11 KiB
Confidence to Evidence-Weighted Score Migration Guide
Version: 1.0
Status: Active
Last Updated: 2025-12-31
Sprint: 8200.0012.0003 (Policy Engine Integration)
Overview
This document describes the migration path from the legacy Confidence scoring system to the new Evidence-Weighted Score (EWS) system. The migration is designed to be gradual, low-risk, and fully reversible at each stage.
Key Differences
| Aspect | Confidence (Legacy) | Evidence-Weighted Score |
|---|---|---|
| Score Range | 0.0–1.0 (decimal) | 0–100 (integer) |
| Direction | Higher = more confident | Higher = higher risk/priority |
| Basis | Heuristic confidence in finding | Evidence-backed exploitability |
| Breakdown | Single value | 6 dimensions (Rch, Rts, Bkp, Xpl, Src, Mit) |
| Determinism | Limited | Fully deterministic with proofs |
| Attestation | Not attested | Included in verdict attestation |
Semantic Inversion
The most important difference is semantic inversion:
- Confidence: Higher values indicate higher confidence that a finding is accurate
- EWS: Higher values indicate higher exploitability evidence (more urgent to fix)
A high-confidence finding may have a low EWS if evidence shows it's mitigated. Conversely, a low-confidence finding may have a high EWS if runtime signals indicate active exploitation.
Migration Phases
Phase 1: Feature Flag (Opt-In)
Duration: Immediate → 2 weeks
Risk: None (off by default)
Enable EWS calculation without changing behavior:
{
"Policy": {
"EvidenceWeightedScore": {
"Enabled": false,
"DualEmitMode": false,
"UseAsPrimaryScore": false
}
}
}
What happens:
- EWS infrastructure is loaded but dormant
- No performance impact
- No output changes
When to proceed: After infrastructure validation
Phase 2: Dual-Emit Mode (Parallel Calculation)
Duration: 2–4 weeks
Risk: Low (additive only)
Enable both scoring systems in parallel:
{
"Policy": {
"EvidenceWeightedScore": {
"Enabled": true,
"DualEmitMode": true,
"UseAsPrimaryScore": false
}
}
}
What happens:
- Both Confidence AND EWS are calculated
- Both appear in verdicts and attestations
- Telemetry compares rankings
- Existing rules use Confidence (unchanged behavior)
Verdict output example:
{
"findingId": "CVE-2024-1234:pkg:npm/lodash@4.17.0",
"status": "block",
"confidence": {
"value": 0.85,
"tier": "High"
},
"evidenceWeightedScore": {
"score": 72,
"bucket": "ScheduleNext",
"breakdown": {
"rch": { "weighted": 18, "weight": 0.25, "raw": 0.72 },
"rts": { "weighted": 24, "weight": 0.30, "raw": 0.80 },
"bkp": { "weighted": 0, "weight": 0.10, "raw": 0.00 },
"xpl": { "weighted": 10, "weight": 0.15, "raw": 0.67 },
"src": { "weighted": 12, "weight": 0.15, "raw": 0.80 },
"mit": { "weighted": 8, "weight": 0.05, "raw": 1.60 }
},
"flags": ["live-signal"],
"explanations": ["Runtime signal detected (score +8)", "Reachable via call graph"]
}
}
Monitoring during this phase:
- Use
IMigrationTelemetryServiceto track alignment - Review
MigrationTelemetryStats.AlignmentRate - Investigate samples where rankings diverge significantly
When to proceed: When alignment rate > 80% or divergences are understood
Phase 3: EWS as Primary (Shadow Confidence)
Duration: 2–4 weeks
Risk: Medium (behavior change possible)
Switch primary scoring while keeping Confidence for validation:
{
"Policy": {
"EvidenceWeightedScore": {
"Enabled": true,
"DualEmitMode": true,
"UseAsPrimaryScore": true
}
}
}
What happens:
- EWS is used for policy rule evaluation
- Confidence is still calculated and emitted (deprecated field)
- Policy rules should be migrated to use
scoreinstead ofconfidence
Rule migration example:
Before (Confidence-based):
rules:
- name: block-high-confidence
when: confidence >= 0.9
then: block
After (EWS-based):
rules:
- name: block-high-evidence
when: score >= 85
then: block
# Or use bucket-based for clearer semantics:
- name: block-act-now
when: score.bucket == "ActNow"
then: block
Recommended rule patterns:
| Confidence Rule | EWS Equivalent | Notes |
|---|---|---|
confidence >= 0.9 |
score >= 85 or score.bucket == "ActNow" |
Very high certainty |
confidence >= 0.7 |
score >= 60 or score.bucket in ["ActNow", "ScheduleNext"] |
High certainty |
confidence >= 0.5 |
score >= 40 |
Medium certainty |
confidence < 0.3 |
score < 25 |
Low evidence |
When to proceed: After rule migration and 2+ weeks of stable operation
Phase 4: EWS-Only (Deprecation Complete)
Duration: Permanent
Risk: Low (rollback path exists)
Disable legacy Confidence scoring:
{
"Policy": {
"EvidenceWeightedScore": {
"Enabled": true,
"DualEmitMode": false,
"UseAsPrimaryScore": true
}
}
}
What happens:
- Only EWS is calculated
- Confidence field is null in verdicts
- Performance improvement (single calculation)
- Consumers must use EWS fields
Breaking changes to document:
Verdict.Confidencereturns nullConfidenceScoretype is deprecated (will be removed in v3.0)- Rules referencing
confidencewill fail validation
Configuration Reference
Full Configuration Schema
{
"Policy": {
"EvidenceWeightedScore": {
"Enabled": true,
"DualEmitMode": true,
"UseAsPrimaryScore": false,
"EnableCaching": true,
"CacheDurationSeconds": 300,
"Weights": {
"Reachability": 0.25,
"RuntimeSignal": 0.30,
"BackportStatus": 0.10,
"ExploitMaturity": 0.15,
"SourceTrust": 0.15,
"MitigationStatus": 0.05
},
"BucketThresholds": {
"ActNow": 85,
"ScheduleNext": 60,
"Investigate": 40
},
"Telemetry": {
"EnableMigrationMetrics": true,
"SampleRate": 0.1,
"MaxSamples": 1000
}
}
}
}
Environment Variables
| Variable | Default | Description |
|---|---|---|
POLICY_EWS_ENABLED |
false |
Enable EWS calculation |
POLICY_EWS_DUAL_EMIT |
false |
Emit both scores |
POLICY_EWS_PRIMARY |
false |
Use EWS as primary score |
POLICY_EWS_CACHE_ENABLED |
true |
Enable score caching |
Telemetry & Monitoring
Metrics
The migration telemetry service exposes these metrics:
| Metric | Type | Description |
|---|---|---|
stellaops.policy.migration.comparisons_total |
Counter | Total comparisons made |
stellaops.policy.migration.aligned_total |
Counter | Comparisons where rankings aligned |
stellaops.policy.migration.score_difference |
Histogram | Distribution of score differences |
stellaops.policy.migration.tier_bucket_match_total |
Counter | Tier/bucket matches |
stellaops.policy.dual_emit.verdicts_total |
Counter | Dual-emit verdicts produced |
Dashboard Queries
Alignment rate over time:
rate(stellaops_policy_migration_aligned_total[5m])
/ rate(stellaops_policy_migration_comparisons_total[5m])
Score difference distribution:
histogram_quantile(0.95, stellaops_policy_migration_score_difference_bucket)
Sample Analysis
Use IMigrationTelemetryService.GetRecentSamples() to retrieve divergent samples:
var telemetry = serviceProvider.GetRequiredService<IMigrationTelemetryService>();
var stats = telemetry.GetStats();
if (stats.AlignmentRate < 0.8m)
{
var samples = telemetry.GetRecentSamples(50)
.Where(s => !s.IsAligned)
.OrderByDescending(s => Math.Abs(s.ScoreDifference));
foreach (var sample in samples)
{
Console.WriteLine($"{sample.FindingId}: Conf={sample.ConfidenceValue:F2} → EWS={sample.EwsScore} (Δ={sample.ScoreDifference})");
}
}
Rollback Procedures
Phase 4 → Phase 3 (Re-enable Dual-Emit)
{
"Policy": {
"EvidenceWeightedScore": {
"DualEmitMode": true
}
}
}
Restart services. Confidence will be calculated again.
Phase 3 → Phase 2 (Revert to Confidence Primary)
{
"Policy": {
"EvidenceWeightedScore": {
"UseAsPrimaryScore": false
}
}
}
Rules using confidence will work again. Rules using score will still work.
Phase 2 → Phase 1 (Disable EWS)
{
"Policy": {
"EvidenceWeightedScore": {
"Enabled": false
}
}
}
No EWS calculation, no performance impact.
Emergency Rollback
Set environment variable for immediate effect without restart (if hot-reload enabled):
export POLICY_EWS_ENABLED=false
Rule Migration Checklist
- Inventory all policies using
confidencefield - Map confidence thresholds to EWS thresholds (see table above)
- Update rules to use
scoresyntax - Consider using bucket-based rules for clearer semantics
- Test rules in dual-emit mode before switching primary
- Update documentation and runbooks
- Train operators on new score interpretation
- Update alerting thresholds
FAQ
Q: Will existing rules break?
A: Not during dual-emit mode. Rules using confidence continue to work. Once UseAsPrimaryScore: true, new rules should use score. Old confidence rules will emit deprecation warnings and fail validation in Phase 4.
Q: How do I interpret the score difference?
A: The ConfidenceToEwsAdapter maps Confidence (0-1) to an approximate EWS (0-100) with semantic inversion. A "difference" of ±15 points is normal due to the different underlying models. Investigate differences > 30 points.
Q: What if my rankings diverge significantly?
A: This is expected for findings where:
- Runtime signals (Rts) differ from static analysis
- Vendor VEX overrides traditional severity
- Reachability analysis shows unreachable code
Review these cases manually. EWS is likely more accurate due to evidence integration.
Q: Can I customize the EWS weights?
A: Yes, via Weights configuration. However, changing weights affects determinism proofs. Document any changes and bump the policy version.
Q: What about attestations?
A: During dual-emit, attestations include both scores. After Phase 4, only EWS is attested. Old attestations remain verifiable with their original scores.
Related Documents
Revision History
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | 2025-12-31 | Implementer | Initial migration guide |