docs consolidation
This commit is contained in:
422
docs/modules/policy/design/confidence-to-ews-migration.md
Normal file
422
docs/modules/policy/design/confidence-to-ews-migration.md
Normal file
@@ -0,0 +1,422 @@
|
||||
# Confidence to Evidence-Weighted Score Migration Guide
|
||||
|
||||
> **Version:** 1.0
|
||||
> **Status:** Active
|
||||
> **Last Updated:** 2025-12-31
|
||||
> **Sprint:** 8200.0012.0003 (Policy Engine Integration)
|
||||
|
||||
## Overview
|
||||
|
||||
This document describes the migration path from the legacy **Confidence** scoring system to the new **Evidence-Weighted Score (EWS)** system. The migration is designed to be gradual, low-risk, and fully reversible at each stage.
|
||||
|
||||
### Key Differences
|
||||
|
||||
| Aspect | Confidence (Legacy) | Evidence-Weighted Score |
|
||||
|--------|---------------------|------------------------|
|
||||
| **Score Range** | 0.0–1.0 (decimal) | 0–100 (integer) |
|
||||
| **Direction** | Higher = more confident | Higher = higher risk/priority |
|
||||
| **Basis** | Heuristic confidence in finding | Evidence-backed exploitability |
|
||||
| **Breakdown** | Single value | 6 dimensions (Rch, Rts, Bkp, Xpl, Src, Mit) |
|
||||
| **Determinism** | Limited | Fully deterministic with proofs |
|
||||
| **Attestation** | Not attested | Included in verdict attestation |
|
||||
|
||||
### Semantic Inversion
|
||||
|
||||
The most important difference is **semantic inversion**:
|
||||
|
||||
- **Confidence**: Higher values indicate higher confidence that a finding is accurate
|
||||
- **EWS**: Higher values indicate higher exploitability evidence (more urgent to fix)
|
||||
|
||||
A high-confidence finding may have a low EWS if evidence shows it's mitigated. Conversely, a low-confidence finding may have a high EWS if runtime signals indicate active exploitation.
|
||||
|
||||
---
|
||||
|
||||
## Migration Phases
|
||||
|
||||
### Phase 1: Feature Flag (Opt-In)
|
||||
|
||||
**Duration:** Immediate → 2 weeks
|
||||
**Risk:** None (off by default)
|
||||
|
||||
Enable EWS calculation without changing behavior:
|
||||
|
||||
```json
|
||||
{
|
||||
"Policy": {
|
||||
"EvidenceWeightedScore": {
|
||||
"Enabled": false,
|
||||
"DualEmitMode": false,
|
||||
"UseAsPrimaryScore": false
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**What happens:**
|
||||
- EWS infrastructure is loaded but dormant
|
||||
- No performance impact
|
||||
- No output changes
|
||||
|
||||
**When to proceed:** After infrastructure validation
|
||||
|
||||
---
|
||||
|
||||
### Phase 2: Dual-Emit Mode (Parallel Calculation)
|
||||
|
||||
**Duration:** 2–4 weeks
|
||||
**Risk:** Low (additive only)
|
||||
|
||||
Enable both scoring systems in parallel:
|
||||
|
||||
```json
|
||||
{
|
||||
"Policy": {
|
||||
"EvidenceWeightedScore": {
|
||||
"Enabled": true,
|
||||
"DualEmitMode": true,
|
||||
"UseAsPrimaryScore": false
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**What happens:**
|
||||
- Both Confidence AND EWS are calculated
|
||||
- Both appear in verdicts and attestations
|
||||
- Telemetry compares rankings
|
||||
- Existing rules use Confidence (unchanged behavior)
|
||||
|
||||
**Verdict output example:**
|
||||
```json
|
||||
{
|
||||
"findingId": "CVE-2024-1234:pkg:npm/lodash@4.17.0",
|
||||
"status": "block",
|
||||
"confidence": {
|
||||
"value": 0.85,
|
||||
"tier": "High"
|
||||
},
|
||||
"evidenceWeightedScore": {
|
||||
"score": 72,
|
||||
"bucket": "ScheduleNext",
|
||||
"breakdown": {
|
||||
"rch": { "weighted": 18, "weight": 0.25, "raw": 0.72 },
|
||||
"rts": { "weighted": 24, "weight": 0.30, "raw": 0.80 },
|
||||
"bkp": { "weighted": 0, "weight": 0.10, "raw": 0.00 },
|
||||
"xpl": { "weighted": 10, "weight": 0.15, "raw": 0.67 },
|
||||
"src": { "weighted": 12, "weight": 0.15, "raw": 0.80 },
|
||||
"mit": { "weighted": 8, "weight": 0.05, "raw": 1.60 }
|
||||
},
|
||||
"flags": ["live-signal"],
|
||||
"explanations": ["Runtime signal detected (score +8)", "Reachable via call graph"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Monitoring during this phase:**
|
||||
- Use `IMigrationTelemetryService` to track alignment
|
||||
- Review `MigrationTelemetryStats.AlignmentRate`
|
||||
- Investigate samples where rankings diverge significantly
|
||||
|
||||
**When to proceed:** When alignment rate > 80% or divergences are understood
|
||||
|
||||
---
|
||||
|
||||
### Phase 3: EWS as Primary (Shadow Confidence)
|
||||
|
||||
**Duration:** 2–4 weeks
|
||||
**Risk:** Medium (behavior change possible)
|
||||
|
||||
Switch primary scoring while keeping Confidence for validation:
|
||||
|
||||
```json
|
||||
{
|
||||
"Policy": {
|
||||
"EvidenceWeightedScore": {
|
||||
"Enabled": true,
|
||||
"DualEmitMode": true,
|
||||
"UseAsPrimaryScore": true
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**What happens:**
|
||||
- EWS is used for policy rule evaluation
|
||||
- Confidence is still calculated and emitted (deprecated field)
|
||||
- Policy rules should be migrated to use `score` instead of `confidence`
|
||||
|
||||
**Rule migration example:**
|
||||
|
||||
Before (Confidence-based):
|
||||
```yaml
|
||||
rules:
|
||||
- name: block-high-confidence
|
||||
when: confidence >= 0.9
|
||||
then: block
|
||||
```
|
||||
|
||||
After (EWS-based):
|
||||
```yaml
|
||||
rules:
|
||||
- name: block-high-evidence
|
||||
when: score >= 85
|
||||
then: block
|
||||
|
||||
# Or use bucket-based for clearer semantics:
|
||||
- name: block-act-now
|
||||
when: score.bucket == "ActNow"
|
||||
then: block
|
||||
```
|
||||
|
||||
**Recommended rule patterns:**
|
||||
|
||||
| Confidence Rule | EWS Equivalent | Notes |
|
||||
|----------------|----------------|-------|
|
||||
| `confidence >= 0.9` | `score >= 85` or `score.bucket == "ActNow"` | Very high certainty |
|
||||
| `confidence >= 0.7` | `score >= 60` or `score.bucket in ["ActNow", "ScheduleNext"]` | High certainty |
|
||||
| `confidence >= 0.5` | `score >= 40` | Medium certainty |
|
||||
| `confidence < 0.3` | `score < 25` | Low evidence |
|
||||
|
||||
**When to proceed:** After rule migration and 2+ weeks of stable operation
|
||||
|
||||
---
|
||||
|
||||
### Phase 4: EWS-Only (Deprecation Complete)
|
||||
|
||||
**Duration:** Permanent
|
||||
**Risk:** Low (rollback path exists)
|
||||
|
||||
Disable legacy Confidence scoring:
|
||||
|
||||
```json
|
||||
{
|
||||
"Policy": {
|
||||
"EvidenceWeightedScore": {
|
||||
"Enabled": true,
|
||||
"DualEmitMode": false,
|
||||
"UseAsPrimaryScore": true
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**What happens:**
|
||||
- Only EWS is calculated
|
||||
- Confidence field is null in verdicts
|
||||
- Performance improvement (single calculation)
|
||||
- Consumers must use EWS fields
|
||||
|
||||
**Breaking changes to document:**
|
||||
- `Verdict.Confidence` returns null
|
||||
- `ConfidenceScore` type is deprecated (will be removed in v3.0)
|
||||
- Rules referencing `confidence` will fail validation
|
||||
|
||||
---
|
||||
|
||||
## Configuration Reference
|
||||
|
||||
### Full Configuration Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"Policy": {
|
||||
"EvidenceWeightedScore": {
|
||||
"Enabled": true,
|
||||
"DualEmitMode": true,
|
||||
"UseAsPrimaryScore": false,
|
||||
"EnableCaching": true,
|
||||
"CacheDurationSeconds": 300,
|
||||
"Weights": {
|
||||
"Reachability": 0.25,
|
||||
"RuntimeSignal": 0.30,
|
||||
"BackportStatus": 0.10,
|
||||
"ExploitMaturity": 0.15,
|
||||
"SourceTrust": 0.15,
|
||||
"MitigationStatus": 0.05
|
||||
},
|
||||
"BucketThresholds": {
|
||||
"ActNow": 85,
|
||||
"ScheduleNext": 60,
|
||||
"Investigate": 40
|
||||
},
|
||||
"Telemetry": {
|
||||
"EnableMigrationMetrics": true,
|
||||
"SampleRate": 0.1,
|
||||
"MaxSamples": 1000
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `POLICY_EWS_ENABLED` | `false` | Enable EWS calculation |
|
||||
| `POLICY_EWS_DUAL_EMIT` | `false` | Emit both scores |
|
||||
| `POLICY_EWS_PRIMARY` | `false` | Use EWS as primary score |
|
||||
| `POLICY_EWS_CACHE_ENABLED` | `true` | Enable score caching |
|
||||
|
||||
---
|
||||
|
||||
## Telemetry & Monitoring
|
||||
|
||||
### Metrics
|
||||
|
||||
The migration telemetry service exposes these metrics:
|
||||
|
||||
| Metric | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| `stellaops.policy.migration.comparisons_total` | Counter | Total comparisons made |
|
||||
| `stellaops.policy.migration.aligned_total` | Counter | Comparisons where rankings aligned |
|
||||
| `stellaops.policy.migration.score_difference` | Histogram | Distribution of score differences |
|
||||
| `stellaops.policy.migration.tier_bucket_match_total` | Counter | Tier/bucket matches |
|
||||
| `stellaops.policy.dual_emit.verdicts_total` | Counter | Dual-emit verdicts produced |
|
||||
|
||||
### Dashboard Queries
|
||||
|
||||
**Alignment rate over time:**
|
||||
```promql
|
||||
rate(stellaops_policy_migration_aligned_total[5m])
|
||||
/ rate(stellaops_policy_migration_comparisons_total[5m])
|
||||
```
|
||||
|
||||
**Score difference distribution:**
|
||||
```promql
|
||||
histogram_quantile(0.95, stellaops_policy_migration_score_difference_bucket)
|
||||
```
|
||||
|
||||
### Sample Analysis
|
||||
|
||||
Use `IMigrationTelemetryService.GetRecentSamples()` to retrieve divergent samples:
|
||||
|
||||
```csharp
|
||||
var telemetry = serviceProvider.GetRequiredService<IMigrationTelemetryService>();
|
||||
var stats = telemetry.GetStats();
|
||||
|
||||
if (stats.AlignmentRate < 0.8m)
|
||||
{
|
||||
var samples = telemetry.GetRecentSamples(50)
|
||||
.Where(s => !s.IsAligned)
|
||||
.OrderByDescending(s => Math.Abs(s.ScoreDifference));
|
||||
|
||||
foreach (var sample in samples)
|
||||
{
|
||||
Console.WriteLine($"{sample.FindingId}: Conf={sample.ConfidenceValue:F2} → EWS={sample.EwsScore} (Δ={sample.ScoreDifference})");
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rollback Procedures
|
||||
|
||||
### Phase 4 → Phase 3 (Re-enable Dual-Emit)
|
||||
|
||||
```json
|
||||
{
|
||||
"Policy": {
|
||||
"EvidenceWeightedScore": {
|
||||
"DualEmitMode": true
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Restart services. Confidence will be calculated again.
|
||||
|
||||
### Phase 3 → Phase 2 (Revert to Confidence Primary)
|
||||
|
||||
```json
|
||||
{
|
||||
"Policy": {
|
||||
"EvidenceWeightedScore": {
|
||||
"UseAsPrimaryScore": false
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Rules using `confidence` will work again. Rules using `score` will still work.
|
||||
|
||||
### Phase 2 → Phase 1 (Disable EWS)
|
||||
|
||||
```json
|
||||
{
|
||||
"Policy": {
|
||||
"EvidenceWeightedScore": {
|
||||
"Enabled": false
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
No EWS calculation, no performance impact.
|
||||
|
||||
### Emergency Rollback
|
||||
|
||||
Set environment variable for immediate effect without restart (if hot-reload enabled):
|
||||
|
||||
```bash
|
||||
export POLICY_EWS_ENABLED=false
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rule Migration Checklist
|
||||
|
||||
- [ ] Inventory all policies using `confidence` field
|
||||
- [ ] Map confidence thresholds to EWS thresholds (see table above)
|
||||
- [ ] Update rules to use `score` syntax
|
||||
- [ ] Consider using bucket-based rules for clearer semantics
|
||||
- [ ] Test rules in dual-emit mode before switching primary
|
||||
- [ ] Update documentation and runbooks
|
||||
- [ ] Train operators on new score interpretation
|
||||
- [ ] Update alerting thresholds
|
||||
|
||||
---
|
||||
|
||||
## FAQ
|
||||
|
||||
### Q: Will existing rules break?
|
||||
|
||||
**A:** Not during dual-emit mode. Rules using `confidence` continue to work. Once `UseAsPrimaryScore: true`, new rules should use `score`. Old `confidence` rules will emit deprecation warnings and fail validation in Phase 4.
|
||||
|
||||
### Q: How do I interpret the score difference?
|
||||
|
||||
**A:** The ConfidenceToEwsAdapter maps Confidence (0-1) to an approximate EWS (0-100) with semantic inversion. A "difference" of ±15 points is normal due to the different underlying models. Investigate differences > 30 points.
|
||||
|
||||
### Q: What if my rankings diverge significantly?
|
||||
|
||||
**A:** This is expected for findings where:
|
||||
- Runtime signals (Rts) differ from static analysis
|
||||
- Vendor VEX overrides traditional severity
|
||||
- Reachability analysis shows unreachable code
|
||||
|
||||
Review these cases manually. EWS is likely more accurate due to evidence integration.
|
||||
|
||||
### Q: Can I customize the EWS weights?
|
||||
|
||||
**A:** Yes, via `Weights` configuration. However, changing weights affects determinism proofs. Document any changes and bump the policy version.
|
||||
|
||||
### Q: What about attestations?
|
||||
|
||||
**A:** During dual-emit, attestations include both scores. After Phase 4, only EWS is attested. Old attestations remain verifiable with their original scores.
|
||||
|
||||
---
|
||||
|
||||
## Related Documents
|
||||
|
||||
- [Evidence-Weighted Score Architecture](../../signals/architecture.md)
|
||||
- [Policy DSL Reference](../contracts/policy-dsl.md)
|
||||
- [Verdict Attestation](../verdict-attestation.md)
|
||||
- [Sprint 8200.0012.0003](../../../../implplan/SPRINT_8200_0012_0003_policy_engine_integration.md)
|
||||
|
||||
---
|
||||
|
||||
## Revision History
|
||||
|
||||
| Version | Date | Author | Changes |
|
||||
|---------|------|--------|---------|
|
||||
| 1.0 | 2025-12-31 | Implementer | Initial migration guide |
|
||||
Reference in New Issue
Block a user