stella-ops.org/git.stella-ops.org

Fork 0

Files

StellaOps Bot 223843f1d1 docs consolidation

2025-12-25 12:16:13 +02:00

11 KiB

Raw Blame History

Confidence to Evidence-Weighted Score Migration Guide

Version: 1.0
Status: Active
Last Updated: 2025-12-31
Sprint: 8200.0012.0003 (Policy Engine Integration)

Overview

This document describes the migration path from the legacy Confidence scoring system to the new Evidence-Weighted Score (EWS) system. The migration is designed to be gradual, low-risk, and fully reversible at each stage.

Key Differences

Aspect	Confidence (Legacy)	Evidence-Weighted Score
Score Range	0.0–1.0 (decimal)	0–100 (integer)
Direction	Higher = more confident	Higher = higher risk/priority
Basis	Heuristic confidence in finding	Evidence-backed exploitability
Breakdown	Single value	6 dimensions (Rch, Rts, Bkp, Xpl, Src, Mit)
Determinism	Limited	Fully deterministic with proofs
Attestation	Not attested	Included in verdict attestation

Semantic Inversion

The most important difference is semantic inversion:

Confidence: Higher values indicate higher confidence that a finding is accurate
EWS: Higher values indicate higher exploitability evidence (more urgent to fix)

A high-confidence finding may have a low EWS if evidence shows it's mitigated. Conversely, a low-confidence finding may have a high EWS if runtime signals indicate active exploitation.

Migration Phases

Phase 1: Feature Flag (Opt-In)

Duration: Immediate → 2 weeks
Risk: None (off by default)

Enable EWS calculation without changing behavior:

{
  "Policy": {
    "EvidenceWeightedScore": {
      "Enabled": false,
      "DualEmitMode": false,
      "UseAsPrimaryScore": false
    }
  }
}

What happens:

EWS infrastructure is loaded but dormant
No performance impact
No output changes

When to proceed: After infrastructure validation

Phase 2: Dual-Emit Mode (Parallel Calculation)

Duration: 2–4 weeks
Risk: Low (additive only)

Enable both scoring systems in parallel:

{
  "Policy": {
    "EvidenceWeightedScore": {
      "Enabled": true,
      "DualEmitMode": true,
      "UseAsPrimaryScore": false
    }
  }
}

What happens:

Both Confidence AND EWS are calculated
Both appear in verdicts and attestations
Telemetry compares rankings
Existing rules use Confidence (unchanged behavior)

Verdict output example:

{
  "findingId": "CVE-2024-1234:pkg:npm/lodash@4.17.0",
  "status": "block",
  "confidence": {
    "value": 0.85,
    "tier": "High"
  },
  "evidenceWeightedScore": {
    "score": 72,
    "bucket": "ScheduleNext",
    "breakdown": {
      "rch": { "weighted": 18, "weight": 0.25, "raw": 0.72 },
      "rts": { "weighted": 24, "weight": 0.30, "raw": 0.80 },
      "bkp": { "weighted": 0, "weight": 0.10, "raw": 0.00 },
      "xpl": { "weighted": 10, "weight": 0.15, "raw": 0.67 },
      "src": { "weighted": 12, "weight": 0.15, "raw": 0.80 },
      "mit": { "weighted": 8, "weight": 0.05, "raw": 1.60 }
    },
    "flags": ["live-signal"],
    "explanations": ["Runtime signal detected (score +8)", "Reachable via call graph"]
  }
}

Monitoring during this phase:

Use IMigrationTelemetryService to track alignment
Review MigrationTelemetryStats.AlignmentRate
Investigate samples where rankings diverge significantly

When to proceed: When alignment rate > 80% or divergences are understood

Phase 3: EWS as Primary (Shadow Confidence)

Duration: 2–4 weeks
Risk: Medium (behavior change possible)

Switch primary scoring while keeping Confidence for validation:

{
  "Policy": {
    "EvidenceWeightedScore": {
      "Enabled": true,
      "DualEmitMode": true,
      "UseAsPrimaryScore": true
    }
  }
}

What happens:

EWS is used for policy rule evaluation
Confidence is still calculated and emitted (deprecated field)
Policy rules should be migrated to use score instead of confidence

Rule migration example:

Before (Confidence-based):

rules:
  - name: block-high-confidence
    when: confidence >= 0.9
    then: block

After (EWS-based):

rules:
  - name: block-high-evidence
    when: score >= 85
    then: block

  # Or use bucket-based for clearer semantics:
  - name: block-act-now
    when: score.bucket == "ActNow"
    then: block

Recommended rule patterns:

Confidence Rule	EWS Equivalent	Notes
`confidence >= 0.9`	`score >= 85` or `score.bucket == "ActNow"`	Very high certainty
`confidence >= 0.7`	`score >= 60` or `score.bucket in ["ActNow", "ScheduleNext"]`	High certainty
`confidence >= 0.5`	`score >= 40`	Medium certainty
`confidence < 0.3`	`score < 25`	Low evidence

When to proceed: After rule migration and 2+ weeks of stable operation

Phase 4: EWS-Only (Deprecation Complete)

Duration: Permanent
Risk: Low (rollback path exists)

Disable legacy Confidence scoring:

{
  "Policy": {
    "EvidenceWeightedScore": {
      "Enabled": true,
      "DualEmitMode": false,
      "UseAsPrimaryScore": true
    }
  }
}

What happens:

Only EWS is calculated
Confidence field is null in verdicts
Performance improvement (single calculation)
Consumers must use EWS fields

Breaking changes to document:

Verdict.Confidence returns null
ConfidenceScore type is deprecated (will be removed in v3.0)
Rules referencing confidence will fail validation

Configuration Reference

Full Configuration Schema

{
  "Policy": {
    "EvidenceWeightedScore": {
      "Enabled": true,
      "DualEmitMode": true,
      "UseAsPrimaryScore": false,
      "EnableCaching": true,
      "CacheDurationSeconds": 300,
      "Weights": {
        "Reachability": 0.25,
        "RuntimeSignal": 0.30,
        "BackportStatus": 0.10,
        "ExploitMaturity": 0.15,
        "SourceTrust": 0.15,
        "MitigationStatus": 0.05
      },
      "BucketThresholds": {
        "ActNow": 85,
        "ScheduleNext": 60,
        "Investigate": 40
      },
      "Telemetry": {
        "EnableMigrationMetrics": true,
        "SampleRate": 0.1,
        "MaxSamples": 1000
      }
    }
  }
}

Environment Variables

Variable	Default	Description
`POLICY_EWS_ENABLED`	`false`	Enable EWS calculation
`POLICY_EWS_DUAL_EMIT`	`false`	Emit both scores
`POLICY_EWS_PRIMARY`	`false`	Use EWS as primary score
`POLICY_EWS_CACHE_ENABLED`	`true`	Enable score caching

Telemetry & Monitoring

Metrics

The migration telemetry service exposes these metrics:

Metric	Type	Description
`stellaops.policy.migration.comparisons_total`	Counter	Total comparisons made
`stellaops.policy.migration.aligned_total`	Counter	Comparisons where rankings aligned
`stellaops.policy.migration.score_difference`	Histogram	Distribution of score differences
`stellaops.policy.migration.tier_bucket_match_total`	Counter	Tier/bucket matches
`stellaops.policy.dual_emit.verdicts_total`	Counter	Dual-emit verdicts produced

Dashboard Queries

Alignment rate over time:

rate(stellaops_policy_migration_aligned_total[5m]) 
/ rate(stellaops_policy_migration_comparisons_total[5m])

Score difference distribution:

histogram_quantile(0.95, stellaops_policy_migration_score_difference_bucket)

Sample Analysis

Use IMigrationTelemetryService.GetRecentSamples() to retrieve divergent samples:

var telemetry = serviceProvider.GetRequiredService<IMigrationTelemetryService>();
var stats = telemetry.GetStats();

if (stats.AlignmentRate < 0.8m)
{
    var samples = telemetry.GetRecentSamples(50)
        .Where(s => !s.IsAligned)
        .OrderByDescending(s => Math.Abs(s.ScoreDifference));
    
    foreach (var sample in samples)
    {
        Console.WriteLine($"{sample.FindingId}: Conf={sample.ConfidenceValue:F2} → EWS={sample.EwsScore} (Δ={sample.ScoreDifference})");
    }
}

Rollback Procedures

Phase 4 → Phase 3 (Re-enable Dual-Emit)

{
  "Policy": {
    "EvidenceWeightedScore": {
      "DualEmitMode": true
    }
  }
}

Restart services. Confidence will be calculated again.

Phase 3 → Phase 2 (Revert to Confidence Primary)

{
  "Policy": {
    "EvidenceWeightedScore": {
      "UseAsPrimaryScore": false
    }
  }
}

Rules using confidence will work again. Rules using score will still work.

Phase 2 → Phase 1 (Disable EWS)

{
  "Policy": {
    "EvidenceWeightedScore": {
      "Enabled": false
    }
  }
}

No EWS calculation, no performance impact.

Emergency Rollback

Set environment variable for immediate effect without restart (if hot-reload enabled):

export POLICY_EWS_ENABLED=false

Rule Migration Checklist

Inventory all policies using confidence field
Map confidence thresholds to EWS thresholds (see table above)
Update rules to use score syntax
Consider using bucket-based rules for clearer semantics
Test rules in dual-emit mode before switching primary
Update documentation and runbooks
Train operators on new score interpretation
Update alerting thresholds

FAQ

Q: Will existing rules break?

A: Not during dual-emit mode. Rules using confidence continue to work. Once UseAsPrimaryScore: true, new rules should use score. Old confidence rules will emit deprecation warnings and fail validation in Phase 4.

Q: How do I interpret the score difference?

A: The ConfidenceToEwsAdapter maps Confidence (0-1) to an approximate EWS (0-100) with semantic inversion. A "difference" of ±15 points is normal due to the different underlying models. Investigate differences > 30 points.

Q: What if my rankings diverge significantly?

A: This is expected for findings where:

Runtime signals (Rts) differ from static analysis
Vendor VEX overrides traditional severity
Reachability analysis shows unreachable code

Review these cases manually. EWS is likely more accurate due to evidence integration.

Q: Can I customize the EWS weights?

A: Yes, via Weights configuration. However, changing weights affects determinism proofs. Document any changes and bump the policy version.

Q: What about attestations?

A: During dual-emit, attestations include both scores. After Phase 4, only EWS is attested. Old attestations remain verifiable with their original scores.

Revision History

Version	Date	Author	Changes
1.0	2025-12-31	Implementer	Initial migration guide

11 KiB Raw Blame History Unescape Escape

Confidence to Evidence-Weighted Score Migration Guide

Overview

Key Differences

Semantic Inversion

Migration Phases

Phase 1: Feature Flag (Opt-In)

Phase 2: Dual-Emit Mode (Parallel Calculation)

Phase 3: EWS as Primary (Shadow Confidence)

Phase 4: EWS-Only (Deprecation Complete)

Configuration Reference

Full Configuration Schema

Environment Variables

Telemetry & Monitoring

Metrics

Dashboard Queries

Sample Analysis

Rollback Procedures

Phase 4 → Phase 3 (Re-enable Dual-Emit)

Phase 3 → Phase 2 (Revert to Confidence Primary)

Phase 2 → Phase 1 (Disable EWS)

Emergency Rollback

Rule Migration Checklist

FAQ

Q: Will existing rules break?

Q: How do I interpret the score difference?

Q: What if my rankings diverge significantly?

Q: Can I customize the EWS weights?

Q: What about attestations?

Related Documents

Revision History

11 KiB

Raw Blame History