- Implemented tests for Cryptographic Failures (A02) to ensure proper handling of sensitive data, secure algorithms, and key management. - Added tests for Security Misconfiguration (A05) to validate production configurations, security headers, CORS settings, and feature management. - Developed tests for Authentication Failures (A07) to enforce strong password policies, rate limiting, session management, and MFA support. - Created tests for Software and Data Integrity Failures (A08) to verify artifact signatures, SBOM integrity, attestation chains, and feed updates.
5.5 KiB
Fidelity Metrics Framework
Sprint: SPRINT_3403_0001_0001_fidelity_metrics
This document describes the three-tier fidelity metrics framework for measuring deterministic reproducibility in StellaOps scanner outputs.
Overview
Fidelity metrics quantify how consistently the scanner produces outputs across replay runs. The framework provides three tiers of measurement, each capturing different aspects of reproducibility:
| Metric | Abbrev. | Description | Target |
|---|---|---|---|
| Bitwise Fidelity | BF | Byte-for-byte identical outputs | ≥ 0.98 |
| Semantic Fidelity | SF | Normalized object equivalence | ≥ 0.99 |
| Policy Fidelity | PF | Policy decision consistency | ≈ 1.0 |
Metric Definitions
Bitwise Fidelity (BF)
Measures the proportion of replay runs that produce byte-for-byte identical outputs.
BF = identical_outputs / total_replays
What it captures:
- SHA-256 hash equivalence of all output artifacts
- Timestamp consistency
- JSON formatting consistency
- Field ordering consistency
When BF < 1.0:
- Timestamps embedded in outputs
- Non-deterministic field ordering
- Floating-point rounding differences
- Random identifiers (UUIDs)
Semantic Fidelity (SF)
Measures the proportion of replay runs that produce semantically equivalent outputs, ignoring formatting differences.
SF = semantic_matches / total_replays
What it compares:
- Package PURLs and versions
- CVE identifiers
- Severity levels (normalized to uppercase)
- VEX verdicts
- Reason codes
When SF < 1.0 but BF = SF:
- No actual content differences
- Only formatting differences
When SF < 1.0:
- Different packages detected
- Different CVEs matched
- Different severity assignments
Policy Fidelity (PF)
Measures the proportion of replay runs that produce matching policy decisions.
PF = policy_matches / total_replays
What it compares:
- Final pass/fail decision
- Reason codes (sorted for comparison)
- Policy rule triggering
When PF < 1.0:
- Policy outcome differs between runs
- Indicates a non-determinism bug that affects user-visible decisions
Prometheus Metrics
The fidelity framework exports the following metrics:
| Metric Name | Type | Labels | Description |
|---|---|---|---|
fidelity_bitwise_ratio |
Gauge | tenant_id, surface_id | Bitwise fidelity ratio |
fidelity_semantic_ratio |
Gauge | tenant_id, surface_id | Semantic fidelity ratio |
fidelity_policy_ratio |
Gauge | tenant_id, surface_id | Policy fidelity ratio |
fidelity_total_replays |
Gauge | tenant_id, surface_id | Number of replays |
fidelity_slo_breach_total |
Counter | breach_type, tenant_id | SLO breach count |
SLO Thresholds
Default SLO thresholds (configurable):
| Metric | Warning | Critical |
|---|---|---|
| Bitwise Fidelity | < 0.98 | < 0.90 |
| Semantic Fidelity | < 0.99 | < 0.95 |
| Policy Fidelity | < 1.0 | < 0.99 |
Integration with DeterminismReport
Fidelity metrics are integrated into the DeterminismReport record:
public sealed record DeterminismReport(
// ... existing fields ...
FidelityMetrics? Fidelity = null);
public sealed record DeterminismImageReport(
// ... existing fields ...
FidelityMetrics? Fidelity = null);
Usage Example
// Create fidelity metrics service
var service = new FidelityMetricsService(
new BitwiseFidelityCalculator(),
new SemanticFidelityCalculator(),
new PolicyFidelityCalculator());
// Compute fidelity from baseline and replays
var baseline = LoadScanResult("scan-baseline.json");
var replays = LoadReplayScanResults();
var fidelity = service.Compute(baseline, replays);
// Check thresholds
if (fidelity.BitwiseFidelity < 0.98)
{
logger.LogWarning("BF below threshold: {BF}", fidelity.BitwiseFidelity);
}
// Include in determinism report
var report = new DeterminismReport(
// ... other fields ...
Fidelity: fidelity);
Mismatch Diagnostics
When fidelity is below threshold, the framework provides diagnostic information:
public sealed record FidelityMismatch
{
public required int RunIndex { get; init; }
public required FidelityMismatchType Type { get; init; }
public required string Description { get; init; }
public IReadOnlyList<string>? AffectedArtifacts { get; init; }
}
public enum FidelityMismatchType
{
BitwiseOnly, // Hash differs but content equivalent
SemanticOnly, // Content differs but policy matches
PolicyDrift // Policy decision differs
}
Configuration
Configure fidelity options via FidelityThresholds:
{
"Fidelity": {
"BitwiseThreshold": 0.98,
"SemanticThreshold": 0.99,
"PolicyThreshold": 1.0,
"EnableDiagnostics": true,
"MaxMismatchesRecorded": 100
}
}
Related Documentation
- Determinism and Reproducibility Technical Reference
- Determinism Scoring Foundations Sprint
- Scanner Architecture
Source Files
src/Scanner/StellaOps.Scanner.Worker/Determinism/FidelityMetrics.cssrc/Scanner/StellaOps.Scanner.Worker/Determinism/FidelityMetricsService.cssrc/Scanner/StellaOps.Scanner.Worker/Determinism/Calculators/src/Telemetry/StellaOps.Telemetry.Core/FidelityMetricsTelemetry.cssrc/Telemetry/StellaOps.Telemetry.Core/FidelitySloAlertingService.cs