Files
git.stella-ops.org/docs/benchmarks/fidelity-metrics.md
master 2170a58734
Some checks failed
Lighthouse CI / Lighthouse Audit (push) Waiting to run
Lighthouse CI / Axe Accessibility Audit (push) Waiting to run
Manifest Integrity / Validate Schema Integrity (push) Waiting to run
Manifest Integrity / Validate Contract Documents (push) Waiting to run
Manifest Integrity / Validate Pack Fixtures (push) Waiting to run
Manifest Integrity / Audit SHA256SUMS Files (push) Waiting to run
Manifest Integrity / Verify Merkle Roots (push) Waiting to run
Policy Lint & Smoke / policy-lint (push) Waiting to run
Policy Simulation / policy-simulate (push) Waiting to run
Docs CI / lint-and-preview (push) Has been cancelled
Export Center CI / export-ci (push) Has been cancelled
Findings Ledger CI / build-test (push) Has been cancelled
Findings Ledger CI / migration-validation (push) Has been cancelled
Findings Ledger CI / generate-manifest (push) Has been cancelled
Add comprehensive security tests for OWASP A02, A05, A07, and A08 categories
- Implemented tests for Cryptographic Failures (A02) to ensure proper handling of sensitive data, secure algorithms, and key management.
- Added tests for Security Misconfiguration (A05) to validate production configurations, security headers, CORS settings, and feature management.
- Developed tests for Authentication Failures (A07) to enforce strong password policies, rate limiting, session management, and MFA support.
- Created tests for Software and Data Integrity Failures (A08) to verify artifact signatures, SBOM integrity, attestation chains, and feed updates.
2025-12-16 16:40:44 +02:00

5.5 KiB

Fidelity Metrics Framework

Sprint: SPRINT_3403_0001_0001_fidelity_metrics

This document describes the three-tier fidelity metrics framework for measuring deterministic reproducibility in StellaOps scanner outputs.

Overview

Fidelity metrics quantify how consistently the scanner produces outputs across replay runs. The framework provides three tiers of measurement, each capturing different aspects of reproducibility:

Metric Abbrev. Description Target
Bitwise Fidelity BF Byte-for-byte identical outputs ≥ 0.98
Semantic Fidelity SF Normalized object equivalence ≥ 0.99
Policy Fidelity PF Policy decision consistency ≈ 1.0

Metric Definitions

Bitwise Fidelity (BF)

Measures the proportion of replay runs that produce byte-for-byte identical outputs.

BF = identical_outputs / total_replays

What it captures:

  • SHA-256 hash equivalence of all output artifacts
  • Timestamp consistency
  • JSON formatting consistency
  • Field ordering consistency

When BF < 1.0:

  • Timestamps embedded in outputs
  • Non-deterministic field ordering
  • Floating-point rounding differences
  • Random identifiers (UUIDs)

Semantic Fidelity (SF)

Measures the proportion of replay runs that produce semantically equivalent outputs, ignoring formatting differences.

SF = semantic_matches / total_replays

What it compares:

  • Package PURLs and versions
  • CVE identifiers
  • Severity levels (normalized to uppercase)
  • VEX verdicts
  • Reason codes

When SF < 1.0 but BF = SF:

  • No actual content differences
  • Only formatting differences

When SF < 1.0:

  • Different packages detected
  • Different CVEs matched
  • Different severity assignments

Policy Fidelity (PF)

Measures the proportion of replay runs that produce matching policy decisions.

PF = policy_matches / total_replays

What it compares:

  • Final pass/fail decision
  • Reason codes (sorted for comparison)
  • Policy rule triggering

When PF < 1.0:

  • Policy outcome differs between runs
  • Indicates a non-determinism bug that affects user-visible decisions

Prometheus Metrics

The fidelity framework exports the following metrics:

Metric Name Type Labels Description
fidelity_bitwise_ratio Gauge tenant_id, surface_id Bitwise fidelity ratio
fidelity_semantic_ratio Gauge tenant_id, surface_id Semantic fidelity ratio
fidelity_policy_ratio Gauge tenant_id, surface_id Policy fidelity ratio
fidelity_total_replays Gauge tenant_id, surface_id Number of replays
fidelity_slo_breach_total Counter breach_type, tenant_id SLO breach count

SLO Thresholds

Default SLO thresholds (configurable):

Metric Warning Critical
Bitwise Fidelity < 0.98 < 0.90
Semantic Fidelity < 0.99 < 0.95
Policy Fidelity < 1.0 < 0.99

Integration with DeterminismReport

Fidelity metrics are integrated into the DeterminismReport record:

public sealed record DeterminismReport(
    // ... existing fields ...
    FidelityMetrics? Fidelity = null);

public sealed record DeterminismImageReport(
    // ... existing fields ...
    FidelityMetrics? Fidelity = null);

Usage Example

// Create fidelity metrics service
var service = new FidelityMetricsService(
    new BitwiseFidelityCalculator(),
    new SemanticFidelityCalculator(),
    new PolicyFidelityCalculator());

// Compute fidelity from baseline and replays
var baseline = LoadScanResult("scan-baseline.json");
var replays = LoadReplayScanResults();
var fidelity = service.Compute(baseline, replays);

// Check thresholds
if (fidelity.BitwiseFidelity < 0.98)
{
    logger.LogWarning("BF below threshold: {BF}", fidelity.BitwiseFidelity);
}

// Include in determinism report
var report = new DeterminismReport(
    // ... other fields ...
    Fidelity: fidelity);

Mismatch Diagnostics

When fidelity is below threshold, the framework provides diagnostic information:

public sealed record FidelityMismatch
{
    public required int RunIndex { get; init; }
    public required FidelityMismatchType Type { get; init; }
    public required string Description { get; init; }
    public IReadOnlyList<string>? AffectedArtifacts { get; init; }
}

public enum FidelityMismatchType
{
    BitwiseOnly,    // Hash differs but content equivalent
    SemanticOnly,   // Content differs but policy matches
    PolicyDrift     // Policy decision differs
}

Configuration

Configure fidelity options via FidelityThresholds:

{
  "Fidelity": {
    "BitwiseThreshold": 0.98,
    "SemanticThreshold": 0.99,
    "PolicyThreshold": 1.0,
    "EnableDiagnostics": true,
    "MaxMismatchesRecorded": 100
  }
}

Source Files

  • src/Scanner/StellaOps.Scanner.Worker/Determinism/FidelityMetrics.cs
  • src/Scanner/StellaOps.Scanner.Worker/Determinism/FidelityMetricsService.cs
  • src/Scanner/StellaOps.Scanner.Worker/Determinism/Calculators/
  • src/Telemetry/StellaOps.Telemetry.Core/FidelityMetricsTelemetry.cs
  • src/Telemetry/StellaOps.Telemetry.Core/FidelitySloAlertingService.cs