Some checks failed
Lighthouse CI / Lighthouse Audit (push) Waiting to run
Lighthouse CI / Axe Accessibility Audit (push) Waiting to run
Manifest Integrity / Validate Schema Integrity (push) Waiting to run
Manifest Integrity / Validate Contract Documents (push) Waiting to run
Manifest Integrity / Validate Pack Fixtures (push) Waiting to run
Manifest Integrity / Audit SHA256SUMS Files (push) Waiting to run
Manifest Integrity / Verify Merkle Roots (push) Waiting to run
Policy Lint & Smoke / policy-lint (push) Waiting to run
Policy Simulation / policy-simulate (push) Waiting to run
Docs CI / lint-and-preview (push) Has been cancelled
Export Center CI / export-ci (push) Has been cancelled
Findings Ledger CI / build-test (push) Has been cancelled
Findings Ledger CI / migration-validation (push) Has been cancelled
Findings Ledger CI / generate-manifest (push) Has been cancelled
- Implemented tests for Cryptographic Failures (A02) to ensure proper handling of sensitive data, secure algorithms, and key management. - Added tests for Security Misconfiguration (A05) to validate production configurations, security headers, CORS settings, and feature management. - Developed tests for Authentication Failures (A07) to enforce strong password policies, rate limiting, session management, and MFA support. - Created tests for Software and Data Integrity Failures (A08) to verify artifact signatures, SBOM integrity, attestation chains, and feed updates.
192 lines
5.5 KiB
Markdown
192 lines
5.5 KiB
Markdown
# Fidelity Metrics Framework
|
|
|
|
> Sprint: SPRINT_3403_0001_0001_fidelity_metrics
|
|
|
|
This document describes the three-tier fidelity metrics framework for measuring deterministic reproducibility in StellaOps scanner outputs.
|
|
|
|
## Overview
|
|
|
|
Fidelity metrics quantify how consistently the scanner produces outputs across replay runs. The framework provides three tiers of measurement, each capturing different aspects of reproducibility:
|
|
|
|
| Metric | Abbrev. | Description | Target |
|
|
|--------|---------|-------------|--------|
|
|
| Bitwise Fidelity | BF | Byte-for-byte identical outputs | ≥ 0.98 |
|
|
| Semantic Fidelity | SF | Normalized object equivalence | ≥ 0.99 |
|
|
| Policy Fidelity | PF | Policy decision consistency | ≈ 1.0 |
|
|
|
|
## Metric Definitions
|
|
|
|
### Bitwise Fidelity (BF)
|
|
|
|
Measures the proportion of replay runs that produce byte-for-byte identical outputs.
|
|
|
|
```
|
|
BF = identical_outputs / total_replays
|
|
```
|
|
|
|
**What it captures:**
|
|
- SHA-256 hash equivalence of all output artifacts
|
|
- Timestamp consistency
|
|
- JSON formatting consistency
|
|
- Field ordering consistency
|
|
|
|
**When BF < 1.0:**
|
|
- Timestamps embedded in outputs
|
|
- Non-deterministic field ordering
|
|
- Floating-point rounding differences
|
|
- Random identifiers (UUIDs)
|
|
|
|
### Semantic Fidelity (SF)
|
|
|
|
Measures the proportion of replay runs that produce semantically equivalent outputs, ignoring formatting differences.
|
|
|
|
```
|
|
SF = semantic_matches / total_replays
|
|
```
|
|
|
|
**What it compares:**
|
|
- Package PURLs and versions
|
|
- CVE identifiers
|
|
- Severity levels (normalized to uppercase)
|
|
- VEX verdicts
|
|
- Reason codes
|
|
|
|
**When SF < 1.0 but BF = SF:**
|
|
- No actual content differences
|
|
- Only formatting differences
|
|
|
|
**When SF < 1.0:**
|
|
- Different packages detected
|
|
- Different CVEs matched
|
|
- Different severity assignments
|
|
|
|
### Policy Fidelity (PF)
|
|
|
|
Measures the proportion of replay runs that produce matching policy decisions.
|
|
|
|
```
|
|
PF = policy_matches / total_replays
|
|
```
|
|
|
|
**What it compares:**
|
|
- Final pass/fail decision
|
|
- Reason codes (sorted for comparison)
|
|
- Policy rule triggering
|
|
|
|
**When PF < 1.0:**
|
|
- Policy outcome differs between runs
|
|
- Indicates a non-determinism bug that affects user-visible decisions
|
|
|
|
## Prometheus Metrics
|
|
|
|
The fidelity framework exports the following metrics:
|
|
|
|
| Metric Name | Type | Labels | Description |
|
|
|-------------|------|--------|-------------|
|
|
| `fidelity_bitwise_ratio` | Gauge | tenant_id, surface_id | Bitwise fidelity ratio |
|
|
| `fidelity_semantic_ratio` | Gauge | tenant_id, surface_id | Semantic fidelity ratio |
|
|
| `fidelity_policy_ratio` | Gauge | tenant_id, surface_id | Policy fidelity ratio |
|
|
| `fidelity_total_replays` | Gauge | tenant_id, surface_id | Number of replays |
|
|
| `fidelity_slo_breach_total` | Counter | breach_type, tenant_id | SLO breach count |
|
|
|
|
## SLO Thresholds
|
|
|
|
Default SLO thresholds (configurable):
|
|
|
|
| Metric | Warning | Critical |
|
|
|--------|---------|----------|
|
|
| Bitwise Fidelity | < 0.98 | < 0.90 |
|
|
| Semantic Fidelity | < 0.99 | < 0.95 |
|
|
| Policy Fidelity | < 1.0 | < 0.99 |
|
|
|
|
## Integration with DeterminismReport
|
|
|
|
Fidelity metrics are integrated into the `DeterminismReport` record:
|
|
|
|
```csharp
|
|
public sealed record DeterminismReport(
|
|
// ... existing fields ...
|
|
FidelityMetrics? Fidelity = null);
|
|
|
|
public sealed record DeterminismImageReport(
|
|
// ... existing fields ...
|
|
FidelityMetrics? Fidelity = null);
|
|
```
|
|
|
|
## Usage Example
|
|
|
|
```csharp
|
|
// Create fidelity metrics service
|
|
var service = new FidelityMetricsService(
|
|
new BitwiseFidelityCalculator(),
|
|
new SemanticFidelityCalculator(),
|
|
new PolicyFidelityCalculator());
|
|
|
|
// Compute fidelity from baseline and replays
|
|
var baseline = LoadScanResult("scan-baseline.json");
|
|
var replays = LoadReplayScanResults();
|
|
var fidelity = service.Compute(baseline, replays);
|
|
|
|
// Check thresholds
|
|
if (fidelity.BitwiseFidelity < 0.98)
|
|
{
|
|
logger.LogWarning("BF below threshold: {BF}", fidelity.BitwiseFidelity);
|
|
}
|
|
|
|
// Include in determinism report
|
|
var report = new DeterminismReport(
|
|
// ... other fields ...
|
|
Fidelity: fidelity);
|
|
```
|
|
|
|
## Mismatch Diagnostics
|
|
|
|
When fidelity is below threshold, the framework provides diagnostic information:
|
|
|
|
```csharp
|
|
public sealed record FidelityMismatch
|
|
{
|
|
public required int RunIndex { get; init; }
|
|
public required FidelityMismatchType Type { get; init; }
|
|
public required string Description { get; init; }
|
|
public IReadOnlyList<string>? AffectedArtifacts { get; init; }
|
|
}
|
|
|
|
public enum FidelityMismatchType
|
|
{
|
|
BitwiseOnly, // Hash differs but content equivalent
|
|
SemanticOnly, // Content differs but policy matches
|
|
PolicyDrift // Policy decision differs
|
|
}
|
|
```
|
|
|
|
## Configuration
|
|
|
|
Configure fidelity options via `FidelityThresholds`:
|
|
|
|
```json
|
|
{
|
|
"Fidelity": {
|
|
"BitwiseThreshold": 0.98,
|
|
"SemanticThreshold": 0.99,
|
|
"PolicyThreshold": 1.0,
|
|
"EnableDiagnostics": true,
|
|
"MaxMismatchesRecorded": 100
|
|
}
|
|
}
|
|
```
|
|
|
|
## Related Documentation
|
|
|
|
- [Determinism and Reproducibility Technical Reference](../product-advisories/14-Dec-2025%20-%20Determinism%20and%20Reproducibility%20Technical%20Reference.md)
|
|
- [Determinism Scoring Foundations Sprint](../implplan/SPRINT_3401_0001_0001_determinism_scoring_foundations.md)
|
|
- [Scanner Architecture](../modules/scanner/architecture.md)
|
|
|
|
## Source Files
|
|
|
|
- `src/Scanner/StellaOps.Scanner.Worker/Determinism/FidelityMetrics.cs`
|
|
- `src/Scanner/StellaOps.Scanner.Worker/Determinism/FidelityMetricsService.cs`
|
|
- `src/Scanner/StellaOps.Scanner.Worker/Determinism/Calculators/`
|
|
- `src/Telemetry/StellaOps.Telemetry.Core/FidelityMetricsTelemetry.cs`
|
|
- `src/Telemetry/StellaOps.Telemetry.Core/FidelitySloAlertingService.cs`
|