Files
git.stella-ops.org/docs/benchmarks/fidelity-metrics.md
master 2170a58734
Some checks failed
Lighthouse CI / Lighthouse Audit (push) Waiting to run
Lighthouse CI / Axe Accessibility Audit (push) Waiting to run
Manifest Integrity / Validate Schema Integrity (push) Waiting to run
Manifest Integrity / Validate Contract Documents (push) Waiting to run
Manifest Integrity / Validate Pack Fixtures (push) Waiting to run
Manifest Integrity / Audit SHA256SUMS Files (push) Waiting to run
Manifest Integrity / Verify Merkle Roots (push) Waiting to run
Policy Lint & Smoke / policy-lint (push) Waiting to run
Policy Simulation / policy-simulate (push) Waiting to run
Docs CI / lint-and-preview (push) Has been cancelled
Export Center CI / export-ci (push) Has been cancelled
Findings Ledger CI / build-test (push) Has been cancelled
Findings Ledger CI / migration-validation (push) Has been cancelled
Findings Ledger CI / generate-manifest (push) Has been cancelled
Add comprehensive security tests for OWASP A02, A05, A07, and A08 categories
- Implemented tests for Cryptographic Failures (A02) to ensure proper handling of sensitive data, secure algorithms, and key management.
- Added tests for Security Misconfiguration (A05) to validate production configurations, security headers, CORS settings, and feature management.
- Developed tests for Authentication Failures (A07) to enforce strong password policies, rate limiting, session management, and MFA support.
- Created tests for Software and Data Integrity Failures (A08) to verify artifact signatures, SBOM integrity, attestation chains, and feed updates.
2025-12-16 16:40:44 +02:00

192 lines
5.5 KiB
Markdown

# Fidelity Metrics Framework
> Sprint: SPRINT_3403_0001_0001_fidelity_metrics
This document describes the three-tier fidelity metrics framework for measuring deterministic reproducibility in StellaOps scanner outputs.
## Overview
Fidelity metrics quantify how consistently the scanner produces outputs across replay runs. The framework provides three tiers of measurement, each capturing different aspects of reproducibility:
| Metric | Abbrev. | Description | Target |
|--------|---------|-------------|--------|
| Bitwise Fidelity | BF | Byte-for-byte identical outputs | ≥ 0.98 |
| Semantic Fidelity | SF | Normalized object equivalence | ≥ 0.99 |
| Policy Fidelity | PF | Policy decision consistency | ≈ 1.0 |
## Metric Definitions
### Bitwise Fidelity (BF)
Measures the proportion of replay runs that produce byte-for-byte identical outputs.
```
BF = identical_outputs / total_replays
```
**What it captures:**
- SHA-256 hash equivalence of all output artifacts
- Timestamp consistency
- JSON formatting consistency
- Field ordering consistency
**When BF < 1.0:**
- Timestamps embedded in outputs
- Non-deterministic field ordering
- Floating-point rounding differences
- Random identifiers (UUIDs)
### Semantic Fidelity (SF)
Measures the proportion of replay runs that produce semantically equivalent outputs, ignoring formatting differences.
```
SF = semantic_matches / total_replays
```
**What it compares:**
- Package PURLs and versions
- CVE identifiers
- Severity levels (normalized to uppercase)
- VEX verdicts
- Reason codes
**When SF < 1.0 but BF = SF:**
- No actual content differences
- Only formatting differences
**When SF < 1.0:**
- Different packages detected
- Different CVEs matched
- Different severity assignments
### Policy Fidelity (PF)
Measures the proportion of replay runs that produce matching policy decisions.
```
PF = policy_matches / total_replays
```
**What it compares:**
- Final pass/fail decision
- Reason codes (sorted for comparison)
- Policy rule triggering
**When PF < 1.0:**
- Policy outcome differs between runs
- Indicates a non-determinism bug that affects user-visible decisions
## Prometheus Metrics
The fidelity framework exports the following metrics:
| Metric Name | Type | Labels | Description |
|-------------|------|--------|-------------|
| `fidelity_bitwise_ratio` | Gauge | tenant_id, surface_id | Bitwise fidelity ratio |
| `fidelity_semantic_ratio` | Gauge | tenant_id, surface_id | Semantic fidelity ratio |
| `fidelity_policy_ratio` | Gauge | tenant_id, surface_id | Policy fidelity ratio |
| `fidelity_total_replays` | Gauge | tenant_id, surface_id | Number of replays |
| `fidelity_slo_breach_total` | Counter | breach_type, tenant_id | SLO breach count |
## SLO Thresholds
Default SLO thresholds (configurable):
| Metric | Warning | Critical |
|--------|---------|----------|
| Bitwise Fidelity | < 0.98 | < 0.90 |
| Semantic Fidelity | < 0.99 | < 0.95 |
| Policy Fidelity | < 1.0 | < 0.99 |
## Integration with DeterminismReport
Fidelity metrics are integrated into the `DeterminismReport` record:
```csharp
public sealed record DeterminismReport(
// ... existing fields ...
FidelityMetrics? Fidelity = null);
public sealed record DeterminismImageReport(
// ... existing fields ...
FidelityMetrics? Fidelity = null);
```
## Usage Example
```csharp
// Create fidelity metrics service
var service = new FidelityMetricsService(
new BitwiseFidelityCalculator(),
new SemanticFidelityCalculator(),
new PolicyFidelityCalculator());
// Compute fidelity from baseline and replays
var baseline = LoadScanResult("scan-baseline.json");
var replays = LoadReplayScanResults();
var fidelity = service.Compute(baseline, replays);
// Check thresholds
if (fidelity.BitwiseFidelity < 0.98)
{
logger.LogWarning("BF below threshold: {BF}", fidelity.BitwiseFidelity);
}
// Include in determinism report
var report = new DeterminismReport(
// ... other fields ...
Fidelity: fidelity);
```
## Mismatch Diagnostics
When fidelity is below threshold, the framework provides diagnostic information:
```csharp
public sealed record FidelityMismatch
{
public required int RunIndex { get; init; }
public required FidelityMismatchType Type { get; init; }
public required string Description { get; init; }
public IReadOnlyList<string>? AffectedArtifacts { get; init; }
}
public enum FidelityMismatchType
{
BitwiseOnly, // Hash differs but content equivalent
SemanticOnly, // Content differs but policy matches
PolicyDrift // Policy decision differs
}
```
## Configuration
Configure fidelity options via `FidelityThresholds`:
```json
{
"Fidelity": {
"BitwiseThreshold": 0.98,
"SemanticThreshold": 0.99,
"PolicyThreshold": 1.0,
"EnableDiagnostics": true,
"MaxMismatchesRecorded": 100
}
}
```
## Related Documentation
- [Determinism and Reproducibility Technical Reference](../product-advisories/14-Dec-2025%20-%20Determinism%20and%20Reproducibility%20Technical%20Reference.md)
- [Determinism Scoring Foundations Sprint](../implplan/SPRINT_3401_0001_0001_determinism_scoring_foundations.md)
- [Scanner Architecture](../modules/scanner/architecture.md)
## Source Files
- `src/Scanner/StellaOps.Scanner.Worker/Determinism/FidelityMetrics.cs`
- `src/Scanner/StellaOps.Scanner.Worker/Determinism/FidelityMetricsService.cs`
- `src/Scanner/StellaOps.Scanner.Worker/Determinism/Calculators/`
- `src/Telemetry/StellaOps.Telemetry.Core/FidelityMetricsTelemetry.cs`
- `src/Telemetry/StellaOps.Telemetry.Core/FidelitySloAlertingService.cs`