Add comprehensive security tests for OWASP A02, A05, A07, and A08 categories
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Export Center CI / export-ci (push) Has been cancelled
Findings Ledger CI / build-test (push) Has been cancelled
Findings Ledger CI / migration-validation (push) Has been cancelled
Findings Ledger CI / generate-manifest (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Lighthouse CI / Lighthouse Audit (push) Has been cancelled
Lighthouse CI / Axe Accessibility Audit (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Export Center CI / export-ci (push) Has been cancelled
Findings Ledger CI / build-test (push) Has been cancelled
Findings Ledger CI / migration-validation (push) Has been cancelled
Findings Ledger CI / generate-manifest (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Lighthouse CI / Lighthouse Audit (push) Has been cancelled
Lighthouse CI / Axe Accessibility Audit (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
- Implemented tests for Cryptographic Failures (A02) to ensure proper handling of sensitive data, secure algorithms, and key management. - Added tests for Security Misconfiguration (A05) to validate production configurations, security headers, CORS settings, and feature management. - Developed tests for Authentication Failures (A07) to enforce strong password policies, rate limiting, session management, and MFA support. - Created tests for Software and Data Integrity Failures (A08) to verify artifact signatures, SBOM integrity, attestation chains, and feed updates.
This commit is contained in:
191
docs/benchmarks/fidelity-metrics.md
Normal file
191
docs/benchmarks/fidelity-metrics.md
Normal file
@@ -0,0 +1,191 @@
|
||||
# Fidelity Metrics Framework
|
||||
|
||||
> Sprint: SPRINT_3403_0001_0001_fidelity_metrics
|
||||
|
||||
This document describes the three-tier fidelity metrics framework for measuring deterministic reproducibility in StellaOps scanner outputs.
|
||||
|
||||
## Overview
|
||||
|
||||
Fidelity metrics quantify how consistently the scanner produces outputs across replay runs. The framework provides three tiers of measurement, each capturing different aspects of reproducibility:
|
||||
|
||||
| Metric | Abbrev. | Description | Target |
|
||||
|--------|---------|-------------|--------|
|
||||
| Bitwise Fidelity | BF | Byte-for-byte identical outputs | ≥ 0.98 |
|
||||
| Semantic Fidelity | SF | Normalized object equivalence | ≥ 0.99 |
|
||||
| Policy Fidelity | PF | Policy decision consistency | ≈ 1.0 |
|
||||
|
||||
## Metric Definitions
|
||||
|
||||
### Bitwise Fidelity (BF)
|
||||
|
||||
Measures the proportion of replay runs that produce byte-for-byte identical outputs.
|
||||
|
||||
```
|
||||
BF = identical_outputs / total_replays
|
||||
```
|
||||
|
||||
**What it captures:**
|
||||
- SHA-256 hash equivalence of all output artifacts
|
||||
- Timestamp consistency
|
||||
- JSON formatting consistency
|
||||
- Field ordering consistency
|
||||
|
||||
**When BF < 1.0:**
|
||||
- Timestamps embedded in outputs
|
||||
- Non-deterministic field ordering
|
||||
- Floating-point rounding differences
|
||||
- Random identifiers (UUIDs)
|
||||
|
||||
### Semantic Fidelity (SF)
|
||||
|
||||
Measures the proportion of replay runs that produce semantically equivalent outputs, ignoring formatting differences.
|
||||
|
||||
```
|
||||
SF = semantic_matches / total_replays
|
||||
```
|
||||
|
||||
**What it compares:**
|
||||
- Package PURLs and versions
|
||||
- CVE identifiers
|
||||
- Severity levels (normalized to uppercase)
|
||||
- VEX verdicts
|
||||
- Reason codes
|
||||
|
||||
**When SF < 1.0 but BF = SF:**
|
||||
- No actual content differences
|
||||
- Only formatting differences
|
||||
|
||||
**When SF < 1.0:**
|
||||
- Different packages detected
|
||||
- Different CVEs matched
|
||||
- Different severity assignments
|
||||
|
||||
### Policy Fidelity (PF)
|
||||
|
||||
Measures the proportion of replay runs that produce matching policy decisions.
|
||||
|
||||
```
|
||||
PF = policy_matches / total_replays
|
||||
```
|
||||
|
||||
**What it compares:**
|
||||
- Final pass/fail decision
|
||||
- Reason codes (sorted for comparison)
|
||||
- Policy rule triggering
|
||||
|
||||
**When PF < 1.0:**
|
||||
- Policy outcome differs between runs
|
||||
- Indicates a non-determinism bug that affects user-visible decisions
|
||||
|
||||
## Prometheus Metrics
|
||||
|
||||
The fidelity framework exports the following metrics:
|
||||
|
||||
| Metric Name | Type | Labels | Description |
|
||||
|-------------|------|--------|-------------|
|
||||
| `fidelity_bitwise_ratio` | Gauge | tenant_id, surface_id | Bitwise fidelity ratio |
|
||||
| `fidelity_semantic_ratio` | Gauge | tenant_id, surface_id | Semantic fidelity ratio |
|
||||
| `fidelity_policy_ratio` | Gauge | tenant_id, surface_id | Policy fidelity ratio |
|
||||
| `fidelity_total_replays` | Gauge | tenant_id, surface_id | Number of replays |
|
||||
| `fidelity_slo_breach_total` | Counter | breach_type, tenant_id | SLO breach count |
|
||||
|
||||
## SLO Thresholds
|
||||
|
||||
Default SLO thresholds (configurable):
|
||||
|
||||
| Metric | Warning | Critical |
|
||||
|--------|---------|----------|
|
||||
| Bitwise Fidelity | < 0.98 | < 0.90 |
|
||||
| Semantic Fidelity | < 0.99 | < 0.95 |
|
||||
| Policy Fidelity | < 1.0 | < 0.99 |
|
||||
|
||||
## Integration with DeterminismReport
|
||||
|
||||
Fidelity metrics are integrated into the `DeterminismReport` record:
|
||||
|
||||
```csharp
|
||||
public sealed record DeterminismReport(
|
||||
// ... existing fields ...
|
||||
FidelityMetrics? Fidelity = null);
|
||||
|
||||
public sealed record DeterminismImageReport(
|
||||
// ... existing fields ...
|
||||
FidelityMetrics? Fidelity = null);
|
||||
```
|
||||
|
||||
## Usage Example
|
||||
|
||||
```csharp
|
||||
// Create fidelity metrics service
|
||||
var service = new FidelityMetricsService(
|
||||
new BitwiseFidelityCalculator(),
|
||||
new SemanticFidelityCalculator(),
|
||||
new PolicyFidelityCalculator());
|
||||
|
||||
// Compute fidelity from baseline and replays
|
||||
var baseline = LoadScanResult("scan-baseline.json");
|
||||
var replays = LoadReplayScanResults();
|
||||
var fidelity = service.Compute(baseline, replays);
|
||||
|
||||
// Check thresholds
|
||||
if (fidelity.BitwiseFidelity < 0.98)
|
||||
{
|
||||
logger.LogWarning("BF below threshold: {BF}", fidelity.BitwiseFidelity);
|
||||
}
|
||||
|
||||
// Include in determinism report
|
||||
var report = new DeterminismReport(
|
||||
// ... other fields ...
|
||||
Fidelity: fidelity);
|
||||
```
|
||||
|
||||
## Mismatch Diagnostics
|
||||
|
||||
When fidelity is below threshold, the framework provides diagnostic information:
|
||||
|
||||
```csharp
|
||||
public sealed record FidelityMismatch
|
||||
{
|
||||
public required int RunIndex { get; init; }
|
||||
public required FidelityMismatchType Type { get; init; }
|
||||
public required string Description { get; init; }
|
||||
public IReadOnlyList<string>? AffectedArtifacts { get; init; }
|
||||
}
|
||||
|
||||
public enum FidelityMismatchType
|
||||
{
|
||||
BitwiseOnly, // Hash differs but content equivalent
|
||||
SemanticOnly, // Content differs but policy matches
|
||||
PolicyDrift // Policy decision differs
|
||||
}
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Configure fidelity options via `FidelityThresholds`:
|
||||
|
||||
```json
|
||||
{
|
||||
"Fidelity": {
|
||||
"BitwiseThreshold": 0.98,
|
||||
"SemanticThreshold": 0.99,
|
||||
"PolicyThreshold": 1.0,
|
||||
"EnableDiagnostics": true,
|
||||
"MaxMismatchesRecorded": 100
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Determinism and Reproducibility Technical Reference](../product-advisories/14-Dec-2025%20-%20Determinism%20and%20Reproducibility%20Technical%20Reference.md)
|
||||
- [Determinism Scoring Foundations Sprint](../implplan/SPRINT_3401_0001_0001_determinism_scoring_foundations.md)
|
||||
- [Scanner Architecture](../modules/scanner/architecture.md)
|
||||
|
||||
## Source Files
|
||||
|
||||
- `src/Scanner/StellaOps.Scanner.Worker/Determinism/FidelityMetrics.cs`
|
||||
- `src/Scanner/StellaOps.Scanner.Worker/Determinism/FidelityMetricsService.cs`
|
||||
- `src/Scanner/StellaOps.Scanner.Worker/Determinism/Calculators/`
|
||||
- `src/Telemetry/StellaOps.Telemetry.Core/FidelityMetricsTelemetry.cs`
|
||||
- `src/Telemetry/StellaOps.Telemetry.Core/FidelitySloAlertingService.cs`
|
||||
Reference in New Issue
Block a user