Files
git.stella-ops.org/docs/features/unchecked/policy/replayable-verdict-evaluation.md

2.9 KiB

Replayable Verdict Evaluation

Module

Policy

Status

IMPLEMENTED

Description

Full replay engine that re-evaluates verdicts using stored snapshot inputs, producing match/mismatch reports with delta explanations when results differ. Exposed via API endpoints.

Implementation Details

  • ReplayEngine: src/Policy/__Libraries/StellaOps.Policy/Replay/ReplayEngine.cs (sealed class implements IReplayEngine)
    • ReplayAsync(ReplayRequest) -- full replay pipeline: load snapshot, resolve inputs, execute, compare, report
    • Uses frozen inputs from KnowledgeSnapshotManifest to ensure deterministic re-evaluation
    • Comparison with VerdictComparer produces ExactMatch, MatchWithinTolerance, or Mismatch
    • Delta report with FieldDeltas, FindingDeltas, and SuspectedCauses
  • ReplayRequest: src/Policy/__Libraries/StellaOps.Policy/Replay/ReplayRequest.cs
    • ArtifactDigest, SnapshotId, OriginalVerdictId
    • ReplayOptions: CompareWithOriginal, AllowNetworkFetch=false, GenerateDetailedReport, ScoreTolerance=0.001
  • ReplayResult: src/Policy/__Libraries/StellaOps.Policy/Replay/ReplayResult.cs
    • MatchStatus: ExactMatch, MatchWithinTolerance, Mismatch, NoComparison, ReplayFailed
    • ReplayedVerdict with Decision (Pass/Fail/PassWithExceptions/Indeterminate), Score, FindingIds
    • Duration tracking for performance monitoring
  • VerdictComparer: src/Policy/__Libraries/StellaOps.Policy/Replay/VerdictComparer.cs -- deterministic comparison with configurable tolerance
  • KnowledgeSourceResolver: src/Policy/__Libraries/StellaOps.Policy/Replay/KnowledgeSourceResolver.cs -- resolves snapshot source descriptors to data
  • SnapshotAwarePolicyEvaluator: src/Policy/__Libraries/StellaOps.Policy/Snapshots/SnapshotAwarePolicyEvaluator.cs -- evaluation with pinned inputs
  • KnowledgeSnapshotManifest: src/Policy/__Libraries/StellaOps.Policy/Snapshots/KnowledgeSnapshotManifest.cs -- content-addressed snapshot manifest

E2E Test Plan

  • Replay verdict with same snapshot and inputs; verify ExactMatch status
  • Replay verdict after source data changed; verify Mismatch with SuspectedCauses listing changed source
  • Replay with ScoreTolerance=0.01; introduce 0.005 score drift; verify MatchWithinTolerance
  • Replay with ScoreTolerance=0.001; introduce 0.005 score drift; verify Mismatch with FieldDelta showing score difference
  • Verify ReplayDeltaReport.FindingDeltas lists Added findings (present in replay, absent in original)
  • Verify ReplayDeltaReport.FindingDeltas lists Removed findings (absent in replay, present in original)
  • Replay with AllowNetworkFetch=false and missing bundled source; verify ReplayFailed
  • Replay with CompareWithOriginal=false; verify NoComparison status, no DeltaReport
  • Verify replay Duration is non-zero and reasonable
  • POST replay endpoint; verify JSON response includes all ReplayResult fields