## Guidelines for Product and Development Managers: Signed, Replayable Risk Verdicts

### Purpose

Signed, replayable risk verdicts are the Stella Ops mechanism for producing a **cryptographically verifiable, audit‑ready decision** about an artifact (container image, VM image, filesystem snapshot, SBOM, etc.) that can be **recomputed later to the same result** using the same inputs (“time-travel replay”).

This capability is not “scan output with a signature.” It is a **decision artifact** that becomes the unit of governance in CI/CD, registry admission, and audits.

---

# 1) Shared definitions and non-negotiables

## 1.1 Definitions

**Risk verdict**
A structured decision: *Pass / Fail / Warn / Needs‑Review* (or similar), produced by a deterministic evaluator under a specific policy and knowledge state.

**Signed**
The verdict is wrapped in a tamper‑evident envelope (e.g., DSSE/in‑toto statement) and signed using an organization-approved trust model (key-based, keyless, or offline CA).

**Replayable**
Given the same:

* target artifact identity
* SBOM (or derivation method)
* vulnerability and advisory knowledge state
* VEX inputs
* policy bundle
* evaluator version
  …Stella Ops can **re-evaluate and reproduce the same verdict** and provide evidence equivalence.

> Critical nuance: replayability is about *result equivalence*. Byte‑for‑byte equality is ideal but not always required if signatures/metadata necessarily vary. If byte‑for‑byte is a goal, you must strictly control timestamps, ordering, and serialization.

---

## 1.2 Non-negotiables (what must be true in v1)

1. **Verdicts are bound to immutable artifact identity**

   * Container image: digest (sha256:…)
   * SBOM: content digest
   * File tree: merkle root digest, or equivalent

2. **Verdicts are deterministic**

   * No “current time” dependence in scoring
   * No non-deterministic ordering of findings
   * No implicit network calls during evaluation

3. **Verdicts are explainable**

   * Every deny/block decision must cite the policy clause and evidence pointers that triggered it.

4. **Verdicts are verifiable**

   * Independent verification toolchain exists (CLI/library) that validates signature and checks referenced evidence integrity.

5. **Knowledge state is pinned**

   * The verdict references a “knowledge snapshot” (vuln feeds, advisories, VEX set) by digest/ID, not “latest.”

---

## 1.3 Explicit non-goals (avoid scope traps)

* Building a full CNAPP runtime protection product as part of verdicting.
* Implementing “all possible attestation standards.” Pick one canonical representation; support others via adapters.
* Solving global revocation and key lifecycle for every ecosystem on day one; define a minimum viable trust model per deployment mode.

---

# 2) Product Management Guidelines

## 2.1 Position the verdict as the primary product artifact

**PM rule:** if a workflow does not end in a verdict artifact, it is not part of this moat.

Examples:

* CI pipeline step produces `VERDICT.attestation` attached to the OCI artifact.
* Registry admission checks for a valid verdict attestation meeting policy.
* Audit export bundles the verdict plus referenced evidence.

**Avoid:** “scan reports” as the goal. Reports are views; the verdict is the object.

---

## 2.2 Define the core personas and success outcomes

Minimum personas:

1. **Release/Platform Engineering**

   * Needs automated gates, reproducibility, and low friction.
2. **Security Engineering / AppSec**

   * Needs evidence, explainability, and exception workflows.
3. **Audit / Compliance**

   * Needs replay, provenance, and a defensible trail.

Define “first value” for each:

* Release engineer: gate merges/releases without re-running scans.
* Security engineer: investigate a deny decision with evidence pointers in minutes.
* Auditor: replay a verdict months later using the same knowledge snapshot.

---

## 2.3 Product requirements (expressed as “shall” statements)

### 2.3.1 Verdict content requirements

A verdict SHALL contain:

* **Subject**: immutable artifact reference (digest, type, locator)
* **Decision**: pass/fail/warn/etc.
* **Policy binding**: policy bundle ID + version + digest
* **Knowledge snapshot binding**: snapshot IDs/digests for vuln feed and VEX set
* **Evaluator binding**: evaluator name/version + schema version
* **Rationale summary**: stable short explanation (human-readable)
* **Findings references**: pointers to detailed findings/evidence (content-addressed)
* **Unknowns state**: explicit unknown counts and categories

### 2.3.2 Replay requirements

The product SHALL support:

* Re-evaluating the same subject under the same policy+knowledge snapshot
* Proving equivalence of inputs used in the original verdict
* Producing a “replay report” that states:

  * replay succeeded and matched
  * or replay failed and why (e.g., missing evidence, policy changed)

### 2.3.3 UX requirements

UI/UX SHALL:

* Show verdict status clearly (Pass/Fail/…)
* Display:

  * policy clause(s) responsible
  * top evidence pointers
  * knowledge snapshot ID
  * signature trust status (who signed, chain validity)
* Provide “Replay” as an action (even if replay happens offline, the UX must guide it)

---

## 2.4 Product taxonomy: separate “verdicts” from “evaluations” from “attestations”

This is where many products get confused. Your terminology must remain strict:

* **Evaluation**: internal computation that produces decision + findings.
* **Verdict**: the stable, canonical decision payload (the thing being signed).
* **Attestation**: the signed envelope binding the verdict to cryptographic identity.

PMs must enforce this vocabulary in PRDs, UI labels, and docs.

---

## 2.5 Policy model guidelines for verdicting

Verdicting depends on policy discipline.

PM rules:

* Policy must be **versioned** and **content-addressed**.
* Policies must be **pure functions** of declared inputs:

  * SBOM graph
  * VEX claims
  * vulnerability data
  * reachability evidence (if present)
  * environment assertions (if present)
* Policies must produce:

  * a decision
  * plus a minimal explanation graph (policy rule ID → evidence IDs)

Avoid “freeform scripts” early. You need determinism and auditability.

---

## 2.6 Exceptions are part of the verdict product, not an afterthought

PM requirement:

* Exceptions must be first-class objects with:

  * scope (exact artifact/component range)
  * owner
  * justification
  * expiry
  * required evidence (optional but strongly recommended)

And verdict logic must:

* record that an exception was applied
* include exception IDs in the verdict evidence graph
* make exception usage visible in UI and audit pack exports

---

## 2.7 Success metrics (PM-owned)

Choose metrics that reflect the moat:

* **Replay success rate**: % of verdicts that can be replayed after N days.
* **Policy determinism incidents**: number of non-deterministic evaluation bugs.
* **Audit cycle time**: time to satisfy an audit evidence request for a release.
* **Noise**: # of manual suppressions/overrides per 100 releases (should drop).
* **Gate adoption**: % of releases gated by verdict attestations (not reports).

---

# 3) Development Management Guidelines

## 3.1 Architecture principles (engineering tenets)

### Tenet A: Determinism-first evaluation

Engineering SHALL ensure evaluation is deterministic across:

* OS and architecture differences (as much as feasible)
* concurrency scheduling
* non-ordered data structures

Practical rules:

* Never iterate over maps/hashes without sorting keys.
* Canonicalize output ordering (findings sorted by stable tuple: (component_id, cve_id, path, rule_id)).
* Keep “generated at” timestamps out of the signed payload; if needed, place them in an unsigned wrapper or separate metadata field excluded from signature.

### Tenet B: Content-address everything

All significant inputs/outputs should have content digests:

* SBOM digest
* policy digest
* knowledge snapshot digest
* evidence bundle digest
* verdict digest

This makes replay and integrity checks possible.

### Tenet C: No hidden network

During evaluation, the engine must not fetch “latest” anything.
Network is allowed only in:

* snapshot acquisition phase
* artifact retrieval phase
* attestation publication phase
  …and each must be explicitly logged and pinned.

---

## 3.2 Canonical verdict schema and serialization rules

**Engineering guideline:** pick a canonical serialization and stick to it.

Options:

* Canonical JSON (JCS or equivalent)
* CBOR with deterministic encoding

Rules:

* Define a **schema version** and strict validation.
* Make field names stable; avoid “optional” fields that appear/disappear nondeterministically.
* Ensure numeric formatting is stable (no float drift; prefer integers or rational representation).
* Always include empty arrays if required for stability, or exclude consistently by schema rule.

---

## 3.3 Suggested verdict payload (illustrative)

This is not a mandate—use it as a baseline structure.

```json
{
  "schema_version": "1.0",
  "subject": {
    "type": "oci-image",
    "name": "registry.example.com/app/service",
    "digest": "sha256:…",
    "platform": "linux/amd64"
  },
  "evaluation": {
    "evaluator": "stella-eval",
    "evaluator_version": "0.9.0",
    "policy": {
      "id": "prod-default",
      "version": "2025.12.1",
      "digest": "sha256:…"
    },
    "knowledge_snapshot": {
      "vuln_db_digest": "sha256:…",
      "advisory_digest": "sha256:…",
      "vex_set_digest": "sha256:…"
    }
  },
  "decision": {
    "status": "fail",
    "score": 87,
    "reasons": [
      { "rule_id": "RISK.CRITICAL.REACHABLE", "evidence_ref": "sha256:…" }
    ],
    "unknowns": {
      "unknown_reachable": 2,
      "unknown_unreachable": 0
    }
  },
  "evidence": {
    "sbom_digest": "sha256:…",
    "finding_bundle_digest": "sha256:…",
    "inputs_manifest_digest": "sha256:…"
  }
}
```

Then wrap this payload in your chosen attestation envelope and sign it.

---

## 3.4 Attestation format and storage guidelines

Development managers must enforce a consistent publishing model:

1. **Envelope**

   * Prefer DSSE/in-toto style envelope because it:

     * standardizes signing
     * supports multiple signature schemes
     * is widely adopted in supply chain ecosystems

2. **Attachment**

   * OCI artifacts should carry verdicts as referrers/attachments to the subject digest (preferred).
   * For non-OCI targets, store in an internal ledger keyed by the subject digest/ID.

3. **Verification**

   * Provide:

     * `stella verify <artifact>` → checks signature and integrity references
     * `stella replay <verdict>` → re-run evaluation from snapshots and compare

4. **Transparency / logs**

   * Optional in v1, but plan for:

     * transparency log (public or private) to strengthen auditability
     * offline alternatives for air-gapped customers

---

## 3.5 Knowledge snapshot engineering requirements

A “snapshot” must be an immutable bundle, ideally content-addressed:

Snapshot includes:

* vulnerability database at a specific point
* advisory sources (OS distro advisories)
* VEX statement set(s)
* any enrichment signals that influence scoring

Rules:

* Snapshot resolution must be explicit: “use snapshot digest X”
* Must support export/import for air-gapped deployments
* Must record source provenance and ingestion timestamps (timestamps may be excluded from signed payload if they cause nondeterminism; store them in snapshot metadata)

---

## 3.6 Replay engine requirements

Replay is not “re-run scan and hope it matches.”

Replay must:

* retrieve the exact subject (or confirm it via digest)
* retrieve the exact SBOM (or deterministically re-generate it from the subject in a defined way)
* load exact policy bundle by digest
* load exact knowledge snapshot by digest
* run evaluator version pinned in verdict (or enforce a compatibility mapping)
* produce:

  * verdict-equivalence result
  * a delta explanation if mismatch occurs

Engineering rule: replay must fail loudly and specifically when inputs are missing.

---

## 3.7 Testing strategy (required)

Deterministic systems require “golden” testing.

Minimum tests:

1. **Golden verdict tests**

   * Fixed artifact + fixed snapshots + fixed policy
   * Expected verdict output must match exactly

2. **Cross-platform determinism tests**

   * Run same evaluation on different machines/containers and compare outputs

3. **Mutation tests for determinism**

   * Randomize ordering of internal collections; output should remain unchanged

4. **Replay regression tests**

   * Store verdict + snapshots and replay after code changes to ensure compatibility guarantees hold

---

## 3.8 Versioning and backward compatibility guidelines

This is essential to prevent “replay breaks after upgrades.”

Rules:

* **Verdict schema version** changes must be rare and carefully managed.
* Maintain a compatibility matrix:

  * evaluator vX can replay verdict schema vY
* If you must evolve logic, do so by:

  * bumping evaluator version
  * preserving older evaluators in a compatibility mode (containerized evaluators are often easiest)

---

## 3.9 Security and key management guidelines

Development managers must ensure:

* Signing keys are managed via:

  * KMS/HSM (enterprise)
  * keyless (OIDC-based) where acceptable
  * offline keys for air-gapped

* Verification trust policy is explicit:

  * which identities are trusted to sign verdicts
  * which policies are accepted
  * whether transparency is required
  * how to handle revocation/rotation

* Separate “can sign” from “can publish”

  * Signing should be restricted; publishing may be broader.

---

# 4) Operational workflow requirements (cross-functional)

## 4.1 CI gate flow

* Build artifact
* Produce SBOM deterministically (or record SBOM digest if generated elsewhere)
* Evaluate → produce verdict payload
* Sign verdict → publish attestation attached to artifact
* Gate decision uses verification of:

  * signature validity
  * policy compliance
  * snapshot integrity

## 4.2 Registry / admission flow

* Admission controller checks for a valid, trusted verdict attestation
* Optionally requires:

  * verdict not older than X snapshot age (this is policy)
  * no expired exceptions
  * replay not required (replay is for audits; admission is fast-path)

## 4.3 Audit flow

* Export “audit pack”:

  * verdict + signature chain
  * policy bundle
  * knowledge snapshot
  * referenced evidence bundles
* Auditor (or internal team) runs `verify` and optionally `replay`

---

# 5) Common failure modes to avoid

1. **Signing “findings” instead of a decision**

   * Leads to unbounded payload growth and weak governance semantics.

2. **Using “latest” feeds during evaluation**

   * Breaks replayability immediately.

3. **Embedding timestamps in signed payload**

   * Eliminates deterministic byte-level reproducibility.

4. **Letting the UI become the source of truth**

   * The verdict artifact must be the authority; UI is a view.

5. **No clear separation between: evidence store, snapshot store, verdict store**

   * Creates coupling and makes offline operations painful.

---

# 6) Definition of Done checklist (use this to gate release)

A feature increment for signed, replayable verdicts is “done” only if:

* [ ] Verdict binds to immutable subject digest
* [ ] Verdict includes policy digest/version and knowledge snapshot digests
* [ ] Verdict is signed and verifiable via CLI
* [ ] Verification works offline (given exported artifacts)
* [ ] Replay works with stored snapshots and produces match/mismatch output with reasons
* [ ] Determinism tests pass (golden + mutation + cross-platform)
* [ ] UI displays signer identity, policy, snapshot IDs, and rule→evidence links
* [ ] Exceptions (if implemented) are recorded in verdict and enforced deterministically

---

## Optional: Recommended implementation sequence (keeps risk down)

1. Canonical verdict schema + deterministic evaluator skeleton
2. Signing + verification CLI
3. Snapshot bundle format + pinned evaluation
4. Replay tool + golden tests
5. OCI attachment publishing + registry/admission integration
6. Evidence bundles + UI explainability
7. Exceptions + audit pack export

---

If you want this turned into a formal internal PRD template, I can format it as:

* “Product requirements” (MUST/SHOULD/COULD)
* “Engineering requirements” (interfaces + invariants + test plan)
* “Security model” (trust roots, signing identities, verification policy)
* “Acceptance criteria” for an MVP and for GA