## Guidelines for Product and Development Managers: Signed, Replayable Risk Verdicts ### Purpose Signed, replayable risk verdicts are the Stella Ops mechanism for producing a **cryptographically verifiable, audit‑ready decision** about an artifact (container image, VM image, filesystem snapshot, SBOM, etc.) that can be **recomputed later to the same result** using the same inputs (“time-travel replay”). This capability is not “scan output with a signature.” It is a **decision artifact** that becomes the unit of governance in CI/CD, registry admission, and audits. --- # 1) Shared definitions and non-negotiables ## 1.1 Definitions **Risk verdict** A structured decision: *Pass / Fail / Warn / Needs‑Review* (or similar), produced by a deterministic evaluator under a specific policy and knowledge state. **Signed** The verdict is wrapped in a tamper‑evident envelope (e.g., DSSE/in‑toto statement) and signed using an organization-approved trust model (key-based, keyless, or offline CA). **Replayable** Given the same: * target artifact identity * SBOM (or derivation method) * vulnerability and advisory knowledge state * VEX inputs * policy bundle * evaluator version …Stella Ops can **re-evaluate and reproduce the same verdict** and provide evidence equivalence. > Critical nuance: replayability is about *result equivalence*. Byte‑for‑byte equality is ideal but not always required if signatures/metadata necessarily vary. If byte‑for‑byte is a goal, you must strictly control timestamps, ordering, and serialization. --- ## 1.2 Non-negotiables (what must be true in v1) 1. **Verdicts are bound to immutable artifact identity** * Container image: digest (sha256:…) * SBOM: content digest * File tree: merkle root digest, or equivalent 2. **Verdicts are deterministic** * No “current time” dependence in scoring * No non-deterministic ordering of findings * No implicit network calls during evaluation 3. **Verdicts are explainable** * Every deny/block decision must cite the policy clause and evidence pointers that triggered it. 4. **Verdicts are verifiable** * Independent verification toolchain exists (CLI/library) that validates signature and checks referenced evidence integrity. 5. **Knowledge state is pinned** * The verdict references a “knowledge snapshot” (vuln feeds, advisories, VEX set) by digest/ID, not “latest.” --- ## 1.3 Explicit non-goals (avoid scope traps) * Building a full CNAPP runtime protection product as part of verdicting. * Implementing “all possible attestation standards.” Pick one canonical representation; support others via adapters. * Solving global revocation and key lifecycle for every ecosystem on day one; define a minimum viable trust model per deployment mode. --- # 2) Product Management Guidelines ## 2.1 Position the verdict as the primary product artifact **PM rule:** if a workflow does not end in a verdict artifact, it is not part of this moat. Examples: * CI pipeline step produces `VERDICT.attestation` attached to the OCI artifact. * Registry admission checks for a valid verdict attestation meeting policy. * Audit export bundles the verdict plus referenced evidence. **Avoid:** “scan reports” as the goal. Reports are views; the verdict is the object. --- ## 2.2 Define the core personas and success outcomes Minimum personas: 1. **Release/Platform Engineering** * Needs automated gates, reproducibility, and low friction. 2. **Security Engineering / AppSec** * Needs evidence, explainability, and exception workflows. 3. **Audit / Compliance** * Needs replay, provenance, and a defensible trail. Define “first value” for each: * Release engineer: gate merges/releases without re-running scans. * Security engineer: investigate a deny decision with evidence pointers in minutes. * Auditor: replay a verdict months later using the same knowledge snapshot. --- ## 2.3 Product requirements (expressed as “shall” statements) ### 2.3.1 Verdict content requirements A verdict SHALL contain: * **Subject**: immutable artifact reference (digest, type, locator) * **Decision**: pass/fail/warn/etc. * **Policy binding**: policy bundle ID + version + digest * **Knowledge snapshot binding**: snapshot IDs/digests for vuln feed and VEX set * **Evaluator binding**: evaluator name/version + schema version * **Rationale summary**: stable short explanation (human-readable) * **Findings references**: pointers to detailed findings/evidence (content-addressed) * **Unknowns state**: explicit unknown counts and categories ### 2.3.2 Replay requirements The product SHALL support: * Re-evaluating the same subject under the same policy+knowledge snapshot * Proving equivalence of inputs used in the original verdict * Producing a “replay report” that states: * replay succeeded and matched * or replay failed and why (e.g., missing evidence, policy changed) ### 2.3.3 UX requirements UI/UX SHALL: * Show verdict status clearly (Pass/Fail/…) * Display: * policy clause(s) responsible * top evidence pointers * knowledge snapshot ID * signature trust status (who signed, chain validity) * Provide “Replay” as an action (even if replay happens offline, the UX must guide it) --- ## 2.4 Product taxonomy: separate “verdicts” from “evaluations” from “attestations” This is where many products get confused. Your terminology must remain strict: * **Evaluation**: internal computation that produces decision + findings. * **Verdict**: the stable, canonical decision payload (the thing being signed). * **Attestation**: the signed envelope binding the verdict to cryptographic identity. PMs must enforce this vocabulary in PRDs, UI labels, and docs. --- ## 2.5 Policy model guidelines for verdicting Verdicting depends on policy discipline. PM rules: * Policy must be **versioned** and **content-addressed**. * Policies must be **pure functions** of declared inputs: * SBOM graph * VEX claims * vulnerability data * reachability evidence (if present) * environment assertions (if present) * Policies must produce: * a decision * plus a minimal explanation graph (policy rule ID → evidence IDs) Avoid “freeform scripts” early. You need determinism and auditability. --- ## 2.6 Exceptions are part of the verdict product, not an afterthought PM requirement: * Exceptions must be first-class objects with: * scope (exact artifact/component range) * owner * justification * expiry * required evidence (optional but strongly recommended) And verdict logic must: * record that an exception was applied * include exception IDs in the verdict evidence graph * make exception usage visible in UI and audit pack exports --- ## 2.7 Success metrics (PM-owned) Choose metrics that reflect the moat: * **Replay success rate**: % of verdicts that can be replayed after N days. * **Policy determinism incidents**: number of non-deterministic evaluation bugs. * **Audit cycle time**: time to satisfy an audit evidence request for a release. * **Noise**: # of manual suppressions/overrides per 100 releases (should drop). * **Gate adoption**: % of releases gated by verdict attestations (not reports). --- # 3) Development Management Guidelines ## 3.1 Architecture principles (engineering tenets) ### Tenet A: Determinism-first evaluation Engineering SHALL ensure evaluation is deterministic across: * OS and architecture differences (as much as feasible) * concurrency scheduling * non-ordered data structures Practical rules: * Never iterate over maps/hashes without sorting keys. * Canonicalize output ordering (findings sorted by stable tuple: (component_id, cve_id, path, rule_id)). * Keep “generated at” timestamps out of the signed payload; if needed, place them in an unsigned wrapper or separate metadata field excluded from signature. ### Tenet B: Content-address everything All significant inputs/outputs should have content digests: * SBOM digest * policy digest * knowledge snapshot digest * evidence bundle digest * verdict digest This makes replay and integrity checks possible. ### Tenet C: No hidden network During evaluation, the engine must not fetch “latest” anything. Network is allowed only in: * snapshot acquisition phase * artifact retrieval phase * attestation publication phase …and each must be explicitly logged and pinned. --- ## 3.2 Canonical verdict schema and serialization rules **Engineering guideline:** pick a canonical serialization and stick to it. Options: * Canonical JSON (JCS or equivalent) * CBOR with deterministic encoding Rules: * Define a **schema version** and strict validation. * Make field names stable; avoid “optional” fields that appear/disappear nondeterministically. * Ensure numeric formatting is stable (no float drift; prefer integers or rational representation). * Always include empty arrays if required for stability, or exclude consistently by schema rule. --- ## 3.3 Suggested verdict payload (illustrative) This is not a mandate—use it as a baseline structure. ```json { "schema_version": "1.0", "subject": { "type": "oci-image", "name": "registry.example.com/app/service", "digest": "sha256:…", "platform": "linux/amd64" }, "evaluation": { "evaluator": "stella-eval", "evaluator_version": "0.9.0", "policy": { "id": "prod-default", "version": "2025.12.1", "digest": "sha256:…" }, "knowledge_snapshot": { "vuln_db_digest": "sha256:…", "advisory_digest": "sha256:…", "vex_set_digest": "sha256:…" } }, "decision": { "status": "fail", "score": 87, "reasons": [ { "rule_id": "RISK.CRITICAL.REACHABLE", "evidence_ref": "sha256:…" } ], "unknowns": { "unknown_reachable": 2, "unknown_unreachable": 0 } }, "evidence": { "sbom_digest": "sha256:…", "finding_bundle_digest": "sha256:…", "inputs_manifest_digest": "sha256:…" } } ``` Then wrap this payload in your chosen attestation envelope and sign it. --- ## 3.4 Attestation format and storage guidelines Development managers must enforce a consistent publishing model: 1. **Envelope** * Prefer DSSE/in-toto style envelope because it: * standardizes signing * supports multiple signature schemes * is widely adopted in supply chain ecosystems 2. **Attachment** * OCI artifacts should carry verdicts as referrers/attachments to the subject digest (preferred). * For non-OCI targets, store in an internal ledger keyed by the subject digest/ID. 3. **Verification** * Provide: * `stella verify ` → checks signature and integrity references * `stella replay ` → re-run evaluation from snapshots and compare 4. **Transparency / logs** * Optional in v1, but plan for: * transparency log (public or private) to strengthen auditability * offline alternatives for air-gapped customers --- ## 3.5 Knowledge snapshot engineering requirements A “snapshot” must be an immutable bundle, ideally content-addressed: Snapshot includes: * vulnerability database at a specific point * advisory sources (OS distro advisories) * VEX statement set(s) * any enrichment signals that influence scoring Rules: * Snapshot resolution must be explicit: “use snapshot digest X” * Must support export/import for air-gapped deployments * Must record source provenance and ingestion timestamps (timestamps may be excluded from signed payload if they cause nondeterminism; store them in snapshot metadata) --- ## 3.6 Replay engine requirements Replay is not “re-run scan and hope it matches.” Replay must: * retrieve the exact subject (or confirm it via digest) * retrieve the exact SBOM (or deterministically re-generate it from the subject in a defined way) * load exact policy bundle by digest * load exact knowledge snapshot by digest * run evaluator version pinned in verdict (or enforce a compatibility mapping) * produce: * verdict-equivalence result * a delta explanation if mismatch occurs Engineering rule: replay must fail loudly and specifically when inputs are missing. --- ## 3.7 Testing strategy (required) Deterministic systems require “golden” testing. Minimum tests: 1. **Golden verdict tests** * Fixed artifact + fixed snapshots + fixed policy * Expected verdict output must match exactly 2. **Cross-platform determinism tests** * Run same evaluation on different machines/containers and compare outputs 3. **Mutation tests for determinism** * Randomize ordering of internal collections; output should remain unchanged 4. **Replay regression tests** * Store verdict + snapshots and replay after code changes to ensure compatibility guarantees hold --- ## 3.8 Versioning and backward compatibility guidelines This is essential to prevent “replay breaks after upgrades.” Rules: * **Verdict schema version** changes must be rare and carefully managed. * Maintain a compatibility matrix: * evaluator vX can replay verdict schema vY * If you must evolve logic, do so by: * bumping evaluator version * preserving older evaluators in a compatibility mode (containerized evaluators are often easiest) --- ## 3.9 Security and key management guidelines Development managers must ensure: * Signing keys are managed via: * KMS/HSM (enterprise) * keyless (OIDC-based) where acceptable * offline keys for air-gapped * Verification trust policy is explicit: * which identities are trusted to sign verdicts * which policies are accepted * whether transparency is required * how to handle revocation/rotation * Separate “can sign” from “can publish” * Signing should be restricted; publishing may be broader. --- # 4) Operational workflow requirements (cross-functional) ## 4.1 CI gate flow * Build artifact * Produce SBOM deterministically (or record SBOM digest if generated elsewhere) * Evaluate → produce verdict payload * Sign verdict → publish attestation attached to artifact * Gate decision uses verification of: * signature validity * policy compliance * snapshot integrity ## 4.2 Registry / admission flow * Admission controller checks for a valid, trusted verdict attestation * Optionally requires: * verdict not older than X snapshot age (this is policy) * no expired exceptions * replay not required (replay is for audits; admission is fast-path) ## 4.3 Audit flow * Export “audit pack”: * verdict + signature chain * policy bundle * knowledge snapshot * referenced evidence bundles * Auditor (or internal team) runs `verify` and optionally `replay` --- # 5) Common failure modes to avoid 1. **Signing “findings” instead of a decision** * Leads to unbounded payload growth and weak governance semantics. 2. **Using “latest” feeds during evaluation** * Breaks replayability immediately. 3. **Embedding timestamps in signed payload** * Eliminates deterministic byte-level reproducibility. 4. **Letting the UI become the source of truth** * The verdict artifact must be the authority; UI is a view. 5. **No clear separation between: evidence store, snapshot store, verdict store** * Creates coupling and makes offline operations painful. --- # 6) Definition of Done checklist (use this to gate release) A feature increment for signed, replayable verdicts is “done” only if: * [ ] Verdict binds to immutable subject digest * [ ] Verdict includes policy digest/version and knowledge snapshot digests * [ ] Verdict is signed and verifiable via CLI * [ ] Verification works offline (given exported artifacts) * [ ] Replay works with stored snapshots and produces match/mismatch output with reasons * [ ] Determinism tests pass (golden + mutation + cross-platform) * [ ] UI displays signer identity, policy, snapshot IDs, and rule→evidence links * [ ] Exceptions (if implemented) are recorded in verdict and enforced deterministically --- ## Optional: Recommended implementation sequence (keeps risk down) 1. Canonical verdict schema + deterministic evaluator skeleton 2. Signing + verification CLI 3. Snapshot bundle format + pinned evaluation 4. Replay tool + golden tests 5. OCI attachment publishing + registry/admission integration 6. Evidence bundles + UI explainability 7. Exceptions + audit pack export --- If you want this turned into a formal internal PRD template, I can format it as: * “Product requirements” (MUST/SHOULD/COULD) * “Engineering requirements” (interfaces + invariants + test plan) * “Security model” (trust roots, signing identities, verification policy) * “Acceptance criteria” for an MVP and for GA