Files
git.stella-ops.org/docs-archived/product-advisories/2025-12-22-ux-sprints/21-Dec-2025 - How Top Scanners Shape Evidence‑First UX.md
2026-01-05 16:02:11 +02:00

16 KiB
Raw Blame History

Guidelines for Product and Development Managers: Signed, Replayable Risk Verdicts

Purpose

Signed, replayable risk verdicts are the Stella Ops mechanism for producing a cryptographically verifiable, auditready decision about an artifact (container image, VM image, filesystem snapshot, SBOM, etc.) that can be recomputed later to the same result using the same inputs (“time-travel replay”).

This capability is not “scan output with a signature.” It is a decision artifact that becomes the unit of governance in CI/CD, registry admission, and audits.


1) Shared definitions and non-negotiables

1.1 Definitions

Risk verdict A structured decision: Pass / Fail / Warn / NeedsReview (or similar), produced by a deterministic evaluator under a specific policy and knowledge state.

Signed The verdict is wrapped in a tamperevident envelope (e.g., DSSE/intoto statement) and signed using an organization-approved trust model (key-based, keyless, or offline CA).

Replayable Given the same:

  • target artifact identity
  • SBOM (or derivation method)
  • vulnerability and advisory knowledge state
  • VEX inputs
  • policy bundle
  • evaluator version …Stella Ops can re-evaluate and reproduce the same verdict and provide evidence equivalence.

Critical nuance: replayability is about result equivalence. Byteforbyte equality is ideal but not always required if signatures/metadata necessarily vary. If byteforbyte is a goal, you must strictly control timestamps, ordering, and serialization.


1.2 Non-negotiables (what must be true in v1)

  1. Verdicts are bound to immutable artifact identity

    • Container image: digest (sha256:…)
    • SBOM: content digest
    • File tree: merkle root digest, or equivalent
  2. Verdicts are deterministic

    • No “current time” dependence in scoring
    • No non-deterministic ordering of findings
    • No implicit network calls during evaluation
  3. Verdicts are explainable

    • Every deny/block decision must cite the policy clause and evidence pointers that triggered it.
  4. Verdicts are verifiable

    • Independent verification toolchain exists (CLI/library) that validates signature and checks referenced evidence integrity.
  5. Knowledge state is pinned

    • The verdict references a “knowledge snapshot” (vuln feeds, advisories, VEX set) by digest/ID, not “latest.”

1.3 Explicit non-goals (avoid scope traps)

  • Building a full CNAPP runtime protection product as part of verdicting.
  • Implementing “all possible attestation standards.” Pick one canonical representation; support others via adapters.
  • Solving global revocation and key lifecycle for every ecosystem on day one; define a minimum viable trust model per deployment mode.

2) Product Management Guidelines

2.1 Position the verdict as the primary product artifact

PM rule: if a workflow does not end in a verdict artifact, it is not part of this moat.

Examples:

  • CI pipeline step produces VERDICT.attestation attached to the OCI artifact.
  • Registry admission checks for a valid verdict attestation meeting policy.
  • Audit export bundles the verdict plus referenced evidence.

Avoid: “scan reports” as the goal. Reports are views; the verdict is the object.


2.2 Define the core personas and success outcomes

Minimum personas:

  1. Release/Platform Engineering

    • Needs automated gates, reproducibility, and low friction.
  2. Security Engineering / AppSec

    • Needs evidence, explainability, and exception workflows.
  3. Audit / Compliance

    • Needs replay, provenance, and a defensible trail.

Define “first value” for each:

  • Release engineer: gate merges/releases without re-running scans.
  • Security engineer: investigate a deny decision with evidence pointers in minutes.
  • Auditor: replay a verdict months later using the same knowledge snapshot.

2.3 Product requirements (expressed as “shall” statements)

2.3.1 Verdict content requirements

A verdict SHALL contain:

  • Subject: immutable artifact reference (digest, type, locator)
  • Decision: pass/fail/warn/etc.
  • Policy binding: policy bundle ID + version + digest
  • Knowledge snapshot binding: snapshot IDs/digests for vuln feed and VEX set
  • Evaluator binding: evaluator name/version + schema version
  • Rationale summary: stable short explanation (human-readable)
  • Findings references: pointers to detailed findings/evidence (content-addressed)
  • Unknowns state: explicit unknown counts and categories

2.3.2 Replay requirements

The product SHALL support:

  • Re-evaluating the same subject under the same policy+knowledge snapshot

  • Proving equivalence of inputs used in the original verdict

  • Producing a “replay report” that states:

    • replay succeeded and matched
    • or replay failed and why (e.g., missing evidence, policy changed)

2.3.3 UX requirements

UI/UX SHALL:

  • Show verdict status clearly (Pass/Fail/…)

  • Display:

    • policy clause(s) responsible
    • top evidence pointers
    • knowledge snapshot ID
    • signature trust status (who signed, chain validity)
  • Provide “Replay” as an action (even if replay happens offline, the UX must guide it)


2.4 Product taxonomy: separate “verdicts” from “evaluations” from “attestations”

This is where many products get confused. Your terminology must remain strict:

  • Evaluation: internal computation that produces decision + findings.
  • Verdict: the stable, canonical decision payload (the thing being signed).
  • Attestation: the signed envelope binding the verdict to cryptographic identity.

PMs must enforce this vocabulary in PRDs, UI labels, and docs.


2.5 Policy model guidelines for verdicting

Verdicting depends on policy discipline.

PM rules:

  • Policy must be versioned and content-addressed.

  • Policies must be pure functions of declared inputs:

    • SBOM graph
    • VEX claims
    • vulnerability data
    • reachability evidence (if present)
    • environment assertions (if present)
  • Policies must produce:

    • a decision
    • plus a minimal explanation graph (policy rule ID → evidence IDs)

Avoid “freeform scripts” early. You need determinism and auditability.


2.6 Exceptions are part of the verdict product, not an afterthought

PM requirement:

  • Exceptions must be first-class objects with:

    • scope (exact artifact/component range)
    • owner
    • justification
    • expiry
    • required evidence (optional but strongly recommended)

And verdict logic must:

  • record that an exception was applied
  • include exception IDs in the verdict evidence graph
  • make exception usage visible in UI and audit pack exports

2.7 Success metrics (PM-owned)

Choose metrics that reflect the moat:

  • Replay success rate: % of verdicts that can be replayed after N days.
  • Policy determinism incidents: number of non-deterministic evaluation bugs.
  • Audit cycle time: time to satisfy an audit evidence request for a release.
  • Noise: # of manual suppressions/overrides per 100 releases (should drop).
  • Gate adoption: % of releases gated by verdict attestations (not reports).

3) Development Management Guidelines

3.1 Architecture principles (engineering tenets)

Tenet A: Determinism-first evaluation

Engineering SHALL ensure evaluation is deterministic across:

  • OS and architecture differences (as much as feasible)
  • concurrency scheduling
  • non-ordered data structures

Practical rules:

  • Never iterate over maps/hashes without sorting keys.
  • Canonicalize output ordering (findings sorted by stable tuple: (component_id, cve_id, path, rule_id)).
  • Keep “generated at” timestamps out of the signed payload; if needed, place them in an unsigned wrapper or separate metadata field excluded from signature.

Tenet B: Content-address everything

All significant inputs/outputs should have content digests:

  • SBOM digest
  • policy digest
  • knowledge snapshot digest
  • evidence bundle digest
  • verdict digest

This makes replay and integrity checks possible.

Tenet C: No hidden network

During evaluation, the engine must not fetch “latest” anything. Network is allowed only in:

  • snapshot acquisition phase
  • artifact retrieval phase
  • attestation publication phase …and each must be explicitly logged and pinned.

3.2 Canonical verdict schema and serialization rules

Engineering guideline: pick a canonical serialization and stick to it.

Options:

  • Canonical JSON (JCS or equivalent)
  • CBOR with deterministic encoding

Rules:

  • Define a schema version and strict validation.
  • Make field names stable; avoid “optional” fields that appear/disappear nondeterministically.
  • Ensure numeric formatting is stable (no float drift; prefer integers or rational representation).
  • Always include empty arrays if required for stability, or exclude consistently by schema rule.

3.3 Suggested verdict payload (illustrative)

This is not a mandate—use it as a baseline structure.

{
  "schema_version": "1.0",
  "subject": {
    "type": "oci-image",
    "name": "registry.example.com/app/service",
    "digest": "sha256:…",
    "platform": "linux/amd64"
  },
  "evaluation": {
    "evaluator": "stella-eval",
    "evaluator_version": "0.9.0",
    "policy": {
      "id": "prod-default",
      "version": "2025.12.1",
      "digest": "sha256:…"
    },
    "knowledge_snapshot": {
      "vuln_db_digest": "sha256:…",
      "advisory_digest": "sha256:…",
      "vex_set_digest": "sha256:…"
    }
  },
  "decision": {
    "status": "fail",
    "score": 87,
    "reasons": [
      { "rule_id": "RISK.CRITICAL.REACHABLE", "evidence_ref": "sha256:…" }
    ],
    "unknowns": {
      "unknown_reachable": 2,
      "unknown_unreachable": 0
    }
  },
  "evidence": {
    "sbom_digest": "sha256:…",
    "finding_bundle_digest": "sha256:…",
    "inputs_manifest_digest": "sha256:…"
  }
}

Then wrap this payload in your chosen attestation envelope and sign it.


3.4 Attestation format and storage guidelines

Development managers must enforce a consistent publishing model:

  1. Envelope

    • Prefer DSSE/in-toto style envelope because it:

      • standardizes signing
      • supports multiple signature schemes
      • is widely adopted in supply chain ecosystems
  2. Attachment

    • OCI artifacts should carry verdicts as referrers/attachments to the subject digest (preferred).
    • For non-OCI targets, store in an internal ledger keyed by the subject digest/ID.
  3. Verification

    • Provide:

      • stella verify <artifact> → checks signature and integrity references
      • stella replay <verdict> → re-run evaluation from snapshots and compare
  4. Transparency / logs

    • Optional in v1, but plan for:

      • transparency log (public or private) to strengthen auditability
      • offline alternatives for air-gapped customers

3.5 Knowledge snapshot engineering requirements

A “snapshot” must be an immutable bundle, ideally content-addressed:

Snapshot includes:

  • vulnerability database at a specific point
  • advisory sources (OS distro advisories)
  • VEX statement set(s)
  • any enrichment signals that influence scoring

Rules:

  • Snapshot resolution must be explicit: “use snapshot digest X”
  • Must support export/import for air-gapped deployments
  • Must record source provenance and ingestion timestamps (timestamps may be excluded from signed payload if they cause nondeterminism; store them in snapshot metadata)

3.6 Replay engine requirements

Replay is not “re-run scan and hope it matches.”

Replay must:

  • retrieve the exact subject (or confirm it via digest)

  • retrieve the exact SBOM (or deterministically re-generate it from the subject in a defined way)

  • load exact policy bundle by digest

  • load exact knowledge snapshot by digest

  • run evaluator version pinned in verdict (or enforce a compatibility mapping)

  • produce:

    • verdict-equivalence result
    • a delta explanation if mismatch occurs

Engineering rule: replay must fail loudly and specifically when inputs are missing.


3.7 Testing strategy (required)

Deterministic systems require “golden” testing.

Minimum tests:

  1. Golden verdict tests

    • Fixed artifact + fixed snapshots + fixed policy
    • Expected verdict output must match exactly
  2. Cross-platform determinism tests

    • Run same evaluation on different machines/containers and compare outputs
  3. Mutation tests for determinism

    • Randomize ordering of internal collections; output should remain unchanged
  4. Replay regression tests

    • Store verdict + snapshots and replay after code changes to ensure compatibility guarantees hold

3.8 Versioning and backward compatibility guidelines

This is essential to prevent “replay breaks after upgrades.”

Rules:

  • Verdict schema version changes must be rare and carefully managed.

  • Maintain a compatibility matrix:

    • evaluator vX can replay verdict schema vY
  • If you must evolve logic, do so by:

    • bumping evaluator version
    • preserving older evaluators in a compatibility mode (containerized evaluators are often easiest)

3.9 Security and key management guidelines

Development managers must ensure:

  • Signing keys are managed via:

    • KMS/HSM (enterprise)
    • keyless (OIDC-based) where acceptable
    • offline keys for air-gapped
  • Verification trust policy is explicit:

    • which identities are trusted to sign verdicts
    • which policies are accepted
    • whether transparency is required
    • how to handle revocation/rotation
  • Separate “can sign” from “can publish”

    • Signing should be restricted; publishing may be broader.

4) Operational workflow requirements (cross-functional)

4.1 CI gate flow

  • Build artifact

  • Produce SBOM deterministically (or record SBOM digest if generated elsewhere)

  • Evaluate → produce verdict payload

  • Sign verdict → publish attestation attached to artifact

  • Gate decision uses verification of:

    • signature validity
    • policy compliance
    • snapshot integrity

4.2 Registry / admission flow

  • Admission controller checks for a valid, trusted verdict attestation

  • Optionally requires:

    • verdict not older than X snapshot age (this is policy)
    • no expired exceptions
    • replay not required (replay is for audits; admission is fast-path)

4.3 Audit flow

  • Export “audit pack”:

    • verdict + signature chain
    • policy bundle
    • knowledge snapshot
    • referenced evidence bundles
  • Auditor (or internal team) runs verify and optionally replay


5) Common failure modes to avoid

  1. Signing “findings” instead of a decision

    • Leads to unbounded payload growth and weak governance semantics.
  2. Using “latest” feeds during evaluation

    • Breaks replayability immediately.
  3. Embedding timestamps in signed payload

    • Eliminates deterministic byte-level reproducibility.
  4. Letting the UI become the source of truth

    • The verdict artifact must be the authority; UI is a view.
  5. No clear separation between: evidence store, snapshot store, verdict store

    • Creates coupling and makes offline operations painful.

6) Definition of Done checklist (use this to gate release)

A feature increment for signed, replayable verdicts is “done” only if:

  • Verdict binds to immutable subject digest
  • Verdict includes policy digest/version and knowledge snapshot digests
  • Verdict is signed and verifiable via CLI
  • Verification works offline (given exported artifacts)
  • Replay works with stored snapshots and produces match/mismatch output with reasons
  • Determinism tests pass (golden + mutation + cross-platform)
  • UI displays signer identity, policy, snapshot IDs, and rule→evidence links
  • Exceptions (if implemented) are recorded in verdict and enforced deterministically

  1. Canonical verdict schema + deterministic evaluator skeleton
  2. Signing + verification CLI
  3. Snapshot bundle format + pinned evaluation
  4. Replay tool + golden tests
  5. OCI attachment publishing + registry/admission integration
  6. Evidence bundles + UI explainability
  7. Exceptions + audit pack export

If you want this turned into a formal internal PRD template, I can format it as:

  • “Product requirements” (MUST/SHOULD/COULD)
  • “Engineering requirements” (interfaces + invariants + test plan)
  • “Security model” (trust roots, signing identities, verification policy)
  • “Acceptance criteria” for an MVP and for GA