Files
git.stella-ops.org/docs-archived/product-advisories/2025-12-21-moat-phase2/20-Dec-2025 - Moat Explanation - Exception management as auditable objects.md
2026-01-05 16:02:11 +02:00

16 KiB
Raw Blame History

Below are implementation-grade guidelines for Stella Ops Product Managers (PMs) and Development Managers (Eng Managers / Tech Leads) for two tightly coupled capabilities:

  1. Exception management as auditable objects (not suppression files)
  2. Audit packs (exportable, verifiable evidence bundles for releases and environments)

The intent is to make these capabilities:

  • operationally useful (reduce friction in CI/CD and runtime governance),
  • defensible in audits (tamper-evident, attributable, time-bounded), and
  • consistent with Stella Ops positioning around determinism, evidence, and replayability.

1. Shared objectives and boundaries

1.1 Objectives

These two capabilities must jointly enable:

  • Risk decisions are explicit: Every “ignore/suppress/waive” is a governed decision with an owner and expiry.
  • Decisions are replayable: If an auditor asks “why did you ship this on date X?”, Stella Ops can reproduce the decision using the same policy + evidence + knowledge snapshot.
  • Decisions are exportable and verifiable: Audit packs include the minimum necessary artifacts and a manifest that allows independent verification of integrity and completeness.
  • Operational friction is reduced: Teams can ship safely with controlled exceptions, rather than ad-hoc suppressions, while retaining accountability.

1.2 Out of scope (explicitly)

Avoid scope creep early. The following are out of scope for v1 unless mandated by a target customer:

  • Full GRC mapping to specific frameworks (you can support evidence; dont claim compliance).
  • Fully automated approvals based on HR org charts.
  • Multi-year archival systems (start with retention, export, and immutable event logs).
  • A “ticketing system replacement.” Integrate with ticketing; dont rebuild it.

2. Shared design principles (non-negotiables)

These principles apply to both Exception Objects and Audit Packs:

  1. Attribution: every action has an authenticated actor identity (human or service), a timestamp, and a reason.

  2. Immutability of history: edits are new versions/events; never rewrite history in place.

  3. Least privilege scope: exceptions must be as narrow as possible (artifact digest over tag; component purl over “any”; environment constraints).

  4. Time-bounded risk: exceptions must expire. “Permanent ignore” is a governance smell.

  5. Deterministic evaluation: given the same policy + snapshot + exceptions + inputs, the outcome is stable and reproducible.

  6. Separation of concerns:

    • Exception store = governed decisions.
    • Scanner = evidence producer.
    • Policy engine = deterministic evaluator.
    • Audit packer = exporter/assembler/verifier.

3. Exception management as auditable objects

3.1 What an “Exception Object” is

An Exception Object is a structured, versioned record that modifies evaluation behavior in a controlled manner, while leaving the underlying findings intact.

It is not:

  • a local .ignore file,
  • a hidden suppression rule,
  • a UI-only toggle,
  • a vendor-specific “ignore list” with no audit trail.

Exception types you should support (minimum set)

PMs should start with these canonical types:

  1. Vulnerability exception

    • suppress/waive a specific vulnerability finding (e.g., CVE/CWE) under defined scope.
  2. Policy exception

    • allow a policy rule to be bypassed under defined scope (e.g., “allow unsigned artifact for dev namespace”).
  3. Unknown-state exception (if Stella models unknowns)

    • allow a release despite unresolved unknowns, with explicit risk acceptance.
  4. Component exception

    • allow/deny a component/package/version across a domain, again with explicit scope and expiry.

3.2 Required fields and schema guidelines

PMs: mandate these fields; Eng: enforce them at API and storage level.

Required fields (v1)

  • exception_id (stable identifier)
  • version (monotonic; or event-sourced)
  • status: proposed | approved | active | expired | revoked
  • owner (accountable person/team)
  • requester (who initiated)
  • approver(s) (who approved; may be empty for dev environments depending on policy)
  • created_at / updated_at / approved_at / expires_at
  • scope (see below)
  • reason_code (taxonomy)
  • rationale (free text, required)
  • evidence_refs (optional in v1 but strongly recommended)
  • risk_acceptance (explicit boolean or structured “risk accepted” block)
  • links (ticket ID, PR, incident, vendor advisory reference) optional but useful
  • audit_log_refs (implicit if event-sourced)

Scope model (critical to defensibility)

Scope must be structured and narrowable. Provide scope dimensions such as:

  • Artifact scope: image digest, SBOM digest, build provenance digest (preferred) (Avoid tags as primary scope unless paired with immutability constraints.)
  • Component scope: purl + version range + ecosystem
  • Vulnerability scope: CVE ID(s), GHSA, internal ID; optionally path/function/symbol constraints
  • Environment scope: cluster/namespace, runtime env (dev/stage/prod), repository, project, tenant
  • Time scope: expires_at (required), optional “valid_from”

PM guideline: default UI and API should encourage digest-based scope and warn on broad scopes.

3.3 Reason codes (taxonomy)

Reason codes are a moat because they enable governance analytics and policy automation.

Minimum suggested taxonomy:

  • FALSE_POSITIVE (with evidence expectations)
  • NOT_REACHABLE (reachable proof preferred)
  • NOT_AFFECTED (VEX-backed preferred)
  • BACKPORT_FIXED (package/distro evidence preferred)
  • COMPENSATING_CONTROL (link to control evidence)
  • RISK_ACCEPTED (explicit sign-off)
  • TEMPORARY_WORKAROUND (link to mitigation plan)
  • VENDOR_PENDING (under investigation)
  • BUSINESS_EXCEPTION (rare; requires stronger approval)

PM guideline: reason codes must be selectable and reportable; do not allow “Other” as the default.

3.4 Evidence attachments

Exceptions should evolve from “justification-only” to “justification + evidence.”

Evidence references can point to:

  • VEX statements (OpenVEX/CycloneDX VEX)
  • reachability proof fragments (call-path subgraph, symbol references)
  • distro advisories / patch references
  • internal change tickets / mitigation PRs
  • runtime mitigations

Eng guideline: store evidence as references with integrity checks (hash/digest). For v2+, store evidence bundles as content-addressed blobs.

3.5 Lifecycle and workflows

Lifecycle states and transitions

  • ProposedApprovedActive → (Expired or Revoked)
  • Renewal should create a new version (never extend an old record silently).

Approvals

PM guideline:

  • At least two approval modes:

    1. Self-approved (allowed only for dev/experimental scopes)
    2. Two-person review (required for prod or broad scope)

Eng guideline:

  • Enforce approval rules via policy config (not hard-coded).
  • Record every approval action with actor identity and timestamp.

Expiry enforcement

Non-negotiable:

  • Expired exceptions must stop applying automatically.
  • Renewals require an explicit action and new audit trail.

3.6 Evaluation semantics (how exceptions affect results)

This is where most products become non-auditable. You need deterministic, explicit rules.

PM guideline: define precedence clearly:

  • Policy engine evaluates baseline findings → applies exceptions → produces verdict.
  • Exceptions never delete underlying findings; they alter the decision outcome and annotate the reasoning.

Eng guideline: exception application must be:

  • Deterministic (stable ordering rules)
  • Transparent (verdict includes “exception applied: exception_id, reason_code, scope match explanation”)
  • Scoped (match explanation must state which scope dimensions matched)

3.7 Auditability requirements

Exception management must be audit-ready by construction.

Minimum requirements:

  • Append-only event log for create/approve/revoke/expire/renew actions
  • Versioning: every change results in a new version or event
  • Tamper-evidence: hash chain events or sign event batches
  • Retention: define retention policy and export strategy

PM guideline: auditors will ask “who approved,” “why,” “when,” “what scope,” and “what changed since.” Design the UX and exports to answer those in minutes.

3.8 UX guidelines

Key UX flows:

  • Create exception from a finding (pre-fill CVE/component/artifact scope)
  • Preview impact (“this will suppress 37 findings across 12 images; are you sure?”)
  • Expiry visibility (countdown, alerts, renewal prompts)
  • Audit trail view (who did what, with diffs between versions)
  • Search and filters by owner, reason, expiry window, scope breadth, environment

UX anti-patterns to forbid:

  • “Ignore all vulnerabilities in this image” with one click
  • Silent suppressions without owner/expiry
  • Exceptions created without linking to scope and reason

3.9 Product acceptance criteria (PM-owned)

A feature is not “done” until:

  • Every exception has owner, expiry, reason code, scope.
  • Exception history is immutable and exportable.
  • Policy outcomes show applied exceptions and why.
  • Expiry is enforced automatically.
  • A user can answer: “What exceptions were active for this release?” within 2 minutes.

4. Audit packs

4.1 What an audit pack is

An Audit Pack is a portable, verifiable bundle that answers:

  • What was evaluated? (artifacts, versions, identities)
  • Under what policies? (policy version/config)
  • Using what knowledge state? (vuln DB snapshot, VEX inputs)
  • What exceptions were applied? (IDs, owners, rationales)
  • What was the decision and why? (verdict + evidence pointers)
  • What changed since the last release? (optional diff summary)

PM guideline: treat the Audit Pack as a product deliverable, not an export button.

Use a predictable, documented layout. Example:

  • manifest.json

    • pack_id, generated_at, generator_version
    • hashes/digests of every included file
    • signing info (optional in v1; recommended soon)
  • inputs/

    • artifact identifiers (digests), repo references (optional)
    • SBOM(s) (CycloneDX/SPDX)
  • vex/

    • VEX docs used + any VEX produced
  • policy/

    • policy bundle used (versioned)
    • evaluation settings
  • exceptions/

    • all exceptions relevant to the evaluated scope
    • plus event logs / versions
  • findings/

    • normalized findings list
    • reachability evidence fragments if applicable
  • verdict/

    • final decision object
    • explanation summary
    • signed attestation (if supported)
  • diff/ (optional)

    • delta from prior baseline (what changed materially)

4.3 Formats: human and machine

You need both:

  • Machine-readable (JSON + standard SBOM/VEX formats) for verification and automation
  • Human-readable summary (HTML or PDF) for auditors and leadership

PM guideline: machine artifacts are the source of truth. Human docs are derived views.

Eng guideline:

  • Ensure the pack can be generated offline.
  • Ensure deterministic outputs where feasible (stable ordering, consistent serialization).

4.4 Integrity and verification

At minimum:

  • manifest.json includes a digest for each file.

  • Provide a stella verify-pack CLI that checks:

    • manifest integrity
    • file hashes
    • schema versions
    • optional signature verification

For v2:

  • Sign the manifest (and/or the verdict) using your standard attestation mechanism.

4.5 Confidentiality and redaction

Audit packs often include sensitive data (paths, internal package names, repo URLs).

PM guideline:

  • Provide redaction profiles:

    • external auditor pack (minimal identifiers)
    • internal audit pack (full detail)
  • Provide encryption options (password/recipient keys) if packs leave the environment.

Eng guideline:

  • Redaction must be deterministic and declarative (policy-based).
  • Pack generation must not leak secrets from raw scan logs.

4.6 Pack generation workflow

Key product flows:

  • Generate pack for:

    • a specific artifact digest
    • a release (set of digests)
    • an environment snapshot (e.g., cluster inventory)
    • a date range (for audit period)
  • Trigger sources:

    • UI
    • API
    • CI pipeline step

Engineering:

  • Treat pack generation as an async job (queue + status endpoint).
  • Cache pack components when inputs are identical (avoid repeated work).

4.7 What must be included (minimum viable audit pack)

PMs should enforce that v1 includes:

  • Artifact identity
  • SBOM(s) or component inventory
  • Findings list (normalized)
  • Policy bundle reference + policy content
  • Exceptions applied (full object + version info)
  • Final verdict + explanation summary
  • Integrity manifest with file hashes

Add these when available (v1.5+):

  • VEX inputs and outputs
  • Knowledge snapshot references
  • Reachability evidence fragments
  • Diff summary vs prior release

4.8 Product acceptance criteria (PM-owned)

Audit Packs are not “done” until:

  • A third party can validate the pack contents havent been altered (hash verification).
  • The pack answers “why did this pass/fail?” including exceptions applied.
  • Packs can be generated without external network calls (air-gap friendly).
  • Packs support redaction profiles.
  • Pack schema is versioned and backward compatible.

5. Cross-cutting: roles, responsibilities, and delivery checkpoints

5.1 Responsibilities

Product Manager

  • Define exception types and required fields
  • Define reason code taxonomy and governance policies
  • Define approval rules by environment and scope breadth
  • Define audit pack templates, profiles, and export targets
  • Own acceptance criteria and audit usability testing

Development Manager / Tech Lead

  • Own event model (immutability, versioning, retention)
  • Own policy evaluation semantics and determinism guarantees
  • Own integrity and signing design (manifest hashes, optional signatures)
  • Own performance and scalability targets (pack generation and query latency)
  • Own secure storage and access controls (RBAC, tenant isolation)

5.2 Deliverables checklist (for each capability)

For “Exception Objects”:

  • PRD + threat model (abuse cases: blanket waivers, privilege escalation)
  • Schema spec + versioning policy
  • API endpoints + RBAC model
  • UI flows + audit trail UI
  • Policy engine semantics + test vectors
  • Metrics dashboards

For “Audit Packs”:

  • Pack schema spec + folder layout
  • Manifest + hash verification rules
  • Generator service + async job API
  • Redaction profiles + tests
  • Verifier CLI + documentation
  • Performance benchmarks + caching strategy

6. Common failure modes to actively prevent

  1. Exceptions become suppressions again If you allow exceptions without expiry/owner or without audit trail, youve rebuilt “ignore lists.”

  2. Over-broad scopes by default If “all repos/all images” is easy, you will accumulate permanent waivers and lose credibility.

  3. No deterministic semantics If the same artifact can pass/fail depending on evaluation order or transient feed updates, auditors will distrust outputs.

  4. Audit packs that are reports, not evidence A PDF without machine-verifiable artifacts is not an audit pack—its a slide.

  5. No renewal discipline If renewals are frictionless and dont require re-justification, exceptions never die.


7. Recommended phased rollout (to manage build cost)

Phase 1: Governance basics

  • Exception object schema + lifecycle + expiry enforcement
  • Create-from-finding UX
  • Audit pack v1 (SBOM/inventory + findings + policy + exceptions + manifest)

Phase 2: Evidence binding

  • Evidence refs on exceptions (VEX, reachability fragments)
  • Pack includes VEX inputs/outputs and knowledge snapshot identifiers

Phase 3: Verifiable trust

  • Signed verdicts and/or signed pack manifests
  • Verifier tooling and deterministic replay hooks

If you want, I can convert the above into two artifacts your teams can execute against immediately:

  1. A concise PRD template (sections + required decisions) for Exceptions and Audit Packs
  2. A technical spec outline (schema definitions, endpoints, state machines, and acceptance test vectors)