Files

master d0a7b88398 move docs/**/archived/* to docs-archived/**/*

2026-01-05 16:02:11 +02:00

16 KiB

Raw Blame History

Below are implementation-grade guidelines for Stella Ops Product Managers (PMs) and Development Managers (Eng Managers / Tech Leads) for two tightly coupled capabilities:

Exception management as auditable objects (not suppression files)
Audit packs (exportable, verifiable evidence bundles for releases and environments)

The intent is to make these capabilities:

operationally useful (reduce friction in CI/CD and runtime governance),
defensible in audits (tamper-evident, attributable, time-bounded), and
consistent with Stella Ops’ positioning around determinism, evidence, and replayability.

1. Shared objectives and boundaries

1.1 Objectives

These two capabilities must jointly enable:

Risk decisions are explicit: Every “ignore/suppress/waive” is a governed decision with an owner and expiry.
Decisions are replayable: If an auditor asks “why did you ship this on date X?”, Stella Ops can reproduce the decision using the same policy + evidence + knowledge snapshot.
Decisions are exportable and verifiable: Audit packs include the minimum necessary artifacts and a manifest that allows independent verification of integrity and completeness.
Operational friction is reduced: Teams can ship safely with controlled exceptions, rather than ad-hoc suppressions, while retaining accountability.

1.2 Out of scope (explicitly)

Avoid scope creep early. The following are out of scope for v1 unless mandated by a target customer:

Full GRC mapping to specific frameworks (you can support evidence; don’t claim compliance).
Fully automated approvals based on HR org charts.
Multi-year archival systems (start with retention, export, and immutable event logs).
A “ticketing system replacement.” Integrate with ticketing; don’t rebuild it.

2. Shared design principles (non-negotiables)

These principles apply to both Exception Objects and Audit Packs:

Attribution: every action has an authenticated actor identity (human or service), a timestamp, and a reason.
Immutability of history: edits are new versions/events; never rewrite history in place.
Least privilege scope: exceptions must be as narrow as possible (artifact digest over tag; component purl over “any”; environment constraints).
Time-bounded risk: exceptions must expire. “Permanent ignore” is a governance smell.
Deterministic evaluation: given the same policy + snapshot + exceptions + inputs, the outcome is stable and reproducible.
Separation of concerns:
- Exception store = governed decisions.
- Scanner = evidence producer.
- Policy engine = deterministic evaluator.
- Audit packer = exporter/assembler/verifier.

3. Exception management as auditable objects

3.1 What an “Exception Object” is

An Exception Object is a structured, versioned record that modifies evaluation behavior in a controlled manner, while leaving the underlying findings intact.

It is not:

a local .ignore file,
a hidden suppression rule,
a UI-only toggle,
a vendor-specific “ignore list” with no audit trail.

Exception types you should support (minimum set)

PMs should start with these canonical types:

Vulnerability exception
- suppress/waive a specific vulnerability finding (e.g., CVE/CWE) under defined scope.
Policy exception
- allow a policy rule to be bypassed under defined scope (e.g., “allow unsigned artifact for dev namespace”).
Unknown-state exception (if Stella models unknowns)
- allow a release despite unresolved unknowns, with explicit risk acceptance.
Component exception
- allow/deny a component/package/version across a domain, again with explicit scope and expiry.

3.2 Required fields and schema guidelines

PMs: mandate these fields; Eng: enforce them at API and storage level.

Required fields (v1)

exception_id (stable identifier)
version (monotonic; or event-sourced)
status: proposed | approved | active | expired | revoked
owner (accountable person/team)
requester (who initiated)
approver(s) (who approved; may be empty for dev environments depending on policy)
created_at / updated_at / approved_at / expires_at
scope (see below)
reason_code (taxonomy)
rationale (free text, required)
evidence_refs (optional in v1 but strongly recommended)
risk_acceptance (explicit boolean or structured “risk accepted” block)
links (ticket ID, PR, incident, vendor advisory reference) – optional but useful
audit_log_refs (implicit if event-sourced)

Scope model (critical to defensibility)

Scope must be structured and narrowable. Provide scope dimensions such as:

Artifact scope: image digest, SBOM digest, build provenance digest (preferred) (Avoid tags as primary scope unless paired with immutability constraints.)
Component scope: purl + version range + ecosystem
Vulnerability scope: CVE ID(s), GHSA, internal ID; optionally path/function/symbol constraints
Environment scope: cluster/namespace, runtime env (dev/stage/prod), repository, project, tenant
Time scope: expires_at (required), optional “valid_from”

PM guideline: default UI and API should encourage digest-based scope and warn on broad scopes.

3.3 Reason codes (taxonomy)

Reason codes are a moat because they enable governance analytics and policy automation.

Minimum suggested taxonomy:

FALSE_POSITIVE (with evidence expectations)
NOT_REACHABLE (reachable proof preferred)
NOT_AFFECTED (VEX-backed preferred)
BACKPORT_FIXED (package/distro evidence preferred)
COMPENSATING_CONTROL (link to control evidence)
RISK_ACCEPTED (explicit sign-off)
TEMPORARY_WORKAROUND (link to mitigation plan)
VENDOR_PENDING (under investigation)
BUSINESS_EXCEPTION (rare; requires stronger approval)

PM guideline: reason codes must be selectable and reportable; do not allow “Other” as the default.

3.4 Evidence attachments

Exceptions should evolve from “justification-only” to “justification + evidence.”

Evidence references can point to:

VEX statements (OpenVEX/CycloneDX VEX)
reachability proof fragments (call-path subgraph, symbol references)
distro advisories / patch references
internal change tickets / mitigation PRs
runtime mitigations

Eng guideline: store evidence as references with integrity checks (hash/digest). For v2+, store evidence bundles as content-addressed blobs.

3.5 Lifecycle and workflows

Lifecycle states and transitions

Proposed → Approved → Active → (Expired or Revoked)
Renewal should create a new version (never extend an old record silently).

Approvals

PM guideline:

At least two approval modes:
1. Self-approved (allowed only for dev/experimental scopes)
2. Two-person review (required for prod or broad scope)

Eng guideline:

Enforce approval rules via policy config (not hard-coded).
Record every approval action with actor identity and timestamp.

Expiry enforcement

Non-negotiable:

Expired exceptions must stop applying automatically.
Renewals require an explicit action and new audit trail.

3.6 Evaluation semantics (how exceptions affect results)

This is where most products become non-auditable. You need deterministic, explicit rules.

PM guideline: define precedence clearly:

Policy engine evaluates baseline findings → applies exceptions → produces verdict.
Exceptions never delete underlying findings; they alter the decision outcome and annotate the reasoning.

Eng guideline: exception application must be:

Deterministic (stable ordering rules)
Transparent (verdict includes “exception applied: exception_id, reason_code, scope match explanation”)
Scoped (match explanation must state which scope dimensions matched)

3.7 Auditability requirements

Exception management must be audit-ready by construction.

Minimum requirements:

Append-only event log for create/approve/revoke/expire/renew actions
Versioning: every change results in a new version or event
Tamper-evidence: hash chain events or sign event batches
Retention: define retention policy and export strategy

PM guideline: auditors will ask “who approved,” “why,” “when,” “what scope,” and “what changed since.” Design the UX and exports to answer those in minutes.

3.8 UX guidelines

Key UX flows:

Create exception from a finding (pre-fill CVE/component/artifact scope)
Preview impact (“this will suppress 37 findings across 12 images; are you sure?”)
Expiry visibility (countdown, alerts, renewal prompts)
Audit trail view (who did what, with diffs between versions)
Search and filters by owner, reason, expiry window, scope breadth, environment

UX anti-patterns to forbid:

“Ignore all vulnerabilities in this image” with one click
Silent suppressions without owner/expiry
Exceptions created without linking to scope and reason

3.9 Product acceptance criteria (PM-owned)

A feature is not “done” until:

Every exception has owner, expiry, reason code, scope.
Exception history is immutable and exportable.
Policy outcomes show applied exceptions and why.
Expiry is enforced automatically.
A user can answer: “What exceptions were active for this release?” within 2 minutes.

4. Audit packs

4.1 What an audit pack is

An Audit Pack is a portable, verifiable bundle that answers:

What was evaluated? (artifacts, versions, identities)
Under what policies? (policy version/config)
Using what knowledge state? (vuln DB snapshot, VEX inputs)
What exceptions were applied? (IDs, owners, rationales)
What was the decision and why? (verdict + evidence pointers)
What changed since the last release? (optional diff summary)

PM guideline: treat the Audit Pack as a product deliverable, not an export button.

4.2 Pack structure (recommended)

Use a predictable, documented layout. Example:

manifest.json
- pack_id, generated_at, generator_version
- hashes/digests of every included file
- signing info (optional in v1; recommended soon)
inputs/
- artifact identifiers (digests), repo references (optional)
- SBOM(s) (CycloneDX/SPDX)
vex/
- VEX docs used + any VEX produced
policy/
- policy bundle used (versioned)
- evaluation settings
exceptions/
- all exceptions relevant to the evaluated scope
- plus event logs / versions
findings/
- normalized findings list
- reachability evidence fragments if applicable
verdict/
- final decision object
- explanation summary
- signed attestation (if supported)
diff/ (optional)
- delta from prior baseline (what changed materially)

4.3 Formats: human and machine

You need both:

Machine-readable (JSON + standard SBOM/VEX formats) for verification and automation
Human-readable summary (HTML or PDF) for auditors and leadership

PM guideline: machine artifacts are the source of truth. Human docs are derived views.

Eng guideline:

Ensure the pack can be generated offline.
Ensure deterministic outputs where feasible (stable ordering, consistent serialization).

4.4 Integrity and verification

At minimum:

manifest.json includes a digest for each file.
Provide a stella verify-pack CLI that checks:
- manifest integrity
- file hashes
- schema versions
- optional signature verification

For v2:

Sign the manifest (and/or the verdict) using your standard attestation mechanism.

4.5 Confidentiality and redaction

Audit packs often include sensitive data (paths, internal package names, repo URLs).

PM guideline:

Provide redaction profiles:
- external auditor pack (minimal identifiers)
- internal audit pack (full detail)
Provide encryption options (password/recipient keys) if packs leave the environment.

Eng guideline:

Redaction must be deterministic and declarative (policy-based).
Pack generation must not leak secrets from raw scan logs.

4.6 Pack generation workflow

Key product flows:

Generate pack for:
- a specific artifact digest
- a release (set of digests)
- an environment snapshot (e.g., cluster inventory)
- a date range (for audit period)
Trigger sources:
- UI
- API
- CI pipeline step

Engineering:

Treat pack generation as an async job (queue + status endpoint).
Cache pack components when inputs are identical (avoid repeated work).

4.7 What must be included (minimum viable audit pack)

PMs should enforce that v1 includes:

Artifact identity
SBOM(s) or component inventory
Findings list (normalized)
Policy bundle reference + policy content
Exceptions applied (full object + version info)
Final verdict + explanation summary
Integrity manifest with file hashes

Add these when available (v1.5+):

VEX inputs and outputs
Knowledge snapshot references
Reachability evidence fragments
Diff summary vs prior release

4.8 Product acceptance criteria (PM-owned)

Audit Packs are not “done” until:

A third party can validate the pack contents haven’t been altered (hash verification).
The pack answers “why did this pass/fail?” including exceptions applied.
Packs can be generated without external network calls (air-gap friendly).
Packs support redaction profiles.
Pack schema is versioned and backward compatible.

5. Cross-cutting: roles, responsibilities, and delivery checkpoints

5.1 Responsibilities

Product Manager

Define exception types and required fields
Define reason code taxonomy and governance policies
Define approval rules by environment and scope breadth
Define audit pack templates, profiles, and export targets
Own acceptance criteria and audit usability testing

Development Manager / Tech Lead

Own event model (immutability, versioning, retention)
Own policy evaluation semantics and determinism guarantees
Own integrity and signing design (manifest hashes, optional signatures)
Own performance and scalability targets (pack generation and query latency)
Own secure storage and access controls (RBAC, tenant isolation)

5.2 Deliverables checklist (for each capability)

For “Exception Objects”:

PRD + threat model (abuse cases: blanket waivers, privilege escalation)
Schema spec + versioning policy
API endpoints + RBAC model
UI flows + audit trail UI
Policy engine semantics + test vectors
Metrics dashboards

For “Audit Packs”:

Pack schema spec + folder layout
Manifest + hash verification rules
Generator service + async job API
Redaction profiles + tests
Verifier CLI + documentation
Performance benchmarks + caching strategy

6. Common failure modes to actively prevent

Exceptions become suppressions again If you allow exceptions without expiry/owner or without audit trail, you’ve rebuilt “ignore lists.”
Over-broad scopes by default If “all repos/all images” is easy, you will accumulate permanent waivers and lose credibility.
No deterministic semantics If the same artifact can pass/fail depending on evaluation order or transient feed updates, auditors will distrust outputs.
Audit packs that are reports, not evidence A PDF without machine-verifiable artifacts is not an audit pack—it’s a slide.
No renewal discipline If renewals are frictionless and don’t require re-justification, exceptions never die.

7. Recommended phased rollout (to manage build cost)

Phase 1: Governance basics

Exception object schema + lifecycle + expiry enforcement
Create-from-finding UX
Audit pack v1 (SBOM/inventory + findings + policy + exceptions + manifest)

Phase 2: Evidence binding

Evidence refs on exceptions (VEX, reachability fragments)
Pack includes VEX inputs/outputs and knowledge snapshot identifiers

Phase 3: Verifiable trust

Signed verdicts and/or signed pack manifests
Verifier tooling and deterministic replay hooks

If you want, I can convert the above into two artifacts your teams can execute against immediately:

A concise PRD template (sections + required decisions) for Exceptions and Audit Packs
A technical spec outline (schema definitions, endpoints, state machines, and acceptance test vectors)

16 KiB Raw Blame History Unescape Escape

1. Shared objectives and boundaries

1.1 Objectives

1.2 Out of scope (explicitly)

2. Shared design principles (non-negotiables)

3. Exception management as auditable objects

3.1 What an “Exception Object” is

Exception types you should support (minimum set)

3.2 Required fields and schema guidelines

Required fields (v1)

Scope model (critical to defensibility)

3.3 Reason codes (taxonomy)

3.4 Evidence attachments

3.5 Lifecycle and workflows

Lifecycle states and transitions

Approvals

Expiry enforcement

3.6 Evaluation semantics (how exceptions affect results)

3.7 Auditability requirements

3.8 UX guidelines

3.9 Product acceptance criteria (PM-owned)

4. Audit packs

4.1 What an audit pack is

4.2 Pack structure (recommended)

4.3 Formats: human and machine

4.4 Integrity and verification

4.5 Confidentiality and redaction

4.6 Pack generation workflow

4.7 What must be included (minimum viable audit pack)

4.8 Product acceptance criteria (PM-owned)

5. Cross-cutting: roles, responsibilities, and delivery checkpoints

5.1 Responsibilities

5.2 Deliverables checklist (for each capability)

6. Common failure modes to actively prevent

7. Recommended phased rollout (to manage build cost)

16 KiB

Raw Blame History