16 KiB
Below are implementation-grade guidelines for Stella Ops Product Managers (PMs) and Development Managers (Eng Managers / Tech Leads) for two tightly coupled capabilities:
- Exception management as auditable objects (not suppression files)
- Audit packs (exportable, verifiable evidence bundles for releases and environments)
The intent is to make these capabilities:
- operationally useful (reduce friction in CI/CD and runtime governance),
- defensible in audits (tamper-evident, attributable, time-bounded), and
- consistent with Stella Ops’ positioning around determinism, evidence, and replayability.
1. Shared objectives and boundaries
1.1 Objectives
These two capabilities must jointly enable:
- Risk decisions are explicit: Every “ignore/suppress/waive” is a governed decision with an owner and expiry.
- Decisions are replayable: If an auditor asks “why did you ship this on date X?”, Stella Ops can reproduce the decision using the same policy + evidence + knowledge snapshot.
- Decisions are exportable and verifiable: Audit packs include the minimum necessary artifacts and a manifest that allows independent verification of integrity and completeness.
- Operational friction is reduced: Teams can ship safely with controlled exceptions, rather than ad-hoc suppressions, while retaining accountability.
1.2 Out of scope (explicitly)
Avoid scope creep early. The following are out of scope for v1 unless mandated by a target customer:
- Full GRC mapping to specific frameworks (you can support evidence; don’t claim compliance).
- Fully automated approvals based on HR org charts.
- Multi-year archival systems (start with retention, export, and immutable event logs).
- A “ticketing system replacement.” Integrate with ticketing; don’t rebuild it.
2. Shared design principles (non-negotiables)
These principles apply to both Exception Objects and Audit Packs:
-
Attribution: every action has an authenticated actor identity (human or service), a timestamp, and a reason.
-
Immutability of history: edits are new versions/events; never rewrite history in place.
-
Least privilege scope: exceptions must be as narrow as possible (artifact digest over tag; component purl over “any”; environment constraints).
-
Time-bounded risk: exceptions must expire. “Permanent ignore” is a governance smell.
-
Deterministic evaluation: given the same policy + snapshot + exceptions + inputs, the outcome is stable and reproducible.
-
Separation of concerns:
- Exception store = governed decisions.
- Scanner = evidence producer.
- Policy engine = deterministic evaluator.
- Audit packer = exporter/assembler/verifier.
3. Exception management as auditable objects
3.1 What an “Exception Object” is
An Exception Object is a structured, versioned record that modifies evaluation behavior in a controlled manner, while leaving the underlying findings intact.
It is not:
- a local
.ignorefile, - a hidden suppression rule,
- a UI-only toggle,
- a vendor-specific “ignore list” with no audit trail.
Exception types you should support (minimum set)
PMs should start with these canonical types:
-
Vulnerability exception
- suppress/waive a specific vulnerability finding (e.g., CVE/CWE) under defined scope.
-
Policy exception
- allow a policy rule to be bypassed under defined scope (e.g., “allow unsigned artifact for dev namespace”).
-
Unknown-state exception (if Stella models unknowns)
- allow a release despite unresolved unknowns, with explicit risk acceptance.
-
Component exception
- allow/deny a component/package/version across a domain, again with explicit scope and expiry.
3.2 Required fields and schema guidelines
PMs: mandate these fields; Eng: enforce them at API and storage level.
Required fields (v1)
- exception_id (stable identifier)
- version (monotonic; or event-sourced)
- status: proposed | approved | active | expired | revoked
- owner (accountable person/team)
- requester (who initiated)
- approver(s) (who approved; may be empty for dev environments depending on policy)
- created_at / updated_at / approved_at / expires_at
- scope (see below)
- reason_code (taxonomy)
- rationale (free text, required)
- evidence_refs (optional in v1 but strongly recommended)
- risk_acceptance (explicit boolean or structured “risk accepted” block)
- links (ticket ID, PR, incident, vendor advisory reference) – optional but useful
- audit_log_refs (implicit if event-sourced)
Scope model (critical to defensibility)
Scope must be structured and narrowable. Provide scope dimensions such as:
- Artifact scope: image digest, SBOM digest, build provenance digest (preferred) (Avoid tags as primary scope unless paired with immutability constraints.)
- Component scope: purl + version range + ecosystem
- Vulnerability scope: CVE ID(s), GHSA, internal ID; optionally path/function/symbol constraints
- Environment scope: cluster/namespace, runtime env (dev/stage/prod), repository, project, tenant
- Time scope: expires_at (required), optional “valid_from”
PM guideline: default UI and API should encourage digest-based scope and warn on broad scopes.
3.3 Reason codes (taxonomy)
Reason codes are a moat because they enable governance analytics and policy automation.
Minimum suggested taxonomy:
- FALSE_POSITIVE (with evidence expectations)
- NOT_REACHABLE (reachable proof preferred)
- NOT_AFFECTED (VEX-backed preferred)
- BACKPORT_FIXED (package/distro evidence preferred)
- COMPENSATING_CONTROL (link to control evidence)
- RISK_ACCEPTED (explicit sign-off)
- TEMPORARY_WORKAROUND (link to mitigation plan)
- VENDOR_PENDING (under investigation)
- BUSINESS_EXCEPTION (rare; requires stronger approval)
PM guideline: reason codes must be selectable and reportable; do not allow “Other” as the default.
3.4 Evidence attachments
Exceptions should evolve from “justification-only” to “justification + evidence.”
Evidence references can point to:
- VEX statements (OpenVEX/CycloneDX VEX)
- reachability proof fragments (call-path subgraph, symbol references)
- distro advisories / patch references
- internal change tickets / mitigation PRs
- runtime mitigations
Eng guideline: store evidence as references with integrity checks (hash/digest). For v2+, store evidence bundles as content-addressed blobs.
3.5 Lifecycle and workflows
Lifecycle states and transitions
- Proposed → Approved → Active → (Expired or Revoked)
- Renewal should create a new version (never extend an old record silently).
Approvals
PM guideline:
-
At least two approval modes:
- Self-approved (allowed only for dev/experimental scopes)
- Two-person review (required for prod or broad scope)
Eng guideline:
- Enforce approval rules via policy config (not hard-coded).
- Record every approval action with actor identity and timestamp.
Expiry enforcement
Non-negotiable:
- Expired exceptions must stop applying automatically.
- Renewals require an explicit action and new audit trail.
3.6 Evaluation semantics (how exceptions affect results)
This is where most products become non-auditable. You need deterministic, explicit rules.
PM guideline: define precedence clearly:
- Policy engine evaluates baseline findings → applies exceptions → produces verdict.
- Exceptions never delete underlying findings; they alter the decision outcome and annotate the reasoning.
Eng guideline: exception application must be:
- Deterministic (stable ordering rules)
- Transparent (verdict includes “exception applied: exception_id, reason_code, scope match explanation”)
- Scoped (match explanation must state which scope dimensions matched)
3.7 Auditability requirements
Exception management must be audit-ready by construction.
Minimum requirements:
- Append-only event log for create/approve/revoke/expire/renew actions
- Versioning: every change results in a new version or event
- Tamper-evidence: hash chain events or sign event batches
- Retention: define retention policy and export strategy
PM guideline: auditors will ask “who approved,” “why,” “when,” “what scope,” and “what changed since.” Design the UX and exports to answer those in minutes.
3.8 UX guidelines
Key UX flows:
- Create exception from a finding (pre-fill CVE/component/artifact scope)
- Preview impact (“this will suppress 37 findings across 12 images; are you sure?”)
- Expiry visibility (countdown, alerts, renewal prompts)
- Audit trail view (who did what, with diffs between versions)
- Search and filters by owner, reason, expiry window, scope breadth, environment
UX anti-patterns to forbid:
- “Ignore all vulnerabilities in this image” with one click
- Silent suppressions without owner/expiry
- Exceptions created without linking to scope and reason
3.9 Product acceptance criteria (PM-owned)
A feature is not “done” until:
- Every exception has owner, expiry, reason code, scope.
- Exception history is immutable and exportable.
- Policy outcomes show applied exceptions and why.
- Expiry is enforced automatically.
- A user can answer: “What exceptions were active for this release?” within 2 minutes.
4. Audit packs
4.1 What an audit pack is
An Audit Pack is a portable, verifiable bundle that answers:
- What was evaluated? (artifacts, versions, identities)
- Under what policies? (policy version/config)
- Using what knowledge state? (vuln DB snapshot, VEX inputs)
- What exceptions were applied? (IDs, owners, rationales)
- What was the decision and why? (verdict + evidence pointers)
- What changed since the last release? (optional diff summary)
PM guideline: treat the Audit Pack as a product deliverable, not an export button.
4.2 Pack structure (recommended)
Use a predictable, documented layout. Example:
-
manifest.json- pack_id, generated_at, generator_version
- hashes/digests of every included file
- signing info (optional in v1; recommended soon)
-
inputs/- artifact identifiers (digests), repo references (optional)
- SBOM(s) (CycloneDX/SPDX)
-
vex/- VEX docs used + any VEX produced
-
policy/- policy bundle used (versioned)
- evaluation settings
-
exceptions/- all exceptions relevant to the evaluated scope
- plus event logs / versions
-
findings/- normalized findings list
- reachability evidence fragments if applicable
-
verdict/- final decision object
- explanation summary
- signed attestation (if supported)
-
diff/(optional)- delta from prior baseline (what changed materially)
4.3 Formats: human and machine
You need both:
- Machine-readable (JSON + standard SBOM/VEX formats) for verification and automation
- Human-readable summary (HTML or PDF) for auditors and leadership
PM guideline: machine artifacts are the source of truth. Human docs are derived views.
Eng guideline:
- Ensure the pack can be generated offline.
- Ensure deterministic outputs where feasible (stable ordering, consistent serialization).
4.4 Integrity and verification
At minimum:
-
manifest.jsonincludes a digest for each file. -
Provide a
stella verify-packCLI that checks:- manifest integrity
- file hashes
- schema versions
- optional signature verification
For v2:
- Sign the manifest (and/or the verdict) using your standard attestation mechanism.
4.5 Confidentiality and redaction
Audit packs often include sensitive data (paths, internal package names, repo URLs).
PM guideline:
-
Provide redaction profiles:
- external auditor pack (minimal identifiers)
- internal audit pack (full detail)
-
Provide encryption options (password/recipient keys) if packs leave the environment.
Eng guideline:
- Redaction must be deterministic and declarative (policy-based).
- Pack generation must not leak secrets from raw scan logs.
4.6 Pack generation workflow
Key product flows:
-
Generate pack for:
- a specific artifact digest
- a release (set of digests)
- an environment snapshot (e.g., cluster inventory)
- a date range (for audit period)
-
Trigger sources:
- UI
- API
- CI pipeline step
Engineering:
- Treat pack generation as an async job (queue + status endpoint).
- Cache pack components when inputs are identical (avoid repeated work).
4.7 What must be included (minimum viable audit pack)
PMs should enforce that v1 includes:
- Artifact identity
- SBOM(s) or component inventory
- Findings list (normalized)
- Policy bundle reference + policy content
- Exceptions applied (full object + version info)
- Final verdict + explanation summary
- Integrity manifest with file hashes
Add these when available (v1.5+):
- VEX inputs and outputs
- Knowledge snapshot references
- Reachability evidence fragments
- Diff summary vs prior release
4.8 Product acceptance criteria (PM-owned)
Audit Packs are not “done” until:
- A third party can validate the pack contents haven’t been altered (hash verification).
- The pack answers “why did this pass/fail?” including exceptions applied.
- Packs can be generated without external network calls (air-gap friendly).
- Packs support redaction profiles.
- Pack schema is versioned and backward compatible.
5. Cross-cutting: roles, responsibilities, and delivery checkpoints
5.1 Responsibilities
Product Manager
- Define exception types and required fields
- Define reason code taxonomy and governance policies
- Define approval rules by environment and scope breadth
- Define audit pack templates, profiles, and export targets
- Own acceptance criteria and audit usability testing
Development Manager / Tech Lead
- Own event model (immutability, versioning, retention)
- Own policy evaluation semantics and determinism guarantees
- Own integrity and signing design (manifest hashes, optional signatures)
- Own performance and scalability targets (pack generation and query latency)
- Own secure storage and access controls (RBAC, tenant isolation)
5.2 Deliverables checklist (for each capability)
For “Exception Objects”:
- PRD + threat model (abuse cases: blanket waivers, privilege escalation)
- Schema spec + versioning policy
- API endpoints + RBAC model
- UI flows + audit trail UI
- Policy engine semantics + test vectors
- Metrics dashboards
For “Audit Packs”:
- Pack schema spec + folder layout
- Manifest + hash verification rules
- Generator service + async job API
- Redaction profiles + tests
- Verifier CLI + documentation
- Performance benchmarks + caching strategy
6. Common failure modes to actively prevent
-
Exceptions become suppressions again If you allow exceptions without expiry/owner or without audit trail, you’ve rebuilt “ignore lists.”
-
Over-broad scopes by default If “all repos/all images” is easy, you will accumulate permanent waivers and lose credibility.
-
No deterministic semantics If the same artifact can pass/fail depending on evaluation order or transient feed updates, auditors will distrust outputs.
-
Audit packs that are reports, not evidence A PDF without machine-verifiable artifacts is not an audit pack—it’s a slide.
-
No renewal discipline If renewals are frictionless and don’t require re-justification, exceptions never die.
7. Recommended phased rollout (to manage build cost)
Phase 1: Governance basics
- Exception object schema + lifecycle + expiry enforcement
- Create-from-finding UX
- Audit pack v1 (SBOM/inventory + findings + policy + exceptions + manifest)
Phase 2: Evidence binding
- Evidence refs on exceptions (VEX, reachability fragments)
- Pack includes VEX inputs/outputs and knowledge snapshot identifiers
Phase 3: Verifiable trust
- Signed verdicts and/or signed pack manifests
- Verifier tooling and deterministic replay hooks
If you want, I can convert the above into two artifacts your teams can execute against immediately:
- A concise PRD template (sections + required decisions) for Exceptions and Audit Packs
- A technical spec outline (schema definitions, endpoints, state machines, and acceptance test vectors)