save progress

This commit is contained in:
StellaOps Bot
2025-12-20 12:15:16 +02:00
parent 439f10966b
commit 0ada1b583f
95 changed files with 12400 additions and 65 deletions

View File

@@ -0,0 +1,469 @@
Im sharing a **competitive securitytool matrix** that you can immediately plug into StellaOps strategy discussions — it maps real, *comparable evidence* from public sources to categories where most current tools fall short. Below the CSV is a short Markdown commentary that highlights gaps & opportunities StellaOps can exploit.
---
## 🧠 Competitive Security Tool Matrix (CSV)
**Columns:**
`Tool,SBOM Fidelity,VEX Handling,Explainability,SmartDiff,CallStack Reachability,Deterministic Scoring,Unknowns State,Ecosystem Integrations,Policy Engine,Offline/AirGapped,Provenance/Attestations,Public Evidence`
```
Tool,SBOM Fidelity,VEX Handling,Explainability,SmartDiff,CallStack Reachability,Deterministic Scoring,Unknowns State,Ecosystem Integrations,Policy Engine,Offline/AirGapped,Provenance/Attestations,Public Evidence
Trivy (open),CycloneDX/SPDX support (basic),Partial* (SBOM ext refs),Low,No,No,Moderate,No,Strong CI/CD/K8s,Minimal,Unknown,SBOM only evidence; VEX support request exists but unmerged⟨*⟩,:contentReference[oaicite:0]{index=0}
Grype/Syft,Strong CycloneDX/SPDX (generator + scanner),None documented,Low,No,No,Moderate,No,Strong CI/CD/K8s,Policy minimal,Unknown,Syft can create signed SBOMs but not full attestations,:contentReference[oaicite:1]{index=1}
Snyk,SBOM export likely (platform),Unknown/limited,Vuln context explainability (reports),No,No,Proprietary risk scoring,Partial integrations,Strong Black/White list policies in UI,Unknown,Unknown (not focused on attestations),:contentReference[oaicite:2]{index=2}
Prisma Cloud,Enterprise SBOM + vuln scanning,Runtime exploitability contexts?*,Enterprise dashboards,No formal smartdiff,No,Risk prioritization,Supports multicloud integrations,Rich policy engines (CNAPP),Supports offline deployment?,Unknown attestations capabilities,:contentReference[oaicite:3]{index=3}
Aqua (enterprise),SBOM via Trivy,Unknown commercial VEX support,Some explainability in reports,No documented smartdiff,No,Risk prioritization,Comprehensive integrations (cloud/CI/CD/SIEM),Enterprise policy supports compliance,Airgapped options in enterprise,Focus on compliance attestations?,:contentReference[oaicite:4]{index=4}
Anchore Enterprise,Strong SBOM mgmt + format support,Policy engine can ingest SBOM + vulnerability sources,Moderate (reports & SBOM insights),Potential policy diff,No explicit reachability analysis,Moderate policy scoring,Partial,Rich integrations (CI/CD/registry),Policyascode,Airgapped deploy supported,SBOM provenance & signing via Syft/intoto,:contentReference[oaicite:5]{index=5}
StellaOps,High fidelity SBOM (CycloneDX/SPDX) planned,Native VEX ingestion + decisioning,Explainability + proof extracts,Smartdiff tech planned,Callstack reachability analysis,Deterministic scoring with proofs,Explicit unknowns state,Integrations with CI/CD/SIGSTORE,Declarative multimodal policy engine,Full offline/airgapped support,Provenance/attestations via DSSE/intoto,StellaOps internal vision
```
---
## 📌 Key Notes, Gaps & Opportunities (Markdown)
### **SBOM Fidelity**
* **Open tools (Trivy, Syft)** already support CycloneDX/SPDX output, but mostly as flat SBOM artifacts without longterm repositories or versioned diffing. ([Ox Security][1])
* **Opportunity:** Provide *repository + lineage + merge semantics* with proofs — not just generation.
### **VEX Handling**
* Trivy has an open feature request for dynamic VEX ingestion. ([GitHub][2])
* Most competitors either lack VEX support or have no *decisioning logic* based on exploitability.
* **Opportunity:** Firstclass VEX ingestion with evaluation rules + automated scoring.
### **Explainability**
* Commercial tools (Prisma/Snyk) offer UI report context and devoriented remediation guidance. ([Snyk][3])
* OSS tools provide flat scan outputs with minimal causal trace.
* **Opportunity:** Link vulnerability flags back to *proven code paths*, enriched with SBOM + call reachability.
### **SmartDiff & Unknowns State**
* No major tool advertising *smart diffing* between SBOMs for incremental risk deltas across releases.
* **Opportunity:** Automate risk deltas between SBOMs with uncertainty margins.
### **CallStack Reachability**
* None of these tools publicly document callstack based exploit reachability analysis outofthebox.
* **Opportunity:** Integrate dynamic/static reachability evidence that elevates scanning from surface report → *impact map*.
### **Deterministic Scoring**
* Snyk & Prisma offer proprietary scoring that blends severity + context. ([TrustRadius][4])
* But these arent reproducible with *signed verdicts*.
* **Opportunity:** Provide *deterministic, attestable scoring proofs*.
### **Ecosystem Integrations**
* Trivy/Grype excel at lightweight CI/CD and Kubernetes. ([Echo][5])
* Enterprise products integrate deeply into cloud/registry. ([Palo Alto Networks][6])
* **Opportunity:** Expand *sigstore/notation* based pipelines and automated attestation flows.
### **Policy Engine**
* Prisma & Aqua have mature enterprise policies. ([Aqua][7])
* OSS tools have limited simple allow/deny.
* **Opportunity:** Provide *lattice/constraint policies* with proof outputs.
### **Offline/AirGapped**
* Anchore supports airgapped deployment in enterprise contexts. ([Anchore][8])
* Support across all open tools is adhoc at best.
* **Opportunity:** Builtin deterministic offline modes with offline SBOM stores and VEX ingestion.
### **Provenance/Attestations**
* Syft supports SBOM output in various formats; also *intoto* for attestations. ([Ox Security][1])
* Most competitors dont prominently advertise *attestation pipelines*.
* **Opportunity:** Endtoend DSSE/intoto provenance with immutable proofs.
---
### 📌 Public Evidence Links
* **Trivy / Syft / Grype SBOM support & formats:** CycloneDX/SPDX; Syft as generator + Grype scanner. ([Ox Security][1])
* **Trivy VEX feature request:** exists but not mainstream yet. ([GitHub][2])
* **Snyk platform capability:** scans containers, IaC, devfirst prioritization. ([Snyk][3])
* **Prisma Cloud container security:** lifecycle scanning + policy. ([Palo Alto Networks][6])
* **Anchore Enterprise SBOM mgmt & policy:** central imports + CI/CD ecosystem. ([Anchore Docs][9])
---
If youd like this in **Excel, JSON, or Mermaid graph form** next, I can generate it — just tell me the format you prefer.
[1]: https://www.ox.security/blog/sbom-tools/?utm_source=chatgpt.com "Top 5 SBOM Tools for Securing the Software Supply Chain"
[2]: https://github.com/aquasecurity/trivy/discussions/8047?utm_source=chatgpt.com "Support for dynamic VEX retrieval from \"External ..."
[3]: https://snyk.io/?utm_source=chatgpt.com "Snyk AI-powered Developer Security Platform | AI-powered ..."
[4]: https://www.trustradius.com/compare-products/palo-alto-networks-prisma-cloud-vs-snyk?utm_source=chatgpt.com "Palo Alto Networks Prisma Cloud vs Snyk"
[5]: https://www.echohq.com/blog/best-container-scanning-tools?utm_source=chatgpt.com "10 Best Container Scanning Tools for 2025"
[6]: https://www.paloaltonetworks.com/prisma/cloud/container-security?utm_source=chatgpt.com "Container Security Solutions"
[7]: https://www.aquasec.com/integrations/?utm_source=chatgpt.com "Integrations - Aqua Security"
[8]: https://anchore.com/blog/effortless-sbom-analysis-how-anchore-enterprise-simplifies-integration/?utm_source=chatgpt.com "Effortless SBOM Analysis with Anchore Enterprise"
[9]: https://docs.anchore.com/current/docs/sbom_management/?utm_source=chatgpt.com "SBOM Management"
Below is a **deep, implementation-grade elaboration** that turns the comparison matrix into **actionable architecture and product differentiation** for Stella Ops. I will deliberately avoid marketing language and focus on **mechanics, evidence, and execution gaps**.
---
## 1. Why existing tools plateau (root cause analysis)
Across Trivy, Syft/Grype, Snyk, Prisma, Aqua, Anchore, there is a **structural ceiling** they all hit — regardless of OSS vs enterprise.
### Shared structural limitations
1. **SBOM is treated as a static artifact**
* Generated → stored → scanned.
* No concept of *evolving truth*, lineage, or replayability.
2. **Vulnerability scoring is probabilistic, not provable**
* CVSS + vendor heuristics.
* Cannot answer: *“Show me why this CVE is exploitable here.”*
3. **Exploitability ≠ reachability**
* “Runtime context” ≠ call-path proof.
4. **Diffing is file-level, not semantic**
* Image hash change ≠ security delta understanding.
5. **Offline support is operational, not epistemic**
* You can run it offline, but you cannot **prove** what knowledge state was used.
These are not accidental omissions. They arise from **tooling lineage**:
* Trivy/Syft grew from *package scanners*
* Snyk grew from *developer remediation UX*
* Prisma/Aqua grew from *policy & compliance platforms*
None were designed around **forensic reproducibility or trust algebra**.
---
## 2. SBOM fidelity: what “high fidelity” actually means
Most tools claim CycloneDX/SPDX support. That is **necessary but insufficient**.
### Current reality
| Dimension | Industry tools |
| ----------------------- | ---------------------- |
| Component identity | Package name + version |
| Binary provenance | Weak or absent |
| Build determinism | None |
| Dependency graph | Flat or shallow |
| Layer attribution | Partial |
| Rebuild reproducibility | Not supported |
### What Stella Ops must do differently
**SBOM must become a *stateful ledger*, not a document.**
Concrete requirements:
* **Component identity = (source + digest + build recipe hash)**
* **Binary → source mapping**
* ELF Build-ID / Mach-O UUID / PE timestamp+hash
* **Layer-aware dependency graphs**
* Not “package depends on X”
* But “binary symbol A resolves to shared object B via loader rule C”
* **Replay manifest**
* Exact feeds
* Exact policies
* Exact scoring rules
* Exact timestamps
* Hash of everything
This is the foundation for *deterministic replayable scans* — something none of the competitors even attempt.
---
## 3. VEX handling: ingestion vs decisioning
Most vendors misunderstand VEX.
### What competitors do
* Accept VEX as:
* Metadata
* Annotation
* Suppression rule
* No **formal reasoning** over VEX statements.
### What Stella Ops must do
VEX is not a comment — it is a **logical claim**.
Each VEX statement:
```
IF
product == X
AND component == Y
AND version in range Z
THEN
status ∈ {not_affected, affected, fixed, under_investigation}
BECAUSE
justification J
WITH
evidence E
```
Stella Ops advantage:
* VEX statements become **inputs to a lattice merge**
* Conflicting VEX from:
* Vendor
* Distro
* Internal analysis
* Runtime evidence
* Are resolved **deterministically** via policy, not precedence hacks.
This unlocks:
* Vendor-supplied proofs
* Customer-supplied overrides
* Jurisdiction-specific trust rules
---
## 4. Explainability: reports vs proofs
### Industry “explainability”
* “This vulnerability is high because…”
* Screenshots, UI hints, remediation text.
### Required explainability
Security explainability must answer **four non-negotiable questions**:
1. **What exact evidence triggered this finding?**
2. **What code or binary path makes it reachable?**
3. **What assumptions are being made?**
4. **What would falsify this conclusion?**
No existing scanner answers #4.
### Stella Ops model
Each finding emits:
* Evidence bundle:
* SBOM nodes
* Call-graph edges
* Loader resolution
* Runtime symbol presence
* Assumption set:
* Compiler flags
* Runtime configuration
* Feature gates
* Confidence score **derived from evidence density**, not CVSS
This is explainability suitable for:
* Auditors
* Regulators
* Courts
* Defense procurement
---
## 5. Smart-Diff: the missing primitive
All tools compare:
* Image A vs Image B
* Result: *“+3 CVEs, 1 CVE”*
This is **noise-centric diffing**.
### What Smart-Diff must mean
Diff not *artifacts*, but **security meaning**.
Examples:
* Same CVE remains, but:
* Call path removed → risk collapses
* New binary added, but:
* Dead code → no reachable risk
* Dependency upgraded, but:
* ABI unchanged → no exposure delta
Implementation direction:
* Diff **reachability graphs**
* Diff **policy outcomes**
* Diff **trust weights**
* Diff **unknowns**
Output:
> “This release reduces exploitability surface by 41%, despite +2 CVEs.”
No competitor does this.
---
## 6. Call-stack reachability: why runtime context isnt enough
### Current vendor claim
“Runtime exploitability analysis.”
Reality:
* Usually:
* Process exists
* Library loaded
* Port open
This is **coarse correlation**, not proof.
### Stella Ops reachability model
Reachability requires **three layers**:
1. **Static call graph**
* From entrypoints to vulnerable symbols
2. **Binary resolution**
* Dynamic loader rules
* Symbol versioning
3. **Runtime gating**
* Feature flags
* Configuration
* Environment
Only when **all three align** does exploitability exist.
This makes false positives *structurally impossible*, not heuristically reduced.
---
## 7. Deterministic scoring: replacing trust with math
Every competitor uses:
* CVSS
* EPSS
* Proprietary weighting
Problem:
* Scores are **non-reproducible**
* Cannot be attested
* Cannot be audited
### Stella Ops scoring
Score = deterministic function of:
* Evidence count
* Evidence strength
* Assumption penalties
* Trust source weights
* Policy constraints
Same inputs → same outputs → forever.
This enables:
* Signed risk decisions
* Cross-org verification
* Legal defensibility
---
## 8. Unknowns as a first-class state
Industry tools suppress uncertainty.
Stella Ops must **surface it**.
States:
* Known-safe
* Known-vulnerable
* **Unknown-reachable**
* **Unknown-unreachable**
Unknowns are **risk**, but different from vulnerabilities.
This is critical for:
* Air-gapped environments
* Novel exploits
* Zero-day windows
No competitor models this explicitly.
---
## 9. Offline / air-gapped: epistemic completeness
Competitors:
* “Yes, we can run offline.”
But cannot answer:
> “Exactly what knowledge did you have when you made this decision?”
Stella Ops must:
* Snapshot feeds
* Snapshot policies
* Snapshot scoring rules
* Snapshot trust anchors
Then bind them cryptographically to scan results.
This creates **forensic replayability**, not just offline execution.
---
## 10. Strategic conclusion (no hype)
### Where Stella Ops can **objectively dominate**
1. Deterministic replayable scans
2. Formal VEX reasoning
3. Reachability-backed exploitability
4. Semantic smart-diff
5. Evidence-first explainability
6. Unknowns modeling
7. Jurisdiction-ready offline trust
### Where competitors remain stronger (for now)
* Mass-market UX polish
* SaaS onboarding friction
* Marketplace integrations
These are solvable later. The **hard parts** above are not.
---
If you want, next I can:
* Convert this into **developer implementation milestones**
* Map features → **moat strength vs competitors**
* Design the **Trust Algebra / Lattice Engine spec**
* Produce a **regulator-grade threat & evidence model**
Tell me which vector you want next.

View File

@@ -0,0 +1,366 @@
# A. Executive directive (send as-is to both PM + Dev)
1. **A “Release” is not an SBOM or a scan report. A Release is a “Security State Snapshot.”**
* A snapshot is a **versioned, content-addressed bundle** containing:
* SBOM graph (canonical form, hashed)
* Reachability graph (canonical form, hashed)
* VEX claim set (canonical form, hashed)
* Policies + rule versions used (hashed)
* Data-feed identifiers used (hashed)
* Toolchain versions (hashed)
2. **Diff is a product primitive, not a UI feature.**
* “Diff” must exist as a stable API and artifact, not a one-off report.
* Every comparison produces a **Delta object** (machine-readable) and a **Delta Verdict attestation** (signed).
3. **The CI/CD gate should never ask “how many CVEs?”**
* It should ask: **“What materially changed in exploitable risk since the last approved baseline?”**
* The Delta Verdict must be deterministically reproducible given the same snapshots and policy.
4. **Every Delta Verdict must be portable and auditable.**
* It must be a signed attestation that can be stored with the build artifact (OCI attach) and replayed offline.
---
# B. Product Management directions
## B1) Define the product concept: “Security Delta as the unit of governance”
**Position the capability as change-control for software risk**, not as “a scanner with comparisons.”
### Primary user stories (MVP)
1. **Release Manager / Security Engineer**
* “Compare the candidate build to the last approved build and explain *what changed* in exploitable risk.”
2. **CI Pipeline Owner**
* “Fail the build only for *new* reachable high-risk exposures (or policy-defined deltas), not for unchanged legacy issues.”
3. **Auditor / Compliance**
* “Show a signed delta verdict with evidence references proving why this release passed.”
### MVP “Delta Verdict” policy questions to support
* Are there **new reachable vulnerabilities** introduced?
* Did any **previously unreachable vulnerability become reachable**?
* Are there **new affected VEX states** (e.g., NOT_AFFECTED → AFFECTED)?
* Are there **new Unknowns** above a threshold?
* Is the **net exploitable surface** increased beyond policy budget?
## B2) Define the baseline selection rules (product-critical)
Diff is meaningless without a baseline contract. Product must specify baseline selection as a first-class choice.
Minimum baseline modes:
* **Previous build in the same pipeline**
* **Last “approved” snapshot** (from an approval gate)
* **Last deployed in environment X** (optional later, but roadmap it)
Acceptance criteria:
* The delta object must always contain:
* `baseline_snapshot_digest`
* `target_snapshot_digest`
* `baseline_selection_method` and identifiers
## B3) Define the delta taxonomy (what your product “knows” how to talk about)
Avoid “diffing findings lists.” You need consistent delta categories.
Minimum taxonomy:
1. **SBOM deltas**
* Component added/removed
* Component version change
* Dependency edge change (graph-level)
2. **VEX deltas**
* Claim added/removed
* Status change (e.g., under_investigation → fixed)
* Justification/evidence change (optional MVP)
3. **Reachability deltas**
* New reachable vulnerable symbol(s)
* Removed reachability
* Entry point changes
4. **Decision deltas**
* Policy outcome changed (PASS → FAIL)
* Explanation changed (drivers of decision)
PM deliverable:
* A one-page **Delta Taxonomy Spec** that becomes the canonical list used across API, UI, and attestations.
## B4) Define what “signed delta verdict” means in product terms
A delta verdict is not a PDF.
It is:
* A deterministic JSON payload
* Wrapped in a signature envelope (DSSE)
* Attached to the artifact (OCI attach)
* Includes pointers (hash references) to evidence graphs
PM must define:
* Where customers can view it (UI + CLI)
* Where it lives (artifact registry + Stella store)
* How it is consumed (policy gate, audit export)
## B5) PM success metrics (must be measurable)
* % of releases gated by delta verdict
* Mean time to explain “why failed”
* Reduction in “unchanged legacy vuln” false gating
* Reproducibility rate: same inputs → same verdict (target: 100%)
---
# C. Development Management directions
## C1) Architecture: treat Snapshot and Delta as immutable, content-addressed objects
You need four core services/modules:
1. **Canonicalization + Hashing**
* Deterministic serialization (stable field ordering, normalized IDs)
* Content addressing: every graph and claim set gets a digest
2. **Snapshot Store (Ledger)**
* Store snapshots keyed by digest
* Store relationships: artifact → snapshot, snapshot → predecessor(s)
* Must support offline export/import later (design now)
3. **Diff Engine**
* Inputs: `baseline_snapshot_digest`, `target_snapshot_digest`
* Outputs:
* `delta_object` (structured)
* `delta_summary` (human-friendly)
* Must be deterministic and testable with golden fixtures
4. **Verdict Engine + Attestation Writer**
* Evaluate policies against delta
* Produce `delta_verdict`
* Wrap as DSSE / in-toto-style statement (or your chosen predicate type)
* Sign and optionally attach to OCI artifact
## C2) Data model (minimum viable schemas)
### Snapshot (conceptual fields)
* `snapshot_id` (digest)
* `artifact_ref` (e.g., image digest)
* `sbom_graph_digest`
* `reachability_graph_digest`
* `vex_claimset_digest`
* `policy_bundle_digest`
* `feed_snapshot_digest`
* `toolchain_digest`
* `created_at`
### Delta object (conceptual fields)
* `delta_id` (digest)
* `baseline_snapshot_digest`
* `target_snapshot_digest`
* `sbom_delta` (structured)
* `reachability_delta` (structured)
* `vex_delta` (structured)
* `unknowns_delta` (structured)
* `derived_risk_delta` (structured)
* `created_at`
### Delta verdict attestation (must include)
* Subjects: artifact digest(s)
* Baseline snapshot digest + Target snapshot digest
* Policy bundle digest
* Verdict enum: PASS/WARN/FAIL
* Drivers: references to delta nodes (hash pointers)
* Signature metadata
## C3) Determinism requirements (non-negotiable)
Development must implement:
* **Canonical ID scheme** for components and graph nodes
(example: package URL + version + supplier + qualifiers, then hashed)
* Stable sorting for node/edge lists
* Stable normalization of timestamps (do not include wall-clock in hash inputs unless explicitly policy-relevant)
* A “replay test harness”:
* Given the same inputs, byte-for-byte identical snapshot/delta/verdict
Definition of Done:
* Golden test vectors for snapshots and deltas checked into repo
* Deterministic hashing tests in CI
## C4) Graph diff design (how to do it without drowning in noise)
### SBOM graph diff (MVP)
Implement:
* Node set delta: added/removed/changed nodes (by stable node ID)
* Edge set delta: added/removed edges (dependency relations)
* A “noise suppressor” layer:
* ignore ordering differences
* ignore metadata-only changes unless policy enables
Output should identify:
* “What changed?” (added/removed/upgraded/downgraded)
* “Why it matters?” (ties to vulnerability & reachability where available)
### VEX claimset diff (MVP)
Implement:
* Keyed by `(product/artifact scope, component ID, vulnerability ID)`
* Delta types:
* claim added/removed
* status changed
* justification changed (optional later)
### Reachability diff (incremental approach)
MVP can start narrow:
* Support one or two ecosystems initially (e.g., Java + Maven, or Go modules)
* Represent reachability as:
* `entrypoint → function/symbol → vulnerable symbol`
* Diff should highlight:
* Newly reachable vulnerable symbols
* Removed reachability
Important: even if reachability is initially partial, the diff model must support it cleanly (unknowns must exist).
## C5) Policy evaluation must run on delta, not on raw findings
Define a policy DSL contract like:
* `fail_if new_reachable_critical > 0`
* `warn_if new_unknowns > 10`
* `fail_if vex_status_regressed == true`
* `pass_if no_net_increase_exploitable_surface == true`
Engineering directive:
* Policies must reference **delta fields**, not scanner-specific output.
* Keep the policy evaluation pure and deterministic.
## C6) Signing and attachment (implementation-level)
Minimum requirements:
* Support signing delta verdict as a DSSE envelope with a stable predicate type.
* Support:
* keyless signing (optional)
* customer-managed keys (enterprise)
* Attach to OCI artifact as an attestation (where possible), and store in Stella ledger for retrieval.
Definition of Done:
* A CI workflow can:
1. create snapshots
2. compute delta
3. produce signed delta verdict
4. verify signature and gate
---
# D. Roadmap (sequenced to deliver value early without painting into a corner)
## Phase 1: “Snapshot + SBOM Diff + Delta Verdict”
* Version SBOM graphs
* Diff SBOM graphs
* Produce delta verdict based on SBOM delta + vulnerability delta (even before reachability)
* Signed delta verdict artifact exists
Output:
* Baseline/target selection
* Delta taxonomy v1
* Signed delta verdict v1
## Phase 2: “VEX claimsets and VEX deltas”
* Ingest OpenVEX/CycloneDX/CSAF
* Store canonical claimsets per snapshot
* Diff claimsets and incorporate into delta verdict
Output:
* “VEX status regression” gating works deterministically
## Phase 3: “Reachability graphs and reachability deltas”
* Start with one ecosystem
* Generate reachability evidence
* Diff reachability and incorporate into verdict
Output:
* “new reachable critical” becomes the primary gate
## Phase 4: “Offline replay bundle”
* Export/import snapshot + feed snapshot + policy bundle
* Replay delta verdict identically in air-gapped environment
---
# E. Acceptance criteria checklist (use this as a release gate for your own feature)
A feature is not done until:
1. **Snapshot is content-addressed** and immutable.
2. **Delta is content-addressed** and immutable.
3. Delta shows:
* SBOM delta
* VEX delta (when enabled)
* Reachability delta (when enabled)
* Unknowns delta
4. **Delta verdict is signed** and verification is automated.
5. **Replay test**: given same baseline/target snapshots + policy bundle, verdict is identical byte-for-byte.
6. The product answers, clearly:
* What changed?
* Why does it matter?
* Why is the verdict pass/fail?
* What evidence supports this?
---
# F. What to tell your teams to avoid (common failure modes)
* Do **not** ship “diff” as a UI compare of two scan outputs.
* Do **not** make reachability an unstructured “note” field; it must be a graph with stable IDs.
* Do **not** allow non-deterministic inputs into verdict hashes (timestamps, random IDs, nondeterministic ordering).
* Do **not** treat VEX as “ignore rules” only; treat it as a claimset with provenance and merge semantics (even if merge comes later).

View File

@@ -0,0 +1,234 @@
## 1) Define the product primitive (non-negotiable)
### Directive (shared)
**The products primary output is not “findings.” It is a “Risk Verdict Attestation” (RVA).**
Everything else (SBOMs, CVEs, VEX, reachability, reports) is *supporting evidence* referenced by the RVA.
### What “first-class artifact” means in practice
1. **The verdict is an OCI artifact “referrer” attached to a specific image/artifact digest** via OCI 1.1 `subject` and discoverable via the referrers API. ([opencontainers.org][1])
2. **The verdict is cryptographically signed** (at least one supported signing pathway).
* DSSE is a standard approach for signing attestations, and cosign supports creating/verifying intoto attestations signed with DSSE. ([Sigstore][2])
* Notation is a widely deployed approach for signing/verifying OCI artifacts in enterprise environments. ([Microsoft Learn][3])
---
## 2) Directions for Product Managers (PM)
### A. Write the “Risk Verdict Attestation v1” product contract
**Deliverable:** A one-page contract + schema that product and customers can treat as an API.
Minimum fields the contract must standardize:
* **Subject binding:** exact OCI digest, repo/name, platform (if applicable)
* **Verdict:** `PASS | FAIL | PASS_WITH_EXCEPTIONS | INDETERMINATE`
* **Policy reference:** policy ID, policy digest, policy version, enforcement mode
* **Knowledge snapshot reference:** snapshot ID + digest (see replay semantics below)
* **Evidence references:** digests/pointers for SBOM, VEX inputs, vuln feed snapshot, reachability proof(s), config snapshot, and unknowns summary
* **Reason codes:** stable machine-readable codes (`RISK.CVE.REACHABLE`, `RISK.VEX.NOT_AFFECTED`, `RISK.UNKNOWN.INPUT_MISSING`, etc.)
* **Human explanation stub:** short rationale text plus links/IDs for deeper evidence
**Key PM rule:** the contract must be **stable and versioned**, with explicit deprecation rules. If you cant maintain compatibility, ship a new version (v2), dont silently mutate v1.
Why: OCI referrers create long-lived metadata chains. Breaking them is a customer trust failure.
### B. Define strict replay semantics as a product requirement (not “nice to have”)
PM must specify what “same inputs” means. At minimum, inputs include:
* artifact digest (subject)
* policy bundle digest
* vulnerability dataset snapshot digest(s)
* VEX bundle digest(s)
* SBOM digest(s) or SBOM generation recipe digest
* scoring rules version/digest
* engine version
* reachability configuration version/digest (if enabled)
**Product acceptance criterion:**
When a user re-runs evaluation in “replay mode” using the same knowledge snapshot and policy digest, the **verdict and reason codes must match** (byte-for-byte identical predicate is ideal; if not, the deterministic portion must match exactly).
OCI 1.1 and ORAS guidance also implies you should avoid shoving large evidence into annotations; store large evidence as blobs and reference by digest. ([opencontainers.org][1])
### C. Make “auditor evidence extraction” a first-order user journey
Define the auditor journey as a separate persona:
* Auditor wants: “Prove why you blocked/allowed artifact X at time Y.”
* They should be able to:
1. Verify the signature chain
2. Extract the decision + evidence package
3. Replay the evaluation
4. Produce a human-readable report without bespoke consulting
**PM feature requirements (v1)**
* `explain` experience that outputs:
* decision summary
* policy used
* evidence references and hashes
* top N reasons (with stable codes)
* unknowns and assumptions
* `export-audit-package` experience:
* exports a ZIP (or OCI bundle) containing the RVA, its referenced evidence artifacts, and a machine-readable manifest listing all digests
* `verify` experience:
* verifies signature + policy expectations (who is trusted to sign; which predicate type(s) are acceptable)
Cosign explicitly supports creating/verifying intoto attestations (DSSE-signed) and even validating custom predicates against policy languages like Rego/CUE—this is a strong PM anchor for ecosystem interoperability. ([Sigstore][2])
---
## 3) Directions for Development Managers (Dev/Eng)
### A. Implement OCI attachment correctly (artifact, referrer, fallback)
**Engineering decisions:**
1. Store RVA as an OCI artifact manifest with:
* `artifactType` set to your verdict media type
* `subject` pointing to the exact image/artifact digest being evaluated
OCI 1.1 introduced these fields for associating metadata artifacts and retrieving them via the referrers API. ([opencontainers.org][1])
2. Support discovery via:
* Referrers API (`GET /v2/<name>/referrers/<digest>`) when registry supports it
* **Fallback “tagged index” strategy** for registries that dont support referrers (OCI 1.1 guidance calls out a fallback tag approach and client responsibilities). ([opencontainers.org][1])
**Dev acceptance tests**
* Push subject image → push RVA artifact with `subject` → query referrers → RVA appears.
* On a registry without referrers support: fallback retrieval still works.
### B. Use a standard attestation envelope and signing flow
For attestations, the lowest friction pathway is:
* intoto Statement + DSSE envelope
* Sign/verify using cosign-compatible workflows (so customers can verify without you) ([Sigstore][2])
DSSE matters because it:
* authenticates message + type
* avoids canonicalization pitfalls
* supports arbitrary encodings ([GitHub][4])
**Engineering rule:** the signed payload must include enough data to replay and audit (policy + knowledge snapshot digests), but avoid embedding huge evidence blobs directly.
### C. Build determinism into the evaluation core (not bolted on)
**“Same inputs → same verdict” is a software architecture constraint.**
It fails if any of these are non-deterministic:
* fetching “latest” vulnerability DB at runtime
* unstable iteration order (maps/hashes)
* timestamps included as decision inputs
* concurrency races changing aggregation order
* floating point scoring without canonical rounding
**Engineering requirements**
1. Create a **Knowledge Snapshot** object (content-addressed):
* a manifest listing every dataset input by digest and version
2. The evaluation function becomes:
* `Verdict = Evaluate(subject_digest, policy_digest, knowledge_snapshot_digest, engine_version, options_digest)`
3. The RVA must embed those digests so replay is possible offline.
**Dev acceptance tests**
* Run Evaluate twice with same snapshot/policy → verdict + reason codes identical.
* Run Evaluate with one dataset changed (snapshot digest differs) → RVA must reflect changed snapshot digest.
### D. Treat “evidence” as a graph of content-addressed artifacts
Implement evidence storage with these rules:
* Large evidence artifacts are stored as OCI blobs/artifacts (SBOM, VEX bundle, reachability proof graph, config snapshot).
* RVA references evidence by digest and type.
* “Explain” traverses this graph and renders:
* a machine-readable explanation JSON
* a human-readable report
ORAS guidance highlights artifact typing via `artifactType` in OCI 1.1 and suggests keeping manifests manageable; dont overload annotations. ([oras.land][5])
### E. Provide a verification and policy enforcement path
You want customers to be able to enforce “only run artifacts with an approved RVA predicate.”
Two practical patterns:
* **Cosign verification of attestations** (customers can do `verify-attestation` and validate predicate structure; cosign supports validating attestations with policy languages like Rego/CUE). ([Sigstore][2])
* **Notation signatures** for organizations that standardize on Notary/Notation for OCI signing/verification workflows. ([Microsoft Learn][3])
Engineering should not hard-code one choice; implement an abstraction:
* signing backend: `cosign/DSSE` first
* optional: notation signature over the RVA artifact for environments that require it
---
## 4) Minimal “v1” spec by example (what your teams should build)
### A. OCI artifact requirements (registry-facing)
* artifact is discoverable as a referrer via `subject` linkage and `artifactType` classification (OCI 1.1). ([opencontainers.org][1])
### B. Attestation payload structure (contract-facing)
In code terms (illustrative only), build on the intoto Statement model:
```json
{
"_type": "https://in-toto.io/Statement/v0.1",
"subject": [
{
"name": "oci://registry.example.com/team/app",
"digest": { "sha256": "<SUBJECT_DIGEST>" }
}
],
"predicateType": "https://stellaops.dev/attestations/risk-verdict/v1",
"predicate": {
"verdict": "FAIL",
"reasonCodes": ["RISK.CVE.REACHABLE", "RISK.POLICY.THRESHOLD_EXCEEDED"],
"policy": { "id": "prod-gate", "digest": "sha256:<POLICY_DIGEST>" },
"knowledgeSnapshot": { "id": "ks-2025-12-19", "digest": "sha256:<KS_DIGEST>" },
"evidence": {
"sbom": { "digest": "sha256:<SBOM_DIGEST>", "format": "cyclonedx-json" },
"vexBundle": { "digest": "sha256:<VEX_DIGEST>", "format": "openvex" },
"vulnData": { "digest": "sha256:<VULN_FEEDS_DIGEST>" },
"reachability": { "digest": "sha256:<REACH_PROOF_DIGEST>" },
"unknowns": { "count": 2, "digest": "sha256:<UNKNOWNS_DIGEST>" }
},
"engine": { "name": "stella-eval", "version": "1.3.0" }
}
}
```
Cosign supports creating and verifying intoto attestations (DSSE-signed), which is exactly the interoperability you want for customer-side verification. ([Sigstore][2])
---
## 5) Definition of Done (use this to align PM/Eng and prevent scope drift)
### v1 must satisfy all of the following:
1. **OCI-attached:** RVA is stored as an OCI artifact referrer to the subject digest and discoverable (referrers API + fallback mode). ([opencontainers.org][1])
2. **Signed:** RVA can be verified by a standard toolchain (cosign at minimum). ([Sigstore][2])
3. **Replayable:** Given the embedded policy + knowledge snapshot digests, the evaluation can be replayed and produces the same verdict + reason codes.
4. **Auditor extractable:** One command produces an audit package containing:
* RVA attestation
* policy bundle
* knowledge snapshot manifest
* referenced evidence artifacts
* an “explanation report” rendering the decision
5. **Stable contract:** predicate schema is versioned and validated (strict JSON schema checks; backwards compatibility rules).

View File

@@ -0,0 +1,463 @@
## Outcome you are shipping
A deterministic “claim resolution” capability that takes:
* Multiple **claims** about the same vulnerability (vendor VEX, distro VEX, internal assessments, scanner inferences),
* A **policy** describing trust and merge semantics,
* A set of **evidence artifacts** (SBOM, config snapshots, reachability proofs, etc.),
…and produces a **single resolved status** per vulnerability/component/artifact **with an explainable trail**:
* Which claims applied and why
* Which were rejected and why
* What evidence was required and whether it was satisfied
* What policy rules triggered the resolution outcome
This replaces naive precedence like `vendor > distro > internal`.
---
# Directions for Product Managers
## 1) Write the PRD around “claims resolution,” not “VEX support”
The customer outcome is not “we ingest VEX.” It is:
* “We can *safely* accept not affected without hiding risk.”
* “We can prove, to auditors and change control, why a CVE was downgraded.”
* “We can consistently resolve conflicts between issuer statements.”
### Non-negotiable product properties
* **Deterministic**: same inputs → same resolved outcome
* **Explainable**: a human can trace the decision path
* **Guardrailed**: a “safe” resolution requires evidence, not just a statement
---
## 2) Define the core objects (these drive everything)
In the PRD, define these three objects explicitly:
### A) Claim (normalized)
A “claim” is any statement about vulnerability applicability to an artifact/component, regardless of source format.
Minimum fields:
* `vuln_id` (CVE/GHSA/etc.)
* `subject` (component identity; ideally package + version + digest/purl)
* `target` (the thing were evaluating: image, repo build, runtime instance)
* `status` (affected / not_affected / fixed / under_investigation / unknown)
* `justification` (human/machine reason)
* `issuer` (who said it; plus verification state)
* `scope` (what it applies to; versions, ranges, products)
* `timestamp` (when produced)
* `references` (links/IDs to evidence or external material)
### B) Evidence
A typed artifact that can satisfy a requirement.
Examples (not exhaustive):
* `config_snapshot` (e.g., Helm values, env var map, feature flag export)
* `sbom_presence_or_absence` (SBOM proof that component is/ isnt present)
* `reachability_proof` (call-path evidence from entrypoint to vulnerable symbol)
* `symbol_absence` (binary inspection shows symbol/function not present)
* `patch_presence` (artifact includes backport / fixed build)
* `manual_attestation` (human-reviewed attestation with reviewer identity + scope)
Each evidence item must have:
* `type`
* `collector` (tool/provider)
* `inputs_hash` and `output_hash`
* `scope` (what artifact/environment it applies to)
* `confidence` (optional but recommended)
* `expires_at` / `valid_for` (for config/runtime evidence)
### C) Policy
A policy describes:
* **Trust rules** (how much to trust whom, under which conditions)
* **Merge semantics** (how to resolve conflicts)
* **Evidence requirements** (what must be present to accept certain claims)
---
## 3) Ship “policy-controlled merge semantics” as a configuration schema first
Do not start with a fully general policy language. You need a small, explicit schema that makes behavior predictable.
PM deliverable: a policy spec with these sections:
1. **Issuer trust**
* weights by issuer category (vendor/distro/internal/scanner)
* optional constraints (must be signed, must match product ownership, must be within time window)
2. **Applicability rules**
* what constitutes a match to artifact/component (range semantics, digest match priority)
3. **Evidence requirements**
* per status + per justification: what evidence types are required
4. **Conflict resolution strategy**
* conservative vs weighted vs most-specific
* explicit guardrails (never accept “safe” without evidence)
5. **Override rules**
* when internal can override vendor (and what evidence is required to do so)
* environment-specific policies (prod vs dev)
---
## 4) Make “evidence hooks” a first-class user workflow
You are explicitly shipping the ability to say:
> “This is not affected **because** feature flag X is off.”
That requires:
* a way to **provide or discover** feature flag state, and
* a way to **bind** that flag to the vulnerable surface
PM must specify: what does the user do to assert that?
Minimum viable workflow:
* User attaches a `config_snapshot` (or system captures it)
* User provides a “binding” to the vulnerable module/function:
* either automatic (later) or manual (first release)
* e.g., `flag X gates module Y` with references (file path, code reference, runbook)
This “binding” itself becomes evidence.
---
## 5) Define acceptance criteria as decision trace tests
PM should write acceptance criteria as “given claims + policy + evidence → resolved outcome + trace”.
You need at least these canonical tests:
1. **Distro backport vs vendor version logic conflict**
* Vendor says affected (by version range)
* Distro says fixed (backport)
* Policy says: in distro context, distro claim can override vendor if patch evidence exists
* Outcome: fixed, with trace proving why
2. **Internal feature flag off downgrade**
* Vendor says affected
* Internal says not_affected because flag off
* Evidence: config snapshot + flag→module binding
* Outcome: not_affected **only for that environment context**, with trace
3. **Evidence missing**
* Internal says not_affected because “code not reachable”
* No reachability evidence present
* Outcome: unknown or affected (policy-dependent), but **not “not_affected”**
4. **Conflicting “safe” claims**
* Vendor says not_affected (reason A)
* Internal says affected (reason B) with strong evidence
* Outcome follows merge strategy, and trace must show why.
---
## 6) Package it as an “Explainable Resolution” feature
UI/UX requirements PM must specify:
* A “Resolved Status” view per vuln/component showing:
* contributing claims (ranked)
* rejected claims (with reason)
* evidence required vs evidence present
* the policy clauses triggered (line-level references)
* A policy editor can be CLI/JSON first; UI later, but explainability cannot wait.
---
# Directions for Development Managers
## 1) Implement as three services/modules with strict interfaces
### Module A: Claim Normalization
* Inputs: OpenVEX / CycloneDX VEX / CSAF / internal annotations / scanner hints
* Output: canonical `Claim` objects
Rules:
* Canonicalize IDs (normalize CVE formats, normalize package coordinates)
* Preserve provenance: issuer identity, signature metadata, timestamps, original document hash
### Module B: Evidence Providers (plugin boundary)
* Provide an interface like:
```
evaluate_evidence(context, claim) -> EvidenceEvaluation
```
Where `EvidenceEvaluation` returns:
* required evidence types for this claim (from policy)
* found evidence items (from store/providers)
* satisfied / not satisfied
* explanation strings
* confidence
Start with 3 providers:
1. SBOM provider (presence/absence)
2. Config provider (feature flags/config snapshot ingestion)
3. Reachability provider (even if initially limited or stubbed, it must exist as a typed hook)
### Module C: Merge & Resolution Engine
* Inputs: set of claims + policy + evidence evaluations + context
* Output: `ResolvedDecision`
A `ResolvedDecision` must include:
* final status
* selected “winning” claim(s)
* all considered claims
* evidence satisfaction summary
* applied policy rule IDs
* deterministic ordering keys/hashes
---
## 2) Define the evaluation context (this avoids foot-guns)
The resolved outcome must be context-aware.
Create an immutable `EvaluationContext` object, containing:
* artifact identity (image digest / build digest / SBOM hash)
* environment identity (prod/stage/dev; cluster; region)
* config snapshot ID
* time (evaluation timestamp)
* policy version hash
This is how you support: “not affected because feature flag off” in prod but not in dev.
---
## 3) Merge semantics: implement scoring + guardrails, not precedence
You need a deterministic function. One workable approach:
### Step 1: compute statement strength
For each claim:
* `trust_weight` from policy (issuer + scope + signature requirements)
* `evidence_factor` (1.0 if requirements satisfied; <1 or 0 if not)
* `specificity_factor` (exact digest match > exact version > range)
* `freshness_factor` (optional; policy-defined)
* `applicability` must be true or claim is excluded
Compute:
```
support = trust_weight * evidence_factor * specificity_factor * freshness_factor
```
### Step 2: apply merge strategy (policy-controlled)
Ship at least two strategies:
1. **Conservative default**
* If any “unsafe” claim (affected/under_investigation) has support above threshold, it wins
* A “safe” claim (not_affected/fixed) can override only if:
* it has equal/higher support + delta, AND
* its evidence requirements are satisfied
2. **Evidence-weighted**
* Highest support wins, but safe statuses have a hard evidence gate
### Step 3: apply guardrails
Hard guardrail to prevent bad outcomes:
* **Never emit a safe status unless evidence requirements for that safe claim are satisfied.**
* If a safe claim lacks evidence, downgrade the safe claim to “unsupported” and do not allow it to win.
This single rule is what makes your system materially different from “VEX as suppression.”
---
## 4) Evidence hooks: treat them as typed contracts, not strings
For “feature flag off,” implement it as a structured evidence requirement.
Example evidence requirement for a “safe because feature flag off” claim:
* Required evidence types:
* `config_snapshot`
* `flag_binding` (the mapping “flag X gates vulnerable surface Y”)
Implementation:
* Config provider can parse:
* Helm values / env var sets / feature flag exports
* Store them as normalized key/value with hashes
* Binding evidence can start as manual JSON that references:
* repo path / module / function group
* a link to code ownership / runbook
* optional test evidence
Later you can automate binding via static analysis, but do not block shipping on that.
---
## 5) Determinism requirements (engineering non-negotiables)
Development manager should enforce:
* stable sorting of claims by canonical key
* stable tie-breakers (e.g., issuer ID, timestamp, claim hash)
* no nondeterministic external calls during evaluation (or they must be snapshot-based)
* every evaluation produces:
* `input_bundle_hash` (claims + evidence + policy + context)
* `decision_hash`
This is the foundation for replayability and audits.
---
## 6) Storage model: store raw inputs and canonical forms
Minimum stores:
* Raw documents (original VEX/CSAF/etc.) keyed by content hash
* Canonical claims keyed by claim hash
* Evidence items keyed by evidence hash and scoped by context
* Policy versions keyed by policy hash
* Resolutions keyed by (context, vuln_id, subject) with decision hash
---
## 7) “Definition of done” checklist for engineering
You are done when:
1. You can ingest at least two formats into canonical claims (pick OpenVEX + CycloneDX VEX first).
2. You can configure issuer trust and evidence requirements in a policy file.
3. You can resolve conflicts deterministically.
4. You can attach a config snapshot and produce:
* `not_affected because feature flag off` **only when evidence satisfied**
5. The system produces a decision trace with:
* applied policy rules
* evidence satisfaction
* selected/rejected claims and reasons
6. Golden test vectors exist for the acceptance scenarios listed above.
---
# A concrete example policy (schema-first, no full DSL required)
```yaml
version: 1
trust:
issuers:
- match: {category: vendor}
weight: 70
require_signature: true
- match: {category: distro}
weight: 75
require_signature: true
- match: {category: internal}
weight: 85
require_signature: false
- match: {category: scanner}
weight: 40
evidence_requirements:
safe_status_requires_evidence: true
rules:
- when:
status: not_affected
reason: feature_flag_off
require: [config_snapshot, flag_binding]
- when:
status: not_affected
reason: component_not_present
require: [sbom_absence]
- when:
status: not_affected
reason: not_reachable
require: [reachability_proof]
merge:
strategy: conservative
unsafe_wins_threshold: 50
safe_override_delta: 10
```
---
# A concrete example output trace (what auditors and engineers must see)
```json
{
"vuln_id": "CVE-XXXX-YYYY",
"subject": "pkg:maven/org.example/foo@1.2.3",
"context": {
"artifact_digest": "sha256:...",
"environment": "prod",
"policy_hash": "sha256:..."
},
"resolved_status": "not_affected",
"because": [
{
"winning_claim": "claim_hash_abc",
"reason": "feature_flag_off",
"evidence_required": ["config_snapshot", "flag_binding"],
"evidence_present": ["ev_hash_1", "ev_hash_2"],
"policy_rules_applied": ["trust.issuers[internal]", "evidence.rules[0]", "merge.safe_override_delta"]
}
],
"claims_considered": [
{"issuer": "vendor", "status": "affected", "support": 62, "accepted": false, "rejection_reason": "overridden_by_higher_support_safe_claim_with_satisfied_evidence"},
{"issuer": "internal", "status": "not_affected", "support": 78, "accepted": true, "evidence_satisfied": true}
],
"decision_hash": "sha256:..."
}
```
---
## The two strategic pitfalls to explicitly avoid
1. **“Trust precedence” as the merge mechanism**
* It will fail immediately on backports, forks, downstream patches, and environment-specific mitigations.
2. **Allowing “safe” without evidence**
* That turns VEX into a suppression system and will collapse trust in the product.

View File

@@ -0,0 +1,338 @@
## Executive directive
Build **Reachability as Evidence**, not as a UI feature.
Every reachability conclusion must produce a **portable, signed, replayable evidence bundle** that answers:
1. **What vulnerable code unit is being discussed?** (symbol/method/function + version)
2. **What entrypoint is assumed?** (HTTP handler, RPC method, CLI, scheduled job, etc.)
3. **What is the witness?** (a call-path subgraph, not a screenshot)
4. **What assumptions/gates apply?** (config flags, feature toggles, runtime wiring)
5. **Can a third party reproduce it?** (same inputs → same evidence hash)
This must work for **source** and **post-build artifacts**.
---
# Directions for Product Managers
## 1) Define the product contract in one page
### Capability name
**Proofcarrying reachability**.
### Contract
Given an artifact (source or built) and a vulnerability mapping, Stella Ops outputs:
- **Reachability verdict:** `REACHABLE | NOT_PROVEN_REACHABLE | INCONCLUSIVE`
- **Witness evidence:** a minimal **reachability subgraph** + one or more witness paths
- **Reproducibility bundle:** all inputs and toolchain metadata needed to replay
- **Attestation:** signed statement tied to the artifact digest
### Important language choice
Avoid claiming “unreachable” unless you can prove non-reachability under a formally sound model.
- Use **NOT_PROVEN_REACHABLE** for “no path found under current analysis + assumptions.”
- Use **INCONCLUSIVE** when analysis cannot be performed reliably (missing symbols, obfuscation, unsupported language, dynamic dispatch uncertainty, etc.).
This is essential for credibility and audit use.
---
## 2) Anchor personas and top workflows
### Primary personas
- Security governance / AppSec: wants fewer false positives and defensible prioritization.
- Compliance/audit: wants evidence and replayability.
- Engineering teams: wants specific call paths and what to change.
### Top workflows (must support in MVP)
1. **CI gate with signed verdict**
- “Block release if any `REACHABLE` high severity is present OR if `INCONCLUSIVE` exceeds threshold.”
2. **Audit replay**
- “Reproduce the reachability proof for artifact digest X using snapshot Y.”
3. **Release delta**
- “Show what reachability changed between release A and B.”
---
## 3) Minimum viable scope: pick targets that make “post-build” real early
To satisfy “source and post-build artifacts” without biting off ELF-level complexity first:
### MVP artifact types (recommended)
- **Source repository** for 12 languages with mature static IR
- **Post-build intermediate artifacts** that retain symbol structure:
- Java `.jar/.class`
- .NET assemblies
- Python wheels (bytecode)
- Node bundles with sourcemaps (optional)
These give you “post-build” support where call graphs are tractable.
### Defer for later phases
- Native ELF/Mach-O deep reachability (harder due to stripping, inlining, indirect calls, dynamic loading)
- Highly dynamic languages without strong type info, unless you accept “witness-only” semantics
Your differentiator is proof portability and determinism, not “supports every binary on day one.”
---
## 4) Product requirements: what “proof-carrying” means in requirements language
### Functional requirements
- Output must include a **reachability subgraph**:
- Nodes = code units (function/method) with stable IDs
- Edges = call or dispatch edges with type annotations
- Must include at least one **witness path** from entrypoint to vulnerable node when `REACHABLE`
- Output must be **artifact-tied**:
- Evidence must reference artifact digest(s) (source commit, build artifact digest, container image digest)
- Output must be **attestable**:
- Produce a signed attestation (DSSE/in-toto style) attached to the artifact digest
- Output must be **replayable**:
- Provide a “replay recipe” (analyzer versions, configs, vulnerability mapping version, and input digests)
### Non-functional requirements
- Deterministic: repeated runs on same inputs produce identical evidence hash
- Size-bounded: subgraph evidence must be bounded (e.g., path-based extraction + limited context)
- Privacy-controllable:
- Support a mode that avoids embedding raw source content (store pointers/hashes instead)
- Verifiable offline:
- Verification and replay must work air-gapped given the snapshot bundle
---
## 5) Acceptance criteria (use as Definition of Done)
A feature is “done” only when:
1. **Verifier can validate** the attestation signature and confirm the evidence hash matches content.
2. A second machine can **reproduce the same evidence hash** given the replay bundle.
3. Evidence includes at least one witness path for `REACHABLE`.
4. Evidence includes explicit assumptions/gates; absence of gating is recorded as an assumption (e.g., “config unknown”).
5. Evidence is **linked to the precise artifact digest** being deployed/scanned.
---
## 6) Product packaging decisions that create switching cost
These are product decisions that turn engineering into moat:
- **Make “reachability proof” an exportable object**, not just a UI view.
- Provide an API: `GET /findings/{id}/proof` returning canonical evidence.
- Support policy gates on:
- `verdict`
- `confidence`
- `assumption_count`
- `inconclusive_reasons`
- Make “proof replay” a one-command workflow in CLI.
---
# Directions for Development Managers
## 1) Architecture: build a “proof pipeline” with strict boundaries
Implement as composable modules with stable interfaces:
1. **Artifact Resolver**
- Inputs: repo URL/commit, build artifact path, container image digest
- Output: normalized “artifact record” with digests and metadata
2. **Graph Builder (language-specific adapters)**
- Inputs: artifact record
- Output: canonical **Program Graph**
- Nodes: code units
- Edges: calls/dispatch
- Optional: config gates, dependency edges
3. **Vulnerability-to-Code Mapper**
- Inputs: vulnerability record (CVE), package coordinates, symbol metadata (if available)
- Output: vulnerable node set + mapping confidence
4. **Entrypoint Modeler**
- Inputs: artifact + runtime context (framework detection, routing tables, main methods)
- Output: entrypoint node set with types (HTTP, RPC, CLI, cron)
5. **Reachability Engine**
- Inputs: graph + entrypoints + vulnerable nodes + constraints
- Output: witness paths + minimal subgraph extraction
6. **Evidence Canonicalizer**
- Inputs: witness paths + subgraph + metadata
- Output: canonical JSON (stable ordering, stable IDs), plus content hash
7. **Attestor**
- Inputs: evidence hash + artifact digest
- Output: signed attestation object (OCI attachable)
8. **Verifier (separate component)**
- Must validate signatures + evidence integrity independently of generator
Critical: generator and verifier must be decoupled to preserve trust.
---
## 2) Evidence model: what to store (and how to keep it stable)
### Node identity must be stable across runs
Define a canonical NodeID scheme:
- Source node ID:
- `{language}:{repo_digest}:{symbol_signature}:{optional_source_location_hash}`
- Post-build node ID:
- `{language}:{artifact_digest}:{symbol_signature}:{optional_offset_or_token}`
Avoid raw file paths or non-deterministic compiler offsets as primary IDs unless normalized.
### Edge identity
`{caller_node_id} -> {callee_node_id} : {edge_type}`
Edge types matter (direct call, virtual dispatch, reflection, dynamic import, etc.)
### Subgraph extraction rule
Store:
- All nodes/edges on at least one witness path (or k witness paths)
- Plus bounded context:
- 12 hop neighborhood around the vulnerable node and entrypoint
- routing edges (HTTP route → handler) where applicable
This makes the proof compact and audit-friendly.
### Canonicalization requirements
- Stable sorting of nodes and edges
- Canonical JSON serialization (no map-order nondeterminism)
- Explicit analyzer version + config included in evidence
- Hash everything that influences results
---
## 3) Determinism and reproducibility: engineering guardrails
### Deterministic computation
- Avoid parallel graph traversal that yields nondeterministic order without canonical sorting
- If using concurrency, collect results and sort deterministically before emitting
### Repro bundle (“time travel”)
Persist, as digests:
- Analyzer container/image digest
- Analyzer config hash
- Vulnerability mapping dataset version hash
- Artifact digest(s)
- Graph builder version hash
A replay must be possible without “calling home.”
### Golden tests
Create fixtures where:
- Same input graph + mapping → exact evidence hash
- Regression test for canonicalization changes (version the schema intentionally)
---
## 4) Attestation format and verification
### Attestation contents (minimum)
- Subject: artifact digest (image digest / build artifact digest)
- Predicate: reachability evidence hash + metadata
- Predicate type: `reachability` (custom) with versioning
### Verification requirements
- Verification must run offline
- It must validate:
1) signature
2) subject digest binding
3) evidence hash matches serialized evidence
### Storage model
Use content-addressable storage keyed by evidence hash.
Attestation references the hash; evidence stored separately or embedded (size tradeoff).
---
## 5) Source + post-build support: engineering plan
### Unifying principle
Both sources produce the same canonical Program Graph abstraction.
#### Source analyzers produce:
- Function/method nodes using language signatures
- Edges from static analysis IR
#### Post-build analyzers produce:
- Nodes from bytecode/assembly symbol tables (where available)
- Edges from bytecode call instructions / metadata
### Practical sequencing (recommended)
1. Implement one source language adapter (fastest to prove model)
2. Implement one post-build adapter where symbols are rich (e.g., Java bytecode)
3. Ensure evidence schema and attestation workflow works identically for both
4. Expand to more ecosystems once the proof pipeline is stable
---
## 6) Operational constraints (performance, size, security)
### Performance
- Cache program graphs per artifact digest
- Cache vulnerability-to-code mapping per package/version
- Compute reachability on-demand per vulnerability, but reuse graphs
### Evidence size
- Limit witness paths (e.g., up to N shortest paths)
- Prefer “witness + bounded neighborhood” over exporting full call graph
### Security and privacy
- Provide a “redacted proof mode”
- include symbol hashes instead of raw names if needed
- store source locations as hashes/pointers
- Never embed raw source code unless explicitly enabled
---
## 7) Definition of Done for the engineering team
A milestone is complete when you can demonstrate:
1. Generate a reachability proof for a known vulnerable code unit with a witness path.
2. Serialize a canonical evidence subgraph and compute a stable hash.
3. Sign the attestation bound to the artifact digest.
4. Verify the attestation on a clean machine (offline).
5. Replay the analysis from the replay bundle and reproduce the same evidence hash.
---
# Concrete artifact example (for alignment)
A reachability evidence object should look structurally like:
- `subject`: artifact digest(s)
- `claim`:
- `verdict`: REACHABLE / NOT_PROVEN_REACHABLE / INCONCLUSIVE
- `entrypoints`: list of NodeIDs
- `vulnerable_nodes`: list of NodeIDs
- `witness_paths`: list of paths (each path = ordered NodeIDs)
- `subgraph`:
- `nodes`: list with stable IDs + metadata
- `edges`: list with stable ordering + edge types
- `assumptions`:
- gating conditions, unresolved dynamic dispatch notes, etc.
- `tooling`:
- analyzer name/version/digest
- config hash
- mapping dataset hash
- `hashes`:
- evidence content hash
- schema version
Then wrap and sign it as an attestation tied to the artifact digest.
---
## The one decision you should force early
Decide (and document) whether your semantics are:
- **Witness-based** (“REACHABLE only if we can produce a witness path”), and
- **Conservative on negative claims** (“NOT_PROVEN_REACHABLE” is not “unreachable”).
This single decision will keep the system honest, reduce legal/audit risk, and prevent the product from drifting into hand-wavy “trust us” scoring.

View File

@@ -0,0 +1,268 @@
## 1) Product direction: make “Unknowns” a first-class risk primitive
### Nonnegotiable product principles
1. **Unknowns are not suppressed findings**
* They are a distinct state with distinct governance.
2. **Unknowns must be policy-addressable**
* If policy cannot block or allow them explicitly, the feature is incomplete.
3. **Unknowns must be attested**
* Every signed decision must carry “what we dont know” in a machine-readable way.
4. **Unknowns must be default-on**
* Users may adjust thresholds, but they must not be able to “turn off unknown tracking.”
### Definition: what counts as an “unknown”
PMs must ensure that “unknown” is not vague. Define **reason-coded unknowns**, for example:
* **U-RCH**: Reachability unknown (call path indeterminate)
* **U-ID**: Component identity unknown (ambiguous package / missing digest / unresolved PURL)
* **U-PROV**: Provenance unknown (cannot map binary → source/build)
* **U-VEX**: VEX conflict or missing applicability statement
* **U-FEED**: Knowledge source missing (offline feed gaps, mirror stale)
* **U-CONFIG**: Config/runtime gate unknown (feature flag not observable)
* **U-ANALYZER**: Analyzer limitation (language/framework unsupported)
Each unknown must have:
* `reason_code` (one of a stable enum)
* `scope` (component, binary, symbol, package, image, repo)
* `evidence_refs` (what we inspected)
* `assumptions` (what would need to be true/false)
* `remediation_hint` (how to reduce unknown)
**Acceptance criterion:** every unknown surfaced to users can be traced to a reason code and remediation hint.
---
## 2) Policy direction: “unknown budgets” must be enforceable and environment-aware
### Policy model requirements
Policy must support:
* Thresholds by environment (dev/test/stage/prod)
* Thresholds by unknown type (reachability vs provenance vs feed, etc.)
* Severity weighting (e.g., unknown on internet-facing service is worse)
* Exception workflow (time-bound, owner-bound)
* Deterministic evaluation (same inputs → same result)
### Recommended default policy posture (ship as opinionated defaults)
These defaults are intentionally strict in prod:
**Prod (default)**
* `unknown_reachable == 0` (fail build/deploy)
* `unknown_provenance == 0` (fail)
* `unknown_total <= 3` (fail if exceeded)
* `unknown_feed == 0` (fail; “we didnt have data” is unacceptable for prod)
**Stage**
* `unknown_reachable <= 1`
* `unknown_provenance <= 1`
* `unknown_total <= 10`
**Dev**
* Never hard fail by default; warn + ticket/PR annotation
* Still compute unknowns and show trendlines (so teams see drift)
### Exception policy (required to avoid “disable unknowns” pressure)
Implement **explicit exceptions** rather than toggles:
* Exception must include: `owner`, `expiry`, `justification`, `scope`, `risk_ack`
* Exception must be emitted into attestations and reports (“this passed with exception X”).
**Acceptance criterion:** there is no “turn off unknowns” knob; only thresholds and expiring exceptions.
---
## 3) Reporting direction: unknowns must be visible, triaged, and trendable
### Required reporting surfaces
1. **Release / PR report**
* Unknown summary at top:
* total unknowns
* unknowns by reason code
* unknowns blocking policy vs not
* “What changed?” vs previous baseline (unknown delta)
2. **Dashboard (portfolio view)**
* Unknowns over time
* Top teams/services by unknown count
* Top unknown causes (reason codes)
3. **Operational triage view**
* “Unknown queue” sortable by:
* environment impact (prod/stage)
* exposure class (internet-facing/internal)
* reason code
* last-seen time
* owner
### Reporting should drive action, not anxiety
Every unknown row must include:
* Why its unknown (reason code + short explanation)
* What evidence is missing
* How to reduce unknown (concrete steps)
* Expected effect (e.g., “adding debug symbols will likely reduce U-RCH by ~X”)
**Key PM instruction:** treat unknowns like an **SLO**. Teams should be able to commit to “unknowns in prod must trend to zero.”
---
## 4) Attestations direction: unknowns must be cryptographically bound to decisions
Every signed decision/attestation must include an “unknowns summary” section.
### Attestation requirements
Include at minimum:
* `unknown_total`
* `unknown_by_reason_code` (map of reason→count)
* `unknown_blocking_count`
* `unknown_details_digest` (hash of the full list if too large)
* `policy_thresholds_applied` (the exact thresholds used)
* `exceptions_applied` (IDs + expiries)
* `knowledge_snapshot_id` (feeds/policy bundle hash if you support offline snapshots)
**Why this matters:** if you sign a “pass,” you must also sign what you *didnt know* at the time. Otherwise the signature is not audit-grade.
**Acceptance criterion:** any downstream verifier can reject a signed “pass” based solely on unknown fields (e.g., “reject if unknown_reachable>0 in prod”).
---
## 5) Development direction: implement unknown propagation as a first-class data flow
### Core engineering tasks (must be done in this order)
#### A. Define the canonical “Tri-state” evaluation type
For any security claim, the evaluator must return:
* `TRUE` (evidence supports)
* `FALSE` (evidence refutes)
* `UNKNOWN` (insufficient evidence)
Do not represent unknown as nulls or missing fields. It must be explicit.
#### B. Build the unknown aggregator and reason-code framework
* A single aggregation layer computes:
* unknown counts per scope
* unknown counts per reason code
* unknown “blockers” based on policy
* This must be deterministic and stable (no random ordering, stable IDs).
#### C. Ensure analyzers emit unknowns instead of silently failing
Any analyzer that cannot conclude must emit:
* `UNKNOWN` + reason code + evidence pointers
Examples:
* call graph incomplete → `U-RCH`
* stripped binary cannot map symbols → `U-PROV`
* unsupported language → `U-ANALYZER`
#### D. Provide “reduce unknown” instrumentation hooks
Attach remediation metadata:
* “add build flags …”
* “upload debug symbols …”
* “enable source mapping …”
* “mirror feeds …”
This is how you prevent user backlash.
---
## 6) Make it default rather than optional: rollout plan without breaking adoption
### Phase 1: compute + display (no blocking)
* Unknowns computed for all scans
* Reports show unknown budgets and what would have failed in prod
* Collect baseline metrics for 24 weeks of typical usage
### Phase 2: soft gating
* In prod-like pipelines: fail only on `unknown_reachable > 0`
* Everything else warns + requires owner acknowledgement
### Phase 3: full policy enforcement
* Enforce default thresholds
* Exceptions require expiry and are visible in attestations
### Phase 4: governance integration
* Unknowns become part of:
* release readiness checks
* quarterly risk reviews
* vendor compliance audits
**Dev Manager instruction:** invest in tooling that reduces unknowns early (symbol capture, provenance mapping, better analyzers). Otherwise “unknown gating” becomes politically unsustainable.
---
## 7) “Definition of Done” checklist for PMs and Dev Managers
### PM DoD
* [ ] Unknowns are explicitly defined with stable reason codes
* [ ] Policy can fail on unknowns with environment-scoped thresholds
* [ ] Reports show unknown deltas and remediation guidance
* [ ] Exceptions are time-bound and appear everywhere (UI + API + attestations)
* [ ] Unknowns cannot be disabled; only thresholds/exceptions are configurable
### Engineering DoD
* [ ] Tri-state evaluation implemented end-to-end
* [ ] Analyzer failures never disappear; they become unknowns
* [ ] Unknown aggregation is deterministic and reproducible
* [ ] Signed attestation includes unknown summary + policy thresholds + exceptions
* [ ] CI/CD integration can enforce “fail if unknowns > N in prod”
---
## 8) Concrete policy examples you can standardize internally
### Minimal policy (prod)
* Block deploy if:
* `unknown_reachable > 0`
* OR `unknown_provenance > 0`
### Balanced policy (prod)
* Block deploy if:
* `unknown_reachable > 0`
* OR `unknown_provenance > 0`
* OR `unknown_total > 3`
### Risk-sensitive policy (internet-facing prod)
* Block deploy if:
* `unknown_reachable > 0`
* OR `unknown_total > 1`
* OR any unknown affects a component with known remotely-exploitable CVEs

View File

@@ -0,0 +1,299 @@
## 1) Anchor the differentiator in one sentence everyone repeats
**Positioning invariant:**
Stella Ops does not “consume VEX to suppress findings.” Stella Ops **verifies who made the claim, scores how much to trust it, deterministically applies it to a decision, and emits a signed, replayable verdict**.
Everything you ship should make that sentence more true.
---
## 2) Shared vocabulary PMs/DMs must standardize
If you dont align on these, youll ship features that look similar to competitors but do not compound into a moat.
### Core objects
- **VEX source**: a distribution channel and issuer identity (e.g., vendor feed, distro feed, OCI-attached attestation).
- **Issuer identity**: cryptographic identity used to sign/attest the VEX (key/cert/OIDC identity), not a string.
- **VEX statement**: one claim about one vulnerability status for one or more products; common statuses include *Not Affected, Affected, Fixed, Under Investigation* (terminology varies by format). citeturn6view1turn10view0
- **Verification result**: cryptographic + semantic verification facts about a VEX document/source.
- **Trust score**: deterministic numeric/ranked evaluation of the source and/or statement quality.
- **Decision**: a policy outcome (pass/fail/needs-review) for a specific artifact or release.
- **Attestation**: signed statement bound to an artifact (e.g., OCI artifact) that captures decision + evidence.
- **Knowledge snapshot**: frozen set of inputs (VEX docs, keys, policies, vulnerability DB versions, scoring code version) required for deterministic replay.
---
## 3) Product Manager guidelines
### 3.1 Treat “VEX source onboarding” as a first-class product workflow
Your differentiator collapses if VEX is just “upload a file.”
**PM requirements:**
1. **VEX Source Registry UI/API**
- Add/edit a source: URL/feed/OCI pattern, update cadence, expected issuer(s), allowed formats.
- Define trust policy per source (thresholds, allowed statuses, expiry, overrides).
2. **Issuer enrollment & key lifecycle**
- Capture: issuer identity, trust anchor, rotation, revocation/deny-list, “break-glass disable.”
3. **Operational status**
- Source health: last fetch, last verified doc, signature failures, schema failures, drift.
**Why it matters:** customers will only operationalize VEX at scale if they can **govern it like a dependency feed**, not like a manual exception list.
### 3.2 Make “verification” visible, not implied
If users cant see it, they wont trust it—and auditors wont accept it.
**Minimum UX per VEX document/statement:**
- Verification status: **Verified / Unverified / Failed**
- Issuer identity: who signed it (and via what trust anchor)
- Format + schema validation status (OpenVEX JSON schema exists and is explicitly recommended for validation). citeturn10view0
- Freshness: timestamp, last updated
- Product mapping coverage: “X of Y products matched to SBOM/components”
### 3.3 Provide “trust score explanations” as a primary UI primitive
Trust scoring must not feel like a magic number.
**UX requirement:** every trust score shows a **breakdown** (e.g., Identity 30/30, Authority 20/25, Freshness 8/10, Evidence quality 6/10…).
This is both:
- a user adoption requirement (security teams will challenge it), and
- a moat hardener (competitors rarely expose scoring mechanics).
### 3.4 Define policy experiences that force deterministic coupling
You are not building a “VEX viewer.” You are building **decisioning**.
Policies must allow:
- “Accept VEX only if verified AND trust score ≥ threshold”
- “Accept Not Affected only if justification/impact statement exists”
- “If conflicting VEX exists, resolve by trust-weighted precedence”
- “For unverified VEX, treat status as Under Investigation (or Unknown), not Not Affected”
This aligns with CSAFs VEX profile expectation that *known_not_affected* should have an impact statement (machine-readable flag or human-readable justification). citeturn1view1
### 3.5 Ship “audit export” as a product feature, not a report
Auditors want to know:
- which VEX claims were applied,
- who asserted them,
- what trust policy allowed them,
- and what was the resulting decision.
ENISAs SBOM guidance explicitly emphasizes “historical snapshots” and “evidence chain integrity” as success criteria for SBOM/VEX integration programs. citeturn8view0
So your product needs:
- exportable evidence bundles (machine-readable)
- signed verdicts linked to the artifact
- replay semantics (“recompute this exact decision later”)
### 3.6 MVP scoping: start with sources that prove the model
For early product proof, prioritize sources that:
- are official,
- have consistent structure,
- publish frequently,
- contain configuration nuance.
Example: Ubuntu publishes VEX following OpenVEX, emphasizing exploitability in specific configurations and providing official distribution points (tarball + GitHub). citeturn9view0turn6view0
This gives you a clean first dataset for verification/trust scoring behaviors.
---
## 4) Development Manager guidelines
### 4.1 Architect it as a pipeline with hard boundaries
Do not mix verification, scoring, and decisioning in one component. You need isolatable, testable stages.
**Recommended pipeline stages:**
1. **Ingest**
- Fetch from registry/OCI
- Deduplicate by content hash
2. **Parse & normalize**
- Convert OpenVEX / CSAF VEX / CycloneDX VEX into a **canonical internal VEX model**
- Note: OpenVEX explicitly calls out that CycloneDX VEX uses different status/justification labels and may need translation. citeturn10view0
3. **Verify (cryptographic + semantic)**
4. **Trust score (pure function)**
5. **Conflict resolve**
6. **Decision**
7. **Attest + persist snapshot**
### 4.2 Verification must include both cryptography and semantics
#### Cryptographic verification (minimum bar)
- Verify signature/attestation against expected issuer identity.
- Validate certificate/identity chains per customer trust anchors.
- Support OCI-attached artifacts and “signature-of-signature” patterns (Sigstore describes countersigning: signature artifacts can themselves be signed). citeturn1view3
#### Semantic verification (equally important)
- Schema validation (OpenVEX provides JSON schema guidance). citeturn10view0
- Vulnerability identifier validity (CVE/aliases)
- Product reference validity (e.g., purl)
- Statement completeness rules:
- “Not affected” must include rationale; CSAF VEX profile requires an impact statement for known_not_affected in flags or threats. citeturn1view1
- Cross-check the statement scope to known SBOM/components:
- If the VEX references products that do not exist in the artifact SBOM, the claim should not affect the decision (or should reduce trust sharply).
### 4.3 Trust scoring must be deterministic by construction
If trust scoring varies between runs, you cannot produce replayable, attestable decisions.
**Rules for determinism:**
- Trust score is a **pure function** of:
- VEX document hash
- verification result
- source configuration (immutable version)
- scoring algorithm version
- evaluation timestamp (explicit input, included in snapshot)
- Never call external services during scoring unless responses are captured and hashed into the snapshot.
### 4.4 Implement two trust concepts: Source Trust and Statement Quality
Do not overload one score to do everything.
- **Source Trust**: “how much do we trust the issuer/channel?”
- **Statement Quality**: “how well-formed, specific, justified is this statement?”
You can then combine them:
`TrustScore = f(SourceTrust, StatementQuality, Freshness, TrackRecord)`
### 4.5 Conflict resolution must be policy-driven, not hard-coded
Conflicting VEX is inevitable:
- vendor vs distro
- older vs newer
- internal vs external
Resolve via:
- deterministic precedence rules configured per tenant
- trust-weighted tie-breakers
- “newer statement wins” only when issuer is the same or within the same trust class
### 4.6 Store VEX and decision inputs as content-addressed artifacts
If you want replayability, you must be able to reconstruct the “world state.”
**Persist:**
- VEX docs (by digest)
- verification artifacts (signature bundles, cert chains)
- normalized VEX statements (canonical form)
- trust score + breakdown + algorithm version
- policy bundle + version
- vulnerability DB snapshot identifiers
- decision output + evidence pointers
---
## 5) A practical trust scoring rubric you can hand to teams
Use a 0100 score with defined buckets. The weights below are a starting point; what matters is consistency and explainability.
### 5.1 Source Trust (060)
1. **Issuer identity verified (025)**
- 0 if unsigned/unverifiable
- 25 if signature verified to a known trust anchor
2. **Issuer authority alignment (020)**
- 20 if issuer is the product supplier/distro maintainer for that component set
- lower if third party / aggregator
3. **Distribution integrity (015)**
- extra credit if the VEX is distributed as an attestation bound to an artifact and/or uses auditable signature patterns (e.g., countersigning). citeturn1view3turn10view0
### 5.2 Statement Quality (040)
1. **Scope specificity (015)**
- exact product IDs (purl), versions, architectures, etc.
2. **Justification/impact present and structured (015)**
- CSAF VEX expects impact statement for known_not_affected; Ubuntu maps “not_affected” to justifications like `vulnerable_code_not_present`. citeturn1view1turn9view0
3. **Freshness (010)**
- based on statement/document timestamps (explicitly hashed into snapshot)
### Score buckets
- **90100**: Verified + authoritative + high-quality → eligible for gating
- **7089**: Verified but weaker evidence/scope → eligible with policy constraints
- **4069**: Mixed/partial trust → informational, not gating by default
- **039**: Unverified/low quality → do not affect decisions
---
## 6) Tight coupling to deterministic decisioning: what “coupling” means in practice
### 6.1 VEX must be an input to the same deterministic evaluation engine that produces the verdict
Do not build “VEX handling” as a sidecar that produces annotations.
**Decision engine inputs must include:**
- SBOM / component graph
- vulnerability findings
- normalized VEX statements
- verification results + trust scores
- tenant policy bundle
- evaluation timestamp + snapshot identifiers
The engine output must include:
- final status per vulnerability (affected/not affected/fixed/under investigation/unknown)
- **why** (evidence pointers)
- the policy rule(s) that caused it
### 6.2 Default posture: fail-safe, not fail-open
Recommended defaults:
- **Unverified VEX never suppresses vulnerabilities.**
- Trust score below threshold never suppresses.
- “Not affected” without justification/impact statement never suppresses.
This is aligned with CSAF VEX expectations and avoids the easiest suppression attack vector. citeturn1view1
### 6.3 Make uncertainty explicit
If VEX conflicts or is low trust, your decisioning must produce explicit states like:
- “Unknown (insufficient trusted VEX)”
- “Under Investigation”
That is consistent with common VEX status vocabulary and avoids false certainty. citeturn6view1turn9view0
---
## 7) Tight coupling to attestations: what to attest, when, and why
### 7.1 Attest **decisions**, not just documents
Competitors already sign SBOMs. Your moat is signing the **verdict** with the evidence chain.
Each signed verdict should bind:
- subject artifact digest (container/image/package)
- decision output (pass/fail/etc.)
- hashes of:
- VEX docs used
- verification artifacts
- trust scoring breakdown
- policy bundle
- vulnerability DB snapshot identifiers
### 7.2 Make attestations replayable
Your attestation must contain enough references (digests) that the system can:
- re-run the decision in an air-gapped environment
- obtain the same outputs
This aligns with “historical snapshots” / “evidence chain integrity” expectations in modern SBOM programs. citeturn8view0
### 7.3 Provide two attestations (recommended)
1. **VEX intake attestation** (optional but powerful)
- “We ingested and verified this VEX doc from issuer X under policy Y.”
2. **Risk verdict attestation** (core differentiator)
- “Given SBOM, vulnerabilities, verified VEX, and policy snapshot, the artifact is acceptable/unacceptable.”
Sigstores countersigning concept illustrates that you can add layers of trust over artifacts/signatures; your verdict is the enterprise-grade layer. citeturn1view3
---
## 8) “Definition of Done” checklists (use in roadmaps)
### PM DoD for VEX Trust (ship criteria)
- A customer can onboard a VEX source and see issuer identity + verification state.
- Trust score exists with a visible breakdown and policy thresholds.
- Policies can gate on trust score + verification.
- Audit export: per release, show which VEX claims affected the final decision.
### DM DoD for Deterministic + Attestable
- Same inputs → identical trust score and decision (golden tests).
- All inputs content-addressed and captured in a snapshot bundle.
- Attestation includes digests of all relevant inputs and a decision summary.
- No network dependency at evaluation time unless recorded in snapshot.
---
## 9) Metrics that prove you differentiated
Track these from the first pilot:
1. **% of decisions backed by verified VEX** (not just present)
2. **% of “not affected” outcomes with cryptographic verification + justification**
3. **Replay success rate** (recompute verdict from snapshot)
4. **Time-to-audit** (minutes to produce evidence chain for a release)
5. **False suppression rate** (should be effectively zero with fail-safe defaults)

View File

@@ -0,0 +1,268 @@
## 1) Product direction: make “Unknowns” a first-class risk primitive
### Nonnegotiable product principles
1. **Unknowns are not suppressed findings**
* They are a distinct state with distinct governance.
2. **Unknowns must be policy-addressable**
* If policy cannot block or allow them explicitly, the feature is incomplete.
3. **Unknowns must be attested**
* Every signed decision must carry “what we dont know” in a machine-readable way.
4. **Unknowns must be default-on**
* Users may adjust thresholds, but they must not be able to “turn off unknown tracking.”
### Definition: what counts as an “unknown”
PMs must ensure that “unknown” is not vague. Define **reason-coded unknowns**, for example:
* **U-RCH**: Reachability unknown (call path indeterminate)
* **U-ID**: Component identity unknown (ambiguous package / missing digest / unresolved PURL)
* **U-PROV**: Provenance unknown (cannot map binary → source/build)
* **U-VEX**: VEX conflict or missing applicability statement
* **U-FEED**: Knowledge source missing (offline feed gaps, mirror stale)
* **U-CONFIG**: Config/runtime gate unknown (feature flag not observable)
* **U-ANALYZER**: Analyzer limitation (language/framework unsupported)
Each unknown must have:
* `reason_code` (one of a stable enum)
* `scope` (component, binary, symbol, package, image, repo)
* `evidence_refs` (what we inspected)
* `assumptions` (what would need to be true/false)
* `remediation_hint` (how to reduce unknown)
**Acceptance criterion:** every unknown surfaced to users can be traced to a reason code and remediation hint.
---
## 2) Policy direction: “unknown budgets” must be enforceable and environment-aware
### Policy model requirements
Policy must support:
* Thresholds by environment (dev/test/stage/prod)
* Thresholds by unknown type (reachability vs provenance vs feed, etc.)
* Severity weighting (e.g., unknown on internet-facing service is worse)
* Exception workflow (time-bound, owner-bound)
* Deterministic evaluation (same inputs → same result)
### Recommended default policy posture (ship as opinionated defaults)
These defaults are intentionally strict in prod:
**Prod (default)**
* `unknown_reachable == 0` (fail build/deploy)
* `unknown_provenance == 0` (fail)
* `unknown_total <= 3` (fail if exceeded)
* `unknown_feed == 0` (fail; “we didnt have data” is unacceptable for prod)
**Stage**
* `unknown_reachable <= 1`
* `unknown_provenance <= 1`
* `unknown_total <= 10`
**Dev**
* Never hard fail by default; warn + ticket/PR annotation
* Still compute unknowns and show trendlines (so teams see drift)
### Exception policy (required to avoid “disable unknowns” pressure)
Implement **explicit exceptions** rather than toggles:
* Exception must include: `owner`, `expiry`, `justification`, `scope`, `risk_ack`
* Exception must be emitted into attestations and reports (“this passed with exception X”).
**Acceptance criterion:** there is no “turn off unknowns” knob; only thresholds and expiring exceptions.
---
## 3) Reporting direction: unknowns must be visible, triaged, and trendable
### Required reporting surfaces
1. **Release / PR report**
* Unknown summary at top:
* total unknowns
* unknowns by reason code
* unknowns blocking policy vs not
* “What changed?” vs previous baseline (unknown delta)
2. **Dashboard (portfolio view)**
* Unknowns over time
* Top teams/services by unknown count
* Top unknown causes (reason codes)
3. **Operational triage view**
* “Unknown queue” sortable by:
* environment impact (prod/stage)
* exposure class (internet-facing/internal)
* reason code
* last-seen time
* owner
### Reporting should drive action, not anxiety
Every unknown row must include:
* Why its unknown (reason code + short explanation)
* What evidence is missing
* How to reduce unknown (concrete steps)
* Expected effect (e.g., “adding debug symbols will likely reduce U-RCH by ~X”)
**Key PM instruction:** treat unknowns like an **SLO**. Teams should be able to commit to “unknowns in prod must trend to zero.”
---
## 4) Attestations direction: unknowns must be cryptographically bound to decisions
Every signed decision/attestation must include an “unknowns summary” section.
### Attestation requirements
Include at minimum:
* `unknown_total`
* `unknown_by_reason_code` (map of reason→count)
* `unknown_blocking_count`
* `unknown_details_digest` (hash of the full list if too large)
* `policy_thresholds_applied` (the exact thresholds used)
* `exceptions_applied` (IDs + expiries)
* `knowledge_snapshot_id` (feeds/policy bundle hash if you support offline snapshots)
**Why this matters:** if you sign a “pass,” you must also sign what you *didnt know* at the time. Otherwise the signature is not audit-grade.
**Acceptance criterion:** any downstream verifier can reject a signed “pass” based solely on unknown fields (e.g., “reject if unknown_reachable>0 in prod”).
---
## 5) Development direction: implement unknown propagation as a first-class data flow
### Core engineering tasks (must be done in this order)
#### A. Define the canonical “Tri-state” evaluation type
For any security claim, the evaluator must return:
* `TRUE` (evidence supports)
* `FALSE` (evidence refutes)
* `UNKNOWN` (insufficient evidence)
Do not represent unknown as nulls or missing fields. It must be explicit.
#### B. Build the unknown aggregator and reason-code framework
* A single aggregation layer computes:
* unknown counts per scope
* unknown counts per reason code
* unknown “blockers” based on policy
* This must be deterministic and stable (no random ordering, stable IDs).
#### C. Ensure analyzers emit unknowns instead of silently failing
Any analyzer that cannot conclude must emit:
* `UNKNOWN` + reason code + evidence pointers
Examples:
* call graph incomplete → `U-RCH`
* stripped binary cannot map symbols → `U-PROV`
* unsupported language → `U-ANALYZER`
#### D. Provide “reduce unknown” instrumentation hooks
Attach remediation metadata:
* “add build flags …”
* “upload debug symbols …”
* “enable source mapping …”
* “mirror feeds …”
This is how you prevent user backlash.
---
## 6) Make it default rather than optional: rollout plan without breaking adoption
### Phase 1: compute + display (no blocking)
* Unknowns computed for all scans
* Reports show unknown budgets and what would have failed in prod
* Collect baseline metrics for 24 weeks of typical usage
### Phase 2: soft gating
* In prod-like pipelines: fail only on `unknown_reachable > 0`
* Everything else warns + requires owner acknowledgement
### Phase 3: full policy enforcement
* Enforce default thresholds
* Exceptions require expiry and are visible in attestations
### Phase 4: governance integration
* Unknowns become part of:
* release readiness checks
* quarterly risk reviews
* vendor compliance audits
**Dev Manager instruction:** invest in tooling that reduces unknowns early (symbol capture, provenance mapping, better analyzers). Otherwise “unknown gating” becomes politically unsustainable.
---
## 7) “Definition of Done” checklist for PMs and Dev Managers
### PM DoD
* [ ] Unknowns are explicitly defined with stable reason codes
* [ ] Policy can fail on unknowns with environment-scoped thresholds
* [ ] Reports show unknown deltas and remediation guidance
* [ ] Exceptions are time-bound and appear everywhere (UI + API + attestations)
* [ ] Unknowns cannot be disabled; only thresholds/exceptions are configurable
### Engineering DoD
* [ ] Tri-state evaluation implemented end-to-end
* [ ] Analyzer failures never disappear; they become unknowns
* [ ] Unknown aggregation is deterministic and reproducible
* [ ] Signed attestation includes unknown summary + policy thresholds + exceptions
* [ ] CI/CD integration can enforce “fail if unknowns > N in prod”
---
## 8) Concrete policy examples you can standardize internally
### Minimal policy (prod)
* Block deploy if:
* `unknown_reachable > 0`
* OR `unknown_provenance > 0`
### Balanced policy (prod)
* Block deploy if:
* `unknown_reachable > 0`
* OR `unknown_provenance > 0`
* OR `unknown_total > 3`
### Risk-sensitive policy (internet-facing prod)
* Block deploy if:
* `unknown_reachable > 0`
* OR `unknown_total > 1`
* OR any unknown affects a component with known remotely-exploitable CVEs

View File

@@ -0,0 +1,104 @@
Below is a **feature → moat strength** map for Stella Ops, explicitly benchmarked against the tools weve been discussing (Trivy/Aqua, Grype/Syft, Anchore Enterprise, Snyk, Prisma Cloud). Im using **“moat”** in the strict sense: *how hard is it for an incumbent to replicate the capability to parity, and how strong are the switching costs once deployed.*
### Moat scale
* **5 = Structural moat** (new primitives, strong defensibility, durable switching cost)
* **4 = Strong moat** (difficult multi-domain engineering; incumbents have only partial analogs)
* **3 = Moderate moat** (others can build; differentiation is execution + packaging)
* **2 = Weak moat** (table-stakes soon; limited defensibility)
* **1 = Commodity** (widely available in OSS / easy to replicate)
---
## 1) Stella Ops candidate features mapped to moat strength
| Stella Ops feature (precisely defined) | Closest competitor analogs (evidence) | Competitive parity today | Moat strength | Why this is (or isnt) defensible | How to harden the moat |
| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------: | ------------: | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Signed, replayable risk verdicts**: “this artifact is acceptable” decisions produced deterministically, with an evidence bundle + policy snapshot, signed as an attestation | Ecosystem can sign SBOM attestations (e.g., Syft + Sigstore; DSSE/in-toto via cosign), but not “risk verdict” decisions end-to-end ([Anchore][1]) | Low | **5** | This requires a **deterministic evaluation model**, a **proof/evidence schema**, and “knowledge snapshotting” so results are replayable months later. Incumbents mostly stop at exporting scan results or SBOMs, not signing a decision in a reproducible way. | Make the verdict format a **first-class artifact** (OCI-attached attestation), with strict replay semantics (“same inputs → same verdict”), plus auditor-friendly evidence extraction. |
| **VEX decisioning engine (not just ingestion)**: ingest OpenVEX/CycloneDX/CSAF, resolve conflicts with a trust/policy lattice, and produce explainable outcomes | Trivy supports multiple VEX formats (CycloneDX/OpenVEX/CSAF) but notes its “experimental/minimal functionality” ([Trivy][2]). Grype supports OpenVEX ingestion ([Chainguard][3]). Anchore can generate VEX docs from annotations (OpenVEX + CycloneDX) ([Anchore Docs][4]). Aqua runs VEX Hub for distributing VEX statements to Trivy ([Aqua][5]) | Medium (ingestion exists; decision logic is thin) | **4** | Ingestion alone is easy; the moat comes from **formal conflict resolution**, provenance-aware trust weighting, and deterministic outcomes. Most tools treat VEX as suppression/annotation, not a reasoning substrate. | Ship a **policy-controlled merge semantics** (“vendor > distro > internal” is too naive) + required evidence hooks (e.g., “not affected because feature flag off”). |
| **Reachability with proof**, tied to deployable artifacts: produce a defensible chain “entrypoint → call path → vulnerable symbol,” plus configuration gates | Snyk has reachability analysis in GA for certain languages/integrations and uses call-graph style reasoning to determine whether vulnerable code is called ([Snyk User Docs][6]). Some commercial vendors also market reachability (e.g., Endor Labs is listed in CycloneDX Tool Center as analyzing reachability) ([CycloneDX][7]) | Medium (reachability exists, but proof portability varies) | **4** | “Reachability” as a label is no longer unique. The moat is **portable proofs** (usable in audits and in air-gapped environments) + artifact-level mapping (not just source repo analysis) + deterministic replay. | Focus on **proof-carrying reachability**: store the reachability subgraph as evidence; make it reproducible and attestable; support both source and post-build artifacts. |
| **Smart-Diff (semantic risk delta)**: between releases, explain “what materially changed in exploitable surface,” not just “CVE count changed” | Anchore provides SBOM management and policy evaluation (good foundation), but “semantic risk diff” is not a prominent, standardized feature in typical scanners ([Anchore Docs][8]) | LowMedium | **4** | Most incumbents can diff findings lists. Few can diff **reachability graphs, policy outcomes, and VEX state** to produce stable “delta narratives.” Hard to replicate without the underlying evidence model. | Treat diff as first-class: version SBOM graphs + reachability graphs + VEX claims; compute deltas over those graphs and emit a signed “delta verdict.” |
| **Unknowns as first-class state**: represent “unknown-reachable/unknown-unreachable” and force policies to account for uncertainty | Not a standard capability in common scanners/platforms; most systems output findings and (optionally) suppressions | Low | **4** | This is conceptually simple but operationally rare; it requires rethinking UX, scoring, and policy evaluation. It becomes sticky once orgs base governance on uncertainty budgets. | Bake unknowns into policies (“fail if unknowns > N in prod”), reporting, and attestations. Make it the default rather than optional. |
| **Air-gapped epistemic mode**: offline operation where the tool can prove what knowledge it used (feed snapshot + timestamps + trust anchors) | Prisma Cloud Compute Edition supports air-gapped environments and has an offline Intel Stream update mechanism ([Prisma Cloud Docs][9]). (But “prove exact knowledge state used for decisions” is typically not the emphasis.) | Medium | **4** | Air-gapped “runtime” is common; air-gapped **reproducibility** is not. The moat is packaging offline feeds + policies + deterministic scoring into a replayable bundle tied to attestations. | Deliver a “sealed knowledge snapshot” workflow (export/import), and make audits a one-command replay. |
| **SBOM ledger + lineage**: BYOS ingestion plus versioned SBOM storage, grouping, and historical tracking | Anchore explicitly positions centralized SBOM management and “Bring Your Own SBOM” ([Anchore Docs][8]). Snyk can generate SBOMs and expose SBOM via API in CycloneDX/SPDX formats ([Snyk User Docs][10]). Prisma can export CycloneDX SBOMs for scans ([Prisma Cloud Docs][11]) | High | **3** | SBOM generation/storage is quickly becoming table stakes. You can still differentiate on **graph fidelity + lineage semantics**, but “having SBOMs” alone wont be a moat. | Make the ledger valuable via **semantic diff, evidence joins (reachability/VEX), and provenance** rather than storage. |
| **Policy engine with proofs**: policy-as-code that produces a signed explanation (“why pass/fail”) and links to evidence nodes | Anchore has a mature policy model (policy JSON, gates, allowlists, mappings) ([Anchore Docs][12]). Prisma/Aqua have rich policy + runtime guardrails (platform-driven) ([Aqua][13]) | High | **3** | Policy engines are common. The moat is the **proof output** + deterministic replay + integration with attestations. | Keep policy language small but rigorous; always emit evidence pointers; support “policy compilation” to deterministic decision artifacts. |
| **VEX distribution network**: ecosystem layer that aggregates, validates, and serves VEX at scale | Aquas VEX Hub is explicitly a centralized repository designed for discover/fetch/consume flows with Trivy ([Aqua][5]) | Medium | **34** | A network layer can become a moat if it achieves broad adoption. But incumbents can also launch hubs. This becomes defensible only with **network effects + trust frameworks**. | Differentiate with **verification + trust scoring** of VEX sources, plus tight coupling to deterministic decisioning and attestations. |
| **“Integrations everywhere”** (CI/CD, registry, Kubernetes, IDE) | Everyone in this space integrates broadly; reachability and scoring features often ride those integrations (e.g., Snyk reachability depends on repo/integration access) ([Snyk User Docs][6]) | High | **12** | Integrations are necessary, but not defensible—mostly engineering throughput. | Use integrations to *distribute attestations and proofs*, not as the headline differentiator. |
---
## 2) Where competitors already have strong moats (avoid headon fights early)
These are areas where incumbents are structurally advantaged, so Stella Ops should either (a) integrate rather than replace, or (b) compete only if you have a much sharper wedge.
### Snyks moat: developer adoption + reachability-informed prioritization
* Snyk publicly documents **reachability analysis** (GA for certain integrations/languages) ([Snyk User Docs][6])
* Snyk prioritization incorporates reachability and other signals into **Priority Score** ([Snyk User Docs][14])
**Implication:** pure “reachability” claims wont beat Snyk; **proof-carrying, artifact-tied, replayable reachability** can.
### Prisma Clouds moat: CNAPP breadth + graph-based risk prioritization + air-gapped CWPP
* Prisma invests in graph-driven investigation/tracing of vulnerabilities ([Prisma Cloud Docs][15])
* Risk prioritization and risk-score ranked vulnerability views are core platform capabilities ([Prisma Cloud Docs][16])
* Compute Edition supports **air-gapped environments** and has offline update workflows ([Prisma Cloud Docs][9])
**Implication:** competing on “platform breadth” is a losing battle early; compete on **decision integrity** (deterministic, attestable, replayable) and integrate where needed.
### Anchores moat: SBOM operations + policy-as-code maturity
* Anchore is explicitly SBOM-management centric and supports policy gating constructs ([Anchore Docs][8])
**Implication:** Anchore is strong at “SBOM at scale.” Stella Ops should outperform on **semantic diff, VEX reasoning, and proof outputs**, not just SBOM storage.
### Aquas moat: code-to-runtime enforcement plus emerging VEX distribution
* Aqua provides CWPP-style runtime policy enforcement/guardrails ([Aqua][13])
* Aqua backs VEX Hub for VEX distribution and Trivy consumption ([Aqua][5])
**Implication:** if Stella Ops is not a runtime protection platform, dont chase CWPP breadth—use Aqua/Prisma integrations and focus on upstream decision quality.
---
## 3) Practical positioning: which features produce the most durable wedge
If you want the shortest path to a *defensible* position:
1. **Moat anchor (5): Signed, replayable risk verdicts**
* Everything else (VEX, reachability, diff) becomes evidence feeding that verdict.
2. **Moat amplifier (4): VEX decisioning + proof-carrying reachability**
* In 2025, VEX ingestion exists in Trivy/Grype/Anchore ([Trivy][2]), and reachability exists in Snyk ([Snyk User Docs][6]).
* Your differentiation must be: **determinism + portability + auditability**.
3. **Moat compounding (4): Smart-Diff over risk meaning**
* Turns “scan results” into an operational change-control primitive.
---
## 4) A concise “moat thesis” per feature (one-liners you can use internally)
* **Deterministic signed verdicts:** “We dont output findings; we output an attestable decision that can be replayed.”
* **VEX decisioning:** “We treat VEX as a logical claim system, not a suppression file.”
* **Reachability proofs:** “We provide proof of exploitability in *this* artifact, not just a badge.”
* **Smart-Diff:** “We explain what changed in exploitable surface area, not what changed in CVE count.”
* **Unknowns modeling:** “We quantify uncertainty and gate on it.”
---
If you want, I can convert the table into a **2×2 moat map** (Customer Value vs Defensibility) and a **build-order roadmap** that maximizes durable advantage while minimizing overlap with entrenched competitor moats.
[1]: https://anchore.com/sbom/creating-sbom-attestations-using-syft-and-sigstore/?utm_source=chatgpt.com "Creating SBOM Attestations Using Syft and Sigstore"
[2]: https://trivy.dev/docs/v0.50/supply-chain/vex/?utm_source=chatgpt.com "VEX"
[3]: https://www.chainguard.dev/unchained/vexed-then-grype-about-it-chainguard-and-anchore-announce-grype-supports-openvex?utm_source=chatgpt.com "VEXed? Then Grype about it"
[4]: https://docs.anchore.com/current/docs/vulnerability_management/vuln_annotations/?utm_source=chatgpt.com "Vulnerability Annotations and VEX"
[5]: https://www.aquasec.com/blog/introducing-vex-hub-unified-repository-for-vex-statements/?utm_source=chatgpt.com "Trivy VEX Hub:The Solution to Vulnerability Fatigue"
[6]: https://docs.snyk.io/manage-risk/prioritize-issues-for-fixing/reachability-analysis?utm_source=chatgpt.com "Reachability analysis"
[7]: https://cyclonedx.org/tool-center/?utm_source=chatgpt.com "CycloneDX Tool Center"
[8]: https://docs.anchore.com/current/docs/sbom_management/?utm_source=chatgpt.com "SBOM Management"
[9]: https://docs.prismacloud.io/en/compute-edition?utm_source=chatgpt.com "Prisma Cloud Compute Edition"
[10]: https://docs.snyk.io/developer-tools/snyk-cli/commands/sbom?utm_source=chatgpt.com "SBOM | Snyk User Docs"
[11]: https://docs.prismacloud.io/en/compute-edition/32/admin-guide/vulnerability-management/exporting-sboms?utm_source=chatgpt.com "Exporting Software Bill of Materials on CycloneDX"
[12]: https://docs.anchore.com/current/docs/overview/concepts/policy/policies/?utm_source=chatgpt.com "Policies and Evaluation"
[13]: https://www.aquasec.com/products/cwpp-cloud-workload-protection/?utm_source=chatgpt.com "Cloud workload protection in Runtime - Aqua Security"
[14]: https://docs.snyk.io/manage-risk/prioritize-issues-for-fixing?utm_source=chatgpt.com "Prioritize issues for fixing"
[15]: https://docs.prismacloud.io/en/enterprise-edition/content-collections/search-and-investigate/c2c-tracing-vulnerabilities/investigate-vulnerabilities-tracing?utm_source=chatgpt.com "Use Vulnerabilities Tracing on Investigate"
[16]: https://docs.prismacloud.io/en/enterprise-edition/use-cases/secure-the-infrastructure/risk-prioritization?utm_source=chatgpt.com "Risk Prioritization - Prisma Cloud Documentation"

View File

@@ -0,0 +1,469 @@
Below are implementation-grade guidelines for Stella Ops Product Managers (PMs) and Development Managers (Eng Managers / Tech Leads) for two tightly coupled capabilities:
1. **Exception management as auditable objects** (not suppression files)
2. **Audit packs** (exportable, verifiable evidence bundles for releases and environments)
The intent is to make these capabilities:
* operationally useful (reduce friction in CI/CD and runtime governance),
* defensible in audits (tamper-evident, attributable, time-bounded), and
* consistent with Stella Ops positioning around determinism, evidence, and replayability.
---
# 1. Shared objectives and boundaries
## 1.1 Objectives
These two capabilities must jointly enable:
* **Risk decisions are explicit**: Every “ignore/suppress/waive” is a governed decision with an owner and expiry.
* **Decisions are replayable**: If an auditor asks “why did you ship this on date X?”, Stella Ops can reproduce the decision using the same policy + evidence + knowledge snapshot.
* **Decisions are exportable and verifiable**: Audit packs include the minimum necessary artifacts and a manifest that allows independent verification of integrity and completeness.
* **Operational friction is reduced**: Teams can ship safely with controlled exceptions, rather than ad-hoc suppressions, while retaining accountability.
## 1.2 Out of scope (explicitly)
Avoid scope creep early. The following are out of scope for v1 unless mandated by a target customer:
* Full GRC mapping to specific frameworks (you can *support evidence*; dont claim compliance).
* Fully automated approvals based on HR org charts.
* Multi-year archival systems (start with retention, export, and immutable event logs).
* A “ticketing system replacement.” Integrate with ticketing; dont rebuild it.
---
# 2. Shared design principles (non-negotiables)
These principles apply to both Exception Objects and Audit Packs:
1. **Attribution**: every action has an authenticated actor identity (human or service), a timestamp, and a reason.
2. **Immutability of history**: edits are new versions/events; never rewrite history in place.
3. **Least privilege scope**: exceptions must be as narrow as possible (artifact digest over tag; component purl over “any”; environment constraints).
4. **Time-bounded risk**: exceptions must expire. “Permanent ignore” is a governance smell.
5. **Deterministic evaluation**: given the same policy + snapshot + exceptions + inputs, the outcome is stable and reproducible.
6. **Separation of concerns**:
* Exception store = governed decisions.
* Scanner = evidence producer.
* Policy engine = deterministic evaluator.
* Audit packer = exporter/assembler/verifier.
---
# 3. Exception management as auditable objects
## 3.1 What an “Exception Object” is
An Exception Object is a structured, versioned record that modifies evaluation behavior *in a controlled manner*, while leaving the underlying findings intact.
It is not:
* a local `.ignore` file,
* a hidden suppression rule,
* a UI-only toggle,
* a vendor-specific “ignore list” with no audit trail.
### Exception types you should support (minimum set)
PMs should start with these canonical types:
1. **Vulnerability exception**
* suppress/waive a specific vulnerability finding (e.g., CVE/CWE) under defined scope.
2. **Policy exception**
* allow a policy rule to be bypassed under defined scope (e.g., “allow unsigned artifact for dev namespace”).
3. **Unknown-state exception** (if Stella models unknowns)
* allow a release despite unresolved unknowns, with explicit risk acceptance.
4. **Component exception**
* allow/deny a component/package/version across a domain, again with explicit scope and expiry.
## 3.2 Required fields and schema guidelines
PMs: mandate these fields; Eng: enforce them at API and storage level.
### Required fields (v1)
* **exception_id** (stable identifier)
* **version** (monotonic; or event-sourced)
* **status**: proposed | approved | active | expired | revoked
* **owner** (accountable person/team)
* **requester** (who initiated)
* **approver(s)** (who approved; may be empty for dev environments depending on policy)
* **created_at / updated_at / approved_at / expires_at**
* **scope** (see below)
* **reason_code** (taxonomy)
* **rationale** (free text, required)
* **evidence_refs** (optional in v1 but strongly recommended)
* **risk_acceptance** (explicit boolean or structured “risk accepted” block)
* **links** (ticket ID, PR, incident, vendor advisory reference) optional but useful
* **audit_log_refs** (implicit if event-sourced)
### Scope model (critical to defensibility)
Scope must be structured and narrowable. Provide scope dimensions such as:
* **Artifact scope**: image digest, SBOM digest, build provenance digest (preferred)
(Avoid tags as primary scope unless paired with immutability constraints.)
* **Component scope**: purl + version range + ecosystem
* **Vulnerability scope**: CVE ID(s), GHSA, internal ID; optionally path/function/symbol constraints
* **Environment scope**: cluster/namespace, runtime env (dev/stage/prod), repository, project, tenant
* **Time scope**: expires_at (required), optional “valid_from”
PM guideline: default UI and API should encourage digest-based scope and warn on broad scopes.
## 3.3 Reason codes (taxonomy)
Reason codes are a moat because they enable governance analytics and policy automation.
Minimum suggested taxonomy:
* **FALSE_POSITIVE** (with evidence expectations)
* **NOT_REACHABLE** (reachable proof preferred)
* **NOT_AFFECTED** (VEX-backed preferred)
* **BACKPORT_FIXED** (package/distro evidence preferred)
* **COMPENSATING_CONTROL** (link to control evidence)
* **RISK_ACCEPTED** (explicit sign-off)
* **TEMPORARY_WORKAROUND** (link to mitigation plan)
* **VENDOR_PENDING** (under investigation)
* **BUSINESS_EXCEPTION** (rare; requires stronger approval)
PM guideline: reason codes must be selectable and reportable; do not allow “Other” as the default.
## 3.4 Evidence attachments
Exceptions should evolve from “justification-only” to “justification + evidence.”
Evidence references can point to:
* VEX statements (OpenVEX/CycloneDX VEX)
* reachability proof fragments (call-path subgraph, symbol references)
* distro advisories / patch references
* internal change tickets / mitigation PRs
* runtime mitigations
Eng guideline: store evidence as references with integrity checks (hash/digest). For v2+, store evidence bundles as content-addressed blobs.
## 3.5 Lifecycle and workflows
### Lifecycle states and transitions
* **Proposed** → **Approved****Active** → (**Expired** or **Revoked**)
* **Renewal** should create a **new version** (never extend an old record silently).
### Approvals
PM guideline:
* At least two approval modes:
1. **Self-approved** (allowed only for dev/experimental scopes)
2. **Two-person review** (required for prod or broad scope)
Eng guideline:
* Enforce approval rules via policy config (not hard-coded).
* Record every approval action with actor identity and timestamp.
### Expiry enforcement
Non-negotiable:
* Expired exceptions must stop applying automatically.
* Renewals require an explicit action and new audit trail.
## 3.6 Evaluation semantics (how exceptions affect results)
This is where most products become non-auditable. You need deterministic, explicit rules.
PM guideline: define precedence clearly:
* Policy engine evaluates baseline findings → applies exceptions → produces verdict.
* Exceptions never delete underlying findings; they alter the *decision outcome* and annotate the reasoning.
Eng guideline: exception application must be:
* **Deterministic** (stable ordering rules)
* **Transparent** (verdict includes “exception applied: exception_id, reason_code, scope match explanation”)
* **Scoped** (match explanation must state which scope dimensions matched)
## 3.7 Auditability requirements
Exception management must be audit-ready by construction.
Minimum requirements:
* **Append-only event log** for create/approve/revoke/expire/renew actions
* **Versioning**: every change results in a new version or event
* **Tamper-evidence**: hash chain events or sign event batches
* **Retention**: define retention policy and export strategy
PM guideline: auditors will ask “who approved,” “why,” “when,” “what scope,” and “what changed since.” Design the UX and exports to answer those in minutes.
## 3.8 UX guidelines
Key UX flows:
* **Create exception from a finding** (pre-fill CVE/component/artifact scope)
* **Preview impact** (“this will suppress 37 findings across 12 images; are you sure?”)
* **Expiry visibility** (countdown, alerts, renewal prompts)
* **Audit trail view** (who did what, with diffs between versions)
* **Search and filters** by owner, reason, expiry window, scope breadth, environment
UX anti-patterns to forbid:
* “Ignore all vulnerabilities in this image” with one click
* Silent suppressions without owner/expiry
* Exceptions created without linking to scope and reason
## 3.9 Product acceptance criteria (PM-owned)
A feature is not “done” until:
* Every exception has owner, expiry, reason code, scope.
* Exception history is immutable and exportable.
* Policy outcomes show applied exceptions and why.
* Expiry is enforced automatically.
* A user can answer: “What exceptions were active for this release?” within 2 minutes.
---
# 4. Audit packs
## 4.1 What an audit pack is
An Audit Pack is a **portable, verifiable bundle** that answers:
* What was evaluated? (artifacts, versions, identities)
* Under what policies? (policy version/config)
* Using what knowledge state? (vuln DB snapshot, VEX inputs)
* What exceptions were applied? (IDs, owners, rationales)
* What was the decision and why? (verdict + evidence pointers)
* What changed since the last release? (optional diff summary)
PM guideline: treat the Audit Pack as a product deliverable, not an export button.
## 4.2 Pack structure (recommended)
Use a predictable, documented layout. Example:
* `manifest.json`
* pack_id, generated_at, generator_version
* hashes/digests of every included file
* signing info (optional in v1; recommended soon)
* `inputs/`
* artifact identifiers (digests), repo references (optional)
* SBOM(s) (CycloneDX/SPDX)
* `vex/`
* VEX docs used + any VEX produced
* `policy/`
* policy bundle used (versioned)
* evaluation settings
* `exceptions/`
* all exceptions relevant to the evaluated scope
* plus event logs / versions
* `findings/`
* normalized findings list
* reachability evidence fragments if applicable
* `verdict/`
* final decision object
* explanation summary
* signed attestation (if supported)
* `diff/` (optional)
* delta from prior baseline (what changed materially)
## 4.3 Formats: human and machine
You need both:
* **Machine-readable** (JSON + standard SBOM/VEX formats) for verification and automation
* **Human-readable** summary (HTML or PDF) for auditors and leadership
PM guideline: machine artifacts are the source of truth. Human docs are derived views.
Eng guideline:
* Ensure the pack can be generated **offline**.
* Ensure deterministic outputs where feasible (stable ordering, consistent serialization).
## 4.4 Integrity and verification
At minimum:
* `manifest.json` includes a digest for each file.
* Provide a `stella verify-pack` CLI that checks:
* manifest integrity
* file hashes
* schema versions
* optional signature verification
For v2:
* Sign the manifest (and/or the verdict) using your standard attestation mechanism.
## 4.5 Confidentiality and redaction
Audit packs often include sensitive data (paths, internal package names, repo URLs).
PM guideline:
* Provide **redaction profiles**:
* external auditor pack (minimal identifiers)
* internal audit pack (full detail)
* Provide encryption options (password/recipient keys) if packs leave the environment.
Eng guideline:
* Redaction must be deterministic and declarative (policy-based).
* Pack generation must not leak secrets from raw scan logs.
## 4.6 Pack generation workflow
Key product flows:
* Generate pack for:
* a specific artifact digest
* a release (set of digests)
* an environment snapshot (e.g., cluster inventory)
* a date range (for audit period)
* Trigger sources:
* UI
* API
* CI pipeline step
Engineering:
* Treat pack generation as an async job (queue + status endpoint).
* Cache pack components when inputs are identical (avoid repeated work).
## 4.7 What must be included (minimum viable audit pack)
PMs should enforce that v1 includes:
* Artifact identity
* SBOM(s) or component inventory
* Findings list (normalized)
* Policy bundle reference + policy content
* Exceptions applied (full object + version info)
* Final verdict + explanation summary
* Integrity manifest with file hashes
Add these when available (v1.5+):
* VEX inputs and outputs
* Knowledge snapshot references
* Reachability evidence fragments
* Diff summary vs prior release
## 4.8 Product acceptance criteria (PM-owned)
Audit Packs are not “done” until:
* A third party can validate the pack contents havent been altered (hash verification).
* The pack answers “why did this pass/fail?” including exceptions applied.
* Packs can be generated without external network calls (air-gap friendly).
* Packs support redaction profiles.
* Pack schema is versioned and backward compatible.
---
# 5. Cross-cutting: roles, responsibilities, and delivery checkpoints
## 5.1 Responsibilities
**Product Manager**
* Define exception types and required fields
* Define reason code taxonomy and governance policies
* Define approval rules by environment and scope breadth
* Define audit pack templates, profiles, and export targets
* Own acceptance criteria and audit usability testing
**Development Manager / Tech Lead**
* Own event model (immutability, versioning, retention)
* Own policy evaluation semantics and determinism guarantees
* Own integrity and signing design (manifest hashes, optional signatures)
* Own performance and scalability targets (pack generation and query latency)
* Own secure storage and access controls (RBAC, tenant isolation)
## 5.2 Deliverables checklist (for each capability)
For “Exception Objects”:
* PRD + threat model (abuse cases: blanket waivers, privilege escalation)
* Schema spec + versioning policy
* API endpoints + RBAC model
* UI flows + audit trail UI
* Policy engine semantics + test vectors
* Metrics dashboards
For “Audit Packs”:
* Pack schema spec + folder layout
* Manifest + hash verification rules
* Generator service + async job API
* Redaction profiles + tests
* Verifier CLI + documentation
* Performance benchmarks + caching strategy
---
# 6. Common failure modes to actively prevent
1. **Exceptions become suppressions again**
If you allow exceptions without expiry/owner or without audit trail, youve rebuilt “ignore lists.”
2. **Over-broad scopes by default**
If “all repos/all images” is easy, you will accumulate permanent waivers and lose credibility.
3. **No deterministic semantics**
If the same artifact can pass/fail depending on evaluation order or transient feed updates, auditors will distrust outputs.
4. **Audit packs that are reports, not evidence**
A PDF without machine-verifiable artifacts is not an audit pack—its a slide.
5. **No renewal discipline**
If renewals are frictionless and dont require re-justification, exceptions never die.
---
# 7. Recommended phased rollout (to manage build cost)
**Phase 1: Governance basics**
* Exception object schema + lifecycle + expiry enforcement
* Create-from-finding UX
* Audit pack v1 (SBOM/inventory + findings + policy + exceptions + manifest)
**Phase 2: Evidence binding**
* Evidence refs on exceptions (VEX, reachability fragments)
* Pack includes VEX inputs/outputs and knowledge snapshot identifiers
**Phase 3: Verifiable trust**
* Signed verdicts and/or signed pack manifests
* Verifier tooling and deterministic replay hooks
---
If you want, I can convert the above into two artifacts your teams can execute against immediately:
1. A concise **PRD template** (sections + required decisions) for Exceptions and Audit Packs
2. A **technical spec outline** (schema definitions, endpoints, state machines, and acceptance test vectors)

View File

@@ -0,0 +1,556 @@
## Guidelines for Product and Development Managers: Signed, Replayable Risk Verdicts
### Purpose
Signed, replayable risk verdicts are the Stella Ops mechanism for producing a **cryptographically verifiable, auditready decision** about an artifact (container image, VM image, filesystem snapshot, SBOM, etc.) that can be **recomputed later to the same result** using the same inputs (“time-travel replay”).
This capability is not “scan output with a signature.” It is a **decision artifact** that becomes the unit of governance in CI/CD, registry admission, and audits.
---
# 1) Shared definitions and non-negotiables
## 1.1 Definitions
**Risk verdict**
A structured decision: *Pass / Fail / Warn / NeedsReview* (or similar), produced by a deterministic evaluator under a specific policy and knowledge state.
**Signed**
The verdict is wrapped in a tamperevident envelope (e.g., DSSE/intoto statement) and signed using an organization-approved trust model (key-based, keyless, or offline CA).
**Replayable**
Given the same:
* target artifact identity
* SBOM (or derivation method)
* vulnerability and advisory knowledge state
* VEX inputs
* policy bundle
* evaluator version
…Stella Ops can **re-evaluate and reproduce the same verdict** and provide evidence equivalence.
> Critical nuance: replayability is about *result equivalence*. Byteforbyte equality is ideal but not always required if signatures/metadata necessarily vary. If byteforbyte is a goal, you must strictly control timestamps, ordering, and serialization.
---
## 1.2 Non-negotiables (what must be true in v1)
1. **Verdicts are bound to immutable artifact identity**
* Container image: digest (sha256:…)
* SBOM: content digest
* File tree: merkle root digest, or equivalent
2. **Verdicts are deterministic**
* No “current time” dependence in scoring
* No non-deterministic ordering of findings
* No implicit network calls during evaluation
3. **Verdicts are explainable**
* Every deny/block decision must cite the policy clause and evidence pointers that triggered it.
4. **Verdicts are verifiable**
* Independent verification toolchain exists (CLI/library) that validates signature and checks referenced evidence integrity.
5. **Knowledge state is pinned**
* The verdict references a “knowledge snapshot” (vuln feeds, advisories, VEX set) by digest/ID, not “latest.”
---
## 1.3 Explicit non-goals (avoid scope traps)
* Building a full CNAPP runtime protection product as part of verdicting.
* Implementing “all possible attestation standards.” Pick one canonical representation; support others via adapters.
* Solving global revocation and key lifecycle for every ecosystem on day one; define a minimum viable trust model per deployment mode.
---
# 2) Product Management Guidelines
## 2.1 Position the verdict as the primary product artifact
**PM rule:** if a workflow does not end in a verdict artifact, it is not part of this moat.
Examples:
* CI pipeline step produces `VERDICT.attestation` attached to the OCI artifact.
* Registry admission checks for a valid verdict attestation meeting policy.
* Audit export bundles the verdict plus referenced evidence.
**Avoid:** “scan reports” as the goal. Reports are views; the verdict is the object.
---
## 2.2 Define the core personas and success outcomes
Minimum personas:
1. **Release/Platform Engineering**
* Needs automated gates, reproducibility, and low friction.
2. **Security Engineering / AppSec**
* Needs evidence, explainability, and exception workflows.
3. **Audit / Compliance**
* Needs replay, provenance, and a defensible trail.
Define “first value” for each:
* Release engineer: gate merges/releases without re-running scans.
* Security engineer: investigate a deny decision with evidence pointers in minutes.
* Auditor: replay a verdict months later using the same knowledge snapshot.
---
## 2.3 Product requirements (expressed as “shall” statements)
### 2.3.1 Verdict content requirements
A verdict SHALL contain:
* **Subject**: immutable artifact reference (digest, type, locator)
* **Decision**: pass/fail/warn/etc.
* **Policy binding**: policy bundle ID + version + digest
* **Knowledge snapshot binding**: snapshot IDs/digests for vuln feed and VEX set
* **Evaluator binding**: evaluator name/version + schema version
* **Rationale summary**: stable short explanation (human-readable)
* **Findings references**: pointers to detailed findings/evidence (content-addressed)
* **Unknowns state**: explicit unknown counts and categories
### 2.3.2 Replay requirements
The product SHALL support:
* Re-evaluating the same subject under the same policy+knowledge snapshot
* Proving equivalence of inputs used in the original verdict
* Producing a “replay report” that states:
* replay succeeded and matched
* or replay failed and why (e.g., missing evidence, policy changed)
### 2.3.3 UX requirements
UI/UX SHALL:
* Show verdict status clearly (Pass/Fail/…)
* Display:
* policy clause(s) responsible
* top evidence pointers
* knowledge snapshot ID
* signature trust status (who signed, chain validity)
* Provide “Replay” as an action (even if replay happens offline, the UX must guide it)
---
## 2.4 Product taxonomy: separate “verdicts” from “evaluations” from “attestations”
This is where many products get confused. Your terminology must remain strict:
* **Evaluation**: internal computation that produces decision + findings.
* **Verdict**: the stable, canonical decision payload (the thing being signed).
* **Attestation**: the signed envelope binding the verdict to cryptographic identity.
PMs must enforce this vocabulary in PRDs, UI labels, and docs.
---
## 2.5 Policy model guidelines for verdicting
Verdicting depends on policy discipline.
PM rules:
* Policy must be **versioned** and **content-addressed**.
* Policies must be **pure functions** of declared inputs:
* SBOM graph
* VEX claims
* vulnerability data
* reachability evidence (if present)
* environment assertions (if present)
* Policies must produce:
* a decision
* plus a minimal explanation graph (policy rule ID → evidence IDs)
Avoid “freeform scripts” early. You need determinism and auditability.
---
## 2.6 Exceptions are part of the verdict product, not an afterthought
PM requirement:
* Exceptions must be first-class objects with:
* scope (exact artifact/component range)
* owner
* justification
* expiry
* required evidence (optional but strongly recommended)
And verdict logic must:
* record that an exception was applied
* include exception IDs in the verdict evidence graph
* make exception usage visible in UI and audit pack exports
---
## 2.7 Success metrics (PM-owned)
Choose metrics that reflect the moat:
* **Replay success rate**: % of verdicts that can be replayed after N days.
* **Policy determinism incidents**: number of non-deterministic evaluation bugs.
* **Audit cycle time**: time to satisfy an audit evidence request for a release.
* **Noise**: # of manual suppressions/overrides per 100 releases (should drop).
* **Gate adoption**: % of releases gated by verdict attestations (not reports).
---
# 3) Development Management Guidelines
## 3.1 Architecture principles (engineering tenets)
### Tenet A: Determinism-first evaluation
Engineering SHALL ensure evaluation is deterministic across:
* OS and architecture differences (as much as feasible)
* concurrency scheduling
* non-ordered data structures
Practical rules:
* Never iterate over maps/hashes without sorting keys.
* Canonicalize output ordering (findings sorted by stable tuple: (component_id, cve_id, path, rule_id)).
* Keep “generated at” timestamps out of the signed payload; if needed, place them in an unsigned wrapper or separate metadata field excluded from signature.
### Tenet B: Content-address everything
All significant inputs/outputs should have content digests:
* SBOM digest
* policy digest
* knowledge snapshot digest
* evidence bundle digest
* verdict digest
This makes replay and integrity checks possible.
### Tenet C: No hidden network
During evaluation, the engine must not fetch “latest” anything.
Network is allowed only in:
* snapshot acquisition phase
* artifact retrieval phase
* attestation publication phase
…and each must be explicitly logged and pinned.
---
## 3.2 Canonical verdict schema and serialization rules
**Engineering guideline:** pick a canonical serialization and stick to it.
Options:
* Canonical JSON (JCS or equivalent)
* CBOR with deterministic encoding
Rules:
* Define a **schema version** and strict validation.
* Make field names stable; avoid “optional” fields that appear/disappear nondeterministically.
* Ensure numeric formatting is stable (no float drift; prefer integers or rational representation).
* Always include empty arrays if required for stability, or exclude consistently by schema rule.
---
## 3.3 Suggested verdict payload (illustrative)
This is not a mandate—use it as a baseline structure.
```json
{
"schema_version": "1.0",
"subject": {
"type": "oci-image",
"name": "registry.example.com/app/service",
"digest": "sha256:…",
"platform": "linux/amd64"
},
"evaluation": {
"evaluator": "stella-eval",
"evaluator_version": "0.9.0",
"policy": {
"id": "prod-default",
"version": "2025.12.1",
"digest": "sha256:…"
},
"knowledge_snapshot": {
"vuln_db_digest": "sha256:…",
"advisory_digest": "sha256:…",
"vex_set_digest": "sha256:…"
}
},
"decision": {
"status": "fail",
"score": 87,
"reasons": [
{ "rule_id": "RISK.CRITICAL.REACHABLE", "evidence_ref": "sha256:…" }
],
"unknowns": {
"unknown_reachable": 2,
"unknown_unreachable": 0
}
},
"evidence": {
"sbom_digest": "sha256:…",
"finding_bundle_digest": "sha256:…",
"inputs_manifest_digest": "sha256:…"
}
}
```
Then wrap this payload in your chosen attestation envelope and sign it.
---
## 3.4 Attestation format and storage guidelines
Development managers must enforce a consistent publishing model:
1. **Envelope**
* Prefer DSSE/in-toto style envelope because it:
* standardizes signing
* supports multiple signature schemes
* is widely adopted in supply chain ecosystems
2. **Attachment**
* OCI artifacts should carry verdicts as referrers/attachments to the subject digest (preferred).
* For non-OCI targets, store in an internal ledger keyed by the subject digest/ID.
3. **Verification**
* Provide:
* `stella verify <artifact>` → checks signature and integrity references
* `stella replay <verdict>` → re-run evaluation from snapshots and compare
4. **Transparency / logs**
* Optional in v1, but plan for:
* transparency log (public or private) to strengthen auditability
* offline alternatives for air-gapped customers
---
## 3.5 Knowledge snapshot engineering requirements
A “snapshot” must be an immutable bundle, ideally content-addressed:
Snapshot includes:
* vulnerability database at a specific point
* advisory sources (OS distro advisories)
* VEX statement set(s)
* any enrichment signals that influence scoring
Rules:
* Snapshot resolution must be explicit: “use snapshot digest X”
* Must support export/import for air-gapped deployments
* Must record source provenance and ingestion timestamps (timestamps may be excluded from signed payload if they cause nondeterminism; store them in snapshot metadata)
---
## 3.6 Replay engine requirements
Replay is not “re-run scan and hope it matches.”
Replay must:
* retrieve the exact subject (or confirm it via digest)
* retrieve the exact SBOM (or deterministically re-generate it from the subject in a defined way)
* load exact policy bundle by digest
* load exact knowledge snapshot by digest
* run evaluator version pinned in verdict (or enforce a compatibility mapping)
* produce:
* verdict-equivalence result
* a delta explanation if mismatch occurs
Engineering rule: replay must fail loudly and specifically when inputs are missing.
---
## 3.7 Testing strategy (required)
Deterministic systems require “golden” testing.
Minimum tests:
1. **Golden verdict tests**
* Fixed artifact + fixed snapshots + fixed policy
* Expected verdict output must match exactly
2. **Cross-platform determinism tests**
* Run same evaluation on different machines/containers and compare outputs
3. **Mutation tests for determinism**
* Randomize ordering of internal collections; output should remain unchanged
4. **Replay regression tests**
* Store verdict + snapshots and replay after code changes to ensure compatibility guarantees hold
---
## 3.8 Versioning and backward compatibility guidelines
This is essential to prevent “replay breaks after upgrades.”
Rules:
* **Verdict schema version** changes must be rare and carefully managed.
* Maintain a compatibility matrix:
* evaluator vX can replay verdict schema vY
* If you must evolve logic, do so by:
* bumping evaluator version
* preserving older evaluators in a compatibility mode (containerized evaluators are often easiest)
---
## 3.9 Security and key management guidelines
Development managers must ensure:
* Signing keys are managed via:
* KMS/HSM (enterprise)
* keyless (OIDC-based) where acceptable
* offline keys for air-gapped
* Verification trust policy is explicit:
* which identities are trusted to sign verdicts
* which policies are accepted
* whether transparency is required
* how to handle revocation/rotation
* Separate “can sign” from “can publish”
* Signing should be restricted; publishing may be broader.
---
# 4) Operational workflow requirements (cross-functional)
## 4.1 CI gate flow
* Build artifact
* Produce SBOM deterministically (or record SBOM digest if generated elsewhere)
* Evaluate → produce verdict payload
* Sign verdict → publish attestation attached to artifact
* Gate decision uses verification of:
* signature validity
* policy compliance
* snapshot integrity
## 4.2 Registry / admission flow
* Admission controller checks for a valid, trusted verdict attestation
* Optionally requires:
* verdict not older than X snapshot age (this is policy)
* no expired exceptions
* replay not required (replay is for audits; admission is fast-path)
## 4.3 Audit flow
* Export “audit pack”:
* verdict + signature chain
* policy bundle
* knowledge snapshot
* referenced evidence bundles
* Auditor (or internal team) runs `verify` and optionally `replay`
---
# 5) Common failure modes to avoid
1. **Signing “findings” instead of a decision**
* Leads to unbounded payload growth and weak governance semantics.
2. **Using “latest” feeds during evaluation**
* Breaks replayability immediately.
3. **Embedding timestamps in signed payload**
* Eliminates deterministic byte-level reproducibility.
4. **Letting the UI become the source of truth**
* The verdict artifact must be the authority; UI is a view.
5. **No clear separation between: evidence store, snapshot store, verdict store**
* Creates coupling and makes offline operations painful.
---
# 6) Definition of Done checklist (use this to gate release)
A feature increment for signed, replayable verdicts is “done” only if:
* [ ] Verdict binds to immutable subject digest
* [ ] Verdict includes policy digest/version and knowledge snapshot digests
* [ ] Verdict is signed and verifiable via CLI
* [ ] Verification works offline (given exported artifacts)
* [ ] Replay works with stored snapshots and produces match/mismatch output with reasons
* [ ] Determinism tests pass (golden + mutation + cross-platform)
* [ ] UI displays signer identity, policy, snapshot IDs, and rule→evidence links
* [ ] Exceptions (if implemented) are recorded in verdict and enforced deterministically
---
## Optional: Recommended implementation sequence (keeps risk down)
1. Canonical verdict schema + deterministic evaluator skeleton
2. Signing + verification CLI
3. Snapshot bundle format + pinned evaluation
4. Replay tool + golden tests
5. OCI attachment publishing + registry/admission integration
6. Evidence bundles + UI explainability
7. Exceptions + audit pack export
---
If you want this turned into a formal internal PRD template, I can format it as:
* “Product requirements” (MUST/SHOULD/COULD)
* “Engineering requirements” (interfaces + invariants + test plan)
* “Security model” (trust roots, signing identities, verification policy)
* “Acceptance criteria” for an MVP and for GA

View File

@@ -0,0 +1,462 @@
Below are internal guidelines for Stella Ops Product Managers and Development Managers for the capability: **Knowledge Snapshots / TimeTravel Replay**. This is written as an implementable operating standard (not a concept note).
---
# Knowledge Snapshots / TimeTravel Replay
## Product and Engineering Guidelines for Stella Ops
## 1) Purpose and value proposition
### What this capability must achieve
Enable Stella Ops to **reproduce any historical risk decision** (scan result, policy evaluation, verdict) **deterministically**, using a **cryptographically bound snapshot** of the exact knowledge inputs that were available at the time the decision was made.
### Why customers pay for it
This capability is primarily purchased for:
* **Auditability**: “Show me what you knew, when you knew it, and why the system decided pass/fail.”
* **Incident response**: reproduce prior posture using historical feeds/VEX/policies and explain deltas.
* **Airgapped / regulated environments**: deterministic, offline decisioning with attested knowledge state.
* **Change control**: prove whether a decision changed due to code change vs knowledge change.
### Core product promise
For a given artifact and snapshot:
* **Same inputs → same outputs** (verdict, scores, findings, evidence pointers), or Stella Ops must clearly declare the precise exceptions.
---
## 2) Definitions (PMs and engineers must align on these)
### Knowledge input
Any external or semi-external information that can influence the outcome:
* vulnerability databases and advisories (any source)
* exploit-intel signals
* VEX statements (OpenVEX, CSAF, CycloneDX VEX, etc.)
* SBOM ingestion logic and parsing rules
* package identification rules (including distro/backport logic)
* policy content and policy engine version
* scoring rules (including weights and thresholds)
* trust anchors and signature verification policy
* plugin versions and enabled capabilities
* configuration defaults and overrides that change analysis
### Knowledge Snapshot
A **sealed record** of:
1. **References** (which inputs were used), and
2. **Content** (the exact bytes used), and
3. **Execution contract** (the evaluator and ruleset versions)
### TimeTravel Replay
Re-running evaluation of an artifact **using only** the snapshot content and the recorded execution contract, producing the same decision and explainability artifacts.
---
## 3) Product principles (nonnegotiables)
1. **Determinism is a product requirement**, not an engineering detail.
2. **Snapshots are firstclass artifacts** with explicit lifecycle (create, verify, export/import, retain, expire).
3. **The snapshot is cryptographically bound** to outcomes and evidence (tamper-evident chain).
4. **Replays must be possible offline** (when the snapshot includes content) and must fail clearly when not possible.
5. **Minimal surprise**: the UI must explain when a verdict changed due to “knowledge drift” vs “artifact drift.”
6. **Scalability by content addressing**: the platform must deduplicate knowledge content aggressively.
7. **Backward compatibility**: old snapshots must remain replayable within a documented support window.
---
## 4) Scope boundaries (what this is not)
### Non-goals (explicitly out of scope for v1 unless approved)
* Reconstructing *external internet state* beyond what is recorded (no “fetch historical CVE state from the web”).
* Guaranteeing replay across major engine rewrites without a compatibility plan.
* Storing sensitive proprietary customer code in snapshots (unless explicitly enabled).
* Replaying “live runtime signals” unless those signals were captured into the snapshot at decision time.
---
## 5) Personas and use cases (PM guidance)
### Primary personas
* **Security Governance / GRC**: needs audit packs, controls evidence, deterministic history.
* **Incident response / AppSec lead**: needs “what changed and why” quickly.
* **Platform engineering / DevOps**: needs reproducible CI gates and airgap workflows.
* **Procurement / regulated customers**: needs proof of process and defensible attestations.
### Must-support use cases
1. **Replay a past release gate decision** in a new environment (including offline) and get identical outcome.
2. **Explain drift**: “This build fails today but passed last month—why?”
3. **Airgap export/import**: create snapshots in connected environment, import to disconnected one.
4. **Audit bundle generation**: export snapshot + verdict(s) + evidence pointers.
---
## 6) Functional requirements (PM “must/should” list)
### Must
* **Snapshot creation** for every material evaluation (or for every “decision object” chosen by configuration).
* **Snapshot manifest** containing:
* unique snapshot ID (content-addressed)
* list of knowledge sources with hashes/digests
* policy IDs and exact policy content hashes
* engine version and plugin versions
* timestamp and clock source metadata
* trust anchor set hash and verification policy hash
* **Snapshot sealing**:
* snapshot manifest is signed
* signed link from verdict → snapshot ID
* **Replay**:
* re-evaluate using only snapshot inputs
* output must match prior results (or emit a deterministic mismatch report)
* **Export/import**:
* portable bundle format
* import verifies integrity and signatures before allowing use
* **Retention controls**:
* configurable retention windows and storage quotas
* deduplication and garbage collection
### Should
* **Partial snapshots** (reference-only) vs **full snapshots** (content included), with explicit replay guarantees.
* **Diff views**: compare two snapshots and highlight what knowledge changed.
* **Multi-snapshot replay**: run “as-of snapshot A” and “as-of snapshot B” to show drift impact.
### Could
* Snapshot “federation” for large orgs (mirrors/replication with policy controls).
* Snapshot “pinning” to releases or environments as a governance policy.
---
## 7) UX and workflow guidelines (PM + Eng)
### UI must communicate three states clearly
1. **Reproducible offline**: snapshot includes all required content.
2. **Reproducible with access**: snapshot references external sources that must be available.
3. **Not reproducible**: missing content or unsupported evaluator version.
### Required UI objects
* **Snapshot Details page**
* snapshot ID and signature status
* list of knowledge sources (name, version/epoch, digest, size)
* policy bundle version, scoring rules version
* trust anchors + verification policy digest
* replay status: “verified reproducible / reproducible / not reproducible”
* **Verdict page**
* links to snapshot(s)
* “replay now” action
* “compare to latest knowledge” action
### UX guardrails
* Never show “pass/fail” without also showing:
* snapshot ID
* policy ID/version
* verification status
* When results differ on replay, show:
* exact mismatch class (engine mismatch, missing data, nondeterminism, corrupted snapshot)
* what input changed (if known)
* remediation steps
---
## 8) Data model and format guidelines (Development Managers)
### Canonical objects (recommended minimum set)
* **KnowledgeSnapshotManifest (KSM)**
* **KnowledgeBlob** (content-addressed bytes)
* **KnowledgeSourceDescriptor**
* **PolicyBundle**
* **TrustBundle**
* **Verdict** (signed decision artifact)
* **ReplayReport** (records replay result and mismatches)
### Content addressing
* Use a stable hash (e.g., SHA256) for:
* each knowledge blob
* manifest
* policy bundle
* trust bundle
* Snapshot ID should be derived from manifest digest.
### Example manifest shape (illustrative)
```json
{
"snapshot_id": "ksm:sha256:…",
"created_at": "2025-12-19T10:15:30Z",
"engine": { "name": "stella-evaluator", "version": "1.7.0", "build": "…"},
"plugins": [
{ "name": "pkg-id", "version": "2.3.1", "digest": "sha256:…" }
],
"policy": { "bundle_id": "pol:sha256:…", "digest": "sha256:…" },
"scoring": { "ruleset_id": "score:sha256:…", "digest": "sha256:…" },
"trust": { "bundle_id": "trust:sha256:…", "digest": "sha256:…" },
"sources": [
{
"name": "nvd",
"epoch": "2025-12-18",
"kind": "vuln_feed",
"content_digest": "sha256:…",
"licenses": ["…"],
"origin": { "uri": "…", "retrieved_at": "…" }
},
{
"name": "customer-vex",
"kind": "vex",
"content_digest": "sha256:…"
}
],
"environment": {
"determinism_profile": "strict",
"timezone": "UTC",
"normalization": { "line_endings": "LF", "sort_order": "canonical" }
}
}
```
### Versioning rules
* Every object is immutable once written.
* Changes create new digests; never mutate in place.
* Support schema evolution via:
* `schema_version`
* strict validation + migration tooling
* Keep manifests small; store large data as blobs.
---
## 9) Determinism contract (Engineering must enforce)
### Determinism requirements
* Stable ordering: sort inputs and outputs canonically.
* Stable timestamps: timestamps may exist but must not change computed scores/verdict.
* Stable randomization: no RNG; if unavoidable, fixed seed recorded in snapshot.
* Stable parsers: parser versions are pinned by digest; parsing must be deterministic.
### Allowed nondeterminism (if any) must be explicit
If you must allow nondeterminism, it must be:
* documented,
* surfaced in UI,
* included in replay report as “non-deterministic factor,”
* and excluded from the signed decision if it affects pass/fail.
---
## 10) Security model (Development Managers)
### Threats this feature must address
* Feed poisoning (tampered vulnerability data)
* Time-of-check/time-of-use drift (same artifact evaluated against moving feeds)
* Replay manipulation (swap snapshot content)
* “Policy drift hiding” (claiming old decision used different policies)
* Signature bypass (trust anchors altered)
### Controls required
* Sign manifests and verdicts.
* Bind verdict → snapshot ID → policy bundle hash → trust bundle hash.
* Verify on every import and on every replay invocation.
* Audit log:
* snapshot created
* snapshot imported
* replay executed
* verification failures
### Key handling
* Decide and document:
* who signs snapshots/verdicts (service keys vs tenant keys)
* rotation policy
* revocation/compromise handling
* Avoid designing cryptography from scratch; use well-established signing formats and separation of duties.
---
## 11) Offline / airgapped requirements
### Snapshot levels (PM packaging guideline)
Offer explicit snapshot types with clear guarantees:
* **Level A: Reference-only snapshot**
* stores hashes + source descriptors
* replay requires access to original sources
* **Level B: Portable snapshot**
* includes blobs necessary for replay
* replay works offline
* **Level C: Sealed portable snapshot**
* portable + signed + includes trust anchors
* replay works offline and can be verified independently
Do not market airgap support without specifying which level is provided.
---
## 12) Performance and storage guidelines
### Principles
* Content-address knowledge blobs to maximize deduplication.
* Separate “hot” knowledge (recent epochs) from cold storage.
* Support snapshot compaction and garbage collection.
### Operational requirements
* Retention policies per tenant/project/environment.
* Quotas and alerting when snapshot storage approaches limits.
* Export bundles should be chunked/streamable for large feeds.
---
## 13) Testing and acceptance criteria
### Required test categories
1. **Golden replay tests**
* same artifact + same snapshot → identical outputs
2. **Corruption tests**
* bit flips in blobs/manifests are detected and rejected
3. **Version skew tests**
* old snapshot + new engine should either replay deterministically or fail with a clear incompatibility report
4. **Airgap tests**
* export → import → replay without network access
5. **Diff accuracy tests**
* compare snapshots and ensure the diff identifies actual knowledge changes, not noise
### Definition of Done (DoD) for the feature
* Snapshots are created automatically according to policy.
* Snapshots can be exported and imported with verified integrity.
* Replay produces matching verdicts for a representative corpus.
* UI exposes snapshot provenance and replay status.
* Audit log records snapshot lifecycle events.
* Clear failure modes exist (missing blobs, incompatible engine, signature failure).
---
## 14) Metrics (PM ownership)
Track metrics that prove this is a moat, not a checkbox.
### Core KPIs
* **Replay success rate** (strict determinism)
* **Time to explain drift** (median time from “why changed” to root cause)
* **% verdicts with sealed portable snapshots**
* **Audit effort reduction** (customer-reported or measured via workflow steps)
* **Storage efficiency** (dedup ratio; bytes per snapshot over time)
### Guardrail metrics
* Snapshot creation latency impact on CI
* Snapshot storage growth per tenant
* Verification failure rates
---
## 15) Common failure modes (what to prevent)
1. Treating snapshots as “metadata only” and still claiming replayability.
2. Allowing “latest feed fetch” during replay (breaks the promise).
3. Not pinning parser/policy/scoring versions—causes silent drift.
4. Missing clear UX around replay limitations and failure reasons.
5. Overcapturing sensitive inputs (privacy and customer trust risk).
6. Underinvesting in dedup/retention (cost blowups).
---
## 16) Management checklists
### PM checklist (before commitment)
* Precisely define “replay” guarantee level (A/B/C) for each SKU/environment.
* Define which inputs are in scope (feeds, VEX, policies, trust bundles, plugins).
* Define customer-facing workflows:
* “replay now”
* “compare to latest”
* “export for audit / air-gap”
* Confirm governance outcomes:
* audit pack integration
* exception linkage
* release gate linkage
### Development Manager checklist (before build)
* Establish canonical schemas and versioning plan.
* Establish content-addressed storage + dedup plan.
* Establish signing and trust anchor strategy.
* Establish deterministic evaluation contract and test harness.
* Establish import/export packaging and verification.
* Establish retention, quotas, and GC.
---
## 17) Minimal phased delivery (recommended)
**Phase 1: Reference snapshot + verdict binding**
* Record source descriptors + hashes, policy/scoring/trust digests.
* Bind snapshot ID into verdict artifacts.
**Phase 2: Portable snapshots**
* Store knowledge blobs locally with dedup.
* Export/import with integrity verification.
**Phase 3: Sealed portable snapshots + replay tooling**
* Sign snapshots.
* Deterministic replay pipeline + replay report.
* UI surfacing and audit logs.
**Phase 4: Snapshot diff + drift explainability**
* Compare snapshots.
* Attribute decision drift to knowledge changes vs artifact changes.
---
If you want this turned into an internal PRD template, I can rewrite it into a structured PRD format with: objectives, user stories, functional requirements, non-functional requirements, security/compliance, dependencies, risks, and acceptance tests—ready for Jira/Linear epics and engineering design review.

View File

@@ -0,0 +1,497 @@
## Stella Ops Guidelines
### Risk Budgets and Diff-Aware Release Gates
**Audience:** Product Managers (PMs) and Development Managers (DMs)
**Applies to:** All customer-impacting software and configuration changes shipped by Stella Ops (code, infrastructure-as-code, runtime config, feature flags, data migrations, dependency upgrades).
---
## 1) What we are optimizing for
Stella Ops ships quickly **without** letting change-driven incidents, security regressions, or data integrity failures become the hidden cost of “speed.”
These guidelines enforce two linked controls:
1. **Risk Budgets** — a quantitative “capacity to take risk” that prevents reliability and trust from being silently depleted.
2. **Diff-Aware Release Gates** — release checks whose strictness scales with *what changed* (the diff), not with generic process.
Together they let us move fast on low-risk diffs and slow down only when the change warrants it.
---
## 2) Non-negotiable principles
1. **All changes are risk-bearing** (even “small” diffs). We quantify and route them accordingly.
2. **Risk is managed at the product/service boundary** (each service has its own budget and gating profile).
3. **Automation first, approvals last**. Humans review what automation cannot reliably verify.
4. **Blast radius is a first-class variable**. A safe rollout beats a perfect code review.
5. **Exceptions are allowed but never free**. Every bypass is logged, justified, and paid back via budget reduction and follow-up controls.
---
## 3) Definitions
### 3.1 Risk Budget (what it is)
A **Risk Budget** is the amount of change-risk a product/service is allowed to take over a defined window (typically a sprint or month) **without increasing the probability of customer harm beyond the agreed tolerance**.
It is a management control, not a theoretical score.
### 3.2 Risk Budget vs. Error Budget (important distinction)
* **Error Budget** (classic SRE): backward-looking tolerance for *actual* unreliability vs. SLO.
* **Risk Budget** (this policy): forward-looking tolerance for *change risk* before shipping.
They interact:
* If error budget is burned (service is unstable), risk budget is automatically constrained.
* If risk budget is low, release gates tighten by policy.
### 3.3 Diff-aware release gates (what it is)
A **release gate** is a set of required checks (tests, scans, reviews, rollout controls) that must pass before a change can progress.
**Diff-aware** means the gate level is determined by:
* what changed (diff classification),
* where it changed (criticality),
* how it ships (blast radius controls),
* and current operational context (incidents, SLO health, budget remaining).
---
## 4) Roles and accountability
### Product Manager (PM) — accountable for risk appetite
PM responsibilities:
* Define product-level risk tolerance with stakeholders (customer impact tolerance, regulatory constraints).
* Approve the **Risk Budget Policy settings** for their product/service tier (criticality level, default gates).
* Prioritize reliability work when budgets are constrained.
* Own customer communications for degraded service or risk-driven release deferrals.
### Development Manager (DM) — accountable for enforcement and engineering hygiene
DM responsibilities:
* Ensure pipelines implement diff classification and enforce gates.
* Ensure tests, telemetry, rollout mechanisms, and rollback procedures exist and are maintained.
* Ensure “exceptions” process is real (logged, postmortemed, paid back).
* Own staffing/rotation decisions to ensure safe releases (on-call readiness, release captains).
### Shared responsibilities
PM + DM jointly:
* Review risk budget status weekly.
* Resolve trade-offs: feature velocity vs. reliability/security work.
* Approve gate profile changes (tighten/loosen) based on evidence.
---
## 5) Risk Budgets
### 5.1 Establish service tiers (criticality)
Each service/product component must be assigned a **Criticality Tier**:
* **Tier 0 Internal only** (no external customers; low business impact)
* **Tier 1 Customer-facing non-critical** (degradation tolerated; limited blast radius)
* **Tier 2 Customer-facing critical** (core workflows; meaningful revenue/trust impact)
* **Tier 3 Safety/financial/data-critical** (payments, auth, permissions, PII, regulated workflows)
Tier drives default budgets and minimum gates.
### 5.2 Choose a budget window and units
**Window:** default to **monthly** with weekly tracking; optionally sprint-based if release cadence is sprint-coupled.
**Units:** use **Risk Points (RP)** — consumed by each change. (Do not overcomplicate at first; tune with data.)
Recommended initial monthly budgets (adjust after 23 cycles with evidence):
* Tier 0: 300 RP/month
* Tier 1: 200 RP/month
* Tier 2: 120 RP/month
* Tier 3: 80 RP/month
> Interpretation: Tier 3 ships fewer “risky” changes; it can still ship frequently, but changes must be decomposed into low-risk diffs and shipped with strong controls.
### 5.3 Risk Point scoring (how changes consume budget)
Every change gets a **Release Risk Score (RRS)** in RP.
A practical baseline model:
**RRS = Base(criticality) + Diff Risk + Operational Context Mitigations**
**Base (criticality):**
* Tier 0: +1
* Tier 1: +3
* Tier 2: +6
* Tier 3: +10
**Diff Risk (additive):**
* +1: docs, comments, non-executed code paths, telemetry-only additions
* +3: UI changes, non-core logic changes, refactors with high test coverage
* +6: API contract changes, dependency upgrades, medium-complexity logic in a core path
* +10: database schema migrations, auth/permission logic, data retention/PII handling
* +15: infra/networking changes, encryption/key handling, payment flows, queue semantics changes
**Operational Context (additive):**
* +5: service currently in incident or had Sev1/Sev2 in last 7 days
* +3: error budget < 50% remaining
* +2: on-call load high (paging above normal baseline)
* +5: release during restricted windows (holidays/freeze) via exception
**Mitigations (subtract):**
* 3: feature flag with staged rollout + instant kill switch verified
* 3: canary + automated health gates + rollback tested in last 30 days
* 2: high-confidence integration coverage for touched components
* 2: no data migration OR backward-compatible migration with proven rollback
* 2: change isolated behind permission boundary / limited cohort
**Minimum RRS floor:** never below 1 RP.
DM is responsible for making sure the pipeline can calculate a *default* RRS automatically and require humans only for edge cases.
### 5.4 Budget operating rules
**Budget ledger:** Maintain a per-service ledger:
* Budget allocated for the window
* RP consumed per release
* RP remaining
* Trendline (projected depletion date)
* Exceptions (break-glass releases)
**Control thresholds:**
* **Green (≥60% remaining):** normal operation
* **Yellow (3059%):** additional caution; gates tighten by 1 level for medium/high-risk diffs
* **Red (<30%):** freeze high-risk diffs; allow only low-risk changes or reliability/security work
* **Exhausted (≤0%):** releases restricted to incident fixes, security fixes, and rollback-only, with tightened gates and explicit sign-off
### 5.5 What to do when budget is low (expected behavior)
When Yellow/Red:
* PM shifts roadmap execution toward:
* reliability work, defect burn-down,
* decomposing large changes into smaller, reversible diffs,
* reducing scope of risky features.
* DM enforces:
* smaller diffs,
* increased feature flagging,
* staged rollout requirements,
* improved test/observability coverage.
Budget constraints are a signal, not a punishment.
### 5.6 Budget replenishment and incentives
Budgets replenish on the window boundary, but we also allow **earned capacity**:
* If a service improves change failure rate and MTTR for 2 consecutive windows, it may earn:
* +1020% budget increase **or**
* one gate level relaxation for specific change categories
This must be evidence-driven (metrics, not opinions).
---
## 6) Diff-Aware Release Gates
### 6.1 Diff classification (what the pipeline must detect)
At minimum, automatically classify diffs into these categories:
**Code scope**
* Executable code vs docs-only
* Core vs non-core modules (define module ownership boundaries)
* Hot paths (latency-sensitive), correctness-sensitive paths
**Data scope**
* Schema migration (additive vs breaking)
* Backfill jobs / batch jobs
* Data model changes impacting downstream consumers
* PII / regulated data touchpoints
**Security scope**
* Authn/authz logic
* Permission checks
* Secrets, key handling, encryption changes
* Dependency changes with known CVEs
**Infra scope**
* IaC changes, networking, load balancer, DNS, autoscaling
* Runtime config changes (feature flags, limits, thresholds)
* Queue/topic changes, retention settings
**Interface scope**
* Public API contract changes
* Backward compatibility of payloads/events
* Client version dependency
### 6.2 Gate levels
Define **Gate Levels G0G4**. The pipeline assigns one based on diff + context + budget.
#### G0 — No-risk / administrative
Use for:
* docs-only, comments-only, non-functional metadata
Requirements:
* Lint/format checks
* Basic CI pass (build)
#### G1 — Low risk
Use for:
* small, localized code changes with strong unit coverage
* non-core UI changes
* telemetry additions (no removal)
Requirements:
* All automated unit tests
* Static analysis/linting
* 1 peer review (code owner not required if outside critical modules)
* Automated deploy to staging
* Post-deploy smoke checks
#### G2 — Moderate risk
Use for:
* moderate logic changes in customer-facing paths
* dependency upgrades
* API changes that are backward compatible
* config changes affecting behavior
Requirements:
* G1 +
* Integration tests relevant to impacted modules
* Code owner review for touched modules
* Feature flag required if customer impact possible
* Staged rollout: canary or small cohort
* Rollback plan documented in PR
#### G3 — High risk
Use for:
* schema migrations
* auth/permission changes
* core business logic in critical flows
* infra changes affecting availability
* non-trivial concurrency/queue semantics changes
Requirements:
* G2 +
* Security scan + dependency audit (must pass, exceptions logged)
* Migration plan (forward + rollback) reviewed
* Load/performance checks if in hot path
* Observability: new/updated dashboards/alerts for the change
* Release captain / on-call sign-off (someone accountable live)
* Progressive delivery with automatic health gates (error rate/latency)
#### G4 — Very high risk / safety-critical / budget-constrained releases
Use for:
* Tier 3 critical systems with low budget remaining
* changes during freeze windows via exception
* broad blast radius changes (platform-wide)
* remediation after major incident where recurrence risk is high
Requirements:
* G3 +
* Formal risk review (PM+DM+Security/SRE) in writing
* Explicit rollback rehearsal or prior proven rollback path
* Extended canary period with success criteria and abort criteria
* Customer comms plan if impact is plausible
* Post-release verification checklist executed and logged
### 6.3 Gate selection logic (policy)
Default rule:
1. Compute **RRS** (Risk Points) from diff + context.
2. Map RRS to default gate:
* 15 RP G1
* 612 RP G2
* 1320 RP G3
* 21+ RP G4
3. Apply modifiers:
* If **budget Yellow**: escalate one gate for changes G2
* If **budget Red**: escalate one gate for changes G1 and block high-risk categories unless exception
* If active incident or error budget severely degraded: block non-fix releases by default
DM must ensure the pipeline enforces this mapping automatically.
### 6.4 “Diff-aware” also means “blast-radius aware”
If the diff is inherently risky, reduce risk operationally:
* feature flags with cohort controls
* dark launches (ship code disabled)
* canary deployments
* blue/green with quick revert
* backwards-compatible DB migrations (expand/contract pattern)
* circuit breakers and rate limiting
* progressive exposure by tenant / region / account segment
Large diffs are not made safe by more reviewers; they are made safe by **reversibility and containment**.
---
## 7) Exceptions (“break glass”) policy
Exceptions are permitted only when one of these is true:
* incident mitigation or customer harm prevention,
* urgent security fix (actively exploited or high severity),
* legal/compliance deadline.
**Requirements for any exception:**
* Recorded rationale in the PR/release ticket
* Named approver(s): DM + on-call owner; PM for customer-impacting risk
* Mandatory follow-up within 5 business days:
* post-incident or post-release review
* remediation tasks created and prioritized
* **Budget penalty:** subtract additional RP (e.g., +50% of the changes RRS) to reflect unmanaged risk
Repeated exceptions are a governance failure and trigger gate tightening.
---
## 8) Operational metrics (what PMs and DMs must review)
Minimum weekly review dashboard per service:
* **Risk budget remaining** (RP and %)
* **Deploy frequency**
* **Change failure rate**
* **MTTR**
* **Sev1/Sev2 count** (rolling 30/90 days)
* **SLO / error budget status**
* **Gate compliance rate** (how often gates were bypassed)
* **Diff size distribution** (are we shipping huge diffs?)
* **Rollback frequency and time-to-rollback**
Policy expectation:
* If change failure rate or MTTR worsens materially over 2 windows, budgets tighten and gate mapping escalates until stability returns.
---
## 9) Practical operating cadence
### Weekly (PM + DM)
* Review budgets and trends
* Identify upcoming high-risk releases and plan staged rollouts
* Confirm staffing for release windows (release captain / on-call coverage)
* Decide whether to defer, decompose, or harden changes
### Per release (DM-led, PM informed)
* Ensure correct gate level
* Verify rollout + rollback readiness
* Confirm monitoring/alerts exist and are watched during rollout
* Execute post-release verification checklist
### Monthly (leadership)
* Adjust tier assignments if product criticality changed
* Recalibrate budget numbers based on measured outcomes
* Identify systemic causes: test gaps, observability gaps, deployment tooling gaps
---
## 10) Required templates (standardize execution)
### 10.1 Release Plan (required for G2+)
* What is changing (13 bullets)
* Expected customer impact (or none”)
* Diff category flags (DB/auth/infra/API/etc.)
* Rollout strategy (canary/cohort/blue-green)
* Abort criteria (exact metrics/thresholds)
* Rollback steps (exact commands/process)
* Owners during rollout (names)
### 10.2 Migration Plan (required for schema/data changes)
* Migration type: additive / expand-contract / breaking (breaking is disallowed without explicit G4 approval)
* Backfill approach and rate limits
* Validation checks (row counts, invariants)
* Rollback strategy (including data implications)
### 10.3 Post-release Verification Checklist (G1+)
* Smoke test results
* Key dashboards checked (latency, error rate, saturation)
* Alerts status
* User-facing workflows validated (as applicable)
* Ticket updated with outcome
---
## 11) What “good” looks like
* Low-risk diffs ship quickly with minimal ceremony (G0G1).
* High-risk diffs are decomposed and shipped progressively, not heroically.
* Risk budgets are visible, used in planning, and treated as a real constraint.
* Exceptions are rare and followed by concrete remediation.
* Over time: deploy frequency stays high while change failure rate and MTTR decrease.
---
## 12) Immediate adoption checklist (first 30 days)
**DM deliverables**
* Implement diff classification in CI/CD (at least: DB/auth/infra/API/deps/config)
* Implement automatic gate mapping and enforcement
* Add release plan and rollback plan checks for G2+
* Add logging for gate overrides
**PM deliverables**
* Confirm service tiering for owned areas
* Approve initial monthly RP budgets
* Add risk budget review to the weekly product/engineering ritual
* Reprioritize work when budgets hit Yellow/Red (explicitly)
---
If you want, I can also provide:
* a concrete scoring worksheet (ready to paste into Confluence/Notion),
* a CI/CD policy example (e.g., GitHub Actions / GitLab rules) that computes gate level from diff patterns,
* and a one-page Release Captain Runbook aligned to G2G4.