save progress
This commit is contained in:
@@ -0,0 +1,469 @@
|
||||
I’m sharing a **competitive security‑tool matrix** that you can immediately plug into Stella Ops strategy discussions — it maps real, *comparable evidence* from public sources to categories where most current tools fall short. Below the CSV is a short Markdown commentary that highlights gaps & opportunities Stella Ops can exploit.
|
||||
|
||||
---
|
||||
|
||||
## 🧠 Competitive Security Tool Matrix (CSV)
|
||||
|
||||
**Columns:**
|
||||
`Tool,SBOM Fidelity,VEX Handling,Explainability,Smart‑Diff,Call‑Stack Reachability,Deterministic Scoring,Unknowns State,Ecosystem Integrations,Policy Engine,Offline/Air‑Gapped,Provenance/Attestations,Public Evidence`
|
||||
|
||||
```
|
||||
Tool,SBOM Fidelity,VEX Handling,Explainability,Smart‑Diff,Call‑Stack Reachability,Deterministic Scoring,Unknowns State,Ecosystem Integrations,Policy Engine,Offline/Air‑Gapped,Provenance/Attestations,Public Evidence
|
||||
Trivy (open),CycloneDX/SPDX support (basic),Partial* (SBOM ext refs),Low,No,No,Moderate,No,Strong CI/CD/K8s,Minimal,Unknown,SBOM only evidence; VEX support request exists but unmerged⟨*⟩,:contentReference[oaicite:0]{index=0}
|
||||
Grype/Syft,Strong CycloneDX/SPDX (generator + scanner),None documented,Low,No,No,Moderate,No,Strong CI/CD/K8s,Policy minimal,Unknown,Syft can create signed SBOMs but not full attestations,:contentReference[oaicite:1]{index=1}
|
||||
Snyk,SBOM export likely (platform),Unknown/limited,Vuln context explainability (reports),No,No,Proprietary risk scoring,Partial integrations,Strong Black/White list policies in UI,Unknown,Unknown (not focused on attestations),:contentReference[oaicite:2]{index=2}
|
||||
Prisma Cloud,Enterprise SBOM + vuln scanning,Runtime exploitability contexts?*,Enterprise dashboards,No formal smart‑diff,No,Risk prioritization,Supports multi‑cloud integrations,Rich policy engines (CNAPP),Supports offline deployment?,Unknown attestations capabilities,:contentReference[oaicite:3]{index=3}
|
||||
Aqua (enterprise),SBOM via Trivy,Unknown commercial VEX support,Some explainability in reports,No documented smart‑diff,No,Risk prioritization,Comprehensive integrations (cloud/CI/CD/SIEM),Enterprise policy supports compliance,Air‑gapped options in enterprise,Focus on compliance attestations?,:contentReference[oaicite:4]{index=4}
|
||||
Anchore Enterprise,Strong SBOM mgmt + format support,Policy engine can ingest SBOM + vulnerability sources,Moderate (reports & SBOM insights),Potential policy diff,No explicit reachability analysis,Moderate policy scoring,Partial,Rich integrations (CI/CD/registry),Policy‑as‑code,Air‑gapped deploy supported,SBOM provenance & signing via Syft/in‑toto,:contentReference[oaicite:5]{index=5}
|
||||
Stella Ops,High fidelity SBOM (CycloneDX/SPDX) planned,Native VEX ingestion + decisioning,Explainability + proof extracts,Smart‑diff tech planned,Call‑stack reachability analysis,Deterministic scoring with proofs,Explicit unknowns state,Integrations with CI/CD/SIGSTORE,Declarative multimodal policy engine,Full offline/air‑gapped support,Provenance/attestations via DSSE/in‑toto,StellaOps internal vision
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📌 Key Notes, Gaps & Opportunities (Markdown)
|
||||
|
||||
### **SBOM Fidelity**
|
||||
|
||||
* **Open tools (Trivy, Syft)** already support CycloneDX/SPDX output, but mostly as flat SBOM artifacts without long‑term repositories or versioned diffing. ([Ox Security][1])
|
||||
* **Opportunity:** Provide *repository + lineage + merge semantics* with proofs — not just generation.
|
||||
|
||||
### **VEX Handling**
|
||||
|
||||
* Trivy has an open feature request for dynamic VEX ingestion. ([GitHub][2])
|
||||
* Most competitors either lack VEX support or have no *decisioning logic* based on exploitability.
|
||||
* **Opportunity:** First‑class VEX ingestion with evaluation rules + automated scoring.
|
||||
|
||||
### **Explainability**
|
||||
|
||||
* Commercial tools (Prisma/Snyk) offer UI report context and dev‑oriented remediation guidance. ([Snyk][3])
|
||||
* OSS tools provide flat scan outputs with minimal causal trace.
|
||||
* **Opportunity:** Link vulnerability flags back to *proven code paths*, enriched with SBOM + call reachability.
|
||||
|
||||
### **Smart‑Diff & Unknowns State**
|
||||
|
||||
* No major tool advertising *smart diffing* between SBOMs for incremental risk deltas across releases.
|
||||
* **Opportunity:** Automate risk deltas between SBOMs with uncertainty margins.
|
||||
|
||||
### **Call‑Stack Reachability**
|
||||
|
||||
* None of these tools publicly document call‑stack based exploit reachability analysis out‑of‑the‑box.
|
||||
* **Opportunity:** Integrate dynamic/static reachability evidence that elevates scanning from surface report → *impact map*.
|
||||
|
||||
### **Deterministic Scoring**
|
||||
|
||||
* Snyk & Prisma offer proprietary scoring that blends severity + context. ([TrustRadius][4])
|
||||
* But these aren’t reproducible with *signed verdicts*.
|
||||
* **Opportunity:** Provide *deterministic, attestable scoring proofs*.
|
||||
|
||||
### **Ecosystem Integrations**
|
||||
|
||||
* Trivy/Grype excel at lightweight CI/CD and Kubernetes. ([Echo][5])
|
||||
* Enterprise products integrate deeply into cloud/registry. ([Palo Alto Networks][6])
|
||||
* **Opportunity:** Expand *sigstore/notation* based pipelines and automated attestation flows.
|
||||
|
||||
### **Policy Engine**
|
||||
|
||||
* Prisma & Aqua have mature enterprise policies. ([Aqua][7])
|
||||
* OSS tools have limited simple allow/deny.
|
||||
* **Opportunity:** Provide *lattice/constraint policies* with proof outputs.
|
||||
|
||||
### **Offline/Air‑Gapped**
|
||||
|
||||
* Anchore supports air‑gapped deployment in enterprise contexts. ([Anchore][8])
|
||||
* Support across all open tools is ad‑hoc at best.
|
||||
* **Opportunity:** Built‑in deterministic offline modes with offline SBOM stores and VEX ingestion.
|
||||
|
||||
### **Provenance/Attestations**
|
||||
|
||||
* Syft supports SBOM output in various formats; also *in‑toto* for attestations. ([Ox Security][1])
|
||||
* Most competitors don’t prominently advertise *attestation pipelines*.
|
||||
* **Opportunity:** End‑to‑end DSSE/in‑toto provenance with immutable proofs.
|
||||
|
||||
---
|
||||
|
||||
### 📌 Public Evidence Links
|
||||
|
||||
* **Trivy / Syft / Grype SBOM support & formats:** CycloneDX/SPDX; Syft as generator + Grype scanner. ([Ox Security][1])
|
||||
* **Trivy VEX feature request:** exists but not mainstream yet. ([GitHub][2])
|
||||
* **Snyk platform capability:** scans containers, IaC, dev‑first prioritization. ([Snyk][3])
|
||||
* **Prisma Cloud container security:** lifecycle scanning + policy. ([Palo Alto Networks][6])
|
||||
* **Anchore Enterprise SBOM mgmt & policy:** central imports + CI/CD ecosystem. ([Anchore Docs][9])
|
||||
|
||||
---
|
||||
|
||||
If you’d like this in **Excel, JSON, or Mermaid graph form** next, I can generate it — just tell me the format you prefer.
|
||||
|
||||
[1]: https://www.ox.security/blog/sbom-tools/?utm_source=chatgpt.com "Top 5 SBOM Tools for Securing the Software Supply Chain"
|
||||
[2]: https://github.com/aquasecurity/trivy/discussions/8047?utm_source=chatgpt.com "Support for dynamic VEX retrieval from \"External ..."
|
||||
[3]: https://snyk.io/?utm_source=chatgpt.com "Snyk AI-powered Developer Security Platform | AI-powered ..."
|
||||
[4]: https://www.trustradius.com/compare-products/palo-alto-networks-prisma-cloud-vs-snyk?utm_source=chatgpt.com "Palo Alto Networks Prisma Cloud vs Snyk"
|
||||
[5]: https://www.echohq.com/blog/best-container-scanning-tools?utm_source=chatgpt.com "10 Best Container Scanning Tools for 2025"
|
||||
[6]: https://www.paloaltonetworks.com/prisma/cloud/container-security?utm_source=chatgpt.com "Container Security Solutions"
|
||||
[7]: https://www.aquasec.com/integrations/?utm_source=chatgpt.com "Integrations - Aqua Security"
|
||||
[8]: https://anchore.com/blog/effortless-sbom-analysis-how-anchore-enterprise-simplifies-integration/?utm_source=chatgpt.com "Effortless SBOM Analysis with Anchore Enterprise"
|
||||
[9]: https://docs.anchore.com/current/docs/sbom_management/?utm_source=chatgpt.com "SBOM Management"
|
||||
Below is a **deep, implementation-grade elaboration** that turns the comparison matrix into **actionable architecture and product differentiation** for Stella Ops. I will deliberately avoid marketing language and focus on **mechanics, evidence, and execution gaps**.
|
||||
|
||||
---
|
||||
|
||||
## 1. Why existing tools plateau (root cause analysis)
|
||||
|
||||
Across Trivy, Syft/Grype, Snyk, Prisma, Aqua, Anchore, there is a **structural ceiling** they all hit — regardless of OSS vs enterprise.
|
||||
|
||||
### Shared structural limitations
|
||||
|
||||
1. **SBOM is treated as a static artifact**
|
||||
|
||||
* Generated → stored → scanned.
|
||||
* No concept of *evolving truth*, lineage, or replayability.
|
||||
2. **Vulnerability scoring is probabilistic, not provable**
|
||||
|
||||
* CVSS + vendor heuristics.
|
||||
* Cannot answer: *“Show me why this CVE is exploitable here.”*
|
||||
3. **Exploitability ≠ reachability**
|
||||
|
||||
* “Runtime context” ≠ call-path proof.
|
||||
4. **Diffing is file-level, not semantic**
|
||||
|
||||
* Image hash change ≠ security delta understanding.
|
||||
5. **Offline support is operational, not epistemic**
|
||||
|
||||
* You can run it offline, but you cannot **prove** what knowledge state was used.
|
||||
|
||||
These are not accidental omissions. They arise from **tooling lineage**:
|
||||
|
||||
* Trivy/Syft grew from *package scanners*
|
||||
* Snyk grew from *developer remediation UX*
|
||||
* Prisma/Aqua grew from *policy & compliance platforms*
|
||||
|
||||
None were designed around **forensic reproducibility or trust algebra**.
|
||||
|
||||
---
|
||||
|
||||
## 2. SBOM fidelity: what “high fidelity” actually means
|
||||
|
||||
Most tools claim CycloneDX/SPDX support. That is **necessary but insufficient**.
|
||||
|
||||
### Current reality
|
||||
|
||||
| Dimension | Industry tools |
|
||||
| ----------------------- | ---------------------- |
|
||||
| Component identity | Package name + version |
|
||||
| Binary provenance | Weak or absent |
|
||||
| Build determinism | None |
|
||||
| Dependency graph | Flat or shallow |
|
||||
| Layer attribution | Partial |
|
||||
| Rebuild reproducibility | Not supported |
|
||||
|
||||
### What Stella Ops must do differently
|
||||
|
||||
**SBOM must become a *stateful ledger*, not a document.**
|
||||
|
||||
Concrete requirements:
|
||||
|
||||
* **Component identity = (source + digest + build recipe hash)**
|
||||
* **Binary → source mapping**
|
||||
|
||||
* ELF Build-ID / Mach-O UUID / PE timestamp+hash
|
||||
* **Layer-aware dependency graphs**
|
||||
|
||||
* Not “package depends on X”
|
||||
* But “binary symbol A resolves to shared object B via loader rule C”
|
||||
* **Replay manifest**
|
||||
|
||||
* Exact feeds
|
||||
* Exact policies
|
||||
* Exact scoring rules
|
||||
* Exact timestamps
|
||||
* Hash of everything
|
||||
|
||||
This is the foundation for *deterministic replayable scans* — something none of the competitors even attempt.
|
||||
|
||||
---
|
||||
|
||||
## 3. VEX handling: ingestion vs decisioning
|
||||
|
||||
Most vendors misunderstand VEX.
|
||||
|
||||
### What competitors do
|
||||
|
||||
* Accept VEX as:
|
||||
|
||||
* Metadata
|
||||
* Annotation
|
||||
* Suppression rule
|
||||
* No **formal reasoning** over VEX statements.
|
||||
|
||||
### What Stella Ops must do
|
||||
|
||||
VEX is not a comment — it is a **logical claim**.
|
||||
|
||||
Each VEX statement:
|
||||
|
||||
```
|
||||
IF
|
||||
product == X
|
||||
AND component == Y
|
||||
AND version in range Z
|
||||
THEN
|
||||
status ∈ {not_affected, affected, fixed, under_investigation}
|
||||
BECAUSE
|
||||
justification J
|
||||
WITH
|
||||
evidence E
|
||||
```
|
||||
|
||||
Stella Ops advantage:
|
||||
|
||||
* VEX statements become **inputs to a lattice merge**
|
||||
* Conflicting VEX from:
|
||||
|
||||
* Vendor
|
||||
* Distro
|
||||
* Internal analysis
|
||||
* Runtime evidence
|
||||
* Are resolved **deterministically** via policy, not precedence hacks.
|
||||
|
||||
This unlocks:
|
||||
|
||||
* Vendor-supplied proofs
|
||||
* Customer-supplied overrides
|
||||
* Jurisdiction-specific trust rules
|
||||
|
||||
---
|
||||
|
||||
## 4. Explainability: reports vs proofs
|
||||
|
||||
### Industry “explainability”
|
||||
|
||||
* “This vulnerability is high because…”
|
||||
* Screenshots, UI hints, remediation text.
|
||||
|
||||
### Required explainability
|
||||
|
||||
Security explainability must answer **four non-negotiable questions**:
|
||||
|
||||
1. **What exact evidence triggered this finding?**
|
||||
2. **What code or binary path makes it reachable?**
|
||||
3. **What assumptions are being made?**
|
||||
4. **What would falsify this conclusion?**
|
||||
|
||||
No existing scanner answers #4.
|
||||
|
||||
### Stella Ops model
|
||||
|
||||
Each finding emits:
|
||||
|
||||
* Evidence bundle:
|
||||
|
||||
* SBOM nodes
|
||||
* Call-graph edges
|
||||
* Loader resolution
|
||||
* Runtime symbol presence
|
||||
* Assumption set:
|
||||
|
||||
* Compiler flags
|
||||
* Runtime configuration
|
||||
* Feature gates
|
||||
* Confidence score **derived from evidence density**, not CVSS
|
||||
|
||||
This is explainability suitable for:
|
||||
|
||||
* Auditors
|
||||
* Regulators
|
||||
* Courts
|
||||
* Defense procurement
|
||||
|
||||
---
|
||||
|
||||
## 5. Smart-Diff: the missing primitive
|
||||
|
||||
All tools compare:
|
||||
|
||||
* Image A vs Image B
|
||||
* Result: *“+3 CVEs, –1 CVE”*
|
||||
|
||||
This is **noise-centric diffing**.
|
||||
|
||||
### What Smart-Diff must mean
|
||||
|
||||
Diff not *artifacts*, but **security meaning**.
|
||||
|
||||
Examples:
|
||||
|
||||
* Same CVE remains, but:
|
||||
|
||||
* Call path removed → risk collapses
|
||||
* New binary added, but:
|
||||
|
||||
* Dead code → no reachable risk
|
||||
* Dependency upgraded, but:
|
||||
|
||||
* ABI unchanged → no exposure delta
|
||||
|
||||
Implementation direction:
|
||||
|
||||
* Diff **reachability graphs**
|
||||
* Diff **policy outcomes**
|
||||
* Diff **trust weights**
|
||||
* Diff **unknowns**
|
||||
|
||||
Output:
|
||||
|
||||
> “This release reduces exploitability surface by 41%, despite +2 CVEs.”
|
||||
|
||||
No competitor does this.
|
||||
|
||||
---
|
||||
|
||||
## 6. Call-stack reachability: why runtime context isn’t enough
|
||||
|
||||
### Current vendor claim
|
||||
|
||||
“Runtime exploitability analysis.”
|
||||
|
||||
Reality:
|
||||
|
||||
* Usually:
|
||||
|
||||
* Process exists
|
||||
* Library loaded
|
||||
* Port open
|
||||
|
||||
This is **coarse correlation**, not proof.
|
||||
|
||||
### Stella Ops reachability model
|
||||
|
||||
Reachability requires **three layers**:
|
||||
|
||||
1. **Static call graph**
|
||||
|
||||
* From entrypoints to vulnerable symbols
|
||||
2. **Binary resolution**
|
||||
|
||||
* Dynamic loader rules
|
||||
* Symbol versioning
|
||||
3. **Runtime gating**
|
||||
|
||||
* Feature flags
|
||||
* Configuration
|
||||
* Environment
|
||||
|
||||
Only when **all three align** does exploitability exist.
|
||||
|
||||
This makes false positives *structurally impossible*, not heuristically reduced.
|
||||
|
||||
---
|
||||
|
||||
## 7. Deterministic scoring: replacing trust with math
|
||||
|
||||
Every competitor uses:
|
||||
|
||||
* CVSS
|
||||
* EPSS
|
||||
* Proprietary weighting
|
||||
|
||||
Problem:
|
||||
|
||||
* Scores are **non-reproducible**
|
||||
* Cannot be attested
|
||||
* Cannot be audited
|
||||
|
||||
### Stella Ops scoring
|
||||
|
||||
Score = deterministic function of:
|
||||
|
||||
* Evidence count
|
||||
* Evidence strength
|
||||
* Assumption penalties
|
||||
* Trust source weights
|
||||
* Policy constraints
|
||||
|
||||
Same inputs → same outputs → forever.
|
||||
|
||||
This enables:
|
||||
|
||||
* Signed risk decisions
|
||||
* Cross-org verification
|
||||
* Legal defensibility
|
||||
|
||||
---
|
||||
|
||||
## 8. Unknowns as a first-class state
|
||||
|
||||
Industry tools suppress uncertainty.
|
||||
|
||||
Stella Ops must **surface it**.
|
||||
|
||||
States:
|
||||
|
||||
* Known-safe
|
||||
* Known-vulnerable
|
||||
* **Unknown-reachable**
|
||||
* **Unknown-unreachable**
|
||||
|
||||
Unknowns are **risk**, but different from vulnerabilities.
|
||||
|
||||
This is critical for:
|
||||
|
||||
* Air-gapped environments
|
||||
* Novel exploits
|
||||
* Zero-day windows
|
||||
|
||||
No competitor models this explicitly.
|
||||
|
||||
---
|
||||
|
||||
## 9. Offline / air-gapped: epistemic completeness
|
||||
|
||||
Competitors:
|
||||
|
||||
* “Yes, we can run offline.”
|
||||
|
||||
But cannot answer:
|
||||
|
||||
> “Exactly what knowledge did you have when you made this decision?”
|
||||
|
||||
Stella Ops must:
|
||||
|
||||
* Snapshot feeds
|
||||
* Snapshot policies
|
||||
* Snapshot scoring rules
|
||||
* Snapshot trust anchors
|
||||
|
||||
Then bind them cryptographically to scan results.
|
||||
|
||||
This creates **forensic replayability**, not just offline execution.
|
||||
|
||||
---
|
||||
|
||||
## 10. Strategic conclusion (no hype)
|
||||
|
||||
### Where Stella Ops can **objectively dominate**
|
||||
|
||||
1. Deterministic replayable scans
|
||||
2. Formal VEX reasoning
|
||||
3. Reachability-backed exploitability
|
||||
4. Semantic smart-diff
|
||||
5. Evidence-first explainability
|
||||
6. Unknowns modeling
|
||||
7. Jurisdiction-ready offline trust
|
||||
|
||||
### Where competitors remain stronger (for now)
|
||||
|
||||
* Mass-market UX polish
|
||||
* SaaS onboarding friction
|
||||
* Marketplace integrations
|
||||
|
||||
These are solvable later. The **hard parts** above are not.
|
||||
|
||||
---
|
||||
|
||||
If you want, next I can:
|
||||
|
||||
* Convert this into **developer implementation milestones**
|
||||
* Map features → **moat strength vs competitors**
|
||||
* Design the **Trust Algebra / Lattice Engine spec**
|
||||
* Produce a **regulator-grade threat & evidence model**
|
||||
|
||||
Tell me which vector you want next.
|
||||
@@ -0,0 +1,366 @@
|
||||
|
||||
# A. Executive directive (send as-is to both PM + Dev)
|
||||
|
||||
1. **A “Release” is not an SBOM or a scan report. A Release is a “Security State Snapshot.”**
|
||||
|
||||
* A snapshot is a **versioned, content-addressed bundle** containing:
|
||||
|
||||
* SBOM graph (canonical form, hashed)
|
||||
* Reachability graph (canonical form, hashed)
|
||||
* VEX claim set (canonical form, hashed)
|
||||
* Policies + rule versions used (hashed)
|
||||
* Data-feed identifiers used (hashed)
|
||||
* Toolchain versions (hashed)
|
||||
|
||||
2. **Diff is a product primitive, not a UI feature.**
|
||||
|
||||
* “Diff” must exist as a stable API and artifact, not a one-off report.
|
||||
* Every comparison produces a **Delta object** (machine-readable) and a **Delta Verdict attestation** (signed).
|
||||
|
||||
3. **The CI/CD gate should never ask “how many CVEs?”**
|
||||
|
||||
* It should ask: **“What materially changed in exploitable risk since the last approved baseline?”**
|
||||
* The Delta Verdict must be deterministically reproducible given the same snapshots and policy.
|
||||
|
||||
4. **Every Delta Verdict must be portable and auditable.**
|
||||
|
||||
* It must be a signed attestation that can be stored with the build artifact (OCI attach) and replayed offline.
|
||||
|
||||
---
|
||||
|
||||
# B. Product Management directions
|
||||
|
||||
## B1) Define the product concept: “Security Delta as the unit of governance”
|
||||
|
||||
**Position the capability as change-control for software risk**, not as “a scanner with comparisons.”
|
||||
|
||||
### Primary user stories (MVP)
|
||||
|
||||
1. **Release Manager / Security Engineer**
|
||||
|
||||
* “Compare the candidate build to the last approved build and explain *what changed* in exploitable risk.”
|
||||
2. **CI Pipeline Owner**
|
||||
|
||||
* “Fail the build only for *new* reachable high-risk exposures (or policy-defined deltas), not for unchanged legacy issues.”
|
||||
3. **Auditor / Compliance**
|
||||
|
||||
* “Show a signed delta verdict with evidence references proving why this release passed.”
|
||||
|
||||
### MVP “Delta Verdict” policy questions to support
|
||||
|
||||
* Are there **new reachable vulnerabilities** introduced?
|
||||
* Did any **previously unreachable vulnerability become reachable**?
|
||||
* Are there **new affected VEX states** (e.g., NOT_AFFECTED → AFFECTED)?
|
||||
* Are there **new Unknowns** above a threshold?
|
||||
* Is the **net exploitable surface** increased beyond policy budget?
|
||||
|
||||
## B2) Define the baseline selection rules (product-critical)
|
||||
|
||||
Diff is meaningless without a baseline contract. Product must specify baseline selection as a first-class choice.
|
||||
|
||||
Minimum baseline modes:
|
||||
|
||||
* **Previous build in the same pipeline**
|
||||
* **Last “approved” snapshot** (from an approval gate)
|
||||
* **Last deployed in environment X** (optional later, but roadmap it)
|
||||
|
||||
Acceptance criteria:
|
||||
|
||||
* The delta object must always contain:
|
||||
|
||||
* `baseline_snapshot_digest`
|
||||
* `target_snapshot_digest`
|
||||
* `baseline_selection_method` and identifiers
|
||||
|
||||
## B3) Define the delta taxonomy (what your product “knows” how to talk about)
|
||||
|
||||
Avoid “diffing findings lists.” You need consistent delta categories.
|
||||
|
||||
Minimum taxonomy:
|
||||
|
||||
1. **SBOM deltas**
|
||||
|
||||
* Component added/removed
|
||||
* Component version change
|
||||
* Dependency edge change (graph-level)
|
||||
2. **VEX deltas**
|
||||
|
||||
* Claim added/removed
|
||||
* Status change (e.g., under_investigation → fixed)
|
||||
* Justification/evidence change (optional MVP)
|
||||
3. **Reachability deltas**
|
||||
|
||||
* New reachable vulnerable symbol(s)
|
||||
* Removed reachability
|
||||
* Entry point changes
|
||||
4. **Decision deltas**
|
||||
|
||||
* Policy outcome changed (PASS → FAIL)
|
||||
* Explanation changed (drivers of decision)
|
||||
|
||||
PM deliverable:
|
||||
|
||||
* A one-page **Delta Taxonomy Spec** that becomes the canonical list used across API, UI, and attestations.
|
||||
|
||||
## B4) Define what “signed delta verdict” means in product terms
|
||||
|
||||
A delta verdict is not a PDF.
|
||||
|
||||
It is:
|
||||
|
||||
* A deterministic JSON payload
|
||||
* Wrapped in a signature envelope (DSSE)
|
||||
* Attached to the artifact (OCI attach)
|
||||
* Includes pointers (hash references) to evidence graphs
|
||||
|
||||
PM must define:
|
||||
|
||||
* Where customers can view it (UI + CLI)
|
||||
* Where it lives (artifact registry + Stella store)
|
||||
* How it is consumed (policy gate, audit export)
|
||||
|
||||
## B5) PM success metrics (must be measurable)
|
||||
|
||||
* % of releases gated by delta verdict
|
||||
* Mean time to explain “why failed”
|
||||
* Reduction in “unchanged legacy vuln” false gating
|
||||
* Reproducibility rate: same inputs → same verdict (target: 100%)
|
||||
|
||||
---
|
||||
|
||||
# C. Development Management directions
|
||||
|
||||
## C1) Architecture: treat Snapshot and Delta as immutable, content-addressed objects
|
||||
|
||||
You need four core services/modules:
|
||||
|
||||
1. **Canonicalization + Hashing**
|
||||
|
||||
* Deterministic serialization (stable field ordering, normalized IDs)
|
||||
* Content addressing: every graph and claim set gets a digest
|
||||
|
||||
2. **Snapshot Store (Ledger)**
|
||||
|
||||
* Store snapshots keyed by digest
|
||||
* Store relationships: artifact → snapshot, snapshot → predecessor(s)
|
||||
* Must support offline export/import later (design now)
|
||||
|
||||
3. **Diff Engine**
|
||||
|
||||
* Inputs: `baseline_snapshot_digest`, `target_snapshot_digest`
|
||||
* Outputs:
|
||||
|
||||
* `delta_object` (structured)
|
||||
* `delta_summary` (human-friendly)
|
||||
* Must be deterministic and testable with golden fixtures
|
||||
|
||||
4. **Verdict Engine + Attestation Writer**
|
||||
|
||||
* Evaluate policies against delta
|
||||
* Produce `delta_verdict`
|
||||
* Wrap as DSSE / in-toto-style statement (or your chosen predicate type)
|
||||
* Sign and optionally attach to OCI artifact
|
||||
|
||||
## C2) Data model (minimum viable schemas)
|
||||
|
||||
### Snapshot (conceptual fields)
|
||||
|
||||
* `snapshot_id` (digest)
|
||||
* `artifact_ref` (e.g., image digest)
|
||||
* `sbom_graph_digest`
|
||||
* `reachability_graph_digest`
|
||||
* `vex_claimset_digest`
|
||||
* `policy_bundle_digest`
|
||||
* `feed_snapshot_digest`
|
||||
* `toolchain_digest`
|
||||
* `created_at`
|
||||
|
||||
### Delta object (conceptual fields)
|
||||
|
||||
* `delta_id` (digest)
|
||||
* `baseline_snapshot_digest`
|
||||
* `target_snapshot_digest`
|
||||
* `sbom_delta` (structured)
|
||||
* `reachability_delta` (structured)
|
||||
* `vex_delta` (structured)
|
||||
* `unknowns_delta` (structured)
|
||||
* `derived_risk_delta` (structured)
|
||||
* `created_at`
|
||||
|
||||
### Delta verdict attestation (must include)
|
||||
|
||||
* Subjects: artifact digest(s)
|
||||
* Baseline snapshot digest + Target snapshot digest
|
||||
* Policy bundle digest
|
||||
* Verdict enum: PASS/WARN/FAIL
|
||||
* Drivers: references to delta nodes (hash pointers)
|
||||
* Signature metadata
|
||||
|
||||
## C3) Determinism requirements (non-negotiable)
|
||||
|
||||
Development must implement:
|
||||
|
||||
* **Canonical ID scheme** for components and graph nodes
|
||||
(example: package URL + version + supplier + qualifiers, then hashed)
|
||||
* Stable sorting for node/edge lists
|
||||
* Stable normalization of timestamps (do not include wall-clock in hash inputs unless explicitly policy-relevant)
|
||||
* A “replay test harness”:
|
||||
|
||||
* Given the same inputs, byte-for-byte identical snapshot/delta/verdict
|
||||
|
||||
Definition of Done:
|
||||
|
||||
* Golden test vectors for snapshots and deltas checked into repo
|
||||
* Deterministic hashing tests in CI
|
||||
|
||||
## C4) Graph diff design (how to do it without drowning in noise)
|
||||
|
||||
### SBOM graph diff (MVP)
|
||||
|
||||
Implement:
|
||||
|
||||
* Node set delta: added/removed/changed nodes (by stable node ID)
|
||||
* Edge set delta: added/removed edges (dependency relations)
|
||||
* A “noise suppressor” layer:
|
||||
|
||||
* ignore ordering differences
|
||||
* ignore metadata-only changes unless policy enables
|
||||
|
||||
Output should identify:
|
||||
|
||||
* “What changed?” (added/removed/upgraded/downgraded)
|
||||
* “Why it matters?” (ties to vulnerability & reachability where available)
|
||||
|
||||
### VEX claimset diff (MVP)
|
||||
|
||||
Implement:
|
||||
|
||||
* Keyed by `(product/artifact scope, component ID, vulnerability ID)`
|
||||
* Delta types:
|
||||
|
||||
* claim added/removed
|
||||
* status changed
|
||||
* justification changed (optional later)
|
||||
|
||||
### Reachability diff (incremental approach)
|
||||
|
||||
MVP can start narrow:
|
||||
|
||||
* Support one or two ecosystems initially (e.g., Java + Maven, or Go modules)
|
||||
* Represent reachability as:
|
||||
|
||||
* `entrypoint → function/symbol → vulnerable symbol`
|
||||
* Diff should highlight:
|
||||
|
||||
* Newly reachable vulnerable symbols
|
||||
* Removed reachability
|
||||
|
||||
Important: even if reachability is initially partial, the diff model must support it cleanly (unknowns must exist).
|
||||
|
||||
## C5) Policy evaluation must run on delta, not on raw findings
|
||||
|
||||
Define a policy DSL contract like:
|
||||
|
||||
* `fail_if new_reachable_critical > 0`
|
||||
* `warn_if new_unknowns > 10`
|
||||
* `fail_if vex_status_regressed == true`
|
||||
* `pass_if no_net_increase_exploitable_surface == true`
|
||||
|
||||
Engineering directive:
|
||||
|
||||
* Policies must reference **delta fields**, not scanner-specific output.
|
||||
* Keep the policy evaluation pure and deterministic.
|
||||
|
||||
## C6) Signing and attachment (implementation-level)
|
||||
|
||||
Minimum requirements:
|
||||
|
||||
* Support signing delta verdict as a DSSE envelope with a stable predicate type.
|
||||
* Support:
|
||||
|
||||
* keyless signing (optional)
|
||||
* customer-managed keys (enterprise)
|
||||
* Attach to OCI artifact as an attestation (where possible), and store in Stella ledger for retrieval.
|
||||
|
||||
Definition of Done:
|
||||
|
||||
* A CI workflow can:
|
||||
|
||||
1. create snapshots
|
||||
2. compute delta
|
||||
3. produce signed delta verdict
|
||||
4. verify signature and gate
|
||||
|
||||
---
|
||||
|
||||
# D. Roadmap (sequenced to deliver value early without painting into a corner)
|
||||
|
||||
## Phase 1: “Snapshot + SBOM Diff + Delta Verdict”
|
||||
|
||||
* Version SBOM graphs
|
||||
* Diff SBOM graphs
|
||||
* Produce delta verdict based on SBOM delta + vulnerability delta (even before reachability)
|
||||
* Signed delta verdict artifact exists
|
||||
|
||||
Output:
|
||||
|
||||
* Baseline/target selection
|
||||
* Delta taxonomy v1
|
||||
* Signed delta verdict v1
|
||||
|
||||
## Phase 2: “VEX claimsets and VEX deltas”
|
||||
|
||||
* Ingest OpenVEX/CycloneDX/CSAF
|
||||
* Store canonical claimsets per snapshot
|
||||
* Diff claimsets and incorporate into delta verdict
|
||||
|
||||
Output:
|
||||
|
||||
* “VEX status regression” gating works deterministically
|
||||
|
||||
## Phase 3: “Reachability graphs and reachability deltas”
|
||||
|
||||
* Start with one ecosystem
|
||||
* Generate reachability evidence
|
||||
* Diff reachability and incorporate into verdict
|
||||
|
||||
Output:
|
||||
|
||||
* “new reachable critical” becomes the primary gate
|
||||
|
||||
## Phase 4: “Offline replay bundle”
|
||||
|
||||
* Export/import snapshot + feed snapshot + policy bundle
|
||||
* Replay delta verdict identically in air-gapped environment
|
||||
|
||||
---
|
||||
|
||||
# E. Acceptance criteria checklist (use this as a release gate for your own feature)
|
||||
|
||||
A feature is not done until:
|
||||
|
||||
1. **Snapshot is content-addressed** and immutable.
|
||||
2. **Delta is content-addressed** and immutable.
|
||||
3. Delta shows:
|
||||
|
||||
* SBOM delta
|
||||
* VEX delta (when enabled)
|
||||
* Reachability delta (when enabled)
|
||||
* Unknowns delta
|
||||
4. **Delta verdict is signed** and verification is automated.
|
||||
5. **Replay test**: given same baseline/target snapshots + policy bundle, verdict is identical byte-for-byte.
|
||||
6. The product answers, clearly:
|
||||
|
||||
* What changed?
|
||||
* Why does it matter?
|
||||
* Why is the verdict pass/fail?
|
||||
* What evidence supports this?
|
||||
|
||||
---
|
||||
|
||||
# F. What to tell your teams to avoid (common failure modes)
|
||||
|
||||
* Do **not** ship “diff” as a UI compare of two scan outputs.
|
||||
* Do **not** make reachability an unstructured “note” field; it must be a graph with stable IDs.
|
||||
* Do **not** allow non-deterministic inputs into verdict hashes (timestamps, random IDs, nondeterministic ordering).
|
||||
* Do **not** treat VEX as “ignore rules” only; treat it as a claimset with provenance and merge semantics (even if merge comes later).
|
||||
@@ -0,0 +1,234 @@
|
||||
## 1) Define the product primitive (non-negotiable)
|
||||
|
||||
### Directive (shared)
|
||||
|
||||
**The product’s primary output is not “findings.” It is a “Risk Verdict Attestation” (RVA).**
|
||||
Everything else (SBOMs, CVEs, VEX, reachability, reports) is *supporting evidence* referenced by the RVA.
|
||||
|
||||
### What “first-class artifact” means in practice
|
||||
|
||||
1. **The verdict is an OCI artifact “referrer” attached to a specific image/artifact digest** via OCI 1.1 `subject` and discoverable via the referrers API. ([opencontainers.org][1])
|
||||
2. **The verdict is cryptographically signed** (at least one supported signing pathway).
|
||||
|
||||
* DSSE is a standard approach for signing attestations, and cosign supports creating/verifying in‑toto attestations signed with DSSE. ([Sigstore][2])
|
||||
* Notation is a widely deployed approach for signing/verifying OCI artifacts in enterprise environments. ([Microsoft Learn][3])
|
||||
|
||||
---
|
||||
|
||||
## 2) Directions for Product Managers (PM)
|
||||
|
||||
### A. Write the “Risk Verdict Attestation v1” product contract
|
||||
|
||||
**Deliverable:** A one-page contract + schema that product and customers can treat as an API.
|
||||
|
||||
Minimum fields the contract must standardize:
|
||||
|
||||
* **Subject binding:** exact OCI digest, repo/name, platform (if applicable)
|
||||
* **Verdict:** `PASS | FAIL | PASS_WITH_EXCEPTIONS | INDETERMINATE`
|
||||
* **Policy reference:** policy ID, policy digest, policy version, enforcement mode
|
||||
* **Knowledge snapshot reference:** snapshot ID + digest (see replay semantics below)
|
||||
* **Evidence references:** digests/pointers for SBOM, VEX inputs, vuln feed snapshot, reachability proof(s), config snapshot, and unknowns summary
|
||||
* **Reason codes:** stable machine-readable codes (`RISK.CVE.REACHABLE`, `RISK.VEX.NOT_AFFECTED`, `RISK.UNKNOWN.INPUT_MISSING`, etc.)
|
||||
* **Human explanation stub:** short rationale text plus links/IDs for deeper evidence
|
||||
|
||||
**Key PM rule:** the contract must be **stable and versioned**, with explicit deprecation rules. If you can’t maintain compatibility, ship a new version (v2), don’t silently mutate v1.
|
||||
|
||||
Why: OCI referrers create long-lived metadata chains. Breaking them is a customer trust failure.
|
||||
|
||||
### B. Define strict replay semantics as a product requirement (not “nice to have”)
|
||||
|
||||
PM must specify what “same inputs” means. At minimum, inputs include:
|
||||
|
||||
* artifact digest (subject)
|
||||
* policy bundle digest
|
||||
* vulnerability dataset snapshot digest(s)
|
||||
* VEX bundle digest(s)
|
||||
* SBOM digest(s) or SBOM generation recipe digest
|
||||
* scoring rules version/digest
|
||||
* engine version
|
||||
* reachability configuration version/digest (if enabled)
|
||||
|
||||
**Product acceptance criterion:**
|
||||
When a user re-runs evaluation in “replay mode” using the same knowledge snapshot and policy digest, the **verdict and reason codes must match** (byte-for-byte identical predicate is ideal; if not, the deterministic portion must match exactly).
|
||||
|
||||
OCI 1.1 and ORAS guidance also implies you should avoid shoving large evidence into annotations; store large evidence as blobs and reference by digest. ([opencontainers.org][1])
|
||||
|
||||
### C. Make “auditor evidence extraction” a first-order user journey
|
||||
|
||||
Define the auditor journey as a separate persona:
|
||||
|
||||
* Auditor wants: “Prove why you blocked/allowed artifact X at time Y.”
|
||||
* They should be able to:
|
||||
|
||||
1. Verify the signature chain
|
||||
2. Extract the decision + evidence package
|
||||
3. Replay the evaluation
|
||||
4. Produce a human-readable report without bespoke consulting
|
||||
|
||||
**PM feature requirements (v1)**
|
||||
|
||||
* `explain` experience that outputs:
|
||||
|
||||
* decision summary
|
||||
* policy used
|
||||
* evidence references and hashes
|
||||
* top N reasons (with stable codes)
|
||||
* unknowns and assumptions
|
||||
* `export-audit-package` experience:
|
||||
|
||||
* exports a ZIP (or OCI bundle) containing the RVA, its referenced evidence artifacts, and a machine-readable manifest listing all digests
|
||||
* `verify` experience:
|
||||
|
||||
* verifies signature + policy expectations (who is trusted to sign; which predicate type(s) are acceptable)
|
||||
|
||||
Cosign explicitly supports creating/verifying in‑toto attestations (DSSE-signed) and even validating custom predicates against policy languages like Rego/CUE—this is a strong PM anchor for ecosystem interoperability. ([Sigstore][2])
|
||||
|
||||
---
|
||||
|
||||
## 3) Directions for Development Managers (Dev/Eng)
|
||||
|
||||
### A. Implement OCI attachment correctly (artifact, referrer, fallback)
|
||||
|
||||
**Engineering decisions:**
|
||||
|
||||
1. Store RVA as an OCI artifact manifest with:
|
||||
|
||||
* `artifactType` set to your verdict media type
|
||||
* `subject` pointing to the exact image/artifact digest being evaluated
|
||||
OCI 1.1 introduced these fields for associating metadata artifacts and retrieving them via the referrers API. ([opencontainers.org][1])
|
||||
2. Support discovery via:
|
||||
|
||||
* Referrers API (`GET /v2/<name>/referrers/<digest>`) when registry supports it
|
||||
* **Fallback “tagged index” strategy** for registries that don’t support referrers (OCI 1.1 guidance calls out a fallback tag approach and client responsibilities). ([opencontainers.org][1])
|
||||
|
||||
**Dev acceptance tests**
|
||||
|
||||
* Push subject image → push RVA artifact with `subject` → query referrers → RVA appears.
|
||||
* On a registry without referrers support: fallback retrieval still works.
|
||||
|
||||
### B. Use a standard attestation envelope and signing flow
|
||||
|
||||
For attestations, the lowest friction pathway is:
|
||||
|
||||
* in‑toto Statement + DSSE envelope
|
||||
* Sign/verify using cosign-compatible workflows (so customers can verify without you) ([Sigstore][2])
|
||||
|
||||
DSSE matters because it:
|
||||
|
||||
* authenticates message + type
|
||||
* avoids canonicalization pitfalls
|
||||
* supports arbitrary encodings ([GitHub][4])
|
||||
|
||||
**Engineering rule:** the signed payload must include enough data to replay and audit (policy + knowledge snapshot digests), but avoid embedding huge evidence blobs directly.
|
||||
|
||||
### C. Build determinism into the evaluation core (not bolted on)
|
||||
|
||||
**“Same inputs → same verdict” is a software architecture constraint.**
|
||||
It fails if any of these are non-deterministic:
|
||||
|
||||
* fetching “latest” vulnerability DB at runtime
|
||||
* unstable iteration order (maps/hashes)
|
||||
* timestamps included as decision inputs
|
||||
* concurrency races changing aggregation order
|
||||
* floating point scoring without canonical rounding
|
||||
|
||||
**Engineering requirements**
|
||||
|
||||
1. Create a **Knowledge Snapshot** object (content-addressed):
|
||||
|
||||
* a manifest listing every dataset input by digest and version
|
||||
2. The evaluation function becomes:
|
||||
|
||||
* `Verdict = Evaluate(subject_digest, policy_digest, knowledge_snapshot_digest, engine_version, options_digest)`
|
||||
3. The RVA must embed those digests so replay is possible offline.
|
||||
|
||||
**Dev acceptance tests**
|
||||
|
||||
* Run Evaluate twice with same snapshot/policy → verdict + reason codes identical.
|
||||
* Run Evaluate with one dataset changed (snapshot digest differs) → RVA must reflect changed snapshot digest.
|
||||
|
||||
### D. Treat “evidence” as a graph of content-addressed artifacts
|
||||
|
||||
Implement evidence storage with these rules:
|
||||
|
||||
* Large evidence artifacts are stored as OCI blobs/artifacts (SBOM, VEX bundle, reachability proof graph, config snapshot).
|
||||
* RVA references evidence by digest and type.
|
||||
* “Explain” traverses this graph and renders:
|
||||
|
||||
* a machine-readable explanation JSON
|
||||
* a human-readable report
|
||||
|
||||
ORAS guidance highlights artifact typing via `artifactType` in OCI 1.1 and suggests keeping manifests manageable; don’t overload annotations. ([oras.land][5])
|
||||
|
||||
### E. Provide a verification and policy enforcement path
|
||||
|
||||
You want customers to be able to enforce “only run artifacts with an approved RVA predicate.”
|
||||
|
||||
Two practical patterns:
|
||||
|
||||
* **Cosign verification of attestations** (customers can do `verify-attestation` and validate predicate structure; cosign supports validating attestations with policy languages like Rego/CUE). ([Sigstore][2])
|
||||
* **Notation signatures** for organizations that standardize on Notary/Notation for OCI signing/verification workflows. ([Microsoft Learn][3])
|
||||
|
||||
Engineering should not hard-code one choice; implement an abstraction:
|
||||
|
||||
* signing backend: `cosign/DSSE` first
|
||||
* optional: notation signature over the RVA artifact for environments that require it
|
||||
|
||||
---
|
||||
|
||||
## 4) Minimal “v1” spec by example (what your teams should build)
|
||||
|
||||
### A. OCI artifact requirements (registry-facing)
|
||||
|
||||
* artifact is discoverable as a referrer via `subject` linkage and `artifactType` classification (OCI 1.1). ([opencontainers.org][1])
|
||||
|
||||
### B. Attestation payload structure (contract-facing)
|
||||
|
||||
In code terms (illustrative only), build on the in‑toto Statement model:
|
||||
|
||||
```json
|
||||
{
|
||||
"_type": "https://in-toto.io/Statement/v0.1",
|
||||
"subject": [
|
||||
{
|
||||
"name": "oci://registry.example.com/team/app",
|
||||
"digest": { "sha256": "<SUBJECT_DIGEST>" }
|
||||
}
|
||||
],
|
||||
"predicateType": "https://stellaops.dev/attestations/risk-verdict/v1",
|
||||
"predicate": {
|
||||
"verdict": "FAIL",
|
||||
"reasonCodes": ["RISK.CVE.REACHABLE", "RISK.POLICY.THRESHOLD_EXCEEDED"],
|
||||
"policy": { "id": "prod-gate", "digest": "sha256:<POLICY_DIGEST>" },
|
||||
"knowledgeSnapshot": { "id": "ks-2025-12-19", "digest": "sha256:<KS_DIGEST>" },
|
||||
"evidence": {
|
||||
"sbom": { "digest": "sha256:<SBOM_DIGEST>", "format": "cyclonedx-json" },
|
||||
"vexBundle": { "digest": "sha256:<VEX_DIGEST>", "format": "openvex" },
|
||||
"vulnData": { "digest": "sha256:<VULN_FEEDS_DIGEST>" },
|
||||
"reachability": { "digest": "sha256:<REACH_PROOF_DIGEST>" },
|
||||
"unknowns": { "count": 2, "digest": "sha256:<UNKNOWNS_DIGEST>" }
|
||||
},
|
||||
"engine": { "name": "stella-eval", "version": "1.3.0" }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Cosign supports creating and verifying in‑toto attestations (DSSE-signed), which is exactly the interoperability you want for customer-side verification. ([Sigstore][2])
|
||||
|
||||
---
|
||||
|
||||
## 5) Definition of Done (use this to align PM/Eng and prevent scope drift)
|
||||
|
||||
### v1 must satisfy all of the following:
|
||||
|
||||
1. **OCI-attached:** RVA is stored as an OCI artifact referrer to the subject digest and discoverable (referrers API + fallback mode). ([opencontainers.org][1])
|
||||
2. **Signed:** RVA can be verified by a standard toolchain (cosign at minimum). ([Sigstore][2])
|
||||
3. **Replayable:** Given the embedded policy + knowledge snapshot digests, the evaluation can be replayed and produces the same verdict + reason codes.
|
||||
4. **Auditor extractable:** One command produces an audit package containing:
|
||||
|
||||
* RVA attestation
|
||||
* policy bundle
|
||||
* knowledge snapshot manifest
|
||||
* referenced evidence artifacts
|
||||
* an “explanation report” rendering the decision
|
||||
5. **Stable contract:** predicate schema is versioned and validated (strict JSON schema checks; backwards compatibility rules).
|
||||
@@ -0,0 +1,463 @@
|
||||
## Outcome you are shipping
|
||||
|
||||
A deterministic “claim resolution” capability that takes:
|
||||
|
||||
* Multiple **claims** about the same vulnerability (vendor VEX, distro VEX, internal assessments, scanner inferences),
|
||||
* A **policy** describing trust and merge semantics,
|
||||
* A set of **evidence artifacts** (SBOM, config snapshots, reachability proofs, etc.),
|
||||
|
||||
…and produces a **single resolved status** per vulnerability/component/artifact **with an explainable trail**:
|
||||
|
||||
* Which claims applied and why
|
||||
* Which were rejected and why
|
||||
* What evidence was required and whether it was satisfied
|
||||
* What policy rules triggered the resolution outcome
|
||||
|
||||
This replaces naive precedence like `vendor > distro > internal`.
|
||||
|
||||
---
|
||||
|
||||
# Directions for Product Managers
|
||||
|
||||
## 1) Write the PRD around “claims resolution,” not “VEX support”
|
||||
|
||||
The customer outcome is not “we ingest VEX.” It is:
|
||||
|
||||
* “We can *safely* accept ‘not affected’ without hiding risk.”
|
||||
* “We can prove, to auditors and change control, why a CVE was downgraded.”
|
||||
* “We can consistently resolve conflicts between issuer statements.”
|
||||
|
||||
### Non-negotiable product properties
|
||||
|
||||
* **Deterministic**: same inputs → same resolved outcome
|
||||
* **Explainable**: a human can trace the decision path
|
||||
* **Guardrailed**: a “safe” resolution requires evidence, not just a statement
|
||||
|
||||
---
|
||||
|
||||
## 2) Define the core objects (these drive everything)
|
||||
|
||||
In the PRD, define these three objects explicitly:
|
||||
|
||||
### A) Claim (normalized)
|
||||
|
||||
A “claim” is any statement about vulnerability applicability to an artifact/component, regardless of source format.
|
||||
|
||||
Minimum fields:
|
||||
|
||||
* `vuln_id` (CVE/GHSA/etc.)
|
||||
* `subject` (component identity; ideally package + version + digest/purl)
|
||||
* `target` (the thing we’re evaluating: image, repo build, runtime instance)
|
||||
* `status` (affected / not_affected / fixed / under_investigation / unknown)
|
||||
* `justification` (human/machine reason)
|
||||
* `issuer` (who said it; plus verification state)
|
||||
* `scope` (what it applies to; versions, ranges, products)
|
||||
* `timestamp` (when produced)
|
||||
* `references` (links/IDs to evidence or external material)
|
||||
|
||||
### B) Evidence
|
||||
|
||||
A typed artifact that can satisfy a requirement.
|
||||
|
||||
Examples (not exhaustive):
|
||||
|
||||
* `config_snapshot` (e.g., Helm values, env var map, feature flag export)
|
||||
* `sbom_presence_or_absence` (SBOM proof that component is/ isn’t present)
|
||||
* `reachability_proof` (call-path evidence from entrypoint to vulnerable symbol)
|
||||
* `symbol_absence` (binary inspection shows symbol/function not present)
|
||||
* `patch_presence` (artifact includes backport / fixed build)
|
||||
* `manual_attestation` (human-reviewed attestation with reviewer identity + scope)
|
||||
|
||||
Each evidence item must have:
|
||||
|
||||
* `type`
|
||||
* `collector` (tool/provider)
|
||||
* `inputs_hash` and `output_hash`
|
||||
* `scope` (what artifact/environment it applies to)
|
||||
* `confidence` (optional but recommended)
|
||||
* `expires_at` / `valid_for` (for config/runtime evidence)
|
||||
|
||||
### C) Policy
|
||||
|
||||
A policy describes:
|
||||
|
||||
* **Trust rules** (how much to trust whom, under which conditions)
|
||||
* **Merge semantics** (how to resolve conflicts)
|
||||
* **Evidence requirements** (what must be present to accept certain claims)
|
||||
|
||||
---
|
||||
|
||||
## 3) Ship “policy-controlled merge semantics” as a configuration schema first
|
||||
|
||||
Do not start with a fully general policy language. You need a small, explicit schema that makes behavior predictable.
|
||||
|
||||
PM deliverable: a policy spec with these sections:
|
||||
|
||||
1. **Issuer trust**
|
||||
|
||||
* weights by issuer category (vendor/distro/internal/scanner)
|
||||
* optional constraints (must be signed, must match product ownership, must be within time window)
|
||||
2. **Applicability rules**
|
||||
|
||||
* what constitutes a match to artifact/component (range semantics, digest match priority)
|
||||
3. **Evidence requirements**
|
||||
|
||||
* per status + per justification: what evidence types are required
|
||||
4. **Conflict resolution strategy**
|
||||
|
||||
* conservative vs weighted vs most-specific
|
||||
* explicit guardrails (never accept “safe” without evidence)
|
||||
5. **Override rules**
|
||||
|
||||
* when internal can override vendor (and what evidence is required to do so)
|
||||
* environment-specific policies (prod vs dev)
|
||||
|
||||
---
|
||||
|
||||
## 4) Make “evidence hooks” a first-class user workflow
|
||||
|
||||
You are explicitly shipping the ability to say:
|
||||
|
||||
> “This is not affected **because** feature flag X is off.”
|
||||
|
||||
That requires:
|
||||
|
||||
* a way to **provide or discover** feature flag state, and
|
||||
* a way to **bind** that flag to the vulnerable surface
|
||||
|
||||
PM must specify: what does the user do to assert that?
|
||||
|
||||
Minimum viable workflow:
|
||||
|
||||
* User attaches a `config_snapshot` (or system captures it)
|
||||
* User provides a “binding” to the vulnerable module/function:
|
||||
|
||||
* either automatic (later) or manual (first release)
|
||||
* e.g., `flag X gates module Y` with references (file path, code reference, runbook)
|
||||
|
||||
This “binding” itself becomes evidence.
|
||||
|
||||
---
|
||||
|
||||
## 5) Define acceptance criteria as decision trace tests
|
||||
|
||||
PM should write acceptance criteria as “given claims + policy + evidence → resolved outcome + trace”.
|
||||
|
||||
You need at least these canonical tests:
|
||||
|
||||
1. **Distro backport vs vendor version logic conflict**
|
||||
|
||||
* Vendor says affected (by version range)
|
||||
* Distro says fixed (backport)
|
||||
* Policy says: in distro context, distro claim can override vendor if patch evidence exists
|
||||
* Outcome: fixed, with trace proving why
|
||||
|
||||
2. **Internal ‘feature flag off’ downgrade**
|
||||
|
||||
* Vendor says affected
|
||||
* Internal says not_affected because flag off
|
||||
* Evidence: config snapshot + flag→module binding
|
||||
* Outcome: not_affected **only for that environment context**, with trace
|
||||
|
||||
3. **Evidence missing**
|
||||
|
||||
* Internal says not_affected because “code not reachable”
|
||||
* No reachability evidence present
|
||||
* Outcome: unknown or affected (policy-dependent), but **not “not_affected”**
|
||||
|
||||
4. **Conflicting “safe” claims**
|
||||
|
||||
* Vendor says not_affected (reason A)
|
||||
* Internal says affected (reason B) with strong evidence
|
||||
* Outcome follows merge strategy, and trace must show why.
|
||||
|
||||
---
|
||||
|
||||
## 6) Package it as an “Explainable Resolution” feature
|
||||
|
||||
UI/UX requirements PM must specify:
|
||||
|
||||
* A “Resolved Status” view per vuln/component showing:
|
||||
|
||||
* contributing claims (ranked)
|
||||
* rejected claims (with reason)
|
||||
* evidence required vs evidence present
|
||||
* the policy clauses triggered (line-level references)
|
||||
* A policy editor can be CLI/JSON first; UI later, but explainability cannot wait.
|
||||
|
||||
---
|
||||
|
||||
# Directions for Development Managers
|
||||
|
||||
## 1) Implement as three services/modules with strict interfaces
|
||||
|
||||
### Module A: Claim Normalization
|
||||
|
||||
* Inputs: OpenVEX / CycloneDX VEX / CSAF / internal annotations / scanner hints
|
||||
* Output: canonical `Claim` objects
|
||||
|
||||
Rules:
|
||||
|
||||
* Canonicalize IDs (normalize CVE formats, normalize package coordinates)
|
||||
* Preserve provenance: issuer identity, signature metadata, timestamps, original document hash
|
||||
|
||||
### Module B: Evidence Providers (plugin boundary)
|
||||
|
||||
* Provide an interface like:
|
||||
|
||||
```
|
||||
evaluate_evidence(context, claim) -> EvidenceEvaluation
|
||||
```
|
||||
|
||||
Where `EvidenceEvaluation` returns:
|
||||
|
||||
* required evidence types for this claim (from policy)
|
||||
* found evidence items (from store/providers)
|
||||
* satisfied / not satisfied
|
||||
* explanation strings
|
||||
* confidence
|
||||
|
||||
Start with 3 providers:
|
||||
|
||||
1. SBOM provider (presence/absence)
|
||||
2. Config provider (feature flags/config snapshot ingestion)
|
||||
3. Reachability provider (even if initially limited or stubbed, it must exist as a typed hook)
|
||||
|
||||
### Module C: Merge & Resolution Engine
|
||||
|
||||
* Inputs: set of claims + policy + evidence evaluations + context
|
||||
* Output: `ResolvedDecision`
|
||||
|
||||
A `ResolvedDecision` must include:
|
||||
|
||||
* final status
|
||||
* selected “winning” claim(s)
|
||||
* all considered claims
|
||||
* evidence satisfaction summary
|
||||
* applied policy rule IDs
|
||||
* deterministic ordering keys/hashes
|
||||
|
||||
---
|
||||
|
||||
## 2) Define the evaluation context (this avoids foot-guns)
|
||||
|
||||
The resolved outcome must be context-aware.
|
||||
|
||||
Create an immutable `EvaluationContext` object, containing:
|
||||
|
||||
* artifact identity (image digest / build digest / SBOM hash)
|
||||
* environment identity (prod/stage/dev; cluster; region)
|
||||
* config snapshot ID
|
||||
* time (evaluation timestamp)
|
||||
* policy version hash
|
||||
|
||||
This is how you support: “not affected because feature flag off” in prod but not in dev.
|
||||
|
||||
---
|
||||
|
||||
## 3) Merge semantics: implement scoring + guardrails, not precedence
|
||||
|
||||
You need a deterministic function. One workable approach:
|
||||
|
||||
### Step 1: compute statement strength
|
||||
|
||||
For each claim:
|
||||
|
||||
* `trust_weight` from policy (issuer + scope + signature requirements)
|
||||
* `evidence_factor` (1.0 if requirements satisfied; <1 or 0 if not)
|
||||
* `specificity_factor` (exact digest match > exact version > range)
|
||||
* `freshness_factor` (optional; policy-defined)
|
||||
* `applicability` must be true or claim is excluded
|
||||
|
||||
Compute:
|
||||
|
||||
```
|
||||
support = trust_weight * evidence_factor * specificity_factor * freshness_factor
|
||||
```
|
||||
|
||||
### Step 2: apply merge strategy (policy-controlled)
|
||||
|
||||
Ship at least two strategies:
|
||||
|
||||
1. **Conservative default**
|
||||
|
||||
* If any “unsafe” claim (affected/under_investigation) has support above threshold, it wins
|
||||
* A “safe” claim (not_affected/fixed) can override only if:
|
||||
|
||||
* it has equal/higher support + delta, AND
|
||||
* its evidence requirements are satisfied
|
||||
|
||||
2. **Evidence-weighted**
|
||||
|
||||
* Highest support wins, but safe statuses have a hard evidence gate
|
||||
|
||||
### Step 3: apply guardrails
|
||||
|
||||
Hard guardrail to prevent bad outcomes:
|
||||
|
||||
* **Never emit a safe status unless evidence requirements for that safe claim are satisfied.**
|
||||
* If a safe claim lacks evidence, downgrade the safe claim to “unsupported” and do not allow it to win.
|
||||
|
||||
This single rule is what makes your system materially different from “VEX as suppression.”
|
||||
|
||||
---
|
||||
|
||||
## 4) Evidence hooks: treat them as typed contracts, not strings
|
||||
|
||||
For “feature flag off,” implement it as a structured evidence requirement.
|
||||
|
||||
Example evidence requirement for a “safe because feature flag off” claim:
|
||||
|
||||
* Required evidence types:
|
||||
|
||||
* `config_snapshot`
|
||||
* `flag_binding` (the mapping “flag X gates vulnerable surface Y”)
|
||||
|
||||
Implementation:
|
||||
|
||||
* Config provider can parse:
|
||||
|
||||
* Helm values / env var sets / feature flag exports
|
||||
* Store them as normalized key/value with hashes
|
||||
* Binding evidence can start as manual JSON that references:
|
||||
|
||||
* repo path / module / function group
|
||||
* a link to code ownership / runbook
|
||||
* optional test evidence
|
||||
|
||||
Later you can automate binding via static analysis, but do not block shipping on that.
|
||||
|
||||
---
|
||||
|
||||
## 5) Determinism requirements (engineering non-negotiables)
|
||||
|
||||
Development manager should enforce:
|
||||
|
||||
* stable sorting of claims by canonical key
|
||||
* stable tie-breakers (e.g., issuer ID, timestamp, claim hash)
|
||||
* no nondeterministic external calls during evaluation (or they must be snapshot-based)
|
||||
* every evaluation produces:
|
||||
|
||||
* `input_bundle_hash` (claims + evidence + policy + context)
|
||||
* `decision_hash`
|
||||
|
||||
This is the foundation for replayability and audits.
|
||||
|
||||
---
|
||||
|
||||
## 6) Storage model: store raw inputs and canonical forms
|
||||
|
||||
Minimum stores:
|
||||
|
||||
* Raw documents (original VEX/CSAF/etc.) keyed by content hash
|
||||
* Canonical claims keyed by claim hash
|
||||
* Evidence items keyed by evidence hash and scoped by context
|
||||
* Policy versions keyed by policy hash
|
||||
* Resolutions keyed by (context, vuln_id, subject) with decision hash
|
||||
|
||||
---
|
||||
|
||||
## 7) “Definition of done” checklist for engineering
|
||||
|
||||
You are done when:
|
||||
|
||||
1. You can ingest at least two formats into canonical claims (pick OpenVEX + CycloneDX VEX first).
|
||||
2. You can configure issuer trust and evidence requirements in a policy file.
|
||||
3. You can resolve conflicts deterministically.
|
||||
4. You can attach a config snapshot and produce:
|
||||
|
||||
* `not_affected because feature flag off` **only when evidence satisfied**
|
||||
5. The system produces a decision trace with:
|
||||
|
||||
* applied policy rules
|
||||
* evidence satisfaction
|
||||
* selected/rejected claims and reasons
|
||||
6. Golden test vectors exist for the acceptance scenarios listed above.
|
||||
|
||||
---
|
||||
|
||||
# A concrete example policy (schema-first, no full DSL required)
|
||||
|
||||
```yaml
|
||||
version: 1
|
||||
|
||||
trust:
|
||||
issuers:
|
||||
- match: {category: vendor}
|
||||
weight: 70
|
||||
require_signature: true
|
||||
- match: {category: distro}
|
||||
weight: 75
|
||||
require_signature: true
|
||||
- match: {category: internal}
|
||||
weight: 85
|
||||
require_signature: false
|
||||
- match: {category: scanner}
|
||||
weight: 40
|
||||
|
||||
evidence_requirements:
|
||||
safe_status_requires_evidence: true
|
||||
|
||||
rules:
|
||||
- when:
|
||||
status: not_affected
|
||||
reason: feature_flag_off
|
||||
require: [config_snapshot, flag_binding]
|
||||
|
||||
- when:
|
||||
status: not_affected
|
||||
reason: component_not_present
|
||||
require: [sbom_absence]
|
||||
|
||||
- when:
|
||||
status: not_affected
|
||||
reason: not_reachable
|
||||
require: [reachability_proof]
|
||||
|
||||
merge:
|
||||
strategy: conservative
|
||||
unsafe_wins_threshold: 50
|
||||
safe_override_delta: 10
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# A concrete example output trace (what auditors and engineers must see)
|
||||
|
||||
```json
|
||||
{
|
||||
"vuln_id": "CVE-XXXX-YYYY",
|
||||
"subject": "pkg:maven/org.example/foo@1.2.3",
|
||||
"context": {
|
||||
"artifact_digest": "sha256:...",
|
||||
"environment": "prod",
|
||||
"policy_hash": "sha256:..."
|
||||
},
|
||||
"resolved_status": "not_affected",
|
||||
"because": [
|
||||
{
|
||||
"winning_claim": "claim_hash_abc",
|
||||
"reason": "feature_flag_off",
|
||||
"evidence_required": ["config_snapshot", "flag_binding"],
|
||||
"evidence_present": ["ev_hash_1", "ev_hash_2"],
|
||||
"policy_rules_applied": ["trust.issuers[internal]", "evidence.rules[0]", "merge.safe_override_delta"]
|
||||
}
|
||||
],
|
||||
"claims_considered": [
|
||||
{"issuer": "vendor", "status": "affected", "support": 62, "accepted": false, "rejection_reason": "overridden_by_higher_support_safe_claim_with_satisfied_evidence"},
|
||||
{"issuer": "internal", "status": "not_affected", "support": 78, "accepted": true, "evidence_satisfied": true}
|
||||
],
|
||||
"decision_hash": "sha256:..."
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## The two strategic pitfalls to explicitly avoid
|
||||
|
||||
1. **“Trust precedence” as the merge mechanism**
|
||||
|
||||
* It will fail immediately on backports, forks, downstream patches, and environment-specific mitigations.
|
||||
2. **Allowing “safe” without evidence**
|
||||
|
||||
* That turns VEX into a suppression system and will collapse trust in the product.
|
||||
@@ -0,0 +1,338 @@
|
||||
## Executive directive
|
||||
|
||||
Build **Reachability as Evidence**, not as a UI feature.
|
||||
|
||||
Every reachability conclusion must produce a **portable, signed, replayable evidence bundle** that answers:
|
||||
|
||||
1. **What vulnerable code unit is being discussed?** (symbol/method/function + version)
|
||||
2. **What entrypoint is assumed?** (HTTP handler, RPC method, CLI, scheduled job, etc.)
|
||||
3. **What is the witness?** (a call-path subgraph, not a screenshot)
|
||||
4. **What assumptions/gates apply?** (config flags, feature toggles, runtime wiring)
|
||||
5. **Can a third party reproduce it?** (same inputs → same evidence hash)
|
||||
|
||||
This must work for **source** and **post-build artifacts**.
|
||||
|
||||
---
|
||||
|
||||
# Directions for Product Managers
|
||||
|
||||
## 1) Define the product contract in one page
|
||||
|
||||
### Capability name
|
||||
**Proof‑carrying reachability**.
|
||||
|
||||
### Contract
|
||||
Given an artifact (source or built) and a vulnerability mapping, Stella Ops outputs:
|
||||
|
||||
- **Reachability verdict:** `REACHABLE | NOT_PROVEN_REACHABLE | INCONCLUSIVE`
|
||||
- **Witness evidence:** a minimal **reachability subgraph** + one or more witness paths
|
||||
- **Reproducibility bundle:** all inputs and toolchain metadata needed to replay
|
||||
- **Attestation:** signed statement tied to the artifact digest
|
||||
|
||||
### Important language choice
|
||||
Avoid claiming “unreachable” unless you can prove non-reachability under a formally sound model.
|
||||
|
||||
- Use **NOT_PROVEN_REACHABLE** for “no path found under current analysis + assumptions.”
|
||||
- Use **INCONCLUSIVE** when analysis cannot be performed reliably (missing symbols, obfuscation, unsupported language, dynamic dispatch uncertainty, etc.).
|
||||
|
||||
This is essential for credibility and audit use.
|
||||
|
||||
---
|
||||
|
||||
## 2) Anchor personas and top workflows
|
||||
|
||||
### Primary personas
|
||||
- Security governance / AppSec: wants fewer false positives and defensible prioritization.
|
||||
- Compliance/audit: wants evidence and replayability.
|
||||
- Engineering teams: wants specific call paths and what to change.
|
||||
|
||||
### Top workflows (must support in MVP)
|
||||
1. **CI gate with signed verdict**
|
||||
- “Block release if any `REACHABLE` high severity is present OR if `INCONCLUSIVE` exceeds threshold.”
|
||||
2. **Audit replay**
|
||||
- “Reproduce the reachability proof for artifact digest X using snapshot Y.”
|
||||
3. **Release delta**
|
||||
- “Show what reachability changed between release A and B.”
|
||||
|
||||
---
|
||||
|
||||
## 3) Minimum viable scope: pick targets that make “post-build” real early
|
||||
|
||||
To satisfy “source and post-build artifacts” without biting off ELF-level complexity first:
|
||||
|
||||
### MVP artifact types (recommended)
|
||||
- **Source repository** for 1–2 languages with mature static IR
|
||||
- **Post-build intermediate artifacts** that retain symbol structure:
|
||||
- Java `.jar/.class`
|
||||
- .NET assemblies
|
||||
- Python wheels (bytecode)
|
||||
- Node bundles with sourcemaps (optional)
|
||||
|
||||
These give you “post-build” support where call graphs are tractable.
|
||||
|
||||
### Defer for later phases
|
||||
- Native ELF/Mach-O deep reachability (harder due to stripping, inlining, indirect calls, dynamic loading)
|
||||
- Highly dynamic languages without strong type info, unless you accept “witness-only” semantics
|
||||
|
||||
Your differentiator is proof portability and determinism, not “supports every binary on day one.”
|
||||
|
||||
---
|
||||
|
||||
## 4) Product requirements: what “proof-carrying” means in requirements language
|
||||
|
||||
### Functional requirements
|
||||
- Output must include a **reachability subgraph**:
|
||||
- Nodes = code units (function/method) with stable IDs
|
||||
- Edges = call or dispatch edges with type annotations
|
||||
- Must include at least one **witness path** from entrypoint to vulnerable node when `REACHABLE`
|
||||
- Output must be **artifact-tied**:
|
||||
- Evidence must reference artifact digest(s) (source commit, build artifact digest, container image digest)
|
||||
- Output must be **attestable**:
|
||||
- Produce a signed attestation (DSSE/in-toto style) attached to the artifact digest
|
||||
- Output must be **replayable**:
|
||||
- Provide a “replay recipe” (analyzer versions, configs, vulnerability mapping version, and input digests)
|
||||
|
||||
### Non-functional requirements
|
||||
- Deterministic: repeated runs on same inputs produce identical evidence hash
|
||||
- Size-bounded: subgraph evidence must be bounded (e.g., path-based extraction + limited context)
|
||||
- Privacy-controllable:
|
||||
- Support a mode that avoids embedding raw source content (store pointers/hashes instead)
|
||||
- Verifiable offline:
|
||||
- Verification and replay must work air-gapped given the snapshot bundle
|
||||
|
||||
---
|
||||
|
||||
## 5) Acceptance criteria (use as Definition of Done)
|
||||
|
||||
A feature is “done” only when:
|
||||
|
||||
1. **Verifier can validate** the attestation signature and confirm the evidence hash matches content.
|
||||
2. A second machine can **reproduce the same evidence hash** given the replay bundle.
|
||||
3. Evidence includes at least one witness path for `REACHABLE`.
|
||||
4. Evidence includes explicit assumptions/gates; absence of gating is recorded as an assumption (e.g., “config unknown”).
|
||||
5. Evidence is **linked to the precise artifact digest** being deployed/scanned.
|
||||
|
||||
---
|
||||
|
||||
## 6) Product packaging decisions that create switching cost
|
||||
|
||||
These are product decisions that turn engineering into moat:
|
||||
|
||||
- **Make “reachability proof” an exportable object**, not just a UI view.
|
||||
- Provide an API: `GET /findings/{id}/proof` returning canonical evidence.
|
||||
- Support policy gates on:
|
||||
- `verdict`
|
||||
- `confidence`
|
||||
- `assumption_count`
|
||||
- `inconclusive_reasons`
|
||||
- Make “proof replay” a one-command workflow in CLI.
|
||||
|
||||
---
|
||||
|
||||
# Directions for Development Managers
|
||||
|
||||
## 1) Architecture: build a “proof pipeline” with strict boundaries
|
||||
|
||||
Implement as composable modules with stable interfaces:
|
||||
|
||||
1. **Artifact Resolver**
|
||||
- Inputs: repo URL/commit, build artifact path, container image digest
|
||||
- Output: normalized “artifact record” with digests and metadata
|
||||
|
||||
2. **Graph Builder (language-specific adapters)**
|
||||
- Inputs: artifact record
|
||||
- Output: canonical **Program Graph**
|
||||
- Nodes: code units
|
||||
- Edges: calls/dispatch
|
||||
- Optional: config gates, dependency edges
|
||||
|
||||
3. **Vulnerability-to-Code Mapper**
|
||||
- Inputs: vulnerability record (CVE), package coordinates, symbol metadata (if available)
|
||||
- Output: vulnerable node set + mapping confidence
|
||||
|
||||
4. **Entrypoint Modeler**
|
||||
- Inputs: artifact + runtime context (framework detection, routing tables, main methods)
|
||||
- Output: entrypoint node set with types (HTTP, RPC, CLI, cron)
|
||||
|
||||
5. **Reachability Engine**
|
||||
- Inputs: graph + entrypoints + vulnerable nodes + constraints
|
||||
- Output: witness paths + minimal subgraph extraction
|
||||
|
||||
6. **Evidence Canonicalizer**
|
||||
- Inputs: witness paths + subgraph + metadata
|
||||
- Output: canonical JSON (stable ordering, stable IDs), plus content hash
|
||||
|
||||
7. **Attestor**
|
||||
- Inputs: evidence hash + artifact digest
|
||||
- Output: signed attestation object (OCI attachable)
|
||||
|
||||
8. **Verifier (separate component)**
|
||||
- Must validate signatures + evidence integrity independently of generator
|
||||
|
||||
Critical: generator and verifier must be decoupled to preserve trust.
|
||||
|
||||
---
|
||||
|
||||
## 2) Evidence model: what to store (and how to keep it stable)
|
||||
|
||||
### Node identity must be stable across runs
|
||||
Define a canonical NodeID scheme:
|
||||
|
||||
- Source node ID:
|
||||
- `{language}:{repo_digest}:{symbol_signature}:{optional_source_location_hash}`
|
||||
- Post-build node ID:
|
||||
- `{language}:{artifact_digest}:{symbol_signature}:{optional_offset_or_token}`
|
||||
|
||||
Avoid raw file paths or non-deterministic compiler offsets as primary IDs unless normalized.
|
||||
|
||||
### Edge identity
|
||||
`{caller_node_id} -> {callee_node_id} : {edge_type}`
|
||||
Edge types matter (direct call, virtual dispatch, reflection, dynamic import, etc.)
|
||||
|
||||
### Subgraph extraction rule
|
||||
Store:
|
||||
- All nodes/edges on at least one witness path (or k witness paths)
|
||||
- Plus bounded context:
|
||||
- 1–2 hop neighborhood around the vulnerable node and entrypoint
|
||||
- routing edges (HTTP route → handler) where applicable
|
||||
|
||||
This makes the proof compact and audit-friendly.
|
||||
|
||||
### Canonicalization requirements
|
||||
- Stable sorting of nodes and edges
|
||||
- Canonical JSON serialization (no map-order nondeterminism)
|
||||
- Explicit analyzer version + config included in evidence
|
||||
- Hash everything that influences results
|
||||
|
||||
---
|
||||
|
||||
## 3) Determinism and reproducibility: engineering guardrails
|
||||
|
||||
### Deterministic computation
|
||||
- Avoid parallel graph traversal that yields nondeterministic order without canonical sorting
|
||||
- If using concurrency, collect results and sort deterministically before emitting
|
||||
|
||||
### Repro bundle (“time travel”)
|
||||
Persist, as digests:
|
||||
- Analyzer container/image digest
|
||||
- Analyzer config hash
|
||||
- Vulnerability mapping dataset version hash
|
||||
- Artifact digest(s)
|
||||
- Graph builder version hash
|
||||
|
||||
A replay must be possible without “calling home.”
|
||||
|
||||
### Golden tests
|
||||
Create fixtures where:
|
||||
- Same input graph + mapping → exact evidence hash
|
||||
- Regression test for canonicalization changes (version the schema intentionally)
|
||||
|
||||
---
|
||||
|
||||
## 4) Attestation format and verification
|
||||
|
||||
### Attestation contents (minimum)
|
||||
- Subject: artifact digest (image digest / build artifact digest)
|
||||
- Predicate: reachability evidence hash + metadata
|
||||
- Predicate type: `reachability` (custom) with versioning
|
||||
|
||||
### Verification requirements
|
||||
- Verification must run offline
|
||||
- It must validate:
|
||||
1) signature
|
||||
2) subject digest binding
|
||||
3) evidence hash matches serialized evidence
|
||||
|
||||
### Storage model
|
||||
Use content-addressable storage keyed by evidence hash.
|
||||
Attestation references the hash; evidence stored separately or embedded (size tradeoff).
|
||||
|
||||
---
|
||||
|
||||
## 5) Source + post-build support: engineering plan
|
||||
|
||||
### Unifying principle
|
||||
Both sources produce the same canonical Program Graph abstraction.
|
||||
|
||||
#### Source analyzers produce:
|
||||
- Function/method nodes using language signatures
|
||||
- Edges from static analysis IR
|
||||
|
||||
#### Post-build analyzers produce:
|
||||
- Nodes from bytecode/assembly symbol tables (where available)
|
||||
- Edges from bytecode call instructions / metadata
|
||||
|
||||
### Practical sequencing (recommended)
|
||||
1. Implement one source language adapter (fastest to prove model)
|
||||
2. Implement one post-build adapter where symbols are rich (e.g., Java bytecode)
|
||||
3. Ensure evidence schema and attestation workflow works identically for both
|
||||
4. Expand to more ecosystems once the proof pipeline is stable
|
||||
|
||||
---
|
||||
|
||||
## 6) Operational constraints (performance, size, security)
|
||||
|
||||
### Performance
|
||||
- Cache program graphs per artifact digest
|
||||
- Cache vulnerability-to-code mapping per package/version
|
||||
- Compute reachability on-demand per vulnerability, but reuse graphs
|
||||
|
||||
### Evidence size
|
||||
- Limit witness paths (e.g., up to N shortest paths)
|
||||
- Prefer “witness + bounded neighborhood” over exporting full call graph
|
||||
|
||||
### Security and privacy
|
||||
- Provide a “redacted proof mode”
|
||||
- include symbol hashes instead of raw names if needed
|
||||
- store source locations as hashes/pointers
|
||||
- Never embed raw source code unless explicitly enabled
|
||||
|
||||
---
|
||||
|
||||
## 7) Definition of Done for the engineering team
|
||||
|
||||
A milestone is complete when you can demonstrate:
|
||||
|
||||
1. Generate a reachability proof for a known vulnerable code unit with a witness path.
|
||||
2. Serialize a canonical evidence subgraph and compute a stable hash.
|
||||
3. Sign the attestation bound to the artifact digest.
|
||||
4. Verify the attestation on a clean machine (offline).
|
||||
5. Replay the analysis from the replay bundle and reproduce the same evidence hash.
|
||||
|
||||
---
|
||||
|
||||
# Concrete artifact example (for alignment)
|
||||
|
||||
A reachability evidence object should look structurally like:
|
||||
|
||||
- `subject`: artifact digest(s)
|
||||
- `claim`:
|
||||
- `verdict`: REACHABLE / NOT_PROVEN_REACHABLE / INCONCLUSIVE
|
||||
- `entrypoints`: list of NodeIDs
|
||||
- `vulnerable_nodes`: list of NodeIDs
|
||||
- `witness_paths`: list of paths (each path = ordered NodeIDs)
|
||||
- `subgraph`:
|
||||
- `nodes`: list with stable IDs + metadata
|
||||
- `edges`: list with stable ordering + edge types
|
||||
- `assumptions`:
|
||||
- gating conditions, unresolved dynamic dispatch notes, etc.
|
||||
- `tooling`:
|
||||
- analyzer name/version/digest
|
||||
- config hash
|
||||
- mapping dataset hash
|
||||
- `hashes`:
|
||||
- evidence content hash
|
||||
- schema version
|
||||
|
||||
Then wrap and sign it as an attestation tied to the artifact digest.
|
||||
|
||||
---
|
||||
|
||||
## The one decision you should force early
|
||||
|
||||
Decide (and document) whether your semantics are:
|
||||
|
||||
- **Witness-based** (“REACHABLE only if we can produce a witness path”), and
|
||||
- **Conservative on negative claims** (“NOT_PROVEN_REACHABLE” is not “unreachable”).
|
||||
|
||||
This single decision will keep the system honest, reduce legal/audit risk, and prevent the product from drifting into hand-wavy “trust us” scoring.
|
||||
@@ -0,0 +1,268 @@
|
||||
## 1) Product direction: make “Unknowns” a first-class risk primitive
|
||||
|
||||
### Non‑negotiable product principles
|
||||
|
||||
1. **Unknowns are not suppressed findings**
|
||||
|
||||
* They are a distinct state with distinct governance.
|
||||
2. **Unknowns must be policy-addressable**
|
||||
|
||||
* If policy cannot block or allow them explicitly, the feature is incomplete.
|
||||
3. **Unknowns must be attested**
|
||||
|
||||
* Every signed decision must carry “what we don’t know” in a machine-readable way.
|
||||
4. **Unknowns must be default-on**
|
||||
|
||||
* Users may adjust thresholds, but they must not be able to “turn off unknown tracking.”
|
||||
|
||||
### Definition: what counts as an “unknown”
|
||||
|
||||
PMs must ensure that “unknown” is not vague. Define **reason-coded unknowns**, for example:
|
||||
|
||||
* **U-RCH**: Reachability unknown (call path indeterminate)
|
||||
* **U-ID**: Component identity unknown (ambiguous package / missing digest / unresolved PURL)
|
||||
* **U-PROV**: Provenance unknown (cannot map binary → source/build)
|
||||
* **U-VEX**: VEX conflict or missing applicability statement
|
||||
* **U-FEED**: Knowledge source missing (offline feed gaps, mirror stale)
|
||||
* **U-CONFIG**: Config/runtime gate unknown (feature flag not observable)
|
||||
* **U-ANALYZER**: Analyzer limitation (language/framework unsupported)
|
||||
|
||||
Each unknown must have:
|
||||
|
||||
* `reason_code` (one of a stable enum)
|
||||
* `scope` (component, binary, symbol, package, image, repo)
|
||||
* `evidence_refs` (what we inspected)
|
||||
* `assumptions` (what would need to be true/false)
|
||||
* `remediation_hint` (how to reduce unknown)
|
||||
|
||||
**Acceptance criterion:** every unknown surfaced to users can be traced to a reason code and remediation hint.
|
||||
|
||||
---
|
||||
|
||||
## 2) Policy direction: “unknown budgets” must be enforceable and environment-aware
|
||||
|
||||
### Policy model requirements
|
||||
|
||||
Policy must support:
|
||||
|
||||
* Thresholds by environment (dev/test/stage/prod)
|
||||
* Thresholds by unknown type (reachability vs provenance vs feed, etc.)
|
||||
* Severity weighting (e.g., unknown on internet-facing service is worse)
|
||||
* Exception workflow (time-bound, owner-bound)
|
||||
* Deterministic evaluation (same inputs → same result)
|
||||
|
||||
### Recommended default policy posture (ship as opinionated defaults)
|
||||
|
||||
These defaults are intentionally strict in prod:
|
||||
|
||||
**Prod (default)**
|
||||
|
||||
* `unknown_reachable == 0` (fail build/deploy)
|
||||
* `unknown_provenance == 0` (fail)
|
||||
* `unknown_total <= 3` (fail if exceeded)
|
||||
* `unknown_feed == 0` (fail; “we didn’t have data” is unacceptable for prod)
|
||||
|
||||
**Stage**
|
||||
|
||||
* `unknown_reachable <= 1`
|
||||
* `unknown_provenance <= 1`
|
||||
* `unknown_total <= 10`
|
||||
|
||||
**Dev**
|
||||
|
||||
* Never hard fail by default; warn + ticket/PR annotation
|
||||
* Still compute unknowns and show trendlines (so teams see drift)
|
||||
|
||||
### Exception policy (required to avoid “disable unknowns” pressure)
|
||||
|
||||
Implement **explicit exceptions** rather than toggles:
|
||||
|
||||
* Exception must include: `owner`, `expiry`, `justification`, `scope`, `risk_ack`
|
||||
* Exception must be emitted into attestations and reports (“this passed with exception X”).
|
||||
|
||||
**Acceptance criterion:** there is no “turn off unknowns” knob; only thresholds and expiring exceptions.
|
||||
|
||||
---
|
||||
|
||||
## 3) Reporting direction: unknowns must be visible, triaged, and trendable
|
||||
|
||||
### Required reporting surfaces
|
||||
|
||||
1. **Release / PR report**
|
||||
|
||||
* Unknown summary at top:
|
||||
|
||||
* total unknowns
|
||||
* unknowns by reason code
|
||||
* unknowns blocking policy vs not
|
||||
* “What changed?” vs previous baseline (unknown delta)
|
||||
2. **Dashboard (portfolio view)**
|
||||
|
||||
* Unknowns over time
|
||||
* Top teams/services by unknown count
|
||||
* Top unknown causes (reason codes)
|
||||
3. **Operational triage view**
|
||||
|
||||
* “Unknown queue” sortable by:
|
||||
|
||||
* environment impact (prod/stage)
|
||||
* exposure class (internet-facing/internal)
|
||||
* reason code
|
||||
* last-seen time
|
||||
* owner
|
||||
|
||||
### Reporting should drive action, not anxiety
|
||||
|
||||
Every unknown row must include:
|
||||
|
||||
* Why it’s unknown (reason code + short explanation)
|
||||
* What evidence is missing
|
||||
* How to reduce unknown (concrete steps)
|
||||
* Expected effect (e.g., “adding debug symbols will likely reduce U-RCH by ~X”)
|
||||
|
||||
**Key PM instruction:** treat unknowns like an **SLO**. Teams should be able to commit to “unknowns in prod must trend to zero.”
|
||||
|
||||
---
|
||||
|
||||
## 4) Attestations direction: unknowns must be cryptographically bound to decisions
|
||||
|
||||
Every signed decision/attestation must include an “unknowns summary” section.
|
||||
|
||||
### Attestation requirements
|
||||
|
||||
Include at minimum:
|
||||
|
||||
* `unknown_total`
|
||||
* `unknown_by_reason_code` (map of reason→count)
|
||||
* `unknown_blocking_count`
|
||||
* `unknown_details_digest` (hash of the full list if too large)
|
||||
* `policy_thresholds_applied` (the exact thresholds used)
|
||||
* `exceptions_applied` (IDs + expiries)
|
||||
* `knowledge_snapshot_id` (feeds/policy bundle hash if you support offline snapshots)
|
||||
|
||||
**Why this matters:** if you sign a “pass,” you must also sign what you *didn’t know* at the time. Otherwise the signature is not audit-grade.
|
||||
|
||||
**Acceptance criterion:** any downstream verifier can reject a signed “pass” based solely on unknown fields (e.g., “reject if unknown_reachable>0 in prod”).
|
||||
|
||||
---
|
||||
|
||||
## 5) Development direction: implement unknown propagation as a first-class data flow
|
||||
|
||||
### Core engineering tasks (must be done in this order)
|
||||
|
||||
#### A. Define the canonical “Tri-state” evaluation type
|
||||
|
||||
For any security claim, the evaluator must return:
|
||||
|
||||
* `TRUE` (evidence supports)
|
||||
* `FALSE` (evidence refutes)
|
||||
* `UNKNOWN` (insufficient evidence)
|
||||
|
||||
Do not represent unknown as nulls or missing fields. It must be explicit.
|
||||
|
||||
#### B. Build the unknown aggregator and reason-code framework
|
||||
|
||||
* A single aggregation layer computes:
|
||||
|
||||
* unknown counts per scope
|
||||
* unknown counts per reason code
|
||||
* unknown “blockers” based on policy
|
||||
* This must be deterministic and stable (no random ordering, stable IDs).
|
||||
|
||||
#### C. Ensure analyzers emit unknowns instead of silently failing
|
||||
|
||||
Any analyzer that cannot conclude must emit:
|
||||
|
||||
* `UNKNOWN` + reason code + evidence pointers
|
||||
Examples:
|
||||
* call graph incomplete → `U-RCH`
|
||||
* stripped binary cannot map symbols → `U-PROV`
|
||||
* unsupported language → `U-ANALYZER`
|
||||
|
||||
#### D. Provide “reduce unknown” instrumentation hooks
|
||||
|
||||
Attach remediation metadata:
|
||||
|
||||
* “add build flags …”
|
||||
* “upload debug symbols …”
|
||||
* “enable source mapping …”
|
||||
* “mirror feeds …”
|
||||
|
||||
This is how you prevent user backlash.
|
||||
|
||||
---
|
||||
|
||||
## 6) Make it default rather than optional: rollout plan without breaking adoption
|
||||
|
||||
### Phase 1: compute + display (no blocking)
|
||||
|
||||
* Unknowns computed for all scans
|
||||
* Reports show unknown budgets and what would have failed in prod
|
||||
* Collect baseline metrics for 2–4 weeks of typical usage
|
||||
|
||||
### Phase 2: soft gating
|
||||
|
||||
* In prod-like pipelines: fail only on `unknown_reachable > 0`
|
||||
* Everything else warns + requires owner acknowledgement
|
||||
|
||||
### Phase 3: full policy enforcement
|
||||
|
||||
* Enforce default thresholds
|
||||
* Exceptions require expiry and are visible in attestations
|
||||
|
||||
### Phase 4: governance integration
|
||||
|
||||
* Unknowns become part of:
|
||||
|
||||
* release readiness checks
|
||||
* quarterly risk reviews
|
||||
* vendor compliance audits
|
||||
|
||||
**Dev Manager instruction:** invest in tooling that reduces unknowns early (symbol capture, provenance mapping, better analyzers). Otherwise “unknown gating” becomes politically unsustainable.
|
||||
|
||||
---
|
||||
|
||||
## 7) “Definition of Done” checklist for PMs and Dev Managers
|
||||
|
||||
### PM DoD
|
||||
|
||||
* [ ] Unknowns are explicitly defined with stable reason codes
|
||||
* [ ] Policy can fail on unknowns with environment-scoped thresholds
|
||||
* [ ] Reports show unknown deltas and remediation guidance
|
||||
* [ ] Exceptions are time-bound and appear everywhere (UI + API + attestations)
|
||||
* [ ] Unknowns cannot be disabled; only thresholds/exceptions are configurable
|
||||
|
||||
### Engineering DoD
|
||||
|
||||
* [ ] Tri-state evaluation implemented end-to-end
|
||||
* [ ] Analyzer failures never disappear; they become unknowns
|
||||
* [ ] Unknown aggregation is deterministic and reproducible
|
||||
* [ ] Signed attestation includes unknown summary + policy thresholds + exceptions
|
||||
* [ ] CI/CD integration can enforce “fail if unknowns > N in prod”
|
||||
|
||||
---
|
||||
|
||||
## 8) Concrete policy examples you can standardize internally
|
||||
|
||||
### Minimal policy (prod)
|
||||
|
||||
* Block deploy if:
|
||||
|
||||
* `unknown_reachable > 0`
|
||||
* OR `unknown_provenance > 0`
|
||||
|
||||
### Balanced policy (prod)
|
||||
|
||||
* Block deploy if:
|
||||
|
||||
* `unknown_reachable > 0`
|
||||
* OR `unknown_provenance > 0`
|
||||
* OR `unknown_total > 3`
|
||||
|
||||
### Risk-sensitive policy (internet-facing prod)
|
||||
|
||||
* Block deploy if:
|
||||
|
||||
* `unknown_reachable > 0`
|
||||
* OR `unknown_total > 1`
|
||||
* OR any unknown affects a component with known remotely-exploitable CVEs
|
||||
@@ -0,0 +1,299 @@
|
||||
## 1) Anchor the differentiator in one sentence everyone repeats
|
||||
|
||||
**Positioning invariant:**
|
||||
Stella Ops does not “consume VEX to suppress findings.” Stella Ops **verifies who made the claim, scores how much to trust it, deterministically applies it to a decision, and emits a signed, replayable verdict**.
|
||||
|
||||
Everything you ship should make that sentence more true.
|
||||
|
||||
---
|
||||
|
||||
## 2) Shared vocabulary PMs/DMs must standardize
|
||||
|
||||
If you don’t align on these, you’ll ship features that look similar to competitors but do not compound into a moat.
|
||||
|
||||
### Core objects
|
||||
- **VEX source**: a distribution channel and issuer identity (e.g., vendor feed, distro feed, OCI-attached attestation).
|
||||
- **Issuer identity**: cryptographic identity used to sign/attest the VEX (key/cert/OIDC identity), not a string.
|
||||
- **VEX statement**: one claim about one vulnerability status for one or more products; common statuses include *Not Affected, Affected, Fixed, Under Investigation* (terminology varies by format). citeturn6view1turn10view0
|
||||
- **Verification result**: cryptographic + semantic verification facts about a VEX document/source.
|
||||
- **Trust score**: deterministic numeric/ranked evaluation of the source and/or statement quality.
|
||||
- **Decision**: a policy outcome (pass/fail/needs-review) for a specific artifact or release.
|
||||
- **Attestation**: signed statement bound to an artifact (e.g., OCI artifact) that captures decision + evidence.
|
||||
- **Knowledge snapshot**: frozen set of inputs (VEX docs, keys, policies, vulnerability DB versions, scoring code version) required for deterministic replay.
|
||||
|
||||
---
|
||||
|
||||
## 3) Product Manager guidelines
|
||||
|
||||
### 3.1 Treat “VEX source onboarding” as a first-class product workflow
|
||||
Your differentiator collapses if VEX is just “upload a file.”
|
||||
|
||||
**PM requirements:**
|
||||
1. **VEX Source Registry UI/API**
|
||||
- Add/edit a source: URL/feed/OCI pattern, update cadence, expected issuer(s), allowed formats.
|
||||
- Define trust policy per source (thresholds, allowed statuses, expiry, overrides).
|
||||
2. **Issuer enrollment & key lifecycle**
|
||||
- Capture: issuer identity, trust anchor, rotation, revocation/deny-list, “break-glass disable.”
|
||||
3. **Operational status**
|
||||
- Source health: last fetch, last verified doc, signature failures, schema failures, drift.
|
||||
|
||||
**Why it matters:** customers will only operationalize VEX at scale if they can **govern it like a dependency feed**, not like a manual exception list.
|
||||
|
||||
### 3.2 Make “verification” visible, not implied
|
||||
If users can’t see it, they won’t trust it—and auditors won’t accept it.
|
||||
|
||||
**Minimum UX per VEX document/statement:**
|
||||
- Verification status: **Verified / Unverified / Failed**
|
||||
- Issuer identity: who signed it (and via what trust anchor)
|
||||
- Format + schema validation status (OpenVEX JSON schema exists and is explicitly recommended for validation). citeturn10view0
|
||||
- Freshness: timestamp, last updated
|
||||
- Product mapping coverage: “X of Y products matched to SBOM/components”
|
||||
|
||||
### 3.3 Provide “trust score explanations” as a primary UI primitive
|
||||
Trust scoring must not feel like a magic number.
|
||||
|
||||
**UX requirement:** every trust score shows a **breakdown** (e.g., Identity 30/30, Authority 20/25, Freshness 8/10, Evidence quality 6/10…).
|
||||
|
||||
This is both:
|
||||
- a user adoption requirement (security teams will challenge it), and
|
||||
- a moat hardener (competitors rarely expose scoring mechanics).
|
||||
|
||||
### 3.4 Define policy experiences that force deterministic coupling
|
||||
You are not building a “VEX viewer.” You are building **decisioning**.
|
||||
|
||||
Policies must allow:
|
||||
- “Accept VEX only if verified AND trust score ≥ threshold”
|
||||
- “Accept Not Affected only if justification/impact statement exists”
|
||||
- “If conflicting VEX exists, resolve by trust-weighted precedence”
|
||||
- “For unverified VEX, treat status as Under Investigation (or Unknown), not Not Affected”
|
||||
|
||||
This aligns with CSAF’s VEX profile expectation that *known_not_affected* should have an impact statement (machine-readable flag or human-readable justification). citeturn1view1
|
||||
|
||||
### 3.5 Ship “audit export” as a product feature, not a report
|
||||
Auditors want to know:
|
||||
- which VEX claims were applied,
|
||||
- who asserted them,
|
||||
- what trust policy allowed them,
|
||||
- and what was the resulting decision.
|
||||
|
||||
ENISA’s SBOM guidance explicitly emphasizes “historical snapshots” and “evidence chain integrity” as success criteria for SBOM/VEX integration programs. citeturn8view0
|
||||
|
||||
So your product needs:
|
||||
- exportable evidence bundles (machine-readable)
|
||||
- signed verdicts linked to the artifact
|
||||
- replay semantics (“recompute this exact decision later”)
|
||||
|
||||
### 3.6 MVP scoping: start with sources that prove the model
|
||||
For early product proof, prioritize sources that:
|
||||
- are official,
|
||||
- have consistent structure,
|
||||
- publish frequently,
|
||||
- contain configuration nuance.
|
||||
|
||||
Example: Ubuntu publishes VEX following OpenVEX, emphasizing exploitability in specific configurations and providing official distribution points (tarball + GitHub). citeturn9view0turn6view0
|
||||
|
||||
This gives you a clean first dataset for verification/trust scoring behaviors.
|
||||
|
||||
---
|
||||
|
||||
## 4) Development Manager guidelines
|
||||
|
||||
### 4.1 Architect it as a pipeline with hard boundaries
|
||||
Do not mix verification, scoring, and decisioning in one component. You need isolatable, testable stages.
|
||||
|
||||
**Recommended pipeline stages:**
|
||||
1. **Ingest**
|
||||
- Fetch from registry/OCI
|
||||
- Deduplicate by content hash
|
||||
2. **Parse & normalize**
|
||||
- Convert OpenVEX / CSAF VEX / CycloneDX VEX into a **canonical internal VEX model**
|
||||
- Note: OpenVEX explicitly calls out that CycloneDX VEX uses different status/justification labels and may need translation. citeturn10view0
|
||||
3. **Verify (cryptographic + semantic)**
|
||||
4. **Trust score (pure function)**
|
||||
5. **Conflict resolve**
|
||||
6. **Decision**
|
||||
7. **Attest + persist snapshot**
|
||||
|
||||
### 4.2 Verification must include both cryptography and semantics
|
||||
|
||||
#### Cryptographic verification (minimum bar)
|
||||
- Verify signature/attestation against expected issuer identity.
|
||||
- Validate certificate/identity chains per customer trust anchors.
|
||||
- Support OCI-attached artifacts and “signature-of-signature” patterns (Sigstore describes countersigning: signature artifacts can themselves be signed). citeturn1view3
|
||||
|
||||
#### Semantic verification (equally important)
|
||||
- Schema validation (OpenVEX provides JSON schema guidance). citeturn10view0
|
||||
- Vulnerability identifier validity (CVE/aliases)
|
||||
- Product reference validity (e.g., purl)
|
||||
- Statement completeness rules:
|
||||
- “Not affected” must include rationale; CSAF VEX profile requires an impact statement for known_not_affected in flags or threats. citeturn1view1
|
||||
- Cross-check the statement scope to known SBOM/components:
|
||||
- If the VEX references products that do not exist in the artifact SBOM, the claim should not affect the decision (or should reduce trust sharply).
|
||||
|
||||
### 4.3 Trust scoring must be deterministic by construction
|
||||
If trust scoring varies between runs, you cannot produce replayable, attestable decisions.
|
||||
|
||||
**Rules for determinism:**
|
||||
- Trust score is a **pure function** of:
|
||||
- VEX document hash
|
||||
- verification result
|
||||
- source configuration (immutable version)
|
||||
- scoring algorithm version
|
||||
- evaluation timestamp (explicit input, included in snapshot)
|
||||
- Never call external services during scoring unless responses are captured and hashed into the snapshot.
|
||||
|
||||
### 4.4 Implement two trust concepts: Source Trust and Statement Quality
|
||||
Do not overload one score to do everything.
|
||||
|
||||
- **Source Trust**: “how much do we trust the issuer/channel?”
|
||||
- **Statement Quality**: “how well-formed, specific, justified is this statement?”
|
||||
|
||||
You can then combine them:
|
||||
`TrustScore = f(SourceTrust, StatementQuality, Freshness, TrackRecord)`
|
||||
|
||||
### 4.5 Conflict resolution must be policy-driven, not hard-coded
|
||||
Conflicting VEX is inevitable:
|
||||
- vendor vs distro
|
||||
- older vs newer
|
||||
- internal vs external
|
||||
|
||||
Resolve via:
|
||||
- deterministic precedence rules configured per tenant
|
||||
- trust-weighted tie-breakers
|
||||
- “newer statement wins” only when issuer is the same or within the same trust class
|
||||
|
||||
### 4.6 Store VEX and decision inputs as content-addressed artifacts
|
||||
If you want replayability, you must be able to reconstruct the “world state.”
|
||||
|
||||
**Persist:**
|
||||
- VEX docs (by digest)
|
||||
- verification artifacts (signature bundles, cert chains)
|
||||
- normalized VEX statements (canonical form)
|
||||
- trust score + breakdown + algorithm version
|
||||
- policy bundle + version
|
||||
- vulnerability DB snapshot identifiers
|
||||
- decision output + evidence pointers
|
||||
|
||||
---
|
||||
|
||||
## 5) A practical trust scoring rubric you can hand to teams
|
||||
|
||||
Use a 0–100 score with defined buckets. The weights below are a starting point; what matters is consistency and explainability.
|
||||
|
||||
### 5.1 Source Trust (0–60)
|
||||
1. **Issuer identity verified (0–25)**
|
||||
- 0 if unsigned/unverifiable
|
||||
- 25 if signature verified to a known trust anchor
|
||||
2. **Issuer authority alignment (0–20)**
|
||||
- 20 if issuer is the product supplier/distro maintainer for that component set
|
||||
- lower if third party / aggregator
|
||||
3. **Distribution integrity (0–15)**
|
||||
- extra credit if the VEX is distributed as an attestation bound to an artifact and/or uses auditable signature patterns (e.g., countersigning). citeturn1view3turn10view0
|
||||
|
||||
### 5.2 Statement Quality (0–40)
|
||||
1. **Scope specificity (0–15)**
|
||||
- exact product IDs (purl), versions, architectures, etc.
|
||||
2. **Justification/impact present and structured (0–15)**
|
||||
- CSAF VEX expects impact statement for known_not_affected; Ubuntu maps “not_affected” to justifications like `vulnerable_code_not_present`. citeturn1view1turn9view0
|
||||
3. **Freshness (0–10)**
|
||||
- based on statement/document timestamps (explicitly hashed into snapshot)
|
||||
|
||||
### Score buckets
|
||||
- **90–100**: Verified + authoritative + high-quality → eligible for gating
|
||||
- **70–89**: Verified but weaker evidence/scope → eligible with policy constraints
|
||||
- **40–69**: Mixed/partial trust → informational, not gating by default
|
||||
- **0–39**: Unverified/low quality → do not affect decisions
|
||||
|
||||
---
|
||||
|
||||
## 6) Tight coupling to deterministic decisioning: what “coupling” means in practice
|
||||
|
||||
### 6.1 VEX must be an input to the same deterministic evaluation engine that produces the verdict
|
||||
Do not build “VEX handling” as a sidecar that produces annotations.
|
||||
|
||||
**Decision engine inputs must include:**
|
||||
- SBOM / component graph
|
||||
- vulnerability findings
|
||||
- normalized VEX statements
|
||||
- verification results + trust scores
|
||||
- tenant policy bundle
|
||||
- evaluation timestamp + snapshot identifiers
|
||||
|
||||
The engine output must include:
|
||||
- final status per vulnerability (affected/not affected/fixed/under investigation/unknown)
|
||||
- **why** (evidence pointers)
|
||||
- the policy rule(s) that caused it
|
||||
|
||||
### 6.2 Default posture: fail-safe, not fail-open
|
||||
Recommended defaults:
|
||||
- **Unverified VEX never suppresses vulnerabilities.**
|
||||
- Trust score below threshold never suppresses.
|
||||
- “Not affected” without justification/impact statement never suppresses.
|
||||
|
||||
This is aligned with CSAF VEX expectations and avoids the easiest suppression attack vector. citeturn1view1
|
||||
|
||||
### 6.3 Make uncertainty explicit
|
||||
If VEX conflicts or is low trust, your decisioning must produce explicit states like:
|
||||
- “Unknown (insufficient trusted VEX)”
|
||||
- “Under Investigation”
|
||||
|
||||
That is consistent with common VEX status vocabulary and avoids false certainty. citeturn6view1turn9view0
|
||||
|
||||
---
|
||||
|
||||
## 7) Tight coupling to attestations: what to attest, when, and why
|
||||
|
||||
### 7.1 Attest **decisions**, not just documents
|
||||
Competitors already sign SBOMs. Your moat is signing the **verdict** with the evidence chain.
|
||||
|
||||
Each signed verdict should bind:
|
||||
- subject artifact digest (container/image/package)
|
||||
- decision output (pass/fail/etc.)
|
||||
- hashes of:
|
||||
- VEX docs used
|
||||
- verification artifacts
|
||||
- trust scoring breakdown
|
||||
- policy bundle
|
||||
- vulnerability DB snapshot identifiers
|
||||
|
||||
### 7.2 Make attestations replayable
|
||||
Your attestation must contain enough references (digests) that the system can:
|
||||
- re-run the decision in an air-gapped environment
|
||||
- obtain the same outputs
|
||||
|
||||
This aligns with “historical snapshots” / “evidence chain integrity” expectations in modern SBOM programs. citeturn8view0
|
||||
|
||||
### 7.3 Provide two attestations (recommended)
|
||||
1. **VEX intake attestation** (optional but powerful)
|
||||
- “We ingested and verified this VEX doc from issuer X under policy Y.”
|
||||
2. **Risk verdict attestation** (core differentiator)
|
||||
- “Given SBOM, vulnerabilities, verified VEX, and policy snapshot, the artifact is acceptable/unacceptable.”
|
||||
|
||||
Sigstore’s countersigning concept illustrates that you can add layers of trust over artifacts/signatures; your verdict is the enterprise-grade layer. citeturn1view3
|
||||
|
||||
---
|
||||
|
||||
## 8) “Definition of Done” checklists (use in roadmaps)
|
||||
|
||||
### PM DoD for VEX Trust (ship criteria)
|
||||
- A customer can onboard a VEX source and see issuer identity + verification state.
|
||||
- Trust score exists with a visible breakdown and policy thresholds.
|
||||
- Policies can gate on trust score + verification.
|
||||
- Audit export: per release, show which VEX claims affected the final decision.
|
||||
|
||||
### DM DoD for Deterministic + Attestable
|
||||
- Same inputs → identical trust score and decision (golden tests).
|
||||
- All inputs content-addressed and captured in a snapshot bundle.
|
||||
- Attestation includes digests of all relevant inputs and a decision summary.
|
||||
- No network dependency at evaluation time unless recorded in snapshot.
|
||||
|
||||
---
|
||||
|
||||
## 9) Metrics that prove you differentiated
|
||||
|
||||
Track these from the first pilot:
|
||||
1. **% of decisions backed by verified VEX** (not just present)
|
||||
2. **% of “not affected” outcomes with cryptographic verification + justification**
|
||||
3. **Replay success rate** (recompute verdict from snapshot)
|
||||
4. **Time-to-audit** (minutes to produce evidence chain for a release)
|
||||
5. **False suppression rate** (should be effectively zero with fail-safe defaults)
|
||||
@@ -0,0 +1,268 @@
|
||||
## 1) Product direction: make “Unknowns” a first-class risk primitive
|
||||
|
||||
### Non‑negotiable product principles
|
||||
|
||||
1. **Unknowns are not suppressed findings**
|
||||
|
||||
* They are a distinct state with distinct governance.
|
||||
2. **Unknowns must be policy-addressable**
|
||||
|
||||
* If policy cannot block or allow them explicitly, the feature is incomplete.
|
||||
3. **Unknowns must be attested**
|
||||
|
||||
* Every signed decision must carry “what we don’t know” in a machine-readable way.
|
||||
4. **Unknowns must be default-on**
|
||||
|
||||
* Users may adjust thresholds, but they must not be able to “turn off unknown tracking.”
|
||||
|
||||
### Definition: what counts as an “unknown”
|
||||
|
||||
PMs must ensure that “unknown” is not vague. Define **reason-coded unknowns**, for example:
|
||||
|
||||
* **U-RCH**: Reachability unknown (call path indeterminate)
|
||||
* **U-ID**: Component identity unknown (ambiguous package / missing digest / unresolved PURL)
|
||||
* **U-PROV**: Provenance unknown (cannot map binary → source/build)
|
||||
* **U-VEX**: VEX conflict or missing applicability statement
|
||||
* **U-FEED**: Knowledge source missing (offline feed gaps, mirror stale)
|
||||
* **U-CONFIG**: Config/runtime gate unknown (feature flag not observable)
|
||||
* **U-ANALYZER**: Analyzer limitation (language/framework unsupported)
|
||||
|
||||
Each unknown must have:
|
||||
|
||||
* `reason_code` (one of a stable enum)
|
||||
* `scope` (component, binary, symbol, package, image, repo)
|
||||
* `evidence_refs` (what we inspected)
|
||||
* `assumptions` (what would need to be true/false)
|
||||
* `remediation_hint` (how to reduce unknown)
|
||||
|
||||
**Acceptance criterion:** every unknown surfaced to users can be traced to a reason code and remediation hint.
|
||||
|
||||
---
|
||||
|
||||
## 2) Policy direction: “unknown budgets” must be enforceable and environment-aware
|
||||
|
||||
### Policy model requirements
|
||||
|
||||
Policy must support:
|
||||
|
||||
* Thresholds by environment (dev/test/stage/prod)
|
||||
* Thresholds by unknown type (reachability vs provenance vs feed, etc.)
|
||||
* Severity weighting (e.g., unknown on internet-facing service is worse)
|
||||
* Exception workflow (time-bound, owner-bound)
|
||||
* Deterministic evaluation (same inputs → same result)
|
||||
|
||||
### Recommended default policy posture (ship as opinionated defaults)
|
||||
|
||||
These defaults are intentionally strict in prod:
|
||||
|
||||
**Prod (default)**
|
||||
|
||||
* `unknown_reachable == 0` (fail build/deploy)
|
||||
* `unknown_provenance == 0` (fail)
|
||||
* `unknown_total <= 3` (fail if exceeded)
|
||||
* `unknown_feed == 0` (fail; “we didn’t have data” is unacceptable for prod)
|
||||
|
||||
**Stage**
|
||||
|
||||
* `unknown_reachable <= 1`
|
||||
* `unknown_provenance <= 1`
|
||||
* `unknown_total <= 10`
|
||||
|
||||
**Dev**
|
||||
|
||||
* Never hard fail by default; warn + ticket/PR annotation
|
||||
* Still compute unknowns and show trendlines (so teams see drift)
|
||||
|
||||
### Exception policy (required to avoid “disable unknowns” pressure)
|
||||
|
||||
Implement **explicit exceptions** rather than toggles:
|
||||
|
||||
* Exception must include: `owner`, `expiry`, `justification`, `scope`, `risk_ack`
|
||||
* Exception must be emitted into attestations and reports (“this passed with exception X”).
|
||||
|
||||
**Acceptance criterion:** there is no “turn off unknowns” knob; only thresholds and expiring exceptions.
|
||||
|
||||
---
|
||||
|
||||
## 3) Reporting direction: unknowns must be visible, triaged, and trendable
|
||||
|
||||
### Required reporting surfaces
|
||||
|
||||
1. **Release / PR report**
|
||||
|
||||
* Unknown summary at top:
|
||||
|
||||
* total unknowns
|
||||
* unknowns by reason code
|
||||
* unknowns blocking policy vs not
|
||||
* “What changed?” vs previous baseline (unknown delta)
|
||||
2. **Dashboard (portfolio view)**
|
||||
|
||||
* Unknowns over time
|
||||
* Top teams/services by unknown count
|
||||
* Top unknown causes (reason codes)
|
||||
3. **Operational triage view**
|
||||
|
||||
* “Unknown queue” sortable by:
|
||||
|
||||
* environment impact (prod/stage)
|
||||
* exposure class (internet-facing/internal)
|
||||
* reason code
|
||||
* last-seen time
|
||||
* owner
|
||||
|
||||
### Reporting should drive action, not anxiety
|
||||
|
||||
Every unknown row must include:
|
||||
|
||||
* Why it’s unknown (reason code + short explanation)
|
||||
* What evidence is missing
|
||||
* How to reduce unknown (concrete steps)
|
||||
* Expected effect (e.g., “adding debug symbols will likely reduce U-RCH by ~X”)
|
||||
|
||||
**Key PM instruction:** treat unknowns like an **SLO**. Teams should be able to commit to “unknowns in prod must trend to zero.”
|
||||
|
||||
---
|
||||
|
||||
## 4) Attestations direction: unknowns must be cryptographically bound to decisions
|
||||
|
||||
Every signed decision/attestation must include an “unknowns summary” section.
|
||||
|
||||
### Attestation requirements
|
||||
|
||||
Include at minimum:
|
||||
|
||||
* `unknown_total`
|
||||
* `unknown_by_reason_code` (map of reason→count)
|
||||
* `unknown_blocking_count`
|
||||
* `unknown_details_digest` (hash of the full list if too large)
|
||||
* `policy_thresholds_applied` (the exact thresholds used)
|
||||
* `exceptions_applied` (IDs + expiries)
|
||||
* `knowledge_snapshot_id` (feeds/policy bundle hash if you support offline snapshots)
|
||||
|
||||
**Why this matters:** if you sign a “pass,” you must also sign what you *didn’t know* at the time. Otherwise the signature is not audit-grade.
|
||||
|
||||
**Acceptance criterion:** any downstream verifier can reject a signed “pass” based solely on unknown fields (e.g., “reject if unknown_reachable>0 in prod”).
|
||||
|
||||
---
|
||||
|
||||
## 5) Development direction: implement unknown propagation as a first-class data flow
|
||||
|
||||
### Core engineering tasks (must be done in this order)
|
||||
|
||||
#### A. Define the canonical “Tri-state” evaluation type
|
||||
|
||||
For any security claim, the evaluator must return:
|
||||
|
||||
* `TRUE` (evidence supports)
|
||||
* `FALSE` (evidence refutes)
|
||||
* `UNKNOWN` (insufficient evidence)
|
||||
|
||||
Do not represent unknown as nulls or missing fields. It must be explicit.
|
||||
|
||||
#### B. Build the unknown aggregator and reason-code framework
|
||||
|
||||
* A single aggregation layer computes:
|
||||
|
||||
* unknown counts per scope
|
||||
* unknown counts per reason code
|
||||
* unknown “blockers” based on policy
|
||||
* This must be deterministic and stable (no random ordering, stable IDs).
|
||||
|
||||
#### C. Ensure analyzers emit unknowns instead of silently failing
|
||||
|
||||
Any analyzer that cannot conclude must emit:
|
||||
|
||||
* `UNKNOWN` + reason code + evidence pointers
|
||||
Examples:
|
||||
* call graph incomplete → `U-RCH`
|
||||
* stripped binary cannot map symbols → `U-PROV`
|
||||
* unsupported language → `U-ANALYZER`
|
||||
|
||||
#### D. Provide “reduce unknown” instrumentation hooks
|
||||
|
||||
Attach remediation metadata:
|
||||
|
||||
* “add build flags …”
|
||||
* “upload debug symbols …”
|
||||
* “enable source mapping …”
|
||||
* “mirror feeds …”
|
||||
|
||||
This is how you prevent user backlash.
|
||||
|
||||
---
|
||||
|
||||
## 6) Make it default rather than optional: rollout plan without breaking adoption
|
||||
|
||||
### Phase 1: compute + display (no blocking)
|
||||
|
||||
* Unknowns computed for all scans
|
||||
* Reports show unknown budgets and what would have failed in prod
|
||||
* Collect baseline metrics for 2–4 weeks of typical usage
|
||||
|
||||
### Phase 2: soft gating
|
||||
|
||||
* In prod-like pipelines: fail only on `unknown_reachable > 0`
|
||||
* Everything else warns + requires owner acknowledgement
|
||||
|
||||
### Phase 3: full policy enforcement
|
||||
|
||||
* Enforce default thresholds
|
||||
* Exceptions require expiry and are visible in attestations
|
||||
|
||||
### Phase 4: governance integration
|
||||
|
||||
* Unknowns become part of:
|
||||
|
||||
* release readiness checks
|
||||
* quarterly risk reviews
|
||||
* vendor compliance audits
|
||||
|
||||
**Dev Manager instruction:** invest in tooling that reduces unknowns early (symbol capture, provenance mapping, better analyzers). Otherwise “unknown gating” becomes politically unsustainable.
|
||||
|
||||
---
|
||||
|
||||
## 7) “Definition of Done” checklist for PMs and Dev Managers
|
||||
|
||||
### PM DoD
|
||||
|
||||
* [ ] Unknowns are explicitly defined with stable reason codes
|
||||
* [ ] Policy can fail on unknowns with environment-scoped thresholds
|
||||
* [ ] Reports show unknown deltas and remediation guidance
|
||||
* [ ] Exceptions are time-bound and appear everywhere (UI + API + attestations)
|
||||
* [ ] Unknowns cannot be disabled; only thresholds/exceptions are configurable
|
||||
|
||||
### Engineering DoD
|
||||
|
||||
* [ ] Tri-state evaluation implemented end-to-end
|
||||
* [ ] Analyzer failures never disappear; they become unknowns
|
||||
* [ ] Unknown aggregation is deterministic and reproducible
|
||||
* [ ] Signed attestation includes unknown summary + policy thresholds + exceptions
|
||||
* [ ] CI/CD integration can enforce “fail if unknowns > N in prod”
|
||||
|
||||
---
|
||||
|
||||
## 8) Concrete policy examples you can standardize internally
|
||||
|
||||
### Minimal policy (prod)
|
||||
|
||||
* Block deploy if:
|
||||
|
||||
* `unknown_reachable > 0`
|
||||
* OR `unknown_provenance > 0`
|
||||
|
||||
### Balanced policy (prod)
|
||||
|
||||
* Block deploy if:
|
||||
|
||||
* `unknown_reachable > 0`
|
||||
* OR `unknown_provenance > 0`
|
||||
* OR `unknown_total > 3`
|
||||
|
||||
### Risk-sensitive policy (internet-facing prod)
|
||||
|
||||
* Block deploy if:
|
||||
|
||||
* `unknown_reachable > 0`
|
||||
* OR `unknown_total > 1`
|
||||
* OR any unknown affects a component with known remotely-exploitable CVEs
|
||||
@@ -0,0 +1,104 @@
|
||||
Below is a **feature → moat strength** map for Stella Ops, explicitly benchmarked against the tools we’ve been discussing (Trivy/Aqua, Grype/Syft, Anchore Enterprise, Snyk, Prisma Cloud). I’m using **“moat”** in the strict sense: *how hard is it for an incumbent to replicate the capability to parity, and how strong are the switching costs once deployed.*
|
||||
|
||||
### Moat scale
|
||||
|
||||
* **5 = Structural moat** (new primitives, strong defensibility, durable switching cost)
|
||||
* **4 = Strong moat** (difficult multi-domain engineering; incumbents have only partial analogs)
|
||||
* **3 = Moderate moat** (others can build; differentiation is execution + packaging)
|
||||
* **2 = Weak moat** (table-stakes soon; limited defensibility)
|
||||
* **1 = Commodity** (widely available in OSS / easy to replicate)
|
||||
|
||||
---
|
||||
|
||||
## 1) Stella Ops candidate features mapped to moat strength
|
||||
|
||||
| Stella Ops feature (precisely defined) | Closest competitor analogs (evidence) | Competitive parity today | Moat strength | Why this is (or isn’t) defensible | How to harden the moat |
|
||||
| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------: | ------------: | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| **Signed, replayable risk verdicts**: “this artifact is acceptable” decisions produced deterministically, with an evidence bundle + policy snapshot, signed as an attestation | Ecosystem can sign SBOM attestations (e.g., Syft + Sigstore; DSSE/in-toto via cosign), but not “risk verdict” decisions end-to-end ([Anchore][1]) | Low | **5** | This requires a **deterministic evaluation model**, a **proof/evidence schema**, and “knowledge snapshotting” so results are replayable months later. Incumbents mostly stop at exporting scan results or SBOMs, not signing a decision in a reproducible way. | Make the verdict format a **first-class artifact** (OCI-attached attestation), with strict replay semantics (“same inputs → same verdict”), plus auditor-friendly evidence extraction. |
|
||||
| **VEX decisioning engine (not just ingestion)**: ingest OpenVEX/CycloneDX/CSAF, resolve conflicts with a trust/policy lattice, and produce explainable outcomes | Trivy supports multiple VEX formats (CycloneDX/OpenVEX/CSAF) but notes it’s “experimental/minimal functionality” ([Trivy][2]). Grype supports OpenVEX ingestion ([Chainguard][3]). Anchore can generate VEX docs from annotations (OpenVEX + CycloneDX) ([Anchore Docs][4]). Aqua runs VEX Hub for distributing VEX statements to Trivy ([Aqua][5]) | Medium (ingestion exists; decision logic is thin) | **4** | Ingestion alone is easy; the moat comes from **formal conflict resolution**, provenance-aware trust weighting, and deterministic outcomes. Most tools treat VEX as suppression/annotation, not a reasoning substrate. | Ship a **policy-controlled merge semantics** (“vendor > distro > internal” is too naive) + required evidence hooks (e.g., “not affected because feature flag off”). |
|
||||
| **Reachability with proof**, tied to deployable artifacts: produce a defensible chain “entrypoint → call path → vulnerable symbol,” plus configuration gates | Snyk has reachability analysis in GA for certain languages/integrations and uses call-graph style reasoning to determine whether vulnerable code is called ([Snyk User Docs][6]). Some commercial vendors also market reachability (e.g., Endor Labs is listed in CycloneDX Tool Center as analyzing reachability) ([CycloneDX][7]) | Medium (reachability exists, but proof portability varies) | **4** | “Reachability” as a label is no longer unique. The moat is **portable proofs** (usable in audits and in air-gapped environments) + artifact-level mapping (not just source repo analysis) + deterministic replay. | Focus on **proof-carrying reachability**: store the reachability subgraph as evidence; make it reproducible and attestable; support both source and post-build artifacts. |
|
||||
| **Smart-Diff (semantic risk delta)**: between releases, explain “what materially changed in exploitable surface,” not just “CVE count changed” | Anchore provides SBOM management and policy evaluation (good foundation), but “semantic risk diff” is not a prominent, standardized feature in typical scanners ([Anchore Docs][8]) | Low–Medium | **4** | Most incumbents can diff findings lists. Few can diff **reachability graphs, policy outcomes, and VEX state** to produce stable “delta narratives.” Hard to replicate without the underlying evidence model. | Treat diff as first-class: version SBOM graphs + reachability graphs + VEX claims; compute deltas over those graphs and emit a signed “delta verdict.” |
|
||||
| **Unknowns as first-class state**: represent “unknown-reachable/unknown-unreachable” and force policies to account for uncertainty | Not a standard capability in common scanners/platforms; most systems output findings and (optionally) suppressions | Low | **4** | This is conceptually simple but operationally rare; it requires rethinking UX, scoring, and policy evaluation. It becomes sticky once orgs base governance on uncertainty budgets. | Bake unknowns into policies (“fail if unknowns > N in prod”), reporting, and attestations. Make it the default rather than optional. |
|
||||
| **Air-gapped epistemic mode**: offline operation where the tool can prove what knowledge it used (feed snapshot + timestamps + trust anchors) | Prisma Cloud Compute Edition supports air-gapped environments and has an offline Intel Stream update mechanism ([Prisma Cloud Docs][9]). (But “prove exact knowledge state used for decisions” is typically not the emphasis.) | Medium | **4** | Air-gapped “runtime” is common; air-gapped **reproducibility** is not. The moat is packaging offline feeds + policies + deterministic scoring into a replayable bundle tied to attestations. | Deliver a “sealed knowledge snapshot” workflow (export/import), and make audits a one-command replay. |
|
||||
| **SBOM ledger + lineage**: BYOS ingestion plus versioned SBOM storage, grouping, and historical tracking | Anchore explicitly positions centralized SBOM management and “Bring Your Own SBOM” ([Anchore Docs][8]). Snyk can generate SBOMs and expose SBOM via API in CycloneDX/SPDX formats ([Snyk User Docs][10]). Prisma can export CycloneDX SBOMs for scans ([Prisma Cloud Docs][11]) | High | **3** | SBOM generation/storage is quickly becoming table stakes. You can still differentiate on **graph fidelity + lineage semantics**, but “having SBOMs” alone won’t be a moat. | Make the ledger valuable via **semantic diff, evidence joins (reachability/VEX), and provenance** rather than storage. |
|
||||
| **Policy engine with proofs**: policy-as-code that produces a signed explanation (“why pass/fail”) and links to evidence nodes | Anchore has a mature policy model (policy JSON, gates, allowlists, mappings) ([Anchore Docs][12]). Prisma/Aqua have rich policy + runtime guardrails (platform-driven) ([Aqua][13]) | High | **3** | Policy engines are common. The moat is the **proof output** + deterministic replay + integration with attestations. | Keep policy language small but rigorous; always emit evidence pointers; support “policy compilation” to deterministic decision artifacts. |
|
||||
| **VEX distribution network**: ecosystem layer that aggregates, validates, and serves VEX at scale | Aqua’s VEX Hub is explicitly a centralized repository designed for discover/fetch/consume flows with Trivy ([Aqua][5]) | Medium | **3–4** | A network layer can become a moat if it achieves broad adoption. But incumbents can also launch hubs. This becomes defensible only with **network effects + trust frameworks**. | Differentiate with **verification + trust scoring** of VEX sources, plus tight coupling to deterministic decisioning and attestations. |
|
||||
| **“Integrations everywhere”** (CI/CD, registry, Kubernetes, IDE) | Everyone in this space integrates broadly; reachability and scoring features often ride those integrations (e.g., Snyk reachability depends on repo/integration access) ([Snyk User Docs][6]) | High | **1–2** | Integrations are necessary, but not defensible—mostly engineering throughput. | Use integrations to *distribute attestations and proofs*, not as the headline differentiator. |
|
||||
|
||||
---
|
||||
|
||||
## 2) Where competitors already have strong moats (avoid head‑on fights early)
|
||||
|
||||
These are areas where incumbents are structurally advantaged, so Stella Ops should either (a) integrate rather than replace, or (b) compete only if you have a much sharper wedge.
|
||||
|
||||
### Snyk’s moat: developer adoption + reachability-informed prioritization
|
||||
|
||||
* Snyk publicly documents **reachability analysis** (GA for certain integrations/languages) ([Snyk User Docs][6])
|
||||
* Snyk prioritization incorporates reachability and other signals into **Priority Score** ([Snyk User Docs][14])
|
||||
**Implication:** pure “reachability” claims won’t beat Snyk; **proof-carrying, artifact-tied, replayable reachability** can.
|
||||
|
||||
### Prisma Cloud’s moat: CNAPP breadth + graph-based risk prioritization + air-gapped CWPP
|
||||
|
||||
* Prisma invests in graph-driven investigation/tracing of vulnerabilities ([Prisma Cloud Docs][15])
|
||||
* Risk prioritization and risk-score ranked vulnerability views are core platform capabilities ([Prisma Cloud Docs][16])
|
||||
* Compute Edition supports **air-gapped environments** and has offline update workflows ([Prisma Cloud Docs][9])
|
||||
**Implication:** competing on “platform breadth” is a losing battle early; compete on **decision integrity** (deterministic, attestable, replayable) and integrate where needed.
|
||||
|
||||
### Anchore’s moat: SBOM operations + policy-as-code maturity
|
||||
|
||||
* Anchore is explicitly SBOM-management centric and supports policy gating constructs ([Anchore Docs][8])
|
||||
**Implication:** Anchore is strong at “SBOM at scale.” Stella Ops should outperform on **semantic diff, VEX reasoning, and proof outputs**, not just SBOM storage.
|
||||
|
||||
### Aqua’s moat: code-to-runtime enforcement plus emerging VEX distribution
|
||||
|
||||
* Aqua provides CWPP-style runtime policy enforcement/guardrails ([Aqua][13])
|
||||
* Aqua backs VEX Hub for VEX distribution and Trivy consumption ([Aqua][5])
|
||||
**Implication:** if Stella Ops is not a runtime protection platform, don’t chase CWPP breadth—use Aqua/Prisma integrations and focus on upstream decision quality.
|
||||
|
||||
---
|
||||
|
||||
## 3) Practical positioning: which features produce the most durable wedge
|
||||
|
||||
If you want the shortest path to a *defensible* position:
|
||||
|
||||
1. **Moat anchor (5): Signed, replayable risk verdicts**
|
||||
|
||||
* Everything else (VEX, reachability, diff) becomes evidence feeding that verdict.
|
||||
2. **Moat amplifier (4): VEX decisioning + proof-carrying reachability**
|
||||
|
||||
* In 2025, VEX ingestion exists in Trivy/Grype/Anchore ([Trivy][2]), and reachability exists in Snyk ([Snyk User Docs][6]).
|
||||
* Your differentiation must be: **determinism + portability + auditability**.
|
||||
3. **Moat compounding (4): Smart-Diff over risk meaning**
|
||||
|
||||
* Turns “scan results” into an operational change-control primitive.
|
||||
|
||||
---
|
||||
|
||||
## 4) A concise “moat thesis” per feature (one-liners you can use internally)
|
||||
|
||||
* **Deterministic signed verdicts:** “We don’t output findings; we output an attestable decision that can be replayed.”
|
||||
* **VEX decisioning:** “We treat VEX as a logical claim system, not a suppression file.”
|
||||
* **Reachability proofs:** “We provide proof of exploitability in *this* artifact, not just a badge.”
|
||||
* **Smart-Diff:** “We explain what changed in exploitable surface area, not what changed in CVE count.”
|
||||
* **Unknowns modeling:** “We quantify uncertainty and gate on it.”
|
||||
|
||||
---
|
||||
|
||||
If you want, I can convert the table into a **2×2 moat map** (Customer Value vs Defensibility) and a **build-order roadmap** that maximizes durable advantage while minimizing overlap with entrenched competitor moats.
|
||||
|
||||
[1]: https://anchore.com/sbom/creating-sbom-attestations-using-syft-and-sigstore/?utm_source=chatgpt.com "Creating SBOM Attestations Using Syft and Sigstore"
|
||||
[2]: https://trivy.dev/docs/v0.50/supply-chain/vex/?utm_source=chatgpt.com "VEX"
|
||||
[3]: https://www.chainguard.dev/unchained/vexed-then-grype-about-it-chainguard-and-anchore-announce-grype-supports-openvex?utm_source=chatgpt.com "VEXed? Then Grype about it"
|
||||
[4]: https://docs.anchore.com/current/docs/vulnerability_management/vuln_annotations/?utm_source=chatgpt.com "Vulnerability Annotations and VEX"
|
||||
[5]: https://www.aquasec.com/blog/introducing-vex-hub-unified-repository-for-vex-statements/?utm_source=chatgpt.com "Trivy VEX Hub:The Solution to Vulnerability Fatigue"
|
||||
[6]: https://docs.snyk.io/manage-risk/prioritize-issues-for-fixing/reachability-analysis?utm_source=chatgpt.com "Reachability analysis"
|
||||
[7]: https://cyclonedx.org/tool-center/?utm_source=chatgpt.com "CycloneDX Tool Center"
|
||||
[8]: https://docs.anchore.com/current/docs/sbom_management/?utm_source=chatgpt.com "SBOM Management"
|
||||
[9]: https://docs.prismacloud.io/en/compute-edition?utm_source=chatgpt.com "Prisma Cloud Compute Edition"
|
||||
[10]: https://docs.snyk.io/developer-tools/snyk-cli/commands/sbom?utm_source=chatgpt.com "SBOM | Snyk User Docs"
|
||||
[11]: https://docs.prismacloud.io/en/compute-edition/32/admin-guide/vulnerability-management/exporting-sboms?utm_source=chatgpt.com "Exporting Software Bill of Materials on CycloneDX"
|
||||
[12]: https://docs.anchore.com/current/docs/overview/concepts/policy/policies/?utm_source=chatgpt.com "Policies and Evaluation"
|
||||
[13]: https://www.aquasec.com/products/cwpp-cloud-workload-protection/?utm_source=chatgpt.com "Cloud workload protection in Runtime - Aqua Security"
|
||||
[14]: https://docs.snyk.io/manage-risk/prioritize-issues-for-fixing?utm_source=chatgpt.com "Prioritize issues for fixing"
|
||||
[15]: https://docs.prismacloud.io/en/enterprise-edition/content-collections/search-and-investigate/c2c-tracing-vulnerabilities/investigate-vulnerabilities-tracing?utm_source=chatgpt.com "Use Vulnerabilities Tracing on Investigate"
|
||||
[16]: https://docs.prismacloud.io/en/enterprise-edition/use-cases/secure-the-infrastructure/risk-prioritization?utm_source=chatgpt.com "Risk Prioritization - Prisma Cloud Documentation"
|
||||
@@ -0,0 +1,469 @@
|
||||
Below are implementation-grade guidelines for Stella Ops Product Managers (PMs) and Development Managers (Eng Managers / Tech Leads) for two tightly coupled capabilities:
|
||||
|
||||
1. **Exception management as auditable objects** (not suppression files)
|
||||
2. **Audit packs** (exportable, verifiable evidence bundles for releases and environments)
|
||||
|
||||
The intent is to make these capabilities:
|
||||
|
||||
* operationally useful (reduce friction in CI/CD and runtime governance),
|
||||
* defensible in audits (tamper-evident, attributable, time-bounded), and
|
||||
* consistent with Stella Ops’ positioning around determinism, evidence, and replayability.
|
||||
|
||||
---
|
||||
|
||||
# 1. Shared objectives and boundaries
|
||||
|
||||
## 1.1 Objectives
|
||||
|
||||
These two capabilities must jointly enable:
|
||||
|
||||
* **Risk decisions are explicit**: Every “ignore/suppress/waive” is a governed decision with an owner and expiry.
|
||||
* **Decisions are replayable**: If an auditor asks “why did you ship this on date X?”, Stella Ops can reproduce the decision using the same policy + evidence + knowledge snapshot.
|
||||
* **Decisions are exportable and verifiable**: Audit packs include the minimum necessary artifacts and a manifest that allows independent verification of integrity and completeness.
|
||||
* **Operational friction is reduced**: Teams can ship safely with controlled exceptions, rather than ad-hoc suppressions, while retaining accountability.
|
||||
|
||||
## 1.2 Out of scope (explicitly)
|
||||
|
||||
Avoid scope creep early. The following are out of scope for v1 unless mandated by a target customer:
|
||||
|
||||
* Full GRC mapping to specific frameworks (you can *support evidence*; don’t claim compliance).
|
||||
* Fully automated approvals based on HR org charts.
|
||||
* Multi-year archival systems (start with retention, export, and immutable event logs).
|
||||
* A “ticketing system replacement.” Integrate with ticketing; don’t rebuild it.
|
||||
|
||||
---
|
||||
|
||||
# 2. Shared design principles (non-negotiables)
|
||||
|
||||
These principles apply to both Exception Objects and Audit Packs:
|
||||
|
||||
1. **Attribution**: every action has an authenticated actor identity (human or service), a timestamp, and a reason.
|
||||
2. **Immutability of history**: edits are new versions/events; never rewrite history in place.
|
||||
3. **Least privilege scope**: exceptions must be as narrow as possible (artifact digest over tag; component purl over “any”; environment constraints).
|
||||
4. **Time-bounded risk**: exceptions must expire. “Permanent ignore” is a governance smell.
|
||||
5. **Deterministic evaluation**: given the same policy + snapshot + exceptions + inputs, the outcome is stable and reproducible.
|
||||
6. **Separation of concerns**:
|
||||
|
||||
* Exception store = governed decisions.
|
||||
* Scanner = evidence producer.
|
||||
* Policy engine = deterministic evaluator.
|
||||
* Audit packer = exporter/assembler/verifier.
|
||||
|
||||
---
|
||||
|
||||
# 3. Exception management as auditable objects
|
||||
|
||||
## 3.1 What an “Exception Object” is
|
||||
|
||||
An Exception Object is a structured, versioned record that modifies evaluation behavior *in a controlled manner*, while leaving the underlying findings intact.
|
||||
|
||||
It is not:
|
||||
|
||||
* a local `.ignore` file,
|
||||
* a hidden suppression rule,
|
||||
* a UI-only toggle,
|
||||
* a vendor-specific “ignore list” with no audit trail.
|
||||
|
||||
### Exception types you should support (minimum set)
|
||||
|
||||
PMs should start with these canonical types:
|
||||
|
||||
1. **Vulnerability exception**
|
||||
|
||||
* suppress/waive a specific vulnerability finding (e.g., CVE/CWE) under defined scope.
|
||||
2. **Policy exception**
|
||||
|
||||
* allow a policy rule to be bypassed under defined scope (e.g., “allow unsigned artifact for dev namespace”).
|
||||
3. **Unknown-state exception** (if Stella models unknowns)
|
||||
|
||||
* allow a release despite unresolved unknowns, with explicit risk acceptance.
|
||||
4. **Component exception**
|
||||
|
||||
* allow/deny a component/package/version across a domain, again with explicit scope and expiry.
|
||||
|
||||
## 3.2 Required fields and schema guidelines
|
||||
|
||||
PMs: mandate these fields; Eng: enforce them at API and storage level.
|
||||
|
||||
### Required fields (v1)
|
||||
|
||||
* **exception_id** (stable identifier)
|
||||
* **version** (monotonic; or event-sourced)
|
||||
* **status**: proposed | approved | active | expired | revoked
|
||||
* **owner** (accountable person/team)
|
||||
* **requester** (who initiated)
|
||||
* **approver(s)** (who approved; may be empty for dev environments depending on policy)
|
||||
* **created_at / updated_at / approved_at / expires_at**
|
||||
* **scope** (see below)
|
||||
* **reason_code** (taxonomy)
|
||||
* **rationale** (free text, required)
|
||||
* **evidence_refs** (optional in v1 but strongly recommended)
|
||||
* **risk_acceptance** (explicit boolean or structured “risk accepted” block)
|
||||
* **links** (ticket ID, PR, incident, vendor advisory reference) – optional but useful
|
||||
* **audit_log_refs** (implicit if event-sourced)
|
||||
|
||||
### Scope model (critical to defensibility)
|
||||
|
||||
Scope must be structured and narrowable. Provide scope dimensions such as:
|
||||
|
||||
* **Artifact scope**: image digest, SBOM digest, build provenance digest (preferred)
|
||||
(Avoid tags as primary scope unless paired with immutability constraints.)
|
||||
* **Component scope**: purl + version range + ecosystem
|
||||
* **Vulnerability scope**: CVE ID(s), GHSA, internal ID; optionally path/function/symbol constraints
|
||||
* **Environment scope**: cluster/namespace, runtime env (dev/stage/prod), repository, project, tenant
|
||||
* **Time scope**: expires_at (required), optional “valid_from”
|
||||
|
||||
PM guideline: default UI and API should encourage digest-based scope and warn on broad scopes.
|
||||
|
||||
## 3.3 Reason codes (taxonomy)
|
||||
|
||||
Reason codes are a moat because they enable governance analytics and policy automation.
|
||||
|
||||
Minimum suggested taxonomy:
|
||||
|
||||
* **FALSE_POSITIVE** (with evidence expectations)
|
||||
* **NOT_REACHABLE** (reachable proof preferred)
|
||||
* **NOT_AFFECTED** (VEX-backed preferred)
|
||||
* **BACKPORT_FIXED** (package/distro evidence preferred)
|
||||
* **COMPENSATING_CONTROL** (link to control evidence)
|
||||
* **RISK_ACCEPTED** (explicit sign-off)
|
||||
* **TEMPORARY_WORKAROUND** (link to mitigation plan)
|
||||
* **VENDOR_PENDING** (under investigation)
|
||||
* **BUSINESS_EXCEPTION** (rare; requires stronger approval)
|
||||
|
||||
PM guideline: reason codes must be selectable and reportable; do not allow “Other” as the default.
|
||||
|
||||
## 3.4 Evidence attachments
|
||||
|
||||
Exceptions should evolve from “justification-only” to “justification + evidence.”
|
||||
|
||||
Evidence references can point to:
|
||||
|
||||
* VEX statements (OpenVEX/CycloneDX VEX)
|
||||
* reachability proof fragments (call-path subgraph, symbol references)
|
||||
* distro advisories / patch references
|
||||
* internal change tickets / mitigation PRs
|
||||
* runtime mitigations
|
||||
|
||||
Eng guideline: store evidence as references with integrity checks (hash/digest). For v2+, store evidence bundles as content-addressed blobs.
|
||||
|
||||
## 3.5 Lifecycle and workflows
|
||||
|
||||
### Lifecycle states and transitions
|
||||
|
||||
* **Proposed** → **Approved** → **Active** → (**Expired** or **Revoked**)
|
||||
* **Renewal** should create a **new version** (never extend an old record silently).
|
||||
|
||||
### Approvals
|
||||
|
||||
PM guideline:
|
||||
|
||||
* At least two approval modes:
|
||||
|
||||
1. **Self-approved** (allowed only for dev/experimental scopes)
|
||||
2. **Two-person review** (required for prod or broad scope)
|
||||
|
||||
Eng guideline:
|
||||
|
||||
* Enforce approval rules via policy config (not hard-coded).
|
||||
* Record every approval action with actor identity and timestamp.
|
||||
|
||||
### Expiry enforcement
|
||||
|
||||
Non-negotiable:
|
||||
|
||||
* Expired exceptions must stop applying automatically.
|
||||
* Renewals require an explicit action and new audit trail.
|
||||
|
||||
## 3.6 Evaluation semantics (how exceptions affect results)
|
||||
|
||||
This is where most products become non-auditable. You need deterministic, explicit rules.
|
||||
|
||||
PM guideline: define precedence clearly:
|
||||
|
||||
* Policy engine evaluates baseline findings → applies exceptions → produces verdict.
|
||||
* Exceptions never delete underlying findings; they alter the *decision outcome* and annotate the reasoning.
|
||||
|
||||
Eng guideline: exception application must be:
|
||||
|
||||
* **Deterministic** (stable ordering rules)
|
||||
* **Transparent** (verdict includes “exception applied: exception_id, reason_code, scope match explanation”)
|
||||
* **Scoped** (match explanation must state which scope dimensions matched)
|
||||
|
||||
## 3.7 Auditability requirements
|
||||
|
||||
Exception management must be audit-ready by construction.
|
||||
|
||||
Minimum requirements:
|
||||
|
||||
* **Append-only event log** for create/approve/revoke/expire/renew actions
|
||||
* **Versioning**: every change results in a new version or event
|
||||
* **Tamper-evidence**: hash chain events or sign event batches
|
||||
* **Retention**: define retention policy and export strategy
|
||||
|
||||
PM guideline: auditors will ask “who approved,” “why,” “when,” “what scope,” and “what changed since.” Design the UX and exports to answer those in minutes.
|
||||
|
||||
## 3.8 UX guidelines
|
||||
|
||||
Key UX flows:
|
||||
|
||||
* **Create exception from a finding** (pre-fill CVE/component/artifact scope)
|
||||
* **Preview impact** (“this will suppress 37 findings across 12 images; are you sure?”)
|
||||
* **Expiry visibility** (countdown, alerts, renewal prompts)
|
||||
* **Audit trail view** (who did what, with diffs between versions)
|
||||
* **Search and filters** by owner, reason, expiry window, scope breadth, environment
|
||||
|
||||
UX anti-patterns to forbid:
|
||||
|
||||
* “Ignore all vulnerabilities in this image” with one click
|
||||
* Silent suppressions without owner/expiry
|
||||
* Exceptions created without linking to scope and reason
|
||||
|
||||
## 3.9 Product acceptance criteria (PM-owned)
|
||||
|
||||
A feature is not “done” until:
|
||||
|
||||
* Every exception has owner, expiry, reason code, scope.
|
||||
* Exception history is immutable and exportable.
|
||||
* Policy outcomes show applied exceptions and why.
|
||||
* Expiry is enforced automatically.
|
||||
* A user can answer: “What exceptions were active for this release?” within 2 minutes.
|
||||
|
||||
---
|
||||
|
||||
# 4. Audit packs
|
||||
|
||||
## 4.1 What an audit pack is
|
||||
|
||||
An Audit Pack is a **portable, verifiable bundle** that answers:
|
||||
|
||||
* What was evaluated? (artifacts, versions, identities)
|
||||
* Under what policies? (policy version/config)
|
||||
* Using what knowledge state? (vuln DB snapshot, VEX inputs)
|
||||
* What exceptions were applied? (IDs, owners, rationales)
|
||||
* What was the decision and why? (verdict + evidence pointers)
|
||||
* What changed since the last release? (optional diff summary)
|
||||
|
||||
PM guideline: treat the Audit Pack as a product deliverable, not an export button.
|
||||
|
||||
## 4.2 Pack structure (recommended)
|
||||
|
||||
Use a predictable, documented layout. Example:
|
||||
|
||||
* `manifest.json`
|
||||
|
||||
* pack_id, generated_at, generator_version
|
||||
* hashes/digests of every included file
|
||||
* signing info (optional in v1; recommended soon)
|
||||
* `inputs/`
|
||||
|
||||
* artifact identifiers (digests), repo references (optional)
|
||||
* SBOM(s) (CycloneDX/SPDX)
|
||||
* `vex/`
|
||||
|
||||
* VEX docs used + any VEX produced
|
||||
* `policy/`
|
||||
|
||||
* policy bundle used (versioned)
|
||||
* evaluation settings
|
||||
* `exceptions/`
|
||||
|
||||
* all exceptions relevant to the evaluated scope
|
||||
* plus event logs / versions
|
||||
* `findings/`
|
||||
|
||||
* normalized findings list
|
||||
* reachability evidence fragments if applicable
|
||||
* `verdict/`
|
||||
|
||||
* final decision object
|
||||
* explanation summary
|
||||
* signed attestation (if supported)
|
||||
* `diff/` (optional)
|
||||
|
||||
* delta from prior baseline (what changed materially)
|
||||
|
||||
## 4.3 Formats: human and machine
|
||||
|
||||
You need both:
|
||||
|
||||
* **Machine-readable** (JSON + standard SBOM/VEX formats) for verification and automation
|
||||
* **Human-readable** summary (HTML or PDF) for auditors and leadership
|
||||
|
||||
PM guideline: machine artifacts are the source of truth. Human docs are derived views.
|
||||
|
||||
Eng guideline:
|
||||
|
||||
* Ensure the pack can be generated **offline**.
|
||||
* Ensure deterministic outputs where feasible (stable ordering, consistent serialization).
|
||||
|
||||
## 4.4 Integrity and verification
|
||||
|
||||
At minimum:
|
||||
|
||||
* `manifest.json` includes a digest for each file.
|
||||
* Provide a `stella verify-pack` CLI that checks:
|
||||
|
||||
* manifest integrity
|
||||
* file hashes
|
||||
* schema versions
|
||||
* optional signature verification
|
||||
|
||||
For v2:
|
||||
|
||||
* Sign the manifest (and/or the verdict) using your standard attestation mechanism.
|
||||
|
||||
## 4.5 Confidentiality and redaction
|
||||
|
||||
Audit packs often include sensitive data (paths, internal package names, repo URLs).
|
||||
|
||||
PM guideline:
|
||||
|
||||
* Provide **redaction profiles**:
|
||||
|
||||
* external auditor pack (minimal identifiers)
|
||||
* internal audit pack (full detail)
|
||||
* Provide encryption options (password/recipient keys) if packs leave the environment.
|
||||
|
||||
Eng guideline:
|
||||
|
||||
* Redaction must be deterministic and declarative (policy-based).
|
||||
* Pack generation must not leak secrets from raw scan logs.
|
||||
|
||||
## 4.6 Pack generation workflow
|
||||
|
||||
Key product flows:
|
||||
|
||||
* Generate pack for:
|
||||
|
||||
* a specific artifact digest
|
||||
* a release (set of digests)
|
||||
* an environment snapshot (e.g., cluster inventory)
|
||||
* a date range (for audit period)
|
||||
* Trigger sources:
|
||||
|
||||
* UI
|
||||
* API
|
||||
* CI pipeline step
|
||||
|
||||
Engineering:
|
||||
|
||||
* Treat pack generation as an async job (queue + status endpoint).
|
||||
* Cache pack components when inputs are identical (avoid repeated work).
|
||||
|
||||
## 4.7 What must be included (minimum viable audit pack)
|
||||
|
||||
PMs should enforce that v1 includes:
|
||||
|
||||
* Artifact identity
|
||||
* SBOM(s) or component inventory
|
||||
* Findings list (normalized)
|
||||
* Policy bundle reference + policy content
|
||||
* Exceptions applied (full object + version info)
|
||||
* Final verdict + explanation summary
|
||||
* Integrity manifest with file hashes
|
||||
|
||||
Add these when available (v1.5+):
|
||||
|
||||
* VEX inputs and outputs
|
||||
* Knowledge snapshot references
|
||||
* Reachability evidence fragments
|
||||
* Diff summary vs prior release
|
||||
|
||||
## 4.8 Product acceptance criteria (PM-owned)
|
||||
|
||||
Audit Packs are not “done” until:
|
||||
|
||||
* A third party can validate the pack contents haven’t been altered (hash verification).
|
||||
* The pack answers “why did this pass/fail?” including exceptions applied.
|
||||
* Packs can be generated without external network calls (air-gap friendly).
|
||||
* Packs support redaction profiles.
|
||||
* Pack schema is versioned and backward compatible.
|
||||
|
||||
---
|
||||
|
||||
# 5. Cross-cutting: roles, responsibilities, and delivery checkpoints
|
||||
|
||||
## 5.1 Responsibilities
|
||||
|
||||
**Product Manager**
|
||||
|
||||
* Define exception types and required fields
|
||||
* Define reason code taxonomy and governance policies
|
||||
* Define approval rules by environment and scope breadth
|
||||
* Define audit pack templates, profiles, and export targets
|
||||
* Own acceptance criteria and audit usability testing
|
||||
|
||||
**Development Manager / Tech Lead**
|
||||
|
||||
* Own event model (immutability, versioning, retention)
|
||||
* Own policy evaluation semantics and determinism guarantees
|
||||
* Own integrity and signing design (manifest hashes, optional signatures)
|
||||
* Own performance and scalability targets (pack generation and query latency)
|
||||
* Own secure storage and access controls (RBAC, tenant isolation)
|
||||
|
||||
## 5.2 Deliverables checklist (for each capability)
|
||||
|
||||
For “Exception Objects”:
|
||||
|
||||
* PRD + threat model (abuse cases: blanket waivers, privilege escalation)
|
||||
* Schema spec + versioning policy
|
||||
* API endpoints + RBAC model
|
||||
* UI flows + audit trail UI
|
||||
* Policy engine semantics + test vectors
|
||||
* Metrics dashboards
|
||||
|
||||
For “Audit Packs”:
|
||||
|
||||
* Pack schema spec + folder layout
|
||||
* Manifest + hash verification rules
|
||||
* Generator service + async job API
|
||||
* Redaction profiles + tests
|
||||
* Verifier CLI + documentation
|
||||
* Performance benchmarks + caching strategy
|
||||
|
||||
---
|
||||
|
||||
# 6. Common failure modes to actively prevent
|
||||
|
||||
1. **Exceptions become suppressions again**
|
||||
If you allow exceptions without expiry/owner or without audit trail, you’ve rebuilt “ignore lists.”
|
||||
|
||||
2. **Over-broad scopes by default**
|
||||
If “all repos/all images” is easy, you will accumulate permanent waivers and lose credibility.
|
||||
|
||||
3. **No deterministic semantics**
|
||||
If the same artifact can pass/fail depending on evaluation order or transient feed updates, auditors will distrust outputs.
|
||||
|
||||
4. **Audit packs that are reports, not evidence**
|
||||
A PDF without machine-verifiable artifacts is not an audit pack—it’s a slide.
|
||||
|
||||
5. **No renewal discipline**
|
||||
If renewals are frictionless and don’t require re-justification, exceptions never die.
|
||||
|
||||
---
|
||||
|
||||
# 7. Recommended phased rollout (to manage build cost)
|
||||
|
||||
**Phase 1: Governance basics**
|
||||
|
||||
* Exception object schema + lifecycle + expiry enforcement
|
||||
* Create-from-finding UX
|
||||
* Audit pack v1 (SBOM/inventory + findings + policy + exceptions + manifest)
|
||||
|
||||
**Phase 2: Evidence binding**
|
||||
|
||||
* Evidence refs on exceptions (VEX, reachability fragments)
|
||||
* Pack includes VEX inputs/outputs and knowledge snapshot identifiers
|
||||
|
||||
**Phase 3: Verifiable trust**
|
||||
|
||||
* Signed verdicts and/or signed pack manifests
|
||||
* Verifier tooling and deterministic replay hooks
|
||||
|
||||
---
|
||||
|
||||
If you want, I can convert the above into two artifacts your teams can execute against immediately:
|
||||
|
||||
1. A concise **PRD template** (sections + required decisions) for Exceptions and Audit Packs
|
||||
2. A **technical spec outline** (schema definitions, endpoints, state machines, and acceptance test vectors)
|
||||
@@ -0,0 +1,556 @@
|
||||
## Guidelines for Product and Development Managers: Signed, Replayable Risk Verdicts
|
||||
|
||||
### Purpose
|
||||
|
||||
Signed, replayable risk verdicts are the Stella Ops mechanism for producing a **cryptographically verifiable, audit‑ready decision** about an artifact (container image, VM image, filesystem snapshot, SBOM, etc.) that can be **recomputed later to the same result** using the same inputs (“time-travel replay”).
|
||||
|
||||
This capability is not “scan output with a signature.” It is a **decision artifact** that becomes the unit of governance in CI/CD, registry admission, and audits.
|
||||
|
||||
---
|
||||
|
||||
# 1) Shared definitions and non-negotiables
|
||||
|
||||
## 1.1 Definitions
|
||||
|
||||
**Risk verdict**
|
||||
A structured decision: *Pass / Fail / Warn / Needs‑Review* (or similar), produced by a deterministic evaluator under a specific policy and knowledge state.
|
||||
|
||||
**Signed**
|
||||
The verdict is wrapped in a tamper‑evident envelope (e.g., DSSE/in‑toto statement) and signed using an organization-approved trust model (key-based, keyless, or offline CA).
|
||||
|
||||
**Replayable**
|
||||
Given the same:
|
||||
|
||||
* target artifact identity
|
||||
* SBOM (or derivation method)
|
||||
* vulnerability and advisory knowledge state
|
||||
* VEX inputs
|
||||
* policy bundle
|
||||
* evaluator version
|
||||
…Stella Ops can **re-evaluate and reproduce the same verdict** and provide evidence equivalence.
|
||||
|
||||
> Critical nuance: replayability is about *result equivalence*. Byte‑for‑byte equality is ideal but not always required if signatures/metadata necessarily vary. If byte‑for‑byte is a goal, you must strictly control timestamps, ordering, and serialization.
|
||||
|
||||
---
|
||||
|
||||
## 1.2 Non-negotiables (what must be true in v1)
|
||||
|
||||
1. **Verdicts are bound to immutable artifact identity**
|
||||
|
||||
* Container image: digest (sha256:…)
|
||||
* SBOM: content digest
|
||||
* File tree: merkle root digest, or equivalent
|
||||
|
||||
2. **Verdicts are deterministic**
|
||||
|
||||
* No “current time” dependence in scoring
|
||||
* No non-deterministic ordering of findings
|
||||
* No implicit network calls during evaluation
|
||||
|
||||
3. **Verdicts are explainable**
|
||||
|
||||
* Every deny/block decision must cite the policy clause and evidence pointers that triggered it.
|
||||
|
||||
4. **Verdicts are verifiable**
|
||||
|
||||
* Independent verification toolchain exists (CLI/library) that validates signature and checks referenced evidence integrity.
|
||||
|
||||
5. **Knowledge state is pinned**
|
||||
|
||||
* The verdict references a “knowledge snapshot” (vuln feeds, advisories, VEX set) by digest/ID, not “latest.”
|
||||
|
||||
---
|
||||
|
||||
## 1.3 Explicit non-goals (avoid scope traps)
|
||||
|
||||
* Building a full CNAPP runtime protection product as part of verdicting.
|
||||
* Implementing “all possible attestation standards.” Pick one canonical representation; support others via adapters.
|
||||
* Solving global revocation and key lifecycle for every ecosystem on day one; define a minimum viable trust model per deployment mode.
|
||||
|
||||
---
|
||||
|
||||
# 2) Product Management Guidelines
|
||||
|
||||
## 2.1 Position the verdict as the primary product artifact
|
||||
|
||||
**PM rule:** if a workflow does not end in a verdict artifact, it is not part of this moat.
|
||||
|
||||
Examples:
|
||||
|
||||
* CI pipeline step produces `VERDICT.attestation` attached to the OCI artifact.
|
||||
* Registry admission checks for a valid verdict attestation meeting policy.
|
||||
* Audit export bundles the verdict plus referenced evidence.
|
||||
|
||||
**Avoid:** “scan reports” as the goal. Reports are views; the verdict is the object.
|
||||
|
||||
---
|
||||
|
||||
## 2.2 Define the core personas and success outcomes
|
||||
|
||||
Minimum personas:
|
||||
|
||||
1. **Release/Platform Engineering**
|
||||
|
||||
* Needs automated gates, reproducibility, and low friction.
|
||||
2. **Security Engineering / AppSec**
|
||||
|
||||
* Needs evidence, explainability, and exception workflows.
|
||||
3. **Audit / Compliance**
|
||||
|
||||
* Needs replay, provenance, and a defensible trail.
|
||||
|
||||
Define “first value” for each:
|
||||
|
||||
* Release engineer: gate merges/releases without re-running scans.
|
||||
* Security engineer: investigate a deny decision with evidence pointers in minutes.
|
||||
* Auditor: replay a verdict months later using the same knowledge snapshot.
|
||||
|
||||
---
|
||||
|
||||
## 2.3 Product requirements (expressed as “shall” statements)
|
||||
|
||||
### 2.3.1 Verdict content requirements
|
||||
|
||||
A verdict SHALL contain:
|
||||
|
||||
* **Subject**: immutable artifact reference (digest, type, locator)
|
||||
* **Decision**: pass/fail/warn/etc.
|
||||
* **Policy binding**: policy bundle ID + version + digest
|
||||
* **Knowledge snapshot binding**: snapshot IDs/digests for vuln feed and VEX set
|
||||
* **Evaluator binding**: evaluator name/version + schema version
|
||||
* **Rationale summary**: stable short explanation (human-readable)
|
||||
* **Findings references**: pointers to detailed findings/evidence (content-addressed)
|
||||
* **Unknowns state**: explicit unknown counts and categories
|
||||
|
||||
### 2.3.2 Replay requirements
|
||||
|
||||
The product SHALL support:
|
||||
|
||||
* Re-evaluating the same subject under the same policy+knowledge snapshot
|
||||
* Proving equivalence of inputs used in the original verdict
|
||||
* Producing a “replay report” that states:
|
||||
|
||||
* replay succeeded and matched
|
||||
* or replay failed and why (e.g., missing evidence, policy changed)
|
||||
|
||||
### 2.3.3 UX requirements
|
||||
|
||||
UI/UX SHALL:
|
||||
|
||||
* Show verdict status clearly (Pass/Fail/…)
|
||||
* Display:
|
||||
|
||||
* policy clause(s) responsible
|
||||
* top evidence pointers
|
||||
* knowledge snapshot ID
|
||||
* signature trust status (who signed, chain validity)
|
||||
* Provide “Replay” as an action (even if replay happens offline, the UX must guide it)
|
||||
|
||||
---
|
||||
|
||||
## 2.4 Product taxonomy: separate “verdicts” from “evaluations” from “attestations”
|
||||
|
||||
This is where many products get confused. Your terminology must remain strict:
|
||||
|
||||
* **Evaluation**: internal computation that produces decision + findings.
|
||||
* **Verdict**: the stable, canonical decision payload (the thing being signed).
|
||||
* **Attestation**: the signed envelope binding the verdict to cryptographic identity.
|
||||
|
||||
PMs must enforce this vocabulary in PRDs, UI labels, and docs.
|
||||
|
||||
---
|
||||
|
||||
## 2.5 Policy model guidelines for verdicting
|
||||
|
||||
Verdicting depends on policy discipline.
|
||||
|
||||
PM rules:
|
||||
|
||||
* Policy must be **versioned** and **content-addressed**.
|
||||
* Policies must be **pure functions** of declared inputs:
|
||||
|
||||
* SBOM graph
|
||||
* VEX claims
|
||||
* vulnerability data
|
||||
* reachability evidence (if present)
|
||||
* environment assertions (if present)
|
||||
* Policies must produce:
|
||||
|
||||
* a decision
|
||||
* plus a minimal explanation graph (policy rule ID → evidence IDs)
|
||||
|
||||
Avoid “freeform scripts” early. You need determinism and auditability.
|
||||
|
||||
---
|
||||
|
||||
## 2.6 Exceptions are part of the verdict product, not an afterthought
|
||||
|
||||
PM requirement:
|
||||
|
||||
* Exceptions must be first-class objects with:
|
||||
|
||||
* scope (exact artifact/component range)
|
||||
* owner
|
||||
* justification
|
||||
* expiry
|
||||
* required evidence (optional but strongly recommended)
|
||||
|
||||
And verdict logic must:
|
||||
|
||||
* record that an exception was applied
|
||||
* include exception IDs in the verdict evidence graph
|
||||
* make exception usage visible in UI and audit pack exports
|
||||
|
||||
---
|
||||
|
||||
## 2.7 Success metrics (PM-owned)
|
||||
|
||||
Choose metrics that reflect the moat:
|
||||
|
||||
* **Replay success rate**: % of verdicts that can be replayed after N days.
|
||||
* **Policy determinism incidents**: number of non-deterministic evaluation bugs.
|
||||
* **Audit cycle time**: time to satisfy an audit evidence request for a release.
|
||||
* **Noise**: # of manual suppressions/overrides per 100 releases (should drop).
|
||||
* **Gate adoption**: % of releases gated by verdict attestations (not reports).
|
||||
|
||||
---
|
||||
|
||||
# 3) Development Management Guidelines
|
||||
|
||||
## 3.1 Architecture principles (engineering tenets)
|
||||
|
||||
### Tenet A: Determinism-first evaluation
|
||||
|
||||
Engineering SHALL ensure evaluation is deterministic across:
|
||||
|
||||
* OS and architecture differences (as much as feasible)
|
||||
* concurrency scheduling
|
||||
* non-ordered data structures
|
||||
|
||||
Practical rules:
|
||||
|
||||
* Never iterate over maps/hashes without sorting keys.
|
||||
* Canonicalize output ordering (findings sorted by stable tuple: (component_id, cve_id, path, rule_id)).
|
||||
* Keep “generated at” timestamps out of the signed payload; if needed, place them in an unsigned wrapper or separate metadata field excluded from signature.
|
||||
|
||||
### Tenet B: Content-address everything
|
||||
|
||||
All significant inputs/outputs should have content digests:
|
||||
|
||||
* SBOM digest
|
||||
* policy digest
|
||||
* knowledge snapshot digest
|
||||
* evidence bundle digest
|
||||
* verdict digest
|
||||
|
||||
This makes replay and integrity checks possible.
|
||||
|
||||
### Tenet C: No hidden network
|
||||
|
||||
During evaluation, the engine must not fetch “latest” anything.
|
||||
Network is allowed only in:
|
||||
|
||||
* snapshot acquisition phase
|
||||
* artifact retrieval phase
|
||||
* attestation publication phase
|
||||
…and each must be explicitly logged and pinned.
|
||||
|
||||
---
|
||||
|
||||
## 3.2 Canonical verdict schema and serialization rules
|
||||
|
||||
**Engineering guideline:** pick a canonical serialization and stick to it.
|
||||
|
||||
Options:
|
||||
|
||||
* Canonical JSON (JCS or equivalent)
|
||||
* CBOR with deterministic encoding
|
||||
|
||||
Rules:
|
||||
|
||||
* Define a **schema version** and strict validation.
|
||||
* Make field names stable; avoid “optional” fields that appear/disappear nondeterministically.
|
||||
* Ensure numeric formatting is stable (no float drift; prefer integers or rational representation).
|
||||
* Always include empty arrays if required for stability, or exclude consistently by schema rule.
|
||||
|
||||
---
|
||||
|
||||
## 3.3 Suggested verdict payload (illustrative)
|
||||
|
||||
This is not a mandate—use it as a baseline structure.
|
||||
|
||||
```json
|
||||
{
|
||||
"schema_version": "1.0",
|
||||
"subject": {
|
||||
"type": "oci-image",
|
||||
"name": "registry.example.com/app/service",
|
||||
"digest": "sha256:…",
|
||||
"platform": "linux/amd64"
|
||||
},
|
||||
"evaluation": {
|
||||
"evaluator": "stella-eval",
|
||||
"evaluator_version": "0.9.0",
|
||||
"policy": {
|
||||
"id": "prod-default",
|
||||
"version": "2025.12.1",
|
||||
"digest": "sha256:…"
|
||||
},
|
||||
"knowledge_snapshot": {
|
||||
"vuln_db_digest": "sha256:…",
|
||||
"advisory_digest": "sha256:…",
|
||||
"vex_set_digest": "sha256:…"
|
||||
}
|
||||
},
|
||||
"decision": {
|
||||
"status": "fail",
|
||||
"score": 87,
|
||||
"reasons": [
|
||||
{ "rule_id": "RISK.CRITICAL.REACHABLE", "evidence_ref": "sha256:…" }
|
||||
],
|
||||
"unknowns": {
|
||||
"unknown_reachable": 2,
|
||||
"unknown_unreachable": 0
|
||||
}
|
||||
},
|
||||
"evidence": {
|
||||
"sbom_digest": "sha256:…",
|
||||
"finding_bundle_digest": "sha256:…",
|
||||
"inputs_manifest_digest": "sha256:…"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Then wrap this payload in your chosen attestation envelope and sign it.
|
||||
|
||||
---
|
||||
|
||||
## 3.4 Attestation format and storage guidelines
|
||||
|
||||
Development managers must enforce a consistent publishing model:
|
||||
|
||||
1. **Envelope**
|
||||
|
||||
* Prefer DSSE/in-toto style envelope because it:
|
||||
|
||||
* standardizes signing
|
||||
* supports multiple signature schemes
|
||||
* is widely adopted in supply chain ecosystems
|
||||
|
||||
2. **Attachment**
|
||||
|
||||
* OCI artifacts should carry verdicts as referrers/attachments to the subject digest (preferred).
|
||||
* For non-OCI targets, store in an internal ledger keyed by the subject digest/ID.
|
||||
|
||||
3. **Verification**
|
||||
|
||||
* Provide:
|
||||
|
||||
* `stella verify <artifact>` → checks signature and integrity references
|
||||
* `stella replay <verdict>` → re-run evaluation from snapshots and compare
|
||||
|
||||
4. **Transparency / logs**
|
||||
|
||||
* Optional in v1, but plan for:
|
||||
|
||||
* transparency log (public or private) to strengthen auditability
|
||||
* offline alternatives for air-gapped customers
|
||||
|
||||
---
|
||||
|
||||
## 3.5 Knowledge snapshot engineering requirements
|
||||
|
||||
A “snapshot” must be an immutable bundle, ideally content-addressed:
|
||||
|
||||
Snapshot includes:
|
||||
|
||||
* vulnerability database at a specific point
|
||||
* advisory sources (OS distro advisories)
|
||||
* VEX statement set(s)
|
||||
* any enrichment signals that influence scoring
|
||||
|
||||
Rules:
|
||||
|
||||
* Snapshot resolution must be explicit: “use snapshot digest X”
|
||||
* Must support export/import for air-gapped deployments
|
||||
* Must record source provenance and ingestion timestamps (timestamps may be excluded from signed payload if they cause nondeterminism; store them in snapshot metadata)
|
||||
|
||||
---
|
||||
|
||||
## 3.6 Replay engine requirements
|
||||
|
||||
Replay is not “re-run scan and hope it matches.”
|
||||
|
||||
Replay must:
|
||||
|
||||
* retrieve the exact subject (or confirm it via digest)
|
||||
* retrieve the exact SBOM (or deterministically re-generate it from the subject in a defined way)
|
||||
* load exact policy bundle by digest
|
||||
* load exact knowledge snapshot by digest
|
||||
* run evaluator version pinned in verdict (or enforce a compatibility mapping)
|
||||
* produce:
|
||||
|
||||
* verdict-equivalence result
|
||||
* a delta explanation if mismatch occurs
|
||||
|
||||
Engineering rule: replay must fail loudly and specifically when inputs are missing.
|
||||
|
||||
---
|
||||
|
||||
## 3.7 Testing strategy (required)
|
||||
|
||||
Deterministic systems require “golden” testing.
|
||||
|
||||
Minimum tests:
|
||||
|
||||
1. **Golden verdict tests**
|
||||
|
||||
* Fixed artifact + fixed snapshots + fixed policy
|
||||
* Expected verdict output must match exactly
|
||||
|
||||
2. **Cross-platform determinism tests**
|
||||
|
||||
* Run same evaluation on different machines/containers and compare outputs
|
||||
|
||||
3. **Mutation tests for determinism**
|
||||
|
||||
* Randomize ordering of internal collections; output should remain unchanged
|
||||
|
||||
4. **Replay regression tests**
|
||||
|
||||
* Store verdict + snapshots and replay after code changes to ensure compatibility guarantees hold
|
||||
|
||||
---
|
||||
|
||||
## 3.8 Versioning and backward compatibility guidelines
|
||||
|
||||
This is essential to prevent “replay breaks after upgrades.”
|
||||
|
||||
Rules:
|
||||
|
||||
* **Verdict schema version** changes must be rare and carefully managed.
|
||||
* Maintain a compatibility matrix:
|
||||
|
||||
* evaluator vX can replay verdict schema vY
|
||||
* If you must evolve logic, do so by:
|
||||
|
||||
* bumping evaluator version
|
||||
* preserving older evaluators in a compatibility mode (containerized evaluators are often easiest)
|
||||
|
||||
---
|
||||
|
||||
## 3.9 Security and key management guidelines
|
||||
|
||||
Development managers must ensure:
|
||||
|
||||
* Signing keys are managed via:
|
||||
|
||||
* KMS/HSM (enterprise)
|
||||
* keyless (OIDC-based) where acceptable
|
||||
* offline keys for air-gapped
|
||||
|
||||
* Verification trust policy is explicit:
|
||||
|
||||
* which identities are trusted to sign verdicts
|
||||
* which policies are accepted
|
||||
* whether transparency is required
|
||||
* how to handle revocation/rotation
|
||||
|
||||
* Separate “can sign” from “can publish”
|
||||
|
||||
* Signing should be restricted; publishing may be broader.
|
||||
|
||||
---
|
||||
|
||||
# 4) Operational workflow requirements (cross-functional)
|
||||
|
||||
## 4.1 CI gate flow
|
||||
|
||||
* Build artifact
|
||||
* Produce SBOM deterministically (or record SBOM digest if generated elsewhere)
|
||||
* Evaluate → produce verdict payload
|
||||
* Sign verdict → publish attestation attached to artifact
|
||||
* Gate decision uses verification of:
|
||||
|
||||
* signature validity
|
||||
* policy compliance
|
||||
* snapshot integrity
|
||||
|
||||
## 4.2 Registry / admission flow
|
||||
|
||||
* Admission controller checks for a valid, trusted verdict attestation
|
||||
* Optionally requires:
|
||||
|
||||
* verdict not older than X snapshot age (this is policy)
|
||||
* no expired exceptions
|
||||
* replay not required (replay is for audits; admission is fast-path)
|
||||
|
||||
## 4.3 Audit flow
|
||||
|
||||
* Export “audit pack”:
|
||||
|
||||
* verdict + signature chain
|
||||
* policy bundle
|
||||
* knowledge snapshot
|
||||
* referenced evidence bundles
|
||||
* Auditor (or internal team) runs `verify` and optionally `replay`
|
||||
|
||||
---
|
||||
|
||||
# 5) Common failure modes to avoid
|
||||
|
||||
1. **Signing “findings” instead of a decision**
|
||||
|
||||
* Leads to unbounded payload growth and weak governance semantics.
|
||||
|
||||
2. **Using “latest” feeds during evaluation**
|
||||
|
||||
* Breaks replayability immediately.
|
||||
|
||||
3. **Embedding timestamps in signed payload**
|
||||
|
||||
* Eliminates deterministic byte-level reproducibility.
|
||||
|
||||
4. **Letting the UI become the source of truth**
|
||||
|
||||
* The verdict artifact must be the authority; UI is a view.
|
||||
|
||||
5. **No clear separation between: evidence store, snapshot store, verdict store**
|
||||
|
||||
* Creates coupling and makes offline operations painful.
|
||||
|
||||
---
|
||||
|
||||
# 6) Definition of Done checklist (use this to gate release)
|
||||
|
||||
A feature increment for signed, replayable verdicts is “done” only if:
|
||||
|
||||
* [ ] Verdict binds to immutable subject digest
|
||||
* [ ] Verdict includes policy digest/version and knowledge snapshot digests
|
||||
* [ ] Verdict is signed and verifiable via CLI
|
||||
* [ ] Verification works offline (given exported artifacts)
|
||||
* [ ] Replay works with stored snapshots and produces match/mismatch output with reasons
|
||||
* [ ] Determinism tests pass (golden + mutation + cross-platform)
|
||||
* [ ] UI displays signer identity, policy, snapshot IDs, and rule→evidence links
|
||||
* [ ] Exceptions (if implemented) are recorded in verdict and enforced deterministically
|
||||
|
||||
---
|
||||
|
||||
## Optional: Recommended implementation sequence (keeps risk down)
|
||||
|
||||
1. Canonical verdict schema + deterministic evaluator skeleton
|
||||
2. Signing + verification CLI
|
||||
3. Snapshot bundle format + pinned evaluation
|
||||
4. Replay tool + golden tests
|
||||
5. OCI attachment publishing + registry/admission integration
|
||||
6. Evidence bundles + UI explainability
|
||||
7. Exceptions + audit pack export
|
||||
|
||||
---
|
||||
|
||||
If you want this turned into a formal internal PRD template, I can format it as:
|
||||
|
||||
* “Product requirements” (MUST/SHOULD/COULD)
|
||||
* “Engineering requirements” (interfaces + invariants + test plan)
|
||||
* “Security model” (trust roots, signing identities, verification policy)
|
||||
* “Acceptance criteria” for an MVP and for GA
|
||||
@@ -0,0 +1,462 @@
|
||||
Below are internal guidelines for Stella Ops Product Managers and Development Managers for the capability: **Knowledge Snapshots / Time‑Travel Replay**. This is written as an implementable operating standard (not a concept note).
|
||||
|
||||
---
|
||||
|
||||
# Knowledge Snapshots / Time‑Travel Replay
|
||||
|
||||
## Product and Engineering Guidelines for Stella Ops
|
||||
|
||||
## 1) Purpose and value proposition
|
||||
|
||||
### What this capability must achieve
|
||||
|
||||
Enable Stella Ops to **reproduce any historical risk decision** (scan result, policy evaluation, verdict) **deterministically**, using a **cryptographically bound snapshot** of the exact knowledge inputs that were available at the time the decision was made.
|
||||
|
||||
### Why customers pay for it
|
||||
|
||||
This capability is primarily purchased for:
|
||||
|
||||
* **Auditability**: “Show me what you knew, when you knew it, and why the system decided pass/fail.”
|
||||
* **Incident response**: reproduce prior posture using historical feeds/VEX/policies and explain deltas.
|
||||
* **Air‑gapped / regulated environments**: deterministic, offline decisioning with attested knowledge state.
|
||||
* **Change control**: prove whether a decision changed due to code change vs knowledge change.
|
||||
|
||||
### Core product promise
|
||||
|
||||
For a given artifact and snapshot:
|
||||
|
||||
* **Same inputs → same outputs** (verdict, scores, findings, evidence pointers), or Stella Ops must clearly declare the precise exceptions.
|
||||
|
||||
---
|
||||
|
||||
## 2) Definitions (PMs and engineers must align on these)
|
||||
|
||||
### Knowledge input
|
||||
|
||||
Any external or semi-external information that can influence the outcome:
|
||||
|
||||
* vulnerability databases and advisories (any source)
|
||||
* exploit-intel signals
|
||||
* VEX statements (OpenVEX, CSAF, CycloneDX VEX, etc.)
|
||||
* SBOM ingestion logic and parsing rules
|
||||
* package identification rules (including distro/backport logic)
|
||||
* policy content and policy engine version
|
||||
* scoring rules (including weights and thresholds)
|
||||
* trust anchors and signature verification policy
|
||||
* plugin versions and enabled capabilities
|
||||
* configuration defaults and overrides that change analysis
|
||||
|
||||
### Knowledge Snapshot
|
||||
|
||||
A **sealed record** of:
|
||||
|
||||
1. **References** (which inputs were used), and
|
||||
2. **Content** (the exact bytes used), and
|
||||
3. **Execution contract** (the evaluator and ruleset versions)
|
||||
|
||||
### Time‑Travel Replay
|
||||
|
||||
Re-running evaluation of an artifact **using only** the snapshot content and the recorded execution contract, producing the same decision and explainability artifacts.
|
||||
|
||||
---
|
||||
|
||||
## 3) Product principles (non‑negotiables)
|
||||
|
||||
1. **Determinism is a product requirement**, not an engineering detail.
|
||||
2. **Snapshots are first‑class artifacts** with explicit lifecycle (create, verify, export/import, retain, expire).
|
||||
3. **The snapshot is cryptographically bound** to outcomes and evidence (tamper-evident chain).
|
||||
4. **Replays must be possible offline** (when the snapshot includes content) and must fail clearly when not possible.
|
||||
5. **Minimal surprise**: the UI must explain when a verdict changed due to “knowledge drift” vs “artifact drift.”
|
||||
6. **Scalability by content addressing**: the platform must deduplicate knowledge content aggressively.
|
||||
7. **Backward compatibility**: old snapshots must remain replayable within a documented support window.
|
||||
|
||||
---
|
||||
|
||||
## 4) Scope boundaries (what this is not)
|
||||
|
||||
### Non-goals (explicitly out of scope for v1 unless approved)
|
||||
|
||||
* Reconstructing *external internet state* beyond what is recorded (no “fetch historical CVE state from the web”).
|
||||
* Guaranteeing replay across major engine rewrites without a compatibility plan.
|
||||
* Storing sensitive proprietary customer code in snapshots (unless explicitly enabled).
|
||||
* Replaying “live runtime signals” unless those signals were captured into the snapshot at decision time.
|
||||
|
||||
---
|
||||
|
||||
## 5) Personas and use cases (PM guidance)
|
||||
|
||||
### Primary personas
|
||||
|
||||
* **Security Governance / GRC**: needs audit packs, controls evidence, deterministic history.
|
||||
* **Incident response / AppSec lead**: needs “what changed and why” quickly.
|
||||
* **Platform engineering / DevOps**: needs reproducible CI gates and air‑gap workflows.
|
||||
* **Procurement / regulated customers**: needs proof of process and defensible attestations.
|
||||
|
||||
### Must-support use cases
|
||||
|
||||
1. **Replay a past release gate decision** in a new environment (including offline) and get identical outcome.
|
||||
2. **Explain drift**: “This build fails today but passed last month—why?”
|
||||
3. **Air‑gap export/import**: create snapshots in connected environment, import to disconnected one.
|
||||
4. **Audit bundle generation**: export snapshot + verdict(s) + evidence pointers.
|
||||
|
||||
---
|
||||
|
||||
## 6) Functional requirements (PM “must/should” list)
|
||||
|
||||
### Must
|
||||
|
||||
* **Snapshot creation** for every material evaluation (or for every “decision object” chosen by configuration).
|
||||
* **Snapshot manifest** containing:
|
||||
|
||||
* unique snapshot ID (content-addressed)
|
||||
* list of knowledge sources with hashes/digests
|
||||
* policy IDs and exact policy content hashes
|
||||
* engine version and plugin versions
|
||||
* timestamp and clock source metadata
|
||||
* trust anchor set hash and verification policy hash
|
||||
* **Snapshot sealing**:
|
||||
|
||||
* snapshot manifest is signed
|
||||
* signed link from verdict → snapshot ID
|
||||
* **Replay**:
|
||||
|
||||
* re-evaluate using only snapshot inputs
|
||||
* output must match prior results (or emit a deterministic mismatch report)
|
||||
* **Export/import**:
|
||||
|
||||
* portable bundle format
|
||||
* import verifies integrity and signatures before allowing use
|
||||
* **Retention controls**:
|
||||
|
||||
* configurable retention windows and storage quotas
|
||||
* deduplication and garbage collection
|
||||
|
||||
### Should
|
||||
|
||||
* **Partial snapshots** (reference-only) vs **full snapshots** (content included), with explicit replay guarantees.
|
||||
* **Diff views**: compare two snapshots and highlight what knowledge changed.
|
||||
* **Multi-snapshot replay**: run “as-of snapshot A” and “as-of snapshot B” to show drift impact.
|
||||
|
||||
### Could
|
||||
|
||||
* Snapshot “federation” for large orgs (mirrors/replication with policy controls).
|
||||
* Snapshot “pinning” to releases or environments as a governance policy.
|
||||
|
||||
---
|
||||
|
||||
## 7) UX and workflow guidelines (PM + Eng)
|
||||
|
||||
### UI must communicate three states clearly
|
||||
|
||||
1. **Reproducible offline**: snapshot includes all required content.
|
||||
2. **Reproducible with access**: snapshot references external sources that must be available.
|
||||
3. **Not reproducible**: missing content or unsupported evaluator version.
|
||||
|
||||
### Required UI objects
|
||||
|
||||
* **Snapshot Details page**
|
||||
|
||||
* snapshot ID and signature status
|
||||
* list of knowledge sources (name, version/epoch, digest, size)
|
||||
* policy bundle version, scoring rules version
|
||||
* trust anchors + verification policy digest
|
||||
* replay status: “verified reproducible / reproducible / not reproducible”
|
||||
* **Verdict page**
|
||||
|
||||
* links to snapshot(s)
|
||||
* “replay now” action
|
||||
* “compare to latest knowledge” action
|
||||
|
||||
### UX guardrails
|
||||
|
||||
* Never show “pass/fail” without also showing:
|
||||
|
||||
* snapshot ID
|
||||
* policy ID/version
|
||||
* verification status
|
||||
* When results differ on replay, show:
|
||||
|
||||
* exact mismatch class (engine mismatch, missing data, nondeterminism, corrupted snapshot)
|
||||
* what input changed (if known)
|
||||
* remediation steps
|
||||
|
||||
---
|
||||
|
||||
## 8) Data model and format guidelines (Development Managers)
|
||||
|
||||
### Canonical objects (recommended minimum set)
|
||||
|
||||
* **KnowledgeSnapshotManifest (KSM)**
|
||||
* **KnowledgeBlob** (content-addressed bytes)
|
||||
* **KnowledgeSourceDescriptor**
|
||||
* **PolicyBundle**
|
||||
* **TrustBundle**
|
||||
* **Verdict** (signed decision artifact)
|
||||
* **ReplayReport** (records replay result and mismatches)
|
||||
|
||||
### Content addressing
|
||||
|
||||
* Use a stable hash (e.g., SHA‑256) for:
|
||||
|
||||
* each knowledge blob
|
||||
* manifest
|
||||
* policy bundle
|
||||
* trust bundle
|
||||
* Snapshot ID should be derived from manifest digest.
|
||||
|
||||
### Example manifest shape (illustrative)
|
||||
|
||||
```json
|
||||
{
|
||||
"snapshot_id": "ksm:sha256:…",
|
||||
"created_at": "2025-12-19T10:15:30Z",
|
||||
"engine": { "name": "stella-evaluator", "version": "1.7.0", "build": "…"},
|
||||
"plugins": [
|
||||
{ "name": "pkg-id", "version": "2.3.1", "digest": "sha256:…" }
|
||||
],
|
||||
"policy": { "bundle_id": "pol:sha256:…", "digest": "sha256:…" },
|
||||
"scoring": { "ruleset_id": "score:sha256:…", "digest": "sha256:…" },
|
||||
"trust": { "bundle_id": "trust:sha256:…", "digest": "sha256:…" },
|
||||
"sources": [
|
||||
{
|
||||
"name": "nvd",
|
||||
"epoch": "2025-12-18",
|
||||
"kind": "vuln_feed",
|
||||
"content_digest": "sha256:…",
|
||||
"licenses": ["…"],
|
||||
"origin": { "uri": "…", "retrieved_at": "…" }
|
||||
},
|
||||
{
|
||||
"name": "customer-vex",
|
||||
"kind": "vex",
|
||||
"content_digest": "sha256:…"
|
||||
}
|
||||
],
|
||||
"environment": {
|
||||
"determinism_profile": "strict",
|
||||
"timezone": "UTC",
|
||||
"normalization": { "line_endings": "LF", "sort_order": "canonical" }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Versioning rules
|
||||
|
||||
* Every object is immutable once written.
|
||||
* Changes create new digests; never mutate in place.
|
||||
* Support schema evolution via:
|
||||
|
||||
* `schema_version`
|
||||
* strict validation + migration tooling
|
||||
* Keep manifests small; store large data as blobs.
|
||||
|
||||
---
|
||||
|
||||
## 9) Determinism contract (Engineering must enforce)
|
||||
|
||||
### Determinism requirements
|
||||
|
||||
* Stable ordering: sort inputs and outputs canonically.
|
||||
* Stable timestamps: timestamps may exist but must not change computed scores/verdict.
|
||||
* Stable randomization: no RNG; if unavoidable, fixed seed recorded in snapshot.
|
||||
* Stable parsers: parser versions are pinned by digest; parsing must be deterministic.
|
||||
|
||||
### Allowed nondeterminism (if any) must be explicit
|
||||
|
||||
If you must allow nondeterminism, it must be:
|
||||
|
||||
* documented,
|
||||
* surfaced in UI,
|
||||
* included in replay report as “non-deterministic factor,”
|
||||
* and excluded from the signed decision if it affects pass/fail.
|
||||
|
||||
---
|
||||
|
||||
## 10) Security model (Development Managers)
|
||||
|
||||
### Threats this feature must address
|
||||
|
||||
* Feed poisoning (tampered vulnerability data)
|
||||
* Time-of-check/time-of-use drift (same artifact evaluated against moving feeds)
|
||||
* Replay manipulation (swap snapshot content)
|
||||
* “Policy drift hiding” (claiming old decision used different policies)
|
||||
* Signature bypass (trust anchors altered)
|
||||
|
||||
### Controls required
|
||||
|
||||
* Sign manifests and verdicts.
|
||||
* Bind verdict → snapshot ID → policy bundle hash → trust bundle hash.
|
||||
* Verify on every import and on every replay invocation.
|
||||
* Audit log:
|
||||
|
||||
* snapshot created
|
||||
* snapshot imported
|
||||
* replay executed
|
||||
* verification failures
|
||||
|
||||
### Key handling
|
||||
|
||||
* Decide and document:
|
||||
|
||||
* who signs snapshots/verdicts (service keys vs tenant keys)
|
||||
* rotation policy
|
||||
* revocation/compromise handling
|
||||
* Avoid designing cryptography from scratch; use well-established signing formats and separation of duties.
|
||||
|
||||
---
|
||||
|
||||
## 11) Offline / air‑gapped requirements
|
||||
|
||||
### Snapshot levels (PM packaging guideline)
|
||||
|
||||
Offer explicit snapshot types with clear guarantees:
|
||||
|
||||
* **Level A: Reference-only snapshot**
|
||||
|
||||
* stores hashes + source descriptors
|
||||
* replay requires access to original sources
|
||||
* **Level B: Portable snapshot**
|
||||
|
||||
* includes blobs necessary for replay
|
||||
* replay works offline
|
||||
* **Level C: Sealed portable snapshot**
|
||||
|
||||
* portable + signed + includes trust anchors
|
||||
* replay works offline and can be verified independently
|
||||
|
||||
Do not market air‑gap support without specifying which level is provided.
|
||||
|
||||
---
|
||||
|
||||
## 12) Performance and storage guidelines
|
||||
|
||||
### Principles
|
||||
|
||||
* Content-address knowledge blobs to maximize deduplication.
|
||||
* Separate “hot” knowledge (recent epochs) from cold storage.
|
||||
* Support snapshot compaction and garbage collection.
|
||||
|
||||
### Operational requirements
|
||||
|
||||
* Retention policies per tenant/project/environment.
|
||||
* Quotas and alerting when snapshot storage approaches limits.
|
||||
* Export bundles should be chunked/streamable for large feeds.
|
||||
|
||||
---
|
||||
|
||||
## 13) Testing and acceptance criteria
|
||||
|
||||
### Required test categories
|
||||
|
||||
1. **Golden replay tests**
|
||||
|
||||
* same artifact + same snapshot → identical outputs
|
||||
2. **Corruption tests**
|
||||
|
||||
* bit flips in blobs/manifests are detected and rejected
|
||||
3. **Version skew tests**
|
||||
|
||||
* old snapshot + new engine should either replay deterministically or fail with a clear incompatibility report
|
||||
4. **Air‑gap tests**
|
||||
|
||||
* export → import → replay without network access
|
||||
5. **Diff accuracy tests**
|
||||
|
||||
* compare snapshots and ensure the diff identifies actual knowledge changes, not noise
|
||||
|
||||
### Definition of Done (DoD) for the feature
|
||||
|
||||
* Snapshots are created automatically according to policy.
|
||||
* Snapshots can be exported and imported with verified integrity.
|
||||
* Replay produces matching verdicts for a representative corpus.
|
||||
* UI exposes snapshot provenance and replay status.
|
||||
* Audit log records snapshot lifecycle events.
|
||||
* Clear failure modes exist (missing blobs, incompatible engine, signature failure).
|
||||
|
||||
---
|
||||
|
||||
## 14) Metrics (PM ownership)
|
||||
|
||||
Track metrics that prove this is a moat, not a checkbox.
|
||||
|
||||
### Core KPIs
|
||||
|
||||
* **Replay success rate** (strict determinism)
|
||||
* **Time to explain drift** (median time from “why changed” to root cause)
|
||||
* **% verdicts with sealed portable snapshots**
|
||||
* **Audit effort reduction** (customer-reported or measured via workflow steps)
|
||||
* **Storage efficiency** (dedup ratio; bytes per snapshot over time)
|
||||
|
||||
### Guardrail metrics
|
||||
|
||||
* Snapshot creation latency impact on CI
|
||||
* Snapshot storage growth per tenant
|
||||
* Verification failure rates
|
||||
|
||||
---
|
||||
|
||||
## 15) Common failure modes (what to prevent)
|
||||
|
||||
1. Treating snapshots as “metadata only” and still claiming replayability.
|
||||
2. Allowing “latest feed fetch” during replay (breaks the promise).
|
||||
3. Not pinning parser/policy/scoring versions—causes silent drift.
|
||||
4. Missing clear UX around replay limitations and failure reasons.
|
||||
5. Overcapturing sensitive inputs (privacy and customer trust risk).
|
||||
6. Underinvesting in dedup/retention (cost blowups).
|
||||
|
||||
---
|
||||
|
||||
## 16) Management checklists
|
||||
|
||||
### PM checklist (before commitment)
|
||||
|
||||
* Precisely define “replay” guarantee level (A/B/C) for each SKU/environment.
|
||||
* Define which inputs are in scope (feeds, VEX, policies, trust bundles, plugins).
|
||||
* Define customer-facing workflows:
|
||||
|
||||
* “replay now”
|
||||
* “compare to latest”
|
||||
* “export for audit / air-gap”
|
||||
* Confirm governance outcomes:
|
||||
|
||||
* audit pack integration
|
||||
* exception linkage
|
||||
* release gate linkage
|
||||
|
||||
### Development Manager checklist (before build)
|
||||
|
||||
* Establish canonical schemas and versioning plan.
|
||||
* Establish content-addressed storage + dedup plan.
|
||||
* Establish signing and trust anchor strategy.
|
||||
* Establish deterministic evaluation contract and test harness.
|
||||
* Establish import/export packaging and verification.
|
||||
* Establish retention, quotas, and GC.
|
||||
|
||||
---
|
||||
|
||||
## 17) Minimal phased delivery (recommended)
|
||||
|
||||
**Phase 1: Reference snapshot + verdict binding**
|
||||
|
||||
* Record source descriptors + hashes, policy/scoring/trust digests.
|
||||
* Bind snapshot ID into verdict artifacts.
|
||||
|
||||
**Phase 2: Portable snapshots**
|
||||
|
||||
* Store knowledge blobs locally with dedup.
|
||||
* Export/import with integrity verification.
|
||||
|
||||
**Phase 3: Sealed portable snapshots + replay tooling**
|
||||
|
||||
* Sign snapshots.
|
||||
* Deterministic replay pipeline + replay report.
|
||||
* UI surfacing and audit logs.
|
||||
|
||||
**Phase 4: Snapshot diff + drift explainability**
|
||||
|
||||
* Compare snapshots.
|
||||
* Attribute decision drift to knowledge changes vs artifact changes.
|
||||
|
||||
---
|
||||
|
||||
If you want this turned into an internal PRD template, I can rewrite it into a structured PRD format with: objectives, user stories, functional requirements, non-functional requirements, security/compliance, dependencies, risks, and acceptance tests—ready for Jira/Linear epics and engineering design review.
|
||||
@@ -0,0 +1,497 @@
|
||||
## Stella Ops Guidelines
|
||||
|
||||
### Risk Budgets and Diff-Aware Release Gates
|
||||
|
||||
**Audience:** Product Managers (PMs) and Development Managers (DMs)
|
||||
**Applies to:** All customer-impacting software and configuration changes shipped by Stella Ops (code, infrastructure-as-code, runtime config, feature flags, data migrations, dependency upgrades).
|
||||
|
||||
---
|
||||
|
||||
## 1) What we are optimizing for
|
||||
|
||||
Stella Ops ships quickly **without** letting change-driven incidents, security regressions, or data integrity failures become the hidden cost of “speed.”
|
||||
|
||||
These guidelines enforce two linked controls:
|
||||
|
||||
1. **Risk Budgets** — a quantitative “capacity to take risk” that prevents reliability and trust from being silently depleted.
|
||||
2. **Diff-Aware Release Gates** — release checks whose strictness scales with *what changed* (the diff), not with generic process.
|
||||
|
||||
Together they let us move fast on low-risk diffs and slow down only when the change warrants it.
|
||||
|
||||
---
|
||||
|
||||
## 2) Non-negotiable principles
|
||||
|
||||
1. **All changes are risk-bearing** (even “small” diffs). We quantify and route them accordingly.
|
||||
2. **Risk is managed at the product/service boundary** (each service has its own budget and gating profile).
|
||||
3. **Automation first, approvals last**. Humans review what automation cannot reliably verify.
|
||||
4. **Blast radius is a first-class variable**. A safe rollout beats a perfect code review.
|
||||
5. **Exceptions are allowed but never free**. Every bypass is logged, justified, and paid back via budget reduction and follow-up controls.
|
||||
|
||||
---
|
||||
|
||||
## 3) Definitions
|
||||
|
||||
### 3.1 Risk Budget (what it is)
|
||||
|
||||
A **Risk Budget** is the amount of change-risk a product/service is allowed to take over a defined window (typically a sprint or month) **without increasing the probability of customer harm beyond the agreed tolerance**.
|
||||
|
||||
It is a management control, not a theoretical score.
|
||||
|
||||
### 3.2 Risk Budget vs. Error Budget (important distinction)
|
||||
|
||||
* **Error Budget** (classic SRE): backward-looking tolerance for *actual* unreliability vs. SLO.
|
||||
* **Risk Budget** (this policy): forward-looking tolerance for *change risk* before shipping.
|
||||
|
||||
They interact:
|
||||
|
||||
* If error budget is burned (service is unstable), risk budget is automatically constrained.
|
||||
* If risk budget is low, release gates tighten by policy.
|
||||
|
||||
### 3.3 Diff-aware release gates (what it is)
|
||||
|
||||
A **release gate** is a set of required checks (tests, scans, reviews, rollout controls) that must pass before a change can progress.
|
||||
**Diff-aware** means the gate level is determined by:
|
||||
|
||||
* what changed (diff classification),
|
||||
* where it changed (criticality),
|
||||
* how it ships (blast radius controls),
|
||||
* and current operational context (incidents, SLO health, budget remaining).
|
||||
|
||||
---
|
||||
|
||||
## 4) Roles and accountability
|
||||
|
||||
### Product Manager (PM) — accountable for risk appetite
|
||||
|
||||
PM responsibilities:
|
||||
|
||||
* Define product-level risk tolerance with stakeholders (customer impact tolerance, regulatory constraints).
|
||||
* Approve the **Risk Budget Policy settings** for their product/service tier (criticality level, default gates).
|
||||
* Prioritize reliability work when budgets are constrained.
|
||||
* Own customer communications for degraded service or risk-driven release deferrals.
|
||||
|
||||
### Development Manager (DM) — accountable for enforcement and engineering hygiene
|
||||
|
||||
DM responsibilities:
|
||||
|
||||
* Ensure pipelines implement diff classification and enforce gates.
|
||||
* Ensure tests, telemetry, rollout mechanisms, and rollback procedures exist and are maintained.
|
||||
* Ensure “exceptions” process is real (logged, postmortemed, paid back).
|
||||
* Own staffing/rotation decisions to ensure safe releases (on-call readiness, release captains).
|
||||
|
||||
### Shared responsibilities
|
||||
|
||||
PM + DM jointly:
|
||||
|
||||
* Review risk budget status weekly.
|
||||
* Resolve trade-offs: feature velocity vs. reliability/security work.
|
||||
* Approve gate profile changes (tighten/loosen) based on evidence.
|
||||
|
||||
---
|
||||
|
||||
## 5) Risk Budgets
|
||||
|
||||
### 5.1 Establish service tiers (criticality)
|
||||
|
||||
Each service/product component must be assigned a **Criticality Tier**:
|
||||
|
||||
* **Tier 0 – Internal only** (no external customers; low business impact)
|
||||
* **Tier 1 – Customer-facing non-critical** (degradation tolerated; limited blast radius)
|
||||
* **Tier 2 – Customer-facing critical** (core workflows; meaningful revenue/trust impact)
|
||||
* **Tier 3 – Safety/financial/data-critical** (payments, auth, permissions, PII, regulated workflows)
|
||||
|
||||
Tier drives default budgets and minimum gates.
|
||||
|
||||
### 5.2 Choose a budget window and units
|
||||
|
||||
**Window:** default to **monthly** with weekly tracking; optionally sprint-based if release cadence is sprint-coupled.
|
||||
**Units:** use **Risk Points (RP)** — consumed by each change. (Do not overcomplicate at first; tune with data.)
|
||||
|
||||
Recommended initial monthly budgets (adjust after 2–3 cycles with evidence):
|
||||
|
||||
* Tier 0: 300 RP/month
|
||||
* Tier 1: 200 RP/month
|
||||
* Tier 2: 120 RP/month
|
||||
* Tier 3: 80 RP/month
|
||||
|
||||
> Interpretation: Tier 3 ships fewer “risky” changes; it can still ship frequently, but changes must be decomposed into low-risk diffs and shipped with strong controls.
|
||||
|
||||
### 5.3 Risk Point scoring (how changes consume budget)
|
||||
|
||||
Every change gets a **Release Risk Score (RRS)** in RP.
|
||||
|
||||
A practical baseline model:
|
||||
|
||||
**RRS = Base(criticality) + Diff Risk + Operational Context – Mitigations**
|
||||
|
||||
**Base (criticality):**
|
||||
|
||||
* Tier 0: +1
|
||||
* Tier 1: +3
|
||||
* Tier 2: +6
|
||||
* Tier 3: +10
|
||||
|
||||
**Diff Risk (additive):**
|
||||
|
||||
* +1: docs, comments, non-executed code paths, telemetry-only additions
|
||||
* +3: UI changes, non-core logic changes, refactors with high test coverage
|
||||
* +6: API contract changes, dependency upgrades, medium-complexity logic in a core path
|
||||
* +10: database schema migrations, auth/permission logic, data retention/PII handling
|
||||
* +15: infra/networking changes, encryption/key handling, payment flows, queue semantics changes
|
||||
|
||||
**Operational Context (additive):**
|
||||
|
||||
* +5: service currently in incident or had Sev1/Sev2 in last 7 days
|
||||
* +3: error budget < 50% remaining
|
||||
* +2: on-call load high (paging above normal baseline)
|
||||
* +5: release during restricted windows (holidays/freeze) via exception
|
||||
|
||||
**Mitigations (subtract):**
|
||||
|
||||
* –3: feature flag with staged rollout + instant kill switch verified
|
||||
* –3: canary + automated health gates + rollback tested in last 30 days
|
||||
* –2: high-confidence integration coverage for touched components
|
||||
* –2: no data migration OR backward-compatible migration with proven rollback
|
||||
* –2: change isolated behind permission boundary / limited cohort
|
||||
|
||||
**Minimum RRS floor:** never below 1 RP.
|
||||
|
||||
DM is responsible for making sure the pipeline can calculate a *default* RRS automatically and require humans only for edge cases.
|
||||
|
||||
### 5.4 Budget operating rules
|
||||
|
||||
**Budget ledger:** Maintain a per-service ledger:
|
||||
|
||||
* Budget allocated for the window
|
||||
* RP consumed per release
|
||||
* RP remaining
|
||||
* Trendline (projected depletion date)
|
||||
* Exceptions (break-glass releases)
|
||||
|
||||
**Control thresholds:**
|
||||
|
||||
* **Green (≥60% remaining):** normal operation
|
||||
* **Yellow (30–59%):** additional caution; gates tighten by 1 level for medium/high-risk diffs
|
||||
* **Red (<30%):** freeze high-risk diffs; allow only low-risk changes or reliability/security work
|
||||
* **Exhausted (≤0%):** releases restricted to incident fixes, security fixes, and rollback-only, with tightened gates and explicit sign-off
|
||||
|
||||
### 5.5 What to do when budget is low (expected behavior)
|
||||
|
||||
When Yellow/Red:
|
||||
|
||||
* PM shifts roadmap execution toward:
|
||||
|
||||
* reliability work, defect burn-down,
|
||||
* decomposing large changes into smaller, reversible diffs,
|
||||
* reducing scope of risky features.
|
||||
* DM enforces:
|
||||
|
||||
* smaller diffs,
|
||||
* increased feature flagging,
|
||||
* staged rollout requirements,
|
||||
* improved test/observability coverage.
|
||||
|
||||
Budget constraints are a signal, not a punishment.
|
||||
|
||||
### 5.6 Budget replenishment and incentives
|
||||
|
||||
Budgets replenish on the window boundary, but we also allow **earned capacity**:
|
||||
|
||||
* If a service improves change failure rate and MTTR for 2 consecutive windows, it may earn:
|
||||
|
||||
* +10–20% budget increase **or**
|
||||
* one gate level relaxation for specific change categories
|
||||
|
||||
This must be evidence-driven (metrics, not opinions).
|
||||
|
||||
---
|
||||
|
||||
## 6) Diff-Aware Release Gates
|
||||
|
||||
### 6.1 Diff classification (what the pipeline must detect)
|
||||
|
||||
At minimum, automatically classify diffs into these categories:
|
||||
|
||||
**Code scope**
|
||||
|
||||
* Executable code vs docs-only
|
||||
* Core vs non-core modules (define module ownership boundaries)
|
||||
* Hot paths (latency-sensitive), correctness-sensitive paths
|
||||
|
||||
**Data scope**
|
||||
|
||||
* Schema migration (additive vs breaking)
|
||||
* Backfill jobs / batch jobs
|
||||
* Data model changes impacting downstream consumers
|
||||
* PII / regulated data touchpoints
|
||||
|
||||
**Security scope**
|
||||
|
||||
* Authn/authz logic
|
||||
* Permission checks
|
||||
* Secrets, key handling, encryption changes
|
||||
* Dependency changes with known CVEs
|
||||
|
||||
**Infra scope**
|
||||
|
||||
* IaC changes, networking, load balancer, DNS, autoscaling
|
||||
* Runtime config changes (feature flags, limits, thresholds)
|
||||
* Queue/topic changes, retention settings
|
||||
|
||||
**Interface scope**
|
||||
|
||||
* Public API contract changes
|
||||
* Backward compatibility of payloads/events
|
||||
* Client version dependency
|
||||
|
||||
### 6.2 Gate levels
|
||||
|
||||
Define **Gate Levels G0–G4**. The pipeline assigns one based on diff + context + budget.
|
||||
|
||||
#### G0 — No-risk / administrative
|
||||
|
||||
Use for:
|
||||
|
||||
* docs-only, comments-only, non-functional metadata
|
||||
|
||||
Requirements:
|
||||
|
||||
* Lint/format checks
|
||||
* Basic CI pass (build)
|
||||
|
||||
#### G1 — Low risk
|
||||
|
||||
Use for:
|
||||
|
||||
* small, localized code changes with strong unit coverage
|
||||
* non-core UI changes
|
||||
* telemetry additions (no removal)
|
||||
|
||||
Requirements:
|
||||
|
||||
* All automated unit tests
|
||||
* Static analysis/linting
|
||||
* 1 peer review (code owner not required if outside critical modules)
|
||||
* Automated deploy to staging
|
||||
* Post-deploy smoke checks
|
||||
|
||||
#### G2 — Moderate risk
|
||||
|
||||
Use for:
|
||||
|
||||
* moderate logic changes in customer-facing paths
|
||||
* dependency upgrades
|
||||
* API changes that are backward compatible
|
||||
* config changes affecting behavior
|
||||
|
||||
Requirements:
|
||||
|
||||
* G1 +
|
||||
* Integration tests relevant to impacted modules
|
||||
* Code owner review for touched modules
|
||||
* Feature flag required if customer impact possible
|
||||
* Staged rollout: canary or small cohort
|
||||
* Rollback plan documented in PR
|
||||
|
||||
#### G3 — High risk
|
||||
|
||||
Use for:
|
||||
|
||||
* schema migrations
|
||||
* auth/permission changes
|
||||
* core business logic in critical flows
|
||||
* infra changes affecting availability
|
||||
* non-trivial concurrency/queue semantics changes
|
||||
|
||||
Requirements:
|
||||
|
||||
* G2 +
|
||||
* Security scan + dependency audit (must pass, exceptions logged)
|
||||
* Migration plan (forward + rollback) reviewed
|
||||
* Load/performance checks if in hot path
|
||||
* Observability: new/updated dashboards/alerts for the change
|
||||
* Release captain / on-call sign-off (someone accountable live)
|
||||
* Progressive delivery with automatic health gates (error rate/latency)
|
||||
|
||||
#### G4 — Very high risk / safety-critical / budget-constrained releases
|
||||
|
||||
Use for:
|
||||
|
||||
* Tier 3 critical systems with low budget remaining
|
||||
* changes during freeze windows via exception
|
||||
* broad blast radius changes (platform-wide)
|
||||
* remediation after major incident where recurrence risk is high
|
||||
|
||||
Requirements:
|
||||
|
||||
* G3 +
|
||||
* Formal risk review (PM+DM+Security/SRE) in writing
|
||||
* Explicit rollback rehearsal or prior proven rollback path
|
||||
* Extended canary period with success criteria and abort criteria
|
||||
* Customer comms plan if impact is plausible
|
||||
* Post-release verification checklist executed and logged
|
||||
|
||||
### 6.3 Gate selection logic (policy)
|
||||
|
||||
Default rule:
|
||||
|
||||
1. Compute **RRS** (Risk Points) from diff + context.
|
||||
2. Map RRS to default gate:
|
||||
|
||||
* 1–5 RP → G1
|
||||
* 6–12 RP → G2
|
||||
* 13–20 RP → G3
|
||||
* 21+ RP → G4
|
||||
3. Apply modifiers:
|
||||
|
||||
* If **budget Yellow**: escalate one gate for changes ≥ G2
|
||||
* If **budget Red**: escalate one gate for changes ≥ G1 and block high-risk categories unless exception
|
||||
* If active incident or error budget severely degraded: block non-fix releases by default
|
||||
|
||||
DM must ensure the pipeline enforces this mapping automatically.
|
||||
|
||||
### 6.4 “Diff-aware” also means “blast-radius aware”
|
||||
|
||||
If the diff is inherently risky, reduce risk operationally:
|
||||
|
||||
* feature flags with cohort controls
|
||||
* dark launches (ship code disabled)
|
||||
* canary deployments
|
||||
* blue/green with quick revert
|
||||
* backwards-compatible DB migrations (expand/contract pattern)
|
||||
* circuit breakers and rate limiting
|
||||
* progressive exposure by tenant / region / account segment
|
||||
|
||||
Large diffs are not “made safe” by more reviewers; they are made safe by **reversibility and containment**.
|
||||
|
||||
---
|
||||
|
||||
## 7) Exceptions (“break glass”) policy
|
||||
|
||||
Exceptions are permitted only when one of these is true:
|
||||
|
||||
* incident mitigation or customer harm prevention,
|
||||
* urgent security fix (actively exploited or high severity),
|
||||
* legal/compliance deadline.
|
||||
|
||||
**Requirements for any exception:**
|
||||
|
||||
* Recorded rationale in the PR/release ticket
|
||||
* Named approver(s): DM + on-call owner; PM for customer-impacting risk
|
||||
* Mandatory follow-up within 5 business days:
|
||||
|
||||
* post-incident or post-release review
|
||||
* remediation tasks created and prioritized
|
||||
* **Budget penalty:** subtract additional RP (e.g., +50% of the change’s RRS) to reflect unmanaged risk
|
||||
|
||||
Repeated exceptions are a governance failure and trigger gate tightening.
|
||||
|
||||
---
|
||||
|
||||
## 8) Operational metrics (what PMs and DMs must review)
|
||||
|
||||
Minimum weekly review dashboard per service:
|
||||
|
||||
* **Risk budget remaining** (RP and %)
|
||||
* **Deploy frequency**
|
||||
* **Change failure rate**
|
||||
* **MTTR**
|
||||
* **Sev1/Sev2 count** (rolling 30/90 days)
|
||||
* **SLO / error budget status**
|
||||
* **Gate compliance rate** (how often gates were bypassed)
|
||||
* **Diff size distribution** (are we shipping huge diffs?)
|
||||
* **Rollback frequency and time-to-rollback**
|
||||
|
||||
Policy expectation:
|
||||
|
||||
* If change failure rate or MTTR worsens materially over 2 windows, budgets tighten and gate mapping escalates until stability returns.
|
||||
|
||||
---
|
||||
|
||||
## 9) Practical operating cadence
|
||||
|
||||
### Weekly (PM + DM)
|
||||
|
||||
* Review budgets and trends
|
||||
* Identify upcoming high-risk releases and plan staged rollouts
|
||||
* Confirm staffing for release windows (release captain / on-call coverage)
|
||||
* Decide whether to defer, decompose, or harden changes
|
||||
|
||||
### Per release (DM-led, PM informed)
|
||||
|
||||
* Ensure correct gate level
|
||||
* Verify rollout + rollback readiness
|
||||
* Confirm monitoring/alerts exist and are watched during rollout
|
||||
* Execute post-release verification checklist
|
||||
|
||||
### Monthly (leadership)
|
||||
|
||||
* Adjust tier assignments if product criticality changed
|
||||
* Recalibrate budget numbers based on measured outcomes
|
||||
* Identify systemic causes: test gaps, observability gaps, deployment tooling gaps
|
||||
|
||||
---
|
||||
|
||||
## 10) Required templates (standardize execution)
|
||||
|
||||
### 10.1 Release Plan (required for G2+)
|
||||
|
||||
* What is changing (1–3 bullets)
|
||||
* Expected customer impact (or “none”)
|
||||
* Diff category flags (DB/auth/infra/API/etc.)
|
||||
* Rollout strategy (canary/cohort/blue-green)
|
||||
* Abort criteria (exact metrics/thresholds)
|
||||
* Rollback steps (exact commands/process)
|
||||
* Owners during rollout (names)
|
||||
|
||||
### 10.2 Migration Plan (required for schema/data changes)
|
||||
|
||||
* Migration type: additive / expand-contract / breaking (breaking is disallowed without explicit G4 approval)
|
||||
* Backfill approach and rate limits
|
||||
* Validation checks (row counts, invariants)
|
||||
* Rollback strategy (including data implications)
|
||||
|
||||
### 10.3 Post-release Verification Checklist (G1+)
|
||||
|
||||
* Smoke test results
|
||||
* Key dashboards checked (latency, error rate, saturation)
|
||||
* Alerts status
|
||||
* User-facing workflows validated (as applicable)
|
||||
* Ticket updated with outcome
|
||||
|
||||
---
|
||||
|
||||
## 11) What “good” looks like
|
||||
|
||||
* Low-risk diffs ship quickly with minimal ceremony (G0–G1).
|
||||
* High-risk diffs are decomposed and shipped progressively, not heroically.
|
||||
* Risk budgets are visible, used in planning, and treated as a real constraint.
|
||||
* Exceptions are rare and followed by concrete remediation.
|
||||
* Over time: deploy frequency stays high while change failure rate and MTTR decrease.
|
||||
|
||||
---
|
||||
|
||||
## 12) Immediate adoption checklist (first 30 days)
|
||||
|
||||
**DM deliverables**
|
||||
|
||||
* Implement diff classification in CI/CD (at least: DB/auth/infra/API/deps/config)
|
||||
* Implement automatic gate mapping and enforcement
|
||||
* Add “release plan” and “rollback plan” checks for G2+
|
||||
* Add logging for gate overrides
|
||||
|
||||
**PM deliverables**
|
||||
|
||||
* Confirm service tiering for owned areas
|
||||
* Approve initial monthly RP budgets
|
||||
* Add risk budget review to the weekly product/engineering ritual
|
||||
* Reprioritize work when budgets hit Yellow/Red (explicitly)
|
||||
|
||||
---
|
||||
|
||||
If you want, I can also provide:
|
||||
|
||||
* a concrete scoring worksheet (ready to paste into Confluence/Notion),
|
||||
* a CI/CD policy example (e.g., GitHub Actions / GitLab rules) that computes gate level from diff patterns,
|
||||
* and a one-page “Release Captain Runbook” aligned to G2–G4.
|
||||
Reference in New Issue
Block a user