house keeping work

This commit is contained in:
StellaOps Bot
2025-12-19 22:19:08 +02:00
parent 91f3610b9d
commit 5b57b04484
64 changed files with 4702 additions and 4 deletions

View File

@@ -0,0 +1,366 @@
# A. Executive directive (send as-is to both PM + Dev)
1. **A “Release” is not an SBOM or a scan report. A Release is a “Security State Snapshot.”**
* A snapshot is a **versioned, content-addressed bundle** containing:
* SBOM graph (canonical form, hashed)
* Reachability graph (canonical form, hashed)
* VEX claim set (canonical form, hashed)
* Policies + rule versions used (hashed)
* Data-feed identifiers used (hashed)
* Toolchain versions (hashed)
2. **Diff is a product primitive, not a UI feature.**
* “Diff” must exist as a stable API and artifact, not a one-off report.
* Every comparison produces a **Delta object** (machine-readable) and a **Delta Verdict attestation** (signed).
3. **The CI/CD gate should never ask “how many CVEs?”**
* It should ask: **“What materially changed in exploitable risk since the last approved baseline?”**
* The Delta Verdict must be deterministically reproducible given the same snapshots and policy.
4. **Every Delta Verdict must be portable and auditable.**
* It must be a signed attestation that can be stored with the build artifact (OCI attach) and replayed offline.
---
# B. Product Management directions
## B1) Define the product concept: “Security Delta as the unit of governance”
**Position the capability as change-control for software risk**, not as “a scanner with comparisons.”
### Primary user stories (MVP)
1. **Release Manager / Security Engineer**
* “Compare the candidate build to the last approved build and explain *what changed* in exploitable risk.”
2. **CI Pipeline Owner**
* “Fail the build only for *new* reachable high-risk exposures (or policy-defined deltas), not for unchanged legacy issues.”
3. **Auditor / Compliance**
* “Show a signed delta verdict with evidence references proving why this release passed.”
### MVP “Delta Verdict” policy questions to support
* Are there **new reachable vulnerabilities** introduced?
* Did any **previously unreachable vulnerability become reachable**?
* Are there **new affected VEX states** (e.g., NOT_AFFECTED → AFFECTED)?
* Are there **new Unknowns** above a threshold?
* Is the **net exploitable surface** increased beyond policy budget?
## B2) Define the baseline selection rules (product-critical)
Diff is meaningless without a baseline contract. Product must specify baseline selection as a first-class choice.
Minimum baseline modes:
* **Previous build in the same pipeline**
* **Last “approved” snapshot** (from an approval gate)
* **Last deployed in environment X** (optional later, but roadmap it)
Acceptance criteria:
* The delta object must always contain:
* `baseline_snapshot_digest`
* `target_snapshot_digest`
* `baseline_selection_method` and identifiers
## B3) Define the delta taxonomy (what your product “knows” how to talk about)
Avoid “diffing findings lists.” You need consistent delta categories.
Minimum taxonomy:
1. **SBOM deltas**
* Component added/removed
* Component version change
* Dependency edge change (graph-level)
2. **VEX deltas**
* Claim added/removed
* Status change (e.g., under_investigation → fixed)
* Justification/evidence change (optional MVP)
3. **Reachability deltas**
* New reachable vulnerable symbol(s)
* Removed reachability
* Entry point changes
4. **Decision deltas**
* Policy outcome changed (PASS → FAIL)
* Explanation changed (drivers of decision)
PM deliverable:
* A one-page **Delta Taxonomy Spec** that becomes the canonical list used across API, UI, and attestations.
## B4) Define what “signed delta verdict” means in product terms
A delta verdict is not a PDF.
It is:
* A deterministic JSON payload
* Wrapped in a signature envelope (DSSE)
* Attached to the artifact (OCI attach)
* Includes pointers (hash references) to evidence graphs
PM must define:
* Where customers can view it (UI + CLI)
* Where it lives (artifact registry + Stella store)
* How it is consumed (policy gate, audit export)
## B5) PM success metrics (must be measurable)
* % of releases gated by delta verdict
* Mean time to explain “why failed”
* Reduction in “unchanged legacy vuln” false gating
* Reproducibility rate: same inputs → same verdict (target: 100%)
---
# C. Development Management directions
## C1) Architecture: treat Snapshot and Delta as immutable, content-addressed objects
You need four core services/modules:
1. **Canonicalization + Hashing**
* Deterministic serialization (stable field ordering, normalized IDs)
* Content addressing: every graph and claim set gets a digest
2. **Snapshot Store (Ledger)**
* Store snapshots keyed by digest
* Store relationships: artifact → snapshot, snapshot → predecessor(s)
* Must support offline export/import later (design now)
3. **Diff Engine**
* Inputs: `baseline_snapshot_digest`, `target_snapshot_digest`
* Outputs:
* `delta_object` (structured)
* `delta_summary` (human-friendly)
* Must be deterministic and testable with golden fixtures
4. **Verdict Engine + Attestation Writer**
* Evaluate policies against delta
* Produce `delta_verdict`
* Wrap as DSSE / in-toto-style statement (or your chosen predicate type)
* Sign and optionally attach to OCI artifact
## C2) Data model (minimum viable schemas)
### Snapshot (conceptual fields)
* `snapshot_id` (digest)
* `artifact_ref` (e.g., image digest)
* `sbom_graph_digest`
* `reachability_graph_digest`
* `vex_claimset_digest`
* `policy_bundle_digest`
* `feed_snapshot_digest`
* `toolchain_digest`
* `created_at`
### Delta object (conceptual fields)
* `delta_id` (digest)
* `baseline_snapshot_digest`
* `target_snapshot_digest`
* `sbom_delta` (structured)
* `reachability_delta` (structured)
* `vex_delta` (structured)
* `unknowns_delta` (structured)
* `derived_risk_delta` (structured)
* `created_at`
### Delta verdict attestation (must include)
* Subjects: artifact digest(s)
* Baseline snapshot digest + Target snapshot digest
* Policy bundle digest
* Verdict enum: PASS/WARN/FAIL
* Drivers: references to delta nodes (hash pointers)
* Signature metadata
## C3) Determinism requirements (non-negotiable)
Development must implement:
* **Canonical ID scheme** for components and graph nodes
(example: package URL + version + supplier + qualifiers, then hashed)
* Stable sorting for node/edge lists
* Stable normalization of timestamps (do not include wall-clock in hash inputs unless explicitly policy-relevant)
* A “replay test harness”:
* Given the same inputs, byte-for-byte identical snapshot/delta/verdict
Definition of Done:
* Golden test vectors for snapshots and deltas checked into repo
* Deterministic hashing tests in CI
## C4) Graph diff design (how to do it without drowning in noise)
### SBOM graph diff (MVP)
Implement:
* Node set delta: added/removed/changed nodes (by stable node ID)
* Edge set delta: added/removed edges (dependency relations)
* A “noise suppressor” layer:
* ignore ordering differences
* ignore metadata-only changes unless policy enables
Output should identify:
* “What changed?” (added/removed/upgraded/downgraded)
* “Why it matters?” (ties to vulnerability & reachability where available)
### VEX claimset diff (MVP)
Implement:
* Keyed by `(product/artifact scope, component ID, vulnerability ID)`
* Delta types:
* claim added/removed
* status changed
* justification changed (optional later)
### Reachability diff (incremental approach)
MVP can start narrow:
* Support one or two ecosystems initially (e.g., Java + Maven, or Go modules)
* Represent reachability as:
* `entrypoint → function/symbol → vulnerable symbol`
* Diff should highlight:
* Newly reachable vulnerable symbols
* Removed reachability
Important: even if reachability is initially partial, the diff model must support it cleanly (unknowns must exist).
## C5) Policy evaluation must run on delta, not on raw findings
Define a policy DSL contract like:
* `fail_if new_reachable_critical > 0`
* `warn_if new_unknowns > 10`
* `fail_if vex_status_regressed == true`
* `pass_if no_net_increase_exploitable_surface == true`
Engineering directive:
* Policies must reference **delta fields**, not scanner-specific output.
* Keep the policy evaluation pure and deterministic.
## C6) Signing and attachment (implementation-level)
Minimum requirements:
* Support signing delta verdict as a DSSE envelope with a stable predicate type.
* Support:
* keyless signing (optional)
* customer-managed keys (enterprise)
* Attach to OCI artifact as an attestation (where possible), and store in Stella ledger for retrieval.
Definition of Done:
* A CI workflow can:
1. create snapshots
2. compute delta
3. produce signed delta verdict
4. verify signature and gate
---
# D. Roadmap (sequenced to deliver value early without painting into a corner)
## Phase 1: “Snapshot + SBOM Diff + Delta Verdict”
* Version SBOM graphs
* Diff SBOM graphs
* Produce delta verdict based on SBOM delta + vulnerability delta (even before reachability)
* Signed delta verdict artifact exists
Output:
* Baseline/target selection
* Delta taxonomy v1
* Signed delta verdict v1
## Phase 2: “VEX claimsets and VEX deltas”
* Ingest OpenVEX/CycloneDX/CSAF
* Store canonical claimsets per snapshot
* Diff claimsets and incorporate into delta verdict
Output:
* “VEX status regression” gating works deterministically
## Phase 3: “Reachability graphs and reachability deltas”
* Start with one ecosystem
* Generate reachability evidence
* Diff reachability and incorporate into verdict
Output:
* “new reachable critical” becomes the primary gate
## Phase 4: “Offline replay bundle”
* Export/import snapshot + feed snapshot + policy bundle
* Replay delta verdict identically in air-gapped environment
---
# E. Acceptance criteria checklist (use this as a release gate for your own feature)
A feature is not done until:
1. **Snapshot is content-addressed** and immutable.
2. **Delta is content-addressed** and immutable.
3. Delta shows:
* SBOM delta
* VEX delta (when enabled)
* Reachability delta (when enabled)
* Unknowns delta
4. **Delta verdict is signed** and verification is automated.
5. **Replay test**: given same baseline/target snapshots + policy bundle, verdict is identical byte-for-byte.
6. The product answers, clearly:
* What changed?
* Why does it matter?
* Why is the verdict pass/fail?
* What evidence supports this?
---
# F. What to tell your teams to avoid (common failure modes)
* Do **not** ship “diff” as a UI compare of two scan outputs.
* Do **not** make reachability an unstructured “note” field; it must be a graph with stable IDs.
* Do **not** allow non-deterministic inputs into verdict hashes (timestamps, random IDs, nondeterministic ordering).
* Do **not** treat VEX as “ignore rules” only; treat it as a claimset with provenance and merge semantics (even if merge comes later).

View File

@@ -0,0 +1,234 @@
## 1) Define the product primitive (non-negotiable)
### Directive (shared)
**The products primary output is not “findings.” It is a “Risk Verdict Attestation” (RVA).**
Everything else (SBOMs, CVEs, VEX, reachability, reports) is *supporting evidence* referenced by the RVA.
### What “first-class artifact” means in practice
1. **The verdict is an OCI artifact “referrer” attached to a specific image/artifact digest** via OCI 1.1 `subject` and discoverable via the referrers API. ([opencontainers.org][1])
2. **The verdict is cryptographically signed** (at least one supported signing pathway).
* DSSE is a standard approach for signing attestations, and cosign supports creating/verifying intoto attestations signed with DSSE. ([Sigstore][2])
* Notation is a widely deployed approach for signing/verifying OCI artifacts in enterprise environments. ([Microsoft Learn][3])
---
## 2) Directions for Product Managers (PM)
### A. Write the “Risk Verdict Attestation v1” product contract
**Deliverable:** A one-page contract + schema that product and customers can treat as an API.
Minimum fields the contract must standardize:
* **Subject binding:** exact OCI digest, repo/name, platform (if applicable)
* **Verdict:** `PASS | FAIL | PASS_WITH_EXCEPTIONS | INDETERMINATE`
* **Policy reference:** policy ID, policy digest, policy version, enforcement mode
* **Knowledge snapshot reference:** snapshot ID + digest (see replay semantics below)
* **Evidence references:** digests/pointers for SBOM, VEX inputs, vuln feed snapshot, reachability proof(s), config snapshot, and unknowns summary
* **Reason codes:** stable machine-readable codes (`RISK.CVE.REACHABLE`, `RISK.VEX.NOT_AFFECTED`, `RISK.UNKNOWN.INPUT_MISSING`, etc.)
* **Human explanation stub:** short rationale text plus links/IDs for deeper evidence
**Key PM rule:** the contract must be **stable and versioned**, with explicit deprecation rules. If you cant maintain compatibility, ship a new version (v2), dont silently mutate v1.
Why: OCI referrers create long-lived metadata chains. Breaking them is a customer trust failure.
### B. Define strict replay semantics as a product requirement (not “nice to have”)
PM must specify what “same inputs” means. At minimum, inputs include:
* artifact digest (subject)
* policy bundle digest
* vulnerability dataset snapshot digest(s)
* VEX bundle digest(s)
* SBOM digest(s) or SBOM generation recipe digest
* scoring rules version/digest
* engine version
* reachability configuration version/digest (if enabled)
**Product acceptance criterion:**
When a user re-runs evaluation in “replay mode” using the same knowledge snapshot and policy digest, the **verdict and reason codes must match** (byte-for-byte identical predicate is ideal; if not, the deterministic portion must match exactly).
OCI 1.1 and ORAS guidance also implies you should avoid shoving large evidence into annotations; store large evidence as blobs and reference by digest. ([opencontainers.org][1])
### C. Make “auditor evidence extraction” a first-order user journey
Define the auditor journey as a separate persona:
* Auditor wants: “Prove why you blocked/allowed artifact X at time Y.”
* They should be able to:
1. Verify the signature chain
2. Extract the decision + evidence package
3. Replay the evaluation
4. Produce a human-readable report without bespoke consulting
**PM feature requirements (v1)**
* `explain` experience that outputs:
* decision summary
* policy used
* evidence references and hashes
* top N reasons (with stable codes)
* unknowns and assumptions
* `export-audit-package` experience:
* exports a ZIP (or OCI bundle) containing the RVA, its referenced evidence artifacts, and a machine-readable manifest listing all digests
* `verify` experience:
* verifies signature + policy expectations (who is trusted to sign; which predicate type(s) are acceptable)
Cosign explicitly supports creating/verifying intoto attestations (DSSE-signed) and even validating custom predicates against policy languages like Rego/CUE—this is a strong PM anchor for ecosystem interoperability. ([Sigstore][2])
---
## 3) Directions for Development Managers (Dev/Eng)
### A. Implement OCI attachment correctly (artifact, referrer, fallback)
**Engineering decisions:**
1. Store RVA as an OCI artifact manifest with:
* `artifactType` set to your verdict media type
* `subject` pointing to the exact image/artifact digest being evaluated
OCI 1.1 introduced these fields for associating metadata artifacts and retrieving them via the referrers API. ([opencontainers.org][1])
2. Support discovery via:
* Referrers API (`GET /v2/<name>/referrers/<digest>`) when registry supports it
* **Fallback “tagged index” strategy** for registries that dont support referrers (OCI 1.1 guidance calls out a fallback tag approach and client responsibilities). ([opencontainers.org][1])
**Dev acceptance tests**
* Push subject image → push RVA artifact with `subject` → query referrers → RVA appears.
* On a registry without referrers support: fallback retrieval still works.
### B. Use a standard attestation envelope and signing flow
For attestations, the lowest friction pathway is:
* intoto Statement + DSSE envelope
* Sign/verify using cosign-compatible workflows (so customers can verify without you) ([Sigstore][2])
DSSE matters because it:
* authenticates message + type
* avoids canonicalization pitfalls
* supports arbitrary encodings ([GitHub][4])
**Engineering rule:** the signed payload must include enough data to replay and audit (policy + knowledge snapshot digests), but avoid embedding huge evidence blobs directly.
### C. Build determinism into the evaluation core (not bolted on)
**“Same inputs → same verdict” is a software architecture constraint.**
It fails if any of these are non-deterministic:
* fetching “latest” vulnerability DB at runtime
* unstable iteration order (maps/hashes)
* timestamps included as decision inputs
* concurrency races changing aggregation order
* floating point scoring without canonical rounding
**Engineering requirements**
1. Create a **Knowledge Snapshot** object (content-addressed):
* a manifest listing every dataset input by digest and version
2. The evaluation function becomes:
* `Verdict = Evaluate(subject_digest, policy_digest, knowledge_snapshot_digest, engine_version, options_digest)`
3. The RVA must embed those digests so replay is possible offline.
**Dev acceptance tests**
* Run Evaluate twice with same snapshot/policy → verdict + reason codes identical.
* Run Evaluate with one dataset changed (snapshot digest differs) → RVA must reflect changed snapshot digest.
### D. Treat “evidence” as a graph of content-addressed artifacts
Implement evidence storage with these rules:
* Large evidence artifacts are stored as OCI blobs/artifacts (SBOM, VEX bundle, reachability proof graph, config snapshot).
* RVA references evidence by digest and type.
* “Explain” traverses this graph and renders:
* a machine-readable explanation JSON
* a human-readable report
ORAS guidance highlights artifact typing via `artifactType` in OCI 1.1 and suggests keeping manifests manageable; dont overload annotations. ([oras.land][5])
### E. Provide a verification and policy enforcement path
You want customers to be able to enforce “only run artifacts with an approved RVA predicate.”
Two practical patterns:
* **Cosign verification of attestations** (customers can do `verify-attestation` and validate predicate structure; cosign supports validating attestations with policy languages like Rego/CUE). ([Sigstore][2])
* **Notation signatures** for organizations that standardize on Notary/Notation for OCI signing/verification workflows. ([Microsoft Learn][3])
Engineering should not hard-code one choice; implement an abstraction:
* signing backend: `cosign/DSSE` first
* optional: notation signature over the RVA artifact for environments that require it
---
## 4) Minimal “v1” spec by example (what your teams should build)
### A. OCI artifact requirements (registry-facing)
* artifact is discoverable as a referrer via `subject` linkage and `artifactType` classification (OCI 1.1). ([opencontainers.org][1])
### B. Attestation payload structure (contract-facing)
In code terms (illustrative only), build on the intoto Statement model:
```json
{
"_type": "https://in-toto.io/Statement/v0.1",
"subject": [
{
"name": "oci://registry.example.com/team/app",
"digest": { "sha256": "<SUBJECT_DIGEST>" }
}
],
"predicateType": "https://stellaops.dev/attestations/risk-verdict/v1",
"predicate": {
"verdict": "FAIL",
"reasonCodes": ["RISK.CVE.REACHABLE", "RISK.POLICY.THRESHOLD_EXCEEDED"],
"policy": { "id": "prod-gate", "digest": "sha256:<POLICY_DIGEST>" },
"knowledgeSnapshot": { "id": "ks-2025-12-19", "digest": "sha256:<KS_DIGEST>" },
"evidence": {
"sbom": { "digest": "sha256:<SBOM_DIGEST>", "format": "cyclonedx-json" },
"vexBundle": { "digest": "sha256:<VEX_DIGEST>", "format": "openvex" },
"vulnData": { "digest": "sha256:<VULN_FEEDS_DIGEST>" },
"reachability": { "digest": "sha256:<REACH_PROOF_DIGEST>" },
"unknowns": { "count": 2, "digest": "sha256:<UNKNOWNS_DIGEST>" }
},
"engine": { "name": "stella-eval", "version": "1.3.0" }
}
}
```
Cosign supports creating and verifying intoto attestations (DSSE-signed), which is exactly the interoperability you want for customer-side verification. ([Sigstore][2])
---
## 5) Definition of Done (use this to align PM/Eng and prevent scope drift)
### v1 must satisfy all of the following:
1. **OCI-attached:** RVA is stored as an OCI artifact referrer to the subject digest and discoverable (referrers API + fallback mode). ([opencontainers.org][1])
2. **Signed:** RVA can be verified by a standard toolchain (cosign at minimum). ([Sigstore][2])
3. **Replayable:** Given the embedded policy + knowledge snapshot digests, the evaluation can be replayed and produces the same verdict + reason codes.
4. **Auditor extractable:** One command produces an audit package containing:
* RVA attestation
* policy bundle
* knowledge snapshot manifest
* referenced evidence artifacts
* an “explanation report” rendering the decision
5. **Stable contract:** predicate schema is versioned and validated (strict JSON schema checks; backwards compatibility rules).

View File

@@ -0,0 +1,463 @@
## Outcome you are shipping
A deterministic “claim resolution” capability that takes:
* Multiple **claims** about the same vulnerability (vendor VEX, distro VEX, internal assessments, scanner inferences),
* A **policy** describing trust and merge semantics,
* A set of **evidence artifacts** (SBOM, config snapshots, reachability proofs, etc.),
…and produces a **single resolved status** per vulnerability/component/artifact **with an explainable trail**:
* Which claims applied and why
* Which were rejected and why
* What evidence was required and whether it was satisfied
* What policy rules triggered the resolution outcome
This replaces naive precedence like `vendor > distro > internal`.
---
# Directions for Product Managers
## 1) Write the PRD around “claims resolution,” not “VEX support”
The customer outcome is not “we ingest VEX.” It is:
* “We can *safely* accept not affected without hiding risk.”
* “We can prove, to auditors and change control, why a CVE was downgraded.”
* “We can consistently resolve conflicts between issuer statements.”
### Non-negotiable product properties
* **Deterministic**: same inputs → same resolved outcome
* **Explainable**: a human can trace the decision path
* **Guardrailed**: a “safe” resolution requires evidence, not just a statement
---
## 2) Define the core objects (these drive everything)
In the PRD, define these three objects explicitly:
### A) Claim (normalized)
A “claim” is any statement about vulnerability applicability to an artifact/component, regardless of source format.
Minimum fields:
* `vuln_id` (CVE/GHSA/etc.)
* `subject` (component identity; ideally package + version + digest/purl)
* `target` (the thing were evaluating: image, repo build, runtime instance)
* `status` (affected / not_affected / fixed / under_investigation / unknown)
* `justification` (human/machine reason)
* `issuer` (who said it; plus verification state)
* `scope` (what it applies to; versions, ranges, products)
* `timestamp` (when produced)
* `references` (links/IDs to evidence or external material)
### B) Evidence
A typed artifact that can satisfy a requirement.
Examples (not exhaustive):
* `config_snapshot` (e.g., Helm values, env var map, feature flag export)
* `sbom_presence_or_absence` (SBOM proof that component is/ isnt present)
* `reachability_proof` (call-path evidence from entrypoint to vulnerable symbol)
* `symbol_absence` (binary inspection shows symbol/function not present)
* `patch_presence` (artifact includes backport / fixed build)
* `manual_attestation` (human-reviewed attestation with reviewer identity + scope)
Each evidence item must have:
* `type`
* `collector` (tool/provider)
* `inputs_hash` and `output_hash`
* `scope` (what artifact/environment it applies to)
* `confidence` (optional but recommended)
* `expires_at` / `valid_for` (for config/runtime evidence)
### C) Policy
A policy describes:
* **Trust rules** (how much to trust whom, under which conditions)
* **Merge semantics** (how to resolve conflicts)
* **Evidence requirements** (what must be present to accept certain claims)
---
## 3) Ship “policy-controlled merge semantics” as a configuration schema first
Do not start with a fully general policy language. You need a small, explicit schema that makes behavior predictable.
PM deliverable: a policy spec with these sections:
1. **Issuer trust**
* weights by issuer category (vendor/distro/internal/scanner)
* optional constraints (must be signed, must match product ownership, must be within time window)
2. **Applicability rules**
* what constitutes a match to artifact/component (range semantics, digest match priority)
3. **Evidence requirements**
* per status + per justification: what evidence types are required
4. **Conflict resolution strategy**
* conservative vs weighted vs most-specific
* explicit guardrails (never accept “safe” without evidence)
5. **Override rules**
* when internal can override vendor (and what evidence is required to do so)
* environment-specific policies (prod vs dev)
---
## 4) Make “evidence hooks” a first-class user workflow
You are explicitly shipping the ability to say:
> “This is not affected **because** feature flag X is off.”
That requires:
* a way to **provide or discover** feature flag state, and
* a way to **bind** that flag to the vulnerable surface
PM must specify: what does the user do to assert that?
Minimum viable workflow:
* User attaches a `config_snapshot` (or system captures it)
* User provides a “binding” to the vulnerable module/function:
* either automatic (later) or manual (first release)
* e.g., `flag X gates module Y` with references (file path, code reference, runbook)
This “binding” itself becomes evidence.
---
## 5) Define acceptance criteria as decision trace tests
PM should write acceptance criteria as “given claims + policy + evidence → resolved outcome + trace”.
You need at least these canonical tests:
1. **Distro backport vs vendor version logic conflict**
* Vendor says affected (by version range)
* Distro says fixed (backport)
* Policy says: in distro context, distro claim can override vendor if patch evidence exists
* Outcome: fixed, with trace proving why
2. **Internal feature flag off downgrade**
* Vendor says affected
* Internal says not_affected because flag off
* Evidence: config snapshot + flag→module binding
* Outcome: not_affected **only for that environment context**, with trace
3. **Evidence missing**
* Internal says not_affected because “code not reachable”
* No reachability evidence present
* Outcome: unknown or affected (policy-dependent), but **not “not_affected”**
4. **Conflicting “safe” claims**
* Vendor says not_affected (reason A)
* Internal says affected (reason B) with strong evidence
* Outcome follows merge strategy, and trace must show why.
---
## 6) Package it as an “Explainable Resolution” feature
UI/UX requirements PM must specify:
* A “Resolved Status” view per vuln/component showing:
* contributing claims (ranked)
* rejected claims (with reason)
* evidence required vs evidence present
* the policy clauses triggered (line-level references)
* A policy editor can be CLI/JSON first; UI later, but explainability cannot wait.
---
# Directions for Development Managers
## 1) Implement as three services/modules with strict interfaces
### Module A: Claim Normalization
* Inputs: OpenVEX / CycloneDX VEX / CSAF / internal annotations / scanner hints
* Output: canonical `Claim` objects
Rules:
* Canonicalize IDs (normalize CVE formats, normalize package coordinates)
* Preserve provenance: issuer identity, signature metadata, timestamps, original document hash
### Module B: Evidence Providers (plugin boundary)
* Provide an interface like:
```
evaluate_evidence(context, claim) -> EvidenceEvaluation
```
Where `EvidenceEvaluation` returns:
* required evidence types for this claim (from policy)
* found evidence items (from store/providers)
* satisfied / not satisfied
* explanation strings
* confidence
Start with 3 providers:
1. SBOM provider (presence/absence)
2. Config provider (feature flags/config snapshot ingestion)
3. Reachability provider (even if initially limited or stubbed, it must exist as a typed hook)
### Module C: Merge & Resolution Engine
* Inputs: set of claims + policy + evidence evaluations + context
* Output: `ResolvedDecision`
A `ResolvedDecision` must include:
* final status
* selected “winning” claim(s)
* all considered claims
* evidence satisfaction summary
* applied policy rule IDs
* deterministic ordering keys/hashes
---
## 2) Define the evaluation context (this avoids foot-guns)
The resolved outcome must be context-aware.
Create an immutable `EvaluationContext` object, containing:
* artifact identity (image digest / build digest / SBOM hash)
* environment identity (prod/stage/dev; cluster; region)
* config snapshot ID
* time (evaluation timestamp)
* policy version hash
This is how you support: “not affected because feature flag off” in prod but not in dev.
---
## 3) Merge semantics: implement scoring + guardrails, not precedence
You need a deterministic function. One workable approach:
### Step 1: compute statement strength
For each claim:
* `trust_weight` from policy (issuer + scope + signature requirements)
* `evidence_factor` (1.0 if requirements satisfied; <1 or 0 if not)
* `specificity_factor` (exact digest match > exact version > range)
* `freshness_factor` (optional; policy-defined)
* `applicability` must be true or claim is excluded
Compute:
```
support = trust_weight * evidence_factor * specificity_factor * freshness_factor
```
### Step 2: apply merge strategy (policy-controlled)
Ship at least two strategies:
1. **Conservative default**
* If any “unsafe” claim (affected/under_investigation) has support above threshold, it wins
* A “safe” claim (not_affected/fixed) can override only if:
* it has equal/higher support + delta, AND
* its evidence requirements are satisfied
2. **Evidence-weighted**
* Highest support wins, but safe statuses have a hard evidence gate
### Step 3: apply guardrails
Hard guardrail to prevent bad outcomes:
* **Never emit a safe status unless evidence requirements for that safe claim are satisfied.**
* If a safe claim lacks evidence, downgrade the safe claim to “unsupported” and do not allow it to win.
This single rule is what makes your system materially different from “VEX as suppression.”
---
## 4) Evidence hooks: treat them as typed contracts, not strings
For “feature flag off,” implement it as a structured evidence requirement.
Example evidence requirement for a “safe because feature flag off” claim:
* Required evidence types:
* `config_snapshot`
* `flag_binding` (the mapping “flag X gates vulnerable surface Y”)
Implementation:
* Config provider can parse:
* Helm values / env var sets / feature flag exports
* Store them as normalized key/value with hashes
* Binding evidence can start as manual JSON that references:
* repo path / module / function group
* a link to code ownership / runbook
* optional test evidence
Later you can automate binding via static analysis, but do not block shipping on that.
---
## 5) Determinism requirements (engineering non-negotiables)
Development manager should enforce:
* stable sorting of claims by canonical key
* stable tie-breakers (e.g., issuer ID, timestamp, claim hash)
* no nondeterministic external calls during evaluation (or they must be snapshot-based)
* every evaluation produces:
* `input_bundle_hash` (claims + evidence + policy + context)
* `decision_hash`
This is the foundation for replayability and audits.
---
## 6) Storage model: store raw inputs and canonical forms
Minimum stores:
* Raw documents (original VEX/CSAF/etc.) keyed by content hash
* Canonical claims keyed by claim hash
* Evidence items keyed by evidence hash and scoped by context
* Policy versions keyed by policy hash
* Resolutions keyed by (context, vuln_id, subject) with decision hash
---
## 7) “Definition of done” checklist for engineering
You are done when:
1. You can ingest at least two formats into canonical claims (pick OpenVEX + CycloneDX VEX first).
2. You can configure issuer trust and evidence requirements in a policy file.
3. You can resolve conflicts deterministically.
4. You can attach a config snapshot and produce:
* `not_affected because feature flag off` **only when evidence satisfied**
5. The system produces a decision trace with:
* applied policy rules
* evidence satisfaction
* selected/rejected claims and reasons
6. Golden test vectors exist for the acceptance scenarios listed above.
---
# A concrete example policy (schema-first, no full DSL required)
```yaml
version: 1
trust:
issuers:
- match: {category: vendor}
weight: 70
require_signature: true
- match: {category: distro}
weight: 75
require_signature: true
- match: {category: internal}
weight: 85
require_signature: false
- match: {category: scanner}
weight: 40
evidence_requirements:
safe_status_requires_evidence: true
rules:
- when:
status: not_affected
reason: feature_flag_off
require: [config_snapshot, flag_binding]
- when:
status: not_affected
reason: component_not_present
require: [sbom_absence]
- when:
status: not_affected
reason: not_reachable
require: [reachability_proof]
merge:
strategy: conservative
unsafe_wins_threshold: 50
safe_override_delta: 10
```
---
# A concrete example output trace (what auditors and engineers must see)
```json
{
"vuln_id": "CVE-XXXX-YYYY",
"subject": "pkg:maven/org.example/foo@1.2.3",
"context": {
"artifact_digest": "sha256:...",
"environment": "prod",
"policy_hash": "sha256:..."
},
"resolved_status": "not_affected",
"because": [
{
"winning_claim": "claim_hash_abc",
"reason": "feature_flag_off",
"evidence_required": ["config_snapshot", "flag_binding"],
"evidence_present": ["ev_hash_1", "ev_hash_2"],
"policy_rules_applied": ["trust.issuers[internal]", "evidence.rules[0]", "merge.safe_override_delta"]
}
],
"claims_considered": [
{"issuer": "vendor", "status": "affected", "support": 62, "accepted": false, "rejection_reason": "overridden_by_higher_support_safe_claim_with_satisfied_evidence"},
{"issuer": "internal", "status": "not_affected", "support": 78, "accepted": true, "evidence_satisfied": true}
],
"decision_hash": "sha256:..."
}
```
---
## The two strategic pitfalls to explicitly avoid
1. **“Trust precedence” as the merge mechanism**
* It will fail immediately on backports, forks, downstream patches, and environment-specific mitigations.
2. **Allowing “safe” without evidence**
* That turns VEX into a suppression system and will collapse trust in the product.

View File

@@ -0,0 +1,338 @@
## Executive directive
Build **Reachability as Evidence**, not as a UI feature.
Every reachability conclusion must produce a **portable, signed, replayable evidence bundle** that answers:
1. **What vulnerable code unit is being discussed?** (symbol/method/function + version)
2. **What entrypoint is assumed?** (HTTP handler, RPC method, CLI, scheduled job, etc.)
3. **What is the witness?** (a call-path subgraph, not a screenshot)
4. **What assumptions/gates apply?** (config flags, feature toggles, runtime wiring)
5. **Can a third party reproduce it?** (same inputs → same evidence hash)
This must work for **source** and **post-build artifacts**.
---
# Directions for Product Managers
## 1) Define the product contract in one page
### Capability name
**Proofcarrying reachability**.
### Contract
Given an artifact (source or built) and a vulnerability mapping, Stella Ops outputs:
- **Reachability verdict:** `REACHABLE | NOT_PROVEN_REACHABLE | INCONCLUSIVE`
- **Witness evidence:** a minimal **reachability subgraph** + one or more witness paths
- **Reproducibility bundle:** all inputs and toolchain metadata needed to replay
- **Attestation:** signed statement tied to the artifact digest
### Important language choice
Avoid claiming “unreachable” unless you can prove non-reachability under a formally sound model.
- Use **NOT_PROVEN_REACHABLE** for “no path found under current analysis + assumptions.”
- Use **INCONCLUSIVE** when analysis cannot be performed reliably (missing symbols, obfuscation, unsupported language, dynamic dispatch uncertainty, etc.).
This is essential for credibility and audit use.
---
## 2) Anchor personas and top workflows
### Primary personas
- Security governance / AppSec: wants fewer false positives and defensible prioritization.
- Compliance/audit: wants evidence and replayability.
- Engineering teams: wants specific call paths and what to change.
### Top workflows (must support in MVP)
1. **CI gate with signed verdict**
- “Block release if any `REACHABLE` high severity is present OR if `INCONCLUSIVE` exceeds threshold.”
2. **Audit replay**
- “Reproduce the reachability proof for artifact digest X using snapshot Y.”
3. **Release delta**
- “Show what reachability changed between release A and B.”
---
## 3) Minimum viable scope: pick targets that make “post-build” real early
To satisfy “source and post-build artifacts” without biting off ELF-level complexity first:
### MVP artifact types (recommended)
- **Source repository** for 12 languages with mature static IR
- **Post-build intermediate artifacts** that retain symbol structure:
- Java `.jar/.class`
- .NET assemblies
- Python wheels (bytecode)
- Node bundles with sourcemaps (optional)
These give you “post-build” support where call graphs are tractable.
### Defer for later phases
- Native ELF/Mach-O deep reachability (harder due to stripping, inlining, indirect calls, dynamic loading)
- Highly dynamic languages without strong type info, unless you accept “witness-only” semantics
Your differentiator is proof portability and determinism, not “supports every binary on day one.”
---
## 4) Product requirements: what “proof-carrying” means in requirements language
### Functional requirements
- Output must include a **reachability subgraph**:
- Nodes = code units (function/method) with stable IDs
- Edges = call or dispatch edges with type annotations
- Must include at least one **witness path** from entrypoint to vulnerable node when `REACHABLE`
- Output must be **artifact-tied**:
- Evidence must reference artifact digest(s) (source commit, build artifact digest, container image digest)
- Output must be **attestable**:
- Produce a signed attestation (DSSE/in-toto style) attached to the artifact digest
- Output must be **replayable**:
- Provide a “replay recipe” (analyzer versions, configs, vulnerability mapping version, and input digests)
### Non-functional requirements
- Deterministic: repeated runs on same inputs produce identical evidence hash
- Size-bounded: subgraph evidence must be bounded (e.g., path-based extraction + limited context)
- Privacy-controllable:
- Support a mode that avoids embedding raw source content (store pointers/hashes instead)
- Verifiable offline:
- Verification and replay must work air-gapped given the snapshot bundle
---
## 5) Acceptance criteria (use as Definition of Done)
A feature is “done” only when:
1. **Verifier can validate** the attestation signature and confirm the evidence hash matches content.
2. A second machine can **reproduce the same evidence hash** given the replay bundle.
3. Evidence includes at least one witness path for `REACHABLE`.
4. Evidence includes explicit assumptions/gates; absence of gating is recorded as an assumption (e.g., “config unknown”).
5. Evidence is **linked to the precise artifact digest** being deployed/scanned.
---
## 6) Product packaging decisions that create switching cost
These are product decisions that turn engineering into moat:
- **Make “reachability proof” an exportable object**, not just a UI view.
- Provide an API: `GET /findings/{id}/proof` returning canonical evidence.
- Support policy gates on:
- `verdict`
- `confidence`
- `assumption_count`
- `inconclusive_reasons`
- Make “proof replay” a one-command workflow in CLI.
---
# Directions for Development Managers
## 1) Architecture: build a “proof pipeline” with strict boundaries
Implement as composable modules with stable interfaces:
1. **Artifact Resolver**
- Inputs: repo URL/commit, build artifact path, container image digest
- Output: normalized “artifact record” with digests and metadata
2. **Graph Builder (language-specific adapters)**
- Inputs: artifact record
- Output: canonical **Program Graph**
- Nodes: code units
- Edges: calls/dispatch
- Optional: config gates, dependency edges
3. **Vulnerability-to-Code Mapper**
- Inputs: vulnerability record (CVE), package coordinates, symbol metadata (if available)
- Output: vulnerable node set + mapping confidence
4. **Entrypoint Modeler**
- Inputs: artifact + runtime context (framework detection, routing tables, main methods)
- Output: entrypoint node set with types (HTTP, RPC, CLI, cron)
5. **Reachability Engine**
- Inputs: graph + entrypoints + vulnerable nodes + constraints
- Output: witness paths + minimal subgraph extraction
6. **Evidence Canonicalizer**
- Inputs: witness paths + subgraph + metadata
- Output: canonical JSON (stable ordering, stable IDs), plus content hash
7. **Attestor**
- Inputs: evidence hash + artifact digest
- Output: signed attestation object (OCI attachable)
8. **Verifier (separate component)**
- Must validate signatures + evidence integrity independently of generator
Critical: generator and verifier must be decoupled to preserve trust.
---
## 2) Evidence model: what to store (and how to keep it stable)
### Node identity must be stable across runs
Define a canonical NodeID scheme:
- Source node ID:
- `{language}:{repo_digest}:{symbol_signature}:{optional_source_location_hash}`
- Post-build node ID:
- `{language}:{artifact_digest}:{symbol_signature}:{optional_offset_or_token}`
Avoid raw file paths or non-deterministic compiler offsets as primary IDs unless normalized.
### Edge identity
`{caller_node_id} -> {callee_node_id} : {edge_type}`
Edge types matter (direct call, virtual dispatch, reflection, dynamic import, etc.)
### Subgraph extraction rule
Store:
- All nodes/edges on at least one witness path (or k witness paths)
- Plus bounded context:
- 12 hop neighborhood around the vulnerable node and entrypoint
- routing edges (HTTP route → handler) where applicable
This makes the proof compact and audit-friendly.
### Canonicalization requirements
- Stable sorting of nodes and edges
- Canonical JSON serialization (no map-order nondeterminism)
- Explicit analyzer version + config included in evidence
- Hash everything that influences results
---
## 3) Determinism and reproducibility: engineering guardrails
### Deterministic computation
- Avoid parallel graph traversal that yields nondeterministic order without canonical sorting
- If using concurrency, collect results and sort deterministically before emitting
### Repro bundle (“time travel”)
Persist, as digests:
- Analyzer container/image digest
- Analyzer config hash
- Vulnerability mapping dataset version hash
- Artifact digest(s)
- Graph builder version hash
A replay must be possible without “calling home.”
### Golden tests
Create fixtures where:
- Same input graph + mapping → exact evidence hash
- Regression test for canonicalization changes (version the schema intentionally)
---
## 4) Attestation format and verification
### Attestation contents (minimum)
- Subject: artifact digest (image digest / build artifact digest)
- Predicate: reachability evidence hash + metadata
- Predicate type: `reachability` (custom) with versioning
### Verification requirements
- Verification must run offline
- It must validate:
1) signature
2) subject digest binding
3) evidence hash matches serialized evidence
### Storage model
Use content-addressable storage keyed by evidence hash.
Attestation references the hash; evidence stored separately or embedded (size tradeoff).
---
## 5) Source + post-build support: engineering plan
### Unifying principle
Both sources produce the same canonical Program Graph abstraction.
#### Source analyzers produce:
- Function/method nodes using language signatures
- Edges from static analysis IR
#### Post-build analyzers produce:
- Nodes from bytecode/assembly symbol tables (where available)
- Edges from bytecode call instructions / metadata
### Practical sequencing (recommended)
1. Implement one source language adapter (fastest to prove model)
2. Implement one post-build adapter where symbols are rich (e.g., Java bytecode)
3. Ensure evidence schema and attestation workflow works identically for both
4. Expand to more ecosystems once the proof pipeline is stable
---
## 6) Operational constraints (performance, size, security)
### Performance
- Cache program graphs per artifact digest
- Cache vulnerability-to-code mapping per package/version
- Compute reachability on-demand per vulnerability, but reuse graphs
### Evidence size
- Limit witness paths (e.g., up to N shortest paths)
- Prefer “witness + bounded neighborhood” over exporting full call graph
### Security and privacy
- Provide a “redacted proof mode”
- include symbol hashes instead of raw names if needed
- store source locations as hashes/pointers
- Never embed raw source code unless explicitly enabled
---
## 7) Definition of Done for the engineering team
A milestone is complete when you can demonstrate:
1. Generate a reachability proof for a known vulnerable code unit with a witness path.
2. Serialize a canonical evidence subgraph and compute a stable hash.
3. Sign the attestation bound to the artifact digest.
4. Verify the attestation on a clean machine (offline).
5. Replay the analysis from the replay bundle and reproduce the same evidence hash.
---
# Concrete artifact example (for alignment)
A reachability evidence object should look structurally like:
- `subject`: artifact digest(s)
- `claim`:
- `verdict`: REACHABLE / NOT_PROVEN_REACHABLE / INCONCLUSIVE
- `entrypoints`: list of NodeIDs
- `vulnerable_nodes`: list of NodeIDs
- `witness_paths`: list of paths (each path = ordered NodeIDs)
- `subgraph`:
- `nodes`: list with stable IDs + metadata
- `edges`: list with stable ordering + edge types
- `assumptions`:
- gating conditions, unresolved dynamic dispatch notes, etc.
- `tooling`:
- analyzer name/version/digest
- config hash
- mapping dataset hash
- `hashes`:
- evidence content hash
- schema version
Then wrap and sign it as an attestation tied to the artifact digest.
---
## The one decision you should force early
Decide (and document) whether your semantics are:
- **Witness-based** (“REACHABLE only if we can produce a witness path”), and
- **Conservative on negative claims** (“NOT_PROVEN_REACHABLE” is not “unreachable”).
This single decision will keep the system honest, reduce legal/audit risk, and prevent the product from drifting into hand-wavy “trust us” scoring.

View File

@@ -0,0 +1,268 @@
## 1) Product direction: make “Unknowns” a first-class risk primitive
### Nonnegotiable product principles
1. **Unknowns are not suppressed findings**
* They are a distinct state with distinct governance.
2. **Unknowns must be policy-addressable**
* If policy cannot block or allow them explicitly, the feature is incomplete.
3. **Unknowns must be attested**
* Every signed decision must carry “what we dont know” in a machine-readable way.
4. **Unknowns must be default-on**
* Users may adjust thresholds, but they must not be able to “turn off unknown tracking.”
### Definition: what counts as an “unknown”
PMs must ensure that “unknown” is not vague. Define **reason-coded unknowns**, for example:
* **U-RCH**: Reachability unknown (call path indeterminate)
* **U-ID**: Component identity unknown (ambiguous package / missing digest / unresolved PURL)
* **U-PROV**: Provenance unknown (cannot map binary → source/build)
* **U-VEX**: VEX conflict or missing applicability statement
* **U-FEED**: Knowledge source missing (offline feed gaps, mirror stale)
* **U-CONFIG**: Config/runtime gate unknown (feature flag not observable)
* **U-ANALYZER**: Analyzer limitation (language/framework unsupported)
Each unknown must have:
* `reason_code` (one of a stable enum)
* `scope` (component, binary, symbol, package, image, repo)
* `evidence_refs` (what we inspected)
* `assumptions` (what would need to be true/false)
* `remediation_hint` (how to reduce unknown)
**Acceptance criterion:** every unknown surfaced to users can be traced to a reason code and remediation hint.
---
## 2) Policy direction: “unknown budgets” must be enforceable and environment-aware
### Policy model requirements
Policy must support:
* Thresholds by environment (dev/test/stage/prod)
* Thresholds by unknown type (reachability vs provenance vs feed, etc.)
* Severity weighting (e.g., unknown on internet-facing service is worse)
* Exception workflow (time-bound, owner-bound)
* Deterministic evaluation (same inputs → same result)
### Recommended default policy posture (ship as opinionated defaults)
These defaults are intentionally strict in prod:
**Prod (default)**
* `unknown_reachable == 0` (fail build/deploy)
* `unknown_provenance == 0` (fail)
* `unknown_total <= 3` (fail if exceeded)
* `unknown_feed == 0` (fail; “we didnt have data” is unacceptable for prod)
**Stage**
* `unknown_reachable <= 1`
* `unknown_provenance <= 1`
* `unknown_total <= 10`
**Dev**
* Never hard fail by default; warn + ticket/PR annotation
* Still compute unknowns and show trendlines (so teams see drift)
### Exception policy (required to avoid “disable unknowns” pressure)
Implement **explicit exceptions** rather than toggles:
* Exception must include: `owner`, `expiry`, `justification`, `scope`, `risk_ack`
* Exception must be emitted into attestations and reports (“this passed with exception X”).
**Acceptance criterion:** there is no “turn off unknowns” knob; only thresholds and expiring exceptions.
---
## 3) Reporting direction: unknowns must be visible, triaged, and trendable
### Required reporting surfaces
1. **Release / PR report**
* Unknown summary at top:
* total unknowns
* unknowns by reason code
* unknowns blocking policy vs not
* “What changed?” vs previous baseline (unknown delta)
2. **Dashboard (portfolio view)**
* Unknowns over time
* Top teams/services by unknown count
* Top unknown causes (reason codes)
3. **Operational triage view**
* “Unknown queue” sortable by:
* environment impact (prod/stage)
* exposure class (internet-facing/internal)
* reason code
* last-seen time
* owner
### Reporting should drive action, not anxiety
Every unknown row must include:
* Why its unknown (reason code + short explanation)
* What evidence is missing
* How to reduce unknown (concrete steps)
* Expected effect (e.g., “adding debug symbols will likely reduce U-RCH by ~X”)
**Key PM instruction:** treat unknowns like an **SLO**. Teams should be able to commit to “unknowns in prod must trend to zero.”
---
## 4) Attestations direction: unknowns must be cryptographically bound to decisions
Every signed decision/attestation must include an “unknowns summary” section.
### Attestation requirements
Include at minimum:
* `unknown_total`
* `unknown_by_reason_code` (map of reason→count)
* `unknown_blocking_count`
* `unknown_details_digest` (hash of the full list if too large)
* `policy_thresholds_applied` (the exact thresholds used)
* `exceptions_applied` (IDs + expiries)
* `knowledge_snapshot_id` (feeds/policy bundle hash if you support offline snapshots)
**Why this matters:** if you sign a “pass,” you must also sign what you *didnt know* at the time. Otherwise the signature is not audit-grade.
**Acceptance criterion:** any downstream verifier can reject a signed “pass” based solely on unknown fields (e.g., “reject if unknown_reachable>0 in prod”).
---
## 5) Development direction: implement unknown propagation as a first-class data flow
### Core engineering tasks (must be done in this order)
#### A. Define the canonical “Tri-state” evaluation type
For any security claim, the evaluator must return:
* `TRUE` (evidence supports)
* `FALSE` (evidence refutes)
* `UNKNOWN` (insufficient evidence)
Do not represent unknown as nulls or missing fields. It must be explicit.
#### B. Build the unknown aggregator and reason-code framework
* A single aggregation layer computes:
* unknown counts per scope
* unknown counts per reason code
* unknown “blockers” based on policy
* This must be deterministic and stable (no random ordering, stable IDs).
#### C. Ensure analyzers emit unknowns instead of silently failing
Any analyzer that cannot conclude must emit:
* `UNKNOWN` + reason code + evidence pointers
Examples:
* call graph incomplete → `U-RCH`
* stripped binary cannot map symbols → `U-PROV`
* unsupported language → `U-ANALYZER`
#### D. Provide “reduce unknown” instrumentation hooks
Attach remediation metadata:
* “add build flags …”
* “upload debug symbols …”
* “enable source mapping …”
* “mirror feeds …”
This is how you prevent user backlash.
---
## 6) Make it default rather than optional: rollout plan without breaking adoption
### Phase 1: compute + display (no blocking)
* Unknowns computed for all scans
* Reports show unknown budgets and what would have failed in prod
* Collect baseline metrics for 24 weeks of typical usage
### Phase 2: soft gating
* In prod-like pipelines: fail only on `unknown_reachable > 0`
* Everything else warns + requires owner acknowledgement
### Phase 3: full policy enforcement
* Enforce default thresholds
* Exceptions require expiry and are visible in attestations
### Phase 4: governance integration
* Unknowns become part of:
* release readiness checks
* quarterly risk reviews
* vendor compliance audits
**Dev Manager instruction:** invest in tooling that reduces unknowns early (symbol capture, provenance mapping, better analyzers). Otherwise “unknown gating” becomes politically unsustainable.
---
## 7) “Definition of Done” checklist for PMs and Dev Managers
### PM DoD
* [ ] Unknowns are explicitly defined with stable reason codes
* [ ] Policy can fail on unknowns with environment-scoped thresholds
* [ ] Reports show unknown deltas and remediation guidance
* [ ] Exceptions are time-bound and appear everywhere (UI + API + attestations)
* [ ] Unknowns cannot be disabled; only thresholds/exceptions are configurable
### Engineering DoD
* [ ] Tri-state evaluation implemented end-to-end
* [ ] Analyzer failures never disappear; they become unknowns
* [ ] Unknown aggregation is deterministic and reproducible
* [ ] Signed attestation includes unknown summary + policy thresholds + exceptions
* [ ] CI/CD integration can enforce “fail if unknowns > N in prod”
---
## 8) Concrete policy examples you can standardize internally
### Minimal policy (prod)
* Block deploy if:
* `unknown_reachable > 0`
* OR `unknown_provenance > 0`
### Balanced policy (prod)
* Block deploy if:
* `unknown_reachable > 0`
* OR `unknown_provenance > 0`
* OR `unknown_total > 3`
### Risk-sensitive policy (internet-facing prod)
* Block deploy if:
* `unknown_reachable > 0`
* OR `unknown_total > 1`
* OR any unknown affects a component with known remotely-exploitable CVEs

View File

@@ -0,0 +1,299 @@
## 1) Anchor the differentiator in one sentence everyone repeats
**Positioning invariant:**
Stella Ops does not “consume VEX to suppress findings.” Stella Ops **verifies who made the claim, scores how much to trust it, deterministically applies it to a decision, and emits a signed, replayable verdict**.
Everything you ship should make that sentence more true.
---
## 2) Shared vocabulary PMs/DMs must standardize
If you dont align on these, youll ship features that look similar to competitors but do not compound into a moat.
### Core objects
- **VEX source**: a distribution channel and issuer identity (e.g., vendor feed, distro feed, OCI-attached attestation).
- **Issuer identity**: cryptographic identity used to sign/attest the VEX (key/cert/OIDC identity), not a string.
- **VEX statement**: one claim about one vulnerability status for one or more products; common statuses include *Not Affected, Affected, Fixed, Under Investigation* (terminology varies by format). citeturn6view1turn10view0
- **Verification result**: cryptographic + semantic verification facts about a VEX document/source.
- **Trust score**: deterministic numeric/ranked evaluation of the source and/or statement quality.
- **Decision**: a policy outcome (pass/fail/needs-review) for a specific artifact or release.
- **Attestation**: signed statement bound to an artifact (e.g., OCI artifact) that captures decision + evidence.
- **Knowledge snapshot**: frozen set of inputs (VEX docs, keys, policies, vulnerability DB versions, scoring code version) required for deterministic replay.
---
## 3) Product Manager guidelines
### 3.1 Treat “VEX source onboarding” as a first-class product workflow
Your differentiator collapses if VEX is just “upload a file.”
**PM requirements:**
1. **VEX Source Registry UI/API**
- Add/edit a source: URL/feed/OCI pattern, update cadence, expected issuer(s), allowed formats.
- Define trust policy per source (thresholds, allowed statuses, expiry, overrides).
2. **Issuer enrollment & key lifecycle**
- Capture: issuer identity, trust anchor, rotation, revocation/deny-list, “break-glass disable.”
3. **Operational status**
- Source health: last fetch, last verified doc, signature failures, schema failures, drift.
**Why it matters:** customers will only operationalize VEX at scale if they can **govern it like a dependency feed**, not like a manual exception list.
### 3.2 Make “verification” visible, not implied
If users cant see it, they wont trust it—and auditors wont accept it.
**Minimum UX per VEX document/statement:**
- Verification status: **Verified / Unverified / Failed**
- Issuer identity: who signed it (and via what trust anchor)
- Format + schema validation status (OpenVEX JSON schema exists and is explicitly recommended for validation). citeturn10view0
- Freshness: timestamp, last updated
- Product mapping coverage: “X of Y products matched to SBOM/components”
### 3.3 Provide “trust score explanations” as a primary UI primitive
Trust scoring must not feel like a magic number.
**UX requirement:** every trust score shows a **breakdown** (e.g., Identity 30/30, Authority 20/25, Freshness 8/10, Evidence quality 6/10…).
This is both:
- a user adoption requirement (security teams will challenge it), and
- a moat hardener (competitors rarely expose scoring mechanics).
### 3.4 Define policy experiences that force deterministic coupling
You are not building a “VEX viewer.” You are building **decisioning**.
Policies must allow:
- “Accept VEX only if verified AND trust score ≥ threshold”
- “Accept Not Affected only if justification/impact statement exists”
- “If conflicting VEX exists, resolve by trust-weighted precedence”
- “For unverified VEX, treat status as Under Investigation (or Unknown), not Not Affected”
This aligns with CSAFs VEX profile expectation that *known_not_affected* should have an impact statement (machine-readable flag or human-readable justification). citeturn1view1
### 3.5 Ship “audit export” as a product feature, not a report
Auditors want to know:
- which VEX claims were applied,
- who asserted them,
- what trust policy allowed them,
- and what was the resulting decision.
ENISAs SBOM guidance explicitly emphasizes “historical snapshots” and “evidence chain integrity” as success criteria for SBOM/VEX integration programs. citeturn8view0
So your product needs:
- exportable evidence bundles (machine-readable)
- signed verdicts linked to the artifact
- replay semantics (“recompute this exact decision later”)
### 3.6 MVP scoping: start with sources that prove the model
For early product proof, prioritize sources that:
- are official,
- have consistent structure,
- publish frequently,
- contain configuration nuance.
Example: Ubuntu publishes VEX following OpenVEX, emphasizing exploitability in specific configurations and providing official distribution points (tarball + GitHub). citeturn9view0turn6view0
This gives you a clean first dataset for verification/trust scoring behaviors.
---
## 4) Development Manager guidelines
### 4.1 Architect it as a pipeline with hard boundaries
Do not mix verification, scoring, and decisioning in one component. You need isolatable, testable stages.
**Recommended pipeline stages:**
1. **Ingest**
- Fetch from registry/OCI
- Deduplicate by content hash
2. **Parse & normalize**
- Convert OpenVEX / CSAF VEX / CycloneDX VEX into a **canonical internal VEX model**
- Note: OpenVEX explicitly calls out that CycloneDX VEX uses different status/justification labels and may need translation. citeturn10view0
3. **Verify (cryptographic + semantic)**
4. **Trust score (pure function)**
5. **Conflict resolve**
6. **Decision**
7. **Attest + persist snapshot**
### 4.2 Verification must include both cryptography and semantics
#### Cryptographic verification (minimum bar)
- Verify signature/attestation against expected issuer identity.
- Validate certificate/identity chains per customer trust anchors.
- Support OCI-attached artifacts and “signature-of-signature” patterns (Sigstore describes countersigning: signature artifacts can themselves be signed). citeturn1view3
#### Semantic verification (equally important)
- Schema validation (OpenVEX provides JSON schema guidance). citeturn10view0
- Vulnerability identifier validity (CVE/aliases)
- Product reference validity (e.g., purl)
- Statement completeness rules:
- “Not affected” must include rationale; CSAF VEX profile requires an impact statement for known_not_affected in flags or threats. citeturn1view1
- Cross-check the statement scope to known SBOM/components:
- If the VEX references products that do not exist in the artifact SBOM, the claim should not affect the decision (or should reduce trust sharply).
### 4.3 Trust scoring must be deterministic by construction
If trust scoring varies between runs, you cannot produce replayable, attestable decisions.
**Rules for determinism:**
- Trust score is a **pure function** of:
- VEX document hash
- verification result
- source configuration (immutable version)
- scoring algorithm version
- evaluation timestamp (explicit input, included in snapshot)
- Never call external services during scoring unless responses are captured and hashed into the snapshot.
### 4.4 Implement two trust concepts: Source Trust and Statement Quality
Do not overload one score to do everything.
- **Source Trust**: “how much do we trust the issuer/channel?”
- **Statement Quality**: “how well-formed, specific, justified is this statement?”
You can then combine them:
`TrustScore = f(SourceTrust, StatementQuality, Freshness, TrackRecord)`
### 4.5 Conflict resolution must be policy-driven, not hard-coded
Conflicting VEX is inevitable:
- vendor vs distro
- older vs newer
- internal vs external
Resolve via:
- deterministic precedence rules configured per tenant
- trust-weighted tie-breakers
- “newer statement wins” only when issuer is the same or within the same trust class
### 4.6 Store VEX and decision inputs as content-addressed artifacts
If you want replayability, you must be able to reconstruct the “world state.”
**Persist:**
- VEX docs (by digest)
- verification artifacts (signature bundles, cert chains)
- normalized VEX statements (canonical form)
- trust score + breakdown + algorithm version
- policy bundle + version
- vulnerability DB snapshot identifiers
- decision output + evidence pointers
---
## 5) A practical trust scoring rubric you can hand to teams
Use a 0100 score with defined buckets. The weights below are a starting point; what matters is consistency and explainability.
### 5.1 Source Trust (060)
1. **Issuer identity verified (025)**
- 0 if unsigned/unverifiable
- 25 if signature verified to a known trust anchor
2. **Issuer authority alignment (020)**
- 20 if issuer is the product supplier/distro maintainer for that component set
- lower if third party / aggregator
3. **Distribution integrity (015)**
- extra credit if the VEX is distributed as an attestation bound to an artifact and/or uses auditable signature patterns (e.g., countersigning). citeturn1view3turn10view0
### 5.2 Statement Quality (040)
1. **Scope specificity (015)**
- exact product IDs (purl), versions, architectures, etc.
2. **Justification/impact present and structured (015)**
- CSAF VEX expects impact statement for known_not_affected; Ubuntu maps “not_affected” to justifications like `vulnerable_code_not_present`. citeturn1view1turn9view0
3. **Freshness (010)**
- based on statement/document timestamps (explicitly hashed into snapshot)
### Score buckets
- **90100**: Verified + authoritative + high-quality → eligible for gating
- **7089**: Verified but weaker evidence/scope → eligible with policy constraints
- **4069**: Mixed/partial trust → informational, not gating by default
- **039**: Unverified/low quality → do not affect decisions
---
## 6) Tight coupling to deterministic decisioning: what “coupling” means in practice
### 6.1 VEX must be an input to the same deterministic evaluation engine that produces the verdict
Do not build “VEX handling” as a sidecar that produces annotations.
**Decision engine inputs must include:**
- SBOM / component graph
- vulnerability findings
- normalized VEX statements
- verification results + trust scores
- tenant policy bundle
- evaluation timestamp + snapshot identifiers
The engine output must include:
- final status per vulnerability (affected/not affected/fixed/under investigation/unknown)
- **why** (evidence pointers)
- the policy rule(s) that caused it
### 6.2 Default posture: fail-safe, not fail-open
Recommended defaults:
- **Unverified VEX never suppresses vulnerabilities.**
- Trust score below threshold never suppresses.
- “Not affected” without justification/impact statement never suppresses.
This is aligned with CSAF VEX expectations and avoids the easiest suppression attack vector. citeturn1view1
### 6.3 Make uncertainty explicit
If VEX conflicts or is low trust, your decisioning must produce explicit states like:
- “Unknown (insufficient trusted VEX)”
- “Under Investigation”
That is consistent with common VEX status vocabulary and avoids false certainty. citeturn6view1turn9view0
---
## 7) Tight coupling to attestations: what to attest, when, and why
### 7.1 Attest **decisions**, not just documents
Competitors already sign SBOMs. Your moat is signing the **verdict** with the evidence chain.
Each signed verdict should bind:
- subject artifact digest (container/image/package)
- decision output (pass/fail/etc.)
- hashes of:
- VEX docs used
- verification artifacts
- trust scoring breakdown
- policy bundle
- vulnerability DB snapshot identifiers
### 7.2 Make attestations replayable
Your attestation must contain enough references (digests) that the system can:
- re-run the decision in an air-gapped environment
- obtain the same outputs
This aligns with “historical snapshots” / “evidence chain integrity” expectations in modern SBOM programs. citeturn8view0
### 7.3 Provide two attestations (recommended)
1. **VEX intake attestation** (optional but powerful)
- “We ingested and verified this VEX doc from issuer X under policy Y.”
2. **Risk verdict attestation** (core differentiator)
- “Given SBOM, vulnerabilities, verified VEX, and policy snapshot, the artifact is acceptable/unacceptable.”
Sigstores countersigning concept illustrates that you can add layers of trust over artifacts/signatures; your verdict is the enterprise-grade layer. citeturn1view3
---
## 8) “Definition of Done” checklists (use in roadmaps)
### PM DoD for VEX Trust (ship criteria)
- A customer can onboard a VEX source and see issuer identity + verification state.
- Trust score exists with a visible breakdown and policy thresholds.
- Policies can gate on trust score + verification.
- Audit export: per release, show which VEX claims affected the final decision.
### DM DoD for Deterministic + Attestable
- Same inputs → identical trust score and decision (golden tests).
- All inputs content-addressed and captured in a snapshot bundle.
- Attestation includes digests of all relevant inputs and a decision summary.
- No network dependency at evaluation time unless recorded in snapshot.
---
## 9) Metrics that prove you differentiated
Track these from the first pilot:
1. **% of decisions backed by verified VEX** (not just present)
2. **% of “not affected” outcomes with cryptographic verification + justification**
3. **Replay success rate** (recompute verdict from snapshot)
4. **Time-to-audit** (minutes to produce evidence chain for a release)
5. **False suppression rate** (should be effectively zero with fail-safe defaults)

View File

@@ -0,0 +1,268 @@
## 1) Product direction: make “Unknowns” a first-class risk primitive
### Nonnegotiable product principles
1. **Unknowns are not suppressed findings**
* They are a distinct state with distinct governance.
2. **Unknowns must be policy-addressable**
* If policy cannot block or allow them explicitly, the feature is incomplete.
3. **Unknowns must be attested**
* Every signed decision must carry “what we dont know” in a machine-readable way.
4. **Unknowns must be default-on**
* Users may adjust thresholds, but they must not be able to “turn off unknown tracking.”
### Definition: what counts as an “unknown”
PMs must ensure that “unknown” is not vague. Define **reason-coded unknowns**, for example:
* **U-RCH**: Reachability unknown (call path indeterminate)
* **U-ID**: Component identity unknown (ambiguous package / missing digest / unresolved PURL)
* **U-PROV**: Provenance unknown (cannot map binary → source/build)
* **U-VEX**: VEX conflict or missing applicability statement
* **U-FEED**: Knowledge source missing (offline feed gaps, mirror stale)
* **U-CONFIG**: Config/runtime gate unknown (feature flag not observable)
* **U-ANALYZER**: Analyzer limitation (language/framework unsupported)
Each unknown must have:
* `reason_code` (one of a stable enum)
* `scope` (component, binary, symbol, package, image, repo)
* `evidence_refs` (what we inspected)
* `assumptions` (what would need to be true/false)
* `remediation_hint` (how to reduce unknown)
**Acceptance criterion:** every unknown surfaced to users can be traced to a reason code and remediation hint.
---
## 2) Policy direction: “unknown budgets” must be enforceable and environment-aware
### Policy model requirements
Policy must support:
* Thresholds by environment (dev/test/stage/prod)
* Thresholds by unknown type (reachability vs provenance vs feed, etc.)
* Severity weighting (e.g., unknown on internet-facing service is worse)
* Exception workflow (time-bound, owner-bound)
* Deterministic evaluation (same inputs → same result)
### Recommended default policy posture (ship as opinionated defaults)
These defaults are intentionally strict in prod:
**Prod (default)**
* `unknown_reachable == 0` (fail build/deploy)
* `unknown_provenance == 0` (fail)
* `unknown_total <= 3` (fail if exceeded)
* `unknown_feed == 0` (fail; “we didnt have data” is unacceptable for prod)
**Stage**
* `unknown_reachable <= 1`
* `unknown_provenance <= 1`
* `unknown_total <= 10`
**Dev**
* Never hard fail by default; warn + ticket/PR annotation
* Still compute unknowns and show trendlines (so teams see drift)
### Exception policy (required to avoid “disable unknowns” pressure)
Implement **explicit exceptions** rather than toggles:
* Exception must include: `owner`, `expiry`, `justification`, `scope`, `risk_ack`
* Exception must be emitted into attestations and reports (“this passed with exception X”).
**Acceptance criterion:** there is no “turn off unknowns” knob; only thresholds and expiring exceptions.
---
## 3) Reporting direction: unknowns must be visible, triaged, and trendable
### Required reporting surfaces
1. **Release / PR report**
* Unknown summary at top:
* total unknowns
* unknowns by reason code
* unknowns blocking policy vs not
* “What changed?” vs previous baseline (unknown delta)
2. **Dashboard (portfolio view)**
* Unknowns over time
* Top teams/services by unknown count
* Top unknown causes (reason codes)
3. **Operational triage view**
* “Unknown queue” sortable by:
* environment impact (prod/stage)
* exposure class (internet-facing/internal)
* reason code
* last-seen time
* owner
### Reporting should drive action, not anxiety
Every unknown row must include:
* Why its unknown (reason code + short explanation)
* What evidence is missing
* How to reduce unknown (concrete steps)
* Expected effect (e.g., “adding debug symbols will likely reduce U-RCH by ~X”)
**Key PM instruction:** treat unknowns like an **SLO**. Teams should be able to commit to “unknowns in prod must trend to zero.”
---
## 4) Attestations direction: unknowns must be cryptographically bound to decisions
Every signed decision/attestation must include an “unknowns summary” section.
### Attestation requirements
Include at minimum:
* `unknown_total`
* `unknown_by_reason_code` (map of reason→count)
* `unknown_blocking_count`
* `unknown_details_digest` (hash of the full list if too large)
* `policy_thresholds_applied` (the exact thresholds used)
* `exceptions_applied` (IDs + expiries)
* `knowledge_snapshot_id` (feeds/policy bundle hash if you support offline snapshots)
**Why this matters:** if you sign a “pass,” you must also sign what you *didnt know* at the time. Otherwise the signature is not audit-grade.
**Acceptance criterion:** any downstream verifier can reject a signed “pass” based solely on unknown fields (e.g., “reject if unknown_reachable>0 in prod”).
---
## 5) Development direction: implement unknown propagation as a first-class data flow
### Core engineering tasks (must be done in this order)
#### A. Define the canonical “Tri-state” evaluation type
For any security claim, the evaluator must return:
* `TRUE` (evidence supports)
* `FALSE` (evidence refutes)
* `UNKNOWN` (insufficient evidence)
Do not represent unknown as nulls or missing fields. It must be explicit.
#### B. Build the unknown aggregator and reason-code framework
* A single aggregation layer computes:
* unknown counts per scope
* unknown counts per reason code
* unknown “blockers” based on policy
* This must be deterministic and stable (no random ordering, stable IDs).
#### C. Ensure analyzers emit unknowns instead of silently failing
Any analyzer that cannot conclude must emit:
* `UNKNOWN` + reason code + evidence pointers
Examples:
* call graph incomplete → `U-RCH`
* stripped binary cannot map symbols → `U-PROV`
* unsupported language → `U-ANALYZER`
#### D. Provide “reduce unknown” instrumentation hooks
Attach remediation metadata:
* “add build flags …”
* “upload debug symbols …”
* “enable source mapping …”
* “mirror feeds …”
This is how you prevent user backlash.
---
## 6) Make it default rather than optional: rollout plan without breaking adoption
### Phase 1: compute + display (no blocking)
* Unknowns computed for all scans
* Reports show unknown budgets and what would have failed in prod
* Collect baseline metrics for 24 weeks of typical usage
### Phase 2: soft gating
* In prod-like pipelines: fail only on `unknown_reachable > 0`
* Everything else warns + requires owner acknowledgement
### Phase 3: full policy enforcement
* Enforce default thresholds
* Exceptions require expiry and are visible in attestations
### Phase 4: governance integration
* Unknowns become part of:
* release readiness checks
* quarterly risk reviews
* vendor compliance audits
**Dev Manager instruction:** invest in tooling that reduces unknowns early (symbol capture, provenance mapping, better analyzers). Otherwise “unknown gating” becomes politically unsustainable.
---
## 7) “Definition of Done” checklist for PMs and Dev Managers
### PM DoD
* [ ] Unknowns are explicitly defined with stable reason codes
* [ ] Policy can fail on unknowns with environment-scoped thresholds
* [ ] Reports show unknown deltas and remediation guidance
* [ ] Exceptions are time-bound and appear everywhere (UI + API + attestations)
* [ ] Unknowns cannot be disabled; only thresholds/exceptions are configurable
### Engineering DoD
* [ ] Tri-state evaluation implemented end-to-end
* [ ] Analyzer failures never disappear; they become unknowns
* [ ] Unknown aggregation is deterministic and reproducible
* [ ] Signed attestation includes unknown summary + policy thresholds + exceptions
* [ ] CI/CD integration can enforce “fail if unknowns > N in prod”
---
## 8) Concrete policy examples you can standardize internally
### Minimal policy (prod)
* Block deploy if:
* `unknown_reachable > 0`
* OR `unknown_provenance > 0`
### Balanced policy (prod)
* Block deploy if:
* `unknown_reachable > 0`
* OR `unknown_provenance > 0`
* OR `unknown_total > 3`
### Risk-sensitive policy (internet-facing prod)
* Block deploy if:
* `unknown_reachable > 0`
* OR `unknown_total > 1`
* OR any unknown affects a component with known remotely-exploitable CVEs