house keeping work

2025-12-19 22:19:08 +02:00
parent 91f3610b9d
commit 5b57b04484
64 changed files with 4702 additions and 4 deletions
--- a/docs/product-advisories/unprocessed/19-Dec-2025
+++ b/docs/product-advisories/unprocessed/19-Dec-2025
@@ -0,0 +1,463 @@
+## Outcome you are shipping
+
+A deterministic “claim resolution” capability that takes:
+
+* Multiple **claims** about the same vulnerability (vendor VEX, distro VEX, internal assessments, scanner inferences),
+* A **policy** describing trust and merge semantics,
+* A set of **evidence artifacts** (SBOM, config snapshots, reachability proofs, etc.),
+
+…and produces a **single resolved status** per vulnerability/component/artifact **with an explainable trail**:
+
+* Which claims applied and why
+* Which were rejected and why
+* What evidence was required and whether it was satisfied
+* What policy rules triggered the resolution outcome
+
+This replaces naive precedence like `vendor > distro > internal`.
+
+---
+
+# Directions for Product Managers
+
+## 1) Write the PRD around “claims resolution,” not “VEX support”
+
+The customer outcome is not “we ingest VEX.” It is:
+
+* “We can *safely* accept ‘not affected’ without hiding risk.”
+* “We can prove, to auditors and change control, why a CVE was downgraded.”
+* “We can consistently resolve conflicts between issuer statements.”
+
+### Non-negotiable product properties
+
+* **Deterministic**: same inputs → same resolved outcome
+* **Explainable**: a human can trace the decision path
+* **Guardrailed**: a “safe” resolution requires evidence, not just a statement
+
+---
+
+## 2) Define the core objects (these drive everything)
+
+In the PRD, define these three objects explicitly:
+
+### A) Claim (normalized)
+
+A “claim” is any statement about vulnerability applicability to an artifact/component, regardless of source format.
+
+Minimum fields:
+
+* `vuln_id` (CVE/GHSA/etc.)
+* `subject` (component identity; ideally package + version + digest/purl)
+* `target` (the thing we’re evaluating: image, repo build, runtime instance)
+* `status` (affected / not_affected / fixed / under_investigation / unknown)
+* `justification` (human/machine reason)
+* `issuer` (who said it; plus verification state)
+* `scope` (what it applies to; versions, ranges, products)
+* `timestamp` (when produced)
+* `references` (links/IDs to evidence or external material)
+
+### B) Evidence
+
+A typed artifact that can satisfy a requirement.
+
+Examples (not exhaustive):
+
+* `config_snapshot` (e.g., Helm values, env var map, feature flag export)
+* `sbom_presence_or_absence` (SBOM proof that component is/ isn’t present)
+* `reachability_proof` (call-path evidence from entrypoint to vulnerable symbol)
+* `symbol_absence` (binary inspection shows symbol/function not present)
+* `patch_presence` (artifact includes backport / fixed build)
+* `manual_attestation` (human-reviewed attestation with reviewer identity + scope)
+
+Each evidence item must have:
+
+* `type`
+* `collector` (tool/provider)
+* `inputs_hash` and `output_hash`
+* `scope` (what artifact/environment it applies to)
+* `confidence` (optional but recommended)
+* `expires_at` / `valid_for` (for config/runtime evidence)
+
+### C) Policy
+
+A policy describes:
+
+* **Trust rules** (how much to trust whom, under which conditions)
+* **Merge semantics** (how to resolve conflicts)
+* **Evidence requirements** (what must be present to accept certain claims)
+
+---
+
+## 3) Ship “policy-controlled merge semantics” as a configuration schema first
+
+Do not start with a fully general policy language. You need a small, explicit schema that makes behavior predictable.
+
+PM deliverable: a policy spec with these sections:
+
+1. **Issuer trust**
+
+   * weights by issuer category (vendor/distro/internal/scanner)
+   * optional constraints (must be signed, must match product ownership, must be within time window)
+2. **Applicability rules**
+
+   * what constitutes a match to artifact/component (range semantics, digest match priority)
+3. **Evidence requirements**
+
+   * per status + per justification: what evidence types are required
+4. **Conflict resolution strategy**
+
+   * conservative vs weighted vs most-specific
+   * explicit guardrails (never accept “safe” without evidence)
+5. **Override rules**
+
+   * when internal can override vendor (and what evidence is required to do so)
+   * environment-specific policies (prod vs dev)
+
+---
+
+## 4) Make “evidence hooks” a first-class user workflow
+
+You are explicitly shipping the ability to say:
+
+> “This is not affected **because** feature flag X is off.”
+
+That requires:
+
+* a way to **provide or discover** feature flag state, and
+* a way to **bind** that flag to the vulnerable surface
+
+PM must specify: what does the user do to assert that?
+
+Minimum viable workflow:
+
+* User attaches a `config_snapshot` (or system captures it)
+* User provides a “binding” to the vulnerable module/function:
+
+  * either automatic (later) or manual (first release)
+  * e.g., `flag X gates module Y` with references (file path, code reference, runbook)
+
+This “binding” itself becomes evidence.
+
+---
+
+## 5) Define acceptance criteria as decision trace tests
+
+PM should write acceptance criteria as “given claims + policy + evidence → resolved outcome + trace”.
+
+You need at least these canonical tests:
+
+1. **Distro backport vs vendor version logic conflict**
+
+   * Vendor says affected (by version range)
+   * Distro says fixed (backport)
+   * Policy says: in distro context, distro claim can override vendor if patch evidence exists
+   * Outcome: fixed, with trace proving why
+
+2. **Internal ‘feature flag off’ downgrade**
+
+   * Vendor says affected
+   * Internal says not_affected because flag off
+   * Evidence: config snapshot + flag→module binding
+   * Outcome: not_affected **only for that environment context**, with trace
+
+3. **Evidence missing**
+
+   * Internal says not_affected because “code not reachable”
+   * No reachability evidence present
+   * Outcome: unknown or affected (policy-dependent), but **not “not_affected”**
+
+4. **Conflicting “safe” claims**
+
+   * Vendor says not_affected (reason A)
+   * Internal says affected (reason B) with strong evidence
+   * Outcome follows merge strategy, and trace must show why.
+
+---
+
+## 6) Package it as an “Explainable Resolution” feature
+
+UI/UX requirements PM must specify:
+
+* A “Resolved Status” view per vuln/component showing:
+
+  * contributing claims (ranked)
+  * rejected claims (with reason)
+  * evidence required vs evidence present
+  * the policy clauses triggered (line-level references)
+* A policy editor can be CLI/JSON first; UI later, but explainability cannot wait.
+
+---
+
+# Directions for Development Managers
+
+## 1) Implement as three services/modules with strict interfaces
+
+### Module A: Claim Normalization
+
+* Inputs: OpenVEX / CycloneDX VEX / CSAF / internal annotations / scanner hints
+* Output: canonical `Claim` objects
+
+Rules:
+
+* Canonicalize IDs (normalize CVE formats, normalize package coordinates)
+* Preserve provenance: issuer identity, signature metadata, timestamps, original document hash
+
+### Module B: Evidence Providers (plugin boundary)
+
+* Provide an interface like:
+
+```
+evaluate_evidence(context, claim) -> EvidenceEvaluation
+```
+
+Where `EvidenceEvaluation` returns:
+
+* required evidence types for this claim (from policy)
+* found evidence items (from store/providers)
+* satisfied / not satisfied
+* explanation strings
+* confidence
+
+Start with 3 providers:
+
+1. SBOM provider (presence/absence)
+2. Config provider (feature flags/config snapshot ingestion)
+3. Reachability provider (even if initially limited or stubbed, it must exist as a typed hook)
+
+### Module C: Merge & Resolution Engine
+
+* Inputs: set of claims + policy + evidence evaluations + context
+* Output: `ResolvedDecision`
+
+A `ResolvedDecision` must include:
+
+* final status
+* selected “winning” claim(s)
+* all considered claims
+* evidence satisfaction summary
+* applied policy rule IDs
+* deterministic ordering keys/hashes
+
+---
+
+## 2) Define the evaluation context (this avoids foot-guns)
+
+The resolved outcome must be context-aware.
+
+Create an immutable `EvaluationContext` object, containing:
+
+* artifact identity (image digest / build digest / SBOM hash)
+* environment identity (prod/stage/dev; cluster; region)
+* config snapshot ID
+* time (evaluation timestamp)
+* policy version hash
+
+This is how you support: “not affected because feature flag off” in prod but not in dev.
+
+---
+
+## 3) Merge semantics: implement scoring + guardrails, not precedence
+
+You need a deterministic function. One workable approach:
+
+### Step 1: compute statement strength
+
+For each claim:
+
+* `trust_weight` from policy (issuer + scope + signature requirements)
+* `evidence_factor` (1.0 if requirements satisfied; <1 or 0 if not)
+* `specificity_factor` (exact digest match > exact version > range)
+* `freshness_factor` (optional; policy-defined)
+* `applicability` must be true or claim is excluded
+
+Compute:
+
+```
+support = trust_weight * evidence_factor * specificity_factor * freshness_factor
+```
+
+### Step 2: apply merge strategy (policy-controlled)
+
+Ship at least two strategies:
+
+1. **Conservative default**
+
+   * If any “unsafe” claim (affected/under_investigation) has support above threshold, it wins
+   * A “safe” claim (not_affected/fixed) can override only if:
+
+     * it has equal/higher support + delta, AND
+     * its evidence requirements are satisfied
+
+2. **Evidence-weighted**
+
+   * Highest support wins, but safe statuses have a hard evidence gate
+
+### Step 3: apply guardrails
+
+Hard guardrail to prevent bad outcomes:
+
+* **Never emit a safe status unless evidence requirements for that safe claim are satisfied.**
+* If a safe claim lacks evidence, downgrade the safe claim to “unsupported” and do not allow it to win.
+
+This single rule is what makes your system materially different from “VEX as suppression.”
+
+---
+
+## 4) Evidence hooks: treat them as typed contracts, not strings
+
+For “feature flag off,” implement it as a structured evidence requirement.
+
+Example evidence requirement for a “safe because feature flag off” claim:
+
+* Required evidence types:
+
+  * `config_snapshot`
+  * `flag_binding` (the mapping “flag X gates vulnerable surface Y”)
+
+Implementation:
+
+* Config provider can parse:
+
+  * Helm values / env var sets / feature flag exports
+  * Store them as normalized key/value with hashes
+* Binding evidence can start as manual JSON that references:
+
+  * repo path / module / function group
+  * a link to code ownership / runbook
+  * optional test evidence
+
+Later you can automate binding via static analysis, but do not block shipping on that.
+
+---
+
+## 5) Determinism requirements (engineering non-negotiables)
+
+Development manager should enforce:
+
+* stable sorting of claims by canonical key
+* stable tie-breakers (e.g., issuer ID, timestamp, claim hash)
+* no nondeterministic external calls during evaluation (or they must be snapshot-based)
+* every evaluation produces:
+
+  * `input_bundle_hash` (claims + evidence + policy + context)
+  * `decision_hash`
+
+This is the foundation for replayability and audits.
+
+---
+
+## 6) Storage model: store raw inputs and canonical forms
+
+Minimum stores:
+
+* Raw documents (original VEX/CSAF/etc.) keyed by content hash
+* Canonical claims keyed by claim hash
+* Evidence items keyed by evidence hash and scoped by context
+* Policy versions keyed by policy hash
+* Resolutions keyed by (context, vuln_id, subject) with decision hash
+
+---
+
+## 7) “Definition of done” checklist for engineering
+
+You are done when:
+
+1. You can ingest at least two formats into canonical claims (pick OpenVEX + CycloneDX VEX first).
+2. You can configure issuer trust and evidence requirements in a policy file.
+3. You can resolve conflicts deterministically.
+4. You can attach a config snapshot and produce:
+
+   * `not_affected because feature flag off` **only when evidence satisfied**
+5. The system produces a decision trace with:
+
+   * applied policy rules
+   * evidence satisfaction
+   * selected/rejected claims and reasons
+6. Golden test vectors exist for the acceptance scenarios listed above.
+
+---
+
+# A concrete example policy (schema-first, no full DSL required)
+
+```yaml
+version: 1
+
+trust:
+  issuers:
+    - match: {category: vendor}
+      weight: 70
+      require_signature: true
+    - match: {category: distro}
+      weight: 75
+      require_signature: true
+    - match: {category: internal}
+      weight: 85
+      require_signature: false
+    - match: {category: scanner}
+      weight: 40
+
+evidence_requirements:
+  safe_status_requires_evidence: true
+
+  rules:
+    - when:
+        status: not_affected
+        reason: feature_flag_off
+      require: [config_snapshot, flag_binding]
+
+    - when:
+        status: not_affected
+        reason: component_not_present
+      require: [sbom_absence]
+
+    - when:
+        status: not_affected
+        reason: not_reachable
+      require: [reachability_proof]
+
+merge:
+  strategy: conservative
+  unsafe_wins_threshold: 50
+  safe_override_delta: 10
+```
+
+---
+
+# A concrete example output trace (what auditors and engineers must see)
+
+```json
+{
+  "vuln_id": "CVE-XXXX-YYYY",
+  "subject": "pkg:maven/org.example/foo@1.2.3",
+  "context": {
+    "artifact_digest": "sha256:...",
+    "environment": "prod",
+    "policy_hash": "sha256:..."
+  },
+  "resolved_status": "not_affected",
+  "because": [
+    {
+      "winning_claim": "claim_hash_abc",
+      "reason": "feature_flag_off",
+      "evidence_required": ["config_snapshot", "flag_binding"],
+      "evidence_present": ["ev_hash_1", "ev_hash_2"],
+      "policy_rules_applied": ["trust.issuers[internal]", "evidence.rules[0]", "merge.safe_override_delta"]
+    }
+  ],
+  "claims_considered": [
+    {"issuer": "vendor", "status": "affected", "support": 62, "accepted": false, "rejection_reason": "overridden_by_higher_support_safe_claim_with_satisfied_evidence"},
+    {"issuer": "internal", "status": "not_affected", "support": 78, "accepted": true, "evidence_satisfied": true}
+  ],
+  "decision_hash": "sha256:..."
+}
+```
+
+---
+
+## The two strategic pitfalls to explicitly avoid
+
+1. **“Trust precedence” as the merge mechanism**
+
+   * It will fail immediately on backports, forks, downstream patches, and environment-specific mitigations.
+2. **Allowing “safe” without evidence**
+
+   * That turns VEX into a suppression system and will collapse trust in the product.