Add reference architecture and testing strategy documentation

- Created a new document for the Stella Ops Reference Architecture outlining the system's topology, trust boundaries, artifact association, and interfaces. - Developed a comprehensive Testing Strategy document detailing the importance of offline readiness, interoperability, determinism, and operational guardrails. - Introduced a README for the Testing Strategy, summarizing processing details and key concepts implemented. - Added guidance for AI agents and developers in the tests directory, including directory structure, test categories, key patterns, and rules for test development.
2025-12-22 07:59:15 +02:00
parent 5d398ec442
commit 53503cb407
96 changed files with 37565 additions and 71 deletions
--- a/docs/product-advisories/21-Dec-2025
+++ b/docs/product-advisories/21-Dec-2025
@@ -0,0 +1,154 @@
+Below are operating guidelines for Product and Development Managers to deliver a “vulnerability-first + reachability + multi-analyzer + single built-in attested verdict” capability as a coherent, top-of-market feature set.
+
+## 1) Product north star and non-negotiables
+
+**North star:** Every vulnerability finding must resolve to a **policy-backed, reachability-informed, runtime-corroborated verdict** that is **exportable as one signed attestation attached to the built artifact**.
+
+**Non-negotiables**
+
+* **Vulnerability-first UX:** Users start from a CVE/finding and immediately see applicability, reachability, runtime corroboration, and policy rationale.
+* **Single canonical verdict artifact:** One built-in, signed verdict attestation per subject (OCI digest), replayable (“same inputs → same output”).
+* **Deterministic evidence:** Evidence objects are content-hashed and versioned (feeds, policies, analyzers, graph snapshots).
+* **Unknowns are first-class:** “Unknown reachability/runtime/config” is not hidden; it is budgeted and policy-controlled.
+
+## 2) Scope: what “reachability” means across analyzers
+
+PMs must define reachability per layer and force consistent semantics:
+
+1. **Source reachability**
+
+   * Entry points → call graph → vulnerable function/symbol (proof subgraph stored).
+2. **Language dependency reachability**
+
+   * Resolved dependency graph + vulnerable component mapping + (where feasible) call-path to vulnerable code.
+3. **OS dependency applicability**
+
+   * Installed package inventory + file ownership + linkage/usage hints (where available).
+4. **Binary mapping reachability**
+
+   * Build-ID / symbol tables / imports + (optional) DWARF/source map; fallback heuristics are explicitly labeled.
+5. **Runtime corroboration (eBPF / runtime sensors)**
+
+   * Execution facts: library loads, syscalls, network exposure, process ancestry; mapped to a “supports/contradicts/unknown” posture for the finding.
+
+**Manager rule:** Any analyzer that cannot produce a proof object must emit an explicit “UNKNOWN with reason code,” never a silent “not reachable.”
+
+## 3) The decision model: a strict, explainable merge into one verdict
+
+Adopt a small fixed set of verdicts and require all teams to use them:
+
+* `AFFECTED`, `NOT_AFFECTED`, `MITIGATED`, `NEEDS_REVIEW`
+
+Each verdict must carry:
+
+* **Reason steps** (policy/lattice merge trace)
+* **Confidence score** (bounded; explainable inputs)
+* **Counterfactuals** (“what would flip this verdict”)
+* **Evidence pointers** (hashes to proof objects)
+
+**PM guidance on precedence:** Do not hardcode “vendor > distro > internal.” Require a policy-defined merge (lattice semantics) where evidence quality and freshness influence trust.
+
+## 4) Built-in attestation as the primary deliverable
+
+**Deliverable:** An OCI-attached DSSE/in-toto style attestation called (example) `stella.verdict.v1`.
+
+Minimum contents:
+
+* Subject: image digest(s)
+* Inputs: feed snapshot IDs, analyzer versions/digests, policy bundle hash, time window, environment tags
+* Per-CVE records: component, installed version, fixed version, verdict, confidence, reason steps
+* Proof pointers: reachability subgraph hash, runtime fact hashes, config/exposure facts hash
+* Replay manifest: “verify this verdict” command + inputs hash
+
+**Acceptance criterion:** A third party can validate signature and replay deterministically using exported inputs, obtaining byte-identical verdict output.
+
+## 5) UX requirements (vulnerability-first, proof-linked)
+
+PMs must enforce these UX invariants:
+
+* Finding row shows: Verdict chip + confidence + “why” one-liner + proof badges (Reachability / Runtime / Policy / Provenance).
+* Click-through yields:
+
+  * Policy explanation (human-readable steps)
+  * Evidence graph (hashes, issuers, timestamps, signature status)
+  * Reachability mini-map (stored subgraph)
+  * Runtime corroboration timeline (windowed)
+  * Export: “Audit pack” (verdict + proofs + inputs)
+
+**Rule:** Any displayed claim must link to a proof node or be explicitly marked “operator note.”
+
+## 6) Engineering execution rules (to keep this shippable)
+
+**Modular contracts**
+
+* Each analyzer outputs into a shared internal schema (typed nodes/edges + content hashes).
+* Evidence objects are immutable; updates create new objects (versioned snapshots).
+
+**Performance strategy**
+
+* Vulnerability-first query plan: build “vulnerable element set” per CVE, then run targeted reachability; avoid whole-program graphs unless needed.
+* Progressive fidelity: fast heuristic → deeper proof when requested; verdict must reflect confidence accordingly.
+
+**Determinism**
+
+* Pin all feeds/policies/analyzer images by digest.
+* Canonical serialization for graphs and verdicts.
+* Stable hashing rules documented and tested.
+
+## 7) Release gates and KPIs (what managers track weekly)
+
+**Quality KPIs**
+
+* % findings with non-UNKNOWN reachability
+* % findings with runtime corroboration available (where sensor deployed)
+* False-positive reduction vs baseline (measured via developer confirmations / triage outcomes)
+* “Explainability completeness”: % verdicts with reason steps + at least one proof pointer
+* Replay success rate: % attestations replaying deterministically in CI
+
+**Operational KPIs**
+
+* Median time to first verdict per image
+* Cache hit rate for graphs/proofs
+* Storage growth per scan (evidence size budgets)
+
+**Policy KPIs**
+
+* Unknown budget breaches by environment (prod/dev)
+* Override/exception volume and aging
+
+## 8) Roadmap sequencing (recommended)
+
+1. **Phase 1: Single attested verdict + OS/lang SCA applicability**
+
+   * Deterministic inputs, verdict schema, signature, OCI attach, basic policy steps.
+2. **Phase 2: Source reachability proofs (top languages)**
+
+   * Store subgraphs; introduce confidence + counterfactuals.
+3. **Phase 3: Binary mapping fallback**
+
+   * Build-ID/symbol-based reachability + explicit “heuristic” labeling.
+4. **Phase 4: Runtime corroboration (eBPF) integration**
+
+   * Evidence facts + time-window model + correlation to findings.
+5. **Phase 5: Full lattice merge + Trust Algebra Studio**
+
+   * Operator-defined semantics; evidence quality weighting; vendor trust scoring.
+
+## 9) Risk management rules (preempt common failure modes)
+
+* **Overclaiming:** Never present “not affected” without an evidence-backed rationale; otherwise use `NEEDS_REVIEW` with a clear missing-evidence reason.
+* **Evidence sprawl:** Enforce evidence budgets (per-scan size caps) and retention tiers; “audit pack export” must remain complete even when the platform prunes caches.
+* **Runtime ambiguity:** Runtime corroboration is supportive, not absolute; map to “observed/supports/contradicts/unknown” rather than binary.
+* **Policy drift:** Policy bundles are versioned and pinned into attestations; changes must produce new signed verdicts (delta verdicts).
+
+## 10) Definition of done for the feature
+
+A release is “done” only if:
+
+* A build produces an OCI artifact with an attached **signed verdict attestation**.
+* Each verdict is **explainable** (reason steps + proof pointers).
+* Reachability evidence is **stored as a reproducible subgraph** (or explicitly UNKNOWN with reason).
+* Replay verification reproduces the same verdict with pinned inputs.
+* UX starts from vulnerabilities and links directly to proofs and audit export.
+
+If you want, I can turn these guidelines into: (1) a manager-ready checklist per sprint, and (2) a concrete “verdict attestation” JSON schema with canonical hashing/serialization rules.