Add reference architecture and testing strategy documentation
- Created a new document for the Stella Ops Reference Architecture outlining the system's topology, trust boundaries, artifact association, and interfaces. - Developed a comprehensive Testing Strategy document detailing the importance of offline readiness, interoperability, determinism, and operational guardrails. - Introduced a README for the Testing Strategy, summarizing processing details and key concepts implemented. - Added guidance for AI agents and developers in the tests directory, including directory structure, test categories, key patterns, and rules for test development.
This commit is contained in:
@@ -0,0 +1,154 @@
|
||||
Below are operating guidelines for Product and Development Managers to deliver a “vulnerability-first + reachability + multi-analyzer + single built-in attested verdict” capability as a coherent, top-of-market feature set.
|
||||
|
||||
## 1) Product north star and non-negotiables
|
||||
|
||||
**North star:** Every vulnerability finding must resolve to a **policy-backed, reachability-informed, runtime-corroborated verdict** that is **exportable as one signed attestation attached to the built artifact**.
|
||||
|
||||
**Non-negotiables**
|
||||
|
||||
* **Vulnerability-first UX:** Users start from a CVE/finding and immediately see applicability, reachability, runtime corroboration, and policy rationale.
|
||||
* **Single canonical verdict artifact:** One built-in, signed verdict attestation per subject (OCI digest), replayable (“same inputs → same output”).
|
||||
* **Deterministic evidence:** Evidence objects are content-hashed and versioned (feeds, policies, analyzers, graph snapshots).
|
||||
* **Unknowns are first-class:** “Unknown reachability/runtime/config” is not hidden; it is budgeted and policy-controlled.
|
||||
|
||||
## 2) Scope: what “reachability” means across analyzers
|
||||
|
||||
PMs must define reachability per layer and force consistent semantics:
|
||||
|
||||
1. **Source reachability**
|
||||
|
||||
* Entry points → call graph → vulnerable function/symbol (proof subgraph stored).
|
||||
2. **Language dependency reachability**
|
||||
|
||||
* Resolved dependency graph + vulnerable component mapping + (where feasible) call-path to vulnerable code.
|
||||
3. **OS dependency applicability**
|
||||
|
||||
* Installed package inventory + file ownership + linkage/usage hints (where available).
|
||||
4. **Binary mapping reachability**
|
||||
|
||||
* Build-ID / symbol tables / imports + (optional) DWARF/source map; fallback heuristics are explicitly labeled.
|
||||
5. **Runtime corroboration (eBPF / runtime sensors)**
|
||||
|
||||
* Execution facts: library loads, syscalls, network exposure, process ancestry; mapped to a “supports/contradicts/unknown” posture for the finding.
|
||||
|
||||
**Manager rule:** Any analyzer that cannot produce a proof object must emit an explicit “UNKNOWN with reason code,” never a silent “not reachable.”
|
||||
|
||||
## 3) The decision model: a strict, explainable merge into one verdict
|
||||
|
||||
Adopt a small fixed set of verdicts and require all teams to use them:
|
||||
|
||||
* `AFFECTED`, `NOT_AFFECTED`, `MITIGATED`, `NEEDS_REVIEW`
|
||||
|
||||
Each verdict must carry:
|
||||
|
||||
* **Reason steps** (policy/lattice merge trace)
|
||||
* **Confidence score** (bounded; explainable inputs)
|
||||
* **Counterfactuals** (“what would flip this verdict”)
|
||||
* **Evidence pointers** (hashes to proof objects)
|
||||
|
||||
**PM guidance on precedence:** Do not hardcode “vendor > distro > internal.” Require a policy-defined merge (lattice semantics) where evidence quality and freshness influence trust.
|
||||
|
||||
## 4) Built-in attestation as the primary deliverable
|
||||
|
||||
**Deliverable:** An OCI-attached DSSE/in-toto style attestation called (example) `stella.verdict.v1`.
|
||||
|
||||
Minimum contents:
|
||||
|
||||
* Subject: image digest(s)
|
||||
* Inputs: feed snapshot IDs, analyzer versions/digests, policy bundle hash, time window, environment tags
|
||||
* Per-CVE records: component, installed version, fixed version, verdict, confidence, reason steps
|
||||
* Proof pointers: reachability subgraph hash, runtime fact hashes, config/exposure facts hash
|
||||
* Replay manifest: “verify this verdict” command + inputs hash
|
||||
|
||||
**Acceptance criterion:** A third party can validate signature and replay deterministically using exported inputs, obtaining byte-identical verdict output.
|
||||
|
||||
## 5) UX requirements (vulnerability-first, proof-linked)
|
||||
|
||||
PMs must enforce these UX invariants:
|
||||
|
||||
* Finding row shows: Verdict chip + confidence + “why” one-liner + proof badges (Reachability / Runtime / Policy / Provenance).
|
||||
* Click-through yields:
|
||||
|
||||
* Policy explanation (human-readable steps)
|
||||
* Evidence graph (hashes, issuers, timestamps, signature status)
|
||||
* Reachability mini-map (stored subgraph)
|
||||
* Runtime corroboration timeline (windowed)
|
||||
* Export: “Audit pack” (verdict + proofs + inputs)
|
||||
|
||||
**Rule:** Any displayed claim must link to a proof node or be explicitly marked “operator note.”
|
||||
|
||||
## 6) Engineering execution rules (to keep this shippable)
|
||||
|
||||
**Modular contracts**
|
||||
|
||||
* Each analyzer outputs into a shared internal schema (typed nodes/edges + content hashes).
|
||||
* Evidence objects are immutable; updates create new objects (versioned snapshots).
|
||||
|
||||
**Performance strategy**
|
||||
|
||||
* Vulnerability-first query plan: build “vulnerable element set” per CVE, then run targeted reachability; avoid whole-program graphs unless needed.
|
||||
* Progressive fidelity: fast heuristic → deeper proof when requested; verdict must reflect confidence accordingly.
|
||||
|
||||
**Determinism**
|
||||
|
||||
* Pin all feeds/policies/analyzer images by digest.
|
||||
* Canonical serialization for graphs and verdicts.
|
||||
* Stable hashing rules documented and tested.
|
||||
|
||||
## 7) Release gates and KPIs (what managers track weekly)
|
||||
|
||||
**Quality KPIs**
|
||||
|
||||
* % findings with non-UNKNOWN reachability
|
||||
* % findings with runtime corroboration available (where sensor deployed)
|
||||
* False-positive reduction vs baseline (measured via developer confirmations / triage outcomes)
|
||||
* “Explainability completeness”: % verdicts with reason steps + at least one proof pointer
|
||||
* Replay success rate: % attestations replaying deterministically in CI
|
||||
|
||||
**Operational KPIs**
|
||||
|
||||
* Median time to first verdict per image
|
||||
* Cache hit rate for graphs/proofs
|
||||
* Storage growth per scan (evidence size budgets)
|
||||
|
||||
**Policy KPIs**
|
||||
|
||||
* Unknown budget breaches by environment (prod/dev)
|
||||
* Override/exception volume and aging
|
||||
|
||||
## 8) Roadmap sequencing (recommended)
|
||||
|
||||
1. **Phase 1: Single attested verdict + OS/lang SCA applicability**
|
||||
|
||||
* Deterministic inputs, verdict schema, signature, OCI attach, basic policy steps.
|
||||
2. **Phase 2: Source reachability proofs (top languages)**
|
||||
|
||||
* Store subgraphs; introduce confidence + counterfactuals.
|
||||
3. **Phase 3: Binary mapping fallback**
|
||||
|
||||
* Build-ID/symbol-based reachability + explicit “heuristic” labeling.
|
||||
4. **Phase 4: Runtime corroboration (eBPF) integration**
|
||||
|
||||
* Evidence facts + time-window model + correlation to findings.
|
||||
5. **Phase 5: Full lattice merge + Trust Algebra Studio**
|
||||
|
||||
* Operator-defined semantics; evidence quality weighting; vendor trust scoring.
|
||||
|
||||
## 9) Risk management rules (preempt common failure modes)
|
||||
|
||||
* **Overclaiming:** Never present “not affected” without an evidence-backed rationale; otherwise use `NEEDS_REVIEW` with a clear missing-evidence reason.
|
||||
* **Evidence sprawl:** Enforce evidence budgets (per-scan size caps) and retention tiers; “audit pack export” must remain complete even when the platform prunes caches.
|
||||
* **Runtime ambiguity:** Runtime corroboration is supportive, not absolute; map to “observed/supports/contradicts/unknown” rather than binary.
|
||||
* **Policy drift:** Policy bundles are versioned and pinned into attestations; changes must produce new signed verdicts (delta verdicts).
|
||||
|
||||
## 10) Definition of done for the feature
|
||||
|
||||
A release is “done” only if:
|
||||
|
||||
* A build produces an OCI artifact with an attached **signed verdict attestation**.
|
||||
* Each verdict is **explainable** (reason steps + proof pointers).
|
||||
* Reachability evidence is **stored as a reproducible subgraph** (or explicitly UNKNOWN with reason).
|
||||
* Replay verification reproduces the same verdict with pinned inputs.
|
||||
* UX starts from vulnerabilities and links directly to proofs and audit export.
|
||||
|
||||
If you want, I can turn these guidelines into: (1) a manager-ready checklist per sprint, and (2) a concrete “verdict attestation” JSON schema with canonical hashing/serialization rules.
|
||||
Reference in New Issue
Block a user