Files
git.stella-ops.org/docs/product-advisories/21-Dec-2025 - Designing Explainable Triage Workflows.md
master 53503cb407 Add reference architecture and testing strategy documentation
- Created a new document for the Stella Ops Reference Architecture outlining the system's topology, trust boundaries, artifact association, and interfaces.
- Developed a comprehensive Testing Strategy document detailing the importance of offline readiness, interoperability, determinism, and operational guardrails.
- Introduced a README for the Testing Strategy, summarizing processing details and key concepts implemented.
- Added guidance for AI agents and developers in the tests directory, including directory structure, test categories, key patterns, and rules for test development.
2025-12-22 07:59:30 +02:00

7.4 KiB

Below are operating guidelines for Product and Development Managers to deliver a “vulnerability-first + reachability + multi-analyzer + single built-in attested verdict” capability as a coherent, top-of-market feature set.

1) Product north star and non-negotiables

North star: Every vulnerability finding must resolve to a policy-backed, reachability-informed, runtime-corroborated verdict that is exportable as one signed attestation attached to the built artifact.

Non-negotiables

  • Vulnerability-first UX: Users start from a CVE/finding and immediately see applicability, reachability, runtime corroboration, and policy rationale.
  • Single canonical verdict artifact: One built-in, signed verdict attestation per subject (OCI digest), replayable (“same inputs → same output”).
  • Deterministic evidence: Evidence objects are content-hashed and versioned (feeds, policies, analyzers, graph snapshots).
  • Unknowns are first-class: “Unknown reachability/runtime/config” is not hidden; it is budgeted and policy-controlled.

2) Scope: what “reachability” means across analyzers

PMs must define reachability per layer and force consistent semantics:

  1. Source reachability

    • Entry points → call graph → vulnerable function/symbol (proof subgraph stored).
  2. Language dependency reachability

    • Resolved dependency graph + vulnerable component mapping + (where feasible) call-path to vulnerable code.
  3. OS dependency applicability

    • Installed package inventory + file ownership + linkage/usage hints (where available).
  4. Binary mapping reachability

    • Build-ID / symbol tables / imports + (optional) DWARF/source map; fallback heuristics are explicitly labeled.
  5. Runtime corroboration (eBPF / runtime sensors)

    • Execution facts: library loads, syscalls, network exposure, process ancestry; mapped to a “supports/contradicts/unknown” posture for the finding.

Manager rule: Any analyzer that cannot produce a proof object must emit an explicit “UNKNOWN with reason code,” never a silent “not reachable.”

3) The decision model: a strict, explainable merge into one verdict

Adopt a small fixed set of verdicts and require all teams to use them:

  • AFFECTED, NOT_AFFECTED, MITIGATED, NEEDS_REVIEW

Each verdict must carry:

  • Reason steps (policy/lattice merge trace)
  • Confidence score (bounded; explainable inputs)
  • Counterfactuals (“what would flip this verdict”)
  • Evidence pointers (hashes to proof objects)

PM guidance on precedence: Do not hardcode “vendor > distro > internal.” Require a policy-defined merge (lattice semantics) where evidence quality and freshness influence trust.

4) Built-in attestation as the primary deliverable

Deliverable: An OCI-attached DSSE/in-toto style attestation called (example) stella.verdict.v1.

Minimum contents:

  • Subject: image digest(s)
  • Inputs: feed snapshot IDs, analyzer versions/digests, policy bundle hash, time window, environment tags
  • Per-CVE records: component, installed version, fixed version, verdict, confidence, reason steps
  • Proof pointers: reachability subgraph hash, runtime fact hashes, config/exposure facts hash
  • Replay manifest: “verify this verdict” command + inputs hash

Acceptance criterion: A third party can validate signature and replay deterministically using exported inputs, obtaining byte-identical verdict output.

5) UX requirements (vulnerability-first, proof-linked)

PMs must enforce these UX invariants:

  • Finding row shows: Verdict chip + confidence + “why” one-liner + proof badges (Reachability / Runtime / Policy / Provenance).

  • Click-through yields:

    • Policy explanation (human-readable steps)
    • Evidence graph (hashes, issuers, timestamps, signature status)
    • Reachability mini-map (stored subgraph)
    • Runtime corroboration timeline (windowed)
    • Export: “Audit pack” (verdict + proofs + inputs)

Rule: Any displayed claim must link to a proof node or be explicitly marked “operator note.”

6) Engineering execution rules (to keep this shippable)

Modular contracts

  • Each analyzer outputs into a shared internal schema (typed nodes/edges + content hashes).
  • Evidence objects are immutable; updates create new objects (versioned snapshots).

Performance strategy

  • Vulnerability-first query plan: build “vulnerable element set” per CVE, then run targeted reachability; avoid whole-program graphs unless needed.
  • Progressive fidelity: fast heuristic → deeper proof when requested; verdict must reflect confidence accordingly.

Determinism

  • Pin all feeds/policies/analyzer images by digest.
  • Canonical serialization for graphs and verdicts.
  • Stable hashing rules documented and tested.

7) Release gates and KPIs (what managers track weekly)

Quality KPIs

  • % findings with non-UNKNOWN reachability
  • % findings with runtime corroboration available (where sensor deployed)
  • False-positive reduction vs baseline (measured via developer confirmations / triage outcomes)
  • “Explainability completeness”: % verdicts with reason steps + at least one proof pointer
  • Replay success rate: % attestations replaying deterministically in CI

Operational KPIs

  • Median time to first verdict per image
  • Cache hit rate for graphs/proofs
  • Storage growth per scan (evidence size budgets)

Policy KPIs

  • Unknown budget breaches by environment (prod/dev)
  • Override/exception volume and aging
  1. Phase 1: Single attested verdict + OS/lang SCA applicability

    • Deterministic inputs, verdict schema, signature, OCI attach, basic policy steps.
  2. Phase 2: Source reachability proofs (top languages)

    • Store subgraphs; introduce confidence + counterfactuals.
  3. Phase 3: Binary mapping fallback

    • Build-ID/symbol-based reachability + explicit “heuristic” labeling.
  4. Phase 4: Runtime corroboration (eBPF) integration

    • Evidence facts + time-window model + correlation to findings.
  5. Phase 5: Full lattice merge + Trust Algebra Studio

    • Operator-defined semantics; evidence quality weighting; vendor trust scoring.

9) Risk management rules (preempt common failure modes)

  • Overclaiming: Never present “not affected” without an evidence-backed rationale; otherwise use NEEDS_REVIEW with a clear missing-evidence reason.
  • Evidence sprawl: Enforce evidence budgets (per-scan size caps) and retention tiers; “audit pack export” must remain complete even when the platform prunes caches.
  • Runtime ambiguity: Runtime corroboration is supportive, not absolute; map to “observed/supports/contradicts/unknown” rather than binary.
  • Policy drift: Policy bundles are versioned and pinned into attestations; changes must produce new signed verdicts (delta verdicts).

10) Definition of done for the feature

A release is “done” only if:

  • A build produces an OCI artifact with an attached signed verdict attestation.
  • Each verdict is explainable (reason steps + proof pointers).
  • Reachability evidence is stored as a reproducible subgraph (or explicitly UNKNOWN with reason).
  • Replay verification reproduces the same verdict with pinned inputs.
  • UX starts from vulnerabilities and links directly to proofs and audit export.

If you want, I can turn these guidelines into: (1) a manager-ready checklist per sprint, and (2) a concrete “verdict attestation” JSON schema with canonical hashing/serialization rules.