git.stella-ops.org/21-Dec-2025 - Designing Explainable Triage Workflows.md at aff0ceb2feb9e5b2e3f38bb7296d3f7e255718a2 - git.stella-ops.org

Files

master 53503cb407 Add reference architecture and testing strategy documentation

- Created a new document for the Stella Ops Reference Architecture outlining the system's topology, trust boundaries, artifact association, and interfaces.
- Developed a comprehensive Testing Strategy document detailing the importance of offline readiness, interoperability, determinism, and operational guardrails.
- Introduced a README for the Testing Strategy, summarizing processing details and key concepts implemented.
- Added guidance for AI agents and developers in the tests directory, including directory structure, test categories, key patterns, and rules for test development.

2025-12-22 07:59:30 +02:00

7.4 KiB

Raw Blame History

Below are operating guidelines for Product and Development Managers to deliver a “vulnerability-first + reachability + multi-analyzer + single built-in attested verdict” capability as a coherent, top-of-market feature set.

1) Product north star and non-negotiables

North star: Every vulnerability finding must resolve to a policy-backed, reachability-informed, runtime-corroborated verdict that is exportable as one signed attestation attached to the built artifact.

Non-negotiables

Vulnerability-first UX: Users start from a CVE/finding and immediately see applicability, reachability, runtime corroboration, and policy rationale.
Single canonical verdict artifact: One built-in, signed verdict attestation per subject (OCI digest), replayable (“same inputs → same output”).
Deterministic evidence: Evidence objects are content-hashed and versioned (feeds, policies, analyzers, graph snapshots).
Unknowns are first-class: “Unknown reachability/runtime/config” is not hidden; it is budgeted and policy-controlled.

2) Scope: what “reachability” means across analyzers

PMs must define reachability per layer and force consistent semantics:

Source reachability
- Entry points → call graph → vulnerable function/symbol (proof subgraph stored).
Language dependency reachability
- Resolved dependency graph + vulnerable component mapping + (where feasible) call-path to vulnerable code.
OS dependency applicability
- Installed package inventory + file ownership + linkage/usage hints (where available).
Binary mapping reachability
- Build-ID / symbol tables / imports + (optional) DWARF/source map; fallback heuristics are explicitly labeled.
Runtime corroboration (eBPF / runtime sensors)
- Execution facts: library loads, syscalls, network exposure, process ancestry; mapped to a “supports/contradicts/unknown” posture for the finding.

Manager rule: Any analyzer that cannot produce a proof object must emit an explicit “UNKNOWN with reason code,” never a silent “not reachable.”

3) The decision model: a strict, explainable merge into one verdict

Adopt a small fixed set of verdicts and require all teams to use them:

AFFECTED, NOT_AFFECTED, MITIGATED, NEEDS_REVIEW

Each verdict must carry:

Reason steps (policy/lattice merge trace)
Confidence score (bounded; explainable inputs)
Counterfactuals (“what would flip this verdict”)
Evidence pointers (hashes to proof objects)

PM guidance on precedence: Do not hardcode “vendor > distro > internal.” Require a policy-defined merge (lattice semantics) where evidence quality and freshness influence trust.

4) Built-in attestation as the primary deliverable

Deliverable: An OCI-attached DSSE/in-toto style attestation called (example) stella.verdict.v1.

Minimum contents:

Subject: image digest(s)
Inputs: feed snapshot IDs, analyzer versions/digests, policy bundle hash, time window, environment tags
Per-CVE records: component, installed version, fixed version, verdict, confidence, reason steps
Proof pointers: reachability subgraph hash, runtime fact hashes, config/exposure facts hash
Replay manifest: “verify this verdict” command + inputs hash

Acceptance criterion: A third party can validate signature and replay deterministically using exported inputs, obtaining byte-identical verdict output.

5) UX requirements (vulnerability-first, proof-linked)

PMs must enforce these UX invariants:

Finding row shows: Verdict chip + confidence + “why” one-liner + proof badges (Reachability / Runtime / Policy / Provenance).
Click-through yields:
- Policy explanation (human-readable steps)
- Evidence graph (hashes, issuers, timestamps, signature status)
- Reachability mini-map (stored subgraph)
- Runtime corroboration timeline (windowed)
- Export: “Audit pack” (verdict + proofs + inputs)

Rule: Any displayed claim must link to a proof node or be explicitly marked “operator note.”

6) Engineering execution rules (to keep this shippable)

Modular contracts

Each analyzer outputs into a shared internal schema (typed nodes/edges + content hashes).
Evidence objects are immutable; updates create new objects (versioned snapshots).

Performance strategy

Vulnerability-first query plan: build “vulnerable element set” per CVE, then run targeted reachability; avoid whole-program graphs unless needed.
Progressive fidelity: fast heuristic → deeper proof when requested; verdict must reflect confidence accordingly.

Determinism

Pin all feeds/policies/analyzer images by digest.
Canonical serialization for graphs and verdicts.
Stable hashing rules documented and tested.

7) Release gates and KPIs (what managers track weekly)

Quality KPIs

% findings with non-UNKNOWN reachability
% findings with runtime corroboration available (where sensor deployed)
False-positive reduction vs baseline (measured via developer confirmations / triage outcomes)
“Explainability completeness”: % verdicts with reason steps + at least one proof pointer
Replay success rate: % attestations replaying deterministically in CI

Operational KPIs

Median time to first verdict per image
Cache hit rate for graphs/proofs
Storage growth per scan (evidence size budgets)

Policy KPIs

Unknown budget breaches by environment (prod/dev)
Override/exception volume and aging

8) Roadmap sequencing (recommended)

Phase 1: Single attested verdict + OS/lang SCA applicability
- Deterministic inputs, verdict schema, signature, OCI attach, basic policy steps.
Phase 2: Source reachability proofs (top languages)
- Store subgraphs; introduce confidence + counterfactuals.
Phase 3: Binary mapping fallback
- Build-ID/symbol-based reachability + explicit “heuristic” labeling.
Phase 4: Runtime corroboration (eBPF) integration
- Evidence facts + time-window model + correlation to findings.
Phase 5: Full lattice merge + Trust Algebra Studio
- Operator-defined semantics; evidence quality weighting; vendor trust scoring.

9) Risk management rules (preempt common failure modes)

Overclaiming: Never present “not affected” without an evidence-backed rationale; otherwise use NEEDS_REVIEW with a clear missing-evidence reason.
Evidence sprawl: Enforce evidence budgets (per-scan size caps) and retention tiers; “audit pack export” must remain complete even when the platform prunes caches.
Runtime ambiguity: Runtime corroboration is supportive, not absolute; map to “observed/supports/contradicts/unknown” rather than binary.
Policy drift: Policy bundles are versioned and pinned into attestations; changes must produce new signed verdicts (delta verdicts).

10) Definition of done for the feature

A release is “done” only if:

A build produces an OCI artifact with an attached signed verdict attestation.
Each verdict is explainable (reason steps + proof pointers).
Reachability evidence is stored as a reproducible subgraph (or explicitly UNKNOWN with reason).
Replay verification reproduces the same verdict with pinned inputs.
UX starts from vulnerabilities and links directly to proofs and audit export.

If you want, I can turn these guidelines into: (1) a manager-ready checklist per sprint, and (2) a concrete “verdict attestation” JSON schema with canonical hashing/serialization rules.

7.4 KiB Raw Blame History