Files
git.stella-ops.org/docs/product-advisories/18-Dec-2025 - Designing Explainable Triage and Proof‑Linked Evidence.md
master 53503cb407 Add reference architecture and testing strategy documentation
- Created a new document for the Stella Ops Reference Architecture outlining the system's topology, trust boundaries, artifact association, and interfaces.
- Developed a comprehensive Testing Strategy document detailing the importance of offline readiness, interoperability, determinism, and operational guardrails.
- Introduced a README for the Testing Strategy, summarizing processing details and key concepts implemented.
- Added guidance for AI agents and developers in the tests directory, including directory structure, test categories, key patterns, and rules for test development.
2025-12-22 07:59:30 +02:00

23 KiB
Raw Blame History

Heres a practical, firsttimefriendly blueprint for making your security workflow both explainable and provable—from triage to approval.

Explainable triage UX (what & why)

Show every risk score with the minimum evidence a responder needs to trust it:

  • Reachable path: the concrete callchain (or network path) proving the vuln is actually hit.
  • Entrypoint boundary: the external surface (HTTP route, CLI verb, cron, message topic) that leads to that path.
  • VEX status: the exploitability decision (Affected/Not Affected/Under Investigation/Fixed) with rationale.
  • Lastseen timestamp: when this evidence was last observed/generated.

UI pattern (compact, 1click expand)

  • Row (collapsed): Score 72 • CVE202412345 • service: api-gateway • package: x.y.z

  • Expand panel (evidence):

    • Path: POST /billing/charge → BillingController.Pay() → StripeClient.Create()
    • Boundary: Ingress: /billing/charge (JWT: required, scope: payments:write)
    • VEX: Not Affected (runtime guard strips untrusted input before sink)
    • Last seen: 20251218T09:22Z (scan: sbomer#c1a2, policy run: lattice#9f0d)
    • Actions: “Open proof bundle”, “Re-run check”, “Create exception (timeboxed)”

Data contract (what the panel needs)

{
  "finding_id": "f-7b3c",
  "cve": "CVE-2024-12345",
  "component": {"name": "stripe-sdk", "version": "6.1.2"},
  "reachable_path": [
    "HTTP POST /billing/charge",
    "BillingController.Pay",
    "StripeClient.Create"
  ],
  "entrypoint": {"type":"http","route":"/billing/charge","auth":"jwt:payments:write"},
  "vex": {"status":"not_affected","justification":"runtime_sanitizer_blocks_sink","timestamp":"2025-12-18T09:22:00Z"},
  "last_seen":"2025-12-18T09:22:00Z",
  "attestation_refs": ["sha256:…sbom", "sha256:…vex", "sha256:…policy"]
}

Evidencelinked approvals (what & why)

Make “Approve to ship” contingent on verifiable proof, not screenshots:

  • Chain must exist and be machineverifiable: SBOM → VEX → policy decision.
  • Use intoto/DSSE attestations or SLSA provenance so each link has a signature, subject digest, and predicate.
  • Gate merges/deploys only when the chain validates.

Pipeline gate (simple policy)

  • Require:

    1. SBOM attestation referencing the exact image digest
    2. VEX attestation covering all listed components (or explicit allowgaps)
    3. Policy decision attestation (e.g., “risk ≤ threshold AND all reachable vulns = Not Affected/Fixed”)

Minimal decision attestation (DSSE envelope → JSON payload)

{
  "predicateType": "stella/policy-decision@v1",
  "subject": [{"name":"registry/org/app","digest":{"sha256":"<image-digest>"}}],
  "predicate": {
    "policy": "risk_threshold<=75 && reachable_vulns.all(v => v.vex in ['not_affected','fixed'])",
    "inputs": {
      "sbom_ref": "sha256:<sbom>",
      "vex_ref": "sha256:<vex>"
    },
    "result": {"allowed": true, "score": 61, "exemptions":[]},
    "evidence_refs": ["sha256:<reachability-proof-bundle>"],
    "run_at": "2025-12-18T09:23:11Z"
  }
}

How this lands in your product (concrete moves)

  • Backend: add /findings/:id/evidence (returns the contract above) + /approvals/:artifact/attestations.

  • Storage: keep proof bundles (graphs, call stacks, logs) as contentaddressed blobs; store DSSE envelopes alongside.

  • UI: one list → expandable rows; chips for VEX status; “Open proof” shows the call graph and boundary in 1 view.

  • CLI/API: stella verify image:<digest> --require sbom,vex,decision returns a signed summary; pipelines fail on nonzero.

  • Metrics:

    • % changes with complete attestations (target ≥95%)
    • TTFE (timetofirstevidence) from alert → panel open (target ≤30s)
    • Postdeploy reversions due to missing proof (trend to zero)

Starter acceptance checklist

  • Every risk row expands to path, boundary, VEX, lastseen in <300ms.
  • “Approve” button disabled until SBOM+VEX+Decision attestations validate for the exact artifact digest.
  • Oneclick “Show DSSE chain” renders the three envelopes with subject digests and signers.
  • Audit log captures who approved, which digests, and which evidence hashes.

If you want, I can turn this into readytodrop .NET 10 endpoints + a small React panel with mocked data so your team can wire it up fast. Below is a “buildit” guide for Stella Ops that goes past the concept level: concrete services, schemas, pipelines, signing/storage choices, UI components, and the exact invariants you should enforce so triage is explainable and approvals are provably evidencelinked.


1) Start with the invariants (the rules your system must never violate)

If you implement nothing else, implement these invariants—theyre what make the UX trustworthy and the approvals auditable.

Artifact anchoring invariant

Every finding, every piece of evidence, and every approval must be anchored to an immutable subject digest (e.g., container image digest sha256:…, binary SHA, or SBOM digest).

  • No “latest tag” approvals.
  • No “approve commit” without mapping to the built artifact digest.

Evidence closure invariant

A policy decision is only valid if it references exactly the evidence it used:

  • inputs.sbom_ref
  • inputs.vex_ref
  • inputs.reachability_ref (optional but recommended)
  • inputs.scan_ref (optional)
  • and any config/IaC refs used for boundary/exposure.

Signature chain invariant

Evidence is only admissible if it is:

  1. structured (machine readable),
  2. signed (DSSE/intoto),
  3. verifiable (trusted identity/keys),
  4. retrievable by digest.

DSSE is specifically designed to authenticate both the message and its type (payload type) and avoid canonicalization pitfalls. (GitHub)

Staleness invariant

Evidence must have:

  • last_seen and expires_at (or TTL),
  • a “stale evidence” behavior in policy (deny or degrade score).

2) Choose the canonical formats and where youll store “proof”

Attestation envelope: DSSE + intoto Statement

Use:

  • intoto Attestation Framework “Statement” as the payload model (“subject + predicateType + predicate”). (GitHub)
  • Wrap it in DSSE for signing. (GitHub)
  • If you use Sigstore bundles, the DSSE envelope is expected to carry an intoto statement and uses payloadType like application/vnd.in-toto+json. (Sigstore)

SBOM format: CycloneDX or SPDX

  • SPDX is an ISO/IEC standard and has v3.0 and v2.3 lines in the ecosystem. (spdx.dev)
  • CycloneDX is an ECMA standard (ECMA424) and widely used for application security contexts. (GitHub)

Pick one as your canonical (internally), but ingest both.

VEX format: OpenVEX (practical) + map to “classic” VEX statuses

VEXs value is triage noise reduction: vendors can assert whether a product is affected, fixed, under investigation, or not affected. (NTIA) OpenVEX is a minimal, embeddable implementation of VEX intended for interoperability. (GitHub)

Where to store proof: OCI registry referrers

Use OCI “subject/referrers” so proofs travel with the artifact:

  • OCI 1.1 introduces an explicit subject field and referrers graph for signatures/attestations/SBOMs. (opencontainers.org)
  • ORAS documentation explains linking artifacts via subject. (Oras)
  • Microsoft docs show oras attach … --artifact-type … patterns (works across registries that support referrers). (Microsoft Learn)

3) System architecture (services + data flow)

Services (minimum set)

  1. Ingestor

    • Pulls scanner outputs (SCA/SAST/IaC), SBOM, runtime signals.
  2. Evidence Builder

    • Computes reachability, entrypoints, boundary/auth context, score explanation.
  3. Attestation Service

    • Creates intoto statements, wraps DSSE, signs (cosign/KMS), stores to registry.
  4. Policy Engine

    • Evaluates allow/deny + reason codes, emits signed decision attestation.
    • Use OPA/Rego for maintainable declarative policies. (openpolicyagent.org)
  5. Stella Ops API

    • Serves findings + evidence panels to the UI (fast, cached).
  6. UI

    • Explainable triage panel + chain viewer + approve button.

Event flow (artifactcentric)

  1. Build produces image@sha256:X

  2. Generate SBOM → sign + attach

  3. Run vuln scan → sign + attach (optional but useful)

  4. Evidence Builder creates:

    • reachability proof
    • boundary proof
    • vex doc (or imports vendor VEX + adds your context)
  5. Policy engine evaluates → emits “decision attestation”

  6. UI shows explainable triage + “approve” gating


4) Data model (the exact objects you need)

Core IDs you should standardize

  • subject_digest: sha256:<image digest>
  • subject_name: registry/org/app
  • finding_key: (subject_digest, detector, cve, component_purl, location) stable hash
  • component_purl: package URL (PURL) canonical component identifier

Tables (Postgres suggested)

artifacts

  • id (uuid)
  • name
  • digest (unique)
  • created_at

findings

  • id (uuid)
  • artifact_digest
  • cve
  • component_purl
  • severity
  • raw_score
  • risk_score
  • status (open/triaged/accepted/fixed)
  • first_seen, last_seen

evidence

  • id (uuid)
  • finding_id
  • kind (reachable_path | boundary | score_explain | vex | ...)
  • payload_json (jsonb, small)
  • blob_ref (content-addressed URI for big payloads)
  • last_seen
  • expires_at
  • confidence (01)
  • source_attestation_digest (nullable)

attestations

  • id (uuid)
  • artifact_digest
  • predicate_type
  • attestation_digest (sha256 of DSSE envelope)
  • signer_identity (OIDC subject / cert identity)
  • issued_at
  • registry_ref (where attached)

approvals

  • id (uuid)
  • artifact_digest
  • decision_attestation_digest
  • approver
  • approved_at
  • expires_at
  • reason

5) Explainable triage: how to compute the “Path + Boundary + VEX + Lastseen”

5.1 Reachable path proof (call chain / flow)

You need a uniform reachability result type:

  • reachable = true with an explicit path
  • reachable = false with justification (e.g., symbol absent, dead code)
  • reachable = unknown with reason (insufficient symbols, dynamic dispatch)

Implementation strategy

  1. Symbol mapping: map CVE → vulnerable symbols/functions/classes

    • Use one or more:

      • vendor advisory → patched functions
      • diff mining (commit that fixes CVE) to extract changed symbols
      • curated mapping in your DB for high volume CVEs
  2. Program graph extraction at build time:

    • Produce a call graph or dependency graph per language.
    • Store as compact adjacency list (or protobuf) keyed by subject_digest.
  3. Entrypoint discovery:

    • HTTP routes (framework metadata)
    • gRPC service methods
    • queue/stream consumers
    • cron/CLI handlers
  4. Path search:

    • BFS/DFS from entrypoints to vulnerable symbols.
    • Record the shortest path + topK alternatives.
  5. Proof bundle:

    • path nodes with stable IDs
    • file hashes + line ranges (no raw source required)
    • tool version + config hash
    • graph digest

Reachability evidence JSON (UIfriendly)

{
  "kind": "reachable_path",
  "result": "reachable",
  "confidence": 0.86,
  "entrypoints": [
    {"type":"http","route":"POST /billing/charge","auth":"jwt:payments:write"}
  ],
  "paths": [{
    "path_id": "p-1",
    "steps": [
      {"node":"BillingController.Pay","file_hash":"sha256:aaa","lines":[41,88]},
      {"node":"StripeClient.Create","file_hash":"sha256:bbb","lines":[10,52]},
      {"node":"stripe-sdk.vulnFn","symbol":"stripe-sdk::parseWebhook","evidence":"symbol-match"}
    ]
  }],
  "graph": {"digest":"sha256:callgraph...", "format":"stella-callgraph-v1"},
  "last_seen": "2025-12-18T09:22:00Z",
  "expires_at": "2025-12-25T09:22:00Z"
}

UI rule: never show “reachable” without a concrete, replayable path ID.


5.2 Boundary proof (the “why this is exposed” part)

Boundary proof answers: “Even if reachable, who can trigger it?”

Data sources

  • Kubernetes ingress/service (exposure)
  • API gateway routes and auth policies
  • service mesh auth (mTLS, JWT)
  • IAM policies (for cloud events)
  • network policies (deny/allow)

Boundary evidence schema

{
  "kind": "boundary",
  "surface": {"type":"http","route":"POST /billing/charge"},
  "exposure": {"internet": true, "ports":[443]},
  "auth": {
    "mechanism":"jwt",
    "required_scopes":["payments:write"],
    "audience":"billing-api"
  },
  "rate_limits": {"enabled": true, "rps": 20},
  "controls": [
    {"type":"waf","status":"enabled"},
    {"type":"input_validation","status":"enabled","location":"BillingController.Pay"}
  ],
  "last_seen": "2025-12-18T09:22:00Z",
  "confidence": 0.74
}

How to build it

  • Create a “Surface Extractor” plugin per environment:

    • k8s-extractor: reads ingress + service + annotations
    • gateway-extractor: reads API gateway config
    • iac-extractor: parses Terraform/CloudFormation
  • Normalize into the schema above.


5.3 VEX in Stella: statuses + justifications

VEX statuses you should support in UI:

  • Not affected
  • Affected
  • Fixed
  • Under investigation (NTIA)

OpenVEX will carry the machine readable structure. (GitHub)

Practical approach

  • Treat VEX as the decision record for exploitability.
  • Your policy can require VEX coverage for all “reachable” high severity vulns.

Rule of thumb

  • If reachable=true AND boundary shows reachable surface + auth weak → VEX defaults to affected until mitigations proven.
  • If reachable=false with high confidence and stable proof → VEX may be not_affected.

5.4 Explainable risk score (dont hide the formula)

Make score explainability firstclass.

Recommended implementation

  • Store risk score as an additive model:

    • base = CVSS normalized
    • + reachability_bonus
    • + exposure_bonus
    • + privilege_bonus
    • - mitigation_discount
  • Emit a score_explain evidence object:

{
  "kind": "score_explain",
  "risk_score": 72,
  "contributions": [
    {"factor":"cvss","value":41,"reason":"CVSS 9.8"},
    {"factor":"reachability","value":18,"reason":"reachable path p-1"},
    {"factor":"exposure","value":10,"reason":"internet-facing route"},
    {"factor":"auth","value":3,"reason":"scope required lowers impact"}
  ],
  "last_seen":"2025-12-18T09:22:00Z"
}

UI rule: “Score 72” must always be clickable to a stable breakdown.


6) The UI you should build (components + interaction rules)

6.1 Findings list row (collapsed)

Show only what helps scanning:

  • Score badge
  • CVE + component
  • service
  • reachability chip: Reachable / Not reachable / Unknown
  • VEX chip
  • last_seen indicator (green/yellow/red)

6.2 Evidence drawer (expanded)

Tabs:

  1. Path

    • show entrypoint(s)
    • render call chain (simple list first; graph view optional)
  2. Boundary

    • exposure, auth, controls
  3. VEX

    • status + justification + issuer identity
  4. Score

    • breakdown bar/list
  5. Proof

    • attestation chain viewer (SBOM → VEX → Decision)
    • “Verify locally” action

6.3 “Open proof bundle” viewer

Must display:

  • subject digest
  • signer identity
  • predicate type
  • digest of proof bundle
  • last_seen + tool versions

This is where trust is built: responders can see that the evidence is signed, tied to the artifact, and recent.


7) Prooflinked evidence: how to generate and attach attestations

7.1 Statement format: intoto Attestation Framework

intotos model is:

  • Subjects (the artifact digests)
  • Predicate type (schema ID)
  • Predicate (your actual data) (GitHub)

7.2 DSSE envelope

Wrap statements using DSSE so payload type is signed too. (GitHub)

7.3 Attach to OCI image via referrers

OCI “subject/referrers” makes attestations discoverable from the image digest. (opencontainers.org) ORAS provides the operational model (“attach artifacts to an image”). (Microsoft Learn)

7.4 Practical signing: cosign attest + verify

Cosign has builtin intoto attestation support and can sign custom predicates. (Sigstore)

Typical patterns (example only; adapt to your environment):

# Attach an attestation
cosign attest --predicate reachability.json \
  --type stella/reachability/v1 \
  <image@sha256:digest>

# Verify attestation
cosign verify-attestation --type stella/reachability/v1 \
  <image@sha256:digest>

(Use keyless OIDC or KMS keys depending on your org.)


8) Define your predicate types (this is the “contract” Stella enforces)

Youll want at least these predicate types:

  1. stella/sbom@v1

    • embeds CycloneDX/SPDX (or references blob digest)
  2. stella/vex@v1

    • embeds OpenVEX document or references it (GitHub)
  3. stella/reachability@v1

    • the reachability evidence above
    • includes graph.digest, paths, confidence, expires_at
  4. stella/boundary@v1

    • exposure/auth proof and last_seen
  5. stella/policy-decision@v1

    • the gating result, references all input attestation digests
  6. Optional: stella/human-approval@v1

    • “I approve deploy of subject digest X based on decision attestation Y”
    • keep it timeboxed

9) The policy gate (how approvals become prooflinked)

9.1 Use OPA/Rego for the gate

OPA policies are written in Rego. (openpolicyagent.org)

Gate input should be a single JSON document assembled from verified attestations:

{
  "subject": {"name":"registry/org/app","digest":"sha256:..."},
  "sbom": {...},
  "vex": {...},
  "reachability": {...},
  "boundary": {...},
  "org_policy": {"max_risk": 75, "max_age_hours": 168}
}

Example Rego (denybydefault)

package stella.gate

default allow := false

# deny if evidence is stale
stale_evidence {
  now := time.now_ns()
  exp := time.parse_rfc3339_ns(input.reachability.expires_at)
  now > exp
}

# deny if any high severity reachable vuln is not resolved by VEX
unresolved_reachable[v] {
  v := input.reachability.findings[_]
  v.severity in {"critical","high"}
  v.reachable == true
  not input.vex.resolution[v.cve] in {"not_affected","fixed"}
}

allow {
  input.risk_score <= input.org_policy.max_risk
  not stale_evidence
  count(unresolved_reachable) == 0
}

9.2 Emit a signed policy decision attestation

When OPA returns allow=true, emit another attestation:

  • predicate includes the policy version/hash and all input refs.
  • thats what the UI “Approve” button targets.

This is the “evidencelinked approval”: approval references the signed decision, and the decision references the signed evidence.


10) “Approve” button behavior (what Stella Ops should enforce)

Disabled until…

  • subject digest known
  • SBOM attestation found + signature verified
  • VEX attestation found + signature verified
  • Decision attestation found + signature verified
  • Decisions inputs digests match the actual retrieved evidence

When clicked…

  1. Stella Ops creates a stella/human-approval@v1 statement:

    • subject = artifact digest
    • predicate.decision_ref = decision attestation digest
    • predicate.expires_at = short TTL (e.g., 730 days)
  2. Signs it with the approver identity

  3. Attaches it to the artifact (OCI referrer)

Audit view must show

  • approver identity
  • exact artifact digest
  • exact decision attestation digest
  • timestamp and expiry

11) Implementation details that matter in production

11.1 Verification library (shared by UI backend + CI gate)

Write one verifier module used everywhere:

Inputs

  • image digest
  • expected predicate types
  • trust policy (allowed identities/issuers, keyless rules, KMS keys)

Steps

  1. Discover referrers for image@sha256:…
  2. Filter by predicateType
  3. Verify DSSE + signature + identity
  4. Validate JSON schema for predicate
  5. Check subject.digest matches image digest
  6. Return “verified evidence set” + “errors”

11.2 Evidence privacy

Reachability proofs can leak implementation details.

  • Store file hashes, symbol names, and line ranges
  • Gate raw source behind elevated permissions
  • Provide redacted proofs by default

11.3 Evidence TTL strategy

  • SBOM: long TTL (weeks/months) if digest immutable
  • Boundary: short TTL (hours/days) because env changes
  • Reachability: medium TTL (days/weeks) depending on code churn
  • VEX: must be renewed if boundary/reachability changes

11.4 Handling “Unknown reachability”

Dont force false certainty.

  • Mark as unknown and show why (missing symbols, dynamic reflection, stripped binaries)
  • Policy can treat unknown as “reachable” for critical CVEs in internetfacing services.

12) A concrete MVP path that still delivers value

If you want a minimal but real first release:

MVP (23 deliverables)

  1. Evidence drawer fed by:

    • scanner output + SBOM + a simple “entrypoint map”
  2. VEX workflow

    • allow engineers to set VEX status + justification
  3. Signed decision gating

    • even if reachability is heuristic, the chain is real

Then iterate:

  • add reachability graphs
  • add boundary extraction from IaC/K8s
  • tighten policy (staleness, confidence thresholds)

13) Quick checklist for “done enough to trust”

  • Every finding expands to: Path, Boundary, VEX, Score, Proof
  • Every evidence tab shows last_seen + confidence
  • “Verify chain” works: SBOM → VEX → Decision all signed and bound to the artifact digest
  • Approve button signs a human approval attestation tied to the decision digest
  • CI gate verifies the same chain before deploy

If you want, I can also drop in:

  • a full set of JSON Schemas for stella/*@v1 predicates,
  • a reference verifier implementation outline in .NET 10 (Minimal API + a verifier class),
  • and a sample UI component tree (React) that renders path/boundary graphs and attestation chains.