Files

master 53503cb407 Add reference architecture and testing strategy documentation

- Created a new document for the Stella Ops Reference Architecture outlining the system's topology, trust boundaries, artifact association, and interfaces.
- Developed a comprehensive Testing Strategy document detailing the importance of offline readiness, interoperability, determinism, and operational guardrails.
- Introduced a README for the Testing Strategy, summarizing processing details and key concepts implemented.
- Added guidance for AI agents and developers in the tests directory, including directory structure, test categories, key patterns, and rules for test development.

2025-12-22 07:59:30 +02:00

23 KiB

Raw Blame History

Here’s a practical, first‑time‑friendly blueprint for making your security workflow both explainable and provable—from triage to approval.

Explainable triage UX (what & why)

Show every risk score with the minimum evidence a responder needs to trust it:

Reachable path: the concrete call‑chain (or network path) proving the vuln is actually hit.
Entrypoint boundary: the external surface (HTTP route, CLI verb, cron, message topic) that leads to that path.
VEX status: the exploitability decision (Affected/Not Affected/Under Investigation/Fixed) with rationale.
Last‑seen timestamp: when this evidence was last observed/generated.

UI pattern (compact, 1‑click expand)

Row (collapsed): Score 72 • CVE‑2024‑12345 • service: api-gateway • package: x.y.z
Expand panel (evidence):
- Path: POST /billing/charge → BillingController.Pay() → StripeClient.Create()
- Boundary: Ingress: /billing/charge (JWT: required, scope: payments:write)
- VEX: Not Affected (runtime guard strips untrusted input before sink)
- Last seen: 2025‑12‑18T09:22Z (scan: sbomer#c1a2, policy run: lattice#9f0d)
- Actions: “Open proof bundle”, “Re-run check”, “Create exception (time‑boxed)”

Data contract (what the panel needs)

{
  "finding_id": "f-7b3c",
  "cve": "CVE-2024-12345",
  "component": {"name": "stripe-sdk", "version": "6.1.2"},
  "reachable_path": [
    "HTTP POST /billing/charge",
    "BillingController.Pay",
    "StripeClient.Create"
  ],
  "entrypoint": {"type":"http","route":"/billing/charge","auth":"jwt:payments:write"},
  "vex": {"status":"not_affected","justification":"runtime_sanitizer_blocks_sink","timestamp":"2025-12-18T09:22:00Z"},
  "last_seen":"2025-12-18T09:22:00Z",
  "attestation_refs": ["sha256:…sbom", "sha256:…vex", "sha256:…policy"]
}

Evidence‑linked approvals (what & why)

Make “Approve to ship” contingent on verifiable proof, not screenshots:

Chain must exist and be machine‑verifiable: SBOM → VEX → policy decision.
Use in‑toto/DSSE attestations or SLSA provenance so each link has a signature, subject digest, and predicate.
Gate merges/deploys only when the chain validates.

Pipeline gate (simple policy)

Require:
1. SBOM attestation referencing the exact image digest
2. VEX attestation covering all listed components (or explicit allow‑gaps)
3. Policy decision attestation (e.g., “risk ≤ threshold AND all reachable vulns = Not Affected/Fixed”)

Minimal decision attestation (DSSE envelope → JSON payload)

{
  "predicateType": "stella/policy-decision@v1",
  "subject": [{"name":"registry/org/app","digest":{"sha256":"<image-digest>"}}],
  "predicate": {
    "policy": "risk_threshold<=75 && reachable_vulns.all(v => v.vex in ['not_affected','fixed'])",
    "inputs": {
      "sbom_ref": "sha256:<sbom>",
      "vex_ref": "sha256:<vex>"
    },
    "result": {"allowed": true, "score": 61, "exemptions":[]},
    "evidence_refs": ["sha256:<reachability-proof-bundle>"],
    "run_at": "2025-12-18T09:23:11Z"
  }
}

How this lands in your product (concrete moves)

Backend: add /findings/:id/evidence (returns the contract above) + /approvals/:artifact/attestations.
Storage: keep proof bundles (graphs, call stacks, logs) as content‑addressed blobs; store DSSE envelopes alongside.
UI: one list → expandable rows; chips for VEX status; “Open proof” shows the call graph and boundary in 1 view.
CLI/API: stella verify image:<digest> --require sbom,vex,decision returns a signed summary; pipelines fail on non‑zero.
Metrics:
- % changes with complete attestations (target ≥95%)
- TTFE (time‑to‑first‑evidence) from alert → panel open (target ≤30s)
- Post‑deploy reversions due to missing proof (trend to zero)

Starter acceptance checklist

Every risk row expands to path, boundary, VEX, last‑seen in <300 ms.
“Approve” button disabled until SBOM+VEX+Decision attestations validate for the exact artifact digest.
One‑click “Show DSSE chain” renders the three envelopes with subject digests and signers.
Audit log captures who approved, which digests, and which evidence hashes.

If you want, I can turn this into ready‑to‑drop .NET 10 endpoints + a small React panel with mocked data so your team can wire it up fast. Below is a “build‑it” guide for Stella Ops that goes past the concept level: concrete services, schemas, pipelines, signing/storage choices, UI components, and the exact invariants you should enforce so triage is explainable and approvals are provably evidence‑linked.

1) Start with the invariants (the rules your system must never violate)

If you implement nothing else, implement these invariants—they’re what make the UX trustworthy and the approvals auditable.

Artifact anchoring invariant

Every finding, every piece of evidence, and every approval must be anchored to an immutable subject digest (e.g., container image digest sha256:…, binary SHA, or SBOM digest).

No “latest tag” approvals.
No “approve commit” without mapping to the built artifact digest.

Evidence closure invariant

A policy decision is only valid if it references exactly the evidence it used:

inputs.sbom_ref
inputs.vex_ref
inputs.reachability_ref (optional but recommended)
inputs.scan_ref (optional)
and any config/IaC refs used for boundary/exposure.

Signature chain invariant

Evidence is only admissible if it is:

structured (machine readable),
signed (DSSE/in‑toto),
verifiable (trusted identity/keys),
retrievable by digest.

DSSE is specifically designed to authenticate both the message and its type (payload type) and avoid canonicalization pitfalls. (GitHub)

Staleness invariant

Evidence must have:

last_seen and expires_at (or TTL),
a “stale evidence” behavior in policy (deny or degrade score).

2) Choose the canonical formats and where you’ll store “proof”

Attestation envelope: DSSE + in‑toto Statement

Use:

in‑toto Attestation Framework “Statement” as the payload model (“subject + predicateType + predicate”). (GitHub)
Wrap it in DSSE for signing. (GitHub)
If you use Sigstore bundles, the DSSE envelope is expected to carry an in‑toto statement and uses payloadType like application/vnd.in-toto+json. (Sigstore)

SBOM format: CycloneDX or SPDX

SPDX is an ISO/IEC standard and has v3.0 and v2.3 lines in the ecosystem. (spdx.dev)
CycloneDX is an ECMA standard (ECMA‑424) and widely used for application security contexts. (GitHub)

Pick one as your canonical (internally), but ingest both.

VEX format: OpenVEX (practical) + map to “classic” VEX statuses

VEX’s value is triage noise reduction: vendors can assert whether a product is affected, fixed, under investigation, or not affected. (NTIA) OpenVEX is a minimal, embeddable implementation of VEX intended for interoperability. (GitHub)

Where to store proof: OCI registry referrers

Use OCI “subject/referrers” so proofs travel with the artifact:

OCI 1.1 introduces an explicit subject field and referrers graph for signatures/attestations/SBOMs. (opencontainers.org)
ORAS documentation explains linking artifacts via subject. (Oras)
Microsoft docs show oras attach … --artifact-type … patterns (works across registries that support referrers). (Microsoft Learn)

3) System architecture (services + data flow)

Services (minimum set)

Ingestor
- Pulls scanner outputs (SCA/SAST/IaC), SBOM, runtime signals.
Evidence Builder
- Computes reachability, entrypoints, boundary/auth context, score explanation.
Attestation Service
- Creates in‑toto statements, wraps DSSE, signs (cosign/KMS), stores to registry.
Policy Engine
- Evaluates allow/deny + reason codes, emits signed decision attestation.
- Use OPA/Rego for maintainable declarative policies. (openpolicyagent.org)
Stella Ops API
- Serves findings + evidence panels to the UI (fast, cached).
UI
- Explainable triage panel + chain viewer + approve button.

Event flow (artifact‑centric)

Build produces image@sha256:X
Generate SBOM → sign + attach
Run vuln scan → sign + attach (optional but useful)
Evidence Builder creates:
- reachability proof
- boundary proof
- vex doc (or imports vendor VEX + adds your context)
Policy engine evaluates → emits “decision attestation”
UI shows explainable triage + “approve” gating

4) Data model (the exact objects you need)

Core IDs you should standardize

subject_digest: sha256:<image digest>
subject_name: registry/org/app
finding_key: (subject_digest, detector, cve, component_purl, location) stable hash
component_purl: package URL (PURL) canonical component identifier

Tables (Postgres suggested)

artifacts

id (uuid)
name
digest (unique)
created_at

findings

id (uuid)
artifact_digest
cve
component_purl
severity
raw_score
risk_score
status (open/triaged/accepted/fixed)
first_seen, last_seen

evidence

id (uuid)
finding_id
kind (reachable_path | boundary | score_explain | vex | ...)
payload_json (jsonb, small)
blob_ref (content-addressed URI for big payloads)
last_seen
expires_at
confidence (0–1)
source_attestation_digest (nullable)

attestations

id (uuid)
artifact_digest
predicate_type
attestation_digest (sha256 of DSSE envelope)
signer_identity (OIDC subject / cert identity)
issued_at
registry_ref (where attached)

approvals

id (uuid)
artifact_digest
decision_attestation_digest
approver
approved_at
expires_at
reason

5) Explainable triage: how to compute the “Path + Boundary + VEX + Last‑seen”

5.1 Reachable path proof (call chain / flow)

You need a uniform reachability result type:

reachable = true with an explicit path
reachable = false with justification (e.g., symbol absent, dead code)
reachable = unknown with reason (insufficient symbols, dynamic dispatch)

Implementation strategy

Symbol mapping: map CVE → vulnerable symbols/functions/classes
- Use one or more:
  - vendor advisory → patched functions
  - diff mining (commit that fixes CVE) to extract changed symbols
  - curated mapping in your DB for high volume CVEs
Program graph extraction at build time:
- Produce a call graph or dependency graph per language.
- Store as compact adjacency list (or protobuf) keyed by subject_digest.
Entrypoint discovery:
- HTTP routes (framework metadata)
- gRPC service methods
- queue/stream consumers
- cron/CLI handlers
Path search:
- BFS/DFS from entrypoints to vulnerable symbols.
- Record the shortest path + top‑K alternatives.
Proof bundle:
- path nodes with stable IDs
- file hashes + line ranges (no raw source required)
- tool version + config hash
- graph digest

Reachability evidence JSON (UI‑friendly)

{
  "kind": "reachable_path",
  "result": "reachable",
  "confidence": 0.86,
  "entrypoints": [
    {"type":"http","route":"POST /billing/charge","auth":"jwt:payments:write"}
  ],
  "paths": [{
    "path_id": "p-1",
    "steps": [
      {"node":"BillingController.Pay","file_hash":"sha256:aaa","lines":[41,88]},
      {"node":"StripeClient.Create","file_hash":"sha256:bbb","lines":[10,52]},
      {"node":"stripe-sdk.vulnFn","symbol":"stripe-sdk::parseWebhook","evidence":"symbol-match"}
    ]
  }],
  "graph": {"digest":"sha256:callgraph...", "format":"stella-callgraph-v1"},
  "last_seen": "2025-12-18T09:22:00Z",
  "expires_at": "2025-12-25T09:22:00Z"
}

UI rule: never show “reachable” without a concrete, replayable path ID.

5.2 Boundary proof (the “why this is exposed” part)

Boundary proof answers: “Even if reachable, who can trigger it?”

Data sources

Kubernetes ingress/service (exposure)
API gateway routes and auth policies
service mesh auth (mTLS, JWT)
IAM policies (for cloud events)
network policies (deny/allow)

Boundary evidence schema

{
  "kind": "boundary",
  "surface": {"type":"http","route":"POST /billing/charge"},
  "exposure": {"internet": true, "ports":[443]},
  "auth": {
    "mechanism":"jwt",
    "required_scopes":["payments:write"],
    "audience":"billing-api"
  },
  "rate_limits": {"enabled": true, "rps": 20},
  "controls": [
    {"type":"waf","status":"enabled"},
    {"type":"input_validation","status":"enabled","location":"BillingController.Pay"}
  ],
  "last_seen": "2025-12-18T09:22:00Z",
  "confidence": 0.74
}

How to build it

Create a “Surface Extractor” plugin per environment:
- k8s-extractor: reads ingress + service + annotations
- gateway-extractor: reads API gateway config
- iac-extractor: parses Terraform/CloudFormation
Normalize into the schema above.

5.3 VEX in Stella: statuses + justifications

VEX statuses you should support in UI:

Not affected
Affected
Fixed
Under investigation (NTIA)

OpenVEX will carry the machine readable structure. (GitHub)

Practical approach

Treat VEX as the decision record for exploitability.
Your policy can require VEX coverage for all “reachable” high severity vulns.

Rule of thumb

If reachable=true AND boundary shows reachable surface + auth weak → VEX defaults to affected until mitigations proven.
If reachable=false with high confidence and stable proof → VEX may be not_affected.

5.4 Explainable risk score (don’t hide the formula)

Make score explainability first‑class.

Recommended implementation

Store risk score as an additive model:
- base = CVSS normalized
- + reachability_bonus
- + exposure_bonus
- + privilege_bonus
- - mitigation_discount
Emit a score_explain evidence object:

{
  "kind": "score_explain",
  "risk_score": 72,
  "contributions": [
    {"factor":"cvss","value":41,"reason":"CVSS 9.8"},
    {"factor":"reachability","value":18,"reason":"reachable path p-1"},
    {"factor":"exposure","value":10,"reason":"internet-facing route"},
    {"factor":"auth","value":3,"reason":"scope required lowers impact"}
  ],
  "last_seen":"2025-12-18T09:22:00Z"
}

UI rule: “Score 72” must always be clickable to a stable breakdown.

6) The UI you should build (components + interaction rules)

6.1 Findings list row (collapsed)

Show only what helps scanning:

Score badge
CVE + component
service
reachability chip: Reachable / Not reachable / Unknown
VEX chip
last_seen indicator (green/yellow/red)

6.2 Evidence drawer (expanded)

Tabs:

Path
- show entrypoint(s)
- render call chain (simple list first; graph view optional)
Boundary
- exposure, auth, controls
VEX
- status + justification + issuer identity
Score
- breakdown bar/list
Proof
- attestation chain viewer (SBOM → VEX → Decision)
- “Verify locally” action

6.3 “Open proof bundle” viewer

Must display:

subject digest
signer identity
predicate type
digest of proof bundle
last_seen + tool versions

This is where trust is built: responders can see that the evidence is signed, tied to the artifact, and recent.

7) Proof‑linked evidence: how to generate and attach attestations

7.1 Statement format: in‑toto Attestation Framework

in‑toto’s model is:

Subjects (the artifact digests)
Predicate type (schema ID)
Predicate (your actual data) (GitHub)

7.2 DSSE envelope

Wrap statements using DSSE so payload type is signed too. (GitHub)

7.3 Attach to OCI image via referrers

OCI “subject/referrers” makes attestations discoverable from the image digest. (opencontainers.org) ORAS provides the operational model (“attach artifacts to an image”). (Microsoft Learn)

7.4 Practical signing: cosign attest + verify

Cosign has built‑in in‑toto attestation support and can sign custom predicates. (Sigstore)

Typical patterns (example only; adapt to your environment):

# Attach an attestation
cosign attest --predicate reachability.json \
  --type stella/reachability/v1 \
  <image@sha256:digest>

# Verify attestation
cosign verify-attestation --type stella/reachability/v1 \
  <image@sha256:digest>

(Use keyless OIDC or KMS keys depending on your org.)

8) Define your predicate types (this is the “contract” Stella enforces)

You’ll want at least these predicate types:

stella/sbom@v1
- embeds CycloneDX/SPDX (or references blob digest)
stella/vex@v1
- embeds OpenVEX document or references it (GitHub)
stella/reachability@v1
- the reachability evidence above
- includes graph.digest, paths, confidence, expires_at
stella/boundary@v1
- exposure/auth proof and last_seen
stella/policy-decision@v1
- the gating result, references all input attestation digests
Optional: stella/human-approval@v1
- “I approve deploy of subject digest X based on decision attestation Y”
- keep it time‑boxed

9) The policy gate (how approvals become proof‑linked)

9.1 Use OPA/Rego for the gate

OPA policies are written in Rego. (openpolicyagent.org)

Gate input should be a single JSON document assembled from verified attestations:

{
  "subject": {"name":"registry/org/app","digest":"sha256:..."},
  "sbom": {...},
  "vex": {...},
  "reachability": {...},
  "boundary": {...},
  "org_policy": {"max_risk": 75, "max_age_hours": 168}
}

Example Rego (deny‑by‑default)

package stella.gate

default allow := false

# deny if evidence is stale
stale_evidence {
  now := time.now_ns()
  exp := time.parse_rfc3339_ns(input.reachability.expires_at)
  now > exp
}

# deny if any high severity reachable vuln is not resolved by VEX
unresolved_reachable[v] {
  v := input.reachability.findings[_]
  v.severity in {"critical","high"}
  v.reachable == true
  not input.vex.resolution[v.cve] in {"not_affected","fixed"}
}

allow {
  input.risk_score <= input.org_policy.max_risk
  not stale_evidence
  count(unresolved_reachable) == 0
}

9.2 Emit a signed policy decision attestation

When OPA returns allow=true, emit another attestation:

predicate includes the policy version/hash and all input refs.
that’s what the UI “Approve” button targets.

This is the “evidence‑linked approval”: approval references the signed decision, and the decision references the signed evidence.

10) “Approve” button behavior (what Stella Ops should enforce)

Disabled until…

subject digest known
SBOM attestation found + signature verified
VEX attestation found + signature verified
Decision attestation found + signature verified
Decision’s inputs digests match the actual retrieved evidence

When clicked…

Stella Ops creates a stella/human-approval@v1 statement:
- subject = artifact digest
- predicate.decision_ref = decision attestation digest
- predicate.expires_at = short TTL (e.g., 7–30 days)
Signs it with the approver identity
Attaches it to the artifact (OCI referrer)

Audit view must show

approver identity
exact artifact digest
exact decision attestation digest
timestamp and expiry

11) Implementation details that matter in production

11.1 Verification library (shared by UI backend + CI gate)

Write one verifier module used everywhere:

Inputs

image digest
expected predicate types
trust policy (allowed identities/issuers, keyless rules, KMS keys)

Steps

Discover referrers for image@sha256:…
Filter by predicateType
Verify DSSE + signature + identity
Validate JSON schema for predicate
Check subject.digest matches image digest
Return “verified evidence set” + “errors”

11.2 Evidence privacy

Reachability proofs can leak implementation details.

Store file hashes, symbol names, and line ranges
Gate raw source behind elevated permissions
Provide redacted proofs by default

11.3 Evidence TTL strategy

SBOM: long TTL (weeks/months) if digest immutable
Boundary: short TTL (hours/days) because env changes
Reachability: medium TTL (days/weeks) depending on code churn
VEX: must be renewed if boundary/reachability changes

11.4 Handling “Unknown reachability”

Don’t force false certainty.

Mark as unknown and show why (missing symbols, dynamic reflection, stripped binaries)
Policy can treat unknown as “reachable” for critical CVEs in internet‑facing services.

12) A concrete MVP path that still delivers value

If you want a minimal but real first release:

MVP (2–3 deliverables)

Evidence drawer fed by:
- scanner output + SBOM + a simple “entrypoint map”
VEX workflow
- allow engineers to set VEX status + justification
Signed decision gating
- even if reachability is heuristic, the chain is real

Then iterate:

add reachability graphs
add boundary extraction from IaC/K8s
tighten policy (staleness, confidence thresholds)

13) Quick checklist for “done enough to trust”

Every finding expands to: Path, Boundary, VEX, Score, Proof
Every evidence tab shows last_seen + confidence
“Verify chain” works: SBOM → VEX → Decision all signed and bound to the artifact digest
Approve button signs a human approval attestation tied to the decision digest
CI gate verifies the same chain before deploy

If you want, I can also drop in:

a full set of JSON Schemas for stella/*@v1 predicates,
a reference verifier implementation outline in .NET 10 (Minimal API + a verifier class),
and a sample UI component tree (React) that renders path/boundary graphs and attestation chains.

23 KiB Raw Blame History Unescape Escape