Files

StellaOps Bot 0ada1b583f save progress

2025-12-20 12:15:16 +02:00

19 KiB

Raw Blame History

Here’s a compact, practical plan to harden Stella Ops around offline‑ready security evidence and deterministic verdicts, with just enough background so it all clicks.

Why this matters (quick primer)

Air‑gapped/offline: Many customers can’t reach public feeds or registries. Your scanners, SBOM tooling, and attestations must work with pre‑synced bundles and prove what data they used.
Interoperability: Teams mix tools (Syft/Grype/Trivy, cosign, CycloneDX/SPDX). Your CI should round‑trip SBOMs and attestations end‑to‑end and prove that downstream consumers (e.g., Grype) can load them.
Determinism: Auditors expect “same inputs → same verdict.” Capture inputs, policies, and feed hashes so a verdict is exactly reproducible later.
Operational guardrails: Shipping gates should fail early on unknowns and apply backpressure gracefully when load spikes.

E2E test themes to add (what to build)

Air‑gapped operation e2e

Package “offline bundle” (vuln feeds, package catalogs, policy/lattice rules, certs, keys).
Run scans (containers, OS, language deps, binaries) without network.
Assert: SBOMs generated, attestations signed/verified, verdicts emitted.
Evidence: manifest of bundle contents + hashes in the run log.

Interop round‑trips (SBOM ⇄ attestation ⇄ scanner)

Produce SBOM (CycloneDX 1.6 and SPDX 3.0.1) with Syft.
Create DSSE/cosign attestation for that SBOM.
Verify consumer tools:
- Grype scans from SBOM (no image pull) and respects attestations.
- Verdict references the exact SBOM digest and attestation chain.
Assert: consumers load, validate, and produce identical findings vs direct scan.

Replayability (delta‑verdicts + strict replay)

Store input set: artifact digest(s), SBOM digests, policy version, feed digests, lattice rules, tool versions.
Re‑run later; assert byte‑identical verdict and same “delta‑verdict” when inputs unchanged.

Unknowns‑budget policy gates

Inject controlled “unknown” conditions (missing CPE mapping, unresolved package source, unparsed distro).
Gate: fail build if unknowns > budget (e.g., prod=0, staging≤N).
Assert: UI, CLI, and attestation all record unknown counts and gate decision.

Attestation round‑trip & validation

Produce: build‑provenance (in‑toto/DSSE), SBOM attest, VEX attest, final verdict attest.
Verify: signature (cosign), certificate chain, time‑stamping, Rekor‑style (or mirror) inclusion when online; cached proofs when offline.
Assert: each attestation is linked in the verdict’s evidence index.

Router backpressure chaos (HTTP 429/503 + Retry‑After)

Load tests that trigger per‑instance and per‑environment limits.
Assert: clients back off per Retry‑After, queues drain, no data loss, latencies bounded; UI shows throttling reason.

UI reducer tests for reachability & VEX chips

Component tests: large SBOM graphs, focused reachability subgraphs, and VEX status chips (affected/not‑affected/under‑investigation).
Assert: stable rendering under 50k+ nodes; interactions remain <200 ms.

Next‑week checklist (do these now)

Delta‑verdict replay tests: golden corpus; lock tool+feed versions; assert bit‑for‑bit verdict.
Unknowns‑budget gates in CI: policy + failing examples; surface in PR checks and UI.
SBOM attestation round‑trip: Syft → cosign attest → Grype consume‑from‑SBOM; verify signatures & digests.
Router backpressure chaos: scripted spike; verify 429/503 + Retry‑After handling and metrics.
UI reducer tests: reachability graph snapshots; VEX chip states; regression suite.

Minimal artifacts to standardize (so tests are boring—good!)

Offline bundle spec: bundle.json with content digests (feeds, policies, keys).
Evidence manifest: machine‑readable index linking verdict → SBOM digest → attestation IDs → tool versions.
Delta‑verdict schema: captures before/after graph deltas, rule evals, and final gate result.
Unknowns taxonomy: codes (e.g., PKG_SOURCE_UNKNOWN, CPE_AMBIG) with severities and budgets.

CI wiring (quick sketch)

Jobs: offline-e2e, interop-e2e, replayable-verdicts, unknowns-gate, router-chaos, ui-reducers.
Matrix: {Debian/Alpine/RHEL‑like} × {amd64/arm64} × {CycloneDX/SPDX}.
Cache discipline: pin tool versions, vendor feeds to content‑addressed store.

Fast success criteria (green = done)

Can run full scan + attest + verify with no network.
Re‑running a fixed input set yields identical verdict.
Grype (from SBOM) matches image scan results within tolerance.
Builds auto‑fail when unknowns budget exceeded.
Router under burst emits correct Retry‑After and recovers cleanly.
UI handles huge graphs; VEX chips never desync from evidence.

If you want, I’ll turn this into GitLab/Gitea pipeline YAML + a tiny sample repo (image, SBOM, policies, and goldens) so your team can plug‑and‑play. Below is a complete, end-to-end testing strategy for Stella Ops that turns your moats (offline readiness, deterministic replayable verdicts, lattice/policy decisioning, attestation provenance, unknowns budgets, router backpressure, UI reachability evidence) into continuously verified guarantees.

1) Non-negotiable test principles

1.1 Determinism as a testable contract

A scan/verdict is deterministic iff same inputs → byte-identical outputs across time and machines (within defined tolerances like timestamps captured as evidence, not embedded in payload order).

Determinism controls (must be enforced by tests):

Canonical JSON (stable key order, stable array ordering where semantically unordered).
Stable sorting for:
- packages/components
- vulnerabilities
- edges in graphs
- evidence lists
Time is an input, never implicit:
- stamp times in a dedicated evidence field; never affect hashing/verdict evaluation.
PRNG uses explicit seed; seed stored in run manifest.
Tool versions + feed digests + policy versions are inputs.
Locale/encoding invariants: UTF-8 everywhere; invariant culture in .NET.

1.2 Offline by default

Every CI job (except explicitly tagged “online”) runs with no egress.

Offline bundle is mandatory input for scanning.
Any attempted network call fails the test (proves air-gap compliance).

1.3 Evidence-first validation

No assertion is “verdict == pass” without verifying the chain of evidence:

verdict references SBOM digest(s)
SBOM references artifact digest(s)
VEX claims reference vulnerabilities + components + reachability evidence
attestations verify cryptographically and chain to configured roots.

1.4 Interop is required, not “nice to have”

Stella Ops must round-trip with:

SBOM: CycloneDX 1.6 and SPDX 3.0.1
Attestation: DSSE / in-toto style envelopes, cosign-compatible flows
Consumer scanners: at least Grype from SBOM; ideally Trivy as cross-check

Interop tests are treated as “compatibility contracts” and block releases.

1.5 Architectural boundary enforcement (your standing rule)

Lattice/policy merge algorithms run in scanner.webservice.
Concelier and Excitors must “preserve prune source”. This is enforced with tests that detect forbidden behavior (see §6.2).

2) The test portfolio (what kinds of tests exist)

Think “coverage by risk”, not “coverage by lines”.

2.1 Test layers and what they prove

Unit tests (fast, deterministic)

Canonicalization, hashing, semantic version range ops
Graph delta algorithms
Policy rule evaluation primitives
Unknowns taxonomy + budgeting math
Evidence index assembly

Property-based tests (FsCheck)

“Reordering inputs does not change verdict hash”
“Graph merge is associative/commutative where policy declares it”
“Unknowns budgets always monotonic with missing evidence”
Parser robustness: arbitrary JSON for SBOM/VEX envelopes never crashes

Component tests (service + Postgres; optional Valkey)

scanner.webservice lattice merge and replay
Feed loader and cache behavior (offline feeds)
Router backpressure decision logic
Attestation verification modules

Contract tests (API compatibility)

OpenAPI/JSON schema compatibility for public endpoints
Evidence manifest schema backward compatibility
OCI artifact layout compatibility (attestation attachments)

Integration tests (multi-service)

Router → scanner.webservice → attestor → storage
Offline bundle import/export
Knowledge snapshot “time travel” replay pipeline

End-to-end tests (realistic flows)

scan an image → generate SBOM → produce attestations → decision verdict → UI evidence extraction
interop consumers load SBOM and confirm findings parity

Non-functional tests

Performance & scale (throughput, memory, large SBOM graphs)
Chaos/fault injection (DB restarts, queue spikes, 429/503 backpressure)
Security tests (fuzzers, decompression bomb defense, signature bypass resistance)

3) Hermetic test harness (how tests run)

3.1 Standard test profiles

You already decided: Postgres is system-of-record, Valkey is ephemeral.

Define two mandatory execution profiles in CI:

Default: Postgres + Valkey
Air-gapped minimal: Postgres only

Both must pass.

3.2 Environment isolation

Containers started with no network unless a test explicitly declares “online”.
For Kubernetes e2e: apply a default-deny egress NetworkPolicy.

3.3 Golden corpora repository (your “truth set”)

Create a versioned stellaops-test-corpus/ containing:

container images (or image tarballs) pinned by digest
SBOM expected outputs (CycloneDX + SPDX)
VEX examples (vendor/distro/internal)
vulnerability feed snapshots (pinned digests)
policies + lattice rules + unknown budgets
expected verdicts + delta verdicts
reachability subgraphs as evidence
negative fixtures: malformed SPDX, corrupted DSSE, missing digests, unsupported distros

Every corpus item includes a Run Manifest (see §4).

3.4 Artifact retention in CI

Every failing integration/e2e test uploads:

run manifest
offline bundle manifest + hashes
logs (structured)
produced SBOMs
attestations
verdict + delta verdict
evidence index

This turns failures into audit-grade reproductions.

4) Core artifacts that tests must validate

4.1 Run Manifest (replay key)

A scan run is defined by:

artifact digests (image/config/layers, or binary hash)
SBOM digests produced/consumed
vuln feed snapshot digest(s)
policy version + lattice rules digest
tool versions (scanner, parsers, reachability engine)
crypto profile (roots, key IDs, algorithm set)
environment profile (postgres-only vs postgres+valkey)
seed + canonicalization version

Test invariant: re-running the same manifest produces byte-identical verdict and same evidence references.

4.2 Offline Bundle Manifest

Bundle includes:

feeds + indexes
policies + lattice rule sets
trust roots, intermediate CAs, timestamp roots (as needed)
crypto provider modules (for sovereign readiness)
optional: Rekor mirror snapshot / inclusion proofs cache

Test invariant: offline scan is blocked if bundle is missing required parts; error is explicit and counts as “unknown” only where policy says so.

4.3 Evidence Index

The verdict is not the product; the product is verdict + evidence graph:

pointers to SBOM, VEX, reachability proofs, attestations
their digests and verification status
unknowns list with codes + remediation hints

Test invariant: every “not affected” claim has required evidence hooks per policy (“because feature flag off” etc.), otherwise becomes unknown/fail.

5) Required E2E flows (minimum set)

These are your release blockers.

Flow A: Air-gapped scan and verdict

Inputs: image tarball + offline bundle
Network: disabled
Output: SBOM (CycloneDX + SPDX), attestations, verdict
Assertions:
- no network calls occurred
- verdict references bundle digest + feed snapshot digest
- unknowns within budget
- evidence index complete

Flow B: SBOM interop round-trip

Produce SBOM via your pipeline
Attach SBOM attestation (DSSE/cosign format)
Consumer (Grype-from-SBOM) reads SBOM and produces findings
Assertions:
- consumer can parse SBOM
- findings parity within defined tolerance
- verdict references exact SBOM digest used by consumer

Flow C: Deterministic replay

Run scan → store run manifest + outputs
Run again from same manifest
Assertions:
- verdict bytes identical
- evidence index identical (except allowed “execution metadata” section)
- delta verdict is “empty delta”

Flow D: Diff-aware delta verdict (smart-diff)

Two versions of same image with controlled change (one dependency bump)
Assertions:
- delta verdict contains only changed nodes/edges
- risk budget computation based on delta matches expected
- signed delta verdict validates and is OCI-attached

Flow E: Unknowns budget gates

Inject unknowns (unmapped package, missing distro metadata, ambiguous CPE)
Policy:
- prod budget = 0
- staging budget = N
Assertions:
- prod fails, staging passes
- unknowns appear in attestation and UI evidence

Flow F: Router backpressure under burst

Spike requests to a single router instance + environment bucket
Assertions:
- 429/503 with Retry-After emitted correctly
- clients backoff; no request loss
- metrics expose throttling reasons

Flow G: Evidence export (“audit pack”)

Run scan
Export a sealed audit pack (bundle + run manifest + evidence + verdict)
Import elsewhere (clean environment)
Assertions:
- replay produces identical verdict
- signatures verify under imported trust roots

6) Module-specific test requirements

6.1 `scanner.webservice` (lattice + policy decisioning)

Must have:

unit tests for lattice merge algebra
property tests: declared commutativity/associativity/idempotency
integration tests that merge vendor/distro/internal VEX and confirm precedence rules are policy-driven

Critical invariant tests:

“Vendor > distro > internal” must be demonstrably configurable, and wrong merges must fail deterministically.

6.2 Boundary enforcement: Concelier & Excitors preserve prune source

Add a “behavioral boundary suite”:

instrument events/telemetry that records where merges happened
feed in conflicting VEX claims and assert:
- Concelier/Excitors do not resolve conflicts; they retain provenance and “prune source”
- only scanner.webservice produces the final merged semantics

If Concelier/Excitors output a resolved claim, the test fails.

6.3 `Router` backpressure and DPoP/nonce rate limiting

deterministic unit tests for token bucket math
time-controlled tests (virtual clock)
integration tests with Valkey + Postgres-only fallbacks
chaos tests: Valkey down → router degrades gracefully (local per-instance limiter still works)

6.4 Storage (Postgres) + Valkey accelerator

migration tests: schema upgrades forward/backward in CI
replay tests: Postgres-only profile yields same verdict bytes
consistency tests: Valkey cache misses never change decision outcomes, only latency

6.5 UI evidence rendering

reducer snapshot tests for:
- reachability subgraph rendering (large graphs)
- VEX chip states: affected/not-affected/under-investigation/unknown
performance budgets:
- large graph render under threshold (define and enforce)
contract tests against evidence index schema

7) Non-functional test program

7.1 Performance and scale tests

Define standard workloads:

small image (200 packages)
medium (2k packages)
large (20k+ packages)
“monorepo container” worst case (50k+ nodes graph)

Metrics collected:

p50/p95/p99 scan time
memory peak
DB write volume
evidence pack size
router throughput + throttle rate

Add regression gates:

no more than X% slowdown in p95 vs baseline
no more than Y% growth in evidence pack size for unchanged inputs

7.2 Chaos and reliability

Run chaos suites weekly/nightly:

kill scanner during run → resume/retry semantics deterministic
restart Postgres mid-run → job fails with explicit retryable state
corrupt offline bundle file → fails with typed error, not crash
burst router + slow downstream → confirms backpressure not meltdown

7.3 Security robustness tests

fuzz parsers: SPDX, CycloneDX, VEX, DSSE envelopes
zip/tar bomb defenses (artifact ingestion)
signature bypass attempts:
- mismatched digest
- altered payload with valid signature on different content
- wrong root chain
SSRF defense: any URL fields in SBOM/VEX are treated as data, never fetched in offline mode

8) CI/CD gating rules (what blocks a release)

Release candidate is blocked if any of these fail:

All mandatory E2E flows (§5) pass in both profiles:
- Postgres-only
- Postgres+Valkey
Deterministic replay suite:
- zero non-deterministic diffs in verdict bytes
- allowed diff list is explicit and reviewed
Interop suite:
- CycloneDX 1.6 and SPDX 3.0.1 round-trips succeed
- consumer scanner compatibility tests pass
Risk budgets + unknowns budgets:
- must pass on corpus, and no regressions against baseline
Backpressure correctness:
- Retry-After compliance and throttle metrics validated
Performance regression budgets:
- no breach of p95/memory budgets on standard workloads
Flakiness threshold:
- if a test flakes more than N times per week, it is quarantined and release is blocked until a deterministic root cause is established (quarantine is allowed only for non-blocking suites, never for §5 flows)

9) Implementation blueprint (how to build this test program)

Phase 0: Harness and corpus

Stand up test harness: docker compose + Testcontainers (.NET xUnit)
Create corpus repo with 10–20 curated artifacts
Implement run manifest + evidence index capture in all tests

Phase 1: Determinism and replay

canonicalization utilities + golden verdict bytes
replay runner that loads manifest and replays end-to-end
add property-based tests for ordering and merge invariants

Phase 2: Offline e2e + interop

offline bundle builder + strict “no egress” enforcement
SBOM attestation round-trip + consumer parsing suite

Phase 3: Unknowns budgets + delta verdict

unknown taxonomy everywhere (UI + attestations)
delta verdict generation and signing
diff-aware release gates

Phase 4: Backpressure + chaos + performance

router throttle chaos suite
scale tests with standard workloads and baselines

Phase 5: Audit packs + time-travel snapshots

sealed export/import
one-command replay for auditors

10) What you should standardize immediately

If you do only three things, do these:

Run Manifest as first-class test artifact
Golden corpus that pins all digests (feeds, policies, images, expected outputs)
“No egress” default in CI with explicit opt-in for online tests

Everything else becomes far easier once these are in place.

If you want, I can also produce a concrete repository layout and CI job matrix (xUnit categories, docker compose profiles, artifact retention conventions, and baseline benchmark scripts) that matches .NET 10 conventions and your Postgres/Valkey profiles.

19 KiB Raw Blame History Unescape Escape