19 KiB
Here’s a compact, practical plan to harden Stella Ops around offline‑ready security evidence and deterministic verdicts, with just enough background so it all clicks.
Why this matters (quick primer)
- Air‑gapped/offline: Many customers can’t reach public feeds or registries. Your scanners, SBOM tooling, and attestations must work with pre‑synced bundles and prove what data they used.
- Interoperability: Teams mix tools (Syft/Grype/Trivy, cosign, CycloneDX/SPDX). Your CI should round‑trip SBOMs and attestations end‑to‑end and prove that downstream consumers (e.g., Grype) can load them.
- Determinism: Auditors expect “same inputs → same verdict.” Capture inputs, policies, and feed hashes so a verdict is exactly reproducible later.
- Operational guardrails: Shipping gates should fail early on unknowns and apply backpressure gracefully when load spikes.
E2E test themes to add (what to build)
- Air‑gapped operation e2e
- Package “offline bundle” (vuln feeds, package catalogs, policy/lattice rules, certs, keys).
- Run scans (containers, OS, language deps, binaries) without network.
- Assert: SBOMs generated, attestations signed/verified, verdicts emitted.
- Evidence: manifest of bundle contents + hashes in the run log.
- Interop round‑trips (SBOM ⇄ attestation ⇄ scanner)
-
Produce SBOM (CycloneDX 1.6 and SPDX 3.0.1) with Syft.
-
Create DSSE/cosign attestation for that SBOM.
-
Verify consumer tools:
- Grype scans from SBOM (no image pull) and respects attestations.
- Verdict references the exact SBOM digest and attestation chain.
-
Assert: consumers load, validate, and produce identical findings vs direct scan.
- Replayability (delta‑verdicts + strict replay)
- Store input set: artifact digest(s), SBOM digests, policy version, feed digests, lattice rules, tool versions.
- Re‑run later; assert byte‑identical verdict and same “delta‑verdict” when inputs unchanged.
- Unknowns‑budget policy gates
- Inject controlled “unknown” conditions (missing CPE mapping, unresolved package source, unparsed distro).
- Gate: fail build if unknowns > budget (e.g., prod=0, staging≤N).
- Assert: UI, CLI, and attestation all record unknown counts and gate decision.
- Attestation round‑trip & validation
- Produce: build‑provenance (in‑toto/DSSE), SBOM attest, VEX attest, final verdict attest.
- Verify: signature (cosign), certificate chain, time‑stamping, Rekor‑style (or mirror) inclusion when online; cached proofs when offline.
- Assert: each attestation is linked in the verdict’s evidence index.
- Router backpressure chaos (HTTP 429/503 + Retry‑After)
- Load tests that trigger per‑instance and per‑environment limits.
- Assert: clients back off per Retry‑After, queues drain, no data loss, latencies bounded; UI shows throttling reason.
- UI reducer tests for reachability & VEX chips
- Component tests: large SBOM graphs, focused reachability subgraphs, and VEX status chips (affected/not‑affected/under‑investigation).
- Assert: stable rendering under 50k+ nodes; interactions remain <200 ms.
Next‑week checklist (do these now)
- Delta‑verdict replay tests: golden corpus; lock tool+feed versions; assert bit‑for‑bit verdict.
- Unknowns‑budget gates in CI: policy + failing examples; surface in PR checks and UI.
- SBOM attestation round‑trip: Syft → cosign attest → Grype consume‑from‑SBOM; verify signatures & digests.
- Router backpressure chaos: scripted spike; verify 429/503 + Retry‑After handling and metrics.
- UI reducer tests: reachability graph snapshots; VEX chip states; regression suite.
Minimal artifacts to standardize (so tests are boring—good!)
- Offline bundle spec:
bundle.jsonwith content digests (feeds, policies, keys). - Evidence manifest: machine‑readable index linking verdict → SBOM digest → attestation IDs → tool versions.
- Delta‑verdict schema: captures before/after graph deltas, rule evals, and final gate result.
- Unknowns taxonomy: codes (e.g.,
PKG_SOURCE_UNKNOWN,CPE_AMBIG) with severities and budgets.
CI wiring (quick sketch)
- Jobs:
offline-e2e,interop-e2e,replayable-verdicts,unknowns-gate,router-chaos,ui-reducers. - Matrix: {Debian/Alpine/RHEL‑like} × {amd64/arm64} × {CycloneDX/SPDX}.
- Cache discipline: pin tool versions, vendor feeds to content‑addressed store.
Fast success criteria (green = done)
- Can run full scan + attest + verify with no network.
- Re‑running a fixed input set yields identical verdict.
- Grype (from SBOM) matches image scan results within tolerance.
- Builds auto‑fail when unknowns budget exceeded.
- Router under burst emits correct Retry‑After and recovers cleanly.
- UI handles huge graphs; VEX chips never desync from evidence.
If you want, I’ll turn this into GitLab/Gitea pipeline YAML + a tiny sample repo (image, SBOM, policies, and goldens) so your team can plug‑and‑play. Below is a complete, end-to-end testing strategy for Stella Ops that turns your moats (offline readiness, deterministic replayable verdicts, lattice/policy decisioning, attestation provenance, unknowns budgets, router backpressure, UI reachability evidence) into continuously verified guarantees.
1) Non-negotiable test principles
1.1 Determinism as a testable contract
A scan/verdict is deterministic iff same inputs → byte-identical outputs across time and machines (within defined tolerances like timestamps captured as evidence, not embedded in payload order).
Determinism controls (must be enforced by tests):
-
Canonical JSON (stable key order, stable array ordering where semantically unordered).
-
Stable sorting for:
- packages/components
- vulnerabilities
- edges in graphs
- evidence lists
-
Time is an input, never implicit:
- stamp times in a dedicated evidence field; never affect hashing/verdict evaluation.
-
PRNG uses explicit seed; seed stored in run manifest.
-
Tool versions + feed digests + policy versions are inputs.
-
Locale/encoding invariants: UTF-8 everywhere; invariant culture in .NET.
1.2 Offline by default
Every CI job (except explicitly tagged “online”) runs with no egress.
- Offline bundle is mandatory input for scanning.
- Any attempted network call fails the test (proves air-gap compliance).
1.3 Evidence-first validation
No assertion is “verdict == pass” without verifying the chain of evidence:
- verdict references SBOM digest(s)
- SBOM references artifact digest(s)
- VEX claims reference vulnerabilities + components + reachability evidence
- attestations verify cryptographically and chain to configured roots.
1.4 Interop is required, not “nice to have”
Stella Ops must round-trip with:
- SBOM: CycloneDX 1.6 and SPDX 3.0.1
- Attestation: DSSE / in-toto style envelopes, cosign-compatible flows
- Consumer scanners: at least Grype from SBOM; ideally Trivy as cross-check
Interop tests are treated as “compatibility contracts” and block releases.
1.5 Architectural boundary enforcement (your standing rule)
- Lattice/policy merge algorithms run in
scanner.webservice. ConcelierandExcitorsmust “preserve prune source”. This is enforced with tests that detect forbidden behavior (see §6.2).
2) The test portfolio (what kinds of tests exist)
Think “coverage by risk”, not “coverage by lines”.
2.1 Test layers and what they prove
- Unit tests (fast, deterministic)
- Canonicalization, hashing, semantic version range ops
- Graph delta algorithms
- Policy rule evaluation primitives
- Unknowns taxonomy + budgeting math
- Evidence index assembly
- Property-based tests (FsCheck)
- “Reordering inputs does not change verdict hash”
- “Graph merge is associative/commutative where policy declares it”
- “Unknowns budgets always monotonic with missing evidence”
- Parser robustness: arbitrary JSON for SBOM/VEX envelopes never crashes
- Component tests (service + Postgres; optional Valkey)
scanner.webservicelattice merge and replay- Feed loader and cache behavior (offline feeds)
- Router backpressure decision logic
- Attestation verification modules
- Contract tests (API compatibility)
- OpenAPI/JSON schema compatibility for public endpoints
- Evidence manifest schema backward compatibility
- OCI artifact layout compatibility (attestation attachments)
- Integration tests (multi-service)
- Router → scanner.webservice → attestor → storage
- Offline bundle import/export
- Knowledge snapshot “time travel” replay pipeline
- End-to-end tests (realistic flows)
- scan an image → generate SBOM → produce attestations → decision verdict → UI evidence extraction
- interop consumers load SBOM and confirm findings parity
- Non-functional tests
- Performance & scale (throughput, memory, large SBOM graphs)
- Chaos/fault injection (DB restarts, queue spikes, 429/503 backpressure)
- Security tests (fuzzers, decompression bomb defense, signature bypass resistance)
3) Hermetic test harness (how tests run)
3.1 Standard test profiles
You already decided: Postgres is system-of-record, Valkey is ephemeral.
Define two mandatory execution profiles in CI:
- Default: Postgres + Valkey
- Air-gapped minimal: Postgres only
Both must pass.
3.2 Environment isolation
- Containers started with no network unless a test explicitly declares “online”.
- For Kubernetes e2e: apply a default-deny egress NetworkPolicy.
3.3 Golden corpora repository (your “truth set”)
Create a versioned stellaops-test-corpus/ containing:
- container images (or image tarballs) pinned by digest
- SBOM expected outputs (CycloneDX + SPDX)
- VEX examples (vendor/distro/internal)
- vulnerability feed snapshots (pinned digests)
- policies + lattice rules + unknown budgets
- expected verdicts + delta verdicts
- reachability subgraphs as evidence
- negative fixtures: malformed SPDX, corrupted DSSE, missing digests, unsupported distros
Every corpus item includes a Run Manifest (see §4).
3.4 Artifact retention in CI
Every failing integration/e2e test uploads:
- run manifest
- offline bundle manifest + hashes
- logs (structured)
- produced SBOMs
- attestations
- verdict + delta verdict
- evidence index
This turns failures into audit-grade reproductions.
4) Core artifacts that tests must validate
4.1 Run Manifest (replay key)
A scan run is defined by:
- artifact digests (image/config/layers, or binary hash)
- SBOM digests produced/consumed
- vuln feed snapshot digest(s)
- policy version + lattice rules digest
- tool versions (scanner, parsers, reachability engine)
- crypto profile (roots, key IDs, algorithm set)
- environment profile (postgres-only vs postgres+valkey)
- seed + canonicalization version
Test invariant: re-running the same manifest produces byte-identical verdict and same evidence references.
4.2 Offline Bundle Manifest
Bundle includes:
- feeds + indexes
- policies + lattice rule sets
- trust roots, intermediate CAs, timestamp roots (as needed)
- crypto provider modules (for sovereign readiness)
- optional: Rekor mirror snapshot / inclusion proofs cache
Test invariant: offline scan is blocked if bundle is missing required parts; error is explicit and counts as “unknown” only where policy says so.
4.3 Evidence Index
The verdict is not the product; the product is verdict + evidence graph:
- pointers to SBOM, VEX, reachability proofs, attestations
- their digests and verification status
- unknowns list with codes + remediation hints
Test invariant: every “not affected” claim has required evidence hooks per policy (“because feature flag off” etc.), otherwise becomes unknown/fail.
5) Required E2E flows (minimum set)
These are your release blockers.
Flow A: Air-gapped scan and verdict
-
Inputs: image tarball + offline bundle
-
Network: disabled
-
Output: SBOM (CycloneDX + SPDX), attestations, verdict
-
Assertions:
- no network calls occurred
- verdict references bundle digest + feed snapshot digest
- unknowns within budget
- evidence index complete
Flow B: SBOM interop round-trip
-
Produce SBOM via your pipeline
-
Attach SBOM attestation (DSSE/cosign format)
-
Consumer (Grype-from-SBOM) reads SBOM and produces findings
-
Assertions:
- consumer can parse SBOM
- findings parity within defined tolerance
- verdict references exact SBOM digest used by consumer
Flow C: Deterministic replay
-
Run scan → store run manifest + outputs
-
Run again from same manifest
-
Assertions:
- verdict bytes identical
- evidence index identical (except allowed “execution metadata” section)
- delta verdict is “empty delta”
Flow D: Diff-aware delta verdict (smart-diff)
-
Two versions of same image with controlled change (one dependency bump)
-
Assertions:
- delta verdict contains only changed nodes/edges
- risk budget computation based on delta matches expected
- signed delta verdict validates and is OCI-attached
Flow E: Unknowns budget gates
-
Inject unknowns (unmapped package, missing distro metadata, ambiguous CPE)
-
Policy:
- prod budget = 0
- staging budget = N
-
Assertions:
- prod fails, staging passes
- unknowns appear in attestation and UI evidence
Flow F: Router backpressure under burst
-
Spike requests to a single router instance + environment bucket
-
Assertions:
- 429/503 with Retry-After emitted correctly
- clients backoff; no request loss
- metrics expose throttling reasons
Flow G: Evidence export (“audit pack”)
-
Run scan
-
Export a sealed audit pack (bundle + run manifest + evidence + verdict)
-
Import elsewhere (clean environment)
-
Assertions:
- replay produces identical verdict
- signatures verify under imported trust roots
6) Module-specific test requirements
6.1 scanner.webservice (lattice + policy decisioning)
Must have:
- unit tests for lattice merge algebra
- property tests: declared commutativity/associativity/idempotency
- integration tests that merge vendor/distro/internal VEX and confirm precedence rules are policy-driven
Critical invariant tests:
- “Vendor > distro > internal” must be demonstrably configurable, and wrong merges must fail deterministically.
6.2 Boundary enforcement: Concelier & Excitors preserve prune source
Add a “behavioral boundary suite”:
-
instrument events/telemetry that records where merges happened
-
feed in conflicting VEX claims and assert:
- Concelier/Excitors do not resolve conflicts; they retain provenance and “prune source”
- only
scanner.webserviceproduces the final merged semantics
If Concelier/Excitors output a resolved claim, the test fails.
6.3 Router backpressure and DPoP/nonce rate limiting
- deterministic unit tests for token bucket math
- time-controlled tests (virtual clock)
- integration tests with Valkey + Postgres-only fallbacks
- chaos tests: Valkey down → router degrades gracefully (local per-instance limiter still works)
6.4 Storage (Postgres) + Valkey accelerator
- migration tests: schema upgrades forward/backward in CI
- replay tests: Postgres-only profile yields same verdict bytes
- consistency tests: Valkey cache misses never change decision outcomes, only latency
6.5 UI evidence rendering
-
reducer snapshot tests for:
- reachability subgraph rendering (large graphs)
- VEX chip states: affected/not-affected/under-investigation/unknown
-
performance budgets:
- large graph render under threshold (define and enforce)
-
contract tests against evidence index schema
7) Non-functional test program
7.1 Performance and scale tests
Define standard workloads:
- small image (200 packages)
- medium (2k packages)
- large (20k+ packages)
- “monorepo container” worst case (50k+ nodes graph)
Metrics collected:
- p50/p95/p99 scan time
- memory peak
- DB write volume
- evidence pack size
- router throughput + throttle rate
Add regression gates:
- no more than X% slowdown in p95 vs baseline
- no more than Y% growth in evidence pack size for unchanged inputs
7.2 Chaos and reliability
Run chaos suites weekly/nightly:
- kill scanner during run → resume/retry semantics deterministic
- restart Postgres mid-run → job fails with explicit retryable state
- corrupt offline bundle file → fails with typed error, not crash
- burst router + slow downstream → confirms backpressure not meltdown
7.3 Security robustness tests
-
fuzz parsers: SPDX, CycloneDX, VEX, DSSE envelopes
-
zip/tar bomb defenses (artifact ingestion)
-
signature bypass attempts:
- mismatched digest
- altered payload with valid signature on different content
- wrong root chain
-
SSRF defense: any URL fields in SBOM/VEX are treated as data, never fetched in offline mode
8) CI/CD gating rules (what blocks a release)
Release candidate is blocked if any of these fail:
-
All mandatory E2E flows (§5) pass in both profiles:
- Postgres-only
- Postgres+Valkey
-
Deterministic replay suite:
- zero non-deterministic diffs in verdict bytes
- allowed diff list is explicit and reviewed
-
Interop suite:
- CycloneDX 1.6 and SPDX 3.0.1 round-trips succeed
- consumer scanner compatibility tests pass
-
Risk budgets + unknowns budgets:
- must pass on corpus, and no regressions against baseline
-
Backpressure correctness:
- Retry-After compliance and throttle metrics validated
-
Performance regression budgets:
- no breach of p95/memory budgets on standard workloads
-
Flakiness threshold:
- if a test flakes more than N times per week, it is quarantined and release is blocked until a deterministic root cause is established (quarantine is allowed only for non-blocking suites, never for §5 flows)
9) Implementation blueprint (how to build this test program)
Phase 0: Harness and corpus
- Stand up test harness: docker compose + Testcontainers (.NET xUnit)
- Create corpus repo with 10–20 curated artifacts
- Implement run manifest + evidence index capture in all tests
Phase 1: Determinism and replay
- canonicalization utilities + golden verdict bytes
- replay runner that loads manifest and replays end-to-end
- add property-based tests for ordering and merge invariants
Phase 2: Offline e2e + interop
- offline bundle builder + strict “no egress” enforcement
- SBOM attestation round-trip + consumer parsing suite
Phase 3: Unknowns budgets + delta verdict
- unknown taxonomy everywhere (UI + attestations)
- delta verdict generation and signing
- diff-aware release gates
Phase 4: Backpressure + chaos + performance
- router throttle chaos suite
- scale tests with standard workloads and baselines
Phase 5: Audit packs + time-travel snapshots
- sealed export/import
- one-command replay for auditors
10) What you should standardize immediately
If you do only three things, do these:
- Run Manifest as first-class test artifact
- Golden corpus that pins all digests (feeds, policies, images, expected outputs)
- “No egress” default in CI with explicit opt-in for online tests
Everything else becomes far easier once these are in place.
If you want, I can also produce a concrete repository layout and CI job matrix (xUnit categories, docker compose profiles, artifact retention conventions, and baseline benchmark scripts) that matches .NET 10 conventions and your Postgres/Valkey profiles.