save progress

This commit is contained in:
StellaOps Bot
2025-12-20 12:15:16 +02:00
parent 439f10966b
commit 0ada1b583f
95 changed files with 12400 additions and 65 deletions

View File

@@ -0,0 +1,565 @@
Heres a compact, practical plan to harden StellaOps around **offlineready security evidence and deterministic verdicts**, with just enough background so it all clicks.
---
# Why this matters (quick primer)
* **Airgapped/offline**: Many customers cant reach public feeds or registries. Your scanners, SBOM tooling, and attestations must work with **presynced bundles** and prove what data they used.
* **Interoperability**: Teams mix tools (Syft/Grype/Trivy, cosign, CycloneDX/SPDX). Your CI should **roundtrip** SBOMs and attestations endtoend and prove that downstream consumers (e.g., Grype) can load them.
* **Determinism**: Auditors expect **“same inputs → same verdict.”** Capture inputs, policies, and feed hashes so a verdict is exactly reproducible later.
* **Operational guardrails**: Shipping gates should fail early on **unknowns** and apply **backpressure** gracefully when load spikes.
---
# E2E test themes to add (what to build)
1. **Airgapped operation e2e**
* Package “offline bundle” (vuln feeds, package catalogs, policy/lattice rules, certs, keys).
* Run scans (containers, OS, language deps, binaries) **without network**.
* Assert: SBOMs generated, attestations signed/verified, verdicts emitted.
* Evidence: manifest of bundle contents + hashes in the run log.
2. **Interop roundtrips (SBOM ⇄ attestation ⇄ scanner)**
* Produce SBOM (CycloneDX1.6 and SPDX3.0.1) with Syft.
* Create **DSSE/cosign** attestation for that SBOM.
* Verify consumer tools:
* **Grype** scans **from SBOM** (no image pull) and respects attestations.
* Verdict references the exact SBOM digest and attestation chain.
* Assert: consumers load, validate, and produce identical findings vs direct scan.
3. **Replayability (deltaverdicts + strict replay)**
* Store input set: artifact digest(s), SBOM digests, policy version, feed digests, lattice rules, tool versions.
* Rerun later; assert **byteidentical verdict** and same “deltaverdict” when inputs unchanged.
4. **Unknownsbudget policy gates**
* Inject controlled “unknown” conditions (missing CPE mapping, unresolved package source, unparsed distro).
* Gate: **fail build if unknowns > budget** (e.g., prod=0, staging≤N).
* Assert: UI, CLI, and attestation all record unknown counts and gate decision.
5. **Attestation roundtrip & validation**
* Produce: buildprovenance (intoto/DSSE), SBOM attest, VEX attest, final **verdict attest**.
* Verify: signature (cosign), certificate chain, timestamping, Rekorstyle (or mirror) inclusion when online; cached proofs when offline.
* Assert: each attestation is linked in the verdicts evidence index.
6. **Router backpressure chaos (HTTP 429/503 + RetryAfter)**
* Load tests that trigger perinstance and perenvironment limits.
* Assert: clients back off per **RetryAfter**, queues drain, no data loss, latencies bounded; UI shows throttling reason.
7. **UI reducer tests for reachability & VEX chips**
* Component tests: large SBOM graphs, focused **reachability subgraphs**, and VEX status chips (affected/notaffected/underinvestigation).
* Assert: stable rendering under 50k+ nodes; interactions remain <200ms.
---
# Nextweek checklist (do these now)
1. **Deltaverdict replay tests**: golden corpus; lock tool+feed versions; assert bitforbit verdict.
2. **Unknownsbudget gates in CI**: policy + failing examples; surface in PR checks and UI.
3. **SBOM attestation roundtrip**: Syft cosign attest Grype consumefromSBOM; verify signatures & digests.
4. **Router backpressure chaos**: scripted spike; verify 429/503 + RetryAfter handling and metrics.
5. **UI reducer tests**: reachability graph snapshots; VEX chip states; regression suite.
---
# Minimal artifacts to standardize (so tests are boring—good!)
* **Offline bundle spec**: `bundle.json` with content digests (feeds, policies, keys).
* **Evidence manifest**: machinereadable index linking verdict SBOM digest attestation IDs tool versions.
* **Deltaverdict schema**: captures before/after graph deltas, rule evals, and final gate result.
* **Unknowns taxonomy**: codes (e.g., `PKG_SOURCE_UNKNOWN`, `CPE_AMBIG`) with severities and budgets.
---
# CI wiring (quick sketch)
* **Jobs**: `offline-e2e`, `interop-e2e`, `replayable-verdicts`, `unknowns-gate`, `router-chaos`, `ui-reducers`.
* **Matrix**: {Debian/Alpine/RHELlike} × {amd64/arm64} × {CycloneDX/SPDX}.
* **Cache discipline**: pin tool versions, vendor feeds to contentaddressed store.
---
# Fast success criteria (green = done)
* Can run **full scan + attest + verify** with **no network**.
* Rerunning a fixed input set yields **identical verdict**.
* Grype (from SBOM) matches image scan results within tolerance.
* Builds autofail when **unknowns budget exceeded**.
* Router under burst emits **correct RetryAfter** and recovers cleanly.
* UI handles huge graphs; VEX chips never desync from evidence.
If you want, Ill turn this into GitLab/Gitea pipeline YAML + a tiny sample repo (image, SBOM, policies, and goldens) so your team can plugandplay.
Below is a complete, end-to-end testing strategy for Stella Ops that turns your moats (offline readiness, deterministic replayable verdicts, lattice/policy decisioning, attestation provenance, unknowns budgets, router backpressure, UI reachability evidence) into continuously verified guarantees.
---
## 1) Non-negotiable test principles
### 1.1 Determinism as a testable contract
A scan/verdict is *deterministic* iff **same inputs → byte-identical outputs** across time and machines (within defined tolerances like timestamps captured as evidence, not embedded in payload order).
**Determinism controls (must be enforced by tests):**
* Canonical JSON (stable key order, stable array ordering where semantically unordered).
* Stable sorting for:
* packages/components
* vulnerabilities
* edges in graphs
* evidence lists
* Time is an *input*, never implicit:
* stamp times in a dedicated evidence field; never affect hashing/verdict evaluation.
* PRNG uses explicit seed; seed stored in run manifest.
* Tool versions + feed digests + policy versions are inputs.
* Locale/encoding invariants: UTF-8 everywhere; invariant culture in .NET.
### 1.2 Offline by default
Every CI job (except explicitly tagged online”) runs with **no egress**.
* Offline bundle is mandatory input for scanning.
* Any attempted network call fails the test (proves air-gap compliance).
### 1.3 Evidence-first validation
No assertion is verdict == pass without verifying the chain of evidence:
* verdict references SBOM digest(s)
* SBOM references artifact digest(s)
* VEX claims reference vulnerabilities + components + reachability evidence
* attestations verify cryptographically and chain to configured roots.
### 1.4 Interop is required, not “nice to have”
Stella Ops must round-trip with:
* SBOM: CycloneDX 1.6 and SPDX 3.0.1
* Attestation: DSSE / in-toto style envelopes, cosign-compatible flows
* Consumer scanners: at least Grype from SBOM; ideally Trivy as cross-check
Interop tests are treated as compatibility contracts and block releases.
### 1.5 Architectural boundary enforcement (your standing rule)
* Lattice/policy merge algorithms run **in `scanner.webservice`**.
* `Concelier` and `Excitors` must preserve prune source”.
This is enforced with tests that detect forbidden behavior (see §6.2).
---
## 2) The test portfolio (what kinds of tests exist)
Think coverage by risk”, not coverage by lines”.
### 2.1 Test layers and what they prove
1. **Unit tests** (fast, deterministic)
* Canonicalization, hashing, semantic version range ops
* Graph delta algorithms
* Policy rule evaluation primitives
* Unknowns taxonomy + budgeting math
* Evidence index assembly
2. **Property-based tests** (FsCheck)
* Reordering inputs does not change verdict hash
* Graph merge is associative/commutative where policy declares it
* Unknowns budgets always monotonic with missing evidence
* Parser robustness: arbitrary JSON for SBOM/VEX envelopes never crashes
3. **Component tests** (service + Postgres; optional Valkey)
* `scanner.webservice` lattice merge and replay
* Feed loader and cache behavior (offline feeds)
* Router backpressure decision logic
* Attestation verification modules
4. **Contract tests** (API compatibility)
* OpenAPI/JSON schema compatibility for public endpoints
* Evidence manifest schema backward compatibility
* OCI artifact layout compatibility (attestation attachments)
5. **Integration tests** (multi-service)
* Router scanner.webservice attestor storage
* Offline bundle import/export
* Knowledge snapshot time travel replay pipeline
6. **End-to-end tests** (realistic flows)
* scan an image generate SBOM produce attestations decision verdict UI evidence extraction
* interop consumers load SBOM and confirm findings parity
7. **Non-functional tests**
* Performance & scale (throughput, memory, large SBOM graphs)
* Chaos/fault injection (DB restarts, queue spikes, 429/503 backpressure)
* Security tests (fuzzers, decompression bomb defense, signature bypass resistance)
---
## 3) Hermetic test harness (how tests run)
### 3.1 Standard test profiles
You already decided: **Postgres is system-of-record**, **Valkey is ephemeral**.
Define two mandatory execution profiles in CI:
1. **Default**: Postgres + Valkey
2. **Air-gapped minimal**: Postgres only
Both must pass.
### 3.2 Environment isolation
* Containers started with **no network** unless a test explicitly declares online”.
* For Kubernetes e2e: apply a default-deny egress NetworkPolicy.
### 3.3 Golden corpora repository (your “truth set”)
Create a versioned `stellaops-test-corpus/` containing:
* container images (or image tarballs) pinned by digest
* SBOM expected outputs (CycloneDX + SPDX)
* VEX examples (vendor/distro/internal)
* vulnerability feed snapshots (pinned digests)
* policies + lattice rules + unknown budgets
* expected verdicts + delta verdicts
* reachability subgraphs as evidence
* negative fixtures: malformed SPDX, corrupted DSSE, missing digests, unsupported distros
Every corpus item includes a **Run Manifest** (see §4).
### 3.4 Artifact retention in CI
Every failing integration/e2e test uploads:
* run manifest
* offline bundle manifest + hashes
* logs (structured)
* produced SBOMs
* attestations
* verdict + delta verdict
* evidence index
This turns failures into audit-grade reproductions.
---
## 4) Core artifacts that tests must validate
### 4.1 Run Manifest (replay key)
A scan run is defined by:
* artifact digests (image/config/layers, or binary hash)
* SBOM digests produced/consumed
* vuln feed snapshot digest(s)
* policy version + lattice rules digest
* tool versions (scanner, parsers, reachability engine)
* crypto profile (roots, key IDs, algorithm set)
* environment profile (postgres-only vs postgres+valkey)
* seed + canonicalization version
**Test invariant:** re-running the same manifest produces **byte-identical verdict** and **same evidence references**.
### 4.2 Offline Bundle Manifest
Bundle includes:
* feeds + indexes
* policies + lattice rule sets
* trust roots, intermediate CAs, timestamp roots (as needed)
* crypto provider modules (for sovereign readiness)
* optional: Rekor mirror snapshot / inclusion proofs cache
**Test invariant:** offline scan is blocked if bundle is missing required parts; error is explicit and counts as unknown only where policy says so.
### 4.3 Evidence Index
The verdict is not the product; the product is verdict + evidence graph:
* pointers to SBOM, VEX, reachability proofs, attestations
* their digests and verification status
* unknowns list with codes + remediation hints
**Test invariant:** every not affected claim has required evidence hooks per policy (“because feature flag off etc.), otherwise becomes unknown/fail.
---
## 5) Required E2E flows (minimum set)
These are your release blockers.
### Flow A: Air-gapped scan and verdict
* Inputs: image tarball + offline bundle
* Network: disabled
* Output: SBOM (CycloneDX + SPDX), attestations, verdict
* Assertions:
* no network calls occurred
* verdict references bundle digest + feed snapshot digest
* unknowns within budget
* evidence index complete
### Flow B: SBOM interop round-trip
* Produce SBOM via your pipeline
* Attach SBOM attestation (DSSE/cosign format)
* Consumer (Grype-from-SBOM) reads SBOM and produces findings
* Assertions:
* consumer can parse SBOM
* findings parity within defined tolerance
* verdict references exact SBOM digest used by consumer
### Flow C: Deterministic replay
* Run scan store run manifest + outputs
* Run again from same manifest
* Assertions:
* verdict bytes identical
* evidence index identical (except allowed execution metadata section)
* delta verdict is empty delta
### Flow D: Diff-aware delta verdict (smart-diff)
* Two versions of same image with controlled change (one dependency bump)
* Assertions:
* delta verdict contains only changed nodes/edges
* risk budget computation based on delta matches expected
* signed delta verdict validates and is OCI-attached
### Flow E: Unknowns budget gates
* Inject unknowns (unmapped package, missing distro metadata, ambiguous CPE)
* Policy:
* prod budget = 0
* staging budget = N
* Assertions:
* prod fails, staging passes
* unknowns appear in attestation and UI evidence
### Flow F: Router backpressure under burst
* Spike requests to a single router instance + environment bucket
* Assertions:
* 429/503 with Retry-After emitted correctly
* clients backoff; no request loss
* metrics expose throttling reasons
### Flow G: Evidence export (“audit pack”)
* Run scan
* Export a sealed audit pack (bundle + run manifest + evidence + verdict)
* Import elsewhere (clean environment)
* Assertions:
* replay produces identical verdict
* signatures verify under imported trust roots
---
## 6) Module-specific test requirements
### 6.1 `scanner.webservice` (lattice + policy decisioning)
Must have:
* unit tests for lattice merge algebra
* property tests: declared commutativity/associativity/idempotency
* integration tests that merge vendor/distro/internal VEX and confirm precedence rules are policy-driven
**Critical invariant tests:**
* Vendor > distro > internal” must be demonstrably *configurable*, and wrong merges must fail deterministically.
### 6.2 Boundary enforcement: Concelier & Excitors preserve prune source
Add a “behavioral boundary suite”:
* instrument events/telemetry that records where merges happened
* feed in conflicting VEX claims and assert:
* Concelier/Excitors do not resolve conflicts; they retain provenance and “prune source”
* only `scanner.webservice` produces the final merged semantics
If Concelier/Excitors output a resolved claim, the test fails.
### 6.3 `Router` backpressure and DPoP/nonce rate limiting
* deterministic unit tests for token bucket math
* time-controlled tests (virtual clock)
* integration tests with Valkey + Postgres-only fallbacks
* chaos tests: Valkey down → router degrades gracefully (local per-instance limiter still works)
### 6.4 Storage (Postgres) + Valkey accelerator
* migration tests: schema upgrades forward/backward in CI
* replay tests: Postgres-only profile yields same verdict bytes
* consistency tests: Valkey cache misses never change decision outcomes, only latency
### 6.5 UI evidence rendering
* reducer snapshot tests for:
* reachability subgraph rendering (large graphs)
* VEX chip states: affected/not-affected/under-investigation/unknown
* performance budgets:
* large graph render under threshold (define and enforce)
* contract tests against evidence index schema
---
## 7) Non-functional test program
### 7.1 Performance and scale tests
Define standard workloads:
* small image (200 packages)
* medium (2k packages)
* large (20k+ packages)
* “monorepo container” worst case (50k+ nodes graph)
Metrics collected:
* p50/p95/p99 scan time
* memory peak
* DB write volume
* evidence pack size
* router throughput + throttle rate
Add regression gates:
* no more than X% slowdown in p95 vs baseline
* no more than Y% growth in evidence pack size for unchanged inputs
### 7.2 Chaos and reliability
Run chaos suites weekly/nightly:
* kill scanner during run → resume/retry semantics deterministic
* restart Postgres mid-run → job fails with explicit retryable state
* corrupt offline bundle file → fails with typed error, not crash
* burst router + slow downstream → confirms backpressure not meltdown
### 7.3 Security robustness tests
* fuzz parsers: SPDX, CycloneDX, VEX, DSSE envelopes
* zip/tar bomb defenses (artifact ingestion)
* signature bypass attempts:
* mismatched digest
* altered payload with valid signature on different content
* wrong root chain
* SSRF defense: any URL fields in SBOM/VEX are treated as data, never fetched in offline mode
---
## 8) CI/CD gating rules (what blocks a release)
Release candidate is blocked if any of these fail:
1. All mandatory E2E flows (§5) pass in both profiles:
* Postgres-only
* Postgres+Valkey
2. Deterministic replay suite:
* zero non-deterministic diffs in verdict bytes
* allowed diff list is explicit and reviewed
3. Interop suite:
* CycloneDX 1.6 and SPDX 3.0.1 round-trips succeed
* consumer scanner compatibility tests pass
4. Risk budgets + unknowns budgets:
* must pass on corpus, and no regressions against baseline
5. Backpressure correctness:
* Retry-After compliance and throttle metrics validated
6. Performance regression budgets:
* no breach of p95/memory budgets on standard workloads
7. Flakiness threshold:
* if a test flakes more than N times per week, it is quarantined *and* release is blocked until a deterministic root cause is established (quarantine is allowed only for non-blocking suites, never for §5 flows)
---
## 9) Implementation blueprint (how to build this test program)
### Phase 0: Harness and corpus
* Stand up test harness: docker compose + Testcontainers (.NET xUnit)
* Create corpus repo with 1020 curated artifacts
* Implement run manifest + evidence index capture in all tests
### Phase 1: Determinism and replay
* canonicalization utilities + golden verdict bytes
* replay runner that loads manifest and replays end-to-end
* add property-based tests for ordering and merge invariants
### Phase 2: Offline e2e + interop
* offline bundle builder + strict “no egress” enforcement
* SBOM attestation round-trip + consumer parsing suite
### Phase 3: Unknowns budgets + delta verdict
* unknown taxonomy everywhere (UI + attestations)
* delta verdict generation and signing
* diff-aware release gates
### Phase 4: Backpressure + chaos + performance
* router throttle chaos suite
* scale tests with standard workloads and baselines
### Phase 5: Audit packs + time-travel snapshots
* sealed export/import
* one-command replay for auditors
---
## 10) What you should standardize immediately
If you do only three things, do these:
1. **Run Manifest** as first-class test artifact
2. **Golden corpus** that pins all digests (feeds, policies, images, expected outputs)
3. **“No egress” default** in CI with explicit opt-in for online tests
Everything else becomes far easier once these are in place.
---
If you want, I can also produce a concrete repository layout and CI job matrix (xUnit categories, docker compose profiles, artifact retention conventions, and baseline benchmark scripts) that matches .NET 10 conventions and your Postgres/Valkey profiles.