save progress

2025-12-20 12:15:16 +02:00
parent 439f10966b
commit 0ada1b583f
95 changed files with 12400 additions and 65 deletions
--- a/docs/product-advisories/unprocessed/20-Dec-2025
+++ b/docs/product-advisories/unprocessed/20-Dec-2025
@@ -0,0 +1,565 @@
+Here’s a compact, practical plan to harden Stella Ops around **offline‑ready security evidence and deterministic verdicts**, with just enough background so it all clicks.
+
+---
+
+# Why this matters (quick primer)
+
+* **Air‑gapped/offline**: Many customers can’t reach public feeds or registries. Your scanners, SBOM tooling, and attestations must work with **pre‑synced bundles** and prove what data they used.
+* **Interoperability**: Teams mix tools (Syft/Grype/Trivy, cosign, CycloneDX/SPDX). Your CI should **round‑trip** SBOMs and attestations end‑to‑end and prove that downstream consumers (e.g., Grype) can load them.
+* **Determinism**: Auditors expect **“same inputs → same verdict.”** Capture inputs, policies, and feed hashes so a verdict is exactly reproducible later.
+* **Operational guardrails**: Shipping gates should fail early on **unknowns** and apply **backpressure** gracefully when load spikes.
+
+---
+
+# E2E test themes to add (what to build)
+
+1. **Air‑gapped operation e2e**
+
+* Package “offline bundle” (vuln feeds, package catalogs, policy/lattice rules, certs, keys).
+* Run scans (containers, OS, language deps, binaries) **without network**.
+* Assert: SBOMs generated, attestations signed/verified, verdicts emitted.
+* Evidence: manifest of bundle contents + hashes in the run log.
+
+2. **Interop round‑trips (SBOM ⇄ attestation ⇄ scanner)**
+
+* Produce SBOM (CycloneDX 1.6 and SPDX 3.0.1) with Syft.
+* Create **DSSE/cosign** attestation for that SBOM.
+* Verify consumer tools:
+
+  * **Grype** scans **from SBOM** (no image pull) and respects attestations.
+  * Verdict references the exact SBOM digest and attestation chain.
+* Assert: consumers load, validate, and produce identical findings vs direct scan.
+
+3. **Replayability (delta‑verdicts + strict replay)**
+
+* Store input set: artifact digest(s), SBOM digests, policy version, feed digests, lattice rules, tool versions.
+* Re‑run later; assert **byte‑identical verdict** and same “delta‑verdict” when inputs unchanged.
+
+4. **Unknowns‑budget policy gates**
+
+* Inject controlled “unknown” conditions (missing CPE mapping, unresolved package source, unparsed distro).
+* Gate: **fail build if unknowns > budget** (e.g., prod=0, staging≤N).
+* Assert: UI, CLI, and attestation all record unknown counts and gate decision.
+
+5. **Attestation round‑trip & validation**
+
+* Produce: build‑provenance (in‑toto/DSSE), SBOM attest, VEX attest, final **verdict attest**.
+* Verify: signature (cosign), certificate chain, time‑stamping, Rekor‑style (or mirror) inclusion when online; cached proofs when offline.
+* Assert: each attestation is linked in the verdict’s evidence index.
+
+6. **Router backpressure chaos (HTTP 429/503 + Retry‑After)**
+
+* Load tests that trigger per‑instance and per‑environment limits.
+* Assert: clients back off per **Retry‑After**, queues drain, no data loss, latencies bounded; UI shows throttling reason.
+
+7. **UI reducer tests for reachability & VEX chips**
+
+* Component tests: large SBOM graphs, focused **reachability subgraphs**, and VEX status chips (affected/not‑affected/under‑investigation).
+* Assert: stable rendering under 50k+ nodes; interactions remain <200 ms.
+
+---
+
+# Next‑week checklist (do these now)
+
+1. **Delta‑verdict replay tests**: golden corpus; lock tool+feed versions; assert bit‑for‑bit verdict.
+2. **Unknowns‑budget gates in CI**: policy + failing examples; surface in PR checks and UI.
+3. **SBOM attestation round‑trip**: Syft → cosign attest → Grype consume‑from‑SBOM; verify signatures & digests.
+4. **Router backpressure chaos**: scripted spike; verify 429/503 + Retry‑After handling and metrics.
+5. **UI reducer tests**: reachability graph snapshots; VEX chip states; regression suite.
+
+---
+
+# Minimal artifacts to standardize (so tests are boring—good!)
+
+* **Offline bundle spec**: `bundle.json` with content digests (feeds, policies, keys).
+* **Evidence manifest**: machine‑readable index linking verdict → SBOM digest → attestation IDs → tool versions.
+* **Delta‑verdict schema**: captures before/after graph deltas, rule evals, and final gate result.
+* **Unknowns taxonomy**: codes (e.g., `PKG_SOURCE_UNKNOWN`, `CPE_AMBIG`) with severities and budgets.
+
+---
+
+# CI wiring (quick sketch)
+
+* **Jobs**: `offline-e2e`, `interop-e2e`, `replayable-verdicts`, `unknowns-gate`, `router-chaos`, `ui-reducers`.
+* **Matrix**: {Debian/Alpine/RHEL‑like} × {amd64/arm64} × {CycloneDX/SPDX}.
+* **Cache discipline**: pin tool versions, vendor feeds to content‑addressed store.
+
+---
+
+# Fast success criteria (green = done)
+
+* Can run **full scan + attest + verify** with **no network**.
+* Re‑running a fixed input set yields **identical verdict**.
+* Grype (from SBOM) matches image scan results within tolerance.
+* Builds auto‑fail when **unknowns budget exceeded**.
+* Router under burst emits **correct Retry‑After** and recovers cleanly.
+* UI handles huge graphs; VEX chips never desync from evidence.
+
+If you want, I’ll turn this into GitLab/Gitea pipeline YAML + a tiny sample repo (image, SBOM, policies, and goldens) so your team can plug‑and‑play.
+Below is a complete, end-to-end testing strategy for Stella Ops that turns your moats (offline readiness, deterministic replayable verdicts, lattice/policy decisioning, attestation provenance, unknowns budgets, router backpressure, UI reachability evidence) into continuously verified guarantees.
+
+---
+
+## 1) Non-negotiable test principles
+
+### 1.1 Determinism as a testable contract
+
+A scan/verdict is *deterministic* iff **same inputs → byte-identical outputs** across time and machines (within defined tolerances like timestamps captured as evidence, not embedded in payload order).
+
+**Determinism controls (must be enforced by tests):**
+
+* Canonical JSON (stable key order, stable array ordering where semantically unordered).
+* Stable sorting for:
+
+  * packages/components
+  * vulnerabilities
+  * edges in graphs
+  * evidence lists
+* Time is an *input*, never implicit:
+
+  * stamp times in a dedicated evidence field; never affect hashing/verdict evaluation.
+* PRNG uses explicit seed; seed stored in run manifest.
+* Tool versions + feed digests + policy versions are inputs.
+* Locale/encoding invariants: UTF-8 everywhere; invariant culture in .NET.
+
+### 1.2 Offline by default
+
+Every CI job (except explicitly tagged “online”) runs with **no egress**.
+
+* Offline bundle is mandatory input for scanning.
+* Any attempted network call fails the test (proves air-gap compliance).
+
+### 1.3 Evidence-first validation
+
+No assertion is “verdict == pass” without verifying the chain of evidence:
+
+* verdict references SBOM digest(s)
+* SBOM references artifact digest(s)
+* VEX claims reference vulnerabilities + components + reachability evidence
+* attestations verify cryptographically and chain to configured roots.
+
+### 1.4 Interop is required, not “nice to have”
+
+Stella Ops must round-trip with:
+
+* SBOM: CycloneDX 1.6 and SPDX 3.0.1
+* Attestation: DSSE / in-toto style envelopes, cosign-compatible flows
+* Consumer scanners: at least Grype from SBOM; ideally Trivy as cross-check
+
+Interop tests are treated as “compatibility contracts” and block releases.
+
+### 1.5 Architectural boundary enforcement (your standing rule)
+
+* Lattice/policy merge algorithms run **in `scanner.webservice`**.
+* `Concelier` and `Excitors` must “preserve prune source”.
+  This is enforced with tests that detect forbidden behavior (see §6.2).
+
+---
+
+## 2) The test portfolio (what kinds of tests exist)
+
+Think “coverage by risk”, not “coverage by lines”.
+
+### 2.1 Test layers and what they prove
+
+1. **Unit tests** (fast, deterministic)
+
+* Canonicalization, hashing, semantic version range ops
+* Graph delta algorithms
+* Policy rule evaluation primitives
+* Unknowns taxonomy + budgeting math
+* Evidence index assembly
+
+2. **Property-based tests** (FsCheck)
+
+* “Reordering inputs does not change verdict hash”
+* “Graph merge is associative/commutative where policy declares it”
+* “Unknowns budgets always monotonic with missing evidence”
+* Parser robustness: arbitrary JSON for SBOM/VEX envelopes never crashes
+
+3. **Component tests** (service + Postgres; optional Valkey)
+
+* `scanner.webservice` lattice merge and replay
+* Feed loader and cache behavior (offline feeds)
+* Router backpressure decision logic
+* Attestation verification modules
+
+4. **Contract tests** (API compatibility)
+
+* OpenAPI/JSON schema compatibility for public endpoints
+* Evidence manifest schema backward compatibility
+* OCI artifact layout compatibility (attestation attachments)
+
+5. **Integration tests** (multi-service)
+
+* Router → scanner.webservice → attestor → storage
+* Offline bundle import/export
+* Knowledge snapshot “time travel” replay pipeline
+
+6. **End-to-end tests** (realistic flows)
+
+* scan an image → generate SBOM → produce attestations → decision verdict → UI evidence extraction
+* interop consumers load SBOM and confirm findings parity
+
+7. **Non-functional tests**
+
+* Performance & scale (throughput, memory, large SBOM graphs)
+* Chaos/fault injection (DB restarts, queue spikes, 429/503 backpressure)
+* Security tests (fuzzers, decompression bomb defense, signature bypass resistance)
+
+---
+
+## 3) Hermetic test harness (how tests run)
+
+### 3.1 Standard test profiles
+
+You already decided: **Postgres is system-of-record**, **Valkey is ephemeral**.
+
+Define two mandatory execution profiles in CI:
+
+1. **Default**: Postgres + Valkey
+2. **Air-gapped minimal**: Postgres only
+
+Both must pass.
+
+### 3.2 Environment isolation
+
+* Containers started with **no network** unless a test explicitly declares “online”.
+* For Kubernetes e2e: apply a default-deny egress NetworkPolicy.
+
+### 3.3 Golden corpora repository (your “truth set”)
+
+Create a versioned `stellaops-test-corpus/` containing:
+
+* container images (or image tarballs) pinned by digest
+* SBOM expected outputs (CycloneDX + SPDX)
+* VEX examples (vendor/distro/internal)
+* vulnerability feed snapshots (pinned digests)
+* policies + lattice rules + unknown budgets
+* expected verdicts + delta verdicts
+* reachability subgraphs as evidence
+* negative fixtures: malformed SPDX, corrupted DSSE, missing digests, unsupported distros
+
+Every corpus item includes a **Run Manifest** (see §4).
+
+### 3.4 Artifact retention in CI
+
+Every failing integration/e2e test uploads:
+
+* run manifest
+* offline bundle manifest + hashes
+* logs (structured)
+* produced SBOMs
+* attestations
+* verdict + delta verdict
+* evidence index
+
+This turns failures into audit-grade reproductions.
+
+---
+
+## 4) Core artifacts that tests must validate
+
+### 4.1 Run Manifest (replay key)
+
+A scan run is defined by:
+
+* artifact digests (image/config/layers, or binary hash)
+* SBOM digests produced/consumed
+* vuln feed snapshot digest(s)
+* policy version + lattice rules digest
+* tool versions (scanner, parsers, reachability engine)
+* crypto profile (roots, key IDs, algorithm set)
+* environment profile (postgres-only vs postgres+valkey)
+* seed + canonicalization version
+
+**Test invariant:** re-running the same manifest produces **byte-identical verdict** and **same evidence references**.
+
+### 4.2 Offline Bundle Manifest
+
+Bundle includes:
+
+* feeds + indexes
+* policies + lattice rule sets
+* trust roots, intermediate CAs, timestamp roots (as needed)
+* crypto provider modules (for sovereign readiness)
+* optional: Rekor mirror snapshot / inclusion proofs cache
+
+**Test invariant:** offline scan is blocked if bundle is missing required parts; error is explicit and counts as “unknown” only where policy says so.
+
+### 4.3 Evidence Index
+
+The verdict is not the product; the product is verdict + evidence graph:
+
+* pointers to SBOM, VEX, reachability proofs, attestations
+* their digests and verification status
+* unknowns list with codes + remediation hints
+
+**Test invariant:** every “not affected” claim has required evidence hooks per policy (“because feature flag off” etc.), otherwise becomes unknown/fail.
+
+---
+
+## 5) Required E2E flows (minimum set)
+
+These are your release blockers.
+
+### Flow A: Air-gapped scan and verdict
+
+* Inputs: image tarball + offline bundle
+* Network: disabled
+* Output: SBOM (CycloneDX + SPDX), attestations, verdict
+* Assertions:
+
+  * no network calls occurred
+  * verdict references bundle digest + feed snapshot digest
+  * unknowns within budget
+  * evidence index complete
+
+### Flow B: SBOM interop round-trip
+
+* Produce SBOM via your pipeline
+* Attach SBOM attestation (DSSE/cosign format)
+* Consumer (Grype-from-SBOM) reads SBOM and produces findings
+* Assertions:
+
+  * consumer can parse SBOM
+  * findings parity within defined tolerance
+  * verdict references exact SBOM digest used by consumer
+
+### Flow C: Deterministic replay
+
+* Run scan → store run manifest + outputs
+* Run again from same manifest
+* Assertions:
+
+  * verdict bytes identical
+  * evidence index identical (except allowed “execution metadata” section)
+  * delta verdict is “empty delta”
+
+### Flow D: Diff-aware delta verdict (smart-diff)
+
+* Two versions of same image with controlled change (one dependency bump)
+* Assertions:
+
+  * delta verdict contains only changed nodes/edges
+  * risk budget computation based on delta matches expected
+  * signed delta verdict validates and is OCI-attached
+
+### Flow E: Unknowns budget gates
+
+* Inject unknowns (unmapped package, missing distro metadata, ambiguous CPE)
+* Policy:
+
+  * prod budget = 0
+  * staging budget = N
+* Assertions:
+
+  * prod fails, staging passes
+  * unknowns appear in attestation and UI evidence
+
+### Flow F: Router backpressure under burst
+
+* Spike requests to a single router instance + environment bucket
+* Assertions:
+
+  * 429/503 with Retry-After emitted correctly
+  * clients backoff; no request loss
+  * metrics expose throttling reasons
+
+### Flow G: Evidence export (“audit pack”)
+
+* Run scan
+* Export a sealed audit pack (bundle + run manifest + evidence + verdict)
+* Import elsewhere (clean environment)
+* Assertions:
+
+  * replay produces identical verdict
+  * signatures verify under imported trust roots
+
+---
+
+## 6) Module-specific test requirements
+
+### 6.1 `scanner.webservice` (lattice + policy decisioning)
+
+Must have:
+
+* unit tests for lattice merge algebra
+* property tests: declared commutativity/associativity/idempotency
+* integration tests that merge vendor/distro/internal VEX and confirm precedence rules are policy-driven
+
+**Critical invariant tests:**
+
+* “Vendor > distro > internal” must be demonstrably *configurable*, and wrong merges must fail deterministically.
+
+### 6.2 Boundary enforcement: Concelier & Excitors preserve prune source
+
+Add a “behavioral boundary suite”:
+
+* instrument events/telemetry that records where merges happened
+* feed in conflicting VEX claims and assert:
+
+  * Concelier/Excitors do not resolve conflicts; they retain provenance and “prune source”
+  * only `scanner.webservice` produces the final merged semantics
+
+If Concelier/Excitors output a resolved claim, the test fails.
+
+### 6.3 `Router` backpressure and DPoP/nonce rate limiting
+
+* deterministic unit tests for token bucket math
+* time-controlled tests (virtual clock)
+* integration tests with Valkey + Postgres-only fallbacks
+* chaos tests: Valkey down → router degrades gracefully (local per-instance limiter still works)
+
+### 6.4 Storage (Postgres) + Valkey accelerator
+
+* migration tests: schema upgrades forward/backward in CI
+* replay tests: Postgres-only profile yields same verdict bytes
+* consistency tests: Valkey cache misses never change decision outcomes, only latency
+
+### 6.5 UI evidence rendering
+
+* reducer snapshot tests for:
+
+  * reachability subgraph rendering (large graphs)
+  * VEX chip states: affected/not-affected/under-investigation/unknown
+* performance budgets:
+
+  * large graph render under threshold (define and enforce)
+* contract tests against evidence index schema
+
+---
+
+## 7) Non-functional test program
+
+### 7.1 Performance and scale tests
+
+Define standard workloads:
+
+* small image (200 packages)
+* medium (2k packages)
+* large (20k+ packages)
+* “monorepo container” worst case (50k+ nodes graph)
+
+Metrics collected:
+
+* p50/p95/p99 scan time
+* memory peak
+* DB write volume
+* evidence pack size
+* router throughput + throttle rate
+
+Add regression gates:
+
+* no more than X% slowdown in p95 vs baseline
+* no more than Y% growth in evidence pack size for unchanged inputs
+
+### 7.2 Chaos and reliability
+
+Run chaos suites weekly/nightly:
+
+* kill scanner during run → resume/retry semantics deterministic
+* restart Postgres mid-run → job fails with explicit retryable state
+* corrupt offline bundle file → fails with typed error, not crash
+* burst router + slow downstream → confirms backpressure not meltdown
+
+### 7.3 Security robustness tests
+
+* fuzz parsers: SPDX, CycloneDX, VEX, DSSE envelopes
+* zip/tar bomb defenses (artifact ingestion)
+* signature bypass attempts:
+
+  * mismatched digest
+  * altered payload with valid signature on different content
+  * wrong root chain
+* SSRF defense: any URL fields in SBOM/VEX are treated as data, never fetched in offline mode
+
+---
+
+## 8) CI/CD gating rules (what blocks a release)
+
+Release candidate is blocked if any of these fail:
+
+1. All mandatory E2E flows (§5) pass in both profiles:
+
+   * Postgres-only
+   * Postgres+Valkey
+
+2. Deterministic replay suite:
+
+   * zero non-deterministic diffs in verdict bytes
+   * allowed diff list is explicit and reviewed
+
+3. Interop suite:
+
+   * CycloneDX 1.6 and SPDX 3.0.1 round-trips succeed
+   * consumer scanner compatibility tests pass
+
+4. Risk budgets + unknowns budgets:
+
+   * must pass on corpus, and no regressions against baseline
+
+5. Backpressure correctness:
+
+   * Retry-After compliance and throttle metrics validated
+
+6. Performance regression budgets:
+
+   * no breach of p95/memory budgets on standard workloads
+
+7. Flakiness threshold:
+
+   * if a test flakes more than N times per week, it is quarantined *and* release is blocked until a deterministic root cause is established (quarantine is allowed only for non-blocking suites, never for §5 flows)
+
+---
+
+## 9) Implementation blueprint (how to build this test program)
+
+### Phase 0: Harness and corpus
+
+* Stand up test harness: docker compose + Testcontainers (.NET xUnit)
+* Create corpus repo with 10–20 curated artifacts
+* Implement run manifest + evidence index capture in all tests
+
+### Phase 1: Determinism and replay
+
+* canonicalization utilities + golden verdict bytes
+* replay runner that loads manifest and replays end-to-end
+* add property-based tests for ordering and merge invariants
+
+### Phase 2: Offline e2e + interop
+
+* offline bundle builder + strict “no egress” enforcement
+* SBOM attestation round-trip + consumer parsing suite
+
+### Phase 3: Unknowns budgets + delta verdict
+
+* unknown taxonomy everywhere (UI + attestations)
+* delta verdict generation and signing
+* diff-aware release gates
+
+### Phase 4: Backpressure + chaos + performance
+
+* router throttle chaos suite
+* scale tests with standard workloads and baselines
+
+### Phase 5: Audit packs + time-travel snapshots
+
+* sealed export/import
+* one-command replay for auditors
+
+---
+
+## 10) What you should standardize immediately
+
+If you do only three things, do these:
+
+1. **Run Manifest** as first-class test artifact
+2. **Golden corpus** that pins all digests (feeds, policies, images, expected outputs)
+3. **“No egress” default** in CI with explicit opt-in for online tests
+
+Everything else becomes far easier once these are in place.
+
+---
+
+If you want, I can also produce a concrete repository layout and CI job matrix (xUnit categories, docker compose profiles, artifact retention conventions, and baseline benchmark scripts) that matches .NET 10 conventions and your Postgres/Valkey profiles.