Files
git.stella-ops.org/docs/product-advisories/29-Nov-2025 - Acceptance Tests Pack for StellaOps Guardrails.md
StellaOps Bot 25254e3831 news advisories
2025-11-30 21:00:38 +02:00

24 KiB
Raw Blame History

Heres a tight, drop-in acceptance-tests pack for Stella Ops that turns common failure modes into concrete guardrails you can ship this sprint.


1) Feed outages & integrity drift (e.g., Grype DB / CDN hiccups)

Lesson: Never couple scans to a single live feed; pin, verify, and cache.

Add to acceptance tests

  • Rollback-safe updaters

    • If a feed update fails checksum or signature, the system keeps using the last “good” bundle.
    • On restart, the updater falls back to the last verified bundle without network access.
  • Signed offline bundles

    • Every feed bundle (SBOM catalogs, CVE DB shards, rules) must be DSSE-signed; verification blocks ingestion on mismatch.
    • Bundle manifest lists SHA-256 for each file; any deviation = reject.

Test cases (CI)

  • Simulate 404/timeout from feed URL → scanner still produces results from cached bundle.
  • Serve a tampered bundle (wrong hash) → updater logs failure; no swap; previous bundle remains active.
  • Air-gap mode: no network → scanner loads from /var/lib/stellaops/offline-bundles/* and passes verification.

2) SBOM quality & schema drift

Lesson: Garbage in = garbage VEX. Gate on schema, completeness, and provenance.

Add to acceptance tests

  • SBOM schema gating

    • Reject SBOMs not valid CycloneDX 1.6 / SPDX 2.3 (your chosen set).
    • Require: component bom-ref, supplier, version, hashes, and build provenance (SLSA/in-toto attestation) if provided.
  • Minimum completeness

    • Thresholds: ≥95% components with cryptographic hashes; no unknown package ecosystem fields for top 20 deps.

Test cases

  • Feed malformed CycloneDX → 400 SBOM_VALIDATION_FAILED with pointer to failing JSON path.
  • SBOM missing hashes for >5% of components → blocked from graph ingestion; actionable error.
  • SBOM with unsigned provenance when policy="RequireAttestation" → rejected.

3) DB/data corruption or operator error

Lesson: Snapshots save releases.

Add to acceptance tests

  • DB snapshot cadence

    • Postgres: base backup nightly + WAL archiving; RPO ≤ 15 min; automated restore rehearsals.
    • Mongo (while still in use): per-collection dumps until conversion completes; checksum each artifact.
  • Deterministic replay

    • Any graph view must be reproducible from snapshot + bundle manifest (same revision hash).

Test cases

  • Run chaos test that deletes last 24h tables → PITR restore to T-15m succeeds; graph revision IDs match pre-failure.
  • Restore rehearsal produces identical VEX verdict counts for a pinned revision.

4) Reachability engines & graph evaluation flakiness

Lesson: When reachability is uncertain, degrade gracefully and be explicit.

Add to acceptance tests

  • Reachability fallbacks

    • If call-graph build fails or language analyzer missing, verdict moves to “Potentially Affected (Unproven Reach)” with a reason code.
    • Policies must allow “conservative mode” (assume reachable) vs “lenient mode” (assume not-reachable) toggled per environment.
  • Stable graph IDs

    • Graph revision ID is a content hash of inputs (SBOM set + rules + feed versions); identical inputs → identical ID.

Test cases

  • Remove a language analyzer container at runtime → status flips to fallback code; no 500s; policy evaluation still completes.
  • Re-ingest same inputs → same graph revision ID and same verdict distribution.

5) Update pipelines & job routing

Lesson: No single point of truth; isolate, audit, and prove swaps.

Add to acceptance tests

  • Two-phase bundle swaps

    • Stage → verify → atomic symlink/label swap; all scanners pick up new label within 1 minute, or roll back.
  • Authority-gated policy changes

    • Any policy change (severity threshold, allowlist) is a signed request via Authority; audit trail must include signer and DSSE envelope hash.

Test cases

  • Introduce a new CVE ruleset; verification passes → atomic swap; running scans continue; new scans use N+1 bundle.
  • Attempt policy change with invalid signature → rejected; audit log entry created; unchanged policy in effect.

How to wire this in Stella Ops (quick pointers)

  • Offline bundle format

    • bundle.json (manifest: file list + SHA-256 + DSSE signature), /sboms/*.json, /feeds/cve/*.sqlite (or shards), /rules/*.yaml, /provenance/*.intoto.jsonl.
    • Verification entrypoint in .NET 10: StellaOps.Bundle.VerifyAsync(manifest, keyring) before any ingestion.
  • Authority integration

    • Define PolicyChangeRequest (subject, diff, reason, expiry, DSSE envelope).
    • Gate PUT /policies/* behind Authority.Verify(envelope) == true and envelope.subject == computed_diff_hash.
  • Graph determinism

    • GraphRevisionId = SHA256(Sort(JSON([SBOMRefs, RulesetVersion, FeedBundleIds, LatticeConfig, NormalizationVersion]))).
  • Postgres snapshots (until full conversion)

    • Use pg_basebackup nightly + wal-g for WAL; GitLab job runs restore rehearsal weekly into stellaops-restore namespace and asserts revision parity against prod.

Minimal developer checklist (copy to your sprint board)

  • Add BundleVerifier to scanner startup; block if verification fails.
  • Implement CacheLastGoodBundle() and atomic label swap (/current -> /v-YYYYMMDDHHmm).
  • Add SbomGate with JSON-Schema validation + completeness thresholds.
  • Emit reasoned fallbacks: REACH_FALLBACK_NO_ANALYZER, REACH_FALLBACK_TIMEOUT.
  • Compute and display GraphRevisionId everywhere (API + UI + logs).
  • Configure nightly PG backups + weekly restore rehearsal that asserts revision parity.
  • Route all policy mutations through Authority DSSE verification + auditable ledger entry.

If you want, I can turn this into ready-to-merge .NET test fixtures (xUnit) and a GitLab CI job that runs the feed-tamper/air-gap simulations automatically. Ill take the 5 “miss” areas and turn them into concrete, implementable test plans, with suggested projects, fixtures, and key cases your team can start coding.

Ill keep names aligned to .NET 10/xUnit and your Stella Ops modules.


0. Test layout proposal

Solution structure (tests)

/tests
  /StellaOps.Bundle.Tests
    BundleVerificationTests.cs
    CachedBundleFallbackTests.cs
  /StellaOps.SbomGate.Tests
    SbomSchemaValidationTests.cs
    SbomCompletenessTests.cs
  /StellaOps.Scanner.Tests
    ScannerOfflineBundleTests.cs
    ReachabilityFallbackTests.cs
    GraphRevisionDeterminismTests.cs
  /StellaOps.DataRecoverability.Tests
    PostgresSnapshotRestoreTests.cs
    GraphReplayParityTests.cs
  /StellaOps.Authority.Tests
    PolicyChangeSignatureTests.cs
  /StellaOps.System.Acceptance
    FeedOutageEndToEndTests.cs
    AirGapModeEndToEndTests.cs
    BundleSwapEndToEndTests.cs
/testdata
  /bundles
  /sboms
  /graphs
  /db

Use xUnit + FluentAssertions, plus Testcontainers for Postgres.


1) Feed outages & integrity drift

Objectives

  1. Scanner never “goes dark” because the CDN/feed is down.
  2. Only verified bundles are used; tampered bundles are never ingested.
  3. Offline/air-gap mode is a first-class, tested behavior.

Components under test

  • StellaOps.BundleVerifier (core library)
  • StellaOps.Scanner.Webservice (scanner, bundle loader)
  • Bundle filesystem layout: /opt/stellaops/bundles/v-<timestamp>/* + /opt/stellaops/bundles/current symlink

Test dimensions

  • Network: OK / timeout / 404 / TLS failure / DNS failure.
  • Remote bundle: correct / tampered (hash mismatch) / wrong signature / truncated.
  • Local cache: last-good present / absent / corrupted.
  • Mode: online / offline (air-gap).

Detailed test suites

1.1 Bundle verification unit tests

Project: StellaOps.Bundle.Tests

Fixtures:

  • testdata/bundles/good-bundle/
  • testdata/bundles/hash-mismatch-bundle/
  • testdata/bundles/bad-signature-bundle/
  • testdata/bundles/missing-file-bundle/

Key tests:

  1. VerifyAsync_ValidBundle_ReturnsSuccess

    • Arrange: Load good-bundle manifest + DSSE signature.

    • Act: BundleVerifier.VerifyAsync(manifest, keyring)

    • Assert:

      • result.IsValid == true
      • result.Files.All(f => f.Status == Verified)
  2. VerifyAsync_HashMismatch_FailsFast

    • Use hash-mismatch-bundle where one files SHA256 differs.

    • Assert:

      • IsValid == false
      • Errors contains BUNDLE_FILE_HASH_MISMATCH and the offending path.
  3. VerifyAsync_InvalidSignature_RejectsBundle

    • DSSE envelope signed with unknown key.

    • Assert:

      • IsValid == false
      • Errors contains BUNDLE_SIGNATURE_INVALID.
  4. VerifyAsync_MissingFile_RejectsBundle

    • Manifest lists file that does not exist on disk.

    • Assert:

      • IsValid == false
      • Errors contains BUNDLE_FILE_MISSING.

1.2 Cached bundle fallback logic

Class under test: BundleManager

Simplified interface:

public interface IBundleManager {
    Task<BundleRef> GetActiveBundleAsync();
    Task<BundleRef> UpdateFromRemoteAsync(CancellationToken ct);
}

Key tests:

  1. UpdateFromRemoteAsync_RemoteUnavailable_KeepsLastGoodBundle

    • Arrange:

      • lastGood bundle exists and is marked verified.
      • Remote HTTP client always throws TaskCanceledException (simulated timeout).
    • Act: UpdateFromRemoteAsync.

    • Assert:

      • Returned bundle ID equals lastGood.Id.
      • No changes to current symlink.
  2. UpdateFromRemoteAsync_RemoteTampered_DoesNotReplaceCurrent

    • Remote returns bundle temp-bundle which fails BundleVerifier.

    • Assert:

      • current still points to lastGood.
      • An error metric is emitted (e.g. stellaops_bundle_update_failures_total++).
  3. GetActiveBundle_NoVerifiedBundle_ThrowsDomainError

    • No bundle is verified on disk.
    • GetActiveBundleAsync throws a domain exception with code NO_VERIFIED_BUNDLE_AVAILABLE.
    • Consumption pattern in Scanner: scanner fails fast on startup with clear log.

1.3 Scanner behavior with outages (integration)

Project: StellaOps.Scanner.Tests

Use in-memory host (WebApplicationFactory<ScannerProgram>).

Scenarios:

  • F1: CDN timeout, last-good present.
  • F2: CDN 404, last-good present.
  • F3: CDN returns tampered bundle; verification fails.
  • F4: Air-gap: network disabled, last-good present.
  • F5: Air-gap + no last-good: scanner must refuse to start.

Example test:

[Fact]
public async Task Scanner_UsesLastGoodBundle_WhenCdnTimesOut() {
    // Arrange: put good bundle under /bundles/v-1, symlink /bundles/current -> v-1
    using var host = TestScannerHost.WithBundle("v-1", good: true, simulateCdnTimeout: true);

    // Act: call /api/scan with small fixture image
    var response = await host.Client.PostAsJsonAsync("/api/scan", scanRequest);

    // Assert:
    response.StatusCode.Should().Be(HttpStatusCode.OK);
    var content = await response.Content.ReadFromJsonAsync<ScanResult>();
    content.BundleId.Should().Be("v-1");
    host.Logs.Should().Contain("Falling back to last verified bundle");
}

1.4 System acceptance (GitLab CI)

Job idea: acceptance:feed-resilience

Steps:

  1. Spin up scanner + stub feedser container.

  2. Phase A: feed OK → run baseline scan; capture bundleId and graphRevisionId.

  3. Phase B: re-run with feed stub configured to:

    • timeout,
    • 404,
    • return tampered bundle.
  4. For each phase:

    • Assert bundleId remains the baseline one.
    • Assert graphRevisionId unchanged.

Failure of any assertion should break the pipeline.


2) SBOM quality & schema drift

Objectives

  1. Only syntactically valid SBOMs are ingested into the graph.
  2. Enforce minimum completeness (hash coverage, supplier etc.).
  3. Clear, machine-readable error responses from SBOM ingestion API.

Components

  • StellaOps.SbomGate (validation service)
  • SBOM ingestion endpoint in Scanner/Concelier: POST /api/sboms

Schema validation tests

Project: StellaOps.SbomGate.Tests

Fixtures:

  • sbom-cdx-1.6-valid.json
  • sbom-cdx-1.6-malformed.json
  • sbom-spdx-2.3-valid.json
  • sbom-unsupported-schema.json
  • sbom-missing-hashes-10percent.json
  • sbom-no-supplier.json

Key tests:

  1. Validate_ValidCycloneDx16_Succeeds

    • Assert type SbomValidationResult.Success.
    • Ensure DetectedSchema == CycloneDx16.
  2. Validate_MalformedJson_FailsWithSyntaxError

    • Malformed JSON.

    • Assert:

      • IsValid == false
      • Errors contains SBOM_JSON_SYNTAX_ERROR with path info.
  3. Validate_UnsupportedSchemaVersion_Fails

    • SPDX 2.1 (if you only allow 2.3).
    • Expect SBOM_SCHEMA_UNSUPPORTED with schemaUri echo.
  4. Validate_MissingHashesOverThreshold_Fails

    • SBOM where >5% components lack hashes.

    • Policy: MinHashCoverage = 0.95.

    • Assert:

      • IsValid == false
      • Errors contains SBOM_HASH_COVERAGE_BELOW_THRESHOLD with actual ratio.
  5. Validate_MissingSupplier_Fails

    • Critical components missing supplier info.
    • Expect SBOM_REQUIRED_FIELD_MISSING with component.supplier.

API-level tests

Project: StellaOps.Scanner.Tests (or StellaOps.Concelier.Tests depending where SBOM ingestion lives).

Key scenarios:

  1. POST /api/sboms with malformed JSON

    • Request body: sbom-cdx-1.6-malformed.json.

    • Expected:

      • HTTP 400.
      • Body: { "code": "SBOM_VALIDATION_FAILED", "details": [ ... ], "correlationId": "..." }.
      • At least one detail contains SBOM_JSON_SYNTAX_ERROR.
  2. POST /api/sboms with missing hashes

    • Body: sbom-missing-hashes-10percent.json.
    • HTTP 400 with SBOM_HASH_COVERAGE_BELOW_THRESHOLD.
  3. POST /api/sboms with unsupported schema

    • Body: sbom-unsupported-schema.json.
    • HTTP 400 with SBOM_SCHEMA_UNSUPPORTED.
  4. POST /api/sboms valid

    • Body: sbom-cdx-1.6-valid.json.
    • HTTP 202 or 201 (depending on design).
    • Response contains SBOM ID; subsequent graph build sees that SBOM.

3) DB/data corruption & operator error

Objectives

  1. You can restore Postgres to a point in time and reproduce previous graph results.
  2. Graphs are deterministic given bundle + SBOM + rules.
  3. Obvious corruptions are detected and surfaced, not silently masked.

Components

  • Postgres cluster (new canonical store)
  • StellaOps.Scanner.Webservice (graph builder, persistence)
  • GraphRevisionId computation

3.1 Postgres snapshot / WAL tests

Project: StellaOps.DataRecoverability.Tests

Use Testcontainers to spin up Postgres.

Scenarios:

  1. PITR_Restore_ReplaysGraphsWithSameRevisionIds

    • Arrange:

      • Spin DB container with WAL archiving enabled.
      • Apply schema migrations.
      • Ingest fixed set of SBOMs + bundle refs + rules.
      • Trigger graph build → record graphRevisionIds from API.
      • Take base backup snapshot (simulate daily snapshot).
    • Act:

      • Destroy container.
      • Start new container from base backup + replay WAL up to a specific LSN.
      • Start Scanner against restored DB.
      • Query graphs again.
    • Assert:

      • For each known graph: revisionId_restored == revisionId_original.
      • Number of nodes/edges is identical.
  2. PartialDataLoss_DetectedByHealthCheck

    • After initial load, deliberately delete some rows (e.g. all edges for a given graph).

    • Run health check endpoint, e.g. /health/graph.

    • Expect:

      • HTTP 503.
      • Body indicates GRAPH_INTEGRITY_FAILED with details of missing edges.

This test forces a discipline to implement a basic graph integrity check (e.g. counts by state vs expected).

3.2 Deterministic replay tests

Project: StellaOps.Scanner.TestsGraphRevisionDeterminismTests.cs

Precondition: Graph revision ID computed as:

GraphRevisionId = SHA256(
  Normalize([
    BundleId,
    OrderedSbomIds,
    RulesetVersion,
    FeedBundleIds,
    LatticeConfigVersion,
    NormalizationVersion
  ])
);

Scenarios:

  1. SameInputs_SameRevisionId

    • Run graph build twice for same inputs.
    • Assert identical GraphRevisionId.
  2. DifferentBundle_DifferentRevisionId

    • Same SBOMs & rules; change vulnerability bundle ID.
    • Assert GraphRevisionId changes.
  3. DifferentRuleset_DifferentRevisionId

    • Same SBOM & bundle; change ruleset version.
    • Assert GraphRevisionId changes.
  4. OrderingIrrelevant_StableRevision

    • Provide SBOMs in different order.
    • Assert GraphRevisionId same (because of internal sorting).

4) Reachability engine & graph evaluation flakiness

Objectives

  1. If reachability cannot be computed, you do not break; you downgrade verdicts with explicit reason codes.
  2. Deterministic reachability for “golden fixtures”.
  3. Graph evaluation remains stable even when analyzers come and go.

Components

  • StellaOps.Scanner.Webservice (lattice / reachability engine)
  • Language analyzers (sidecar or gRPC microservices)
  • Verdict representation, e.g.:
public sealed record VulnerabilityVerdict(
    string Status,              // "NotAffected", "Affected", "PotentiallyAffected"
    string ReasonCode,          // "REACH_CONFIRMED", "REACH_FALLBACK_NO_ANALYZER", ...
    string? AnalyzerId
);

4.1 Golden reachability fixtures

Project: StellaOps.Scanner.TestsGoldenReachabilityTests.cs Fixtures directory: /testdata/reachability/fixture-*/

Each fixture:

/testdata/reachability/fixture-01-log4j/
  sbom.json
  code-snippets/...
  expected-vex.json
  config.json            # language, entrypoints, etc.

Test pattern:

For each fixture:

  1. Load SBOM + configuration.
  2. Trigger reachability analysis.
  3. Collect raw reachability graph + final VEX verdicts.
  4. Compare to expected-vex.json (status + reason codes).
  5. Store the GraphRevisionId and set it as golden as well.

Key cases:

  • R1: simple direct call → reachability confirmed → Status = "Affected", ReasonCode = "REACH_CONFIRMED".
  • R2: library present but not called → Status = "NotAffected", ReasonCode = "REACH_ANALYZED_UNREACHABLE".
  • R3: language analyzer missing → Status = "PotentiallyAffected", ReasonCode = "REACH_FALLBACK_NO_ANALYZER".
  • R4: analysis timeout → Status = "PotentiallyAffected", ReasonCode = "REACH_FALLBACK_TIMEOUT".

4.2 Analyzer unavailability / fallback behavior

Project: StellaOps.Scanner.TestsReachabilityFallbackTests.cs

Scenarios:

  1. NoAnalyzerRegistered_ForLanguage_UsesFallback

    • Scanner config lists a component in language “go” but no analyzer registered.

    • Expect:

      • No 500 error from /api/graphs/....
      • All applicable vulnerabilities for that component have Status = "PotentiallyAffected" and ReasonCode = "REACH_FALLBACK_NO_ANALYZER".
  2. AnalyzerRpcFailure_UsesFallback

    • Analyzer responds with gRPC error or HTTP 500.
    • Scanner logs error and keeps going.
    • Same semantics as missing analyzer, but with AnalyzerId populated and optional ReasonDetails (e.g. RPC_UNAVAILABLE).
  3. AnalyzerTimeout_UsesTimeoutFallback

    • Force analyzer calls to time out.
    • ReasonCode = "REACH_FALLBACK_TIMEOUT".

4.3 Concurrency & determinism

Add a test that:

  1. Triggers N parallel graph builds for the same inputs.

  2. Asserts that:

    • All builds succeed.
    • All GraphRevisionId are identical.
    • All reachability reason codes are identical.

This is important for concurrent scanners and ensures lack of race conditions in graph construction.


5) Update pipelines & job routing

Objectives

  1. Bundle swaps are atomic: scanners see either old or new, never partially written bundles.
  2. Policy changes are always signed via Authority; unsigned/invalid changes never apply.
  3. Job routing changes (if/when you move to direct microservice pools) remain stateless and testable.

5.1 Two-phase bundle swap tests

Bundle layout:

  • /opt/stellaops/bundles/current → symlink to v-YYYYMMDDHHmmss

  • New bundle:

    • Download to /opt/stellaops/bundles/staging/<temp-id>
    • Verify
    • Atomic ln -s v-new current.tmp && mv -T current.tmp current

Project: StellaOps.Bundle.TestsBundleSwapTests.cs

Scenarios:

  1. Swap_Success_IsAtomic

    • Simulate swap in a temp directory.

    • During swap, spawn parallel tasks that repeatedly read current and open manifest.json.

    • Assert:

      • Readers never fail with “file not found” / partial manifest.
      • Readers only see either v-old or v-new, no mixed state.
  2. Swap_VerificationFails_NoChangeToCurrent

    • Stage bundle which fails BundleVerifier.

    • After attempted swap:

      • current still points to v-old.
      • No new directory with the name expected for v-new is referenced by current.
  3. Swap_CrashBetweenVerifyAndMv_LeavesSystemConsistent

    • Simulate crash after creating current.tmp but before mv -T.

    • On “restart”:

      • Cleanup code must detect current.tmp and remove it.
      • Ensure current still points to last good.

5.2 Authority-gated policy changes

Component: StellaOps.Authority + any service that exposes /policies.

Policy change flow:

  1. Client sends DSSE-signed PolicyChangeRequest to /authority/verify.
  2. Authority validates signature, subject hash.
  3. Service applies change only if Authority approves.

Project: StellaOps.Authority.Tests + StellaOps.Scanner.Tests (or wherever policies live).

Key tests:

  1. PolicyChange_WithValidSignature_Applies

    • Signed requests subject hash matches computed diff of old->new policy.
    • Authority returns Approved.
    • Policy service updates policy; audit log entry recorded.
  2. PolicyChange_InvalidSignature_Rejected

    • Signature verifiable with no trusted key, or corrupted payload.

    • Expect:

      • HTTP 403 or 400 from policy endpoint.
      • No policy change in DB.
      • Audit log entry with reason SIGNATURE_INVALID.
  3. PolicyChange_SubjectHashMismatch_Rejected

    • Attacker changes policy body but not DSSE subject.
    • On verification, recomputed diff doesnt match subject hash.
    • Authority rejects with SUBJECT_MISMATCH.
  4. PolicyChange_ExpiredEnvelope_Rejected

    • Envelope contains expiry in past.
    • Authority rejects with ENVELOPE_EXPIRED.
  5. PolicyChange_AuditTrail_Complete

    • After valid change:

      • Audit log contains: policyName, oldHash, newHash, signerId, envelopeId, timestamp.

5.3 Job routing (if/when you use DB-backed routing tables)

You discussed a routing table:

domain       text,
instance_id  uuid,
last_heartbeat timestamptz,
table_name   text

Key tests (once implemented):

  1. HeartbeatExpired_DropsRoutingEntry

    • Insert entry with last_heartbeat older than 1 minute.
    • Routing GC job should remove it.
    • API gateway must not route new jobs to that instance.
  2. RoundRobinAcrossAliveInstances

    • Multiple routing rows for same domain with fresh heartbeats.
    • Issue M requests via gateway.
    • Assert approximately round-robin distribution across instance_id.
  3. NoDurabilityRequired_JobsNotReplayedAfterRestart

    • Confirm that in-memory or temp tables are used appropriately where you do not want durable queues.

If you decide to go with “N gateways x M microservices via Docker load balancer only”, then the main tests here move to health-check based routing in the load balancer and become more infra than app tests.


6) CI wiring summary

To make this actually enforceable:

  1. Unit test job (test:unit)

    • Runs StellaOps.Bundle.Tests, StellaOps.SbomGate.Tests, StellaOps.Authority.Tests, StellaOps.Scanner.Tests.
  2. DB recoverability job (test:db-recoverability)

    • Uses Testcontainers to run StellaOps.DataRecoverability.Tests.
    • Marked as “required” for main branch merges.
  3. Acceptance job (test:acceptance-system)

    • Spins up a minimal stack via Docker Compose.

    • Executes StellaOps.System.Acceptance tests:

      • Feed outages & fallback.
      • Air-gap modes.
      • Bundle swap.
    • Can be slower; run on main and release branches.

  4. Nightly chaos job (test:nightly-chaos)

    • Optional: run more expensive tests (simulated DB corruption, analyzer outages, etc.).

If you want, next step I can generate skeleton xUnit test classes and a /testdata layout you can paste directly into your repo (with TODOs where real fixtures are needed).