news advisories

This commit is contained in:
StellaOps Bot
2025-11-30 21:00:38 +02:00
parent 0bef705bcc
commit 25254e3831
14 changed files with 7573 additions and 29 deletions

View File

@@ -0,0 +1,792 @@
Heres a tight, drop-in acceptance-tests pack for Stella Ops that turns common failure modes into concrete guardrails you can ship this sprint.
---
# 1) Feed outages & integrity drift (e.g., Grype DB / CDN hiccups)
**Lesson:** Never couple scans to a single live feed; pin, verify, and cache.
**Add to acceptance tests**
* **Rollback-safe updaters**
* If a feed update fails checksum or signature, the system keeps using the last “good” bundle.
* On restart, the updater falls back to the last verified bundle without network access.
* **Signed offline bundles**
* Every feed bundle (SBOM catalogs, CVE DB shards, rules) must be DSSE-signed; verification blocks ingestion on mismatch.
* Bundle manifest lists SHA-256 for each file; any deviation = reject.
**Test cases (CI)**
* Simulate 404/timeout from feed URL → scanner still produces results from cached bundle.
* Serve a tampered bundle (wrong hash) → updater logs failure; no swap; previous bundle remains active.
* Air-gap mode: no network → scanner loads from `/var/lib/stellaops/offline-bundles/*` and passes verification.
---
# 2) SBOM quality & schema drift
**Lesson:** Garbage in = garbage VEX. Gate on schema, completeness, and provenance.
**Add to acceptance tests**
* **SBOM schema gating**
* Reject SBOMs not valid CycloneDX 1.6 / SPDX 2.3 (your chosen set).
* Require: component `bom-ref`, supplier, version, hashes, and build provenance (SLSA/in-toto attestation) if provided.
* **Minimum completeness**
* Thresholds: ≥95% components with cryptographic hashes; no unknown package ecosystem fields for top 20 deps.
**Test cases**
* Feed malformed CycloneDX → `400 SBOM_VALIDATION_FAILED` with pointer to failing JSON path.
* SBOM missing hashes for >5% of components → blocked from graph ingestion; actionable error.
* SBOM with unsigned provenance when policy="RequireAttestation" → rejected.
---
# 3) DB/data corruption or operator error
**Lesson:** Snapshots save releases.
**Add to acceptance tests**
* **DB snapshot cadence**
* Postgres: base backup nightly + WAL archiving; RPO ≤ 15 min; automated restore rehearsals.
* Mongo (while still in use): per-collection dumps until conversion completes; checksum each artifact.
* **Deterministic replay**
* Any graph view must be reproducible from snapshot + bundle manifest (same revision hash).
**Test cases**
* Run chaos test that deletes last 24h tables → PITR restore to T-15m succeeds; graph revision IDs match pre-failure.
* Restore rehearsal produces identical VEX verdict counts for a pinned revision.
---
# 4) Reachability engines & graph evaluation flakiness
**Lesson:** When reachability is uncertain, degrade gracefully and be explicit.
**Add to acceptance tests**
* **Reachability fallbacks**
* If call-graph build fails or language analyzer missing, verdict moves to “Potentially Affected (Unproven Reach)” with a reason code.
* Policies must allow “conservative mode” (assume reachable) vs “lenient mode” (assume not-reachable) toggled per environment.
* **Stable graph IDs**
* Graph revision ID is a content hash of inputs (SBOM set + rules + feed versions); identical inputs → identical ID.
**Test cases**
* Remove a language analyzer container at runtime → status flips to fallback code; no 500s; policy evaluation still completes.
* Re-ingest same inputs → same graph revision ID and same verdict distribution.
---
# 5) Update pipelines & job routing
**Lesson:** No single point of truth; isolate, audit, and prove swaps.
**Add to acceptance tests**
* **Two-phase bundle swaps**
* Stage → verify → atomic symlink/label swap; all scanners pick up new label within 1 minute, or roll back.
* **Authority-gated policy changes**
* Any policy change (severity threshold, allowlist) is a signed request via Authority; audit trail must include signer and DSSE envelope hash.
**Test cases**
* Introduce a new CVE ruleset; verification passes → atomic swap; running scans continue; new scans use N+1 bundle.
* Attempt policy change with invalid signature → rejected; audit log entry created; unchanged policy in effect.
---
## How to wire this in Stella Ops (quick pointers)
* **Offline bundle format**
* `bundle.json` (manifest: file list + SHA-256 + DSSE signature), `/sboms/*.json`, `/feeds/cve/*.sqlite` (or shards), `/rules/*.yaml`, `/provenance/*.intoto.jsonl`.
* Verification entrypoint in .NET 10: `StellaOps.Bundle.VerifyAsync(manifest, keyring)` before any ingestion.
* **Authority integration**
* Define `PolicyChangeRequest` (subject, diff, reason, expiry, DSSE envelope).
* Gate `PUT /policies/*` behind `Authority.Verify(envelope) == true` and `envelope.subject == computed_diff_hash`.
* **Graph determinism**
* `GraphRevisionId = SHA256(Sort(JSON([SBOMRefs, RulesetVersion, FeedBundleIds, LatticeConfig, NormalizationVersion])))`.
* **Postgres snapshots (until full conversion)**
* Use `pg_basebackup` nightly + `wal-g` for WAL; GitLab job runs restore rehearsal weekly into `stellaops-restore` namespace and asserts revision parity against prod.
---
## Minimal developer checklist (copy to your sprint board)
* [ ] Add `BundleVerifier` to scanner startup; block if verification fails.
* [ ] Implement `CacheLastGoodBundle()` and atomic label swap (`/current -> /v-YYYYMMDDHHmm`).
* [ ] Add `SbomGate` with JSON-Schema validation + completeness thresholds.
* [ ] Emit reasoned fallbacks: `REACH_FALLBACK_NO_ANALYZER`, `REACH_FALLBACK_TIMEOUT`.
* [ ] Compute and display `GraphRevisionId` everywhere (API + UI + logs).
* [ ] Configure nightly PG backups + weekly restore rehearsal that asserts revision parity.
* [ ] Route all policy mutations through Authority DSSE verification + auditable ledger entry.
If you want, I can turn this into ready-to-merge .NET test fixtures (xUnit) and a GitLab CI job that runs the feed-tamper/air-gap simulations automatically.
Ill take the 5 “miss” areas and turn them into concrete, implementable test plans, with suggested projects, fixtures, and key cases your team can start coding.
Ill keep names aligned to .NET 10/xUnit and your Stella Ops modules.
---
## 0. Test layout proposal
**Solution structure (tests)**
```text
/tests
/StellaOps.Bundle.Tests
BundleVerificationTests.cs
CachedBundleFallbackTests.cs
/StellaOps.SbomGate.Tests
SbomSchemaValidationTests.cs
SbomCompletenessTests.cs
/StellaOps.Scanner.Tests
ScannerOfflineBundleTests.cs
ReachabilityFallbackTests.cs
GraphRevisionDeterminismTests.cs
/StellaOps.DataRecoverability.Tests
PostgresSnapshotRestoreTests.cs
GraphReplayParityTests.cs
/StellaOps.Authority.Tests
PolicyChangeSignatureTests.cs
/StellaOps.System.Acceptance
FeedOutageEndToEndTests.cs
AirGapModeEndToEndTests.cs
BundleSwapEndToEndTests.cs
/testdata
/bundles
/sboms
/graphs
/db
```
Use xUnit + FluentAssertions, plus Testcontainers for Postgres.
---
## 1) Feed outages & integrity drift
### Objectives
1. Scanner never “goes dark” because the CDN/feed is down.
2. Only **verified** bundles are used; tampered bundles are never ingested.
3. Offline/air-gap mode is a first-class, tested behavior.
### Components under test
* `StellaOps.BundleVerifier` (core library)
* `StellaOps.Scanner.Webservice` (scanner, bundle loader)
* Bundle filesystem layout:
`/opt/stellaops/bundles/v-<timestamp>/*` + `/opt/stellaops/bundles/current` symlink
### Test dimensions
* Network: OK / timeout / 404 / TLS failure / DNS failure.
* Remote bundle: correct / tampered (hash mismatch) / wrong signature / truncated.
* Local cache: last-good present / absent / corrupted.
* Mode: online / offline (air-gap).
### Detailed test suites
#### 1.1 Bundle verification unit tests
**Project:** `StellaOps.Bundle.Tests`
**Fixtures:**
* `testdata/bundles/good-bundle/`
* `testdata/bundles/hash-mismatch-bundle/`
* `testdata/bundles/bad-signature-bundle/`
* `testdata/bundles/missing-file-bundle/`
**Key tests:**
1. `VerifyAsync_ValidBundle_ReturnsSuccess`
* Arrange: Load `good-bundle` manifest + DSSE signature.
* Act: `BundleVerifier.VerifyAsync(manifest, keyring)`
* Assert:
* `result.IsValid == true`
* `result.Files.All(f => f.Status == Verified)`
2. `VerifyAsync_HashMismatch_FailsFast`
* Use `hash-mismatch-bundle` where one files SHA256 differs.
* Assert:
* `IsValid == false`
* `Errors` contains `BUNDLE_FILE_HASH_MISMATCH` and the offending path.
3. `VerifyAsync_InvalidSignature_RejectsBundle`
* DSSE envelope signed with unknown key.
* Assert:
* `IsValid == false`
* `Errors` contains `BUNDLE_SIGNATURE_INVALID`.
4. `VerifyAsync_MissingFile_RejectsBundle`
* Manifest lists file that does not exist on disk.
* Assert:
* `IsValid == false`
* `Errors` contains `BUNDLE_FILE_MISSING`.
#### 1.2 Cached bundle fallback logic
**Class under test:** `BundleManager`
Simplified interface:
```csharp
public interface IBundleManager {
Task<BundleRef> GetActiveBundleAsync();
Task<BundleRef> UpdateFromRemoteAsync(CancellationToken ct);
}
```
**Key tests:**
1. `UpdateFromRemoteAsync_RemoteUnavailable_KeepsLastGoodBundle`
* Arrange:
* `lastGood` bundle exists and is marked verified.
* Remote HTTP client always throws `TaskCanceledException` (simulated timeout).
* Act: `UpdateFromRemoteAsync`.
* Assert:
* Returned bundle ID equals `lastGood.Id`.
* No changes to `current` symlink.
2. `UpdateFromRemoteAsync_RemoteTampered_DoesNotReplaceCurrent`
* Remote returns bundle `temp-bundle` which fails `BundleVerifier`.
* Assert:
* `current` still points to `lastGood`.
* An error metric is emitted (e.g. `stellaops_bundle_update_failures_total++`).
3. `GetActiveBundle_NoVerifiedBundle_ThrowsDomainError`
* No bundle is verified on disk.
* `GetActiveBundleAsync` throws a domain exception with code `NO_VERIFIED_BUNDLE_AVAILABLE`.
* Consumption pattern in Scanner: scanner fails fast on startup with clear log.
#### 1.3 Scanner behavior with outages (integration)
**Project:** `StellaOps.Scanner.Tests`
Use in-memory host (`WebApplicationFactory<ScannerProgram>`).
**Scenarios:**
* F1: CDN timeout, last-good present.
* F2: CDN 404, last-good present.
* F3: CDN returns tampered bundle; verification fails.
* F4: Air-gap: network disabled, last-good present.
* F5: Air-gap + no last-good: scanner must refuse to start.
Example test:
```csharp
[Fact]
public async Task Scanner_UsesLastGoodBundle_WhenCdnTimesOut() {
// Arrange: put good bundle under /bundles/v-1, symlink /bundles/current -> v-1
using var host = TestScannerHost.WithBundle("v-1", good: true, simulateCdnTimeout: true);
// Act: call /api/scan with small fixture image
var response = await host.Client.PostAsJsonAsync("/api/scan", scanRequest);
// Assert:
response.StatusCode.Should().Be(HttpStatusCode.OK);
var content = await response.Content.ReadFromJsonAsync<ScanResult>();
content.BundleId.Should().Be("v-1");
host.Logs.Should().Contain("Falling back to last verified bundle");
}
```
#### 1.4 System acceptance (GitLab CI)
**Job idea:** `acceptance:feed-resilience`
Steps:
1. Spin up `scanner` + stub `feedser` container.
2. Phase A: feed OK → run baseline scan; capture `bundleId` and `graphRevisionId`.
3. Phase B: re-run with feed stub configured to:
* timeout,
* 404,
* return tampered bundle.
4. For each phase:
* Assert `bundleId` remains the baseline one.
* Assert `graphRevisionId` unchanged.
Failure of any assertion should break the pipeline.
---
## 2) SBOM quality & schema drift
### Objectives
1. Only syntactically valid SBOMs are ingested into the graph.
2. Enforce minimum completeness (hash coverage, supplier etc.).
3. Clear, machine-readable error responses from SBOM ingestion API.
### Components
* `StellaOps.SbomGate` (validation service)
* SBOM ingestion endpoint in Scanner/Concelier: `POST /api/sboms`
### Schema validation tests
**Project:** `StellaOps.SbomGate.Tests`
**Fixtures:**
* `sbom-cdx-1.6-valid.json`
* `sbom-cdx-1.6-malformed.json`
* `sbom-spdx-2.3-valid.json`
* `sbom-unsupported-schema.json`
* `sbom-missing-hashes-10percent.json`
* `sbom-no-supplier.json`
**Key tests:**
1. `Validate_ValidCycloneDx16_Succeeds`
* Assert type `SbomValidationResult.Success`.
* Ensure `DetectedSchema == CycloneDx16`.
2. `Validate_MalformedJson_FailsWithSyntaxError`
* Malformed JSON.
* Assert:
* `IsValid == false`
* `Errors` contains `SBOM_JSON_SYNTAX_ERROR` with path info.
3. `Validate_UnsupportedSchemaVersion_Fails`
* SPDX 2.1 (if you only allow 2.3).
* Expect `SBOM_SCHEMA_UNSUPPORTED` with `schemaUri` echo.
4. `Validate_MissingHashesOverThreshold_Fails`
* SBOM where >5% components lack hashes.
* Policy: `MinHashCoverage = 0.95`.
* Assert:
* `IsValid == false`
* `Errors` contains `SBOM_HASH_COVERAGE_BELOW_THRESHOLD` with actual ratio.
5. `Validate_MissingSupplier_Fails`
* Critical components missing supplier info.
* Expect `SBOM_REQUIRED_FIELD_MISSING` with `component.supplier`.
### API-level tests
**Project:** `StellaOps.Scanner.Tests` (or `StellaOps.Concelier.Tests` depending where SBOM ingestion lives).
Key scenarios:
1. `POST /api/sboms` with malformed JSON
* Request body: `sbom-cdx-1.6-malformed.json`.
* Expected:
* HTTP 400.
* Body: `{ "code": "SBOM_VALIDATION_FAILED", "details": [ ... ], "correlationId": "..." }`.
* At least one detail contains `SBOM_JSON_SYNTAX_ERROR`.
2. `POST /api/sboms` with missing hashes
* Body: `sbom-missing-hashes-10percent.json`.
* HTTP 400 with `SBOM_HASH_COVERAGE_BELOW_THRESHOLD`.
3. `POST /api/sboms` with unsupported schema
* Body: `sbom-unsupported-schema.json`.
* HTTP 400 with `SBOM_SCHEMA_UNSUPPORTED`.
4. `POST /api/sboms` valid
* Body: `sbom-cdx-1.6-valid.json`.
* HTTP 202 or 201 (depending on design).
* Response contains SBOM ID; subsequent graph build sees that SBOM.
---
## 3) DB/data corruption & operator error
### Objectives
1. You can restore Postgres to a point in time and reproduce previous graph results.
2. Graphs are deterministic given bundle + SBOM + rules.
3. Obvious corruptions are detected and surfaced, not silently masked.
### Components
* Postgres cluster (new canonical store)
* `StellaOps.Scanner.Webservice` (graph builder, persistence)
* `GraphRevisionId` computation
### 3.1 Postgres snapshot / WAL tests
**Project:** `StellaOps.DataRecoverability.Tests`
Use Testcontainers to spin up Postgres.
Scenarios:
1. `PITR_Restore_ReplaysGraphsWithSameRevisionIds`
* Arrange:
* Spin DB container with WAL archiving enabled.
* Apply schema migrations.
* Ingest fixed set of SBOMs + bundle refs + rules.
* Trigger graph build → record `graphRevisionIds` from API.
* Take base backup snapshot (simulate daily snapshot).
* Act:
* Destroy container.
* Start new container from base backup + replay WAL up to a specific LSN.
* Start Scanner against restored DB.
* Query graphs again.
* Assert:
* For each known graph: `revisionId_restored == revisionId_original`.
* Number of nodes/edges is identical.
2. `PartialDataLoss_DetectedByHealthCheck`
* After initial load, deliberately delete some rows (e.g. all edges for a given graph).
* Run health check endpoint, e.g. `/health/graph`.
* Expect:
* HTTP 503.
* Body indicates `GRAPH_INTEGRITY_FAILED` with details of missing edges.
This test forces a discipline to implement a basic graph integrity check (e.g. counts by state vs expected).
### 3.2 Deterministic replay tests
**Project:** `StellaOps.Scanner.Tests``GraphRevisionDeterminismTests.cs`
**Precondition:** Graph revision ID computed as:
```csharp
GraphRevisionId = SHA256(
Normalize([
BundleId,
OrderedSbomIds,
RulesetVersion,
FeedBundleIds,
LatticeConfigVersion,
NormalizationVersion
])
);
```
**Scenarios:**
1. `SameInputs_SameRevisionId`
* Run graph build twice for same inputs.
* Assert identical `GraphRevisionId`.
2. `DifferentBundle_DifferentRevisionId`
* Same SBOMs & rules; change vulnerability bundle ID.
* Assert `GraphRevisionId` changes.
3. `DifferentRuleset_DifferentRevisionId`
* Same SBOM & bundle; change ruleset version.
* Assert `GraphRevisionId` changes.
4. `OrderingIrrelevant_StableRevision`
* Provide SBOMs in different order.
* Assert ` GraphRevisionId` same (because of internal sorting).
---
## 4) Reachability engine & graph evaluation flakiness
### Objectives
1. If reachability cannot be computed, you do not break; you downgrade verdicts with explicit reason codes.
2. Deterministic reachability for “golden fixtures”.
3. Graph evaluation remains stable even when analyzers come and go.
### Components
* `StellaOps.Scanner.Webservice` (lattice / reachability engine)
* Language analyzers (sidecar or gRPC microservices)
* Verdict representation, e.g.:
```csharp
public sealed record VulnerabilityVerdict(
string Status, // "NotAffected", "Affected", "PotentiallyAffected"
string ReasonCode, // "REACH_CONFIRMED", "REACH_FALLBACK_NO_ANALYZER", ...
string? AnalyzerId
);
```
### 4.1 Golden reachability fixtures
**Project:** `StellaOps.Scanner.Tests``GoldenReachabilityTests.cs`
**Fixtures directory:** `/testdata/reachability/fixture-*/`
Each fixture:
```text
/testdata/reachability/fixture-01-log4j/
sbom.json
code-snippets/...
expected-vex.json
config.json # language, entrypoints, etc.
```
**Test pattern:**
For each fixture:
1. Load SBOM + configuration.
2. Trigger reachability analysis.
3. Collect raw reachability graph + final VEX verdicts.
4. Compare to `expected-vex.json` (status + reason codes).
5. Store the `GraphRevisionId` and set it as golden as well.
Key cases:
* R1: simple direct call → reachability confirmed → `Status = "Affected", ReasonCode = "REACH_CONFIRMED"`.
* R2: library present but not called → `Status = "NotAffected", ReasonCode = "REACH_ANALYZED_UNREACHABLE"`.
* R3: language analyzer missing → `Status = "PotentiallyAffected", ReasonCode = "REACH_FALLBACK_NO_ANALYZER"`.
* R4: analysis timeout → `Status = "PotentiallyAffected", ReasonCode = "REACH_FALLBACK_TIMEOUT"`.
### 4.2 Analyzer unavailability / fallback behavior
**Project:** `StellaOps.Scanner.Tests``ReachabilityFallbackTests.cs`
Scenarios:
1. `NoAnalyzerRegistered_ForLanguage_UsesFallback`
* Scanner config lists a component in language “go” but no analyzer registered.
* Expect:
* No 500 error from `/api/graphs/...`.
* All applicable vulnerabilities for that component have `Status = "PotentiallyAffected"` and `ReasonCode = "REACH_FALLBACK_NO_ANALYZER"`.
2. `AnalyzerRpcFailure_UsesFallback`
* Analyzer responds with gRPC error or HTTP 500.
* Scanner logs error and keeps going.
* Same semantics as missing analyzer, but with `AnalyzerId` populated and optional `ReasonDetails` (e.g. `RPC_UNAVAILABLE`).
3. `AnalyzerTimeout_UsesTimeoutFallback`
* Force analyzer calls to time out.
* `ReasonCode = "REACH_FALLBACK_TIMEOUT"`.
### 4.3 Concurrency & determinism
Add a test that:
1. Triggers N parallel graph builds for the same inputs.
2. Asserts that:
* All builds succeed.
* All `GraphRevisionId` are identical.
* All reachability reason codes are identical.
This is important for concurrent scanners and ensures lack of race conditions in graph construction.
---
## 5) Update pipelines & job routing
### Objectives
1. Bundle swaps are atomic: scanners see either old or new, never partially written bundles.
2. Policy changes are always signed via Authority; unsigned/invalid changes never apply.
3. Job routing changes (if/when you move to direct microservice pools) remain stateless and testable.
### 5.1 Two-phase bundle swap tests
**Bundle layout:**
* `/opt/stellaops/bundles/current` → symlink to `v-YYYYMMDDHHmmss`
* New bundle:
* Download to `/opt/stellaops/bundles/staging/<temp-id>`
* Verify
* Atomic `ln -s v-new current.tmp && mv -T current.tmp current`
**Project:** `StellaOps.Bundle.Tests``BundleSwapTests.cs`
Scenarios:
1. `Swap_Success_IsAtomic`
* Simulate swap in a temp directory.
* During swap, spawn parallel tasks that repeatedly read `current` and open `manifest.json`.
* Assert:
* Readers never fail with “file not found” / partial manifest.
* Readers only see either `v-old` or `v-new`, no mixed state.
2. `Swap_VerificationFails_NoChangeToCurrent`
* Stage bundle which fails `BundleVerifier`.
* After attempted swap:
* `current` still points to `v-old`.
* No new directory with the name expected for `v-new` is referenced by `current`.
3. `Swap_CrashBetweenVerifyAndMv_LeavesSystemConsistent`
* Simulate crash after creating `current.tmp` but before `mv -T`.
* On “restart”:
* Cleanup code must detect `current.tmp` and remove it.
* Ensure `current` still points to last good.
### 5.2 Authority-gated policy changes
**Component:** `StellaOps.Authority` + any service that exposes `/policies`.
Policy change flow:
1. Client sends DSSE-signed `PolicyChangeRequest` to `/authority/verify`.
2. Authority validates signature, subject hash.
3. Service applies change only if Authority approves.
**Project:** `StellaOps.Authority.Tests` + `StellaOps.Scanner.Tests` (or wherever policies live).
Key tests:
1. `PolicyChange_WithValidSignature_Applies`
* Signed requests `subject` hash matches computed diff of old->new policy.
* Authority returns `Approved`.
* Policy service updates policy; audit log entry recorded.
2. `PolicyChange_InvalidSignature_Rejected`
* Signature verifiable with no trusted key, or corrupted payload.
* Expect:
* HTTP 403 or 400 from policy endpoint.
* No policy change in DB.
* Audit log entry with reason `SIGNATURE_INVALID`.
3. `PolicyChange_SubjectHashMismatch_Rejected`
* Attacker changes policy body but not DSSE subject.
* On verification, recomputed diff doesnt match subject hash.
* Authority rejects with `SUBJECT_MISMATCH`.
4. `PolicyChange_ExpiredEnvelope_Rejected`
* Envelope contains `expiry` in past.
* Authority rejects with `ENVELOPE_EXPIRED`.
5. `PolicyChange_AuditTrail_Complete`
* After valid change:
* Audit log contains: `policyName`, `oldHash`, `newHash`, `signerId`, `envelopeId`, `timestamp`.
### 5.3 Job routing (if/when you use DB-backed routing tables)
You discussed a `routing` table:
```sql
domain text,
instance_id uuid,
last_heartbeat timestamptz,
table_name text
```
Key tests (once implemented):
1. `HeartbeatExpired_DropsRoutingEntry`
* Insert entry with `last_heartbeat` older than 1 minute.
* Routing GC job should remove it.
* API gateway must not route new jobs to that instance.
2. `RoundRobinAcrossAliveInstances`
* Multiple routing rows for same domain with fresh heartbeats.
* Issue M requests via gateway.
* Assert approximately round-robin distribution across `instance_id`.
3. `NoDurabilityRequired_JobsNotReplayedAfterRestart`
* Confirm that in-memory or temp tables are used appropriately where you do not want durable queues.
If you decide to go with “N gateways x M microservices via Docker load balancer only”, then the main tests here move to health-check based routing in the load balancer and become more infra than app tests.
---
## 6) CI wiring summary
To make this actually enforceable:
1. **Unit test job** (`test:unit`)
* Runs `StellaOps.Bundle.Tests`, `StellaOps.SbomGate.Tests`, `StellaOps.Authority.Tests`, `StellaOps.Scanner.Tests`.
2. **DB recoverability job** (`test:db-recoverability`)
* Uses Testcontainers to run `StellaOps.DataRecoverability.Tests`.
* Marked as “required” for `main` branch merges.
3. **Acceptance job** (`test:acceptance-system`)
* Spins up a minimal stack via Docker Compose.
* Executes `StellaOps.System.Acceptance` tests:
* Feed outages & fallback.
* Air-gap modes.
* Bundle swap.
* Can be slower; run on main and release branches.
4. **Nightly chaos job** (`test:nightly-chaos`)
* Optional: run more expensive tests (simulated DB corruption, analyzer outages, etc.).
---
If you want, next step I can generate skeleton xUnit test classes and a `/testdata` layout you can paste directly into your repo (with TODOs where real fixtures are needed).