news advisories
This commit is contained in:
@@ -0,0 +1,792 @@
|
||||
Here’s a tight, drop-in acceptance-tests pack for Stella Ops that turns common failure modes into concrete guardrails you can ship this sprint.
|
||||
|
||||
---
|
||||
|
||||
# 1) Feed outages & integrity drift (e.g., Grype DB / CDN hiccups)
|
||||
|
||||
**Lesson:** Never couple scans to a single live feed; pin, verify, and cache.
|
||||
|
||||
**Add to acceptance tests**
|
||||
|
||||
* **Rollback-safe updaters**
|
||||
|
||||
* If a feed update fails checksum or signature, the system keeps using the last “good” bundle.
|
||||
* On restart, the updater falls back to the last verified bundle without network access.
|
||||
* **Signed offline bundles**
|
||||
|
||||
* Every feed bundle (SBOM catalogs, CVE DB shards, rules) must be DSSE-signed; verification blocks ingestion on mismatch.
|
||||
* Bundle manifest lists SHA-256 for each file; any deviation = reject.
|
||||
|
||||
**Test cases (CI)**
|
||||
|
||||
* Simulate 404/timeout from feed URL → scanner still produces results from cached bundle.
|
||||
* Serve a tampered bundle (wrong hash) → updater logs failure; no swap; previous bundle remains active.
|
||||
* Air-gap mode: no network → scanner loads from `/var/lib/stellaops/offline-bundles/*` and passes verification.
|
||||
|
||||
---
|
||||
|
||||
# 2) SBOM quality & schema drift
|
||||
|
||||
**Lesson:** Garbage in = garbage VEX. Gate on schema, completeness, and provenance.
|
||||
|
||||
**Add to acceptance tests**
|
||||
|
||||
* **SBOM schema gating**
|
||||
|
||||
* Reject SBOMs not valid CycloneDX 1.6 / SPDX 2.3 (your chosen set).
|
||||
* Require: component `bom-ref`, supplier, version, hashes, and build provenance (SLSA/in-toto attestation) if provided.
|
||||
* **Minimum completeness**
|
||||
|
||||
* Thresholds: ≥95% components with cryptographic hashes; no unknown package ecosystem fields for top 20 deps.
|
||||
|
||||
**Test cases**
|
||||
|
||||
* Feed malformed CycloneDX → `400 SBOM_VALIDATION_FAILED` with pointer to failing JSON path.
|
||||
* SBOM missing hashes for >5% of components → blocked from graph ingestion; actionable error.
|
||||
* SBOM with unsigned provenance when policy="RequireAttestation" → rejected.
|
||||
|
||||
---
|
||||
|
||||
# 3) DB/data corruption or operator error
|
||||
|
||||
**Lesson:** Snapshots save releases.
|
||||
|
||||
**Add to acceptance tests**
|
||||
|
||||
* **DB snapshot cadence**
|
||||
|
||||
* Postgres: base backup nightly + WAL archiving; RPO ≤ 15 min; automated restore rehearsals.
|
||||
* Mongo (while still in use): per-collection dumps until conversion completes; checksum each artifact.
|
||||
* **Deterministic replay**
|
||||
|
||||
* Any graph view must be reproducible from snapshot + bundle manifest (same revision hash).
|
||||
|
||||
**Test cases**
|
||||
|
||||
* Run chaos test that deletes last 24h tables → PITR restore to T-15m succeeds; graph revision IDs match pre-failure.
|
||||
* Restore rehearsal produces identical VEX verdict counts for a pinned revision.
|
||||
|
||||
---
|
||||
|
||||
# 4) Reachability engines & graph evaluation flakiness
|
||||
|
||||
**Lesson:** When reachability is uncertain, degrade gracefully and be explicit.
|
||||
|
||||
**Add to acceptance tests**
|
||||
|
||||
* **Reachability fallbacks**
|
||||
|
||||
* If call-graph build fails or language analyzer missing, verdict moves to “Potentially Affected (Unproven Reach)” with a reason code.
|
||||
* Policies must allow “conservative mode” (assume reachable) vs “lenient mode” (assume not-reachable) toggled per environment.
|
||||
* **Stable graph IDs**
|
||||
|
||||
* Graph revision ID is a content hash of inputs (SBOM set + rules + feed versions); identical inputs → identical ID.
|
||||
|
||||
**Test cases**
|
||||
|
||||
* Remove a language analyzer container at runtime → status flips to fallback code; no 500s; policy evaluation still completes.
|
||||
* Re-ingest same inputs → same graph revision ID and same verdict distribution.
|
||||
|
||||
---
|
||||
|
||||
# 5) Update pipelines & job routing
|
||||
|
||||
**Lesson:** No single point of truth; isolate, audit, and prove swaps.
|
||||
|
||||
**Add to acceptance tests**
|
||||
|
||||
* **Two-phase bundle swaps**
|
||||
|
||||
* Stage → verify → atomic symlink/label swap; all scanners pick up new label within 1 minute, or roll back.
|
||||
* **Authority-gated policy changes**
|
||||
|
||||
* Any policy change (severity threshold, allowlist) is a signed request via Authority; audit trail must include signer and DSSE envelope hash.
|
||||
|
||||
**Test cases**
|
||||
|
||||
* Introduce a new CVE ruleset; verification passes → atomic swap; running scans continue; new scans use N+1 bundle.
|
||||
* Attempt policy change with invalid signature → rejected; audit log entry created; unchanged policy in effect.
|
||||
|
||||
---
|
||||
|
||||
## How to wire this in Stella Ops (quick pointers)
|
||||
|
||||
* **Offline bundle format**
|
||||
|
||||
* `bundle.json` (manifest: file list + SHA-256 + DSSE signature), `/sboms/*.json`, `/feeds/cve/*.sqlite` (or shards), `/rules/*.yaml`, `/provenance/*.intoto.jsonl`.
|
||||
* Verification entrypoint in .NET 10: `StellaOps.Bundle.VerifyAsync(manifest, keyring)` before any ingestion.
|
||||
|
||||
* **Authority integration**
|
||||
|
||||
* Define `PolicyChangeRequest` (subject, diff, reason, expiry, DSSE envelope).
|
||||
* Gate `PUT /policies/*` behind `Authority.Verify(envelope) == true` and `envelope.subject == computed_diff_hash`.
|
||||
|
||||
* **Graph determinism**
|
||||
|
||||
* `GraphRevisionId = SHA256(Sort(JSON([SBOMRefs, RulesetVersion, FeedBundleIds, LatticeConfig, NormalizationVersion])))`.
|
||||
|
||||
* **Postgres snapshots (until full conversion)**
|
||||
|
||||
* Use `pg_basebackup` nightly + `wal-g` for WAL; GitLab job runs restore rehearsal weekly into `stellaops-restore` namespace and asserts revision parity against prod.
|
||||
|
||||
---
|
||||
|
||||
## Minimal developer checklist (copy to your sprint board)
|
||||
|
||||
* [ ] Add `BundleVerifier` to scanner startup; block if verification fails.
|
||||
* [ ] Implement `CacheLastGoodBundle()` and atomic label swap (`/current -> /v-YYYYMMDDHHmm`).
|
||||
* [ ] Add `SbomGate` with JSON-Schema validation + completeness thresholds.
|
||||
* [ ] Emit reasoned fallbacks: `REACH_FALLBACK_NO_ANALYZER`, `REACH_FALLBACK_TIMEOUT`.
|
||||
* [ ] Compute and display `GraphRevisionId` everywhere (API + UI + logs).
|
||||
* [ ] Configure nightly PG backups + weekly restore rehearsal that asserts revision parity.
|
||||
* [ ] Route all policy mutations through Authority DSSE verification + auditable ledger entry.
|
||||
|
||||
If you want, I can turn this into ready-to-merge .NET test fixtures (xUnit) and a GitLab CI job that runs the feed-tamper/air-gap simulations automatically.
|
||||
I’ll take the 5 “miss” areas and turn them into concrete, implementable test plans, with suggested projects, fixtures, and key cases your team can start coding.
|
||||
|
||||
I’ll keep names aligned to .NET 10/xUnit and your Stella Ops modules.
|
||||
|
||||
---
|
||||
|
||||
## 0. Test layout proposal
|
||||
|
||||
**Solution structure (tests)**
|
||||
|
||||
```text
|
||||
/tests
|
||||
/StellaOps.Bundle.Tests
|
||||
BundleVerificationTests.cs
|
||||
CachedBundleFallbackTests.cs
|
||||
/StellaOps.SbomGate.Tests
|
||||
SbomSchemaValidationTests.cs
|
||||
SbomCompletenessTests.cs
|
||||
/StellaOps.Scanner.Tests
|
||||
ScannerOfflineBundleTests.cs
|
||||
ReachabilityFallbackTests.cs
|
||||
GraphRevisionDeterminismTests.cs
|
||||
/StellaOps.DataRecoverability.Tests
|
||||
PostgresSnapshotRestoreTests.cs
|
||||
GraphReplayParityTests.cs
|
||||
/StellaOps.Authority.Tests
|
||||
PolicyChangeSignatureTests.cs
|
||||
/StellaOps.System.Acceptance
|
||||
FeedOutageEndToEndTests.cs
|
||||
AirGapModeEndToEndTests.cs
|
||||
BundleSwapEndToEndTests.cs
|
||||
/testdata
|
||||
/bundles
|
||||
/sboms
|
||||
/graphs
|
||||
/db
|
||||
```
|
||||
|
||||
Use xUnit + FluentAssertions, plus Testcontainers for Postgres.
|
||||
|
||||
---
|
||||
|
||||
## 1) Feed outages & integrity drift
|
||||
|
||||
### Objectives
|
||||
|
||||
1. Scanner never “goes dark” because the CDN/feed is down.
|
||||
2. Only **verified** bundles are used; tampered bundles are never ingested.
|
||||
3. Offline/air-gap mode is a first-class, tested behavior.
|
||||
|
||||
### Components under test
|
||||
|
||||
* `StellaOps.BundleVerifier` (core library)
|
||||
* `StellaOps.Scanner.Webservice` (scanner, bundle loader)
|
||||
* Bundle filesystem layout:
|
||||
`/opt/stellaops/bundles/v-<timestamp>/*` + `/opt/stellaops/bundles/current` symlink
|
||||
|
||||
### Test dimensions
|
||||
|
||||
* Network: OK / timeout / 404 / TLS failure / DNS failure.
|
||||
* Remote bundle: correct / tampered (hash mismatch) / wrong signature / truncated.
|
||||
* Local cache: last-good present / absent / corrupted.
|
||||
* Mode: online / offline (air-gap).
|
||||
|
||||
### Detailed test suites
|
||||
|
||||
#### 1.1 Bundle verification unit tests
|
||||
|
||||
**Project:** `StellaOps.Bundle.Tests`
|
||||
|
||||
**Fixtures:**
|
||||
|
||||
* `testdata/bundles/good-bundle/`
|
||||
* `testdata/bundles/hash-mismatch-bundle/`
|
||||
* `testdata/bundles/bad-signature-bundle/`
|
||||
* `testdata/bundles/missing-file-bundle/`
|
||||
|
||||
**Key tests:**
|
||||
|
||||
1. `VerifyAsync_ValidBundle_ReturnsSuccess`
|
||||
|
||||
* Arrange: Load `good-bundle` manifest + DSSE signature.
|
||||
* Act: `BundleVerifier.VerifyAsync(manifest, keyring)`
|
||||
* Assert:
|
||||
|
||||
* `result.IsValid == true`
|
||||
* `result.Files.All(f => f.Status == Verified)`
|
||||
|
||||
2. `VerifyAsync_HashMismatch_FailsFast`
|
||||
|
||||
* Use `hash-mismatch-bundle` where one file’s SHA256 differs.
|
||||
* Assert:
|
||||
|
||||
* `IsValid == false`
|
||||
* `Errors` contains `BUNDLE_FILE_HASH_MISMATCH` and the offending path.
|
||||
|
||||
3. `VerifyAsync_InvalidSignature_RejectsBundle`
|
||||
|
||||
* DSSE envelope signed with unknown key.
|
||||
* Assert:
|
||||
|
||||
* `IsValid == false`
|
||||
* `Errors` contains `BUNDLE_SIGNATURE_INVALID`.
|
||||
|
||||
4. `VerifyAsync_MissingFile_RejectsBundle`
|
||||
|
||||
* Manifest lists file that does not exist on disk.
|
||||
* Assert:
|
||||
|
||||
* `IsValid == false`
|
||||
* `Errors` contains `BUNDLE_FILE_MISSING`.
|
||||
|
||||
#### 1.2 Cached bundle fallback logic
|
||||
|
||||
**Class under test:** `BundleManager`
|
||||
|
||||
Simplified interface:
|
||||
|
||||
```csharp
|
||||
public interface IBundleManager {
|
||||
Task<BundleRef> GetActiveBundleAsync();
|
||||
Task<BundleRef> UpdateFromRemoteAsync(CancellationToken ct);
|
||||
}
|
||||
```
|
||||
|
||||
**Key tests:**
|
||||
|
||||
1. `UpdateFromRemoteAsync_RemoteUnavailable_KeepsLastGoodBundle`
|
||||
|
||||
* Arrange:
|
||||
|
||||
* `lastGood` bundle exists and is marked verified.
|
||||
* Remote HTTP client always throws `TaskCanceledException` (simulated timeout).
|
||||
* Act: `UpdateFromRemoteAsync`.
|
||||
* Assert:
|
||||
|
||||
* Returned bundle ID equals `lastGood.Id`.
|
||||
* No changes to `current` symlink.
|
||||
|
||||
2. `UpdateFromRemoteAsync_RemoteTampered_DoesNotReplaceCurrent`
|
||||
|
||||
* Remote returns bundle `temp-bundle` which fails `BundleVerifier`.
|
||||
* Assert:
|
||||
|
||||
* `current` still points to `lastGood`.
|
||||
* An error metric is emitted (e.g. `stellaops_bundle_update_failures_total++`).
|
||||
|
||||
3. `GetActiveBundle_NoVerifiedBundle_ThrowsDomainError`
|
||||
|
||||
* No bundle is verified on disk.
|
||||
* `GetActiveBundleAsync` throws a domain exception with code `NO_VERIFIED_BUNDLE_AVAILABLE`.
|
||||
* Consumption pattern in Scanner: scanner fails fast on startup with clear log.
|
||||
|
||||
#### 1.3 Scanner behavior with outages (integration)
|
||||
|
||||
**Project:** `StellaOps.Scanner.Tests`
|
||||
|
||||
Use in-memory host (`WebApplicationFactory<ScannerProgram>`).
|
||||
|
||||
**Scenarios:**
|
||||
|
||||
* F1: CDN timeout, last-good present.
|
||||
* F2: CDN 404, last-good present.
|
||||
* F3: CDN returns tampered bundle; verification fails.
|
||||
* F4: Air-gap: network disabled, last-good present.
|
||||
* F5: Air-gap + no last-good: scanner must refuse to start.
|
||||
|
||||
Example test:
|
||||
|
||||
```csharp
|
||||
[Fact]
|
||||
public async Task Scanner_UsesLastGoodBundle_WhenCdnTimesOut() {
|
||||
// Arrange: put good bundle under /bundles/v-1, symlink /bundles/current -> v-1
|
||||
using var host = TestScannerHost.WithBundle("v-1", good: true, simulateCdnTimeout: true);
|
||||
|
||||
// Act: call /api/scan with small fixture image
|
||||
var response = await host.Client.PostAsJsonAsync("/api/scan", scanRequest);
|
||||
|
||||
// Assert:
|
||||
response.StatusCode.Should().Be(HttpStatusCode.OK);
|
||||
var content = await response.Content.ReadFromJsonAsync<ScanResult>();
|
||||
content.BundleId.Should().Be("v-1");
|
||||
host.Logs.Should().Contain("Falling back to last verified bundle");
|
||||
}
|
||||
```
|
||||
|
||||
#### 1.4 System acceptance (GitLab CI)
|
||||
|
||||
**Job idea:** `acceptance:feed-resilience`
|
||||
|
||||
Steps:
|
||||
|
||||
1. Spin up `scanner` + stub `feedser` container.
|
||||
2. Phase A: feed OK → run baseline scan; capture `bundleId` and `graphRevisionId`.
|
||||
3. Phase B: re-run with feed stub configured to:
|
||||
|
||||
* timeout,
|
||||
* 404,
|
||||
* return tampered bundle.
|
||||
4. For each phase:
|
||||
|
||||
* Assert `bundleId` remains the baseline one.
|
||||
* Assert `graphRevisionId` unchanged.
|
||||
|
||||
Failure of any assertion should break the pipeline.
|
||||
|
||||
---
|
||||
|
||||
## 2) SBOM quality & schema drift
|
||||
|
||||
### Objectives
|
||||
|
||||
1. Only syntactically valid SBOMs are ingested into the graph.
|
||||
2. Enforce minimum completeness (hash coverage, supplier etc.).
|
||||
3. Clear, machine-readable error responses from SBOM ingestion API.
|
||||
|
||||
### Components
|
||||
|
||||
* `StellaOps.SbomGate` (validation service)
|
||||
* SBOM ingestion endpoint in Scanner/Concelier: `POST /api/sboms`
|
||||
|
||||
### Schema validation tests
|
||||
|
||||
**Project:** `StellaOps.SbomGate.Tests`
|
||||
|
||||
**Fixtures:**
|
||||
|
||||
* `sbom-cdx-1.6-valid.json`
|
||||
* `sbom-cdx-1.6-malformed.json`
|
||||
* `sbom-spdx-2.3-valid.json`
|
||||
* `sbom-unsupported-schema.json`
|
||||
* `sbom-missing-hashes-10percent.json`
|
||||
* `sbom-no-supplier.json`
|
||||
|
||||
**Key tests:**
|
||||
|
||||
1. `Validate_ValidCycloneDx16_Succeeds`
|
||||
|
||||
* Assert type `SbomValidationResult.Success`.
|
||||
* Ensure `DetectedSchema == CycloneDx16`.
|
||||
|
||||
2. `Validate_MalformedJson_FailsWithSyntaxError`
|
||||
|
||||
* Malformed JSON.
|
||||
* Assert:
|
||||
|
||||
* `IsValid == false`
|
||||
* `Errors` contains `SBOM_JSON_SYNTAX_ERROR` with path info.
|
||||
|
||||
3. `Validate_UnsupportedSchemaVersion_Fails`
|
||||
|
||||
* SPDX 2.1 (if you only allow 2.3).
|
||||
* Expect `SBOM_SCHEMA_UNSUPPORTED` with `schemaUri` echo.
|
||||
|
||||
4. `Validate_MissingHashesOverThreshold_Fails`
|
||||
|
||||
* SBOM where >5% components lack hashes.
|
||||
* Policy: `MinHashCoverage = 0.95`.
|
||||
* Assert:
|
||||
|
||||
* `IsValid == false`
|
||||
* `Errors` contains `SBOM_HASH_COVERAGE_BELOW_THRESHOLD` with actual ratio.
|
||||
|
||||
5. `Validate_MissingSupplier_Fails`
|
||||
|
||||
* Critical components missing supplier info.
|
||||
* Expect `SBOM_REQUIRED_FIELD_MISSING` with `component.supplier`.
|
||||
|
||||
### API-level tests
|
||||
|
||||
**Project:** `StellaOps.Scanner.Tests` (or `StellaOps.Concelier.Tests` depending where SBOM ingestion lives).
|
||||
|
||||
Key scenarios:
|
||||
|
||||
1. `POST /api/sboms` with malformed JSON
|
||||
|
||||
* Request body: `sbom-cdx-1.6-malformed.json`.
|
||||
* Expected:
|
||||
|
||||
* HTTP 400.
|
||||
* Body: `{ "code": "SBOM_VALIDATION_FAILED", "details": [ ... ], "correlationId": "..." }`.
|
||||
* At least one detail contains `SBOM_JSON_SYNTAX_ERROR`.
|
||||
|
||||
2. `POST /api/sboms` with missing hashes
|
||||
|
||||
* Body: `sbom-missing-hashes-10percent.json`.
|
||||
* HTTP 400 with `SBOM_HASH_COVERAGE_BELOW_THRESHOLD`.
|
||||
|
||||
3. `POST /api/sboms` with unsupported schema
|
||||
|
||||
* Body: `sbom-unsupported-schema.json`.
|
||||
* HTTP 400 with `SBOM_SCHEMA_UNSUPPORTED`.
|
||||
|
||||
4. `POST /api/sboms` valid
|
||||
|
||||
* Body: `sbom-cdx-1.6-valid.json`.
|
||||
* HTTP 202 or 201 (depending on design).
|
||||
* Response contains SBOM ID; subsequent graph build sees that SBOM.
|
||||
|
||||
---
|
||||
|
||||
## 3) DB/data corruption & operator error
|
||||
|
||||
### Objectives
|
||||
|
||||
1. You can restore Postgres to a point in time and reproduce previous graph results.
|
||||
2. Graphs are deterministic given bundle + SBOM + rules.
|
||||
3. Obvious corruptions are detected and surfaced, not silently masked.
|
||||
|
||||
### Components
|
||||
|
||||
* Postgres cluster (new canonical store)
|
||||
* `StellaOps.Scanner.Webservice` (graph builder, persistence)
|
||||
* `GraphRevisionId` computation
|
||||
|
||||
### 3.1 Postgres snapshot / WAL tests
|
||||
|
||||
**Project:** `StellaOps.DataRecoverability.Tests`
|
||||
|
||||
Use Testcontainers to spin up Postgres.
|
||||
|
||||
Scenarios:
|
||||
|
||||
1. `PITR_Restore_ReplaysGraphsWithSameRevisionIds`
|
||||
|
||||
* Arrange:
|
||||
|
||||
* Spin DB container with WAL archiving enabled.
|
||||
* Apply schema migrations.
|
||||
* Ingest fixed set of SBOMs + bundle refs + rules.
|
||||
* Trigger graph build → record `graphRevisionIds` from API.
|
||||
* Take base backup snapshot (simulate daily snapshot).
|
||||
* Act:
|
||||
|
||||
* Destroy container.
|
||||
* Start new container from base backup + replay WAL up to a specific LSN.
|
||||
* Start Scanner against restored DB.
|
||||
* Query graphs again.
|
||||
* Assert:
|
||||
|
||||
* For each known graph: `revisionId_restored == revisionId_original`.
|
||||
* Number of nodes/edges is identical.
|
||||
|
||||
2. `PartialDataLoss_DetectedByHealthCheck`
|
||||
|
||||
* After initial load, deliberately delete some rows (e.g. all edges for a given graph).
|
||||
* Run health check endpoint, e.g. `/health/graph`.
|
||||
* Expect:
|
||||
|
||||
* HTTP 503.
|
||||
* Body indicates `GRAPH_INTEGRITY_FAILED` with details of missing edges.
|
||||
|
||||
This test forces a discipline to implement a basic graph integrity check (e.g. counts by state vs expected).
|
||||
|
||||
### 3.2 Deterministic replay tests
|
||||
|
||||
**Project:** `StellaOps.Scanner.Tests` → `GraphRevisionDeterminismTests.cs`
|
||||
|
||||
**Precondition:** Graph revision ID computed as:
|
||||
|
||||
```csharp
|
||||
GraphRevisionId = SHA256(
|
||||
Normalize([
|
||||
BundleId,
|
||||
OrderedSbomIds,
|
||||
RulesetVersion,
|
||||
FeedBundleIds,
|
||||
LatticeConfigVersion,
|
||||
NormalizationVersion
|
||||
])
|
||||
);
|
||||
```
|
||||
|
||||
**Scenarios:**
|
||||
|
||||
1. `SameInputs_SameRevisionId`
|
||||
|
||||
* Run graph build twice for same inputs.
|
||||
* Assert identical `GraphRevisionId`.
|
||||
|
||||
2. `DifferentBundle_DifferentRevisionId`
|
||||
|
||||
* Same SBOMs & rules; change vulnerability bundle ID.
|
||||
* Assert `GraphRevisionId` changes.
|
||||
|
||||
3. `DifferentRuleset_DifferentRevisionId`
|
||||
|
||||
* Same SBOM & bundle; change ruleset version.
|
||||
* Assert `GraphRevisionId` changes.
|
||||
|
||||
4. `OrderingIrrelevant_StableRevision`
|
||||
|
||||
* Provide SBOMs in different order.
|
||||
* Assert ` GraphRevisionId` same (because of internal sorting).
|
||||
|
||||
---
|
||||
|
||||
## 4) Reachability engine & graph evaluation flakiness
|
||||
|
||||
### Objectives
|
||||
|
||||
1. If reachability cannot be computed, you do not break; you downgrade verdicts with explicit reason codes.
|
||||
2. Deterministic reachability for “golden fixtures”.
|
||||
3. Graph evaluation remains stable even when analyzers come and go.
|
||||
|
||||
### Components
|
||||
|
||||
* `StellaOps.Scanner.Webservice` (lattice / reachability engine)
|
||||
* Language analyzers (sidecar or gRPC microservices)
|
||||
* Verdict representation, e.g.:
|
||||
|
||||
```csharp
|
||||
public sealed record VulnerabilityVerdict(
|
||||
string Status, // "NotAffected", "Affected", "PotentiallyAffected"
|
||||
string ReasonCode, // "REACH_CONFIRMED", "REACH_FALLBACK_NO_ANALYZER", ...
|
||||
string? AnalyzerId
|
||||
);
|
||||
```
|
||||
|
||||
### 4.1 Golden reachability fixtures
|
||||
|
||||
**Project:** `StellaOps.Scanner.Tests` → `GoldenReachabilityTests.cs`
|
||||
**Fixtures directory:** `/testdata/reachability/fixture-*/`
|
||||
|
||||
Each fixture:
|
||||
|
||||
```text
|
||||
/testdata/reachability/fixture-01-log4j/
|
||||
sbom.json
|
||||
code-snippets/...
|
||||
expected-vex.json
|
||||
config.json # language, entrypoints, etc.
|
||||
```
|
||||
|
||||
**Test pattern:**
|
||||
|
||||
For each fixture:
|
||||
|
||||
1. Load SBOM + configuration.
|
||||
2. Trigger reachability analysis.
|
||||
3. Collect raw reachability graph + final VEX verdicts.
|
||||
4. Compare to `expected-vex.json` (status + reason codes).
|
||||
5. Store the `GraphRevisionId` and set it as golden as well.
|
||||
|
||||
Key cases:
|
||||
|
||||
* R1: simple direct call → reachability confirmed → `Status = "Affected", ReasonCode = "REACH_CONFIRMED"`.
|
||||
* R2: library present but not called → `Status = "NotAffected", ReasonCode = "REACH_ANALYZED_UNREACHABLE"`.
|
||||
* R3: language analyzer missing → `Status = "PotentiallyAffected", ReasonCode = "REACH_FALLBACK_NO_ANALYZER"`.
|
||||
* R4: analysis timeout → `Status = "PotentiallyAffected", ReasonCode = "REACH_FALLBACK_TIMEOUT"`.
|
||||
|
||||
### 4.2 Analyzer unavailability / fallback behavior
|
||||
|
||||
**Project:** `StellaOps.Scanner.Tests` → `ReachabilityFallbackTests.cs`
|
||||
|
||||
Scenarios:
|
||||
|
||||
1. `NoAnalyzerRegistered_ForLanguage_UsesFallback`
|
||||
|
||||
* Scanner config lists a component in language “go” but no analyzer registered.
|
||||
* Expect:
|
||||
|
||||
* No 500 error from `/api/graphs/...`.
|
||||
* All applicable vulnerabilities for that component have `Status = "PotentiallyAffected"` and `ReasonCode = "REACH_FALLBACK_NO_ANALYZER"`.
|
||||
|
||||
2. `AnalyzerRpcFailure_UsesFallback`
|
||||
|
||||
* Analyzer responds with gRPC error or HTTP 500.
|
||||
* Scanner logs error and keeps going.
|
||||
* Same semantics as missing analyzer, but with `AnalyzerId` populated and optional `ReasonDetails` (e.g. `RPC_UNAVAILABLE`).
|
||||
|
||||
3. `AnalyzerTimeout_UsesTimeoutFallback`
|
||||
|
||||
* Force analyzer calls to time out.
|
||||
* `ReasonCode = "REACH_FALLBACK_TIMEOUT"`.
|
||||
|
||||
### 4.3 Concurrency & determinism
|
||||
|
||||
Add a test that:
|
||||
|
||||
1. Triggers N parallel graph builds for the same inputs.
|
||||
2. Asserts that:
|
||||
|
||||
* All builds succeed.
|
||||
* All `GraphRevisionId` are identical.
|
||||
* All reachability reason codes are identical.
|
||||
|
||||
This is important for concurrent scanners and ensures lack of race conditions in graph construction.
|
||||
|
||||
---
|
||||
|
||||
## 5) Update pipelines & job routing
|
||||
|
||||
### Objectives
|
||||
|
||||
1. Bundle swaps are atomic: scanners see either old or new, never partially written bundles.
|
||||
2. Policy changes are always signed via Authority; unsigned/invalid changes never apply.
|
||||
3. Job routing changes (if/when you move to direct microservice pools) remain stateless and testable.
|
||||
|
||||
### 5.1 Two-phase bundle swap tests
|
||||
|
||||
**Bundle layout:**
|
||||
|
||||
* `/opt/stellaops/bundles/current` → symlink to `v-YYYYMMDDHHmmss`
|
||||
* New bundle:
|
||||
|
||||
* Download to `/opt/stellaops/bundles/staging/<temp-id>`
|
||||
* Verify
|
||||
* Atomic `ln -s v-new current.tmp && mv -T current.tmp current`
|
||||
|
||||
**Project:** `StellaOps.Bundle.Tests` → `BundleSwapTests.cs`
|
||||
|
||||
Scenarios:
|
||||
|
||||
1. `Swap_Success_IsAtomic`
|
||||
|
||||
* Simulate swap in a temp directory.
|
||||
* During swap, spawn parallel tasks that repeatedly read `current` and open `manifest.json`.
|
||||
* Assert:
|
||||
|
||||
* Readers never fail with “file not found” / partial manifest.
|
||||
* Readers only see either `v-old` or `v-new`, no mixed state.
|
||||
|
||||
2. `Swap_VerificationFails_NoChangeToCurrent`
|
||||
|
||||
* Stage bundle which fails `BundleVerifier`.
|
||||
* After attempted swap:
|
||||
|
||||
* `current` still points to `v-old`.
|
||||
* No new directory with the name expected for `v-new` is referenced by `current`.
|
||||
|
||||
3. `Swap_CrashBetweenVerifyAndMv_LeavesSystemConsistent`
|
||||
|
||||
* Simulate crash after creating `current.tmp` but before `mv -T`.
|
||||
* On “restart”:
|
||||
|
||||
* Cleanup code must detect `current.tmp` and remove it.
|
||||
* Ensure `current` still points to last good.
|
||||
|
||||
### 5.2 Authority-gated policy changes
|
||||
|
||||
**Component:** `StellaOps.Authority` + any service that exposes `/policies`.
|
||||
|
||||
Policy change flow:
|
||||
|
||||
1. Client sends DSSE-signed `PolicyChangeRequest` to `/authority/verify`.
|
||||
2. Authority validates signature, subject hash.
|
||||
3. Service applies change only if Authority approves.
|
||||
|
||||
**Project:** `StellaOps.Authority.Tests` + `StellaOps.Scanner.Tests` (or wherever policies live).
|
||||
|
||||
Key tests:
|
||||
|
||||
1. `PolicyChange_WithValidSignature_Applies`
|
||||
|
||||
* Signed request’s `subject` hash matches computed diff of old->new policy.
|
||||
* Authority returns `Approved`.
|
||||
* Policy service updates policy; audit log entry recorded.
|
||||
|
||||
2. `PolicyChange_InvalidSignature_Rejected`
|
||||
|
||||
* Signature verifiable with no trusted key, or corrupted payload.
|
||||
* Expect:
|
||||
|
||||
* HTTP 403 or 400 from policy endpoint.
|
||||
* No policy change in DB.
|
||||
* Audit log entry with reason `SIGNATURE_INVALID`.
|
||||
|
||||
3. `PolicyChange_SubjectHashMismatch_Rejected`
|
||||
|
||||
* Attacker changes policy body but not DSSE subject.
|
||||
* On verification, recomputed diff doesn’t match subject hash.
|
||||
* Authority rejects with `SUBJECT_MISMATCH`.
|
||||
|
||||
4. `PolicyChange_ExpiredEnvelope_Rejected`
|
||||
|
||||
* Envelope contains `expiry` in past.
|
||||
* Authority rejects with `ENVELOPE_EXPIRED`.
|
||||
|
||||
5. `PolicyChange_AuditTrail_Complete`
|
||||
|
||||
* After valid change:
|
||||
|
||||
* Audit log contains: `policyName`, `oldHash`, `newHash`, `signerId`, `envelopeId`, `timestamp`.
|
||||
|
||||
### 5.3 Job routing (if/when you use DB-backed routing tables)
|
||||
|
||||
You discussed a `routing` table:
|
||||
|
||||
```sql
|
||||
domain text,
|
||||
instance_id uuid,
|
||||
last_heartbeat timestamptz,
|
||||
table_name text
|
||||
```
|
||||
|
||||
Key tests (once implemented):
|
||||
|
||||
1. `HeartbeatExpired_DropsRoutingEntry`
|
||||
|
||||
* Insert entry with `last_heartbeat` older than 1 minute.
|
||||
* Routing GC job should remove it.
|
||||
* API gateway must not route new jobs to that instance.
|
||||
|
||||
2. `RoundRobinAcrossAliveInstances`
|
||||
|
||||
* Multiple routing rows for same domain with fresh heartbeats.
|
||||
* Issue M requests via gateway.
|
||||
* Assert approximately round-robin distribution across `instance_id`.
|
||||
|
||||
3. `NoDurabilityRequired_JobsNotReplayedAfterRestart`
|
||||
|
||||
* Confirm that in-memory or temp tables are used appropriately where you do not want durable queues.
|
||||
|
||||
If you decide to go with “N gateways x M microservices via Docker load balancer only”, then the main tests here move to health-check based routing in the load balancer and become more infra than app tests.
|
||||
|
||||
---
|
||||
|
||||
## 6) CI wiring summary
|
||||
|
||||
To make this actually enforceable:
|
||||
|
||||
1. **Unit test job** (`test:unit`)
|
||||
|
||||
* Runs `StellaOps.Bundle.Tests`, `StellaOps.SbomGate.Tests`, `StellaOps.Authority.Tests`, `StellaOps.Scanner.Tests`.
|
||||
|
||||
2. **DB recoverability job** (`test:db-recoverability`)
|
||||
|
||||
* Uses Testcontainers to run `StellaOps.DataRecoverability.Tests`.
|
||||
* Marked as “required” for `main` branch merges.
|
||||
|
||||
3. **Acceptance job** (`test:acceptance-system`)
|
||||
|
||||
* Spins up a minimal stack via Docker Compose.
|
||||
* Executes `StellaOps.System.Acceptance` tests:
|
||||
|
||||
* Feed outages & fallback.
|
||||
* Air-gap modes.
|
||||
* Bundle swap.
|
||||
* Can be slower; run on main and release branches.
|
||||
|
||||
4. **Nightly chaos job** (`test:nightly-chaos`)
|
||||
|
||||
* Optional: run more expensive tests (simulated DB corruption, analyzer outages, etc.).
|
||||
|
||||
---
|
||||
|
||||
If you want, next step I can generate skeleton xUnit test classes and a `/testdata` layout you can paste directly into your repo (with TODOs where real fixtures are needed).
|
||||
Reference in New Issue
Block a user