Add reference architecture and testing strategy documentation

- Created a new document for the Stella Ops Reference Architecture outlining the system's topology, trust boundaries, artifact association, and interfaces. - Developed a comprehensive Testing Strategy document detailing the importance of offline readiness, interoperability, determinism, and operational guardrails. - Introduced a README for the Testing Strategy, summarizing processing details and key concepts implemented. - Added guidance for AI agents and developers in the tests directory, including directory structure, test categories, key patterns, and rules for test development.
2025-12-22 07:59:15 +02:00
parent 5d398ec442
commit 53503cb407
96 changed files with 37565 additions and 71 deletions
--- a/docs/19_TEST_SUITE_OVERVIEW.md
+++ b/docs/19_TEST_SUITE_OVERVIEW.md
@@ -1,7 +1,7 @@
-# Automated Test‑Suite Overview
+# Automated Test-Suite Overview

-This document enumerates **every automated check** executed by the Stella Ops
-CI pipeline, from unit level to chaos experiments.  It is intended for
+This document enumerates **every automated check** executed by the Stella Ops
+CI pipeline, from unit level to chaos experiments. It is intended for
 contributors who need to extend coverage or diagnose failures.

 > **Build parameters** – values such as `{{ dotnet }}` (runtime) and
@@ -9,40 +9,81 @@ contributors who need to extend coverage or diagnose failures.

 ---

-## Layer map
+## Test Philosophy

-| Layer | Tooling | Entry‑point | Frequency |
-|-------|---------|-------------|-----------|
-| **1. Unit** | `xUnit` (<code>dotnet test</code>) | `*.Tests.csproj` | per PR / push |
-| **2. Property‑based** | `FsCheck` | `SbomPropertyTests` | per PR |
-| **3. Integration (API)** | `Testcontainers` suite | `test/Api.Integration` | per PR + nightly |
-| **4. Integration (DB-merge)** | Testcontainers PostgreSQL + Redis | `Concelier.Integration` (vulnerability ingest/merge/export service) | per PR |
-| **5. Contract (gRPC)** | `Buf breaking` | `buf.yaml` files | per PR |
-| **6. Front‑end unit** | `Jest` | `ui/src/**/*.spec.ts` | per PR |
-| **7. Front‑end E2E** | `Playwright` | `ui/e2e/**` | nightly |
-| **8. Lighthouse perf / a11y** | `lighthouse-ci` (Chrome headless) | `ui/dist/index.html` | nightly |
-| **9. Load** | `k6` scripted scenarios | `k6/*.js` | nightly |
-| **10. Chaos CPU / OOM** | `pumba` | Docker Compose overlay | weekly |
-| **11. Dependency scanning** | `Trivy fs` + `dotnet list package --vuln` | root | per PR |
-| **12. License compliance** | `LicenceFinder` | root | per PR |
-| **13. SBOM reproducibility** | `in‑toto attestation` diff | GitLab job | release tags |
+### Core Principles
+
+1. **Determinism as Contract**: Scan verdicts must be reproducible. Same inputs → byte-identical outputs.
+2. **Offline by Default**: Every test (except explicitly tagged "online") runs without network access.
+3. **Evidence-First Validation**: Assertions verify the complete evidence chain, not just pass/fail.
+4. **Interop is Required**: Compatibility with ecosystem tools (Syft, Grype, Trivy, cosign) blocks releases.
+5. **Coverage by Risk**: Prioritize testing high-risk paths over line coverage metrics.
+
+### Test Boundaries
+
+- **Lattice/policy merge** algorithms run in `scanner.webservice`
+- **Concelier/Excitors** preserve prune source (no conflict resolution)
+- Tests enforce these boundaries explicitly

 ---

-## Quality gates
+## Layer Map
+
+| Layer | Tooling | Entry-point | Frequency |
+|-------|---------|-------------|-----------|
+| **1. Unit** | `xUnit` (<code>dotnet test</code>) | `*.Tests.csproj` | per PR / push |
+| **2. Property-based** | `FsCheck` | `SbomPropertyTests`, `Canonicalization` | per PR |
+| **3. Integration (API)** | `Testcontainers` suite | `test/Api.Integration` | per PR + nightly |
+| **4. Integration (DB-merge)** | Testcontainers PostgreSQL + Valkey | `Concelier.Integration` | per PR |
+| **5. Contract (OpenAPI)** | Schema validation | `docs/api/*.yaml` | per PR |
+| **6. Front-end unit** | `Jest` | `ui/src/**/*.spec.ts` | per PR |
+| **7. Front-end E2E** | `Playwright` | `ui/e2e/**` | nightly |
+| **8. Lighthouse perf / a11y** | `lighthouse-ci` (Chrome headless) | `ui/dist/index.html` | nightly |
+| **9. Load** | `k6` scripted scenarios | `tests/load/*.js` | nightly |
+| **10. Chaos** | `pumba`, custom harness | `tests/chaos/` | weekly |
+| **11. Interop** | Syft/Grype/cosign | `tests/interop/` | nightly |
+| **12. Offline E2E** | Network-isolated containers | `tests/offline/` | nightly |
+| **13. Replay Verification** | Golden corpus replay | `bench/golden-corpus/` | per PR |
+| **14. Dependency scanning** | `Trivy fs` + `dotnet list package --vuln` | root | per PR |
+| **15. License compliance** | `LicenceFinder` | root | per PR |
+| **16. SBOM reproducibility** | `in-toto attestation` diff | GitLab job | release tags |
+
+---
+
+## Test Categories (xUnit Traits)
+
+```csharp
+[Trait("Category", "Unit")]           // Fast, isolated unit tests
+[Trait("Category", "Integration")]    // Tests requiring infrastructure
+[Trait("Category", "E2E")]            // Full end-to-end workflows
+[Trait("Category", "AirGap")]         // Must work without network
+[Trait("Category", "Interop")]        // Third-party tool compatibility
+[Trait("Category", "Performance")]    // Performance benchmarks
+[Trait("Category", "Chaos")]          // Failure injection tests
+[Trait("Category", "Security")]       // Security-focused tests
+```
+
+---
+
+## Quality Gates

 | Metric | Budget | Gate |
 |--------|--------|------|
-| API unit coverage | ≥ 85 % lines | PR merge |
-| API response P95 | ≤ 120 ms | nightly alert |
-| Δ‑SBOM warm scan P95 (4 vCPU) | ≤ 5 s | nightly alert |
-| Lighthouse performance score | ≥ 90 | nightly alert |
-| Lighthouse accessibility score | ≥ 95 | nightly alert |
-| k6 sustained RPS drop | &lt; 5 % vs baseline | nightly alert |
+| API unit coverage | ≥ 85% lines | PR merge |
+| API response P95 | ≤ 120 ms | nightly alert |
+| Δ-SBOM warm scan P95 (4 vCPU) | ≤ 5 s | nightly alert |
+| Lighthouse performance score | ≥ 90 | nightly alert |
+| Lighthouse accessibility score | ≥ 95 | nightly alert |
+| k6 sustained RPS drop | < 5% vs baseline | nightly alert |
+| **Replay determinism** | 0 byte diff | **Release** |
+| **Interop findings parity** | ≥ 95% | **Release** |
+| **Offline E2E** | All pass with no network | **Release** |
+| **Unknowns budget (prod)** | ≤ configured limit | **Release** |
+| **Router Retry-After compliance** | 100% | Nightly |

 ---

-## Local runner
+## Local Runner

 ```bash
 # minimal run: unit + property + frontend tests
@@ -50,21 +91,26 @@ contributors who need to extend coverage or diagnose failures.

 # full stack incl. Playwright and lighthouse
 ./scripts/dev-test.sh --full
-````

-The script spins up PostgreSQL/Redis via Testcontainers and requires:
+# category-specific
+dotnet test --filter "Category=Unit"
+dotnet test --filter "Category=AirGap"
+dotnet test --filter "Category=Interop"
+```
+
+The script spins up PostgreSQL/Valkey via Testcontainers and requires:

 * Docker ≥ 25
 * Node 20 (for Jest/Playwright)

-#### PostgreSQL Testcontainers
+### PostgreSQL Testcontainers

 Multiple suites (Concelier connectors, Excititor worker/WebService, Scheduler)
 use Testcontainers with PostgreSQL for integration tests. If you don't have
 Docker available, tests can also run against a local PostgreSQL instance
 listening on `127.0.0.1:5432`.

-#### Local PostgreSQL helper
+### Local PostgreSQL Helper

 Some suites (Concelier WebService/Core, Exporter JSON) need a full
 PostgreSQL instance when you want to debug or inspect data with `psql`.
@@ -84,9 +130,59 @@ By default the script uses Docker to run PostgreSQL 16, binds to
 connection string is printed on start and you can export it before
 running `dotnet test` if a suite supports overriding its connection string.

--- 
+---

-### Concelier OSV↔GHSA parity fixtures
+## New Test Infrastructure (Epic 5100)
+
+### Run Manifest & Replay
+
+Every scan captures a **Run Manifest** containing all inputs (artifact digests, feed versions, policy versions, PRNG seed). This enables deterministic replay:
+
+```bash
+# Replay a scan from manifest
+stella replay --manifest run-manifest.json --output verdict.json
+
+# Verify determinism
+stella replay verify --manifest run-manifest.json
+```
+
+### Evidence Index
+
+The **Evidence Index** links verdicts to their supporting evidence chain:
+- Verdict → SBOM digests → Attestation IDs → Tool versions
+
+### Golden Corpus
+
+Located at `bench/golden-corpus/`, contains 50+ test cases:
+- Severity levels (Critical, High, Medium, Low)
+- VEX scenarios (Not Affected, Affected, Conflicting)
+- Reachability cases (Reachable, Not Reachable, Inconclusive)
+- Unknowns scenarios
+- Scale tests (200 to 50k+ packages)
+- Multi-distro (Alpine, Debian, RHEL, SUSE, Ubuntu)
+- Interop fixtures (Syft-generated, Trivy-generated)
+- Negative cases (malformed inputs)
+
+### Offline Testing
+
+Inherit from `NetworkIsolatedTestBase` for air-gap compliance:
+
+```csharp
+[Trait("Category", "AirGap")]
+public class OfflineTests : NetworkIsolatedTestBase
+{
+    [Fact]
+    public async Task Test_WorksOffline()
+    {
+        // Test implementation
+        AssertNoNetworkCalls();  // Fails if network accessed
+    }
+}
+```
+
+---
+
+## Concelier OSV↔GHSA Parity Fixtures

 The Concelier connector suite includes a regression test (`OsvGhsaParityRegressionTests`)
 that checks a curated set of GHSA identifiers against OSV responses. The fixture
@@ -104,7 +200,7 @@ fixtures stay stable across machines.

 ---

-## CI job layout
+## CI Job Layout

 ```mermaid
 flowchart LR
@@ -115,21 +211,42 @@ flowchart LR
  I1 --> FE[Jest]
  FE --> E2E[Playwright]
  E2E --> Lighthouse
+
+  subgraph release-gates
+    REPLAY[Replay Verify]
+    INTEROP[Interop E2E]
+    OFFLINE[Offline E2E]
+    BUDGET[Unknowns Gate]
+  end
+
  Lighthouse --> INTEG2[Concelier]
  INTEG2 --> LOAD[k6]
-  LOAD --> CHAOS[pumba]
+  LOAD --> CHAOS[Chaos Suite]
  CHAOS --> RELEASE[Attestation diff]
+
+  RELEASE --> release-gates
 ```

 ---

-## Adding a new test layer
+## Adding a New Test Layer

 1. Extend `scripts/dev-test.sh` so local contributors get the layer by default.
-2. Add a dedicated GitLab job in `.gitlab-ci.yml` (stage `test` or `nightly`).
+2. Add a dedicated workflow in `.gitea/workflows/` (or GitLab job in `.gitlab-ci.yml`).
 3. Register the job in `docs/19_TEST_SUITE_OVERVIEW.md` *and* list its metric
   in `docs/metrics/README.md`.
+4. If the test requires network isolation, inherit from `NetworkIsolatedTestBase`.
+5. If the test uses golden corpus, add cases to `bench/golden-corpus/`.

 ---

-*Last updated {{ "now" | date: "%Y‑%m‑%d" }}*
+## Related Documentation
+
+- [Sprint Epic 5100 - Testing Strategy](implplan/SPRINT_5100_SUMMARY.md)
+- [tests/AGENTS.md](../tests/AGENTS.md)
+- [Offline Operation Guide](24_OFFLINE_KIT.md)
+- [Module Architecture Dossiers](modules/)
+
+---
+
+*Last updated 2025-12-21*