git.stella-ops.org/docs/19_TEST_SUITE_OVERVIEW.md

#  19 · Test‑Suite Overview — **Stella Ops**
*(v2.0 — 12 Jul 2025)*

> **Purpose** — Describe the **multi‑layer automated‑test strategy** that guards Stella Ops’ five‑second performance promise, security posture and API stability, and show how each layer maps to CI gates and release criteria.

---

##  0 Table of Contents

1. Test‑pyramid at a glance
2. Layer definitions & tooling
3. Directory & naming conventions
4. CI workflows & failure policy
5. Quality gates & coverage budgets
6. Evidence retention & auditability
7. Local developer quick‑start
8. Flaky‑test triage & escalation
9. Change log

---

##  1 Test‑pyramid at a glance

| Layer                  | Framework(s)             | Scope                                   | CI frequency |
| ---------------------- | ------------------------ | --------------------------------------- | ------------ |
| **Unit**               | xUnit + FluentAssertions | Pure C# methods, guard clauses, mapping | Every PR     |
| **Mutation**           | **Stryker.NET**          | Critical algorithm branches             | Nightly      |
| **Static analysis**    | **CodeQL**, **Semgrep**  | OWASP, injection, secrets               | Every PR     |
| **Integration**        | Testcontainers + xUnit   | Redis, Trivy exec, plug‑in hot‑load     | Every PR     |
| **Quota / throttle**         | Testcontainers + Clock‑mock      | 333‑scan counter, 5 s & 60 s retry‑after headers   | Every PR     |
| **End‑to‑End (UI)**    | **Playwright C#**        | Login, scan list, mute flow             | Merge→main   |
| **Performance**        | Hyperfine + K6           | P95 latency, 40 rps throughput          | Nightly      |
| **Security DAST**      | OWASP ZAP baseline       | TLS headers, auth, XSS                  | Nightly + RC |
| **Chaos / Resilience** | **Pumba** & Toxiproxy    | Redis latency, container kill           | Weekly       |
| **Compliance smoke**   | Spectral + JSON‑Schema   | SBOM & API payloads                     | Every PR     |
| **Token validity** | xUnit + ClockMock | Expiry warning, OUK update refresh, `/token/offline` flow | Every PR |

---

##  2 Layer definitions & tooling

###  2.1 Unit

* Target ≥ 80 % **line and** ≥ 60 % **branch** coverage (`coverlet` + ReportGenerator).
* Naming: `Method_ShouldExpected_WhenCondition`.

###  2.2 Mutation

* **Stryker.NET** runs only on projects tagged `critical‑logic=true` in `Directory.Build.props`.
* Threshold: ≥ 60 % mutation score; red build < 55 %.

###  2.3 Integration

* `RedisTestcontainer`, `TrivyServerTestcontainer`, `TestcontainersNetwork` for realistic wiring.
* Each test cleans keys and volumes; parallelisable.

* **Quota & throttle tests (new)** — spin up Redis container, fix system clock to just before UTC midnight, hammer `/scan` with a stub token to validate:
  1. Counter hits **200** → header `X‑Stella‑Quota‑Remaining: 133`; banner socket event emitted.  Delay of 5 secs is added.
  2. Counter hits **333** → Delay of 60 secs is added.
  3. At UTC midnight rollover key expires → counter resets to 0.

### 2.4 Quota / throttle layer (explicit)

* Uses the same fixture but runs in isolation to keep CI time predictable.
* Fails the pipeline if **any** of the four behaviours above mis‑fires.

###  2.4 End‑to‑End

* API suite asserts presence of `X‑Stella‑Quota‑Remaining` on every successful `/scan`.
* API suite uses **async httpx** for accurate latency numbers.
* UI suite uses **Playwright** headless Chromium; Lighthouse a11y snapshot recorded.

###  2.5 Performance

* Hyperfine measures CLI workflows (`SBOM_LOCAL`, `SBOM_REMOTE`, `IMAGE_WARM`).
* **K6** hits `/scan` at 40 rps for 3 min; checks P95 ≤ 5 s and error‑rate = 0.

###  2.6 Security (DAST + SAST)

* **PHASE QUOTA_WAIT** benchmark:
  * ≤ 5 s median for first 30 blocked requests (soft back‑off).
  * Exactly 60 s wall for hard wait‑wall.
* SAST: **CodeQL** (GitHub native) + **Semgrep OSS** ruleset.
* DAST: **ZAP baseline** spider + passive rules; fails on High risk alerts.

###  2.7 Chaos / Resilience

* **Pumba** randomly kills Trivy side‑car; test asserts queue retry.
* **Toxiproxy** injects 150 ms latency on Redis; perf budget still ≤ 6 s.


---

##  3 Repository layout

```text
tests/
├─ unit/                 # *.Unit.csproj
├─ mutation/stryker.conf.json
├─ integration/          # *.Integration.csproj
│   └─ fixtures/
├─ e2e/
│   ├─ api/pytest/       # test_*.py
│   └─ ui/playwright/    # *.spec.ts
├─ perf/
│   ├─ compose-perf.yml
│   ├─ hyperfine/
│   └─ k6/
├─ security/
│   ├─ zap-baseline.conf
│   └─ semgrep/
└─ chaos/
    ├─ toxiproxy/
    └─ pumba/
```

Tests mirror the module namespaces; each src project owns a matching test project.

##  4 CI workflows

| File         | Trigger                                               | Stages                                 |
| ------------ | ----------------------------------------------------- | -------------------------------------- |
| ci.yml       | Push / PR Lint → Unit → Static analysis → Integration |
| e2e.yml      | Merge→main                                            | Compose stack → API+UI Playwright      |
| perf.yml     | Nightly                                               | Hyperfine + K6; update Grafana JSON    |
| security.yml | Nightly                                               | ZAP baseline, Trivy FS, CodeQL         |
| mutation.yml | Nightly                                               | Stryker.NET; comment PR if < threshold |
| chaos.yml    | Weekly (cron)                                         | Toxiproxy + Pumba scenarios            |
| release.yml  | Tag                                                   | Run all above + evidence bundling      |
Failure policy: any Red gate blocks merge; nightly failures ping #stella-ci.

##  5 Quality gates & budgets

| Metric                              | Threshold  | Source                            | Maps to KPI     |
| ----------------------------------- | ---------- | --------------------------------- | --------------- |
| Line coverage                       | ≥ 80 %     | Unit, Integration Maintainability |
| Mutation score                      | ≥ 60 %     | Stryker Defect escape             |
| P95 SBOM‑first                      | ≤ 5 s      | Hyperfine                         | Product promise |
| P95 QUOTA_WAIT (soft)                     | ≤ 10 s                              | Hyperfine + Clock‑mock    | Predictable throttling      |
| Hard wait‑wall accuracy                   | 60 ± 1 s                           | Hyperfine                 | Compliance with spec        |
| P95 image‑unpack                    | ≤ 10 s     | Hyperfine                         | SRS FR‑IMG‑1    |
| /scan error‑rate                    | 0          | K6                                | Reliability     |
| ZAP High alerts                     | 0          | ZAP JSON                          | Security NFR    |
| Trivy Critical CVEs in release SBOM | 0 Trivy FS | NFR‑SEC‑1                         |
| Offline token expiry warning lead‑time | ≥ 7 days | Token tests |

Coverage & perf budgets live in tests/budgets/*.json; CI actions fail on regression.

##  6 Evidence retention

| Artefact           | Retention      | Storage               |
| ------------------ | -------------- | --------------------- |
| Hyperfine & K6 CSV | 18 months      | GitHub artefacts → S3 |
| Mutation reports   | 6 months       | S3                    |
| ZAP & Trivy SARIF  | 18 months      | GitHub Security tab   |
| Playwright videos  | Last 50 builds | MinIO                 |

Test logs (JUnit/Allure) 12 months S3, lifecycle policy

##  7 Developer quick‑start

# Bring up full stack for e2e on a laptop

```bash
docker compose -f tests/e2e/compose-core.yml up -d
```

# Run unit + integration

```bash
dotnet test --collect:"XPlat Code Coverage"

# API e2e
cd tests/e2e/api
pytest -q

# UI e2e
cd tests/e2e/ui
npx playwright install
npm test
```

##  8 Flaky‑test triage & escalation

Label failing test with flaky and open GitHub Discussion.
After 3 consecutive nightly failures, auto‑page <ops@stella-ops.org>.
Root‑cause within next sprint or quarantine behind feature flag (max 2 weeks).
*Token‑expiry tests cannot be quarantined* — they guard offline operability.


##  9 Change log

| Version | Date       | Notes                                                                                                                      |
| ------- | ---------- | -------------------------------------------------------------------------------------------------------------------------- |
| v2.0    | 2025‑07‑12 | Full overhaul: mutation tests, CodeQL/Semgrep, chaos layer, role‑based escalation, perf/security budgets aligned with SRS. |
| v1.0    | 2025‑07‑09 | Original minimal overview                                                                                                  |

(End of Test‑Suite Overview v2.0)