#  19 · Test‑Suite Overview — **Stella Ops** *(v2.0 — 12 Jul 2025)* > **Purpose** — Describe the **multi‑layer automated‑test strategy** that guards Stella Ops’ five‑second performance promise, security posture and API stability, and show how each layer maps to CI gates and release criteria. --- ##  0 Table of Contents 1. Test‑pyramid at a glance 2. Layer definitions & tooling 3. Directory & naming conventions 4. CI workflows & failure policy 5. Quality gates & coverage budgets 6. Evidence retention & auditability 7. Local developer quick‑start 8. Flaky‑test triage & escalation 9. Change log --- ##  1 Test‑pyramid at a glance | Layer | Framework(s) | Scope | CI frequency | | ---------------------- | ------------------------ | --------------------------------------- | ------------ | | **Unit** | xUnit + FluentAssertions | Pure C# methods, guard clauses, mapping | Every PR | | **Mutation** | **Stryker.NET** | Critical algorithm branches | Nightly | | **Static analysis** | **CodeQL**, **Semgrep** | OWASP, injection, secrets | Every PR | | **Integration** | Testcontainers + xUnit | Redis, Trivy exec, plug‑in hot‑load | Every PR | | **Quota / throttle** | Testcontainers + Clock‑mock | 333‑scan counter, 5 s & 60 s retry‑after headers | Every PR | | **End‑to‑End (UI)** | **Playwright C#** | Login, scan list, mute flow | Merge→main | | **Performance** | Hyperfine + K6 | P95 latency, 40 rps throughput | Nightly | | **Security DAST** | OWASP ZAP baseline | TLS headers, auth, XSS | Nightly + RC | | **Chaos / Resilience** | **Pumba** & Toxiproxy | Redis latency, container kill | Weekly | | **Compliance smoke** | Spectral + JSON‑Schema | SBOM & API payloads | Every PR | | **Token validity** | xUnit + ClockMock | Expiry warning, OUK update refresh, `/token/offline` flow | Every PR | --- ##  2 Layer definitions & tooling ###  2.1 Unit * Target ≥ 80 % **line and** ≥ 60 % **branch** coverage (`coverlet` + ReportGenerator). * Naming: `Method_ShouldExpected_WhenCondition`. ###  2.2 Mutation * **Stryker.NET** runs only on projects tagged `critical‑logic=true` in `Directory.Build.props`. * Threshold: ≥ 60 % mutation score; red build < 55 %. ###  2.3 Integration * `RedisTestcontainer`, `TrivyServerTestcontainer`, `TestcontainersNetwork` for realistic wiring. * Each test cleans keys and volumes; parallelisable. * **Quota & throttle tests (new)** — spin up Redis container, fix system clock to just before UTC midnight, hammer `/scan` with a stub token to validate: 1. Counter hits **200** → header `X‑Stella‑Quota‑Remaining: 133`; banner socket event emitted. Delay of 5 secs is added. 2. Counter hits **333** → Delay of 60 secs is added. 3. At UTC midnight rollover key expires → counter resets to 0. ### 2.4 Quota / throttle layer (explicit) * Uses the same fixture but runs in isolation to keep CI time predictable. * Fails the pipeline if **any** of the four behaviours above mis‑fires. ###  2.4 End‑to‑End * API suite asserts presence of `X‑Stella‑Quota‑Remaining` on every successful `/scan`. * API suite uses **async httpx** for accurate latency numbers. * UI suite uses **Playwright** headless Chromium; Lighthouse a11y snapshot recorded. ###  2.5 Performance * Hyperfine measures CLI workflows (`SBOM_LOCAL`, `SBOM_REMOTE`, `IMAGE_WARM`). * **K6** hits `/scan` at 40 rps for 3 min; checks P95 ≤ 5 s and error‑rate = 0. ###  2.6 Security (DAST + SAST) * **PHASE QUOTA_WAIT** benchmark: * ≤ 5 s median for first 30 blocked requests (soft back‑off). * Exactly 60 s wall for hard wait‑wall. * SAST: **CodeQL** (GitHub native) + **Semgrep OSS** ruleset. * DAST: **ZAP baseline** spider + passive rules; fails on High risk alerts. ###  2.7 Chaos / Resilience * **Pumba** randomly kills Trivy side‑car; test asserts queue retry. * **Toxiproxy** injects 150 ms latency on Redis; perf budget still ≤ 6 s. --- ##  3 Repository layout ```text tests/ ├─ unit/ # *.Unit.csproj ├─ mutation/stryker.conf.json ├─ integration/ # *.Integration.csproj │ └─ fixtures/ ├─ e2e/ │ ├─ api/pytest/ # test_*.py │ └─ ui/playwright/ # *.spec.ts ├─ perf/ │ ├─ compose-perf.yml │ ├─ hyperfine/ │ └─ k6/ ├─ security/ │ ├─ zap-baseline.conf │ └─ semgrep/ └─ chaos/ ├─ toxiproxy/ └─ pumba/ ``` Tests mirror the module namespaces; each src project owns a matching test project. ##  4 CI workflows | File | Trigger | Stages | | ------------ | ----------------------------------------------------- | -------------------------------------- | | ci.yml | Push / PR Lint → Unit → Static analysis → Integration | | e2e.yml | Merge→main | Compose stack → API+UI Playwright | | perf.yml | Nightly | Hyperfine + K6; update Grafana JSON | | security.yml | Nightly | ZAP baseline, Trivy FS, CodeQL | | mutation.yml | Nightly | Stryker.NET; comment PR if < threshold | | chaos.yml | Weekly (cron) | Toxiproxy + Pumba scenarios | | release.yml | Tag | Run all above + evidence bundling | Failure policy: any Red gate blocks merge; nightly failures ping #stella-ci. ##  5 Quality gates & budgets | Metric | Threshold | Source | Maps to KPI | | ----------------------------------- | ---------- | --------------------------------- | --------------- | | Line coverage | ≥ 80 % | Unit, Integration Maintainability | | Mutation score | ≥ 60 % | Stryker Defect escape | | P95 SBOM‑first | ≤ 5 s | Hyperfine | Product promise | | P95 QUOTA_WAIT (soft) | ≤ 10 s | Hyperfine + Clock‑mock | Predictable throttling | | Hard wait‑wall accuracy | 60 ± 1 s | Hyperfine | Compliance with spec | | P95 image‑unpack | ≤ 10 s | Hyperfine | SRS FR‑IMG‑1 | | /scan error‑rate | 0 | K6 | Reliability | | ZAP High alerts | 0 | ZAP JSON | Security NFR | | Trivy Critical CVEs in release SBOM | 0 Trivy FS | NFR‑SEC‑1 | | Offline token expiry warning lead‑time | ≥ 7 days | Token tests | Coverage & perf budgets live in tests/budgets/*.json; CI actions fail on regression. ##  6 Evidence retention | Artefact | Retention | Storage | | ------------------ | -------------- | --------------------- | | Hyperfine & K6 CSV | 18 months | GitHub artefacts → S3 | | Mutation reports | 6 months | S3 | | ZAP & Trivy SARIF | 18 months | GitHub Security tab | | Playwright videos | Last 50 builds | MinIO | Test logs (JUnit/Allure) 12 months S3, lifecycle policy ##  7 Developer quick‑start # Bring up full stack for e2e on a laptop ```bash docker compose -f tests/e2e/compose-core.yml up -d ``` # Run unit + integration ```bash dotnet test --collect:"XPlat Code Coverage" # API e2e cd tests/e2e/api pytest -q # UI e2e cd tests/e2e/ui npx playwright install npm test ``` ##  8 Flaky‑test triage & escalation Label failing test with flaky and open GitHub Discussion. After 3 consecutive nightly failures, auto‑page . Root‑cause within next sprint or quarantine behind feature flag (max 2 weeks). *Token‑expiry tests cannot be quarantined* — they guard offline operability. ##  9 Change log | Version | Date | Notes | | ------- | ---------- | -------------------------------------------------------------------------------------------------------------------------- | | v2.0 | 2025‑07‑12 | Full overhaul: mutation tests, CodeQL/Semgrep, chaos layer, role‑based escalation, perf/security budgets aligned with SRS. | | v1.0 | 2025‑07‑09 | Original minimal overview | (End of Test‑Suite Overview v2.0)