docs consolidation, big sln build fixes, new advisories and sprints/tasks

2026-01-05 18:37:04 +02:00
parent d0a7b88398
commit d7bdca6d97
175 changed files with 10322 additions and 307 deletions
--- a/docs/product-advisories/05-Dec-2026
+++ b/docs/product-advisories/05-Dec-2026
@@ -0,0 +1,85 @@
+**Stella Ops — Incremental Testing Enhancements (NEW since prior runs)**
+*Only net-new ideas and practices; no restatement of earlier guidance.*
+
+---
+
+## 1) Unit Testing — what to add now
+
+* **Semantic fuzzing for policies**: generate inputs that specifically target policy boundaries (quotas, geo rules, sanctions, priority overrides), not random fuzz.
+* **Time-skew simulation**: unit tests that warp time (clock drift, leap seconds, TTL expiry) to catch cache and signature failures.
+* **Decision explainability tests**: assert that every routing decision produces a minimal, machine-readable explanation payload (even if not user-facing).
+
+**Why it matters**: catches failures that only appear under temporal or policy edge conditions.
+
+---
+
+## 2) Module / Source-Level Testing — new practices
+
+* **Policy-as-code tests**: treat routing and ops policies as versioned code with diff-based tests (policy change → expected behavior delta).
+* **Schema evolution tests**: automatically replay last N schema versions against current code to ensure backward compatibility.
+* **Dead-path detection**: fail builds if conditional branches are never exercised across the module test suite.
+
+**Why it matters**: prevents silent behavior changes when policies or schemas evolve.
+
+---
+
+## 3) Integration Testing — new focus areas
+
+* **Production trace replay (sanitized)**: replay real, anonymized traces into integration environments to validate behavior against reality, not assumptions.
+* **Failure choreography tests**: deliberately stagger dependency failures (A fails first, then B recovers, then A recovers) and assert system convergence.
+* **Idempotency verification**: explicit tests that repeated requests under retries never create divergent state.
+
+**Why it matters**: most real outages are sequencing problems, not single failures.
+
+---
+
+## 4) Deployment / E2E Testing — additions
+
+* **Config-diff E2E tests**: assert that changing *only* config (no code) produces only the expected behavioral delta.
+* **Rollback lag tests**: measure and assert maximum time-to-safe-state after rollback is triggered.
+* **Synthetic adversarial traffic**: continuously inject malformed but valid-looking traffic post-deploy to ensure defenses stay active.
+
+**Why it matters**: many incidents come from “safe” config changes and slow rollback propagation.
+
+---
+
+## 5) Competitor Parity Testing — next-level
+
+* **Behavioral fingerprinting**: derive a compact fingerprint (outputs + timing + error shape) per request class and track drift over time.
+* **Asymmetric stress tests**: apply load patterns competitors are known to struggle with and verify Stella Ops remains stable.
+* **Regression-to-market alerts**: trigger alerts when Stella deviates from competitor norms in *either* direction (worse or suspiciously better).
+
+**Why it matters**: parity isn’t static; it drifts quietly unless measured continuously.
+
+---
+
+## 6) New Cross-Cutting Standards to Enforce
+
+* **Tests as evidence**: every integration/E2E run produces immutable artifacts suitable for audit or post-incident review.
+* **Deterministic replayability**: any failed test must be reproducible bit-for-bit within 24 hours.
+* **Blast-radius annotation**: every test declares what operational surface it covers (routing, auth, billing, compliance).
+
+---
+
+## Prioritized Checklist — This Week Only
+
+**Immediate (1–2 days)**
+
+1. Add decision-explainability assertions to core routing unit tests.
+2. Introduce time-skew unit tests for cache, TTL, and signature logic.
+3. Define and enforce idempotency tests on one critical integration path.
+
+**Short-term (by end of week)**
+4. Enable sanitized production trace replay in one integration suite.
+5. Add rollback lag measurement to deployment/E2E tests.
+6. Start policy-as-code diff tests for routing rules.
+
+**High-leverage**
+7. Implement a minimal competitor behavioral fingerprint and store it weekly.
+8. Require blast-radius annotations on all new integration and E2E tests.
+
+---
+
+### Bottom line
+
+The next gains for Stella Ops testing are no longer about coverage—they’re about **temporal correctness, policy drift control, replayability, and competitive awareness**. Systems that fail now do so quietly, over time, and under sequence pressure. These additions close exactly those gaps.