docs consolidation, big sln build fixes, new advisories and sprints/tasks

This commit is contained in:
master
2026-01-05 18:37:04 +02:00
parent d0a7b88398
commit d7bdca6d97
175 changed files with 10322 additions and 307 deletions

View File

@@ -0,0 +1,85 @@
**Stella Ops — Incremental Testing Enhancements (NEW since prior runs)**
*Only net-new ideas and practices; no restatement of earlier guidance.*
---
## 1) Unit Testing — what to add now
* **Semantic fuzzing for policies**: generate inputs that specifically target policy boundaries (quotas, geo rules, sanctions, priority overrides), not random fuzz.
* **Time-skew simulation**: unit tests that warp time (clock drift, leap seconds, TTL expiry) to catch cache and signature failures.
* **Decision explainability tests**: assert that every routing decision produces a minimal, machine-readable explanation payload (even if not user-facing).
**Why it matters**: catches failures that only appear under temporal or policy edge conditions.
---
## 2) Module / Source-Level Testing — new practices
* **Policy-as-code tests**: treat routing and ops policies as versioned code with diff-based tests (policy change → expected behavior delta).
* **Schema evolution tests**: automatically replay last N schema versions against current code to ensure backward compatibility.
* **Dead-path detection**: fail builds if conditional branches are never exercised across the module test suite.
**Why it matters**: prevents silent behavior changes when policies or schemas evolve.
---
## 3) Integration Testing — new focus areas
* **Production trace replay (sanitized)**: replay real, anonymized traces into integration environments to validate behavior against reality, not assumptions.
* **Failure choreography tests**: deliberately stagger dependency failures (A fails first, then B recovers, then A recovers) and assert system convergence.
* **Idempotency verification**: explicit tests that repeated requests under retries never create divergent state.
**Why it matters**: most real outages are sequencing problems, not single failures.
---
## 4) Deployment / E2E Testing — additions
* **Config-diff E2E tests**: assert that changing *only* config (no code) produces only the expected behavioral delta.
* **Rollback lag tests**: measure and assert maximum time-to-safe-state after rollback is triggered.
* **Synthetic adversarial traffic**: continuously inject malformed but valid-looking traffic post-deploy to ensure defenses stay active.
**Why it matters**: many incidents come from “safe” config changes and slow rollback propagation.
---
## 5) Competitor Parity Testing — next-level
* **Behavioral fingerprinting**: derive a compact fingerprint (outputs + timing + error shape) per request class and track drift over time.
* **Asymmetric stress tests**: apply load patterns competitors are known to struggle with and verify Stella Ops remains stable.
* **Regression-to-market alerts**: trigger alerts when Stella deviates from competitor norms in *either* direction (worse or suspiciously better).
**Why it matters**: parity isnt static; it drifts quietly unless measured continuously.
---
## 6) New Cross-Cutting Standards to Enforce
* **Tests as evidence**: every integration/E2E run produces immutable artifacts suitable for audit or post-incident review.
* **Deterministic replayability**: any failed test must be reproducible bit-for-bit within 24 hours.
* **Blast-radius annotation**: every test declares what operational surface it covers (routing, auth, billing, compliance).
---
## Prioritized Checklist — This Week Only
**Immediate (12 days)**
1. Add decision-explainability assertions to core routing unit tests.
2. Introduce time-skew unit tests for cache, TTL, and signature logic.
3. Define and enforce idempotency tests on one critical integration path.
**Short-term (by end of week)**
4. Enable sanitized production trace replay in one integration suite.
5. Add rollback lag measurement to deployment/E2E tests.
6. Start policy-as-code diff tests for routing rules.
**High-leverage**
7. Implement a minimal competitor behavioral fingerprint and store it weekly.
8. Require blast-radius annotations on all new integration and E2E tests.
---
### Bottom line
The next gains for Stella Ops testing are no longer about coverage—theyre about **temporal correctness, policy drift control, replayability, and competitive awareness**. Systems that fail now do so quietly, over time, and under sequence pressure. These additions close exactly those gaps.