more features checks. setup improvements

2026-02-13 02:04:55 +02:00
parent 9911b7d73c
commit 9ca2de05df
675 changed files with 37550 additions and 1826 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -143,7 +143,8 @@ Role switching rule:

 Role inference (fallback):
 - "implement / fix / add endpoint / refactor code" -> Developer / Implementer
- "add tests / stabilize flaky tests / verify determinism" -> QA / Test Automation
+- "add tests / stabilize flaky tests / verify determinism" -> Test Automation (4.4)
+- "enter QA / test features / verify features / feature verification / e2e tests" -> QA (4.6)
 - "update docs / write guide / edit architecture dossier" -> Documentation author
 - "plan / sprint / tasks / dependencies / milestones" -> Project Manager
 - "review advisory / product direction / capability assessment" -> Product Manager
@@ -177,7 +178,7 @@ Behavior:
 Constraints:
 - Add tests for changes; maintain determinism and offline posture.

-### 4.4 QA / Test Automation role
+### 4.4 Test Automation role
 Binding standard:
 - `docs/code-of-conduct/TESTING_PRACTICES.md`

@@ -195,6 +196,88 @@ Responsibilities:
 - Update module dossiers when contracts change
 - Ensure docs remain consistent with implemented behavior

+### 4.6 QA role (end-to-end behavioral verification)
+
+Binding standards:
+- `docs/qa/feature-checks/FLOW.md` (CRITICAL -- read in full before any QA work)
+- `docs/code-of-conduct/TESTING_PRACTICES.md`
+
+Role inference:
+- "enter QA role", "test features", "verify features", "feature verification" -> this role
+
+**Primary goal: END-TO-END BEHAVIORAL VERIFICATION.**
+QA exists to prove features **actually work** by exercising them as a real user would.
+File existence checks and build passes are prerequisites, not the goal.
+**Tier 2 (behavioral verification) is the goal. Skipping Tier 2 is a verification failure.**
+
+#### 4.6.1 Feature verification pipeline (mandatory)
+
+Follow the 3-tier pipeline from `docs/qa/feature-checks/FLOW.md`:
+
+1. **Tier 0 -- Source Verification**: Confirm source files referenced in feature `.md` exist on disk.
+2. **Tier 1 -- Build + Code Review**: Build the module, run tests, AND read source code to verify the logic matches claims. Tests must assert meaningful outcomes (not just "doesn't throw").
+3. **Tier 2 -- Behavioral Verification** (THE MAIN PURPOSE):
+   - **Tier 2a (API)**: Send real HTTP requests, verify responses. For services with HTTP endpoints.
+   - **Tier 2b (CLI)**: Run CLI commands, verify output and exit codes.
+   - **Tier 2c (UI)**: Use Playwright to navigate the UI, verify rendering and interactions.
+   - **Tier 2d (Library/Internal)**: Run **targeted integration tests** against the **specific test `.csproj`** (see below).
+
+#### 4.6.2 Tier 2d deep verification rules (CRITICAL -- prevents shallow testing)
+
+For library/internal modules (Policy, Concelier, Scanner, Signals, Attestor, etc.) with no external HTTP/CLI/UI surface:
+
+1. **Run tests against INDIVIDUAL `.csproj` files, NOT solution filters (`.slnf`).**
+   Solution filters ignore `--filter` flags, causing all tests to run and producing misleading suite-wide pass counts that hide whether the feature's specific tests actually ran.
+   ```
+   # CORRECT -- targets specific test project, filter works:
+   dotnet test "src/Policy/__Tests/StellaOps.Policy.Engine.Tests/StellaOps.Policy.Engine.Tests.csproj" \
+     --filter "FullyQualifiedName~EwsCalculatorTests" -v normal
+
+   # WRONG -- slnf ignores filter, runs everything, useless evidence:
+   dotnet test src/Policy/StellaOps.Policy.tests.slnf \
+     --filter "FullyQualifiedName~EwsCalculatorTests" -v normal
+   ```
+
+2. **Verify the `--filter` actually filtered.** The `testsRun` count in evidence must reflect the targeted subset, not the entire suite. If you see the full suite count, the filter did not work -- switch to individual `.csproj`.
+
+3. **Read test source code.** Open the test `.cs` files and verify:
+   - Tests assert actual computed values (scores, verdicts, hashes, states)
+   - Tests exercise the feature's core logic paths (happy path + error cases)
+   - Tests are NOT just checking `!= null` or `doesn't throw`
+   - If assertions are shallow, the feature has a **test gap** -- mark it and write deeper tests
+
+4. **Write new tests when behavioral coverage is missing.**
+   - If no tests exist for the feature's core behavior: **create a focused test class**
+   - Test actual inputs -> expected outputs for the feature's logic
+   - Run the new test and verify it passes
+   - Record new tests in evidence (`newTestsWritten` field)
+
+5. **Fix bugs when tests fail.**
+   - Diagnose root cause, apply minimal fix, re-run, capture before/after evidence
+   - Record fixes in evidence (`bugsFixed` field)
+   - Follow the FLOW.md state machine: `failed -> triaged -> confirmed -> fixing -> retesting`
+
+6. **Capture actual command output** in tier2 evidence. Include raw `dotnet test` output snippets, not just summary counts.
+
+#### 4.6.3 Forbidden shortcuts (will invalidate verification)
+
+- Declaring Tier 2 pass from suite totals alone (e.g., "708/708 pass") without targeted test evidence
+- Copying previous run artifacts and editing timestamps
+- Running the entire solution filter and claiming filter "is advisory"
+- Marking a feature as verified without reading and confirming test assertions
+- Skipping Tier 2 for any reason other than: `hardware_required`, `multi_datacenter`, `air_gap_network`
+
+#### 4.6.4 Orchestrator vs. subagent responsibilities
+
+- **Orchestrator** (team lead): Writes state files (`docs/qa/feature-checks/state/*.json`), moves feature files between `unchecked/` -> `checked/` or `unimplemented/`, dispatches subagents, max 4 concurrent agents on unrelated modules
+- **Subagents** (feature checkers): Execute tiers, write evidence to `docs/qa/feature-checks/runs/`, move feature files, report results back to orchestrator. Never modify state JSON directly.
+
+#### 4.6.5 Problems-first enforcement
+
+- Resolve `failed`/`fixing`/`retesting` features before starting new `queued` features
+- A feature in a non-terminal state blocks all new work on that module
+- Follow the FLOW.md state machine strictly: `queued -> checking -> passed/failed -> done/blocked`
+
 ---

 ## 5) Module-local AGENTS.md discipline