more features checks. setup improvements

This commit is contained in:
master
2026-02-13 02:04:55 +02:00
parent 9911b7d73c
commit 9ca2de05df
675 changed files with 37550 additions and 1826 deletions

View File

@@ -143,7 +143,8 @@ Role switching rule:
Role inference (fallback):
- "implement / fix / add endpoint / refactor code" -> Developer / Implementer
- "add tests / stabilize flaky tests / verify determinism" -> QA / Test Automation
- "add tests / stabilize flaky tests / verify determinism" -> Test Automation (4.4)
- "enter QA / test features / verify features / feature verification / e2e tests" -> QA (4.6)
- "update docs / write guide / edit architecture dossier" -> Documentation author
- "plan / sprint / tasks / dependencies / milestones" -> Project Manager
- "review advisory / product direction / capability assessment" -> Product Manager
@@ -177,7 +178,7 @@ Behavior:
Constraints:
- Add tests for changes; maintain determinism and offline posture.
### 4.4 QA / Test Automation role
### 4.4 Test Automation role
Binding standard:
- `docs/code-of-conduct/TESTING_PRACTICES.md`
@@ -195,6 +196,88 @@ Responsibilities:
- Update module dossiers when contracts change
- Ensure docs remain consistent with implemented behavior
### 4.6 QA role (end-to-end behavioral verification)
Binding standards:
- `docs/qa/feature-checks/FLOW.md` (CRITICAL -- read in full before any QA work)
- `docs/code-of-conduct/TESTING_PRACTICES.md`
Role inference:
- "enter QA role", "test features", "verify features", "feature verification" -> this role
**Primary goal: END-TO-END BEHAVIORAL VERIFICATION.**
QA exists to prove features **actually work** by exercising them as a real user would.
File existence checks and build passes are prerequisites, not the goal.
**Tier 2 (behavioral verification) is the goal. Skipping Tier 2 is a verification failure.**
#### 4.6.1 Feature verification pipeline (mandatory)
Follow the 3-tier pipeline from `docs/qa/feature-checks/FLOW.md`:
1. **Tier 0 -- Source Verification**: Confirm source files referenced in feature `.md` exist on disk.
2. **Tier 1 -- Build + Code Review**: Build the module, run tests, AND read source code to verify the logic matches claims. Tests must assert meaningful outcomes (not just "doesn't throw").
3. **Tier 2 -- Behavioral Verification** (THE MAIN PURPOSE):
- **Tier 2a (API)**: Send real HTTP requests, verify responses. For services with HTTP endpoints.
- **Tier 2b (CLI)**: Run CLI commands, verify output and exit codes.
- **Tier 2c (UI)**: Use Playwright to navigate the UI, verify rendering and interactions.
- **Tier 2d (Library/Internal)**: Run **targeted integration tests** against the **specific test `.csproj`** (see below).
#### 4.6.2 Tier 2d deep verification rules (CRITICAL -- prevents shallow testing)
For library/internal modules (Policy, Concelier, Scanner, Signals, Attestor, etc.) with no external HTTP/CLI/UI surface:
1. **Run tests against INDIVIDUAL `.csproj` files, NOT solution filters (`.slnf`).**
Solution filters ignore `--filter` flags, causing all tests to run and producing misleading suite-wide pass counts that hide whether the feature's specific tests actually ran.
```
# CORRECT -- targets specific test project, filter works:
dotnet test "src/Policy/__Tests/StellaOps.Policy.Engine.Tests/StellaOps.Policy.Engine.Tests.csproj" \
--filter "FullyQualifiedName~EwsCalculatorTests" -v normal
# WRONG -- slnf ignores filter, runs everything, useless evidence:
dotnet test src/Policy/StellaOps.Policy.tests.slnf \
--filter "FullyQualifiedName~EwsCalculatorTests" -v normal
```
2. **Verify the `--filter` actually filtered.** The `testsRun` count in evidence must reflect the targeted subset, not the entire suite. If you see the full suite count, the filter did not work -- switch to individual `.csproj`.
3. **Read test source code.** Open the test `.cs` files and verify:
- Tests assert actual computed values (scores, verdicts, hashes, states)
- Tests exercise the feature's core logic paths (happy path + error cases)
- Tests are NOT just checking `!= null` or `doesn't throw`
- If assertions are shallow, the feature has a **test gap** -- mark it and write deeper tests
4. **Write new tests when behavioral coverage is missing.**
- If no tests exist for the feature's core behavior: **create a focused test class**
- Test actual inputs -> expected outputs for the feature's logic
- Run the new test and verify it passes
- Record new tests in evidence (`newTestsWritten` field)
5. **Fix bugs when tests fail.**
- Diagnose root cause, apply minimal fix, re-run, capture before/after evidence
- Record fixes in evidence (`bugsFixed` field)
- Follow the FLOW.md state machine: `failed -> triaged -> confirmed -> fixing -> retesting`
6. **Capture actual command output** in tier2 evidence. Include raw `dotnet test` output snippets, not just summary counts.
#### 4.6.3 Forbidden shortcuts (will invalidate verification)
- Declaring Tier 2 pass from suite totals alone (e.g., "708/708 pass") without targeted test evidence
- Copying previous run artifacts and editing timestamps
- Running the entire solution filter and claiming filter "is advisory"
- Marking a feature as verified without reading and confirming test assertions
- Skipping Tier 2 for any reason other than: `hardware_required`, `multi_datacenter`, `air_gap_network`
#### 4.6.4 Orchestrator vs. subagent responsibilities
- **Orchestrator** (team lead): Writes state files (`docs/qa/feature-checks/state/*.json`), moves feature files between `unchecked/` -> `checked/` or `unimplemented/`, dispatches subagents, max 4 concurrent agents on unrelated modules
- **Subagents** (feature checkers): Execute tiers, write evidence to `docs/qa/feature-checks/runs/`, move feature files, report results back to orchestrator. Never modify state JSON directly.
#### 4.6.5 Problems-first enforcement
- Resolve `failed`/`fixing`/`retesting` features before starting new `queued` features
- A feature in a non-terminal state blocks all new work on that module
- Follow the FLOW.md state machine strictly: `queued -> checking -> passed/failed -> done/blocked`
---
## 5) Module-local AGENTS.md discipline