save checkpoint. addition features and their state. check some ofthem

2026-02-10 07:54:44 +02:00
parent 4bdc298ec1
commit 5593212b41
211 changed files with 10248 additions and 1208 deletions
--- a/docs/qa/feature-checks/FLOW.md
+++ b/docs/qa/feature-checks/FLOW.md
@@ -0,0 +1,530 @@
+# Feature Verification Pipeline - FLOW
+
+This document defines the state machine, tier system, artifact format, and priority rules
+for the automated feature verification pipeline.
+
+All agents in the pipeline MUST read this document before taking any action.
+
+---
+
+## 1. Directory Layout
+
+```
+docs/features/
+  unchecked/<module>/<feature>.md    # Input: features to verify (1,144 files)
+  checked/<module>/<feature>.md      # Output: features that passed verification
+  dropped/<feature>.md               # Not implemented / intentionally dropped
+
+docs/qa/feature-checks/
+  FLOW.md                            # This file (state machine spec)
+  state/<module>.json                # Per-module state ledger (one file per module)
+  runs/<module>/<feature>/<runId>/   # Artifacts per verification run
+```
+
+---
+
+## 2. State Machine
+
+### 2.1 States
+
+| State | Meaning |
+|-------|---------|
+| `queued` | Discovered, not yet processed |
+| `checking` | Feature checker is running |
+| `passed` | All tier checks passed |
+| `failed` | Check found issues (pre-triage) |
+| `triaged` | Issue-finder identified root cause |
+| `confirmed` | Issue-confirmer validated triage |
+| `fixing` | Fixer is implementing the fix |
+| `retesting` | Retester is re-running checks |
+| `done` | Verified and moved to `checked/` |
+| `blocked` | Requires human intervention |
+| `skipped` | Cannot be automatically verified (manual-only) |
+| `not_implemented` | Source files missing despite sprint claiming DONE |
+
+### 2.2 Transitions
+
+```
+queued ──────────────> checking
+                          │
+                ┌─────────┼─────────────┐
+                v         v             v
+            passed     failed      not_implemented
+               │         │              │
+               v         v              │ (move file back
+            done     triaged            │  to unimplemented/)
+               │         │              v
+               │         v           [terminal]
+               │     confirmed
+               │         │
+               │         v
+               │      fixing
+               │         │
+               │         v
+               │     retesting
+               │       │    │
+               │       v    v
+               │    done  failed ──> (retry or blocked)
+               │
+               v
+         [move file to checked/]
+```
+
+### 2.3 Retry Policy
+
+- Maximum retry count: **3** per feature
+- After 3 retries with failures: transition to `blocked`
+- Blocked features require human review before re-entering the pipeline
+- Each retry increments `retryCount` in state
+
+### 2.4 Skip Criteria
+
+Features that CANNOT be automatically E2E tested should be marked `skipped`:
+- Air-gap/offline features (require disconnected environment)
+- Crypto-sovereign features (require HSM/eIDAS hardware)
+- Multi-node cluster features (require multi-host setup)
+- Performance benchmarking features (require dedicated infra)
+- Features with description containing "manual verification required"
+
+The checker agent determines skip eligibility during Tier 0.
+
+---
+
+## 3. Tier System
+
+Verification proceeds in tiers. Each tier is a gate - a feature must pass
+the current tier before advancing to the next. **A feature is NOT verified
+until ALL applicable tiers pass.** File existence alone is not verification.
+
+### Tier 0: Source Verification (fast, cheap)
+
+**Purpose**: Verify that the source files referenced in the feature file actually exist.
+
+**Process**:
+1. Read the feature `.md` file
+2. Extract file paths from `## Implementation Details`, `## Key files`, or `## What's Implemented` sections
+3. For each path, check if the file exists on disk
+4. Extract class/interface names and grep for their declarations
+
+**Outcomes**:
+- All key files found: `source_verified = true`, advance to Tier 1
+- Key files missing (>50% absent): `status = not_implemented`
+- Some files missing (<50% absent): `source_verified = partial`, add note, advance to Tier 1
+
+**What this proves**: The code exists on disk. Nothing more.
+
+**Cost**: ~0.01 USD per feature (file existence checks only)
+
+### Tier 1: Build + Code Review (medium)
+
+**Purpose**: Verify the module compiles, tests pass, AND the code actually implements
+the described behavior.
+
+**Process**:
+1. Identify the `.csproj` file(s) for the feature's module
+2. Run `dotnet build <project>.csproj` and capture output
+3. Run `dotnet test <test-project>.csproj --filter <relevant-filter>` -- tests MUST actually execute and pass
+4. For Angular/frontend features: run `npx ng build` and `npx ng test` for the relevant library/app
+5. **Code review** (CRITICAL): Read the key source files and verify:
+   - The classes/methods described in the feature file actually contain the logic claimed
+   - The feature description matches what the code does (not just that it exists)
+   - Tests cover the core behavior described in the feature (not just compilation)
+6. If the build succeeds but tests are blocked by upstream dependency errors:
+   - Record as `build_verified = true, tests_blocked_upstream = true`
+   - The feature CANNOT advance to `passed` -- mark as `failed` with category `env_issue`
+   - The upstream blocker must be resolved before the feature can pass
+
+**Code Review Checklist** (must answer YES to all):
+- [ ] Does the main class/service exist with non-trivial implementation (not stubs/TODOs)?
+- [ ] Does the logic match what the feature description claims?
+- [ ] Are there unit tests that exercise the core behavior?
+- [ ] Do those tests actually assert meaningful outcomes (not just "doesn't throw")?
+
+**Outcomes**:
+- Build + tests pass + code review confirms behavior: `build_verified = true`, advance to Tier 2
+- Build fails: `status = failed`, record build errors
+- Tests fail or blocked: `status = failed`, record reason
+- Code review finds stubs/missing logic: `status = failed`, category = `missing_code`
+
+**What this proves**: The code compiles, tests pass, and someone has verified the code
+does what it claims.
+
+**Cost**: ~0.10 USD per feature (compile + test execution + code reading)
+
+### Tier 2: Behavioral Verification (API / CLI / UI)
+
+**Purpose**: Verify the feature works end-to-end by actually exercising it through
+its external interface. This is the only tier that proves the feature WORKS, not
+just that code exists.
+
+**EVERY feature MUST have a Tier 2 check unless explicitly skipped.** The check type
+depends on the module's external surface.
+
+#### Tier 2a: API Testing (Gateway, Router, Api, Platform, backend services with HTTP endpoints)
+
+**Process**:
+1. Ensure the service is running (check port, or start via `docker compose up`)
+2. Send HTTP requests to the feature's endpoints using `curl` or a test script
+3. Verify response status codes, headers, and body structure
+4. Test error cases (unauthorized, bad input, rate limited, etc.)
+5. Verify the behavior described in the feature file actually happens
+
+**Example for `gateway-identity-header-strip`**:
+```bash
+# Send request with spoofed identity header
+curl -H "X-Forwarded-User: attacker" http://localhost:5000/api/test
+# Verify the header was stripped (response should use authenticated identity, not spoofed)
+```
+
+**Artifact**: `tier2-api-check.json`
+```json
+{
+  "type": "api",
+  "baseUrl": "http://localhost:5000",
+  "requests": [
+    {
+      "description": "Verify spoofed identity header is stripped",
+      "method": "GET",
+      "path": "/api/test",
+      "headers": { "X-Forwarded-User": "attacker" },
+      "expectedStatus": 200,
+      "actualStatus": 200,
+      "assertion": "Response X-Forwarded-User header matches authenticated user, not 'attacker'",
+      "result": "pass|fail",
+      "evidence": "actual response headers/body"
+    }
+  ],
+  "verdict": "pass|fail|skip"
+}
+```
+
+#### Tier 2b: CLI Testing (Cli, Tools, Bench modules)
+
+**Process**:
+1. Build the CLI tool if needed
+2. Run the CLI command described in the feature's E2E Test Plan
+3. Verify stdout/stderr output matches expected behavior
+4. Test error cases (invalid args, missing config, etc.)
+5. Verify exit codes
+
+**Example for `cli-baseline-selection-logic`**:
+```bash
+stella scan --baseline last-green myimage:latest
+# Verify output shows baseline was selected correctly
+echo $?  # Verify exit code 0
+```
+
+**Artifact**: `tier2-cli-check.json`
+```json
+{
+  "type": "cli",
+  "commands": [
+    {
+      "description": "Verify baseline selection with last-green strategy",
+      "command": "stella scan --baseline last-green myimage:latest",
+      "expectedExitCode": 0,
+      "actualExitCode": 0,
+      "expectedOutput": "Using baseline: ...",
+      "actualOutput": "...",
+      "result": "pass|fail"
+    }
+  ],
+  "verdict": "pass|fail|skip"
+}
+```
+
+#### Tier 2c: UI Testing (Web, ExportCenter, DevPortal, VulnExplorer, PacksRegistry)
+
+**Process**:
+1. Ensure the Angular app is running (`ng serve` or docker)
+2. Use Playwright CLI or MCP to navigate to the feature's UI route
+3. Follow E2E Test Plan steps: verify elements render, interactions work, data displays
+4. Capture screenshots as evidence
+5. Test accessibility (keyboard navigation, ARIA labels) if listed in E2E plan
+
+**Example for `pipeline-run-centric-view`**:
+```bash
+npx playwright test --grep "pipeline-run" --reporter=json
+# Or manually via MCP: navigate to /release-orchestrator/runs, verify table renders
+```
+
+**Artifact**: `tier2-ui-check.json`
+```json
+{
+  "type": "ui",
+  "baseUrl": "http://localhost:4200",
+  "steps": [
+    {
+      "description": "Navigate to /release-orchestrator/runs",
+      "action": "navigate",
+      "target": "/release-orchestrator/runs",
+      "expected": "Runs list table renders with columns",
+      "result": "pass|fail",
+      "screenshot": "step-1-runs-list.png"
+    }
+  ],
+  "verdict": "pass|fail|skip"
+}
+```
+
+#### Tier 2d: Library/Internal Testing (Attestor, Policy, Scanner, etc. with no external surface)
+
+For modules with no HTTP/CLI/UI surface, Tier 2 means running **targeted
+integration tests** or **behavioral unit tests** that prove the feature logic:
+
+**Process**:
+1. Identify tests that specifically exercise the feature's behavior
+2. Run those tests: `dotnet test --filter "FullyQualifiedName~FeatureClassName"`
+3. Read the test code to confirm it asserts meaningful behavior (not just "compiles")
+4. If no behavioral tests exist, write a focused test and run it
+
+**Example for `evidence-weighted-score-model`**:
+```bash
+dotnet test --filter "FullyQualifiedName~EwsCalculatorTests"
+# Verify: normalizers produce expected dimension scores
+# Verify: guardrails cap/floor scores correctly
+# Verify: composite score is deterministic for same inputs
+```
+
+**Artifact**: `tier2-integration-check.json`
+```json
+{
+  "type": "integration",
+  "testFilter": "FullyQualifiedName~EwsCalculatorTests",
+  "testsRun": 21,
+  "testsPassed": 21,
+  "testsFailed": 0,
+  "behaviorVerified": [
+    "6-dimension normalization produces expected scores",
+    "Guardrails enforce caps and floors",
+    "Composite score is deterministic"
+  ],
+  "verdict": "pass|fail"
+}
+```
+
+### When to skip Tier 2
+
+Mark `skipped` ONLY for features that literally cannot be tested in the current environment:
+- Air-gap features requiring a disconnected network
+- HSM/eIDAS features requiring physical hardware
+- Multi-datacenter features requiring distributed infrastructure
+- Performance benchmark features requiring dedicated load-gen infrastructure
+
+"The app isn't running" is NOT a skip reason -- it's a `failed` with `env_issue`.
+"No tests exist" is NOT a skip reason -- write a focused test.
+
+### Tier Classification by Module
+
+| Tier 2 Type | Modules | Feature Count |
+|-------------|---------|---------------|
+| 2a (API) | Gateway, Router, Api, Platform | ~30 |
+| 2b (CLI) | Cli, Tools, Bench | ~110 |
+| 2c (UI/Playwright) | Web, ExportCenter, DevPortal, VulnExplorer, PacksRegistry | ~190 |
+| 2d (Integration) | Attestor, Policy, Scanner, BinaryIndex, Concelier, Libraries, EvidenceLocker, Orchestrator, Signals, Authority, Signer, Cryptography, ReachGraph, Graph, RiskEngine, Replay, Unknowns, Scheduler, TaskRunner, Timeline, Notifier, Findings, SbomService, Mirror, Feedser, Analyzers | ~700 |
+| Manual (skip) | AirGap (subset), SmRemote (HSM), DevOps (infra) | ~25 |
+
+---
+
+## 4. State File Format
+
+Per-module state files live at `docs/qa/feature-checks/state/<module>.json`.
+
+```json
+{
+  "module": "gateway",
+  "featureCount": 8,
+  "lastUpdatedUtc": "2026-02-09T12:00:00Z",
+  "features": {
+    "router-back-pressure-middleware": {
+      "status": "queued",
+      "tier": 0,
+      "retryCount": 0,
+      "sourceVerified": null,
+      "buildVerified": null,
+      "e2eVerified": null,
+      "skipReason": null,
+      "lastRunId": null,
+      "lastUpdatedUtc": "2026-02-09T12:00:00Z",
+      "featureFile": "docs/features/unchecked/gateway/router-back-pressure-middleware.md",
+      "notes": []
+    }
+  }
+}
+```
+
+### State File Rules
+
+- **Single writer**: Only the orchestrator writes state files
+- **Subagents report back**: Subagents return results to the orchestrator via their output; they do NOT write state files directly
+- **Atomic updates**: Each state transition must update `lastUpdatedUtc`
+- **Append-only notes**: The `notes` array is append-only; never remove entries
+
+---
+
+## 5. Run Artifact Format
+
+Each verification run produces artifacts under:
+`docs/qa/feature-checks/runs/<module>/<feature-slug>/<runId>/`
+
+Where `<runId>` = `run-001`, `run-002`, etc. (zero-padded, sequential).
+
+### Required Artifacts
+
+| Stage | File | Format |
+|-------|------|--------|
+| Tier 0 | `tier0-source-check.json` | `{ "filesChecked": [...], "found": [...], "missing": [...], "verdict": "pass\|fail\|partial" }` |
+| Tier 1 | `tier1-build-check.json` | `{ "project": "...", "buildResult": "pass\|fail", "testResult": "pass\|fail\|skipped", "errors": [...] }` |
+| Tier 2 | `tier2-e2e-check.json` | `{ "steps": [{ "description": "...", "result": "pass\|fail", "evidence": "..." }], "screenshots": [...] }` |
+| Triage | `triage.json` | `{ "rootCause": "...", "category": "missing_code\|bug\|config\|test_gap\|env_issue", "affectedFiles": [...], "confidence": 0.0-1.0 }` |
+| Confirm | `confirmation.json` | `{ "approved": true\|false, "reason": "...", "revisedRootCause": "..." }` |
+| Fix | `fix-summary.json` | `{ "filesModified": [...], "testsAdded": [...], "description": "..." }` |
+| Retest | `retest-result.json` | `{ "previousFailures": [...], "retestResults": [...], "verdict": "pass\|fail" }` |
+
+### Screenshot Convention
+
+Screenshots for Tier 2 go in `<runId>/screenshots/` with names:
+`step-<N>-<description-slug>.png`
+
+---
+
+## 6. Priority Rules
+
+When selecting the next feature to process, the orchestrator follows this priority order:
+
+1. **`retesting`** - Finish in-progress retests first
+2. **`fixing`** - Complete in-progress fixes
+3. **`confirmed`** - Confirmed issues ready for fix
+4. **`triaged`** - Triaged issues ready for confirmation
+5. **`failed`** (retryCount < 3) - Failed features ready for triage
+6. **`queued`** - New features not yet checked
+
+Within the same priority level, prefer:
+- Features in smaller modules first (faster to clear a module completely)
+- Features with lower `retryCount`
+- Alphabetical by feature slug (deterministic ordering)
+
+---
+
+## 7. File Movement Rules
+
+### On `passed` -> `done`
+
+1. Copy feature file from `docs/features/unchecked/<module>/<feature>.md` to `docs/features/checked/<module>/<feature>.md`
+2. Update the status line in the file from `IMPLEMENTED` to `VERIFIED`
+3. Append a `## Verification` section with the run ID and date
+4. Remove the original from `unchecked/`
+5. Create the target module directory in `checked/` if it doesn't exist
+
+### On `not_implemented`
+
+1. Copy feature file from `docs/features/unchecked/<module>/<feature>.md` to `docs/features/unimplemented/<module>/<feature>.md`
+2. Update status from `IMPLEMENTED` to `PARTIALLY_IMPLEMENTED`
+3. Add notes about what was missing
+4. Remove the original from `unchecked/`
+
+### On `blocked`
+
+- Do NOT move the file
+- Add a `## Blocked` section to the feature file in `unchecked/` with the reason
+- The feature stays in `unchecked/` until a human unblocks it
+
+---
+
+## 8. Agent Contracts
+
+### stella-orchestrator
+- **Reads**: State files, feature files (to pick next work)
+- **Writes**: State files, moves feature files on pass/fail
+- **Dispatches**: Subagents with specific feature context
+- **Rule**: NEVER run checks itself; always delegate to subagents
+
+### stella-feature-checker
+- **Receives**: Feature file path, current tier, module info
+- **Reads**: Feature .md file, source code files, build output
+- **Executes**: File existence checks, `dotnet build`, `dotnet test`, Playwright CLI
+- **Returns**: Tier check results (JSON) to orchestrator
+- **Rule**: Read-only on feature files; never modify source code; never write state
+
+### stella-issue-finder
+- **Receives**: Check failure details, feature file path
+- **Reads**: Source code in the relevant module, test files, build errors
+- **Returns**: Triage JSON with root cause, category, affected files, confidence
+- **Rule**: Read-only; never modify files; fast analysis
+
+### stella-issue-confirmer
+- **Receives**: Triage JSON, feature file path
+- **Reads**: Same source code as finder, plus broader context
+- **Returns**: Confirmation JSON (approved/rejected with reason)
+- **Rule**: Read-only; never modify files; thorough analysis
+
+### stella-fixer
+- **Receives**: Confirmed triage, feature file path, affected files list
+- **Writes**: Source code fixes, new/updated tests
+- **Returns**: Fix summary JSON
+- **Rule**: Only modify files listed in confirmed triage; add tests for every change; follow CODE_OF_CONDUCT.md
+
+### stella-retester
+- **Receives**: Feature file path, previous failure details, fix summary
+- **Executes**: Same checks as feature-checker for the tiers that previously failed
+- **Returns**: Retest result JSON
+- **Rule**: Same constraints as feature-checker; never modify source code
+
+---
+
+## 9. Environment Prerequisites
+
+Before running Tier 1+ checks, ensure:
+
+### Backend (.NET)
+```bash
+# Verify .NET SDK is available
+dotnet --version  # Expected: 10.0.x
+
+# Verify the solution builds
+dotnet build src/StellaOps.sln --no-restore
+```
+
+### Frontend (Angular)
+```bash
+# Verify Node.js and Angular CLI
+node --version    # Expected: 22.x
+npx ng version   # Expected: 21.x
+
+# Build the frontend
+cd src/Web/StellaOps.Web && npm ci && npx ng build
+```
+
+### Playwright (Tier 2 only)
+```bash
+npx playwright install chromium
+```
+
+### Application Runtime (Tier 2 only)
+```bash
+# Start backend + frontend (if docker compose exists)
+docker compose -f devops/compose/docker-compose.dev.yml up -d
+
+# Or run individually
+cd src/Web/StellaOps.Web && npx ng serve &
+```
+
+If the environment is not available, Tier 2 checks should be marked `skipped`
+with `skipReason: "application not running"`.
+
+---
+
+## 10. Cost Estimation
+
+| Tier | Per Feature | 1,144 Features | Notes |
+|------|-------------|-----------------|-------|
+| Tier 0 | ~$0.01 | ~$11 | File existence only |
+| Tier 1 | ~$0.05 | ~$57 | Build + test |
+| Tier 2 | ~$0.50 | ~$165 (330 UI features) | Playwright + Opus |
+| Triage | ~$0.10 | ~$30 (est. 300 failures) | Sonnet |
+| Confirm | ~$0.15 | ~$30 (est. 200 confirmed) | Opus |
+| Fix | ~$0.50 | ~$75 (est. 150 fixes) | o3 |
+| Retest | ~$0.20 | ~$30 (est. 150 retests) | Opus |
+| **Total** | | **~$400** | Conservative estimate |
+
+Run Tier 0 first to filter out `not_implemented` features before spending on higher tiers.