save checkpoint

This commit is contained in:
master
2026-02-11 01:32:14 +02:00
parent 5593212b41
commit cf5b72974f
2316 changed files with 68799 additions and 3808 deletions

View File

@@ -5,6 +5,22 @@ for the automated feature verification pipeline.
All agents in the pipeline MUST read this document before taking any action.
> **THE PRIMARY GOAL IS END-TO-END BEHAVIORAL VERIFICATION.**
>
> This pipeline exists to prove that features **actually work** by exercising them
> as a real user would -- through APIs, CLIs, UIs, and integration tests. Tier 0
> (file checks) and Tier 1 (build + unit tests) are necessary prerequisites, but
> they are NOT the goal. **Tier 2 (E2E behavioral verification) is the goal.**
>
> Agents MUST:
> 1. Start Docker / Docker Desktop before running any checks
> 2. Set up required containers (Postgres, Redis, RabbitMQ, etc.)
> 3. Start application services needed for behavioral testing
> 4. Run ALL tiers including Tier 2 -- never stop at Tier 1
> 5. Act as a user: call APIs, run CLI commands, interact with UIs
>
> **Skipping Tier 2 is a verification failure, not a verification pass.**
---
## 1. Directory Layout
@@ -77,16 +93,20 @@ queued ──────────────> checking
- Blocked features require human review before re-entering the pipeline
- Each retry increments `retryCount` in state
### 2.4 Skip Criteria
### 2.4 Skip Criteria (STRICT - almost nothing qualifies)
Features that CANNOT be automatically E2E tested should be marked `skipped`:
- Air-gap/offline features (require disconnected environment)
- Crypto-sovereign features (require HSM/eIDAS hardware)
- Multi-node cluster features (require multi-host setup)
- Performance benchmarking features (require dedicated infra)
- Features with description containing "manual verification required"
Features may ONLY be marked `skipped` if they match one of these 3 physical constraints:
- `hardware_required`: Requires physical HSM, smart card, or eIDAS hardware token
- `multi_datacenter`: Requires geographically distributed infrastructure
- `air_gap_network`: Requires a physically disconnected network
The checker agent determines skip eligibility during Tier 0.
**Everything else MUST be tested.** Features that were previously classified as
"performance benchmarking" or "multi-node cluster" should be tested with whatever
scale is available locally (single-node Docker, local containers). Partial behavioral
verification is better than no verification.
The checker agent determines skip eligibility during Tier 0 and MUST justify the
skip with one of the 3 reasons above. Any other reason is invalid.
---
@@ -151,14 +171,39 @@ does what it claims.
**Cost**: ~0.10 USD per feature (compile + test execution + code reading)
### Tier 2: Behavioral Verification (API / CLI / UI)
### Tier 2: Behavioral Verification (API / CLI / UI) -- THE MAIN PURPOSE
**Purpose**: Verify the feature works end-to-end by actually exercising it through
its external interface. This is the only tier that proves the feature WORKS, not
just that code exists.
just that code exists. **This is the primary reason the verification pipeline exists.**
**EVERY feature MUST have a Tier 2 check unless explicitly skipped.** The check type
depends on the module's external surface.
**EVERY feature MUST have a Tier 2 check. E2E tests MUST NOT be skipped.** The whole
point of this pipeline is to act as a user and verify the software works. Tier 0 and
Tier 1 are prerequisites -- Tier 2 is the actual verification.
**If the environment is not set up, set it up.** If Docker is not running, start it.
If containers are not running, start them. If the app is not running, start it.
"Environment not ready" is never an excuse to skip -- it is a setup step the agent
must perform (see Section 9).
The check type depends on the module's external surface.
### Tier 2 Acceptance Gate (HARD REQUIREMENT)
A Tier 2 run is valid only if ALL of the following are true:
1. It uses a new run directory (`run-XYZ`) created for the current execution.
2. It contains fresh evidence captured in this run (new timestamps and new command/request outputs).
3. It includes user-surface interactions (HTTP requests, CLI invocations, or UI interactions), not only library test counts.
4. It verifies both positive and negative behavior paths when the feature has error semantics.
5. For rechecks, at least one new user transaction per feature is captured in the new run.
The following are forbidden and invalidate Tier 2:
- Copying a previous run directory and only editing `runId`, timestamps, or summary text.
- Declaring Tier 2 pass from suite totals alone without fresh request/response, command output, or UI step evidence.
- Reusing screenshots or response payloads from prior runs without replaying the interaction.
If any forbidden shortcut is detected, mark the feature `failed` with category `test_gap`
and rerun Tier 2 from scratch.
#### Tier 2a: API Testing (Gateway, Router, Api, Platform, backend services with HTTP endpoints)
@@ -181,6 +226,7 @@ curl -H "X-Forwarded-User: attacker" http://localhost:5000/api/test
{
"type": "api",
"baseUrl": "http://localhost:5000",
"capturedAtUtc": "2026-02-10T12:00:00Z",
"requests": [
{
"description": "Verify spoofed identity header is stripped",
@@ -191,7 +237,9 @@ curl -H "X-Forwarded-User: attacker" http://localhost:5000/api/test
"actualStatus": 200,
"assertion": "Response X-Forwarded-User header matches authenticated user, not 'attacker'",
"result": "pass|fail",
"evidence": "actual response headers/body"
"evidence": "actual response headers/body",
"requestCapturedAtUtc": "2026-02-10T12:00:01Z",
"responseSnippet": "HTTP/1.1 200 ..."
}
],
"verdict": "pass|fail|skip"
@@ -218,6 +266,7 @@ echo $? # Verify exit code 0
```json
{
"type": "cli",
"capturedAtUtc": "2026-02-10T12:00:00Z",
"commands": [
{
"description": "Verify baseline selection with last-green strategy",
@@ -226,7 +275,8 @@ echo $? # Verify exit code 0
"actualExitCode": 0,
"expectedOutput": "Using baseline: ...",
"actualOutput": "...",
"result": "pass|fail"
"result": "pass|fail",
"commandCapturedAtUtc": "2026-02-10T12:00:01Z"
}
],
"verdict": "pass|fail|skip"
@@ -253,6 +303,7 @@ npx playwright test --grep "pipeline-run" --reporter=json
{
"type": "ui",
"baseUrl": "http://localhost:4200",
"capturedAtUtc": "2026-02-10T12:00:00Z",
"steps": [
{
"description": "Navigate to /release-orchestrator/runs",
@@ -260,7 +311,8 @@ npx playwright test --grep "pipeline-run" --reporter=json
"target": "/release-orchestrator/runs",
"expected": "Runs list table renders with columns",
"result": "pass|fail",
"screenshot": "step-1-runs-list.png"
"screenshot": "step-1-runs-list.png",
"stepCapturedAtUtc": "2026-02-10T12:00:01Z"
}
],
"verdict": "pass|fail|skip"
@@ -290,6 +342,7 @@ dotnet test --filter "FullyQualifiedName~EwsCalculatorTests"
```json
{
"type": "integration",
"capturedAtUtc": "2026-02-10T12:00:00Z",
"testFilter": "FullyQualifiedName~EwsCalculatorTests",
"testsRun": 21,
"testsPassed": 21,
@@ -303,16 +356,27 @@ dotnet test --filter "FullyQualifiedName~EwsCalculatorTests"
}
```
### When to skip Tier 2
### When to skip Tier 2 (ALMOST NEVER)
Mark `skipped` ONLY for features that literally cannot be tested in the current environment:
- Air-gap features requiring a disconnected network
- HSM/eIDAS features requiring physical hardware
- Multi-datacenter features requiring distributed infrastructure
- Performance benchmark features requiring dedicated load-gen infrastructure
**Default: Tier 2 is MANDATORY.** Agents must exhaust all options before marking skip.
"The app isn't running" is NOT a skip reason -- it's a `failed` with `env_issue`.
"No tests exist" is NOT a skip reason -- write a focused test.
The ONLY acceptable skip reasons (must match exactly one):
- `hardware_required`: Feature requires physical HSM, smart card, or eIDAS token
- `multi_datacenter`: Feature requires geographically distributed infrastructure
- `air_gap_network`: Feature requires a physically disconnected network (not just no internet)
**These are NOT valid skip reasons:**
- "The app isn't running" -- **START IT** (see Section 9). If it won't start, mark `failed` with `env_issue`.
- "Docker isn't running" -- **START DOCKER** (see Section 9.0). If it won't start, mark `failed` with `env_issue`.
- "No E2E tests exist" -- **WRITE ONE.** A focused behavioral test that exercises the feature as a user would.
- "The database isn't set up" -- **SET IT UP** using Docker containers (see Section 9.1).
- "Environment not ready" -- **PREPARE IT.** That is part of the agent's job, not an excuse.
- "Too complex to test" -- Break it into smaller testable steps. Test what you can.
- "Only unit tests needed" -- Unit tests are Tier 1. Tier 2 is behavioral/integration/E2E.
- "Application not running" -- See "The app isn't running" above.
**If an agent skips Tier 2 without one of the 3 valid reasons above, the entire
feature verification is INVALID and must be re-run.**
### Tier Classification by Module
@@ -381,6 +445,14 @@ Where `<runId>` = `run-001`, `run-002`, etc. (zero-padded, sequential).
| Fix | `fix-summary.json` | `{ "filesModified": [...], "testsAdded": [...], "description": "..." }` |
| Retest | `retest-result.json` | `{ "previousFailures": [...], "retestResults": [...], "verdict": "pass\|fail" }` |
### Artifact Freshness Rules (MANDATORY)
- Every new run (`run-XYZ`) MUST be generated from fresh execution, not by copying prior run files.
- Every Tier 2 artifact MUST include `capturedAtUtc` and per-step/per-command/per-request capture times.
- Evidence fields MUST contain fresh raw output from the current run (response snippets, command output, screenshots, or logs).
- Recheck runs MUST include at least one newly captured user interaction per feature in that run directory.
- If a previous run is reused as input for convenience, that run is INVALID until all Tier 2 evidence files are regenerated.
### Screenshot Convention
Screenshots for Tier 2 go in `<runId>/screenshots/` with names:
@@ -442,9 +514,14 @@ Within the same priority level, prefer:
### stella-feature-checker
- **Receives**: Feature file path, current tier, module info
- **Reads**: Feature .md file, source code files, build output
- **Executes**: File existence checks, `dotnet build`, `dotnet test`, Playwright CLI
- **Executes**: File existence checks, `dotnet build`, `dotnet test`, Playwright CLI, Docker commands
- **Returns**: Tier check results (JSON) to orchestrator
- **Rule**: Read-only on feature files; never modify source code; never write state
- **MUST**: Set up required infrastructure (Docker, containers, databases) before testing.
Environment setup is part of the checker's job. If Docker is not running, start it.
If containers are needed, spin them up. If the app needs to be running, start it.
The checker MUST leave the environment in a testable state before running Tier 2.
- **MUST NOT**: copy a previous run's artifacts to satisfy Tier 2. Checker must capture fresh user-surface evidence for each run.
### stella-issue-finder
- **Receives**: Check failure details, feature file path
@@ -472,45 +549,113 @@ Within the same priority level, prefer:
---
## 9. Environment Prerequisites
## 9. Environment Prerequisites (MANDATORY - DO NOT SKIP)
Before running Tier 1+ checks, ensure:
The verification pipeline exists to prove features **actually work** from a user's
perspective. Agents MUST set up the full runtime environment before running checks.
Skipping environment setup is NEVER acceptable.
### Backend (.NET)
### 9.0 Docker / Container Runtime (MUST BE RUNNING FIRST)
Docker is required for Tier 1 tests (Testcontainers for Postgres, Redis, RabbitMQ, etc.)
and for Tier 2 behavioral checks (running services). **Start Docker before anything else.**
```bash
# Step 1: Ensure Docker Desktop is running (Windows/macOS)
# On Windows: Start Docker Desktop from Start Menu or:
Start-Process "C:\Program Files\Docker\Docker\Docker Desktop.exe"
# On macOS:
open -a Docker
# Step 2: Wait for Docker to be ready
docker info > /dev/null 2>&1
# If this fails, Docker is not running. DO NOT proceed without Docker.
# Retry up to 60 seconds:
for i in $(seq 1 12); do docker info > /dev/null 2>&1 && break || sleep 5; done
# Step 3: Verify Docker is functional
docker ps # Should return (possibly empty) container list without errors
```
**If Docker is not available or cannot start:** Mark all affected features as
`failed` with category `env_issue` and note `"Docker unavailable"`. Do NOT mark
them as `skipped` -- infrastructure failures are failures, not skips.
### 9.1 Container Setup and Cleanup
Before running tests, ensure a clean container state:
```bash
# Clean up any stale containers from previous runs
docker compose -f devops/compose/docker-compose.dev.yml down --volumes --remove-orphans 2>/dev/null || true
# Pull required images
docker compose -f devops/compose/docker-compose.dev.yml pull
# Start infrastructure services (Postgres, Redis, RabbitMQ, etc.)
docker compose -f devops/compose/docker-compose.dev.yml up -d
# Wait for services to be healthy (check health status)
docker compose -f devops/compose/docker-compose.dev.yml ps
# Verify all services show "healthy" or "running"
# If no docker-compose file exists, start minimum required services manually:
docker run -d --name stella-postgres -e POSTGRES_PASSWORD=stella -e POSTGRES_DB=stellaops -p 5432:5432 postgres:16-alpine
docker run -d --name stella-redis -p 6379:6379 redis:7-alpine
docker run -d --name stella-rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:3-management-alpine
```
### 9.2 Backend (.NET)
```bash
# Verify .NET SDK is available
dotnet --version # Expected: 10.0.x
# Verify the solution builds
dotnet build src/StellaOps.sln --no-restore
# Restore and build the solution
dotnet restore src/StellaOps.sln
dotnet build src/StellaOps.sln
```
### Frontend (Angular)
### 9.3 Frontend (Angular)
```bash
# Verify Node.js and Angular CLI
node --version # Expected: 22.x
npx ng version # Expected: 21.x
# Build the frontend
# Install dependencies and build
cd src/Web/StellaOps.Web && npm ci && npx ng build
```
### Playwright (Tier 2 only)
### 9.4 Playwright (Tier 2c UI testing)
```bash
npx playwright install chromium
```
### Application Runtime (Tier 2 only)
### 9.5 Application Runtime (Tier 2 - ALL behavioral checks)
The application MUST be running for Tier 2 checks. This is not optional.
```bash
# Start backend + frontend (if docker compose exists)
# Option A: Docker Compose (preferred - starts everything)
docker compose -f devops/compose/docker-compose.dev.yml up -d
# Or run individually
# Option B: Run services individually
# Backend API:
dotnet run --project src/Gateway/StellaOps.Gateway.WebService/StellaOps.Gateway.WebService.csproj &
# Frontend:
cd src/Web/StellaOps.Web && npx ng serve &
# Verify services are reachable
curl -s http://localhost:5000/health || echo "Backend not reachable"
curl -s http://localhost:4200 || echo "Frontend not reachable"
```
If the environment is not available, Tier 2 checks should be marked `skipped`
with `skipReason: "application not running"`.
### 9.6 Environment Teardown (after all checks complete)
```bash
# Stop and clean up all containers
docker compose -f devops/compose/docker-compose.dev.yml down --volumes --remove-orphans 2>/dev/null || true
docker rm -f stella-postgres stella-redis stella-rabbitmq 2>/dev/null || true
```
---