Add reference architecture and testing strategy documentation

- Created a new document for the Stella Ops Reference Architecture outlining the system's topology, trust boundaries, artifact association, and interfaces. - Developed a comprehensive Testing Strategy document detailing the importance of offline readiness, interoperability, determinism, and operational guardrails. - Introduced a README for the Testing Strategy, summarizing processing details and key concepts implemented. - Added guidance for AI agents and developers in the tests directory, including directory structure, test categories, key patterns, and rules for test development.
2025-12-22 07:59:15 +02:00
parent 5d398ec442
commit 53503cb407
96 changed files with 37565 additions and 71 deletions
--- a/docs/product-advisories/14-Dec-2025
+++ b/docs/product-advisories/14-Dec-2025
@@ -1,465 +0,0 @@
-# Testing and Quality Guardrails Technical Reference
-
-**Source Advisories**:
- 29-Nov-2025 - Acceptance Tests Pack and Guardrails
- 29-Nov-2025 - SCA Failure Catalogue for StellaOps Tests
- 30-Nov-2025 - Ecosystem Reality Test Cases for StellaOps
- 14-Dec-2025 - Create a small ground‑truth corpus
-
-**Last Updated**: 2025-12-14
-
---
-
-## 1. ACCEPTANCE TEST PACK SCHEMA
-
-### 1.1 Required Artifacts (MVP for DONE)
-
- Advisory summary under `docs/process/`
- Checklist stub referencing AT1–AT10
- Fixture pack path: `tests/acceptance/packs/guardrails/` (no network)
- Links into sprint tracker (`SPRINT_0300_0001_0001_documentation_process.md`)
-
-### 1.2 Determinism & Offline
-
- Freeze scanner/db versions; record in `inputs.lock`
- All fixtures reproducible from seeds
- Include DSSE envelopes for pack manifests
-
-## 2. SCA FAILURE CATALOGUE (FC1-FC10)
-
-### 2.1 Required Artifacts
-
- Catalogue plus fixture pack root: `tests/fixtures/sca/catalogue/`
- Sprint Execution Log entry when published
-
-### 2.2 Fixture Requirements
-
- Pin scanner versions and feeds
- Include `inputs.lock` and DSSE manifest per case
- Normalize results (ordering, casing) for stable comparisons
-
-## 3. ECOSYSTEM REALITY TEST CASES (ET1-ET10)
-
-**Fixture Path**: `tests/fixtures/sca/catalogue/`
-
-**Requirements**:
- Map each incident to acceptance tests and fixture paths
- Pin tool versions and feeds; no live network
- Populate fixtures and acceptance specs
-
-## 4. GROUND-TRUTH CORPUS SCHEMA
-
-### 4.1 Service Structure
-
-Each service under `/toys/svc-XX-<name>/`:
-
-```
-app/
-infra/          # Dockerfile, compose, network policy
-tests/          # positive + negative reachability tests
-labels.yaml     # ground truth
-evidence/       # generated by tests (trace, tags, manifests)
-fix/            # minimal patch proving remediation
-```
-
-### 4.2 labels.yaml Schema
-
-```yaml
-service: svc-01-password-reset
-vulns:
-  - id: V1
-    cve: CVE-2022-XXXXX
-    type: dep_runtime|dep_build|code|config|os_pkg|supply_chain
-    package: string
-    version: string
-    reachable: true|false
-    reachability_level: R0|R1|R2|R3|R4
-    entrypoint: string  # route:/reset, topic:jobs, cli:command
-    preconditions: [string]  # flags/env/auth
-    path_tags: [string]
-    proof:
-      artifacts: [string]
-      tags: [string]
-    fix:
-      type: upgrade|config|code
-      patch_path: string
-      expected_delta: string
-    negative_proof: string  # if unreachable
-```
-
-### 4.3 Reachability Tiers
-
- **R0 Present**: component exists in SBOM, not imported/loaded
- **R1 Loaded**: imported/linked/initialized, no executed path
- **R2 Executed**: vulnerable function executed (deterministic trace)
- **R3 Tainted execution**: execution with externally influenced input
- **R4 Exploitable**: controlled, non-harmful PoC (optional)
-
-### 4.4 Evidence Requirements per Tier
-
- **R0**: SBOM + file hash/package metadata
- **R1**: runtime startup logs or module load trace tag
- **R2**: callsite tag + stack trace snippet
- **R3**: R2 + taint marker showing external data reached call
- **R4**: only if safe/necessary; non-weaponized, sandboxed
-
-### 4.5 Canonical Tag Format
-
-```
-TAG:route:<method> <path>
-TAG:topic:<name>
-TAG:call:<sink>
-TAG:taint:<boundary>
-TAG:flag:<name>=<value>
-```
-
-### 4.6 Evidence Artifact Schema
-
-**evidence/trace.json**:
-```json
-{
-  "ts": "UTC ISO-8601",
-  "corr": "correlation-id",
-  "tags": ["TAG:route:POST /reset", "TAG:taint:http.body.email", "TAG:call:Crypto.MD5"]
-}
-```
-
-### 4.7 Evidence Manifest
-
-**evidence/manifest.json**:
-```json
-{
-  "git_sha": "string",
-  "image_digest": "string",
-  "tool_versions": {"scanner": "string", "db": "string"},
-  "timestamps": {"started_at": "UTC ISO-8601", "completed_at": "UTC ISO-8601"},
-  "evidence_hashes": {"trace.json": "sha256:...", "tags.log": "sha256:..."}
-}
-```
-
-## 5. CORE TEST METRICS
-
-| Metric | Definition |
-|--------|------------|
-| Recall (by class) | % of labeled vulns detected (runtime deps, OS pkgs, code, config) |
-| Precision | 1 - false positive rate |
-| Reachability accuracy | % correct R0/R1/R2/R3 classifications |
-| Overreach | Predicted reachable but labeled R0/R1 |
-| Underreach | Labeled R2/R3 but predicted non-reachable |
-| TTFS | Time-to-first-signal (first evidence-backed blocking issue) |
-| Fix validation | % of applied fixes producing expected delta |
-
-## 6. TEST QUALITY GATES (CI ENFORCEMENT THRESHOLDS)
-
-```yaml
-thresholds:
-  runtime_dependency_recall: >= 0.95
-  unreachable_false_positives: <= 0.05
-  reachability_underreport: <= 0.10
-  ttfs_regression: <= +10% vs main
-  fix_validation_pass_rate: 100%
-```
-
-## 7. SERVICE DEFINITION OF DONE
-
-A service PR is DONE only if it includes:
-
- [ ] `labels.yaml` validated by `schemas/labels.schema.json`
- [ ] Docker build reproducible (digest pinned, lockfiles committed)
- [ ] Positive tests generating evidence proving reachability tiers
- [ ] Negative tests proving "unreachable" claims
- [ ] `fix/` patch removing/mitigating weakness with measurable delta
- [ ] `evidence/manifest.json` capturing tool versions, git sha, image digest, timestamps, evidence hashes
-
-## 8. REVIEWER REJECTION CRITERIA
-
-Reject PR if any fail:
-
- [ ] Labels complete, schema-valid, stable IDs preserved
- [ ] Proof artifacts deterministic and generated by tests
- [ ] Reachability tier justified and matches evidence
- [ ] Unreachable claims have negative proofs
- [ ] Docker build uses pinned digests + committed lockfiles
- [ ] `fix/` produces measurable delta without new unlabeled issues
- [ ] No network egress required; tests hermetic
-
-## 9. TEST HARNESS PATTERNS
-
-### 9.1 xUnit Test Template
-
-```csharp
-public class ReachabilityAcceptanceTests : IClassFixture<PostgresFixture>
-{
-    private readonly PostgresFixture _db;
-
-    public ReachabilityAcceptanceTests(PostgresFixture db)
-    {
-        _db = db;
-    }
-
-    [Theory]
-    [InlineData("svc-01-password-reset", "V1", ReachabilityLevel.R2)]
-    [InlineData("svc-02-file-upload", "V1", ReachabilityLevel.R0)]
-    public async Task VerifyReachabilityClassification(
-        string serviceId,
-        string vulnId,
-        ReachabilityLevel expectedLevel)
-    {
-        // Arrange
-        var labels = await LoadLabels($"toys/{serviceId}/labels.yaml");
-        var expectedVuln = labels.Vulns.First(v => v.Id == vulnId);
-
-        // Act
-        var result = await _scanner.ScanAsync(serviceId);
-        var actualVuln = result.Findings.First(f => f.VulnId == vulnId);
-
-        // Assert
-        Assert.Equal(expectedLevel, actualVuln.ReachabilityLevel);
-        Assert.NotEmpty(actualVuln.Evidence);
-    }
-}
-```
-
-### 9.2 Testcontainers Pattern
-
-```csharp
-public class PostgresFixture : IAsyncLifetime
-{
-    private PostgreSqlContainer? _container;
-    public string ConnectionString { get; private set; } = null!;
-
-    public async Task InitializeAsync()
-    {
-        _container = new PostgreSqlBuilder()
-            .WithImage("postgres:16-alpine")
-            .WithDatabase("stellaops_test")
-            .WithUsername("test")
-            .WithPassword("test")
-            .Build();
-
-        await _container.StartAsync();
-        ConnectionString = _container.GetConnectionString();
-
-        // Run migrations
-        await RunMigrations(ConnectionString);
-    }
-
-    public async Task DisposeAsync()
-    {
-        if (_container != null)
-            await _container.DisposeAsync();
-    }
-}
-```
-
-## 10. FIXTURE ORGANIZATION
-
-```
-tests/
-  fixtures/
-    sca/
-      catalogue/
-        FC001_openssl_version_range/
-          inputs.lock
-          sbom.cdx.json
-          expected_findings.json
-          dsse_manifest.json
-    acceptance/
-      packs/
-        guardrails/
-          AT001_reachability_present/
-          AT002_reachability_loaded/
-          AT003_reachability_executed/
-    micro/
-      motion/
-      error/
-      offline/
-  toys/
-    svc-01-password-reset/
-      app/
-      infra/
-      tests/
-      labels.yaml
-      evidence/
-      fix/
-```
-
-## 11. DETERMINISTIC TEST REQUIREMENTS
-
-### 11.1 Time Handling
-
- Freeze timers to `2025-12-04T12:00:00Z` in stories/e2e
- Use `FakeTimeProvider` in .NET tests
- Playwright: `useFakeTimers`
-
-### 11.2 Random Number Generation
-
- Seed RNG with `0x5EED2025` unless scenario-specific
- Never use `Random()` without explicit seed
-
-### 11.3 Network Isolation
-
- No network calls in test execution
- Offline assets bundled
- Testcontainers for external dependencies
- Mock external APIs
-
-### 11.4 Snapshot Testing
-
- All fixtures stored under `tests/fixtures/`
- Golden outputs checked into git
- Stable ordering for arrays/objects
- Strip volatile fields (timestamps, UUIDs) unless semantic
-
-## 12. COVERAGE REQUIREMENTS
-
-### 12.1 Unit Tests
-
- **Target**: ≥85% line coverage for core modules
- **Critical paths**: 100% coverage required
- **Exceptions**: UI glue code, generated code
-
-### 12.2 Integration Tests
-
- **Database operations**: All repositories tested with Testcontainers
- **API endpoints**: All endpoints tested with WebApplicationFactory
- **External integrations**: Mocked or stubbed
-
-### 12.3 End-to-End Tests
-
- **Critical workflows**: User registration → scan → triage → decision
- **Happy paths**: All major features
- **Error paths**: Authentication failures, network errors, data validation
-
-## 13. PERFORMANCE TESTING
-
-### 13.1 Benchmark Tests
-
-```csharp
-[MemoryDiagnoser]
-public class ScannerBenchmarks
-{
-    [Benchmark]
-    public async Task ScanMediumImage()
-    {
-        // 100k LOC .NET service
-        await _scanner.ScanAsync("medium-service");
-    }
-
-    [Benchmark]
-    public async Task ComputeReachability()
-    {
-        await _reachability.ComputeAsync(_testGraph);
-    }
-}
-```
-
-### 13.2 Performance Targets
-
-| Operation | Target |
-|-----------|--------|
-| Medium service scan | < 2 minutes |
-| Reachability compute | < 30 seconds |
-| Query GET finding | < 200ms p95 |
-| SBOM ingestion | < 5 seconds |
-
-## 14. MUTATION TESTING
-
-### 14.1 Stryker Configuration
-
-```json
-{
-  "stryker-config": {
-    "mutate": [
-      "src/**/*.cs",
-      "!src/**/*.Designer.cs",
-      "!src/**/Migrations/**"
-    ],
-    "test-runner": "dotnet",
-    "threshold-high": 90,
-    "threshold-low": 70,
-    "threshold-break": 60
-  }
-}
-```
-
-### 14.2 Mutation Score Targets
-
- **Critical modules**: ≥90%
- **Standard modules**: ≥70%
- **Break build**: <60%
-
-## 15. SECURITY TESTING
-
-### 15.1 OWASP Top 10 Coverage
-
- [ ] SQL Injection
- [ ] XSS (Cross-Site Scripting)
- [ ] CSRF (Cross-Site Request Forgery)
- [ ] Authentication bypasses
- [ ] Authorization bypasses
- [ ] Sensitive data exposure
- [ ] XML External Entities (XXE)
- [ ] Broken Access Control
- [ ] Security Misconfiguration
- [ ] Insecure Deserialization
-
-### 15.2 Dependency Scanning
-
-```bash
-# SBOM generation
-dotnet sbom-tool generate -b ./bin -bc ./src -pn StellaOps -pv 1.0.0
-
-# Vulnerability scanning
-dotnet list package --vulnerable --include-transitive
-```
-
-## 16. CI/CD INTEGRATION
-
-### 16.1 GitHub Actions Workflow
-
-```yaml
-name: Test
-
-on: [push, pull_request]
-
-jobs:
-  test:
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/checkout@v4
-      - uses: actions/setup-dotnet@v4
-        with:
-          dotnet-version: '10.0.x'
-      - name: Restore dependencies
-        run: dotnet restore
-      - name: Build
-        run: dotnet build --no-restore
-      - name: Test
-        run: dotnet test --no-build --verbosity normal --collect:"XPlat Code Coverage"
-      - name: Upload coverage
-        uses: codecov/codecov-action@v4
-```
-
-### 16.2 Quality Gates
-
- All tests pass
- Coverage ≥85%
- No high/critical vulnerabilities
- Mutation score ≥70%
- Performance regressions <10%
-
-## 17. BENCH HARNESSES (SIGNED, REPRODUCIBLE METRICS)
-
-Use the repo bench harness for moat-grade, reproducible comparisons and audit kits:
- Harness root: `bench/README.md`
- Signed finding bundles + verifiers live under `bench/findings/` and `bench/tools/`
- Baseline comparisons and rollups live under `bench/results/`
-
-Guardrail:
- Any change to scanning/policy/proof logic must be covered by at least one deterministic bench scenario (or an extension of an existing one).
-
---
-
-**Document Version**: 1.0
-**Target Platform**: .NET 10, PostgreSQL ≥16, Angular v17