Files
git.stella-ops.org/docs/product-advisories/14-Dec-2025 - Testing and Quality Guardrails Technical Reference.md
2025-12-14 19:58:38 +02:00

11 KiB
Raw Blame History

Testing and Quality Guardrails Technical Reference

Source Advisories:

  • 29-Nov-2025 - Acceptance Tests Pack and Guardrails
  • 29-Nov-2025 - SCA Failure Catalogue for StellaOps Tests
  • 30-Nov-2025 - Ecosystem Reality Test Cases for StellaOps
  • 14-Dec-2025 - Create a small groundtruth corpus

Last Updated: 2025-12-14


1. ACCEPTANCE TEST PACK SCHEMA

1.1 Required Artifacts (MVP for DONE)

  • Advisory summary under docs/process/
  • Checklist stub referencing AT1AT10
  • Fixture pack path: tests/acceptance/packs/guardrails/ (no network)
  • Links into sprint tracker (SPRINT_0300_0001_0001_documentation_process.md)

1.2 Determinism & Offline

  • Freeze scanner/db versions; record in inputs.lock
  • All fixtures reproducible from seeds
  • Include DSSE envelopes for pack manifests

2. SCA FAILURE CATALOGUE (FC1-FC10)

2.1 Required Artifacts

  • Catalogue plus fixture pack root: tests/fixtures/sca/catalogue/
  • Sprint Execution Log entry when published

2.2 Fixture Requirements

  • Pin scanner versions and feeds
  • Include inputs.lock and DSSE manifest per case
  • Normalize results (ordering, casing) for stable comparisons

3. ECOSYSTEM REALITY TEST CASES (ET1-ET10)

Fixture Path: tests/fixtures/sca/catalogue/

Requirements:

  • Map each incident to acceptance tests and fixture paths
  • Pin tool versions and feeds; no live network
  • Populate fixtures and acceptance specs

4. GROUND-TRUTH CORPUS SCHEMA

4.1 Service Structure

Each service under /toys/svc-XX-<name>/:

app/
infra/          # Dockerfile, compose, network policy
tests/          # positive + negative reachability tests
labels.yaml     # ground truth
evidence/       # generated by tests (trace, tags, manifests)
fix/            # minimal patch proving remediation

4.2 labels.yaml Schema

service: svc-01-password-reset
vulns:
  - id: V1
    cve: CVE-2022-XXXXX
    type: dep_runtime|dep_build|code|config|os_pkg|supply_chain
    package: string
    version: string
    reachable: true|false
    reachability_level: R0|R1|R2|R3|R4
    entrypoint: string  # route:/reset, topic:jobs, cli:command
    preconditions: [string]  # flags/env/auth
    path_tags: [string]
    proof:
      artifacts: [string]
      tags: [string]
    fix:
      type: upgrade|config|code
      patch_path: string
      expected_delta: string
    negative_proof: string  # if unreachable

4.3 Reachability Tiers

  • R0 Present: component exists in SBOM, not imported/loaded
  • R1 Loaded: imported/linked/initialized, no executed path
  • R2 Executed: vulnerable function executed (deterministic trace)
  • R3 Tainted execution: execution with externally influenced input
  • R4 Exploitable: controlled, non-harmful PoC (optional)

4.4 Evidence Requirements per Tier

  • R0: SBOM + file hash/package metadata
  • R1: runtime startup logs or module load trace tag
  • R2: callsite tag + stack trace snippet
  • R3: R2 + taint marker showing external data reached call
  • R4: only if safe/necessary; non-weaponized, sandboxed

4.5 Canonical Tag Format

TAG:route:<method> <path>
TAG:topic:<name>
TAG:call:<sink>
TAG:taint:<boundary>
TAG:flag:<name>=<value>

4.6 Evidence Artifact Schema

evidence/trace.json:

{
  "ts": "UTC ISO-8601",
  "corr": "correlation-id",
  "tags": ["TAG:route:POST /reset", "TAG:taint:http.body.email", "TAG:call:Crypto.MD5"]
}

4.7 Evidence Manifest

evidence/manifest.json:

{
  "git_sha": "string",
  "image_digest": "string",
  "tool_versions": {"scanner": "string", "db": "string"},
  "timestamps": {"started_at": "UTC ISO-8601", "completed_at": "UTC ISO-8601"},
  "evidence_hashes": {"trace.json": "sha256:...", "tags.log": "sha256:..."}
}

5. CORE TEST METRICS

Metric Definition
Recall (by class) % of labeled vulns detected (runtime deps, OS pkgs, code, config)
Precision 1 - false positive rate
Reachability accuracy % correct R0/R1/R2/R3 classifications
Overreach Predicted reachable but labeled R0/R1
Underreach Labeled R2/R3 but predicted non-reachable
TTFS Time-to-first-signal (first evidence-backed blocking issue)
Fix validation % of applied fixes producing expected delta

6. TEST QUALITY GATES (CI ENFORCEMENT THRESHOLDS)

thresholds:
  runtime_dependency_recall: >= 0.95
  unreachable_false_positives: <= 0.05
  reachability_underreport: <= 0.10
  ttfs_regression: <= +10% vs main
  fix_validation_pass_rate: 100%

7. SERVICE DEFINITION OF DONE

A service PR is DONE only if it includes:

  • labels.yaml validated by schemas/labels.schema.json
  • Docker build reproducible (digest pinned, lockfiles committed)
  • Positive tests generating evidence proving reachability tiers
  • Negative tests proving "unreachable" claims
  • fix/ patch removing/mitigating weakness with measurable delta
  • evidence/manifest.json capturing tool versions, git sha, image digest, timestamps, evidence hashes

8. REVIEWER REJECTION CRITERIA

Reject PR if any fail:

  • Labels complete, schema-valid, stable IDs preserved
  • Proof artifacts deterministic and generated by tests
  • Reachability tier justified and matches evidence
  • Unreachable claims have negative proofs
  • Docker build uses pinned digests + committed lockfiles
  • fix/ produces measurable delta without new unlabeled issues
  • No network egress required; tests hermetic

9. TEST HARNESS PATTERNS

9.1 xUnit Test Template

public class ReachabilityAcceptanceTests : IClassFixture<PostgresFixture>
{
    private readonly PostgresFixture _db;

    public ReachabilityAcceptanceTests(PostgresFixture db)
    {
        _db = db;
    }

    [Theory]
    [InlineData("svc-01-password-reset", "V1", ReachabilityLevel.R2)]
    [InlineData("svc-02-file-upload", "V1", ReachabilityLevel.R0)]
    public async Task VerifyReachabilityClassification(
        string serviceId,
        string vulnId,
        ReachabilityLevel expectedLevel)
    {
        // Arrange
        var labels = await LoadLabels($"toys/{serviceId}/labels.yaml");
        var expectedVuln = labels.Vulns.First(v => v.Id == vulnId);

        // Act
        var result = await _scanner.ScanAsync(serviceId);
        var actualVuln = result.Findings.First(f => f.VulnId == vulnId);

        // Assert
        Assert.Equal(expectedLevel, actualVuln.ReachabilityLevel);
        Assert.NotEmpty(actualVuln.Evidence);
    }
}

9.2 Testcontainers Pattern

public class PostgresFixture : IAsyncLifetime
{
    private PostgreSqlContainer? _container;
    public string ConnectionString { get; private set; } = null!;

    public async Task InitializeAsync()
    {
        _container = new PostgreSqlBuilder()
            .WithImage("postgres:16-alpine")
            .WithDatabase("stellaops_test")
            .WithUsername("test")
            .WithPassword("test")
            .Build();

        await _container.StartAsync();
        ConnectionString = _container.GetConnectionString();

        // Run migrations
        await RunMigrations(ConnectionString);
    }

    public async Task DisposeAsync()
    {
        if (_container != null)
            await _container.DisposeAsync();
    }
}

10. FIXTURE ORGANIZATION

tests/
  fixtures/
    sca/
      catalogue/
        FC001_openssl_version_range/
          inputs.lock
          sbom.cdx.json
          expected_findings.json
          dsse_manifest.json
    acceptance/
      packs/
        guardrails/
          AT001_reachability_present/
          AT002_reachability_loaded/
          AT003_reachability_executed/
    micro/
      motion/
      error/
      offline/
  toys/
    svc-01-password-reset/
      app/
      infra/
      tests/
      labels.yaml
      evidence/
      fix/

11. DETERMINISTIC TEST REQUIREMENTS

11.1 Time Handling

  • Freeze timers to 2025-12-04T12:00:00Z in stories/e2e
  • Use FakeTimeProvider in .NET tests
  • Playwright: useFakeTimers

11.2 Random Number Generation

  • Seed RNG with 0x5EED2025 unless scenario-specific
  • Never use Random() without explicit seed

11.3 Network Isolation

  • No network calls in test execution
  • Offline assets bundled
  • Testcontainers for external dependencies
  • Mock external APIs

11.4 Snapshot Testing

  • All fixtures stored under tests/fixtures/
  • Golden outputs checked into git
  • Stable ordering for arrays/objects
  • Strip volatile fields (timestamps, UUIDs) unless semantic

12. COVERAGE REQUIREMENTS

12.1 Unit Tests

  • Target: ≥85% line coverage for core modules
  • Critical paths: 100% coverage required
  • Exceptions: UI glue code, generated code

12.2 Integration Tests

  • Database operations: All repositories tested with Testcontainers
  • API endpoints: All endpoints tested with WebApplicationFactory
  • External integrations: Mocked or stubbed

12.3 End-to-End Tests

  • Critical workflows: User registration → scan → triage → decision
  • Happy paths: All major features
  • Error paths: Authentication failures, network errors, data validation

13. PERFORMANCE TESTING

13.1 Benchmark Tests

[MemoryDiagnoser]
public class ScannerBenchmarks
{
    [Benchmark]
    public async Task ScanMediumImage()
    {
        // 100k LOC .NET service
        await _scanner.ScanAsync("medium-service");
    }

    [Benchmark]
    public async Task ComputeReachability()
    {
        await _reachability.ComputeAsync(_testGraph);
    }
}

13.2 Performance Targets

Operation Target
Medium service scan < 2 minutes
Reachability compute < 30 seconds
Query GET finding < 200ms p95
SBOM ingestion < 5 seconds

14. MUTATION TESTING

14.1 Stryker Configuration

{
  "stryker-config": {
    "mutate": [
      "src/**/*.cs",
      "!src/**/*.Designer.cs",
      "!src/**/Migrations/**"
    ],
    "test-runner": "dotnet",
    "threshold-high": 90,
    "threshold-low": 70,
    "threshold-break": 60
  }
}

14.2 Mutation Score Targets

  • Critical modules: ≥90%
  • Standard modules: ≥70%
  • Break build: <60%

15. SECURITY TESTING

15.1 OWASP Top 10 Coverage

  • SQL Injection
  • XSS (Cross-Site Scripting)
  • CSRF (Cross-Site Request Forgery)
  • Authentication bypasses
  • Authorization bypasses
  • Sensitive data exposure
  • XML External Entities (XXE)
  • Broken Access Control
  • Security Misconfiguration
  • Insecure Deserialization

15.2 Dependency Scanning

# SBOM generation
dotnet sbom-tool generate -b ./bin -bc ./src -pn StellaOps -pv 1.0.0

# Vulnerability scanning
dotnet list package --vulnerable --include-transitive

16. CI/CD INTEGRATION

16.1 GitHub Actions Workflow

name: Test

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-dotnet@v4
        with:
          dotnet-version: '10.0.x'
      - name: Restore dependencies
        run: dotnet restore
      - name: Build
        run: dotnet build --no-restore
      - name: Test
        run: dotnet test --no-build --verbosity normal --collect:"XPlat Code Coverage"
      - name: Upload coverage
        uses: codecov/codecov-action@v4

16.2 Quality Gates

  • All tests pass
  • Coverage ≥85%
  • No high/critical vulnerabilities
  • Mutation score ≥70%
  • Performance regressions <10%

Document Version: 1.0 Target Platform: .NET 10, PostgreSQL ≥16, Angular v17