stella-ops.org/git.stella-ops.org

Fork 0

Files

StellaOps Bot 3411e825cd themesd advisories enhanced

2025-12-14 21:29:44 +02:00

12 KiB

Raw Permalink Blame History

Testing and Quality Guardrails Technical Reference

Source Advisories:

29-Nov-2025 - Acceptance Tests Pack and Guardrails
29-Nov-2025 - SCA Failure Catalogue for StellaOps Tests
30-Nov-2025 - Ecosystem Reality Test Cases for StellaOps
14-Dec-2025 - Create a small ground‑truth corpus

Last Updated: 2025-12-14

1. ACCEPTANCE TEST PACK SCHEMA

1.1 Required Artifacts (MVP for DONE)

Advisory summary under docs/process/
Checklist stub referencing AT1–AT10
Fixture pack path: tests/acceptance/packs/guardrails/ (no network)
Links into sprint tracker (SPRINT_0300_0001_0001_documentation_process.md)

1.2 Determinism & Offline

Freeze scanner/db versions; record in inputs.lock
All fixtures reproducible from seeds
Include DSSE envelopes for pack manifests

2. SCA FAILURE CATALOGUE (FC1-FC10)

2.1 Required Artifacts

Catalogue plus fixture pack root: tests/fixtures/sca/catalogue/
Sprint Execution Log entry when published

2.2 Fixture Requirements

Pin scanner versions and feeds
Include inputs.lock and DSSE manifest per case
Normalize results (ordering, casing) for stable comparisons

3. ECOSYSTEM REALITY TEST CASES (ET1-ET10)

Fixture Path: tests/fixtures/sca/catalogue/

Requirements:

Map each incident to acceptance tests and fixture paths
Pin tool versions and feeds; no live network
Populate fixtures and acceptance specs

4. GROUND-TRUTH CORPUS SCHEMA

4.1 Service Structure

Each service under /toys/svc-XX-<name>/:

app/
infra/          # Dockerfile, compose, network policy
tests/          # positive + negative reachability tests
labels.yaml     # ground truth
evidence/       # generated by tests (trace, tags, manifests)
fix/            # minimal patch proving remediation

4.2 labels.yaml Schema

service: svc-01-password-reset
vulns:
  - id: V1
    cve: CVE-2022-XXXXX
    type: dep_runtime|dep_build|code|config|os_pkg|supply_chain
    package: string
    version: string
    reachable: true|false
    reachability_level: R0|R1|R2|R3|R4
    entrypoint: string  # route:/reset, topic:jobs, cli:command
    preconditions: [string]  # flags/env/auth
    path_tags: [string]
    proof:
      artifacts: [string]
      tags: [string]
    fix:
      type: upgrade|config|code
      patch_path: string
      expected_delta: string
    negative_proof: string  # if unreachable

4.3 Reachability Tiers

R0 Present: component exists in SBOM, not imported/loaded
R1 Loaded: imported/linked/initialized, no executed path
R2 Executed: vulnerable function executed (deterministic trace)
R3 Tainted execution: execution with externally influenced input
R4 Exploitable: controlled, non-harmful PoC (optional)

4.4 Evidence Requirements per Tier

R0: SBOM + file hash/package metadata
R1: runtime startup logs or module load trace tag
R2: callsite tag + stack trace snippet
R3: R2 + taint marker showing external data reached call
R4: only if safe/necessary; non-weaponized, sandboxed

4.5 Canonical Tag Format

TAG:route:<method> <path>
TAG:topic:<name>
TAG:call:<sink>
TAG:taint:<boundary>
TAG:flag:<name>=<value>

4.6 Evidence Artifact Schema

evidence/trace.json:

{
  "ts": "UTC ISO-8601",
  "corr": "correlation-id",
  "tags": ["TAG:route:POST /reset", "TAG:taint:http.body.email", "TAG:call:Crypto.MD5"]
}

4.7 Evidence Manifest

evidence/manifest.json:

{
  "git_sha": "string",
  "image_digest": "string",
  "tool_versions": {"scanner": "string", "db": "string"},
  "timestamps": {"started_at": "UTC ISO-8601", "completed_at": "UTC ISO-8601"},
  "evidence_hashes": {"trace.json": "sha256:...", "tags.log": "sha256:..."}
}

5. CORE TEST METRICS

Metric	Definition
Recall (by class)	% of labeled vulns detected (runtime deps, OS pkgs, code, config)
Precision	1 - false positive rate
Reachability accuracy	% correct R0/R1/R2/R3 classifications
Overreach	Predicted reachable but labeled R0/R1
Underreach	Labeled R2/R3 but predicted non-reachable
TTFS	Time-to-first-signal (first evidence-backed blocking issue)
Fix validation	% of applied fixes producing expected delta

6. TEST QUALITY GATES (CI ENFORCEMENT THRESHOLDS)

thresholds:
  runtime_dependency_recall: >= 0.95
  unreachable_false_positives: <= 0.05
  reachability_underreport: <= 0.10
  ttfs_regression: <= +10% vs main
  fix_validation_pass_rate: 100%

7. SERVICE DEFINITION OF DONE

A service PR is DONE only if it includes:

labels.yaml validated by schemas/labels.schema.json
Docker build reproducible (digest pinned, lockfiles committed)
Positive tests generating evidence proving reachability tiers
Negative tests proving "unreachable" claims
fix/ patch removing/mitigating weakness with measurable delta
evidence/manifest.json capturing tool versions, git sha, image digest, timestamps, evidence hashes

8. REVIEWER REJECTION CRITERIA

Reject PR if any fail:

Labels complete, schema-valid, stable IDs preserved
Proof artifacts deterministic and generated by tests
Reachability tier justified and matches evidence
Unreachable claims have negative proofs
Docker build uses pinned digests + committed lockfiles
fix/ produces measurable delta without new unlabeled issues
No network egress required; tests hermetic

9. TEST HARNESS PATTERNS

9.1 xUnit Test Template

public class ReachabilityAcceptanceTests : IClassFixture<PostgresFixture>
{
    private readonly PostgresFixture _db;

    public ReachabilityAcceptanceTests(PostgresFixture db)
    {
        _db = db;
    }

    [Theory]
    [InlineData("svc-01-password-reset", "V1", ReachabilityLevel.R2)]
    [InlineData("svc-02-file-upload", "V1", ReachabilityLevel.R0)]
    public async Task VerifyReachabilityClassification(
        string serviceId,
        string vulnId,
        ReachabilityLevel expectedLevel)
    {
        // Arrange
        var labels = await LoadLabels($"toys/{serviceId}/labels.yaml");
        var expectedVuln = labels.Vulns.First(v => v.Id == vulnId);

        // Act
        var result = await _scanner.ScanAsync(serviceId);
        var actualVuln = result.Findings.First(f => f.VulnId == vulnId);

        // Assert
        Assert.Equal(expectedLevel, actualVuln.ReachabilityLevel);
        Assert.NotEmpty(actualVuln.Evidence);
    }
}

9.2 Testcontainers Pattern

public class PostgresFixture : IAsyncLifetime
{
    private PostgreSqlContainer? _container;
    public string ConnectionString { get; private set; } = null!;

    public async Task InitializeAsync()
    {
        _container = new PostgreSqlBuilder()
            .WithImage("postgres:16-alpine")
            .WithDatabase("stellaops_test")
            .WithUsername("test")
            .WithPassword("test")
            .Build();

        await _container.StartAsync();
        ConnectionString = _container.GetConnectionString();

        // Run migrations
        await RunMigrations(ConnectionString);
    }

    public async Task DisposeAsync()
    {
        if (_container != null)
            await _container.DisposeAsync();
    }
}

10. FIXTURE ORGANIZATION

tests/
  fixtures/
    sca/
      catalogue/
        FC001_openssl_version_range/
          inputs.lock
          sbom.cdx.json
          expected_findings.json
          dsse_manifest.json
    acceptance/
      packs/
        guardrails/
          AT001_reachability_present/
          AT002_reachability_loaded/
          AT003_reachability_executed/
    micro/
      motion/
      error/
      offline/
  toys/
    svc-01-password-reset/
      app/
      infra/
      tests/
      labels.yaml
      evidence/
      fix/

11. DETERMINISTIC TEST REQUIREMENTS

11.1 Time Handling

Freeze timers to 2025-12-04T12:00:00Z in stories/e2e
Use FakeTimeProvider in .NET tests
Playwright: useFakeTimers

11.2 Random Number Generation

Seed RNG with 0x5EED2025 unless scenario-specific
Never use Random() without explicit seed

11.3 Network Isolation

No network calls in test execution
Offline assets bundled
Testcontainers for external dependencies
Mock external APIs

11.4 Snapshot Testing

All fixtures stored under tests/fixtures/
Golden outputs checked into git
Stable ordering for arrays/objects
Strip volatile fields (timestamps, UUIDs) unless semantic

12. COVERAGE REQUIREMENTS

12.1 Unit Tests

Target: ≥85% line coverage for core modules
Critical paths: 100% coverage required
Exceptions: UI glue code, generated code

12.2 Integration Tests

Database operations: All repositories tested with Testcontainers
API endpoints: All endpoints tested with WebApplicationFactory
External integrations: Mocked or stubbed

12.3 End-to-End Tests

Critical workflows: User registration → scan → triage → decision
Happy paths: All major features
Error paths: Authentication failures, network errors, data validation

13. PERFORMANCE TESTING

13.1 Benchmark Tests

[MemoryDiagnoser]
public class ScannerBenchmarks
{
    [Benchmark]
    public async Task ScanMediumImage()
    {
        // 100k LOC .NET service
        await _scanner.ScanAsync("medium-service");
    }

    [Benchmark]
    public async Task ComputeReachability()
    {
        await _reachability.ComputeAsync(_testGraph);
    }
}

13.2 Performance Targets

Operation	Target
Medium service scan	< 2 minutes
Reachability compute	< 30 seconds
Query GET finding	< 200ms p95
SBOM ingestion	< 5 seconds

14. MUTATION TESTING

14.1 Stryker Configuration

{
  "stryker-config": {
    "mutate": [
      "src/**/*.cs",
      "!src/**/*.Designer.cs",
      "!src/**/Migrations/**"
    ],
    "test-runner": "dotnet",
    "threshold-high": 90,
    "threshold-low": 70,
    "threshold-break": 60
  }
}

14.2 Mutation Score Targets

Critical modules: ≥90%
Standard modules: ≥70%
Break build: <60%

15. SECURITY TESTING

15.1 OWASP Top 10 Coverage

SQL Injection
XSS (Cross-Site Scripting)
CSRF (Cross-Site Request Forgery)
Authentication bypasses
Authorization bypasses
Sensitive data exposure
XML External Entities (XXE)
Broken Access Control
Security Misconfiguration
Insecure Deserialization

15.2 Dependency Scanning

# SBOM generation
dotnet sbom-tool generate -b ./bin -bc ./src -pn StellaOps -pv 1.0.0

# Vulnerability scanning
dotnet list package --vulnerable --include-transitive

16. CI/CD INTEGRATION

16.1 GitHub Actions Workflow

name: Test

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-dotnet@v4
        with:
          dotnet-version: '10.0.x'
      - name: Restore dependencies
        run: dotnet restore
      - name: Build
        run: dotnet build --no-restore
      - name: Test
        run: dotnet test --no-build --verbosity normal --collect:"XPlat Code Coverage"
      - name: Upload coverage
        uses: codecov/codecov-action@v4

16.2 Quality Gates

All tests pass
Coverage ≥85%
No high/critical vulnerabilities
Mutation score ≥70%
Performance regressions <10%

17. BENCH HARNESSES (SIGNED, REPRODUCIBLE METRICS)

Use the repo bench harness for moat-grade, reproducible comparisons and audit kits:

Harness root: bench/README.md
Signed finding bundles + verifiers live under bench/findings/ and bench/tools/
Baseline comparisons and rollups live under bench/results/

Guardrail:

Any change to scanning/policy/proof logic must be covered by at least one deterministic bench scenario (or an extension of an existing one).

Document Version: 1.0 Target Platform: .NET 10, PostgreSQL ≥16, Angular v17

12 KiB Raw Permalink Blame History Unescape Escape