12 KiB
12 KiB
Testing and Quality Guardrails Technical Reference
Source Advisories:
- 29-Nov-2025 - Acceptance Tests Pack and Guardrails
- 29-Nov-2025 - SCA Failure Catalogue for StellaOps Tests
- 30-Nov-2025 - Ecosystem Reality Test Cases for StellaOps
- 14-Dec-2025 - Create a small ground‑truth corpus
Last Updated: 2025-12-14
1. ACCEPTANCE TEST PACK SCHEMA
1.1 Required Artifacts (MVP for DONE)
- Advisory summary under
docs/process/ - Checklist stub referencing AT1–AT10
- Fixture pack path:
tests/acceptance/packs/guardrails/(no network) - Links into sprint tracker (
SPRINT_0300_0001_0001_documentation_process.md)
1.2 Determinism & Offline
- Freeze scanner/db versions; record in
inputs.lock - All fixtures reproducible from seeds
- Include DSSE envelopes for pack manifests
2. SCA FAILURE CATALOGUE (FC1-FC10)
2.1 Required Artifacts
- Catalogue plus fixture pack root:
tests/fixtures/sca/catalogue/ - Sprint Execution Log entry when published
2.2 Fixture Requirements
- Pin scanner versions and feeds
- Include
inputs.lockand DSSE manifest per case - Normalize results (ordering, casing) for stable comparisons
3. ECOSYSTEM REALITY TEST CASES (ET1-ET10)
Fixture Path: tests/fixtures/sca/catalogue/
Requirements:
- Map each incident to acceptance tests and fixture paths
- Pin tool versions and feeds; no live network
- Populate fixtures and acceptance specs
4. GROUND-TRUTH CORPUS SCHEMA
4.1 Service Structure
Each service under /toys/svc-XX-<name>/:
app/
infra/ # Dockerfile, compose, network policy
tests/ # positive + negative reachability tests
labels.yaml # ground truth
evidence/ # generated by tests (trace, tags, manifests)
fix/ # minimal patch proving remediation
4.2 labels.yaml Schema
service: svc-01-password-reset
vulns:
- id: V1
cve: CVE-2022-XXXXX
type: dep_runtime|dep_build|code|config|os_pkg|supply_chain
package: string
version: string
reachable: true|false
reachability_level: R0|R1|R2|R3|R4
entrypoint: string # route:/reset, topic:jobs, cli:command
preconditions: [string] # flags/env/auth
path_tags: [string]
proof:
artifacts: [string]
tags: [string]
fix:
type: upgrade|config|code
patch_path: string
expected_delta: string
negative_proof: string # if unreachable
4.3 Reachability Tiers
- R0 Present: component exists in SBOM, not imported/loaded
- R1 Loaded: imported/linked/initialized, no executed path
- R2 Executed: vulnerable function executed (deterministic trace)
- R3 Tainted execution: execution with externally influenced input
- R4 Exploitable: controlled, non-harmful PoC (optional)
4.4 Evidence Requirements per Tier
- R0: SBOM + file hash/package metadata
- R1: runtime startup logs or module load trace tag
- R2: callsite tag + stack trace snippet
- R3: R2 + taint marker showing external data reached call
- R4: only if safe/necessary; non-weaponized, sandboxed
4.5 Canonical Tag Format
TAG:route:<method> <path>
TAG:topic:<name>
TAG:call:<sink>
TAG:taint:<boundary>
TAG:flag:<name>=<value>
4.6 Evidence Artifact Schema
evidence/trace.json:
{
"ts": "UTC ISO-8601",
"corr": "correlation-id",
"tags": ["TAG:route:POST /reset", "TAG:taint:http.body.email", "TAG:call:Crypto.MD5"]
}
4.7 Evidence Manifest
evidence/manifest.json:
{
"git_sha": "string",
"image_digest": "string",
"tool_versions": {"scanner": "string", "db": "string"},
"timestamps": {"started_at": "UTC ISO-8601", "completed_at": "UTC ISO-8601"},
"evidence_hashes": {"trace.json": "sha256:...", "tags.log": "sha256:..."}
}
5. CORE TEST METRICS
| Metric | Definition |
|---|---|
| Recall (by class) | % of labeled vulns detected (runtime deps, OS pkgs, code, config) |
| Precision | 1 - false positive rate |
| Reachability accuracy | % correct R0/R1/R2/R3 classifications |
| Overreach | Predicted reachable but labeled R0/R1 |
| Underreach | Labeled R2/R3 but predicted non-reachable |
| TTFS | Time-to-first-signal (first evidence-backed blocking issue) |
| Fix validation | % of applied fixes producing expected delta |
6. TEST QUALITY GATES (CI ENFORCEMENT THRESHOLDS)
thresholds:
runtime_dependency_recall: >= 0.95
unreachable_false_positives: <= 0.05
reachability_underreport: <= 0.10
ttfs_regression: <= +10% vs main
fix_validation_pass_rate: 100%
7. SERVICE DEFINITION OF DONE
A service PR is DONE only if it includes:
labels.yamlvalidated byschemas/labels.schema.json- Docker build reproducible (digest pinned, lockfiles committed)
- Positive tests generating evidence proving reachability tiers
- Negative tests proving "unreachable" claims
fix/patch removing/mitigating weakness with measurable deltaevidence/manifest.jsoncapturing tool versions, git sha, image digest, timestamps, evidence hashes
8. REVIEWER REJECTION CRITERIA
Reject PR if any fail:
- Labels complete, schema-valid, stable IDs preserved
- Proof artifacts deterministic and generated by tests
- Reachability tier justified and matches evidence
- Unreachable claims have negative proofs
- Docker build uses pinned digests + committed lockfiles
fix/produces measurable delta without new unlabeled issues- No network egress required; tests hermetic
9. TEST HARNESS PATTERNS
9.1 xUnit Test Template
public class ReachabilityAcceptanceTests : IClassFixture<PostgresFixture>
{
private readonly PostgresFixture _db;
public ReachabilityAcceptanceTests(PostgresFixture db)
{
_db = db;
}
[Theory]
[InlineData("svc-01-password-reset", "V1", ReachabilityLevel.R2)]
[InlineData("svc-02-file-upload", "V1", ReachabilityLevel.R0)]
public async Task VerifyReachabilityClassification(
string serviceId,
string vulnId,
ReachabilityLevel expectedLevel)
{
// Arrange
var labels = await LoadLabels($"toys/{serviceId}/labels.yaml");
var expectedVuln = labels.Vulns.First(v => v.Id == vulnId);
// Act
var result = await _scanner.ScanAsync(serviceId);
var actualVuln = result.Findings.First(f => f.VulnId == vulnId);
// Assert
Assert.Equal(expectedLevel, actualVuln.ReachabilityLevel);
Assert.NotEmpty(actualVuln.Evidence);
}
}
9.2 Testcontainers Pattern
public class PostgresFixture : IAsyncLifetime
{
private PostgreSqlContainer? _container;
public string ConnectionString { get; private set; } = null!;
public async Task InitializeAsync()
{
_container = new PostgreSqlBuilder()
.WithImage("postgres:16-alpine")
.WithDatabase("stellaops_test")
.WithUsername("test")
.WithPassword("test")
.Build();
await _container.StartAsync();
ConnectionString = _container.GetConnectionString();
// Run migrations
await RunMigrations(ConnectionString);
}
public async Task DisposeAsync()
{
if (_container != null)
await _container.DisposeAsync();
}
}
10. FIXTURE ORGANIZATION
tests/
fixtures/
sca/
catalogue/
FC001_openssl_version_range/
inputs.lock
sbom.cdx.json
expected_findings.json
dsse_manifest.json
acceptance/
packs/
guardrails/
AT001_reachability_present/
AT002_reachability_loaded/
AT003_reachability_executed/
micro/
motion/
error/
offline/
toys/
svc-01-password-reset/
app/
infra/
tests/
labels.yaml
evidence/
fix/
11. DETERMINISTIC TEST REQUIREMENTS
11.1 Time Handling
- Freeze timers to
2025-12-04T12:00:00Zin stories/e2e - Use
FakeTimeProviderin .NET tests - Playwright:
useFakeTimers
11.2 Random Number Generation
- Seed RNG with
0x5EED2025unless scenario-specific - Never use
Random()without explicit seed
11.3 Network Isolation
- No network calls in test execution
- Offline assets bundled
- Testcontainers for external dependencies
- Mock external APIs
11.4 Snapshot Testing
- All fixtures stored under
tests/fixtures/ - Golden outputs checked into git
- Stable ordering for arrays/objects
- Strip volatile fields (timestamps, UUIDs) unless semantic
12. COVERAGE REQUIREMENTS
12.1 Unit Tests
- Target: ≥85% line coverage for core modules
- Critical paths: 100% coverage required
- Exceptions: UI glue code, generated code
12.2 Integration Tests
- Database operations: All repositories tested with Testcontainers
- API endpoints: All endpoints tested with WebApplicationFactory
- External integrations: Mocked or stubbed
12.3 End-to-End Tests
- Critical workflows: User registration → scan → triage → decision
- Happy paths: All major features
- Error paths: Authentication failures, network errors, data validation
13. PERFORMANCE TESTING
13.1 Benchmark Tests
[MemoryDiagnoser]
public class ScannerBenchmarks
{
[Benchmark]
public async Task ScanMediumImage()
{
// 100k LOC .NET service
await _scanner.ScanAsync("medium-service");
}
[Benchmark]
public async Task ComputeReachability()
{
await _reachability.ComputeAsync(_testGraph);
}
}
13.2 Performance Targets
| Operation | Target |
|---|---|
| Medium service scan | < 2 minutes |
| Reachability compute | < 30 seconds |
| Query GET finding | < 200ms p95 |
| SBOM ingestion | < 5 seconds |
14. MUTATION TESTING
14.1 Stryker Configuration
{
"stryker-config": {
"mutate": [
"src/**/*.cs",
"!src/**/*.Designer.cs",
"!src/**/Migrations/**"
],
"test-runner": "dotnet",
"threshold-high": 90,
"threshold-low": 70,
"threshold-break": 60
}
}
14.2 Mutation Score Targets
- Critical modules: ≥90%
- Standard modules: ≥70%
- Break build: <60%
15. SECURITY TESTING
15.1 OWASP Top 10 Coverage
- SQL Injection
- XSS (Cross-Site Scripting)
- CSRF (Cross-Site Request Forgery)
- Authentication bypasses
- Authorization bypasses
- Sensitive data exposure
- XML External Entities (XXE)
- Broken Access Control
- Security Misconfiguration
- Insecure Deserialization
15.2 Dependency Scanning
# SBOM generation
dotnet sbom-tool generate -b ./bin -bc ./src -pn StellaOps -pv 1.0.0
# Vulnerability scanning
dotnet list package --vulnerable --include-transitive
16. CI/CD INTEGRATION
16.1 GitHub Actions Workflow
name: Test
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-dotnet@v4
with:
dotnet-version: '10.0.x'
- name: Restore dependencies
run: dotnet restore
- name: Build
run: dotnet build --no-restore
- name: Test
run: dotnet test --no-build --verbosity normal --collect:"XPlat Code Coverage"
- name: Upload coverage
uses: codecov/codecov-action@v4
16.2 Quality Gates
- All tests pass
- Coverage ≥85%
- No high/critical vulnerabilities
- Mutation score ≥70%
- Performance regressions <10%
17. BENCH HARNESSES (SIGNED, REPRODUCIBLE METRICS)
Use the repo bench harness for moat-grade, reproducible comparisons and audit kits:
- Harness root:
bench/README.md - Signed finding bundles + verifiers live under
bench/findings/andbench/tools/ - Baseline comparisons and rollups live under
bench/results/
Guardrail:
- Any change to scanning/policy/proof logic must be covered by at least one deterministic bench scenario (or an extension of an existing one).
Document Version: 1.0 Target Platform: .NET 10, PostgreSQL ≥16, Angular v17