Add reference architecture and testing strategy documentation

- Created a new document for the Stella Ops Reference Architecture outlining the system's topology, trust boundaries, artifact association, and interfaces.
- Developed a comprehensive Testing Strategy document detailing the importance of offline readiness, interoperability, determinism, and operational guardrails.
- Introduced a README for the Testing Strategy, summarizing processing details and key concepts implemented.
- Added guidance for AI agents and developers in the tests directory, including directory structure, test categories, key patterns, and rules for test development.
This commit is contained in:
2025-12-22 07:59:15 +02:00
parent 5d398ec442
commit 53503cb407
96 changed files with 37565 additions and 71 deletions

View File

@@ -1,465 +0,0 @@
# Testing and Quality Guardrails Technical Reference
**Source Advisories**:
- 29-Nov-2025 - Acceptance Tests Pack and Guardrails
- 29-Nov-2025 - SCA Failure Catalogue for StellaOps Tests
- 30-Nov-2025 - Ecosystem Reality Test Cases for StellaOps
- 14-Dec-2025 - Create a small groundtruth corpus
**Last Updated**: 2025-12-14
---
## 1. ACCEPTANCE TEST PACK SCHEMA
### 1.1 Required Artifacts (MVP for DONE)
- Advisory summary under `docs/process/`
- Checklist stub referencing AT1AT10
- Fixture pack path: `tests/acceptance/packs/guardrails/` (no network)
- Links into sprint tracker (`SPRINT_0300_0001_0001_documentation_process.md`)
### 1.2 Determinism & Offline
- Freeze scanner/db versions; record in `inputs.lock`
- All fixtures reproducible from seeds
- Include DSSE envelopes for pack manifests
## 2. SCA FAILURE CATALOGUE (FC1-FC10)
### 2.1 Required Artifacts
- Catalogue plus fixture pack root: `tests/fixtures/sca/catalogue/`
- Sprint Execution Log entry when published
### 2.2 Fixture Requirements
- Pin scanner versions and feeds
- Include `inputs.lock` and DSSE manifest per case
- Normalize results (ordering, casing) for stable comparisons
## 3. ECOSYSTEM REALITY TEST CASES (ET1-ET10)
**Fixture Path**: `tests/fixtures/sca/catalogue/`
**Requirements**:
- Map each incident to acceptance tests and fixture paths
- Pin tool versions and feeds; no live network
- Populate fixtures and acceptance specs
## 4. GROUND-TRUTH CORPUS SCHEMA
### 4.1 Service Structure
Each service under `/toys/svc-XX-<name>/`:
```
app/
infra/ # Dockerfile, compose, network policy
tests/ # positive + negative reachability tests
labels.yaml # ground truth
evidence/ # generated by tests (trace, tags, manifests)
fix/ # minimal patch proving remediation
```
### 4.2 labels.yaml Schema
```yaml
service: svc-01-password-reset
vulns:
- id: V1
cve: CVE-2022-XXXXX
type: dep_runtime|dep_build|code|config|os_pkg|supply_chain
package: string
version: string
reachable: true|false
reachability_level: R0|R1|R2|R3|R4
entrypoint: string # route:/reset, topic:jobs, cli:command
preconditions: [string] # flags/env/auth
path_tags: [string]
proof:
artifacts: [string]
tags: [string]
fix:
type: upgrade|config|code
patch_path: string
expected_delta: string
negative_proof: string # if unreachable
```
### 4.3 Reachability Tiers
- **R0 Present**: component exists in SBOM, not imported/loaded
- **R1 Loaded**: imported/linked/initialized, no executed path
- **R2 Executed**: vulnerable function executed (deterministic trace)
- **R3 Tainted execution**: execution with externally influenced input
- **R4 Exploitable**: controlled, non-harmful PoC (optional)
### 4.4 Evidence Requirements per Tier
- **R0**: SBOM + file hash/package metadata
- **R1**: runtime startup logs or module load trace tag
- **R2**: callsite tag + stack trace snippet
- **R3**: R2 + taint marker showing external data reached call
- **R4**: only if safe/necessary; non-weaponized, sandboxed
### 4.5 Canonical Tag Format
```
TAG:route:<method> <path>
TAG:topic:<name>
TAG:call:<sink>
TAG:taint:<boundary>
TAG:flag:<name>=<value>
```
### 4.6 Evidence Artifact Schema
**evidence/trace.json**:
```json
{
"ts": "UTC ISO-8601",
"corr": "correlation-id",
"tags": ["TAG:route:POST /reset", "TAG:taint:http.body.email", "TAG:call:Crypto.MD5"]
}
```
### 4.7 Evidence Manifest
**evidence/manifest.json**:
```json
{
"git_sha": "string",
"image_digest": "string",
"tool_versions": {"scanner": "string", "db": "string"},
"timestamps": {"started_at": "UTC ISO-8601", "completed_at": "UTC ISO-8601"},
"evidence_hashes": {"trace.json": "sha256:...", "tags.log": "sha256:..."}
}
```
## 5. CORE TEST METRICS
| Metric | Definition |
|--------|------------|
| Recall (by class) | % of labeled vulns detected (runtime deps, OS pkgs, code, config) |
| Precision | 1 - false positive rate |
| Reachability accuracy | % correct R0/R1/R2/R3 classifications |
| Overreach | Predicted reachable but labeled R0/R1 |
| Underreach | Labeled R2/R3 but predicted non-reachable |
| TTFS | Time-to-first-signal (first evidence-backed blocking issue) |
| Fix validation | % of applied fixes producing expected delta |
## 6. TEST QUALITY GATES (CI ENFORCEMENT THRESHOLDS)
```yaml
thresholds:
runtime_dependency_recall: >= 0.95
unreachable_false_positives: <= 0.05
reachability_underreport: <= 0.10
ttfs_regression: <= +10% vs main
fix_validation_pass_rate: 100%
```
## 7. SERVICE DEFINITION OF DONE
A service PR is DONE only if it includes:
- [ ] `labels.yaml` validated by `schemas/labels.schema.json`
- [ ] Docker build reproducible (digest pinned, lockfiles committed)
- [ ] Positive tests generating evidence proving reachability tiers
- [ ] Negative tests proving "unreachable" claims
- [ ] `fix/` patch removing/mitigating weakness with measurable delta
- [ ] `evidence/manifest.json` capturing tool versions, git sha, image digest, timestamps, evidence hashes
## 8. REVIEWER REJECTION CRITERIA
Reject PR if any fail:
- [ ] Labels complete, schema-valid, stable IDs preserved
- [ ] Proof artifacts deterministic and generated by tests
- [ ] Reachability tier justified and matches evidence
- [ ] Unreachable claims have negative proofs
- [ ] Docker build uses pinned digests + committed lockfiles
- [ ] `fix/` produces measurable delta without new unlabeled issues
- [ ] No network egress required; tests hermetic
## 9. TEST HARNESS PATTERNS
### 9.1 xUnit Test Template
```csharp
public class ReachabilityAcceptanceTests : IClassFixture<PostgresFixture>
{
private readonly PostgresFixture _db;
public ReachabilityAcceptanceTests(PostgresFixture db)
{
_db = db;
}
[Theory]
[InlineData("svc-01-password-reset", "V1", ReachabilityLevel.R2)]
[InlineData("svc-02-file-upload", "V1", ReachabilityLevel.R0)]
public async Task VerifyReachabilityClassification(
string serviceId,
string vulnId,
ReachabilityLevel expectedLevel)
{
// Arrange
var labels = await LoadLabels($"toys/{serviceId}/labels.yaml");
var expectedVuln = labels.Vulns.First(v => v.Id == vulnId);
// Act
var result = await _scanner.ScanAsync(serviceId);
var actualVuln = result.Findings.First(f => f.VulnId == vulnId);
// Assert
Assert.Equal(expectedLevel, actualVuln.ReachabilityLevel);
Assert.NotEmpty(actualVuln.Evidence);
}
}
```
### 9.2 Testcontainers Pattern
```csharp
public class PostgresFixture : IAsyncLifetime
{
private PostgreSqlContainer? _container;
public string ConnectionString { get; private set; } = null!;
public async Task InitializeAsync()
{
_container = new PostgreSqlBuilder()
.WithImage("postgres:16-alpine")
.WithDatabase("stellaops_test")
.WithUsername("test")
.WithPassword("test")
.Build();
await _container.StartAsync();
ConnectionString = _container.GetConnectionString();
// Run migrations
await RunMigrations(ConnectionString);
}
public async Task DisposeAsync()
{
if (_container != null)
await _container.DisposeAsync();
}
}
```
## 10. FIXTURE ORGANIZATION
```
tests/
fixtures/
sca/
catalogue/
FC001_openssl_version_range/
inputs.lock
sbom.cdx.json
expected_findings.json
dsse_manifest.json
acceptance/
packs/
guardrails/
AT001_reachability_present/
AT002_reachability_loaded/
AT003_reachability_executed/
micro/
motion/
error/
offline/
toys/
svc-01-password-reset/
app/
infra/
tests/
labels.yaml
evidence/
fix/
```
## 11. DETERMINISTIC TEST REQUIREMENTS
### 11.1 Time Handling
- Freeze timers to `2025-12-04T12:00:00Z` in stories/e2e
- Use `FakeTimeProvider` in .NET tests
- Playwright: `useFakeTimers`
### 11.2 Random Number Generation
- Seed RNG with `0x5EED2025` unless scenario-specific
- Never use `Random()` without explicit seed
### 11.3 Network Isolation
- No network calls in test execution
- Offline assets bundled
- Testcontainers for external dependencies
- Mock external APIs
### 11.4 Snapshot Testing
- All fixtures stored under `tests/fixtures/`
- Golden outputs checked into git
- Stable ordering for arrays/objects
- Strip volatile fields (timestamps, UUIDs) unless semantic
## 12. COVERAGE REQUIREMENTS
### 12.1 Unit Tests
- **Target**: ≥85% line coverage for core modules
- **Critical paths**: 100% coverage required
- **Exceptions**: UI glue code, generated code
### 12.2 Integration Tests
- **Database operations**: All repositories tested with Testcontainers
- **API endpoints**: All endpoints tested with WebApplicationFactory
- **External integrations**: Mocked or stubbed
### 12.3 End-to-End Tests
- **Critical workflows**: User registration → scan → triage → decision
- **Happy paths**: All major features
- **Error paths**: Authentication failures, network errors, data validation
## 13. PERFORMANCE TESTING
### 13.1 Benchmark Tests
```csharp
[MemoryDiagnoser]
public class ScannerBenchmarks
{
[Benchmark]
public async Task ScanMediumImage()
{
// 100k LOC .NET service
await _scanner.ScanAsync("medium-service");
}
[Benchmark]
public async Task ComputeReachability()
{
await _reachability.ComputeAsync(_testGraph);
}
}
```
### 13.2 Performance Targets
| Operation | Target |
|-----------|--------|
| Medium service scan | < 2 minutes |
| Reachability compute | < 30 seconds |
| Query GET finding | < 200ms p95 |
| SBOM ingestion | < 5 seconds |
## 14. MUTATION TESTING
### 14.1 Stryker Configuration
```json
{
"stryker-config": {
"mutate": [
"src/**/*.cs",
"!src/**/*.Designer.cs",
"!src/**/Migrations/**"
],
"test-runner": "dotnet",
"threshold-high": 90,
"threshold-low": 70,
"threshold-break": 60
}
}
```
### 14.2 Mutation Score Targets
- **Critical modules**: 90%
- **Standard modules**: 70%
- **Break build**: <60%
## 15. SECURITY TESTING
### 15.1 OWASP Top 10 Coverage
- [ ] SQL Injection
- [ ] XSS (Cross-Site Scripting)
- [ ] CSRF (Cross-Site Request Forgery)
- [ ] Authentication bypasses
- [ ] Authorization bypasses
- [ ] Sensitive data exposure
- [ ] XML External Entities (XXE)
- [ ] Broken Access Control
- [ ] Security Misconfiguration
- [ ] Insecure Deserialization
### 15.2 Dependency Scanning
```bash
# SBOM generation
dotnet sbom-tool generate -b ./bin -bc ./src -pn StellaOps -pv 1.0.0
# Vulnerability scanning
dotnet list package --vulnerable --include-transitive
```
## 16. CI/CD INTEGRATION
### 16.1 GitHub Actions Workflow
```yaml
name: Test
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-dotnet@v4
with:
dotnet-version: '10.0.x'
- name: Restore dependencies
run: dotnet restore
- name: Build
run: dotnet build --no-restore
- name: Test
run: dotnet test --no-build --verbosity normal --collect:"XPlat Code Coverage"
- name: Upload coverage
uses: codecov/codecov-action@v4
```
### 16.2 Quality Gates
- All tests pass
- Coverage 85%
- No high/critical vulnerabilities
- Mutation score 70%
- Performance regressions <10%
## 17. BENCH HARNESSES (SIGNED, REPRODUCIBLE METRICS)
Use the repo bench harness for moat-grade, reproducible comparisons and audit kits:
- Harness root: `bench/README.md`
- Signed finding bundles + verifiers live under `bench/findings/` and `bench/tools/`
- Baseline comparisons and rollups live under `bench/results/`
Guardrail:
- Any change to scanning/policy/proof logic must be covered by at least one deterministic bench scenario (or an extension of an existing one).
---
**Document Version**: 1.0
**Target Platform**: .NET 10, PostgreSQL 16, Angular v17