Add reference architecture and testing strategy documentation
- Created a new document for the Stella Ops Reference Architecture outlining the system's topology, trust boundaries, artifact association, and interfaces. - Developed a comprehensive Testing Strategy document detailing the importance of offline readiness, interoperability, determinism, and operational guardrails. - Introduced a README for the Testing Strategy, summarizing processing details and key concepts implemented. - Added guidance for AI agents and developers in the tests directory, including directory structure, test categories, key patterns, and rules for test development.
This commit is contained in:
@@ -1,465 +0,0 @@
|
||||
# Testing and Quality Guardrails Technical Reference
|
||||
|
||||
**Source Advisories**:
|
||||
- 29-Nov-2025 - Acceptance Tests Pack and Guardrails
|
||||
- 29-Nov-2025 - SCA Failure Catalogue for StellaOps Tests
|
||||
- 30-Nov-2025 - Ecosystem Reality Test Cases for StellaOps
|
||||
- 14-Dec-2025 - Create a small ground‑truth corpus
|
||||
|
||||
**Last Updated**: 2025-12-14
|
||||
|
||||
---
|
||||
|
||||
## 1. ACCEPTANCE TEST PACK SCHEMA
|
||||
|
||||
### 1.1 Required Artifacts (MVP for DONE)
|
||||
|
||||
- Advisory summary under `docs/process/`
|
||||
- Checklist stub referencing AT1–AT10
|
||||
- Fixture pack path: `tests/acceptance/packs/guardrails/` (no network)
|
||||
- Links into sprint tracker (`SPRINT_0300_0001_0001_documentation_process.md`)
|
||||
|
||||
### 1.2 Determinism & Offline
|
||||
|
||||
- Freeze scanner/db versions; record in `inputs.lock`
|
||||
- All fixtures reproducible from seeds
|
||||
- Include DSSE envelopes for pack manifests
|
||||
|
||||
## 2. SCA FAILURE CATALOGUE (FC1-FC10)
|
||||
|
||||
### 2.1 Required Artifacts
|
||||
|
||||
- Catalogue plus fixture pack root: `tests/fixtures/sca/catalogue/`
|
||||
- Sprint Execution Log entry when published
|
||||
|
||||
### 2.2 Fixture Requirements
|
||||
|
||||
- Pin scanner versions and feeds
|
||||
- Include `inputs.lock` and DSSE manifest per case
|
||||
- Normalize results (ordering, casing) for stable comparisons
|
||||
|
||||
## 3. ECOSYSTEM REALITY TEST CASES (ET1-ET10)
|
||||
|
||||
**Fixture Path**: `tests/fixtures/sca/catalogue/`
|
||||
|
||||
**Requirements**:
|
||||
- Map each incident to acceptance tests and fixture paths
|
||||
- Pin tool versions and feeds; no live network
|
||||
- Populate fixtures and acceptance specs
|
||||
|
||||
## 4. GROUND-TRUTH CORPUS SCHEMA
|
||||
|
||||
### 4.1 Service Structure
|
||||
|
||||
Each service under `/toys/svc-XX-<name>/`:
|
||||
|
||||
```
|
||||
app/
|
||||
infra/ # Dockerfile, compose, network policy
|
||||
tests/ # positive + negative reachability tests
|
||||
labels.yaml # ground truth
|
||||
evidence/ # generated by tests (trace, tags, manifests)
|
||||
fix/ # minimal patch proving remediation
|
||||
```
|
||||
|
||||
### 4.2 labels.yaml Schema
|
||||
|
||||
```yaml
|
||||
service: svc-01-password-reset
|
||||
vulns:
|
||||
- id: V1
|
||||
cve: CVE-2022-XXXXX
|
||||
type: dep_runtime|dep_build|code|config|os_pkg|supply_chain
|
||||
package: string
|
||||
version: string
|
||||
reachable: true|false
|
||||
reachability_level: R0|R1|R2|R3|R4
|
||||
entrypoint: string # route:/reset, topic:jobs, cli:command
|
||||
preconditions: [string] # flags/env/auth
|
||||
path_tags: [string]
|
||||
proof:
|
||||
artifacts: [string]
|
||||
tags: [string]
|
||||
fix:
|
||||
type: upgrade|config|code
|
||||
patch_path: string
|
||||
expected_delta: string
|
||||
negative_proof: string # if unreachable
|
||||
```
|
||||
|
||||
### 4.3 Reachability Tiers
|
||||
|
||||
- **R0 Present**: component exists in SBOM, not imported/loaded
|
||||
- **R1 Loaded**: imported/linked/initialized, no executed path
|
||||
- **R2 Executed**: vulnerable function executed (deterministic trace)
|
||||
- **R3 Tainted execution**: execution with externally influenced input
|
||||
- **R4 Exploitable**: controlled, non-harmful PoC (optional)
|
||||
|
||||
### 4.4 Evidence Requirements per Tier
|
||||
|
||||
- **R0**: SBOM + file hash/package metadata
|
||||
- **R1**: runtime startup logs or module load trace tag
|
||||
- **R2**: callsite tag + stack trace snippet
|
||||
- **R3**: R2 + taint marker showing external data reached call
|
||||
- **R4**: only if safe/necessary; non-weaponized, sandboxed
|
||||
|
||||
### 4.5 Canonical Tag Format
|
||||
|
||||
```
|
||||
TAG:route:<method> <path>
|
||||
TAG:topic:<name>
|
||||
TAG:call:<sink>
|
||||
TAG:taint:<boundary>
|
||||
TAG:flag:<name>=<value>
|
||||
```
|
||||
|
||||
### 4.6 Evidence Artifact Schema
|
||||
|
||||
**evidence/trace.json**:
|
||||
```json
|
||||
{
|
||||
"ts": "UTC ISO-8601",
|
||||
"corr": "correlation-id",
|
||||
"tags": ["TAG:route:POST /reset", "TAG:taint:http.body.email", "TAG:call:Crypto.MD5"]
|
||||
}
|
||||
```
|
||||
|
||||
### 4.7 Evidence Manifest
|
||||
|
||||
**evidence/manifest.json**:
|
||||
```json
|
||||
{
|
||||
"git_sha": "string",
|
||||
"image_digest": "string",
|
||||
"tool_versions": {"scanner": "string", "db": "string"},
|
||||
"timestamps": {"started_at": "UTC ISO-8601", "completed_at": "UTC ISO-8601"},
|
||||
"evidence_hashes": {"trace.json": "sha256:...", "tags.log": "sha256:..."}
|
||||
}
|
||||
```
|
||||
|
||||
## 5. CORE TEST METRICS
|
||||
|
||||
| Metric | Definition |
|
||||
|--------|------------|
|
||||
| Recall (by class) | % of labeled vulns detected (runtime deps, OS pkgs, code, config) |
|
||||
| Precision | 1 - false positive rate |
|
||||
| Reachability accuracy | % correct R0/R1/R2/R3 classifications |
|
||||
| Overreach | Predicted reachable but labeled R0/R1 |
|
||||
| Underreach | Labeled R2/R3 but predicted non-reachable |
|
||||
| TTFS | Time-to-first-signal (first evidence-backed blocking issue) |
|
||||
| Fix validation | % of applied fixes producing expected delta |
|
||||
|
||||
## 6. TEST QUALITY GATES (CI ENFORCEMENT THRESHOLDS)
|
||||
|
||||
```yaml
|
||||
thresholds:
|
||||
runtime_dependency_recall: >= 0.95
|
||||
unreachable_false_positives: <= 0.05
|
||||
reachability_underreport: <= 0.10
|
||||
ttfs_regression: <= +10% vs main
|
||||
fix_validation_pass_rate: 100%
|
||||
```
|
||||
|
||||
## 7. SERVICE DEFINITION OF DONE
|
||||
|
||||
A service PR is DONE only if it includes:
|
||||
|
||||
- [ ] `labels.yaml` validated by `schemas/labels.schema.json`
|
||||
- [ ] Docker build reproducible (digest pinned, lockfiles committed)
|
||||
- [ ] Positive tests generating evidence proving reachability tiers
|
||||
- [ ] Negative tests proving "unreachable" claims
|
||||
- [ ] `fix/` patch removing/mitigating weakness with measurable delta
|
||||
- [ ] `evidence/manifest.json` capturing tool versions, git sha, image digest, timestamps, evidence hashes
|
||||
|
||||
## 8. REVIEWER REJECTION CRITERIA
|
||||
|
||||
Reject PR if any fail:
|
||||
|
||||
- [ ] Labels complete, schema-valid, stable IDs preserved
|
||||
- [ ] Proof artifacts deterministic and generated by tests
|
||||
- [ ] Reachability tier justified and matches evidence
|
||||
- [ ] Unreachable claims have negative proofs
|
||||
- [ ] Docker build uses pinned digests + committed lockfiles
|
||||
- [ ] `fix/` produces measurable delta without new unlabeled issues
|
||||
- [ ] No network egress required; tests hermetic
|
||||
|
||||
## 9. TEST HARNESS PATTERNS
|
||||
|
||||
### 9.1 xUnit Test Template
|
||||
|
||||
```csharp
|
||||
public class ReachabilityAcceptanceTests : IClassFixture<PostgresFixture>
|
||||
{
|
||||
private readonly PostgresFixture _db;
|
||||
|
||||
public ReachabilityAcceptanceTests(PostgresFixture db)
|
||||
{
|
||||
_db = db;
|
||||
}
|
||||
|
||||
[Theory]
|
||||
[InlineData("svc-01-password-reset", "V1", ReachabilityLevel.R2)]
|
||||
[InlineData("svc-02-file-upload", "V1", ReachabilityLevel.R0)]
|
||||
public async Task VerifyReachabilityClassification(
|
||||
string serviceId,
|
||||
string vulnId,
|
||||
ReachabilityLevel expectedLevel)
|
||||
{
|
||||
// Arrange
|
||||
var labels = await LoadLabels($"toys/{serviceId}/labels.yaml");
|
||||
var expectedVuln = labels.Vulns.First(v => v.Id == vulnId);
|
||||
|
||||
// Act
|
||||
var result = await _scanner.ScanAsync(serviceId);
|
||||
var actualVuln = result.Findings.First(f => f.VulnId == vulnId);
|
||||
|
||||
// Assert
|
||||
Assert.Equal(expectedLevel, actualVuln.ReachabilityLevel);
|
||||
Assert.NotEmpty(actualVuln.Evidence);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 9.2 Testcontainers Pattern
|
||||
|
||||
```csharp
|
||||
public class PostgresFixture : IAsyncLifetime
|
||||
{
|
||||
private PostgreSqlContainer? _container;
|
||||
public string ConnectionString { get; private set; } = null!;
|
||||
|
||||
public async Task InitializeAsync()
|
||||
{
|
||||
_container = new PostgreSqlBuilder()
|
||||
.WithImage("postgres:16-alpine")
|
||||
.WithDatabase("stellaops_test")
|
||||
.WithUsername("test")
|
||||
.WithPassword("test")
|
||||
.Build();
|
||||
|
||||
await _container.StartAsync();
|
||||
ConnectionString = _container.GetConnectionString();
|
||||
|
||||
// Run migrations
|
||||
await RunMigrations(ConnectionString);
|
||||
}
|
||||
|
||||
public async Task DisposeAsync()
|
||||
{
|
||||
if (_container != null)
|
||||
await _container.DisposeAsync();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 10. FIXTURE ORGANIZATION
|
||||
|
||||
```
|
||||
tests/
|
||||
fixtures/
|
||||
sca/
|
||||
catalogue/
|
||||
FC001_openssl_version_range/
|
||||
inputs.lock
|
||||
sbom.cdx.json
|
||||
expected_findings.json
|
||||
dsse_manifest.json
|
||||
acceptance/
|
||||
packs/
|
||||
guardrails/
|
||||
AT001_reachability_present/
|
||||
AT002_reachability_loaded/
|
||||
AT003_reachability_executed/
|
||||
micro/
|
||||
motion/
|
||||
error/
|
||||
offline/
|
||||
toys/
|
||||
svc-01-password-reset/
|
||||
app/
|
||||
infra/
|
||||
tests/
|
||||
labels.yaml
|
||||
evidence/
|
||||
fix/
|
||||
```
|
||||
|
||||
## 11. DETERMINISTIC TEST REQUIREMENTS
|
||||
|
||||
### 11.1 Time Handling
|
||||
|
||||
- Freeze timers to `2025-12-04T12:00:00Z` in stories/e2e
|
||||
- Use `FakeTimeProvider` in .NET tests
|
||||
- Playwright: `useFakeTimers`
|
||||
|
||||
### 11.2 Random Number Generation
|
||||
|
||||
- Seed RNG with `0x5EED2025` unless scenario-specific
|
||||
- Never use `Random()` without explicit seed
|
||||
|
||||
### 11.3 Network Isolation
|
||||
|
||||
- No network calls in test execution
|
||||
- Offline assets bundled
|
||||
- Testcontainers for external dependencies
|
||||
- Mock external APIs
|
||||
|
||||
### 11.4 Snapshot Testing
|
||||
|
||||
- All fixtures stored under `tests/fixtures/`
|
||||
- Golden outputs checked into git
|
||||
- Stable ordering for arrays/objects
|
||||
- Strip volatile fields (timestamps, UUIDs) unless semantic
|
||||
|
||||
## 12. COVERAGE REQUIREMENTS
|
||||
|
||||
### 12.1 Unit Tests
|
||||
|
||||
- **Target**: ≥85% line coverage for core modules
|
||||
- **Critical paths**: 100% coverage required
|
||||
- **Exceptions**: UI glue code, generated code
|
||||
|
||||
### 12.2 Integration Tests
|
||||
|
||||
- **Database operations**: All repositories tested with Testcontainers
|
||||
- **API endpoints**: All endpoints tested with WebApplicationFactory
|
||||
- **External integrations**: Mocked or stubbed
|
||||
|
||||
### 12.3 End-to-End Tests
|
||||
|
||||
- **Critical workflows**: User registration → scan → triage → decision
|
||||
- **Happy paths**: All major features
|
||||
- **Error paths**: Authentication failures, network errors, data validation
|
||||
|
||||
## 13. PERFORMANCE TESTING
|
||||
|
||||
### 13.1 Benchmark Tests
|
||||
|
||||
```csharp
|
||||
[MemoryDiagnoser]
|
||||
public class ScannerBenchmarks
|
||||
{
|
||||
[Benchmark]
|
||||
public async Task ScanMediumImage()
|
||||
{
|
||||
// 100k LOC .NET service
|
||||
await _scanner.ScanAsync("medium-service");
|
||||
}
|
||||
|
||||
[Benchmark]
|
||||
public async Task ComputeReachability()
|
||||
{
|
||||
await _reachability.ComputeAsync(_testGraph);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 13.2 Performance Targets
|
||||
|
||||
| Operation | Target |
|
||||
|-----------|--------|
|
||||
| Medium service scan | < 2 minutes |
|
||||
| Reachability compute | < 30 seconds |
|
||||
| Query GET finding | < 200ms p95 |
|
||||
| SBOM ingestion | < 5 seconds |
|
||||
|
||||
## 14. MUTATION TESTING
|
||||
|
||||
### 14.1 Stryker Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"stryker-config": {
|
||||
"mutate": [
|
||||
"src/**/*.cs",
|
||||
"!src/**/*.Designer.cs",
|
||||
"!src/**/Migrations/**"
|
||||
],
|
||||
"test-runner": "dotnet",
|
||||
"threshold-high": 90,
|
||||
"threshold-low": 70,
|
||||
"threshold-break": 60
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 14.2 Mutation Score Targets
|
||||
|
||||
- **Critical modules**: ≥90%
|
||||
- **Standard modules**: ≥70%
|
||||
- **Break build**: <60%
|
||||
|
||||
## 15. SECURITY TESTING
|
||||
|
||||
### 15.1 OWASP Top 10 Coverage
|
||||
|
||||
- [ ] SQL Injection
|
||||
- [ ] XSS (Cross-Site Scripting)
|
||||
- [ ] CSRF (Cross-Site Request Forgery)
|
||||
- [ ] Authentication bypasses
|
||||
- [ ] Authorization bypasses
|
||||
- [ ] Sensitive data exposure
|
||||
- [ ] XML External Entities (XXE)
|
||||
- [ ] Broken Access Control
|
||||
- [ ] Security Misconfiguration
|
||||
- [ ] Insecure Deserialization
|
||||
|
||||
### 15.2 Dependency Scanning
|
||||
|
||||
```bash
|
||||
# SBOM generation
|
||||
dotnet sbom-tool generate -b ./bin -bc ./src -pn StellaOps -pv 1.0.0
|
||||
|
||||
# Vulnerability scanning
|
||||
dotnet list package --vulnerable --include-transitive
|
||||
```
|
||||
|
||||
## 16. CI/CD INTEGRATION
|
||||
|
||||
### 16.1 GitHub Actions Workflow
|
||||
|
||||
```yaml
|
||||
name: Test
|
||||
|
||||
on: [push, pull_request]
|
||||
|
||||
jobs:
|
||||
test:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/setup-dotnet@v4
|
||||
with:
|
||||
dotnet-version: '10.0.x'
|
||||
- name: Restore dependencies
|
||||
run: dotnet restore
|
||||
- name: Build
|
||||
run: dotnet build --no-restore
|
||||
- name: Test
|
||||
run: dotnet test --no-build --verbosity normal --collect:"XPlat Code Coverage"
|
||||
- name: Upload coverage
|
||||
uses: codecov/codecov-action@v4
|
||||
```
|
||||
|
||||
### 16.2 Quality Gates
|
||||
|
||||
- All tests pass
|
||||
- Coverage ≥85%
|
||||
- No high/critical vulnerabilities
|
||||
- Mutation score ≥70%
|
||||
- Performance regressions <10%
|
||||
|
||||
## 17. BENCH HARNESSES (SIGNED, REPRODUCIBLE METRICS)
|
||||
|
||||
Use the repo bench harness for moat-grade, reproducible comparisons and audit kits:
|
||||
- Harness root: `bench/README.md`
|
||||
- Signed finding bundles + verifiers live under `bench/findings/` and `bench/tools/`
|
||||
- Baseline comparisons and rollups live under `bench/results/`
|
||||
|
||||
Guardrail:
|
||||
- Any change to scanning/policy/proof logic must be covered by at least one deterministic bench scenario (or an extension of an existing one).
|
||||
|
||||
---
|
||||
|
||||
**Document Version**: 1.0
|
||||
**Target Platform**: .NET 10, PostgreSQL ≥16, Angular v17
|
||||
Reference in New Issue
Block a user