12 KiB
Testing Quality Guardrails Implementation
Overview
This document provides the master implementation plan for the Testing Quality Guardrails system derived from the 14-Dec-2025 - Testing and Quality Guardrails Technical Reference.md product advisory.
Source Advisory: docs/product-advisories/14-Dec-2025 - Testing and Quality Guardrails Technical Reference.md
Implementation Status: Planning Complete, Execution Pending
Executive Summary
The Testing Quality Guardrails implementation addresses gaps between the product advisory and the current StellaOps codebase. After analysis, we identified 6 high-value items to implement across 4 focused sprints.
What We're Implementing
| Item | Sprint | Value | Effort |
|---|---|---|---|
| Reachability quality gates in CI | 0350 | HIGH | LOW |
| TTFS regression tracking | 0350 | HIGH | LOW |
| Performance SLO enforcement | 0350 | HIGH | LOW |
| SCA Failure Catalogue (FC6-FC10) | 0351 | HIGH | MEDIUM |
| Security testing (OWASP Top 10) | 0352 | HIGH | MEDIUM |
| Mutation testing (Stryker.NET) | 0353 | MEDIUM | MEDIUM |
What We're NOT Implementing (and Why)
| Item | Reason |
|---|---|
toys/svc-XX/ directory restructure |
Already have equivalent in tests/reachability/corpus/ |
labels.yaml per-service format |
Already have reachgraph.truth.json with same semantics |
| Canonical TAG format | Can adopt incrementally, not blocking |
| Fix validation 100% pass rate | Too rigid; changed to 90% for fixable cases |
| Automated reviewer rejection | Over-engineering; human judgment needed |
Sprint Roadmap
┌─────────────────────────────────────────────────────────────────┐
│ TESTING QUALITY GUARDRAILS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Sprint 0350 Sprint 0351 │
│ CI Quality Gates SCA Failure Catalogue │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ Reachability│ │ FC6: Java │ │
│ │ TTFS │ │ FC7: .NET │ │
│ │ Performance │ │ FC8: Docker │ │
│ └─────────────┘ │ FC9: PURL │ │
│ │ │ FC10: CVE │ │
│ │ └─────────────┘ │
│ │ │ │
│ └──────────┬───────────────┘ │
│ │ │
│ ▼ │
│ Sprint 0352 Sprint 0353 │
│ Security Testing Mutation Testing │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ OWASP Top 10│ │ Stryker.NET │ │
│ │ A01-A10 │ │ Scanner │ │
│ │ 50+ tests │ │ Policy │ │
│ └─────────────┘ │ Authority │ │
│ └─────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Execution Order
- Sprint 0350 and Sprint 0351 can run in parallel (no dependencies)
- Sprint 0352 can run in parallel with 0350/0351
- Sprint 0353 should start after 0352 (security tests should be stable first)
Estimated Duration
| Sprint | Tasks | Estimated Effort |
|---|---|---|
| 0350 | 10 | 2-3 developer-days |
| 0351 | 10 | 3-4 developer-days |
| 0352 | 10 | 4-5 developer-days |
| 0353 | 10 | 3-4 developer-days |
| Total | 40 | 12-16 developer-days |
Sprint Details
Sprint 0350: CI Quality Gates Foundation
File: docs/implplan/SPRINT_0350_0001_0001_ci_quality_gates_foundation.md
Objective: Connect existing test infrastructure to CI enforcement
Key Deliverables:
scripts/ci/compute-reachability-metrics.sh- Compute recall/precision from corpusscripts/ci/reachability-thresholds.yaml- Enforcement thresholdsscripts/ci/compute-ttfs-metrics.sh- TTFS extraction from test runsbench/baselines/ttfs-baseline.json- TTFS targetsscripts/ci/enforce-performance-slos.sh- Performance SLO checks- CI workflow modifications for quality gates
Quality Thresholds:
thresholds:
runtime_dependency_recall: >= 0.95
unreachable_false_positives: <= 0.05
reachability_underreport: <= 0.10
ttfs_regression: <= +10% vs main
Sprint 0351: SCA Failure Catalogue Completion
File: docs/implplan/SPRINT_0351_0001_0001_sca_failure_catalogue_completion.md
Objective: Complete FC6-FC10 test cases for scanner regression testing
New Failure Cases:
| ID | Name | Failure Mode |
|---|---|---|
| FC6 | Java Shadow JAR | Shaded dependencies not detected |
| FC7 | .NET Transitive Pinning | CPM pins to vulnerable version |
| FC8 | Docker Multi-Stage Leakage | Build-time deps in runtime analysis |
| FC9 | PURL Namespace Collision | npm vs pypi same package name |
| FC10 | CVE Split/Merge | Single vuln with multiple CVE IDs |
Key Deliverables:
- 5 new fixture directories under
tests/fixtures/sca/catalogue/ - DSSE manifests for integrity verification
- xUnit test project for failure catalogue
- Updated documentation
Sprint 0352: Security Testing Framework
File: docs/implplan/SPRINT_0352_0001_0001_security_testing_framework.md
Objective: Systematic OWASP Top 10 coverage for StellaOps
Coverage Matrix:
| OWASP | Category | Test Count |
|---|---|---|
| A01 | Broken Access Control | 8+ |
| A02 | Cryptographic Failures | 6+ |
| A03 | Injection | 10+ |
| A05 | Security Misconfiguration | 6+ |
| A07 | Authentication Failures | 8+ |
| A08 | Integrity Failures | 5+ |
| A10 | SSRF | 8+ |
| Total | 7 categories | 50+ tests |
Key Deliverables:
tests/security/StellaOps.Security.Tests/- Security test projectMaliciousPayloads.cs- Common attack patternsSecurityTestBase.cs- Test infrastructure.gitea/workflows/security-tests.yml- Dedicated CI workflowdocs/testing/security-testing-guide.md- Documentation
Sprint 0353: Mutation Testing Integration
File: docs/implplan/SPRINT_0353_0001_0001_mutation_testing_integration.md
Objective: Measure test suite effectiveness with Stryker.NET
Target Modules:
| Module | Threshold (Break) | Threshold (High) |
|---|---|---|
| Scanner.Core | 60% | 85% |
| Policy.Engine | 60% | 85% |
| Authority.Core | 65% | 90% |
Key Deliverables:
- Stryker.NET configuration for each target module
scripts/ci/mutation-thresholds.yaml- Threshold configuration.gitea/workflows/mutation-testing.yml- Weekly mutation runsbench/baselines/mutation-baselines.json- Baseline scoresdocs/testing/mutation-testing-guide.md- Developer guide
Existing Infrastructure Mapping
The advisory describes structures that already exist under different names:
| Advisory Structure | Existing Equivalent | Notes |
|---|---|---|
toys/svc-XX/ |
tests/reachability/corpus/ |
Same purpose, different path |
labels.yaml |
reachgraph.truth.json |
Different schema, same semantics |
evidence/trace.json |
Evidence in attestor module | Already implemented |
PostgresFixture |
PostgresIntegrationFixture |
Already implemented |
FakeTimeProvider |
Authority tests, ConnectorTestHarness | Already used |
inputs.lock |
Exists in acceptance/guardrails | Already implemented |
Quality Gate Summary
After implementation, CI will enforce:
Reachability Gates
- Runtime dependency recall ≥ 95%
- Unreachable false positives ≤ 5%
- Reachability underreport ≤ 10%
Performance Gates
- Medium service scan < 2 minutes
- Reachability compute < 30 seconds
- SBOM ingestion < 5 seconds
TTFS Gates
- p50 < 2 seconds
- p95 < 5 seconds
- Regression ≤ +10% vs main
Coverage Gates
- Line coverage ≥ 70% (existing)
- Branch coverage ≥ 60% (existing)
- Mutation score ≥ 60-65% (break threshold)
Security Gates
- All security tests pass
- No OWASP Top 10 violations
File Changes Summary
New Files
scripts/ci/
├── compute-reachability-metrics.sh
├── compute-ttfs-metrics.sh
├── enforce-performance-slos.sh
├── enforce-thresholds.sh
├── enforce-mutation-thresholds.sh
├── extract-mutation-score.sh
├── reachability-thresholds.yaml
└── mutation-thresholds.yaml
bench/baselines/
├── ttfs-baseline.json
└── mutation-baselines.json
tests/
├── fixtures/sca/catalogue/
│ ├── fc6-java-shadow-jar/
│ ├── fc7-dotnet-transitive-pinning/
│ ├── fc8-docker-multistage-leakage/
│ ├── fc9-purl-namespace-collision/
│ └── fc10-cve-split-merge/
└── security/
└── StellaOps.Security.Tests/
.gitea/workflows/
├── security-tests.yml
└── mutation-testing.yml
.config/
└── dotnet-tools.json (stryker)
stryker-config.json (root)
src/Scanner/__Libraries/StellaOps.Scanner.Core/stryker-config.json
src/Policy/StellaOps.Policy.Engine/stryker-config.json
src/Authority/StellaOps.Authority.Core/stryker-config.json
docs/testing/
├── ci-quality-gates.md
├── security-testing-guide.md
└── mutation-testing-guide.md
Modified Files
.gitea/workflows/build-test-deploy.yml
tests/fixtures/sca/catalogue/inputs.lock
tests/fixtures/sca/catalogue/README.md
README.md (badges)
Rollback Strategy
If quality gates cause CI instability:
- Immediate: Set
failure_mode: warnin threshold configs - Short-term: Remove
needs:dependencies to unblock other jobs - Investigation: Create issue with specific threshold that failed
- Resolution: Either fix underlying issue or adjust threshold
- Re-enable: Set
failure_mode: blockafter verification
Success Metrics
| Metric | Current | Target | Measurement |
|---|---|---|---|
| FC Catalogue coverage | 5 cases | 10 cases | Count of fixtures |
| Security test coverage | Partial | 50+ tests | OWASP categories |
| Mutation score (Scanner) | Unknown | ≥ 70% | Stryker weekly |
| Mutation score (Policy) | Unknown | ≥ 70% | Stryker weekly |
| Mutation score (Authority) | Unknown | ≥ 80% | Stryker weekly |
| Quality gate pass rate | N/A | ≥ 95% | CI runs |
References
Sprint Files
docs/implplan/SPRINT_0350_0001_0001_ci_quality_gates_foundation.mddocs/implplan/SPRINT_0351_0001_0001_sca_failure_catalogue_completion.mddocs/implplan/SPRINT_0352_0001_0001_security_testing_framework.mddocs/implplan/SPRINT_0353_0001_0001_mutation_testing_integration.md
Source Advisory
docs/product-advisories/14-Dec-2025 - Testing and Quality Guardrails Technical Reference.md
Existing Documentation
docs/19_TEST_SUITE_OVERVIEW.mddocs/reachability/ground-truth-schema.mddocs/reachability/corpus-plan.mdtests/reachability/README.md
Document Version
| Field | Value |
|---|---|
| Version | 1.0 |
| Created | 2025-12-14 |
| Author | Platform Team |
| Status | Planning Complete |