Files
git.stella-ops.org/docs/technical/testing/ci-quality-gates.md
2026-01-28 02:30:48 +02:00

9.1 KiB

CI Quality Gates

Sprint: SPRINT_0350_0001_0001_ci_quality_gates_foundation
Task: QGATE-0350-009

Overview

StellaOps implements automated quality gates in CI to enforce:

  • Reachability Quality - Recall/precision thresholds for vulnerability detection
  • TTFS Regression - Time-to-First-Signal performance tracking
  • Performance SLOs - Scan time and compute budget enforcement

These gates run as part of the build-test-deploy.yml workflow after the main test suite completes.

Quality Gate Jobs

Reachability Quality Gate

Script: scripts/ci/compute-reachability-metrics.sh
Config: scripts/ci/reachability-thresholds.yaml

Validates that the scanner meets recall/precision thresholds against the ground-truth corpus.

Metrics Computed

Metric Description Threshold
runtime_dependency_recall % of runtime dep vulns detected ≥ 95%
unreachable_false_positives FP rate for unreachable findings ≤ 5%
reachability_underreport Underreporting rate ≤ 10%
os_package_recall % of OS package vulns detected ≥ 92%
code_vuln_recall % of code vulns detected ≥ 88%
config_vuln_recall % of config vulns detected ≥ 85%

Running Locally

# Dry run (no enforcement)
./scripts/ci/compute-reachability-metrics.sh --dry-run

# Full run against corpus
./scripts/ci/compute-reachability-metrics.sh

TTFS Regression Gate

Script: scripts/ci/compute-ttfs-metrics.sh
Baseline: bench/baselines/ttfs-baseline.json

Detects performance regressions in Time-to-First-Signal.

Metrics Computed

Metric Description Threshold
ttfs_p50_ms P50 time to first signal ≤ baseline + 10%
ttfs_p95_ms P95 time to first signal ≤ baseline + 15%
ttfs_max_ms Maximum TTFS ≤ baseline + 25%

Baseline Format

{
  "ttfs_p50_ms": 450,
  "ttfs_p95_ms": 1200,
  "ttfs_max_ms": 3000,
  "measured_at": "2025-12-16T00:00:00Z",
  "sample_count": 1000
}

Performance SLO Gate

Script: scripts/ci/enforce-performance-slos.sh
Config: scripts/ci/performance-slos.yaml

Enforces scan time and compute budget SLOs.

SLOs Enforced

SLO Description Target
scan_time_p50_ms P50 scan time ≤ 120,000ms (2 min)
scan_time_p95_ms P95 scan time ≤ 300,000ms (5 min)
memory_peak_mb Peak memory usage ≤ 2048 MB
cpu_seconds Total CPU time ≤ 120 seconds

Workflow Integration

Quality gates are integrated into the main CI workflow:

# .gitea/workflows/build-test-deploy.yml

quality-gates:
  runs-on: ${{ vars.LINUX_RUNNER_LABEL || 'ubuntu-latest' }}
  needs: build-test
  steps:
    - name: Reachability quality gate
      run: ./scripts/ci/compute-reachability-metrics.sh

    - name: TTFS regression gate
      run: ./scripts/ci/compute-ttfs-metrics.sh

    - name: Performance SLO gate
      run: ./scripts/ci/enforce-performance-slos.sh --warn-only

Failure Modes

Hard Failure (Blocks Merge)

  • Reachability recall below threshold
  • TTFS regression exceeds 25%
  • Memory budget exceeded by 50%

Soft Failure (Warning Only)

  • Minor TTFS regression (< 15%)
  • Memory near budget limit
  • Missing baseline data (new fixtures)

Adding New Quality Gates

  1. Create computation script in scripts/ci/
  2. Add threshold configuration (YAML or JSON)
  3. Integrate into workflow as a new step
  4. Update this documentation
  5. Add to sprint tracking

Troubleshooting

Gate Fails on PR but Passes on Main

Check for:

  • Non-deterministic test execution
  • Timing-sensitive assertions
  • Missing test fixtures in PR branch

Baseline Drift

If baselines become stale:

# Regenerate baselines
./scripts/ci/compute-ttfs-metrics.sh --update-baseline
./scripts/ci/compute-reachability-metrics.sh --update-baseline

Turn #6 Quality Gates (2026-01-27)

Source: Testing Enhancements (Automation Turn #6) Sprint: docs/implplan/SPRINT_0127_002_DOCS_testing_enhancements_turn6.md

Intent Violation Gate

Purpose: Detect test changes that violate declared intent categories.

Script: scripts/ci/check-intent-violations.sh

Check Description Action
Intent missing Non-trivial test without Intent trait Warning (regulatory modules: Error)
Intent contradiction Test behavior contradicts declared intent Error
Intent coverage drop Module intent coverage decreased Warning

Enforcement:

  • PR-gating for regulatory modules (Policy, Authority, Signer, Attestor, EvidenceLocker).
  • Warning-only for other modules (to allow gradual adoption).

Observability Contract Gate

Purpose: Validate OTel spans, structured logs, and metrics contracts.

Script: scripts/ci/check-observability-contracts.sh

Check Description Threshold
Required spans missing Core operation spans not emitted Error
Span attribute missing Required attributes not present Error
High cardinality attribute Label cardinality exceeds limit Warning (> 50), Error (> 100)
PII in logs Sensitive data patterns in log output Error
Missing log fields Required fields not present Warning

Enforcement:

  • PR-gating for all W1 (WebService) modules.
  • Run as part of contract test lane.

Evidence Chain Gate

Purpose: Verify requirement -> test -> artifact traceability.

Script: scripts/ci/check-evidence-chain.sh

Check Description Action
Orphaned test Regulatory test without Requirement attribute Warning
Artifact hash drift Artifact hash changed unexpectedly Error
Artifact non-deterministic Multiple runs produce different artifacts Error
Traceability gap Requirement without test coverage Warning

Enforcement:

  • PR-gating for regulatory modules.
  • Traceability report generated as CI artifact.

Longevity Gate (Release Gating)

Purpose: Detect memory leaks, connection leaks, and counter drift under sustained load.

Script: scripts/ci/run-longevity-gate.sh Cadence: Nightly + pre-release

Metric Description Threshold
Memory growth rate Memory increase per hour ≤ 1%
Connection pool leaks Unreturned connections 0
Counter drift Counter value outside expected range Error
GC pressure Gen2 collections per hour ≤ 10

Enforcement:

  • Not PR-gating (too slow).
  • Release-gating: longevity tests must pass before release.
  • Results stored for trend analysis.

Interop Gate (Release Gating)

Purpose: Validate cross-version and environment compatibility.

Script: scripts/ci/run-interop-gate.sh Cadence: Weekly + pre-release

Check Description Threshold
N-1 compatibility Current server with previous client Must pass
N+1 compatibility Previous server with current client Must pass
Environment equivalence Same results across infra profiles ≤ 5% deviation

Profiles Tested:

  • standard: default Testcontainers configuration.
  • high-latency: +100ms network latency.
  • low-bandwidth: 10 Mbps limit.
  • packet-loss: 1% packet loss (Linux only).

Enforcement:

  • Not PR-gating (requires multi-version infrastructure).
  • Release-gating: interop tests must pass before release.

Post-Incident Gate

Purpose: Ensure incident-derived tests are maintained and passing.

Script: scripts/ci/check-post-incident-tests.sh

Check Description Action
Incident test failing PostIncident test not passing Error (P1/P2), Warning (P3)
Incident test missing metadata Missing IncidentId or RootCause Warning
Incident coverage P1/P2 incidents without tests Error

Enforcement:

  • PR-gating: P1/P2 incident tests must pass.
  • Release-gating: all incident tests must pass.

Gate Summary by Gating Level

PR-Gating (Must Pass for Merge)

Gate Scope
Reachability Quality All
TTFS Regression All
Intent Violation Regulatory modules
Observability Contract W1 modules
Evidence Chain Regulatory modules
Post-Incident (P1/P2) All

Release-Gating (Must Pass for Release)

Gate Scope
All PR gates All
Longevity Worker modules
Interop Schema/API-dependent modules
Post-Incident (all) All
Performance SLO All

Warning-Only (Informational)

Gate Scope
Intent missing Non-regulatory modules
Intent coverage drop All
Orphaned test All
Traceability gap All