feat(metrics): Implement scan metrics repository and PostgreSQL integration
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
- Added IScanMetricsRepository interface for scan metrics persistence and retrieval. - Implemented PostgresScanMetricsRepository for PostgreSQL database interactions, including methods for saving and retrieving scan metrics and execution phases. - Introduced methods for obtaining TTE statistics and recent scans for tenants. - Implemented deletion of old metrics for retention purposes. test(tests): Add SCA Failure Catalogue tests for FC6-FC10 - Created ScaCatalogueDeterminismTests to validate determinism properties of SCA Failure Catalogue fixtures. - Developed ScaFailureCatalogueTests to ensure correct handling of specific failure modes in the scanner. - Included tests for manifest validation, file existence, and expected findings across multiple failure cases. feat(telemetry): Integrate scan completion metrics into the pipeline - Introduced IScanCompletionMetricsIntegration interface and ScanCompletionMetricsIntegration class to record metrics upon scan completion. - Implemented proof coverage and TTE metrics recording with logging for scan completion summaries.
This commit is contained in:
155
docs/testing/ci-quality-gates.md
Normal file
155
docs/testing/ci-quality-gates.md
Normal file
@@ -0,0 +1,155 @@
|
||||
# CI Quality Gates
|
||||
|
||||
Sprint: `SPRINT_0350_0001_0001_ci_quality_gates_foundation`
|
||||
Task: `QGATE-0350-009`
|
||||
|
||||
## Overview
|
||||
|
||||
StellaOps implements automated quality gates in CI to enforce:
|
||||
- **Reachability Quality** - Recall/precision thresholds for vulnerability detection
|
||||
- **TTFS Regression** - Time-to-First-Signal performance tracking
|
||||
- **Performance SLOs** - Scan time and compute budget enforcement
|
||||
|
||||
These gates run as part of the `build-test-deploy.yml` workflow after the main test suite completes.
|
||||
|
||||
## Quality Gate Jobs
|
||||
|
||||
### Reachability Quality Gate
|
||||
|
||||
**Script:** `scripts/ci/compute-reachability-metrics.sh`
|
||||
**Config:** `scripts/ci/reachability-thresholds.yaml`
|
||||
|
||||
Validates that the scanner meets recall/precision thresholds against the ground-truth corpus.
|
||||
|
||||
#### Metrics Computed
|
||||
|
||||
| Metric | Description | Threshold |
|
||||
|--------|-------------|-----------|
|
||||
| `runtime_dependency_recall` | % of runtime dep vulns detected | ≥ 95% |
|
||||
| `unreachable_false_positives` | FP rate for unreachable findings | ≤ 5% |
|
||||
| `reachability_underreport` | Underreporting rate | ≤ 10% |
|
||||
| `os_package_recall` | % of OS package vulns detected | ≥ 92% |
|
||||
| `code_vuln_recall` | % of code vulns detected | ≥ 88% |
|
||||
| `config_vuln_recall` | % of config vulns detected | ≥ 85% |
|
||||
|
||||
#### Running Locally
|
||||
|
||||
```bash
|
||||
# Dry run (no enforcement)
|
||||
./scripts/ci/compute-reachability-metrics.sh --dry-run
|
||||
|
||||
# Full run against corpus
|
||||
./scripts/ci/compute-reachability-metrics.sh
|
||||
```
|
||||
|
||||
### TTFS Regression Gate
|
||||
|
||||
**Script:** `scripts/ci/compute-ttfs-metrics.sh`
|
||||
**Baseline:** `bench/baselines/ttfs-baseline.json`
|
||||
|
||||
Detects performance regressions in Time-to-First-Signal.
|
||||
|
||||
#### Metrics Computed
|
||||
|
||||
| Metric | Description | Threshold |
|
||||
|--------|-------------|-----------|
|
||||
| `ttfs_p50_ms` | P50 time to first signal | ≤ baseline + 10% |
|
||||
| `ttfs_p95_ms` | P95 time to first signal | ≤ baseline + 15% |
|
||||
| `ttfs_max_ms` | Maximum TTFS | ≤ baseline + 25% |
|
||||
|
||||
#### Baseline Format
|
||||
|
||||
```json
|
||||
{
|
||||
"ttfs_p50_ms": 450,
|
||||
"ttfs_p95_ms": 1200,
|
||||
"ttfs_max_ms": 3000,
|
||||
"measured_at": "2025-12-16T00:00:00Z",
|
||||
"sample_count": 1000
|
||||
}
|
||||
```
|
||||
|
||||
### Performance SLO Gate
|
||||
|
||||
**Script:** `scripts/ci/enforce-performance-slos.sh`
|
||||
**Config:** `scripts/ci/performance-slos.yaml`
|
||||
|
||||
Enforces scan time and compute budget SLOs.
|
||||
|
||||
#### SLOs Enforced
|
||||
|
||||
| SLO | Description | Target |
|
||||
|-----|-------------|--------|
|
||||
| `scan_time_p50_ms` | P50 scan time | ≤ 120,000ms (2 min) |
|
||||
| `scan_time_p95_ms` | P95 scan time | ≤ 300,000ms (5 min) |
|
||||
| `memory_peak_mb` | Peak memory usage | ≤ 2048 MB |
|
||||
| `cpu_seconds` | Total CPU time | ≤ 120 seconds |
|
||||
|
||||
## Workflow Integration
|
||||
|
||||
Quality gates are integrated into the main CI workflow:
|
||||
|
||||
```yaml
|
||||
# .gitea/workflows/build-test-deploy.yml
|
||||
|
||||
quality-gates:
|
||||
runs-on: ubuntu-22.04
|
||||
needs: build-test
|
||||
steps:
|
||||
- name: Reachability quality gate
|
||||
run: ./scripts/ci/compute-reachability-metrics.sh
|
||||
|
||||
- name: TTFS regression gate
|
||||
run: ./scripts/ci/compute-ttfs-metrics.sh
|
||||
|
||||
- name: Performance SLO gate
|
||||
run: ./scripts/ci/enforce-performance-slos.sh --warn-only
|
||||
```
|
||||
|
||||
## Failure Modes
|
||||
|
||||
### Hard Failure (Blocks Merge)
|
||||
|
||||
- Reachability recall below threshold
|
||||
- TTFS regression exceeds 25%
|
||||
- Memory budget exceeded by 50%
|
||||
|
||||
### Soft Failure (Warning Only)
|
||||
|
||||
- Minor TTFS regression (< 15%)
|
||||
- Memory near budget limit
|
||||
- Missing baseline data (new fixtures)
|
||||
|
||||
## Adding New Quality Gates
|
||||
|
||||
1. Create computation script in `scripts/ci/`
|
||||
2. Add threshold configuration (YAML or JSON)
|
||||
3. Integrate into workflow as a new step
|
||||
4. Update this documentation
|
||||
5. Add to sprint tracking
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Gate Fails on PR but Passes on Main
|
||||
|
||||
Check for:
|
||||
- Non-deterministic test execution
|
||||
- Timing-sensitive assertions
|
||||
- Missing test fixtures in PR branch
|
||||
|
||||
### Baseline Drift
|
||||
|
||||
If baselines become stale:
|
||||
|
||||
```bash
|
||||
# Regenerate baselines
|
||||
./scripts/ci/compute-ttfs-metrics.sh --update-baseline
|
||||
./scripts/ci/compute-reachability-metrics.sh --update-baseline
|
||||
```
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Test Suite Overview](../19_TEST_SUITE_OVERVIEW.md)
|
||||
- [Reachability Corpus Plan](../reachability/corpus-plan.md)
|
||||
- [Performance Workbook](../12_PERFORMANCE_WORKBOOK.md)
|
||||
- [Testing Quality Guardrails](./testing-quality-guardrails-implementation.md)
|
||||
Reference in New Issue
Block a user