# Alerting Rules > Prometheus alerting rules for the Release Orchestrator. **Status:** Planned (not yet implemented) **Source:** [Architecture Advisory Section 13.4](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) **Related Modules:** [Metrics](metrics.md), [Observability Overview](overview.md) ## Overview The Release Orchestrator provides Prometheus alerting rules for monitoring promotions, deployments, agents, and integrations. --- ## High Priority Alerts ### Security Gate Block Rate ```yaml - alert: PromotionGateBlockRate expr: | rate(stella_security_gate_results_total{result="blocked"}[1h]) / rate(stella_security_gate_results_total[1h]) > 0.5 for: 15m labels: severity: warning annotations: summary: "High rate of security gate blocks" description: "More than 50% of promotions are being blocked by security gates" ``` ### Deployment Failure Rate ```yaml - alert: DeploymentFailureRate expr: | rate(stella_deployments_total{status="failed"}[1h]) / rate(stella_deployments_total[1h]) > 0.1 for: 10m labels: severity: critical annotations: summary: "High deployment failure rate" description: "More than 10% of deployments are failing" ``` ### Agent Offline ```yaml - alert: AgentOffline expr: | stella_agents_status{status="offline"} == 1 for: 5m labels: severity: warning annotations: summary: "Agent offline" description: "Agent {{ $labels.agent_id }} has been offline for 5 minutes" ``` ### Promotion Stuck ```yaml - alert: PromotionStuck expr: | time() - stella_promotion_start_time{status="deploying"} > 1800 for: 5m labels: severity: warning annotations: summary: "Promotion stuck in deploying state" description: "Promotion {{ $labels.promotion_id }} has been deploying for more than 30 minutes" ``` ### Integration Unhealthy ```yaml - alert: IntegrationUnhealthy expr: | stella_integration_health{status="unhealthy"} == 1 for: 10m labels: severity: warning annotations: summary: "Integration unhealthy" description: "Integration {{ $labels.integration_name }} has been unhealthy for 10 minutes" ``` --- ## Medium Priority Alerts ### Workflow Step Timeout ```yaml - alert: WorkflowStepTimeout expr: | stella_workflow_step_duration_seconds > 600 for: 1m labels: severity: warning annotations: summary: "Workflow step taking too long" description: "Step {{ $labels.step_type }} in workflow {{ $labels.workflow_run_id }} has been running for more than 10 minutes" ``` ### Evidence Generation Failure ```yaml - alert: EvidenceGenerationFailure expr: | rate(stella_evidence_generation_failures_total[1h]) > 0 for: 5m labels: severity: warning annotations: summary: "Evidence generation failures" description: "Evidence generation is failing, affecting audit compliance" ``` ### Target Health Degraded ```yaml - alert: TargetHealthDegraded expr: | stella_target_health{status!="healthy"} == 1 for: 5m labels: severity: warning annotations: summary: "Target health degraded" description: "Target {{ $labels.target_name }} is reporting {{ $labels.status }}" ``` ### Approval Timeout ```yaml - alert: ApprovalTimeout expr: | time() - stella_promotion_approval_requested_time > 86400 for: 1h labels: severity: warning annotations: summary: "Promotion awaiting approval for too long" description: "Promotion {{ $labels.promotion_id }} has been waiting for approval for more than 24 hours" ``` --- ## Low Priority Alerts ### Database Connection Pool ```yaml - alert: DatabaseConnectionPoolExhausted expr: | stella_db_connection_pool_available < 5 for: 5m labels: severity: warning annotations: summary: "Database connection pool running low" description: "Only {{ $value }} database connections available" ``` ### Plugin Error Rate ```yaml - alert: PluginErrorRate expr: | rate(stella_plugin_errors_total[5m]) > 1 for: 5m labels: severity: warning annotations: summary: "Plugin errors detected" description: "Plugin {{ $labels.plugin_id }} is experiencing errors" ``` --- ## Alert Routing ### Example AlertManager Configuration ```yaml # alertmanager.yaml route: receiver: default group_by: [alertname, severity] group_wait: 30s group_interval: 5m repeat_interval: 4h routes: - match: severity: critical receiver: pagerduty continue: true - match: severity: warning receiver: slack receivers: - name: default webhook_configs: - url: http://webhook.example.com/alerts - name: pagerduty pagerduty_configs: - service_key: ${PAGERDUTY_KEY} severity: critical - name: slack slack_configs: - channel: '#alerts' api_url: ${SLACK_WEBHOOK_URL} title: '{{ .CommonAnnotations.summary }}' text: '{{ .CommonAnnotations.description }}' ``` --- ## Dashboard Integration ### Grafana Alert Panels Recommended dashboard panels for alerts: | Panel | Query | |-------|-------| | Active Alerts | `count(ALERTS{alertstate="firing"})` | | Alert History | `count_over_time(ALERTS{alertstate="firing"}[24h])` | | By Severity | `count(ALERTS{alertstate="firing"}) by (severity)` | | By Component | `count(ALERTS{alertstate="firing"}) by (alertname)` | --- ## See Also - [Metrics](metrics.md) - [Observability Overview](overview.md) - [Logging](logging.md) - [Tracing](tracing.md)