add release orchestrator docs and sprints gaps fills

This commit is contained in:
2026-01-11 01:05:17 +02:00
parent d58c093887
commit a62974a8c2
37 changed files with 6061 additions and 0 deletions

View File

@@ -0,0 +1,222 @@
# Distributed Tracing Specification
> OpenTelemetry-based distributed tracing for the Release Orchestrator.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 13.3](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Observability Overview](overview.md), [Logging](logging.md)
## Overview
The Release Orchestrator uses OpenTelemetry for distributed tracing, enabling end-to-end visibility of promotion workflows, deployments, and agent tasks.
---
## Trace Context Propagation
### W3C Trace Context
```typescript
// Trace context structure
interface TraceContext {
traceId: string; // 32-char hex
spanId: string; // 16-char hex
parentSpanId?: string;
sampled: boolean;
baggage: Record<string, string>;
}
// Propagation headers
const TRACE_HEADERS = {
W3C_TRACEPARENT: "traceparent",
W3C_TRACESTATE: "tracestate",
BAGGAGE: "baggage",
};
// Example traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
```
### Header Format
```
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
^ ^ ^ ^
| | | |
| trace-id (32 hex) span-id (16 hex) flags
version
```
---
## Key Traces
| Operation | Span Name | Attributes |
|-----------|-----------|------------|
| Promotion request | `promotion.request` | promotion_id, release_id, environment |
| Gate evaluation | `promotion.evaluate_gates` | gate_names, result |
| Workflow execution | `workflow.execute` | workflow_run_id, template_name |
| Step execution | `workflow.step.{type}` | step_run_id, node_id, inputs |
| Deployment job | `deployment.execute` | job_id, environment, strategy |
| Agent task | `agent.task.{type}` | task_id, agent_id, target_id |
| Plugin call | `plugin.{method}` | plugin_id, method, duration |
---
## Trace Hierarchy
### Promotion Flow
```
promotion.request (root)
+-- promotion.evaluate_gates
| +-- gate.security
| +-- gate.approval
| +-- gate.freeze_window
|
+-- workflow.execute
| +-- workflow.step.security-check
| +-- workflow.step.approval
| +-- workflow.step.deploy
| +-- deployment.execute
| +-- deployment.assign_tasks
| +-- agent.task.pull
| +-- agent.task.deploy
| +-- agent.task.health_check
|
+-- evidence.generate
+-- evidence.sign
```
---
## Span Attributes
### Common Attributes
| Attribute | Type | Description |
|-----------|------|-------------|
| `tenant.id` | string | Tenant UUID |
| `user.id` | string | User UUID (if authenticated) |
| `release.id` | string | Release UUID |
| `environment.name` | string | Environment name |
| `error` | boolean | Whether error occurred |
| `error.type` | string | Error type/class |
### Promotion Attributes
| Attribute | Type | Description |
|-----------|------|-------------|
| `promotion.id` | string | Promotion UUID |
| `promotion.status` | string | Current status |
| `promotion.gates` | string[] | Gates evaluated |
| `promotion.decision` | string | allow/deny |
### Deployment Attributes
| Attribute | Type | Description |
|-----------|------|-------------|
| `deployment.job_id` | string | Deployment job UUID |
| `deployment.strategy` | string | Deployment strategy |
| `deployment.target_count` | int | Number of targets |
| `deployment.batch_size` | int | Batch size |
### Agent Task Attributes
| Attribute | Type | Description |
|-----------|------|-------------|
| `task.id` | string | Task UUID |
| `task.type` | string | Task type |
| `agent.id` | string | Agent UUID |
| `target.id` | string | Target UUID |
---
## OpenTelemetry Configuration
### SDK Configuration
```yaml
# otel-config.yaml
service:
name: stella-release-orchestrator
version: ${VERSION}
exporters:
otlp:
endpoint: otel-collector:4317
protocol: grpc
processors:
batch:
timeout: 10s
send_batch_size: 1024
resource:
attributes:
- key: service.namespace
value: stella-ops
- key: deployment.environment
value: ${ENVIRONMENT}
```
### Environment Variables
```bash
OTEL_SERVICE_NAME=stella-release-orchestrator
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
OTEL_EXPORTER_OTLP_PROTOCOL=grpc
OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=0.1
```
---
## Sampling Strategy
| Environment | Sampling Rate | Reason |
|-------------|---------------|--------|
| Development | 100% | Full visibility |
| Staging | 100% | Full visibility |
| Production | 10% | Cost/performance |
| Production (errors) | 100% | Always sample errors |
---
## Example Trace
```json
{
"traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
"spans": [
{
"spanId": "00f067aa0ba902b7",
"name": "promotion.request",
"duration_ms": 5234,
"attributes": {
"promotion.id": "promo-123",
"release.id": "rel-456",
"environment.name": "production"
}
},
{
"spanId": "00f067aa0ba902b8",
"parentSpanId": "00f067aa0ba902b7",
"name": "gate.security",
"duration_ms": 234,
"attributes": {
"gate.result": "passed",
"vulnerabilities.critical": 0
}
}
]
}
```
---
## See Also
- [Observability Overview](overview.md)
- [Logging](logging.md)
- [Metrics](metrics.md)
- [Alerting](alerting.md)