# Distributed Tracing Specification

> OpenTelemetry-based distributed tracing for the Release Orchestrator.

**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 13.3](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Observability Overview](overview.md), [Logging](logging.md)

## Overview

The Release Orchestrator uses OpenTelemetry for distributed tracing, enabling end-to-end visibility of promotion workflows, deployments, and agent tasks.

---

## Trace Context Propagation

### W3C Trace Context

```typescript
// Trace context structure
interface TraceContext {
  traceId: string;        // 32-char hex
  spanId: string;         // 16-char hex
  parentSpanId?: string;
  sampled: boolean;
  baggage: Record<string, string>;
}

// Propagation headers
const TRACE_HEADERS = {
  W3C_TRACEPARENT: "traceparent",
  W3C_TRACESTATE: "tracestate",
  BAGGAGE: "baggage",
};

// Example traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
```

### Header Format

```
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
             ^  ^                                ^                ^
             |  |                                |                |
             |  trace-id (32 hex)                span-id (16 hex) flags
             version
```

---

## Key Traces

| Operation | Span Name | Attributes |
|-----------|-----------|------------|
| Promotion request | `promotion.request` | promotion_id, release_id, environment |
| Gate evaluation | `promotion.evaluate_gates` | gate_names, result |
| Workflow execution | `workflow.execute` | workflow_run_id, template_name |
| Step execution | `workflow.step.{type}` | step_run_id, node_id, inputs |
| Deployment job | `deployment.execute` | job_id, environment, strategy |
| Agent task | `agent.task.{type}` | task_id, agent_id, target_id |
| Plugin call | `plugin.{method}` | plugin_id, method, duration |

---

## Trace Hierarchy

### Promotion Flow

```
promotion.request (root)
+-- promotion.evaluate_gates
|   +-- gate.security
|   +-- gate.approval
|   +-- gate.freeze_window
|
+-- workflow.execute
|   +-- workflow.step.security-check
|   +-- workflow.step.approval
|   +-- workflow.step.deploy
|       +-- deployment.execute
|           +-- deployment.assign_tasks
|           +-- agent.task.pull
|           +-- agent.task.deploy
|           +-- agent.task.health_check
|
+-- evidence.generate
    +-- evidence.sign
```

---

## Span Attributes

### Common Attributes

| Attribute | Type | Description |
|-----------|------|-------------|
| `tenant.id` | string | Tenant UUID |
| `user.id` | string | User UUID (if authenticated) |
| `release.id` | string | Release UUID |
| `environment.name` | string | Environment name |
| `error` | boolean | Whether error occurred |
| `error.type` | string | Error type/class |

### Promotion Attributes

| Attribute | Type | Description |
|-----------|------|-------------|
| `promotion.id` | string | Promotion UUID |
| `promotion.status` | string | Current status |
| `promotion.gates` | string[] | Gates evaluated |
| `promotion.decision` | string | allow/deny |

### Deployment Attributes

| Attribute | Type | Description |
|-----------|------|-------------|
| `deployment.job_id` | string | Deployment job UUID |
| `deployment.strategy` | string | Deployment strategy |
| `deployment.target_count` | int | Number of targets |
| `deployment.batch_size` | int | Batch size |

### Agent Task Attributes

| Attribute | Type | Description |
|-----------|------|-------------|
| `task.id` | string | Task UUID |
| `task.type` | string | Task type |
| `agent.id` | string | Agent UUID |
| `target.id` | string | Target UUID |

---

## OpenTelemetry Configuration

### SDK Configuration

```yaml
# otel-config.yaml
service:
  name: stella-release-orchestrator
  version: ${VERSION}

exporters:
  otlp:
    endpoint: otel-collector:4317
    protocol: grpc

processors:
  batch:
    timeout: 10s
    send_batch_size: 1024

resource:
  attributes:
    - key: service.namespace
      value: stella-ops
    - key: deployment.environment
      value: ${ENVIRONMENT}
```

### Environment Variables

```bash
OTEL_SERVICE_NAME=stella-release-orchestrator
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
OTEL_EXPORTER_OTLP_PROTOCOL=grpc
OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=0.1
```

---

## Sampling Strategy

| Environment | Sampling Rate | Reason |
|-------------|---------------|--------|
| Development | 100% | Full visibility |
| Staging | 100% | Full visibility |
| Production | 10% | Cost/performance |
| Production (errors) | 100% | Always sample errors |

---

## Example Trace

```json
{
  "traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
  "spans": [
    {
      "spanId": "00f067aa0ba902b7",
      "name": "promotion.request",
      "duration_ms": 5234,
      "attributes": {
        "promotion.id": "promo-123",
        "release.id": "rel-456",
        "environment.name": "production"
      }
    },
    {
      "spanId": "00f067aa0ba902b8",
      "parentSpanId": "00f067aa0ba902b7",
      "name": "gate.security",
      "duration_ms": 234,
      "attributes": {
        "gate.result": "passed",
        "vulnerabilities.critical": 0
      }
    }
  ]
}
```

---

## See Also

- [Observability Overview](overview.md)
- [Logging](logging.md)
- [Metrics](metrics.md)
- [Alerting](alerting.md)