Files
git.stella-ops.org/docs/modules/release-orchestrator/operations/tracing.md

5.3 KiB

Distributed Tracing Specification

OpenTelemetry-based distributed tracing for the Release Orchestrator.

Status: Planned (not yet implemented) Source: Architecture Advisory Section 13.3 Related Modules: Observability Overview, Logging

Overview

The Release Orchestrator uses OpenTelemetry for distributed tracing, enabling end-to-end visibility of promotion workflows, deployments, and agent tasks.


Trace Context Propagation

W3C Trace Context

// Trace context structure
interface TraceContext {
  traceId: string;        // 32-char hex
  spanId: string;         // 16-char hex
  parentSpanId?: string;
  sampled: boolean;
  baggage: Record<string, string>;
}

// Propagation headers
const TRACE_HEADERS = {
  W3C_TRACEPARENT: "traceparent",
  W3C_TRACESTATE: "tracestate",
  BAGGAGE: "baggage",
};

// Example traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01

Header Format

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
             ^  ^                                ^                ^
             |  |                                |                |
             |  trace-id (32 hex)                span-id (16 hex) flags
             version

Key Traces

Operation Span Name Attributes
Promotion request promotion.request promotion_id, release_id, environment
Gate evaluation promotion.evaluate_gates gate_names, result
Workflow execution workflow.execute workflow_run_id, template_name
Step execution workflow.step.{type} step_run_id, node_id, inputs
Deployment job deployment.execute job_id, environment, strategy
Agent task agent.task.{type} task_id, agent_id, target_id
Plugin call plugin.{method} plugin_id, method, duration

Trace Hierarchy

Promotion Flow

promotion.request (root)
+-- promotion.evaluate_gates
|   +-- gate.security
|   +-- gate.approval
|   +-- gate.freeze_window
|
+-- workflow.execute
|   +-- workflow.step.security-check
|   +-- workflow.step.approval
|   +-- workflow.step.deploy
|       +-- deployment.execute
|           +-- deployment.assign_tasks
|           +-- agent.task.pull
|           +-- agent.task.deploy
|           +-- agent.task.health_check
|
+-- evidence.generate
    +-- evidence.sign

Span Attributes

Common Attributes

Attribute Type Description
tenant.id string Tenant UUID
user.id string User UUID (if authenticated)
release.id string Release UUID
environment.name string Environment name
error boolean Whether error occurred
error.type string Error type/class

Promotion Attributes

Attribute Type Description
promotion.id string Promotion UUID
promotion.status string Current status
promotion.gates string[] Gates evaluated
promotion.decision string allow/deny

Deployment Attributes

Attribute Type Description
deployment.job_id string Deployment job UUID
deployment.strategy string Deployment strategy
deployment.target_count int Number of targets
deployment.batch_size int Batch size

Agent Task Attributes

Attribute Type Description
task.id string Task UUID
task.type string Task type
agent.id string Agent UUID
target.id string Target UUID

OpenTelemetry Configuration

SDK Configuration

# otel-config.yaml
service:
  name: stella-release-orchestrator
  version: ${VERSION}

exporters:
  otlp:
    endpoint: otel-collector:4317
    protocol: grpc

processors:
  batch:
    timeout: 10s
    send_batch_size: 1024

resource:
  attributes:
    - key: service.namespace
      value: stella-ops
    - key: deployment.environment
      value: ${ENVIRONMENT}

Environment Variables

OTEL_SERVICE_NAME=stella-release-orchestrator
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
OTEL_EXPORTER_OTLP_PROTOCOL=grpc
OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=0.1

Sampling Strategy

Environment Sampling Rate Reason
Development 100% Full visibility
Staging 100% Full visibility
Production 10% Cost/performance
Production (errors) 100% Always sample errors

Example Trace

{
  "traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
  "spans": [
    {
      "spanId": "00f067aa0ba902b7",
      "name": "promotion.request",
      "duration_ms": 5234,
      "attributes": {
        "promotion.id": "promo-123",
        "release.id": "rel-456",
        "environment.name": "production"
      }
    },
    {
      "spanId": "00f067aa0ba902b8",
      "parentSpanId": "00f067aa0ba902b7",
      "name": "gate.security",
      "duration_ms": 234,
      "attributes": {
        "gate.result": "passed",
        "vulnerabilities.critical": 0
      }
    }
  ]
}

See Also