5.3 KiB
5.3 KiB
Distributed Tracing Specification
OpenTelemetry-based distributed tracing for the Release Orchestrator.
Status: Planned (not yet implemented) Source: Architecture Advisory Section 13.3 Related Modules: Observability Overview, Logging
Overview
The Release Orchestrator uses OpenTelemetry for distributed tracing, enabling end-to-end visibility of promotion workflows, deployments, and agent tasks.
Trace Context Propagation
W3C Trace Context
// Trace context structure
interface TraceContext {
traceId: string; // 32-char hex
spanId: string; // 16-char hex
parentSpanId?: string;
sampled: boolean;
baggage: Record<string, string>;
}
// Propagation headers
const TRACE_HEADERS = {
W3C_TRACEPARENT: "traceparent",
W3C_TRACESTATE: "tracestate",
BAGGAGE: "baggage",
};
// Example traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
Header Format
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
^ ^ ^ ^
| | | |
| trace-id (32 hex) span-id (16 hex) flags
version
Key Traces
| Operation | Span Name | Attributes |
|---|---|---|
| Promotion request | promotion.request |
promotion_id, release_id, environment |
| Gate evaluation | promotion.evaluate_gates |
gate_names, result |
| Workflow execution | workflow.execute |
workflow_run_id, template_name |
| Step execution | workflow.step.{type} |
step_run_id, node_id, inputs |
| Deployment job | deployment.execute |
job_id, environment, strategy |
| Agent task | agent.task.{type} |
task_id, agent_id, target_id |
| Plugin call | plugin.{method} |
plugin_id, method, duration |
Trace Hierarchy
Promotion Flow
promotion.request (root)
+-- promotion.evaluate_gates
| +-- gate.security
| +-- gate.approval
| +-- gate.freeze_window
|
+-- workflow.execute
| +-- workflow.step.security-check
| +-- workflow.step.approval
| +-- workflow.step.deploy
| +-- deployment.execute
| +-- deployment.assign_tasks
| +-- agent.task.pull
| +-- agent.task.deploy
| +-- agent.task.health_check
|
+-- evidence.generate
+-- evidence.sign
Span Attributes
Common Attributes
| Attribute | Type | Description |
|---|---|---|
tenant.id |
string | Tenant UUID |
user.id |
string | User UUID (if authenticated) |
release.id |
string | Release UUID |
environment.name |
string | Environment name |
error |
boolean | Whether error occurred |
error.type |
string | Error type/class |
Promotion Attributes
| Attribute | Type | Description |
|---|---|---|
promotion.id |
string | Promotion UUID |
promotion.status |
string | Current status |
promotion.gates |
string[] | Gates evaluated |
promotion.decision |
string | allow/deny |
Deployment Attributes
| Attribute | Type | Description |
|---|---|---|
deployment.job_id |
string | Deployment job UUID |
deployment.strategy |
string | Deployment strategy |
deployment.target_count |
int | Number of targets |
deployment.batch_size |
int | Batch size |
Agent Task Attributes
| Attribute | Type | Description |
|---|---|---|
task.id |
string | Task UUID |
task.type |
string | Task type |
agent.id |
string | Agent UUID |
target.id |
string | Target UUID |
OpenTelemetry Configuration
SDK Configuration
# otel-config.yaml
service:
name: stella-release-orchestrator
version: ${VERSION}
exporters:
otlp:
endpoint: otel-collector:4317
protocol: grpc
processors:
batch:
timeout: 10s
send_batch_size: 1024
resource:
attributes:
- key: service.namespace
value: stella-ops
- key: deployment.environment
value: ${ENVIRONMENT}
Environment Variables
OTEL_SERVICE_NAME=stella-release-orchestrator
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
OTEL_EXPORTER_OTLP_PROTOCOL=grpc
OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=0.1
Sampling Strategy
| Environment | Sampling Rate | Reason |
|---|---|---|
| Development | 100% | Full visibility |
| Staging | 100% | Full visibility |
| Production | 10% | Cost/performance |
| Production (errors) | 100% | Always sample errors |
Example Trace
{
"traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
"spans": [
{
"spanId": "00f067aa0ba902b7",
"name": "promotion.request",
"duration_ms": 5234,
"attributes": {
"promotion.id": "promo-123",
"release.id": "rel-456",
"environment.name": "production"
}
},
{
"spanId": "00f067aa0ba902b8",
"parentSpanId": "00f067aa0ba902b7",
"name": "gate.security",
"duration_ms": 234,
"attributes": {
"gate.result": "passed",
"vulnerabilities.critical": 0
}
}
]
}