add release orchestrator docs and sprints gaps fills
This commit is contained in:
222
docs/modules/release-orchestrator/operations/tracing.md
Normal file
222
docs/modules/release-orchestrator/operations/tracing.md
Normal file
@@ -0,0 +1,222 @@
|
||||
# Distributed Tracing Specification
|
||||
|
||||
> OpenTelemetry-based distributed tracing for the Release Orchestrator.
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
**Source:** [Architecture Advisory Section 13.3](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
**Related Modules:** [Observability Overview](overview.md), [Logging](logging.md)
|
||||
|
||||
## Overview
|
||||
|
||||
The Release Orchestrator uses OpenTelemetry for distributed tracing, enabling end-to-end visibility of promotion workflows, deployments, and agent tasks.
|
||||
|
||||
---
|
||||
|
||||
## Trace Context Propagation
|
||||
|
||||
### W3C Trace Context
|
||||
|
||||
```typescript
|
||||
// Trace context structure
|
||||
interface TraceContext {
|
||||
traceId: string; // 32-char hex
|
||||
spanId: string; // 16-char hex
|
||||
parentSpanId?: string;
|
||||
sampled: boolean;
|
||||
baggage: Record<string, string>;
|
||||
}
|
||||
|
||||
// Propagation headers
|
||||
const TRACE_HEADERS = {
|
||||
W3C_TRACEPARENT: "traceparent",
|
||||
W3C_TRACESTATE: "tracestate",
|
||||
BAGGAGE: "baggage",
|
||||
};
|
||||
|
||||
// Example traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
|
||||
```
|
||||
|
||||
### Header Format
|
||||
|
||||
```
|
||||
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
|
||||
^ ^ ^ ^
|
||||
| | | |
|
||||
| trace-id (32 hex) span-id (16 hex) flags
|
||||
version
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Traces
|
||||
|
||||
| Operation | Span Name | Attributes |
|
||||
|-----------|-----------|------------|
|
||||
| Promotion request | `promotion.request` | promotion_id, release_id, environment |
|
||||
| Gate evaluation | `promotion.evaluate_gates` | gate_names, result |
|
||||
| Workflow execution | `workflow.execute` | workflow_run_id, template_name |
|
||||
| Step execution | `workflow.step.{type}` | step_run_id, node_id, inputs |
|
||||
| Deployment job | `deployment.execute` | job_id, environment, strategy |
|
||||
| Agent task | `agent.task.{type}` | task_id, agent_id, target_id |
|
||||
| Plugin call | `plugin.{method}` | plugin_id, method, duration |
|
||||
|
||||
---
|
||||
|
||||
## Trace Hierarchy
|
||||
|
||||
### Promotion Flow
|
||||
|
||||
```
|
||||
promotion.request (root)
|
||||
+-- promotion.evaluate_gates
|
||||
| +-- gate.security
|
||||
| +-- gate.approval
|
||||
| +-- gate.freeze_window
|
||||
|
|
||||
+-- workflow.execute
|
||||
| +-- workflow.step.security-check
|
||||
| +-- workflow.step.approval
|
||||
| +-- workflow.step.deploy
|
||||
| +-- deployment.execute
|
||||
| +-- deployment.assign_tasks
|
||||
| +-- agent.task.pull
|
||||
| +-- agent.task.deploy
|
||||
| +-- agent.task.health_check
|
||||
|
|
||||
+-- evidence.generate
|
||||
+-- evidence.sign
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Span Attributes
|
||||
|
||||
### Common Attributes
|
||||
|
||||
| Attribute | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `tenant.id` | string | Tenant UUID |
|
||||
| `user.id` | string | User UUID (if authenticated) |
|
||||
| `release.id` | string | Release UUID |
|
||||
| `environment.name` | string | Environment name |
|
||||
| `error` | boolean | Whether error occurred |
|
||||
| `error.type` | string | Error type/class |
|
||||
|
||||
### Promotion Attributes
|
||||
|
||||
| Attribute | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `promotion.id` | string | Promotion UUID |
|
||||
| `promotion.status` | string | Current status |
|
||||
| `promotion.gates` | string[] | Gates evaluated |
|
||||
| `promotion.decision` | string | allow/deny |
|
||||
|
||||
### Deployment Attributes
|
||||
|
||||
| Attribute | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `deployment.job_id` | string | Deployment job UUID |
|
||||
| `deployment.strategy` | string | Deployment strategy |
|
||||
| `deployment.target_count` | int | Number of targets |
|
||||
| `deployment.batch_size` | int | Batch size |
|
||||
|
||||
### Agent Task Attributes
|
||||
|
||||
| Attribute | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `task.id` | string | Task UUID |
|
||||
| `task.type` | string | Task type |
|
||||
| `agent.id` | string | Agent UUID |
|
||||
| `target.id` | string | Target UUID |
|
||||
|
||||
---
|
||||
|
||||
## OpenTelemetry Configuration
|
||||
|
||||
### SDK Configuration
|
||||
|
||||
```yaml
|
||||
# otel-config.yaml
|
||||
service:
|
||||
name: stella-release-orchestrator
|
||||
version: ${VERSION}
|
||||
|
||||
exporters:
|
||||
otlp:
|
||||
endpoint: otel-collector:4317
|
||||
protocol: grpc
|
||||
|
||||
processors:
|
||||
batch:
|
||||
timeout: 10s
|
||||
send_batch_size: 1024
|
||||
|
||||
resource:
|
||||
attributes:
|
||||
- key: service.namespace
|
||||
value: stella-ops
|
||||
- key: deployment.environment
|
||||
value: ${ENVIRONMENT}
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
OTEL_SERVICE_NAME=stella-release-orchestrator
|
||||
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
|
||||
OTEL_EXPORTER_OTLP_PROTOCOL=grpc
|
||||
OTEL_TRACES_SAMPLER=parentbased_traceidratio
|
||||
OTEL_TRACES_SAMPLER_ARG=0.1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Sampling Strategy
|
||||
|
||||
| Environment | Sampling Rate | Reason |
|
||||
|-------------|---------------|--------|
|
||||
| Development | 100% | Full visibility |
|
||||
| Staging | 100% | Full visibility |
|
||||
| Production | 10% | Cost/performance |
|
||||
| Production (errors) | 100% | Always sample errors |
|
||||
|
||||
---
|
||||
|
||||
## Example Trace
|
||||
|
||||
```json
|
||||
{
|
||||
"traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
|
||||
"spans": [
|
||||
{
|
||||
"spanId": "00f067aa0ba902b7",
|
||||
"name": "promotion.request",
|
||||
"duration_ms": 5234,
|
||||
"attributes": {
|
||||
"promotion.id": "promo-123",
|
||||
"release.id": "rel-456",
|
||||
"environment.name": "production"
|
||||
}
|
||||
},
|
||||
{
|
||||
"spanId": "00f067aa0ba902b8",
|
||||
"parentSpanId": "00f067aa0ba902b7",
|
||||
"name": "gate.security",
|
||||
"duration_ms": 234,
|
||||
"attributes": {
|
||||
"gate.result": "passed",
|
||||
"vulnerabilities.critical": 0
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [Observability Overview](overview.md)
|
||||
- [Logging](logging.md)
|
||||
- [Metrics](metrics.md)
|
||||
- [Alerting](alerting.md)
|
||||
Reference in New Issue
Block a user