Files
git.stella-ops.org/docs/orchestrator/run-ledger.md
StellaOps Bot 9f6e6f7fb3
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
up
2025-11-25 22:09:44 +02:00

37 lines
1.6 KiB
Markdown

# Orchestrator Run Ledger (DOCS-ORCH-34-001)
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
Last updated: 2025-11-25
## Purpose
Immutable record of every DAG run and step execution for audit, replay, and offline export.
## Record schema (conceptual)
- `tenant`, `runId`, `dagId`, `dagVersion`, `runToken`, `traceId`
- `status` (`running|completed|failed|cancelled`)
- `inputsHash`, `outputsHash` (overall)
- `startedUtc`, `endedUtc`, `durationMs`
- `steps[]`:
- `stepId`, `status`, `attempt`, `startedUtc`, `endedUtc`, `durationMs`
- `inputsHash`, `outputsHash`, `logsRef`, `metricsRef`, `errorCode`, `retryable`
- `events[]` (optional): ordered list of significant events with `timestamp`, `type`, `message`, `actor`
## Storage
- Mongo collection partitioned by tenant; indexes on `(tenant, dagId, runId)`, `(tenant, status, startedUtc)`.
- Artifacts/logs referenced by content hash; stored separately (object storage/GridFS).
- Append-only updates; run status transitions are monotonic.
## Exports
- NDJSON export sorted by `startedUtc`, then `runId`; includes steps/events inline.
- Exports include manifest with hash and count for determinism.
## Observability
- Metrics derived from ledger: run counts, durations, failure rates, retry counts.
- Trace links preserved via stored `traceId`.
## Governance
- Runs never mutated or deleted; cancellation recorded as an event.
- Access is tenant-scoped; admin queries require `orchestrator:admin`.
- Replay tokens can be derived from `inputsHash` + `dagVersion`; consumers must log rationale when replaying.