Files
git.stella-ops.org/docs/modules/taskrunner/migrations/pack-run-collections.md
master a1ce3f74fa
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Implement MongoDB-based storage for Pack Run approval, artifact, log, and state management
- Added MongoPackRunApprovalStore for managing approval states with MongoDB.
- Introduced MongoPackRunArtifactUploader for uploading and storing artifacts.
- Created MongoPackRunLogStore to handle logging of pack run events.
- Developed MongoPackRunStateStore for persisting and retrieving pack run states.
- Implemented unit tests for MongoDB stores to ensure correct functionality.
- Added MongoTaskRunnerTestContext for setting up MongoDB test environment.
- Enhanced PackRunStateFactory to correctly initialize state with gate reasons.
2025-11-07 10:01:47 +02:00

100 lines
5.7 KiB
Markdown

# Task Runner Collections — Initial Migration
Last updated: 2025-11-06
This migration seeds the MongoDB collections that back the Task Runner service. It is implemented as `20251106-task-runner-baseline.mongosh` under the platform migration runner and must be applied **before** enabling the TaskRunner service in any environment.
## Collections
### `pack_runs`
| Field | Type | Notes |
|------------------|-----------------|-----------------------------------------------------------|
| `_id` | `string` | Run identifier (same as `runId`). |
| `planHash` | `string` | Deterministic hash produced by the planner. |
| `plan` | `object` | Full `TaskPackPlan` payload used to execute the run. |
| `failurePolicy` | `object` | Retry/backoff directives resolved at plan time. |
| `requestedAt` | `date` | Timestamp when the client requested the run. |
| `createdAt` | `date` | Timestamp when the run was persisted. |
| `updatedAt` | `date` | Timestamp of the last mutation. |
| `steps` | `array<object>` | Flattened step records (`stepId`, `status`, attempts…). |
| `tenantId` | `string` | Optional multi-tenant scope (reserved for future phases). |
**Indexes**
1. `{ _id: 1 }` — implicit primary key / uniqueness guarantee.
2. `{ updatedAt: -1 }` — serves `GET /runs` listings and staleness checks.
3. `{ tenantId: 1, updatedAt: -1 }` — activated once tenancy is enforced; remains sparse until then.
### `pack_run_logs`
| Field | Type | Notes |
|---------------|-----------------|--------------------------------------------------------|
| `_id` | `ObjectId` | Generated per log entry. |
| `runId` | `string` | Foreign key to `pack_runs._id`. |
| `sequence` | `long` | Monotonic counter assigned by the writer. |
| `timestamp` | `date` | UTC timestamp of the log event. |
| `level` | `string` | `trace`, `debug`, `info`, `warn`, `error`. |
| `eventType` | `string` | Machine-friendly event identifier (e.g. `step.started`). |
| `message` | `string` | Human-readable summary. |
| `stepId` | `string` | Optional step identifier. |
| `metadata` | `object` | Deterministic key/value payload (string-only values). |
**Indexes**
1. `{ runId: 1, sequence: 1 }` (unique) — guarantees ordered retrieval and enforces idempotence.
2. `{ runId: 1, timestamp: 1 }` — accelerates replay and time-window queries.
3. `{ timestamp: 1 }` — optional TTL (disabled by default) for retention policies.
### `pack_artifacts`
| Field | Type | Notes |
|--------------|------------|-------------------------------------------------------------|
| `_id` | `ObjectId` | Generated per artifact record. |
| `runId` | `string` | Foreign key to `pack_runs._id`. |
| `name` | `string` | Output name from the Task Pack manifest. |
| `type` | `string` | `file`, `object`, or other future evidence categories. |
| `sourcePath` | `string` | Local path captured during execution (nullable). |
| `storedPath` | `string` | Object store path or bundle-relative URI (nullable). |
| `status` | `string` | `pending`, `copied`, `materialized`, `skipped`. |
| `notes` | `string` | Free-form notes (deterministic messages only). |
| `capturedAt` | `date` | UTC timestamp recorded by the worker. |
**Indexes**
1. `{ runId: 1, name: 1 }` (unique) — ensures a run emits at most one record per output.
2. `{ runId: 1 }` — supports artifact listing alongside run inspection.
## Execution Order
1. Create collections with `validator` envelopes mirroring the field expectations above (if MongoDB schema validation is enabled in the environment).
2. Apply the indexes in the order listed — unique indexes first to surface data issues early.
3. Backfill existing filesystem-backed runs by importing the serialized state/log/artifact manifests into the new collections. A dedicated importer script (`tools/taskrunner/import-filesystem-state.ps1`) accompanies the migration.
4. Switch the Task Runner service configuration to point at the Mongo-backed stores (`TaskRunner:Storage:Mode = "Mongo"`), then redeploy workers and web service.
## Rollback
To revert, switch the Task Runner configuration back to the filesystem provider and stop the Mongo migration runner. Collections can remain in place; they are append-only and harmless when unused.
## Configuration Reference
Enable the Mongo-backed stores by updating the worker and web service configuration (Compose/Helm values or `appsettings*.json`):
```json
"TaskRunner": {
"Storage": {
"Mode": "mongo",
"Mongo": {
"ConnectionString": "mongodb://127.0.0.1:27017/taskrunner",
"Database": "taskrunner",
"RunsCollection": "pack_runs",
"LogsCollection": "pack_run_logs",
"ArtifactsCollection": "pack_artifacts",
"ApprovalsCollection": "pack_run_approvals"
}
}
}
```
The worker uses the mirrored structure under the `Worker` section. Omit the `Database` property to fall back to the name embedded in the connection string.