Implement MongoDB-based storage for Pack Run approval, artifact, log, and state management

- Added MongoPackRunApprovalStore for managing approval states with MongoDB. - Introduced MongoPackRunArtifactUploader for uploading and storing artifacts. - Created MongoPackRunLogStore to handle logging of pack run events. - Developed MongoPackRunStateStore for persisting and retrieving pack run states. - Implemented unit tests for MongoDB stores to ensure correct functionality. - Added MongoTaskRunnerTestContext for setting up MongoDB test environment. - Enhanced PackRunStateFactory to correctly initialize state with gate reasons.
2025-11-07 10:01:35 +02:00
parent e5ffcd6535
commit a1ce3f74fa
122 changed files with 8730 additions and 914 deletions
--- a/docs/modules/advisory-ai/README.md
+++ b/docs/modules/advisory-ai/README.md
@@ -1,29 +1,37 @@
-# StellaOps Advisory AI
-
-Advisory AI is the retrieval-augmented assistant that synthesizes advisory and VEX evidence into operator-ready summaries, conflict explanations, and remediation plans with strict provenance.
-
-## Responsibilities
- Generate policy-aware advisory summaries with citations back to Conseiller and Excititor evidence.
- Explain conflicting advisories/VEX statements using weights from VEX Lens and Policy Engine.
- Propose remediation hints aligned with Offline Kit staging and export bundles.
- Expose API/UI surfaces with guardrails on model prompts, outputs, and retention.
-
-## Key components
- RAG pipeline drawing from Conseiller, Excititor, VEX Lens, Policy Engine, and SBOM Service data.
- Prompt templates and guard models enforcing provenance and redaction policies.
- Vercel/offline inference workers with deterministic caching of generated artefacts.
-
-## Integrations & dependencies
- Authority for tenant-aware access control.
- Policy Engine for context-specific decisions and explain traces.
- Console/CLI for interaction surfaces.
- Export Center/Vuln Explorer for embedding generated briefs.
-
-## Operational notes
- Model cache management and offline bundle packaging per Epic 8 requirements.
- Usage/latency dashboards for prompt/response monitoring.
- Redaction policies validated against security/LLM guardrail tests.
-
-## Epic alignment
- Epic 8: Advisory AI Assistant.
- DOCS-AI stories to be tracked in ../../TASKS.md.
+# StellaOps Advisory AI
+
+Advisory AI is the retrieval-augmented assistant that synthesizes advisory and VEX evidence into operator-ready summaries, conflict explanations, and remediation plans with strict provenance.
+
+## Responsibilities
+- Generate policy-aware advisory summaries with citations back to Conseiller and Excititor evidence.
+- Explain conflicting advisories/VEX statements using weights from VEX Lens and Policy Engine.
+- Propose remediation hints aligned with Offline Kit staging and export bundles.
+- Expose API/UI surfaces with guardrails on model prompts, outputs, and retention.
+
+## Key components
+- RAG pipeline drawing from Conseiller, Excititor, VEX Lens, Policy Engine, and SBOM Service data.
+- Prompt templates and guard models enforcing provenance and redaction policies.
+- Vercel/offline inference workers with deterministic caching of generated artefacts.
+
+## Integrations & dependencies
+- Authority for tenant-aware access control.
+- Policy Engine for context-specific decisions and explain traces.
+- Console/CLI for interaction surfaces.
+- Export Center/Vuln Explorer for embedding generated briefs.
+
+## Operational notes
+- Model cache management and offline bundle packaging per Epic 8 requirements.
+- Usage/latency dashboards for prompt/response monitoring with `advisory_ai_latency_seconds`, guardrail block/validation counters, and citation coverage histograms wired into the default “Advisory AI” Grafana dashboard.
+- Alert policies fire when `advisory_ai_guardrail_blocks_total` or `advisory_ai_validation_failures_total` breach burn-rate thresholds (5 blocks/min or validation failures > 1% of traffic) and when latency p95 exceeds 30s.
+- Redaction policies validated against security/LLM guardrail tests.
+- Guardrail behaviour, blocked phrases, and operational alerts are detailed in `/docs/security/assistant-guardrails.md`.
+
+## CLI usage
+- `stella advise run <summary|conflict|remediation> --advisory-key <id> [--artifact-id id] [--artifact-purl purl] [--policy-version v] [--profile profile] [--section name] [--force-refresh] [--timeout seconds]`
+  - Requests an advisory plan from the web service, enqueues execution, then polls for the generated output (default wait 120 s, single check if `--timeout 0`).
+  - Renders plan metadata (cache key, prompt template, token budget), guardrail state, provenance hashes, signatures, and citations in a deterministic table view.
+  - Honors `STELLAOPS_ADVISORYAI_URL` when set; otherwise the CLI reuses the backend URL and scopes requests via `X-StellaOps-Scopes`.
+
+## Epic alignment
+- Epic 8: Advisory AI Assistant.
+- DOCS-AI stories to be tracked in ../../TASKS.md.
--- a/docs/modules/cli/architecture.md
+++ b/docs/modules/cli/architecture.md
@@ -129,7 +129,16 @@ src/

 Both subcommands honour offline-first expectations (no network access) and normalise relative roots via `--root` when operators mirror the credential store.

-### 2.11 Air-gap guard
+### 2.11 Advisory AI (RAG summaries)
+
+* `advise run <summary|conflict|remediation> --advisory-key <id> [--artifact-id id] [--artifact-purl purl] [--policy-version v] [--profile profile] [--section name] [--force-refresh] [--timeout seconds]`
+
+  * Calls the Advisory AI service (`/v1/advisory-ai/pipeline/{task}` + `/outputs/{cacheKey}`) to materialise a deterministic plan, queue execution, and poll for the generated brief.
+  * Renders plan metadata (cache key, prompt template, token budgets), guardrail results, provenance hashes/signatures, and citation list. Exit code is non-zero if guardrails block or the command times out.
+  * Uses `STELLAOPS_ADVISORYAI_URL` when configured; otherwise it reuses the backend base address and adds `X-StellaOps-Scopes` (`advisory:run` + task scope) per request.
+  * `--timeout 0` performs a single cache lookup (for CI flows that only want cached artefacts).
+
+### 2.12 Air-gap guard

 - CLI outbound HTTP flows (Authority auth, backend APIs, advisory downloads) route through `StellaOps.AirGap.Policy`. When sealed mode is active the CLI refuses commands that would require external egress and surfaces the shared `AIRGAP_EGRESS_BLOCKED` remediation guidance instead of attempting the request.

--- a/docs/modules/concelier/README.md
+++ b/docs/modules/concelier/README.md
@@ -3,15 +3,19 @@
 Concelier ingests signed advisories from dozens of sources and converts them into immutable observations plus linksets under the Aggregation-Only Contract (AOC).

 ## Responsibilities
- Fetch and normalise vulnerability advisories via restart-time connectors.
- Persist observations and correlation linksets without precedence decisions.
- Emit deterministic exports (JSON, Trivy DB) for downstream policy evaluation.
- Coordinate offline/air-gap updates via Offline Kit bundles.
+- Fetch and normalise vulnerability advisories via restart-time connectors.
+- Persist observations and correlation linksets without precedence decisions.
+- Emit deterministic exports (JSON, Trivy DB) for downstream policy evaluation.
+- Coordinate offline/air-gap updates via Offline Kit bundles.
+- Serve paragraph-anchored advisory chunks for Advisory AI consumers without breaking the Aggregation-Only Contract.

-## Key components
- `StellaOps.Concelier.WebService` orchestration host.
- Connector libraries under `StellaOps.Concelier.Connector.*`.
- Exporter packages (`StellaOps.Concelier.Exporter.*`).
+## Key components
+- `StellaOps.Concelier.WebService` orchestration host.
+- Connector libraries under `StellaOps.Concelier.Connector.*`.
+- Exporter packages (`StellaOps.Concelier.Exporter.*`).
+
+## Recent updates
+- **2025-11-07:** Paragraph-anchored `/advisories/{advisoryKey}/chunks` endpoint shipped for Advisory AI paragraph retrieval. Details and rollout notes live in [`../../updates/2025-11-07-concelier-advisory-chunks.md`](../../updates/2025-11-07-concelier-advisory-chunks.md).

 ## Integrations & dependencies
 - MongoDB for canonical observations and schedules.
--- a/docs/modules/excititor/README.md
+++ b/docs/modules/excititor/README.md
@@ -3,8 +3,15 @@
 Excititor converts heterogeneous VEX feeds into raw observations and linksets that honour the Aggregation-Only Contract.

 ## Latest updates (2025-11-05)
- Link-Not-Merge readiness: release note [Excitor consensus beta](../../updates/2025-11-05-excitor-consensus-beta.md) captures how Excititor feeds power the Excitor consensus beta (sample payload in [consensus JSON](../../vex/consensus-json.md)).
+- Link-Not-Merge readiness: release note [Excitor consensus beta](../../updates/2025-11-05-excitor-consensus-beta.md) captures how Excititor feeds power the Excititor consensus beta (sample payload in [consensus JSON](../../vex/consensus-json.md)).
 - README now points policy/UI teams to the upcoming consensus integration work.
+- DSSE packaging for consensus bundles and Export Center hooks are documented in the [beta release note](../../updates/2025-11-05-excitor-consensus-beta.md); operators mirroring Excititor exports must verify detached JWS artefacts (`bundle.json.jws`) alongside each bundle.
+- Follow-ups called out in the release note (Policy weighting knobs `POLICY-ENGINE-30-101`, CLI verb `CLI-VEX-30-002`) remain in-flight and are tracked in `/docs/implplan/SPRINT_200_documentation_process.md`.
+
+## Release references
+- Consensus beta payload reference: [docs/vex/consensus-json.md](../../vex/consensus-json.md)
+- Export Center offline packaging: [docs/modules/export-center/devportal-offline.md](../export-center/devportal-offline.md)
+- Historical release log: [docs/updates/](../../updates/)

 ## Responsibilities
 - Fetch OpenVEX/CSAF/CycloneDX statements via restart-only connectors.
--- a/docs/modules/excititor/TASKS.md
+++ b/docs/modules/excititor/TASKS.md
@@ -4,6 +4,6 @@

 | ID | Status | Owner(s) | Description | Notes |
 |----|--------|----------|-------------|-------|
-| EXCITITOR-DOCS-0001 | DONE (2025-11-05) | Docs Guild | Validate that ./README.md aligns with the latest release notes. | README now links to the [Excitor consensus beta release note](../../updates/2025-11-05-excitor-consensus-beta.md) and [consensus JSON sample](../../vex/consensus-json.md). |
-| EXCITITOR-OPS-0001 | TODO | Ops Guild | Review runbooks/observability assets after next sprint demo. | Sync outcomes back to ../../TASKS.md |
-| EXCITITOR-ENG-0001 | TODO | Module Team | Cross-check implementation plan milestones against `/docs/implplan/SPRINT_*.md`. | Update status via ./AGENTS.md workflow |
+| EXCITITOR-DOCS-0001 | DONE (2025-11-07) | Docs Guild | Validate that ./README.md aligns with the latest release notes. | README now includes DSSE/export references + release-note cross-links for the consensus beta. |
+| EXCITITOR-OPS-0001 | DONE (2025-11-07) | Ops Guild | Review runbooks/observability assets after next sprint demo. | Added runbook/observability checklist (metrics, alerts, incident steps) to `docs/modules/excititor/mirrors.md`. |
+| EXCITITOR-ENG-0001 | DONE (2025-11-07) | Module Team | Cross-check implementation plan milestones against `/docs/implplan/SPRINT_*.md`. | Implementation plan now mirrors SPRINT_200 statuses in a new sprint-alignment table. |
--- a/docs/modules/excititor/implementation_plan.md
+++ b/docs/modules/excititor/implementation_plan.md
@@ -15,7 +15,17 @@
 - **Epic 8 – Advisory AI:** guarantee citation-ready payloads and normalized context for AI summaries/explainers.
 - Track DOCS-LNM-22-006/007 and CLI-EXC-25-001..002 in ../../TASKS.md.

-## Coordination
- Review ./AGENTS.md before picking up new work.
- Sync with cross-cutting teams noted in `/docs/implplan/SPRINT_*.md`.
- Update this plan whenever scope, dependencies, or guardrails change.
+## Coordination
+- Review ./AGENTS.md before picking up new work.
+- Sync with cross-cutting teams noted in `/docs/implplan/SPRINT_*.md`.
+- Update this plan whenever scope, dependencies, or guardrails change.
+
+## Sprint alignment (2025-11-07)
+
+| Sprint task | State (SPRINT_200) | Notes |
+| --- | --- | --- |
+| EXCITITOR-DOCS-0001 | DONE | README release alignment + consensus beta references refreshed (DSSE/export guidance). |
+| EXCITITOR-ENG-0001 | DONE | Implementation plan now mirrors `SPRINT_200_documentation_process.md` through this table. |
+| EXCITITOR-OPS-0001 | DONE | Runbook/observability checklist added to `docs/modules/excititor/mirrors.md`. |
+
+See `/docs/implplan/SPRINT_200_documentation_process.md` for the canonical status table.
--- a/docs/modules/excititor/mirrors.md
+++ b/docs/modules/excititor/mirrors.md
@@ -156,9 +156,40 @@ Downstream automation reads `manifest.json`/`bundle.json` directly, while `/exci

 ---

-## 6) Future alignment
-
-* Replace manual export definitions with generated mirror bundle manifests once `EXCITITOR-EXPORT-01-007` ships.
-* Extend `/index` payload with quiet-provenance when `EXCITITOR-EXPORT-01-006` adds that metadata.
-* Integrate domain manifests with DevOps mirror profiles (`DEVOPS-MIRROR-08-001`) so helm/compose overlays can enable or disable domains declaratively.
+## 6) Future alignment
+
+* Replace manual export definitions with generated mirror bundle manifests once `EXCITITOR-EXPORT-01-007` ships.
+* Extend `/index` payload with quiet-provenance when `EXCITITOR-EXPORT-01-006` adds that metadata.
+* Integrate domain manifests with DevOps mirror profiles (`DEVOPS-MIRROR-08-001`) so helm/compose overlays can enable or disable domains declaratively.
+
+---
+
+## 7) Runbook & observability checklist (Sprint 22 demo refresh · 2025-11-07)
+
+### Daily / on-call checks
+1. **Index freshness** – watch `excitor_mirror_export_latency_seconds` (p95 < 180) grouped by `domainId`. If latency grows past 10 minutes, verify the export worker queue (`stellaops-export-worker` logs) and ensure Mongo `vex_exports` has entries newer than `now()-10m`.
+2. **Quota exhaustion** – alert on `excitor_mirror_quota_exhausted_total{scope="download"}` increases. When triggered, inspect structured logs (`MirrorDomainId`, `QuotaScope`, `RemoteIp`) and either raise limits or throttle abusive clients.
+3. **Bundle signature health** – metric `excitor_mirror_bundle_signature_verified_total` should match download counts when signing enabled. Deltas indicate missing `.jws` files; rebuild the bundle via export job or copy artefacts from the authority mirror cache.
+4. **HTTP errors** – dashboards should track 4xx/5xx rates split by route; repeated `503` statuses imply misconfigured exports. Check `mirror/index` logs for `status=misconfigured`.
+
+### Incident steps
+1. Use `GET /excititor/mirror/domains/{id}/index` to capture current manifests. Attach the response to the incident log for reproducibility.
+2. For quota incidents, temporarily raise `maxIndexRequestsPerHour`/`maxDownloadRequestsPerHour` via the `Excititor:Mirror:Domains` config override, redeploy, then work with the consuming team on caching.
+3. For stale exports, trigger the export job (`Excititor.ExportRunner`) and confirm the artefacts are written to `outputRoot/<domain>`.
+4. Validate DSSE artefacts by running `cosign verify-blob --certificate-rekor-url=<rekor> --bundle <domain>/bundle.json --signature <domain>/bundle.json.jws`.
+
+### Logging fields (structured)
+| Field | Description |
+| --- | --- |
+| `MirrorDomainId` | Domain handling the request (matches `id` in config). |
+| `QuotaScope` | `index` / `download`, useful when alerting on quota events. |
+| `ExportKey` | Included in download logs to pinpoint misconfigured exports. |
+| `BundleDigest` | SHA-256 of the artefact; compare with index payload when debugging corruption. |
+
+### OTEL signals
+- **Counters:** `excitor.mirror.requests`, `excitor.mirror.quota_blocked`, `excitor.mirror.signature.failures`.
+- **Histograms:** `excitor.mirror.download.duration`, `excitor.mirror.export.latency`.
+- **Spans:** `mirror.index`, `mirror.download` include attributes `mirror.domain`, `mirror.export.key`, and `mirror.quota.remaining`.
+
+Add these instruments via the `MirrorEndpoints` middleware; see `StellaOps.Excititor.WebService/Telemetry/MirrorMetrics.cs`.

--- a/docs/modules/scanner/design/surface-fs.md
+++ b/docs/modules/scanner/design/surface-fs.md
@@ -110,6 +110,10 @@ Import script calls `PutManifest` for each manifest, verifying digests. This ena

 Scanner.Worker serialises EntryTrace graphs into Surface.FS using `SurfaceCacheKey(namespace: "entrytrace.graph", tenant, sha256(options|env|entrypoint))`. At runtime the worker checks the cache before invoking analyzers; cache hits bypass parsing and feed the result store/attestor pipeline directly. The same namespace is consumed by WebService and CLI to retrieve cached graphs for reporting.

+### 6.2 BuildX generator path
+
+`StellaOps.Scanner.Sbomer.BuildXPlugin` reuses the same CAS layout via the `--surface-*` descriptor flags (or `STELLAOPS_SURFACE_*` env vars). When layer fragment JSON, EntryTrace graph JSON, or NDJSON files are supplied, the plug-in writes them under `scanner/surface/**` within the configured CAS root and emits a manifest pointer so Scanner.WebService can pick up the artefacts without re-scanning. The Surface manifest JSON can also be copied to an arbitrary path via `--surface-manifest-output` for CI artefacts/offline kits.
+
 ## 7. Security & Tenancy

 - Tenant ID is mandatory; Surface.Validation enforces match with Authority token.
@@ -119,8 +123,12 @@ Scanner.Worker serialises EntryTrace graphs into Surface.FS using `SurfaceCacheK

 ## 8. Observability

- Logs include manifest SHA, tenant, and kind; payload paths truncated for brevity.
- Metrics exported via Prometheus with labels `{tenant, kind, result}`.
+- Logs include manifest SHA, tenant, kind, and cache namespace; payload paths are truncated for brevity.
+- Prometheus metrics (emitted by Scanner.Worker) now include:
+  - `scanner_worker_surface_manifests_published_total`, `scanner_worker_surface_manifests_failed_total`, `scanner_worker_surface_manifests_skipped_total` with labels `{queue, job_kind, surface_result, reason?, surface_payload_count}`.
+  - `scanner_worker_surface_payload_persisted_total` with `{surface_kind}` to track cache churn (`entrytrace.graph`, `entrytrace.ndjson`, `layer.fragments`, …).
+  - `scanner_worker_surface_manifest_publish_duration_ms` histogram for end-to-end persistence latency.
+- Grafana dashboard JSON: `docs/modules/scanner/operations/surface-worker-grafana-dashboard.json` (panels for publish outcomes, latency, per-kind cache rate, and failure reasons). Import alongside the analyzer dashboard and point it to the Scanner Prometheus datasource.
 - Tracing spans: `surface.fs.put`, `surface.fs.get`, `surface.fs.cache`.

 ## 9. Testing Strategy
--- a/docs/modules/scanner/operations/surface-worker-grafana-dashboard.json
+++ b/docs/modules/scanner/operations/surface-worker-grafana-dashboard.json
@@ -0,0 +1,177 @@
+{
+  "title": "StellaOps Scanner Surface Worker",
+  "uid": "scanner-surface-worker",
+  "schemaVersion": 38,
+  "version": 1,
+  "editable": true,
+  "timezone": "",
+  "graphTooltip": 0,
+  "time": {
+    "from": "now-24h",
+    "to": "now"
+  },
+  "templating": {
+    "list": [
+      {
+        "name": "datasource",
+        "type": "datasource",
+        "query": "prometheus",
+        "refresh": 1,
+        "hide": 0,
+        "current": {}
+      }
+    ]
+  },
+  "annotations": {
+    "list": []
+  },
+  "panels": [
+    {
+      "id": 1,
+      "type": "timeseries",
+      "title": "Surface Manifest Outcomes (5m rate)",
+      "datasource": {
+        "type": "prometheus",
+        "uid": "${datasource}"
+      },
+      "fieldConfig": {
+        "defaults": {
+          "displayName": "{{__series.name}}",
+          "unit": "ops"
+        },
+        "overrides": []
+      },
+      "options": {
+        "legend": {
+          "displayMode": "table",
+          "placement": "bottom"
+        },
+        "tooltip": {
+          "mode": "multi",
+          "sort": "none"
+        }
+      },
+      "targets": [
+        {
+          "expr": "sum(rate(scanner_worker_surface_manifests_published_total[5m]))",
+          "legendFormat": "published",
+          "refId": "A"
+        },
+        {
+          "expr": "sum(rate(scanner_worker_surface_manifests_failed_total[5m]))",
+          "legendFormat": "failed",
+          "refId": "B"
+        },
+        {
+          "expr": "sum(rate(scanner_worker_surface_manifests_skipped_total[5m]))",
+          "legendFormat": "skipped",
+          "refId": "C"
+        }
+      ]
+    },
+    {
+      "id": 2,
+      "type": "timeseries",
+      "title": "Surface Manifest Publish Duration (ms)",
+      "datasource": {
+        "type": "prometheus",
+        "uid": "${datasource}"
+      },
+      "fieldConfig": {
+        "defaults": {
+          "displayName": "{{__series.name}}",
+          "unit": "ms"
+        },
+        "overrides": []
+      },
+      "options": {
+        "legend": {
+          "displayMode": "table",
+          "placement": "bottom"
+        },
+        "tooltip": {
+          "mode": "single",
+          "sort": "none"
+        }
+      },
+      "targets": [
+        {
+          "expr": "histogram_quantile(0.95, sum by (le) (rate(scanner_worker_surface_manifest_publish_duration_ms_bucket[5m])))",
+          "legendFormat": "p95",
+          "refId": "A"
+        },
+        {
+          "expr": "sum(rate(scanner_worker_surface_manifest_publish_duration_ms_sum[5m])) / sum(rate(scanner_worker_surface_manifest_publish_duration_ms_count[5m]))",
+          "legendFormat": "avg",
+          "refId": "B"
+        }
+      ]
+    },
+    {
+      "id": 3,
+      "type": "timeseries",
+      "title": "Surface Payload Cached by Kind (5m rate)",
+      "datasource": {
+        "type": "prometheus",
+        "uid": "${datasource}"
+      },
+      "fieldConfig": {
+        "defaults": {
+          "displayName": "{{surface_kind}}",
+          "unit": "ops"
+        },
+        "overrides": []
+      },
+      "options": {
+        "legend": {
+          "displayMode": "table",
+          "placement": "bottom"
+        },
+        "tooltip": {
+          "mode": "multi",
+          "sort": "none"
+        }
+      },
+      "targets": [
+        {
+          "expr": "sum by (surface_kind) (rate(scanner_worker_surface_payload_persisted_total[5m]))",
+          "legendFormat": "{{surface_kind}}",
+          "refId": "A"
+        }
+      ]
+    },
+    {
+      "id": 4,
+      "type": "timeseries",
+      "title": "Surface Manifest Failures by Reason (5m rate)",
+      "datasource": {
+        "type": "prometheus",
+        "uid": "${datasource}"
+      },
+      "fieldConfig": {
+        "defaults": {
+          "displayName": "{{reason}}",
+          "unit": "ops"
+        },
+        "overrides": []
+      },
+      "options": {
+        "legend": {
+          "displayMode": "table",
+          "placement": "bottom"
+        },
+        "tooltip": {
+          "mode": "multi",
+          "sort": "none"
+        }
+      },
+      "targets": [
+        {
+          "expr": "sum by (reason) (rate(scanner_worker_surface_manifests_failed_total[5m]))",
+          "legendFormat": "{{reason}}",
+          "refId": "A"
+        }
+      ]
+    }
+  ]
+}
--- a/docs/modules/taskrunner/migrations/pack-run-collections.md
+++ b/docs/modules/taskrunner/migrations/pack-run-collections.md
@@ -0,0 +1,99 @@
+# Task Runner Collections — Initial Migration
+
+Last updated: 2025-11-06
+
+This migration seeds the MongoDB collections that back the Task Runner service. It is implemented as `20251106-task-runner-baseline.mongosh` under the platform migration runner and must be applied **before** enabling the TaskRunner service in any environment.
+
+## Collections
+
+### `pack_runs`
+
+| Field            | Type            | Notes                                                     |
+|------------------|-----------------|-----------------------------------------------------------|
+| `_id`            | `string`        | Run identifier (same as `runId`).                         |
+| `planHash`       | `string`        | Deterministic hash produced by the planner.               |
+| `plan`           | `object`        | Full `TaskPackPlan` payload used to execute the run.      |
+| `failurePolicy`  | `object`        | Retry/backoff directives resolved at plan time.           |
+| `requestedAt`    | `date`          | Timestamp when the client requested the run.              |
+| `createdAt`      | `date`          | Timestamp when the run was persisted.                     |
+| `updatedAt`      | `date`          | Timestamp of the last mutation.                           |
+| `steps`          | `array<object>` | Flattened step records (`stepId`, `status`, attempts…).   |
+| `tenantId`       | `string`        | Optional multi-tenant scope (reserved for future phases). |
+
+**Indexes**
+
+1. `{ _id: 1 }` — implicit primary key / uniqueness guarantee.
+2. `{ updatedAt: -1 }` — serves `GET /runs` listings and staleness checks.
+3. `{ tenantId: 1, updatedAt: -1 }` — activated once tenancy is enforced; remains sparse until then.
+
+### `pack_run_logs`
+
+| Field         | Type            | Notes                                                  |
+|---------------|-----------------|--------------------------------------------------------|
+| `_id`         | `ObjectId`      | Generated per log entry.                               |
+| `runId`       | `string`        | Foreign key to `pack_runs._id`.                        |
+| `sequence`    | `long`          | Monotonic counter assigned by the writer.              |
+| `timestamp`   | `date`          | UTC timestamp of the log event.                        |
+| `level`       | `string`        | `trace`, `debug`, `info`, `warn`, `error`.             |
+| `eventType`   | `string`        | Machine-friendly event identifier (e.g. `step.started`). |
+| `message`     | `string`        | Human-readable summary.                                |
+| `stepId`      | `string`        | Optional step identifier.                              |
+| `metadata`    | `object`        | Deterministic key/value payload (string-only values).  |
+
+**Indexes**
+
+1. `{ runId: 1, sequence: 1 }` (unique) — guarantees ordered retrieval and enforces idempotence.
+2. `{ runId: 1, timestamp: 1 }` — accelerates replay and time-window queries.
+3. `{ timestamp: 1 }` — optional TTL (disabled by default) for retention policies.
+
+### `pack_artifacts`
+
+| Field        | Type       | Notes                                                       |
+|--------------|------------|-------------------------------------------------------------|
+| `_id`        | `ObjectId` | Generated per artifact record.                              |
+| `runId`      | `string`   | Foreign key to `pack_runs._id`.                             |
+| `name`       | `string`   | Output name from the Task Pack manifest.                    |
+| `type`       | `string`   | `file`, `object`, or other future evidence categories.      |
+| `sourcePath` | `string`   | Local path captured during execution (nullable).            |
+| `storedPath` | `string`   | Object store path or bundle-relative URI (nullable).        |
+| `status`     | `string`   | `pending`, `copied`, `materialized`, `skipped`.             |
+| `notes`      | `string`   | Free-form notes (deterministic messages only).              |
+| `capturedAt` | `date`     | UTC timestamp recorded by the worker.                       |
+
+**Indexes**
+
+1. `{ runId: 1, name: 1 }` (unique) — ensures a run emits at most one record per output.
+2. `{ runId: 1 }` — supports artifact listing alongside run inspection.
+
+## Execution Order
+
+1. Create collections with `validator` envelopes mirroring the field expectations above (if MongoDB schema validation is enabled in the environment).
+2. Apply the indexes in the order listed — unique indexes first to surface data issues early.
+3. Backfill existing filesystem-backed runs by importing the serialized state/log/artifact manifests into the new collections. A dedicated importer script (`tools/taskrunner/import-filesystem-state.ps1`) accompanies the migration.
+4. Switch the Task Runner service configuration to point at the Mongo-backed stores (`TaskRunner:Storage:Mode = "Mongo"`), then redeploy workers and web service.
+
+## Rollback
+
+To revert, switch the Task Runner configuration back to the filesystem provider and stop the Mongo migration runner. Collections can remain in place; they are append-only and harmless when unused.
+
+## Configuration Reference
+
+Enable the Mongo-backed stores by updating the worker and web service configuration (Compose/Helm values or `appsettings*.json`):
+
+```json
+"TaskRunner": {
+  "Storage": {
+    "Mode": "mongo",
+    "Mongo": {
+      "ConnectionString": "mongodb://127.0.0.1:27017/taskrunner",
+      "Database": "taskrunner",
+      "RunsCollection": "pack_runs",
+      "LogsCollection": "pack_run_logs",
+      "ArtifactsCollection": "pack_artifacts",
+      "ApprovalsCollection": "pack_run_approvals"
+    }
+  }
+}
+```
+
+The worker uses the mirrored structure under the `Worker` section. Omit the `Database` property to fall back to the name embedded in the connection string.