Implement MongoDB-based storage for Pack Run approval, artifact, log, and state management
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
- Added MongoPackRunApprovalStore for managing approval states with MongoDB. - Introduced MongoPackRunArtifactUploader for uploading and storing artifacts. - Created MongoPackRunLogStore to handle logging of pack run events. - Developed MongoPackRunStateStore for persisting and retrieving pack run states. - Implemented unit tests for MongoDB stores to ensure correct functionality. - Added MongoTaskRunnerTestContext for setting up MongoDB test environment. - Enhanced PackRunStateFactory to correctly initialize state with gate reasons.
This commit is contained in:
@@ -1,29 +1,37 @@
|
||||
# StellaOps Advisory AI
|
||||
|
||||
Advisory AI is the retrieval-augmented assistant that synthesizes advisory and VEX evidence into operator-ready summaries, conflict explanations, and remediation plans with strict provenance.
|
||||
|
||||
## Responsibilities
|
||||
- Generate policy-aware advisory summaries with citations back to Conseiller and Excititor evidence.
|
||||
- Explain conflicting advisories/VEX statements using weights from VEX Lens and Policy Engine.
|
||||
- Propose remediation hints aligned with Offline Kit staging and export bundles.
|
||||
- Expose API/UI surfaces with guardrails on model prompts, outputs, and retention.
|
||||
|
||||
## Key components
|
||||
- RAG pipeline drawing from Conseiller, Excititor, VEX Lens, Policy Engine, and SBOM Service data.
|
||||
- Prompt templates and guard models enforcing provenance and redaction policies.
|
||||
- Vercel/offline inference workers with deterministic caching of generated artefacts.
|
||||
|
||||
## Integrations & dependencies
|
||||
- Authority for tenant-aware access control.
|
||||
- Policy Engine for context-specific decisions and explain traces.
|
||||
- Console/CLI for interaction surfaces.
|
||||
- Export Center/Vuln Explorer for embedding generated briefs.
|
||||
|
||||
## Operational notes
|
||||
- Model cache management and offline bundle packaging per Epic 8 requirements.
|
||||
- Usage/latency dashboards for prompt/response monitoring.
|
||||
- Redaction policies validated against security/LLM guardrail tests.
|
||||
|
||||
## Epic alignment
|
||||
- Epic 8: Advisory AI Assistant.
|
||||
- DOCS-AI stories to be tracked in ../../TASKS.md.
|
||||
# StellaOps Advisory AI
|
||||
|
||||
Advisory AI is the retrieval-augmented assistant that synthesizes advisory and VEX evidence into operator-ready summaries, conflict explanations, and remediation plans with strict provenance.
|
||||
|
||||
## Responsibilities
|
||||
- Generate policy-aware advisory summaries with citations back to Conseiller and Excititor evidence.
|
||||
- Explain conflicting advisories/VEX statements using weights from VEX Lens and Policy Engine.
|
||||
- Propose remediation hints aligned with Offline Kit staging and export bundles.
|
||||
- Expose API/UI surfaces with guardrails on model prompts, outputs, and retention.
|
||||
|
||||
## Key components
|
||||
- RAG pipeline drawing from Conseiller, Excititor, VEX Lens, Policy Engine, and SBOM Service data.
|
||||
- Prompt templates and guard models enforcing provenance and redaction policies.
|
||||
- Vercel/offline inference workers with deterministic caching of generated artefacts.
|
||||
|
||||
## Integrations & dependencies
|
||||
- Authority for tenant-aware access control.
|
||||
- Policy Engine for context-specific decisions and explain traces.
|
||||
- Console/CLI for interaction surfaces.
|
||||
- Export Center/Vuln Explorer for embedding generated briefs.
|
||||
|
||||
## Operational notes
|
||||
- Model cache management and offline bundle packaging per Epic 8 requirements.
|
||||
- Usage/latency dashboards for prompt/response monitoring with `advisory_ai_latency_seconds`, guardrail block/validation counters, and citation coverage histograms wired into the default “Advisory AI” Grafana dashboard.
|
||||
- Alert policies fire when `advisory_ai_guardrail_blocks_total` or `advisory_ai_validation_failures_total` breach burn-rate thresholds (5 blocks/min or validation failures > 1% of traffic) and when latency p95 exceeds 30s.
|
||||
- Redaction policies validated against security/LLM guardrail tests.
|
||||
- Guardrail behaviour, blocked phrases, and operational alerts are detailed in `/docs/security/assistant-guardrails.md`.
|
||||
|
||||
## CLI usage
|
||||
- `stella advise run <summary|conflict|remediation> --advisory-key <id> [--artifact-id id] [--artifact-purl purl] [--policy-version v] [--profile profile] [--section name] [--force-refresh] [--timeout seconds]`
|
||||
- Requests an advisory plan from the web service, enqueues execution, then polls for the generated output (default wait 120 s, single check if `--timeout 0`).
|
||||
- Renders plan metadata (cache key, prompt template, token budget), guardrail state, provenance hashes, signatures, and citations in a deterministic table view.
|
||||
- Honors `STELLAOPS_ADVISORYAI_URL` when set; otherwise the CLI reuses the backend URL and scopes requests via `X-StellaOps-Scopes`.
|
||||
|
||||
## Epic alignment
|
||||
- Epic 8: Advisory AI Assistant.
|
||||
- DOCS-AI stories to be tracked in ../../TASKS.md.
|
||||
|
||||
@@ -129,7 +129,16 @@ src/
|
||||
|
||||
Both subcommands honour offline-first expectations (no network access) and normalise relative roots via `--root` when operators mirror the credential store.
|
||||
|
||||
### 2.11 Air-gap guard
|
||||
### 2.11 Advisory AI (RAG summaries)
|
||||
|
||||
* `advise run <summary|conflict|remediation> --advisory-key <id> [--artifact-id id] [--artifact-purl purl] [--policy-version v] [--profile profile] [--section name] [--force-refresh] [--timeout seconds]`
|
||||
|
||||
* Calls the Advisory AI service (`/v1/advisory-ai/pipeline/{task}` + `/outputs/{cacheKey}`) to materialise a deterministic plan, queue execution, and poll for the generated brief.
|
||||
* Renders plan metadata (cache key, prompt template, token budgets), guardrail results, provenance hashes/signatures, and citation list. Exit code is non-zero if guardrails block or the command times out.
|
||||
* Uses `STELLAOPS_ADVISORYAI_URL` when configured; otherwise it reuses the backend base address and adds `X-StellaOps-Scopes` (`advisory:run` + task scope) per request.
|
||||
* `--timeout 0` performs a single cache lookup (for CI flows that only want cached artefacts).
|
||||
|
||||
### 2.12 Air-gap guard
|
||||
|
||||
- CLI outbound HTTP flows (Authority auth, backend APIs, advisory downloads) route through `StellaOps.AirGap.Policy`. When sealed mode is active the CLI refuses commands that would require external egress and surfaces the shared `AIRGAP_EGRESS_BLOCKED` remediation guidance instead of attempting the request.
|
||||
|
||||
|
||||
@@ -3,15 +3,19 @@
|
||||
Concelier ingests signed advisories from dozens of sources and converts them into immutable observations plus linksets under the Aggregation-Only Contract (AOC).
|
||||
|
||||
## Responsibilities
|
||||
- Fetch and normalise vulnerability advisories via restart-time connectors.
|
||||
- Persist observations and correlation linksets without precedence decisions.
|
||||
- Emit deterministic exports (JSON, Trivy DB) for downstream policy evaluation.
|
||||
- Coordinate offline/air-gap updates via Offline Kit bundles.
|
||||
- Fetch and normalise vulnerability advisories via restart-time connectors.
|
||||
- Persist observations and correlation linksets without precedence decisions.
|
||||
- Emit deterministic exports (JSON, Trivy DB) for downstream policy evaluation.
|
||||
- Coordinate offline/air-gap updates via Offline Kit bundles.
|
||||
- Serve paragraph-anchored advisory chunks for Advisory AI consumers without breaking the Aggregation-Only Contract.
|
||||
|
||||
## Key components
|
||||
- `StellaOps.Concelier.WebService` orchestration host.
|
||||
- Connector libraries under `StellaOps.Concelier.Connector.*`.
|
||||
- Exporter packages (`StellaOps.Concelier.Exporter.*`).
|
||||
## Key components
|
||||
- `StellaOps.Concelier.WebService` orchestration host.
|
||||
- Connector libraries under `StellaOps.Concelier.Connector.*`.
|
||||
- Exporter packages (`StellaOps.Concelier.Exporter.*`).
|
||||
|
||||
## Recent updates
|
||||
- **2025-11-07:** Paragraph-anchored `/advisories/{advisoryKey}/chunks` endpoint shipped for Advisory AI paragraph retrieval. Details and rollout notes live in [`../../updates/2025-11-07-concelier-advisory-chunks.md`](../../updates/2025-11-07-concelier-advisory-chunks.md).
|
||||
|
||||
## Integrations & dependencies
|
||||
- MongoDB for canonical observations and schedules.
|
||||
|
||||
@@ -3,8 +3,15 @@
|
||||
Excititor converts heterogeneous VEX feeds into raw observations and linksets that honour the Aggregation-Only Contract.
|
||||
|
||||
## Latest updates (2025-11-05)
|
||||
- Link-Not-Merge readiness: release note [Excitor consensus beta](../../updates/2025-11-05-excitor-consensus-beta.md) captures how Excititor feeds power the Excitor consensus beta (sample payload in [consensus JSON](../../vex/consensus-json.md)).
|
||||
- Link-Not-Merge readiness: release note [Excitor consensus beta](../../updates/2025-11-05-excitor-consensus-beta.md) captures how Excititor feeds power the Excititor consensus beta (sample payload in [consensus JSON](../../vex/consensus-json.md)).
|
||||
- README now points policy/UI teams to the upcoming consensus integration work.
|
||||
- DSSE packaging for consensus bundles and Export Center hooks are documented in the [beta release note](../../updates/2025-11-05-excitor-consensus-beta.md); operators mirroring Excititor exports must verify detached JWS artefacts (`bundle.json.jws`) alongside each bundle.
|
||||
- Follow-ups called out in the release note (Policy weighting knobs `POLICY-ENGINE-30-101`, CLI verb `CLI-VEX-30-002`) remain in-flight and are tracked in `/docs/implplan/SPRINT_200_documentation_process.md`.
|
||||
|
||||
## Release references
|
||||
- Consensus beta payload reference: [docs/vex/consensus-json.md](../../vex/consensus-json.md)
|
||||
- Export Center offline packaging: [docs/modules/export-center/devportal-offline.md](../export-center/devportal-offline.md)
|
||||
- Historical release log: [docs/updates/](../../updates/)
|
||||
|
||||
## Responsibilities
|
||||
- Fetch OpenVEX/CSAF/CycloneDX statements via restart-only connectors.
|
||||
|
||||
@@ -4,6 +4,6 @@
|
||||
|
||||
| ID | Status | Owner(s) | Description | Notes |
|
||||
|----|--------|----------|-------------|-------|
|
||||
| EXCITITOR-DOCS-0001 | DONE (2025-11-05) | Docs Guild | Validate that ./README.md aligns with the latest release notes. | README now links to the [Excitor consensus beta release note](../../updates/2025-11-05-excitor-consensus-beta.md) and [consensus JSON sample](../../vex/consensus-json.md). |
|
||||
| EXCITITOR-OPS-0001 | TODO | Ops Guild | Review runbooks/observability assets after next sprint demo. | Sync outcomes back to ../../TASKS.md |
|
||||
| EXCITITOR-ENG-0001 | TODO | Module Team | Cross-check implementation plan milestones against `/docs/implplan/SPRINT_*.md`. | Update status via ./AGENTS.md workflow |
|
||||
| EXCITITOR-DOCS-0001 | DONE (2025-11-07) | Docs Guild | Validate that ./README.md aligns with the latest release notes. | README now includes DSSE/export references + release-note cross-links for the consensus beta. |
|
||||
| EXCITITOR-OPS-0001 | DONE (2025-11-07) | Ops Guild | Review runbooks/observability assets after next sprint demo. | Added runbook/observability checklist (metrics, alerts, incident steps) to `docs/modules/excititor/mirrors.md`. |
|
||||
| EXCITITOR-ENG-0001 | DONE (2025-11-07) | Module Team | Cross-check implementation plan milestones against `/docs/implplan/SPRINT_*.md`. | Implementation plan now mirrors SPRINT_200 statuses in a new sprint-alignment table. |
|
||||
|
||||
@@ -15,7 +15,17 @@
|
||||
- **Epic 8 – Advisory AI:** guarantee citation-ready payloads and normalized context for AI summaries/explainers.
|
||||
- Track DOCS-LNM-22-006/007 and CLI-EXC-25-001..002 in ../../TASKS.md.
|
||||
|
||||
## Coordination
|
||||
- Review ./AGENTS.md before picking up new work.
|
||||
- Sync with cross-cutting teams noted in `/docs/implplan/SPRINT_*.md`.
|
||||
- Update this plan whenever scope, dependencies, or guardrails change.
|
||||
## Coordination
|
||||
- Review ./AGENTS.md before picking up new work.
|
||||
- Sync with cross-cutting teams noted in `/docs/implplan/SPRINT_*.md`.
|
||||
- Update this plan whenever scope, dependencies, or guardrails change.
|
||||
|
||||
## Sprint alignment (2025-11-07)
|
||||
|
||||
| Sprint task | State (SPRINT_200) | Notes |
|
||||
| --- | --- | --- |
|
||||
| EXCITITOR-DOCS-0001 | DONE | README release alignment + consensus beta references refreshed (DSSE/export guidance). |
|
||||
| EXCITITOR-ENG-0001 | DONE | Implementation plan now mirrors `SPRINT_200_documentation_process.md` through this table. |
|
||||
| EXCITITOR-OPS-0001 | DONE | Runbook/observability checklist added to `docs/modules/excititor/mirrors.md`. |
|
||||
|
||||
See `/docs/implplan/SPRINT_200_documentation_process.md` for the canonical status table.
|
||||
|
||||
@@ -156,9 +156,40 @@ Downstream automation reads `manifest.json`/`bundle.json` directly, while `/exci
|
||||
|
||||
---
|
||||
|
||||
## 6) Future alignment
|
||||
|
||||
* Replace manual export definitions with generated mirror bundle manifests once `EXCITITOR-EXPORT-01-007` ships.
|
||||
* Extend `/index` payload with quiet-provenance when `EXCITITOR-EXPORT-01-006` adds that metadata.
|
||||
* Integrate domain manifests with DevOps mirror profiles (`DEVOPS-MIRROR-08-001`) so helm/compose overlays can enable or disable domains declaratively.
|
||||
## 6) Future alignment
|
||||
|
||||
* Replace manual export definitions with generated mirror bundle manifests once `EXCITITOR-EXPORT-01-007` ships.
|
||||
* Extend `/index` payload with quiet-provenance when `EXCITITOR-EXPORT-01-006` adds that metadata.
|
||||
* Integrate domain manifests with DevOps mirror profiles (`DEVOPS-MIRROR-08-001`) so helm/compose overlays can enable or disable domains declaratively.
|
||||
|
||||
---
|
||||
|
||||
## 7) Runbook & observability checklist (Sprint 22 demo refresh · 2025-11-07)
|
||||
|
||||
### Daily / on-call checks
|
||||
1. **Index freshness** – watch `excitor_mirror_export_latency_seconds` (p95 < 180) grouped by `domainId`. If latency grows past 10 minutes, verify the export worker queue (`stellaops-export-worker` logs) and ensure Mongo `vex_exports` has entries newer than `now()-10m`.
|
||||
2. **Quota exhaustion** – alert on `excitor_mirror_quota_exhausted_total{scope="download"}` increases. When triggered, inspect structured logs (`MirrorDomainId`, `QuotaScope`, `RemoteIp`) and either raise limits or throttle abusive clients.
|
||||
3. **Bundle signature health** – metric `excitor_mirror_bundle_signature_verified_total` should match download counts when signing enabled. Deltas indicate missing `.jws` files; rebuild the bundle via export job or copy artefacts from the authority mirror cache.
|
||||
4. **HTTP errors** – dashboards should track 4xx/5xx rates split by route; repeated `503` statuses imply misconfigured exports. Check `mirror/index` logs for `status=misconfigured`.
|
||||
|
||||
### Incident steps
|
||||
1. Use `GET /excititor/mirror/domains/{id}/index` to capture current manifests. Attach the response to the incident log for reproducibility.
|
||||
2. For quota incidents, temporarily raise `maxIndexRequestsPerHour`/`maxDownloadRequestsPerHour` via the `Excititor:Mirror:Domains` config override, redeploy, then work with the consuming team on caching.
|
||||
3. For stale exports, trigger the export job (`Excititor.ExportRunner`) and confirm the artefacts are written to `outputRoot/<domain>`.
|
||||
4. Validate DSSE artefacts by running `cosign verify-blob --certificate-rekor-url=<rekor> --bundle <domain>/bundle.json --signature <domain>/bundle.json.jws`.
|
||||
|
||||
### Logging fields (structured)
|
||||
| Field | Description |
|
||||
| --- | --- |
|
||||
| `MirrorDomainId` | Domain handling the request (matches `id` in config). |
|
||||
| `QuotaScope` | `index` / `download`, useful when alerting on quota events. |
|
||||
| `ExportKey` | Included in download logs to pinpoint misconfigured exports. |
|
||||
| `BundleDigest` | SHA-256 of the artefact; compare with index payload when debugging corruption. |
|
||||
|
||||
### OTEL signals
|
||||
- **Counters:** `excitor.mirror.requests`, `excitor.mirror.quota_blocked`, `excitor.mirror.signature.failures`.
|
||||
- **Histograms:** `excitor.mirror.download.duration`, `excitor.mirror.export.latency`.
|
||||
- **Spans:** `mirror.index`, `mirror.download` include attributes `mirror.domain`, `mirror.export.key`, and `mirror.quota.remaining`.
|
||||
|
||||
Add these instruments via the `MirrorEndpoints` middleware; see `StellaOps.Excititor.WebService/Telemetry/MirrorMetrics.cs`.
|
||||
|
||||
|
||||
@@ -110,6 +110,10 @@ Import script calls `PutManifest` for each manifest, verifying digests. This ena
|
||||
|
||||
Scanner.Worker serialises EntryTrace graphs into Surface.FS using `SurfaceCacheKey(namespace: "entrytrace.graph", tenant, sha256(options|env|entrypoint))`. At runtime the worker checks the cache before invoking analyzers; cache hits bypass parsing and feed the result store/attestor pipeline directly. The same namespace is consumed by WebService and CLI to retrieve cached graphs for reporting.
|
||||
|
||||
### 6.2 BuildX generator path
|
||||
|
||||
`StellaOps.Scanner.Sbomer.BuildXPlugin` reuses the same CAS layout via the `--surface-*` descriptor flags (or `STELLAOPS_SURFACE_*` env vars). When layer fragment JSON, EntryTrace graph JSON, or NDJSON files are supplied, the plug-in writes them under `scanner/surface/**` within the configured CAS root and emits a manifest pointer so Scanner.WebService can pick up the artefacts without re-scanning. The Surface manifest JSON can also be copied to an arbitrary path via `--surface-manifest-output` for CI artefacts/offline kits.
|
||||
|
||||
## 7. Security & Tenancy
|
||||
|
||||
- Tenant ID is mandatory; Surface.Validation enforces match with Authority token.
|
||||
@@ -119,8 +123,12 @@ Scanner.Worker serialises EntryTrace graphs into Surface.FS using `SurfaceCacheK
|
||||
|
||||
## 8. Observability
|
||||
|
||||
- Logs include manifest SHA, tenant, and kind; payload paths truncated for brevity.
|
||||
- Metrics exported via Prometheus with labels `{tenant, kind, result}`.
|
||||
- Logs include manifest SHA, tenant, kind, and cache namespace; payload paths are truncated for brevity.
|
||||
- Prometheus metrics (emitted by Scanner.Worker) now include:
|
||||
- `scanner_worker_surface_manifests_published_total`, `scanner_worker_surface_manifests_failed_total`, `scanner_worker_surface_manifests_skipped_total` with labels `{queue, job_kind, surface_result, reason?, surface_payload_count}`.
|
||||
- `scanner_worker_surface_payload_persisted_total` with `{surface_kind}` to track cache churn (`entrytrace.graph`, `entrytrace.ndjson`, `layer.fragments`, …).
|
||||
- `scanner_worker_surface_manifest_publish_duration_ms` histogram for end-to-end persistence latency.
|
||||
- Grafana dashboard JSON: `docs/modules/scanner/operations/surface-worker-grafana-dashboard.json` (panels for publish outcomes, latency, per-kind cache rate, and failure reasons). Import alongside the analyzer dashboard and point it to the Scanner Prometheus datasource.
|
||||
- Tracing spans: `surface.fs.put`, `surface.fs.get`, `surface.fs.cache`.
|
||||
|
||||
## 9. Testing Strategy
|
||||
|
||||
@@ -0,0 +1,177 @@
|
||||
{
|
||||
"title": "StellaOps Scanner Surface Worker",
|
||||
"uid": "scanner-surface-worker",
|
||||
"schemaVersion": 38,
|
||||
"version": 1,
|
||||
"editable": true,
|
||||
"timezone": "",
|
||||
"graphTooltip": 0,
|
||||
"time": {
|
||||
"from": "now-24h",
|
||||
"to": "now"
|
||||
},
|
||||
"templating": {
|
||||
"list": [
|
||||
{
|
||||
"name": "datasource",
|
||||
"type": "datasource",
|
||||
"query": "prometheus",
|
||||
"refresh": 1,
|
||||
"hide": 0,
|
||||
"current": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"panels": [
|
||||
{
|
||||
"id": 1,
|
||||
"type": "timeseries",
|
||||
"title": "Surface Manifest Outcomes (5m rate)",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "${datasource}"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"displayName": "{{__series.name}}",
|
||||
"unit": "ops"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "bottom"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi",
|
||||
"sort": "none"
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(scanner_worker_surface_manifests_published_total[5m]))",
|
||||
"legendFormat": "published",
|
||||
"refId": "A"
|
||||
},
|
||||
{
|
||||
"expr": "sum(rate(scanner_worker_surface_manifests_failed_total[5m]))",
|
||||
"legendFormat": "failed",
|
||||
"refId": "B"
|
||||
},
|
||||
{
|
||||
"expr": "sum(rate(scanner_worker_surface_manifests_skipped_total[5m]))",
|
||||
"legendFormat": "skipped",
|
||||
"refId": "C"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"type": "timeseries",
|
||||
"title": "Surface Manifest Publish Duration (ms)",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "${datasource}"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"displayName": "{{__series.name}}",
|
||||
"unit": "ms"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "bottom"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "single",
|
||||
"sort": "none"
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "histogram_quantile(0.95, sum by (le) (rate(scanner_worker_surface_manifest_publish_duration_ms_bucket[5m])))",
|
||||
"legendFormat": "p95",
|
||||
"refId": "A"
|
||||
},
|
||||
{
|
||||
"expr": "sum(rate(scanner_worker_surface_manifest_publish_duration_ms_sum[5m])) / sum(rate(scanner_worker_surface_manifest_publish_duration_ms_count[5m]))",
|
||||
"legendFormat": "avg",
|
||||
"refId": "B"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"type": "timeseries",
|
||||
"title": "Surface Payload Cached by Kind (5m rate)",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "${datasource}"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"displayName": "{{surface_kind}}",
|
||||
"unit": "ops"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "bottom"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi",
|
||||
"sort": "none"
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum by (surface_kind) (rate(scanner_worker_surface_payload_persisted_total[5m]))",
|
||||
"legendFormat": "{{surface_kind}}",
|
||||
"refId": "A"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 4,
|
||||
"type": "timeseries",
|
||||
"title": "Surface Manifest Failures by Reason (5m rate)",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "${datasource}"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"displayName": "{{reason}}",
|
||||
"unit": "ops"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "bottom"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi",
|
||||
"sort": "none"
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum by (reason) (rate(scanner_worker_surface_manifests_failed_total[5m]))",
|
||||
"legendFormat": "{{reason}}",
|
||||
"refId": "A"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
99
docs/modules/taskrunner/migrations/pack-run-collections.md
Normal file
99
docs/modules/taskrunner/migrations/pack-run-collections.md
Normal file
@@ -0,0 +1,99 @@
|
||||
# Task Runner Collections — Initial Migration
|
||||
|
||||
Last updated: 2025-11-06
|
||||
|
||||
This migration seeds the MongoDB collections that back the Task Runner service. It is implemented as `20251106-task-runner-baseline.mongosh` under the platform migration runner and must be applied **before** enabling the TaskRunner service in any environment.
|
||||
|
||||
## Collections
|
||||
|
||||
### `pack_runs`
|
||||
|
||||
| Field | Type | Notes |
|
||||
|------------------|-----------------|-----------------------------------------------------------|
|
||||
| `_id` | `string` | Run identifier (same as `runId`). |
|
||||
| `planHash` | `string` | Deterministic hash produced by the planner. |
|
||||
| `plan` | `object` | Full `TaskPackPlan` payload used to execute the run. |
|
||||
| `failurePolicy` | `object` | Retry/backoff directives resolved at plan time. |
|
||||
| `requestedAt` | `date` | Timestamp when the client requested the run. |
|
||||
| `createdAt` | `date` | Timestamp when the run was persisted. |
|
||||
| `updatedAt` | `date` | Timestamp of the last mutation. |
|
||||
| `steps` | `array<object>` | Flattened step records (`stepId`, `status`, attempts…). |
|
||||
| `tenantId` | `string` | Optional multi-tenant scope (reserved for future phases). |
|
||||
|
||||
**Indexes**
|
||||
|
||||
1. `{ _id: 1 }` — implicit primary key / uniqueness guarantee.
|
||||
2. `{ updatedAt: -1 }` — serves `GET /runs` listings and staleness checks.
|
||||
3. `{ tenantId: 1, updatedAt: -1 }` — activated once tenancy is enforced; remains sparse until then.
|
||||
|
||||
### `pack_run_logs`
|
||||
|
||||
| Field | Type | Notes |
|
||||
|---------------|-----------------|--------------------------------------------------------|
|
||||
| `_id` | `ObjectId` | Generated per log entry. |
|
||||
| `runId` | `string` | Foreign key to `pack_runs._id`. |
|
||||
| `sequence` | `long` | Monotonic counter assigned by the writer. |
|
||||
| `timestamp` | `date` | UTC timestamp of the log event. |
|
||||
| `level` | `string` | `trace`, `debug`, `info`, `warn`, `error`. |
|
||||
| `eventType` | `string` | Machine-friendly event identifier (e.g. `step.started`). |
|
||||
| `message` | `string` | Human-readable summary. |
|
||||
| `stepId` | `string` | Optional step identifier. |
|
||||
| `metadata` | `object` | Deterministic key/value payload (string-only values). |
|
||||
|
||||
**Indexes**
|
||||
|
||||
1. `{ runId: 1, sequence: 1 }` (unique) — guarantees ordered retrieval and enforces idempotence.
|
||||
2. `{ runId: 1, timestamp: 1 }` — accelerates replay and time-window queries.
|
||||
3. `{ timestamp: 1 }` — optional TTL (disabled by default) for retention policies.
|
||||
|
||||
### `pack_artifacts`
|
||||
|
||||
| Field | Type | Notes |
|
||||
|--------------|------------|-------------------------------------------------------------|
|
||||
| `_id` | `ObjectId` | Generated per artifact record. |
|
||||
| `runId` | `string` | Foreign key to `pack_runs._id`. |
|
||||
| `name` | `string` | Output name from the Task Pack manifest. |
|
||||
| `type` | `string` | `file`, `object`, or other future evidence categories. |
|
||||
| `sourcePath` | `string` | Local path captured during execution (nullable). |
|
||||
| `storedPath` | `string` | Object store path or bundle-relative URI (nullable). |
|
||||
| `status` | `string` | `pending`, `copied`, `materialized`, `skipped`. |
|
||||
| `notes` | `string` | Free-form notes (deterministic messages only). |
|
||||
| `capturedAt` | `date` | UTC timestamp recorded by the worker. |
|
||||
|
||||
**Indexes**
|
||||
|
||||
1. `{ runId: 1, name: 1 }` (unique) — ensures a run emits at most one record per output.
|
||||
2. `{ runId: 1 }` — supports artifact listing alongside run inspection.
|
||||
|
||||
## Execution Order
|
||||
|
||||
1. Create collections with `validator` envelopes mirroring the field expectations above (if MongoDB schema validation is enabled in the environment).
|
||||
2. Apply the indexes in the order listed — unique indexes first to surface data issues early.
|
||||
3. Backfill existing filesystem-backed runs by importing the serialized state/log/artifact manifests into the new collections. A dedicated importer script (`tools/taskrunner/import-filesystem-state.ps1`) accompanies the migration.
|
||||
4. Switch the Task Runner service configuration to point at the Mongo-backed stores (`TaskRunner:Storage:Mode = "Mongo"`), then redeploy workers and web service.
|
||||
|
||||
## Rollback
|
||||
|
||||
To revert, switch the Task Runner configuration back to the filesystem provider and stop the Mongo migration runner. Collections can remain in place; they are append-only and harmless when unused.
|
||||
|
||||
## Configuration Reference
|
||||
|
||||
Enable the Mongo-backed stores by updating the worker and web service configuration (Compose/Helm values or `appsettings*.json`):
|
||||
|
||||
```json
|
||||
"TaskRunner": {
|
||||
"Storage": {
|
||||
"Mode": "mongo",
|
||||
"Mongo": {
|
||||
"ConnectionString": "mongodb://127.0.0.1:27017/taskrunner",
|
||||
"Database": "taskrunner",
|
||||
"RunsCollection": "pack_runs",
|
||||
"LogsCollection": "pack_run_logs",
|
||||
"ArtifactsCollection": "pack_artifacts",
|
||||
"ApprovalsCollection": "pack_run_approvals"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The worker uses the mirrored structure under the `Worker` section. Omit the `Database` property to fall back to the name embedded in the connection string.
|
||||
Reference in New Issue
Block a user