# Concelier · Orchestrator Registry & Control Prep - **Date:** 2025-11-20 - **Scope:** PREP-CONCELIER-ORCH-32-001, PREP-CONCELIER-ORCH-32-002, PREP-CONCELIER-ORCH-33-001, PREP-CONCELIER-ORCH-34-001 - **Working directory:** `src/Concelier/**` (WebService, Core, Storage.Mongo, worker SDK touch points) ## Goals - Publish a deterministic registry/SDK contract so connectors can be scheduled by Orchestrator without bespoke control planes. - Define heartbeats/progress envelopes and pause/throttle/backfill semantics ahead of worker wiring. - Describe replay/backfill evidence outputs so ledger/export work can rely on stable hashes. ## Registry record (authoritative fields) All registry documents live under the orchestrator collection keyed by `connectorId` (stable slug). Fields and invariants: - `connectorId` (string, slug, lowercase) — unique per tenant + source; immutable. - `tenant` (string) — required; enforced by WebService tenant guard. - `source` (enum) — advisory provider (`nvd`, `ghsa`, `osv`, `icscisa`, `kisa`, `vendor:`). - `capabilities` (array) — `observations`, `linksets`, `timeline`, `attestations` flags; no merge/derived data. - `authRef` (string) — reference to secrets store key; never inlined. - `schedule` (object) — `cron`, `timeZone`, `maxParallelRuns`, `maxLagMinutes`. - `ratePolicy` (object) — `rpm`, `burst`, `cooldownSeconds`; default deny if absent. - `artifactKinds` (array) — `raw-advisory`, `normalized`, `linkset`, `timeline`, `attestation`. - `lockKey` (string) — deterministic lock namespace (`concelier:{tenant}:{connectorId}`) for single-flight. - `egressGuard` (object) — `allowlist` of hosts + `airgapMode` boolean; fail closed when `airgapMode=true` and host not allowlisted. - `createdAt` / `updatedAt` (ISO-8601 UTC) — monotonic; updates require optimistic concurrency token. ### Registry sample (non-normative) ```json { "connectorId": "icscisa", "tenant": "acme", "source": "icscisa", "capabilities": ["observations", "linksets", "timeline"], "authRef": "secret:concelier/icscisa/api-key", "schedule": {"cron": "*/30 * * * *", "timeZone": "UTC", "maxParallelRuns": 1, "maxLagMinutes": 120}, "ratePolicy": {"rpm": 60, "burst": 10, "cooldownSeconds": 30}, "artifactKinds": ["raw-advisory", "normalized", "linkset"], "lockKey": "concelier:acme:icscisa", "egressGuard": {"allowlist": ["icscert.kisa.or.kr"], "airgapMode": true}, "createdAt": "2025-11-20T00:00:00Z", "updatedAt": "2025-11-20T00:00:00Z" } ``` ## Control/SDK contract (heartbeats + commands) - Heartbeat endpoint `POST /internal/orch/heartbeat` (auth: internal orchestrator role, tenant-scoped). - Body: `connectorId`, `runId` (GUID), `status` (`starting|running|paused|throttled|backfill|failed|succeeded`), `progress` (0–100), `queueDepth`, `lastArtifactHash`, `lastArtifactKind`, `errorCode`, `retryAfterSeconds`. - Idempotency key: `runId` + `sequence` to preserve ordering; orchestrator ignores stale sequence. - Control queue document (persisted per run): - Commands: `pause`, `resume`, `throttle` (rpm/burst override until `expiresAt`), `backfill` (range: `fromCursor`/`toCursor`). - Workers poll `/internal/orch/commands?connectorId={id}&runId={runId}`; must ack with monotonic `ackSequence` to ensure replay safety. - Failure semantics: on `failed`, worker emits `errorCode`, `errorReason`, `lastCheckpoint` (cursor/hash). Orchestrator may re-enqueue with backoff. ## Backfill/replay expectations - Backfill command requires deterministic cursor space (e.g., advisory sequence number or RFC3339 timestamp truncated to minutes). - Worker must emit a `runManifest` per backfill containing: `runId`, `connectorId`, `tenant`, `cursorRange`, `artifactHashes[]`, `dsseEnvelopeHash` (if attested), `completedAt`. - Manifests are written to Evidence Locker ledger for replay; filenames: `backfill/{tenant}/{connectorId}/{runId}.ndjson` with stable ordering. ## Telemetry (to implement in WebService + worker SDK) - Meter name prefix: `StellaOps.Concelier.Orch`. - Counters: - `concelier.orch.heartbeat` tags: `tenant`, `connectorId`, `status`. - `concelier.orch.command.applied` tags: `tenant`, `connectorId`, `command`. - Histograms: - `concelier.orch.lag.minutes` (now - cursor upper bound) tags: `tenant`, `connectorId`. - Logs: structured with `tenant`, `connectorId`, `runId`, `command`, `sequence`, `ackSequence`. ## Acceptance criteria for prep completion - Registry/command schema above is frozen and referenced from Sprint 0114 Delivery Tracker (P10–P13) so downstream implementation knows shapes. - Sample manifest path + naming are defined for ledger/replay flows. - Meter names/tags enumerated for observability wiring.