Files
git.stella-ops.org/docs/modules/concelier/prep/2025-11-20-orchestrator-registry-prep.md
master d519782a8f
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
prep docs and service updates
2025-11-21 06:56:36 +00:00

4.6 KiB
Raw Blame History

Concelier · Orchestrator Registry & Control Prep

  • Date: 2025-11-20
  • Scope: PREP-CONCELIER-ORCH-32-001, PREP-CONCELIER-ORCH-32-002, PREP-CONCELIER-ORCH-33-001, PREP-CONCELIER-ORCH-34-001
  • Working directory: src/Concelier/** (WebService, Core, Storage.Mongo, worker SDK touch points)

Goals

  • Publish a deterministic registry/SDK contract so connectors can be scheduled by Orchestrator without bespoke control planes.
  • Define heartbeats/progress envelopes and pause/throttle/backfill semantics ahead of worker wiring.
  • Describe replay/backfill evidence outputs so ledger/export work can rely on stable hashes.

Registry record (authoritative fields)

All registry documents live under the orchestrator collection keyed by connectorId (stable slug). Fields and invariants:

  • connectorId (string, slug, lowercase) — unique per tenant + source; immutable.
  • tenant (string) — required; enforced by WebService tenant guard.
  • source (enum) — advisory provider (nvd, ghsa, osv, icscisa, kisa, vendor:<slug>).
  • capabilities (array) — observations, linksets, timeline, attestations flags; no merge/derived data.
  • authRef (string) — reference to secrets store key; never inlined.
  • schedule (object) — cron, timeZone, maxParallelRuns, maxLagMinutes.
  • ratePolicy (object) — rpm, burst, cooldownSeconds; default deny if absent.
  • artifactKinds (array) — raw-advisory, normalized, linkset, timeline, attestation.
  • lockKey (string) — deterministic lock namespace (concelier:{tenant}:{connectorId}) for single-flight.
  • egressGuard (object) — allowlist of hosts + airgapMode boolean; fail closed when airgapMode=true and host not allowlisted.
  • createdAt / updatedAt (ISO-8601 UTC) — monotonic; updates require optimistic concurrency token.

Registry sample (non-normative)

{
  "connectorId": "icscisa",
  "tenant": "acme",
  "source": "icscisa",
  "capabilities": ["observations", "linksets", "timeline"],
  "authRef": "secret:concelier/icscisa/api-key",
  "schedule": {"cron": "*/30 * * * *", "timeZone": "UTC", "maxParallelRuns": 1, "maxLagMinutes": 120},
  "ratePolicy": {"rpm": 60, "burst": 10, "cooldownSeconds": 30},
  "artifactKinds": ["raw-advisory", "normalized", "linkset"],
  "lockKey": "concelier:acme:icscisa",
  "egressGuard": {"allowlist": ["icscert.kisa.or.kr"], "airgapMode": true},
  "createdAt": "2025-11-20T00:00:00Z",
  "updatedAt": "2025-11-20T00:00:00Z"
}

Control/SDK contract (heartbeats + commands)

  • Heartbeat endpoint POST /internal/orch/heartbeat (auth: internal orchestrator role, tenant-scoped).
    • Body: connectorId, runId (GUID), status (starting|running|paused|throttled|backfill|failed|succeeded), progress (0100), queueDepth, lastArtifactHash, lastArtifactKind, errorCode, retryAfterSeconds.
    • Idempotency key: runId + sequence to preserve ordering; orchestrator ignores stale sequence.
  • Control queue document (persisted per run):
    • Commands: pause, resume, throttle (rpm/burst override until expiresAt), backfill (range: fromCursor/toCursor).
    • Workers poll /internal/orch/commands?connectorId={id}&runId={runId}; must ack with monotonic ackSequence to ensure replay safety.
  • Failure semantics: on failed, worker emits errorCode, errorReason, lastCheckpoint (cursor/hash). Orchestrator may re-enqueue with backoff.

Backfill/replay expectations

  • Backfill command requires deterministic cursor space (e.g., advisory sequence number or RFC3339 timestamp truncated to minutes).
  • Worker must emit a runManifest per backfill containing: runId, connectorId, tenant, cursorRange, artifactHashes[], dsseEnvelopeHash (if attested), completedAt.
  • Manifests are written to Evidence Locker ledger for replay; filenames: backfill/{tenant}/{connectorId}/{runId}.ndjson with stable ordering.

Telemetry (to implement in WebService + worker SDK)

  • Meter name prefix: StellaOps.Concelier.Orch.
  • Counters:
    • concelier.orch.heartbeat tags: tenant, connectorId, status.
    • concelier.orch.command.applied tags: tenant, connectorId, command.
  • Histograms:
    • concelier.orch.lag.minutes (now - cursor upper bound) tags: tenant, connectorId.
  • Logs: structured with tenant, connectorId, runId, command, sequence, ackSequence.

Acceptance criteria for prep completion

  • Registry/command schema above is frozen and referenced from Sprint 0114 Delivery Tracker (P10P13) so downstream implementation knows shapes.
  • Sample manifest path + naming are defined for ledger/replay flows.
  • Meter names/tags enumerated for observability wiring.