4.6 KiB
4.6 KiB
Concelier · Orchestrator Registry & Control Prep
- Date: 2025-11-20
- Scope: PREP-CONCELIER-ORCH-32-001, PREP-CONCELIER-ORCH-32-002, PREP-CONCELIER-ORCH-33-001, PREP-CONCELIER-ORCH-34-001
- Working directory:
src/Concelier/**(WebService, Core, Storage.Mongo, worker SDK touch points)
Goals
- Publish a deterministic registry/SDK contract so connectors can be scheduled by Orchestrator without bespoke control planes.
- Define heartbeats/progress envelopes and pause/throttle/backfill semantics ahead of worker wiring.
- Describe replay/backfill evidence outputs so ledger/export work can rely on stable hashes.
Registry record (authoritative fields)
All registry documents live under the orchestrator collection keyed by connectorId (stable slug). Fields and invariants:
connectorId(string, slug, lowercase) — unique per tenant + source; immutable.tenant(string) — required; enforced by WebService tenant guard.source(enum) — advisory provider (nvd,ghsa,osv,icscisa,kisa,vendor:<slug>).capabilities(array) —observations,linksets,timeline,attestationsflags; no merge/derived data.authRef(string) — reference to secrets store key; never inlined.schedule(object) —cron,timeZone,maxParallelRuns,maxLagMinutes.ratePolicy(object) —rpm,burst,cooldownSeconds; default deny if absent.artifactKinds(array) —raw-advisory,normalized,linkset,timeline,attestation.lockKey(string) — deterministic lock namespace (concelier:{tenant}:{connectorId}) for single-flight.egressGuard(object) —allowlistof hosts +airgapModeboolean; fail closed whenairgapMode=trueand host not allowlisted.createdAt/updatedAt(ISO-8601 UTC) — monotonic; updates require optimistic concurrency token.
Registry sample (non-normative)
{
"connectorId": "icscisa",
"tenant": "acme",
"source": "icscisa",
"capabilities": ["observations", "linksets", "timeline"],
"authRef": "secret:concelier/icscisa/api-key",
"schedule": {"cron": "*/30 * * * *", "timeZone": "UTC", "maxParallelRuns": 1, "maxLagMinutes": 120},
"ratePolicy": {"rpm": 60, "burst": 10, "cooldownSeconds": 30},
"artifactKinds": ["raw-advisory", "normalized", "linkset"],
"lockKey": "concelier:acme:icscisa",
"egressGuard": {"allowlist": ["icscert.kisa.or.kr"], "airgapMode": true},
"createdAt": "2025-11-20T00:00:00Z",
"updatedAt": "2025-11-20T00:00:00Z"
}
Control/SDK contract (heartbeats + commands)
- Heartbeat endpoint
POST /internal/orch/heartbeat(auth: internal orchestrator role, tenant-scoped).- Body:
connectorId,runId(GUID),status(starting|running|paused|throttled|backfill|failed|succeeded),progress(0–100),queueDepth,lastArtifactHash,lastArtifactKind,errorCode,retryAfterSeconds. - Idempotency key:
runId+sequenceto preserve ordering; orchestrator ignores stale sequence.
- Body:
- Control queue document (persisted per run):
- Commands:
pause,resume,throttle(rpm/burst override untilexpiresAt),backfill(range:fromCursor/toCursor). - Workers poll
/internal/orch/commands?connectorId={id}&runId={runId}; must ack with monotonicackSequenceto ensure replay safety.
- Commands:
- Failure semantics: on
failed, worker emitserrorCode,errorReason,lastCheckpoint(cursor/hash). Orchestrator may re-enqueue with backoff.
Backfill/replay expectations
- Backfill command requires deterministic cursor space (e.g., advisory sequence number or RFC3339 timestamp truncated to minutes).
- Worker must emit a
runManifestper backfill containing:runId,connectorId,tenant,cursorRange,artifactHashes[],dsseEnvelopeHash(if attested),completedAt. - Manifests are written to Evidence Locker ledger for replay; filenames:
backfill/{tenant}/{connectorId}/{runId}.ndjsonwith stable ordering.
Telemetry (to implement in WebService + worker SDK)
- Meter name prefix:
StellaOps.Concelier.Orch. - Counters:
concelier.orch.heartbeattags:tenant,connectorId,status.concelier.orch.command.appliedtags:tenant,connectorId,command.
- Histograms:
concelier.orch.lag.minutes(now - cursor upper bound) tags:tenant,connectorId.
- Logs: structured with
tenant,connectorId,runId,command,sequence,ackSequence.
Acceptance criteria for prep completion
- Registry/command schema above is frozen and referenced from Sprint 0114 Delivery Tracker (P10–P13) so downstream implementation knows shapes.
- Sample manifest path + naming are defined for ledger/replay flows.
- Meter names/tags enumerated for observability wiring.