prep docs and service updates
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
This commit is contained in:
@@ -0,0 +1,72 @@
|
||||
# Concelier · Orchestrator Registry & Control Prep
|
||||
|
||||
- **Date:** 2025-11-20
|
||||
- **Scope:** PREP-CONCELIER-ORCH-32-001, PREP-CONCELIER-ORCH-32-002, PREP-CONCELIER-ORCH-33-001, PREP-CONCELIER-ORCH-34-001
|
||||
- **Working directory:** `src/Concelier/**` (WebService, Core, Storage.Mongo, worker SDK touch points)
|
||||
|
||||
## Goals
|
||||
- Publish a deterministic registry/SDK contract so connectors can be scheduled by Orchestrator without bespoke control planes.
|
||||
- Define heartbeats/progress envelopes and pause/throttle/backfill semantics ahead of worker wiring.
|
||||
- Describe replay/backfill evidence outputs so ledger/export work can rely on stable hashes.
|
||||
|
||||
## Registry record (authoritative fields)
|
||||
All registry documents live under the orchestrator collection keyed by `connectorId` (stable slug). Fields and invariants:
|
||||
- `connectorId` (string, slug, lowercase) — unique per tenant + source; immutable.
|
||||
- `tenant` (string) — required; enforced by WebService tenant guard.
|
||||
- `source` (enum) — advisory provider (`nvd`, `ghsa`, `osv`, `icscisa`, `kisa`, `vendor:<slug>`).
|
||||
- `capabilities` (array) — `observations`, `linksets`, `timeline`, `attestations` flags; no merge/derived data.
|
||||
- `authRef` (string) — reference to secrets store key; never inlined.
|
||||
- `schedule` (object) — `cron`, `timeZone`, `maxParallelRuns`, `maxLagMinutes`.
|
||||
- `ratePolicy` (object) — `rpm`, `burst`, `cooldownSeconds`; default deny if absent.
|
||||
- `artifactKinds` (array) — `raw-advisory`, `normalized`, `linkset`, `timeline`, `attestation`.
|
||||
- `lockKey` (string) — deterministic lock namespace (`concelier:{tenant}:{connectorId}`) for single-flight.
|
||||
- `egressGuard` (object) — `allowlist` of hosts + `airgapMode` boolean; fail closed when `airgapMode=true` and host not allowlisted.
|
||||
- `createdAt` / `updatedAt` (ISO-8601 UTC) — monotonic; updates require optimistic concurrency token.
|
||||
|
||||
### Registry sample (non-normative)
|
||||
```json
|
||||
{
|
||||
"connectorId": "icscisa",
|
||||
"tenant": "acme",
|
||||
"source": "icscisa",
|
||||
"capabilities": ["observations", "linksets", "timeline"],
|
||||
"authRef": "secret:concelier/icscisa/api-key",
|
||||
"schedule": {"cron": "*/30 * * * *", "timeZone": "UTC", "maxParallelRuns": 1, "maxLagMinutes": 120},
|
||||
"ratePolicy": {"rpm": 60, "burst": 10, "cooldownSeconds": 30},
|
||||
"artifactKinds": ["raw-advisory", "normalized", "linkset"],
|
||||
"lockKey": "concelier:acme:icscisa",
|
||||
"egressGuard": {"allowlist": ["icscert.kisa.or.kr"], "airgapMode": true},
|
||||
"createdAt": "2025-11-20T00:00:00Z",
|
||||
"updatedAt": "2025-11-20T00:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
## Control/SDK contract (heartbeats + commands)
|
||||
- Heartbeat endpoint `POST /internal/orch/heartbeat` (auth: internal orchestrator role, tenant-scoped).
|
||||
- Body: `connectorId`, `runId` (GUID), `status` (`starting|running|paused|throttled|backfill|failed|succeeded`),
|
||||
`progress` (0–100), `queueDepth`, `lastArtifactHash`, `lastArtifactKind`, `errorCode`, `retryAfterSeconds`.
|
||||
- Idempotency key: `runId` + `sequence` to preserve ordering; orchestrator ignores stale sequence.
|
||||
- Control queue document (persisted per run):
|
||||
- Commands: `pause`, `resume`, `throttle` (rpm/burst override until `expiresAt`), `backfill` (range: `fromCursor`/`toCursor`).
|
||||
- Workers poll `/internal/orch/commands?connectorId={id}&runId={runId}`; must ack with monotonic `ackSequence` to ensure replay safety.
|
||||
- Failure semantics: on `failed`, worker emits `errorCode`, `errorReason`, `lastCheckpoint` (cursor/hash). Orchestrator may re-enqueue with backoff.
|
||||
|
||||
## Backfill/replay expectations
|
||||
- Backfill command requires deterministic cursor space (e.g., advisory sequence number or RFC3339 timestamp truncated to minutes).
|
||||
- Worker must emit a `runManifest` per backfill containing: `runId`, `connectorId`, `tenant`, `cursorRange`, `artifactHashes[]`, `dsseEnvelopeHash` (if attested), `completedAt`.
|
||||
- Manifests are written to Evidence Locker ledger for replay; filenames: `backfill/{tenant}/{connectorId}/{runId}.ndjson` with stable ordering.
|
||||
|
||||
## Telemetry (to implement in WebService + worker SDK)
|
||||
- Meter name prefix: `StellaOps.Concelier.Orch`.
|
||||
- Counters:
|
||||
- `concelier.orch.heartbeat` tags: `tenant`, `connectorId`, `status`.
|
||||
- `concelier.orch.command.applied` tags: `tenant`, `connectorId`, `command`.
|
||||
- Histograms:
|
||||
- `concelier.orch.lag.minutes` (now - cursor upper bound) tags: `tenant`, `connectorId`.
|
||||
- Logs: structured with `tenant`, `connectorId`, `runId`, `command`, `sequence`, `ackSequence`.
|
||||
|
||||
## Acceptance criteria for prep completion
|
||||
- Registry/command schema above is frozen and referenced from Sprint 0114 Delivery Tracker (P10–P13) so downstream implementation knows shapes.
|
||||
- Sample manifest path + naming are defined for ledger/replay flows.
|
||||
- Meter names/tags enumerated for observability wiring.
|
||||
|
||||
Reference in New Issue
Block a user