prep docs and service updates
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled

This commit is contained in:
master
2025-11-21 06:56:36 +00:00
parent ca35db9ef4
commit d519782a8f
242 changed files with 17293 additions and 13367 deletions

View File

@@ -0,0 +1,72 @@
# Concelier · Orchestrator Registry & Control Prep
- **Date:** 2025-11-20
- **Scope:** PREP-CONCELIER-ORCH-32-001, PREP-CONCELIER-ORCH-32-002, PREP-CONCELIER-ORCH-33-001, PREP-CONCELIER-ORCH-34-001
- **Working directory:** `src/Concelier/**` (WebService, Core, Storage.Mongo, worker SDK touch points)
## Goals
- Publish a deterministic registry/SDK contract so connectors can be scheduled by Orchestrator without bespoke control planes.
- Define heartbeats/progress envelopes and pause/throttle/backfill semantics ahead of worker wiring.
- Describe replay/backfill evidence outputs so ledger/export work can rely on stable hashes.
## Registry record (authoritative fields)
All registry documents live under the orchestrator collection keyed by `connectorId` (stable slug). Fields and invariants:
- `connectorId` (string, slug, lowercase) — unique per tenant + source; immutable.
- `tenant` (string) — required; enforced by WebService tenant guard.
- `source` (enum) — advisory provider (`nvd`, `ghsa`, `osv`, `icscisa`, `kisa`, `vendor:<slug>`).
- `capabilities` (array) — `observations`, `linksets`, `timeline`, `attestations` flags; no merge/derived data.
- `authRef` (string) — reference to secrets store key; never inlined.
- `schedule` (object) — `cron`, `timeZone`, `maxParallelRuns`, `maxLagMinutes`.
- `ratePolicy` (object) — `rpm`, `burst`, `cooldownSeconds`; default deny if absent.
- `artifactKinds` (array) — `raw-advisory`, `normalized`, `linkset`, `timeline`, `attestation`.
- `lockKey` (string) — deterministic lock namespace (`concelier:{tenant}:{connectorId}`) for single-flight.
- `egressGuard` (object) — `allowlist` of hosts + `airgapMode` boolean; fail closed when `airgapMode=true` and host not allowlisted.
- `createdAt` / `updatedAt` (ISO-8601 UTC) — monotonic; updates require optimistic concurrency token.
### Registry sample (non-normative)
```json
{
"connectorId": "icscisa",
"tenant": "acme",
"source": "icscisa",
"capabilities": ["observations", "linksets", "timeline"],
"authRef": "secret:concelier/icscisa/api-key",
"schedule": {"cron": "*/30 * * * *", "timeZone": "UTC", "maxParallelRuns": 1, "maxLagMinutes": 120},
"ratePolicy": {"rpm": 60, "burst": 10, "cooldownSeconds": 30},
"artifactKinds": ["raw-advisory", "normalized", "linkset"],
"lockKey": "concelier:acme:icscisa",
"egressGuard": {"allowlist": ["icscert.kisa.or.kr"], "airgapMode": true},
"createdAt": "2025-11-20T00:00:00Z",
"updatedAt": "2025-11-20T00:00:00Z"
}
```
## Control/SDK contract (heartbeats + commands)
- Heartbeat endpoint `POST /internal/orch/heartbeat` (auth: internal orchestrator role, tenant-scoped).
- Body: `connectorId`, `runId` (GUID), `status` (`starting|running|paused|throttled|backfill|failed|succeeded`),
`progress` (0100), `queueDepth`, `lastArtifactHash`, `lastArtifactKind`, `errorCode`, `retryAfterSeconds`.
- Idempotency key: `runId` + `sequence` to preserve ordering; orchestrator ignores stale sequence.
- Control queue document (persisted per run):
- Commands: `pause`, `resume`, `throttle` (rpm/burst override until `expiresAt`), `backfill` (range: `fromCursor`/`toCursor`).
- Workers poll `/internal/orch/commands?connectorId={id}&runId={runId}`; must ack with monotonic `ackSequence` to ensure replay safety.
- Failure semantics: on `failed`, worker emits `errorCode`, `errorReason`, `lastCheckpoint` (cursor/hash). Orchestrator may re-enqueue with backoff.
## Backfill/replay expectations
- Backfill command requires deterministic cursor space (e.g., advisory sequence number or RFC3339 timestamp truncated to minutes).
- Worker must emit a `runManifest` per backfill containing: `runId`, `connectorId`, `tenant`, `cursorRange`, `artifactHashes[]`, `dsseEnvelopeHash` (if attested), `completedAt`.
- Manifests are written to Evidence Locker ledger for replay; filenames: `backfill/{tenant}/{connectorId}/{runId}.ndjson` with stable ordering.
## Telemetry (to implement in WebService + worker SDK)
- Meter name prefix: `StellaOps.Concelier.Orch`.
- Counters:
- `concelier.orch.heartbeat` tags: `tenant`, `connectorId`, `status`.
- `concelier.orch.command.applied` tags: `tenant`, `connectorId`, `command`.
- Histograms:
- `concelier.orch.lag.minutes` (now - cursor upper bound) tags: `tenant`, `connectorId`.
- Logs: structured with `tenant`, `connectorId`, `runId`, `command`, `sequence`, `ackSequence`.
## Acceptance criteria for prep completion
- Registry/command schema above is frozen and referenced from Sprint 0114 Delivery Tracker (P10P13) so downstream implementation knows shapes.
- Sample manifest path + naming are defined for ledger/replay flows.
- Meter names/tags enumerated for observability wiring.