feat(policy): persist gate evaluation queue, snapshots, orchestrator jobs

Policy Engine: moves gate evaluation, snapshots, orchestrator job tracking,
and ledger export from in-memory state to Postgres-backed stores.

- New persistence migrations 007 (runtime state), 008 (snapshot artifact
  identity), 009 (orchestrator jobs).
- New repositories: PolicyEngineSnapshotRepository,
  PolicyEngineLedgerExportRepository, PolicyEngineOrchestratorJobRepository,
  WorkerResultRepository.
- Gateway services: GateEvaluationJobDispatchService,
  GateEvaluationJobStatusService, GateEvaluationJobWorker,
  SchedulerBackedGateEvaluationQueue (plus Unsupported fallback),
  GateTargetSnapshotMaterializer, PersistedKnowledgeSnapshotStore,
  GateBaselineBootstrapper, PolicyGateEvaluationJobExecutor.
- New endpoints: GateJobEndpoints for job status + dispatch.
- Worker host: PolicyOrchestratorJobWorkerHost to drain the persistent queue.
- PersistedOrchestratorStores + DeltaSnapshotServiceAdapter swap in the
  persistent implementations via DI.

Tests: PersistedDeltaRuntimeTests, PolicyEngineGateTargetSnapshotRuntimeTests,
PolicyEngineRegistryWebhookRuntimeTests, PostgresLedgerExportStoreTests,
PostgresSnapshotStoreTests, PolicyGatewayPersistedDeltaRuntimeTests,
RegistryWebhookQueueRuntimeTests. Archives the old S001 demo seed.

Docs: policy API + architecture pages updated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
master
2026-04-15 11:14:41 +03:00
parent d8f6bade9a
commit 786d09b88f
70 changed files with 5994 additions and 768 deletions

View File

@@ -450,12 +450,28 @@ POST /api/v1/webhooks/registry/harbor
POST /api/v1/webhooks/registry/generic
```
Webhook handlers enqueue async gate evaluation jobs in the Scheduler via `GateEvaluationJob`.
Webhook push handlers now use a runtime-selected async gate-evaluation path. When `Postgres:Scheduler` is not configured, both Policy hosts resolve `IGateEvaluationQueue` to an explicit unsupported runtime adapter and return `501 problem+json` instead of fabricating queued work. When scheduler persistence is configured, both hosts register the shared scheduler-backed queue runtime, auto-migrate the scheduler schema, enqueue deduplicated `policy.gate-evaluation` jobs, dispatch them through the worker service, and expose persisted request/decision status from `GET /api/v1/policy/gate/jobs/{jobId}`. The previous process-local `InMemoryGateEvaluationQueue` and background worker path was removed because it fabricated "no drift" gate contexts and fake job IDs instead of dispatching real work.
#### Gate Bypass Auditing
Bypass attempts are logged to `policy.gate_bypass_audit`:
- The live `POST /api/v1/policy/gate/evaluate` runtime now resolves gate-bypass auditing through the PostgreSQL-backed `PostgresGateBypassAuditRepository`, scoped by the current tenant context. When a tenant context is genuinely absent, the adapter falls back deterministically to tenant `public` rather than using a process-local in-memory store.
- `StellaOps.Policy.Gateway` now uses the same PostgreSQL-backed gate-bypass audit path through the unified `IStellaOpsTenantAccessor`, so the standalone gateway no longer keeps a separate in-memory audit repository for compatibility routes.
#### Runtime Snapshots And Ledger Exports
- The live snapshot surface now persists engine runtime state in PostgreSQL-owned tables `policy.engine_ledger_exports` and `policy.engine_snapshots`, reached through `PostgresLedgerExportStore` and `PostgresSnapshotStore` rather than the previous process-local in-memory stores.
- Sync and async gate evaluation now materialize a tenant-scoped target snapshot in `policy.engine_snapshots` before delta computation. The runtime derives that target snapshot from the latest persisted `policy.engine_ledger_exports` document (or the baseline snapshot's export), stamps the artifact digest/repository/tag onto the snapshot row, and then passes the real snapshot identifier into `DeltaComputer`.
- `IOrchestratorJobStore` and `IWorkerResultStore` now resolve to persisted adapters over `policy.orchestrator_jobs` and `policy.worker_results`, so Policy export/bootstrap logic survives host recreates instead of depending on process-local completed-job state.
- Direct `/policy/orchestrator/jobs` submissions now use a real producer runtime. `OrchestratorJobService.SubmitAsync` signals `PolicyOrchestratorJobWorkerHost`, the host leases the next queued job from `IOrchestratorJobStore`, marks it `running`, executes `PolicyWorkerService`, persists `policy.worker_results`, and records terminal `completed` or `failed` state instead of requiring a separate manual `/policy/worker/run` call.
- The deterministic `/api/policy/eval/batch` surface remains stateless by contract. It returns evaluation results to callers but does not populate `policy.orchestrator_jobs` or `policy.worker_results`.
- When a gate request omits an explicit baseline reference and the tenant has no persisted baseline snapshot yet, the engine now auto-builds the first ledger export from completed persisted Policy results and auto-creates a baseline snapshot before materializing the target snapshot. Explicit baseline references remain strict: if the caller asks for a missing snapshot ID, the runtime fails instead of inventing one.
- `StellaOps.Policy.Persistence` now applies startup migrations for the Policy schema on `policy-engine` boot, and `001_initial_schema.sql` is idempotent on reused local volumes so snapshot/export runtime convergence does not depend on a fresh database.
- The merged gateway compatibility routes now register the unified StellaOps tenant accessor and middleware alongside the Policy-specific tenant context middleware. This keeps copied `RequireTenant()` filters from failing pre-handler with `500` and allows the persisted delta compatibility path to reach the real `DeltaComputer`.
- The live delta compatibility surface now projects persisted engine snapshots through `PersistedKnowledgeSnapshotStore` and `DeltaSnapshotServiceAdapter`, so tenant-scoped `/api/policy/deltas/compute` requests fail only on normal contract/data issues rather than process-local tenant or snapshot-store gaps.
- `StellaOps.Policy.Gateway` now uses the same persisted delta projection path for its standalone compatibility host: `ISnapshotStore` resolves to `PersistedKnowledgeSnapshotStore` and `StellaOps.Policy.Deltas.ISnapshotService` resolves to the engine-owned `DeltaSnapshotServiceAdapter`, replacing the old `InMemorySnapshotStore` path that fabricated mostly-empty compatibility input.
```json
{
"bypassId": "bypass-uuid",