- All active services now use their own persistence (release-orchestrator, scheduler, packsregistry)
- Zero remaining references from any active csproj
- Clean solution files (4 projects + 48 build configs removed from StellaOps.sln)
- Update README and AGENTS.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Create ISchedulerJobPlugin abstraction with JobKind routing
- Add SchedulerPluginRegistry for plugin discovery and resolution
- Wrap existing scan logic as ScanJobPlugin (zero behavioral change)
- Extend Schedule model with JobKind (default "scan") and PluginConfig (jsonb)
- Add SQL migrations 007 (job_kind/plugin_config) and 008 (doctor_trends table)
- Implement DoctorJobPlugin replacing standalone doctor-scheduler service
- Add PostgresDoctorTrendRepository for persistent trend storage
- Register Doctor trend endpoints at /api/v1/scheduler/doctor/trends/*
- Seed 3 default Doctor schedules (daily full, hourly quick, weekly compliance)
- Comment out doctor-scheduler container in compose and services-matrix
- Update Doctor architecture docs and AGENTS.md with scheduling migration info
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix dual-schema violation (scheduler was writing to scheduler + scripts)
- Move ScriptsDataSource, PostgresScriptStore, script endpoints
- Update gateway routes and UI references
- Each service now owns exactly one schema
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Workflow: add PostgreSQL auto-migration (8 tables in schema `workflow`)
with AddStartupMigrations wiring and embedded SQL migration
- Scheduler: add missing `schema_version` and `source` columns to
`scheduler.schedules` table in both init script and migration
- Platform: delay analytics maintenance 15s to avoid race with migration
020_AnalyticsRollups creating compute_daily_rollups()
- Docker: install libgssapi-krb5-2 in runtime image to eliminate Npgsql
Kerberos probe warnings across all 59 services
- Docker: remove `# syntax=docker/dockerfile:1.7` directive from both
Dockerfiles to avoid BuildKit frontend pull failures on flaky DNS
- Postgres init: add `workflow` schema to 01-create-schemas.sql
Verified: 75 containers, 0 unhealthy, 0 recurring errors after full
wipe-and-rebuild cycle.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ScriptsEndpoints to the Scheduler WebService for CRUD operations on
automation scripts. Add a reusable script-picker overlay component for
selecting scripts from the UI.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove jobengine and jobengine-worker containers from docker-compose
- Create release-orchestrator service (120 endpoints) with full auth, tenant, and infrastructure DI
- Wire workflow engine to PostgreSQL with definition store (wf_definitions table)
- Deploy 4 canonical workflow definitions on startup (release-promotion, scan-execution, advisory-refresh, compliance-sweep)
- Fix workflow definition JSON to match canonical contract schema (set-state, call-transport, decision)
- Add WorkflowClient to release-orchestrator for starting workflow instances on promotion
- Add WorkflowTriggerClient + endpoint to scheduler for triggering workflows from system schedules
- Update gateway routes from jobengine.stella-ops.local to release-orchestrator.stella-ops.local
- Remove Platform.Database dependency on JobEngine.Infrastructure
- Fix workflow csproj duplicate Content items (EmbeddedResource + SDK default)
- System-managed schedules with source column, SystemScheduleBootstrap, inline edit UI
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduce PostgresDeploymentCompatibilityStore with migration 011, in-memory
fallback, deployment endpoints, and Postgres fixture for integration tests.
Update Scheduler repository with connection policy adoption.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Set ClientName on every Redis/Valkey connection across Scanner, Signals,
Concelier, Notify, Scheduler, Timeline, and Router for easier connection
attribution in monitoring.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Completes Sprint 323 TASK-001 using Option C (direct URL rewrite):
- release-management.client.ts: readBaseUrl and legacyBaseUrl now use
/api/v1/release-orchestrator/releases, eliminating the v2 proxy dependency
- All 15+ component files updated: activity, approvals, runs, versions,
bundle-organizer, sidebar queries, topology pages
- Spec files updated to match new URL patterns
- Added /releases/activity and /releases/versions backend route aliases
in ReleaseEndpoints.cs with ListActivity and ListVersions handlers
- Fixed orphaned audit-log-dashboard.component import → audit-log-table
- Both Angular build and JobEngine build pass clean
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix#20 — Audit log empty:
Wire app.MapAuditEndpoints() in JobEngine Program.cs. The endpoint file
existed but was never registered, so /api/v1/jobengine/audit returned 404
and the Timeline unified aggregation service got 0 events.
Fix#22 — Registry search returns mock data:
Replace the catchError() synthetic mock fallback in searchImages() with
an empty array return. The release wizard will now show "no results"
instead of fabricating fake "payment-service" with "sha256:payment..."
digests. getImageDigests() returns an empty-tags placeholder on failure.
Fix#13 — Topology wizard 401 (identity envelope passthrough):
Add TryAuthenticateFromIdentityEnvelope() to Concelier's JwtBearer
OnMessageReceived handler. When no JWT bearer token is present (stripped
by gateway's IdentityHeaderPolicyMiddleware on ReverseProxy routes),
the handler reads X-StellaOps-Identity-Envelope + signature headers,
verifies the HMAC-SHA256 signature using the shared signing key, and
populates ClaimsPrincipal with subject/tenant/scopes/roles from the
envelope. This enables ReverseProxy routes to Concelier topology
endpoints to authenticate the same way Microservice/Valkey routes do.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Concelier:
- Register Topology.Read, Topology.Manage, Topology.Admin authorization
policies mapped to OrchRead/OrchOperate/PlatformContextRead/IntegrationWrite
scopes. Previously these policies were referenced by endpoints but never
registered, causing System.InvalidOperationException on every topology
API call.
Gateway routes:
- Simplified targets/environments routes (removed specific sub-path routes,
use catch-all patterns instead)
- Changed environments base route to JobEngine (where CRUD lives)
- Changed to ReverseProxy type for all topology routes
KNOWN ISSUE (not yet fixed):
- ReverseProxy routes don't forward the gateway's identity envelope to
Concelier. The regions/targets/bindings endpoints return 401 because
hasPrincipal=False — the gateway authenticates the user but doesn't
pass the identity to the backend via ReverseProxy. Microservice routes
use Valkey transport which includes envelope headers. Topology endpoints
need either: (a) Valkey transport registration in Concelier, or
(b) Concelier configured to accept raw bearer tokens on ReverseProxy paths.
This is an architecture-level fix.
Journey findings collected so far:
- Integration wizard (Harbor + GitHub App): works end-to-end
- Advisory Check All: fixed (parallel individual checks)
- Mirror domain creation: works, generate-immediately fails silently
- Topology wizard Step 1 (Region): blocked by auth passthrough issue
- Topology wizard Step 2 (Environment): POST to JobEngine needs verify
- User ID resolution: raw hashes shown everywhere
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add resource limits (heavy/medium/light tiers) to all 59 .NET services
- Add .NET GC tuning (server/workstation GC, DATAS, conserve memory)
- Convert FirstSignalSnapshotWriter from 10s polling to Valkey pub/sub
- Convert EnvironmentSettingsRefreshService from 60s polling to Valkey pub/sub
- Consolidate GraphAnalytics dual timers to single timer with idle-skip
- Increase healthcheck interval from 30s to 60s (configurable)
- Reduce debug logging to Information on 4 high-traffic services
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix 4 classes of issues that prevented JobEngine from auto-migrating:
1. Non-idempotent DDL: add IF NOT EXISTS to CREATE TABLE, wrap CREATE
TYPE in DO blocks with EXCEPTION WHEN duplicate_object, wrap partition
creation with EXCEPTION WHEN duplicate_object OR SQLSTATE '42P17'
2. Reserved keyword: quote `window` column name in 004_slo_quotas.sql
3. Invalid syntax: replace DELETE...LIMIT with ctid subquery pattern
in 004_slo_quotas.sql and 005_audit_ledger.sql
4. Partition constraint: add tenant_id to UNIQUE(log_id) constraint
on pack_run_logs in 006_pack_runs.sql (partitioned tables require
partition key in all unique constraints)
5. Non-immutable index predicate: remove NOW() from partial index
predicate in 002_backfill.sql
6. Remove BEGIN/COMMIT wrappers from all migration files (the
StartupMigrationHost already wraps each migration in a transaction)
All 8 orchestrator migrations (001-008) now apply cleanly on fresh DB.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wire AddStartupMigrations so JobEngine converges the orchestrator schema
on fresh database or wiped volumes without manual bootstrap scripts.
Adds StellaOps.Infrastructure.Postgres.Migrations dependency.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>