The 15-minute cron (0,15,30,45 * * * *) caused the fetch/parse/map
pipeline to fire 4x per hour, creating constant DB write pressure.
This overlapped with e2e test runs and caused advisory-source API
timeouts due to shared Postgres contention.
Changed to every 4 hours (0 */4 * * *) which is appropriate for
advisory data freshness — Red Hat advisories don't update every 15min.
Parse/map stages staggered at +10min and +20min offsets.
Manual sync via POST /advisory-sources/redhat/sync remains available
for on-demand refreshes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three-layer defense against Concelier overload during bulk advisory sync:
Layer 1 — Freshness query cache (30s TTL):
GET /advisory-sources, /advisory-sources/summary, and
/{id}/freshness now cache their results in IMemoryCache for 30s.
Eliminates the expensive 4-table LEFT JOIN with computed freshness
on every call during sync storms.
Layer 2 — Backpressure on sync endpoint (429 + Retry-After):
POST /{sourceId}/sync checks active job count via GetActiveRunsAsync().
When active runs >= MaxConcurrentJobs, returns 429 Too Many Requests
with Retry-After: 30 header. Clients get a clear signal to back off.
Layer 3 — Staged sync-all with inter-batch delay:
POST /sync now triggers sources in batches of MaxConcurrentJobs
(default: 6) with SyncBatchDelaySeconds (default: 5s) between batches.
21 sources → 4 batches over ~15s instead of 21 instant triggers.
Each batch triggers in parallel (Task.WhenAll), then delays.
New config: JobScheduler:SyncBatchDelaySeconds (default: 5)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Problem: Triggering sync on all 21+ advisory sources simultaneously
fires 21 background fetch jobs that all compete for DB connections,
HTTP connections, and CPU. This overwhelms the service, causing 504
gateway timeouts on subsequent API calls.
Fix: Add a SemaphoreSlim in JobCoordinator.ExecuteJobAsync gated by
MaxConcurrentJobs (default: 6). When more than 6 jobs are triggered
concurrently, excess jobs queue behind the semaphore rather than all
executing at once.
- JobSchedulerOptions: new MaxConcurrentJobs property (default 6)
- JobCoordinator: SemaphoreSlim wraps ExecuteJobAsync, extracted
ExecuteJobCoreAsync for the actual execution logic
- Configurable via appsettings: JobScheduler:MaxConcurrentJobs
The lease-based per-job deduplication still prevents the same job
kind from running twice. This new limit caps total concurrent jobs
across all kinds.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The topology wizard creates environments and targets via POST /api/v1/environments
and POST /api/v1/targets. These were routed to JobEngine which doesn't have
the identity envelope middleware, causing 404 on ReverseProxy routes.
Fix: Add environment CRUD, target CRUD, and agent list endpoints directly
to Concelier's TopologySetupEndpointExtensions. These use the same
Topology.Read/Manage authorization policies that work with the identity
envelope middleware.
Routes updated:
- /api/v1/environments → Concelier (was JobEngine)
- /api/v1/agents → Concelier (new)
Topology wizard now completes steps 1-4:
1. Region: CREATE OK
2. Environment: CREATE OK
3. Stage Order: OK (skip)
4. Target: CREATE OK
5. Agent: BLOCKED (expected — no agents deployed on fresh install)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The identity envelope PostConfigure on JwtBearerOptions didn't work because
AddStellaOpsResourceServerAuthentication configures its own events that
override PostConfigure. The OnMessageReceived handler was only in the
TestSigningSecret branch, never in the OIDC discovery branch used in prod.
Fix: Add a middleware BEFORE UseAuthentication() that reads
X-StellaOps-Identity-Envelope headers, verifies HMAC-SHA256 signature
using Router:IdentityEnvelopeSigningKey (from router-microservice-defaults),
and sets HttpContext.User with claims from the envelope.
Also fixed: read signing key from Router:IdentityEnvelopeSigningKey config
path (matches the compose env var Router__IdentityEnvelopeSigningKey from
x-router-microservice-defaults).
Verified: Topology wizard "Create Region" now succeeds — Next button enables.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix#20 — Audit log empty:
Wire app.MapAuditEndpoints() in JobEngine Program.cs. The endpoint file
existed but was never registered, so /api/v1/jobengine/audit returned 404
and the Timeline unified aggregation service got 0 events.
Fix#22 — Registry search returns mock data:
Replace the catchError() synthetic mock fallback in searchImages() with
an empty array return. The release wizard will now show "no results"
instead of fabricating fake "payment-service" with "sha256:payment..."
digests. getImageDigests() returns an empty-tags placeholder on failure.
Fix#13 — Topology wizard 401 (identity envelope passthrough):
Add TryAuthenticateFromIdentityEnvelope() to Concelier's JwtBearer
OnMessageReceived handler. When no JWT bearer token is present (stripped
by gateway's IdentityHeaderPolicyMiddleware on ReverseProxy routes),
the handler reads X-StellaOps-Identity-Envelope + signature headers,
verifies the HMAC-SHA256 signature using the shared signing key, and
populates ClaimsPrincipal with subject/tenant/scopes/roles from the
envelope. This enables ReverseProxy routes to Concelier topology
endpoints to authenticate the same way Microservice/Valkey routes do.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Concelier:
- Register Topology.Read, Topology.Manage, Topology.Admin authorization
policies mapped to OrchRead/OrchOperate/PlatformContextRead/IntegrationWrite
scopes. Previously these policies were referenced by endpoints but never
registered, causing System.InvalidOperationException on every topology
API call.
Gateway routes:
- Simplified targets/environments routes (removed specific sub-path routes,
use catch-all patterns instead)
- Changed environments base route to JobEngine (where CRUD lives)
- Changed to ReverseProxy type for all topology routes
KNOWN ISSUE (not yet fixed):
- ReverseProxy routes don't forward the gateway's identity envelope to
Concelier. The regions/targets/bindings endpoints return 401 because
hasPrincipal=False — the gateway authenticates the user but doesn't
pass the identity to the backend via ReverseProxy. Microservice routes
use Valkey transport which includes envelope headers. Topology endpoints
need either: (a) Valkey transport registration in Concelier, or
(b) Concelier configured to accept raw bearer tokens on ReverseProxy paths.
This is an architecture-level fix.
Journey findings collected so far:
- Integration wizard (Harbor + GitHub App): works end-to-end
- Advisory Check All: fixed (parallel individual checks)
- Mirror domain creation: works, generate-immediately fails silently
- Topology wizard Step 1 (Region): blocked by auth passthrough issue
- Topology wizard Step 2 (Environment): POST to JobEngine needs verify
- User ID resolution: raw hashes shown everywhere
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>