Commit Graph

34 Commits

Author SHA1 Message Date
master
50abd2137f Update docs, sprint plans, and compose configuration
Add 12 new sprint files (Integrations, Graph, JobEngine, FE, Router,
AdvisoryAI), archive completed scheduler UI sprint, update module
architecture docs (router, graph, jobengine, web, integrations),
and add Gitea entrypoint script for local dev.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 08:53:50 +03:00
master
88eba753ee Isolate Authority DB from Concelier write pressure
Problem: All 46+ services share one PostgreSQL database and connection
pool. When Concelier runs advisory sync jobs (heavy writes), the shared
pool starves Authority's OIDC token validation, causing login timeouts.

Fix: Create a dedicated stellaops_authority database on the same Postgres
instance. Authority gets its own connection string with an independent
Npgsql connection pool (Maximum Pool Size=20, Minimum Pool Size=2).

Changes:
- 00-create-authority-db.sql: Creates stellaops_authority database
- 04b-authority-dedicated-schema.sql: Applies full Authority schema
  (tables, indexes, RLS, triggers, seed data) to the dedicated DB
- docker-compose.stella-ops.yml: New x-postgres-authority-connection
  anchor pointing to stellaops_authority. Authority service env updated.
  Shared pool reduced to Maximum Pool Size=50.

The existing stellaops_platform.authority schema remains for backward
compatibility. Authority reads/writes from the isolated database.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 12:32:03 +03:00
master
8536a6c707 Update compose config, policy simulation, and workflow replay
- devops/compose: README, docker-compose, hosts updates
- Policy simulation: pre-promotion and test-validate panels,
  routes, dashboard, and spec updates
- Workflow visualization: run-graph replay page template update
- Claude settings update

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 17:26:02 +03:00
master
da76d6e93e Add topology auth policies + journey findings notes
Concelier:
- Register Topology.Read, Topology.Manage, Topology.Admin authorization
  policies mapped to OrchRead/OrchOperate/PlatformContextRead/IntegrationWrite
  scopes. Previously these policies were referenced by endpoints but never
  registered, causing System.InvalidOperationException on every topology
  API call.

Gateway routes:
- Simplified targets/environments routes (removed specific sub-path routes,
  use catch-all patterns instead)
- Changed environments base route to JobEngine (where CRUD lives)
- Changed to ReverseProxy type for all topology routes

KNOWN ISSUE (not yet fixed):
- ReverseProxy routes don't forward the gateway's identity envelope to
  Concelier. The regions/targets/bindings endpoints return 401 because
  hasPrincipal=False — the gateway authenticates the user but doesn't
  pass the identity to the backend via ReverseProxy. Microservice routes
  use Valkey transport which includes envelope headers. Topology endpoints
  need either: (a) Valkey transport registration in Concelier, or
  (b) Concelier configured to accept raw bearer tokens on ReverseProxy paths.
  This is an architecture-level fix.

Journey findings collected so far:
- Integration wizard (Harbor + GitHub App): works end-to-end
- Advisory Check All: fixed (parallel individual checks)
- Mirror domain creation: works, generate-immediately fails silently
- Topology wizard Step 1 (Region): blocked by auth passthrough issue
- Topology wizard Step 2 (Environment): POST to JobEngine needs verify
- User ID resolution: raw hashes shown everywhere

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 08:12:39 +02:00
master
27d27b1952 Align release create wizard with canonical bundle lifecycle
Wire orch:operate scope into console bootstrap so the browser token can
execute release-control actions. Replace the silent-redirect fallback
with the canonical createBundle → publishVersion → materialize flow and
surface truthful error messages on 403/409/503. Add focused Angular
tests and Playwright journey evidence for standard and hotfix paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 13:26:20 +02:00
master
509b97a1a7 Harden scratch setup bootstrap and authority admin scopes 2026-03-12 13:12:32 +02:00
master
d93006a8fa Align release publisher scopes and preserve promotion submit context 2026-03-10 19:01:16 +02:00
master
7acf0ae8f2 Fix router frontdoor readiness and route contracts 2026-03-10 10:19:49 +02:00
master
6578c82602 Eliminate legacy gateway container (consolidate into router-gateway)
The gateway service was a redundant deployment of the same
StellaOps.Gateway.WebService binary already running as router-gateway.
It served no unique purpose — all traffic is handled by router-gateway
(slot 0). This removes the container, its route table entries, nginx
proxy blocks, health/quota stubs, and redirects STELLAOPS_GATEWAY_URL
to router.stella-ops.local so the Angular frontend resolves API base
URLs through the canonical frontdoor.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 03:50:16 +02:00
master
31cb31d0fb Eliminate Valkey queue polling fallback (phase 2 CPU optimization)
Replace hardcoded 1-5s polling constants with configurable
QueueWaitTimeoutSeconds (default 0 = pure event-driven). Consumers
now only wake on pub/sub notifications, eliminating ~118 idle
XREADGROUP polls per second across 59 services. Override with
VALKEY_QUEUE_WAIT_TIMEOUT env var if a safety-net poll is needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 02:36:01 +02:00
master
166745f9f9 Reduce idle CPU across 62 containers (phase 1)
- Add resource limits (heavy/medium/light tiers) to all 59 .NET services
- Add .NET GC tuning (server/workstation GC, DATAS, conserve memory)
- Convert FirstSignalSnapshotWriter from 10s polling to Valkey pub/sub
- Convert EnvironmentSettingsRefreshService from 60s polling to Valkey pub/sub
- Consolidate GraphAnalytics dual timers to single timer with idle-skip
- Increase healthcheck interval from 30s to 60s (configurable)
- Reduce debug logging to Information on 4 high-traffic services

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 02:16:19 +02:00
master
18246cd74c Align live console and policy governance clients 2026-03-10 01:37:42 +02:00
master
dfd22281ed Repair live canonical migrations and scanner cache bootstrap 2026-03-09 21:56:41 +02:00
master
00bf2fa99a Repair live unified search corpus runtime 2026-03-09 19:44:16 +02:00
master
69923b648c fix(infra): repair gateway route ownership and add JobEngine/pack-registry scopes
- Route /api/v1/jobengine to jobengine service (was orchestrator)
- Route /api/v1/sources and /api/v1/witnesses to scanner service
- Add orch:quota and pack-registry scopes to platform OIDC token
- Align compose-local manifests with gateway appsettings.json

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 07:52:46 +02:00
master
c9686edf07 Restore scratch setup bootstrap and live frontdoor sweep 2026-03-09 01:42:24 +02:00
master
4f445ad951 Fix live evidence and registry auth contracts 2026-03-08 22:54:36 +02:00
master
6eb6d5e356 fix: approval legacy route prefix and jobengine orchestrator alias
- Fix approval.client.ts legacy URL from /api/release-orchestrator/ to
  /api/v1/release-orchestrator/ matching gateway route config
- Add orchestrator.stella-ops.local alias to jobengine container so
  gateway route translation resolves correctly
- Update sprint execution log with QA iteration results (40/40 pages clean)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-06 15:26:18 +02:00
master
d1b4a880e2 qa iteration 3
Fresh-DB bootstrap fixes enabling 25/25 pages zero HTTP errors:
- Fix shared.tenants schema mismatch (missing is_default column in init script 16)
- Align migration 000 column set with init script (superset for all modules)
- Seed Authority tenant + stella-ops-ui OAuth client in init script 04
- Widen Platform auth bypass to cover Docker (172.0.0.0/8) and localhost (127.0.0.0/8)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-06 02:19:05 +02:00
master
360485f556 qa iteration 1 2026-03-06 00:23:59 +02:00
master
a918d39a61 texts fixes, search bar fixes, global menu fixes. 2026-03-05 18:15:30 +02:00
master
8e1cb9448d consolidation of some of the modules, localization fixes, product advisories work, qa work 2026-03-05 03:54:22 +02:00
master
bd8fee6ed8 stela ops usage fixes roles propagation and timoeut, one account to support multi tenants, migrations consolidation, search to support documentation, doctor and open api vector db search 2026-02-22 19:27:54 +02:00
master
7e36c1f151 doctor and setup fixes 2026-02-21 09:45:32 +02:00
master
1ec797d5e8 ui progressing 2026-02-20 23:32:20 +02:00
master
cb3e361fcf e2e observation fixes 2026-02-18 22:47:34 +02:00
master
1bcab39a2c Finish off old sprints 2026-02-18 15:01:04 +02:00
master
49cdebe2f1 compose and authority fixes. finish sprints. 2026-02-18 12:00:10 +02:00
master
fb46a927ad save changes 2026-02-17 00:51:35 +02:00
master
70fdbfcf25 Stabilize U 2026-02-16 07:33:20 +02:00
master
ab794e167c frontend styling fixes 2026-02-15 12:00:34 +02:00
master
9911b7d73c save checkpoint 2026-02-12 21:02:43 +02:00
master
5548cf83bf part #2 2026-02-04 19:59:20 +02:00
master
a743bb9a1d devops folders consolidate 2026-01-25 23:39:14 +02:00