Commit Graph

98 Commits

Author SHA1 Message Date
master
b851aa8300 Fix journey cluster defects + UX improvements across 7 clusters
P0 fixes (clean-start + route contracts):
- VexHub: fix migration 002 table name + add repair migration 003
- Gateway: add /console/admin and /api/v1/unknowns routes
- IDP: add platform.idp.admin scope to OAuth client + web config
- Risk: fix URL construction from authority to gateway base
- Unknowns: fix client path from /api/v1/scanner/unknowns to /api/v1/unknowns

P1 fixes (trust + shell integrity):
- Audit: fix module name normalization, add Authority audit source
- Stage: add persistence across web store, API contracts, DB migration 059
- Posture: add per-source error tracking + degradation banner

P2 fixes (adoption + workflow clarity):
- Rename Triage to Findings in navigation + breadcrumbs
- Command palette: show quick actions for plain text queries, fix scan routes
- Scan: add local-mode limitation messaging + queue hints
- Release: add post-seal promotion CTA with pre-filled release ID
- Welcome: rewrite around operator adoption model (Get Started + What Stella Replaces)

UX improvements:
- Status rail: convert to icon-only with color state + tooltips
- Event Stream Monitor: new page at /ops/operations/event-stream
- Sidebar: collapse Operations by default
- User menu: embed theme switcher (Day/Night/System), remove standalone toggle
- Settings: add Profile section with email editing + PUT /api/v1/platform/preferences/email endpoint
- Docs viewer: replace custom parser with ngx-markdown (marked) for proper table/code/blockquote rendering

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 15:10:36 +02:00
master
ea5942fa1b Ship 7 remaining journey fixes: Harbor data, scan timeout, permissions,
flicker, pack creation, export tooltip, audit guidance

Sprint A: Harbor fixture now returns realistic search results (7 repos)
  and artifact digests (3 versions with tags). Release creation wizard
  Step 2 now shows actual images to select.

Sprint B: Scan polling caps at 60 polls (3 min). Shows timeout banner
  with guidance link to Scheduled Jobs and "Keep Waiting" button.

Sprint C: /console/profile route now renders InsufficientPermissions
  component instead of 404. Shows user/tenant, guidance, and nav links.
  Catches all 24 guard redirect dead-ends.

Sprint D: Event stream chip no longer flickers DEGRADED during context
  reloads. Loading state treated as connected (transient, not error).

Sprint E: Policy Packs empty state now has inline Create Pack form.
  Calls existing PolicyApiService.createPack() backend endpoint.

Sprint F: Diagnostics Export button shows disabled tooltip "Run a
  diagnostic check first" when no results available.

Sprint G: Audit Log shows guidance text when all module counts are 0.
  Lists automatically captured event types. Confirms audit is active.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 23:00:20 +02:00
master
efa33efdbc Sprint 2+3+5: Registry search, workflow chain, unified security data
Sprint 2 — Registry image search (S2-T01/T02/T03):
  Harbor plugin: SearchRepositoriesAsync + ListArtifactsAsync calling
    Harbor /api/v2.0/search and /api/v2.0/projects/*/repositories/*/artifacts
  Platform endpoint: GET /api/v1/registries/images/search proxies to
    Harbor fixture, returns aggregated RegistryImage[] response
  Frontend: release-management.client.ts now calls /api/v1/registries/*
    instead of the nonexistent /api/registry/* path
  Gateway route: /api/v1/registries → platform (ReverseProxy)

Sprint 3 — Workflow chain links (S3-T01/T02/T03/T05):
  S3-T01: Integration detail health tab shows "Scan your first image"
    CTA after successful registry connection test
  S3-T02: Scan submit page already had "View findings" link (verified)
  S3-T03: Triage findings detail shows "Check policy gates" banner
    after recording a VEX decision
  S3-T05: Promotions list + detail show "Review blocking finding"
    link when promotion is blocked by gate failure

Sprint 5 — Unified security data (S5-T01):
  Security Posture now queries VULNERABILITY_API for triage stats
  Risk Posture card shows real finding count from triage (was hardcoded 0)
  Risk label computed from triage severity breakdown (GUARDED→HIGH)
  Blocking Items shows critical+high counts from triage
  "View in Vulnerabilities workspace" drilldown link added

Angular build: 0 errors. .NET builds: 0 errors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 16:08:22 +02:00
master
b97bffc430 Sprint 1: Scanner entry point + vulnerability navigation (S1-T01 to T07)
S1-T01: Add "Scan Image" to sidebar under Security > Security Posture children
  - New nav item with scanner:read scope, route /security/scan

S1-T02: Create Scan Image page (scan-submit.component.ts)
  - Image reference input, force rescan toggle, metadata fields
  - Submits POST /api/v1/scans/, polls for status every 3s
  - Shows progress badges (queued/scanning/completed/failed)
  - "View findings" link on completion
  - Route registered in security.routes.ts

S1-T04: Rename "Triage" to "Vulnerabilities" in sidebar + breadcrumbs
  - Sidebar label: Triage → Vulnerabilities
  - Route title and breadcrumb data updated
  - Internal route /triage/artifacts unchanged

S1-T05: Add 10 security terms to command palette quick actions
  - Scan image, View vulnerabilities, Search CVE, View findings,
    Create release, View audit log, Run diagnostics, Configure
    advisory sources, View promotions, Check policy gates

S1-T06: Add CTA buttons to Security Posture page
  - "Scan an Image" (primary) → /security/scan
  - "View Active Findings" (secondary) → /triage/artifacts

S1-T07: Gateway routes for scanner endpoints
  - /api/v1/scans → scanner.stella-ops.local (ReverseProxy)
  - /api/v1/scan-policies → scanner.stella-ops.local (ReverseProxy)
  - Added to both compose mount and source appsettings

Angular build: 0 errors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 14:27:47 +02:00
master
a86f0d1361 Add environment/target/agent CRUD endpoints to Concelier topology
The topology wizard creates environments and targets via POST /api/v1/environments
and POST /api/v1/targets. These were routed to JobEngine which doesn't have
the identity envelope middleware, causing 404 on ReverseProxy routes.

Fix: Add environment CRUD, target CRUD, and agent list endpoints directly
to Concelier's TopologySetupEndpointExtensions. These use the same
Topology.Read/Manage authorization policies that work with the identity
envelope middleware.

Routes updated:
- /api/v1/environments → Concelier (was JobEngine)
- /api/v1/agents → Concelier (new)

Topology wizard now completes steps 1-4:
  1. Region: CREATE OK
  2. Environment: CREATE OK
  3. Stage Order: OK (skip)
  4. Target: CREATE OK
  5. Agent: BLOCKED (expected — no agents deployed on fresh install)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 09:49:59 +02:00
master
da76d6e93e Add topology auth policies + journey findings notes
Concelier:
- Register Topology.Read, Topology.Manage, Topology.Admin authorization
  policies mapped to OrchRead/OrchOperate/PlatformContextRead/IntegrationWrite
  scopes. Previously these policies were referenced by endpoints but never
  registered, causing System.InvalidOperationException on every topology
  API call.

Gateway routes:
- Simplified targets/environments routes (removed specific sub-path routes,
  use catch-all patterns instead)
- Changed environments base route to JobEngine (where CRUD lives)
- Changed to ReverseProxy type for all topology routes

KNOWN ISSUE (not yet fixed):
- ReverseProxy routes don't forward the gateway's identity envelope to
  Concelier. The regions/targets/bindings endpoints return 401 because
  hasPrincipal=False — the gateway authenticates the user but doesn't
  pass the identity to the backend via ReverseProxy. Microservice routes
  use Valkey transport which includes envelope headers. Topology endpoints
  need either: (a) Valkey transport registration in Concelier, or
  (b) Concelier configured to accept raw bearer tokens on ReverseProxy paths.
  This is an architecture-level fix.

Journey findings collected so far:
- Integration wizard (Harbor + GitHub App): works end-to-end
- Advisory Check All: fixed (parallel individual checks)
- Mirror domain creation: works, generate-immediately fails silently
- Topology wizard Step 1 (Region): blocked by auth passthrough issue
- Topology wizard Step 2 (Environment): POST to JobEngine needs verify
- User ID resolution: raw hashes shown everywhere

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 08:12:39 +02:00
master
602df77467 Fix topology routes: use ReverseProxy for all topology endpoints
Changed all topology gateway routes from Microservice (Valkey transport)
to ReverseProxy (direct HTTP) because:
- Concelier topology endpoints serve via HTTP, not Valkey message bus
- JobEngine environment CRUD serves via HTTP

Routes:
- /api/v1/regions → Concelier (ReverseProxy)
- /api/v1/infrastructure-bindings → Concelier (ReverseProxy)
- /api/v1/pending-deletions → Concelier (ReverseProxy)
- /api/v1/targets → Concelier (ReverseProxy)
- /api/v1/environments/{id}/readiness → Concelier (ReverseProxy)
- /api/v1/environments → JobEngine (ReverseProxy)

All return expected auth/tenant errors (404/500) instead of 503.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 07:44:58 +02:00
master
701229b3e6 Add gateway routes for topology setup endpoints on Concelier
The topology setup wizard calls /api/v1/regions, /api/v1/infrastructure-bindings,
/api/v1/pending-deletions, and target/environment readiness+validate endpoints
which are registered on the Concelier service. Without explicit gateway routes,
these fall through to the generic Microservice matcher which tries to find a
non-existent "regions" service, returning 503.

Added 6 Microservice routes forwarding topology API paths to
http://concelier.stella-ops.local. Both compose mount config and source
appsettings.json updated.

Verified: /api/v1/regions now returns 401 (auth required) instead of 503.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 02:20:22 +02:00
master
534aabfa2a First-time user experience fixes and platform contract repairs
FTUX fixes (Sprint 316-001):
- Remove all hardcoded fake data from dashboard — fresh installs show
  honest setup guide instead of fake crisis data (5 fake criticals gone)
- Curate advisory source defaults: 32 sources disabled by default
  (ecosystem, geo-restricted, exploit, hardware, mirror). ~43 core
  sources remain enabled. StellaOps Mirror no longer enabled at priority 1.
- Filter Mirror-category sources from Create Domain wizard to prevent
  circular mirror-from-mirror chains
- Add 404 catch-all route — unknown URLs show "Page Not Found" instead
  of silently rendering the dashboard
- Fix arrow characters in release target path dropdown (? → →)
- Add login credentials to quickstart documentation
- Update Feature Matrix: 14 release orchestration features marked as
  shipped (was marked planned)

Platform contract repairs (from prior session):
- Add /api/v1/jobengine/quotas/summary endpoint on Platform
- Fix gateway route prefix matching for /policy/shadow/* and
  /policy/simulations/* (regex routes instead of exact match)
- Fix VexHub PostgresVexSourceRepository missing interface method
- Fix advisory-vex-sources sweep text expectation
- Fix mirror operator journey auth (session storage token extraction)

Verified: 110/111 canonical routes passing (1 unrelated stale approval ref)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 02:05:38 +02:00
master
27d27b1952 Align release create wizard with canonical bundle lifecycle
Wire orch:operate scope into console bootstrap so the browser token can
execute release-control actions. Replace the silent-redirect fallback
with the canonical createBundle → publishVersion → materialize flow and
surface truthful error messages on 403/409/503. Add focused Angular
tests and Playwright journey evidence for standard and hotfix paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 13:26:20 +02:00
master
bd78523564 Widen scratch iteration 011 with fixture-backed integrations QA 2026-03-14 03:11:45 +02:00
master
317e55e623 Complete scratch iteration 004 setup and grouped route-action fixes 2026-03-12 19:28:42 +02:00
master
509b97a1a7 Harden scratch setup bootstrap and authority admin scopes 2026-03-12 13:12:32 +02:00
master
ae09af4e65 Fix scratch setup image builder switch forwarding 2026-03-11 09:44:36 +02:00
master
d93006a8fa Align release publisher scopes and preserve promotion submit context 2026-03-10 19:01:16 +02:00
master
8578065675 Fix notifications surface ownership and frontdoor contracts 2026-03-10 16:54:25 +02:00
master
fc7aaf4d37 Restore platform ownership for v2 evidence routes 2026-03-10 13:10:06 +02:00
master
ffd4646d89 Harden scratch setup third-party readiness probes 2026-03-10 12:48:56 +02:00
master
d881fff387 Segment-bound doctor and scheduler frontdoor chunks 2026-03-10 12:47:51 +02:00
master
1b6051662f Repair router frontdoor route boundaries and service prefixes 2026-03-10 12:28:48 +02:00
master
6f808c3b3d remove temp files 2026-03-10 11:11:53 +02:00
master
7acf0ae8f2 Fix router frontdoor readiness and route contracts 2026-03-10 10:19:49 +02:00
master
ff4cd7e999 Restore policy frontdoor compatibility and live QA 2026-03-10 06:18:30 +02:00
master
6578c82602 Eliminate legacy gateway container (consolidate into router-gateway)
The gateway service was a redundant deployment of the same
StellaOps.Gateway.WebService binary already running as router-gateway.
It served no unique purpose — all traffic is handled by router-gateway
(slot 0). This removes the container, its route table entries, nginx
proxy blocks, health/quota stubs, and redirects STELLAOPS_GATEWAY_URL
to router.stella-ops.local so the Angular frontend resolves API base
URLs through the canonical frontdoor.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 03:50:16 +02:00
master
109f119a65 Fix router-gateway debug logging from mounted config file
router-gateway-local.json had Logging.LogLevel.StellaOps set to Debug,
overriding the compose env var Information setting. Fixed in both local
and reverseproxy config variants.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 03:46:59 +02:00
master
31cb31d0fb Eliminate Valkey queue polling fallback (phase 2 CPU optimization)
Replace hardcoded 1-5s polling constants with configurable
QueueWaitTimeoutSeconds (default 0 = pure event-driven). Consumers
now only wake on pub/sub notifications, eliminating ~118 idle
XREADGROUP polls per second across 59 services. Override with
VALKEY_QUEUE_WAIT_TIMEOUT env var if a safety-net poll is needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 02:36:01 +02:00
master
166745f9f9 Reduce idle CPU across 62 containers (phase 1)
- Add resource limits (heavy/medium/light tiers) to all 59 .NET services
- Add .NET GC tuning (server/workstation GC, DATAS, conserve memory)
- Convert FirstSignalSnapshotWriter from 10s polling to Valkey pub/sub
- Convert EnvironmentSettingsRefreshService from 60s polling to Valkey pub/sub
- Consolidate GraphAnalytics dual timers to single timer with idle-skip
- Increase healthcheck interval from 30s to 60s (configurable)
- Reduce debug logging to Information on 4 high-traffic services

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 02:16:19 +02:00
master
c0c0267ac9 Normalize live policy simulation tenant routing 2026-03-10 02:14:29 +02:00
master
72084355a6 Align policy simulation auth passthrough at the frontdoor 2026-03-10 01:55:51 +02:00
master
18246cd74c Align live console and policy governance clients 2026-03-10 01:37:42 +02:00
master
ac544c0064 Repair live watchlist frontdoor routing 2026-03-10 00:25:34 +02:00
master
dfd22281ed Repair live canonical migrations and scanner cache bootstrap 2026-03-09 21:56:41 +02:00
master
00bf2fa99a Repair live unified search corpus runtime 2026-03-09 19:44:16 +02:00
master
69923b648c fix(infra): repair gateway route ownership and add JobEngine/pack-registry scopes
- Route /api/v1/jobengine to jobengine service (was orchestrator)
- Route /api/v1/sources and /api/v1/witnesses to scanner service
- Add orch:quota and pack-registry scopes to platform OIDC token
- Align compose-local manifests with gateway appsettings.json

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 07:52:46 +02:00
master
f218ec82ec Speed up scratch image builds with publish-first contexts 2026-03-09 07:37:24 +02:00
master
c9686edf07 Restore scratch setup bootstrap and live frontdoor sweep 2026-03-09 01:42:24 +02:00
master
622f015421 Backfill live auth scope and evidence route metadata 2026-03-08 22:56:55 +02:00
master
4f445ad951 Fix live evidence and registry auth contracts 2026-03-08 22:54:36 +02:00
master
30532800ec fix(router): ship audit bundle frontdoor cutover 2026-03-08 14:30:12 +02:00
master
6eb6d5e356 fix: approval legacy route prefix and jobengine orchestrator alias
- Fix approval.client.ts legacy URL from /api/release-orchestrator/ to
  /api/v1/release-orchestrator/ matching gateway route config
- Add orchestrator.stella-ops.local alias to jobengine container so
  gateway route translation resolves correctly
- Update sprint execution log with QA iteration results (40/40 pages clean)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-06 15:26:18 +02:00
master
49763be70b context detemrinistic + randomized searches and fix for setup from stella-ops.local rather 127.1.0.* 2026-03-06 14:41:05 +02:00
master
d1b4a880e2 qa iteration 3
Fresh-DB bootstrap fixes enabling 25/25 pages zero HTTP errors:
- Fix shared.tenants schema mismatch (missing is_default column in init script 16)
- Align migration 000 column set with init script (superset for all modules)
- Seed Authority tenant + stella-ops-ui OAuth client in init script 04
- Widen Platform auth bypass to cover Docker (172.0.0.0/8) and localhost (127.0.0.0/8)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-06 02:19:05 +02:00
master
360485f556 qa iteration 1 2026-03-06 00:23:59 +02:00
master
a918d39a61 texts fixes, search bar fixes, global menu fixes. 2026-03-05 18:15:30 +02:00
master
8e1cb9448d consolidation of some of the modules, localization fixes, product advisories work, qa work 2026-03-05 03:54:22 +02:00
master
4fe8eb56ae enrich the setup. setup fixes. minimize the consolidation plan 2026-02-26 08:51:47 +02:00
master
b07d27772e search and ai stabilization work, localization stablized. 2026-02-24 23:29:36 +02:00
master
e746577380 wip: doctor/cli/docs/api to vector db consolidation; api hardening for descriptions, tenant, and scopes; migrations and conversions of all DALs to EF v10 2026-02-23 15:30:50 +02:00
master
bd8fee6ed8 stela ops usage fixes roles propagation and timoeut, one account to support multi tenants, migrations consolidation, search to support documentation, doctor and open api vector db search 2026-02-22 19:27:54 +02:00
master
a29f438f53 setup and mock fixes 2026-02-21 20:14:23 +02:00