Commit Graph

618 Commits

Author SHA1 Message Date
master
b302a5a3d6 Preserve deployment evidence navigation scope 2026-03-10 13:35:00 +02:00
master
1fe3f489f1 Finalize topbar status chip ownership split 2026-03-10 13:20:17 +02:00
master
0e764da736 Align mission control with shared context scope 2026-03-10 13:13:57 +02:00
master
fc7aaf4d37 Restore platform ownership for v2 evidence routes 2026-03-10 13:10:06 +02:00
master
d881fff387 Segment-bound doctor and scheduler frontdoor chunks 2026-03-10 12:47:51 +02:00
master
1b6051662f Repair router frontdoor route boundaries and service prefixes 2026-03-10 12:28:48 +02:00
master
7acf0ae8f2 Fix router frontdoor readiness and route contracts 2026-03-10 10:19:49 +02:00
master
eae2dfc9d4 Harden policy simulation direct-route defaults 2026-03-10 09:09:29 +02:00
master
db7371de03 Add live integrations sweep harness script 2026-03-10 08:12:15 +02:00
master
011aebc802 Ignore aborted navigations in ops policy sweep runtime accounting 2026-03-10 07:55:45 +02:00
master
f0535bcdf6 Harden live frontdoor authentication harness 2026-03-10 07:39:58 +02:00
master
425bccf10a Preserve topology and triage scope in live setup flows 2026-03-10 07:37:20 +02:00
master
b9aa1dbe24 Add live mission control action sweep 2026-03-10 06:35:05 +02:00
master
ff4cd7e999 Restore policy frontdoor compatibility and live QA 2026-03-10 06:18:30 +02:00
master
6578c82602 Eliminate legacy gateway container (consolidate into router-gateway)
The gateway service was a redundant deployment of the same
StellaOps.Gateway.WebService binary already running as router-gateway.
It served no unique purpose — all traffic is handled by router-gateway
(slot 0). This removes the container, its route table entries, nginx
proxy blocks, health/quota stubs, and redirects STELLAOPS_GATEWAY_URL
to router.stella-ops.local so the Angular frontend resolves API base
URLs through the canonical frontdoor.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 03:50:16 +02:00
master
31cb31d0fb Eliminate Valkey queue polling fallback (phase 2 CPU optimization)
Replace hardcoded 1-5s polling constants with configurable
QueueWaitTimeoutSeconds (default 0 = pure event-driven). Consumers
now only wake on pub/sub notifications, eliminating ~118 idle
XREADGROUP polls per second across 59 services. Override with
VALKEY_QUEUE_WAIT_TIMEOUT env var if a safety-net poll is needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 02:36:01 +02:00
master
166745f9f9 Reduce idle CPU across 62 containers (phase 1)
- Add resource limits (heavy/medium/light tiers) to all 59 .NET services
- Add .NET GC tuning (server/workstation GC, DATAS, conserve memory)
- Convert FirstSignalSnapshotWriter from 10s polling to Valkey pub/sub
- Convert EnvironmentSettingsRefreshService from 60s polling to Valkey pub/sub
- Consolidate GraphAnalytics dual timers to single timer with idle-skip
- Increase healthcheck interval from 30s to 60s (configurable)
- Reduce debug logging to Information on 4 high-traffic services

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 02:16:19 +02:00
master
c0c0267ac9 Normalize live policy simulation tenant routing 2026-03-10 02:14:29 +02:00
master
72084355a6 Align policy simulation auth passthrough at the frontdoor 2026-03-10 01:55:51 +02:00
master
d16d7a1692 Repair live JobEngine runtime contracts 2026-03-10 01:38:38 +02:00
master
7be7295597 Keep approval queue on live canonical contracts 2026-03-10 01:38:21 +02:00
master
4a13601207 Adapt live frontend clients for compatibility data 2026-03-10 01:38:10 +02:00
master
18246cd74c Align live console and policy governance clients 2026-03-10 01:37:42 +02:00
master
afb9711e61 Restore live platform compatibility contracts 2026-03-10 01:37:24 +02:00
master
6b7168ca3c Bind startup migrations to module schema search path 2026-03-10 01:37:02 +02:00
master
1df79ac75e Restore policy simulation history compatibility 2026-03-10 00:42:18 +02:00
master
ac544c0064 Repair live watchlist frontdoor routing 2026-03-10 00:25:34 +02:00
master
359fafa9da Repair release investigation workspace contracts 2026-03-09 23:19:42 +02:00
master
3ecafc49a3 Preserve live scope across evidence and registry flows 2026-03-09 22:11:08 +02:00
master
dfd22281ed Repair live canonical migrations and scanner cache bootstrap 2026-03-09 21:56:41 +02:00
master
00bf2fa99a Repair live unified search corpus runtime 2026-03-09 19:44:16 +02:00
master
bf937c9395 Repair router frontdoor convergence and live route contracts 2026-03-09 19:09:19 +02:00
master
49d1c57597 Align live titles and trust setup overview 2026-03-09 11:20:19 +02:00
master
1e53976ffb fix(jobengine): make all orchestrator migration SQL idempotent and PostgreSQL-compatible
Fix 4 classes of issues that prevented JobEngine from auto-migrating:
1. Non-idempotent DDL: add IF NOT EXISTS to CREATE TABLE, wrap CREATE
   TYPE in DO blocks with EXCEPTION WHEN duplicate_object, wrap partition
   creation with EXCEPTION WHEN duplicate_object OR SQLSTATE '42P17'
2. Reserved keyword: quote `window` column name in 004_slo_quotas.sql
3. Invalid syntax: replace DELETE...LIMIT with ctid subquery pattern
   in 004_slo_quotas.sql and 005_audit_ledger.sql
4. Partition constraint: add tenant_id to UNIQUE(log_id) constraint
   on pack_run_logs in 006_pack_runs.sql (partitioned tables require
   partition key in all unique constraints)
5. Non-immutable index predicate: remove NOW() from partial index
   predicate in 002_backfill.sql
6. Remove BEGIN/COMMIT wrappers from all migration files (the
   StartupMigrationHost already wraps each migration in a transaction)

All 8 orchestrator migrations (001-008) now apply cleanly on fresh DB.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 08:38:20 +02:00
master
310e9f84fe fix(web): unify API base URL resolution and repair frontend service clients
- Introduce resolveApiBaseUrl() helper for consistent URL construction
- Fix evidence-pack queries to use public /v1/evidence-packs with runId param
- Resolve notify tenant from active context instead of hard-coded override
- Gate console run stream on concrete run ID (remove synthetic 'last' token)
- Remove unnecessary installed-pack probe from dashboard load
- Expand canonical route inventory with investigation and registry surfaces

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 07:53:46 +02:00
master
0473a5876a fix(notify): normalize legacy channel config and restore health diagnostics endpoint
- Add legacy channel config normalization for unmapped smtpHost, webhookUrl,
  channel fields into canonical NotifyChannelConfig
- Restore GET /channels/{channelId}/health endpoint
- Add JsonConverter attribute to ChannelHealthStatus enum
- Add test coverage for legacy row shapes and health contract
- Remove hosted services from test override to isolate channel tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 07:53:33 +02:00
master
481a062a1a fix(jobengine): register startup migrations for orchestrator schema
Wire AddStartupMigrations so JobEngine converges the orchestrator schema
on fresh database or wiped volumes without manual bootstrap scripts.
Adds StellaOps.Infrastructure.Postgres.Migrations dependency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 07:53:24 +02:00
master
354654ea84 feat(advisoryai): register runs service and expose canonical /v1/advisory-ai/runs endpoint
- Register RunService and IRunStore (InMemoryRunStore) in DI
- Disambiguate IGuidGenerator namespaces (Chat vs Runs)
- Mount RunEndpoints at canonical /v1/advisory-ai/runs path
- Make RunService public for WebService composition
- Add integration tests for runs authorization and CRUD

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 07:53:17 +02:00
master
e0c79e0dc0 fix(tools): improve build script discovery and update Verifier to System.CommandLine v8+
Build script:
- Add Get-RepoRelativePath() helper for cross-platform path handling
- Exclude node_modules and bin/obj from solution discovery

Verifier:
- Replace deprecated SetHandler with SetAction handler pattern
- Use GetRequiredValue/GetValue instead of GetValueForOption
- Replace SetDefaultValue with DefaultValueFactory property
- Remove CommandLineBuilder wrapper (built into framework now)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 07:53:08 +02:00
master
e6094e3b53 fix(project): normalize solution file paths and consolidate Scheduler references
- Normalize path separators in slnf files (forward to backslash)
- Move Scheduler project references from stale src/Scheduler/ to
  correct src/JobEngine/StellaOps.Scheduler.__Libraries/ location
- Remove BOM characters from solution files for consistency
- Fix solution folder labels for Verifier

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 07:52:58 +02:00
master
69923b648c fix(infra): repair gateway route ownership and add JobEngine/pack-registry scopes
- Route /api/v1/jobengine to jobengine service (was orchestrator)
- Route /api/v1/sources and /api/v1/witnesses to scanner service
- Add orch:quota and pack-registry scopes to platform OIDC token
- Align compose-local manifests with gateway appsettings.json

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 07:52:46 +02:00
master
841add4f27 perf(router): replace 100ms Valkey polling with Pub/Sub notification wakeup and increase heartbeat to 45s
The Valkey transport layer used 100ms busy-polling loops (Task.Delay(100))
across ~90 concurrent loops in 45+ services, generating ~900 idle
commands/sec and burning ~58% CPU while the system was completely idle.

Replace polling with Redis Pub/Sub notifications:
- Publishers fire PUBLISH after each XADD (fire-and-forget)
- Consumers SUBSCRIBE and wait on SemaphoreSlim with 30s fallback timeout
- Applies to both ValkeyMessageQueue (INotifiableQueue) and ValkeyEventStream
- Non-Valkey transports fall back to 1s polling via QueueWaitExtensions

Increase heartbeat interval from 10s to 45s across all transport options,
with corresponding health threshold adjustments (stale: 135s, degraded: 90s).

Expected idle CPU reduction: ~58% → ~3-5%.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 07:47:31 +02:00
master
c9686edf07 Restore scratch setup bootstrap and live frontdoor sweep 2026-03-09 01:42:24 +02:00
master
abda749ffd add couple of test:E2e:live npm starts 2026-03-09 00:19:55 +02:00
master
b87ffeb237 Repair live releases deployment detail flows 2026-03-09 00:09:01 +02:00
master
faf6278941 merge: harden derived shared ui components 2026-03-08 23:50:53 +02:00
master
b55760fc76 fix(web): harden derived shared ui components 2026-03-08 23:49:23 +02:00
master
d27d68d8c6 feat(web): derive timeline-list into canonical audit-grade event-stream timeline [SPRINT-029]
Rework the orphan TimelineListComponent into a canonical audit-grade
event-stream primitive for all mounted chronology surfaces.

Canonical event model (FE-TLD-001):
- TimelineEvent with id, timestamp (ISO-8601 UTC), title, description,
  actor, eventKind (info/success/warning/error/critical/neutral), icon,
  evidenceLink, metadata key-value pairs, and expandable detail payload
- Relative time for <24h, absolute UTC for >=24h, full ISO on tooltip
- Date grouping when events span multiple days

Derived primitive (FE-TLD-002):
- Vertical timeline with colored severity markers
- Deterministic UTC timestamp formatting
- Expandable detail sections with expand/collapse toggle
- Optional actor, metadata chips, and evidence links
- Loading skeleton and empty state
- Accessibility: role="feed", role="article", aria-labels, datetime attrs
- Content projection via ng-template for domain-specific rendering

Adopted on 3 surfaces (FE-TLD-003):
- incident-timeline: replaces bespoke inline timeline markers with shared
  component; preserves affected-services chips and correlated-events via
  expandable and content projection
- audit-timeline-search: replaces bespoke timeline rendering; preserves
  module/action badges via content projection
- releases-activity: replaces timeline view mode (was rendering duplicate
  table) with canonical timeline; preserves lane/env/outcome chips

Tests (FE-TLD-004): 32 focused tests covering event rendering, severity
markers, timestamp formatting, expandable toggle, loading/empty states,
date grouping, accessibility, and default fallbacks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 23:23:23 +02:00
master
12a6ef831b feat(web): derive page-header into canonical context-header with unified header contract [SPRINT-027]
Enhance ContextHeaderComponent to be the single canonical header primitive:
- Add configurable heading level (h1/h2/h3) for semantic HTML in nested shells
- Add testId input for Playwright targeting (data-testid)
- Add ARIA labels on return button and chip list (role=list/listitem)
- Add back-arrow indicator for improved return-button affordance
- Add JSDoc on all inputs for developer ergonomics

Deprecate PageHeaderComponent to a thin compatibility wrapper that delegates
to ContextHeaderComponent.

Adopt canonical header on 4 representative pages:
- RegistryAdminComponent (admin/setup surface)
- PackRegistryBrowserComponent (operational surface)
- DeadLetterDashboardComponent (operational surface)
- OfflineKitComponent (operational surface)

Each adopted page gains eyebrow breadcrumb context, consistent subtitle
placement, and projected actions via the shared header-actions slot,
replacing ~80 lines of repeated ad-hoc header markup.

15 focused component tests covering title rendering, eyebrow/subtitle
display, chips with ARIA, back action, action slot projection, heading
levels, testId, and responsive layout structure. All pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 23:20:21 +02:00
master
d7f55b72c8 feat(web): derive witness-viewer into reusable proof-inspection sections for mounted surfaces [SPRINT-031]
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 23:05:45 +02:00