Commit Graph

823 Commits

Author SHA1 Message Date
master
166745f9f9 Reduce idle CPU across 62 containers (phase 1)
- Add resource limits (heavy/medium/light tiers) to all 59 .NET services
- Add .NET GC tuning (server/workstation GC, DATAS, conserve memory)
- Convert FirstSignalSnapshotWriter from 10s polling to Valkey pub/sub
- Convert EnvironmentSettingsRefreshService from 60s polling to Valkey pub/sub
- Consolidate GraphAnalytics dual timers to single timer with idle-skip
- Increase healthcheck interval from 30s to 60s (configurable)
- Reduce debug logging to Information on 4 high-traffic services

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 02:16:19 +02:00
master
c0c0267ac9 Normalize live policy simulation tenant routing 2026-03-10 02:14:29 +02:00
master
72084355a6 Align policy simulation auth passthrough at the frontdoor 2026-03-10 01:55:51 +02:00
master
d16d7a1692 Repair live JobEngine runtime contracts 2026-03-10 01:38:38 +02:00
master
7be7295597 Keep approval queue on live canonical contracts 2026-03-10 01:38:21 +02:00
master
4a13601207 Adapt live frontend clients for compatibility data 2026-03-10 01:38:10 +02:00
master
18246cd74c Align live console and policy governance clients 2026-03-10 01:37:42 +02:00
master
afb9711e61 Restore live platform compatibility contracts 2026-03-10 01:37:24 +02:00
master
6b7168ca3c Bind startup migrations to module schema search path 2026-03-10 01:37:02 +02:00
master
1df79ac75e Restore policy simulation history compatibility 2026-03-10 00:42:18 +02:00
master
ac544c0064 Repair live watchlist frontdoor routing 2026-03-10 00:25:34 +02:00
master
359fafa9da Repair release investigation workspace contracts 2026-03-09 23:19:42 +02:00
master
3ecafc49a3 Preserve live scope across evidence and registry flows 2026-03-09 22:11:08 +02:00
master
dfd22281ed Repair live canonical migrations and scanner cache bootstrap 2026-03-09 21:56:41 +02:00
master
00bf2fa99a Repair live unified search corpus runtime 2026-03-09 19:44:16 +02:00
master
bf937c9395 Repair router frontdoor convergence and live route contracts 2026-03-09 19:09:19 +02:00
master
49d1c57597 Align live titles and trust setup overview 2026-03-09 11:20:19 +02:00
master
29fec722df docs(sprint): close sprints 001/003/004/005 — all tasks verified DONE
Mark all remaining TODO/DOING tasks as DONE with live probe evidence:
- Sprint 001 Task 003: 36/36 solutions build successfully
- Sprint 003 Task 003: sources=200, witnesses=200, advisory-ai/runs=403
- Sprint 004 Task 003: channels=200, rules=200, deliveries=200
- Sprint 005 Task 003: JobEngine healthy, all 8 migrations applied,
  jobs/runs/pack-runs routes respond 403 (scope auth, not schema)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 08:38:31 +02:00
master
1e53976ffb fix(jobengine): make all orchestrator migration SQL idempotent and PostgreSQL-compatible
Fix 4 classes of issues that prevented JobEngine from auto-migrating:
1. Non-idempotent DDL: add IF NOT EXISTS to CREATE TABLE, wrap CREATE
   TYPE in DO blocks with EXCEPTION WHEN duplicate_object, wrap partition
   creation with EXCEPTION WHEN duplicate_object OR SQLSTATE '42P17'
2. Reserved keyword: quote `window` column name in 004_slo_quotas.sql
3. Invalid syntax: replace DELETE...LIMIT with ctid subquery pattern
   in 004_slo_quotas.sql and 005_audit_ledger.sql
4. Partition constraint: add tenant_id to UNIQUE(log_id) constraint
   on pack_run_logs in 006_pack_runs.sql (partitioned tables require
   partition key in all unique constraints)
5. Non-immutable index predicate: remove NOW() from partial index
   predicate in 002_backfill.sql
6. Remove BEGIN/COMMIT wrappers from all migration files (the
   StartupMigrationHost already wraps each migration in a transaction)

All 8 orchestrator migrations (001-008) now apply cleanly on fresh DB.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 08:38:20 +02:00
master
71db8d4386 docs(sprint): add sprint 003-005 planning and update sprint 002 log
- SPRINT 003: Router frontdoor contract repair tasks
- SPRINT 004: Notify service and AI runs repair tasks
- SPRINT 005: JobEngine migration and scope repair tasks
- Update sprint 002 execution log with expanded route inventory

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 07:53:56 +02:00
master
310e9f84fe fix(web): unify API base URL resolution and repair frontend service clients
- Introduce resolveApiBaseUrl() helper for consistent URL construction
- Fix evidence-pack queries to use public /v1/evidence-packs with runId param
- Resolve notify tenant from active context instead of hard-coded override
- Gate console run stream on concrete run ID (remove synthetic 'last' token)
- Remove unnecessary installed-pack probe from dashboard load
- Expand canonical route inventory with investigation and registry surfaces

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 07:53:46 +02:00
master
0473a5876a fix(notify): normalize legacy channel config and restore health diagnostics endpoint
- Add legacy channel config normalization for unmapped smtpHost, webhookUrl,
  channel fields into canonical NotifyChannelConfig
- Restore GET /channels/{channelId}/health endpoint
- Add JsonConverter attribute to ChannelHealthStatus enum
- Add test coverage for legacy row shapes and health contract
- Remove hosted services from test override to isolate channel tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 07:53:33 +02:00
master
481a062a1a fix(jobengine): register startup migrations for orchestrator schema
Wire AddStartupMigrations so JobEngine converges the orchestrator schema
on fresh database or wiped volumes without manual bootstrap scripts.
Adds StellaOps.Infrastructure.Postgres.Migrations dependency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 07:53:24 +02:00
master
354654ea84 feat(advisoryai): register runs service and expose canonical /v1/advisory-ai/runs endpoint
- Register RunService and IRunStore (InMemoryRunStore) in DI
- Disambiguate IGuidGenerator namespaces (Chat vs Runs)
- Mount RunEndpoints at canonical /v1/advisory-ai/runs path
- Make RunService public for WebService composition
- Add integration tests for runs authorization and CRUD

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 07:53:17 +02:00
master
e0c79e0dc0 fix(tools): improve build script discovery and update Verifier to System.CommandLine v8+
Build script:
- Add Get-RepoRelativePath() helper for cross-platform path handling
- Exclude node_modules and bin/obj from solution discovery

Verifier:
- Replace deprecated SetHandler with SetAction handler pattern
- Use GetRequiredValue/GetValue instead of GetValueForOption
- Replace SetDefaultValue with DefaultValueFactory property
- Remove CommandLineBuilder wrapper (built into framework now)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 07:53:08 +02:00
master
e6094e3b53 fix(project): normalize solution file paths and consolidate Scheduler references
- Normalize path separators in slnf files (forward to backslash)
- Move Scheduler project references from stale src/Scheduler/ to
  correct src/JobEngine/StellaOps.Scheduler.__Libraries/ location
- Remove BOM characters from solution files for consistency
- Fix solution folder labels for Verifier

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 07:52:58 +02:00
master
69923b648c fix(infra): repair gateway route ownership and add JobEngine/pack-registry scopes
- Route /api/v1/jobengine to jobengine service (was orchestrator)
- Route /api/v1/sources and /api/v1/witnesses to scanner service
- Add orch:quota and pack-registry scopes to platform OIDC token
- Align compose-local manifests with gateway appsettings.json

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 07:52:46 +02:00
master
841add4f27 perf(router): replace 100ms Valkey polling with Pub/Sub notification wakeup and increase heartbeat to 45s
The Valkey transport layer used 100ms busy-polling loops (Task.Delay(100))
across ~90 concurrent loops in 45+ services, generating ~900 idle
commands/sec and burning ~58% CPU while the system was completely idle.

Replace polling with Redis Pub/Sub notifications:
- Publishers fire PUBLISH after each XADD (fire-and-forget)
- Consumers SUBSCRIBE and wait on SemaphoreSlim with 30s fallback timeout
- Applies to both ValkeyMessageQueue (INotifiableQueue) and ValkeyEventStream
- Non-Valkey transports fall back to 1s polling via QueueWaitExtensions

Increase heartbeat interval from 10s to 45s across all transport options,
with corresponding health threshold adjustments (stale: 135s, degraded: 90s).

Expected idle CPU reduction: ~58% → ~3-5%.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 07:47:31 +02:00
master
f218ec82ec Speed up scratch image builds with publish-first contexts 2026-03-09 07:37:24 +02:00
master
c9686edf07 Restore scratch setup bootstrap and live frontdoor sweep 2026-03-09 01:42:24 +02:00
master
abda749ffd add couple of test:E2e:live npm starts 2026-03-09 00:19:55 +02:00
master
b87ffeb237 Repair live releases deployment detail flows 2026-03-09 00:09:01 +02:00
master
faf6278941 merge: harden derived shared ui components 2026-03-08 23:50:53 +02:00
master
b55760fc76 fix(web): harden derived shared ui components 2026-03-08 23:49:23 +02:00
master
d27d68d8c6 feat(web): derive timeline-list into canonical audit-grade event-stream timeline [SPRINT-029]
Rework the orphan TimelineListComponent into a canonical audit-grade
event-stream primitive for all mounted chronology surfaces.

Canonical event model (FE-TLD-001):
- TimelineEvent with id, timestamp (ISO-8601 UTC), title, description,
  actor, eventKind (info/success/warning/error/critical/neutral), icon,
  evidenceLink, metadata key-value pairs, and expandable detail payload
- Relative time for <24h, absolute UTC for >=24h, full ISO on tooltip
- Date grouping when events span multiple days

Derived primitive (FE-TLD-002):
- Vertical timeline with colored severity markers
- Deterministic UTC timestamp formatting
- Expandable detail sections with expand/collapse toggle
- Optional actor, metadata chips, and evidence links
- Loading skeleton and empty state
- Accessibility: role="feed", role="article", aria-labels, datetime attrs
- Content projection via ng-template for domain-specific rendering

Adopted on 3 surfaces (FE-TLD-003):
- incident-timeline: replaces bespoke inline timeline markers with shared
  component; preserves affected-services chips and correlated-events via
  expandable and content projection
- audit-timeline-search: replaces bespoke timeline rendering; preserves
  module/action badges via content projection
- releases-activity: replaces timeline view mode (was rendering duplicate
  table) with canonical timeline; preserves lane/env/outcome chips

Tests (FE-TLD-004): 32 focused tests covering event rendering, severity
markers, timestamp formatting, expandable toggle, loading/empty states,
date grouping, accessibility, and default fallbacks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 23:23:23 +02:00
master
12a6ef831b feat(web): derive page-header into canonical context-header with unified header contract [SPRINT-027]
Enhance ContextHeaderComponent to be the single canonical header primitive:
- Add configurable heading level (h1/h2/h3) for semantic HTML in nested shells
- Add testId input for Playwright targeting (data-testid)
- Add ARIA labels on return button and chip list (role=list/listitem)
- Add back-arrow indicator for improved return-button affordance
- Add JSDoc on all inputs for developer ergonomics

Deprecate PageHeaderComponent to a thin compatibility wrapper that delegates
to ContextHeaderComponent.

Adopt canonical header on 4 representative pages:
- RegistryAdminComponent (admin/setup surface)
- PackRegistryBrowserComponent (operational surface)
- DeadLetterDashboardComponent (operational surface)
- OfflineKitComponent (operational surface)

Each adopted page gains eyebrow breadcrumb context, consistent subtitle
placement, and projected actions via the shared header-actions slot,
replacing ~80 lines of repeated ad-hoc header markup.

15 focused component tests covering title rendering, eyebrow/subtitle
display, chips with ARIA, back action, action slot projection, heading
levels, testId, and responsive layout structure. All pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 23:20:21 +02:00
master
d7f55b72c8 feat(web): derive witness-viewer into reusable proof-inspection sections for mounted surfaces [SPRINT-031]
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 23:05:45 +02:00
master
2bf4d69bba feat(web): rationalize settings IA into personal-preferences shell with admin rehoming [SPRINT-026]
Settings shell now owns only personal user preferences (appearance,
language, layout, AI assistant). All 14 admin/tenant/ops leaves
converted to controlled redirects pointing at their canonical owners
(Administration, Setup, Ops). Language merged into user-preferences.
Identity-providers rehomed from settings to administration as
canonical owner. Navigation config updated. 22 new route tests added.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 22:59:38 +02:00
master
ce59f66e97 feat(web): consolidate split-pane into list-detail-shell as canonical master-detail layout [SPRINT-030]
Extend ListDetailShellComponent with collapsible toggle button, detail panel
slide-in animation, and accessibility roles (complementary, aria-controls,
focus-visible). Adopt on signing-key-dashboard (trust-admin) for side-by-side
key list + detail browsing. Deprecate SplitPaneComponent. Add 15 focused
component tests covering rendering, toggle behavior, and accessibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 22:59:02 +02:00
master
622f015421 Backfill live auth scope and evidence route metadata 2026-03-08 22:56:55 +02:00
master
d7c3d5ad62 feat(web): derive metric-card into canonical KPI card with semantic delta handling [SPRINT-028]
Rework MetricCardComponent from a basic label+value+delta card into the
canonical Stella Ops KPI card primitive with:

- deltaDirection input ('up-is-good' | 'up-is-bad' | 'neutral') to control
  green/red semantics per metric context
- severity input ('healthy' | 'warning' | 'critical' | 'unknown') for
  left-border health accents
- unit input for display units (ms, %, /hr, GB)
- loading, empty, and error states with skeleton/placeholder rendering
- ARIA accessibility (role="group", composite aria-label, delta labels)
- Responsive dense-grid support

Adopted on 3 representative dashboards (12 bespoke tiles replaced):
- signals-runtime-dashboard (3 cards)
- search-quality-dashboard (4 cards)
- delivery-analytics (5 cards)

40 focused tests covering delta direction semantics, all states, severity
accents, and accessibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 22:55:54 +02:00
master
5d5f4de2e1 Refine live Playwright changed-surface checks 2026-03-08 22:55:12 +02:00
master
4f445ad951 Fix live evidence and registry auth contracts 2026-03-08 22:54:36 +02:00
master
aa7e0e937c chore(web): prune dead ui cleanup artifacts 2026-03-08 21:59:38 +02:00
master
6efed23647 archive these 2026-03-08 21:41:38 +02:00
master
f40043ed50 fix(web): remediate orphan revival regressions 2026-03-08 20:23:37 +02:00
master
d6b2e354f0 docs(ui): update task board and plan for orphan revival batch [SPRINT-013..023]
Sync TASKS.md, implementation_plan.md, and orphan-revival-batch README
to reflect all 11 shipped orphan component adoption sprints.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 19:25:48 +02:00
master
2a25e7b2b0 feat(ui): reconnect registry-admin under integration hub [SPRINT-023]
Mount registry-admin routes under canonical /ops/integrations (and
/setup/integrations) with plans list, editor, and audit flows reachable
from integration-hub entry points.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 19:25:40 +02:00
master
38cbdb79dd feat(ui): reconnect release investigation routes [SPRINT-022]
Mount deploy-diff, change-trace, and timeline under /releases/investigation
as bounded secondary routes. Timeline uses correlation-based model to avoid
collision with shipped run-workspace tab.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 19:25:38 +02:00
master
1b934ad47a feat(ui): reconnect evidence-thread and persona workspace routes [SPRINT-021]
Mount evidence-thread, auditor-workspace, and developer-workspace routes
under canonical /evidence family as drill-in lenses, not standalone shells.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 19:25:32 +02:00