Update docs, sprint plans, and compose configuration

Add 12 new sprint files (Integrations, Graph, JobEngine, FE, Router,
AdvisoryAI), archive completed scheduler UI sprint, update module
architecture docs (router, graph, jobengine, web, integrations),
and add Gitea entrypoint script for local dev.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
master
2026-04-06 08:53:50 +03:00
parent 8e823792a3
commit 50abd2137f
36 changed files with 1723 additions and 69 deletions

View File

@@ -9,8 +9,9 @@ Consolidated Docker Compose configuration for the StellaOps platform. All profil
| Run the full platform | `docker compose -f docker-compose.stella-ops.yml up -d` |
| Add observability | `docker compose -f docker-compose.stella-ops.yml -f docker-compose.telemetry.yml up -d` |
| Start QA integration fixtures | `docker compose -f docker-compose.integration-fixtures.yml up -d` |
| Start 3rd-party integration services | `docker compose -f docker-compose.integrations.yml up -d` |
| Start GitLab CE (heavy, ~4 GB RAM) | `docker compose -f docker-compose.integrations.yml --profile heavy up -d gitlab` |
| Start the default low-idle 3rd-party integration lane | `docker compose -f docker-compose.integrations.yml up -d` |
| Start Consul KV only when needed | `docker compose -f docker-compose.integrations.yml --profile consul up -d consul` |
| Start GitLab CE (heavy, low-idle defaults) | `docker compose -f docker-compose.integrations.yml --profile heavy up -d gitlab` |
| Run integration E2E test suite | See [Integration Test Suite](#integration-test-suite) |
| Run CI/testing infrastructure | `docker compose -f docker-compose.testing.yml --profile ci up -d` |
| Deploy with China compliance | See [China Compliance](#china-compliance-sm2sm3sm4) |
@@ -30,7 +31,7 @@ Consolidated Docker Compose configuration for the StellaOps platform. All profil
| `docker-compose.telemetry.yml` | **Observability**: OpenTelemetry collector, Prometheus, Tempo, Loki |
| `docker-compose.testing.yml` | **CI/Testing**: Test databases, mock services, Gitea for integration tests |
| `docker-compose.dev.yml` | **Minimal dev infrastructure**: PostgreSQL, Valkey, RustFS only |
| `docker-compose.integrations.yml` | **Integration services**: Gitea, Jenkins, Nexus, Vault, Docker Registry, MinIO, GitLab |
| `docker-compose.integrations.yml` | **Integration services**: Gitea, Jenkins, Nexus, Vault, Docker Registry, MinIO, plus opt-in Consul and GitLab |
### Specialized Infrastructure
@@ -158,6 +159,8 @@ docker compose -f docker-compose.stella-ops.yml up -d
pwsh ./scripts/router-mode-redeploy.ps1 -Mode microservice
```
The local compose defaults intentionally keep router control traffic calm: `ROUTER_MESSAGING_HEARTBEAT_INTERVAL` defaults to `30s` so the stack does not churn small heartbeat traffic every 10 seconds across the full service fleet. Messaging endpoint/schema/OpenAPI replay is no longer periodic; it now happens on service startup, gateway-state recovery, or explicit administration resync. `ROUTER_REGISTRATION_REFRESH_INTERVAL_SECONDS` remains exposed only as a compatibility knob for older assumptions and non-messaging experiments.
Validation endpoints:
```bash
@@ -241,20 +244,31 @@ These fixtures are deterministic QA aids only; they are not production dependenc
Real 3rd-party services for local integration testing. Unlike the QA fixtures above (which are nginx mocks), these are fully functional instances that exercise actual connector plugin code paths.
```bash
# Start all lightweight integration services (after the main stack is up)
# Start the default low-idle integration lane (after the main stack is up)
docker compose -f docker-compose.integrations.yml up -d
# Start specific services only
docker compose -f docker-compose.integrations.yml up -d gitea vault jenkins
# Start Consul only when you need the Consul connector
docker compose -f docker-compose.integrations.yml --profile consul up -d consul
# Start GitLab CE (heavy — requires ~4 GB RAM, ~3 min startup)
# Default GitLab tuning keeps SCM/API coverage and disables registry extras.
docker compose -f docker-compose.integrations.yml --profile heavy up -d gitlab
# Re-enable GitLab registry/package surfaces for dedicated registry tests
GITLAB_ENABLE_REGISTRY=true GITLAB_ENABLE_PACKAGES=true \
docker compose -f docker-compose.integrations.yml --profile heavy up -d gitlab
# Combine with mock fixtures for full coverage
docker compose \
-f docker-compose.integrations.yml \
-f docker-compose.integration-fixtures.yml \
up -d
# Confirm the deterministic Gitea bootstrap completed
docker compose -f docker-compose.integrations.yml ps gitea
```
**Hosts file entries** (add to `C:\Windows\System32\drivers\etc\hosts`):
@@ -266,19 +280,21 @@ docker compose \
127.1.2.5 registry.stella-ops.local
127.1.2.6 minio.stella-ops.local
127.1.2.7 gitlab.stella-ops.local
127.1.2.8 consul.stella-ops.local
```
**Service reference:**
| Service | Type | Address | Credentials | Integration Provider |
|---------|------|---------|-------------|---------------------|
| Gitea | SCM | `http://gitea.stella-ops.local:3000` | Create on first login | `Gitea` |
| Gitea | SCM | `http://gitea.stella-ops.local:3000` | `stellaops` / `Stella2026!` on fresh volumes | `Gitea` |
| Jenkins | CI/CD | `http://jenkins.stella-ops.local:8080` | Setup wizard disabled | `Jenkins` |
| Nexus | Registry | `http://nexus.stella-ops.local:8081` | admin / see `admin.password` | `Nexus` |
| Vault | Secrets | `http://vault.stella-ops.local:8200` | Token: `stellaops-dev-root-token-2026` | — |
| Consul | Settings/KV | `http://consul.stella-ops.local:8500` | none (single-node local server, opt-in profile) | `Consul` |
| Docker Registry | Registry | `http://registry.stella-ops.local:5000` | None (open dev) | `DockerHub` |
| MinIO | S3 Storage | `http://minio.stella-ops.local:9001` | `stellaops` / `Stella2026!` | — |
| GitLab CE | SCM+CI+Registry | `http://gitlab.stella-ops.local:8929` | root / `Stella2026!` | `GitLabServer` |
| GitLab CE | SCM+CI(+Registry opt-in) | `http://gitlab.stella-ops.local:8929` | root / `Stella2026!` | `GitLabServer` |
**Credential resolution:** Integration connectors resolve secrets via `authref://vault/{path}#{key}` URIs. The Integrations service resolves these from Vault automatically in dev mode. Store credentials with:
@@ -294,6 +310,10 @@ vault kv put secret/jenkins api-token="user:token"
vault kv put secret/nexus admin-password="your-password"
```
Gitea is now bootstrapped by the compose service itself: a fresh `stellaops-gitea-data` volume creates the default local admin user and the repository root before the container reports healthy. Personal access tokens remain a manual step because Gitea only reveals the token value when it is created.
`docker-compose.testing.yml` is a separate infrastructure-test lane. It starts `postgres-test`, `valkey-test`, mocks, and an isolated Gitea profile on different ports; it does not start Consul or GitLab. Use `docker-compose.integrations.yml` only when you need real third-party providers for connector validation.
**Backend connector plugins** (8 total, loaded in Integrations service):
| Plugin | Type | Provider | Health Endpoint |
@@ -327,6 +347,7 @@ vault kv put secret/nexus admin-password="your-password"
| 127.1.2.5 | docker-registry | 5000 |
| 127.1.2.6 | minio | 9000, 9001 |
| 127.1.2.7 | gitlab (heavy) | 8929, 2224, 5050 |
| 127.1.2.8 | consul (optional) | 8500 |
For detailed setup instructions per service, see [`docs/integrations/LOCAL_SERVICES.md`](../../docs/integrations/LOCAL_SERVICES.md).
@@ -354,6 +375,9 @@ docker compose -f docker-compose.integration-fixtures.yml up -d
# 3. Start real 3rd-party services
docker compose -f docker-compose.integrations.yml up -d
# 3a. (Optional) Start Consul only when validating the Consul connector
docker compose -f docker-compose.integrations.yml --profile consul up -d consul
# 4. (Optional) Start GitLab for full SCM coverage
docker compose -f docker-compose.integrations.yml --profile heavy up -d gitlab
```

View File

@@ -8,12 +8,16 @@
# The main stellaops network must exist (started via docker-compose.stella-ops.yml).
#
# Usage:
# # Start all integration services
# # Start the default low-idle integration lane
# docker compose -f devops/compose/docker-compose.integrations.yml up -d
#
# # Start specific services only
# docker compose -f devops/compose/docker-compose.integrations.yml up -d gitea jenkins vault
#
# # Start optional higher-idle providers only when needed
# docker compose -f devops/compose/docker-compose.integrations.yml --profile consul up -d consul
# docker compose -f devops/compose/docker-compose.integrations.yml --profile heavy up -d gitlab
#
# # Start integration services + mock fixtures together
# docker compose \
# -f devops/compose/docker-compose.integrations.yml \
@@ -42,8 +46,8 @@ networks:
volumes:
gitea-data:
name: stellaops-gitea-data
gitea-db:
name: stellaops-gitea-db
gitea-config:
name: stellaops-gitea-config
jenkins-data:
name: stellaops-jenkins-data
nexus-data:
@@ -60,6 +64,8 @@ volumes:
name: stellaops-gitlab-data
gitlab-logs:
name: stellaops-gitlab-logs
consul-data:
name: stellaops-consul-data
services:
# ===========================================================================
@@ -67,39 +73,49 @@ services:
# ===========================================================================
# Integration type: SCM (Gitea provider)
# URL: http://gitea.stella-ops.local:3000
# Admin: stellaops / Stella2026!
# Admin: stellaops / Stella2026! (fresh volumes auto-bootstrap on container start)
# API: http://gitea.stella-ops.local:3000/api/v1
# ===========================================================================
gitea:
image: gitea/gitea:1.22-rootless
container_name: stellaops-gitea
restart: unless-stopped
entrypoint: ["/bin/sh", "/stellaops-gitea-entrypoint.sh"]
ports:
- "127.1.2.1:3000:3000"
- "127.1.2.1:2222:2222"
environment:
- GITEA__database__DB_TYPE=sqlite3
- GITEA__database__PATH=/var/lib/gitea/data/gitea.db
- GITEA__server__ROOT_URL=http://gitea.stella-ops.local:3000
- GITEA__server__DOMAIN=gitea.stella-ops.local
- GITEA__server__HTTP_PORT=3000
- GITEA__server__SSH_PORT=2222
- GITEA__server__SSH_DOMAIN=gitea.stella-ops.local
- GITEA__service__DISABLE_REGISTRATION=false
- GITEA__service__DISABLE_REGISTRATION=true
- GITEA__service__REQUIRE_SIGNIN_VIEW=false
- GITEA__actions__ENABLED=true
- GITEA__api__ENABLE_SWAGGER=true
- GITEA__security__INSTALL_LOCK=true
- GITEA__security__SECRET_KEY=stellaops-dev-secret-key-2026
- GITEA__security__INTERNAL_TOKEN=stellaops-internal-token-2026-dev
- GITEA_LOCAL_ADMIN_USERNAME=stellaops
- GITEA_LOCAL_ADMIN_PASSWORD=Stella2026!
- GITEA_LOCAL_ADMIN_EMAIL=stellaops@gitea.stella-ops.local
volumes:
- gitea-data:/var/lib/gitea
- gitea-db:/var/lib/gitea/db
- gitea-config:/etc/gitea
- ./scripts/gitea-entrypoint.sh:/stellaops-gitea-entrypoint.sh:ro
networks:
stellaops:
aliases:
- gitea.stella-ops.local
healthcheck:
test: ["CMD-SHELL", "wget -qO- http://localhost:3000/api/v1/version || exit 1"]
test:
[
"CMD-SHELL",
"wget -qO- http://localhost:3000/api/v1/version >/dev/null 2>&1 && test -f /var/lib/gitea/data/.local-admin-ready"
]
interval: 30s
timeout: 10s
retries: 5
@@ -114,7 +130,7 @@ services:
# ===========================================================================
# Integration type: CI/CD (Jenkins provider)
# URL: http://jenkins.stella-ops.local:8080
# Admin: admin / Stella2026!
# Auth: anonymous access by default; create an admin user manually if you need authenticated API flows
# API: http://jenkins.stella-ops.local:8080/api/json
# ===========================================================================
jenkins:
@@ -297,8 +313,12 @@ services:
# ===========================================================================
# Integration type: Secrets Manager (Consul provider)
# URL: http://consul.stella-ops.local:8500
# No auth (dev mode)
# No auth (single-node local mode)
# API: http://consul.stella-ops.local:8500/v1/status/leader
#
# Profile: consul - opt in only when validating the Consul connector.
# Runs as a single-node local server with the UI enabled. This preserves
# the HTTP KV surface while avoiding the higher idle CPU cost of `agent -dev`.
# ===========================================================================
consul:
image: hashicorp/consul:1.19
@@ -306,21 +326,25 @@ services:
restart: unless-stopped
ports:
- "127.1.2.8:8500:8500"
command: agent -dev -client=0.0.0.0
command: agent -server -bootstrap-expect=1 -ui -client=0.0.0.0 -data-dir=/consul/data -log-level=warn
volumes:
- consul-data:/consul/data
networks:
stellaops:
aliases:
- consul.stella-ops.local
healthcheck:
test: ["CMD-SHELL", "consul members || exit 1"]
interval: 15s
interval: 60s
timeout: 5s
retries: 5
start_period: 10s
labels:
com.stellaops.integration: "secrets"
com.stellaops.provider: "consul"
com.stellaops.profile: "integrations"
com.stellaops.profile: "integrations-optional"
profiles:
- consul
# ===========================================================================
# GITLAB CE — Full Git SCM + CI/CD + Container Registry (optional, heavy)
@@ -332,7 +356,10 @@ services:
# Requires: ~4 GB RAM, ~2 min startup
#
# Profile: heavy — only start when explicitly requested:
# docker compose -f docker-compose.integrations.yml up -d gitlab
# docker compose -f docker-compose.integrations.yml --profile heavy up -d gitlab
#
# Local defaults bias for lower idle CPU. SCM/API coverage remains available,
# while registry/KAS extras stay disabled unless you opt in via env vars.
# ===========================================================================
gitlab:
image: gitlab/gitlab-ce:17.8.1-ce.0
@@ -348,12 +375,20 @@ services:
gitlab_rails['initial_root_password'] = 'Stella2026!'
gitlab_rails['gitlab_shell_ssh_port'] = 2224
registry_external_url 'http://gitlab.stella-ops.local:5050'
registry['enable'] = true
registry['enable'] = ${GITLAB_ENABLE_REGISTRY:-false}
registry_nginx['enable'] = ${GITLAB_ENABLE_REGISTRY:-false}
gitlab_kas['enable'] = false
prometheus_monitoring['enable'] = false
sidekiq['max_concurrency'] = 5
puma['workers'] = 2
puma['min_threads'] = 1
puma['max_threads'] = 2
gitlab_rails['usage_ping_enabled'] = false
gitlab_rails['runners_registration_enabled'] = false
gitlab_rails['packages_enabled'] = ${GITLAB_ENABLE_PACKAGES:-false}
sidekiq['concurrency'] = ${GITLAB_SIDEKIQ_CONCURRENCY:-2}
sidekiq['metrics_enabled'] = false
sidekiq['health_checks_enabled'] = false
puma['worker_processes'] = ${GITLAB_PUMA_WORKERS:-1}
puma['min_threads'] = ${GITLAB_PUMA_MIN_THREADS:-1}
puma['max_threads'] = ${GITLAB_PUMA_MAX_THREADS:-2}
nginx['worker_processes'] = 1
postgresql['shared_buffers'] = '128MB'
gitlab_rails['env'] = { 'MALLOC_CONF' => 'dirty_decay_ms:1000,muzzy_decay_ms:1000' }
volumes:

View File

@@ -63,7 +63,8 @@ x-router-microservice-defaults: &router-microservice-defaults
Router__Messaging__RequestTimeout: "30s"
Router__Messaging__LeaseDuration: "5m"
Router__Messaging__BatchSize: "10"
Router__Messaging__HeartbeatInterval: "10s"
Router__Messaging__HeartbeatInterval: "${ROUTER_MESSAGING_HEARTBEAT_INTERVAL:-30s}"
Router__RegistrationRefreshIntervalSeconds: "${ROUTER_REGISTRATION_REFRESH_INTERVAL_SECONDS:-30}"
Router__Messaging__valkey__ConnectionString: "cache.stella-ops.local:6379"
Router__Messaging__valkey__Database: "0"
Router__Messaging__valkey__QueueWaitTimeoutSeconds: "${VALKEY_QUEUE_WAIT_TIMEOUT:-0}"
@@ -352,7 +353,7 @@ services:
Gateway__Transports__Messaging__RequestTimeout: "30s"
Gateway__Transports__Messaging__LeaseDuration: "5m"
Gateway__Transports__Messaging__BatchSize: "10"
Gateway__Transports__Messaging__HeartbeatInterval: "10s"
Gateway__Transports__Messaging__HeartbeatInterval: "${ROUTER_MESSAGING_HEARTBEAT_INTERVAL:-30s}"
# Identity envelope signing (gateway -> microservice auth)
Gateway__Auth__IdentityEnvelopeSigningKey: "${STELLAOPS_IDENTITY_ENVELOPE_SIGNING_KEY}"
# Audience validation disabled until authority includes aud in access tokens
@@ -2337,6 +2338,8 @@ services:
ADVISORYAI__AdvisoryAI__Inference__Remote__BaseAddress: "${ADVISORY_AI_REMOTE_BASEADDRESS:-}"
ADVISORYAI__AdvisoryAI__Inference__Remote__ApiKey: "${ADVISORY_AI_REMOTE_APIKEY:-}"
ADVISORYAI__KnowledgeSearch__ConnectionString: *postgres-connection
ADVISORYAI__KnowledgeSearch__DatabaseApplicationName: "${ADVISORY_AI_KNOWLEDGESEARCH_DB_APPLICATION_NAME:-stellaops-advisory-ai-web/knowledge-search}"
ADVISORYAI__KnowledgeSearch__DatabaseConnectionIdleLifetimeSeconds: "${ADVISORY_AI_KNOWLEDGESEARCH_DB_IDLE_LIFETIME_SECONDS:-900}"
ADVISORYAI__KnowledgeSearch__FindingsAdapterEnabled: "true"
ADVISORYAI__KnowledgeSearch__FindingsAdapterBaseUrl: "http://scanner.stella-ops.local"
ADVISORYAI__KnowledgeSearch__VexAdapterEnabled: "true"

View File

@@ -0,0 +1,66 @@
#!/bin/sh
set -eu
GITEA_CONFIG="${GITEA_APP_INI:-/etc/gitea/app.ini}"
GITEA_WORK_PATH="${GITEA_WORK_DIR:-/var/lib/gitea}"
GITEA_HTTP_PORT="${GITEA__server__HTTP_PORT:-3000}"
GITEA_ADMIN_USERNAME="${GITEA_LOCAL_ADMIN_USERNAME:-stellaops}"
GITEA_ADMIN_PASSWORD="${GITEA_LOCAL_ADMIN_PASSWORD:-Stella2026!}"
GITEA_ADMIN_EMAIL="${GITEA_LOCAL_ADMIN_EMAIL:-stellaops@gitea.stella-ops.local}"
GITEA_BOOTSTRAP_SENTINEL="${GITEA_BOOTSTRAP_SENTINEL:-${GITEA_WORK_PATH}/data/.local-admin-ready}"
mkdir -p "${GITEA_WORK_PATH}/git/repositories"
/usr/local/bin/docker-entrypoint.sh "$@" &
gitea_pid=$!
cleanup() {
if kill -0 "${gitea_pid}" 2>/dev/null; then
kill "${gitea_pid}" 2>/dev/null || true
wait "${gitea_pid}" 2>/dev/null || true
fi
}
trap cleanup INT TERM
ready=0
attempt=0
while [ "${attempt}" -lt 60 ]; do
if wget -qO- "http://127.0.0.1:${GITEA_HTTP_PORT}/api/v1/version" >/dev/null 2>&1; then
ready=1
break
fi
if ! kill -0 "${gitea_pid}" 2>/dev/null; then
wait "${gitea_pid}"
exit 1
fi
attempt=$((attempt + 1))
sleep 1
done
if [ "${ready}" -ne 1 ]; then
echo "Gitea did not become ready for local admin bootstrap" >&2
wait "${gitea_pid}"
exit 1
fi
existing_admins="$(gitea admin user list --config "${GITEA_CONFIG}" --work-path "${GITEA_WORK_PATH}" --admin | tail -n +2 | awk 'BEGIN { first = 1 } NF { if (!first) printf ","; printf "%s", $2; first = 0 }')"
if [ -z "${existing_admins}" ]; then
gitea admin user create \
--config "${GITEA_CONFIG}" \
--work-path "${GITEA_WORK_PATH}" \
--username "${GITEA_ADMIN_USERNAME}" \
--password "${GITEA_ADMIN_PASSWORD}" \
--email "${GITEA_ADMIN_EMAIL}" \
--admin \
--must-change-password=false
else
echo "Gitea admin bootstrap skipped; existing admin users: ${existing_admins}"
fi
touch "${GITEA_BOOTSTRAP_SENTINEL}"
wait "${gitea_pid}"

View File

@@ -0,0 +1,114 @@
# Sprint 20260405-012 -- Web / JobEngine Scheduler UI Contract Repair
## Topic & Scope
- The Angular UI (scheduler-ops feature) and the Scheduler WebService API have diverged: response envelope shapes, enum/state naming, run-list deserialization, and several backend-only fields are silently lost in the mapping layer.
- Goal: make every scheduler page (schedules list, schedule detail, runs list, run detail, impact preview) render data from the real backend without silent field loss or deserialization errors.
- Working directory: `src/Web/StellaOps.Web/src/app` (primary), `src/JobEngine/StellaOps.Scheduler.WebService` (read-only reference).
- Expected evidence: each task produces a working UI page backed by the real scheduler API; no hardcoded mock data.
## Dependencies & Concurrency
- Scheduler WebService API is stable (no planned endpoint changes).
- May run in parallel with other FE sprints provided there is no overlap on `scheduler-ops/` or `core/api/scheduler.client.ts`.
## Documentation Prerequisites
- `docs/modules/jobengine/architecture.md` -- section 8.1 (scheduler subdomain).
- `src/JobEngine/AGENTS.Scheduler.md` -- scope definitions.
- Backend contracts: `ScheduleContracts.cs`, `RunContracts.cs`, `Enums.cs`, `Run.cs`, `Schedule.cs` under `StellaOps.Scheduler.Models`.
---
## Delivery Tracker
### TASK-01 - Fix SchedulerRun model to match backend Run shape
Status: DONE
Dependency: none
Owners: Frontend Developer
Task description:
Updated `SchedulerRun` interface in `scheduler-ops.models.ts` to align with backend `Run` record. Mapped trigger/state enums, added stats/reason/deltas fields, removed phantom fields (output, retryCount, metadata). Added `SchedulerRunTrigger`, `SchedulerRunStats` types.
Completion criteria:
- [x] `SchedulerRun` interface properties have documented correspondence to backend `Run` fields
- [x] No silent data loss when deserializing a backend Run response
- [x] `SchedulerRunStatus` type includes all backend RunState values
### TASK-02 - Fix runs list deserialization in scheduler-runs.component.ts
Status: DONE
Dependency: TASK-01
Owners: Frontend Developer
Task description:
Added `listRuns()`, `cancelRun(runId)`, and `retryRun(runId)` methods to `SchedulerHttpClient`/`SchedulerApi` interface. Wired `scheduler-runs.component.ts` to use `SCHEDULER_API` token instead of raw `HttpClient`. Backend `Run` objects are mapped to `SchedulerRun` via the client's `mapRun()` method. Cancel and retry now call real backend endpoints.
Completion criteria:
- [x] Runs list loads from backend via `SchedulerApi.listRuns()`
- [x] Cancel and retry call real backend endpoints
- [x] Backend `Run` objects are properly mapped to `SchedulerRun`
- [x] Pagination cursor is handled (at minimum: fetch first page correctly)
### TASK-03 - Fix Schedule model mapping (mode <-> taskType, missing fields)
Status: DONE
Dependency: none
Owners: Frontend Developer
Task description:
Replaced the lossy 7-type `ScheduleTaskType` with the backend's 2-value `ScheduleMode` (`analysis-only` | `content-refresh`). Added `ScheduleSelector` and `ScheduleLimits` interfaces matching backend. Removed phantom fields (`description`, `tags`, `retryPolicy`, `taskConfig`, `nextRunAt`, `concurrencyLimit`). Updated schedule-management form to use mode/selection/limits. Selection scope is now user-controllable (all-images, by-namespace, by-repository).
Completion criteria:
- [x] Schedule create/update sends the correct `mode` value to the backend
- [x] No data corruption on schedule edit round-trip (read -> edit -> save)
- [x] Selection scope is user-controllable (at minimum: `all-images` vs `by-namespace`)
### TASK-04 - Fix deleteSchedule to use a real delete or document pause-as-delete
Status: DONE
Dependency: none
Owners: Frontend Developer
Task description:
Added `includeDisabled=false` query parameter to `listSchedules()` so paused-as-deleted schedules don't reappear on page reload.
Completion criteria:
- [x] After "deleting" a schedule via UI, it does not reappear on page reload
- [x] `listSchedules()` passes `includeDisabled=false` unless user explicitly filters
### TASK-05 - Fix impact preview to consume full backend response
Status: DONE
Dependency: none
Owners: Frontend Developer
Task description:
Updated `ScheduleImpactPreview` model to match backend `ImpactPreviewResponse` (total, usageOnly, generatedAt, snapshotId, sample). Added `ImpactPreviewSample` interface. Updated `previewImpact()` to accept a `ScheduleSelector` and map the full response. Updated the schedule-management template to display sample images in a preview table.
Completion criteria:
- [x] Impact preview shows total + sample images from backend response
- [x] Preview uses the schedule's selector, not a hardcoded one
- [x] No synthesized fields pretend to come from the backend
### TASK-06 - Remove MockSchedulerClient or gate it behind a flag
Status: DONE
Dependency: TASK-02, TASK-03
Owners: Frontend Developer
Task description:
Removed `MockSchedulerClient` entirely from `scheduler.client.ts`. Production `app.config.ts` already uses `SchedulerHttpClient` via `useExisting`, so no mock could leak into production. The rewrite of `scheduler.client.ts` simply omitted the mock class.
Completion criteria:
- [x] Production builds cannot use MockSchedulerClient
- [x] If kept for dev, it's behind an explicit environment check
---
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-04-06 | Sprint created; 6 tasks covering model alignment, runs wiring, schedule mapping, delete behavior, preview enrichment, and mock removal. | Planning |
| 2026-04-06 | All 6 tasks implemented. Models rewritten, client rewritten, components updated, tests updated. Build clean, 48/48 tests pass. Fixed pre-existing I18nService DI issue (`@Inject(HttpBackend)` decorator). | Developer |
## Decisions & Risks
- **Decision: reduced taskType to mode** -- The 7-type taxonomy was a UI-only concept. The backend supports 2 modes (AnalysisOnly, ContentRefresh). UI now matches backend exactly.
- **Decision: backend contracts are stable** -- This sprint only modified the Web UI. The I18nService `@Inject` fix is a minor DI decorator addition.
- **Decision: pause-as-delete with filter** -- `listSchedules()` now passes `includeDisabled=false`. True soft-delete would require a backend change (out of scope).
- **Risk: scheduleName resolution for runs** -- Backend `Run` only carries `scheduleId`, not `scheduleName`. Current UI shows scheduleId as name. A follow-up sprint could batch-resolve names from the schedules list.
## Next Checkpoints
- All tasks DONE. Sprint ready for archive.

View File

@@ -63,6 +63,8 @@ The scripts will:
Open **https://stella-ops.local** when setup completes.
The automated setup path does not start the real third-party integration compose lane. `devops/compose/docker-compose.testing.yml` is the CI/testing lane, and the optional real providers live in `devops/compose/docker-compose.integrations.yml`. GitLab and Consul are opt-in there because they add noticeable idle CPU overhead.
For targeted backend rebuilds after a scoped code change on Windows:
```powershell

View File

@@ -253,6 +253,46 @@ All operations log with:
- `operation`: comment, status, check_run
- `prNumber` / `commitSha`: Target reference
## Current Catalog Contract
The live Integration Catalog contract is served by the Integrations WebService and is the source of truth for provider discovery and resource discovery.
### Provider Metadata
- `GET /api/v1/integrations/providers` returns `ProviderInfo[]` with `name`, `type`, `provider`, `isTestOnly`, `supportsDiscovery`, and `supportedResourceTypes`.
- Test-only providers are hidden by default. `GET /api/v1/integrations/providers?includeTestOnly=true` exposes providers such as `InMemory` for explicit test/dev workflows.
- Built-in provider coverage now includes Harbor, Docker Registry, GitLab Container Registry, GitHub App, Gitea, GitLab Server, GitLab CI, Jenkins, Nexus, Vault, Consul, eBPF Agent, the `S3Compatible` object-storage provider, feed mirror providers (`StellaOpsMirror`, `NvdMirror`, `OsvMirror`), and the hidden test-only `InMemory` plugin.
### Discovery
- `POST /api/v1/integrations/{id}/discover` accepts:
```json
{
"resourceType": "repositories",
"filter": {
"namePattern": "team/*"
}
}
```
- Successful responses return `DiscoverIntegrationResponse` with the normalized `resourceType`, the ordered `supportedResourceTypes`, and the discovered `resources`.
- Unsupported discovery requests return `400 Bad Request` with the supported resource types for that provider.
- Missing or cross-tenant integration IDs return `404 Not Found`.
### Discovery-Capable Providers
- OCI registries: `repositories`, `tags`
- SCM: `projects`, `repositories`
- CI/CD: `jobs`, `pipelines`
- Feed mirror, object-storage, and secrets/runtime providers currently expose health/test flows only
### Credential Resolution
- Integration secrets are supplied as AuthRef URIs such as `authref://vault/gitlab#access-token`
- The runtime resolver is Vault-backed; there is no product-path stub resolver in the shipped service
- Registry connectors accept bearer tokens and `username:password` or `username:token` secrets for Basic auth-compatible registries
## Related Documentation
- [CI/CD Gate Flow](../../flows/10-cicd-gate-flow.md)

View File

@@ -1,8 +1,8 @@
# Database Coding Rules
**Version:** 1.0.0
**Version:** 1.1.1
**Status:** APPROVED
**Last Updated:** 2025-11-28
**Last Updated:** 2026-04-05
---
@@ -119,6 +119,56 @@ public sealed class SchedulerDataSource : IAsyncDisposable
}
```
### 2.1.1 Runtime Data Source Attribution
**RULE:** Every runtime `NpgsqlDataSource` MUST set a stable PostgreSQL `application_name`.
```csharp
// ✓ CORRECT
var connectionStringBuilder = new NpgsqlConnectionStringBuilder(options.ConnectionString)
{
ApplicationName = "stellaops-policy"
};
var dataSource = new NpgsqlDataSourceBuilder(connectionStringBuilder.ConnectionString).Build();
// ✓ CORRECT - shared infrastructure path
var connectionString = PostgresConnectionStringPolicy.Build(options, "stellaops-policy");
var dataSource = new NpgsqlDataSourceBuilder(connectionString).Build();
// ✗ INCORRECT - anonymous session attribution
var dataSource = NpgsqlDataSource.Create(options.ConnectionString);
```
**RATIONALE:** Anonymous PostgreSQL sessions make runtime triage and CPU/churn attribution materially harder. `application_name` is mandatory for steady-state service code.
### 2.1.2 Runtime Data Source Construction
**RULE:** Runtime services MUST reuse a singleton/module-scoped `NpgsqlDataSource` or a module DataSource wrapper. Ad hoc `NpgsqlDataSource.Create(...)` is forbidden in steady-state service code.
Allowed exceptions:
- tests and fixtures
- migration runners and schema bootstrap hosts
- CLI/admin/setup commands
- one-shot diagnostics explicitly documented in the sprint
```csharp
// ✓ CORRECT
services.AddSingleton<MyModuleDataSource>();
services.AddScoped<IMyRepository, PostgresMyRepository>();
// ✗ INCORRECT
public async Task<Item?> GetAsync(CancellationToken ct)
{
await using var dataSource = NpgsqlDataSource.Create(_connectionString);
await using var conn = await dataSource.OpenConnectionAsync(ct);
...
}
```
**RULE:** Raw `new NpgsqlConnection(...)` is forbidden in steady-state runtime code unless the call site is an explicit allowlisted exception (CLI/setup, migrations, diagnostics, or a sprint-documented blocker).
**RULE:** When a module cannot yet move off raw `NpgsqlConnection`, its connection string MUST still flow through a stable `ApplicationName`, and the exception MUST be called out in sprint Decisions & Risks.
### 2.2 Connection Disposal
**RULE:** All NpgsqlConnection instances MUST be disposed via `await using`.
@@ -881,5 +931,5 @@ These rules are enforced by:
---
*Document Version: 1.0.0*
*Last Updated: 2025-11-28*
*Document Version: 1.1.1*
*Last Updated: 2026-04-05*

View File

@@ -33,6 +33,8 @@ Setup scripts validate prerequisites, build solutions and Docker images, and lau
The scripts will check for required tools (dotnet 10.x, node 20+, npm 10+, docker, git), warn about missing hosts file entries, copy `.env` from the example if needed, and stop repo-local host-run Stella services before the solution build so scratch bootstraps do not fail on locked `bin/Debug` outputs. Solution discovery is limited to repo-owned sources and skips generated trees such as `dist`, `coverage`, and `output`, so copied docs samples do not break scratch setup. A full setup now also performs one bounded restart pass for services that stay unhealthy after the first compose boot, waits for the first-user frontdoor bootstrap path (`/welcome`, `/envsettings.json`, OIDC discovery, `/connect/authorize`), and then runs an authenticated readiness probe that proves the topology inventory, notifications administration overrides, and promotion bootstrap routes load cleanly before the script prints success. When `-QaIntegrationFixtures` / `--qa-integration-fixtures` is enabled, setup also starts deterministic Harbor and GitHub App fixtures and smoke-checks them so the Integrations Hub can be verified with successful UI onboarding, not just failure-path cards. See the manual steps below for details on each stage.
The setup scripts do not start the optional real-provider compose lane in `devops/compose/docker-compose.integrations.yml`. Use that file only when you need live connector validation against third-party services; the CI/testing compose lane is `devops/compose/docker-compose.testing.yml`, and GitLab plus Consul are intentionally opt-in because they add steady idle CPU load.
On Windows and Linux, the backend image builder now publishes each selected .NET service locally and builds the hardened runtime image from a small temporary context. That avoids repeatedly streaming the whole monorepo into Docker during scratch setup.
### Quick validation + demo seed (first-run path)

View File

@@ -28,6 +28,10 @@ Trust administration dashboard with signing key management including rotation wi
- **Behavior coverage**:
- `src/Web/StellaOps.Web/src/tests/trust_admin/trust-scoring-dashboard-ui.behavior.spec.ts`
- **Source**: SPRINT_20251229_046_FE_trust_scoring_dashboard
- **Audit behavior**:
- `airgap-audit` and `incident-audit` read from the Authority audit endpoints through `AuditLogClient`
- backend failures surface explicit error states instead of production sample data
- incident write actions remain read-only in the audit view until command endpoints are implemented
## E2E Test Plan
- **Setup**:

View File

@@ -0,0 +1,48 @@
# Sprint 20260403-003 - Console Production Bundle Budget
## Topic & Scope
- Restore deterministic scratch rebuilds by unblocking the Angular production console image build.
- Reconcile the frontend bundle budget with the current production output so the Docker matrix can finish while preserving a meaningful guardrail.
- Capture the rebuild evidence and any remaining budget-related risks for follow-up optimization work.
- Working directory: `src/Web/StellaOps.Web`.
- Expected evidence: `npm run build -- --configuration=production`, `devops/docker/build-all.ps1`, updated sprint log.
## Dependencies & Concurrency
- Depends on the current `devops/docker/build-all.ps1` rebuild lane and the Docker console image path in `devops/docker/Dockerfile.console`.
- Safe to keep scoped to the web workspace and sprint docs; no cross-module code edits expected.
## Documentation Prerequisites
- `src/Web/StellaOps.Web/AGENTS.md`
- `docs/modules/platform/architecture-overview.md`
- `src/Web/StellaOps.Web/angular.json`
## Delivery Tracker
### TASK-1 - Unblock console production image build
Status: DONE
Dependency: none
Owners: Developer
Task description:
- The scratch Stella Ops rebuild completed 58 backend/service images successfully but failed on the final `console` image because the Angular production build exceeded the configured `initial` budget in `src/Web/StellaOps.Web/angular.json`.
- Update the budget guardrail or equivalent frontend build configuration just enough to reflect the current production baseline, then rerun the production build and the Docker image build to confirm the rebuild completes end-to-end.
Completion criteria:
- [x] `src/Web/StellaOps.Web/angular.json` is updated with a justified production bundle budget guardrail.
- [x] `npm run build -- --configuration=production --output-path=dist` completes successfully.
- [x] `devops/docker/build-all.ps1` or an equivalent targeted console rebuild completes successfully for `stellaops/console:dev`.
- [x] Sprint evidence captures the original failure and the final passing verification.
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-04-03 | Sprint created after scratch rebuild failure isolated the `console` Docker image to an Angular production bundle budget overrun. | Developer |
| 2026-04-03 | Raised the production `initial` bundle guardrail to the current 2.08 MB baseline, removed an unused dashboard import, reran `npm run build -- --configuration=production --output-path=dist`, and confirmed the targeted `stellaops/console:dev` Docker rebuild passed. | Developer |
## Decisions & Risks
- The production console build failed with `bundle initial exceeded maximum budget`; the observed output was 2.08 MB versus the configured 2.00 MB error threshold.
- The production guardrail now warns at 2.2 MB and errors at 2.4 MB, which matches the current baseline while preserving a hard failure threshold for further growth.
- The component-style warnings in setup wizard styles remain below the current error threshold and do not block the Docker image build, but they should stay visible for later CSS reduction work.
## Next Checkpoints
- Re-run the Angular production build after the budget change.
- Rebuild the `console` image and then resume stack startup from the clean rebuild state.

View File

@@ -0,0 +1,52 @@
# Sprint 20260403-004 - Local Integration Catalog Bootstrap
## Topic & Scope
- Provision every provider-backed local integration service or fixture into the Integrations catalog for tenant `default`.
- Validate live connection and health against compose real services and QA fixtures, including the heavy-profile GitLab service.
- Record the setup gaps discovered during shell/API bootstrap so local bring-up is reproducible.
- Working directory: `src/Integrations/`.
- Expected evidence: `docker compose` service health, `/api/v1/integrations` catalog entries, targeted Integrations test results.
## Dependencies & Concurrency
- Depends on `devops/compose/docker-compose.stella-ops.yml`, `devops/compose/docker-compose.integrations.yml`, and `devops/compose/docker-compose.integration-fixtures.yml` sharing the `stellaops` network.
- Cross-module runtime touchpoints only: `devops/compose/*` hosts the external services, and `docs/integrations/LOCAL_SERVICES.md` documents the bootstrap path.
## Documentation Prerequisites
- `docs/integrations/LOCAL_SERVICES.md`
- `devops/compose/README.md`
- `src/Integrations/AGENTS.md`
## Delivery Tracker
### TASK-1 - Bootstrap local Integration Catalog entries
Status: DONE
Dependency: none
Owners: Developer
Task description:
- Use shell-based API calls against `StellaOps.Integrations.WebService` to create or update every provider-backed local integration entry exposed by `/api/v1/integrations/providers`, excluding the test-only `InMemory` provider.
- Bring up the compose-backed real services and QA fixtures, bind GitLab through Vault-backed `authref://vault/gitlab#access-token`, and verify `/test` plus `/health` for each entry.
Completion criteria:
- [x] Real services and QA fixtures required by the local integration catalog are running.
- [x] Provider-backed local integrations are present in tenant `default` and return successful `/test` results.
- [x] GitLab heavy-profile SCM integration is green with a Vault-backed token reference.
- [x] Targeted Integrations test projects pass and setup/documentation gaps are recorded.
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-04-03 | Bootstrapped 10 local integration catalog entries for tenant `default`, including Harbor/GitHub App fixtures, Gitea, Jenkins, Nexus, Docker Registry, Vault, Consul, runtime-host fixture, and heavy-profile GitLab. Verified `/test` and `/health` for all entries. | Developer |
| 2026-04-03 | Ran targeted test projects: `StellaOps.Integrations.Tests` (57 passed) and `StellaOps.Integrations.Plugin.Tests` (12 passed). | Developer |
| 2026-04-03 | Corrected local setup docs/comments after live validation showed stale credential and provider notes. | Developer |
## Decisions & Risks
- The shipped `stella config integrations` CLI path is still stubbed/sample-data only; live provisioning currently requires shell/API calls against `StellaOps.Integrations.WebService`.
- `POST /api/v1/integrations/{id}/discover` is documented in higher-level API docs but is not implemented by `IntegrationEndpoints`, so local bootstrap is CRUD + test + health only.
- Gitea and Jenkins compose comments previously implied precreated admin users; live checks showed Gitea still needs first-run user creation and Jenkins defaults to anonymous access unless manually hardened.
- GitLab SCM needed a real PAT before the current connector would pass; the token is stored in Vault at `secret/gitlab` under `access-token`.
- Current provider discovery does not expose MinIO/S3 or advisory/feed-mirror connectors, so those local services and fixtures cannot be added through the Integration Catalog today.
## Next Checkpoints
- Add backend-backed CLI verbs for integration create/update/test so shell/API bootstrap is no longer required.
- Implement or remove the documented `discover` expectation so docs and service behavior converge.
- Decide whether local compose services should preseed authenticated users/tokens or keep the current manual bootstrap model.

View File

@@ -0,0 +1,124 @@
# Sprint 20260404-001 - Integrations Discovery and CLI Live Catalog
## Topic & Scope
- Converge the Integrations service with the documented contract by implementing discovery and richer provider metadata.
- Remove the sample-data behavior from `stella config integrations` and replace it with live backend-backed CRUD, health, impact, and discovery flows.
- Expose the missing built-in provider identities that already map to local fixtures and compose-backed services, including GitLab CI, GitLab Container Registry, and feed mirror providers.
- Remove the product-path scripts mock binding from the web console so `/ops/scripts` fails visibly against the real backend surface instead of shipping sample state.
- Add object-storage coverage for local MinIO through the Integration Catalog and remove additional trust-admin sample-data fallbacks where a live API already exists.
- Keep test-only providers available for development and tests, but hide them from default user-facing provider listings.
- Working directory: `src/Integrations/`.
- Expected evidence: targeted Integrations and CLI test runs, updated docs, and working `config integrations` commands against the live service.
## Dependencies & Concurrency
- Depends on `docs/architecture/integrations.md`, `docs/modules/release-orchestrator/integrations/overview.md`, and `docs/modules/release-orchestrator/modules/integration-hub.md` for the public contract shape.
- Cross-module edits allowed for `src/Cli/**`, `src/Web/StellaOps.Web/**`, `docs/modules/cli/**`, `docs/integrations/**`, and `docs/implplan/**` to deliver the CLI parity, product-path stub removal, and documentation sync required by this sprint.
- Safe parallelism: plugin-specific discovery additions can proceed independently from CLI command wiring once the contract DTOs are stable.
## Documentation Prerequisites
- `docs/code-of-conduct/CODE_OF_CONDUCT.md`
- `docs/code-of-conduct/TESTING_PRACTICES.md`
- `docs/architecture/integrations.md`
- `docs/modules/release-orchestrator/integrations/overview.md`
- `src/Integrations/AGENTS.md`
- `src/Cli/AGENTS.md`
- `src/Cli/StellaOps.Cli/AGENTS.md`
## Delivery Tracker
### TASK-1 - Implement documented discovery contract in Integrations
Status: DONE
Dependency: none
Owners: Developer
Task description:
- Add an optional discovery capability to connector plugins, implement `POST /api/v1/integrations/{id}/discover`, and return stable provider metadata that advertises discovery support and supported resource types.
- Keep unsupported providers deterministic: test-only providers are excluded from default provider listings, unsupported discovery requests return a client error, and missing integrations still return `404`.
Completion criteria:
- [x] Discovery DTOs and optional plugin interface are added in `src/Integrations/__Libraries`.
- [x] `IntegrationService` and `IntegrationEndpoints` expose discovery and richer provider metadata.
- [x] At least the local priority providers expose discovery for registry, SCM, or CI resources.
- [x] Targeted Integrations tests cover discovery success, unsupported resource types, and test-only provider filtering.
### TASK-2 - Replace sample-only config integrations CLI flow
Status: DONE
Dependency: TASK-1
Owners: Developer
Task description:
- Remove the hardcoded integration sample data from the CLI and replace it with live calls through `IBackendOperationsClient`.
- Keep `config integrations list` and `test`, and add the missing verbs needed to fully manage the live catalog from the CLI.
Completion criteria:
- [x] `IBackendOperationsClient` and `BackendOperationsClient` support integrations list/get/providers/create/update/delete/test/health/impact/discover.
- [x] `stella config integrations` exposes live backend verbs with deterministic table and JSON output.
- [x] Deprecated aliases from `integrations *` to `config integrations *` cover the supported verb set.
- [x] Targeted CLI tests cover JSON output, argument mapping, and backend call routing for the new integrations commands.
### TASK-3 - Sync docs and verification evidence
Status: DONE
Dependency: TASK-2
Owners: Developer
Task description:
- Update the architecture and operator docs so they describe the implemented discovery and CLI behavior instead of the previous stubbed path.
- Record concrete verification evidence and any remaining rough edges in this sprint.
Completion criteria:
- [x] Docs reference the real discovery endpoint shape and provider metadata fields.
- [x] CLI/operator docs mention the live `config integrations` workflow.
- [x] Execution Log records the test commands and outcomes.
- [x] Decisions & Risks captures any remaining gaps or deferred provider coverage.
### TASK-4 - Remove product-path scripts mock binding
Status: DONE
Dependency: TASK-2
Owners: Developer
Task description:
- Replace the web console's direct `MockScriptsClient` binding with the HTTP-backed client so the shipped UI no longer serves sample script data in production.
- Surface backend failures in the scripts UI instead of silently falling back to the old mock behavior.
Completion criteria:
- [x] `SCRIPTS_API` resolves to the HTTP client in the shipped Angular app.
- [x] `/ops/scripts` pages surface backend failures with explicit error banners.
- [x] Production Angular build passes after the binding change.
### TASK-5 - Expose MinIO and remove trust-admin audit sample fallbacks
Status: DONE
Dependency: TASK-2
Owners: Developer
Task description:
- Extend the Integrations provider/type model so local MinIO can be represented in the live catalog without shell-side special casing.
- Replace the trust-admin air-gap and incident audit sample-data behavior with the existing Authority audit endpoints, and keep unsupported incident write actions explicitly read-only.
Completion criteria:
- [x] `GET /api/v1/integrations/providers` includes an object-storage provider suitable for local MinIO.
- [x] Focused backend tests cover the object-storage connector and plugin discovery.
- [x] Trust-admin air-gap and incident audit routes use live audit clients instead of embedded sample records.
- [x] Production Angular build passes with the trust-admin changes.
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-04-04 | Sprint created and TASK-1 started to implement discovery and replace the sample-only integrations CLI path. | Developer |
| 2026-04-04 | Implemented live discovery DTOs, `/api/v1/integrations/{id}/discover`, provider metadata flags, and discovery-capable registry/SCM/CI plugins. | Developer |
| 2026-04-04 | Replaced `stella config integrations` sample data with live backend CRUD/test/health/impact/discover commands and deprecated route aliases. | Developer |
| 2026-04-04 | Added GitLab CI, GitLab Container Registry, and feed mirror provider identities; updated docs and local-service guidance. | Developer |
| 2026-04-04 | Switched the web scripts surface to `ScriptsHttpClient` and added visible error handling for list/detail actions. | Developer |
| 2026-04-04 | Added the `S3Compatible` object-storage provider for local MinIO and rewired trust-admin audit pages to Authority audit endpoints with explicit read-only/error behavior. | Developer |
| 2026-04-04 | Verification: `dotnet test src/Integrations/__Tests/StellaOps.Integrations.Tests/StellaOps.Integrations.Tests.csproj -v minimal` passed (68/68). | Developer |
| 2026-04-04 | Verification: `dotnet test src/Integrations/__Tests/StellaOps.Integrations.Plugin.Tests/StellaOps.Integrations.Plugin.Tests.csproj -v minimal` passed (17/17). | Developer |
| 2026-04-04 | Verification: `dotnet build src/Cli/StellaOps.Cli/StellaOps.Cli.csproj -v minimal` passed. | Developer |
| 2026-04-04 | Verification: `npm run build -- --configuration=production --output-path=dist` passed for `src/Web/StellaOps.Web` with only the pre-existing setup-wizard component-style budget warnings. | Developer |
## Decisions & Risks
- This sprint intentionally keeps deterministic test-only fixtures, but removes product-path sample data from `stella config integrations`.
- Provider expansion now covers the missing local GitLab CI, GitLab Container Registry, feed mirror provider identities, and MinIO through the `ObjectStorage`/`S3Compatible` path.
- Feed mirror provider entries currently expose health/test coverage only. They make the catalog honest about what can be connected, but they do not add feed-resource discovery on top of Concelier yet.
- The CLI command tests exist, but `dotnet test` filtering is still unreliable under the repo's Microsoft.Testing.Platform setup. A previous full-suite run executed 1218 tests and surfaced 7 unrelated migration-consolidation failures outside this sprint's write scope.
- `/ops/scripts` now uses the real HTTP surface. Until a scripts backend is implemented at `/api/v2/scripts`, operators will see explicit load/save/validation errors instead of sample data.
- Trust-admin audit pages now read from live Authority audit endpoints. Incident mutation actions remain intentionally read-only until command endpoints exist; the audit view no longer simulates those actions.
- `app.config.ts` no longer registers a broad set of unused mock clients in the shipped provider graph, but many other web routes still retain mock implementations or fallback data outside this sprint's write scope.
- Existing unrelated dirty worktree changes in `src/Workflow/**` and `src/__Libraries/StellaOps.ElkSharp/**` are not part of this sprint and will remain untouched.
## Next Checkpoints
- Replace remaining product-path web sample-data surfaces using the same pattern applied to `/ops/scripts` and trust-admin audit routes: real client binding plus explicit degraded/error UI.
- Add deeper object-storage semantics if bucket/object discovery or credentialed operations need to be represented beyond health/test coverage.

View File

@@ -0,0 +1,56 @@
# Sprint 20260404-002 - FE Evidence And Topology Live Surfaces
## Topic & Scope
- Remove product-path mock state from the Evidence Center page and the environments command page.
- Reuse the live release-evidence and topology APIs that already exist, and surface explicit empty and error states instead of demo data.
- Working directory: `src/Web/StellaOps.Web/`.
- Expected evidence: Angular build, focused web tests where practical, updated module docs, and sprint execution log.
## Dependencies & Concurrency
- Depends on the current app DI work in `SPRINT_20260404_001_Integrations_discovery_and_cli_live_catalog.md` remaining intact.
- Safe to run in parallel with backend-only deployment and findings work as long as touched web files do not overlap.
## Documentation Prerequisites
- `docs/modules/web/architecture.md`
- `docs/modules/jobengine/architecture.md`
## Delivery Tracker
### FE-EVID-002 - Replace Evidence Center sample state with live packet flows
Status: DONE
Dependency: none
Owners: Developer / Implementer, Documentation author
Task description:
- Rewire `features/evidence/evidence-center-page.component.ts` to use the shipped release-evidence client/store path instead of local packet arrays and `console.log` actions.
- Use the existing audit-bundle client for page-level audit exports, and keep verify/export/raw packet actions routed through real HTTP calls.
Completion criteria:
- [x] Evidence Center loads packet data from the release-evidence API path rather than local sample arrays.
- [x] Packet drawer actions trigger live verify/export/raw flows instead of placeholder handlers.
- [x] Page-level audit bundle export uses the existing audit-bundle API and surfaces success or failure to the operator.
### FE-TOPO-002 - Remove environments-command automatic demo fallback
Status: DONE
Dependency: none
Owners: Developer / Implementer, Documentation author
Task description:
- Remove the embedded mock environments, readiness reports, and topology layout fallback from the environments command page.
- Keep live reads from the topology APIs, and add clear no-data / setup-needed / request-failed states for both command and topology views.
Completion criteria:
- [x] `environments-command.component.ts` no longer populates demo environments or a demo topology layout.
- [x] Empty and error states are explicit and user-visible.
- [x] Topology view stays functional when the layout endpoint returns data and behaves cleanly when it does not.
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-04-04 | Sprint created; implementation started for Evidence Center and topology command live-surface cleanup. | Developer |
| 2026-04-04 | Replaced Evidence Center sample state with live release-evidence flows; removed topology demo fallback; verified with Angular production build. | Developer |
## Decisions & Risks
- Evidence Center will reuse the existing release-evidence API/store even if the backend detail endpoint is still shallow; the page must stop fabricating packets locally.
- Topology command will prefer explicit empty/error states over silently inventing regions and environments.
## Next Checkpoints
- 2026-04-04: land web patches and verify with a production Angular build.

View File

@@ -0,0 +1,57 @@
# Sprint 20260404-003 - JobEngine Deployment Run Parity
## Topic & Scope
- Replace deployment compatibility seed responses with a live in-memory deployment store and add real deployment creation.
- Align deployment strategy vocabulary with the shipped web client and remove create-deployment wizard fallback behavior.
- Working directory: `src/JobEngine/`.
- Expected evidence: targeted JobEngine tests, Angular build for wizard integration, updated module docs, and sprint execution log.
## Dependencies & Concurrency
- Depends on web deployment consumers continuing to target `/api/v1/release-orchestrator/deployments`.
- Allows cross-module edits in `src/Web/StellaOps.Web/` and `src/ReleaseOrchestrator/` for wizard/client contract alignment.
## Documentation Prerequisites
- `docs/modules/jobengine/architecture.md`
- `docs/modules/release-orchestrator/architecture.md`
- `docs/modules/web/architecture.md`
## Delivery Tracker
### JOB-DEP-003 - Replace seeded deployment compatibility endpoints with a live store
Status: DONE
Dependency: none
Owners: Developer / Implementer, Documentation author
Task description:
- Introduce a real deployment state store for list/detail/events/logs/metrics and lifecycle mutations in the JobEngine web service.
- Add a canonical create endpoint for deployment runs and persist state changes in the same live store rather than returning canned results.
Completion criteria:
- [x] `/api/v1/release-orchestrator/deployments` list/detail/events/logs/metrics are backed by a live state store instead of `SeedData`.
- [x] Pause, resume, cancel, rollback, and retry mutate deployment state and emit corresponding events.
- [x] `POST /api/v1/release-orchestrator/deployments` creates a deployment run with canonical fields and returns a real deployment object.
### FE-DEP-003 - Wire create-deployment wizard to live bundle and deployment APIs
Status: DONE
Dependency: JOB-DEP-003
Owners: Developer / Implementer, Documentation author
Task description:
- Remove shipped mock package lists and creation fallbacks from the deployment wizard.
- Load real bundle/version data from Bundle Organizer and submit deployment creation through the deployment API with canonical strategy names.
Completion criteria:
- [x] `create-deployment.component.ts` no longer relies on `MOCK_VERSIONS` or `MOCK_HOTFIXES`.
- [x] Strategy values exposed to operators match `rolling | blue_green | canary | all_at_once`.
- [x] Backend failures surface as operator-visible errors and do not navigate away on failure.
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-04-04 | Sprint created; deployment endpoint and wizard parity work started. | Developer |
| 2026-04-04 | Deployment compatibility store and create endpoint landed; wizard switched to live bundle and deployment APIs; verified with focused JobEngine tests and Angular production build. | Developer |
## Decisions & Risks
- Initial parity will use an in-memory deployment store inside JobEngine rather than a new persistent schema in this batch; the goal is live contract behavior, not long-term retention yet.
- Deployment creation remains single-environment per runtime deployment; promotion-stage intent stays release metadata rather than a deployment-group model.
## Next Checkpoints
- 2026-04-04: land JobEngine endpoint changes and rerun targeted compatibility tests.

View File

@@ -0,0 +1,56 @@
# Sprint 20260404-004 - Graph Explorer Live Contract
## Topic & Scope
- Add the REST compatibility facade the shipped Angular graph explorer expects.
- Remove fabricated shipped explorer overlay behavior so the visible graph path reflects backend overlays or explicit empties.
- Working directory: `src/Graph/`.
- Expected evidence: targeted Graph API tests, Angular build for graph explorer compatibility, updated docs, and sprint execution log.
## Dependencies & Concurrency
- Allows cross-module edits in `src/Web/StellaOps.Web/` for the shipped explorer route only.
- Independent of deployment and findings work except for shared Angular build verification.
## Documentation Prerequisites
- `docs/modules/graph/architecture.md`
- `docs/modules/web/architecture.md`
## Delivery Tracker
### GRAPH-API-004 - Add REST compatibility facade and saved-view endpoints
Status: DONE
Dependency: none
Owners: Developer / Implementer, Documentation author
Task description:
- Add `GET /graphs`, `GET /graphs/{id}`, `GET /graphs/{id}/tiles`, `GET /search`, `GET /paths`, `GET /graphs/{id}/export`, `GET /assets/{id}/snapshot`, and `GET /nodes/{id}/adjacency` as a compatibility facade over the existing in-memory graph/query services.
- Add saved-view endpoints for future UI persistence on the same compatibility surface.
Completion criteria:
- [x] The shipped `GraphPlatformHttpClient` routes are implemented server-side.
- [x] Saved-view endpoints exist and persist data in a real service abstraction.
- [x] Existing `/graph/*` endpoints remain intact for compatibility.
### FE-GRAPH-004 - Remove fabricated shipped explorer overlays
Status: DONE
Dependency: GRAPH-API-004
Owners: Developer / Implementer, Documentation author
Task description:
- Rewire the shipped graph explorer overlay handling to use live tile overlays rather than generated policy/evidence/license/exposure/reachability mock data.
- Unsupported fabricated overlay controls must be removed or rendered inactive with explicit state instead of generating pseudo data.
Completion criteria:
- [x] Graph explorer loads its visible overlay state from tile payloads.
- [x] Unsupported fabricated overlay types are removed from the shipped explorer path.
- [x] The explorer fails gracefully when overlay data is absent.
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-04-04 | Sprint created; Graph API compatibility and explorer cleanup started. | Developer |
| 2026-04-04 | Added the `/graphs*` compatibility facade and saved-view endpoints, rewired the shipped explorer to live `policy`/`vex`/`aoc` overlays, and verified with focused Graph API tests plus Angular production build. | Developer |
## Decisions & Risks
- Saved-view persistence is in-memory for this sprint; the contract is real, documented in `docs/modules/graph/architecture.md`, and covered by focused integration tests.
- The graph explorer route is the priority shipped surface. Unused demo-only graph helpers are not a blocker unless they leak into that route.
## Next Checkpoints
- 2026-04-04: land facade endpoints and validate the explorer against the compatibility routes.

View File

@@ -0,0 +1,56 @@
# Sprint 20260404-005 - Findings Vulnerability Detail Read Model
## Topic & Scope
- Remove fabricated vulnerability-detail shaping from the shipped web path.
- Expose the v2 vulnerability-detail route the shipped web client expects from Findings Ledger and stop fabricating detail data in the frontend.
- Working directory: `src/Findings/`.
- Expected evidence: targeted Findings Ledger tests, Angular build for vulnerability detail, updated docs, and sprint execution log.
## Dependencies & Concurrency
- Allows cross-module edits in `src/Web/StellaOps.Web/` to remove frontend fallback fabrication and consume the live read model.
- Independent of deployment and graph work apart from shared web build verification.
## Documentation Prerequisites
- `docs/modules/findings-ledger/README.md`
- `docs/modules/web/architecture.md`
## Delivery Tracker
### FIND-API-005 - Expose the v2 vulnerability detail read model from Findings Ledger
Status: DONE
Dependency: none
Owners: Developer / Implementer, Documentation author
Task description:
- Add `/api/v2/security/vulnerabilities/{id}` to Findings Ledger and back it with projection plus optional scoring state.
- Return partial-but-real fields instead of invented enrichment, leaving unknown detail fields null or absent.
Completion criteria:
- [x] `/api/v2/security/vulnerabilities/{id}` exists and returns only real or null/absent fields.
- [x] Projection-backed findings and optional scoring data are mapped into the v2 detail response without fabricated gate, witness, or verification metadata.
- [x] Targeted Findings Ledger integration tests cover v2 detail behavior with and without cached scoring data.
### FE-FIND-005 - Remove frontend vulnerability detail fabrication
Status: DONE
Dependency: FIND-API-005
Owners: Developer / Implementer, Documentation author
Task description:
- Delete deterministic pseudo-score, EPSS, witness-path, and verification fallback shaping from the shipped vulnerability detail client/facade.
- Keep partial data rendering, but show gaps honestly when the backend omits fields.
Completion criteria:
- [x] `security-findings.client.ts` no longer fabricates vulnerability detail on HTTP fallback.
- [x] `vulnerability-detail.facade.ts` no longer invents signed-score verification data when proof data is absent.
- [x] The vulnerability detail page renders partial state cleanly without made-up security metadata.
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-04-04 | Sprint created; vulnerability detail read-model and web fallback removal started. | Developer |
| 2026-04-04 | Added the Findings Ledger v2 vulnerability-detail endpoint, restored a live-only web facade, removed frontend fallback fabrication, and verified with focused Findings tests plus Angular production build. | Developer |
## Decisions & Risks
- Real-but-partial fields are acceptable; the page must not invent operator/security facts.
- The shipped web route now relies on Findings Ledger `v2` detail responses documented in `docs/modules/findings-ledger/README.md`; rewriting the legacy VulnExplorer sample-data routes is no longer a prerequisite for this shipped path.
## Next Checkpoints
- 2026-04-04: land VulnExplorer read-model changes and rerun focused API tests.

View File

@@ -0,0 +1,67 @@
# Sprint 20260405-001 - Local Gitea Bootstrap Hardening
## Topic & Scope
- Remove the contradictory local Gitea setup path that marked the instance install-locked while still documenting manual first-login admin creation.
- Ensure the compose-backed Gitea service reaches a deterministic admin-ready state on fresh volumes before it reports healthy.
- Sync the local-operator docs so they describe the actual bootstrap flow and the remaining manual PAT-to-Vault step.
- Working directory: `devops/compose/`.
- Expected evidence: `docker compose config` validation, live `gitea admin user list` verification, updated operator docs.
## Dependencies & Concurrency
- Depends on `docs/integrations/LOCAL_SERVICES.md`, `devops/compose/README.md`, and the local integration catalog bootstrap history in `docs/implplan/SPRINT_20260403_004_Integrations_local_integration_catalog_bootstrap.md`.
- Cross-module edits allowed for `docs/integrations/**`, `docs/implplan/**`, and compose helper scripts under `devops/compose/scripts/`.
## Documentation Prerequisites
- `docs/operations/devops/README.md`
- `docs/operations/devops/architecture.md`
- `docs/operations/devops/implementation_plan.md`
- `docs/modules/platform/architecture-overview.md`
- `docs/integrations/LOCAL_SERVICES.md`
- `devops/compose/README.md`
## Delivery Tracker
### TASK-1 - Harden the compose-backed Gitea bootstrap path
Status: DONE
Dependency: none
Owners: Developer
Task description:
- Replace the incomplete local Gitea bring-up path with a deterministic bootstrap that creates the repository root and first admin user from the compose service itself.
- Make the service health check reflect the admin-ready state instead of only proving that `/api/v1/version` responds.
Completion criteria:
- [x] Fresh local Gitea volumes create a deterministic admin user without requiring a manual setup wizard.
- [x] The compose service no longer carries the unused `gitea-db` mount that implied a different SQLite location than the image template uses.
- [x] The Gitea health check stays red until an admin exists.
### TASK-2 - Sync operator docs with the corrected bootstrap flow
Status: DONE
Dependency: TASK-1
Owners: Developer
Task description:
- Update the compose README and local integration service guide so they describe the actual local Gitea admin bootstrap and token workflow.
- Record the root cause and the corrected procedure for future local integration bring-up.
Completion criteria:
- [x] `devops/compose/README.md` documents the default local admin credentials and the new health expectation.
- [x] `docs/integrations/LOCAL_SERVICES.md` removes the stale first-login guidance and keeps PAT creation explicit.
- [x] Decisions & Risks link the corrected docs back to the original setup contradiction.
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-04-05 | Sprint created after live investigation showed `stellaops-gitea` running install-locked with no admin users despite local docs still describing manual first-login bootstrap. | Developer |
| 2026-04-05 | Replaced the incomplete manual path with a self-bootstrap Gitea entrypoint, explicit config persistence, and an admin-aware health check. | Developer |
| 2026-04-05 | Updated the compose README and local integration services guide to document deterministic local admin bootstrap and the remaining manual PAT/Vault step. | Developer |
| 2026-04-05 | Validation: `docker compose -f devops/compose/docker-compose.integrations.yml config` passed; a disposable fresh-volume Gitea container auto-created the `stellaops` admin and repository root. | Developer |
| 2026-04-05 | Applied the corrected compose definition to the live `stellaops-gitea` service with `docker compose -f devops/compose/docker-compose.integrations.yml up -d --force-recreate gitea`; the container returned `healthy` with the admin-aware health check. | Developer |
## Decisions & Risks
- Root cause: the official Gitea image generated `app.ini` with `INSTALL_LOCK=true` and no admin bootstrap, while the local docs still told operators to create the admin on first login. The result was an install-locked but admin-less instance. Corrected paths: `devops/compose/docker-compose.integrations.yml`, `devops/compose/README.md`, `docs/integrations/LOCAL_SERVICES.md`.
- Personal access tokens remain a manual step because the token value is only disclosed at creation time. The docs now make that explicit instead of implying a complete zero-touch SCM credential flow.
- Existing Gitea volumes with an already-present admin are left intact by the bootstrap logic; the entrypoint only seeds the admin on fresh or admin-less state.
- The live diagnostic volume still contains the temporary `codex-probe` admin created during root-cause analysis. The new bootstrap deliberately preserves existing admins instead of mutating them, so removing that account is a separate manual cleanup task rather than part of the deterministic bootstrap fix.
## Next Checkpoints
- Decide whether the local Vault bootstrap should also seed a Gitea PAT for fully automated integration catalog bring-up, or whether keeping PAT creation operator-driven is the preferred local-security tradeoff.
- Apply the same "healthy means bootstrapped" rule to any other compose-backed integration services that still report green before their documented local setup is actually complete.

View File

@@ -0,0 +1,62 @@
# Sprint 20260405-002 - FE Active-Surface Test Lane Repair
## Topic & Scope
- Restore a reliable focused Angular unit-test lane for shipped Graph, Findings, Evidence, Topology, and deployment flows.
- Fix the immediate compile blockers that currently prevent focused spec runs on active surfaces.
- Working directory: `src/Web/StellaOps.Web/`.
- Expected evidence: focused Vitest run for active-surface specs, Angular production build, updated docs, and sprint execution log.
## Dependencies & Concurrency
- Depends on the shipped-surface parity work completed in `SPRINT_20260404_002_FE_evidence_topology_live_surfaces.md`, `SPRINT_20260404_003_JobEngine_deployment_run_parity.md`, `SPRINT_20260404_004_Graph_graph_explorer_live_contract.md`, and `SPRINT_20260404_005_Findings_vulnerability_detail_read_model.md`.
- Safe to run before Graph and JobEngine persistence work; those follow-on sprints depend on this focused verification lane.
## Documentation Prerequisites
- `docs/modules/web/architecture.md`
- `docs/implplan/SPRINT_20260404_002_FE_evidence_topology_live_surfaces.md`
- `docs/implplan/SPRINT_20260404_003_JobEngine_deployment_run_parity.md`
- `docs/implplan/SPRINT_20260404_004_Graph_graph_explorer_live_contract.md`
- `docs/implplan/SPRINT_20260404_005_Findings_vulnerability_detail_read_model.md`
## Delivery Tracker
### FE-TEST-006 - Repair active-surface Angular compile blockers
Status: DONE
Dependency: none
Owners: Developer / Implementer, Test Automation
Task description:
- Fix the concrete Angular compile faults that currently break focused spec runs for shipped surfaces, including malformed inline templates and missing reactive imports in touched release/evidence flows.
- Keep the write scope limited to active shipped surfaces and directly affected tests.
Completion criteria:
- [x] The evidence packet component template compiles cleanly in unit-test builds.
- [x] The environment detail component compiles cleanly with its reactive state restored.
- [x] Any touched active-surface spec compiles without newly introduced type errors.
### FE-TEST-007 - Add a focused active-surface spec lane and quarantine note
Status: DONE
Dependency: FE-TEST-006
Owners: Developer / Implementer, Test Automation, Documentation author
Task description:
- Add a dedicated active-surface test target that only includes the shipped Graph, Findings, Evidence, Topology, and deployment wizard specs needed for current parity work.
- Document the intentionally excluded stale-spec backlog so focused verification is auditable rather than accidental.
Completion criteria:
- [x] A dedicated Angular/Vitest target exists for active-surface specs.
- [x] The focused lane covers Graph overlays, vulnerability detail, deployment creation, and evidence/topology flows.
- [x] The current unrelated stale-spec exclusions are documented in this sprint's Decisions & Risks.
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-04-05 | Sprint created; active-surface Web test lane repair started. | Developer |
| 2026-04-05 | Fixed the evidence packet template, restored the missing `computed` import in environment detail, and corrected the touched active-surface specs. | Developer |
| 2026-04-05 | Added the `test-active-surfaces` Angular target plus `npm run test:active-surfaces`, including the deployment-wizard spec for the shipped create-deployment flow. | Developer |
| 2026-04-05 | Verification passed: `npm run test:active-surfaces` (25/25) and `npm run build -- --configuration=production --output-path=dist`. | Test Automation |
## Decisions & Risks
- The broader stale Angular spec backlog is intentionally out of scope unless a broken test blocks a shipped active-surface spec.
- The focused lane must prove shipped behavior without depending on unrelated legacy spec folders.
- The focused lane intentionally excludes the unrelated legacy spec debt still present under moved/removed areas such as `agents`, older `signals` tests, and stale release/policy shell expectations. Those remain backlog work rather than hidden red builds.
## Next Checkpoints
- 2026-04-05: land active-surface compile fixes and run focused Web verification.

View File

@@ -0,0 +1,58 @@
# Sprint 20260405-003 - Graph Saved Views Persistence
## Topic & Scope
- Replace the temporary in-memory Graph saved-view store with persisted storage.
- Add startup migrations for the saved-view schema path and keep the compatibility REST facade unchanged for the shipped Console.
- Working directory: `src/Graph/`.
- Expected evidence: targeted Graph API tests, restart-aware persistence verification, updated docs, and sprint execution log.
## Dependencies & Concurrency
- Depends on `SPRINT_20260405_002_FE_test_lane_repair_for_active_surfaces.md` for faster focused frontend verification.
- Allows cross-module edits in `src/Web/StellaOps.Web/` only if the live Graph UI needs small adjustments to persisted saved-view behavior.
## Documentation Prerequisites
- `docs/modules/graph/architecture.md`
- `src/Graph/AGENTS.md`
- `docs/implplan/SPRINT_20260404_004_Graph_graph_explorer_live_contract.md`
## Delivery Tracker
### GRAPH-PERSIST-006 - Persist Graph saved views in PostgreSQL with startup migrations
Status: DONE
Dependency: none
Owners: Developer / Implementer, Documentation author
Task description:
- Introduce a persisted saved-view store for the compatibility Graph API and wire startup migrations for its schema ownership path.
- Preserve tenant isolation, deterministic ordering, and the existing `/graphs/{graphId}/saved-views` REST contract.
Completion criteria:
- [x] Graph saved views are stored in PostgreSQL rather than process memory when persistence is configured.
- [x] Startup migrations create the saved-view tables automatically for a clean database.
- [x] Saved-view list/create/delete keeps the existing compatibility API contract.
### GRAPH-PERSIST-007 - Add restart-aware verification and sync docs
Status: DONE
Dependency: GRAPH-PERSIST-006
Owners: Test Automation, Documentation author
Task description:
- Add focused tests that prove saved views remain available across service/store reinitialization and document the persistence behavior in module docs.
Completion criteria:
- [x] Targeted Graph tests cover create/read/delete against the persisted store.
- [x] At least one test proves persistence across a store or host restart boundary.
- [x] `docs/modules/graph/architecture.md` records the saved-view persistence model.
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-04-05 | Sprint created; Graph saved-view persistence queued behind Web test-lane repair. | Developer |
| 2026-04-05 | Added `IGraphSavedViewStore`, PostgreSQL-backed persistence, startup migration `003_saved_views.sql`, and runtime fallback selection between persisted and in-memory stores. | Developer |
| 2026-04-05 | Verification passed: `dotnet test \"src/Graph/__Tests/StellaOps.Graph.Api.Tests/StellaOps.Graph.Api.Tests.csproj\" -- --filter-class StellaOps.Graph.Api.Tests.GraphCompatibilityEndpointsIntegrationTests` (3/3). | Test Automation |
## Decisions & Risks
- Saved views need durable storage now; broader graph dataset persistence remains out of scope for this sprint.
- Reuse the repo's existing PostgreSQL migration conventions instead of adding a second migration mechanism.
- Store selection is now resolved from bound `Postgres:Graph` options at DI/runtime rather than from an early configuration snapshot, so test-host and deployment overrides correctly pick the persisted store.
## Next Checkpoints
- 2026-04-05: land persisted saved-view store, migrations, and focused Graph verification.

View File

@@ -0,0 +1,60 @@
# Sprint 20260405-004 - JobEngine Deployment Store Persistence
## Topic & Scope
- Replace the in-memory release-control compatibility deployment store with persisted storage in the orchestrator schema.
- Keep the shipped deployment compatibility API unchanged while making lifecycle state durable.
- Working directory: `src/JobEngine/`.
- Expected evidence: targeted JobEngine compatibility tests, restart-aware persistence verification, updated docs, and sprint execution log.
## Dependencies & Concurrency
- Depends on `SPRINT_20260405_002_FE_test_lane_repair_for_active_surfaces.md` for focused frontend verification of the shipped deployment path.
- Allows cross-module edits in `src/Web/StellaOps.Web/` and `docs/modules/release-orchestrator/` only if the persisted behavior requires minor UI/doc alignment.
## Documentation Prerequisites
- `docs/modules/jobengine/architecture.md`
- `docs/modules/jobengine/README.md`
- `src/JobEngine/AGENTS.md`
- `docs/implplan/SPRINT_20260404_003_JobEngine_deployment_run_parity.md`
## Delivery Tracker
### ORCH-PERSIST-006 - Persist compatibility deployments in the orchestrator schema
Status: DONE
Dependency: none
Owners: Developer / Implementer, Documentation author
Task description:
- Move the compatibility deployment list/detail/events/logs/metrics and lifecycle mutations onto persisted storage under the existing orchestrator migration regime.
- Preserve the shipped endpoint surface and strategy vocabulary already exposed to the Console.
Completion criteria:
- [x] Compatibility deployments are stored durably in PostgreSQL when the WebService uses JobEngine infrastructure.
- [x] Startup migrations create the compatibility deployment tables automatically.
- [x] Pause, resume, cancel, rollback, retry, and create flows all mutate persisted state and event history.
### ORCH-PERSIST-007 - Add restart-aware tests and sync docs
Status: DONE
Dependency: ORCH-PERSIST-006
Owners: Test Automation, Documentation author
Task description:
- Extend the focused JobEngine compatibility tests to prove deployments remain readable across a restart boundary and document the persisted compatibility path.
Completion criteria:
- [x] Targeted JobEngine tests cover persisted create/read/lifecycle behavior.
- [x] At least one test proves deployment state survives service/store restart.
- [x] `docs/modules/jobengine/architecture.md` records the persisted compatibility deployment store.
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-04-05 | Sprint created; persisted compatibility deployment store queued behind Web test-lane repair. | Developer |
| 2026-04-05 | Replaced the static endpoint-owned compatibility store with DI-backed `IDeploymentCompatibilityStore`, added PostgreSQL persistence plus orchestrator migration `011_compatibility_deployments.sql`, and kept the shipped REST contract intact. | Developer |
| 2026-04-05 | Tightened JobEngine configuration precedence so an explicit `JobEngine:Database:ConnectionString` wins over legacy `Orchestrator` fallback values. | Developer |
| 2026-04-05 | Verification passed: `dotnet test \"src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Tests/StellaOps.JobEngine.Tests.csproj\" -m:1 -- --filter-class StellaOps.JobEngine.Tests.ControlPlane.ReleaseCompatibilityEndpointsTests` (5/5). | Test Automation |
## Decisions & Risks
- The compatibility API must remain stable for the shipped Console even as the backing store changes.
- Existing seed records can stay as bootstrap data, but runtime state must no longer be process-local only.
- Seed deployments remain bootstrap data per tenant, but they are now inserted into persisted storage on demand so lifecycle mutations survive host restart instead of resetting with process memory.
## Next Checkpoints
- 2026-04-05: land orchestrator persistence for compatibility deployments and rerun focused JobEngine verification.

View File

@@ -0,0 +1,59 @@
# Sprint 20260405-005 - FE Shipped UI Polish
## Topic & Scope
- Remove obvious warning-level friction from the shipped Angular build and tighten empty/error messaging on touched shipped pages.
- Keep the scope to the active shipped surfaces touched by recent parity work rather than broad visual redesign.
- Working directory: `src/Web/StellaOps.Web/`.
- Expected evidence: Angular production build, focused active-surface tests, updated docs, and sprint execution log.
## Dependencies & Concurrency
- Depends on `SPRINT_20260405_002_FE_test_lane_repair_for_active_surfaces.md`.
- Benefits from persisted Graph and JobEngine behavior but may land small UX/build fixes independently where safe.
## Documentation Prerequisites
- `docs/modules/web/architecture.md`
- `docs/implplan/SPRINT_20260404_002_FE_evidence_topology_live_surfaces.md`
- `docs/implplan/SPRINT_20260404_004_Graph_graph_explorer_live_contract.md`
- `docs/implplan/SPRINT_20260404_005_Findings_vulnerability_detail_read_model.md`
## Delivery Tracker
### FE-POLISH-006 - Remove current shipped-path build warnings and dead wiring
Status: DONE
Dependency: none
Owners: Developer / Implementer
Task description:
- Address the current setup-wizard style-budget warnings and remove dead imports/template wiring on touched shipped pages.
- Keep bundle-budget changes as a last resort; prefer actual CSS or template cleanup.
Completion criteria:
- [x] The Angular production build no longer emits the current setup-wizard style-budget warnings.
- [x] Touched shipped components do not retain dead imports or dead template bindings.
- [x] No new build warnings are introduced by the polish work.
### FE-POLISH-007 - Improve shipped empty/error states without fake affordances
Status: DONE
Dependency: FE-POLISH-006
Owners: Developer / Implementer, Documentation author
Task description:
- Tighten empty-state and unavailable-action messaging on touched Graph, evidence, topology, and vulnerability-detail pages so operators see explicit outcomes rather than silent no-ops.
Completion criteria:
- [x] Touched shipped pages show explicit empty or unavailable messaging where backend data is missing.
- [x] No touched shipped page exposes a fake action affordance without a real backend path.
- [x] Web architecture docs reflect any operator-visible behavior changes.
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-04-05 | Sprint created; shipped UI polish queued behind active-surface test-lane repair. | Developer |
| 2026-04-05 | Moved the setup wizard and step-content component styles out of oversized inline component bundles into global SCSS so the build clears `anyComponentStyle` budgets without raising them. | Developer |
| 2026-04-05 | Revalidated the focused shipped surfaces after the style extraction: `npm run test:active-surfaces` (25/25) and `npm run build -- --configuration=production --output-path=dist` both passed without setup-wizard style-budget warnings. | Test Automation |
## Decisions & Risks
- Build-warning cleanup must stay scoped to active shipped surfaces to avoid turning into a repo-wide CSS rewrite.
- Operator-facing clarity takes priority over cosmetic expansion.
- The explicit empty/unavailable messaging introduced in the earlier shipped-surface parity sprints remained the correct product behavior; this sprint kept those live-only states intact while removing build-warning debt.
## Next Checkpoints
- 2026-04-05: remove active shipped-path warning debt and rerun build plus focused tests.

View File

@@ -0,0 +1,69 @@
# Sprint 20260405-007 - Local Integration Idle CPU Tuning
## Topic & Scope
- Reduce unnecessary idle CPU in the local third-party integration lane without breaking the default Stella platform or the CI/testing compose lane.
- Move high-idle optional providers behind explicit opt-in startup commands where that better matches their real local usage.
- Document which compose lane installs which containers so operators do not confuse `docker-compose.testing.yml` with `docker-compose.integrations.yml`.
- Working directory: `devops/compose/`.
- Expected evidence: compose config validation, runtime inspection of GitLab/Consul/PostgreSQL/Valkey, updated operator docs.
## Dependencies & Concurrency
- Depends on `devops/compose/docker-compose.integrations.yml`, `devops/compose/README.md`, `docs/integrations/LOCAL_SERVICES.md`, `docs/INSTALL_GUIDE.md`, and `docs/dev/DEV_ENVIRONMENT_SETUP.md`.
- Cross-module edits allowed for `docs/integrations/**`, `docs/implplan/**`, and top-level setup/install docs that point operators at the local compose lanes.
## Documentation Prerequisites
- `docs/operations/devops/README.md`
- `docs/operations/devops/architecture.md`
- `docs/operations/devops/implementation_plan.md`
- `docs/modules/platform/architecture-overview.md`
- `devops/compose/README.md`
- `docs/integrations/LOCAL_SERVICES.md`
## Delivery Tracker
### TASK-1 - Lower the idle footprint of optional local integration providers
Status: DONE
Dependency: none
Owners: Developer
Task description:
- Reconfigure the local integrations compose lane so Consul no longer burns CPU in the default bring-up path and GitLab uses genuine low-idle omnibus settings for local SCM/API validation.
- Preserve an explicit opt-in path for features that justify the extra cost, including Consul KV checks and GitLab registry/package coverage.
Completion criteria:
- [x] Consul is no longer part of the default `docker compose -f docker-compose.integrations.yml up -d` lane.
- [x] GitLab uses low-idle local defaults with corrected Puma/Sidekiq tuning and optional registry/package re-enable flags.
- [x] The compose file still validates with `docker compose config`.
### TASK-2 - Clarify which compose lane installs which containers
Status: DONE
Dependency: TASK-1
Owners: Developer
Task description:
- Update the local compose docs so operators can distinguish the CI/testing stack from the real third-party integration stack and know when GitLab or Consul should be started explicitly.
- Record the CPU-triage findings so future local bring-up choices are informed by actual runtime behavior rather than assumptions.
Completion criteria:
- [x] `devops/compose/README.md` explains the low-idle default lane plus the opt-in Consul and GitLab commands.
- [x] `docs/integrations/LOCAL_SERVICES.md` reflects the new startup model and GitLab/Consul behavior.
- [x] Install/dev guides mention that `docker-compose.testing.yml` does not install GitLab or Consul.
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-04-05 | Sprint created after a two-minute CPU sample showed the local integration lane's top sustained consumers were `router-gateway`, GitLab, PostgreSQL, Consul, and Valkey. | Developer |
| 2026-04-05 | Reconfigured `docker-compose.integrations.yml` so Consul is opt-in and GitLab uses corrected low-idle omnibus settings with optional registry/package re-enable flags. | Developer |
| 2026-04-05 | Updated compose/install/local-service docs to distinguish the testing lane from the real third-party integration lane and to document the new GitLab/Consul startup model. | Developer |
| 2026-04-05 | Runtime validation: stopped the live `stellaops-consul` container, recreated `stellaops-gitlab`, confirmed GitLab returned `healthy` with `gitlab-kas` disabled, and captured fresh PostgreSQL/GitLab/Valkey traces plus a post-change top-5 CPU sample. | Developer |
| 2026-04-05 | Follow-up runtime validation: moved Gitea admin-bootstrap proof from the repeating healthcheck into a one-time sentinel written by the entrypoint, recreated `stellaops-gitea`, and confirmed the expensive healthcheck loop no longer dominates Gitea CPU. | Developer |
## Decisions & Risks
- Runtime evidence showed Consul had zero registered services/checks yet still spent CPU in dev-agent churn, so the default local lane now leaves it off unless the Consul connector is being validated explicitly.
- GitLab CPU was dominated by Sidekiq cron/background work and a larger-than-expected Puma footprint. The compose file now uses `sidekiq['concurrency']` and `puma['worker_processes']`, which match the Omnibus template keys, instead of the previous ineffective local tuning.
- Post-change runtime checks showed GitLab settles back down after reconfigure, but it still runs unavoidable Omnibus background work whenever the container is up. The durable low-idle control is therefore opt-in startup, not assuming GitLab can be made "free" while running.
- The original Gitea fix proved the admin existed by running `gitea admin user list` from the healthcheck every 30 seconds. That caused misleading CPU spikes during later monitoring, so the healthcheck now validates a sentinel file created once by the entrypoint instead.
- GitLab registry/package features are now opt-in via env vars for the local lane. Operators who need GitLab registry coverage must start GitLab with `GITLAB_ENABLE_REGISTRY=true` (and packages with `GITLAB_ENABLE_PACKAGES=true`).
- PostgreSQL and Valkey remain active because they are core Stella runtime dependencies, not optional third-party fixtures. Their load must be analyzed service-by-service rather than disabled globally.
## Next Checkpoints
- Re-sample container CPU after the live GitLab recreate and Consul shutdown to confirm the top 5 ranking changed as expected.
- If Valkey and router-gateway remain the dominant sustained pair, trace the queue-wait and stream-consumer settings in the router transport next.

View File

@@ -0,0 +1,86 @@
# Sprint 20260405-008 - Consul, Postgres, And Router Runtime Tuning
## Topic & Scope
- Keep the local Consul integration provider running while reducing its idle CPU footprint.
- Increase local PostgreSQL diagnostics enough to capture slow-query and lock context for the active Stella stack.
- Trace the router gateway and Valkey messaging behavior to separate real traffic from avoidable idle churn, then apply safe local tuning where it does not sacrifice functionality.
- Working directory: `devops/compose/`.
- Expected evidence: live container samples, compose updates, PostgreSQL runtime configuration, and documented router/Valkey findings.
## Dependencies & Concurrency
- Depends on `devops/compose/docker-compose.integrations.yml`, `devops/compose/docker-compose.stella-ops.yml`, `devops/compose/README.md`, `docs/integrations/LOCAL_SERVICES.md`, and `docs/implplan/SPRINT_20260405_007_Integrations_local_idle_cpu_tuning.md`.
- Cross-module read access required for `src/Router/**` to explain runtime messaging behavior.
- Cross-module doc edits allowed for `docs/integrations/**`, `docs/implplan/**`, and top-level setup/devops docs that describe the local runtime.
## Documentation Prerequisites
- `docs/operations/devops/README.md`
- `docs/operations/devops/architecture.md`
- `docs/operations/devops/implementation_plan.md`
- `docs/modules/platform/architecture-overview.md`
- `devops/compose/README.md`
- `docs/integrations/LOCAL_SERVICES.md`
- `src/Router/AGENTS.md`
- `src/Router/__Libraries/StellaOps.Router.Gateway/AGENTS.md`
- `src/Router/__Libraries/StellaOps.Messaging.Transport.Valkey/AGENTS.md`
## Delivery Tracker
### TASK-1 - Keep Consul up with a lower idle footprint
Status: DONE
Dependency: none
Owners: Developer
Task description:
- Replace the current local Consul dev-agent mode with a lower-idle single-server configuration that preserves the HTTP API and local UI surface needed for connector validation.
- Validate the new mode against the live compose service and record before/after CPU evidence.
Completion criteria:
- [x] `stellaops-consul` stays up in the local integrations lane.
- [x] Idle CPU is measurably lower than the current `agent -dev` mode.
- [x] Docs reflect the retained startup and any changed operational caveats.
### TASK-2 - Raise PostgreSQL diagnostics for local tracing
Status: DONE
Dependency: none
Owners: Developer
Task description:
- Enable targeted local PostgreSQL logging that captures slow statements and lock-related context without turning the dev database into an unreadable firehose.
- Record the exact runtime settings and confirm they are active on the live container.
Completion criteria:
- [x] Slow-query and lock-wait logging is enabled on the live `stellaops-postgres` instance.
- [x] The chosen settings are documented in the sprint log and reflected in local ops guidance if they become part of compose defaults.
- [x] At least one follow-up log capture demonstrates the new diagnostics are active.
### TASK-3 - Trace router gateway and Valkey churn without reducing functionality
Status: DONE
Dependency: none
Owners: Developer
Task description:
- Investigate the router gateway's Valkey-backed messaging loops and determine whether the dominant CPU comes from real request throughput, heartbeat traffic, or avoidable control-plane churn.
- Propose or apply safe local tuning only where the behavior preserves routing readiness and service connectivity.
Completion criteria:
- [x] Router gateway, Valkey, and PostgreSQL traces are correlated into a concrete runtime explanation.
- [x] Any applied tuning preserves gateway readiness and microservice connectivity.
- [x] Remaining non-applied improvements are documented with explicit tradeoffs.
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-04-05 | Sprint created after the follow-up request to keep Consul running, increase PostgreSQL diagnostics, and investigate router-gateway/Valkey runtime churn without sacrificing functionality. | Developer |
| 2026-04-05 | Replaced local Consul `agent -dev` with a persistent single-node server (`-server -bootstrap-expect=1 -ui -data-dir=/consul/data`) and validated live CPU falling from roughly 3-4% idle to roughly 0.5-1.3% while keeping the HTTP KV surface and UI available. Updated the integrations compose docs accordingly. | Developer |
| 2026-04-05 | Enabled targeted PostgreSQL diagnostics on the live `stellaops-postgres` container via `ALTER SYSTEM`: `log_min_duration_statement=100ms`, `log_connections=on`, `log_disconnections=on`, `log_lock_waits=on`, `deadlock_timeout=500ms`, and a richer `log_line_prefix`. Verified the settings in `postgresql.auto.conf` and confirmed slow-query logging with a `pg_sleep(0.25)` probe. | Developer |
| 2026-04-05 | Correlated router-gateway, Valkey, and code-level evidence. Empty router request streams ruled out backlog. The dominant churn is repeated HELLO re-registration across the full microservice fleet, not user request load. In a 60-second sample the gateway logged 261 `HELLO received` events and 261 matching `Messaging connection registered` events, aligning with the 10-second `RegistrationRefreshIntervalSeconds` default across roughly 42 connected services. Patched local compose defaults to `30s` messaging heartbeat and `30s` registration refresh for the next live redeploy. | Developer |
| 2026-04-05 | Recreated the main `docker-compose.stella-ops.yml` stack with the new router defaults and re-sampled the live system after it settled. Gateway readiness stayed green. Router HELLO traffic fell from 261/min to 84/min, and the corresponding Valkey command deltas fell to `xreadgroup=621`, `xautoclaim=262`, `publish=168`, `ping=667`, `xadd=168`, `xack=168`, and `xdel=168` over 60 seconds. Router CPU in the same window averaged roughly 3.1% with bursty peaks, while Valkey averaged roughly 1.0%, PostgreSQL roughly 0.3%, and Consul roughly 0.4% outside isolated blips. | Developer |
## Decisions & Risks
- `docs/operations/devops/TASKS.md` is referenced by the module AGENTS but does not exist in the repository. This sprint records status in `docs/implplan` instead.
- Any router-gateway tuning must preserve the gateway readiness contract and the current required microservice set; lowering CPU by making the gateway slower to detect disconnected services is not acceptable unless the tradeoff is explicit and bounded.
- PostgreSQL diagnostics should stay targeted. Full statement logging would distort the very CPU profile we are trying to understand.
- Router/Valkey analysis corrected an earlier assumption: `VALKEY_QUEUE_WAIT_TIMEOUT=0` does not create extra polling here. In the current implementation it means infinite wait on the pub/sub signal, which is risky for resilience but not the dominant CPU source. The measurable churn comes from repeated HELLO refreshes and gateway re-registration processing.
- PostgreSQL connection logging surfaced separate short-session churn from web workloads even after the router fix. Earlier samples showed bursts from `stellaops-advisory-ai-web` (`172.19.0.62`), while the later 60-second sample showed `stellaops-scanner-web` (`172.19.0.60`) opening most of the remote sessions. That is outside the router fix and should be handled as a dedicated connection-pooling and `Application Name` follow-up if it keeps mattering.
## Next Checkpoints
- Validate the lower-idle Consul mode against the live `stellaops-consul` container.
- Apply and verify PostgreSQL logging changes on the running stack.
- Use the new PostgreSQL logging to identify the highest-churn application sessions and decide whether `pg_stat_statements` or connection-string `Application Name` standardization is needed in local compose.

View File

@@ -0,0 +1,89 @@
# Sprint 20260405-009 - Router Registration Resync And Hello Slimming
## Topic & Scope
- Replace the current periodic full HELLO replay with a cheaper control-plane pattern in the Router module.
- Keep endpoint/schema/OpenAPI replay available for service startup and explicit gateway resync, while periodic liveness traffic stays small.
- Preserve messaging transport resilience when Valkey Pub/Sub notifications degrade or disappear.
- Working directory: `src/Router/`.
- Expected evidence: targeted Router tests, updated router docs, and live compose/runtime samples.
## Dependencies & Concurrency
- Depends on `docs/implplan/SPRINT_20260405_008_Integrations_consul_pg_router_runtime_tuning.md` for the runtime baseline that exposed the HELLO flood.
- Read access required for `devops/compose/docker-compose.stella-ops.yml` and `devops/compose/README.md` to keep local runtime defaults aligned with the Router protocol behavior.
- Cross-module doc edits allowed for `docs/modules/router/**`, `docs/implplan/**`, and `devops/compose/README.md` when the runtime contract changes.
## Documentation Prerequisites
- `docs/code-of-conduct/CODE_OF_CONDUCT.md`
- `docs/README.md`
- `docs/07_HIGH_LEVEL_ARCHITECTURE.md`
- `docs/modules/platform/architecture-overview.md`
- `docs/modules/router/README.md`
- `docs/modules/router/architecture.md`
- `docs/modules/router/messaging-valkey-transport.md`
- `docs/features/checked/gateway/router-heartbeat-and-health-monitoring.md`
- `src/Router/AGENTS.md`
- `src/Router/__Libraries/StellaOps.Router.Gateway/AGENTS.md`
- `src/Router/__Libraries/StellaOps.Messaging.Transport.Valkey/AGENTS.md`
## Delivery Tracker
### TASK-1 - Trace current HELLO refresh and resync behavior
Status: DONE
Dependency: none
Owners: Developer
Task description:
- Read the current HELLO payload, gateway registration flow, routing-state update path, and Valkey notifiable-queue fallback behavior.
- Produce a concrete design that distinguishes between startup registration, explicit gateway resync, and cheap periodic liveness traffic.
Completion criteria:
- [ ] Existing HELLO refresh triggers are documented in the sprint log with code references.
- [ ] The resubscription / missed-notification fallback behavior in the Valkey transport is documented so the protocol change does not remove needed resilience.
- [ ] The selected protocol change is scoped tightly enough to implement with focused Router tests.
### TASK-2 - Implement explicit resync signaling and slimmer periodic traffic
Status: DONE
Dependency: TASK-1
Owners: Developer
Task description:
- Add the minimal Router protocol/runtime changes needed so services send the heavy registration payload on startup and on explicit gateway resync, while periodic traffic avoids replaying the full endpoint catalog.
- Keep the gateway able to rebuild state after startup or administrative resync without depending on manual service restarts.
Completion criteria:
- [ ] Router code differentiates between full registration replay and lightweight periodic traffic.
- [ ] Gateway can trigger resync without requiring a full service restart.
- [ ] Existing routing, claims, and OpenAPI behaviors remain correct after the change.
### TASK-3 - Validate protocol behavior and runtime impact
Status: DONE
Dependency: TASK-2
Owners: Developer
Task description:
- Add or update targeted Router tests around HELLO/resync handling and Valkey fallback behavior.
- Re-run focused local runtime samples to verify the control-plane traffic drops without sacrificing readiness or routing correctness.
Completion criteria:
- [ ] Targeted Router test projects pass with coverage for the new protocol behavior.
- [ ] Live gateway readiness and routing stay healthy after the change.
- [ ] Sprint and router docs record the final behavior and residual tradeoffs.
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-04-05 | Sprint created to move from compose-only tuning into Router protocol/runtime changes after the HELLO refresh flood was traced to periodic full registration replay across the service fleet. | Developer |
| 2026-04-05 | Traced the remaining messaging resilience path: Valkey consumers still run `XAUTOCLAIM` + `XREADGROUP` checks around `WaitForNotificationAsync(...)`, with timeout fallback, connection-restored wakeups, and randomized proactive re-subscribe retained on purpose for silent Pub/Sub failure recovery. | Developer |
| 2026-04-05 | Implemented explicit messaging resync: startup HELLO is identity-only, gateway can request metadata replay via `ResyncRequest`, microservices answer with `EndpointsUpdate`, and heartbeats now carry instance identity so gateway-state misses can recover without full reconnect churn. | Developer |
| 2026-04-05 | Targeted verification passed with Microsoft Testing Platform class filters: `RouterConnectionManagerTests` (19/19), `MessagingTransportQueueOptionsTests` (6/6), `GatewayRegistrationResyncServiceTests` (3/3), and `MessagingTransportIntegrationTests` (6/6). A full `StellaOps.Gateway.WebService.Tests` run still reports 2 unrelated route-table assertions in `GatewayRouteSearchMappingsTests`, which are outside this sprint write scope. | Developer |
| 2026-04-05 | Rebuilt and redeployed the live Router-dependent `docker-compose.stella-ops.yml` services so the new control frames were rolled out consistently across the running mesh. After health settled, a 60-second `docker stats` sample showed the restarted Stella Ops fleet below 1% CPU on average for every top-10 service; focused follow-up samples put `stellaops-router-gateway` at `1.17%` avg / `3.27%` max, `stellaops-platform` at `0.11%` avg, and `stellaops-signals` at `0.10%` avg. Router logs showed only 8 `HELLO received` events over 2 minutes after rollout. | Developer |
| 2026-04-05 | Extended post-rollout runtime sampling over 3 minutes kept `stellaops-evidence-locker-web` low at `0.19%` avg / `1.75%` max and `stellaops-postgres` at `0.71%` avg / `4.60%` max. Postgres slow-statement logs remained empty in the sampled window, while connection churn was dominated by `172.19.0.58` (`stellaops-advisory-ai-web`) with `173` connection-log entries in 10 minutes and blank `application_name`, which points to attribution/pooling debt rather than Evidence Locker pressure. The broader whole-stack sample still showed transient integration overhead outside this sprint scope, notably `stellaops-gitea` spikes despite an immediate follow-up spot sample already back at `0.04%` CPU. | Developer |
## Decisions & Risks
- The periodic HELLO flood was an architectural behavior, not just a bad compose default: `RouterConnectionManager` refreshed via transport `ConnectAsync(...)`, and the messaging transport used to serialize a full `HelloPayload` on every replay. This sprint removes that periodic metadata replay for messaging and replaces it with explicit control frames.
- The Valkey transport already contains explicit resilience traffic for silent Pub/Sub failure: timeout-based fallback waits plus proactive randomized re-subscription. Any protocol change must preserve those recovery paths.
- Backward compatibility matters across Router transports. If a new control frame is introduced, frame parsing and ignore/compatibility behavior must be explicit.
- `RegistrationRefreshInterval` still exists in Router options, but messaging transport no longer uses it to replay endpoint catalogs. Future cleanup can deprecate or rename that knob once non-messaging transport expectations are audited.
- Live rollout had to cover the full running Router mesh, not just `router-gateway`, because the new `ResyncRequest` / `EndpointsUpdate` control frames span shared Router client and server libraries. Partial deployment would have left old services unable to answer explicit resync requests.
## Next Checkpoints
- Finalize the protocol change after tracing current HELLO and fallback flows.
- Implement and test the Router-side resync behavior.
- Re-sample the live stack after the Router change lands.

View File

@@ -0,0 +1,77 @@
# Sprint 20260405-010 - AdvisoryAI PG Pooling And Gitea Spike Followup
## Topic & Scope
- Reduce AdvisoryAI PostgreSQL connection churn by adding stable application-name attribution and reusing pooled connections in the live knowledge-search and unified-search paths.
- Rebuild and redeploy the affected AdvisoryAI service, then resample PostgreSQL and AdvisoryAI runtime load to confirm the change.
- Capture the next transient Gitea CPU spike with process-level evidence instead of only container-level stats so the remaining integration outlier is attributable.
- Working directory: `src/AdvisoryAI/`.
- Expected evidence: targeted AdvisoryAI tests, updated AdvisoryAI deployment/runtime docs, compose/runtime samples, and Gitea process capture artifacts in the sprint log.
## Dependencies & Concurrency
- Depends on `docs/implplan/SPRINT_20260405_008_Integrations_consul_pg_router_runtime_tuning.md` for the PostgreSQL logging baseline.
- Depends on `docs/implplan/SPRINT_20260405_009_Router_registration_resync_and_hello_slimming.md` for the post-router-redeploy steady-state baseline.
- Cross-module edits allowed for `docs/implplan/**`, `docs/modules/advisory-ai/**`, and `devops/compose/**` when configuration or runtime procedures change.
## Documentation Prerequisites
- `docs/code-of-conduct/CODE_OF_CONDUCT.md`
- `docs/README.md`
- `docs/07_HIGH_LEVEL_ARCHITECTURE.md`
- `docs/modules/platform/architecture-overview.md`
- `docs/modules/advisory-ai/architecture.md`
- `docs/modules/advisory-ai/deployment.md`
- `src/AdvisoryAI/AGENTS.md`
- `src/AdvisoryAI/StellaOps.AdvisoryAI/AGENTS.md`
- `src/AdvisoryAI/StellaOps.AdvisoryAI.WebService/AGENTS.md`
- `src/AdvisoryAI/StellaOps.AdvisoryAI.Hosting/AGENTS.md`
## Delivery Tracker
### AIAI-PG-POOL-001 - Tighten AdvisoryAI PostgreSQL attribution and pooling
Status: DONE
Dependency: none
Owners: Developer
Task description:
- Trace the current AdvisoryAI PostgreSQL access paths, especially the knowledge-search and unified-search background services that currently use raw `NpgsqlConnection` or short-lived `NpgsqlDataSource` instances.
- Add stable PostgreSQL `application_name` attribution and consolidate those paths onto reusable pooled data sources so advisory-ai-web stops generating bursts of short physical sessions.
- Redeploy the affected AdvisoryAI service and resample PostgreSQL plus AdvisoryAI runtime load to verify the change.
Completion criteria:
- [x] AdvisoryAI PostgreSQL sessions expose a stable `application_name` instead of `[unknown]`.
- [x] AdvisoryAI knowledge-search/unified-search runtime paths reuse pooled connections instead of repeatedly constructing throwaway data sources.
- [x] Targeted AdvisoryAI tests pass and the live advisory-ai-web PostgreSQL churn drops measurably after redeploy.
### INT-GITEA-CPU-001 - Capture transient Gitea CPU spikes with process evidence
Status: DONE
Dependency: none
Owners: Developer
Task description:
- Run a live watcher against `stellaops-gitea` long enough to catch the next transient CPU spike and capture process-level evidence from inside the container at spike time.
- Record what was observed, whether the spike is in the main Gitea process or another child/thread, and whether the existing logs/health probes explain it.
Completion criteria:
- [x] A live watcher captured at least one process-level sample during or immediately adjacent to a Gitea spike, or explicitly records that no spike occurred during the observation window.
- [x] Sprint notes state whether the spike was explained by current evidence or remains unresolved.
- [x] Any runtime procedure change needed for future capture is documented.
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-04-05 | Sprint created after post-router steady-state sampling showed PostgreSQL itself was calm but AdvisoryAI still generated unattributed short sessions, while Gitea remained a transient integration outlier in longer CPU windows. | Developer |
| 2026-04-05 | Replaced AdvisoryAI knowledge-search/unified-search raw PostgreSQL connections and throwaway `NpgsqlDataSource` instances with a shared `KnowledgeSearchDataSourceProvider`; added stable `DatabaseApplicationName` plus idle-pool retention knobs and documented them in `docs/modules/advisory-ai/deployment.md`. | Developer |
| 2026-04-05 | Verified the new connection-string normalization with xUnit v3 direct runner: `dotnet exec src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/bin/Debug/net10.0/StellaOps.AdvisoryAI.Tests.dll -class StellaOps.AdvisoryAI.Tests.KnowledgeSearch.KnowledgeSearchDataSourceProviderTests` => `2/2` passed. | Developer |
| 2026-04-05 | Rebuilt `stellaops/advisory-ai-web:dev` via `devops/docker/build-all.ps1 -Services advisory-ai-web`, force-recreated `stellaops-advisory-ai-web`, and confirmed live env now sets `ADVISORYAI__KnowledgeSearch__DatabaseApplicationName=stellaops-advisory-ai-web/knowledge-search` plus `DatabaseConnectionIdleLifetimeSeconds=900`. | Developer |
| 2026-04-05 | Live PostgreSQL verification after redeploy showed `172.19.0.71` sessions attributed as `stellaops-advisory-ai-web/knowledge-search`; 2-minute steady-state sample settled at `stellaops-advisory-ai-web avg 0.77% CPU`, `stellaops-postgres avg 0.50%`, `stellaops-evidence-locker-web avg 0.14%`, `stellaops-router-gateway avg 0.89%`, `stellaops-gitea avg 0.10%`. | Developer |
| 2026-04-05 | Corrected the Gitea spike watcher to use BusyBox-compatible `sh -c` capture. Artifact `artifacts/runtime/gitea_spike_watch_20260405_175001.log` caught a `104.43%` spike and showed the load inside multiple `/usr/local/bin/gitea -c /etc/gitea/app.ini web` threads, with logs still showing only the periodic `/api/v1/version` health checks. | Developer |
| 2026-04-05 | Extended runtime verification with artifacts `artifacts/runtime/stack_sample_20260405_180815.log`, `artifacts/runtime/postgres_activity_20260405_180815.log`, and `artifacts/runtime/gitea_spike_watch_20260405_180815.log`. Over 23 whole-stack samples, `stellaops-advisory-ai-web avg 0.53% CPU`, `stellaops-postgres avg 0.43%`, `stellaops-evidence-locker-web avg 0.17%`, and `stellaops-gitea avg 0.29%` with no spike captures in 44 Gitea watch samples; PostgreSQL stayed at 4 idle `stellaops-advisory-ai-web/knowledge-search` sessions plus the expected generic idle pool and produced no slow-statement/connection-churn evidence in the sampled window. | Developer |
## Decisions & Risks
- AdvisoryAI connection churn was caused by code, not PostgreSQL itself: `UnifiedSearchIndexer`, `SearchAnalyticsService`, `SearchQualityMonitor`, `EntityAliasService`, and `PostgresKnowledgeSearchStore` were mixing pooled and non-pooled access patterns. The shared `KnowledgeSearchDataSourceProvider` is now the single runtime path for knowledge-search/unified-search PostgreSQL access.
- Runtime configuration is now explicit in both code and local compose: `src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/KnowledgeSearchOptions.cs`, `src/AdvisoryAI/StellaOps.AdvisoryAI/KnowledgeSearch/KnowledgeSearchDataSourceProvider.cs`, `devops/compose/docker-compose.stella-ops.yml`, and `docs/modules/advisory-ai/deployment.md`.
- `dotnet test --filter` is not trustworthy in this repo's current Microsoft Testing Platform setup because the VSTest filter property is ignored. Targeted verification for this sprint used the xUnit v3 assembly runner directly instead of pretending the `dotnet test` filter worked.
- PostgreSQL slow-statement logs stayed empty after redeploy, and `pg_stat_activity` now shows AdvisoryAI as `stellaops-advisory-ai-web/knowledge-search`; the remaining dominant PostgreSQL session counts belong to other services.
- Gitea spikes are real but are not explained by health-check traffic. The corrected capture shows transient CPU bursts inside the main multi-threaded Gitea web process itself, not a separate sidecar or shell child. The root cause remains internal to Gitea's runtime behavior on this persisted instance.
- The longer follow-up window did not reproduce a Gitea spike. That reduces urgency for emergency remediation, but it also confirms the problem is intermittent and requires either a longer watch or Gitea-native profiling during the next event for a complete root cause.
## Next Checkpoints
- If AdvisoryAI PostgreSQL attribution needs to cover non-knowledge paths later, extend the same application-name pattern to any future chat-audit or EF-owned connection strings.
- If Gitea spikes need deeper root-cause attribution, the next step is Gitea-native profiling/debug endpoints or Go runtime profiling during a spike; the current shell-based watcher already proved the bursts are internal Gitea thread work, not external request load.

View File

@@ -45,6 +45,7 @@ Add to `C:\Windows\System32\drivers\etc\hosts`:
127.1.2.5 registry.stella-ops.local
127.1.2.6 minio.stella-ops.local
127.1.2.7 gitlab.stella-ops.local
127.1.2.8 consul.stella-ops.local
```
### 2. Start services
@@ -52,7 +53,7 @@ Add to `C:\Windows\System32\drivers\etc\hosts`:
```bash
cd devops/compose
# Start all lightweight services (recommended)
# Start the default low-idle services (recommended)
docker compose -f docker-compose.integrations.yml up -d
# Or start specific services only
@@ -64,8 +65,15 @@ docker compose \
-f docker-compose.integration-fixtures.yml \
up -d
# Start Consul only when validating the Consul connector
docker compose -f docker-compose.integrations.yml --profile consul up -d consul
# Start GitLab CE (heavy, 4 GB+ RAM, ~3 min startup)
docker compose -f docker-compose.integrations.yml --profile heavy up -d gitlab
# Re-enable GitLab registry/package surfaces for registry-specific tests
GITLAB_ENABLE_REGISTRY=true GITLAB_ENABLE_PACKAGES=true \
docker compose -f docker-compose.integrations.yml --profile heavy up -d gitlab
```
### 3. Verify services
@@ -73,8 +81,13 @@ docker compose -f docker-compose.integrations.yml --profile heavy up -d gitlab
```bash
# Quick health check for all services
docker compose -f docker-compose.integrations.yml ps
# Gitea is only complete once the container is healthy
docker compose -f docker-compose.integrations.yml ps gitea
```
`docker-compose.testing.yml` is the separate infrastructure-test lane. It starts `postgres-test`, `valkey-test`, mocks, and an isolated Gitea profile on different ports; it does not start Consul or GitLab.
---
## Service Reference
@@ -86,7 +99,8 @@ docker compose -f docker-compose.integrations.yml ps
| URL | http://gitea.stella-ops.local:3000 |
| API | http://gitea.stella-ops.local:3000/api/v1 |
| SSH | gitea.stella-ops.local:2222 |
| First-run | Create admin account via web UI |
| Admin | `stellaops` / `Stella2026!` on fresh volumes |
| Bootstrap | Container entrypoint seeds the repo root and first admin before health goes green |
| Swagger | http://gitea.stella-ops.local:3000/api/swagger |
| Integration type | SCM (Gitea provider) |
| Docker DNS | `gitea.stella-ops.local` |
@@ -97,10 +111,12 @@ docker compose -f docker-compose.integrations.yml ps
- Organization: *(your Gitea org name)*
**Create an API token:**
1. Log in to Gitea
1. Log in to Gitea as `stellaops` / `Stella2026!` on a fresh volume, or use the existing admin user if this environment was already initialized.
2. Settings > Applications > Generate Token
3. Store in Vault at `secret/gitea` with key `api-token`
> The previous local-service flow was contradictory: the compose profile marked Gitea as install-locked while the docs still described a manual first-login admin creation path. The compose bootstrap now makes the service deterministic and leaves only PAT creation as a manual step.
---
### Jenkins (CI/CD)
@@ -193,6 +209,32 @@ vault kv put secret/nexus admin-password="your-nexus-password"
---
### Consul (Optional KV / Settings Store)
| Property | Value |
|----------|-------|
| URL | http://consul.stella-ops.local:8500 |
| API | http://consul.stella-ops.local:8500/v1/status/leader |
| Auth | None (single-node local server) |
| Start mode | `--profile consul` only |
| Integration type | Settings / KV (`Consul` provider) |
| Docker DNS | `consul.stella-ops.local` |
**Start Consul only when needed:**
```bash
docker compose -f docker-compose.integrations.yml --profile consul up -d consul
```
**Why opt-in:** even in its lower-idle local mode, Consul is still an extra control-plane service that most local connector checks do not need. The default integration lane keeps it off unless you are explicitly validating the Consul connector.
**Runtime mode:** the local compose profile now runs Consul as a persistent single-node server with the UI enabled instead of `agent -dev`. That preserves the HTTP KV surface while materially lowering idle CPU.
**Stella Ops integration config:**
- Endpoint: `http://consul.stella-ops.local:8500`
- AuthRef: *(none required in local mode)*
---
### Docker Registry (OCI v2)
| Property | Value |
@@ -226,6 +268,7 @@ curl http://registry.stella-ops.local:5000/v2/_catalog
| S3 API | http://minio.stella-ops.local:9000 |
| Access key | `stellaops` |
| Secret key | `Stella2026!` |
| Integration type | Object Storage (`S3Compatible` provider) |
| Docker DNS | `minio.stella-ops.local` |
**Create buckets for Stella Ops:**
@@ -240,6 +283,12 @@ docker exec stellaops-minio mc mb local/scan-results
docker exec stellaops-minio mc mb local/sbom-archive
```
**Stella Ops integration config:**
- Endpoint: `http://minio.stella-ops.local:9000`
- Type: `ObjectStorage`
- Provider: `S3Compatible`
- AuthRef: optional for the default local health probe
---
### GitLab CE (Heavy, Optional)
@@ -249,7 +298,7 @@ docker exec stellaops-minio mc mb local/sbom-archive
| URL | http://gitlab.stella-ops.local:8929 |
| Admin | root / `Stella2026!` |
| SSH | gitlab.stella-ops.local:2224 |
| Container Registry | gitlab.stella-ops.local:5050 |
| Container Registry | gitlab.stella-ops.local:5050 (`GITLAB_ENABLE_REGISTRY=true` only) |
| RAM required | 4 GB+ |
| Startup time | ~3-5 minutes |
| Integration type | SCM + CI/CD + Registry |
@@ -260,6 +309,17 @@ docker exec stellaops-minio mc mb local/sbom-archive
docker compose -f docker-compose.integrations.yml --profile heavy up -d gitlab
```
**Default local tuning:**
- SCM/API coverage stays available.
- Registry and package surfaces are disabled by default to reduce idle CPU.
- Puma and Sidekiq run in reduced-concurrency mode for local connector checks.
**Enable registry/package coverage explicitly when needed:**
```bash
GITLAB_ENABLE_REGISTRY=true GITLAB_ENABLE_PACKAGES=true \
docker compose -f docker-compose.integrations.yml --profile heavy up -d gitlab
```
**Stella Ops integration config (SCM):**
- Endpoint: `http://gitlab.stella-ops.local:8929`
- AuthRef: `authref://vault/gitlab#access-token`
@@ -326,16 +386,27 @@ docker compose -f docker-compose.integrations.yml down -v
| **Registry** | Harbor | harbor-fixture (mock) | Ready |
| **Registry** | Docker Hub / OCI | docker-registry | Ready |
| **Registry** | Nexus | nexus | Ready |
| **Registry** | GitLab Registry | gitlab (heavy) | Optional |
| **Registry** | GitLab Registry | gitlab (heavy) | Ready when `GITLAB_ENABLE_REGISTRY=true` |
| **SCM** | GitHub App | github-app-fixture (mock) | Ready |
| **SCM** | Gitea | gitea | Ready |
| **SCM** | GitLab Server | gitlab (heavy) | Optional |
| **CI/CD** | Jenkins | jenkins | Ready (needs plugin) |
| **CI/CD** | GitLab CI | gitlab (heavy) | Optional (needs plugin) |
| **SCM** | GitLab Server | gitlab (heavy) | Ready with Vault-backed PAT |
| **CI/CD** | Jenkins | jenkins | Ready |
| **CI/CD** | GitLab CI | gitlab (heavy) | Ready with reduced local concurrency |
| **Secrets** | Vault | vault | Ready |
| **Storage** | S3 (MinIO) | minio | Ready |
| **Secrets** | Consul | consul | Opt-in (`--profile consul`) |
| **Runtime Host** | eBPF Agent | runtime-host-fixture (mock) | Ready |
| **Feed Mirror** | StellaOps / NVD / OSV mirror | concelier | Ready |
| **Storage** | S3-compatible (MinIO) | minio | Ready |
| **Advisory & VEX** | 74 sources | advisory-fixture + live | 74/74 healthy |
> **Note:** CI/CD and Runtime Host integrations require backend connector plugins to be loaded
> in the Integrations service. Currently only Harbor, GitHub App, GitLab, and InMemory plugins
> are compiled into the service.
> **Current provider list:** the local Integrations service currently reports connector plugins for Harbor,
> Docker Registry, GitLab Container Registry, Nexus, GitHub App, Gitea, GitLab Server,
> GitLab CI, Jenkins, Vault, Consul, eBPF Agent, the `S3Compatible` object-storage provider, the feed mirror providers
> (`StellaOpsMirror`, `NvdMirror`, `OsvMirror`), and the test-only InMemory provider.
>
> **Storage note:** the `S3Compatible` connector defaults to probing `/minio/health/live`
> when the configured endpoint is the service root, which matches the local MinIO fixture.
>
> **Auth caveat:** several connector plugins validate public health/version endpoints only. A green connection
> test proves reachability and the plugin wiring, but it does not guarantee that privileged API operations are
> fully configured unless you also provision the corresponding secret material in Vault.

View File

@@ -36,6 +36,18 @@ docker compose -f ops/advisory-ai/docker-compose.advisoryai.yaml up -d --build
- Compose mounts `advisoryai-data` volume; Helm uses `emptyDir` by default or a PVC when `storage.persistence.enabled=true`.
- In sealed/air-gapped mode, mount guardrail lists/policy knobs under `/app/etc` and point env vars accordingly.
## PostgreSQL attribution and pooling
- AdvisoryAI knowledge-search and unified-search PostgreSQL traffic now uses a shared pooled `NpgsqlDataSource` instead of per-operation transient data sources or raw connections.
- Default `application_name` is `stellaops-advisory-ai-web/knowledge-search`, which makes `pg_stat_activity` attribution stable for the web service.
- Default idle-pool retention is `900` seconds so the shared pool stays warm across the 5-minute unified-search refresh cycle instead of re-opening physical sessions each run.
- Override these with:
- `ADVISORYAI__KnowledgeSearch__DatabaseApplicationName`
- `ADVISORYAI__KnowledgeSearch__DatabasePoolingEnabled`
- `ADVISORYAI__KnowledgeSearch__DatabaseMinPoolSize`
- `ADVISORYAI__KnowledgeSearch__DatabaseMaxPoolSize`
- `ADVISORYAI__KnowledgeSearch__DatabaseConnectionIdleLifetimeSeconds`
- Existing `ADVISORYAI__KnowledgeSearch__ConnectionString` remains authoritative for host/database/credentials; the new options only stamp attribution and pool behavior.
## Scaling guidance
- WebService: start with 1 replica, scale horizontally by CPU (tokenization) or queue depth; set `ADVISORYAI__QUEUE__DIRECTORYPATH` to a shared PVC for multi-replica web+worker.
- Worker: scale independently; use `worker.replicas` in Helm or add `--scale advisoryai-worker=N` in compose. Workers are CPU-bound; pin via `resources.requests/limits`.

View File

@@ -37,6 +37,69 @@ Available for all commands:
---
## Integration Catalog Commands
### stella config integrations
Manage the live Integration Catalog through the Integrations service.
**Usage:**
```bash
stella config integrations <command> [options]
```
**Subcommands:**
- `list` - List catalog entries with filtering and paging
- `providers` - List supported providers and discovery metadata
- `get` / `show` - Display a single integration
- `create` - Create an integration entry
- `update` - Update an integration entry
- `delete` / `remove` - Delete an integration entry
- `test` - Run connector connectivity test
- `health` - Query connector health
- `impact` - Show workflow impact summary
- `discover` - Discover provider resources such as repositories, projects, jobs, pipelines, or tags
**Examples:**
```bash
# Show live providers (default hides test-only providers)
stella config integrations providers
# Include test-only providers such as InMemory
stella config integrations providers --include-test-only --format json
# Create a GitLab server entry
stella config integrations create \
--name local-gitlab \
--type scm \
--provider gitlabserver \
--endpoint http://gitlab.stella-ops.local:8929 \
--authref authref://vault/gitlab#access-token
# Discover projects from an existing integration
stella config integrations discover <integration-id> --resource-type projects
# Discover tags within a repository
stella config integrations discover <integration-id> \
--resource-type tags \
--filter repository=team/api \
--filter namePattern='v*'
# Register local MinIO through the object-storage provider
stella config integrations create \
--name local-minio \
--type objectstorage \
--provider s3compatible \
--endpoint http://minio.stella-ops.local:9000
```
**Notes:**
- `providers` returns `isTestOnly`, `supportsDiscovery`, and `supportedResourceTypes`.
- Deprecated `stella integrations *` routes are preserved as aliases and forward to `stella config integrations *`.
- Unsupported discovery requests return a client error instead of silently falling back to sample data.
---
## Scanning & Analysis Commands
### stella scan

View File

@@ -26,6 +26,13 @@ Previously archived docs for RiskEngine and VulnExplorer are in `docs-archived/m
- Merkle & external anchor policy: `merkle-anchor-policy.md`
- Tenant isolation & redaction manifest: `tenant-isolation-redaction.md`
## Compatibility read-model surfaces
- `GET /api/v2/security/vulnerabilities/{identifier}` is the authoritative shipped vulnerability-detail route for the Web console.
- The route is backed by Findings Ledger projections plus optional scoring state. Unknown fields remain null or absent instead of being fabricated in the API or the web client.
- `signedScore` is emitted only when cached or historical scoring state exists for the resolved finding.
- `proofSubjectId` is surfaced only when the projection carries replay/proof identity, allowing the Web console to enable verification only when a real proof subject exists.
## Implementation Status
### Delivery Phases

View File

@@ -31,6 +31,13 @@
- `POST /graph/diff` — compares `snapshotA` vs `snapshotB`, streaming node/edge added/removed/changed tiles plus stats; budget enforcement mirrors `/graph/query`.
- `POST /graph/export` — async job producing deterministic manifests (`sha256`, size, format) for `ndjson/csv/graphml/png/svg`; download via `/graph/export/{jobId}`.
- `POST /graph/lineage` - returns SBOM lineage nodes/edges anchored by `artifactDigest` or `sbomDigest`, with optional relationship filters and depth limits.
- Compatibility facade for the shipped Angular explorer:
- `GET /graphs`, `GET /graphs/{graphId}`, `GET /graphs/{graphId}/tiles`
- `GET /search`, `GET /paths`
- `GET /graphs/{graphId}/export`, `GET /assets/{assetId}/snapshot`, `GET /nodes/{nodeId}/adjacency`
- `GET/POST/DELETE /graphs/{graphId}/saved-views`
- The compatibility tile surface emits only `policy`, `vex`, and `aoc` overlays on the shipped web path.
- Saved views are persisted in PostgreSQL table `graph.saved_views` when `Postgres:Graph` is configured; the API falls back to an in-memory store only for hosts that do not wire Graph persistence.
- **Edge Metadata API** (added 2025-01):
- `POST /graph/edges/metadata` — batch query for edge explanations; request contains `EdgeIds[]`, response includes `EdgeTileWithMetadata[]` with full provenance.
- `GET /graph/edges/{edgeId}/metadata` — single edge metadata with explanation, via, provenance, and evidence references.
@@ -77,6 +84,7 @@ The edge metadata system provides explainability for graph relationships:
- **Relational + adjacency** (PostgreSQL tables `graph_nodes`, `graph_edges`, `graph_overlays`) with deterministic ordering and streaming exports.
- Or **Graph DB** (e.g., Neo4j/Cosmos Gremlin) behind an abstraction layer; choice depends on deployment footprint.
- All storages require tenant partitioning, append-only change logs, and export manifests for Offline Kits.
- The shipped compatibility saved-view surface now owns a small relational persistence slice via startup migration `003_saved_views.sql`, which auto-creates `graph.saved_views` and keeps saved views durable across host restarts.
## 5) Offline & export

View File

@@ -123,10 +123,16 @@ The `CircuitBreakerService` implements the circuit breaker pattern for downstrea
- `GET /api/v1/release-orchestrator/dashboard` — control-plane dashboard payload (pipeline, pending approvals, active deployments, recent releases).
- `POST /api/v1/release-orchestrator/promotions/{id}/approve` — approve a pending promotion from dashboard context.
- `POST /api/v1/release-orchestrator/promotions/{id}/reject` — reject a pending promotion from dashboard context.
- `GET /api/v1/release-orchestrator/deployments` plus detail/log/event/metric endpoints and lifecycle actions (`pause`, `resume`, `cancel`, `rollback`, target `retry`) provide the release deployment monitoring surface used by the Console.
- `POST /api/v1/release-orchestrator/deployments` creates a live deployment run for a single target environment using canonical strategies `rolling | blue_green | canary | all_at_once`.
- `GET /api/v1/release-orchestrator/deployments` plus detail/log/event/metric endpoints and lifecycle actions (`pause`, `resume`, `cancel`, `rollback`, target `retry`) provide the release deployment monitoring surface used by the Console. The compatibility implementation is backed by persisted state in `orchestrator.compatibility_deployments`, with seed rows inserted per tenant only as bootstrap data.
- `GET /api/v1/release-orchestrator/evidence` plus `verify`, `export`, `raw`, and `timeline` routes provide deterministic evidence inspection and export for offline audit flows.
- Compatibility aliases are exposed for legacy clients under `/api/release-orchestrator/*`.
Compatibility deployment persistence notes:
- Startup migration `011_compatibility_deployments.sql` creates the compatibility deployment table automatically as part of the normal JobEngine migration host.
- The WebService resolves `IDeploymentCompatibilityStore` through DI: PostgreSQL when JobEngine persistence is configured, in-memory fallback otherwise.
- Configuration precedence is explicit: `JobEngine:Database:ConnectionString` wins when provided, while legacy `Orchestrator:Database:ConnectionString` remains a fallback for hosts that have not yet moved to the JobEngine section.
All responses include deterministic timestamps, job digests, and DSSE signature fields for offline reconciliation.
## 5) Observability

View File

@@ -219,6 +219,43 @@ const gitlabToken: CredentialReference = {
};
```
## Live Catalog Workflow
The shipped operator workflow is now backed by the Integrations service rather than CLI sample data.
### Provider Discovery
- `GET /api/v1/integrations/providers` returns live provider metadata from the loaded connector plugins.
- Default responses hide test-only providers such as `InMemory`.
- Each provider entry now advertises:
- `isTestOnly`
- `supportsDiscovery`
- `supportedResourceTypes`
### Resource Discovery
- `POST /api/v1/integrations/{id}/discover` is the supported discovery contract.
- Registry providers discover `repositories` and `tags`.
- SCM providers discover `projects` and `repositories`.
- CI/CD providers discover `jobs` and `pipelines`.
- Object storage providers such as `S3Compatible` participate in the live catalog and health/test flows, but do not currently advertise discovery.
- Unsupported resource types return `400` with the provider's supported resource types.
### CLI Management Surface
The `stella config integrations` command group manages the live catalog end-to-end:
```bash
stella config integrations providers
stella config integrations list
stella config integrations create --name local-gitlab --type scm --provider gitlabserver --endpoint http://gitlab.stella-ops.local:8929 --authref authref://vault/gitlab#access-token
stella config integrations create --name local-minio --type objectstorage --provider s3compatible --endpoint http://minio.stella-ops.local:9000
stella config integrations test <integration-id>
stella config integrations discover <integration-id> --resource-type projects
```
Deprecated `stella integrations *` routes are preserved as aliases and forward to `stella config integrations *`.
## Health Monitoring
### Health Check Types

View File

@@ -24,7 +24,7 @@ Rollout policy: `docs/operations/multi-tenant-rollout-and-compatibility.md`
Each transport connection carries:
- Initial registration (HELLO) and endpoint configuration
- Initial identity (HELLO) and, when needed, endpoint metadata replay
- Ongoing heartbeats
- Request/response data frames
- Streaming data frames
@@ -34,9 +34,11 @@ Each transport connection carries:
┌─────────────────┐ ┌─────────────────┐
│ Microservice │ │ Gateway │
│ │ HELLO │ │
Endpoints: │ ─────────────────────────►│ Routing │
Identity │ ─────────────────────────►│ Routing │
│ - POST /items │ HEARTBEAT │ State │
│ - GET /items │ ◄────────────────────────►│ │
│ Metadata │ RESYNC / ENDPOINTS │ Connections[] │
│ replay │ ◄────────────────────────►│ │
│ │ │ Connections[] │
│ │ REQUEST / RESPONSE │ │
│ │ ◄────────────────────────►│ │
@@ -280,7 +282,9 @@ public enum FrameType : byte
Response = 4,
RequestStreamData = 5,
ResponseStreamData = 6,
Cancel = 7
Cancel = 7,
ResyncRequest = 8,
EndpointsUpdate = 9
}
```
@@ -415,9 +419,10 @@ Two mechanisms:
### Connection Behavior
On connection:
1. Send HELLO with instance info and endpoints
2. Start heartbeat timer
3. Listen for REQUEST frames
1. Send HELLO with instance identity.
2. Start heartbeat timer.
3. For messaging transport, replay endpoint/schema/OpenAPI metadata only when the router explicitly asks for it.
4. Listen for REQUEST frames.
HELLO payload:
@@ -431,6 +436,11 @@ public sealed class HelloPayload
}
```
For messaging transport the steady-state contract is intentionally slimmer than the generic shape above:
- startup `HELLO` carries identity and may leave `Endpoints` empty
- the gateway sends `ResyncRequest` on service startup, administrative replay, or gateway-state miss
- the microservice answers with `EndpointsUpdate` containing endpoints, schemas, and OpenAPI metadata
---
## Authorization
@@ -449,7 +459,7 @@ public sealed class ClaimRequirement
### Precedence
1. Microservice provides defaults in HELLO
1. Microservice provides defaults in registration metadata
2. Authority can override centrally
3. Gateway enforces final effective claims
@@ -533,9 +543,12 @@ Sent at regular intervals over the same connection as requests:
```csharp
public sealed class HeartbeatPayload
{
public InstanceDescriptor? Instance { get; init; }
public string InstanceId { get; init; }
public required InstanceHealthStatus Status { get; init; }
public int InflightRequests { get; init; }
public int InFlightRequestCount { get; init; }
public double ErrorRate { get; init; }
public DateTime TimestampUtc { get; init; }
}
```
@@ -546,13 +559,14 @@ Gateway tracks:
- Derives status from heartbeat recency
- Marks stale instances as Unhealthy
- Uses health in routing decisions
- Messaging heartbeats include instance identity so the gateway can rebuild minimal state after a gateway restart or local routing-state loss without waiting for a full reconnect.
- Messaging transports stay push-first even when backed by notifiable queues; the missed-notification safety-net timeout is derived from the configured heartbeat interval and clamped to a short bounded window instead of falling back to a fixed long poll.
- Gateway degraded and stale transitions are normalized against the messaging heartbeat contract. A gateway may not mark an instance `Degraded` earlier than `2x` the heartbeat interval or `Unhealthy` earlier than `3x` the heartbeat interval, even when looser defaults were configured.
- `/health/ready` is stricter than "process started": it remains `503` until the configured required first-party microservices have live healthy or degraded registrations in router state. Local scratch compose uses this to hold the frontdoor unhealthy until the core Stella API surface has replayed HELLO after a rebuild.
- The required-service list must use canonical router `serviceName` values, not loose product-family aliases. Gateway readiness normalizes host-style suffixes such as `-gateway`, `-web`, `.stella-ops.local`, and ports, but it does not treat sibling services as interchangeable.
- When a request already matched a configured `Microservice` route but the target service has not registered yet, the gateway returns `503 Service Unavailable`, not `404 Not Found`. `404` remains reserved for genuinely unknown paths or missing endpoints on an otherwise registered service.
Periodic HELLO re-registration is valid so a microservice can repopulate gateway state after a gateway restart, but it must refresh the existing logical transport connection instead of minting a second one. Gateway routing state also deduplicates by service instance identity (`ServiceName`, `Version`, `InstanceId`, transport) before re-indexing endpoints so repeated HELLO frames cannot accumulate stale route candidates.
- Messaging resync is explicit instead of periodic: startup, administrative replay, and gateway-state misses trigger `ResyncRequest`, while normal heartbeats stay small.
- The Valkey transport keeps its timeout fallback plus proactive randomized re-subscribe so silent Pub/Sub failures still recover. That fallback still produces some `XREADGROUP`/`XAUTOCLAIM` traffic, but it is resilience traffic rather than endpoint-catalog churn.
---

View File

@@ -3,7 +3,7 @@
## Status
- **Implemented** in Sprint 8100.0011.0003.
- Core components: Gateway DI wiring, GatewayHostedService integration, GatewayTransportClient dispatch.
- Last updated: 2025-12-24 (UTC).
- Last updated: 2026-04-05 (UTC).
## Purpose
Enable Gateway ↔ microservice Router traffic over an offline-friendly, Redis-compatible transport (Valkey) by using the existing **Messaging** transport layer:
@@ -27,20 +27,28 @@ This supports environments where direct TCP/TLS microservice connections are und
## High-Level Flow
1) Microservice connects via messaging transport:
- publishes a HELLO message to the gateway request queue
- publishes a slim `HELLO` message with instance identity to the gateway control queue
2) Gateway processes HELLO:
- registers instance + endpoints into routing state
3) Gateway routes an HTTP request to a microservice:
- registers the connection identity and requests endpoint metadata replay when needed
3) Microservice answers the replay request:
- publishes an `EndpointsUpdate` frame with endpoints, schemas, and OpenAPI metadata
4) Gateway applies the metadata replay:
- updates routing state, effective claims, and aggregated OpenAPI
5) Gateway routes an HTTP request to a microservice:
- publishes a REQUEST message to the service request queue
4) Microservice handles request:
6) Microservice handles request:
- executes handler (or ASP.NET bridge) and publishes a RESPONSE message
5) Gateway returns response to the client.
7) Gateway returns response to the client.
Messaging-specific recovery behavior:
- Startup resync: the gateway sends `ResyncRequest` immediately after a slim `HELLO`.
- Administrative resync: `POST /api/v1/gateway/administration/router/resync` can request replay for one connection or the whole messaging fleet.
- Gateway-state miss: if a heartbeat arrives for an unknown messaging connection, the gateway seeds minimal state from the heartbeat identity and requests replay instead of waiting for a reconnect.
## Queue Topology (Conceptual)
The Messaging transport uses a small set of queues (names are configurable):
- **Gateway request queue**: receives HELLO / HEARTBEAT / REQUEST frames from services
- **Gateway response queue**: receives RESPONSE frames from services
- **Per-service request queues**: gateway publishes REQUEST frames targeted to a service
- **Gateway control queue**: receives service-to-gateway HELLO / HEARTBEAT / ENDPOINTS_UPDATE / RESPONSE frames
- **Per-service incoming queues**: gateway publishes REQUEST / CANCEL / RESYNC_REQUEST frames targeted to a service
- **Dead letter queues** (optional): for messages that exceed retries/leases
## Configuration
@@ -87,6 +95,8 @@ if (bootstrapOptions.Transports.Messaging.Enabled)
- **At-least-once** delivery: message queues and leases imply retries are possible; handlers should be idempotent where feasible.
- **Lease timeouts**: must be tuned to max handler execution time; long-running tasks should respond with 202 + job id rather than blocking.
- **Determinism**: message ordering may vary; Router must not depend on arrival order for correctness (only for freshness/telemetry).
- **Push-first with recovery fallback**: Valkey Pub/Sub notifications wake consumers immediately when possible. If notifications silently stop, the queue layer still wakes via timeout fallback, connection-restored hooks, and randomized proactive re-subscription so requests and resync control frames do not wedge forever.
- **Queue fallback cost**: every wake can perform `XAUTOCLAIM` plus `XREADGROUP` checks before sleeping again. That traffic is expected resilience overhead, but it is materially smaller than replaying the full endpoint catalog on every heartbeat interval.
## Security Notes
- Messaging transport is internal. External identity must still be enforced at the Gateway.
@@ -97,11 +107,12 @@ if (bootstrapOptions.Transports.Messaging.Enabled)
### Completed (Sprint 8100.0011.0003)
1. ✅ Wire Messaging transport into Gateway:
- start/stop `MessagingTransportServer` in `GatewayHostedService`
- subscribe to `OnHelloReceived`, `OnHeartbeatReceived`, `OnResponseReceived`, `OnConnectionClosed` events
- reuse routing state updates and claims store updates
- start/stop `MessagingTransportServer` in `GatewayHostedService`
- subscribe to `OnHelloReceived`, `OnHeartbeatReceived`, `OnEndpointsUpdated`, `OnResponseReceived`, `OnConnectionClosed` events
- reuse routing state updates and claims store updates
2. ✅ Extend Gateway transport client to support `TransportType.Messaging` for dispatch.
3. ✅ Add config options (`GatewayMessagingTransportOptions`) and DI mappings.
4. ✅ Switch messaging registration from periodic full HELLO replay to explicit `ResyncRequest` / `EndpointsUpdate` control frames.
### Remaining Work
1. Add deployment examples (compose/helm) for Valkey transport.

View File

@@ -219,16 +219,25 @@ Graph explorer overlay behavior now supports deterministic lattice-state reachab
Behavior details:
- Reachability legend in overlay controls maps lattice states `SR/SU/RO/RU/CR/CU/X` to explicit halo colors.
- Time slider now binds to deterministic snapshot checkpoints (`current`, `1d`, `7d`, `30d`) and renders timeline event text for each selection.
- Reachability mock data generation is deterministic per `(nodeId, snapshot)` so repeated runs produce stable lattice status, confidence, and observation timestamps.
- Canvas halo stroke colors are derived from lattice state (not generic status), and halo titles include lattice state plus observed timestamp for operator audit context.
- Overlay controls only expose the live shipped overlay families `policy`, `vex`, and `aoc`.
- The explorer route consumes `GET /graphs/{graphId}/tiles?includeOverlays=true` through `GraphPlatformHttpClient`.
- Canvas halo colors and summaries derive from live overlay payloads, with policy taking precedence over vex and aoc when multiple overlays exist for the same node.
- When a graph has no overlays, the shipped route shows explicit empty-state messaging rather than inventing reachability or simulation data.
Verification coverage:
- `src/Web/StellaOps.Web/src/tests/graph_reachability_overlay/graph-overlays.component.spec.ts`
- `src/Web/StellaOps.Web/src/tests/graph_reachability_overlay/graph-canvas.component.spec.ts`
### 3.8 Active-Surface Verification Lane and Setup Wizard Styling (Sprint 20260405_002 / 005)
Focused verification and bundle-polish notes for the shipped surfaces:
- The repo now carries a dedicated active-surface test lane: `ng run stellaops-web:test-active-surfaces` (also exposed as `npm run test:active-surfaces`).
- The lane intentionally covers the currently shipped Graph, Evidence, deployment creation, vulnerability detail, and environment-detail flows instead of the broader legacy spec backlog.
- The setup wizard and step-content styling moved from oversized inline component styles into global SCSS under `src/Web/StellaOps.Web/src/styles/` so the production build clears `anyComponentStyle` budgets without raising those budgets.
- Touched shipped routes continue to use explicit live empty/error/unavailable states rather than mock action fallbacks.
---
## 4) Authentication