refactor: JobEngine cleanup + crypto compose refactor + sprint plans + timeline merge prep
- Remove zombie JobEngine WebService (no container runs it) - Remove dangling STELLAOPS_JOBENGINE_URL, replace with RELEASE_ORCHESTRATOR_URL - Update Timeline audit paths to release-orchestrator - Extract smremote to docker-compose.crypto-provider.smremote.yml - Rename crypto compose files for consistent naming - Add crypto provider health probe API (CP-001) + tenant preferences (CP-002) - Create sprint plans: crypto picker, VulnExplorer merge, scheduler plugins - Timeline merge prep: ingestion worker relocated to infrastructure lib Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -194,9 +194,9 @@ For offline bundles, imports, and update workflows, see:
|
||||
|
||||
| Region | Testing | Production |
|
||||
|--------|---------|------------|
|
||||
| China (SM2/SM3/SM4) | `docker-compose.compliance-china.yml` + `docker-compose.crypto-sim.yml` | `docker-compose.compliance-china.yml` + `docker-compose.sm-remote.yml` |
|
||||
| Russia (GOST) | `docker-compose.compliance-russia.yml` + `docker-compose.crypto-sim.yml` | `docker-compose.compliance-russia.yml` + `docker-compose.cryptopro.yml` |
|
||||
| EU (eIDAS) | `docker-compose.compliance-eu.yml` + `docker-compose.crypto-sim.yml` | `docker-compose.compliance-eu.yml` |
|
||||
| China (SM2/SM3/SM4) | `docker-compose.compliance-china.yml` + `docker-compose.crypto-provider.crypto-sim.yml` | `docker-compose.compliance-china.yml` + `docker-compose.crypto-provider.smremote.yml` |
|
||||
| Russia (GOST) | `docker-compose.compliance-russia.yml` + `docker-compose.crypto-provider.crypto-sim.yml` | `docker-compose.compliance-russia.yml` + `docker-compose.crypto-provider.cryptopro.yml` |
|
||||
| EU (eIDAS) | `docker-compose.compliance-eu.yml` + `docker-compose.crypto-provider.crypto-sim.yml` | `docker-compose.compliance-eu.yml` |
|
||||
|
||||
See `devops/compose/README.md` for detailed compliance deployment instructions.
|
||||
|
||||
|
||||
@@ -12,7 +12,7 @@ Dedicated remote service for Chinese SM2/SM3/SM4 cryptographic operations, runni
|
||||
## Implementation Details
|
||||
- **Service Entry Point**: `src/SmRemote/StellaOps.SmRemote.Service/Program.cs` -- ASP.NET Core minimal API service exposing `/status`, `/health`, `/sign`, `/verify`, `/hash`, `/encrypt`, and `/decrypt`.
|
||||
- **SmRemote Integration Tests**: `src/SmRemote/__Tests/StellaOps.SmRemote.Service.Tests/SmRemoteServiceApiTests.cs` -- endpoint-level integration coverage for positive and negative paths.
|
||||
- **Docker Compose Overlay**: `devops/compose/docker-compose.sm-remote.yml` -- overlay configuration for running SM Remote alongside the base platform compose stack.
|
||||
- **Docker Compose Overlay**: `devops/compose/docker-compose.crypto-provider.smremote.yml` -- overlay configuration for running SM Remote alongside the base platform compose stack.
|
||||
|
||||
## E2E Test Plan
|
||||
- [x] Start the SM Remote service and verify `/health` and `/status` return success responses.
|
||||
@@ -20,7 +20,7 @@ Dedicated remote service for Chinese SM2/SM3/SM4 cryptographic operations, runni
|
||||
- [x] Submit an SM2 signing request and verify the returned signature via `/verify`.
|
||||
- [x] Submit an SM4 encryption request, then decrypt the ciphertext via `/decrypt`, and verify the round-trip matches the original plaintext.
|
||||
- [x] Verify negative-path validation for invalid hash payloads, invalid SM4 key lengths, and invalid sign input (HTTP 400 responses).
|
||||
- [x] Confirm compose overlay contract remains documented for alongside-platform deployment (`devops/compose/docker-compose.sm-remote.yml`).
|
||||
- [x] Confirm compose overlay contract remains documented for alongside-platform deployment (`devops/compose/docker-compose.crypto-provider.smremote.yml`).
|
||||
|
||||
## Verification
|
||||
- Verified on 2026-02-11 via FLOW Tier 0/1/2 replay in `run-005`.
|
||||
|
||||
@@ -0,0 +1,569 @@
|
||||
# Sprint 20260408-002 - VulnExplorer Persistence Migration + Merge into Findings Ledger
|
||||
|
||||
## Topic & Scope
|
||||
- Two-phase plan: first migrate VulnExplorer from in-memory ConcurrentDictionary stores to Postgres, then merge it into the Findings Ledger WebService.
|
||||
- Phase 1 (Sprint 1) eliminates all in-memory data stores and SampleData in VulnExplorer by introducing a persistence layer with SQL migrations, while VulnExplorer continues to run as its own service. This makes the data durable and tests the schema before the merge.
|
||||
- Phase 2 (Sprint 2) moves VulnExplorer's endpoint surface into Ledger WebService as projections, wires VEX decisions and fix verifications as Ledger event types, removes the VulnExplorer container, and updates all consumers.
|
||||
- Working directory: `src/Findings/`
|
||||
- Expected evidence: all VulnExplorer endpoints backed by Postgres (Phase 1), then accessible via Ledger WebService with no separate container (Phase 2), existing tests pass, new integration tests cover persistence and merged endpoints.
|
||||
|
||||
## Analysis Summary (Decision Record)
|
||||
|
||||
### Why two phases instead of one
|
||||
|
||||
The original single-sprint plan assumed VulnExplorer's in-memory stores could be directly replaced by Ledger projections in one step. However:
|
||||
|
||||
1. VulnExplorer has five distinct in-memory stores (`SampleData`, `VexDecisionStore`, `FixVerificationStore`, `AuditBundleStore`, `EvidenceSubgraphStore`) with ConcurrentDictionary-based state and complex business logic (VEX override attestation flow, fix verification state machine, audit bundle aggregation).
|
||||
2. Migrating persistence and merging service boundaries simultaneously creates too many failure modes -- schema issues mask merge issues and vice versa.
|
||||
3. Phase 1 gives us a working VulnExplorer with real Postgres persistence that can be validated independently before the merge destabilizes routing and API contracts.
|
||||
4. Phase 1 also validates the data model against the Ledger schema, ensuring the Phase 2 projection mapping is sound.
|
||||
|
||||
### Store-to-persistence mapping
|
||||
|
||||
| VulnExplorer Store | Phase 1 (Own Tables) | Phase 2 (Ledger Equivalent) |
|
||||
|---|---|---|
|
||||
| `SampleData` (VulnSummary/VulnDetail) | `vulnexplorer.vulnerabilities` table | `findings_projection` table + `VulnerabilityDetailService` + `FindingSummaryService` |
|
||||
| `VexDecisionStore` | `vulnexplorer.vex_decisions` table | Ledger events (`finding.vex_decision_created/updated`) + `observations` table + `ledger_attestation_pointers` |
|
||||
| `FixVerificationStore` | `vulnexplorer.fix_verifications` table | Ledger events (`finding.fix_verification_created/updated`) + `observations` table |
|
||||
| `AuditBundleStore` | `vulnexplorer.audit_bundles` table | `EvidenceBundleService` + `OrchestratorExportService` |
|
||||
| `EvidenceSubgraphStore` | Delegates to `EvidenceGraphBuilder` via HTTP/internal call | `EvidenceGraphBuilder` + `EvidenceGraphEndpoints` (real persistence-backed graph) |
|
||||
|
||||
### Key codebase facts informing this plan
|
||||
|
||||
**In-memory stores identified (all `ConcurrentDictionary`):**
|
||||
- `VexDecisionStore` (`src/Findings/StellaOps.VulnExplorer.Api/Data/VexDecisionStore.cs`) -- 244 lines, includes `CreateWithAttestationAsync`/`UpdateWithAttestationAsync` with `IVexOverrideAttestorClient` integration
|
||||
- `FixVerificationStore` (`src/Findings/StellaOps.VulnExplorer.Api/Data/TriageWorkflowStores.cs`) -- state machine with transitions
|
||||
- `AuditBundleStore` (same file) -- sequential ID generation, evidence ref aggregation
|
||||
- `EvidenceSubgraphStore` (same file) -- returns hardcoded graph structure
|
||||
- `SampleData` (`src/Findings/StellaOps.VulnExplorer.Api/Data/SampleData.cs`) -- two hardcoded VulnSummary/VulnDetail records
|
||||
|
||||
**UI consumers (must preserve API shape):**
|
||||
- `src/Web/StellaOps.Web/src/app/core/api/vex-decisions.client.ts` -- calls `GET/POST/PATCH /v1/vex-decisions` via `VEX_DECISIONS_API_BASE_URL`
|
||||
- `src/Web/StellaOps.Web/src/app/core/api/audit-bundles.client.ts` -- calls `GET/POST /v1/audit-bundles` via `AUDIT_BUNDLES_API_BASE_URL`
|
||||
- `src/Web/StellaOps.Web/src/app/features/vuln-explorer/services/evidence-subgraph.service.ts` -- calls `/api/vuln-explorer/findings/{id}/evidence-subgraph`
|
||||
- `src/Web/StellaOps.Web/src/app/features/triage/services/vulnerability-list.service.ts` -- calls `/api/v1/vulnerabilities`
|
||||
- `src/Web/StellaOps.Web/src/app/features/vulnerabilities/vulnerability-detail.component.ts` -- consumes VulnExplorer data
|
||||
- `src/Web/StellaOps.Web/src/tests/vuln_explorer/` -- behavioral specs for evidence tree and filter presets
|
||||
- `src/Web/StellaOps.Web/tests/e2e/triage-explainability-workspace.spec.ts` -- E2E test
|
||||
|
||||
**Cross-service consumers:**
|
||||
- `src/VexLens/StellaOps.VexLens/Integration/IVulnExplorerIntegration` + `VulnExplorerIntegration` -- VexLens enriches vulnerabilities with VEX consensus data via this interface
|
||||
- `src/Concelier/StellaOps.Concelier.Core/Diagnostics/VulnExplorerTelemetry.cs` -- telemetry meter `StellaOps.Concelier.VulnExplorer` for advisory processing metrics
|
||||
- `src/Concelier/StellaOps.Concelier.WebService/Program.cs` -- calls `VulnExplorerTelemetry` methods during advisory ingest
|
||||
- `src/Authority/StellaOps.Auth.Abstractions/StellaOpsServiceIdentities.cs` -- defines `VulnExplorer = "vuln-explorer"` service identity
|
||||
|
||||
**Infrastructure references (Phase 2 removal scope):**
|
||||
- `devops/compose/docker-compose.stella-ops.yml` -- vulnexplorer container with alias `vulnexplorer.stella-ops.local`
|
||||
- `devops/compose/docker-compose.stella-services.yml` -- vulnexplorer service definition, `Router__Messaging__ConsumerGroup: "vulnexplorer"`
|
||||
- `devops/compose/router-gateway-local.json` -- route `^/api/vuln-explorer(.*)` -> `http://vulnexplorer.stella-ops.local/api/vuln-explorer$1`
|
||||
- `devops/compose/envsettings-override.json` -- `apiBaseUrls.vulnexplorer`
|
||||
- `devops/compose/hosts.stellaops.local` -- hostname entry
|
||||
- `devops/helm/stellaops/values.yaml` -- no vulnexplorer entry found (Helm clean)
|
||||
- `devops/helm/stellaops/templates/vuln-mock.yaml` -- mock deployment template
|
||||
|
||||
**Documentation references:**
|
||||
- `docs/technical/architecture/webservice-catalog.md`
|
||||
- `docs/technical/architecture/port-registry.md`
|
||||
- `docs/technical/architecture/component-map.md`
|
||||
- `docs/technical/architecture/module-matrix.md`
|
||||
- `docs/modules/findings-ledger/README.md`
|
||||
- `docs/modules/web/README.md`
|
||||
- `docs/modules/ui/README.md`
|
||||
- `docs/modules/ui/architecture.md`
|
||||
- `docs/modules/ui/component-preservation-map/` (dead components under `vuln-explorer/`)
|
||||
- `docs/modules/vex-lens/guides/explorer-integration.md`
|
||||
- `docs/modules/authority/AUTHORITY.md`
|
||||
- `docs/API_CLI_REFERENCE.md`
|
||||
- `docs/features/checked/vulnexplorer/vulnexplorer-triage-api.md`
|
||||
- `docs/features/checked/web/vuln-explorer-with-evidence-tree-and-citation-links.md`
|
||||
- `docs/features/checked/web/filter-preset-pills-with-url-synchronization.md`
|
||||
- `docs/operations/runbooks/vuln-ops.md`
|
||||
- `docs/qa/feature-checks/state/vulnexplorer.json`
|
||||
- `docs/dev/DEV_ENVIRONMENT_SETUP.md`
|
||||
- `docs/dev/SOLUTION_BUILD_GUIDE.md`
|
||||
|
||||
**Existing test projects:**
|
||||
- `src/Findings/__Tests/StellaOps.VulnExplorer.Api.Tests/` -- `VulnApiTests.cs` (4 unit tests), `VulnExplorerTriageApiE2ETests.cs` (5 integration tests covering VEX decisions, attestation, evidence subgraph, fix verification, audit bundles)
|
||||
|
||||
## Dependencies & Concurrency
|
||||
- No upstream sprint dependencies.
|
||||
- The VEX override attestation flow depends on `IVexOverrideAttestorClient` which calls the Attestor service -- this integration is preserved as-is in both phases.
|
||||
- Phase 1 tasks (VXPM-*) can run in parallel: VXPM-001/002/003 are independent. VXPM-004 depends on all three. VXPM-005 depends on VXPM-004.
|
||||
- Phase 2 tasks (VXLM-*) depend on Phase 1 completion. VXLM-001/002 are independent. VXLM-003 depends on both. VXLM-004/005 depend on VXLM-003.
|
||||
|
||||
## Documentation Prerequisites
|
||||
- `docs/modules/findings-ledger/schema.md` (Ledger schema and Merkle invariants)
|
||||
- `docs/modules/findings-ledger/workflow-inference.md` (projection rules)
|
||||
- `src/Findings/AGENTS.md` (module working rules)
|
||||
- `docs/modules/vex-lens/guides/explorer-integration.md` (VexLens integration contract)
|
||||
|
||||
---
|
||||
|
||||
# Phase 1 -- In-Memory to Postgres Migration
|
||||
|
||||
Goal: Replace all ConcurrentDictionary stores with Postgres-backed repositories while VulnExplorer remains its own service. Validate data model and API contract preservation.
|
||||
|
||||
## Delivery Tracker (Phase 1)
|
||||
|
||||
### VXPM-001 - Create VulnExplorer Postgres schema and SQL migrations
|
||||
Status: TODO
|
||||
Dependency: none
|
||||
Owners: Backend engineer
|
||||
|
||||
Task description:
|
||||
- Create a new persistence library `StellaOps.VulnExplorer.Persistence` (or add persistence to the existing `StellaOps.VulnExplorer.Api` project) following the pattern in `src/Findings/StellaOps.Findings.Ledger/Infrastructure/Postgres/`.
|
||||
- Design tables under a `vulnexplorer` schema:
|
||||
- `vulnexplorer.vex_decisions` -- stores VEX decision records with all fields from `VexDecisionDto`: id (PK, uuid), vulnerability_id, subject (JSONB), status, justification_type, justification_text, evidence_refs (JSONB), scope (JSONB), valid_for (JSONB), attestation_ref (JSONB), signed_override (JSONB), supersedes_decision_id, created_by (JSONB), tenant_id, created_at, updated_at.
|
||||
- `vulnexplorer.fix_verifications` -- stores fix verification records: cve_id (PK), component_purl, artifact_digest, verdict, transitions (JSONB array), tenant_id, created_at, updated_at.
|
||||
- `vulnexplorer.audit_bundles` -- stores audit bundle records: bundle_id (PK), tenant_id, decision_ids (JSONB array), evidence_refs (JSONB array), created_at.
|
||||
- Write SQL migration files as embedded resources:
|
||||
- `001_initial_vulnexplorer_schema.sql` -- create schema and tables
|
||||
- Include RLS policies for tenant isolation (follow pattern from `src/Findings/StellaOps.Findings.Ledger/migrations/007_enable_rls.sql`)
|
||||
- Wire `AddStartupMigrations("vulnexplorer", "VulnExplorer", migrationsAssembly)` in VulnExplorer's `Program.cs` per the auto-migration requirement (CLAUDE.md section 2.7).
|
||||
|
||||
Tests:
|
||||
- Unit test that migration SQL is valid and can be parsed
|
||||
- Integration test that migrations apply cleanly to a fresh database
|
||||
- Integration test that migrations are idempotent (re-run does not fail)
|
||||
|
||||
Users:
|
||||
- No user-facing changes -- this is infrastructure-only
|
||||
|
||||
Documentation:
|
||||
- Add schema documentation to `src/Findings/StellaOps.VulnExplorer.Api/AGENTS.md` describing the new tables
|
||||
- Document migration file naming convention in the module AGENTS.md
|
||||
|
||||
Completion criteria:
|
||||
- [ ] SQL migration files exist and are embedded resources in the project
|
||||
- [ ] Schema creates cleanly on a fresh database
|
||||
- [ ] Auto-migration wired in Program.cs and runs on startup
|
||||
- [ ] RLS policies enforce tenant isolation
|
||||
- [ ] No manual init scripts required
|
||||
|
||||
### VXPM-002 - Implement Postgres repository for VEX decisions
|
||||
Status: TODO
|
||||
Dependency: none (can start before VXPM-001 with interface-first approach)
|
||||
Owners: Backend engineer
|
||||
|
||||
Task description:
|
||||
- Create `IVexDecisionRepository` interface mirroring the `VexDecisionStore` API surface:
|
||||
- `CreateAsync(VexDecisionDto)` -> `VexDecisionDto`
|
||||
- `UpdateAsync(Guid, UpdateVexDecisionRequest)` -> `VexDecisionDto?`
|
||||
- `GetAsync(Guid)` -> `VexDecisionDto?`
|
||||
- `QueryAsync(vulnerabilityId?, subjectName?, status?, skip, take)` -> `IReadOnlyList<VexDecisionDto>`
|
||||
- `CountAsync()` -> `int`
|
||||
- Implement `PostgresVexDecisionRepository` using EF Core or raw Npgsql (follow the pattern in `src/Findings/StellaOps.Findings.Ledger/Infrastructure/Postgres/PostgresLedgerEventRepository.cs`).
|
||||
- Create `IFixVerificationRepository` and `PostgresFixVerificationRepository`:
|
||||
- `CreateAsync(CreateFixVerificationRequest)` -> `FixVerificationRecord`
|
||||
- `UpdateAsync(cveId, verdict)` -> `FixVerificationRecord?`
|
||||
- Create `IAuditBundleRepository` and `PostgresAuditBundleRepository`:
|
||||
- `CreateAsync(tenant, decisions)` -> `AuditBundleResponse`
|
||||
- Preserve the `IVexOverrideAttestorClient` integration: `CreateWithAttestationAsync` and `UpdateWithAttestationAsync` logic moves into a service layer that wraps the repository.
|
||||
|
||||
Tests:
|
||||
- Unit tests for each repository method with an in-memory database or test containers
|
||||
- Test that VEX decision CRUD preserves all fields (especially JSONB: subject, scope, evidence_refs, signed_override)
|
||||
- Test that fix verification state transitions are correctly persisted and reconstructed
|
||||
- Test that audit bundle creation aggregates evidence refs from persisted decisions
|
||||
- Test deterministic ordering (createdAt desc, id asc) matches current in-memory behavior
|
||||
|
||||
Users:
|
||||
- No user-facing API changes -- same endpoints, same request/response shapes
|
||||
- `VexDecisionStore.CreateWithAttestationAsync` behavior preserved for `IVexOverrideAttestorClient`
|
||||
|
||||
Documentation:
|
||||
- Document repository interfaces in module AGENTS.md
|
||||
|
||||
Completion criteria:
|
||||
- [ ] All repository interfaces defined
|
||||
- [ ] Postgres implementations for all three repositories
|
||||
- [ ] Business logic (attestation flow, state machine, bundle aggregation) preserved in service layer
|
||||
- [ ] All JSONB fields round-trip correctly
|
||||
|
||||
### VXPM-003 - Replace SampleData with seeded Postgres data
|
||||
Status: TODO
|
||||
Dependency: none
|
||||
Owners: Backend engineer
|
||||
|
||||
Task description:
|
||||
- Remove `SampleData.cs` (hardcoded VulnSummary/VulnDetail records).
|
||||
- Replace the vuln list/detail endpoints (`GET /v1/vulns`, `GET /v1/vulns/{id}`) with queries against a new `IVulnerabilityQueryService` that reads from `findings_projection` (the Ledger table, accessed via cross-schema query or a shared connection) or a VulnExplorer-owned view/table.
|
||||
- Decision needed: whether VulnExplorer reads from `findings_ledger.findings_projection` directly (simpler, couples to Ledger schema) or maintains its own materialized view. Recommendation: read from Ledger projection directly via the shared Postgres connection, since VulnExplorer will be merged into Ledger in Phase 2 anyway.
|
||||
- If Ledger projection is used: wire the Ledger's `IFindingProjectionRepository` or create a read-only query service that maps `FindingProjection` rows to `VulnSummary`/`VulnDetail`.
|
||||
- If VulnExplorer-owned table is used: create `vulnexplorer.vulnerability_summaries` table and a sync mechanism from Ledger events.
|
||||
- Replace `EvidenceSubgraphStore.Build()` (which returns hardcoded graph) with either:
|
||||
- A call to Ledger's `IEvidenceGraphBuilder.BuildAsync()` (if accessible via shared library reference)
|
||||
- An HTTP call to Ledger's `/evidence-graph/{findingId}` endpoint
|
||||
- Recommendation: use shared library reference since both are in `src/Findings/`
|
||||
|
||||
Tests:
|
||||
- Test that `GET /v1/vulns` returns findings from database (not hardcoded data)
|
||||
- Test that `GET /v1/vulns/{id}` returns finding detail from database
|
||||
- Test filtering (CVE, PURL, severity, exploitability, fixAvailable) works against real data
|
||||
- Test pagination (pageToken/pageSize) works
|
||||
- Test evidence subgraph returns real graph data (not the hardcoded stub)
|
||||
- Regression test: verify the 4 existing `VulnApiTests` (List_ReturnsDeterministicOrder, List_FiltersByCve, Detail_ReturnsNotFoundWhenMissing, etc.) pass with the new persistence layer -- these will need seed data in the test database
|
||||
|
||||
Users:
|
||||
- `VulnerabilityListService` (UI) calls `/api/v1/vulnerabilities` -- verify response shape unchanged
|
||||
- `EvidenceSubgraphService` (UI) calls `/api/vuln-explorer/findings/{id}/evidence-subgraph` -- verify response shape unchanged
|
||||
|
||||
Documentation:
|
||||
- Update `docs/features/checked/vulnexplorer/vulnexplorer-triage-api.md` to note that data is now persisted (not in-memory)
|
||||
|
||||
Completion criteria:
|
||||
- [ ] `SampleData.cs` deleted
|
||||
- [ ] `EvidenceSubgraphStore` hardcoded data removed
|
||||
- [ ] Vuln list/detail endpoints return data from Postgres
|
||||
- [ ] Evidence subgraph endpoint returns real graph data
|
||||
- [ ] All existing filters and pagination work against Postgres queries
|
||||
- [ ] Existing test assertions updated and passing
|
||||
|
||||
### VXPM-004 - Wire repositories into VulnExplorer Program.cs and replace in-memory singletons
|
||||
Status: TODO
|
||||
Dependency: VXPM-001, VXPM-002, VXPM-003
|
||||
Owners: Backend engineer
|
||||
|
||||
Task description:
|
||||
- Update `Program.cs` to replace all in-memory `AddSingleton` registrations:
|
||||
- Remove `builder.Services.AddSingleton<VexDecisionStore>(...)` -> register `IVexDecisionRepository` (scoped)
|
||||
- Remove `builder.Services.AddSingleton<FixVerificationStore>()` -> register `IFixVerificationRepository` (scoped)
|
||||
- Remove `builder.Services.AddSingleton<AuditBundleStore>()` -> register `IAuditBundleRepository` (scoped)
|
||||
- Remove `builder.Services.AddSingleton<EvidenceSubgraphStore>()` -> register `IEvidenceGraphBuilder` or equivalent
|
||||
- Update all endpoint handlers in `Program.cs` to use the repository/service interfaces instead of the concrete stores.
|
||||
- Wire the Postgres connection string from `ConnectionStrings__Default` (already in compose environment).
|
||||
- Ensure the `StubVexOverrideAttestorClient` remains wired for dev/test, with `HttpVexOverrideAttestorClient` available for production.
|
||||
- Verify all 10 endpoints continue to work:
|
||||
- `GET /v1/vulns` (list)
|
||||
- `GET /v1/vulns/{id}` (detail)
|
||||
- `POST /v1/vex-decisions` (create, with optional attestation)
|
||||
- `PATCH /v1/vex-decisions/{id:guid}` (update)
|
||||
- `GET /v1/vex-decisions` (list)
|
||||
- `GET /v1/vex-decisions/{id:guid}` (get)
|
||||
- `GET /v1/evidence-subgraph/{vulnId}` (subgraph)
|
||||
- `POST /v1/fix-verifications` (create)
|
||||
- `PATCH /v1/fix-verifications/{cveId}` (update)
|
||||
- `POST /v1/audit-bundles` (create)
|
||||
|
||||
Tests:
|
||||
- Full integration test suite against Postgres: run the existing `VulnExplorerTriageApiE2ETests` (5 tests) against the Postgres-backed service
|
||||
- Run the existing `VulnApiTests` (4 tests) against the Postgres-backed service
|
||||
- Verify no 500 errors on cold start (fresh DB with auto-migration)
|
||||
- Verify service starts and registers with Valkey router successfully
|
||||
|
||||
Users:
|
||||
- All UI consumers should see zero behavioral change
|
||||
- Gateway routing unchanged (`/api/vuln-explorer(.*) -> vulnexplorer.stella-ops.local`)
|
||||
|
||||
Documentation:
|
||||
- Update `src/Findings/StellaOps.VulnExplorer.Api/AGENTS.md` to reflect persistence architecture
|
||||
|
||||
Completion criteria:
|
||||
- [ ] Zero `ConcurrentDictionary` or in-memory store references in VulnExplorer
|
||||
- [ ] All 10 endpoints return data from Postgres
|
||||
- [ ] `VulnExplorerTriageApiE2ETests` (5 tests) pass
|
||||
- [ ] `VulnApiTests` (4 tests) pass with seeded data
|
||||
- [ ] Cold-start works: auto-migration creates schema, service starts, responds to health check
|
||||
- [ ] Docker compose: vulnexplorer container starts cleanly with Postgres
|
||||
|
||||
### VXPM-005 - Phase 1 integration validation
|
||||
Status: TODO
|
||||
Dependency: VXPM-004
|
||||
Owners: QA, Backend engineer
|
||||
|
||||
Task description:
|
||||
- Full system test: bring up the complete compose stack and verify:
|
||||
- VulnExplorer starts, auto-migrates, and registers with Valkey
|
||||
- UI flows that consume VulnExplorer work end-to-end (navigate to vuln explorer page, view findings, create VEX decision, view evidence subgraph)
|
||||
- VexLens `IVulnExplorerIntegration` continues to enrich vulnerabilities (this is an in-process integration in VexLens, not an HTTP call to VulnExplorer -- verify it still works)
|
||||
- Concelier `VulnExplorerTelemetry` metrics still emit (this is just a meter, no runtime dependency on VulnExplorer service)
|
||||
- Run all existing test suites:
|
||||
- `src/Findings/__Tests/StellaOps.VulnExplorer.Api.Tests/` (9 tests)
|
||||
- `src/Findings/__Tests/StellaOps.Findings.Ledger.Tests/` (verify no regressions)
|
||||
- `src/Web/StellaOps.Web/src/tests/vuln_explorer/` (2 behavioral specs)
|
||||
|
||||
Tests:
|
||||
- All tests listed above pass
|
||||
- Manual or Playwright verification of UI vuln explorer page
|
||||
|
||||
Users:
|
||||
- End-to-end user flow validated
|
||||
|
||||
Documentation:
|
||||
- Record test results in Execution Log
|
||||
|
||||
Completion criteria:
|
||||
- [ ] All 9 VulnExplorer API tests pass
|
||||
- [ ] All Ledger tests pass (no regression)
|
||||
- [ ] UI behavioral specs pass
|
||||
- [ ] VulnExplorer container starts and responds in full compose stack
|
||||
- [ ] Data survives container restart (persistence verified)
|
||||
|
||||
---
|
||||
|
||||
# Phase 2 -- Merge VulnExplorer into Findings Ledger
|
||||
|
||||
Goal: Eliminate VulnExplorer as a separate service. Move all endpoints into Ledger WebService. VEX decisions and fix verifications become Ledger events. Remove VulnExplorer container from compose.
|
||||
|
||||
## Delivery Tracker (Phase 2)
|
||||
|
||||
### VXLM-001 - Migrate VulnExplorer endpoint DTOs into Ledger WebService
|
||||
Status: DONE
|
||||
Dependency: VXPM-005 (Phase 1 complete)
|
||||
Owners: Backend engineer
|
||||
|
||||
Task description:
|
||||
- Move VulnExplorer contract types into the Ledger WebService `Contracts/` namespace:
|
||||
- `VulnModels.cs` (VulnSummary, VulnDetail, VulnListResponse, VulnFilter, EvidenceProvenance, PolicyRationale, PackageAffect, AdvisoryRef, EvidenceRef)
|
||||
- `VexDecisionModels.cs` (VexDecisionDto, CreateVexDecisionRequest, UpdateVexDecisionRequest, VexDecisionListResponse, SubjectRefDto, EvidenceRefDto, VexScopeDto, ValidForDto, AttestationRefDto, ActorRefDto, VexOverrideAttestationDto, AttestationVerificationStatusDto, AttestationRequestOptions, and all enums: VexStatus, SubjectType, EvidenceType, VexJustificationType)
|
||||
- `FixVerificationModels.cs` (FixVerificationResponse, FixVerificationGoldenSetRef, FixVerificationAnalysis, FunctionChangeResult, FunctionChangeChild, ReachabilityChangeResult, FixVerificationRiskImpact, FixVerificationEvidenceChain, EvidenceChainItem, FixVerificationRequest)
|
||||
- `AttestationModels.cs` (VulnScanAttestationDto, AttestationSubjectDto, VulnScanPredicateDto, ScannerInfoDto, ScannerDbInfoDto, SeverityCountsDto, FindingReportDto, AttestationMetaDto, AttestationSignerDto, AttestationListResponse, AttestationSummaryDto, AttestationType)
|
||||
- `TriageWorkflowModels.cs` (CreateFixVerificationRequest, UpdateFixVerificationRequest, CreateAuditBundleRequest, AuditBundleResponse, FixVerificationTransition, FixVerificationRecord)
|
||||
- Contracts from `StellaOps.VulnExplorer.WebService.Contracts.EvidenceSubgraphContracts` already exist conceptually in the Ledger's `EvidenceGraphContracts.cs` -- create thin adapter types or type aliases where the frontend expects the VulnExplorer shape.
|
||||
- Keep the VulnExplorer API path prefix (`/v1/vulns`, `/v1/vex-decisions`, `/v1/evidence-subgraph`, `/v1/fix-verifications`, `/v1/audit-bundles`) as route groups in the Ledger WebService to avoid frontend breaking changes.
|
||||
|
||||
Tests:
|
||||
- Compilation test: all contract types compile within Ledger WebService
|
||||
- Verify no duplicate type definitions between the two projects
|
||||
- Verify existing Ledger tests still pass after adding new contracts
|
||||
|
||||
Users:
|
||||
- No UI changes needed at this stage -- endpoints return 501 initially
|
||||
- Frontend API clients (`vex-decisions.client.ts`, `audit-bundles.client.ts`, `evidence-subgraph.service.ts`) will be retargeted in VXLM-004
|
||||
|
||||
Documentation:
|
||||
- Update `docs/API_CLI_REFERENCE.md` to note VulnExplorer endpoints are now served by Findings Ledger
|
||||
|
||||
Completion criteria:
|
||||
- [ ] All VulnExplorer contract types compile within Ledger WebService
|
||||
- [ ] No duplicate type definitions between the two projects
|
||||
- [ ] VulnExplorer API paths registered in Ledger WebService (can return 501 initially)
|
||||
- [ ] Existing Ledger tests still pass
|
||||
|
||||
### VXLM-002 - Wire VulnExplorer read endpoints to Ledger projection queries
|
||||
Status: DONE
|
||||
Dependency: VXPM-005 (Phase 1 complete)
|
||||
Owners: Backend engineer
|
||||
|
||||
Task description:
|
||||
- Implement `/v1/vulns` (list) by querying `IFindingProjectionRepository.QueryScoredAsync()` and mapping `FindingProjection` to `VulnSummary`. The Ledger's `VulnerabilityDetailService` already does the field extraction from `labels` JSONB -- reuse that logic.
|
||||
- Implement `/v1/vulns/{id}` (detail) by calling `VulnerabilityDetailService.GetAsync()` and mapping to `VulnDetail`. The existing `VulnerabilityDetailResponse` is a superset of VulnDetail.
|
||||
- Implement `/v1/evidence-subgraph/{vulnId}` by calling `IEvidenceGraphBuilder.BuildAsync()` and mapping `EvidenceGraphResponse` to `EvidenceSubgraphResponse`. The Ledger's graph model (verdict node, VEX nodes, reachability, runtime, SBOM, provenance) covers all VulnExplorer subgraph node types.
|
||||
|
||||
Tests:
|
||||
- Integration test: create a finding via Ledger event, then query via `/v1/vulns` and verify it appears in the response
|
||||
- Integration test: `GET /v1/vulns/{id}` returns correct detail for a known finding
|
||||
- Integration test: evidence subgraph returns graph with correct node types
|
||||
- Test filtering (CVE, PURL, severity, exploitability, fixAvailable) works against Ledger projection fields
|
||||
- Test pagination (pageToken/pageSize) works
|
||||
|
||||
Users:
|
||||
- `VulnerabilityListService` (UI) at `/api/v1/vulnerabilities` -- ensure response shape unchanged
|
||||
- `EvidenceSubgraphService` (UI) at `/api/vuln-explorer/findings/{id}/evidence-subgraph` -- ensure response shape unchanged
|
||||
- `vulnerability-detail.component.ts` (UI) -- verify data binding unchanged
|
||||
|
||||
Documentation:
|
||||
- Update `docs/modules/findings-ledger/README.md` with new endpoint groups
|
||||
|
||||
Completion criteria:
|
||||
- [ ] `/v1/vulns` returns findings from Ledger DB (not hardcoded data)
|
||||
- [ ] `/v1/vulns/{id}` returns finding detail from Ledger projections
|
||||
- [ ] `/v1/evidence-subgraph/{vulnId}` returns real evidence graph data
|
||||
- [ ] Filtering (CVE, PURL, severity, exploitability, fixAvailable) works against Ledger projection fields
|
||||
- [ ] Pagination (pageToken/pageSize) works
|
||||
|
||||
### VXLM-003 - Migrate VEX decision and fix verification endpoints to Ledger event persistence
|
||||
Status: DONE
|
||||
Dependency: VXLM-001, VXLM-002
|
||||
Owners: Backend engineer
|
||||
|
||||
Task description:
|
||||
- **New Ledger event types**: Add to `LedgerEventConstants` (`src/Findings/StellaOps.Findings.Ledger/Domain/LedgerEventConstants.cs`):
|
||||
- `EventFindingVexDecisionCreated = "finding.vex_decision_created"`
|
||||
- `EventFindingVexDecisionUpdated = "finding.vex_decision_updated"`
|
||||
- `EventFindingFixVerificationCreated = "finding.fix_verification_created"`
|
||||
- `EventFindingFixVerificationUpdated = "finding.fix_verification_updated"`
|
||||
- Add all four to `SupportedEventTypes` and `FindingEventTypes`
|
||||
- **VEX Decisions**: Wire `POST /v1/vex-decisions` to emit a `finding.vex_decision_created` Ledger event with the VEX decision payload in the event body JSONB. The VEX override attestation flow (`IVexOverrideAttestorClient`) is preserved and produces a `ledger_attestation_pointers` record when attestation succeeds.
|
||||
- Wire `PATCH /v1/vex-decisions/{id}` to emit a `finding.vex_decision_updated` event (append-only update).
|
||||
- Wire `GET /v1/vex-decisions` to query `observations` table filtered by action type, or introduce a new Ledger projection for VEX decisions.
|
||||
- Wire `GET /v1/vex-decisions/{id:guid}` to reconstruct from Ledger events.
|
||||
- **Fix Verification**: Wire `POST /v1/fix-verifications` to emit `finding.fix_verification_created` event. Store verdict, transitions, and evidence chain in event body. Wire `PATCH /v1/fix-verifications/{cveId}` to emit `finding.fix_verification_updated` event with state transition.
|
||||
- **Audit Bundle**: Wire `POST /v1/audit-bundles` to delegate to `EvidenceBundleService` or `OrchestratorExportService`, packaging the referenced VEX decisions from the Ledger chain.
|
||||
- **Data migration**: Migrate VulnExplorer's `vulnexplorer.*` tables into Ledger events. Write a one-time migration that:
|
||||
- Reads all VEX decisions from `vulnexplorer.vex_decisions` and emits corresponding Ledger events
|
||||
- Reads all fix verifications from `vulnexplorer.fix_verifications` and emits corresponding events
|
||||
- Records the migration in the Execution Log
|
||||
- Add new SQL migration `010_vex_fix_verification_events.sql` to add the event types to the Ledger's `ledger_event_type` enum (if using enum) or document the new type strings.
|
||||
|
||||
Tests:
|
||||
- Integration test: create VEX decision via `POST /v1/vex-decisions`, verify it persists as Ledger event, query back via `GET`
|
||||
- Integration test: VEX decision with attestation produces both Ledger event and `ledger_attestation_pointers` record
|
||||
- Integration test: fix verification create and update produce state transitions as Ledger events
|
||||
- Integration test: audit bundle aggregates from Ledger data, not in-memory store
|
||||
- Test Merkle chain integrity: new VEX/fix events participate in the append-only hash chain
|
||||
- Test data migration script: verify it correctly converts existing records
|
||||
- Test backward compatibility: old VEX decisions created before migration are still queryable
|
||||
|
||||
Users:
|
||||
- `VexDecisionsHttpClient` (UI) -- verify create/list/get/patch all work with Ledger persistence
|
||||
- `AuditBundlesHttpClient` (UI) -- verify bundle creation aggregates from Ledger events
|
||||
- `triage-explainability-workspace.spec.ts` (E2E) -- verify full triage workflow
|
||||
|
||||
Documentation:
|
||||
- Update `docs/modules/findings-ledger/schema.md` with new event types
|
||||
- Update `docs/modules/findings-ledger/workflow-inference.md` if projection rules change
|
||||
- Update `docs/features/checked/vulnexplorer/vulnexplorer-triage-api.md` to document Ledger-backed persistence
|
||||
|
||||
Completion criteria:
|
||||
- [ ] VEX decisions are persisted as Ledger events (append-only, with Merkle chain integrity)
|
||||
- [ ] VEX override attestations produce `ledger_attestation_pointers` records
|
||||
- [ ] Fix verifications are persisted as Ledger events with state transitions
|
||||
- [ ] Audit bundles aggregate from Ledger data (not in-memory store)
|
||||
- [ ] New SQL migration `010_vex_fix_verification_events.sql` adds event types
|
||||
- [ ] All ConcurrentDictionary stores eliminated
|
||||
- [ ] Data migration from `vulnexplorer.*` tables to Ledger events complete
|
||||
|
||||
### VXLM-004 - Remove VulnExplorer service and update compose/routing/consumers
|
||||
Status: DONE
|
||||
Dependency: VXLM-003
|
||||
Owners: Backend engineer, DevOps
|
||||
|
||||
Task description:
|
||||
- Remove `StellaOps.VulnExplorer.Api/` project from the solution.
|
||||
- Remove `StellaOps.VulnExplorer.WebService/` project from the solution (inline `EvidenceSubgraphContracts` into Ledger if still referenced).
|
||||
- Remove `StellaOps.VulnExplorer.Persistence/` (Phase 1 persistence library) -- its tables are superseded by Ledger events.
|
||||
- Update `docker-compose.stella-ops.yml`:
|
||||
- Remove the vulnexplorer service container
|
||||
- Remove `STELLAOPS_VULNEXPLORER_URL` from the gateway's environment variables
|
||||
- Update `docker-compose.stella-services.yml`:
|
||||
- Remove vulnexplorer service definition
|
||||
- Remove `STELLAOPS_VULNEXPLORER_URL` from shared environment
|
||||
- Update `devops/compose/router-gateway-local.json`:
|
||||
- Change route `^/api/vuln-explorer(.*)` to target `http://findings-ledger.stella-ops.local/api/vuln-explorer$1`
|
||||
- Or add new routes for `/v1/vulns*`, `/v1/vex-decisions*`, etc. targeting findings-ledger
|
||||
- Update `devops/compose/hosts.stellaops.local` -- remove vulnexplorer hostname
|
||||
- Update `devops/compose/envsettings-override.json` -- change `apiBaseUrls.vulnexplorer` to point to findings-ledger or remove if the gateway handles routing
|
||||
- Update `devops/docker/services-matrix.env` -- remove vulnexplorer project path if present
|
||||
- Update `devops/helm/stellaops/templates/vuln-mock.yaml` -- remove or repurpose
|
||||
- Update cross-service references:
|
||||
- `src/Authority/StellaOps.Auth.Abstractions/StellaOpsServiceIdentities.cs` -- deprecate or remove `VulnExplorer` identity (or redirect to findings-ledger)
|
||||
- `src/VexLens/StellaOps.VexLens/Integration/` -- `IVulnExplorerIntegration`/`VulnExplorerIntegration` remain as-is (they use `IConsensusProjectionStore`, not HTTP to VulnExplorer)
|
||||
- `src/Concelier/StellaOps.Concelier.Core/Diagnostics/VulnExplorerTelemetry.cs` -- rename meter to `StellaOps.Findings.VulnExplorer` or leave as-is for telemetry continuity
|
||||
- `src/Router/__Tests/StellaOps.Gateway.WebService.Tests/Middleware/RouteDispatchMiddlewareMicroserviceTests.cs` -- update test expectations for route target
|
||||
- `src/Router/__Tests/StellaOps.Router.Gateway.Tests/OpenApi/OpenApiDocumentGeneratorTests.cs` -- update if it references vulnexplorer routes
|
||||
|
||||
Tests:
|
||||
- Verify solution builds without VulnExplorer projects
|
||||
- Verify all 62+ containers start cleanly (minus vulnexplorer = 61+)
|
||||
- Verify gateway routes `/v1/vulns*`, `/v1/vex-decisions*`, `/v1/evidence-subgraph*`, `/v1/fix-verifications*`, `/v1/audit-bundles*` to findings-ledger service
|
||||
- Verify VexLens integration still works (no runtime dependency on VulnExplorer service)
|
||||
- Verify Concelier telemetry still emits (no runtime dependency on VulnExplorer service)
|
||||
- Run gateway routing tests and verify they pass with updated route targets
|
||||
|
||||
Users:
|
||||
- UI: `vex-decisions.client.ts` `VEX_DECISIONS_API_BASE_URL` -- verify it resolves to the gateway which now routes to findings-ledger
|
||||
- UI: `audit-bundles.client.ts` `AUDIT_BUNDLES_API_BASE_URL` -- same verification
|
||||
- UI: `evidence-subgraph.service.ts` base URL `/api/vuln-explorer` -- verify gateway route rewrite works
|
||||
- UI: `vulnerability-list.service.ts` base URL `/api/v1/vulnerabilities` -- verify routing
|
||||
- `envsettings-override.json` apiBaseUrls update consumed by UI at runtime
|
||||
|
||||
Documentation:
|
||||
- Update `docs/technical/architecture/webservice-catalog.md` -- remove VulnExplorer entry, note merged into Findings Ledger
|
||||
- Update `docs/technical/architecture/port-registry.md` -- remove VulnExplorer port allocation
|
||||
- Update `docs/technical/architecture/component-map.md` -- update diagram
|
||||
- Update `docs/technical/architecture/module-matrix.md` -- remove VulnExplorer row
|
||||
- Update `docs/dev/DEV_ENVIRONMENT_SETUP.md` -- remove VulnExplorer references
|
||||
- Update `docs/dev/SOLUTION_BUILD_GUIDE.md` -- remove VulnExplorer project
|
||||
- Update `docs/technical/cicd/path-filters.md` -- remove VulnExplorer paths
|
||||
|
||||
Completion criteria:
|
||||
- [ ] No vulnexplorer container in compose
|
||||
- [ ] Gateway routes VulnExplorer API paths to findings-ledger service
|
||||
- [ ] Solution builds without VulnExplorer projects
|
||||
- [ ] All containers start cleanly
|
||||
- [ ] Cross-service references updated (VexLens, Concelier, Authority, Router tests)
|
||||
- [ ] UI `envsettings-override.json` updated
|
||||
|
||||
### VXLM-005 - Integration tests, UI validation, and documentation update
|
||||
Status: TODO
|
||||
Dependency: VXLM-004
|
||||
Owners: Backend engineer, QA
|
||||
|
||||
Task description:
|
||||
- Port VulnExplorer test assertions to Ledger test project (`src/Findings/__Tests/StellaOps.Findings.Ledger.Tests/`). Add integration tests that:
|
||||
- Create a finding via Ledger event, then query via `/v1/vulns` and `/v1/vulns/{id}`.
|
||||
- Create a VEX decision via `POST /v1/vex-decisions`, verify it persists as Ledger event, query back via `GET`.
|
||||
- Create a VEX decision with attestation, verify `ledger_attestation_pointers` record.
|
||||
- Create a fix verification, verify state transitions persist as Ledger events.
|
||||
- Create an audit bundle from persisted decisions.
|
||||
- Retrieve evidence subgraph for a finding with real evidence data.
|
||||
- Full triage workflow: create finding -> create VEX decision -> create fix verification -> create audit bundle -> verify all queryable.
|
||||
- Run UI behavioral specs:
|
||||
- `src/Web/StellaOps.Web/src/tests/vuln_explorer/vuln-explorer-with-evidence-tree-and-citation-links.behavior.spec.ts`
|
||||
- `src/Web/StellaOps.Web/src/tests/vuln_explorer/filter-preset-pills-with-url-synchronization.component.spec.ts`
|
||||
- `src/Web/StellaOps.Web/tests/e2e/triage-explainability-workspace.spec.ts`
|
||||
- Remove or archive old VulnExplorer test project (`src/Findings/__Tests/StellaOps.VulnExplorer.Api.Tests/`).
|
||||
- Update documentation:
|
||||
- `src/Findings/AGENTS.md` -- document the merged endpoint surface and note VulnExplorer is now part of Findings Ledger
|
||||
- `docs/modules/findings-ledger/schema.md` -- add new event types (vex_decision_created/updated, fix_verification_created/updated)
|
||||
- `docs/modules/findings-ledger/README.md` -- note VulnExplorer endpoints merged in
|
||||
- `docs/modules/web/README.md` -- update service dependency list
|
||||
- `docs/modules/ui/architecture.md` -- update service dependency list
|
||||
- `docs/modules/ui/component-preservation-map/README.md` -- update VulnExplorer component status
|
||||
- `docs/modules/vex-lens/guides/explorer-integration.md` -- note VulnExplorer merged into Ledger
|
||||
- `docs/modules/authority/AUTHORITY.md` -- note service identity change
|
||||
- `docs/operations/runbooks/vuln-ops.md` -- update operational procedures
|
||||
- `docs/qa/feature-checks/state/vulnexplorer.json` -- update state to reflect merge
|
||||
- `docs/INDEX.md` -- update if VulnExplorer is listed separately
|
||||
- High-level architecture docs (`docs/07_HIGH_LEVEL_ARCHITECTURE.md`) if the service count changes
|
||||
|
||||
Tests:
|
||||
- All 6+ new integration tests pass
|
||||
- All existing Ledger tests pass (no regression)
|
||||
- UI behavioral specs pass
|
||||
- E2E triage workspace spec passes
|
||||
- All ported VulnExplorer test assertions pass in Ledger test project
|
||||
|
||||
Users:
|
||||
- End-to-end validation: all UI flows that previously hit VulnExplorer now work via Findings Ledger
|
||||
- No user-visible behavior change
|
||||
|
||||
Documentation:
|
||||
- All documentation updates listed above completed
|
||||
|
||||
Completion criteria:
|
||||
- [ ] Integration tests cover all 6 merged endpoint groups
|
||||
- [ ] Existing Ledger tests still pass
|
||||
- [ ] UI behavioral specs pass
|
||||
- [ ] E2E triage workspace spec passes
|
||||
- [ ] Old VulnExplorer test project removed or archived
|
||||
- [ ] Module AGENTS.md updated with merged endpoint list
|
||||
- [ ] Schema docs updated with new event types
|
||||
- [ ] All 13+ documentation files updated
|
||||
- [ ] High-level architecture docs updated with new service count
|
||||
|
||||
## Execution Log
|
||||
| Date (UTC) | Update | Owner |
|
||||
| --- | --- | --- |
|
||||
| 2026-04-08 | Sprint created from VulnExplorer/Ledger merge analysis. Option A (merge first, Ledger projections) selected. | Planning |
|
||||
| 2026-04-08 | Sprint restructured into two phases: Phase 1 (in-memory to Postgres migration) and Phase 2 (merge into Ledger). Comprehensive consumer/dependency audit added. | Planning |
|
||||
| 2026-04-08 | Phase 2 implemented (VXLM-001 through VXLM-004): DTOs moved to Ledger `Contracts/VulnExplorer/`, endpoints mounted via `VulnExplorerEndpoints.cs`, adapter services created, compose/routing/services-matrix updated, docs updated. Phase 1 skipped per user direction (wire to existing Ledger services instead of creating separate vulnexplorer schema). VXLM-005 (integration tests) remaining TODO. | Backend |
|
||||
|
||||
## Decisions & Risks
|
||||
- **Decision**: Two-phase approach. Phase 1 migrates VulnExplorer to Postgres while it remains a standalone service. Phase 2 merges into Findings Ledger. Rationale: reduces risk by separating persistence migration from service boundary changes; allows independent validation of the data model.
|
||||
- **Decision**: VulnExplorer's Phase 1 tables (`vulnexplorer.*` schema) are temporary. They serve as a stepping stone to validate the data model before the Ledger merge in Phase 2. Phase 2 will migrate their data into Ledger events and drop the tables.
|
||||
- **Decision**: VulnExplorer API paths are preserved as-is in the Ledger WebService to avoid frontend breaking changes. They will be documented as aliases for the Ledger's native v2 endpoints.
|
||||
- **Decision**: VulnExplorer reads from `findings_ledger.findings_projection` for vuln list/detail (Phase 1, VXPM-003) rather than creating its own vulnerability table. Rationale: avoids data duplication, and this is the same table that Ledger will serve in Phase 2.
|
||||
- **Risk**: The VEX override attestation workflow (`IVexOverrideAttestorClient`) currently uses a stub in VulnExplorer. Merging preserves this stub but it must be connected to the real Attestor service for production. This is existing tech debt, not introduced by the migration.
|
||||
- **Risk**: New Ledger event types (`finding.vex_decision_created`, `finding.fix_verification_created`) require a SQL migration to extend the event type set. Must ensure the migration runs before the new code deploys (auto-migration handles this).
|
||||
- **Risk**: VexLens `IVulnExplorerIntegration` does not make HTTP calls to VulnExplorer -- it uses `IConsensusProjectionStore` in-process. No service dependency, but the interface name references VulnExplorer. Consider renaming in a follow-up sprint.
|
||||
- **Risk**: Concelier `VulnExplorerTelemetry` meter name (`StellaOps.Concelier.VulnExplorer`) is baked into dashboards/alerts. Renaming would break observability continuity. Decision: leave meter name as-is, document the historical naming.
|
||||
- **Risk**: `envsettings-override.json` has `apiBaseUrls.vulnexplorer` pointing to `https://stella-ops.local`. If the UI reads this to build API URLs, it must be updated in Phase 2. If the gateway handles all routing, this may be a no-op.
|
||||
|
||||
## Next Checkpoints
|
||||
- **Phase 1**: VXPM-001/002/003 can proceed in parallel immediately. VXPM-004 integrates all three. VXPM-005 validates the complete Phase 1.
|
||||
- **Phase 2 gate**: Phase 2 must not start until VXPM-005 passes. All VulnExplorer endpoints must be Postgres-backed and tested.
|
||||
- **Phase 2**: VXLM-001 + VXLM-002 can proceed in parallel. VXLM-003 is the critical-path task. VXLM-004 (service removal) should be the last code change.
|
||||
- **Demo (Phase 1)**: VulnExplorer with real Postgres persistence, zero hardcoded data, data survives restarts.
|
||||
- **Demo (Phase 2)**: Merged service with Ledger-backed VulnExplorer endpoints, no VulnExplorer container, all UI flows working.
|
||||
@@ -0,0 +1,540 @@
|
||||
# Sprint 20260408-003 - Scheduler Plugin Architecture + Doctor Migration
|
||||
|
||||
## Topic & Scope
|
||||
|
||||
- Design and implement a generic job-plugin system for the Scheduler service, enabling non-scanning workloads (health checks, policy sweeps, graph builds, etc.) to be scheduled and executed as first-class Scheduler jobs.
|
||||
- Migrate Doctor's thin scheduling layer (`StellaOps.Doctor.Scheduler`) to become the first Scheduler job plugin, eliminating a standalone service while preserving Doctor-specific UX and trending.
|
||||
- Working directory: `src/JobEngine/` (primary), `src/Doctor/` (migration source), `src/Web/StellaOps.Web/src/app/features/doctor/` (UI adapter).
|
||||
- Expected evidence: interface definitions compile, Doctor plugin builds, existing Scheduler tests pass, new plugin tests pass, Doctor UI still renders schedules and trends.
|
||||
|
||||
## Dependencies & Concurrency
|
||||
|
||||
- No upstream sprint blockers. The Scheduler WebService and Doctor Scheduler are both stable.
|
||||
- Batch 1 (tasks 001-004) can proceed independently of Batch 2 (005-009).
|
||||
- Batch 2 (Doctor plugin) depends on Batch 1 (plugin contracts).
|
||||
- Batch 3 (UI + cleanup, tasks 010-012) depends on Batch 2.
|
||||
- Safe to develop in parallel with any FE or Findings sprints since working directories do not overlap.
|
||||
|
||||
## Documentation Prerequisites
|
||||
|
||||
- `docs/modules/scheduler/architecture.md` (read before DOING)
|
||||
- `src/JobEngine/AGENTS.Scheduler.md`
|
||||
- `src/Doctor/AGENTS.md`
|
||||
- `docs/doctor/doctor-capabilities.md`
|
||||
|
||||
---
|
||||
|
||||
## Architecture Design
|
||||
|
||||
### A. Current State Analysis
|
||||
|
||||
**Scheduler** (src/JobEngine/StellaOps.Scheduler.WebService):
|
||||
- Manages `Schedule` entities with cron expressions, `ScheduleMode` (AnalysisOnly, ContentRefresh), `Selector` (image targeting), `ScheduleOnlyIf` preconditions, `ScheduleNotify` preferences, and `ScheduleLimits`.
|
||||
- Creates `Run` entities with state machine: Planning -> Queued -> Running -> Completed/Error/Cancelled.
|
||||
- The `Schedule.Mode` enum is hardcoded to scanning modes. The `Selector` model is image-centric (digests, namespaces, repositories, labels).
|
||||
- Worker Host processes queue segments via `StellaOps.Scheduler.Queue` and `StellaOps.Scheduler.Worker.DependencyInjection`.
|
||||
- Has an empty `StellaOps.Scheduler.plugins/scheduler/` directory and a working `PluginHostOptions` / `PluginHost.LoadPlugins()` assembly-loading pipeline via `StellaOps.Plugin.Hosting`.
|
||||
- `SystemScheduleBootstrap` seeds 6 system schedules on startup.
|
||||
- Already registers plugin assemblies via `RegisterPluginRoutines()` in Program.cs (line 189), which scans for `IDependencyInjectionRoutine` implementations.
|
||||
|
||||
**Doctor Scheduler** (src/Doctor/StellaOps.Doctor.Scheduler):
|
||||
- Standalone slim WebApplication (~65 lines in Program.cs).
|
||||
- `DoctorScheduleWorker` (BackgroundService): polls every N seconds, evaluates cron via Cronos, dispatches to `ScheduleExecutor`.
|
||||
- `ScheduleExecutor`: makes HTTP POST to Doctor WebService `/api/v1/doctor/run`, polls for completion, stores trend data, evaluates alert rules.
|
||||
- `DoctorSchedule` model: ScheduleId, Name, CronExpression, Mode (Quick/Full/Categories/Plugins), Categories[], Plugins[], Enabled, Alerts (AlertConfiguration), TimeZoneId, LastRunAt/Id/Status.
|
||||
- All persistence is in-memory (`InMemoryScheduleRepository`, `InMemoryTrendRepository`). No Postgres implementation exists yet.
|
||||
- Exposes REST endpoints at `/api/v1/doctor/scheduler/schedules` and `/api/v1/doctor/scheduler/trends`.
|
||||
- 20 Doctor plugins across 18+ directories under `src/Doctor/__Plugins/`, each implementing `IDoctorPlugin` with `IDoctorCheck[]`.
|
||||
|
||||
**Doctor UI** (src/Web/StellaOps.Web/src/app/features/doctor):
|
||||
- Calls Doctor WebService directly (`/doctor/api/v1/doctor/...`) for runs, checks, plugins, reports.
|
||||
- Calls Doctor Scheduler at `/api/v1/doctor/scheduler/trends/categories/{category}` for trend sparklines.
|
||||
- No schedule management UI exists yet (schedules are created via API or seed data).
|
||||
|
||||
### B. Plugin Architecture Design
|
||||
|
||||
#### B.1 The `ISchedulerJobPlugin` Contract
|
||||
|
||||
A new library `StellaOps.Scheduler.Plugin.Abstractions` defines the plugin contract:
|
||||
|
||||
```csharp
|
||||
namespace StellaOps.Scheduler.Plugin;
|
||||
|
||||
/// <summary>
|
||||
/// Identifies the kind of job a plugin handles. Used in Schedule.JobKind
|
||||
/// to route cron triggers to the correct plugin at execution time.
|
||||
/// </summary>
|
||||
public interface ISchedulerJobPlugin
|
||||
{
|
||||
/// <summary>
|
||||
/// Unique, stable identifier for this job kind (e.g., "scan", "doctor", "policy-sweep").
|
||||
/// Stored in the Schedule record; must be immutable once published.
|
||||
/// </summary>
|
||||
string JobKind { get; }
|
||||
|
||||
/// <summary>
|
||||
/// Human-readable display name for the UI.
|
||||
/// </summary>
|
||||
string DisplayName { get; }
|
||||
|
||||
/// <summary>
|
||||
/// Plugin version for compatibility checking.
|
||||
/// </summary>
|
||||
Version Version { get; }
|
||||
|
||||
/// <summary>
|
||||
/// Creates a typed execution plan from a Schedule + Run.
|
||||
/// Called when the cron fires or a manual run is created.
|
||||
/// Returns a plan object that the Scheduler persists as the Run's plan payload.
|
||||
/// </summary>
|
||||
Task<JobPlan> CreatePlanAsync(JobPlanContext context, CancellationToken ct);
|
||||
|
||||
/// <summary>
|
||||
/// Executes the plan. Called by the Worker Host.
|
||||
/// Must be idempotent and support cancellation.
|
||||
/// Updates Run state via the provided IRunProgressReporter.
|
||||
/// </summary>
|
||||
Task ExecuteAsync(JobExecutionContext context, CancellationToken ct);
|
||||
|
||||
/// <summary>
|
||||
/// Optionally validates plugin-specific configuration stored in Schedule.PluginConfig.
|
||||
/// Called on schedule create/update.
|
||||
/// </summary>
|
||||
Task<JobConfigValidationResult> ValidateConfigAsync(
|
||||
IReadOnlyDictionary<string, object?> pluginConfig,
|
||||
CancellationToken ct);
|
||||
|
||||
/// <summary>
|
||||
/// Returns the JSON schema for plugin-specific configuration, enabling UI-driven forms.
|
||||
/// </summary>
|
||||
string? GetConfigJsonSchema();
|
||||
|
||||
/// <summary>
|
||||
/// Registers plugin-specific services into DI.
|
||||
/// Called once during host startup.
|
||||
/// </summary>
|
||||
void ConfigureServices(IServiceCollection services, IConfiguration configuration);
|
||||
|
||||
/// <summary>
|
||||
/// Registers plugin-specific HTTP endpoints (optional).
|
||||
/// Called during app.Map* phase.
|
||||
/// </summary>
|
||||
void MapEndpoints(IEndpointRouteBuilder routes);
|
||||
}
|
||||
```
|
||||
|
||||
#### B.2 Supporting Types
|
||||
|
||||
```csharp
|
||||
/// <summary>
|
||||
/// Immutable context passed to CreatePlanAsync.
|
||||
/// </summary>
|
||||
public sealed record JobPlanContext(
|
||||
Schedule Schedule,
|
||||
Run Run,
|
||||
IServiceProvider Services,
|
||||
TimeProvider TimeProvider);
|
||||
|
||||
/// <summary>
|
||||
/// The plan produced by a plugin. Serialized to JSON and stored on the Run.
|
||||
/// </summary>
|
||||
public sealed record JobPlan(
|
||||
string JobKind,
|
||||
IReadOnlyDictionary<string, object?> Payload,
|
||||
int EstimatedSteps = 1);
|
||||
|
||||
/// <summary>
|
||||
/// Context passed to ExecuteAsync.
|
||||
/// </summary>
|
||||
public sealed record JobExecutionContext(
|
||||
Schedule Schedule,
|
||||
Run Run,
|
||||
JobPlan Plan,
|
||||
IRunProgressReporter Reporter,
|
||||
IServiceProvider Services,
|
||||
TimeProvider TimeProvider);
|
||||
|
||||
/// <summary>
|
||||
/// Callback interface for plugins to report progress and update Run state.
|
||||
/// </summary>
|
||||
public interface IRunProgressReporter
|
||||
{
|
||||
Task ReportProgressAsync(int completed, int total, string? message = null, CancellationToken ct = default);
|
||||
Task TransitionStateAsync(RunState newState, string? error = null, CancellationToken ct = default);
|
||||
Task AppendLogAsync(string message, string level = "info", CancellationToken ct = default);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Result of plugin config validation.
|
||||
/// </summary>
|
||||
public sealed record JobConfigValidationResult(
|
||||
bool IsValid,
|
||||
IReadOnlyList<string> Errors);
|
||||
```
|
||||
|
||||
#### B.3 Schedule Model Extension
|
||||
|
||||
The existing `Schedule` record needs two new fields:
|
||||
|
||||
1. **`JobKind`** (string, default `"scan"`): routes to the correct `ISchedulerJobPlugin`. Existing schedules implicitly use `"scan"`.
|
||||
2. **`PluginConfig`** (ImmutableDictionary<string, object?>?, optional): plugin-specific configuration stored as JSON. For scan jobs this is null (mode/selector cover everything). For Doctor jobs this contains `{ "doctorMode": "full", "categories": [...], "plugins": [...], "alerts": {...} }`.
|
||||
|
||||
The existing `ScheduleMode` and `Selector` remain valid for scan-type jobs. Plugins that don't target images can ignore `Selector` and set `Scope = AllImages` as a no-op.
|
||||
|
||||
#### B.4 Plugin Registry and Discovery
|
||||
|
||||
```
|
||||
SchedulerPluginRegistry : ISchedulerPluginRegistry
|
||||
- Dictionary<string, ISchedulerJobPlugin> _plugins
|
||||
- Register(ISchedulerJobPlugin plugin)
|
||||
- Resolve(string jobKind) -> ISchedulerJobPlugin?
|
||||
- ListRegistered() -> IReadOnlyList<(string JobKind, string DisplayName)>
|
||||
```
|
||||
|
||||
Plugins are discovered in two ways:
|
||||
1. **Built-in**: The existing scan logic is refactored into `ScanJobPlugin : ISchedulerJobPlugin` with `JobKind = "scan"`. Registered in DI unconditionally.
|
||||
2. **Assembly-loaded**: The existing `PluginHost.LoadPlugins()` pipeline scans `plugins/scheduler/` for DLLs. Any type implementing `ISchedulerJobPlugin` is instantiated and registered. This uses the existing `PluginHostOptions` infrastructure already wired in the Scheduler.
|
||||
|
||||
#### B.5 Execution Flow
|
||||
|
||||
```
|
||||
Cron fires for Schedule (jobKind="doctor")
|
||||
-> SchedulerPluginRegistry.Resolve("doctor") -> DoctorJobPlugin
|
||||
-> DoctorJobPlugin.CreatePlanAsync(schedule, run) -> JobPlan
|
||||
-> Run persisted with state=Queued, plan payload
|
||||
-> Worker dequeues Run
|
||||
-> DoctorJobPlugin.ExecuteAsync(context)
|
||||
-> Calls Doctor WebService HTTP API (same as current ScheduleExecutor)
|
||||
-> Reports progress via IRunProgressReporter
|
||||
-> Stores trend data
|
||||
-> Evaluates alerts
|
||||
-> Run transitions to Completed/Error
|
||||
```
|
||||
|
||||
#### B.6 Backward Compatibility
|
||||
|
||||
- `Schedule.JobKind` defaults to `"scan"` for all existing schedules (migration adds column with default).
|
||||
- `Schedule.PluginConfig` defaults to null for existing schedules.
|
||||
- `ScanJobPlugin` wraps the current execution logic with no behavioral change.
|
||||
- The `ScheduleMode` enum remains but is only meaningful for `jobKind="scan"`. Other plugins ignore it (or set a sentinel value).
|
||||
- All existing API contracts (`/api/v1/scheduler/schedules`, `/api/v1/scheduler/runs`) are extended, not broken.
|
||||
|
||||
### C. Doctor Plugin Design
|
||||
|
||||
#### C.1 DoctorJobPlugin
|
||||
|
||||
```csharp
|
||||
public sealed class DoctorJobPlugin : ISchedulerJobPlugin
|
||||
{
|
||||
public string JobKind => "doctor";
|
||||
public string DisplayName => "Doctor Health Checks";
|
||||
|
||||
// CreatePlanAsync: reads DoctorScheduleConfig from Schedule.PluginConfig,
|
||||
// resolves which checks to run, returns JobPlan with check list.
|
||||
|
||||
// ExecuteAsync: HTTP POST to Doctor WebService /api/v1/doctor/run,
|
||||
// polls for completion (same logic as current ScheduleExecutor),
|
||||
// stores trend data via ITrendRepository,
|
||||
// evaluates alerts via IAlertService.
|
||||
|
||||
// MapEndpoints: registers /api/v1/scheduler/doctor/trends/* endpoints
|
||||
// to serve trend data (proxied from Scheduler's database).
|
||||
}
|
||||
```
|
||||
|
||||
#### C.2 Doctor-Specific Config Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"doctorMode": "full|quick|categories|plugins",
|
||||
"categories": ["security", "platform"],
|
||||
"plugins": ["stellaops.doctor.agent"],
|
||||
"timeoutSeconds": 300,
|
||||
"alerts": {
|
||||
"enabled": true,
|
||||
"alertOnFail": true,
|
||||
"alertOnWarn": false,
|
||||
"alertOnStatusChange": true,
|
||||
"channels": ["email"],
|
||||
"emailRecipients": [],
|
||||
"webhookUrls": [],
|
||||
"minSeverity": "Fail"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This replaces `DoctorSchedule.Mode`, `Categories`, `Plugins`, and `Alerts` with structured data inside `Schedule.PluginConfig`.
|
||||
|
||||
#### C.3 What Stays vs. What Moves
|
||||
|
||||
| Component | Current Location | After Migration |
|
||||
|---|---|---|
|
||||
| Doctor WebService | `src/Doctor/StellaOps.Doctor.WebService/` | **Stays unchanged** -- remains the execution engine |
|
||||
| Doctor Scheduler (standalone service) | `src/Doctor/StellaOps.Doctor.Scheduler/` | **Deprecated** -- replaced by DoctorJobPlugin in Scheduler |
|
||||
| Doctor checks (20 plugins) | `src/Doctor/__Plugins/` | **Stay unchanged** -- loaded by Doctor WebService |
|
||||
| Doctor schedule CRUD | Doctor Scheduler endpoints | **Moves** to Scheduler schedule CRUD (with jobKind="doctor") |
|
||||
| Doctor trend storage | `InMemoryTrendRepository` | **Moves** to Scheduler persistence (new table `scheduler.doctor_trends`) |
|
||||
| Doctor trend endpoints | `/api/v1/doctor/scheduler/trends/*` | **Moves** to DoctorJobPlugin.MapEndpoints at same paths (or proxied) |
|
||||
| Doctor UI | `src/Web/.../doctor/` | **Minor change** -- trend API base URL may change, schedule API uses Scheduler |
|
||||
|
||||
#### C.4 Doctor UI Continuity
|
||||
|
||||
The Doctor UI (`doctor.client.ts`) currently calls:
|
||||
1. `/doctor/api/v1/doctor/...` (runs, checks, plugins, reports) -- **no change needed**, Doctor WebService stays.
|
||||
2. `/api/v1/doctor/scheduler/trends/categories/{category}` (trends) -- **routed to DoctorJobPlugin endpoints registered in Scheduler**, or the existing Doctor Scheduler service can be kept running temporarily as a compatibility shim.
|
||||
|
||||
Strategy: DoctorJobPlugin registers the same trend endpoints under the Scheduler service. The gateway route for `doctor-scheduler.stella-ops.local` is remapped to the Scheduler service. UI code requires zero changes.
|
||||
|
||||
### D. What This Architecture Enables (Future)
|
||||
|
||||
After this sprint, adding a new scheduled job type requires:
|
||||
1. Implement `ISchedulerJobPlugin` (one class + supporting types).
|
||||
2. Drop the DLL into `plugins/scheduler/`.
|
||||
3. Create schedules with `jobKind="your-kind"` and `pluginConfig={...}`.
|
||||
4. No Scheduler core changes needed.
|
||||
|
||||
Future plugin candidates: `policy-sweep`, `graph-build`, `feed-refresh`, `evidence-export`, `compliance-audit`.
|
||||
|
||||
---
|
||||
|
||||
## Delivery Tracker
|
||||
|
||||
### TASK-001 - Create StellaOps.Scheduler.Plugin.Abstractions library
|
||||
Status: TODO
|
||||
Dependency: none
|
||||
Owners: Developer (Backend)
|
||||
Task description:
|
||||
- Create new class library `src/JobEngine/StellaOps.Scheduler.__Libraries/StellaOps.Scheduler.Plugin.Abstractions/`.
|
||||
- Define `ISchedulerJobPlugin`, `JobPlanContext`, `JobPlan`, `JobExecutionContext`, `IRunProgressReporter`, `JobConfigValidationResult`.
|
||||
- Target net10.0. No external dependencies beyond `StellaOps.Scheduler.Models`.
|
||||
- Add to `StellaOps.JobEngine.sln`.
|
||||
|
||||
Completion criteria:
|
||||
- [ ] Library compiles with zero warnings
|
||||
- [ ] All types documented with XML comments
|
||||
- [ ] Added to solution and referenced by Scheduler.WebService and Scheduler.Worker.Host csproj files
|
||||
|
||||
### TASK-002 - Create SchedulerPluginRegistry
|
||||
Status: TODO
|
||||
Dependency: TASK-001
|
||||
Owners: Developer (Backend)
|
||||
Task description:
|
||||
- Create `ISchedulerPluginRegistry` and `SchedulerPluginRegistry` in the Scheduler.WebService project (or a shared library).
|
||||
- Registry stores `Dictionary<string, ISchedulerJobPlugin>` keyed by `JobKind`.
|
||||
- Provides `Register()`, `Resolve(string jobKind)`, `ListRegistered()`.
|
||||
- Wire into DI as singleton in Program.cs.
|
||||
- Integrate with existing `PluginHost.LoadPlugins()` to discover and register `ISchedulerJobPlugin` implementations from plugin assemblies.
|
||||
|
||||
Completion criteria:
|
||||
- [ ] Registry resolves built-in plugins
|
||||
- [ ] Registry discovers plugins from assembly-loaded DLLs
|
||||
- [ ] Unit tests verify registration, resolution, and duplicate-kind rejection
|
||||
|
||||
### TASK-003 - Extend Schedule model with JobKind and PluginConfig
|
||||
Status: TODO
|
||||
Dependency: TASK-001
|
||||
Owners: Developer (Backend)
|
||||
Task description:
|
||||
- Add `JobKind` (string, default "scan") and `PluginConfig` (ImmutableDictionary<string, object?>?) to the `Schedule` record.
|
||||
- Update `ScheduleCreateRequest` and `ScheduleUpdateRequest` contracts to accept these fields.
|
||||
- Update `ScheduleEndpoints` create/update handlers to validate `PluginConfig` via the resolved plugin's `ValidateConfigAsync()`.
|
||||
- Add SQL migration to add `job_kind` (varchar, default 'scan') and `plugin_config` (jsonb, nullable) columns to the schedules table.
|
||||
- Update EF Core entity mapping and compiled model.
|
||||
- Update `SystemScheduleBootstrap` to set `JobKind = "scan"` explicitly.
|
||||
|
||||
Completion criteria:
|
||||
- [ ] Existing schedule tests pass (backward compatible)
|
||||
- [ ] New schedules can be created with jobKind and pluginConfig
|
||||
- [ ] SQL migration is embedded resource and auto-applies
|
||||
- [ ] Serialization round-trips correctly for pluginConfig
|
||||
|
||||
### TASK-004 - Refactor existing scan logic into ScanJobPlugin
|
||||
Status: TODO
|
||||
Dependency: TASK-001, TASK-002
|
||||
Owners: Developer (Backend)
|
||||
Task description:
|
||||
- Create `ScanJobPlugin : ISchedulerJobPlugin` with `JobKind = "scan"`.
|
||||
- `CreatePlanAsync`: reuse existing run-planning logic (impact resolution, selector evaluation, queue dispatch).
|
||||
- `ExecuteAsync`: reuse existing worker segment processing.
|
||||
- `ValidateConfigAsync`: validate ScheduleMode is valid.
|
||||
- `ConfigureServices`: no-op (scan services already registered).
|
||||
- `MapEndpoints`: no-op (scan endpoints already registered).
|
||||
- Register as built-in plugin in `SchedulerPluginRegistry` during DI setup.
|
||||
- This is a refactoring task. No behavioral change allowed.
|
||||
|
||||
Completion criteria:
|
||||
- [ ] Existing scan schedules work identically through the plugin path
|
||||
- [ ] All existing Scheduler tests pass without modification
|
||||
- [ ] ScanJobPlugin is the default plugin when jobKind is "scan" or null
|
||||
|
||||
### TASK-005 - Create StellaOps.Scheduler.Plugin.Doctor library
|
||||
Status: TODO
|
||||
Dependency: TASK-001, TASK-003
|
||||
Owners: Developer (Backend)
|
||||
Task description:
|
||||
- Create new class library `src/JobEngine/StellaOps.Scheduler.plugins/StellaOps.Scheduler.Plugin.Doctor/`.
|
||||
- Implement `DoctorJobPlugin : ISchedulerJobPlugin` with `JobKind = "doctor"`.
|
||||
- Port `ScheduleExecutor` logic: HTTP POST to Doctor WebService, poll for completion, map results.
|
||||
- Port `DoctorScheduleConfig` deserialization from `Schedule.PluginConfig`.
|
||||
- Port `AlertConfiguration` evaluation and `IAlertService` integration.
|
||||
- `ConfigureServices`: register `HttpClient` for Doctor API, `IAlertService`, `ITrendRepository`.
|
||||
- Use Scheduler's persistence layer for trend storage (new table via embedded SQL migration).
|
||||
|
||||
Completion criteria:
|
||||
- [ ] Plugin compiles and loads via PluginHost
|
||||
- [ ] Plugin can create a plan from a doctor-type schedule
|
||||
- [ ] Plugin executes a doctor run via HTTP against Doctor WebService
|
||||
- [ ] Trend data is stored in Scheduler's Postgres schema
|
||||
|
||||
### TASK-006 - Add Doctor trend persistence to Scheduler schema
|
||||
Status: TODO
|
||||
Dependency: TASK-005
|
||||
Owners: Developer (Backend)
|
||||
Task description:
|
||||
- Add SQL migration creating `scheduler.doctor_trends` table (timestamp, check_id, plugin_id, category, run_id, status, health_score, duration_ms, evidence_values jsonb).
|
||||
- Add `scheduler.doctor_trend_summaries` materialized view or summary query.
|
||||
- Implement `PostgresDoctorTrendRepository : ITrendRepository` using Scheduler's DB connection.
|
||||
- Implement data retention pruning (configurable, default 365 days).
|
||||
|
||||
Completion criteria:
|
||||
- [ ] Migration auto-applies on Scheduler startup
|
||||
- [ ] Trend data round-trips correctly
|
||||
- [ ] Pruning removes old data beyond retention period
|
||||
- [ ] Query performance acceptable for 365-day windows
|
||||
|
||||
### TASK-007 - Register Doctor trend and schedule endpoints in DoctorJobPlugin
|
||||
Status: TODO
|
||||
Dependency: TASK-005, TASK-006
|
||||
Owners: Developer (Backend)
|
||||
Task description:
|
||||
- Implement `DoctorJobPlugin.MapEndpoints()` to register:
|
||||
- `GET /api/v1/scheduler/doctor/trends` (mirrors existing `/api/v1/doctor/scheduler/trends`)
|
||||
- `GET /api/v1/scheduler/doctor/trends/checks/{checkId}`
|
||||
- `GET /api/v1/scheduler/doctor/trends/categories/{category}`
|
||||
- `GET /api/v1/scheduler/doctor/trends/degrading`
|
||||
- Ensure response shapes match current Doctor Scheduler endpoint contracts for UI compatibility.
|
||||
- Add gateway route alias so requests to `/api/v1/doctor/scheduler/trends/*` are forwarded to Scheduler service.
|
||||
|
||||
Completion criteria:
|
||||
- [ ] All trend endpoints return correct data shapes
|
||||
- [ ] Existing Doctor UI trend sparklines work without code changes
|
||||
- [ ] Gateway routing verified
|
||||
|
||||
### TASK-008 - Seed default Doctor schedules via SystemScheduleBootstrap
|
||||
Status: TODO
|
||||
Dependency: TASK-003, TASK-005
|
||||
Owners: Developer (Backend)
|
||||
Task description:
|
||||
- Add Doctor system schedules to `SystemScheduleBootstrap.SystemSchedules`:
|
||||
- `doctor-full-daily` ("Daily Health Check", `0 4 * * *`, jobKind="doctor", pluginConfig for Full mode)
|
||||
- `doctor-quick-hourly` ("Hourly Quick Check", `0 * * * *`, jobKind="doctor", pluginConfig for Quick mode)
|
||||
- `doctor-compliance-weekly` ("Weekly Compliance Audit", `0 5 * * 0`, jobKind="doctor", pluginConfig for Categories=["compliance"])
|
||||
- These replace the in-memory seeds from Doctor Scheduler's `InMemoryScheduleRepository`.
|
||||
|
||||
Completion criteria:
|
||||
- [ ] Doctor schedules are created on fresh DB
|
||||
- [ ] Existing scan schedules unaffected
|
||||
- [ ] Schedules appear in Scheduler API with correct jobKind and pluginConfig
|
||||
|
||||
### TASK-009 - Integration tests for Doctor plugin lifecycle
|
||||
Status: TODO
|
||||
Dependency: TASK-005, TASK-006, TASK-007, TASK-008
|
||||
Owners: Developer (Backend), Test Automation
|
||||
Task description:
|
||||
- Add integration tests in `src/JobEngine/StellaOps.Scheduler.__Tests/`:
|
||||
- Plugin discovery and registration test
|
||||
- Doctor schedule create/update with pluginConfig validation
|
||||
- Doctor plan creation from schedule
|
||||
- Doctor execution mock (mock HTTP to Doctor WebService)
|
||||
- Trend storage and query
|
||||
- Alert evaluation
|
||||
- Use deterministic fixtures and `TimeProvider.System` replacement for time control.
|
||||
|
||||
Completion criteria:
|
||||
- [ ] All new tests pass
|
||||
- [ ] No flaky tests (deterministic time, no network)
|
||||
- [ ] Coverage includes happy path, validation errors, execution errors, cancellation
|
||||
|
||||
### TASK-010 - Update Doctor UI trend API base URL
|
||||
Status: TODO
|
||||
Dependency: TASK-007
|
||||
Owners: Developer (Frontend)
|
||||
Task description:
|
||||
- If gateway routing alias is set up correctly (TASK-007), this may be a no-op.
|
||||
- If API path changes, update `doctor.client.ts` `getTrends()` method to use new endpoint path.
|
||||
- Verify trend sparklines render correctly.
|
||||
|
||||
Completion criteria:
|
||||
- [ ] Doctor dashboard trend sparklines display data
|
||||
- [ ] No console errors related to trend API calls
|
||||
|
||||
### TASK-011 - Deprecate Doctor Scheduler standalone service
|
||||
Status: TODO
|
||||
Dependency: TASK-009 (all tests pass)
|
||||
Owners: Developer (Backend), Project Manager
|
||||
Task description:
|
||||
- Add deprecation notice to `src/Doctor/StellaOps.Doctor.Scheduler/README.md`.
|
||||
- Remove Doctor Scheduler from `docker-compose.stella-ops.yml` (or disable by default).
|
||||
- Remove Doctor Scheduler from `devops/compose/services-matrix.env` if present.
|
||||
- Keep source code intact for one release cycle before deletion.
|
||||
- Update `docs/modules/doctor/` to reflect that scheduling is now handled by the Scheduler service.
|
||||
|
||||
Completion criteria:
|
||||
- [ ] Doctor Scheduler container no longer starts in default compose
|
||||
- [ ] All Doctor scheduling functionality verified via Scheduler service
|
||||
- [ ] Deprecation documented
|
||||
|
||||
### TASK-012 - Update architecture documentation
|
||||
Status: TODO
|
||||
Dependency: TASK-004, TASK-005
|
||||
Owners: Documentation Author
|
||||
Task description:
|
||||
- Update `docs/modules/scheduler/architecture.md` with plugin architecture section.
|
||||
- Add `ISchedulerJobPlugin` contract reference.
|
||||
- Update `docs/modules/doctor/` to document scheduler integration.
|
||||
- Update `docs/07_HIGH_LEVEL_ARCHITECTURE.md` if Scheduler's role description needs updating.
|
||||
- Create or update `src/JobEngine/StellaOps.Scheduler.plugins/AGENTS.md` with plugin development guide.
|
||||
|
||||
Completion criteria:
|
||||
- [ ] Architecture docs reflect plugin system
|
||||
- [ ] Doctor scheduling migration documented
|
||||
- [ ] Plugin development guide exists for future plugin authors
|
||||
|
||||
## Execution Log
|
||||
|
||||
| Date (UTC) | Update | Owner |
|
||||
| --- | --- | --- |
|
||||
| 2026-04-08 | Sprint created with full architectural design after codebase analysis. 12 tasks defined across 3 batches. | Planning |
|
||||
|
||||
## Decisions & Risks
|
||||
|
||||
### Decisions
|
||||
|
||||
1. **Plugin interface vs. message-based dispatch**: Chose an in-process `ISchedulerJobPlugin` interface over a message queue dispatch model. Rationale: the Scheduler already has assembly-loading infrastructure (`PluginHost`), and in-process execution avoids adding another IPC layer. Plugins that need to call remote services (like Doctor) do so via HttpClient, which is already the pattern.
|
||||
|
||||
2. **Schedule model extension vs. separate table**: Chose to extend the existing `Schedule` record with `JobKind` + `PluginConfig` rather than creating a separate `PluginSchedule` table. Rationale: keeps the CRUD API unified, avoids join complexity, and the JSON pluginConfig column provides flexibility without schema changes per plugin.
|
||||
|
||||
3. **Doctor WebService stays**: Doctor WebService remains a standalone service. The plugin only replaces the scheduling/triggering layer (Doctor Scheduler). This preserves the existing Doctor engine, plugin loading, check execution, and report storage. The plugin communicates with Doctor WebService via HTTP, same as today.
|
||||
|
||||
4. **Trend data in Scheduler schema**: Doctor trend data moves to the Scheduler's Postgres schema rather than staying in Doctor's (non-existent) Postgres. Rationale: Scheduler already has persistent storage; Doctor Scheduler was in-memory only. This gives trends durability without adding a new database dependency to Doctor.
|
||||
|
||||
5. **ScanJobPlugin as refactoring, not rewrite**: The existing scan logic is wrapped in `ScanJobPlugin` by extracting and delegating, not by rewriting. This minimizes regression risk.
|
||||
|
||||
### Risks
|
||||
|
||||
1. **Schedule.PluginConfig schema evolution**: As plugin configs evolve, backward compatibility of the JSON blob must be maintained. Mitigation: plugins should version their config schema and handle migration in `ValidateConfigAsync`.
|
||||
|
||||
2. **Doctor WebService availability during scheduled runs**: If Doctor WebService is down, the DoctorJobPlugin's execution will fail. Mitigation: implement retry with backoff in the plugin, and use Run state machine to track Error state with meaningful messages.
|
||||
|
||||
3. **Gateway routing for trend endpoints**: The UI currently hits Doctor Scheduler directly. After migration, requests must be routed to the Scheduler service. Mitigation: TASK-007 explicitly addresses gateway configuration, and TASK-010 handles UI fallback.
|
||||
|
||||
4. **Compiled model regeneration**: Adding columns to Schedule requires regenerating EF Core compiled models. This is mechanical but must not be forgotten.
|
||||
|
||||
5. **Plugin isolation**: In-process plugins share the Scheduler's AppDomain. A misbehaving plugin (memory leak, thread starvation) affects all jobs. Mitigation: use `SemaphoreSlim` for concurrency limits (same pattern as current Doctor Scheduler), add plugin execution timeouts.
|
||||
|
||||
## Next Checkpoints
|
||||
|
||||
- **Batch 1 complete** (TASK-001 through TASK-004): Plugin abstractions + registry + scan refactoring. Demo: existing scan schedules work through plugin dispatch. Estimated: 3-4 days.
|
||||
- **Batch 2 complete** (TASK-005 through TASK-009): Doctor plugin + trend storage + tests. Demo: doctor health checks triggered by Scheduler, trends visible. Estimated: 4-5 days.
|
||||
- **Batch 3 complete** (TASK-010 through TASK-012): UI fix-up, deprecation, docs. Demo: full end-to-end. Estimated: 2 days.
|
||||
- **Total estimated effort**: 9-11 working days for one backend developer + 1 day frontend.
|
||||
@@ -32,13 +32,13 @@ This page focuses on deterministic slot/port allocation and may include legacy o
|
||||
| 14 | 10140 | 10141 | Policy Engine | `policy-engine.stella-ops.local` | `src/Policy/StellaOps.Policy.Engine` | `STELLAOPS_POLICY_ENGINE_URL` |
|
||||
| 15 | 10150 | 10151 | ~~Policy Gateway~~ (merged into Policy Engine, Slot 14) | `policy-gateway.stella-ops.local` -> `policy-engine.stella-ops.local` | _removed_ | _removed_ |
|
||||
| 16 | 10160 | 10161 | RiskEngine | `riskengine.stella-ops.local` | `src/Findings/StellaOps.RiskEngine.WebService` | `STELLAOPS_RISKENGINE_URL` |
|
||||
| 17 | 10170 | 10171 | Orchestrator | `jobengine.stella-ops.local` | `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService` | `STELLAOPS_JOBENGINE_URL` |
|
||||
| 17 | 10170 | 10171 | ~~Orchestrator~~ (retired; audit/first-signal moved to Release Orchestrator, Slot 48) | `jobengine.stella-ops.local` | _removed_ | _removed_ |
|
||||
| 18 | 10180 | 10181 | TaskRunner | `taskrunner.stella-ops.local` | `src/JobEngine/StellaOps.TaskRunner/StellaOps.TaskRunner.WebService` | `STELLAOPS_TASKRUNNER_URL` |
|
||||
| 19 | 10190 | 10191 | Scheduler | `scheduler.stella-ops.local` | `src/JobEngine/StellaOps.Scheduler.WebService` | `STELLAOPS_SCHEDULER_URL` |
|
||||
| 20 | 10200 | 10201 | Graph API | `graph.stella-ops.local` | `src/Graph/StellaOps.Graph.Api` | `STELLAOPS_GRAPH_URL` |
|
||||
| 21 | 10210 | 10211 | Cartographer | `cartographer.stella-ops.local` | `src/Scanner/StellaOps.Scanner.Cartographer` | `STELLAOPS_CARTOGRAPHER_URL` |
|
||||
| 22 | 10220 | 10221 | ReachGraph | `reachgraph.stella-ops.local` | `src/ReachGraph/StellaOps.ReachGraph.WebService` | `STELLAOPS_REACHGRAPH_URL` |
|
||||
| 23 | 10230 | 10231 | Timeline Indexer | `timelineindexer.stella-ops.local` | `src/Timeline/StellaOps.TimelineIndexer.WebService` | `STELLAOPS_TIMELINEINDEXER_URL` |
|
||||
| 23 | 10230 | 10231 | _(Timeline Indexer merged into Timeline)_ | `timelineindexer.stella-ops.local` (alias) | _(see Timeline)_ | `STELLAOPS_TIMELINEINDEXER_URL` |
|
||||
| 24 | 10240 | 10241 | Timeline | `timeline.stella-ops.local` | `src/Timeline/StellaOps.Timeline.WebService` | `STELLAOPS_TIMELINE_URL` |
|
||||
| 25 | 10250 | 10251 | Findings Ledger | `findings.stella-ops.local` | `src/Findings/StellaOps.Findings.Ledger.WebService` | `STELLAOPS_FINDINGS_LEDGER_URL` |
|
||||
| 26 | 10260 | 10261 | Doctor | `doctor.stella-ops.local` | `src/Doctor/StellaOps.Doctor.WebService` | `STELLAOPS_DOCTOR_URL` |
|
||||
|
||||
Reference in New Issue
Block a user