consolidation of some of the modules, localization fixes, product advisories work, qa work
This commit is contained in:
@@ -0,0 +1,35 @@
|
||||
# DAG Planner with Critical-Path Metadata
|
||||
|
||||
## Module
|
||||
Orchestrator
|
||||
|
||||
## Status
|
||||
IMPLEMENTED
|
||||
|
||||
## Description
|
||||
DAG-based job planner that computes critical-path metadata for orchestrator execution plans, enabling dependency-aware scheduling and parallel execution of independent job chains.
|
||||
|
||||
## Implementation Details
|
||||
- **Modules**: `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scheduling/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/`
|
||||
- **Key Classes**:
|
||||
- `DagPlanner` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scheduling/DagPlanner.cs`) - computes execution DAGs from job dependency graphs, identifies critical path, and enables parallel scheduling of independent chains
|
||||
- `DagEdge` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/DagEdge.cs`) - edge model representing dependencies between jobs in the execution DAG
|
||||
- `JobScheduler` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scheduling/JobScheduler.cs`) - schedules jobs based on DAG planner output, respecting dependency ordering
|
||||
- `JobStateMachine` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scheduling/JobStateMachine.cs`) - state machine governing job lifecycle transitions within the DAG execution
|
||||
- `Job` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Job.cs`) - job entity with status, dependencies, and scheduling metadata
|
||||
- `JobStatus` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/JobStatus.cs`) - enum defining job lifecycle states
|
||||
- `JobHistory` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/JobHistory.cs`) - historical record of job state transitions
|
||||
- `DagEndpoints` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/DagEndpoints.cs`) - REST API for querying DAG execution plans
|
||||
- `DagContracts` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Contracts/DagContracts.cs`) - API contracts for DAG responses
|
||||
- **Interfaces**: `IDagEdgeRepository` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Repositories/IDagEdgeRepository.cs`)
|
||||
- **Source**: Feature matrix scan
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Create a DAG with 5 jobs (A->B->C, A->D->E) and verify `DagPlanner` identifies A as the root and C/E as leaves
|
||||
- [ ] Verify critical path computation: the longest dependency chain (A->B->C or A->D->E) is marked as the critical path
|
||||
- [ ] Schedule the DAG via `JobScheduler` and verify B and D execute in parallel after A completes
|
||||
- [ ] Add a new dependency (D->C) creating a diamond DAG and verify the critical path updates
|
||||
- [ ] Query the DAG via `DagEndpoints` and verify the response includes all edges, critical path markers, and parallel groups
|
||||
- [ ] Create a cyclic DAG (A->B->A) and verify `DagPlanner` rejects it with a cycle detection error
|
||||
- [ ] Verify DAG metadata: each job node in the `DagContracts` response includes estimated duration and dependency count
|
||||
- [ ] Schedule a DAG with one failed job and verify `JobStateMachine` marks downstream dependencies as blocked
|
||||
35
docs/features/checked/jobengine/event-fan-out.md
Normal file
35
docs/features/checked/jobengine/event-fan-out.md
Normal file
@@ -0,0 +1,35 @@
|
||||
# Event Fan-Out (SSE/Streaming)
|
||||
|
||||
## Module
|
||||
Orchestrator
|
||||
|
||||
## Status
|
||||
IMPLEMENTED
|
||||
|
||||
## Description
|
||||
Job and pack-run streaming coordinators with stream payload models for real-time SSE event delivery.
|
||||
|
||||
## Implementation Details
|
||||
- **Modules**: `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Streaming/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Events/`
|
||||
- **Key Classes**:
|
||||
- `JobStreamCoordinator` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Streaming/JobStreamCoordinator.cs`) - coordinates SSE streaming for job lifecycle events to connected clients
|
||||
- `PackRunStreamCoordinator` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Streaming/PackRunStreamCoordinator.cs`) - coordinates streaming for pack-run execution events
|
||||
- `RunStreamCoordinator` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Streaming/RunStreamCoordinator.cs`) - coordinates streaming for individual run events
|
||||
- `SseWriter` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Streaming/SseWriter.cs`) - writes Server-Sent Events to HTTP response streams
|
||||
- `StreamOptions` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Streaming/StreamOptions.cs`) - configuration for stream connections (heartbeat interval, buffer size, timeout)
|
||||
- `StreamPayloads` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Streaming/StreamPayloads.cs`) - typed payload models for stream events (job progress, pack-run status, log lines)
|
||||
- `StreamEndpoints` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/StreamEndpoints.cs`) - REST endpoints for SSE stream subscription
|
||||
- `EventEnvelope` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Events/EventEnvelope.cs`) - typed event envelope wrapping domain events for streaming
|
||||
- `OrchestratorEventPublisher` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Events/OrchestratorEventPublisher.cs`) - concrete event publisher routing events to stream coordinators
|
||||
- **Interfaces**: `IEventPublisher` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Events/IEventPublisher.cs`)
|
||||
- **Source**: Feature matrix scan
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Subscribe to the job stream via `StreamEndpoints` and trigger a job; verify SSE events are received for each state transition
|
||||
- [ ] Subscribe to the pack-run stream via `PackRunStreamCoordinator` and execute a pack; verify progress events include step index, status, and log lines
|
||||
- [ ] Verify heartbeat: subscribe to a stream and wait without events; confirm heartbeat events arrive at the `StreamOptions` configured interval
|
||||
- [ ] Subscribe with two clients to the same job stream and verify both receive identical events (fan-out via `JobStreamCoordinator`)
|
||||
- [ ] Disconnect a client mid-stream and verify the stream coordinator cleans up the connection without affecting other subscribers
|
||||
- [ ] Trigger a rapid sequence of events and verify `SseWriter` delivers them in order without drops
|
||||
- [ ] Verify stream payloads: each event contains a typed payload matching the `StreamPayloads` model
|
||||
- [ ] Test stream timeout: idle for longer than `StreamOptions.Timeout` and verify the connection closes gracefully
|
||||
33
docs/features/checked/jobengine/export-job-service.md
Normal file
33
docs/features/checked/jobengine/export-job-service.md
Normal file
@@ -0,0 +1,33 @@
|
||||
# Export Job Service
|
||||
|
||||
## Module
|
||||
Orchestrator
|
||||
|
||||
## Status
|
||||
IMPLEMENTED
|
||||
|
||||
## Description
|
||||
Export job management with service and domain model for orchestrated export operations.
|
||||
|
||||
## Implementation Details
|
||||
- **Modules**: `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Services/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Export/`
|
||||
- **Key Classes**:
|
||||
- `ExportJobService` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Services/ExportJobService.cs`) - manages export job lifecycle: creation, scheduling, execution tracking, and completion
|
||||
- `ExportJob` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Export/ExportJob.cs`) - export job entity with status, target, format, and schedule
|
||||
- `ExportJobPolicy` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Export/ExportJobPolicy.cs`) - policy controlling export permissions and constraints
|
||||
- `ExportJobTypes` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Export/ExportJobTypes.cs`) - enumeration of supported export types (evidence pack, audit report, snapshot)
|
||||
- `ExportSchedule` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Export/ExportSchedule.cs`) - scheduling configuration for recurring exports
|
||||
- `LedgerExporter` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Ledger/LedgerExporter.cs`) - exports audit ledger data for compliance and audit
|
||||
- `ExportJobEndpoints` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/ExportJobEndpoints.cs`) - REST API for creating, querying, and managing export jobs
|
||||
- **Interfaces**: `ILedgerExporter` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Ledger/ILedgerExporter.cs`)
|
||||
- **Source**: Feature matrix scan
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Create an export job via `ExportJobEndpoints` with type=evidence_pack and verify it is persisted with status=Pending
|
||||
- [ ] Execute the export job via `ExportJobService` and verify status transitions: Pending -> Running -> Completed
|
||||
- [ ] Verify export policy enforcement: create an export job with a restricted type and verify `ExportJobPolicy` rejects it
|
||||
- [ ] Schedule a recurring export via `ExportSchedule` and verify the next execution is computed correctly
|
||||
- [ ] Export audit ledger data via `LedgerExporter` and verify the output contains all entries within the specified time range
|
||||
- [ ] Create an export job with retention policy and verify completed exports are cleaned up after expiry
|
||||
- [ ] Query export jobs via `ExportJobEndpoints` with status filter and verify pagination works correctly
|
||||
- [ ] Test export failure: simulate an export error and verify the job transitions to Failed with error details
|
||||
@@ -0,0 +1,37 @@
|
||||
# Job Lifecycle State Machine
|
||||
|
||||
## Module
|
||||
Orchestrator
|
||||
|
||||
## Status
|
||||
IMPLEMENTED
|
||||
|
||||
## Description
|
||||
Job scheduling with Postgres-backed job repository, event envelope domain model, and air-gap compatible scheduling tests.
|
||||
|
||||
## Implementation Details
|
||||
- **Modules**: `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scheduling/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/`
|
||||
- **Key Classes**:
|
||||
- `JobStateMachine` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scheduling/JobStateMachine.cs`) - finite state machine governing job lifecycle transitions (Pending -> Scheduled -> Running -> Completed/Failed/Cancelled)
|
||||
- `JobScheduler` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scheduling/JobScheduler.cs`) - schedules jobs based on state machine rules and DAG dependencies
|
||||
- `RetryPolicy` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scheduling/RetryPolicy.cs`) - configurable retry policy for failed jobs (max retries, backoff strategy)
|
||||
- `Job` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Job.cs`) - job entity with current status, attempts, and metadata
|
||||
- `JobStatus` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/JobStatus.cs`) - enum defining all valid job states
|
||||
- `JobHistory` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/JobHistory.cs`) - historical record of all state transitions with timestamps
|
||||
- `EventEnvelope` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Events/EventEnvelope.cs`) - typed event envelope emitted on state transitions
|
||||
- `TimelineEvent` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Events/TimelineEvent.cs`) - timeline event for job lifecycle tracking
|
||||
- `TimelineEventEmitter` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Events/TimelineEventEmitter.cs`) - emits timeline events on state transitions
|
||||
- `JobEndpoints` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/JobEndpoints.cs`) - REST API for job management
|
||||
- `JobContracts` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Contracts/JobContracts.cs`) - API contracts for job operations
|
||||
- **Interfaces**: `IJobRepository` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Repositories/IJobRepository.cs`), `IJobHistoryRepository` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Repositories/IJobHistoryRepository.cs`)
|
||||
- **Source**: Feature matrix scan
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Create a job via `JobEndpoints` and verify initial state is Pending
|
||||
- [ ] Schedule the job via `JobScheduler` and verify state transition: Pending -> Scheduled, with `TimelineEvent` emitted
|
||||
- [ ] Start the job and verify `JobStateMachine` transition: Scheduled -> Running
|
||||
- [ ] Complete the job and verify transition: Running -> Completed with completion timestamp in `JobHistory`
|
||||
- [ ] Fail the job and verify transition: Running -> Failed with retry attempt incremented
|
||||
- [ ] Verify `RetryPolicy`: fail a job with max_retries=3 and verify it re-enters Scheduled up to 3 times before terminal failure
|
||||
- [ ] Attempt an invalid transition (e.g., Completed -> Running) and verify `JobStateMachine` rejects it
|
||||
- [ ] Verify air-gap scheduling: schedule a job in sealed mode and verify it does not attempt network egress
|
||||
@@ -0,0 +1,35 @@
|
||||
# Orchestrator Admin Quota Controls (orch:quota, orch:backfill)
|
||||
|
||||
## Module
|
||||
Orchestrator
|
||||
|
||||
## Status
|
||||
IMPLEMENTED
|
||||
|
||||
## Description
|
||||
New `orch:quota` and `orch:backfill` scopes with mandatory reason/ticket fields. Token requests must include `quota_reason`/`backfill_reason` and optionally `quota_ticket`/`backfill_ticket`. Authority persists these as claims and audit properties for traceability of capacity-affecting operations.
|
||||
|
||||
## Implementation Details
|
||||
- **Modules**: `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Backfill/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/`
|
||||
- **Key Classes**:
|
||||
- `Quota` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Quota.cs`) - quota entity with limits, current usage, and allocation metadata
|
||||
- `BackfillRequest` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/BackfillRequest.cs`) - backfill request model with reason, ticket, and scope
|
||||
- `BackfillManager` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Backfill/BackfillManager.cs`) - manages backfill operations with duplicate suppression and event time window tracking
|
||||
- `DuplicateSuppressor` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Backfill/DuplicateSuppressor.cs`) - prevents duplicate backfill requests within a time window
|
||||
- `EventTimeWindow` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Backfill/EventTimeWindow.cs`) - time window for backfill event deduplication
|
||||
- `QuotaEndpoints` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/QuotaEndpoints.cs`) - REST API for quota management (view, adjust, allocate)
|
||||
- `QuotaContracts` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Contracts/QuotaContracts.cs`) - API contracts for quota operations
|
||||
- `AuditEntry` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/AuditEntry.cs`) - audit entry capturing quota/backfill actions with reason and ticket
|
||||
- `TenantResolver` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Services/TenantResolver.cs`) - resolves tenant context for quota scoping
|
||||
- **Interfaces**: `IQuotaRepository` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Repositories/IQuotaRepository.cs`), `IBackfillRepository` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Repositories/IBackfillRepository.cs`)
|
||||
- **Source**: Feature matrix scan
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Request a quota adjustment via `QuotaEndpoints` with `quota_reason` and `quota_ticket`; verify the adjustment is applied and audited in `AuditEntry`
|
||||
- [ ] Attempt a quota adjustment without `quota_reason` and verify it is rejected with a 400 error
|
||||
- [ ] Request a backfill via `BackfillManager` with `backfill_reason` and verify the backfill is initiated
|
||||
- [ ] Submit a duplicate backfill request within the `EventTimeWindow` and verify `DuplicateSuppressor` rejects it
|
||||
- [ ] Verify audit trail: check the `AuditEntry` for the quota adjustment and confirm reason and ticket are captured
|
||||
- [ ] Query current quota usage via `QuotaEndpoints` and verify limits and current usage are returned
|
||||
- [ ] Adjust quota beyond the maximum limit and verify the operation is rejected by policy
|
||||
- [ ] Verify tenant scoping via `TenantResolver`: adjust quota for tenant A and verify tenant B's quota is unchanged
|
||||
39
docs/features/checked/jobengine/jobengine-audit-ledger.md
Normal file
39
docs/features/checked/jobengine/jobengine-audit-ledger.md
Normal file
@@ -0,0 +1,39 @@
|
||||
# Orchestrator Audit Ledger
|
||||
|
||||
## Module
|
||||
Orchestrator
|
||||
|
||||
## Status
|
||||
IMPLEMENTED
|
||||
|
||||
## Description
|
||||
Append-only audit ledger tracking all orchestrator job lifecycle state changes, rate-limit decisions, and dead-letter events with tenant-scoped isolation.
|
||||
|
||||
## Implementation Details
|
||||
- **Modules**: `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/DeadLetter/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Ledger/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/`
|
||||
- **Key Classes**:
|
||||
- `AuditEntry` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/AuditEntry.cs`) - audit entry model with action type, actor, tenant, timestamp, and metadata
|
||||
- `RunLedger` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/RunLedger.cs`) - run-level ledger tracking execution history
|
||||
- `SignedManifest` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/SignedManifest.cs`) - signed manifest for tamper-evident ledger export
|
||||
- `LedgerExporter` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Ledger/LedgerExporter.cs`) - exports ledger data for compliance and audit
|
||||
- `AuditEndpoints` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/AuditEndpoints.cs`) - REST API for querying audit ledger entries
|
||||
- `LedgerEndpoints` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/LedgerEndpoints.cs`) - REST API for ledger export and querying
|
||||
- `AuditLedgerContracts` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Contracts/AuditLedgerContracts.cs`) - API contracts for audit responses
|
||||
- `DeadLetterEntry` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/DeadLetterEntry.cs`) - dead-letter entry in the audit trail
|
||||
- `DeadLetterNotifier` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/DeadLetter/DeadLetterNotifier.cs`) - notifies on dead-letter events
|
||||
- `ErrorClassification` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/DeadLetter/ErrorClassification.cs`) - classifies errors for dead-letter categorization
|
||||
- `ReplayManager` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/DeadLetter/ReplayManager.cs`) - manages replay of dead-letter entries
|
||||
- `DeadLetterEndpoints` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/DeadLetterEndpoints.cs`) - REST API for dead-letter management
|
||||
- `TenantResolver` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Services/TenantResolver.cs`) - ensures tenant-scoped audit isolation
|
||||
- **Interfaces**: `ILedgerExporter` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Ledger/ILedgerExporter.cs`), `IAuditRepository` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Repositories/IAuditRepository.cs`), `IDeadLetterRepository` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/DeadLetter/IDeadLetterRepository.cs`), `ILedgerRepository` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Repositories/ILedgerRepository.cs`)
|
||||
- **Source**: Feature matrix scan
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Trigger a job state transition and verify an `AuditEntry` is created in the ledger with action type, actor, and timestamp
|
||||
- [ ] Query the audit ledger via `AuditEndpoints` with a time range filter and verify only matching entries are returned
|
||||
- [ ] Verify tenant isolation via `TenantResolver`: create audit entries for two tenants and verify each tenant only sees their own entries
|
||||
- [ ] Trigger a dead-letter event and verify it appears in both the `DeadLetterEntry` store and the audit ledger
|
||||
- [ ] Export the audit ledger via `LedgerExporter` and verify the export contains all entries within the specified range
|
||||
- [ ] Replay a dead-letter entry via `ReplayManager` and verify the replay action is also audited
|
||||
- [ ] Verify `ErrorClassification` categorizes different error types correctly (transient, permanent, unknown)
|
||||
- [ ] Query dead-letter entries via `DeadLetterEndpoints` and verify pagination and filtering work
|
||||
@@ -0,0 +1,40 @@
|
||||
# Orchestrator Event Envelopes with SSE/WebSocket Streaming
|
||||
|
||||
## Module
|
||||
Orchestrator
|
||||
|
||||
## Status
|
||||
IMPLEMENTED
|
||||
|
||||
## Description
|
||||
Typed event envelope system with SSE and WebSocket streaming for real-time orchestrator job progress, enabling live UI updates and CLI monitoring of pack-run execution.
|
||||
|
||||
## Implementation Details
|
||||
- **Modules**: `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Events/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Hashing/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Streaming/`
|
||||
- **Key Classes**:
|
||||
- `EventEnvelope` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Events/EventEnvelope.cs`) - typed event envelope with event type, payload, timestamp, and correlation ID
|
||||
- `EventEnvelope` (legacy) (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/EventEnvelope.cs`) - legacy event envelope model
|
||||
- `TimelineEvent` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Events/TimelineEvent.cs`) - timeline event for job lifecycle tracking
|
||||
- `TimelineEventEmitter` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Events/TimelineEventEmitter.cs`) - emits timeline events on domain actions
|
||||
- `OrchestratorEventPublisher` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Events/OrchestratorEventPublisher.cs`) - concrete publisher routing events to stream coordinators
|
||||
- `EventEnvelopeHasher` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Hashing/EventEnvelopeHasher.cs`) - hashes event envelopes for integrity verification
|
||||
- `CanonicalJsonHasher` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Hashing/CanonicalJsonHasher.cs`) - canonical JSON hashing for deterministic event hashes
|
||||
- `SseWriter` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Streaming/SseWriter.cs`) - Server-Sent Events writer
|
||||
- `JobStreamCoordinator` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Streaming/JobStreamCoordinator.cs`) - job event stream coordinator
|
||||
- `PackRunStreamCoordinator` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Streaming/PackRunStreamCoordinator.cs`) - pack-run stream coordinator
|
||||
- `RunStreamCoordinator` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Streaming/RunStreamCoordinator.cs`) - run-level stream coordinator
|
||||
- `StreamEndpoints` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/StreamEndpoints.cs`) - REST endpoints for SSE subscriptions
|
||||
- `StreamOptions` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Streaming/StreamOptions.cs`) - stream configuration
|
||||
- `StreamPayloads` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Streaming/StreamPayloads.cs`) - typed event payloads
|
||||
- **Interfaces**: `IEventPublisher` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Events/IEventPublisher.cs`)
|
||||
- **Source**: Feature matrix scan
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Create an `EventEnvelope` with type=job_completed and payload; verify it is hashed via `EventEnvelopeHasher` and the hash is deterministic
|
||||
- [ ] Publish an event via `OrchestratorEventPublisher` and verify it reaches the `JobStreamCoordinator`
|
||||
- [ ] Subscribe to SSE via `StreamEndpoints` and verify events arrive as formatted SSE messages (data: + newline)
|
||||
- [ ] Verify canonical hashing: create two identical events and verify `CanonicalJsonHasher` produces identical hashes
|
||||
- [ ] Subscribe to pack-run stream via `PackRunStreamCoordinator` and execute a pack; verify real-time progress events include step index and status
|
||||
- [ ] Verify `StreamOptions`: configure heartbeat interval and verify heartbeats arrive at the configured cadence
|
||||
- [ ] Publish 100 events rapidly and verify `SseWriter` delivers all of them in order
|
||||
- [ ] Verify event envelope correlation: publish events with the same correlation ID and verify they can be filtered by correlation
|
||||
@@ -0,0 +1,44 @@
|
||||
# Orchestrator Golden Signals Observability
|
||||
|
||||
## Module
|
||||
Orchestrator
|
||||
|
||||
## Status
|
||||
VERIFIED
|
||||
|
||||
## Description
|
||||
Built-in golden signal metrics (latency, traffic, errors, saturation) for orchestrator job execution, with timeline event emission and job capsule provenance tracking.
|
||||
|
||||
## Implementation Details
|
||||
- **Modules**: `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Observability/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scale/`
|
||||
- **Key Classes**:
|
||||
- `OrchestratorGoldenSignals` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Observability/OrchestratorGoldenSignals.cs`) - golden signal metrics: latency (p50/p95/p99), traffic (requests/sec), errors (error rate), saturation (queue depth, CPU, memory)
|
||||
- `OrchestratorMetrics` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Observability/OrchestratorMetrics.cs`) - OpenTelemetry metrics registration for orchestrator operations
|
||||
- `IncidentModeHooks` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Observability/IncidentModeHooks.cs`) - hooks triggered when golden signals breach thresholds, activating incident mode
|
||||
- `JobAttestationService` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobAttestationService.cs`) - generates attestations for job execution with provenance data
|
||||
- `JobAttestation` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobAttestation.cs`) - attestation model for a completed job
|
||||
- `JobCapsule` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobCapsule.cs`) - capsule containing job execution evidence (inputs, outputs, metrics)
|
||||
- `JobCapsuleGenerator` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobCapsuleGenerator.cs`) - generates job capsules from execution data
|
||||
- `JobRedactionGuard` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/JobRedactionGuard.cs`) - redacts sensitive data from job capsules before attestation
|
||||
- `SnapshotHook` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Evidence/SnapshotHook.cs`) - hook capturing execution state snapshots at key points
|
||||
- `ScaleMetrics` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scale/ScaleMetrics.cs`) - metrics for auto-scaling decisions
|
||||
- `KpiEndpoints` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/KpiEndpoints.cs`) - REST endpoints for KPI/metrics queries
|
||||
- `HealthEndpoints` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/HealthEndpoints.cs`) - health check endpoints
|
||||
- **Interfaces**: None (uses concrete implementations)
|
||||
- **Source**: Feature matrix scan
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Execute a job and verify `OrchestratorGoldenSignals` records latency, traffic, and error metrics
|
||||
- [ ] Verify golden signal latency: execute 10 jobs with varying durations and verify p50/p95/p99 percentiles are computed correctly
|
||||
- [ ] Trigger an error threshold breach and verify `IncidentModeHooks` activates incident mode
|
||||
- [ ] Generate a `JobCapsule` via `JobCapsuleGenerator` and verify it contains job inputs, outputs, and execution metrics
|
||||
- [ ] Verify redaction: include sensitive data in job inputs and verify `JobRedactionGuard` removes it from the capsule
|
||||
- [ ] Generate a `JobAttestation` via `JobAttestationService` and verify it contains the capsule hash and provenance data
|
||||
- [ ] Query KPI metrics via `KpiEndpoints` and verify golden signal data is returned
|
||||
- [ ] Verify `HealthEndpoints` report healthy when golden signals are within thresholds
|
||||
|
||||
## Verification
|
||||
- Verified on 2026-02-13 via `run-002`.
|
||||
- Tier 0: Source files confirmed present on disk.
|
||||
- Tier 1: `dotnet build` passed (0 errors); 1292/1292 tests passed.
|
||||
- Tier 2d: `docs/qa/feature-checks/runs/jobengine/orchestrator-golden-signals-observability/run-002/tier2-integration-check.json`
|
||||
@@ -0,0 +1,39 @@
|
||||
# Orchestrator Operator Scope with Audit Metadata
|
||||
|
||||
## Module
|
||||
Orchestrator
|
||||
|
||||
## Status
|
||||
VERIFIED
|
||||
|
||||
## Description
|
||||
New `orch:operate` scope and `Orch.Operator` role requiring explicit `operator_reason` and `operator_ticket` parameters on token requests. Authority enforces these fields and captures them as audit properties, giving SecOps traceability for every orchestrator control action.
|
||||
|
||||
## Implementation Details
|
||||
- **Modules**: `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/`
|
||||
- **Key Classes**:
|
||||
- `AuditEntry` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/AuditEntry.cs`) - audit entry capturing operator actions with reason and ticket metadata
|
||||
- `TenantResolver` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Services/TenantResolver.cs`) - resolves tenant and operator context from token claims
|
||||
- `AuditEndpoints` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/AuditEndpoints.cs`) - REST API for querying operator audit trail
|
||||
- `AuditLedgerContracts` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Contracts/AuditLedgerContracts.cs`) - API contracts including operator metadata
|
||||
- `Quota` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Quota.cs`) - quota model with operator attribution
|
||||
- `Job` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Job.cs`) - job model with operator tracking
|
||||
- `DeprecationHeaders` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Services/DeprecationHeaders.cs`) - deprecation header support for versioned operator APIs
|
||||
- **Interfaces**: `IAuditRepository` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Repositories/IAuditRepository.cs`)
|
||||
- **Source**: Feature matrix scan
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Request a token with `orch:operate` scope, `operator_reason="maintenance"`, and `operator_ticket="TICKET-123"`; verify the token is issued
|
||||
- [ ] Perform an operator action (e.g., cancel a job) with the scoped token; verify an `AuditEntry` captures the operator_reason and operator_ticket
|
||||
- [ ] Attempt an operator action without `operator_reason` and verify it is rejected with a 400 error
|
||||
- [ ] Query the audit trail via `AuditEndpoints` and filter by operator_ticket; verify matching entries are returned
|
||||
- [ ] Verify operator scope enforcement: use a token without `orch:operate` scope and verify operator actions are forbidden (403)
|
||||
- [ ] Perform multiple operator actions and verify each generates a separate `AuditEntry` with correct metadata
|
||||
- [ ] Verify tenant scoping via `TenantResolver`: operator actions for tenant A are not visible in tenant B's audit trail
|
||||
- [ ] Verify audit entry immutability: attempt to modify an existing `AuditEntry` and verify it is rejected
|
||||
|
||||
## Verification
|
||||
- Verified on 2026-02-13 via `run-002`.
|
||||
- Tier 0: Source files confirmed present on disk.
|
||||
- Tier 1: `dotnet build` passed (0 errors); 1292/1292 tests passed.
|
||||
- Tier 2d: `docs/qa/feature-checks/runs/jobengine/orchestrator-operator-scope-with-audit-metadata/run-002/tier2-integration-check.json`
|
||||
46
docs/features/checked/jobengine/jobengine-worker-sdks.md
Normal file
46
docs/features/checked/jobengine/jobengine-worker-sdks.md
Normal file
@@ -0,0 +1,46 @@
|
||||
# Orchestrator Worker SDKs (Go and Python)
|
||||
|
||||
## Module
|
||||
Orchestrator
|
||||
|
||||
## Status
|
||||
VERIFIED
|
||||
|
||||
## Description
|
||||
Multi-language Worker SDKs enabling external workers to participate in orchestrator job execution via Go and Python clients, with examples and structured API packages.
|
||||
|
||||
## Implementation Details
|
||||
- **Modules**: `src/JobEngine/StellaOps.JobEngine.WorkerSdk.Go/`, `src/JobEngine/StellaOps.JobEngine.WorkerSdk.Python/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/`
|
||||
- **Key Classes**:
|
||||
- `client.go` (`src/JobEngine/StellaOps.JobEngine.WorkerSdk.Go/pkg/workersdk/client.go`) - Go SDK client for worker communication
|
||||
- `config.go` (`src/JobEngine/StellaOps.JobEngine.WorkerSdk.Go/pkg/workersdk/config.go`) - Go SDK configuration
|
||||
- `artifact.go` (`src/JobEngine/StellaOps.JobEngine.WorkerSdk.Go/pkg/workersdk/artifact.go`) - artifact handling in Go SDK
|
||||
- `backfill.go` (`src/JobEngine/StellaOps.JobEngine.WorkerSdk.Go/pkg/workersdk/backfill.go`) - backfill support in Go SDK
|
||||
- `retry.go` (`src/JobEngine/StellaOps.JobEngine.WorkerSdk.Go/pkg/workersdk/retry.go`) - retry logic in Go SDK
|
||||
- `errors.go` (`src/JobEngine/StellaOps.JobEngine.WorkerSdk.Go/pkg/workersdk/errors.go`) - error types in Go SDK
|
||||
- `transport.go` (`src/JobEngine/StellaOps.JobEngine.WorkerSdk.Go/internal/transport/transport.go`) - HTTP transport layer for Go SDK
|
||||
- `main.go` (`src/JobEngine/StellaOps.JobEngine.WorkerSdk.Go/examples/smoke/main.go`) - smoke test example worker
|
||||
- `client.py` (`src/JobEngine/StellaOps.JobEngine.WorkerSdk.Python/stellaops_orchestrator_worker/client.py`) - Python SDK client
|
||||
- `config.py` (`src/JobEngine/StellaOps.JobEngine.WorkerSdk.Python/stellaops_orchestrator_worker/config.py`) - Python SDK configuration
|
||||
- `backfill.py` (`src/JobEngine/StellaOps.JobEngine.WorkerSdk.Python/stellaops_orchestrator_worker/backfill.py`) - Python backfill support
|
||||
- `WorkerEndpoints` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/WorkerEndpoints.cs`) - REST API for worker registration and job assignment
|
||||
- `WorkerContracts` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Contracts/WorkerContracts.cs`) - API contracts for worker communication
|
||||
- `Worker` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Worker/Worker.cs`) - .NET worker implementation
|
||||
- **Interfaces**: None (SDK clients are standalone)
|
||||
- **Source**: Feature matrix scan
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Register a Go worker via `WorkerEndpoints` and verify it receives a job assignment
|
||||
- [ ] Execute a job with the Go worker SDK `client.go` and verify results are reported back via the API
|
||||
- [ ] Register a Python worker via `client.py` and verify it receives a job assignment
|
||||
- [ ] Verify Go SDK retry: configure `retry.go` policy and simulate a transient failure; verify the SDK retries and succeeds
|
||||
- [ ] Verify artifact handling: upload an artifact via `artifact.go` and verify it is persisted
|
||||
- [ ] Verify backfill: trigger a backfill via `backfill.py` and verify it processes historical events
|
||||
- [ ] Verify Go SDK error types: trigger different error conditions and verify `errors.go` returns appropriate error types
|
||||
- [ ] Run the Go smoke test example `main.go` and verify it completes successfully against the orchestrator API
|
||||
|
||||
## Verification
|
||||
- Verified on 2026-02-13 via `run-002`.
|
||||
- Tier 0: Source files confirmed present on disk (Go SDK, Python SDK, .NET endpoints).
|
||||
- Tier 1: `dotnet build` passed (0 errors); 1292/1292 tests passed.
|
||||
- Tier 2d: `docs/qa/feature-checks/runs/jobengine/orchestrator-worker-sdks/run-002/tier2-integration-check.json`
|
||||
36
docs/features/checked/jobengine/network-intent-validator.md
Normal file
36
docs/features/checked/jobengine/network-intent-validator.md
Normal file
@@ -0,0 +1,36 @@
|
||||
# Network Intent Validator (Air-Gap Orchestrator Controls)
|
||||
|
||||
## Module
|
||||
Orchestrator
|
||||
|
||||
## Status
|
||||
IMPLEMENTED
|
||||
|
||||
## Description
|
||||
NetworkIntentValidator enforces air-gap network policies on orchestrator jobs, preventing egress in sealed mode. Includes MirrorJobTypes and MirrorOperationRecorder for offline mirror operations.
|
||||
|
||||
## Implementation Details
|
||||
- **Modules**: `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/AirGap/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/AirGap/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Mirror/`
|
||||
- **Key Classes**:
|
||||
- `NetworkIntentValidator` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/AirGap/NetworkIntentValidator.cs`) - validates job network intent against air-gap policy, blocking egress requests in sealed mode
|
||||
- `StalenessValidator` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/AirGap/StalenessValidator.cs`) - validates data freshness in air-gapped environments, ensuring cached data is within acceptable staleness bounds
|
||||
- `NetworkIntent` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/AirGap/NetworkIntent.cs`) - declares the network intent of a job (egress, ingress, local-only)
|
||||
- `SealingStatus` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/AirGap/SealingStatus.cs`) - enum for air-gap sealing state (Sealed, Unsealed, Transitioning)
|
||||
- `StalenessConfig` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/AirGap/StalenessConfig.cs`) - configuration for acceptable data staleness in air-gap mode
|
||||
- `StalenessValidationResult` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/AirGap/StalenessValidationResult.cs`) - result of staleness validation
|
||||
- `BundleProvenance` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/AirGap/BundleProvenance.cs`) - provenance tracking for air-gap bundles
|
||||
- `MirrorBundle` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Mirror/MirrorBundle.cs`) - bundle model for offline mirror operations
|
||||
- `MirrorJobTypes` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Mirror/MirrorJobTypes.cs`) - types of mirror jobs (sync, verify, prune)
|
||||
- `MirrorOperationRecorder` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Mirror/MirrorOperationRecorder.cs`) - records mirror operations for audit trail
|
||||
- **Interfaces**: None (uses concrete implementations)
|
||||
- **Source**: Feature matrix scan
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Set `SealingStatus` to Sealed and submit a job with egress intent; verify `NetworkIntentValidator` rejects it
|
||||
- [ ] Set `SealingStatus` to Unsealed and submit a job with egress intent; verify it is allowed
|
||||
- [ ] Validate staleness: set `StalenessConfig` max staleness to 24 hours and verify data older than 24 hours is rejected by `StalenessValidator`
|
||||
- [ ] Create a mirror job with type=sync and verify `MirrorOperationRecorder` records the operation
|
||||
- [ ] Verify bundle provenance: create a `MirrorBundle` and verify `BundleProvenance` captures origin, sync timestamp, and hash
|
||||
- [ ] Transition sealing status from Unsealed to Sealed and verify in-flight egress jobs are blocked
|
||||
- [ ] Submit a local-only `NetworkIntent` job in sealed mode and verify it is allowed
|
||||
- [ ] Verify staleness config: set different staleness thresholds per data type in `StalenessConfig` and verify per-type enforcement
|
||||
43
docs/features/checked/jobengine/pack-run-bridge.md
Normal file
43
docs/features/checked/jobengine/pack-run-bridge.md
Normal file
@@ -0,0 +1,43 @@
|
||||
# Pack-Run Bridge (TaskRunner Integration)
|
||||
|
||||
## Module
|
||||
Orchestrator
|
||||
|
||||
## Status
|
||||
VERIFIED
|
||||
|
||||
## Description
|
||||
Pack-run integration with Postgres repository, API endpoints, stream coordinator for log/artifact streaming, and domain model.
|
||||
|
||||
## Implementation Details
|
||||
- **Modules**: `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/`
|
||||
- **Key Classes**:
|
||||
- `Pack` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Pack.cs`) - pack entity containing a set of jobs to execute as a unit
|
||||
- `PackRun` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/PackRun.cs`) - pack-run entity tracking execution of a pack instance
|
||||
- `PackRunLog` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/PackRunLog.cs`) - log entries for pack-run execution
|
||||
- `PackRunStreamCoordinator` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Streaming/PackRunStreamCoordinator.cs`) - coordinates real-time streaming of pack-run logs and artifacts
|
||||
- `PackRunEndpoints` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/PackRunEndpoints.cs`) - REST API for creating, querying, and managing pack runs
|
||||
- `PackRegistryEndpoints` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/PackRegistryEndpoints.cs`) - REST API for pack registration and versioning
|
||||
- `PackRunContracts` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Contracts/PackRunContracts.cs`) - API contracts for pack-run operations
|
||||
- `PackRegistryContracts` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Contracts/PackRegistryContracts.cs`) - API contracts for pack registry
|
||||
- `Run` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Run.cs`) - individual run within a pack execution
|
||||
- `RunEndpoints` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/RunEndpoints.cs`) - REST API for run management
|
||||
- `RunContracts` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Contracts/RunContracts.cs`) - API contracts for run operations
|
||||
- **Interfaces**: `IPackRunRepository` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Repositories/IPackRunRepository.cs`), `IPackRegistryRepository` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Repositories/IPackRegistryRepository.cs`), `IRunRepository` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Repositories/IRunRepository.cs`)
|
||||
- **Source**: Feature matrix scan
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Register a pack via `PackRegistryEndpoints` with 3 jobs and verify it is persisted with version 1
|
||||
- [ ] Create a pack run via `PackRunEndpoints` and verify it starts executing the pack's jobs
|
||||
- [ ] Subscribe to the pack-run stream via `PackRunStreamCoordinator` and verify real-time log entries arrive as jobs execute
|
||||
- [ ] Verify pack-run completion: all 3 jobs complete and the `PackRun` transitions to Completed
|
||||
- [ ] Verify pack versioning: update a pack and verify `PackRegistryEndpoints` creates version 2 while preserving version 1
|
||||
- [ ] Query `PackRunLog` entries via the API and verify all log entries are returned in chronological order
|
||||
- [ ] Fail one job in a pack run and verify the pack run reports partial failure
|
||||
- [ ] Create multiple pack runs concurrently and verify they execute independently
|
||||
|
||||
## Verification
|
||||
- Verified on 2026-02-13 via `run-002`.
|
||||
- Tier 0: Source files confirmed present on disk.
|
||||
- Tier 1: `dotnet build` passed (0 errors); 1292/1292 tests passed.
|
||||
- Tier 2d: `docs/qa/feature-checks/runs/jobengine/pack-run-bridge/run-002/tier2-integration-check.json`
|
||||
@@ -0,0 +1,43 @@
|
||||
# Quota Governance and Circuit Breakers
|
||||
|
||||
## Module
|
||||
Orchestrator
|
||||
|
||||
## Status
|
||||
VERIFIED
|
||||
|
||||
## Description
|
||||
Quota governance services with cross-tenant allocation policies and circuit breaker automation for downstream service failure protection, integrated with rate limiting and load shedding.
|
||||
|
||||
## Implementation Details
|
||||
- **Modules**: `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/RateLimiting/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scale/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/`
|
||||
- **Key Classes**:
|
||||
- `QuotaGovernanceService` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Services/QuotaGovernanceService.cs`) - cross-tenant quota allocation with 5 strategies (unlimited, proportional, priority, reserved, max-limit)
|
||||
- `CircuitBreakerService` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Services/CircuitBreakerService.cs`) - circuit breaker with Closed/Open/HalfOpen state transitions
|
||||
- `Quota` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Quota.cs`) - quota entity with limits and allocation
|
||||
- `QuotaEndpoints` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/QuotaEndpoints.cs`) - REST API for quota queries and adjustments
|
||||
- `QuotaContracts` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Contracts/QuotaContracts.cs`) - API contracts for quota operations
|
||||
- `Throttle` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Throttle.cs`) - throttle configuration for rate limiting
|
||||
- `AdaptiveRateLimiter` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/RateLimiting/AdaptiveRateLimiter.cs`) - adaptive rate limiting based on system load
|
||||
- `ConcurrencyLimiter` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/RateLimiting/ConcurrencyLimiter.cs`) - limits concurrent job execution
|
||||
- `BackpressureHandler` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/RateLimiting/BackpressureHandler.cs`) - backpressure signaling
|
||||
- `LoadShedder` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scale/LoadShedder.cs`) - load shedding under saturation
|
||||
- `PostgresQuotaRepository` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Postgres/PostgresQuotaRepository.cs`) - Postgres-backed quota storage
|
||||
- `PostgresThrottleRepository` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Postgres/PostgresThrottleRepository.cs`) - Postgres-backed throttle storage
|
||||
- **Source**: Feature matrix scan
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Configure a quota policy with proportional allocation and verify QuotaGovernanceService distributes capacity across tenants
|
||||
- [ ] Request quota above max limit and verify the request is capped
|
||||
- [ ] Pause a tenant and verify quota requests are denied
|
||||
- [ ] Trigger circuit breaker by exceeding failure threshold and verify downstream requests are blocked
|
||||
- [ ] Verify circuit breaker recovery: wait for timeout, verify HalfOpen state, send success to close
|
||||
- [ ] Force-open and force-close the circuit breaker and verify state changes
|
||||
- [ ] Test concurrent access to circuit breaker and verify thread safety
|
||||
- [ ] Verify all 5 allocation strategies produce correct quota distributions
|
||||
|
||||
## Verification
|
||||
- Verified on 2026-02-13 via `run-002`.
|
||||
- Tier 0: Source files confirmed present on disk.
|
||||
- Tier 1: `dotnet build` passed (0 errors); 1292/1292 tests passed.
|
||||
- Tier 2d: `docs/qa/feature-checks/runs/jobengine/quota-governance-and-circuit-breakers/run-002/tier2-integration-check.json`
|
||||
42
docs/features/checked/jobengine/skip-locked-queue-pattern.md
Normal file
42
docs/features/checked/jobengine/skip-locked-queue-pattern.md
Normal file
@@ -0,0 +1,42 @@
|
||||
# SKIP LOCKED Queue Pattern
|
||||
|
||||
## Module
|
||||
Orchestrator
|
||||
|
||||
## Status
|
||||
VERIFIED
|
||||
|
||||
## Description
|
||||
SKIP LOCKED queue pattern is used in Scheduler and Orchestrator job repositories for reliable work distribution.
|
||||
|
||||
## Implementation Details
|
||||
- **Modules**: `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scheduling/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/RateLimiting/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scale/`
|
||||
- **Key Classes**:
|
||||
- `JobScheduler` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scheduling/JobScheduler.cs`) - job scheduler using PostgreSQL `SELECT ... FOR UPDATE SKIP LOCKED` for concurrent job dequeuing without contention
|
||||
- `Job` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Job.cs`) - job entity with status field used for queue filtering
|
||||
- `JobStatus` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/JobStatus.cs`) - job states used in queue queries (Pending jobs are available for dequeuing)
|
||||
- `Watermark` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Watermark.cs`) - watermark tracking for ordered processing
|
||||
- `AdaptiveRateLimiter` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/RateLimiting/AdaptiveRateLimiter.cs`) - rate limiter that adjusts based on queue depth and processing speed
|
||||
- `ConcurrencyLimiter` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/RateLimiting/ConcurrencyLimiter.cs`) - limits concurrent job processing
|
||||
- `TokenBucket` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/RateLimiting/TokenBucket.cs`) - token bucket rate limiter for smooth job distribution
|
||||
- `BackpressureHandler` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/RateLimiting/BackpressureHandler.cs`) - applies backpressure when queue depth exceeds thresholds
|
||||
- `LoadShedder` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scale/LoadShedder.cs`) - sheds load when system is saturated
|
||||
- `ScaleMetrics` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scale/ScaleMetrics.cs`) - metrics for monitoring queue depth and throughput
|
||||
- **Interfaces**: `IJobRepository` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Repositories/IJobRepository.cs`), `IWatermarkRepository` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Repositories/IWatermarkRepository.cs`)
|
||||
- **Source**: Feature matrix scan
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Enqueue 10 jobs and dequeue from 3 concurrent workers using SKIP LOCKED via `JobScheduler`; verify each job is assigned to exactly one worker
|
||||
- [ ] Verify no contention: dequeue rapidly from 5 workers and verify no blocking or deadlocks occur
|
||||
- [ ] Verify job visibility: a job locked by worker A is not visible to worker B during dequeue
|
||||
- [ ] Complete a locked job and verify it is no longer in the queue
|
||||
- [ ] Verify `AdaptiveRateLimiter`: increase queue depth and verify the rate limiter increases throughput
|
||||
- [ ] Verify `BackpressureHandler`: fill the queue beyond the threshold and verify backpressure is signaled to producers
|
||||
- [ ] Verify `LoadShedder`: saturate the system and verify new jobs are rejected with a 503 response
|
||||
- [ ] Test `TokenBucket`: configure a rate of 10 jobs/second and verify the bucket enforces the limit
|
||||
|
||||
## Verification
|
||||
- Verified on 2026-02-13 via `run-002`.
|
||||
- Tier 0: Source files confirmed present on disk.
|
||||
- Tier 1: `dotnet build` passed (0 errors); 1292/1292 tests passed.
|
||||
- Tier 2d: `docs/qa/feature-checks/runs/jobengine/skip-locked-queue-pattern/run-002/tier2-integration-check.json`
|
||||
@@ -0,0 +1,38 @@
|
||||
# SLO Burn-Rate Computation and Alert Budget Tracking
|
||||
|
||||
## Module
|
||||
Orchestrator
|
||||
|
||||
## Status
|
||||
VERIFIED
|
||||
|
||||
## Description
|
||||
SLO burn-rate computation for orchestrator operations with configurable alert budgets, enabling proactive capacity and reliability management.
|
||||
|
||||
## Implementation Details
|
||||
- **Modules**: `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/SloManagement/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/`
|
||||
- **Key Classes**:
|
||||
- `BurnRateEngine` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/SloManagement/BurnRateEngine.cs`) - computes SLO burn rate from error budget consumption over rolling windows (1h, 6h, 24h, 30d)
|
||||
- `Slo` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Slo.cs`) - SLO entity with target (e.g., 99.9%), error budget, and current burn rate
|
||||
- `SloEndpoints` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/SloEndpoints.cs`) - REST API for SLO queries and burn rate dashboards
|
||||
- `IncidentModeHooks` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Observability/IncidentModeHooks.cs`) - activates incident mode when burn rate exceeds alert thresholds
|
||||
- `OrchestratorGoldenSignals` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Observability/OrchestratorGoldenSignals.cs`) - provides underlying error/latency data for SLO computation
|
||||
- `ScaleMetrics` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scale/ScaleMetrics.cs`) - metrics feeding SLO saturation signals
|
||||
- **Interfaces**: None (uses concrete implementations)
|
||||
- **Source**: Feature matrix scan
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Define an `Slo` with target=99.9% and error budget=43.2 minutes/month; verify the SLO is persisted
|
||||
- [ ] Generate 10 successful and 1 failed request and verify `BurnRateEngine` computation reflects the error
|
||||
- [ ] Verify rolling window: compute burn rates for 1h, 6h, and 24h windows via `BurnRateEngine` and verify each reflects the appropriate time range
|
||||
- [ ] Exceed the alert threshold (e.g., 2x burn rate) and verify `IncidentModeHooks` triggers incident mode
|
||||
- [ ] Query SLO via `SloEndpoints` and verify the response includes current burn rate, remaining budget, and alert status
|
||||
- [ ] Verify budget depletion: consume the entire error budget and verify the `Slo` shows 0% remaining
|
||||
- [ ] Reset the SLO period (monthly rollover) and verify the error budget resets to full
|
||||
- [ ] Verify multi-SLO: define SLOs for latency and availability, verify `BurnRateEngine` computes each independently
|
||||
|
||||
## Verification
|
||||
- Verified on 2026-02-13 via `run-002`.
|
||||
- Tier 0: Source files confirmed present on disk.
|
||||
- Tier 1: `dotnet build` passed (0 errors); 1292/1292 tests passed.
|
||||
- Tier 2d: `docs/qa/feature-checks/runs/jobengine/slo-burn-rate-computation-and-alert-budget-tracking/run-002/tier2-integration-check.json`
|
||||
Reference in New Issue
Block a user