consolidation of some of the modules, localization fixes, product advisories work, qa work
This commit is contained in:
29
src/JobEngine/AGENTS.PacksRegistry.md
Normal file
29
src/JobEngine/AGENTS.PacksRegistry.md
Normal file
@@ -0,0 +1,29 @@
|
||||
# AGENTS - PacksRegistry Module
|
||||
|
||||
## Working Directory
|
||||
- `src/PacksRegistry/**` (core, persistence, WebService, Worker, tests).
|
||||
|
||||
## Required Reading
|
||||
- `docs/README.md`
|
||||
- `docs/07_HIGH_LEVEL_ARCHITECTURE.md`
|
||||
- `docs/modules/platform/architecture-overview.md`
|
||||
- `docs/modules/packs-registry/architecture.md`
|
||||
- `docs/modules/packs-registry/README.md`
|
||||
- `docs/modules/packs-registry/guides/spec.md`
|
||||
|
||||
## Engineering Rules
|
||||
- Deterministic pack ingestion and indexing.
|
||||
- Validate signatures and fail closed on invalid packs.
|
||||
- Offline-first; no network calls in tests.
|
||||
|
||||
## Testing & Verification
|
||||
- Tests live in `src/PacksRegistry/__Tests/**`.
|
||||
- Cover pack validation, registry API, and persistence.
|
||||
|
||||
## Sprint Discipline
|
||||
- Record pack contract changes in sprint Decisions & Risks.
|
||||
|
||||
## Service Endpoints
|
||||
- Development: https://localhost:10340, http://localhost:10341
|
||||
- Local alias: https://packsregistry.stella-ops.local, http://packsregistry.stella-ops.local
|
||||
- Env var: STELLAOPS_PACKSREGISTRY_URL
|
||||
46
src/JobEngine/AGENTS.Scheduler.md
Normal file
46
src/JobEngine/AGENTS.Scheduler.md
Normal file
@@ -0,0 +1,46 @@
|
||||
# AGENTS ?? Scheduler Working Directory
|
||||
|
||||
## Roles
|
||||
- **Scheduler Worker/WebService Engineer**: .NET 10 (preview) across workers, web service, and shared libraries; keep jobs/metrics deterministic and tenant-safe.
|
||||
- **QA / Reliability**: Adds/maintains unit + integration tests in `__Tests`, covers determinism, job orchestration, and metrics; validates PostgreSQL/Redis/NATS contracts without live cloud deps.
|
||||
- **Docs/Runbook Touches**: Update `docs/modules/scheduler/**` and `operations/` assets when contracts or operational characteristics change.
|
||||
|
||||
## Required Reading
|
||||
- `docs/modules/scheduler/README.md`
|
||||
- `docs/modules/scheduler/architecture.md`
|
||||
- `docs-archived/implplan/SPRINT_0155_0001_0001_scheduler_i.md`
|
||||
- `docs/modules/platform/architecture-overview.md`
|
||||
- Current sprint file(s) for this module (e.g., `docs-archived/implplan/SPRINT_0155_0001_0001_scheduler_i.md`, `SPRINT_0156_0001_0002_scheduler_ii.md`).
|
||||
|
||||
## Working Directory & Boundaries
|
||||
- Primary scope: `src/Scheduler/**` including WebService, Worker.Host, `__Libraries`, `__Tests`, plugins, and solution files.
|
||||
- Cross-module edits require an explicit note in sprint **Delivery Tracker** and **Decisions & Risks**.
|
||||
- Fixtures belong under `src/Scheduler/__Tests/Fixtures` and must be deterministic.
|
||||
|
||||
## Engineering Rules
|
||||
- Target `net10.0`; prefer latest C# preview permitted in repo.
|
||||
- Offline-first: no new external calls; use `.nuget/packages/` cache and configurable endpoints.
|
||||
- Determinism: stable ordering, UTC ISO-8601 timestamps, seeded randomness; avoid host-specific paths in outputs/events.
|
||||
- Observability: use structured logging; keep metric/label names consistent with published dashboards (`policy_simulation_*`, `graph_*`, `overlay_*`).
|
||||
- Security: tenant isolation on all queues/stores; avoid leaking PII/secrets in logs or metrics.
|
||||
|
||||
## Testing & Verification
|
||||
- Default: `dotnet test src/Scheduler/StellaOps.Scheduler.sln` (note: GraphJobs `IGraphJobStore.UpdateAsync` accessibility issue is a known blocker; document if encountered).
|
||||
- Add/extend tests in `src/Scheduler/__Tests/**`; prefer minimal deterministic fixtures and stable sort order.
|
||||
- When adding metrics, include unit tests validating label sets and defaults; update `operations/worker-prometheus-rules.yaml` if alert semantics change.
|
||||
|
||||
## Workflow Expectations
|
||||
- Mirror task state changes in sprint files and, where applicable, module TASKS boards.
|
||||
- If blocked by contracts or upstream issues, set task to `BLOCKED` in sprint tracker and note the required decision/fix.
|
||||
- Document runbook/operational changes alongside code changes.
|
||||
|
||||
## Allowed Shared Libraries
|
||||
- May reference shared helpers under `src/Scheduler/__Libraries/**` and existing plugins; new shared libs require sprint note.
|
||||
|
||||
## Air-gap & Offline
|
||||
- Support air-gapped operation: no hardcoded internet endpoints; provide config flags and mirrored feeds when needed.
|
||||
|
||||
## Service Endpoints
|
||||
- Development: https://localhost:10190, http://localhost:10191
|
||||
- Local alias: https://scheduler.stella-ops.local, http://scheduler.stella-ops.local
|
||||
- Env var: STELLAOPS_SCHEDULER_URL
|
||||
30
src/JobEngine/AGENTS.TaskRunner.md
Normal file
30
src/JobEngine/AGENTS.TaskRunner.md
Normal file
@@ -0,0 +1,30 @@
|
||||
# TaskRunner Module Charter
|
||||
|
||||
## Mission
|
||||
- Orchestrate deterministic task-pack execution, evidence, and replayable run logs.
|
||||
|
||||
## Responsibilities
|
||||
- Define pack run lifecycle, persistence, and evidence outputs.
|
||||
- Ensure canonical plan hashing and deterministic event emission.
|
||||
- Maintain offline-first execution and bounded resource usage.
|
||||
|
||||
## Required Reading
|
||||
- docs/README.md
|
||||
- docs/07_HIGH_LEVEL_ARCHITECTURE.md
|
||||
- docs/modules/platform/architecture-overview.md
|
||||
- docs/modules/taskrunner/architecture.md
|
||||
|
||||
## Working Agreement
|
||||
- Use TimeProvider and IGuidGenerator for all timestamps and IDs.
|
||||
- Use RFC 8785 canonical JSON for hashes and signatures.
|
||||
- Propagate CancellationToken and avoid network by default.
|
||||
|
||||
## Testing Strategy
|
||||
- Unit tests for plan hashing, persistence, and evidence outputs.
|
||||
- Determinism tests for run logs and identifiers.
|
||||
- Integration tests for API and worker loops.
|
||||
|
||||
## Service Endpoints
|
||||
- Development: https://localhost:10180, http://localhost:10181
|
||||
- Local alias: https://taskrunner.stella-ops.local, http://taskrunner.stella-ops.local
|
||||
- Env var: STELLAOPS_TASKRUNNER_URL
|
||||
28
src/JobEngine/AGENTS.md
Normal file
28
src/JobEngine/AGENTS.md
Normal file
@@ -0,0 +1,28 @@
|
||||
# AGENTS - JobEngine Module
|
||||
|
||||
## Working Directory
|
||||
- `src/JobEngine/**` (core, infrastructure, WebService, Worker, tests).
|
||||
|
||||
## Required Reading
|
||||
- `docs/README.md`
|
||||
- `docs/07_HIGH_LEVEL_ARCHITECTURE.md`
|
||||
- `docs/modules/platform/architecture-overview.md`
|
||||
- `docs/modules/jobengine/architecture.md`
|
||||
- `docs/modules/jobengine/README.md`
|
||||
|
||||
## Engineering Rules
|
||||
- Deterministic scheduling and execution; stable ordering for job runs.
|
||||
- Enforce tenant isolation and authz on all APIs.
|
||||
- Offline-first; no network calls in tests.
|
||||
|
||||
## Testing & Verification
|
||||
- Tests live in `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Tests` and `src/JobEngine/StellaOps.JobEngine.WorkerSdk.Python/stellaops_jobengine_worker/tests`.
|
||||
- Cover scheduling, retries, and idempotent execution.
|
||||
|
||||
## Sprint Discipline
|
||||
- Track task status in sprint tracker and local TASKS boards.
|
||||
|
||||
## Service Endpoints
|
||||
- Development: https://localhost:10170, http://localhost:10171
|
||||
- Local alias: https://jobengine.stella-ops.local, http://jobengine.stella-ops.local
|
||||
- Env var: STELLAOPS_ORCHESTRATOR_URL
|
||||
21
src/JobEngine/StellaOps.JobEngine.WorkerSdk.Go/AGENTS.md
Normal file
21
src/JobEngine/StellaOps.JobEngine.WorkerSdk.Go/AGENTS.md
Normal file
@@ -0,0 +1,21 @@
|
||||
# Worker SDK (Go) — Agent Charter
|
||||
|
||||
## Mission
|
||||
Provide the official Go SDK for StellaOps orchestrated workers. Implement claim/heartbeat/progress clients, artifact publishing, error classification, and guardrails so Concelier, Excititor, SBOM, Policy, and other teams can integrate with the jobengine deterministically.
|
||||
|
||||
## Responsibilities
|
||||
- Maintain idiomatic Go client with configurable transports, retries, and tenant-aware headers.
|
||||
- Surface structured metrics/logging hooks mirroring jobengine expectations.
|
||||
- Enforce idempotency token usage, artifact checksum publication, and backfill/watermark handshakes.
|
||||
- Coordinate release cadence with Worker Python SDK, jobengine service, DevOps packaging, and Offline Kit requirements.
|
||||
|
||||
## Required Reading
|
||||
- `docs/modules/jobengine/architecture.md`
|
||||
- `docs/modules/platform/architecture-overview.md`
|
||||
|
||||
## Working Agreement
|
||||
- 1. Update task status to `DOING`/`DONE` in both correspoding sprint file `/docs/implplan/SPRINT_*.md` and the local `TASKS.md` when you start or finish work.
|
||||
- 2. Review this charter and the Required Reading documents before coding; confirm prerequisites are met.
|
||||
- 3. Keep changes deterministic (stable ordering, timestamps, hashes) and align with offline/air-gap expectations.
|
||||
- 4. Coordinate doc updates, tests, and cross-guild communication whenever contracts or workflows change.
|
||||
- 5. Revert to `TODO` if you pause the task without shipping changes; leave notes in commit/PR descriptions for context.
|
||||
@@ -0,0 +1,45 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"context"
|
||||
"log"
|
||||
"time"
|
||||
|
||||
"git.stella-ops.org/stellaops/jobengine/worker-sdk-go/pkg/workersdk"
|
||||
)
|
||||
|
||||
func main() {
|
||||
client, err := workersdk.New(workersdk.Config{
|
||||
BaseURL: "http://localhost:8080",
|
||||
APIKey: "dev-token",
|
||||
TenantID: "local-tenant",
|
||||
ProjectID: "demo-project",
|
||||
})
|
||||
if err != nil {
|
||||
log.Fatalf("configure client: %v", err)
|
||||
}
|
||||
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
defer cancel()
|
||||
|
||||
claim, err := client.Claim(ctx, workersdk.ClaimJobRequest{WorkerID: "demo-worker", Capabilities: []string{"pack-run"}})
|
||||
if err != nil {
|
||||
log.Fatalf("claim job: %v", err)
|
||||
}
|
||||
if claim == nil {
|
||||
log.Println("no work available")
|
||||
return
|
||||
}
|
||||
|
||||
// ... perform work using claim.Payload ...
|
||||
|
||||
// heartbeat and progress
|
||||
_ = client.Heartbeat(ctx, claim.JobID, claim.LeaseID)
|
||||
_ = client.Progress(ctx, claim.JobID, claim.LeaseID, 50, "halfway")
|
||||
|
||||
if err := client.Ack(ctx, workersdk.AckJobRequest{JobID: claim.JobID, LeaseID: claim.LeaseID, Status: "succeeded"}); err != nil {
|
||||
log.Fatalf("ack job: %v", err)
|
||||
}
|
||||
|
||||
log.Printf("acknowledged job %s", claim.JobID)
|
||||
}
|
||||
3
src/JobEngine/StellaOps.JobEngine.WorkerSdk.Go/go.mod
Normal file
3
src/JobEngine/StellaOps.JobEngine.WorkerSdk.Go/go.mod
Normal file
@@ -0,0 +1,3 @@
|
||||
module git.stella-ops.org/stellaops/jobengine/worker-sdk-go
|
||||
|
||||
go 1.21
|
||||
@@ -0,0 +1,31 @@
|
||||
package transport
|
||||
|
||||
import (
|
||||
"context"
|
||||
"net/http"
|
||||
)
|
||||
|
||||
// RoundTripper abstracts HTTP transport so we can stub in tests without
|
||||
// depending on the default client.
|
||||
type RoundTripper interface {
|
||||
RoundTrip(*http.Request) (*http.Response, error)
|
||||
}
|
||||
|
||||
// Client wraps an http.Client-like implementation.
|
||||
type Client interface {
|
||||
Do(req *http.Request) (*http.Response, error)
|
||||
}
|
||||
|
||||
// DefaultClient returns a minimal http.Client with sane defaults.
|
||||
func DefaultClient(rt RoundTripper) *http.Client {
|
||||
if rt == nil {
|
||||
return &http.Client{}
|
||||
}
|
||||
return &http.Client{Transport: rt}
|
||||
}
|
||||
|
||||
// Do wraps an HTTP call using the provided Client.
|
||||
func Do(ctx context.Context, c Client, req *http.Request) (*http.Response, error) {
|
||||
req = req.WithContext(ctx)
|
||||
return c.Do(req)
|
||||
}
|
||||
@@ -0,0 +1,66 @@
|
||||
package workersdk
|
||||
|
||||
import (
|
||||
"context"
|
||||
"crypto/sha256"
|
||||
"encoding/hex"
|
||||
"fmt"
|
||||
"io"
|
||||
)
|
||||
|
||||
// StorageClient is a minimal interface for artifact storage backends.
|
||||
type StorageClient interface {
|
||||
PutObject(ctx context.Context, key string, body io.Reader, metadata map[string]string) error
|
||||
}
|
||||
|
||||
// ArtifactPublishRequest describes an artifact to upload.
|
||||
type ArtifactPublishRequest struct {
|
||||
JobID string
|
||||
LeaseID string
|
||||
ObjectKey string
|
||||
Content io.Reader
|
||||
ContentLength int64
|
||||
ContentType string
|
||||
ArtifactType string
|
||||
IdempotencyKey string
|
||||
Storage StorageClient
|
||||
}
|
||||
|
||||
// ArtifactPublishResponse returns checksum metadata.
|
||||
type ArtifactPublishResponse struct {
|
||||
SHA256 string
|
||||
Size int64
|
||||
}
|
||||
|
||||
// PublishArtifact uploads artifact content with checksum metadata and idempotency guard.
|
||||
func (c *Client) PublishArtifact(ctx context.Context, req ArtifactPublishRequest) (*ArtifactPublishResponse, error) {
|
||||
if req.JobID == "" || req.LeaseID == "" {
|
||||
return nil, fmt.Errorf("JobID and LeaseID are required")
|
||||
}
|
||||
if req.ObjectKey == "" {
|
||||
return nil, fmt.Errorf("ObjectKey is required")
|
||||
}
|
||||
if req.Storage == nil {
|
||||
return nil, fmt.Errorf("Storage client is required")
|
||||
}
|
||||
|
||||
// Compute SHA256 while streaming.
|
||||
hasher := sha256.New()
|
||||
tee := io.TeeReader(req.Content, hasher)
|
||||
// Wrap to enforce known length? length optional; storage client may use metadata.
|
||||
metadata := map[string]string{
|
||||
"x-stellaops-job-id": req.JobID,
|
||||
"x-stellaops-lease": req.LeaseID,
|
||||
"x-stellaops-type": req.ArtifactType,
|
||||
"x-stellaops-ct": req.ContentType,
|
||||
}
|
||||
if req.IdempotencyKey != "" {
|
||||
metadata["x-idempotency-key"] = req.IdempotencyKey
|
||||
}
|
||||
|
||||
if err := req.Storage.PutObject(ctx, req.ObjectKey, tee, metadata); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
sum := hex.EncodeToString(hasher.Sum(nil))
|
||||
return &ArtifactPublishResponse{SHA256: sum, Size: req.ContentLength}, nil
|
||||
}
|
||||
@@ -0,0 +1,61 @@
|
||||
package workersdk
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"context"
|
||||
"io"
|
||||
"testing"
|
||||
)
|
||||
|
||||
type memStorage struct {
|
||||
key string
|
||||
data []byte
|
||||
metadata map[string]string
|
||||
}
|
||||
|
||||
func (m *memStorage) PutObject(ctx context.Context, key string, body io.Reader, metadata map[string]string) error {
|
||||
b, err := io.ReadAll(body)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
m.key = key
|
||||
m.data = b
|
||||
m.metadata = metadata
|
||||
return nil
|
||||
}
|
||||
|
||||
func TestPublishArtifact(t *testing.T) {
|
||||
store := &memStorage{}
|
||||
client, err := New(Config{BaseURL: "https://example"})
|
||||
if err != nil {
|
||||
t.Fatalf("new client: %v", err)
|
||||
}
|
||||
|
||||
content := []byte("hello")
|
||||
resp, err := client.PublishArtifact(context.Background(), ArtifactPublishRequest{
|
||||
JobID: "job1",
|
||||
LeaseID: "lease1",
|
||||
ObjectKey: "artifacts/job1/output.txt",
|
||||
Content: bytes.NewReader(content),
|
||||
ContentLength: int64(len(content)),
|
||||
ContentType: "text/plain",
|
||||
ArtifactType: "log",
|
||||
IdempotencyKey: "idem-1",
|
||||
Storage: store,
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatalf("publish: %v", err)
|
||||
}
|
||||
if resp.SHA256 == "" || resp.Size != 5 {
|
||||
t.Fatalf("unexpected resp: %+v", resp)
|
||||
}
|
||||
if store.key != "artifacts/job1/output.txt" {
|
||||
t.Fatalf("key mismatch: %s", store.key)
|
||||
}
|
||||
if store.metadata["x-idempotency-key"] != "idem-1" {
|
||||
t.Fatalf("idempotency missing")
|
||||
}
|
||||
if store.metadata["x-stellaops-job-id"] != "job1" {
|
||||
t.Fatalf("job metadata missing")
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,86 @@
|
||||
package workersdk
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"time"
|
||||
)
|
||||
|
||||
// Range represents an inclusive backfill window.
|
||||
type Range struct {
|
||||
Start time.Time
|
||||
End time.Time
|
||||
}
|
||||
|
||||
// Validate ensures start <= end.
|
||||
func (r Range) Validate() error {
|
||||
if r.End.Before(r.Start) {
|
||||
return fmt.Errorf("range end before start")
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// WatermarkHandshake ensures the worker's view of the watermark matches orchestrator-provided value.
|
||||
type WatermarkHandshake struct {
|
||||
Expected string
|
||||
Current string
|
||||
}
|
||||
|
||||
func (w WatermarkHandshake) Validate() error {
|
||||
if w.Expected == "" {
|
||||
return fmt.Errorf("expected watermark required")
|
||||
}
|
||||
if w.Expected != w.Current {
|
||||
return fmt.Errorf("watermark mismatch")
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// Deduper tracks processed artifact digests to prevent duplicate publication.
|
||||
type Deduper struct {
|
||||
seen map[string]struct{}
|
||||
}
|
||||
|
||||
// NewDeduper creates a deduper.
|
||||
func NewDeduper() *Deduper {
|
||||
return &Deduper{seen: make(map[string]struct{})}
|
||||
}
|
||||
|
||||
// Seen returns true if digest already processed; marks new digests.
|
||||
func (d *Deduper) Seen(digest string) bool {
|
||||
if digest == "" {
|
||||
return false
|
||||
}
|
||||
if _, ok := d.seen[digest]; ok {
|
||||
return true
|
||||
}
|
||||
d.seen[digest] = struct{}{}
|
||||
return false
|
||||
}
|
||||
|
||||
// ExecuteRange iterates [start,end] by step days, invoking fn for each day.
|
||||
func ExecuteRange(ctx context.Context, r Range, step time.Duration, fn func(context.Context, time.Time) error) error {
|
||||
if err := r.Validate(); err != nil {
|
||||
return err
|
||||
}
|
||||
if step <= 0 {
|
||||
return fmt.Errorf("step must be positive")
|
||||
}
|
||||
for ts := r.Start; !ts.After(r.End); ts = ts.Add(step) {
|
||||
if err := fn(ctx, ts); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// VerifyAndPublishArtifact wraps PublishArtifact with dedupe and watermark guard.
|
||||
func (c *Client) VerifyAndPublishArtifact(ctx context.Context, wm WatermarkHandshake, dedupe *Deduper, req ArtifactPublishRequest) (*ArtifactPublishResponse, error) {
|
||||
if err := wm.Validate(); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
if dedupe != nil && dedupe.Seen(req.IdempotencyKey) {
|
||||
return nil, fmt.Errorf("duplicate artifact idempotency key")
|
||||
}
|
||||
return c.PublishArtifact(ctx, req)
|
||||
}
|
||||
@@ -0,0 +1,85 @@
|
||||
package workersdk
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"context"
|
||||
"io"
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
type stubStorage struct{}
|
||||
|
||||
func (stubStorage) PutObject(ctx context.Context, key string, body io.Reader, metadata map[string]string) error {
|
||||
return nil
|
||||
}
|
||||
|
||||
func TestRangeValidation(t *testing.T) {
|
||||
r := Range{Start: time.Now(), End: time.Now().Add(-time.Hour)}
|
||||
if err := r.Validate(); err == nil {
|
||||
t.Fatalf("expected error for invalid range")
|
||||
}
|
||||
}
|
||||
|
||||
func TestExecuteRange(t *testing.T) {
|
||||
start := time.Date(2025, 11, 15, 0, 0, 0, 0, time.UTC)
|
||||
end := start.Add(48 * time.Hour)
|
||||
r := Range{Start: start, End: end}
|
||||
calls := 0
|
||||
err := ExecuteRange(context.Background(), r, 24*time.Hour, func(ctx context.Context, ts time.Time) error {
|
||||
calls++
|
||||
return nil
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatalf("execute range: %v", err)
|
||||
}
|
||||
if calls != 3 {
|
||||
t.Fatalf("expected 3 calls, got %d", calls)
|
||||
}
|
||||
}
|
||||
|
||||
func TestWatermarkMismatch(t *testing.T) {
|
||||
wm := WatermarkHandshake{Expected: "abc", Current: "def"}
|
||||
if err := wm.Validate(); err == nil {
|
||||
t.Fatal("expected mismatch error")
|
||||
}
|
||||
}
|
||||
|
||||
func TestDeduper(t *testing.T) {
|
||||
d := NewDeduper()
|
||||
if d.Seen("sha") {
|
||||
t.Fatal("should be new")
|
||||
}
|
||||
if !d.Seen("sha") {
|
||||
t.Fatal("should detect duplicate")
|
||||
}
|
||||
}
|
||||
|
||||
func TestVerifyAndPublishArtifactDuplicate(t *testing.T) {
|
||||
d := NewDeduper()
|
||||
c, _ := New(Config{BaseURL: "https://x"})
|
||||
d.Seen("idem1")
|
||||
_, err := c.VerifyAndPublishArtifact(
|
||||
context.Background(),
|
||||
WatermarkHandshake{Expected: "w", Current: "w"},
|
||||
d,
|
||||
ArtifactPublishRequest{IdempotencyKey: "idem1", Storage: stubStorage{}, JobID: "j", LeaseID: "l", ObjectKey: "k", Content: bytes.NewReader([]byte{}), ArtifactType: "log"},
|
||||
)
|
||||
if err == nil {
|
||||
t.Fatal("expected duplicate error")
|
||||
}
|
||||
}
|
||||
|
||||
func TestVerifyAndPublishArtifactWatermark(t *testing.T) {
|
||||
d := NewDeduper()
|
||||
c, _ := New(Config{BaseURL: "https://x"})
|
||||
_, err := c.VerifyAndPublishArtifact(
|
||||
context.Background(),
|
||||
WatermarkHandshake{Expected: "w1", Current: "w2"},
|
||||
d,
|
||||
ArtifactPublishRequest{IdempotencyKey: "idem2", Storage: stubStorage{}, JobID: "j", LeaseID: "l", ObjectKey: "k", Content: bytes.NewReader([]byte{}), ArtifactType: "log"},
|
||||
)
|
||||
if err == nil {
|
||||
t.Fatal("expected watermark error")
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,243 @@
|
||||
package workersdk
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"io"
|
||||
"net/http"
|
||||
"net/url"
|
||||
"path"
|
||||
"time"
|
||||
|
||||
"git.stella-ops.org/stellaops/jobengine/worker-sdk-go/internal/transport"
|
||||
)
|
||||
|
||||
// Client provides job claim/acknowledge operations.
|
||||
type Client struct {
|
||||
baseURL *url.URL
|
||||
apiKey string
|
||||
tenantID string
|
||||
projectID string
|
||||
userAgent string
|
||||
http transport.Client
|
||||
logger Logger
|
||||
metrics MetricsSink
|
||||
}
|
||||
|
||||
// New creates a configured Client.
|
||||
func New(cfg Config) (*Client, error) {
|
||||
if err := cfg.validate(); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
parsed, err := url.Parse(cfg.BaseURL)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("invalid BaseURL: %w", err)
|
||||
}
|
||||
ua := cfg.UserAgent
|
||||
if ua == "" {
|
||||
ua = "stellaops-worker-sdk-go/0.1"
|
||||
}
|
||||
|
||||
return &Client{
|
||||
baseURL: parsed,
|
||||
apiKey: cfg.APIKey,
|
||||
tenantID: cfg.TenantID,
|
||||
projectID: cfg.ProjectID,
|
||||
userAgent: ua,
|
||||
http: cfg.httpClient(),
|
||||
logger: cfg.logger(),
|
||||
metrics: cfg.metrics(),
|
||||
}, nil
|
||||
}
|
||||
|
||||
// ClaimJobRequest represents a worker's desire to lease a job.
|
||||
type ClaimJobRequest struct {
|
||||
WorkerID string `json:"worker_id"`
|
||||
Capabilities []string `json:"capabilities,omitempty"`
|
||||
}
|
||||
|
||||
// ClaimJobResponse returns the leased job payload.
|
||||
type ClaimJobResponse struct {
|
||||
JobID string `json:"job_id"`
|
||||
LeaseID string `json:"lease_id"`
|
||||
ExpiresAt time.Time `json:"expires_at"`
|
||||
JobType string `json:"job_type"`
|
||||
Payload json.RawMessage `json:"payload"`
|
||||
RetryAfter int `json:"retry_after_seconds,omitempty"`
|
||||
NotBefore *time.Time `json:"not_before,omitempty"`
|
||||
TraceID string `json:"trace_id,omitempty"`
|
||||
}
|
||||
|
||||
// AckJobRequest represents completion of a job.
|
||||
type AckJobRequest struct {
|
||||
JobID string `json:"job_id"`
|
||||
LeaseID string `json:"lease_id"`
|
||||
Status string `json:"status"`
|
||||
Message string `json:"message,omitempty"`
|
||||
Rotating string `json:"rotating_token,omitempty"`
|
||||
}
|
||||
|
||||
// Claim requests the next available job for the worker.
|
||||
func (c *Client) Claim(ctx context.Context, req ClaimJobRequest) (*ClaimJobResponse, error) {
|
||||
if req.WorkerID == "" {
|
||||
return nil, fmt.Errorf("WorkerID is required")
|
||||
}
|
||||
endpoint := c.resolve("/api/jobs/lease")
|
||||
resp, err := c.doClaim(ctx, endpoint, req)
|
||||
if err == nil {
|
||||
c.metrics.IncClaimed()
|
||||
}
|
||||
c.logger.Info(ctx, "claim", map[string]any{"worker_id": req.WorkerID, "err": err})
|
||||
return resp, err
|
||||
}
|
||||
|
||||
// Ack acknowledges job completion or failure.
|
||||
func (c *Client) Ack(ctx context.Context, req AckJobRequest) error {
|
||||
if req.JobID == "" || req.LeaseID == "" || req.Status == "" {
|
||||
return fmt.Errorf("JobID, LeaseID, and Status are required")
|
||||
}
|
||||
endpoint := c.resolve(path.Join("/api/jobs", req.JobID, "ack"))
|
||||
payload, err := json.Marshal(req)
|
||||
if err != nil {
|
||||
return fmt.Errorf("marshal ack request: %w", err)
|
||||
}
|
||||
|
||||
httpReq, err := http.NewRequest(http.MethodPost, endpoint, bytes.NewReader(payload))
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
c.applyHeaders(httpReq)
|
||||
httpReq.Header.Set("Content-Type", "application/json")
|
||||
|
||||
resp, err := transport.Do(ctx, c.http, httpReq)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
|
||||
if resp.StatusCode >= 300 {
|
||||
b, _ := io.ReadAll(io.LimitReader(resp.Body, 8<<10))
|
||||
return fmt.Errorf("ack failed: %s (%s)", resp.Status, string(b))
|
||||
}
|
||||
c.metrics.IncAck(req.Status)
|
||||
return nil
|
||||
}
|
||||
|
||||
// Heartbeat reports liveness for a job lease.
|
||||
func (c *Client) Heartbeat(ctx context.Context, jobID, leaseID string) error {
|
||||
if jobID == "" || leaseID == "" {
|
||||
return fmt.Errorf("JobID and LeaseID are required")
|
||||
}
|
||||
endpoint := c.resolve(path.Join("/api/jobs", jobID, "heartbeat"))
|
||||
payload, _ := json.Marshal(map[string]string{"lease_id": leaseID})
|
||||
httpReq, err := http.NewRequest(http.MethodPost, endpoint, bytes.NewReader(payload))
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
c.applyHeaders(httpReq)
|
||||
httpReq.Header.Set("Content-Type", "application/json")
|
||||
start := time.Now()
|
||||
resp, err := transport.Do(ctx, c.http, httpReq)
|
||||
if err != nil {
|
||||
c.metrics.IncHeartbeatFailures()
|
||||
return err
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
if resp.StatusCode >= 300 {
|
||||
b, _ := io.ReadAll(io.LimitReader(resp.Body, 8<<10))
|
||||
c.metrics.IncHeartbeatFailures()
|
||||
return fmt.Errorf("heartbeat failed: %s (%s)", resp.Status, string(b))
|
||||
}
|
||||
c.metrics.ObserveHeartbeatLatency(time.Since(start).Seconds())
|
||||
return nil
|
||||
}
|
||||
|
||||
// Progress reports worker progress (0-100) with optional message.
|
||||
func (c *Client) Progress(ctx context.Context, jobID, leaseID string, pct int, message string) error {
|
||||
if pct < 0 || pct > 100 {
|
||||
return fmt.Errorf("pct must be 0-100")
|
||||
}
|
||||
payload, _ := json.Marshal(map[string]any{
|
||||
"lease_id": leaseID,
|
||||
"progress": pct,
|
||||
"message": message,
|
||||
})
|
||||
endpoint := c.resolve(path.Join("/api/jobs", jobID, "progress"))
|
||||
httpReq, err := http.NewRequest(http.MethodPost, endpoint, bytes.NewReader(payload))
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
c.applyHeaders(httpReq)
|
||||
httpReq.Header.Set("Content-Type", "application/json")
|
||||
resp, err := transport.Do(ctx, c.http, httpReq)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
if resp.StatusCode >= 300 {
|
||||
b, _ := io.ReadAll(io.LimitReader(resp.Body, 8<<10))
|
||||
return fmt.Errorf("progress failed: %s (%s)", resp.Status, string(b))
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func (c *Client) doClaim(ctx context.Context, endpoint string, req ClaimJobRequest) (*ClaimJobResponse, error) {
|
||||
payload, err := json.Marshal(req)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("marshal claim request: %w", err)
|
||||
}
|
||||
|
||||
httpReq, err := http.NewRequest(http.MethodPost, endpoint, bytes.NewReader(payload))
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
c.applyHeaders(httpReq)
|
||||
httpReq.Header.Set("Content-Type", "application/json")
|
||||
|
||||
resp, err := transport.Do(ctx, c.http, httpReq)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
|
||||
if resp.StatusCode == http.StatusNoContent {
|
||||
return nil, nil // no work available
|
||||
}
|
||||
if resp.StatusCode >= 300 {
|
||||
b, _ := io.ReadAll(io.LimitReader(resp.Body, 8<<10))
|
||||
return nil, fmt.Errorf("claim failed: %s (%s)", resp.Status, string(b))
|
||||
}
|
||||
|
||||
var out ClaimJobResponse
|
||||
decoder := json.NewDecoder(resp.Body)
|
||||
decoder.DisallowUnknownFields()
|
||||
if err := decoder.Decode(&out); err != nil {
|
||||
return nil, fmt.Errorf("decode claim response: %w", err)
|
||||
}
|
||||
return &out, nil
|
||||
}
|
||||
|
||||
func (c *Client) applyHeaders(r *http.Request) {
|
||||
if c.apiKey != "" {
|
||||
r.Header.Set("Authorization", "Bearer "+c.apiKey)
|
||||
}
|
||||
if c.tenantID != "" {
|
||||
r.Header.Set("X-StellaOps-Tenant", c.tenantID)
|
||||
}
|
||||
if c.projectID != "" {
|
||||
r.Header.Set("X-StellaOps-Project", c.projectID)
|
||||
}
|
||||
r.Header.Set("Accept", "application/json")
|
||||
if c.userAgent != "" {
|
||||
r.Header.Set("User-Agent", c.userAgent)
|
||||
}
|
||||
}
|
||||
|
||||
func (c *Client) resolve(p string) string {
|
||||
clone := *c.baseURL
|
||||
clone.Path = path.Join(clone.Path, p)
|
||||
return clone.String()
|
||||
}
|
||||
@@ -0,0 +1,136 @@
|
||||
package workersdk
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"net/http"
|
||||
"net/http/httptest"
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
type claimRecorded struct {
|
||||
Method string
|
||||
Path string
|
||||
Auth string
|
||||
Tenant string
|
||||
Project string
|
||||
Body ClaimJobRequest
|
||||
}
|
||||
|
||||
type ackRecorded struct {
|
||||
Method string
|
||||
Path string
|
||||
Body AckJobRequest
|
||||
}
|
||||
|
||||
func TestClaimAndAck(t *testing.T) {
|
||||
var claimRec claimRecorded
|
||||
var ackRec ackRecorded
|
||||
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
switch r.URL.Path {
|
||||
case "/api/jobs/lease":
|
||||
claimRec.Method = r.Method
|
||||
claimRec.Path = r.URL.Path
|
||||
claimRec.Auth = r.Header.Get("Authorization")
|
||||
claimRec.Tenant = r.Header.Get("X-StellaOps-Tenant")
|
||||
claimRec.Project = r.Header.Get("X-StellaOps-Project")
|
||||
if err := json.NewDecoder(r.Body).Decode(&claimRec.Body); err != nil {
|
||||
t.Fatalf("decode claim: %v", err)
|
||||
}
|
||||
resp := ClaimJobResponse{
|
||||
JobID: "123",
|
||||
LeaseID: "lease-1",
|
||||
ExpiresAt: time.Date(2025, 11, 17, 0, 0, 0, 0, time.UTC),
|
||||
JobType: "demo",
|
||||
Payload: json.RawMessage(`{"key":"value"}`),
|
||||
}
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
json.NewEncoder(w).Encode(resp)
|
||||
case "/api/jobs/123/heartbeat":
|
||||
w.WriteHeader(http.StatusAccepted)
|
||||
case "/api/jobs/123/progress":
|
||||
w.WriteHeader(http.StatusAccepted)
|
||||
default:
|
||||
ackRec.Method = r.Method
|
||||
ackRec.Path = r.URL.Path
|
||||
if err := json.NewDecoder(r.Body).Decode(&ackRec.Body); err != nil {
|
||||
t.Fatalf("decode ack: %v", err)
|
||||
}
|
||||
w.WriteHeader(http.StatusAccepted)
|
||||
}
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
client, err := New(Config{
|
||||
BaseURL: srv.URL,
|
||||
APIKey: "token-1",
|
||||
TenantID: "tenant-a",
|
||||
ProjectID: "project-1",
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatalf("new client: %v", err)
|
||||
}
|
||||
|
||||
ctx := context.Background()
|
||||
claimResp, err := client.Claim(ctx, ClaimJobRequest{WorkerID: "worker-7", Capabilities: []string{"scan"}})
|
||||
if err != nil {
|
||||
t.Fatalf("claim: %v", err)
|
||||
}
|
||||
|
||||
if claimRec.Method != http.MethodPost || claimRec.Path != "/api/jobs/lease" {
|
||||
t.Fatalf("unexpected claim method/path: %s %s", claimRec.Method, claimRec.Path)
|
||||
}
|
||||
if claimRec.Auth != "Bearer token-1" {
|
||||
t.Fatalf("auth header mismatch: %s", claimRec.Auth)
|
||||
}
|
||||
if claimRec.Tenant != "tenant-a" || claimRec.Project != "project-1" {
|
||||
t.Fatalf("tenant/project headers missing: %s %s", claimRec.Tenant, claimRec.Project)
|
||||
}
|
||||
if claimRec.Body.WorkerID != "worker-7" {
|
||||
t.Fatalf("worker id missing")
|
||||
}
|
||||
|
||||
if claimResp == nil || claimResp.JobID != "123" || claimResp.LeaseID != "lease-1" {
|
||||
t.Fatalf("claim response mismatch: %+v", claimResp)
|
||||
}
|
||||
|
||||
err = client.Ack(ctx, AckJobRequest{JobID: claimResp.JobID, LeaseID: claimResp.LeaseID, Status: "succeeded"})
|
||||
if err != nil {
|
||||
t.Fatalf("ack error: %v", err)
|
||||
}
|
||||
|
||||
if err := client.Heartbeat(ctx, claimResp.JobID, claimResp.LeaseID); err != nil {
|
||||
t.Fatalf("heartbeat error: %v", err)
|
||||
}
|
||||
if err := client.Progress(ctx, claimResp.JobID, claimResp.LeaseID, 50, "halfway"); err != nil {
|
||||
t.Fatalf("progress error: %v", err)
|
||||
}
|
||||
|
||||
if ackRec.Method != http.MethodPost {
|
||||
t.Fatalf("ack method mismatch: %s", ackRec.Method)
|
||||
}
|
||||
if ackRec.Path != "/api/jobs/123/ack" {
|
||||
t.Fatalf("ack path mismatch: %s", ackRec.Path)
|
||||
}
|
||||
if ackRec.Body.Status != "succeeded" || ackRec.Body.JobID != "123" {
|
||||
t.Fatalf("ack body mismatch: %+v", ackRec.Body)
|
||||
}
|
||||
}
|
||||
|
||||
func TestClaimMissingWorker(t *testing.T) {
|
||||
client, err := New(Config{BaseURL: "https://example.invalid"})
|
||||
if err != nil {
|
||||
t.Fatalf("new client: %v", err)
|
||||
}
|
||||
if _, err := client.Claim(context.Background(), ClaimJobRequest{}); err == nil {
|
||||
t.Fatal("expected error for missing worker id")
|
||||
}
|
||||
}
|
||||
|
||||
func TestConfigValidation(t *testing.T) {
|
||||
if _, err := New(Config{}); err == nil {
|
||||
t.Fatal("expected error for missing base url")
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,49 @@
|
||||
package workersdk
|
||||
|
||||
import (
|
||||
"errors"
|
||||
"net/http"
|
||||
"strings"
|
||||
|
||||
"git.stella-ops.org/stellaops/jobengine/worker-sdk-go/internal/transport"
|
||||
)
|
||||
|
||||
// Config holds SDK configuration.
|
||||
type Config struct {
|
||||
BaseURL string
|
||||
APIKey string
|
||||
TenantID string
|
||||
ProjectID string
|
||||
UserAgent string
|
||||
Client transport.Client
|
||||
Logger Logger
|
||||
Metrics MetricsSink
|
||||
}
|
||||
|
||||
func (c *Config) validate() error {
|
||||
if strings.TrimSpace(c.BaseURL) == "" {
|
||||
return errors.New("BaseURL is required")
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func (c *Config) httpClient() transport.Client {
|
||||
if c.Client != nil {
|
||||
return c.Client
|
||||
}
|
||||
return transport.DefaultClient(http.DefaultTransport)
|
||||
}
|
||||
|
||||
func (c *Config) logger() Logger {
|
||||
if c.Logger != nil {
|
||||
return c.Logger
|
||||
}
|
||||
return NoopLogger{}
|
||||
}
|
||||
|
||||
func (c *Config) metrics() MetricsSink {
|
||||
if c.Metrics != nil {
|
||||
return c.Metrics
|
||||
}
|
||||
return NoopMetrics{}
|
||||
}
|
||||
@@ -0,0 +1,29 @@
|
||||
package workersdk
|
||||
|
||||
// ErrorCode represents orchestrator error categories.
|
||||
type ErrorCode string
|
||||
|
||||
const (
|
||||
ErrorCodeTemporary ErrorCode = "temporary"
|
||||
ErrorCodePermanent ErrorCode = "permanent"
|
||||
ErrorCodeFatal ErrorCode = "fatal"
|
||||
ErrorCodeUnauth ErrorCode = "unauthorized"
|
||||
ErrorCodeQuota ErrorCode = "quota_exceeded"
|
||||
ErrorCodeValidation ErrorCode = "validation"
|
||||
)
|
||||
|
||||
// ErrorClassification maps HTTP status to codes and retryability.
|
||||
func ErrorClassification(status int) (ErrorCode, bool) {
|
||||
switch {
|
||||
case status == 401 || status == 403:
|
||||
return ErrorCodeUnauth, false
|
||||
case status >= 500 && status < 600:
|
||||
return ErrorCodeTemporary, true
|
||||
case status == 429:
|
||||
return ErrorCodeQuota, true
|
||||
case status >= 400 && status < 500:
|
||||
return ErrorCodePermanent, false
|
||||
default:
|
||||
return "", false
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,24 @@
|
||||
package workersdk
|
||||
|
||||
import "testing"
|
||||
|
||||
func TestErrorClassification(t *testing.T) {
|
||||
cases := []struct {
|
||||
status int
|
||||
code ErrorCode
|
||||
retry bool
|
||||
}{
|
||||
{500, ErrorCodeTemporary, true},
|
||||
{503, ErrorCodeTemporary, true},
|
||||
{429, ErrorCodeQuota, true},
|
||||
{401, ErrorCodeUnauth, false},
|
||||
{400, ErrorCodePermanent, false},
|
||||
{404, ErrorCodePermanent, false},
|
||||
}
|
||||
for _, c := range cases {
|
||||
code, retry := ErrorClassification(c.status)
|
||||
if code != c.code || retry != c.retry {
|
||||
t.Fatalf("status %d -> got %s retry %v", c.status, code, retry)
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,15 @@
|
||||
package workersdk
|
||||
|
||||
import "context"
|
||||
|
||||
// Logger is a minimal structured logger interface.
|
||||
type Logger interface {
|
||||
Info(ctx context.Context, msg string, fields map[string]any)
|
||||
Error(ctx context.Context, msg string, fields map[string]any)
|
||||
}
|
||||
|
||||
// NoopLogger is used when no logger provided.
|
||||
type NoopLogger struct{}
|
||||
|
||||
func (NoopLogger) Info(_ context.Context, _ string, _ map[string]any) {}
|
||||
func (NoopLogger) Error(_ context.Context, _ string, _ map[string]any) {}
|
||||
@@ -0,0 +1,17 @@
|
||||
package workersdk
|
||||
|
||||
// MetricsSink allows callers to wire Prometheus or other metrics systems.
|
||||
type MetricsSink interface {
|
||||
IncClaimed()
|
||||
IncAck(status string)
|
||||
ObserveHeartbeatLatency(seconds float64)
|
||||
IncHeartbeatFailures()
|
||||
}
|
||||
|
||||
// NoopMetrics is the default sink when none is provided.
|
||||
type NoopMetrics struct{}
|
||||
|
||||
func (NoopMetrics) IncClaimed() {}
|
||||
func (NoopMetrics) IncAck(_ string) {}
|
||||
func (NoopMetrics) ObserveHeartbeatLatency(_ float64) {}
|
||||
func (NoopMetrics) IncHeartbeatFailures() {}
|
||||
@@ -0,0 +1,53 @@
|
||||
package workersdk
|
||||
|
||||
import (
|
||||
"context"
|
||||
"math/rand"
|
||||
"time"
|
||||
)
|
||||
|
||||
// RetryPolicy defines retry behavior.
|
||||
type RetryPolicy struct {
|
||||
MaxAttempts int
|
||||
BaseDelay time.Duration
|
||||
MaxDelay time.Duration
|
||||
Jitter float64 // between 0 and 1, represents percentage of jitter.
|
||||
}
|
||||
|
||||
// DefaultRetryPolicy returns exponential backoff with jitter suitable for worker I/O.
|
||||
func DefaultRetryPolicy() RetryPolicy {
|
||||
return RetryPolicy{MaxAttempts: 5, BaseDelay: 200 * time.Millisecond, MaxDelay: 5 * time.Second, Jitter: 0.2}
|
||||
}
|
||||
|
||||
// Retry executes fn with retries according to policy.
|
||||
func Retry(ctx context.Context, policy RetryPolicy, fn func() error) error {
|
||||
if policy.MaxAttempts <= 0 {
|
||||
policy = DefaultRetryPolicy()
|
||||
}
|
||||
delay := policy.BaseDelay
|
||||
for attempt := 1; attempt <= policy.MaxAttempts; attempt++ {
|
||||
err := fn()
|
||||
if err == nil {
|
||||
return nil
|
||||
}
|
||||
if attempt == policy.MaxAttempts {
|
||||
return err
|
||||
}
|
||||
// apply jitter
|
||||
jitter := 1 + (policy.Jitter * (rand.Float64()*2 - 1))
|
||||
sleepFor := time.Duration(float64(delay) * jitter)
|
||||
if sleepFor > policy.MaxDelay {
|
||||
sleepFor = policy.MaxDelay
|
||||
}
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
return ctx.Err()
|
||||
case <-time.After(sleepFor):
|
||||
}
|
||||
delay *= 2
|
||||
if delay > policy.MaxDelay {
|
||||
delay = policy.MaxDelay
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
21
src/JobEngine/StellaOps.JobEngine.WorkerSdk.Python/AGENTS.md
Normal file
21
src/JobEngine/StellaOps.JobEngine.WorkerSdk.Python/AGENTS.md
Normal file
@@ -0,0 +1,21 @@
|
||||
# Worker SDK (Python) — Agent Charter
|
||||
|
||||
## Mission
|
||||
Publish the Python client library for StellaOps orchestrated workers. Provide asyncio-friendly claim/heartbeat/progress APIs, artifact publishing helpers, error handling, and observability hooks aligned with Epic 9 requirements and the imposed rule for cross-component parity.
|
||||
|
||||
## Responsibilities
|
||||
- Maintain typed client (httpx/async) with retry/backoff primitives mirroring jobengine expectations.
|
||||
- Surface structured metrics/logging instrumentation and pluggable exporters.
|
||||
- Enforce idempotency token usage, artifact checksum publication, and watermark/backfill helpers.
|
||||
- Coordinate versioning with Go SDK, jobengine service contracts, DevOps packaging, and Offline Kit deliverables.
|
||||
|
||||
## Required Reading
|
||||
- `docs/modules/jobengine/architecture.md`
|
||||
- `docs/modules/platform/architecture-overview.md`
|
||||
|
||||
## Working Agreement
|
||||
- 1. Update task status to `DOING`/`DONE` in both correspoding sprint file `/docs/implplan/SPRINT_*.md` and the local `TASKS.md` when you start or finish work.
|
||||
- 2. Review this charter and the Required Reading documents before coding; confirm prerequisites are met.
|
||||
- 3. Keep changes deterministic (stable ordering, timestamps, hashes) and align with offline/air-gap expectations.
|
||||
- 4. Coordinate doc updates, tests, and cross-guild communication whenever contracts or workflows change.
|
||||
- 5. Revert to `TODO` if you pause the task without shipping changes; leave notes in commit/PR descriptions for context.
|
||||
10
src/JobEngine/StellaOps.JobEngine.WorkerSdk.Python/README.md
Normal file
10
src/JobEngine/StellaOps.JobEngine.WorkerSdk.Python/README.md
Normal file
@@ -0,0 +1,10 @@
|
||||
# StellaOps Orchestrator Worker SDK (Python)
|
||||
|
||||
Async-friendly SDK for StellaOps workers: claim jobs, acknowledge results, and attach tenant-aware auth headers. The default transport is dependency-free and can be swapped for aiohttp/httpx as needed.
|
||||
|
||||
## Quick start
|
||||
```bash
|
||||
export ORCH_BASE_URL=http://localhost:8080
|
||||
export ORCH_API_KEY=dev-token
|
||||
python sample_worker.py
|
||||
```
|
||||
@@ -0,0 +1,11 @@
|
||||
[project]
|
||||
name = "stellaops-jobengine-worker"
|
||||
version = "0.1.0"
|
||||
description = "Async worker SDK for StellaOps Orchestrator"
|
||||
authors = [{name = "StellaOps"}]
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.10"
|
||||
|
||||
[build-system]
|
||||
requires = ["setuptools"]
|
||||
build-backend = "setuptools.build_meta"
|
||||
@@ -0,0 +1,41 @@
|
||||
import asyncio
|
||||
import os
|
||||
|
||||
from stellaops_jobengine_worker import (
|
||||
AckJobRequest,
|
||||
ClaimJobRequest,
|
||||
Config,
|
||||
OrchestratorClient,
|
||||
)
|
||||
from stellaops_jobengine_worker.retry import RetryPolicy, retry
|
||||
|
||||
|
||||
async def main():
|
||||
cfg = Config(
|
||||
base_url=os.environ.get("ORCH_BASE_URL", "http://localhost:8080"),
|
||||
api_key=os.environ.get("ORCH_API_KEY", "dev-token"),
|
||||
tenant_id=os.environ.get("ORCH_TENANT", "local-tenant"),
|
||||
project_id=os.environ.get("ORCH_PROJECT", "demo-project"),
|
||||
)
|
||||
client = OrchestratorClient(cfg)
|
||||
|
||||
claim = await client.claim(ClaimJobRequest(worker_id="py-worker", capabilities=["pack-run"]))
|
||||
if claim is None:
|
||||
print("no work available")
|
||||
return
|
||||
|
||||
# ... perform actual work described by claim.payload ...
|
||||
await client.heartbeat(job_id=claim.job_id, lease_id=claim.lease_id)
|
||||
await client.progress(job_id=claim.job_id, lease_id=claim.lease_id, pct=50, message="halfway")
|
||||
|
||||
async def _ack():
|
||||
await client.ack(
|
||||
AckJobRequest(job_id=claim.job_id, lease_id=claim.lease_id, status="succeeded"),
|
||||
)
|
||||
|
||||
await retry(RetryPolicy(), _ack)
|
||||
print(f"acknowledged job {claim.job_id}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
@@ -0,0 +1,37 @@
|
||||
"""Async worker SDK for StellaOps Orchestrator."""
|
||||
|
||||
from .client import OrchestratorClient, ClaimJobRequest, AckJobRequest, ClaimJobResponse
|
||||
from .config import Config
|
||||
from .metrics import MetricsSink, NoopMetrics
|
||||
from .transport import Transport, InMemoryTransport, TransportRequest, TransportResponse
|
||||
from .retry import RetryPolicy, retry
|
||||
from .storage import publish_artifact, InMemoryStorage, ArtifactPublishResult, Storage
|
||||
from .errors import ErrorCode, classify_status
|
||||
from .backfill import Range, WatermarkHandshake, Deduper, execute_range, verify_and_publish_artifact
|
||||
|
||||
__all__ = [
|
||||
"OrchestratorClient",
|
||||
"ClaimJobRequest",
|
||||
"ClaimJobResponse",
|
||||
"AckJobRequest",
|
||||
"Config",
|
||||
"MetricsSink",
|
||||
"NoopMetrics",
|
||||
"RetryPolicy",
|
||||
"retry",
|
||||
"Storage",
|
||||
"publish_artifact",
|
||||
"InMemoryStorage",
|
||||
"ArtifactPublishResult",
|
||||
"Range",
|
||||
"WatermarkHandshake",
|
||||
"Deduper",
|
||||
"execute_range",
|
||||
"verify_and_publish_artifact",
|
||||
"ErrorCode",
|
||||
"classify_status",
|
||||
"Transport",
|
||||
"InMemoryTransport",
|
||||
"TransportRequest",
|
||||
"TransportResponse",
|
||||
]
|
||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
@@ -0,0 +1,81 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import datetime as dt
|
||||
from dataclasses import dataclass
|
||||
from typing import Awaitable, Callable, Optional
|
||||
|
||||
from .storage import publish_artifact, ArtifactPublishResult, Storage
|
||||
|
||||
|
||||
@dataclass
|
||||
class Range:
|
||||
start: dt.datetime
|
||||
end: dt.datetime
|
||||
|
||||
def validate(self) -> None:
|
||||
if self.end < self.start:
|
||||
raise ValueError("range end before start")
|
||||
|
||||
|
||||
@dataclass
|
||||
class WatermarkHandshake:
|
||||
expected: str
|
||||
current: str
|
||||
|
||||
def validate(self) -> None:
|
||||
if not self.expected:
|
||||
raise ValueError("expected watermark required")
|
||||
if self.expected != self.current:
|
||||
raise ValueError("watermark mismatch")
|
||||
|
||||
|
||||
class Deduper:
|
||||
def __init__(self):
|
||||
self._seen: set[str] = set()
|
||||
|
||||
def seen(self, key: str) -> bool:
|
||||
if not key:
|
||||
return False
|
||||
if key in self._seen:
|
||||
return True
|
||||
self._seen.add(key)
|
||||
return False
|
||||
|
||||
|
||||
async def execute_range(r: Range, step: dt.timedelta, fn: Callable[[dt.datetime], Awaitable[None]]) -> None:
|
||||
r.validate()
|
||||
if step.total_seconds() <= 0:
|
||||
raise ValueError("step must be positive")
|
||||
current = r.start
|
||||
while current <= r.end:
|
||||
await fn(current)
|
||||
current = current + step
|
||||
|
||||
|
||||
async def verify_and_publish_artifact(
|
||||
*,
|
||||
storage: Storage,
|
||||
wm: WatermarkHandshake,
|
||||
dedupe: Optional[Deduper],
|
||||
job_id: str,
|
||||
lease_id: str,
|
||||
object_key: str,
|
||||
content: bytes,
|
||||
content_type: str = "application/octet-stream",
|
||||
artifact_type: Optional[str] = None,
|
||||
idempotency_key: Optional[str] = None,
|
||||
) -> ArtifactPublishResult:
|
||||
wm.validate()
|
||||
if dedupe and idempotency_key and dedupe.seen(idempotency_key):
|
||||
raise ValueError("duplicate artifact idempotency key")
|
||||
return await publish_artifact(
|
||||
storage=storage,
|
||||
job_id=job_id,
|
||||
lease_id=lease_id,
|
||||
object_key=object_key,
|
||||
content=content,
|
||||
content_type=content_type,
|
||||
artifact_type=artifact_type,
|
||||
idempotency_key=idempotency_key,
|
||||
)
|
||||
@@ -0,0 +1,111 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from dataclasses import dataclass
|
||||
from typing import Optional
|
||||
from urllib.parse import urljoin
|
||||
|
||||
from .config import Config
|
||||
from .metrics import MetricsSink
|
||||
from .transport import Transport, TransportRequest, TransportResponse
|
||||
|
||||
|
||||
@dataclass
|
||||
class ClaimJobRequest:
|
||||
worker_id: str
|
||||
capabilities: Optional[list[str]] = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class ClaimJobResponse:
|
||||
job_id: str
|
||||
lease_id: str
|
||||
job_type: Optional[str]
|
||||
payload: dict
|
||||
expires_at: Optional[str] = None
|
||||
retry_after_seconds: Optional[int] = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class AckJobRequest:
|
||||
job_id: str
|
||||
lease_id: str
|
||||
status: str
|
||||
message: Optional[str] = None
|
||||
|
||||
|
||||
class OrchestratorClient:
|
||||
"""Async client for job claim/ack operations."""
|
||||
|
||||
def __init__(self, config: Config, *, transport: Optional[Transport] = None):
|
||||
config.validate()
|
||||
self._cfg = config
|
||||
self._transport = transport or config.get_transport()
|
||||
self._metrics: MetricsSink = config.get_metrics()
|
||||
|
||||
async def claim(self, request: ClaimJobRequest) -> Optional[ClaimJobResponse]:
|
||||
if not request.worker_id:
|
||||
raise ValueError("worker_id is required")
|
||||
body = json.dumps(request.__dict__).encode()
|
||||
resp = await self._execute("POST", "/api/jobs/lease", body)
|
||||
if resp.status == 204:
|
||||
return None
|
||||
if resp.status >= 300:
|
||||
raise RuntimeError(f"claim failed: {resp.status} {resp.body.decode(errors='ignore')}")
|
||||
data = json.loads(resp.body)
|
||||
self._metrics.inc_claimed()
|
||||
return ClaimJobResponse(
|
||||
job_id=data["job_id"],
|
||||
lease_id=data["lease_id"],
|
||||
job_type=data.get("job_type"),
|
||||
payload=data.get("payload", {}),
|
||||
expires_at=data.get("expires_at"),
|
||||
retry_after_seconds=data.get("retry_after_seconds"),
|
||||
)
|
||||
|
||||
async def ack(self, request: AckJobRequest) -> None:
|
||||
if not request.job_id or not request.lease_id or not request.status:
|
||||
raise ValueError("job_id, lease_id, and status are required")
|
||||
body = json.dumps(request.__dict__).encode()
|
||||
resp = await self._execute("POST", f"/api/jobs/{request.job_id}/ack", body)
|
||||
if resp.status >= 300:
|
||||
raise RuntimeError(f"ack failed: {resp.status} {resp.body.decode(errors='ignore')}")
|
||||
self._metrics.inc_ack(request.status)
|
||||
|
||||
async def heartbeat(self, *, job_id: str, lease_id: str) -> None:
|
||||
if not job_id or not lease_id:
|
||||
raise ValueError("job_id and lease_id are required")
|
||||
body = json.dumps({"lease_id": lease_id}).encode()
|
||||
resp = await self._execute("POST", f"/api/jobs/{job_id}/heartbeat", body)
|
||||
if resp.status >= 300:
|
||||
self._metrics.inc_heartbeat_failures()
|
||||
raise RuntimeError(f"heartbeat failed: {resp.status} {resp.body.decode(errors='ignore')}")
|
||||
# latency recorded by caller; keep simple here
|
||||
|
||||
async def progress(self, *, job_id: str, lease_id: str, pct: int, message: Optional[str] = None) -> None:
|
||||
if pct < 0 or pct > 100:
|
||||
raise ValueError("pct must be 0-100")
|
||||
payload = {"lease_id": lease_id, "progress": pct}
|
||||
if message:
|
||||
payload["message"] = message
|
||||
body = json.dumps(payload).encode()
|
||||
resp = await self._execute("POST", f"/api/jobs/{job_id}/progress", body)
|
||||
if resp.status >= 300:
|
||||
raise RuntimeError(f"progress failed: {resp.status} {resp.body.decode(errors='ignore')}")
|
||||
|
||||
async def _execute(self, method: str, path: str, body: Optional[bytes]) -> TransportResponse:
|
||||
url = urljoin(self._cfg.base_url.rstrip("/") + "/", path.lstrip("/"))
|
||||
headers = {
|
||||
"Accept": "application/json",
|
||||
"Content-Type": "application/json",
|
||||
"User-Agent": self._cfg.user_agent,
|
||||
}
|
||||
if self._cfg.api_key:
|
||||
headers["Authorization"] = f"Bearer {self._cfg.api_key}"
|
||||
if self._cfg.tenant_id:
|
||||
headers["X-StellaOps-Tenant"] = self._cfg.tenant_id
|
||||
if self._cfg.project_id:
|
||||
headers["X-StellaOps-Project"] = self._cfg.project_id
|
||||
|
||||
req = TransportRequest(method=method, url=url, headers=headers, body=body)
|
||||
return await self._transport.execute(req)
|
||||
@@ -0,0 +1,30 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from typing import Optional
|
||||
|
||||
from .metrics import MetricsSink, NoopMetrics
|
||||
from .transport import Transport, default_transport
|
||||
|
||||
|
||||
@dataclass
|
||||
class Config:
|
||||
"""SDK configuration."""
|
||||
|
||||
base_url: str
|
||||
api_key: Optional[str] = None
|
||||
tenant_id: Optional[str] = None
|
||||
project_id: Optional[str] = None
|
||||
user_agent: str = "stellaops-worker-sdk-py/0.1"
|
||||
transport: Optional[Transport] = None
|
||||
metrics: Optional[MetricsSink] = None
|
||||
|
||||
def validate(self) -> None:
|
||||
if not self.base_url:
|
||||
raise ValueError("base_url is required")
|
||||
|
||||
def get_transport(self) -> Transport:
|
||||
return self.transport or default_transport()
|
||||
|
||||
def get_metrics(self) -> MetricsSink:
|
||||
return self.metrics or NoopMetrics()
|
||||
@@ -0,0 +1,24 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from enum import Enum
|
||||
|
||||
|
||||
class ErrorCode(str, Enum):
|
||||
TEMPORARY = "temporary"
|
||||
PERMANENT = "permanent"
|
||||
FATAL = "fatal"
|
||||
UNAUTHORIZED = "unauthorized"
|
||||
QUOTA = "quota_exceeded"
|
||||
VALIDATION = "validation"
|
||||
|
||||
|
||||
def classify_status(status: int) -> tuple[ErrorCode | None, bool]:
|
||||
if status in (401, 403):
|
||||
return ErrorCode.UNAUTHORIZED, False
|
||||
if status == 429:
|
||||
return ErrorCode.QUOTA, True
|
||||
if 500 <= status < 600:
|
||||
return ErrorCode.TEMPORARY, True
|
||||
if 400 <= status < 500:
|
||||
return ErrorCode.PERMANENT, False
|
||||
return None, False
|
||||
@@ -0,0 +1,24 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Protocol
|
||||
|
||||
|
||||
class MetricsSink(Protocol):
|
||||
def inc_claimed(self) -> None: ...
|
||||
def inc_ack(self, status: str) -> None: ...
|
||||
def observe_heartbeat_latency(self, seconds: float) -> None: ...
|
||||
def inc_heartbeat_failures(self) -> None: ...
|
||||
|
||||
|
||||
class NoopMetrics:
|
||||
def inc_claimed(self) -> None:
|
||||
return None
|
||||
|
||||
def inc_ack(self, status: str) -> None:
|
||||
return None
|
||||
|
||||
def observe_heartbeat_latency(self, seconds: float) -> None:
|
||||
return None
|
||||
|
||||
def inc_heartbeat_failures(self) -> None:
|
||||
return None
|
||||
@@ -0,0 +1,34 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import random
|
||||
from dataclasses import dataclass
|
||||
from typing import Awaitable, Callable
|
||||
|
||||
|
||||
@dataclass
|
||||
class RetryPolicy:
|
||||
max_attempts: int = 5
|
||||
base_delay: float = 0.2 # seconds
|
||||
max_delay: float = 5.0 # seconds
|
||||
jitter: float = 0.2 # +/- 20%
|
||||
|
||||
|
||||
def _jittered(delay: float, jitter: float) -> float:
|
||||
if jitter <= 0:
|
||||
return delay
|
||||
factor = 1 + ((random.random() * 2 - 1) * jitter)
|
||||
return delay * factor
|
||||
|
||||
|
||||
async def retry(policy: RetryPolicy, fn: Callable[[], Awaitable[None]]) -> None:
|
||||
delay = policy.base_delay
|
||||
for attempt in range(1, policy.max_attempts + 1):
|
||||
try:
|
||||
await fn()
|
||||
return
|
||||
except Exception: # pragma: no cover - caller handles fatal
|
||||
if attempt == policy.max_attempts:
|
||||
raise
|
||||
await asyncio.sleep(min(_jittered(delay, policy.jitter), policy.max_delay))
|
||||
delay = min(delay * 2, policy.max_delay)
|
||||
@@ -0,0 +1,56 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import hashlib
|
||||
from dataclasses import dataclass
|
||||
from typing import Protocol, Dict, Optional
|
||||
|
||||
|
||||
class Storage(Protocol):
|
||||
async def put_object(self, key: str, data: bytes, metadata: Dict[str, str]) -> None: ...
|
||||
|
||||
|
||||
@dataclass
|
||||
class ArtifactPublishResult:
|
||||
sha256: str
|
||||
size: int
|
||||
|
||||
|
||||
async def publish_artifact(
|
||||
*,
|
||||
storage: Storage,
|
||||
job_id: str,
|
||||
lease_id: str,
|
||||
object_key: str,
|
||||
content: bytes,
|
||||
content_type: str = "application/octet-stream",
|
||||
artifact_type: Optional[str] = None,
|
||||
idempotency_key: Optional[str] = None,
|
||||
) -> ArtifactPublishResult:
|
||||
if not job_id or not lease_id:
|
||||
raise ValueError("job_id and lease_id are required")
|
||||
if not object_key:
|
||||
raise ValueError("object_key is required")
|
||||
if storage is None:
|
||||
raise ValueError("storage is required")
|
||||
|
||||
sha = hashlib.sha256(content).hexdigest()
|
||||
metadata = {
|
||||
"x-stellaops-job-id": job_id,
|
||||
"x-stellaops-lease": lease_id,
|
||||
"x-stellaops-ct": content_type,
|
||||
}
|
||||
if artifact_type:
|
||||
metadata["x-stellaops-type"] = artifact_type
|
||||
if idempotency_key:
|
||||
metadata["x-idempotency-key"] = idempotency_key
|
||||
|
||||
await storage.put_object(object_key, content, metadata)
|
||||
return ArtifactPublishResult(sha256=sha, size=len(content))
|
||||
|
||||
|
||||
class InMemoryStorage(Storage):
|
||||
def __init__(self):
|
||||
self.calls = []
|
||||
|
||||
async def put_object(self, key: str, data: bytes, metadata: Dict[str, str]) -> None:
|
||||
self.calls.append((key, data, metadata))
|
||||
Binary file not shown.
Binary file not shown.
@@ -0,0 +1,164 @@
|
||||
import asyncio
|
||||
import json
|
||||
import unittest
|
||||
import datetime as dt
|
||||
|
||||
from stellaops_jobengine_worker import (
|
||||
AckJobRequest,
|
||||
ClaimJobRequest,
|
||||
Config,
|
||||
ErrorCode,
|
||||
Deduper,
|
||||
Range,
|
||||
WatermarkHandshake,
|
||||
execute_range,
|
||||
verify_and_publish_artifact,
|
||||
InMemoryStorage,
|
||||
InMemoryTransport,
|
||||
MetricsSink,
|
||||
OrchestratorClient,
|
||||
TransportRequest,
|
||||
TransportResponse,
|
||||
classify_status,
|
||||
publish_artifact,
|
||||
)
|
||||
|
||||
|
||||
class ClientTests(unittest.TestCase):
|
||||
def test_claim_and_ack_headers(self):
|
||||
seen = {}
|
||||
metric_calls = {"claimed": 0, "ack": 0, "hb_fail": 0}
|
||||
|
||||
class Metrics(MetricsSink):
|
||||
def inc_claimed(self) -> None:
|
||||
metric_calls["claimed"] += 1
|
||||
|
||||
def inc_ack(self, status: str) -> None:
|
||||
metric_calls["ack"] += 1
|
||||
|
||||
def observe_heartbeat_latency(self, seconds: float) -> None:
|
||||
metric_calls["latency"] = seconds
|
||||
|
||||
def inc_heartbeat_failures(self) -> None:
|
||||
metric_calls["hb_fail"] += 1
|
||||
|
||||
def handler(req: TransportRequest) -> TransportResponse:
|
||||
if req.url.endswith("/api/jobs/lease"):
|
||||
seen["claim_headers"] = req.headers
|
||||
seen["claim_url"] = req.url
|
||||
body = json.loads(req.body)
|
||||
self.assertEqual(body["worker_id"], "w1")
|
||||
payload = {
|
||||
"job_id": "123",
|
||||
"lease_id": "l1",
|
||||
"job_type": "demo",
|
||||
"payload": {"k": "v"},
|
||||
}
|
||||
return TransportResponse(status=200, headers={}, body=json.dumps(payload).encode())
|
||||
if req.url.endswith("/api/jobs/123/heartbeat"):
|
||||
return TransportResponse(status=202, headers={}, body=b"")
|
||||
if req.url.endswith("/api/jobs/123/progress"):
|
||||
return TransportResponse(status=202, headers={}, body=b"")
|
||||
seen["ack_headers"] = req.headers
|
||||
seen["ack_url"] = req.url
|
||||
return TransportResponse(status=202, headers={}, body=b"")
|
||||
|
||||
transport = InMemoryTransport(handler)
|
||||
client = OrchestratorClient(
|
||||
Config(base_url="http://orch/", api_key="t", tenant_id="tenant-a", project_id="project-1", metrics=Metrics()),
|
||||
transport=transport,
|
||||
)
|
||||
|
||||
loop = asyncio.new_event_loop()
|
||||
asyncio.set_event_loop(loop)
|
||||
claim = loop.run_until_complete(
|
||||
client.claim(ClaimJobRequest(worker_id="w1", capabilities=["scan"]))
|
||||
)
|
||||
self.assertEqual(claim.job_id, "123")
|
||||
loop.run_until_complete(client.ack(AckJobRequest(job_id="123", lease_id="l1", status="succeeded")))
|
||||
loop.run_until_complete(client.heartbeat(job_id="123", lease_id="l1"))
|
||||
loop.run_until_complete(client.progress(job_id="123", lease_id="l1", pct=50, message="halfway"))
|
||||
|
||||
headers = seen["claim_headers"]
|
||||
self.assertEqual(headers["Authorization"], "Bearer t")
|
||||
self.assertEqual(headers["X-StellaOps-Tenant"], "tenant-a")
|
||||
self.assertEqual(headers["X-StellaOps-Project"], "project-1")
|
||||
self.assertIn("/api/jobs/lease", seen["claim_url"])
|
||||
self.assertEqual(metric_calls["claimed"], 1)
|
||||
self.assertEqual(metric_calls["ack"], 1)
|
||||
|
||||
def test_missing_worker_rejected(self):
|
||||
client = OrchestratorClient(Config(base_url="http://orch"))
|
||||
loop = asyncio.get_event_loop()
|
||||
with self.assertRaises(ValueError):
|
||||
loop.run_until_complete(client.claim(ClaimJobRequest(worker_id="")))
|
||||
|
||||
def test_publish_artifact(self):
|
||||
storage = InMemoryStorage()
|
||||
loop = asyncio.new_event_loop()
|
||||
asyncio.set_event_loop(loop)
|
||||
result = loop.run_until_complete(
|
||||
publish_artifact(
|
||||
storage=storage,
|
||||
job_id="j1",
|
||||
lease_id="l1",
|
||||
object_key="artifacts/j1/out.txt",
|
||||
content=b"hello",
|
||||
content_type="text/plain",
|
||||
artifact_type="log",
|
||||
idempotency_key="idem-1",
|
||||
)
|
||||
)
|
||||
self.assertEqual(result.size, 5)
|
||||
self.assertEqual(len(storage.calls), 1)
|
||||
key, data, metadata = storage.calls[0]
|
||||
self.assertEqual(key, "artifacts/j1/out.txt")
|
||||
self.assertEqual(data, b"hello")
|
||||
self.assertEqual(metadata["x-idempotency-key"], "idem-1")
|
||||
|
||||
def test_classify_status(self):
|
||||
code, retry = classify_status(500)
|
||||
self.assertEqual(code, ErrorCode.TEMPORARY)
|
||||
self.assertTrue(retry)
|
||||
code, retry = classify_status(404)
|
||||
self.assertEqual(code, ErrorCode.PERMANENT)
|
||||
self.assertFalse(retry)
|
||||
|
||||
def test_execute_range_and_watermark(self):
|
||||
r = Range(start=dt.datetime(2025, 11, 15), end=dt.datetime(2025, 11, 17))
|
||||
hits = []
|
||||
|
||||
async def fn(ts: dt.datetime):
|
||||
hits.append(ts.date())
|
||||
|
||||
asyncio.get_event_loop().run_until_complete(execute_range(r, dt.timedelta(days=1), fn))
|
||||
self.assertEqual(len(hits), 3)
|
||||
with self.assertRaises(ValueError):
|
||||
Range(start=r.end, end=r.start - dt.timedelta(days=1)).validate()
|
||||
|
||||
wm = WatermarkHandshake(expected="w1", current="w2")
|
||||
with self.assertRaises(ValueError):
|
||||
wm.validate()
|
||||
|
||||
def test_verify_and_publish_dedupe(self):
|
||||
storage = InMemoryStorage()
|
||||
dedupe = Deduper()
|
||||
dedupe.seen("idem-1")
|
||||
loop = asyncio.get_event_loop()
|
||||
with self.assertRaises(ValueError):
|
||||
loop.run_until_complete(
|
||||
verify_and_publish_artifact(
|
||||
storage=storage,
|
||||
wm=WatermarkHandshake(expected="w", current="w"),
|
||||
dedupe=dedupe,
|
||||
job_id="j",
|
||||
lease_id="l",
|
||||
object_key="k",
|
||||
content=b"",
|
||||
idempotency_key="idem-1",
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
@@ -0,0 +1,62 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
from dataclasses import dataclass
|
||||
from typing import Awaitable, Callable, Dict, Optional
|
||||
import json
|
||||
import urllib.request
|
||||
|
||||
|
||||
@dataclass
|
||||
class TransportRequest:
|
||||
method: str
|
||||
url: str
|
||||
headers: Dict[str, str]
|
||||
body: Optional[bytes]
|
||||
|
||||
|
||||
@dataclass
|
||||
class TransportResponse:
|
||||
status: int
|
||||
headers: Dict[str, str]
|
||||
body: bytes
|
||||
|
||||
|
||||
class Transport:
|
||||
"""Abstract transport interface for HTTP requests."""
|
||||
|
||||
async def execute(self, request: TransportRequest) -> TransportResponse: # pragma: no cover - interface
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
class _StdlibTransport(Transport):
|
||||
def __init__(self, *, timeout: float = 10.0):
|
||||
self._timeout = timeout
|
||||
|
||||
async def execute(self, request: TransportRequest) -> TransportResponse:
|
||||
def _do() -> TransportResponse:
|
||||
req = urllib.request.Request(
|
||||
request.url, data=request.body, method=request.method, headers=request.headers
|
||||
)
|
||||
with urllib.request.urlopen(req, timeout=self._timeout) as resp: # nosec B310: controlled endpoint
|
||||
return TransportResponse(
|
||||
status=resp.status,
|
||||
headers=dict(resp.headers.items()),
|
||||
body=resp.read(),
|
||||
)
|
||||
|
||||
return await asyncio.to_thread(_do)
|
||||
|
||||
|
||||
class InMemoryTransport(Transport):
|
||||
"""Simple stub transport for tests that returns a prepared response."""
|
||||
|
||||
def __init__(self, handler: Callable[[TransportRequest], TransportResponse]):
|
||||
self._handler = handler
|
||||
|
||||
async def execute(self, request: TransportRequest) -> TransportResponse:
|
||||
return self._handler(request)
|
||||
|
||||
|
||||
def default_transport() -> Transport:
|
||||
return _StdlibTransport()
|
||||
508
src/JobEngine/StellaOps.JobEngine.sln
Normal file
508
src/JobEngine/StellaOps.JobEngine.sln
Normal file
@@ -0,0 +1,508 @@
|
||||
Microsoft Visual Studio Solution File, Format Version 12.00
|
||||
# Visual Studio Version 17
|
||||
VisualStudioVersion = 17.0.31903.59
|
||||
MinimumVisualStudioVersion = 10.0.40219.1
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "StellaOps.JobEngine", "StellaOps.JobEngine", "{0BD8BADA-1E00-7228-CA2D-F67E2A51EDC0}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "StellaOps.JobEngine.Core", "StellaOps.JobEngine.Core", "{C9C6ED3E-166F-F8A2-9ADB-D30271C31F89}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "StellaOps.JobEngine.Infrastructure", "StellaOps.JobEngine.Infrastructure", "{698ECAEE-58EE-22A4-23C3-A281DD9076DE}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "StellaOps.JobEngine.Tests", "StellaOps.JobEngine.Tests", "{43BD7CCE-81F1-671A-02CF-7BDE295E6D15}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "StellaOps.JobEngine.WebService", "StellaOps.JobEngine.WebService", "{7B5EBFF9-DCD8-4C3E-52B7-33A01F59BD96}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "StellaOps.JobEngine.Worker", "StellaOps.JobEngine.Worker", "{EEE65590-0DA5-BAFD-3BFC-6492600454B6}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "__External", "__External", "{5B52EF8A-3661-DCFF-797D-BC4D6AC60BDA}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "AirGap", "AirGap", "{F310596E-88BB-9E54-885E-21C61971917E}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "StellaOps.AirGap.Policy", "StellaOps.AirGap.Policy", "{D9492ED1-A812-924B-65E4-F518592B49BB}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "StellaOps.AirGap.Policy", "StellaOps.AirGap.Policy", "{3823DE1E-2ACE-C956-99E1-00DB786D9E1D}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Router", "Router", "{FC018E5B-1E2F-DE19-1E97-0C845058C469}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "__Libraries", "__Libraries", "{1BE5B76C-B486-560B-6CB2-44C6537249AA}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "StellaOps.Messaging", "StellaOps.Messaging", "{F4F1CBE2-1CDD-CAA4-41F0-266DB4677C05}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "StellaOps.Messaging.Transport.InMemory", "StellaOps.Messaging.Transport.InMemory", "{8A8ABE17-5D77-260A-0393-3259C16EA732}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "StellaOps.Messaging.Transport.Postgres", "StellaOps.Messaging.Transport.Postgres", "{13CFAACB-89E7-1596-3B36-E39ECD8C2072}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "StellaOps.Messaging.Transport.Valkey", "StellaOps.Messaging.Transport.Valkey", "{6748B1AD-9881-8346-F454-058000A448E7}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "StellaOps.Microservice", "StellaOps.Microservice", "{3DE1DCDC-C845-4AC7-7B66-34B0A9E8626B}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "StellaOps.Microservice.AspNetCore", "StellaOps.Microservice.AspNetCore", "{6FA01E92-606B-0CB8-8583-6F693A903CFC}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "StellaOps.Router.AspNet", "StellaOps.Router.AspNet", "{A5994E92-7E0E-89FE-5628-DE1A0176B8BA}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "StellaOps.Router.Common", "StellaOps.Router.Common", "{54C11B29-4C54-7255-AB44-BEB63AF9BD1F}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Telemetry", "Telemetry", "{E9A667F9-9627-4297-EF5E-0333593FDA14}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "StellaOps.Telemetry.Core", "StellaOps.Telemetry.Core", "{B81E0B20-6C85-AC09-1DB6-5BD6CBB8AA62}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "StellaOps.Telemetry.Core", "StellaOps.Telemetry.Core", "{74C64C1F-14F4-7B75-C354-9F252494A758}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "__Libraries", "__Libraries", "{1345DD29-BB3A-FB5F-4B3D-E29F6045A27A}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "StellaOps.Canonical.Json", "StellaOps.Canonical.Json", "{79E122F4-2325-3E92-438E-5825A307B594}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "StellaOps.Cryptography", "StellaOps.Cryptography", "{66557252-B5C4-664B-D807-07018C627474}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "StellaOps.DependencyInjection", "StellaOps.DependencyInjection", "{589A43FD-8213-E9E3-6CFF-9CBA72D53E98}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "StellaOps.Metrics", "StellaOps.Metrics", "{6DFCCD05-3039-AE97-5008-F38C440FB1A9}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "StellaOps.Plugin", "StellaOps.Plugin", "{772B02B5-6280-E1D4-3E2E-248D0455C2FB}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "StellaOps.TestKit", "StellaOps.TestKit", "{8380A20C-A5B8-EE91-1A58-270323688CB9}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "StellaOps.AirGap.Policy", "..\\AirGap\StellaOps.AirGap.Policy\StellaOps.AirGap.Policy\StellaOps.AirGap.Policy.csproj", "{AD31623A-BC43-52C2-D906-AC1D8784A541}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "StellaOps.Canonical.Json", "..\\__Libraries\StellaOps.Canonical.Json\StellaOps.Canonical.Json.csproj", "{AF9E7F02-25AD-3540-18D7-F6A4F8BA5A60}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "StellaOps.Cryptography", "..\\__Libraries\StellaOps.Cryptography\StellaOps.Cryptography.csproj", "{F664A948-E352-5808-E780-77A03F19E93E}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "StellaOps.DependencyInjection", "..\\__Libraries\StellaOps.DependencyInjection\StellaOps.DependencyInjection.csproj", "{632A1F0D-1BA5-C84B-B716-2BE638A92780}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "StellaOps.Messaging", "..\\Router\__Libraries\StellaOps.Messaging\StellaOps.Messaging.csproj", "{97998C88-E6E1-D5E2-B632-537B58E00CBF}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "StellaOps.Messaging.Transport.InMemory", "..\\Router\__Libraries\StellaOps.Messaging.Transport.InMemory\StellaOps.Messaging.Transport.InMemory.csproj", "{96279C16-30E6-95B0-7759-EBF32CCAB6F8}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "StellaOps.Messaging.Transport.Postgres", "..\\Router\__Libraries\StellaOps.Messaging.Transport.Postgres\StellaOps.Messaging.Transport.Postgres.csproj", "{4CDE8730-52CD-45E3-44B8-5ED84B62AD5B}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "StellaOps.Messaging.Transport.Valkey", "..\\Router\__Libraries\StellaOps.Messaging.Transport.Valkey\StellaOps.Messaging.Transport.Valkey.csproj", "{CB0EA9C0-9989-0BE2-EA0B-AF2D6803C1AB}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "StellaOps.Metrics", "..\\__Libraries\StellaOps.Metrics\StellaOps.Metrics.csproj", "{5E060B4F-1CAE-5140-F5D3-6A077660BD1A}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "StellaOps.Microservice", "..\\Router\__Libraries\StellaOps.Microservice\StellaOps.Microservice.csproj", "{BAD08D96-A80A-D27F-5D9C-656AEEB3D568}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "StellaOps.Microservice.AspNetCore", "..\\Router\__Libraries\StellaOps.Microservice.AspNetCore\StellaOps.Microservice.AspNetCore.csproj", "{F63694F1-B56D-6E72-3F5D-5D38B1541F0F}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "StellaOps.JobEngine.Core", "StellaOps.JobEngine\StellaOps.JobEngine.Core\StellaOps.JobEngine.Core.csproj", "{783EF693-2851-C594-B1E4-784ADC73C8DE}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "StellaOps.JobEngine.Infrastructure", "StellaOps.JobEngine\StellaOps.JobEngine.Infrastructure\StellaOps.JobEngine.Infrastructure.csproj", "{245946A1-4AC0-69A3-52C2-19B102FA7D9F}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "StellaOps.JobEngine.Tests", "StellaOps.JobEngine\StellaOps.JobEngine.Tests\StellaOps.JobEngine.Tests.csproj", "{E1413BFB-C320-E54C-14B3-4600AC5A5A70}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "StellaOps.JobEngine.WebService", "StellaOps.JobEngine\StellaOps.JobEngine.WebService\StellaOps.JobEngine.WebService.csproj", "{B1C35286-4A4E-5677-A09F-4AD04ABB15D3}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "StellaOps.JobEngine.Worker", "StellaOps.JobEngine\StellaOps.JobEngine.Worker\StellaOps.JobEngine.Worker.csproj", "{D49617DE-10E1-78EF-0AE3-0E0EB1BCA01A}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "StellaOps.Plugin", "..\\__Libraries\StellaOps.Plugin\StellaOps.Plugin.csproj", "{38A9EE9B-6FC8-93BC-0D43-2A906E678D66}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "StellaOps.Router.AspNet", "..\\Router\__Libraries\StellaOps.Router.AspNet\StellaOps.Router.AspNet.csproj", "{79104479-B087-E5D0-5523-F1803282A246}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "StellaOps.Router.Common", "..\\Router\__Libraries\StellaOps.Router.Common\StellaOps.Router.Common.csproj", "{F17A6F0B-3120-2BA9-84D8-5F8BA0B9705D}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "StellaOps.Telemetry.Core", "..\\Telemetry\StellaOps.Telemetry.Core\StellaOps.Telemetry.Core\StellaOps.Telemetry.Core.csproj", "{8CD19568-1638-B8F6-8447-82CFD4F17ADF}"
|
||||
|
||||
EndProject
|
||||
|
||||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "StellaOps.TestKit", "..\\__Libraries\StellaOps.TestKit\StellaOps.TestKit.csproj", "{AF043113-CCE3-59C1-DF71-9804155F26A8}"
|
||||
|
||||
EndProject
|
||||
|
||||
Global
|
||||
|
||||
GlobalSection(SolutionConfigurationPlatforms) = preSolution
|
||||
|
||||
Debug|Any CPU = Debug|Any CPU
|
||||
|
||||
Release|Any CPU = Release|Any CPU
|
||||
|
||||
EndGlobalSection
|
||||
|
||||
GlobalSection(ProjectConfigurationPlatforms) = postSolution
|
||||
|
||||
{AD31623A-BC43-52C2-D906-AC1D8784A541}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
|
||||
|
||||
{AD31623A-BC43-52C2-D906-AC1D8784A541}.Debug|Any CPU.Build.0 = Debug|Any CPU
|
||||
|
||||
{AD31623A-BC43-52C2-D906-AC1D8784A541}.Release|Any CPU.ActiveCfg = Release|Any CPU
|
||||
|
||||
{AD31623A-BC43-52C2-D906-AC1D8784A541}.Release|Any CPU.Build.0 = Release|Any CPU
|
||||
|
||||
{AF9E7F02-25AD-3540-18D7-F6A4F8BA5A60}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
|
||||
|
||||
{AF9E7F02-25AD-3540-18D7-F6A4F8BA5A60}.Debug|Any CPU.Build.0 = Debug|Any CPU
|
||||
|
||||
{AF9E7F02-25AD-3540-18D7-F6A4F8BA5A60}.Release|Any CPU.ActiveCfg = Release|Any CPU
|
||||
|
||||
{AF9E7F02-25AD-3540-18D7-F6A4F8BA5A60}.Release|Any CPU.Build.0 = Release|Any CPU
|
||||
|
||||
{F664A948-E352-5808-E780-77A03F19E93E}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
|
||||
|
||||
{F664A948-E352-5808-E780-77A03F19E93E}.Debug|Any CPU.Build.0 = Debug|Any CPU
|
||||
|
||||
{F664A948-E352-5808-E780-77A03F19E93E}.Release|Any CPU.ActiveCfg = Release|Any CPU
|
||||
|
||||
{F664A948-E352-5808-E780-77A03F19E93E}.Release|Any CPU.Build.0 = Release|Any CPU
|
||||
|
||||
{632A1F0D-1BA5-C84B-B716-2BE638A92780}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
|
||||
|
||||
{632A1F0D-1BA5-C84B-B716-2BE638A92780}.Debug|Any CPU.Build.0 = Debug|Any CPU
|
||||
|
||||
{632A1F0D-1BA5-C84B-B716-2BE638A92780}.Release|Any CPU.ActiveCfg = Release|Any CPU
|
||||
|
||||
{632A1F0D-1BA5-C84B-B716-2BE638A92780}.Release|Any CPU.Build.0 = Release|Any CPU
|
||||
|
||||
{97998C88-E6E1-D5E2-B632-537B58E00CBF}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
|
||||
|
||||
{97998C88-E6E1-D5E2-B632-537B58E00CBF}.Debug|Any CPU.Build.0 = Debug|Any CPU
|
||||
|
||||
{97998C88-E6E1-D5E2-B632-537B58E00CBF}.Release|Any CPU.ActiveCfg = Release|Any CPU
|
||||
|
||||
{97998C88-E6E1-D5E2-B632-537B58E00CBF}.Release|Any CPU.Build.0 = Release|Any CPU
|
||||
|
||||
{96279C16-30E6-95B0-7759-EBF32CCAB6F8}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
|
||||
|
||||
{96279C16-30E6-95B0-7759-EBF32CCAB6F8}.Debug|Any CPU.Build.0 = Debug|Any CPU
|
||||
|
||||
{96279C16-30E6-95B0-7759-EBF32CCAB6F8}.Release|Any CPU.ActiveCfg = Release|Any CPU
|
||||
|
||||
{96279C16-30E6-95B0-7759-EBF32CCAB6F8}.Release|Any CPU.Build.0 = Release|Any CPU
|
||||
|
||||
{4CDE8730-52CD-45E3-44B8-5ED84B62AD5B}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
|
||||
|
||||
{4CDE8730-52CD-45E3-44B8-5ED84B62AD5B}.Debug|Any CPU.Build.0 = Debug|Any CPU
|
||||
|
||||
{4CDE8730-52CD-45E3-44B8-5ED84B62AD5B}.Release|Any CPU.ActiveCfg = Release|Any CPU
|
||||
|
||||
{4CDE8730-52CD-45E3-44B8-5ED84B62AD5B}.Release|Any CPU.Build.0 = Release|Any CPU
|
||||
|
||||
{CB0EA9C0-9989-0BE2-EA0B-AF2D6803C1AB}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
|
||||
|
||||
{CB0EA9C0-9989-0BE2-EA0B-AF2D6803C1AB}.Debug|Any CPU.Build.0 = Debug|Any CPU
|
||||
|
||||
{CB0EA9C0-9989-0BE2-EA0B-AF2D6803C1AB}.Release|Any CPU.ActiveCfg = Release|Any CPU
|
||||
|
||||
{CB0EA9C0-9989-0BE2-EA0B-AF2D6803C1AB}.Release|Any CPU.Build.0 = Release|Any CPU
|
||||
|
||||
{5E060B4F-1CAE-5140-F5D3-6A077660BD1A}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
|
||||
|
||||
{5E060B4F-1CAE-5140-F5D3-6A077660BD1A}.Debug|Any CPU.Build.0 = Debug|Any CPU
|
||||
|
||||
{5E060B4F-1CAE-5140-F5D3-6A077660BD1A}.Release|Any CPU.ActiveCfg = Release|Any CPU
|
||||
|
||||
{5E060B4F-1CAE-5140-F5D3-6A077660BD1A}.Release|Any CPU.Build.0 = Release|Any CPU
|
||||
|
||||
{BAD08D96-A80A-D27F-5D9C-656AEEB3D568}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
|
||||
|
||||
{BAD08D96-A80A-D27F-5D9C-656AEEB3D568}.Debug|Any CPU.Build.0 = Debug|Any CPU
|
||||
|
||||
{BAD08D96-A80A-D27F-5D9C-656AEEB3D568}.Release|Any CPU.ActiveCfg = Release|Any CPU
|
||||
|
||||
{BAD08D96-A80A-D27F-5D9C-656AEEB3D568}.Release|Any CPU.Build.0 = Release|Any CPU
|
||||
|
||||
{F63694F1-B56D-6E72-3F5D-5D38B1541F0F}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
|
||||
|
||||
{F63694F1-B56D-6E72-3F5D-5D38B1541F0F}.Debug|Any CPU.Build.0 = Debug|Any CPU
|
||||
|
||||
{F63694F1-B56D-6E72-3F5D-5D38B1541F0F}.Release|Any CPU.ActiveCfg = Release|Any CPU
|
||||
|
||||
{F63694F1-B56D-6E72-3F5D-5D38B1541F0F}.Release|Any CPU.Build.0 = Release|Any CPU
|
||||
|
||||
{783EF693-2851-C594-B1E4-784ADC73C8DE}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
|
||||
|
||||
{783EF693-2851-C594-B1E4-784ADC73C8DE}.Debug|Any CPU.Build.0 = Debug|Any CPU
|
||||
|
||||
{783EF693-2851-C594-B1E4-784ADC73C8DE}.Release|Any CPU.ActiveCfg = Release|Any CPU
|
||||
|
||||
{783EF693-2851-C594-B1E4-784ADC73C8DE}.Release|Any CPU.Build.0 = Release|Any CPU
|
||||
|
||||
{245946A1-4AC0-69A3-52C2-19B102FA7D9F}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
|
||||
|
||||
{245946A1-4AC0-69A3-52C2-19B102FA7D9F}.Debug|Any CPU.Build.0 = Debug|Any CPU
|
||||
|
||||
{245946A1-4AC0-69A3-52C2-19B102FA7D9F}.Release|Any CPU.ActiveCfg = Release|Any CPU
|
||||
|
||||
{245946A1-4AC0-69A3-52C2-19B102FA7D9F}.Release|Any CPU.Build.0 = Release|Any CPU
|
||||
|
||||
{E1413BFB-C320-E54C-14B3-4600AC5A5A70}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
|
||||
|
||||
{E1413BFB-C320-E54C-14B3-4600AC5A5A70}.Debug|Any CPU.Build.0 = Debug|Any CPU
|
||||
|
||||
{E1413BFB-C320-E54C-14B3-4600AC5A5A70}.Release|Any CPU.ActiveCfg = Release|Any CPU
|
||||
|
||||
{E1413BFB-C320-E54C-14B3-4600AC5A5A70}.Release|Any CPU.Build.0 = Release|Any CPU
|
||||
|
||||
{B1C35286-4A4E-5677-A09F-4AD04ABB15D3}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
|
||||
|
||||
{B1C35286-4A4E-5677-A09F-4AD04ABB15D3}.Debug|Any CPU.Build.0 = Debug|Any CPU
|
||||
|
||||
{B1C35286-4A4E-5677-A09F-4AD04ABB15D3}.Release|Any CPU.ActiveCfg = Release|Any CPU
|
||||
|
||||
{B1C35286-4A4E-5677-A09F-4AD04ABB15D3}.Release|Any CPU.Build.0 = Release|Any CPU
|
||||
|
||||
{D49617DE-10E1-78EF-0AE3-0E0EB1BCA01A}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
|
||||
|
||||
{D49617DE-10E1-78EF-0AE3-0E0EB1BCA01A}.Debug|Any CPU.Build.0 = Debug|Any CPU
|
||||
|
||||
{D49617DE-10E1-78EF-0AE3-0E0EB1BCA01A}.Release|Any CPU.ActiveCfg = Release|Any CPU
|
||||
|
||||
{D49617DE-10E1-78EF-0AE3-0E0EB1BCA01A}.Release|Any CPU.Build.0 = Release|Any CPU
|
||||
|
||||
{38A9EE9B-6FC8-93BC-0D43-2A906E678D66}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
|
||||
|
||||
{38A9EE9B-6FC8-93BC-0D43-2A906E678D66}.Debug|Any CPU.Build.0 = Debug|Any CPU
|
||||
|
||||
{38A9EE9B-6FC8-93BC-0D43-2A906E678D66}.Release|Any CPU.ActiveCfg = Release|Any CPU
|
||||
|
||||
{38A9EE9B-6FC8-93BC-0D43-2A906E678D66}.Release|Any CPU.Build.0 = Release|Any CPU
|
||||
|
||||
{79104479-B087-E5D0-5523-F1803282A246}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
|
||||
|
||||
{79104479-B087-E5D0-5523-F1803282A246}.Debug|Any CPU.Build.0 = Debug|Any CPU
|
||||
|
||||
{79104479-B087-E5D0-5523-F1803282A246}.Release|Any CPU.ActiveCfg = Release|Any CPU
|
||||
|
||||
{79104479-B087-E5D0-5523-F1803282A246}.Release|Any CPU.Build.0 = Release|Any CPU
|
||||
|
||||
{F17A6F0B-3120-2BA9-84D8-5F8BA0B9705D}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
|
||||
|
||||
{F17A6F0B-3120-2BA9-84D8-5F8BA0B9705D}.Debug|Any CPU.Build.0 = Debug|Any CPU
|
||||
|
||||
{F17A6F0B-3120-2BA9-84D8-5F8BA0B9705D}.Release|Any CPU.ActiveCfg = Release|Any CPU
|
||||
|
||||
{F17A6F0B-3120-2BA9-84D8-5F8BA0B9705D}.Release|Any CPU.Build.0 = Release|Any CPU
|
||||
|
||||
{8CD19568-1638-B8F6-8447-82CFD4F17ADF}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
|
||||
|
||||
{8CD19568-1638-B8F6-8447-82CFD4F17ADF}.Debug|Any CPU.Build.0 = Debug|Any CPU
|
||||
|
||||
{8CD19568-1638-B8F6-8447-82CFD4F17ADF}.Release|Any CPU.ActiveCfg = Release|Any CPU
|
||||
|
||||
{8CD19568-1638-B8F6-8447-82CFD4F17ADF}.Release|Any CPU.Build.0 = Release|Any CPU
|
||||
|
||||
{AF043113-CCE3-59C1-DF71-9804155F26A8}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
|
||||
|
||||
{AF043113-CCE3-59C1-DF71-9804155F26A8}.Debug|Any CPU.Build.0 = Debug|Any CPU
|
||||
|
||||
{AF043113-CCE3-59C1-DF71-9804155F26A8}.Release|Any CPU.ActiveCfg = Release|Any CPU
|
||||
|
||||
{AF043113-CCE3-59C1-DF71-9804155F26A8}.Release|Any CPU.Build.0 = Release|Any CPU
|
||||
|
||||
EndGlobalSection
|
||||
|
||||
GlobalSection(SolutionProperties) = preSolution
|
||||
|
||||
HideSolutionNode = FALSE
|
||||
|
||||
EndGlobalSection
|
||||
|
||||
GlobalSection(NestedProjects) = preSolution
|
||||
|
||||
{C9C6ED3E-166F-F8A2-9ADB-D30271C31F89} = {0BD8BADA-1E00-7228-CA2D-F67E2A51EDC0}
|
||||
|
||||
{698ECAEE-58EE-22A4-23C3-A281DD9076DE} = {0BD8BADA-1E00-7228-CA2D-F67E2A51EDC0}
|
||||
|
||||
{43BD7CCE-81F1-671A-02CF-7BDE295E6D15} = {0BD8BADA-1E00-7228-CA2D-F67E2A51EDC0}
|
||||
|
||||
{7B5EBFF9-DCD8-4C3E-52B7-33A01F59BD96} = {0BD8BADA-1E00-7228-CA2D-F67E2A51EDC0}
|
||||
|
||||
{EEE65590-0DA5-BAFD-3BFC-6492600454B6} = {0BD8BADA-1E00-7228-CA2D-F67E2A51EDC0}
|
||||
|
||||
{F310596E-88BB-9E54-885E-21C61971917E} = {5B52EF8A-3661-DCFF-797D-BC4D6AC60BDA}
|
||||
|
||||
{D9492ED1-A812-924B-65E4-F518592B49BB} = {F310596E-88BB-9E54-885E-21C61971917E}
|
||||
|
||||
{3823DE1E-2ACE-C956-99E1-00DB786D9E1D} = {D9492ED1-A812-924B-65E4-F518592B49BB}
|
||||
|
||||
{FC018E5B-1E2F-DE19-1E97-0C845058C469} = {5B52EF8A-3661-DCFF-797D-BC4D6AC60BDA}
|
||||
|
||||
{1BE5B76C-B486-560B-6CB2-44C6537249AA} = {FC018E5B-1E2F-DE19-1E97-0C845058C469}
|
||||
|
||||
{F4F1CBE2-1CDD-CAA4-41F0-266DB4677C05} = {1BE5B76C-B486-560B-6CB2-44C6537249AA}
|
||||
|
||||
{8A8ABE17-5D77-260A-0393-3259C16EA732} = {1BE5B76C-B486-560B-6CB2-44C6537249AA}
|
||||
|
||||
{13CFAACB-89E7-1596-3B36-E39ECD8C2072} = {1BE5B76C-B486-560B-6CB2-44C6537249AA}
|
||||
|
||||
{6748B1AD-9881-8346-F454-058000A448E7} = {1BE5B76C-B486-560B-6CB2-44C6537249AA}
|
||||
|
||||
{3DE1DCDC-C845-4AC7-7B66-34B0A9E8626B} = {1BE5B76C-B486-560B-6CB2-44C6537249AA}
|
||||
|
||||
{6FA01E92-606B-0CB8-8583-6F693A903CFC} = {1BE5B76C-B486-560B-6CB2-44C6537249AA}
|
||||
|
||||
{A5994E92-7E0E-89FE-5628-DE1A0176B8BA} = {1BE5B76C-B486-560B-6CB2-44C6537249AA}
|
||||
|
||||
{54C11B29-4C54-7255-AB44-BEB63AF9BD1F} = {1BE5B76C-B486-560B-6CB2-44C6537249AA}
|
||||
|
||||
{E9A667F9-9627-4297-EF5E-0333593FDA14} = {5B52EF8A-3661-DCFF-797D-BC4D6AC60BDA}
|
||||
|
||||
{B81E0B20-6C85-AC09-1DB6-5BD6CBB8AA62} = {E9A667F9-9627-4297-EF5E-0333593FDA14}
|
||||
|
||||
{74C64C1F-14F4-7B75-C354-9F252494A758} = {B81E0B20-6C85-AC09-1DB6-5BD6CBB8AA62}
|
||||
|
||||
{1345DD29-BB3A-FB5F-4B3D-E29F6045A27A} = {5B52EF8A-3661-DCFF-797D-BC4D6AC60BDA}
|
||||
|
||||
{79E122F4-2325-3E92-438E-5825A307B594} = {1345DD29-BB3A-FB5F-4B3D-E29F6045A27A}
|
||||
|
||||
{66557252-B5C4-664B-D807-07018C627474} = {1345DD29-BB3A-FB5F-4B3D-E29F6045A27A}
|
||||
|
||||
{589A43FD-8213-E9E3-6CFF-9CBA72D53E98} = {1345DD29-BB3A-FB5F-4B3D-E29F6045A27A}
|
||||
|
||||
{6DFCCD05-3039-AE97-5008-F38C440FB1A9} = {1345DD29-BB3A-FB5F-4B3D-E29F6045A27A}
|
||||
|
||||
{772B02B5-6280-E1D4-3E2E-248D0455C2FB} = {1345DD29-BB3A-FB5F-4B3D-E29F6045A27A}
|
||||
|
||||
{8380A20C-A5B8-EE91-1A58-270323688CB9} = {1345DD29-BB3A-FB5F-4B3D-E29F6045A27A}
|
||||
|
||||
{AD31623A-BC43-52C2-D906-AC1D8784A541} = {3823DE1E-2ACE-C956-99E1-00DB786D9E1D}
|
||||
|
||||
{AF9E7F02-25AD-3540-18D7-F6A4F8BA5A60} = {79E122F4-2325-3E92-438E-5825A307B594}
|
||||
|
||||
{F664A948-E352-5808-E780-77A03F19E93E} = {66557252-B5C4-664B-D807-07018C627474}
|
||||
|
||||
{632A1F0D-1BA5-C84B-B716-2BE638A92780} = {589A43FD-8213-E9E3-6CFF-9CBA72D53E98}
|
||||
|
||||
{97998C88-E6E1-D5E2-B632-537B58E00CBF} = {F4F1CBE2-1CDD-CAA4-41F0-266DB4677C05}
|
||||
|
||||
{96279C16-30E6-95B0-7759-EBF32CCAB6F8} = {8A8ABE17-5D77-260A-0393-3259C16EA732}
|
||||
|
||||
{4CDE8730-52CD-45E3-44B8-5ED84B62AD5B} = {13CFAACB-89E7-1596-3B36-E39ECD8C2072}
|
||||
|
||||
{CB0EA9C0-9989-0BE2-EA0B-AF2D6803C1AB} = {6748B1AD-9881-8346-F454-058000A448E7}
|
||||
|
||||
{5E060B4F-1CAE-5140-F5D3-6A077660BD1A} = {6DFCCD05-3039-AE97-5008-F38C440FB1A9}
|
||||
|
||||
{BAD08D96-A80A-D27F-5D9C-656AEEB3D568} = {3DE1DCDC-C845-4AC7-7B66-34B0A9E8626B}
|
||||
|
||||
{F63694F1-B56D-6E72-3F5D-5D38B1541F0F} = {6FA01E92-606B-0CB8-8583-6F693A903CFC}
|
||||
|
||||
{783EF693-2851-C594-B1E4-784ADC73C8DE} = {C9C6ED3E-166F-F8A2-9ADB-D30271C31F89}
|
||||
|
||||
{245946A1-4AC0-69A3-52C2-19B102FA7D9F} = {698ECAEE-58EE-22A4-23C3-A281DD9076DE}
|
||||
|
||||
{E1413BFB-C320-E54C-14B3-4600AC5A5A70} = {43BD7CCE-81F1-671A-02CF-7BDE295E6D15}
|
||||
|
||||
{B1C35286-4A4E-5677-A09F-4AD04ABB15D3} = {7B5EBFF9-DCD8-4C3E-52B7-33A01F59BD96}
|
||||
|
||||
{D49617DE-10E1-78EF-0AE3-0E0EB1BCA01A} = {EEE65590-0DA5-BAFD-3BFC-6492600454B6}
|
||||
|
||||
{38A9EE9B-6FC8-93BC-0D43-2A906E678D66} = {772B02B5-6280-E1D4-3E2E-248D0455C2FB}
|
||||
|
||||
{79104479-B087-E5D0-5523-F1803282A246} = {A5994E92-7E0E-89FE-5628-DE1A0176B8BA}
|
||||
|
||||
{F17A6F0B-3120-2BA9-84D8-5F8BA0B9705D} = {54C11B29-4C54-7255-AB44-BEB63AF9BD1F}
|
||||
|
||||
{8CD19568-1638-B8F6-8447-82CFD4F17ADF} = {74C64C1F-14F4-7B75-C354-9F252494A758}
|
||||
|
||||
{AF043113-CCE3-59C1-DF71-9804155F26A8} = {8380A20C-A5B8-EE91-1A58-270323688CB9}
|
||||
|
||||
EndGlobalSection
|
||||
|
||||
GlobalSection(ExtensibilityGlobals) = postSolution
|
||||
|
||||
SolutionGuid = {448CB79D-193F-8952-2F87-43B50BC2B101}
|
||||
|
||||
EndGlobalSection
|
||||
|
||||
EndGlobal
|
||||
|
||||
|
||||
30
src/JobEngine/StellaOps.JobEngine/AGENTS.md
Normal file
30
src/JobEngine/StellaOps.JobEngine/AGENTS.md
Normal file
@@ -0,0 +1,30 @@
|
||||
# StellaOps JobEngine Service — Agent Charter
|
||||
|
||||
## Mission
|
||||
Build and operate the Source & Job JobEngine control plane described in Epic 9. Own scheduler, job state persistence, rate limiting, audit/provenance exports, and realtime streaming APIs while respecting the imposed rule: work of this type must be applied everywhere it belongs.
|
||||
|
||||
## Key Responsibilities
|
||||
- Maintain deterministic Postgres schema/migrations for sources, runs, jobs, dag edges, artifacts, quotas, and schedules.
|
||||
- Implement DAG planner, token-bucket rate limiting, watermark/backfill manager, dead-letter replay, and horizontal scale guards.
|
||||
- Publish REST + WebSocket/SSE APIs powering Console/CLI, capture audit trails, and guard tenant isolation/RBAC scopes.
|
||||
- Coordinate with Worker SDK, Concelier, Excititor, SBOM, Policy, VEX Lens, Findings Ledger, Authority, Console, CLI, DevOps, and Docs teams to keep integrations in sync.
|
||||
|
||||
## Module Layout
|
||||
- `StellaOps.JobEngine.Core/` — scheduler primitives, DAG models, rate limit policies.
|
||||
- `StellaOps.JobEngine.Infrastructure/` — Postgres DAL, queue integrations, telemetry shims.
|
||||
- `StellaOps.JobEngine.WebService/` — control-plane APIs (sources, runs, jobs, streams).
|
||||
- `StellaOps.JobEngine.Worker/` — execution coordinator / lease manager loops.
|
||||
- `StellaOps.JobEngine.Tests/` — unit tests for core/infrastructure concerns.
|
||||
- `StellaOps.JobEngine.sln` — solution bundling jobengine components.
|
||||
|
||||
## Required Reading
|
||||
- `docs/modules/jobengine/architecture.md`
|
||||
- `docs/modules/platform/architecture-overview.md`
|
||||
|
||||
## Working Agreement
|
||||
- 1. Update task status to `DOING`/`DONE` in both correspoding sprint file `/docs/implplan/SPRINT_*.md` and the local `TASKS.md` when you start or finish work.
|
||||
- 2. Review this charter and the Required Reading documents before coding; confirm prerequisites are met.
|
||||
- 3. Keep changes deterministic (stable ordering, timestamps, hashes) and align with offline/air-gap expectations.
|
||||
- 4. Coordinate doc updates, tests, and cross-guild communication whenever contracts or workflows change.
|
||||
- 5. Revert to `TODO` if you pause the task without shipping changes; leave notes in commit/PR descriptions for context.
|
||||
- 6. **Contract guardrails:** Pack-run scheduling now requires `projectId` plus tenant headers; reject/422 if absent. Keep OpenAPI examples and worker/CLI samples aligned. Preserve idempotency semantics (`Idempotency-Key`) and deterministic pagination/stream ordering in all APIs.
|
||||
@@ -0,0 +1,13 @@
|
||||
# StellaOps.JobEngine.Core Agent Charter
|
||||
|
||||
## Mission
|
||||
Provide core orchestration domain logic, scheduling, and evidence helpers.
|
||||
|
||||
## Required Reading
|
||||
- docs/modules/jobengine/architecture.md
|
||||
- docs/modules/platform/architecture-overview.md
|
||||
|
||||
## Working Agreement
|
||||
- Update sprint status in docs/implplan/SPRINT_*.md and local TASKS.md.
|
||||
- Keep behavior deterministic (stable ordering, timestamps, hashes).
|
||||
- Add or update tests for core scheduling/evidence logic.
|
||||
@@ -0,0 +1,412 @@
|
||||
|
||||
using Microsoft.Extensions.Logging;
|
||||
using StellaOps.JobEngine.Core.Domain.AirGap;
|
||||
using System.Text.Json;
|
||||
using System.Text.RegularExpressions;
|
||||
|
||||
namespace StellaOps.JobEngine.Core.AirGap;
|
||||
|
||||
/// <summary>
|
||||
/// Validates network intents declared in job payloads.
|
||||
/// Per ORCH-AIRGAP-56-001: Enforce job descriptors to declare network intents.
|
||||
/// </summary>
|
||||
public interface INetworkIntentValidator
|
||||
{
|
||||
/// <summary>
|
||||
/// Validates network intents for a job payload.
|
||||
/// </summary>
|
||||
/// <param name="jobType">The job type.</param>
|
||||
/// <param name="payload">The job payload JSON.</param>
|
||||
/// <param name="config">Network intent configuration.</param>
|
||||
/// <param name="isSealed">Whether the environment is in sealed mode.</param>
|
||||
/// <returns>Validation result.</returns>
|
||||
NetworkIntentValidationResult ValidateForJob(
|
||||
string jobType,
|
||||
string payload,
|
||||
NetworkIntentConfig config,
|
||||
bool isSealed);
|
||||
|
||||
/// <summary>
|
||||
/// Extracts network endpoints from a job payload.
|
||||
/// </summary>
|
||||
/// <param name="payload">The job payload JSON.</param>
|
||||
/// <returns>List of detected network endpoints.</returns>
|
||||
IReadOnlyList<string> ExtractNetworkEndpoints(string payload);
|
||||
|
||||
/// <summary>
|
||||
/// Extracts declared network intents from a job payload.
|
||||
/// </summary>
|
||||
/// <param name="payload">The job payload JSON.</param>
|
||||
/// <returns>List of declared network intents.</returns>
|
||||
IReadOnlyList<NetworkIntent> ExtractDeclaredIntents(string payload);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Default implementation of network intent validator.
|
||||
/// </summary>
|
||||
public sealed partial class NetworkIntentValidator : INetworkIntentValidator
|
||||
{
|
||||
private readonly ILogger<NetworkIntentValidator> _logger;
|
||||
|
||||
// Common URL/endpoint field names in payloads
|
||||
private static readonly string[] UrlFieldNames =
|
||||
[
|
||||
"destinationUri",
|
||||
"callbackUrl",
|
||||
"webhookUrl",
|
||||
"endpoint",
|
||||
"url",
|
||||
"uri",
|
||||
"host",
|
||||
"server",
|
||||
"apiUrl",
|
||||
"serviceUrl",
|
||||
"notifyUrl",
|
||||
"targetUrl",
|
||||
"registryUrl",
|
||||
"collectorEndpoint"
|
||||
];
|
||||
|
||||
public NetworkIntentValidator(ILogger<NetworkIntentValidator> logger)
|
||||
{
|
||||
_logger = logger ?? throw new ArgumentNullException(nameof(logger));
|
||||
}
|
||||
|
||||
/// <inheritdoc/>
|
||||
public NetworkIntentValidationResult ValidateForJob(
|
||||
string jobType,
|
||||
string payload,
|
||||
NetworkIntentConfig config,
|
||||
bool isSealed)
|
||||
{
|
||||
ArgumentException.ThrowIfNullOrEmpty(jobType);
|
||||
ArgumentException.ThrowIfNullOrEmpty(payload);
|
||||
ArgumentNullException.ThrowIfNull(config);
|
||||
|
||||
// If enforcement is disabled, always pass
|
||||
if (config.EnforcementMode == EnforcementMode.Disabled)
|
||||
{
|
||||
_logger.LogDebug("Network intent enforcement disabled for job type {JobType}", jobType);
|
||||
return NetworkIntentValidationResult.Success();
|
||||
}
|
||||
|
||||
// If not in sealed mode and not requiring explicit intents, pass
|
||||
if (!isSealed && !config.RequireExplicitIntents)
|
||||
{
|
||||
return NetworkIntentValidationResult.Success();
|
||||
}
|
||||
|
||||
var detectedEndpoints = ExtractNetworkEndpoints(payload);
|
||||
var declaredIntents = ExtractDeclaredIntents(payload);
|
||||
|
||||
// If no network endpoints detected, pass
|
||||
if (detectedEndpoints.Count == 0)
|
||||
{
|
||||
return NetworkIntentValidationResult.Success();
|
||||
}
|
||||
|
||||
var violations = new List<NetworkIntentViolation>();
|
||||
var shouldBlock = config.EnforcementMode == EnforcementMode.Strict && isSealed;
|
||||
|
||||
// Check for undeclared endpoints (if requiring explicit intents)
|
||||
if (config.RequireExplicitIntents)
|
||||
{
|
||||
var declaredHosts = declaredIntents
|
||||
.Select(i => i.Host.ToLowerInvariant())
|
||||
.ToHashSet();
|
||||
|
||||
foreach (var endpoint in detectedEndpoints)
|
||||
{
|
||||
var host = ExtractHostFromEndpoint(endpoint);
|
||||
if (host is not null && !declaredHosts.Contains(host.ToLowerInvariant()))
|
||||
{
|
||||
// Check if any declared intent pattern matches
|
||||
var matchingIntent = declaredIntents.FirstOrDefault(i =>
|
||||
HostMatchesPattern(host, i.Host));
|
||||
|
||||
if (matchingIntent is null)
|
||||
{
|
||||
violations.Add(new NetworkIntentViolation(
|
||||
endpoint,
|
||||
NetworkViolationType.MissingIntent,
|
||||
null));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// In sealed mode, validate declared intents against allowlist
|
||||
if (isSealed && config.Allowlist is { Count: > 0 })
|
||||
{
|
||||
foreach (var intent in declaredIntents)
|
||||
{
|
||||
var isAllowed = config.Allowlist.Any(entry => intent.MatchesAllowlistEntry(entry));
|
||||
if (!isAllowed)
|
||||
{
|
||||
violations.Add(new NetworkIntentViolation(
|
||||
$"{intent.Protocol}://{intent.Host}:{intent.Port ?? 443}",
|
||||
NetworkViolationType.NotInAllowlist,
|
||||
intent));
|
||||
}
|
||||
}
|
||||
}
|
||||
else if (isSealed && (config.Allowlist is null || config.Allowlist.Count == 0))
|
||||
{
|
||||
// Sealed mode with no allowlist - all external network access is blocked
|
||||
foreach (var intent in declaredIntents)
|
||||
{
|
||||
violations.Add(new NetworkIntentViolation(
|
||||
$"{intent.Protocol}://{intent.Host}:{intent.Port ?? 443}",
|
||||
NetworkViolationType.NotInAllowlist,
|
||||
intent));
|
||||
}
|
||||
}
|
||||
|
||||
// Check for blocked protocols
|
||||
if (config.BlockedProtocols is { Count: > 0 })
|
||||
{
|
||||
foreach (var intent in declaredIntents)
|
||||
{
|
||||
if (config.BlockedProtocols.Contains(intent.Protocol, StringComparer.OrdinalIgnoreCase))
|
||||
{
|
||||
violations.Add(new NetworkIntentViolation(
|
||||
$"{intent.Protocol}://{intent.Host}",
|
||||
NetworkViolationType.BlockedProtocol,
|
||||
intent));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (violations.Count == 0)
|
||||
{
|
||||
return NetworkIntentValidationResult.Success();
|
||||
}
|
||||
|
||||
// Log violations
|
||||
foreach (var violation in violations)
|
||||
{
|
||||
if (shouldBlock)
|
||||
{
|
||||
_logger.LogWarning(
|
||||
"Network intent violation for job type {JobType}: {ViolationType} - {Endpoint}",
|
||||
jobType, violation.ViolationType, violation.Endpoint);
|
||||
}
|
||||
else
|
||||
{
|
||||
_logger.LogInformation(
|
||||
"Network intent warning for job type {JobType}: {ViolationType} - {Endpoint}",
|
||||
jobType, violation.ViolationType, violation.Endpoint);
|
||||
}
|
||||
}
|
||||
|
||||
// Build result based on violation types
|
||||
var hasMissingIntents = violations.Any(v => v.ViolationType == NetworkViolationType.MissingIntent);
|
||||
var hasDisallowed = violations.Any(v => v.ViolationType == NetworkViolationType.NotInAllowlist);
|
||||
|
||||
if (hasMissingIntents && !hasDisallowed)
|
||||
{
|
||||
var missingEndpoints = violations
|
||||
.Where(v => v.ViolationType == NetworkViolationType.MissingIntent)
|
||||
.Select(v => v.Endpoint)
|
||||
.ToList();
|
||||
return NetworkIntentValidationResult.MissingIntents(missingEndpoints, shouldBlock);
|
||||
}
|
||||
|
||||
return NetworkIntentValidationResult.DisallowedIntents(violations, shouldBlock);
|
||||
}
|
||||
|
||||
/// <inheritdoc/>
|
||||
public IReadOnlyList<string> ExtractNetworkEndpoints(string payload)
|
||||
{
|
||||
var endpoints = new HashSet<string>(StringComparer.OrdinalIgnoreCase);
|
||||
|
||||
try
|
||||
{
|
||||
using var doc = JsonDocument.Parse(payload);
|
||||
ExtractEndpointsFromElement(doc.RootElement, endpoints);
|
||||
}
|
||||
catch (JsonException ex)
|
||||
{
|
||||
_logger.LogDebug(ex, "Failed to parse payload as JSON for endpoint extraction");
|
||||
}
|
||||
|
||||
return [.. endpoints];
|
||||
}
|
||||
|
||||
/// <inheritdoc/>
|
||||
public IReadOnlyList<NetworkIntent> ExtractDeclaredIntents(string payload)
|
||||
{
|
||||
try
|
||||
{
|
||||
using var doc = JsonDocument.Parse(payload);
|
||||
var root = doc.RootElement;
|
||||
|
||||
// Look for "networkIntents" array in the payload
|
||||
if (root.TryGetProperty("networkIntents", out var intentsElement) &&
|
||||
intentsElement.ValueKind == JsonValueKind.Array)
|
||||
{
|
||||
var intents = new List<NetworkIntent>();
|
||||
foreach (var intentElement in intentsElement.EnumerateArray())
|
||||
{
|
||||
var intent = ParseNetworkIntent(intentElement);
|
||||
if (intent is not null)
|
||||
{
|
||||
intents.Add(intent);
|
||||
}
|
||||
}
|
||||
return intents;
|
||||
}
|
||||
|
||||
// Also check camelCase variant
|
||||
if (root.TryGetProperty("network_intents", out var intentsElement2) &&
|
||||
intentsElement2.ValueKind == JsonValueKind.Array)
|
||||
{
|
||||
var intents = new List<NetworkIntent>();
|
||||
foreach (var intentElement in intentsElement2.EnumerateArray())
|
||||
{
|
||||
var intent = ParseNetworkIntent(intentElement);
|
||||
if (intent is not null)
|
||||
{
|
||||
intents.Add(intent);
|
||||
}
|
||||
}
|
||||
return intents;
|
||||
}
|
||||
}
|
||||
catch (JsonException ex)
|
||||
{
|
||||
_logger.LogDebug(ex, "Failed to parse payload as JSON for intent extraction");
|
||||
}
|
||||
|
||||
return [];
|
||||
}
|
||||
|
||||
private static NetworkIntent? ParseNetworkIntent(JsonElement element)
|
||||
{
|
||||
if (element.ValueKind != JsonValueKind.Object)
|
||||
return null;
|
||||
|
||||
string? host = null;
|
||||
int? port = null;
|
||||
string protocol = "https";
|
||||
string purpose = "unspecified";
|
||||
var direction = NetworkDirection.Egress;
|
||||
|
||||
if (element.TryGetProperty("host", out var hostProp))
|
||||
host = hostProp.GetString();
|
||||
|
||||
if (element.TryGetProperty("port", out var portProp) && portProp.TryGetInt32(out var portValue))
|
||||
port = portValue;
|
||||
|
||||
if (element.TryGetProperty("protocol", out var protocolProp))
|
||||
protocol = protocolProp.GetString() ?? "https";
|
||||
|
||||
if (element.TryGetProperty("purpose", out var purposeProp))
|
||||
purpose = purposeProp.GetString() ?? "unspecified";
|
||||
|
||||
if (element.TryGetProperty("direction", out var directionProp))
|
||||
{
|
||||
var dirStr = directionProp.GetString();
|
||||
if (string.Equals(dirStr, "ingress", StringComparison.OrdinalIgnoreCase))
|
||||
direction = NetworkDirection.Ingress;
|
||||
}
|
||||
|
||||
return host is not null
|
||||
? new NetworkIntent(host, port, protocol, purpose, direction)
|
||||
: null;
|
||||
}
|
||||
|
||||
private void ExtractEndpointsFromElement(JsonElement element, HashSet<string> endpoints)
|
||||
{
|
||||
switch (element.ValueKind)
|
||||
{
|
||||
case JsonValueKind.Object:
|
||||
foreach (var property in element.EnumerateObject())
|
||||
{
|
||||
// Check if this is a URL field
|
||||
if (IsUrlFieldName(property.Name) &&
|
||||
property.Value.ValueKind == JsonValueKind.String)
|
||||
{
|
||||
var value = property.Value.GetString();
|
||||
if (!string.IsNullOrEmpty(value) && IsNetworkEndpoint(value))
|
||||
{
|
||||
endpoints.Add(value);
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
ExtractEndpointsFromElement(property.Value, endpoints);
|
||||
}
|
||||
}
|
||||
break;
|
||||
|
||||
case JsonValueKind.Array:
|
||||
foreach (var item in element.EnumerateArray())
|
||||
{
|
||||
ExtractEndpointsFromElement(item, endpoints);
|
||||
}
|
||||
break;
|
||||
|
||||
case JsonValueKind.String:
|
||||
var stringValue = element.GetString();
|
||||
if (!string.IsNullOrEmpty(stringValue) && IsNetworkEndpoint(stringValue))
|
||||
{
|
||||
endpoints.Add(stringValue);
|
||||
}
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
private static bool IsUrlFieldName(string fieldName)
|
||||
{
|
||||
return UrlFieldNames.Any(name =>
|
||||
fieldName.Contains(name, StringComparison.OrdinalIgnoreCase));
|
||||
}
|
||||
|
||||
private static bool IsNetworkEndpoint(string value)
|
||||
{
|
||||
// Check for URL patterns
|
||||
if (Uri.TryCreate(value, UriKind.Absolute, out var uri))
|
||||
{
|
||||
return uri.Scheme is "http" or "https" or "grpc" or "grpcs";
|
||||
}
|
||||
|
||||
// Check for host:port patterns
|
||||
return HostPortRegex().IsMatch(value);
|
||||
}
|
||||
|
||||
private static string? ExtractHostFromEndpoint(string endpoint)
|
||||
{
|
||||
if (Uri.TryCreate(endpoint, UriKind.Absolute, out var uri))
|
||||
{
|
||||
return uri.Host;
|
||||
}
|
||||
|
||||
// Try host:port format
|
||||
var match = HostPortRegex().Match(endpoint);
|
||||
if (match.Success)
|
||||
{
|
||||
return match.Groups[1].Value;
|
||||
}
|
||||
|
||||
return null;
|
||||
}
|
||||
|
||||
private static bool HostMatchesPattern(string host, string pattern)
|
||||
{
|
||||
if (string.Equals(pattern, "*", StringComparison.Ordinal))
|
||||
return true;
|
||||
|
||||
if (pattern.StartsWith("*.", StringComparison.Ordinal))
|
||||
{
|
||||
var suffix = pattern[1..];
|
||||
return host.EndsWith(suffix, StringComparison.OrdinalIgnoreCase) ||
|
||||
string.Equals(host, pattern[2..], StringComparison.OrdinalIgnoreCase);
|
||||
}
|
||||
|
||||
return string.Equals(host, pattern, StringComparison.OrdinalIgnoreCase);
|
||||
}
|
||||
|
||||
[GeneratedRegex(@"^([a-zA-Z0-9][-a-zA-Z0-9]*\.)+[a-zA-Z]{2,}(:\d+)?$")]
|
||||
private static partial Regex HostPortRegex();
|
||||
}
|
||||
@@ -0,0 +1,327 @@
|
||||
using StellaOps.JobEngine.Core.Domain.AirGap;
|
||||
|
||||
namespace StellaOps.JobEngine.Core.AirGap;
|
||||
|
||||
/// <summary>
|
||||
/// Service for validating air-gap staleness against configured thresholds.
|
||||
/// Per ORCH-AIRGAP-56-002.
|
||||
/// </summary>
|
||||
public interface IStalenessValidator
|
||||
{
|
||||
/// <summary>
|
||||
/// Validates staleness for a specific domain.
|
||||
/// </summary>
|
||||
StalenessValidationResult ValidateDomain(
|
||||
string domainId,
|
||||
DomainStalenessMetric metric,
|
||||
StalenessConfig config,
|
||||
StalenessValidationContext context,
|
||||
DateTimeOffset now);
|
||||
|
||||
/// <summary>
|
||||
/// Validates staleness across multiple domains required for a job.
|
||||
/// </summary>
|
||||
StalenessValidationResult ValidateForJob(
|
||||
IEnumerable<string> requiredDomains,
|
||||
IReadOnlyDictionary<string, DomainStalenessMetric> domainMetrics,
|
||||
StalenessConfig config,
|
||||
DateTimeOffset now);
|
||||
|
||||
/// <summary>
|
||||
/// Generates warnings for domains approaching staleness threshold.
|
||||
/// </summary>
|
||||
IReadOnlyList<StalenessWarning> GetApproachingThresholdWarnings(
|
||||
IReadOnlyDictionary<string, DomainStalenessMetric> domainMetrics,
|
||||
StalenessConfig config);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Default implementation of staleness validator.
|
||||
/// </summary>
|
||||
public sealed class StalenessValidator : IStalenessValidator
|
||||
{
|
||||
/// <summary>
|
||||
/// Validates staleness for a specific domain.
|
||||
/// </summary>
|
||||
public StalenessValidationResult ValidateDomain(
|
||||
string domainId,
|
||||
DomainStalenessMetric metric,
|
||||
StalenessConfig config,
|
||||
StalenessValidationContext context,
|
||||
DateTimeOffset now)
|
||||
{
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(domainId);
|
||||
ArgumentNullException.ThrowIfNull(metric);
|
||||
ArgumentNullException.ThrowIfNull(config);
|
||||
|
||||
// Check if domain is exempt
|
||||
if (config.IsDomainExempt(domainId))
|
||||
{
|
||||
return StalenessValidationResult.Pass(
|
||||
now,
|
||||
context,
|
||||
domainId,
|
||||
metric.StalenessSeconds,
|
||||
config.FreshnessThresholdSeconds,
|
||||
config.EnforcementMode);
|
||||
}
|
||||
|
||||
// Skip validation if disabled
|
||||
if (config.EnforcementMode == StalenessEnforcementMode.Disabled)
|
||||
{
|
||||
return StalenessValidationResult.Pass(
|
||||
now,
|
||||
context,
|
||||
domainId,
|
||||
metric.StalenessSeconds,
|
||||
config.FreshnessThresholdSeconds,
|
||||
config.EnforcementMode);
|
||||
}
|
||||
|
||||
// Calculate effective threshold including grace period
|
||||
var effectiveThreshold = config.FreshnessThresholdSeconds + config.GracePeriodSeconds;
|
||||
|
||||
// Check if stale
|
||||
if (metric.StalenessSeconds > effectiveThreshold)
|
||||
{
|
||||
var error = new StalenessError(
|
||||
StalenessErrorCode.AirgapStale,
|
||||
$"Domain '{domainId}' data is stale ({FormatDuration(metric.StalenessSeconds)}, threshold {FormatDuration(config.FreshnessThresholdSeconds)})",
|
||||
domainId,
|
||||
metric.StalenessSeconds,
|
||||
config.FreshnessThresholdSeconds,
|
||||
$"Import a fresh bundle for '{domainId}' from upstream using 'stella airgap import'");
|
||||
|
||||
var warnings = GetWarningsForMetric(domainId, metric, config);
|
||||
|
||||
return StalenessValidationResult.Fail(
|
||||
now,
|
||||
context,
|
||||
domainId,
|
||||
metric.StalenessSeconds,
|
||||
config.FreshnessThresholdSeconds,
|
||||
config.EnforcementMode,
|
||||
error,
|
||||
warnings);
|
||||
}
|
||||
|
||||
// Check for warnings (approaching threshold)
|
||||
var validationWarnings = GetWarningsForMetric(domainId, metric, config);
|
||||
|
||||
return StalenessValidationResult.Pass(
|
||||
now,
|
||||
context,
|
||||
domainId,
|
||||
metric.StalenessSeconds,
|
||||
config.FreshnessThresholdSeconds,
|
||||
config.EnforcementMode,
|
||||
validationWarnings.Count > 0 ? validationWarnings : null);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Validates staleness across multiple domains required for a job.
|
||||
/// </summary>
|
||||
public StalenessValidationResult ValidateForJob(
|
||||
IEnumerable<string> requiredDomains,
|
||||
IReadOnlyDictionary<string, DomainStalenessMetric> domainMetrics,
|
||||
StalenessConfig config,
|
||||
DateTimeOffset now)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(requiredDomains);
|
||||
ArgumentNullException.ThrowIfNull(domainMetrics);
|
||||
ArgumentNullException.ThrowIfNull(config);
|
||||
|
||||
var domains = requiredDomains.ToList();
|
||||
if (domains.Count == 0)
|
||||
{
|
||||
// No domain requirements - pass
|
||||
return StalenessValidationResult.Pass(
|
||||
now,
|
||||
StalenessValidationContext.JobScheduling,
|
||||
null,
|
||||
0,
|
||||
config.FreshnessThresholdSeconds,
|
||||
config.EnforcementMode);
|
||||
}
|
||||
|
||||
// Skip validation if disabled
|
||||
if (config.EnforcementMode == StalenessEnforcementMode.Disabled)
|
||||
{
|
||||
return StalenessValidationResult.Pass(
|
||||
now,
|
||||
StalenessValidationContext.JobScheduling,
|
||||
null,
|
||||
0,
|
||||
config.FreshnessThresholdSeconds,
|
||||
config.EnforcementMode);
|
||||
}
|
||||
|
||||
var allWarnings = new List<StalenessWarning>();
|
||||
var effectiveThreshold = config.FreshnessThresholdSeconds + config.GracePeriodSeconds;
|
||||
var maxStaleness = 0;
|
||||
string? stalestDomain = null;
|
||||
|
||||
foreach (var domainId in domains)
|
||||
{
|
||||
// Check if domain is exempt
|
||||
if (config.IsDomainExempt(domainId))
|
||||
{
|
||||
continue;
|
||||
}
|
||||
|
||||
// Check if we have metrics for this domain
|
||||
if (!domainMetrics.TryGetValue(domainId, out var metric))
|
||||
{
|
||||
// No bundle for domain
|
||||
var noBundleError = new StalenessError(
|
||||
StalenessErrorCode.AirgapNoBundle,
|
||||
$"No bundle available for domain '{domainId}'",
|
||||
domainId,
|
||||
null,
|
||||
config.FreshnessThresholdSeconds,
|
||||
$"Import a bundle for '{domainId}' from upstream using 'stella airgap import'");
|
||||
|
||||
return StalenessValidationResult.Fail(
|
||||
now,
|
||||
StalenessValidationContext.JobScheduling,
|
||||
domainId,
|
||||
0,
|
||||
config.FreshnessThresholdSeconds,
|
||||
config.EnforcementMode,
|
||||
noBundleError);
|
||||
}
|
||||
|
||||
// Track max staleness
|
||||
if (metric.StalenessSeconds > maxStaleness)
|
||||
{
|
||||
maxStaleness = metric.StalenessSeconds;
|
||||
stalestDomain = domainId;
|
||||
}
|
||||
|
||||
// Check if stale
|
||||
if (metric.StalenessSeconds > effectiveThreshold)
|
||||
{
|
||||
var error = new StalenessError(
|
||||
StalenessErrorCode.AirgapStale,
|
||||
$"Domain '{domainId}' data is stale ({FormatDuration(metric.StalenessSeconds)}, threshold {FormatDuration(config.FreshnessThresholdSeconds)})",
|
||||
domainId,
|
||||
metric.StalenessSeconds,
|
||||
config.FreshnessThresholdSeconds,
|
||||
$"Import a fresh bundle for '{domainId}' from upstream using 'stella airgap import'");
|
||||
|
||||
return StalenessValidationResult.Fail(
|
||||
now,
|
||||
StalenessValidationContext.JobScheduling,
|
||||
domainId,
|
||||
metric.StalenessSeconds,
|
||||
config.FreshnessThresholdSeconds,
|
||||
config.EnforcementMode,
|
||||
error,
|
||||
allWarnings.Count > 0 ? allWarnings : null);
|
||||
}
|
||||
|
||||
// Collect warnings
|
||||
allWarnings.AddRange(GetWarningsForMetric(domainId, metric, config));
|
||||
}
|
||||
|
||||
return StalenessValidationResult.Pass(
|
||||
now,
|
||||
StalenessValidationContext.JobScheduling,
|
||||
stalestDomain,
|
||||
maxStaleness,
|
||||
config.FreshnessThresholdSeconds,
|
||||
config.EnforcementMode,
|
||||
allWarnings.Count > 0 ? allWarnings : null);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Generates warnings for domains approaching staleness threshold.
|
||||
/// </summary>
|
||||
public IReadOnlyList<StalenessWarning> GetApproachingThresholdWarnings(
|
||||
IReadOnlyDictionary<string, DomainStalenessMetric> domainMetrics,
|
||||
StalenessConfig config)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(domainMetrics);
|
||||
ArgumentNullException.ThrowIfNull(config);
|
||||
|
||||
var warnings = new List<StalenessWarning>();
|
||||
|
||||
foreach (var (domainId, metric) in domainMetrics)
|
||||
{
|
||||
if (config.IsDomainExempt(domainId))
|
||||
{
|
||||
continue;
|
||||
}
|
||||
|
||||
warnings.AddRange(GetWarningsForMetric(domainId, metric, config));
|
||||
}
|
||||
|
||||
return warnings;
|
||||
}
|
||||
|
||||
private static List<StalenessWarning> GetWarningsForMetric(
|
||||
string domainId,
|
||||
DomainStalenessMetric metric,
|
||||
StalenessConfig config)
|
||||
{
|
||||
var warnings = new List<StalenessWarning>();
|
||||
var percentOfThreshold = (double)metric.StalenessSeconds / config.FreshnessThresholdSeconds * 100;
|
||||
|
||||
// Check notification thresholds
|
||||
if (config.NotificationThresholds is not null)
|
||||
{
|
||||
foreach (var threshold in config.NotificationThresholds.OrderByDescending(t => t.PercentOfThreshold))
|
||||
{
|
||||
if (percentOfThreshold >= threshold.PercentOfThreshold)
|
||||
{
|
||||
var warningCode = threshold.Severity switch
|
||||
{
|
||||
NotificationSeverity.Critical => StalenessWarningCode.AirgapApproachingStale,
|
||||
NotificationSeverity.Warning => StalenessWarningCode.AirgapBundleOld,
|
||||
_ => StalenessWarningCode.AirgapNoRecentImport
|
||||
};
|
||||
|
||||
var severityText = threshold.Severity switch
|
||||
{
|
||||
NotificationSeverity.Critical => "critical",
|
||||
NotificationSeverity.Warning => "warning",
|
||||
_ => "info"
|
||||
};
|
||||
|
||||
warnings.Add(new StalenessWarning(
|
||||
warningCode,
|
||||
$"Domain '{domainId}' at {percentOfThreshold:F0}% of staleness threshold ({severityText})",
|
||||
percentOfThreshold,
|
||||
metric.ProjectedStaleAt));
|
||||
|
||||
break; // Only report highest severity threshold
|
||||
}
|
||||
}
|
||||
}
|
||||
else if (percentOfThreshold >= 75)
|
||||
{
|
||||
// Default warning at 75%
|
||||
warnings.Add(new StalenessWarning(
|
||||
StalenessWarningCode.AirgapApproachingStale,
|
||||
$"Domain '{domainId}' at {percentOfThreshold:F0}% of staleness threshold",
|
||||
percentOfThreshold,
|
||||
metric.ProjectedStaleAt));
|
||||
}
|
||||
|
||||
return warnings;
|
||||
}
|
||||
|
||||
private static string FormatDuration(int seconds)
|
||||
{
|
||||
var span = TimeSpan.FromSeconds(seconds);
|
||||
if (span.TotalDays >= 1)
|
||||
{
|
||||
return $"{span.TotalDays:F1} days";
|
||||
}
|
||||
if (span.TotalHours >= 1)
|
||||
{
|
||||
return $"{span.TotalHours:F1} hours";
|
||||
}
|
||||
return $"{span.TotalMinutes:F0} minutes";
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,590 @@
|
||||
using Microsoft.Extensions.Logging;
|
||||
using StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
namespace StellaOps.JobEngine.Core.Backfill;
|
||||
|
||||
/// <summary>
|
||||
/// Configuration options for the backfill manager.
|
||||
/// </summary>
|
||||
public sealed record BackfillManagerOptions
|
||||
{
|
||||
/// <summary>
|
||||
/// Maximum number of events allowed in a single backfill request.
|
||||
/// </summary>
|
||||
public long MaxEventsPerBackfill { get; init; } = 1_000_000;
|
||||
|
||||
/// <summary>
|
||||
/// Maximum duration allowed for a backfill operation.
|
||||
/// </summary>
|
||||
public TimeSpan MaxBackfillDuration { get; init; } = TimeSpan.FromHours(24);
|
||||
|
||||
/// <summary>
|
||||
/// Data retention period - backfills cannot extend beyond this.
|
||||
/// </summary>
|
||||
public TimeSpan RetentionPeriod { get; init; } = TimeSpan.FromDays(90);
|
||||
|
||||
/// <summary>
|
||||
/// Default TTL for processed event records.
|
||||
/// </summary>
|
||||
public TimeSpan DefaultProcessedEventTtl { get; init; } = TimeSpan.FromDays(30);
|
||||
|
||||
/// <summary>
|
||||
/// Number of sample event keys to include in previews.
|
||||
/// </summary>
|
||||
public int PreviewSampleSize { get; init; } = 10;
|
||||
|
||||
/// <summary>
|
||||
/// Estimated events per second for duration estimation.
|
||||
/// </summary>
|
||||
public double EstimatedEventsPerSecond { get; init; } = 100;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Coordinates backfill operations with safety validations.
|
||||
/// </summary>
|
||||
public interface IBackfillManager
|
||||
{
|
||||
/// <summary>
|
||||
/// Creates a new backfill request with validation.
|
||||
/// </summary>
|
||||
Task<BackfillRequest> CreateRequestAsync(
|
||||
string tenantId,
|
||||
Guid? sourceId,
|
||||
string? jobType,
|
||||
DateTimeOffset windowStart,
|
||||
DateTimeOffset windowEnd,
|
||||
string reason,
|
||||
string createdBy,
|
||||
int batchSize = 100,
|
||||
bool dryRun = false,
|
||||
bool forceReprocess = false,
|
||||
string? ticket = null,
|
||||
TimeSpan? maxDuration = null,
|
||||
CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>
|
||||
/// Validates a backfill request and runs safety checks.
|
||||
/// </summary>
|
||||
Task<BackfillRequest> ValidateRequestAsync(
|
||||
string tenantId,
|
||||
Guid backfillId,
|
||||
string updatedBy,
|
||||
CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>
|
||||
/// Generates a preview of what a backfill would process (dry-run).
|
||||
/// </summary>
|
||||
Task<BackfillPreview> PreviewAsync(
|
||||
string tenantId,
|
||||
Guid? sourceId,
|
||||
string? jobType,
|
||||
DateTimeOffset windowStart,
|
||||
DateTimeOffset windowEnd,
|
||||
int batchSize = 100,
|
||||
CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>
|
||||
/// Starts execution of a validated backfill request.
|
||||
/// </summary>
|
||||
Task<BackfillRequest> StartAsync(
|
||||
string tenantId,
|
||||
Guid backfillId,
|
||||
string updatedBy,
|
||||
CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>
|
||||
/// Pauses a running backfill.
|
||||
/// </summary>
|
||||
Task<BackfillRequest> PauseAsync(
|
||||
string tenantId,
|
||||
Guid backfillId,
|
||||
string updatedBy,
|
||||
CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>
|
||||
/// Resumes a paused backfill.
|
||||
/// </summary>
|
||||
Task<BackfillRequest> ResumeAsync(
|
||||
string tenantId,
|
||||
Guid backfillId,
|
||||
string updatedBy,
|
||||
CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>
|
||||
/// Cancels a backfill request.
|
||||
/// </summary>
|
||||
Task<BackfillRequest> CancelAsync(
|
||||
string tenantId,
|
||||
Guid backfillId,
|
||||
string updatedBy,
|
||||
CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>
|
||||
/// Gets the current status of a backfill request.
|
||||
/// </summary>
|
||||
Task<BackfillRequest?> GetStatusAsync(
|
||||
string tenantId,
|
||||
Guid backfillId,
|
||||
CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>
|
||||
/// Lists backfill requests with filters.
|
||||
/// </summary>
|
||||
Task<IReadOnlyList<BackfillRequest>> ListAsync(
|
||||
string tenantId,
|
||||
BackfillStatus? status = null,
|
||||
Guid? sourceId = null,
|
||||
string? jobType = null,
|
||||
int limit = 50,
|
||||
int offset = 0,
|
||||
CancellationToken cancellationToken = default);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Provides event counting for backfill estimation.
|
||||
/// </summary>
|
||||
public interface IBackfillEventCounter
|
||||
{
|
||||
/// <summary>
|
||||
/// Estimates the number of events in a time window.
|
||||
/// </summary>
|
||||
Task<long> EstimateEventCountAsync(
|
||||
string tenantId,
|
||||
string scopeKey,
|
||||
DateTimeOffset windowStart,
|
||||
DateTimeOffset windowEnd,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>
|
||||
/// Gets sample event keys from a time window.
|
||||
/// </summary>
|
||||
Task<IReadOnlyList<string>> GetSampleEventKeysAsync(
|
||||
string tenantId,
|
||||
string scopeKey,
|
||||
DateTimeOffset windowStart,
|
||||
DateTimeOffset windowEnd,
|
||||
int sampleSize,
|
||||
CancellationToken cancellationToken);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Validates backfill safety conditions.
|
||||
/// </summary>
|
||||
public interface IBackfillSafetyValidator
|
||||
{
|
||||
/// <summary>
|
||||
/// Runs all safety validations for a backfill request.
|
||||
/// </summary>
|
||||
Task<BackfillSafetyChecks> ValidateAsync(
|
||||
BackfillRequest request,
|
||||
long estimatedEvents,
|
||||
TimeSpan estimatedDuration,
|
||||
CancellationToken cancellationToken);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Default implementation of backfill safety validator.
|
||||
/// </summary>
|
||||
public sealed class DefaultBackfillSafetyValidator : IBackfillSafetyValidator
|
||||
{
|
||||
private readonly ISourceValidator _sourceValidator;
|
||||
private readonly IOverlapChecker _overlapChecker;
|
||||
private readonly TimeProvider _timeProvider;
|
||||
private readonly BackfillManagerOptions _options;
|
||||
|
||||
public DefaultBackfillSafetyValidator(
|
||||
ISourceValidator sourceValidator,
|
||||
IOverlapChecker overlapChecker,
|
||||
TimeProvider timeProvider,
|
||||
BackfillManagerOptions options)
|
||||
{
|
||||
_sourceValidator = sourceValidator;
|
||||
_overlapChecker = overlapChecker;
|
||||
_timeProvider = timeProvider ?? TimeProvider.System;
|
||||
_options = options;
|
||||
}
|
||||
|
||||
public async Task<BackfillSafetyChecks> ValidateAsync(
|
||||
BackfillRequest request,
|
||||
long estimatedEvents,
|
||||
TimeSpan estimatedDuration,
|
||||
CancellationToken cancellationToken)
|
||||
{
|
||||
var warnings = new List<string>();
|
||||
var errors = new List<string>();
|
||||
|
||||
// Check source exists
|
||||
var sourceExists = true;
|
||||
if (request.SourceId.HasValue)
|
||||
{
|
||||
sourceExists = await _sourceValidator.ExistsAsync(
|
||||
request.TenantId, request.SourceId.Value, cancellationToken);
|
||||
if (!sourceExists)
|
||||
{
|
||||
errors.Add($"Source {request.SourceId} not found.");
|
||||
}
|
||||
}
|
||||
|
||||
// Check for overlapping backfills
|
||||
var hasOverlap = await _overlapChecker.HasOverlapAsync(
|
||||
request.TenantId,
|
||||
request.ScopeKey,
|
||||
request.WindowStart,
|
||||
request.WindowEnd,
|
||||
request.BackfillId,
|
||||
cancellationToken);
|
||||
if (hasOverlap)
|
||||
{
|
||||
errors.Add("An active backfill already exists for this scope and time window.");
|
||||
}
|
||||
|
||||
// Check retention period
|
||||
var retentionLimit = _timeProvider.GetUtcNow() - _options.RetentionPeriod;
|
||||
var withinRetention = request.WindowStart >= retentionLimit;
|
||||
if (!withinRetention)
|
||||
{
|
||||
errors.Add($"Window start {request.WindowStart:O} is beyond the retention period ({_options.RetentionPeriod.TotalDays} days).");
|
||||
}
|
||||
|
||||
// Check event limit
|
||||
var withinEventLimit = estimatedEvents <= _options.MaxEventsPerBackfill;
|
||||
if (!withinEventLimit)
|
||||
{
|
||||
errors.Add($"Estimated {estimatedEvents:N0} events exceeds maximum allowed ({_options.MaxEventsPerBackfill:N0}).");
|
||||
}
|
||||
else if (estimatedEvents > _options.MaxEventsPerBackfill * 0.8)
|
||||
{
|
||||
warnings.Add($"Estimated {estimatedEvents:N0} events is approaching the maximum limit.");
|
||||
}
|
||||
|
||||
// Check duration limit
|
||||
var maxDuration = request.MaxDuration ?? _options.MaxBackfillDuration;
|
||||
var withinDurationLimit = estimatedDuration <= maxDuration;
|
||||
if (!withinDurationLimit)
|
||||
{
|
||||
errors.Add($"Estimated duration {estimatedDuration} exceeds maximum allowed ({maxDuration}).");
|
||||
}
|
||||
|
||||
// Check quota availability (placeholder - always true for now)
|
||||
var quotaAvailable = true;
|
||||
|
||||
// Add warnings for large backfills
|
||||
if (request.WindowDuration > TimeSpan.FromDays(7))
|
||||
{
|
||||
warnings.Add("Large time window may take significant time to process.");
|
||||
}
|
||||
|
||||
if (request.ForceReprocess)
|
||||
{
|
||||
warnings.Add("Force reprocess is enabled - events will be processed even if already seen.");
|
||||
}
|
||||
|
||||
return new BackfillSafetyChecks(
|
||||
SourceExists: sourceExists,
|
||||
HasOverlappingBackfill: hasOverlap,
|
||||
WithinRetention: withinRetention,
|
||||
WithinEventLimit: withinEventLimit,
|
||||
WithinDurationLimit: withinDurationLimit,
|
||||
QuotaAvailable: quotaAvailable,
|
||||
Warnings: warnings,
|
||||
Errors: errors);
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Validates that a source exists.
|
||||
/// </summary>
|
||||
public interface ISourceValidator
|
||||
{
|
||||
/// <summary>
|
||||
/// Checks if a source exists.
|
||||
/// </summary>
|
||||
Task<bool> ExistsAsync(string tenantId, Guid sourceId, CancellationToken cancellationToken);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Checks for overlapping backfill operations.
|
||||
/// </summary>
|
||||
public interface IOverlapChecker
|
||||
{
|
||||
/// <summary>
|
||||
/// Checks if there's an overlapping active backfill.
|
||||
/// </summary>
|
||||
Task<bool> HasOverlapAsync(
|
||||
string tenantId,
|
||||
string scopeKey,
|
||||
DateTimeOffset windowStart,
|
||||
DateTimeOffset windowEnd,
|
||||
Guid? excludeBackfillId,
|
||||
CancellationToken cancellationToken);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Default implementation of the backfill manager.
|
||||
/// </summary>
|
||||
public sealed class BackfillManager : IBackfillManager
|
||||
{
|
||||
private readonly IBackfillRepository _backfillRepository;
|
||||
private readonly IBackfillSafetyValidator _safetyValidator;
|
||||
private readonly IBackfillEventCounter _eventCounter;
|
||||
private readonly IDuplicateSuppressor _duplicateSuppressor;
|
||||
private readonly TimeProvider _timeProvider;
|
||||
private readonly BackfillManagerOptions _options;
|
||||
private readonly ILogger<BackfillManager> _logger;
|
||||
|
||||
public BackfillManager(
|
||||
IBackfillRepository backfillRepository,
|
||||
IBackfillSafetyValidator safetyValidator,
|
||||
IBackfillEventCounter eventCounter,
|
||||
IDuplicateSuppressor duplicateSuppressor,
|
||||
TimeProvider timeProvider,
|
||||
BackfillManagerOptions options,
|
||||
ILogger<BackfillManager> logger)
|
||||
{
|
||||
_backfillRepository = backfillRepository;
|
||||
_safetyValidator = safetyValidator;
|
||||
_eventCounter = eventCounter;
|
||||
_duplicateSuppressor = duplicateSuppressor;
|
||||
_timeProvider = timeProvider ?? TimeProvider.System;
|
||||
_options = options;
|
||||
_logger = logger;
|
||||
}
|
||||
|
||||
public async Task<BackfillRequest> CreateRequestAsync(
|
||||
string tenantId,
|
||||
Guid? sourceId,
|
||||
string? jobType,
|
||||
DateTimeOffset windowStart,
|
||||
DateTimeOffset windowEnd,
|
||||
string reason,
|
||||
string createdBy,
|
||||
int batchSize = 100,
|
||||
bool dryRun = false,
|
||||
bool forceReprocess = false,
|
||||
string? ticket = null,
|
||||
TimeSpan? maxDuration = null,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
var request = BackfillRequest.Create(
|
||||
tenantId: tenantId,
|
||||
sourceId: sourceId,
|
||||
jobType: jobType,
|
||||
windowStart: windowStart,
|
||||
windowEnd: windowEnd,
|
||||
reason: reason,
|
||||
createdBy: createdBy,
|
||||
timestamp: _timeProvider.GetUtcNow(),
|
||||
batchSize: batchSize,
|
||||
dryRun: dryRun,
|
||||
forceReprocess: forceReprocess,
|
||||
ticket: ticket,
|
||||
maxDuration: maxDuration);
|
||||
|
||||
await _backfillRepository.CreateAsync(request, cancellationToken);
|
||||
|
||||
_logger.LogInformation(
|
||||
"Created backfill request {BackfillId} for scope {ScopeKey} from {WindowStart} to {WindowEnd}",
|
||||
request.BackfillId, request.ScopeKey, windowStart, windowEnd);
|
||||
|
||||
return request;
|
||||
}
|
||||
|
||||
public async Task<BackfillRequest> ValidateRequestAsync(
|
||||
string tenantId,
|
||||
Guid backfillId,
|
||||
string updatedBy,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
var request = await _backfillRepository.GetByIdAsync(tenantId, backfillId, cancellationToken)
|
||||
?? throw new InvalidOperationException($"Backfill request {backfillId} not found.");
|
||||
|
||||
request = request.StartValidation(updatedBy);
|
||||
await _backfillRepository.UpdateAsync(request, cancellationToken);
|
||||
|
||||
// Estimate event count
|
||||
var estimatedEvents = await _eventCounter.EstimateEventCountAsync(
|
||||
tenantId, request.ScopeKey, request.WindowStart, request.WindowEnd, cancellationToken);
|
||||
|
||||
// Calculate estimated duration
|
||||
var estimatedDuration = TimeSpan.FromSeconds(estimatedEvents / _options.EstimatedEventsPerSecond);
|
||||
|
||||
// Run safety validations
|
||||
var safetyChecks = await _safetyValidator.ValidateAsync(
|
||||
request, estimatedEvents, estimatedDuration, cancellationToken);
|
||||
|
||||
request = request.WithSafetyChecks(safetyChecks, estimatedEvents, estimatedDuration, updatedBy);
|
||||
await _backfillRepository.UpdateAsync(request, cancellationToken);
|
||||
|
||||
_logger.LogInformation(
|
||||
"Validated backfill request {BackfillId}: {EstimatedEvents} events, safe={IsSafe}",
|
||||
backfillId, estimatedEvents, safetyChecks.IsSafe);
|
||||
|
||||
return request;
|
||||
}
|
||||
|
||||
public async Task<BackfillPreview> PreviewAsync(
|
||||
string tenantId,
|
||||
Guid? sourceId,
|
||||
string? jobType,
|
||||
DateTimeOffset windowStart,
|
||||
DateTimeOffset windowEnd,
|
||||
int batchSize = 100,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
var scopeKey = GetScopeKey(sourceId, jobType);
|
||||
|
||||
// Estimate total events
|
||||
var estimatedEvents = await _eventCounter.EstimateEventCountAsync(
|
||||
tenantId, scopeKey, windowStart, windowEnd, cancellationToken);
|
||||
|
||||
// Get already processed count
|
||||
var processedCount = await _duplicateSuppressor.CountProcessedAsync(
|
||||
scopeKey, windowStart, windowEnd, cancellationToken);
|
||||
|
||||
// Get sample event keys
|
||||
var sampleKeys = await _eventCounter.GetSampleEventKeysAsync(
|
||||
tenantId, scopeKey, windowStart, windowEnd, _options.PreviewSampleSize, cancellationToken);
|
||||
|
||||
// Calculate estimates
|
||||
var processableEvents = Math.Max(0, estimatedEvents - processedCount);
|
||||
var estimatedDuration = TimeSpan.FromSeconds(processableEvents / _options.EstimatedEventsPerSecond);
|
||||
var estimatedBatches = (int)Math.Ceiling((double)processableEvents / batchSize);
|
||||
|
||||
// Run safety checks
|
||||
var tempRequest = BackfillRequest.Create(
|
||||
tenantId, sourceId, jobType, windowStart, windowEnd,
|
||||
"preview", "system", _timeProvider.GetUtcNow(), batchSize);
|
||||
|
||||
var safetyChecks = await _safetyValidator.ValidateAsync(
|
||||
tempRequest, estimatedEvents, estimatedDuration, cancellationToken);
|
||||
|
||||
return new BackfillPreview(
|
||||
ScopeKey: scopeKey,
|
||||
WindowStart: windowStart,
|
||||
WindowEnd: windowEnd,
|
||||
EstimatedEvents: estimatedEvents,
|
||||
SkippedEvents: processedCount,
|
||||
ProcessableEvents: processableEvents,
|
||||
EstimatedDuration: estimatedDuration,
|
||||
EstimatedBatches: estimatedBatches,
|
||||
SafetyChecks: safetyChecks,
|
||||
SampleEventKeys: sampleKeys);
|
||||
}
|
||||
|
||||
public async Task<BackfillRequest> StartAsync(
|
||||
string tenantId,
|
||||
Guid backfillId,
|
||||
string updatedBy,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
var request = await _backfillRepository.GetByIdAsync(tenantId, backfillId, cancellationToken)
|
||||
?? throw new InvalidOperationException($"Backfill request {backfillId} not found.");
|
||||
|
||||
request = request.Start(updatedBy, _timeProvider.GetUtcNow());
|
||||
await _backfillRepository.UpdateAsync(request, cancellationToken);
|
||||
|
||||
_logger.LogInformation("Started backfill request {BackfillId}", backfillId);
|
||||
|
||||
return request;
|
||||
}
|
||||
|
||||
public async Task<BackfillRequest> PauseAsync(
|
||||
string tenantId,
|
||||
Guid backfillId,
|
||||
string updatedBy,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
var request = await _backfillRepository.GetByIdAsync(tenantId, backfillId, cancellationToken)
|
||||
?? throw new InvalidOperationException($"Backfill request {backfillId} not found.");
|
||||
|
||||
request = request.Pause(updatedBy);
|
||||
await _backfillRepository.UpdateAsync(request, cancellationToken);
|
||||
|
||||
_logger.LogInformation("Paused backfill request {BackfillId}", backfillId);
|
||||
|
||||
return request;
|
||||
}
|
||||
|
||||
public async Task<BackfillRequest> ResumeAsync(
|
||||
string tenantId,
|
||||
Guid backfillId,
|
||||
string updatedBy,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
var request = await _backfillRepository.GetByIdAsync(tenantId, backfillId, cancellationToken)
|
||||
?? throw new InvalidOperationException($"Backfill request {backfillId} not found.");
|
||||
|
||||
request = request.Resume(updatedBy);
|
||||
await _backfillRepository.UpdateAsync(request, cancellationToken);
|
||||
|
||||
_logger.LogInformation("Resumed backfill request {BackfillId}", backfillId);
|
||||
|
||||
return request;
|
||||
}
|
||||
|
||||
public async Task<BackfillRequest> CancelAsync(
|
||||
string tenantId,
|
||||
Guid backfillId,
|
||||
string updatedBy,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
var request = await _backfillRepository.GetByIdAsync(tenantId, backfillId, cancellationToken)
|
||||
?? throw new InvalidOperationException($"Backfill request {backfillId} not found.");
|
||||
|
||||
request = request.Cancel(updatedBy, _timeProvider.GetUtcNow());
|
||||
await _backfillRepository.UpdateAsync(request, cancellationToken);
|
||||
|
||||
_logger.LogInformation("Canceled backfill request {BackfillId}", backfillId);
|
||||
|
||||
return request;
|
||||
}
|
||||
|
||||
public Task<BackfillRequest?> GetStatusAsync(
|
||||
string tenantId,
|
||||
Guid backfillId,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
return _backfillRepository.GetByIdAsync(tenantId, backfillId, cancellationToken);
|
||||
}
|
||||
|
||||
public Task<IReadOnlyList<BackfillRequest>> ListAsync(
|
||||
string tenantId,
|
||||
BackfillStatus? status = null,
|
||||
Guid? sourceId = null,
|
||||
string? jobType = null,
|
||||
int limit = 50,
|
||||
int offset = 0,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
return _backfillRepository.ListAsync(tenantId, status, sourceId, jobType, limit, offset, cancellationToken);
|
||||
}
|
||||
|
||||
private static string GetScopeKey(Guid? sourceId, string? jobType)
|
||||
{
|
||||
return (sourceId, jobType) switch
|
||||
{
|
||||
(Guid s, string j) when !string.IsNullOrEmpty(j) => Watermark.CreateScopeKey(s, j),
|
||||
(Guid s, _) => Watermark.CreateScopeKey(s),
|
||||
(_, string j) when !string.IsNullOrEmpty(j) => Watermark.CreateScopeKey(j),
|
||||
_ => throw new ArgumentException("Either sourceId or jobType must be specified.")
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Repository interface for backfill persistence (imported for convenience).
|
||||
/// </summary>
|
||||
public interface IBackfillRepository
|
||||
{
|
||||
Task<BackfillRequest?> GetByIdAsync(string tenantId, Guid backfillId, CancellationToken cancellationToken);
|
||||
Task CreateAsync(BackfillRequest request, CancellationToken cancellationToken);
|
||||
Task UpdateAsync(BackfillRequest request, CancellationToken cancellationToken);
|
||||
Task<IReadOnlyList<BackfillRequest>> ListAsync(
|
||||
string tenantId,
|
||||
BackfillStatus? status,
|
||||
Guid? sourceId,
|
||||
string? jobType,
|
||||
int limit,
|
||||
int offset,
|
||||
CancellationToken cancellationToken);
|
||||
}
|
||||
@@ -0,0 +1,328 @@
|
||||
namespace StellaOps.JobEngine.Core.Backfill;
|
||||
|
||||
/// <summary>
|
||||
/// Tracks processed events for duplicate suppression.
|
||||
/// </summary>
|
||||
public interface IDuplicateSuppressor
|
||||
{
|
||||
/// <summary>
|
||||
/// Checks if an event has already been processed.
|
||||
/// </summary>
|
||||
/// <param name="scopeKey">Scope identifier.</param>
|
||||
/// <param name="eventKey">Unique event identifier.</param>
|
||||
/// <param name="cancellationToken">Cancellation token.</param>
|
||||
/// <returns>True if the event was already processed.</returns>
|
||||
Task<bool> HasProcessedAsync(string scopeKey, string eventKey, CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>
|
||||
/// Checks multiple events for duplicate status.
|
||||
/// </summary>
|
||||
/// <param name="scopeKey">Scope identifier.</param>
|
||||
/// <param name="eventKeys">Event identifiers to check.</param>
|
||||
/// <param name="cancellationToken">Cancellation token.</param>
|
||||
/// <returns>Set of event keys that have already been processed.</returns>
|
||||
Task<IReadOnlySet<string>> GetProcessedAsync(string scopeKey, IEnumerable<string> eventKeys, CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>
|
||||
/// Marks an event as processed.
|
||||
/// </summary>
|
||||
/// <param name="scopeKey">Scope identifier.</param>
|
||||
/// <param name="eventKey">Unique event identifier.</param>
|
||||
/// <param name="eventTime">Event timestamp.</param>
|
||||
/// <param name="batchId">Optional batch/backfill identifier.</param>
|
||||
/// <param name="ttl">Time-to-live for the record.</param>
|
||||
/// <param name="cancellationToken">Cancellation token.</param>
|
||||
Task MarkProcessedAsync(
|
||||
string scopeKey,
|
||||
string eventKey,
|
||||
DateTimeOffset eventTime,
|
||||
Guid? batchId,
|
||||
TimeSpan ttl,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>
|
||||
/// Marks multiple events as processed.
|
||||
/// </summary>
|
||||
/// <param name="scopeKey">Scope identifier.</param>
|
||||
/// <param name="events">Events to mark as processed.</param>
|
||||
/// <param name="batchId">Optional batch/backfill identifier.</param>
|
||||
/// <param name="ttl">Time-to-live for the records.</param>
|
||||
/// <param name="cancellationToken">Cancellation token.</param>
|
||||
Task MarkProcessedBatchAsync(
|
||||
string scopeKey,
|
||||
IEnumerable<ProcessedEvent> events,
|
||||
Guid? batchId,
|
||||
TimeSpan ttl,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>
|
||||
/// Counts processed events within a time range.
|
||||
/// </summary>
|
||||
/// <param name="scopeKey">Scope identifier.</param>
|
||||
/// <param name="from">Start of time range.</param>
|
||||
/// <param name="to">End of time range.</param>
|
||||
/// <param name="cancellationToken">Cancellation token.</param>
|
||||
/// <returns>Count of processed events.</returns>
|
||||
Task<long> CountProcessedAsync(string scopeKey, DateTimeOffset from, DateTimeOffset to, CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>
|
||||
/// Removes expired records (cleanup).
|
||||
/// </summary>
|
||||
/// <param name="batchLimit">Maximum records to remove per call.</param>
|
||||
/// <param name="cancellationToken">Cancellation token.</param>
|
||||
/// <returns>Number of records removed.</returns>
|
||||
Task<int> CleanupExpiredAsync(int batchLimit, CancellationToken cancellationToken);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Event data for duplicate tracking.
|
||||
/// </summary>
|
||||
public sealed record ProcessedEvent(
|
||||
/// <summary>Unique event identifier.</summary>
|
||||
string EventKey,
|
||||
|
||||
/// <summary>Event timestamp.</summary>
|
||||
DateTimeOffset EventTime);
|
||||
|
||||
/// <summary>
|
||||
/// In-memory duplicate suppressor for testing.
|
||||
/// </summary>
|
||||
public sealed class InMemoryDuplicateSuppressor : IDuplicateSuppressor
|
||||
{
|
||||
private readonly Dictionary<string, Dictionary<string, ProcessedEventEntry>> _store = new();
|
||||
private readonly TimeProvider _timeProvider;
|
||||
private readonly object _lock = new();
|
||||
|
||||
/// <summary>
|
||||
/// Creates a new in-memory duplicate suppressor.
|
||||
/// </summary>
|
||||
/// <param name="timeProvider">Time provider for deterministic time.</param>
|
||||
public InMemoryDuplicateSuppressor(TimeProvider? timeProvider = null)
|
||||
{
|
||||
_timeProvider = timeProvider ?? TimeProvider.System;
|
||||
}
|
||||
|
||||
private sealed record ProcessedEventEntry(
|
||||
DateTimeOffset EventTime,
|
||||
DateTimeOffset ProcessedAt,
|
||||
Guid? BatchId,
|
||||
DateTimeOffset ExpiresAt);
|
||||
|
||||
public Task<bool> HasProcessedAsync(string scopeKey, string eventKey, CancellationToken cancellationToken)
|
||||
{
|
||||
lock (_lock)
|
||||
{
|
||||
if (!_store.TryGetValue(scopeKey, out var scopeStore))
|
||||
return Task.FromResult(false);
|
||||
|
||||
if (!scopeStore.TryGetValue(eventKey, out var entry))
|
||||
return Task.FromResult(false);
|
||||
|
||||
// Check if expired
|
||||
if (entry.ExpiresAt < _timeProvider.GetUtcNow())
|
||||
{
|
||||
scopeStore.Remove(eventKey);
|
||||
return Task.FromResult(false);
|
||||
}
|
||||
|
||||
return Task.FromResult(true);
|
||||
}
|
||||
}
|
||||
|
||||
public Task<IReadOnlySet<string>> GetProcessedAsync(string scopeKey, IEnumerable<string> eventKeys, CancellationToken cancellationToken)
|
||||
{
|
||||
var now = _timeProvider.GetUtcNow();
|
||||
var result = new HashSet<string>();
|
||||
|
||||
lock (_lock)
|
||||
{
|
||||
if (!_store.TryGetValue(scopeKey, out var scopeStore))
|
||||
return Task.FromResult<IReadOnlySet<string>>(result);
|
||||
|
||||
foreach (var eventKey in eventKeys)
|
||||
{
|
||||
if (scopeStore.TryGetValue(eventKey, out var entry) && entry.ExpiresAt >= now)
|
||||
{
|
||||
result.Add(eventKey);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return Task.FromResult<IReadOnlySet<string>>(result);
|
||||
}
|
||||
|
||||
public Task MarkProcessedAsync(
|
||||
string scopeKey,
|
||||
string eventKey,
|
||||
DateTimeOffset eventTime,
|
||||
Guid? batchId,
|
||||
TimeSpan ttl,
|
||||
CancellationToken cancellationToken)
|
||||
{
|
||||
var now = _timeProvider.GetUtcNow();
|
||||
var entry = new ProcessedEventEntry(eventTime, now, batchId, now + ttl);
|
||||
|
||||
lock (_lock)
|
||||
{
|
||||
if (!_store.TryGetValue(scopeKey, out var scopeStore))
|
||||
{
|
||||
scopeStore = new Dictionary<string, ProcessedEventEntry>();
|
||||
_store[scopeKey] = scopeStore;
|
||||
}
|
||||
|
||||
scopeStore[eventKey] = entry;
|
||||
}
|
||||
|
||||
return Task.CompletedTask;
|
||||
}
|
||||
|
||||
public Task MarkProcessedBatchAsync(
|
||||
string scopeKey,
|
||||
IEnumerable<ProcessedEvent> events,
|
||||
Guid? batchId,
|
||||
TimeSpan ttl,
|
||||
CancellationToken cancellationToken)
|
||||
{
|
||||
var now = _timeProvider.GetUtcNow();
|
||||
var expiresAt = now + ttl;
|
||||
|
||||
lock (_lock)
|
||||
{
|
||||
if (!_store.TryGetValue(scopeKey, out var scopeStore))
|
||||
{
|
||||
scopeStore = new Dictionary<string, ProcessedEventEntry>();
|
||||
_store[scopeKey] = scopeStore;
|
||||
}
|
||||
|
||||
foreach (var evt in events)
|
||||
{
|
||||
scopeStore[evt.EventKey] = new ProcessedEventEntry(evt.EventTime, now, batchId, expiresAt);
|
||||
}
|
||||
}
|
||||
|
||||
return Task.CompletedTask;
|
||||
}
|
||||
|
||||
public Task<long> CountProcessedAsync(string scopeKey, DateTimeOffset from, DateTimeOffset to, CancellationToken cancellationToken)
|
||||
{
|
||||
var now = _timeProvider.GetUtcNow();
|
||||
long count = 0;
|
||||
|
||||
lock (_lock)
|
||||
{
|
||||
if (_store.TryGetValue(scopeKey, out var scopeStore))
|
||||
{
|
||||
count = scopeStore.Values
|
||||
.Count(e => e.ExpiresAt >= now && e.EventTime >= from && e.EventTime < to);
|
||||
}
|
||||
}
|
||||
|
||||
return Task.FromResult(count);
|
||||
}
|
||||
|
||||
public Task<int> CleanupExpiredAsync(int batchLimit, CancellationToken cancellationToken)
|
||||
{
|
||||
var now = _timeProvider.GetUtcNow();
|
||||
var removed = 0;
|
||||
|
||||
lock (_lock)
|
||||
{
|
||||
foreach (var scopeStore in _store.Values)
|
||||
{
|
||||
var expiredKeys = scopeStore
|
||||
.Where(kvp => kvp.Value.ExpiresAt < now)
|
||||
.Take(batchLimit - removed)
|
||||
.Select(kvp => kvp.Key)
|
||||
.ToList();
|
||||
|
||||
foreach (var key in expiredKeys)
|
||||
{
|
||||
scopeStore.Remove(key);
|
||||
removed++;
|
||||
}
|
||||
|
||||
if (removed >= batchLimit)
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
return Task.FromResult(removed);
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Result of filtering events through duplicate suppression.
|
||||
/// </summary>
|
||||
public sealed record DuplicateFilterResult<T>(
|
||||
/// <summary>Events that should be processed (not duplicates).</summary>
|
||||
IReadOnlyList<T> ToProcess,
|
||||
|
||||
/// <summary>Events that were filtered as duplicates.</summary>
|
||||
IReadOnlyList<T> Duplicates,
|
||||
|
||||
/// <summary>Total events evaluated.</summary>
|
||||
int Total)
|
||||
{
|
||||
/// <summary>
|
||||
/// Number of events that passed filtering.
|
||||
/// </summary>
|
||||
public int ProcessCount => ToProcess.Count;
|
||||
|
||||
/// <summary>
|
||||
/// Number of duplicates filtered.
|
||||
/// </summary>
|
||||
public int DuplicateCount => Duplicates.Count;
|
||||
|
||||
/// <summary>
|
||||
/// Duplicate percentage.
|
||||
/// </summary>
|
||||
public double DuplicatePercent => Total > 0 ? Math.Round((double)DuplicateCount / Total * 100, 2) : 0;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Helper methods for duplicate suppression.
|
||||
/// </summary>
|
||||
public static class DuplicateSuppressorExtensions
|
||||
{
|
||||
/// <summary>
|
||||
/// Filters a batch of events, removing duplicates.
|
||||
/// </summary>
|
||||
/// <typeparam name="T">Event type.</typeparam>
|
||||
/// <param name="suppressor">Duplicate suppressor.</param>
|
||||
/// <param name="scopeKey">Scope identifier.</param>
|
||||
/// <param name="events">Events to filter.</param>
|
||||
/// <param name="keySelector">Function to extract event key.</param>
|
||||
/// <param name="cancellationToken">Cancellation token.</param>
|
||||
/// <returns>Filter result with events to process and duplicates.</returns>
|
||||
public static async Task<DuplicateFilterResult<T>> FilterAsync<T>(
|
||||
this IDuplicateSuppressor suppressor,
|
||||
string scopeKey,
|
||||
IReadOnlyList<T> events,
|
||||
Func<T, string> keySelector,
|
||||
CancellationToken cancellationToken)
|
||||
{
|
||||
if (events.Count == 0)
|
||||
return new DuplicateFilterResult<T>([], [], 0);
|
||||
|
||||
var eventKeys = events.Select(keySelector).ToList();
|
||||
var processed = await suppressor.GetProcessedAsync(scopeKey, eventKeys, cancellationToken).ConfigureAwait(false);
|
||||
|
||||
var toProcess = new List<T>();
|
||||
var duplicates = new List<T>();
|
||||
|
||||
foreach (var evt in events)
|
||||
{
|
||||
var key = keySelector(evt);
|
||||
if (processed.Contains(key))
|
||||
{
|
||||
duplicates.Add(evt);
|
||||
}
|
||||
else
|
||||
{
|
||||
toProcess.Add(evt);
|
||||
}
|
||||
}
|
||||
|
||||
return new DuplicateFilterResult<T>(toProcess, duplicates, events.Count);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,218 @@
|
||||
namespace StellaOps.JobEngine.Core.Backfill;
|
||||
|
||||
/// <summary>
|
||||
/// Represents an event-time window for batch processing.
|
||||
/// </summary>
|
||||
public sealed record EventTimeWindow(
|
||||
/// <summary>Start of the window (inclusive).</summary>
|
||||
DateTimeOffset Start,
|
||||
|
||||
/// <summary>End of the window (exclusive).</summary>
|
||||
DateTimeOffset End)
|
||||
{
|
||||
/// <summary>
|
||||
/// Duration of the window.
|
||||
/// </summary>
|
||||
public TimeSpan Duration => End - Start;
|
||||
|
||||
/// <summary>
|
||||
/// Whether the window is empty (zero duration).
|
||||
/// </summary>
|
||||
public bool IsEmpty => End <= Start;
|
||||
|
||||
/// <summary>
|
||||
/// Whether a timestamp falls within this window.
|
||||
/// </summary>
|
||||
public bool Contains(DateTimeOffset timestamp) => timestamp >= Start && timestamp < End;
|
||||
|
||||
/// <summary>
|
||||
/// Whether this window overlaps with another.
|
||||
/// </summary>
|
||||
public bool Overlaps(EventTimeWindow other) =>
|
||||
Start < other.End && End > other.Start;
|
||||
|
||||
/// <summary>
|
||||
/// Creates the intersection of two windows.
|
||||
/// </summary>
|
||||
public EventTimeWindow? Intersect(EventTimeWindow other)
|
||||
{
|
||||
var newStart = Start > other.Start ? Start : other.Start;
|
||||
var newEnd = End < other.End ? End : other.End;
|
||||
|
||||
return newEnd > newStart ? new EventTimeWindow(newStart, newEnd) : null;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Splits the window into batches of the specified duration.
|
||||
/// </summary>
|
||||
public IEnumerable<EventTimeWindow> Split(TimeSpan batchDuration)
|
||||
{
|
||||
if (batchDuration <= TimeSpan.Zero)
|
||||
throw new ArgumentOutOfRangeException(nameof(batchDuration), "Batch duration must be positive.");
|
||||
|
||||
var current = Start;
|
||||
while (current < End)
|
||||
{
|
||||
var batchEnd = current + batchDuration;
|
||||
if (batchEnd > End)
|
||||
batchEnd = End;
|
||||
|
||||
yield return new EventTimeWindow(current, batchEnd);
|
||||
current = batchEnd;
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Creates a window from a duration ending at the specified time.
|
||||
/// </summary>
|
||||
public static EventTimeWindow FromDuration(DateTimeOffset end, TimeSpan duration) =>
|
||||
new(end - duration, end);
|
||||
|
||||
/// <summary>
|
||||
/// Creates a window covering the last N hours from now.
|
||||
/// </summary>
|
||||
public static EventTimeWindow LastHours(int hours, DateTimeOffset now)
|
||||
{
|
||||
return FromDuration(now, TimeSpan.FromHours(hours));
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Creates a window covering the last N days from now.
|
||||
/// </summary>
|
||||
public static EventTimeWindow LastDays(int days, DateTimeOffset now)
|
||||
{
|
||||
return FromDuration(now, TimeSpan.FromDays(days));
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Configuration for event-time window computation.
|
||||
/// </summary>
|
||||
public sealed record EventTimeWindowOptions(
|
||||
/// <summary>Minimum window size (prevents too-small batches).</summary>
|
||||
TimeSpan MinWindowSize,
|
||||
|
||||
/// <summary>Maximum window size (prevents too-large batches).</summary>
|
||||
TimeSpan MaxWindowSize,
|
||||
|
||||
/// <summary>Overlap with previous window for late-arriving events.</summary>
|
||||
TimeSpan OverlapDuration,
|
||||
|
||||
/// <summary>Maximum lag allowed before triggering alerts.</summary>
|
||||
TimeSpan MaxLag,
|
||||
|
||||
/// <summary>Default lookback for initial fetch when no watermark exists.</summary>
|
||||
TimeSpan InitialLookback)
|
||||
{
|
||||
/// <summary>
|
||||
/// Default options for hourly batching.
|
||||
/// </summary>
|
||||
public static EventTimeWindowOptions HourlyBatches => new(
|
||||
MinWindowSize: TimeSpan.FromMinutes(5),
|
||||
MaxWindowSize: TimeSpan.FromHours(1),
|
||||
OverlapDuration: TimeSpan.FromMinutes(5),
|
||||
MaxLag: TimeSpan.FromHours(2),
|
||||
InitialLookback: TimeSpan.FromDays(7));
|
||||
|
||||
/// <summary>
|
||||
/// Default options for daily batching.
|
||||
/// </summary>
|
||||
public static EventTimeWindowOptions DailyBatches => new(
|
||||
MinWindowSize: TimeSpan.FromHours(1),
|
||||
MaxWindowSize: TimeSpan.FromDays(1),
|
||||
OverlapDuration: TimeSpan.FromHours(1),
|
||||
MaxLag: TimeSpan.FromDays(1),
|
||||
InitialLookback: TimeSpan.FromDays(30));
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Computes event-time windows for incremental processing.
|
||||
/// </summary>
|
||||
public static class EventTimeWindowPlanner
|
||||
{
|
||||
/// <summary>
|
||||
/// Computes the next window to process based on current watermark.
|
||||
/// </summary>
|
||||
/// <param name="now">Current time.</param>
|
||||
/// <param name="highWatermark">Current high watermark (null for initial fetch).</param>
|
||||
/// <param name="options">Window configuration options.</param>
|
||||
/// <returns>The next window to process, or null if caught up.</returns>
|
||||
public static EventTimeWindow? GetNextWindow(
|
||||
DateTimeOffset now,
|
||||
DateTimeOffset? highWatermark,
|
||||
EventTimeWindowOptions options)
|
||||
{
|
||||
DateTimeOffset windowStart;
|
||||
|
||||
if (highWatermark is null)
|
||||
{
|
||||
// Initial fetch: start from initial lookback
|
||||
windowStart = now - options.InitialLookback;
|
||||
}
|
||||
else
|
||||
{
|
||||
// Incremental fetch: start from watermark minus overlap
|
||||
windowStart = highWatermark.Value - options.OverlapDuration;
|
||||
|
||||
// If we're caught up (watermark + min window > now), no work needed
|
||||
if (highWatermark.Value + options.MinWindowSize > now)
|
||||
{
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
// Calculate window end (at most now, at most max window from start)
|
||||
var windowEnd = windowStart + options.MaxWindowSize;
|
||||
if (windowEnd > now)
|
||||
{
|
||||
windowEnd = now;
|
||||
}
|
||||
|
||||
// Ensure minimum window size
|
||||
if (windowEnd - windowStart < options.MinWindowSize)
|
||||
{
|
||||
// If window would be too small, extend end (but not past now)
|
||||
windowEnd = windowStart + options.MinWindowSize;
|
||||
if (windowEnd > now)
|
||||
{
|
||||
return null; // Not enough data accumulated yet
|
||||
}
|
||||
}
|
||||
|
||||
return new EventTimeWindow(windowStart, windowEnd);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Calculates the current lag from the high watermark.
|
||||
/// </summary>
|
||||
public static TimeSpan CalculateLag(DateTimeOffset now, DateTimeOffset highWatermark) =>
|
||||
now - highWatermark;
|
||||
|
||||
/// <summary>
|
||||
/// Determines if the lag exceeds the maximum allowed.
|
||||
/// </summary>
|
||||
public static bool IsLagging(DateTimeOffset now, DateTimeOffset highWatermark, EventTimeWindowOptions options) =>
|
||||
CalculateLag(now, highWatermark) > options.MaxLag;
|
||||
|
||||
/// <summary>
|
||||
/// Estimates the number of windows needed to catch up.
|
||||
/// </summary>
|
||||
public static int EstimateWindowsToProcess(
|
||||
DateTimeOffset now,
|
||||
DateTimeOffset? highWatermark,
|
||||
EventTimeWindowOptions options)
|
||||
{
|
||||
if (highWatermark is null)
|
||||
{
|
||||
// Initial fetch
|
||||
var totalDuration = options.InitialLookback;
|
||||
return (int)Math.Ceiling(totalDuration / options.MaxWindowSize);
|
||||
}
|
||||
|
||||
var lag = CalculateLag(now, highWatermark.Value);
|
||||
if (lag <= options.MinWindowSize)
|
||||
return 0;
|
||||
|
||||
return (int)Math.Ceiling(lag / options.MaxWindowSize);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,502 @@
|
||||
using Microsoft.Extensions.Logging;
|
||||
using StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
namespace StellaOps.JobEngine.Core.DeadLetter;
|
||||
|
||||
/// <summary>
|
||||
/// Notification channel types.
|
||||
/// </summary>
|
||||
public enum NotificationChannel
|
||||
{
|
||||
Email,
|
||||
Slack,
|
||||
Teams,
|
||||
Webhook,
|
||||
PagerDuty
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Notification rule for dead-letter events.
|
||||
/// </summary>
|
||||
public sealed record NotificationRule(
|
||||
Guid RuleId,
|
||||
string TenantId,
|
||||
string? JobTypePattern,
|
||||
string? ErrorCodePattern,
|
||||
ErrorCategory? Category,
|
||||
Guid? SourceId,
|
||||
bool Enabled,
|
||||
NotificationChannel Channel,
|
||||
string Endpoint,
|
||||
int CooldownMinutes,
|
||||
int MaxPerHour,
|
||||
bool Aggregate,
|
||||
DateTimeOffset? LastNotifiedAt,
|
||||
int NotificationsSent,
|
||||
DateTimeOffset CreatedAt,
|
||||
DateTimeOffset UpdatedAt,
|
||||
string CreatedBy,
|
||||
string UpdatedBy)
|
||||
{
|
||||
/// <summary>Creates a new notification rule.</summary>
|
||||
public static NotificationRule Create(
|
||||
string tenantId,
|
||||
NotificationChannel channel,
|
||||
string endpoint,
|
||||
string createdBy,
|
||||
DateTimeOffset createdAt,
|
||||
string? jobTypePattern = null,
|
||||
string? errorCodePattern = null,
|
||||
ErrorCategory? category = null,
|
||||
Guid? sourceId = null,
|
||||
int cooldownMinutes = 15,
|
||||
int maxPerHour = 10,
|
||||
bool aggregate = true)
|
||||
{
|
||||
return new NotificationRule(
|
||||
RuleId: Guid.NewGuid(),
|
||||
TenantId: tenantId,
|
||||
JobTypePattern: jobTypePattern,
|
||||
ErrorCodePattern: errorCodePattern,
|
||||
Category: category,
|
||||
SourceId: sourceId,
|
||||
Enabled: true,
|
||||
Channel: channel,
|
||||
Endpoint: endpoint,
|
||||
CooldownMinutes: cooldownMinutes,
|
||||
MaxPerHour: maxPerHour,
|
||||
Aggregate: aggregate,
|
||||
LastNotifiedAt: null,
|
||||
NotificationsSent: 0,
|
||||
CreatedAt: createdAt,
|
||||
UpdatedAt: createdAt,
|
||||
CreatedBy: createdBy,
|
||||
UpdatedBy: createdBy);
|
||||
}
|
||||
|
||||
/// <summary>Checks if this rule matches the given entry.</summary>
|
||||
public bool Matches(DeadLetterEntry entry)
|
||||
{
|
||||
if (!Enabled) return false;
|
||||
|
||||
if (SourceId.HasValue && entry.SourceId != SourceId.Value) return false;
|
||||
if (Category.HasValue && entry.Category != Category.Value) return false;
|
||||
|
||||
if (!string.IsNullOrEmpty(JobTypePattern))
|
||||
{
|
||||
if (!System.Text.RegularExpressions.Regex.IsMatch(entry.JobType, JobTypePattern))
|
||||
return false;
|
||||
}
|
||||
|
||||
if (!string.IsNullOrEmpty(ErrorCodePattern))
|
||||
{
|
||||
if (!System.Text.RegularExpressions.Regex.IsMatch(entry.ErrorCode, ErrorCodePattern))
|
||||
return false;
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
/// <summary>Checks if this rule is within rate limits.</summary>
|
||||
public bool CanNotify(DateTimeOffset now, int notificationsSentThisHour)
|
||||
{
|
||||
if (!Enabled) return false;
|
||||
|
||||
if (notificationsSentThisHour >= MaxPerHour) return false;
|
||||
|
||||
if (LastNotifiedAt.HasValue)
|
||||
{
|
||||
var elapsed = now - LastNotifiedAt.Value;
|
||||
if (elapsed < TimeSpan.FromMinutes(CooldownMinutes))
|
||||
return false;
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
/// <summary>Records a notification sent.</summary>
|
||||
public NotificationRule RecordNotification(DateTimeOffset now) =>
|
||||
this with
|
||||
{
|
||||
LastNotifiedAt = now,
|
||||
NotificationsSent = NotificationsSent + 1,
|
||||
UpdatedAt = now
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Notification log entry.
|
||||
/// </summary>
|
||||
public sealed record NotificationLogEntry(
|
||||
Guid LogId,
|
||||
string TenantId,
|
||||
Guid RuleId,
|
||||
IReadOnlyList<Guid> EntryIds,
|
||||
NotificationChannel Channel,
|
||||
string Endpoint,
|
||||
bool Success,
|
||||
string? ErrorMessage,
|
||||
string? Subject,
|
||||
int EntryCount,
|
||||
DateTimeOffset SentAt);
|
||||
|
||||
/// <summary>
|
||||
/// Notification payload for dead-letter events.
|
||||
/// </summary>
|
||||
public sealed record DeadLetterNotificationPayload(
|
||||
string TenantId,
|
||||
string EventType,
|
||||
IReadOnlyList<DeadLetterEntrySummary> Entries,
|
||||
DeadLetterStatsSnapshot? Stats,
|
||||
DateTimeOffset Timestamp,
|
||||
string? ActionUrl);
|
||||
|
||||
/// <summary>
|
||||
/// Summary of a dead-letter entry for notifications.
|
||||
/// </summary>
|
||||
public sealed record DeadLetterEntrySummary(
|
||||
Guid EntryId,
|
||||
Guid OriginalJobId,
|
||||
string JobType,
|
||||
string ErrorCode,
|
||||
ErrorCategory Category,
|
||||
string FailureReason,
|
||||
string? RemediationHint,
|
||||
bool IsRetryable,
|
||||
int ReplayAttempts,
|
||||
DateTimeOffset FailedAt);
|
||||
|
||||
/// <summary>
|
||||
/// Stats snapshot for notifications.
|
||||
/// </summary>
|
||||
public sealed record DeadLetterStatsSnapshot(
|
||||
long PendingCount,
|
||||
long RetryableCount,
|
||||
long ExhaustedCount);
|
||||
|
||||
/// <summary>
|
||||
/// Interface for dead-letter event notifications.
|
||||
/// </summary>
|
||||
public interface IDeadLetterNotifier
|
||||
{
|
||||
/// <summary>Notifies when a new entry is added to dead-letter store.</summary>
|
||||
Task NotifyNewEntryAsync(
|
||||
DeadLetterEntry entry,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>Notifies when an entry is successfully replayed.</summary>
|
||||
Task NotifyReplaySuccessAsync(
|
||||
DeadLetterEntry entry,
|
||||
Guid newJobId,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>Notifies when an entry exhausts all replay attempts.</summary>
|
||||
Task NotifyExhaustedAsync(
|
||||
DeadLetterEntry entry,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>Sends aggregated notifications for pending entries.</summary>
|
||||
Task SendAggregatedNotificationsAsync(
|
||||
string tenantId,
|
||||
CancellationToken cancellationToken);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Interface for notification delivery.
|
||||
/// </summary>
|
||||
public interface INotificationDelivery
|
||||
{
|
||||
/// <summary>Sends a notification to the specified endpoint.</summary>
|
||||
Task<bool> SendAsync(
|
||||
NotificationChannel channel,
|
||||
string endpoint,
|
||||
DeadLetterNotificationPayload payload,
|
||||
CancellationToken cancellationToken);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Repository for notification rules.
|
||||
/// </summary>
|
||||
public interface INotificationRuleRepository
|
||||
{
|
||||
Task<NotificationRule?> GetByIdAsync(string tenantId, Guid ruleId, CancellationToken cancellationToken);
|
||||
Task<IReadOnlyList<NotificationRule>> ListAsync(string tenantId, bool enabledOnly, CancellationToken cancellationToken);
|
||||
Task<IReadOnlyList<NotificationRule>> GetMatchingRulesAsync(string tenantId, DeadLetterEntry entry, CancellationToken cancellationToken);
|
||||
Task CreateAsync(NotificationRule rule, CancellationToken cancellationToken);
|
||||
Task<bool> UpdateAsync(NotificationRule rule, CancellationToken cancellationToken);
|
||||
Task<bool> DeleteAsync(string tenantId, Guid ruleId, CancellationToken cancellationToken);
|
||||
Task<int> GetNotificationCountThisHourAsync(string tenantId, Guid ruleId, CancellationToken cancellationToken);
|
||||
Task LogNotificationAsync(NotificationLogEntry log, CancellationToken cancellationToken);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Default dead-letter notifier implementation.
|
||||
/// </summary>
|
||||
public sealed class DeadLetterNotifier : IDeadLetterNotifier
|
||||
{
|
||||
private readonly INotificationRuleRepository _ruleRepository;
|
||||
private readonly IDeadLetterRepository _deadLetterRepository;
|
||||
private readonly INotificationDelivery _delivery;
|
||||
private readonly TimeProvider _timeProvider;
|
||||
private readonly ILogger<DeadLetterNotifier> _logger;
|
||||
|
||||
public DeadLetterNotifier(
|
||||
INotificationRuleRepository ruleRepository,
|
||||
IDeadLetterRepository deadLetterRepository,
|
||||
INotificationDelivery delivery,
|
||||
TimeProvider timeProvider,
|
||||
ILogger<DeadLetterNotifier> logger)
|
||||
{
|
||||
_ruleRepository = ruleRepository ?? throw new ArgumentNullException(nameof(ruleRepository));
|
||||
_deadLetterRepository = deadLetterRepository ?? throw new ArgumentNullException(nameof(deadLetterRepository));
|
||||
_delivery = delivery ?? throw new ArgumentNullException(nameof(delivery));
|
||||
_timeProvider = timeProvider ?? throw new ArgumentNullException(nameof(timeProvider));
|
||||
_logger = logger ?? throw new ArgumentNullException(nameof(logger));
|
||||
}
|
||||
|
||||
public async Task NotifyNewEntryAsync(
|
||||
DeadLetterEntry entry,
|
||||
CancellationToken cancellationToken)
|
||||
{
|
||||
var rules = await _ruleRepository.GetMatchingRulesAsync(entry.TenantId, entry, cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
|
||||
var now = _timeProvider.GetUtcNow();
|
||||
|
||||
foreach (var rule in rules)
|
||||
{
|
||||
if (rule.Aggregate)
|
||||
{
|
||||
// Skip immediate notification for aggregated rules
|
||||
continue;
|
||||
}
|
||||
|
||||
var notificationsThisHour = await _ruleRepository.GetNotificationCountThisHourAsync(
|
||||
entry.TenantId, rule.RuleId, cancellationToken).ConfigureAwait(false);
|
||||
|
||||
if (!rule.CanNotify(now, notificationsThisHour))
|
||||
{
|
||||
continue;
|
||||
}
|
||||
|
||||
await SendNotificationAsync(rule, "new_entry", [entry], null, cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
}
|
||||
}
|
||||
|
||||
public async Task NotifyReplaySuccessAsync(
|
||||
DeadLetterEntry entry,
|
||||
Guid newJobId,
|
||||
CancellationToken cancellationToken)
|
||||
{
|
||||
var rules = await _ruleRepository.GetMatchingRulesAsync(entry.TenantId, entry, cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
|
||||
var now = _timeProvider.GetUtcNow();
|
||||
|
||||
foreach (var rule in rules)
|
||||
{
|
||||
var notificationsThisHour = await _ruleRepository.GetNotificationCountThisHourAsync(
|
||||
entry.TenantId, rule.RuleId, cancellationToken).ConfigureAwait(false);
|
||||
|
||||
if (!rule.CanNotify(now, notificationsThisHour))
|
||||
{
|
||||
continue;
|
||||
}
|
||||
|
||||
var payload = new DeadLetterNotificationPayload(
|
||||
TenantId: entry.TenantId,
|
||||
EventType: "replay_success",
|
||||
Entries: [ToSummary(entry)],
|
||||
Stats: null,
|
||||
Timestamp: now,
|
||||
ActionUrl: null);
|
||||
|
||||
var success = await _delivery.SendAsync(rule.Channel, rule.Endpoint, payload, cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
|
||||
await LogNotificationAsync(rule, [entry.EntryId], success, null, cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
}
|
||||
}
|
||||
|
||||
public async Task NotifyExhaustedAsync(
|
||||
DeadLetterEntry entry,
|
||||
CancellationToken cancellationToken)
|
||||
{
|
||||
var rules = await _ruleRepository.GetMatchingRulesAsync(entry.TenantId, entry, cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
|
||||
var now = _timeProvider.GetUtcNow();
|
||||
|
||||
foreach (var rule in rules)
|
||||
{
|
||||
var notificationsThisHour = await _ruleRepository.GetNotificationCountThisHourAsync(
|
||||
entry.TenantId, rule.RuleId, cancellationToken).ConfigureAwait(false);
|
||||
|
||||
if (!rule.CanNotify(now, notificationsThisHour))
|
||||
{
|
||||
continue;
|
||||
}
|
||||
|
||||
await SendNotificationAsync(rule, "exhausted", [entry], null, cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
}
|
||||
}
|
||||
|
||||
public async Task SendAggregatedNotificationsAsync(
|
||||
string tenantId,
|
||||
CancellationToken cancellationToken)
|
||||
{
|
||||
var rules = await _ruleRepository.ListAsync(tenantId, enabledOnly: true, cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
|
||||
var now = _timeProvider.GetUtcNow();
|
||||
var stats = await _deadLetterRepository.GetStatsAsync(tenantId, cancellationToken).ConfigureAwait(false);
|
||||
|
||||
foreach (var rule in rules.Where(r => r.Aggregate))
|
||||
{
|
||||
var notificationsThisHour = await _ruleRepository.GetNotificationCountThisHourAsync(
|
||||
tenantId, rule.RuleId, cancellationToken).ConfigureAwait(false);
|
||||
|
||||
if (!rule.CanNotify(now, notificationsThisHour))
|
||||
{
|
||||
continue;
|
||||
}
|
||||
|
||||
// Get pending entries matching this rule
|
||||
var options = new DeadLetterListOptions(
|
||||
Status: DeadLetterStatus.Pending,
|
||||
Category: rule.Category,
|
||||
Limit: 10);
|
||||
|
||||
var entries = await _deadLetterRepository.ListAsync(tenantId, options, cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
|
||||
// Filter to only matching entries
|
||||
var matchingEntries = entries.Where(e => rule.Matches(e)).ToList();
|
||||
|
||||
if (matchingEntries.Count == 0)
|
||||
{
|
||||
continue;
|
||||
}
|
||||
|
||||
var statsSnapshot = new DeadLetterStatsSnapshot(
|
||||
PendingCount: stats.PendingEntries,
|
||||
RetryableCount: stats.RetryableEntries,
|
||||
ExhaustedCount: stats.ExhaustedEntries);
|
||||
|
||||
await SendNotificationAsync(rule, "aggregated", matchingEntries, statsSnapshot, cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
}
|
||||
}
|
||||
|
||||
private async Task SendNotificationAsync(
|
||||
NotificationRule rule,
|
||||
string eventType,
|
||||
IReadOnlyList<DeadLetterEntry> entries,
|
||||
DeadLetterStatsSnapshot? stats,
|
||||
CancellationToken cancellationToken)
|
||||
{
|
||||
var now = _timeProvider.GetUtcNow();
|
||||
|
||||
var payload = new DeadLetterNotificationPayload(
|
||||
TenantId: rule.TenantId,
|
||||
EventType: eventType,
|
||||
Entries: entries.Select(ToSummary).ToList(),
|
||||
Stats: stats,
|
||||
Timestamp: now,
|
||||
ActionUrl: null);
|
||||
|
||||
string? errorMessage = null;
|
||||
bool success;
|
||||
|
||||
try
|
||||
{
|
||||
success = await _delivery.SendAsync(rule.Channel, rule.Endpoint, payload, cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
success = false;
|
||||
errorMessage = ex.Message;
|
||||
_logger.LogError(ex, "Failed to send {EventType} notification for rule {RuleId}", eventType, rule.RuleId);
|
||||
}
|
||||
|
||||
await LogNotificationAsync(rule, entries.Select(e => e.EntryId).ToList(), success, errorMessage, cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
|
||||
if (success)
|
||||
{
|
||||
var updatedRule = rule.RecordNotification(now);
|
||||
await _ruleRepository.UpdateAsync(updatedRule, cancellationToken).ConfigureAwait(false);
|
||||
_logger.LogInformation(
|
||||
"Dead-letter notification sent: tenant={TenantId}, channel={Channel}, eventType={EventType}",
|
||||
rule.TenantId, rule.Channel, eventType);
|
||||
}
|
||||
else
|
||||
{
|
||||
_logger.LogWarning(
|
||||
"Dead-letter notification failed: tenant={TenantId}, channel={Channel}, eventType={EventType}",
|
||||
rule.TenantId, rule.Channel, eventType);
|
||||
}
|
||||
}
|
||||
|
||||
private async Task LogNotificationAsync(
|
||||
NotificationRule rule,
|
||||
IReadOnlyList<Guid> entryIds,
|
||||
bool success,
|
||||
string? errorMessage,
|
||||
CancellationToken cancellationToken)
|
||||
{
|
||||
var log = new NotificationLogEntry(
|
||||
LogId: Guid.NewGuid(),
|
||||
TenantId: rule.TenantId,
|
||||
RuleId: rule.RuleId,
|
||||
EntryIds: entryIds,
|
||||
Channel: rule.Channel,
|
||||
Endpoint: rule.Endpoint,
|
||||
Success: success,
|
||||
ErrorMessage: errorMessage,
|
||||
Subject: null,
|
||||
EntryCount: entryIds.Count,
|
||||
SentAt: _timeProvider.GetUtcNow());
|
||||
|
||||
await _ruleRepository.LogNotificationAsync(log, cancellationToken).ConfigureAwait(false);
|
||||
}
|
||||
|
||||
private static DeadLetterEntrySummary ToSummary(DeadLetterEntry entry) =>
|
||||
new(
|
||||
EntryId: entry.EntryId,
|
||||
OriginalJobId: entry.OriginalJobId,
|
||||
JobType: entry.JobType,
|
||||
ErrorCode: entry.ErrorCode,
|
||||
Category: entry.Category,
|
||||
FailureReason: entry.FailureReason,
|
||||
RemediationHint: entry.RemediationHint,
|
||||
IsRetryable: entry.IsRetryable,
|
||||
ReplayAttempts: entry.ReplayAttempts,
|
||||
FailedAt: entry.FailedAt);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// No-op notifier for when notifications are disabled.
|
||||
/// </summary>
|
||||
public sealed class NullDeadLetterNotifier : IDeadLetterNotifier
|
||||
{
|
||||
public static readonly NullDeadLetterNotifier Instance = new();
|
||||
|
||||
private NullDeadLetterNotifier() { }
|
||||
|
||||
public Task NotifyNewEntryAsync(DeadLetterEntry entry, CancellationToken cancellationToken) =>
|
||||
Task.CompletedTask;
|
||||
|
||||
public Task NotifyReplaySuccessAsync(DeadLetterEntry entry, Guid newJobId, CancellationToken cancellationToken) =>
|
||||
Task.CompletedTask;
|
||||
|
||||
public Task NotifyExhaustedAsync(DeadLetterEntry entry, CancellationToken cancellationToken) =>
|
||||
Task.CompletedTask;
|
||||
|
||||
public Task SendAggregatedNotificationsAsync(string tenantId, CancellationToken cancellationToken) =>
|
||||
Task.CompletedTask;
|
||||
}
|
||||
@@ -0,0 +1,578 @@
|
||||
using StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
namespace StellaOps.JobEngine.Core.DeadLetter;
|
||||
|
||||
/// <summary>
|
||||
/// Represents a classified error with remediation guidance.
|
||||
/// </summary>
|
||||
public sealed record ClassifiedError(
|
||||
/// <summary>Error code (e.g., "ORCH-ERR-001").</summary>
|
||||
string ErrorCode,
|
||||
|
||||
/// <summary>Error category.</summary>
|
||||
ErrorCategory Category,
|
||||
|
||||
/// <summary>Human-readable description.</summary>
|
||||
string Description,
|
||||
|
||||
/// <summary>Remediation hint for operators.</summary>
|
||||
string RemediationHint,
|
||||
|
||||
/// <summary>Whether this error is potentially retryable.</summary>
|
||||
bool IsRetryable,
|
||||
|
||||
/// <summary>Suggested retry delay if retryable.</summary>
|
||||
TimeSpan? SuggestedRetryDelay);
|
||||
|
||||
/// <summary>
|
||||
/// Classifies errors and provides remediation hints.
|
||||
/// </summary>
|
||||
public interface IErrorClassifier
|
||||
{
|
||||
/// <summary>Classifies an exception into a categorized error.</summary>
|
||||
ClassifiedError Classify(Exception exception);
|
||||
|
||||
/// <summary>Classifies an error code and message.</summary>
|
||||
ClassifiedError Classify(string errorCode, string message);
|
||||
|
||||
/// <summary>Classifies based on HTTP status code and message.</summary>
|
||||
ClassifiedError ClassifyHttpError(int statusCode, string? message);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Default error classifier with standard error codes and remediation hints.
|
||||
/// </summary>
|
||||
public sealed class DefaultErrorClassifier : IErrorClassifier
|
||||
{
|
||||
/// <summary>Known error codes with classifications.</summary>
|
||||
public static class ErrorCodes
|
||||
{
|
||||
// Transient errors (ORCH-TRN-xxx)
|
||||
public const string NetworkTimeout = "ORCH-TRN-001";
|
||||
public const string ConnectionRefused = "ORCH-TRN-002";
|
||||
public const string DnsResolutionFailed = "ORCH-TRN-003";
|
||||
public const string ServiceUnavailable = "ORCH-TRN-004";
|
||||
public const string GatewayTimeout = "ORCH-TRN-005";
|
||||
public const string TemporaryFailure = "ORCH-TRN-099";
|
||||
|
||||
// Not found errors (ORCH-NF-xxx)
|
||||
public const string ImageNotFound = "ORCH-NF-001";
|
||||
public const string SourceNotFound = "ORCH-NF-002";
|
||||
public const string RegistryNotFound = "ORCH-NF-003";
|
||||
public const string ManifestNotFound = "ORCH-NF-004";
|
||||
public const string ResourceNotFound = "ORCH-NF-099";
|
||||
|
||||
// Auth errors (ORCH-AUTH-xxx)
|
||||
public const string InvalidCredentials = "ORCH-AUTH-001";
|
||||
public const string TokenExpired = "ORCH-AUTH-002";
|
||||
public const string InsufficientPermissions = "ORCH-AUTH-003";
|
||||
public const string CertificateError = "ORCH-AUTH-004";
|
||||
public const string AuthenticationFailed = "ORCH-AUTH-099";
|
||||
|
||||
// Rate limit errors (ORCH-RL-xxx)
|
||||
public const string RateLimited = "ORCH-RL-001";
|
||||
public const string QuotaExceeded = "ORCH-RL-002";
|
||||
public const string ConcurrencyLimitReached = "ORCH-RL-003";
|
||||
public const string ThrottlingError = "ORCH-RL-099";
|
||||
|
||||
// Validation errors (ORCH-VAL-xxx)
|
||||
public const string InvalidPayload = "ORCH-VAL-001";
|
||||
public const string InvalidConfiguration = "ORCH-VAL-002";
|
||||
public const string SchemaValidationFailed = "ORCH-VAL-003";
|
||||
public const string MissingRequiredField = "ORCH-VAL-004";
|
||||
public const string ValidationFailed = "ORCH-VAL-099";
|
||||
|
||||
// Upstream errors (ORCH-UP-xxx)
|
||||
public const string RegistryError = "ORCH-UP-001";
|
||||
public const string AdvisoryFeedError = "ORCH-UP-002";
|
||||
public const string DatabaseError = "ORCH-UP-003";
|
||||
public const string ExternalServiceError = "ORCH-UP-099";
|
||||
|
||||
// Internal errors (ORCH-INT-xxx)
|
||||
public const string InternalError = "ORCH-INT-001";
|
||||
public const string StateCorruption = "ORCH-INT-002";
|
||||
public const string ProcessingError = "ORCH-INT-003";
|
||||
public const string UnexpectedError = "ORCH-INT-099";
|
||||
|
||||
// Conflict errors (ORCH-CON-xxx)
|
||||
public const string DuplicateJob = "ORCH-CON-001";
|
||||
public const string VersionMismatch = "ORCH-CON-002";
|
||||
public const string ConcurrentModification = "ORCH-CON-003";
|
||||
public const string ConflictError = "ORCH-CON-099";
|
||||
|
||||
// Canceled errors (ORCH-CAN-xxx)
|
||||
public const string UserCanceled = "ORCH-CAN-001";
|
||||
public const string SystemCanceled = "ORCH-CAN-002";
|
||||
public const string TimeoutCanceled = "ORCH-CAN-003";
|
||||
public const string OperationCanceled = "ORCH-CAN-099";
|
||||
}
|
||||
|
||||
private static readonly Dictionary<string, ClassifiedError> KnownErrors = new()
|
||||
{
|
||||
// Transient errors
|
||||
[ErrorCodes.NetworkTimeout] = new(
|
||||
ErrorCodes.NetworkTimeout,
|
||||
ErrorCategory.Transient,
|
||||
"Network operation timed out",
|
||||
"Check network connectivity and firewall rules. If the target service is healthy, increase timeout settings.",
|
||||
IsRetryable: true,
|
||||
SuggestedRetryDelay: TimeSpan.FromMinutes(1)),
|
||||
|
||||
[ErrorCodes.ConnectionRefused] = new(
|
||||
ErrorCodes.ConnectionRefused,
|
||||
ErrorCategory.Transient,
|
||||
"Connection refused by target host",
|
||||
"Verify the target service is running and accessible. Check firewall rules and network policies.",
|
||||
IsRetryable: true,
|
||||
SuggestedRetryDelay: TimeSpan.FromMinutes(2)),
|
||||
|
||||
[ErrorCodes.DnsResolutionFailed] = new(
|
||||
ErrorCodes.DnsResolutionFailed,
|
||||
ErrorCategory.Transient,
|
||||
"DNS resolution failed",
|
||||
"Verify the hostname is correct. Check DNS server configuration and network connectivity.",
|
||||
IsRetryable: true,
|
||||
SuggestedRetryDelay: TimeSpan.FromMinutes(1)),
|
||||
|
||||
[ErrorCodes.ServiceUnavailable] = new(
|
||||
ErrorCodes.ServiceUnavailable,
|
||||
ErrorCategory.Transient,
|
||||
"Service temporarily unavailable (503)",
|
||||
"The target service is temporarily overloaded or under maintenance. Retry with exponential backoff.",
|
||||
IsRetryable: true,
|
||||
SuggestedRetryDelay: TimeSpan.FromMinutes(5)),
|
||||
|
||||
[ErrorCodes.GatewayTimeout] = new(
|
||||
ErrorCodes.GatewayTimeout,
|
||||
ErrorCategory.Transient,
|
||||
"Gateway timeout (504)",
|
||||
"An upstream service took too long to respond. This is typically transient; retry with backoff.",
|
||||
IsRetryable: true,
|
||||
SuggestedRetryDelay: TimeSpan.FromMinutes(2)),
|
||||
|
||||
[ErrorCodes.TemporaryFailure] = new(
|
||||
ErrorCodes.TemporaryFailure,
|
||||
ErrorCategory.Transient,
|
||||
"Temporary failure",
|
||||
"A transient error occurred. Retry the operation after a brief delay.",
|
||||
IsRetryable: true,
|
||||
SuggestedRetryDelay: TimeSpan.FromMinutes(1)),
|
||||
|
||||
// Not found errors
|
||||
[ErrorCodes.ImageNotFound] = new(
|
||||
ErrorCodes.ImageNotFound,
|
||||
ErrorCategory.NotFound,
|
||||
"Container image not found",
|
||||
"Verify the image reference is correct (repository, tag, digest). Check registry access and that the image exists.",
|
||||
IsRetryable: false,
|
||||
SuggestedRetryDelay: null),
|
||||
|
||||
[ErrorCodes.SourceNotFound] = new(
|
||||
ErrorCodes.SourceNotFound,
|
||||
ErrorCategory.NotFound,
|
||||
"Source configuration not found",
|
||||
"The referenced source may have been deleted. Verify the source ID and recreate if necessary.",
|
||||
IsRetryable: false,
|
||||
SuggestedRetryDelay: null),
|
||||
|
||||
[ErrorCodes.RegistryNotFound] = new(
|
||||
ErrorCodes.RegistryNotFound,
|
||||
ErrorCategory.NotFound,
|
||||
"Container registry not found",
|
||||
"Verify the registry URL is correct. Check DNS resolution and that the registry is operational.",
|
||||
IsRetryable: false,
|
||||
SuggestedRetryDelay: null),
|
||||
|
||||
[ErrorCodes.ManifestNotFound] = new(
|
||||
ErrorCodes.ManifestNotFound,
|
||||
ErrorCategory.NotFound,
|
||||
"Image manifest not found",
|
||||
"The image exists but the manifest is missing. The image may have been deleted or the tag moved.",
|
||||
IsRetryable: false,
|
||||
SuggestedRetryDelay: null),
|
||||
|
||||
[ErrorCodes.ResourceNotFound] = new(
|
||||
ErrorCodes.ResourceNotFound,
|
||||
ErrorCategory.NotFound,
|
||||
"Resource not found",
|
||||
"The requested resource does not exist. Verify the resource identifier is correct.",
|
||||
IsRetryable: false,
|
||||
SuggestedRetryDelay: null),
|
||||
|
||||
// Auth errors
|
||||
[ErrorCodes.InvalidCredentials] = new(
|
||||
ErrorCodes.InvalidCredentials,
|
||||
ErrorCategory.AuthFailure,
|
||||
"Invalid credentials",
|
||||
"The provided credentials are invalid. Update the registry credentials in the source configuration.",
|
||||
IsRetryable: false,
|
||||
SuggestedRetryDelay: null),
|
||||
|
||||
[ErrorCodes.TokenExpired] = new(
|
||||
ErrorCodes.TokenExpired,
|
||||
ErrorCategory.AuthFailure,
|
||||
"Authentication token expired",
|
||||
"The authentication token has expired. Refresh credentials or re-authenticate to obtain a new token.",
|
||||
IsRetryable: true,
|
||||
SuggestedRetryDelay: TimeSpan.FromMinutes(1)),
|
||||
|
||||
[ErrorCodes.InsufficientPermissions] = new(
|
||||
ErrorCodes.InsufficientPermissions,
|
||||
ErrorCategory.AuthFailure,
|
||||
"Insufficient permissions",
|
||||
"The authenticated user lacks required permissions. Request access from the registry administrator.",
|
||||
IsRetryable: false,
|
||||
SuggestedRetryDelay: null),
|
||||
|
||||
[ErrorCodes.CertificateError] = new(
|
||||
ErrorCodes.CertificateError,
|
||||
ErrorCategory.AuthFailure,
|
||||
"TLS certificate error",
|
||||
"Certificate validation failed. Verify the CA bundle or add the registry's certificate to trusted roots.",
|
||||
IsRetryable: false,
|
||||
SuggestedRetryDelay: null),
|
||||
|
||||
[ErrorCodes.AuthenticationFailed] = new(
|
||||
ErrorCodes.AuthenticationFailed,
|
||||
ErrorCategory.AuthFailure,
|
||||
"Authentication failed",
|
||||
"Unable to authenticate with the target service. Verify credentials and authentication configuration.",
|
||||
IsRetryable: false,
|
||||
SuggestedRetryDelay: null),
|
||||
|
||||
// Rate limit errors
|
||||
[ErrorCodes.RateLimited] = new(
|
||||
ErrorCodes.RateLimited,
|
||||
ErrorCategory.RateLimited,
|
||||
"Rate limit exceeded (429)",
|
||||
"Request rate limit exceeded. Reduce request frequency or upgrade service tier. Will auto-retry with backoff.",
|
||||
IsRetryable: true,
|
||||
SuggestedRetryDelay: TimeSpan.FromMinutes(5)),
|
||||
|
||||
[ErrorCodes.QuotaExceeded] = new(
|
||||
ErrorCodes.QuotaExceeded,
|
||||
ErrorCategory.RateLimited,
|
||||
"Quota exceeded",
|
||||
"Usage quota has been exceeded. Wait for quota reset or request quota increase.",
|
||||
IsRetryable: true,
|
||||
SuggestedRetryDelay: TimeSpan.FromHours(1)),
|
||||
|
||||
[ErrorCodes.ConcurrencyLimitReached] = new(
|
||||
ErrorCodes.ConcurrencyLimitReached,
|
||||
ErrorCategory.RateLimited,
|
||||
"Concurrency limit reached",
|
||||
"Maximum concurrent operations limit reached. Reduce parallel operations or increase limit.",
|
||||
IsRetryable: true,
|
||||
SuggestedRetryDelay: TimeSpan.FromMinutes(1)),
|
||||
|
||||
[ErrorCodes.ThrottlingError] = new(
|
||||
ErrorCodes.ThrottlingError,
|
||||
ErrorCategory.RateLimited,
|
||||
"Request throttled",
|
||||
"Request was throttled due to rate limits. Retry with exponential backoff.",
|
||||
IsRetryable: true,
|
||||
SuggestedRetryDelay: TimeSpan.FromMinutes(2)),
|
||||
|
||||
// Validation errors
|
||||
[ErrorCodes.InvalidPayload] = new(
|
||||
ErrorCodes.InvalidPayload,
|
||||
ErrorCategory.ValidationError,
|
||||
"Invalid job payload",
|
||||
"The job payload is malformed or invalid. Review the payload structure and fix validation errors.",
|
||||
IsRetryable: false,
|
||||
SuggestedRetryDelay: null),
|
||||
|
||||
[ErrorCodes.InvalidConfiguration] = new(
|
||||
ErrorCodes.InvalidConfiguration,
|
||||
ErrorCategory.ValidationError,
|
||||
"Invalid configuration",
|
||||
"Source or job configuration is invalid. Review and correct the configuration settings.",
|
||||
IsRetryable: false,
|
||||
SuggestedRetryDelay: null),
|
||||
|
||||
[ErrorCodes.SchemaValidationFailed] = new(
|
||||
ErrorCodes.SchemaValidationFailed,
|
||||
ErrorCategory.ValidationError,
|
||||
"Schema validation failed",
|
||||
"Input data failed schema validation. Ensure data conforms to the expected schema.",
|
||||
IsRetryable: false,
|
||||
SuggestedRetryDelay: null),
|
||||
|
||||
[ErrorCodes.MissingRequiredField] = new(
|
||||
ErrorCodes.MissingRequiredField,
|
||||
ErrorCategory.ValidationError,
|
||||
"Missing required field",
|
||||
"A required field is missing from the input. Provide all required fields.",
|
||||
IsRetryable: false,
|
||||
SuggestedRetryDelay: null),
|
||||
|
||||
[ErrorCodes.ValidationFailed] = new(
|
||||
ErrorCodes.ValidationFailed,
|
||||
ErrorCategory.ValidationError,
|
||||
"Validation failed",
|
||||
"Input validation failed. Review the error details and correct the input.",
|
||||
IsRetryable: false,
|
||||
SuggestedRetryDelay: null),
|
||||
|
||||
// Upstream errors
|
||||
[ErrorCodes.RegistryError] = new(
|
||||
ErrorCodes.RegistryError,
|
||||
ErrorCategory.UpstreamError,
|
||||
"Container registry error",
|
||||
"The container registry returned an error. Check registry status and logs for details.",
|
||||
IsRetryable: true,
|
||||
SuggestedRetryDelay: TimeSpan.FromMinutes(5)),
|
||||
|
||||
[ErrorCodes.AdvisoryFeedError] = new(
|
||||
ErrorCodes.AdvisoryFeedError,
|
||||
ErrorCategory.UpstreamError,
|
||||
"Advisory feed error",
|
||||
"Error fetching from advisory feed. Check feed URL and authentication. May be temporary.",
|
||||
IsRetryable: true,
|
||||
SuggestedRetryDelay: TimeSpan.FromMinutes(15)),
|
||||
|
||||
[ErrorCodes.DatabaseError] = new(
|
||||
ErrorCodes.DatabaseError,
|
||||
ErrorCategory.UpstreamError,
|
||||
"Database error",
|
||||
"Database operation failed. Check database connectivity and status.",
|
||||
IsRetryable: true,
|
||||
SuggestedRetryDelay: TimeSpan.FromMinutes(1)),
|
||||
|
||||
[ErrorCodes.ExternalServiceError] = new(
|
||||
ErrorCodes.ExternalServiceError,
|
||||
ErrorCategory.UpstreamError,
|
||||
"External service error",
|
||||
"An external service dependency failed. Check service status and connectivity.",
|
||||
IsRetryable: true,
|
||||
SuggestedRetryDelay: TimeSpan.FromMinutes(5)),
|
||||
|
||||
// Internal errors
|
||||
[ErrorCodes.InternalError] = new(
|
||||
ErrorCodes.InternalError,
|
||||
ErrorCategory.InternalError,
|
||||
"Internal processing error",
|
||||
"An internal error occurred. This may indicate a bug. Please report if persistent.",
|
||||
IsRetryable: false,
|
||||
SuggestedRetryDelay: null),
|
||||
|
||||
[ErrorCodes.StateCorruption] = new(
|
||||
ErrorCodes.StateCorruption,
|
||||
ErrorCategory.InternalError,
|
||||
"State corruption detected",
|
||||
"Internal state corruption detected. Manual intervention may be required.",
|
||||
IsRetryable: false,
|
||||
SuggestedRetryDelay: null),
|
||||
|
||||
[ErrorCodes.ProcessingError] = new(
|
||||
ErrorCodes.ProcessingError,
|
||||
ErrorCategory.InternalError,
|
||||
"Processing error",
|
||||
"Error during job processing. Review job payload and configuration.",
|
||||
IsRetryable: false,
|
||||
SuggestedRetryDelay: null),
|
||||
|
||||
[ErrorCodes.UnexpectedError] = new(
|
||||
ErrorCodes.UnexpectedError,
|
||||
ErrorCategory.InternalError,
|
||||
"Unexpected error",
|
||||
"An unexpected error occurred. This may indicate a bug. Please report with error details.",
|
||||
IsRetryable: false,
|
||||
SuggestedRetryDelay: null),
|
||||
|
||||
// Conflict errors
|
||||
[ErrorCodes.DuplicateJob] = new(
|
||||
ErrorCodes.DuplicateJob,
|
||||
ErrorCategory.Conflict,
|
||||
"Duplicate job detected",
|
||||
"A job with the same idempotency key already exists. This is expected for retry scenarios.",
|
||||
IsRetryable: false,
|
||||
SuggestedRetryDelay: null),
|
||||
|
||||
[ErrorCodes.VersionMismatch] = new(
|
||||
ErrorCodes.VersionMismatch,
|
||||
ErrorCategory.Conflict,
|
||||
"Version mismatch",
|
||||
"Resource version conflict detected. Refresh and retry the operation.",
|
||||
IsRetryable: true,
|
||||
SuggestedRetryDelay: TimeSpan.FromSeconds(5)),
|
||||
|
||||
[ErrorCodes.ConcurrentModification] = new(
|
||||
ErrorCodes.ConcurrentModification,
|
||||
ErrorCategory.Conflict,
|
||||
"Concurrent modification",
|
||||
"Resource was modified concurrently. Refresh state and retry.",
|
||||
IsRetryable: true,
|
||||
SuggestedRetryDelay: TimeSpan.FromSeconds(5)),
|
||||
|
||||
[ErrorCodes.ConflictError] = new(
|
||||
ErrorCodes.ConflictError,
|
||||
ErrorCategory.Conflict,
|
||||
"Resource conflict",
|
||||
"A resource conflict occurred. Check for concurrent operations.",
|
||||
IsRetryable: true,
|
||||
SuggestedRetryDelay: TimeSpan.FromSeconds(10)),
|
||||
|
||||
// Canceled errors
|
||||
[ErrorCodes.UserCanceled] = new(
|
||||
ErrorCodes.UserCanceled,
|
||||
ErrorCategory.Canceled,
|
||||
"Canceled by user",
|
||||
"Operation was canceled by user request. No action required unless retry is desired.",
|
||||
IsRetryable: false,
|
||||
SuggestedRetryDelay: null),
|
||||
|
||||
[ErrorCodes.SystemCanceled] = new(
|
||||
ErrorCodes.SystemCanceled,
|
||||
ErrorCategory.Canceled,
|
||||
"Canceled by system",
|
||||
"Operation was canceled by the system (e.g., shutdown, quota). May be automatically rescheduled.",
|
||||
IsRetryable: true,
|
||||
SuggestedRetryDelay: TimeSpan.FromMinutes(5)),
|
||||
|
||||
[ErrorCodes.TimeoutCanceled] = new(
|
||||
ErrorCodes.TimeoutCanceled,
|
||||
ErrorCategory.Canceled,
|
||||
"Canceled due to timeout",
|
||||
"Operation exceeded its time limit. Consider increasing timeout or optimizing the operation.",
|
||||
IsRetryable: true,
|
||||
SuggestedRetryDelay: TimeSpan.FromMinutes(2)),
|
||||
|
||||
[ErrorCodes.OperationCanceled] = new(
|
||||
ErrorCodes.OperationCanceled,
|
||||
ErrorCategory.Canceled,
|
||||
"Operation canceled",
|
||||
"The operation was canceled. Check cancellation source for details.",
|
||||
IsRetryable: false,
|
||||
SuggestedRetryDelay: null)
|
||||
};
|
||||
|
||||
/// <inheritdoc />
|
||||
public ClassifiedError Classify(Exception exception)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(exception);
|
||||
|
||||
return exception switch
|
||||
{
|
||||
OperationCanceledException => KnownErrors[ErrorCodes.OperationCanceled],
|
||||
TimeoutException => KnownErrors[ErrorCodes.NetworkTimeout],
|
||||
HttpRequestException httpEx => ClassifyHttpException(httpEx),
|
||||
_ when exception.Message.Contains("connection refused", StringComparison.OrdinalIgnoreCase)
|
||||
=> KnownErrors[ErrorCodes.ConnectionRefused],
|
||||
_ when exception.Message.Contains("DNS", StringComparison.OrdinalIgnoreCase)
|
||||
=> KnownErrors[ErrorCodes.DnsResolutionFailed],
|
||||
_ when exception.Message.Contains("timeout", StringComparison.OrdinalIgnoreCase)
|
||||
=> KnownErrors[ErrorCodes.NetworkTimeout],
|
||||
_ when exception.Message.Contains("certificate", StringComparison.OrdinalIgnoreCase)
|
||||
=> KnownErrors[ErrorCodes.CertificateError],
|
||||
_ when exception.Message.Contains("unauthorized", StringComparison.OrdinalIgnoreCase)
|
||||
=> KnownErrors[ErrorCodes.AuthenticationFailed],
|
||||
_ when exception.Message.Contains("forbidden", StringComparison.OrdinalIgnoreCase)
|
||||
=> KnownErrors[ErrorCodes.InsufficientPermissions],
|
||||
_ => new ClassifiedError(
|
||||
ErrorCodes.UnexpectedError,
|
||||
ErrorCategory.InternalError,
|
||||
exception.GetType().Name,
|
||||
$"Unexpected error: {exception.Message}. Review stack trace for details.",
|
||||
IsRetryable: false,
|
||||
SuggestedRetryDelay: null)
|
||||
};
|
||||
}
|
||||
|
||||
/// <inheritdoc />
|
||||
public ClassifiedError Classify(string errorCode, string message)
|
||||
{
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(errorCode);
|
||||
|
||||
if (KnownErrors.TryGetValue(errorCode, out var known))
|
||||
{
|
||||
return known;
|
||||
}
|
||||
|
||||
// Try to infer from error code prefix
|
||||
var category = errorCode switch
|
||||
{
|
||||
_ when errorCode.StartsWith("ORCH-TRN-", StringComparison.Ordinal) => ErrorCategory.Transient,
|
||||
_ when errorCode.StartsWith("ORCH-NF-", StringComparison.Ordinal) => ErrorCategory.NotFound,
|
||||
_ when errorCode.StartsWith("ORCH-AUTH-", StringComparison.Ordinal) => ErrorCategory.AuthFailure,
|
||||
_ when errorCode.StartsWith("ORCH-RL-", StringComparison.Ordinal) => ErrorCategory.RateLimited,
|
||||
_ when errorCode.StartsWith("ORCH-VAL-", StringComparison.Ordinal) => ErrorCategory.ValidationError,
|
||||
_ when errorCode.StartsWith("ORCH-UP-", StringComparison.Ordinal) => ErrorCategory.UpstreamError,
|
||||
_ when errorCode.StartsWith("ORCH-INT-", StringComparison.Ordinal) => ErrorCategory.InternalError,
|
||||
_ when errorCode.StartsWith("ORCH-CON-", StringComparison.Ordinal) => ErrorCategory.Conflict,
|
||||
_ when errorCode.StartsWith("ORCH-CAN-", StringComparison.Ordinal) => ErrorCategory.Canceled,
|
||||
_ => ErrorCategory.Unknown
|
||||
};
|
||||
|
||||
var isRetryable = category is ErrorCategory.Transient or ErrorCategory.RateLimited or ErrorCategory.UpstreamError;
|
||||
|
||||
return new ClassifiedError(
|
||||
errorCode,
|
||||
category,
|
||||
message,
|
||||
"Unknown error code. Review the error message for details.",
|
||||
isRetryable,
|
||||
isRetryable ? TimeSpan.FromMinutes(5) : null);
|
||||
}
|
||||
|
||||
/// <inheritdoc />
|
||||
public ClassifiedError ClassifyHttpError(int statusCode, string? message)
|
||||
{
|
||||
return statusCode switch
|
||||
{
|
||||
400 => KnownErrors[ErrorCodes.ValidationFailed],
|
||||
401 => KnownErrors[ErrorCodes.AuthenticationFailed],
|
||||
403 => KnownErrors[ErrorCodes.InsufficientPermissions],
|
||||
404 => KnownErrors[ErrorCodes.ResourceNotFound],
|
||||
408 => KnownErrors[ErrorCodes.NetworkTimeout],
|
||||
409 => KnownErrors[ErrorCodes.ConflictError],
|
||||
429 => KnownErrors[ErrorCodes.RateLimited],
|
||||
500 => KnownErrors[ErrorCodes.InternalError],
|
||||
502 => KnownErrors[ErrorCodes.ExternalServiceError],
|
||||
503 => KnownErrors[ErrorCodes.ServiceUnavailable],
|
||||
504 => KnownErrors[ErrorCodes.GatewayTimeout],
|
||||
_ when statusCode >= 400 && statusCode < 500 => new ClassifiedError(
|
||||
$"HTTP-{statusCode}",
|
||||
ErrorCategory.ValidationError,
|
||||
message ?? $"HTTP {statusCode} error",
|
||||
"Client error. Review request parameters.",
|
||||
IsRetryable: false,
|
||||
SuggestedRetryDelay: null),
|
||||
_ when statusCode >= 500 => new ClassifiedError(
|
||||
$"HTTP-{statusCode}",
|
||||
ErrorCategory.UpstreamError,
|
||||
message ?? $"HTTP {statusCode} error",
|
||||
"Server error. May be transient; retry with backoff.",
|
||||
IsRetryable: true,
|
||||
SuggestedRetryDelay: TimeSpan.FromMinutes(2)),
|
||||
_ => new ClassifiedError(
|
||||
$"HTTP-{statusCode}",
|
||||
ErrorCategory.Unknown,
|
||||
message ?? $"HTTP {statusCode}",
|
||||
"Unexpected HTTP status. Review response for details.",
|
||||
IsRetryable: false,
|
||||
SuggestedRetryDelay: null)
|
||||
};
|
||||
}
|
||||
|
||||
private ClassifiedError ClassifyHttpException(HttpRequestException ex)
|
||||
{
|
||||
if (ex.StatusCode.HasValue)
|
||||
{
|
||||
return ClassifyHttpError((int)ex.StatusCode.Value, ex.Message);
|
||||
}
|
||||
|
||||
// No status code - likely a connection error
|
||||
return ex.Message switch
|
||||
{
|
||||
_ when ex.Message.Contains("connection refused", StringComparison.OrdinalIgnoreCase)
|
||||
=> KnownErrors[ErrorCodes.ConnectionRefused],
|
||||
_ when ex.Message.Contains("name resolution", StringComparison.OrdinalIgnoreCase)
|
||||
=> KnownErrors[ErrorCodes.DnsResolutionFailed],
|
||||
_ when ex.Message.Contains("SSL", StringComparison.OrdinalIgnoreCase) ||
|
||||
ex.Message.Contains("TLS", StringComparison.OrdinalIgnoreCase)
|
||||
=> KnownErrors[ErrorCodes.CertificateError],
|
||||
_ => KnownErrors[ErrorCodes.ExternalServiceError]
|
||||
};
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,221 @@
|
||||
using StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
namespace StellaOps.JobEngine.Core.DeadLetter;
|
||||
|
||||
/// <summary>
|
||||
/// Repository for dead-letter entry persistence.
|
||||
/// </summary>
|
||||
public interface IDeadLetterRepository
|
||||
{
|
||||
/// <summary>Gets a dead-letter entry by ID.</summary>
|
||||
Task<DeadLetterEntry?> GetByIdAsync(
|
||||
string tenantId,
|
||||
Guid entryId,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>Gets a dead-letter entry by original job ID.</summary>
|
||||
Task<DeadLetterEntry?> GetByOriginalJobIdAsync(
|
||||
string tenantId,
|
||||
Guid originalJobId,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>Lists dead-letter entries with filtering and pagination.</summary>
|
||||
Task<IReadOnlyList<DeadLetterEntry>> ListAsync(
|
||||
string tenantId,
|
||||
DeadLetterListOptions options,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>Counts dead-letter entries with filtering.</summary>
|
||||
Task<long> CountAsync(
|
||||
string tenantId,
|
||||
DeadLetterListOptions options,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>Creates a new dead-letter entry.</summary>
|
||||
Task CreateAsync(
|
||||
DeadLetterEntry entry,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>Updates an existing dead-letter entry.</summary>
|
||||
Task<bool> UpdateAsync(
|
||||
DeadLetterEntry entry,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>Gets entries pending replay that are retryable.</summary>
|
||||
Task<IReadOnlyList<DeadLetterEntry>> GetPendingRetryableAsync(
|
||||
string tenantId,
|
||||
int limit,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>Gets entries by error code.</summary>
|
||||
Task<IReadOnlyList<DeadLetterEntry>> GetByErrorCodeAsync(
|
||||
string tenantId,
|
||||
string errorCode,
|
||||
DeadLetterStatus? status,
|
||||
int limit,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>Gets entries by category.</summary>
|
||||
Task<IReadOnlyList<DeadLetterEntry>> GetByCategoryAsync(
|
||||
string tenantId,
|
||||
ErrorCategory category,
|
||||
DeadLetterStatus? status,
|
||||
int limit,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>Gets aggregated statistics.</summary>
|
||||
Task<DeadLetterStats> GetStatsAsync(
|
||||
string tenantId,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>Gets a summary of actionable entries grouped by error code.</summary>
|
||||
Task<IReadOnlyList<DeadLetterSummary>> GetActionableSummaryAsync(
|
||||
string tenantId,
|
||||
int limit,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>Marks expired entries.</summary>
|
||||
Task<int> MarkExpiredAsync(
|
||||
int batchLimit,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>Purges old resolved/expired entries.</summary>
|
||||
Task<int> PurgeOldEntriesAsync(
|
||||
int retentionDays,
|
||||
int batchLimit,
|
||||
CancellationToken cancellationToken);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Options for listing dead-letter entries.
|
||||
/// </summary>
|
||||
public sealed record DeadLetterListOptions(
|
||||
DeadLetterStatus? Status = null,
|
||||
ErrorCategory? Category = null,
|
||||
string? JobType = null,
|
||||
string? ErrorCode = null,
|
||||
Guid? SourceId = null,
|
||||
Guid? RunId = null,
|
||||
bool? IsRetryable = null,
|
||||
DateTimeOffset? CreatedAfter = null,
|
||||
DateTimeOffset? CreatedBefore = null,
|
||||
string? Cursor = null,
|
||||
int Limit = 50,
|
||||
bool Ascending = false);
|
||||
|
||||
/// <summary>
|
||||
/// Aggregated dead-letter statistics.
|
||||
/// </summary>
|
||||
public sealed record DeadLetterStats(
|
||||
long TotalEntries,
|
||||
long PendingEntries,
|
||||
long ReplayingEntries,
|
||||
long ReplayedEntries,
|
||||
long ResolvedEntries,
|
||||
long ExhaustedEntries,
|
||||
long ExpiredEntries,
|
||||
long RetryableEntries,
|
||||
IReadOnlyDictionary<ErrorCategory, long> ByCategory,
|
||||
IReadOnlyDictionary<string, long> TopErrorCodes,
|
||||
IReadOnlyDictionary<string, long> TopJobTypes);
|
||||
|
||||
/// <summary>
|
||||
/// Summary of dead-letter entries grouped by error code.
|
||||
/// </summary>
|
||||
public sealed record DeadLetterSummary(
|
||||
string ErrorCode,
|
||||
ErrorCategory Category,
|
||||
long EntryCount,
|
||||
long RetryableCount,
|
||||
DateTimeOffset OldestEntry,
|
||||
string? SampleReason);
|
||||
|
||||
/// <summary>
|
||||
/// Repository for replay audit records.
|
||||
/// </summary>
|
||||
public interface IReplayAuditRepository
|
||||
{
|
||||
/// <summary>Gets audit records for an entry.</summary>
|
||||
Task<IReadOnlyList<ReplayAuditRecord>> GetByEntryAsync(
|
||||
string tenantId,
|
||||
Guid entryId,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>Gets a specific audit record.</summary>
|
||||
Task<ReplayAuditRecord?> GetByIdAsync(
|
||||
string tenantId,
|
||||
Guid auditId,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>Creates a new audit record.</summary>
|
||||
Task CreateAsync(
|
||||
ReplayAuditRecord record,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>Updates an audit record (completion).</summary>
|
||||
Task<bool> UpdateAsync(
|
||||
ReplayAuditRecord record,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>Gets audit records for a new job ID (to find replay source).</summary>
|
||||
Task<ReplayAuditRecord?> GetByNewJobIdAsync(
|
||||
string tenantId,
|
||||
Guid newJobId,
|
||||
CancellationToken cancellationToken);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Replay attempt audit record.
|
||||
/// </summary>
|
||||
public sealed record ReplayAuditRecord(
|
||||
Guid AuditId,
|
||||
string TenantId,
|
||||
Guid EntryId,
|
||||
int AttemptNumber,
|
||||
bool Success,
|
||||
Guid? NewJobId,
|
||||
string? ErrorMessage,
|
||||
string TriggeredBy,
|
||||
DateTimeOffset TriggeredAt,
|
||||
DateTimeOffset? CompletedAt,
|
||||
string InitiatedBy)
|
||||
{
|
||||
/// <summary>Creates a new audit record for a replay attempt.</summary>
|
||||
public static ReplayAuditRecord Create(
|
||||
string tenantId,
|
||||
Guid entryId,
|
||||
int attemptNumber,
|
||||
string triggeredBy,
|
||||
string initiatedBy,
|
||||
DateTimeOffset now) =>
|
||||
new(
|
||||
AuditId: Guid.NewGuid(),
|
||||
TenantId: tenantId,
|
||||
EntryId: entryId,
|
||||
AttemptNumber: attemptNumber,
|
||||
Success: false,
|
||||
NewJobId: null,
|
||||
ErrorMessage: null,
|
||||
TriggeredBy: triggeredBy,
|
||||
TriggeredAt: now,
|
||||
CompletedAt: null,
|
||||
InitiatedBy: initiatedBy);
|
||||
|
||||
/// <summary>Marks the replay as successful.</summary>
|
||||
public ReplayAuditRecord Complete(Guid newJobId, DateTimeOffset now) =>
|
||||
this with
|
||||
{
|
||||
Success = true,
|
||||
NewJobId = newJobId,
|
||||
CompletedAt = now
|
||||
};
|
||||
|
||||
/// <summary>Marks the replay as failed.</summary>
|
||||
public ReplayAuditRecord Fail(string errorMessage, DateTimeOffset now) =>
|
||||
this with
|
||||
{
|
||||
Success = false,
|
||||
ErrorMessage = errorMessage,
|
||||
CompletedAt = now
|
||||
};
|
||||
}
|
||||
@@ -0,0 +1,472 @@
|
||||
using Microsoft.Extensions.Logging;
|
||||
using StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
namespace StellaOps.JobEngine.Core.DeadLetter;
|
||||
|
||||
/// <summary>
|
||||
/// Options for replay manager configuration.
|
||||
/// </summary>
|
||||
public sealed record ReplayManagerOptions(
|
||||
/// <summary>Default maximum replay attempts.</summary>
|
||||
int DefaultMaxReplayAttempts = 3,
|
||||
|
||||
/// <summary>Default retention period for dead-letter entries.</summary>
|
||||
TimeSpan DefaultRetention = default,
|
||||
|
||||
/// <summary>Minimum delay between replay attempts.</summary>
|
||||
TimeSpan MinReplayDelay = default,
|
||||
|
||||
/// <summary>Maximum batch size for bulk operations.</summary>
|
||||
int MaxBatchSize = 100,
|
||||
|
||||
/// <summary>Enable automatic replay of retryable entries.</summary>
|
||||
bool AutoReplayEnabled = false,
|
||||
|
||||
/// <summary>Delay before automatic replay.</summary>
|
||||
TimeSpan AutoReplayDelay = default)
|
||||
{
|
||||
/// <summary>Default options.</summary>
|
||||
public static ReplayManagerOptions Default => new(
|
||||
DefaultMaxReplayAttempts: 3,
|
||||
DefaultRetention: TimeSpan.FromDays(30),
|
||||
MinReplayDelay: TimeSpan.FromMinutes(5),
|
||||
MaxBatchSize: 100,
|
||||
AutoReplayEnabled: false,
|
||||
AutoReplayDelay: TimeSpan.FromMinutes(15));
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Result of a replay operation.
|
||||
/// </summary>
|
||||
public sealed record ReplayResult(
|
||||
bool Success,
|
||||
Guid? NewJobId,
|
||||
string? ErrorMessage,
|
||||
DeadLetterEntry UpdatedEntry);
|
||||
|
||||
/// <summary>
|
||||
/// Result of a batch replay operation.
|
||||
/// </summary>
|
||||
public sealed record BatchReplayResult(
|
||||
int Attempted,
|
||||
int Succeeded,
|
||||
int Failed,
|
||||
IReadOnlyList<ReplayResult> Results);
|
||||
|
||||
/// <summary>
|
||||
/// Manages dead-letter entry replay operations.
|
||||
/// </summary>
|
||||
public interface IReplayManager
|
||||
{
|
||||
/// <summary>Replays a single dead-letter entry.</summary>
|
||||
Task<ReplayResult> ReplayAsync(
|
||||
string tenantId,
|
||||
Guid entryId,
|
||||
string initiatedBy,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>Replays multiple entries by ID.</summary>
|
||||
Task<BatchReplayResult> ReplayBatchAsync(
|
||||
string tenantId,
|
||||
IReadOnlyList<Guid> entryIds,
|
||||
string initiatedBy,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>Replays all pending retryable entries matching criteria.</summary>
|
||||
Task<BatchReplayResult> ReplayPendingAsync(
|
||||
string tenantId,
|
||||
string? errorCode,
|
||||
ErrorCategory? category,
|
||||
int maxCount,
|
||||
string initiatedBy,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>Resolves an entry without replay.</summary>
|
||||
Task<DeadLetterEntry> ResolveAsync(
|
||||
string tenantId,
|
||||
Guid entryId,
|
||||
string notes,
|
||||
string resolvedBy,
|
||||
CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>Resolves multiple entries without replay.</summary>
|
||||
Task<int> ResolveBatchAsync(
|
||||
string tenantId,
|
||||
IReadOnlyList<Guid> entryIds,
|
||||
string notes,
|
||||
string resolvedBy,
|
||||
CancellationToken cancellationToken);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Job creator interface for replay operations.
|
||||
/// </summary>
|
||||
public interface IJobCreator
|
||||
{
|
||||
/// <summary>Creates a new job from a dead-letter entry payload.</summary>
|
||||
Task<Job> CreateFromReplayAsync(
|
||||
string tenantId,
|
||||
string jobType,
|
||||
string payload,
|
||||
string payloadDigest,
|
||||
string idempotencyKey,
|
||||
string? correlationId,
|
||||
Guid replayOf,
|
||||
string createdBy,
|
||||
CancellationToken cancellationToken);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Default replay manager implementation.
|
||||
/// </summary>
|
||||
public sealed class ReplayManager : IReplayManager
|
||||
{
|
||||
private readonly IDeadLetterRepository _deadLetterRepository;
|
||||
private readonly IReplayAuditRepository _auditRepository;
|
||||
private readonly IJobCreator _jobCreator;
|
||||
private readonly IDeadLetterNotifier _notifier;
|
||||
private readonly TimeProvider _timeProvider;
|
||||
private readonly ReplayManagerOptions _options;
|
||||
private readonly ILogger<ReplayManager> _logger;
|
||||
|
||||
public ReplayManager(
|
||||
IDeadLetterRepository deadLetterRepository,
|
||||
IReplayAuditRepository auditRepository,
|
||||
IJobCreator jobCreator,
|
||||
IDeadLetterNotifier notifier,
|
||||
TimeProvider timeProvider,
|
||||
ReplayManagerOptions options,
|
||||
ILogger<ReplayManager> logger)
|
||||
{
|
||||
_deadLetterRepository = deadLetterRepository ?? throw new ArgumentNullException(nameof(deadLetterRepository));
|
||||
_auditRepository = auditRepository ?? throw new ArgumentNullException(nameof(auditRepository));
|
||||
_jobCreator = jobCreator ?? throw new ArgumentNullException(nameof(jobCreator));
|
||||
_notifier = notifier ?? throw new ArgumentNullException(nameof(notifier));
|
||||
_timeProvider = timeProvider ?? throw new ArgumentNullException(nameof(timeProvider));
|
||||
_options = options ?? ReplayManagerOptions.Default;
|
||||
_logger = logger ?? throw new ArgumentNullException(nameof(logger));
|
||||
}
|
||||
|
||||
public async Task<ReplayResult> ReplayAsync(
|
||||
string tenantId,
|
||||
Guid entryId,
|
||||
string initiatedBy,
|
||||
CancellationToken cancellationToken)
|
||||
{
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(tenantId);
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(initiatedBy);
|
||||
|
||||
var entry = await _deadLetterRepository.GetByIdAsync(tenantId, entryId, cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
|
||||
if (entry is null)
|
||||
{
|
||||
throw new InvalidOperationException($"Dead-letter entry {entryId} not found.");
|
||||
}
|
||||
|
||||
return await ReplayEntryAsync(entry, "manual", initiatedBy, cancellationToken).ConfigureAwait(false);
|
||||
}
|
||||
|
||||
public async Task<BatchReplayResult> ReplayBatchAsync(
|
||||
string tenantId,
|
||||
IReadOnlyList<Guid> entryIds,
|
||||
string initiatedBy,
|
||||
CancellationToken cancellationToken)
|
||||
{
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(tenantId);
|
||||
ArgumentNullException.ThrowIfNull(entryIds);
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(initiatedBy);
|
||||
|
||||
if (entryIds.Count > _options.MaxBatchSize)
|
||||
{
|
||||
throw new ArgumentException($"Batch size {entryIds.Count} exceeds maximum {_options.MaxBatchSize}.");
|
||||
}
|
||||
|
||||
var results = new List<ReplayResult>();
|
||||
var succeeded = 0;
|
||||
var failed = 0;
|
||||
|
||||
foreach (var entryId in entryIds)
|
||||
{
|
||||
try
|
||||
{
|
||||
var entry = await _deadLetterRepository.GetByIdAsync(tenantId, entryId, cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
|
||||
if (entry is null)
|
||||
{
|
||||
results.Add(new ReplayResult(
|
||||
Success: false,
|
||||
NewJobId: null,
|
||||
ErrorMessage: $"Entry {entryId} not found.",
|
||||
UpdatedEntry: null!));
|
||||
failed++;
|
||||
continue;
|
||||
}
|
||||
|
||||
var result = await ReplayEntryAsync(entry, "batch", initiatedBy, cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
results.Add(result);
|
||||
|
||||
if (result.Success)
|
||||
succeeded++;
|
||||
else
|
||||
failed++;
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
_logger.LogError(ex, "Failed to replay entry {EntryId}", entryId);
|
||||
results.Add(new ReplayResult(
|
||||
Success: false,
|
||||
NewJobId: null,
|
||||
ErrorMessage: ex.Message,
|
||||
UpdatedEntry: null!));
|
||||
failed++;
|
||||
}
|
||||
}
|
||||
|
||||
return new BatchReplayResult(
|
||||
Attempted: entryIds.Count,
|
||||
Succeeded: succeeded,
|
||||
Failed: failed,
|
||||
Results: results);
|
||||
}
|
||||
|
||||
public async Task<BatchReplayResult> ReplayPendingAsync(
|
||||
string tenantId,
|
||||
string? errorCode,
|
||||
ErrorCategory? category,
|
||||
int maxCount,
|
||||
string initiatedBy,
|
||||
CancellationToken cancellationToken)
|
||||
{
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(tenantId);
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(initiatedBy);
|
||||
|
||||
var effectiveLimit = Math.Min(maxCount, _options.MaxBatchSize);
|
||||
|
||||
IReadOnlyList<DeadLetterEntry> entries;
|
||||
if (!string.IsNullOrEmpty(errorCode))
|
||||
{
|
||||
entries = await _deadLetterRepository.GetByErrorCodeAsync(
|
||||
tenantId, errorCode, DeadLetterStatus.Pending, effectiveLimit, cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
}
|
||||
else if (category.HasValue)
|
||||
{
|
||||
entries = await _deadLetterRepository.GetByCategoryAsync(
|
||||
tenantId, category.Value, DeadLetterStatus.Pending, effectiveLimit, cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
}
|
||||
else
|
||||
{
|
||||
entries = await _deadLetterRepository.GetPendingRetryableAsync(tenantId, effectiveLimit, cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
}
|
||||
|
||||
var results = new List<ReplayResult>();
|
||||
var succeeded = 0;
|
||||
var failed = 0;
|
||||
|
||||
foreach (var entry in entries)
|
||||
{
|
||||
if (!entry.CanReplay)
|
||||
{
|
||||
continue;
|
||||
}
|
||||
|
||||
try
|
||||
{
|
||||
var result = await ReplayEntryAsync(entry, "auto", initiatedBy, cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
results.Add(result);
|
||||
|
||||
if (result.Success)
|
||||
succeeded++;
|
||||
else
|
||||
failed++;
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
_logger.LogError(ex, "Failed to replay entry {EntryId}", entry.EntryId);
|
||||
results.Add(new ReplayResult(
|
||||
Success: false,
|
||||
NewJobId: null,
|
||||
ErrorMessage: ex.Message,
|
||||
UpdatedEntry: entry));
|
||||
failed++;
|
||||
}
|
||||
}
|
||||
|
||||
return new BatchReplayResult(
|
||||
Attempted: results.Count,
|
||||
Succeeded: succeeded,
|
||||
Failed: failed,
|
||||
Results: results);
|
||||
}
|
||||
|
||||
public async Task<DeadLetterEntry> ResolveAsync(
|
||||
string tenantId,
|
||||
Guid entryId,
|
||||
string notes,
|
||||
string resolvedBy,
|
||||
CancellationToken cancellationToken)
|
||||
{
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(tenantId);
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(resolvedBy);
|
||||
|
||||
var entry = await _deadLetterRepository.GetByIdAsync(tenantId, entryId, cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
|
||||
if (entry is null)
|
||||
{
|
||||
throw new InvalidOperationException($"Dead-letter entry {entryId} not found.");
|
||||
}
|
||||
|
||||
var now = _timeProvider.GetUtcNow();
|
||||
var resolved = entry.Resolve(notes, resolvedBy, now);
|
||||
|
||||
await _deadLetterRepository.UpdateAsync(resolved, cancellationToken).ConfigureAwait(false);
|
||||
|
||||
_logger.LogInformation(
|
||||
"Resolved dead-letter entry {EntryId} for job {JobId}. Notes: {Notes}",
|
||||
entryId, entry.OriginalJobId, notes);
|
||||
|
||||
return resolved;
|
||||
}
|
||||
|
||||
public async Task<int> ResolveBatchAsync(
|
||||
string tenantId,
|
||||
IReadOnlyList<Guid> entryIds,
|
||||
string notes,
|
||||
string resolvedBy,
|
||||
CancellationToken cancellationToken)
|
||||
{
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(tenantId);
|
||||
ArgumentNullException.ThrowIfNull(entryIds);
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(resolvedBy);
|
||||
|
||||
var resolved = 0;
|
||||
var now = _timeProvider.GetUtcNow();
|
||||
|
||||
foreach (var entryId in entryIds)
|
||||
{
|
||||
try
|
||||
{
|
||||
var entry = await _deadLetterRepository.GetByIdAsync(tenantId, entryId, cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
|
||||
if (entry is null || entry.IsTerminal)
|
||||
{
|
||||
continue;
|
||||
}
|
||||
|
||||
var resolvedEntry = entry.Resolve(notes, resolvedBy, now);
|
||||
await _deadLetterRepository.UpdateAsync(resolvedEntry, cancellationToken).ConfigureAwait(false);
|
||||
resolved++;
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
_logger.LogError(ex, "Failed to resolve entry {EntryId}", entryId);
|
||||
}
|
||||
}
|
||||
|
||||
return resolved;
|
||||
}
|
||||
|
||||
private async Task<ReplayResult> ReplayEntryAsync(
|
||||
DeadLetterEntry entry,
|
||||
string triggeredBy,
|
||||
string initiatedBy,
|
||||
CancellationToken cancellationToken)
|
||||
{
|
||||
if (!entry.CanReplay)
|
||||
{
|
||||
return new ReplayResult(
|
||||
Success: false,
|
||||
NewJobId: null,
|
||||
ErrorMessage: $"Entry cannot be replayed: status={entry.Status}, attempts={entry.ReplayAttempts}/{entry.MaxReplayAttempts}, retryable={entry.IsRetryable}",
|
||||
UpdatedEntry: entry);
|
||||
}
|
||||
|
||||
var now = _timeProvider.GetUtcNow();
|
||||
|
||||
// Mark entry as replaying
|
||||
var replaying = entry.StartReplay(initiatedBy, now);
|
||||
await _deadLetterRepository.UpdateAsync(replaying, cancellationToken).ConfigureAwait(false);
|
||||
|
||||
// Create audit record
|
||||
var auditRecord = ReplayAuditRecord.Create(
|
||||
entry.TenantId,
|
||||
entry.EntryId,
|
||||
replaying.ReplayAttempts,
|
||||
triggeredBy,
|
||||
initiatedBy,
|
||||
now);
|
||||
await _auditRepository.CreateAsync(auditRecord, cancellationToken).ConfigureAwait(false);
|
||||
|
||||
try
|
||||
{
|
||||
// Create new job with updated idempotency key
|
||||
var newIdempotencyKey = $"{entry.IdempotencyKey}:replay:{replaying.ReplayAttempts}";
|
||||
var newJob = await _jobCreator.CreateFromReplayAsync(
|
||||
entry.TenantId,
|
||||
entry.JobType,
|
||||
entry.Payload,
|
||||
entry.PayloadDigest,
|
||||
newIdempotencyKey,
|
||||
entry.CorrelationId,
|
||||
entry.OriginalJobId,
|
||||
initiatedBy,
|
||||
cancellationToken).ConfigureAwait(false);
|
||||
|
||||
// Mark replay successful
|
||||
now = _timeProvider.GetUtcNow();
|
||||
var completed = replaying.CompleteReplay(newJob.JobId, initiatedBy, now);
|
||||
await _deadLetterRepository.UpdateAsync(completed, cancellationToken).ConfigureAwait(false);
|
||||
|
||||
// Update audit record
|
||||
var completedAudit = auditRecord.Complete(newJob.JobId, now);
|
||||
await _auditRepository.UpdateAsync(completedAudit, cancellationToken).ConfigureAwait(false);
|
||||
|
||||
_logger.LogInformation(
|
||||
"Replayed dead-letter entry {EntryId} as new job {NewJobId}",
|
||||
entry.EntryId, newJob.JobId);
|
||||
|
||||
// Notify on success
|
||||
await _notifier.NotifyReplaySuccessAsync(completed, newJob.JobId, cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
|
||||
return new ReplayResult(
|
||||
Success: true,
|
||||
NewJobId: newJob.JobId,
|
||||
ErrorMessage: null,
|
||||
UpdatedEntry: completed);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
_logger.LogError(ex, "Failed to replay entry {EntryId}", entry.EntryId);
|
||||
|
||||
// Mark replay failed
|
||||
now = _timeProvider.GetUtcNow();
|
||||
var failed = replaying.FailReplay(ex.Message, initiatedBy, now);
|
||||
await _deadLetterRepository.UpdateAsync(failed, cancellationToken).ConfigureAwait(false);
|
||||
|
||||
// Update audit record
|
||||
var failedAudit = auditRecord.Fail(ex.Message, now);
|
||||
await _auditRepository.UpdateAsync(failedAudit, cancellationToken).ConfigureAwait(false);
|
||||
|
||||
// Notify on exhausted
|
||||
if (failed.Status == DeadLetterStatus.Exhausted)
|
||||
{
|
||||
await _notifier.NotifyExhaustedAsync(failed, cancellationToken).ConfigureAwait(false);
|
||||
}
|
||||
|
||||
return new ReplayResult(
|
||||
Success: false,
|
||||
NewJobId: null,
|
||||
ErrorMessage: ex.Message,
|
||||
UpdatedEntry: failed);
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,116 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain.AirGap;
|
||||
|
||||
/// <summary>
|
||||
/// Provenance record for an imported air-gap bundle.
|
||||
/// Per ORCH-AIRGAP-56-002 and ledger-airgap-staleness.schema.json.
|
||||
/// </summary>
|
||||
public sealed record BundleProvenance(
|
||||
/// <summary>Unique bundle identifier.</summary>
|
||||
Guid BundleId,
|
||||
|
||||
/// <summary>Bundle domain (vex-advisories, vulnerability-feeds, etc.).</summary>
|
||||
string DomainId,
|
||||
|
||||
/// <summary>When bundle was imported into this environment.</summary>
|
||||
DateTimeOffset ImportedAt,
|
||||
|
||||
/// <summary>Original generation timestamp from source environment.</summary>
|
||||
DateTimeOffset SourceTimestamp,
|
||||
|
||||
/// <summary>Source environment identifier.</summary>
|
||||
string? SourceEnvironment,
|
||||
|
||||
/// <summary>SHA-256 digest of the bundle contents.</summary>
|
||||
string? BundleDigest,
|
||||
|
||||
/// <summary>SHA-256 digest of the bundle manifest.</summary>
|
||||
string? ManifestDigest,
|
||||
|
||||
/// <summary>Time anchor used for staleness calculation.</summary>
|
||||
TimeAnchor? TimeAnchor,
|
||||
|
||||
/// <summary>Exports included in this bundle.</summary>
|
||||
IReadOnlyList<ExportRecord>? Exports,
|
||||
|
||||
/// <summary>Additional bundle metadata.</summary>
|
||||
IReadOnlyDictionary<string, string>? Metadata)
|
||||
{
|
||||
/// <summary>
|
||||
/// Calculates staleness in seconds (importedAt - sourceTimestamp).
|
||||
/// </summary>
|
||||
public int StalenessSeconds => (int)(ImportedAt - SourceTimestamp).TotalSeconds;
|
||||
|
||||
/// <summary>
|
||||
/// Calculates current staleness based on provided time reference.
|
||||
/// </summary>
|
||||
public int CurrentStalenessSeconds(DateTimeOffset now) => (int)(now - SourceTimestamp).TotalSeconds;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Trusted time reference for staleness calculations.
|
||||
/// </summary>
|
||||
public sealed record TimeAnchor(
|
||||
/// <summary>Type of time anchor.</summary>
|
||||
TimeAnchorType AnchorType,
|
||||
|
||||
/// <summary>Anchor timestamp (UTC).</summary>
|
||||
DateTimeOffset Timestamp,
|
||||
|
||||
/// <summary>Time source identifier.</summary>
|
||||
string? Source,
|
||||
|
||||
/// <summary>Time uncertainty in milliseconds.</summary>
|
||||
int? Uncertainty,
|
||||
|
||||
/// <summary>Digest of time attestation signature if applicable.</summary>
|
||||
string? SignatureDigest,
|
||||
|
||||
/// <summary>Whether time anchor was cryptographically verified.</summary>
|
||||
bool Verified);
|
||||
|
||||
/// <summary>
|
||||
/// Type of time anchor for staleness calculations.
|
||||
/// </summary>
|
||||
public enum TimeAnchorType
|
||||
{
|
||||
Ntp,
|
||||
Roughtime,
|
||||
HardwareClock,
|
||||
AttestationTsa,
|
||||
Manual
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Record of an export included in a bundle.
|
||||
/// </summary>
|
||||
public sealed record ExportRecord(
|
||||
/// <summary>Export identifier.</summary>
|
||||
Guid ExportId,
|
||||
|
||||
/// <summary>Export key.</summary>
|
||||
string Key,
|
||||
|
||||
/// <summary>Export data format.</summary>
|
||||
ExportFormat Format,
|
||||
|
||||
/// <summary>When export was created.</summary>
|
||||
DateTimeOffset CreatedAt,
|
||||
|
||||
/// <summary>Export artifact digest.</summary>
|
||||
string ArtifactDigest,
|
||||
|
||||
/// <summary>Number of records in export.</summary>
|
||||
int? RecordCount);
|
||||
|
||||
/// <summary>
|
||||
/// Export data format.
|
||||
/// </summary>
|
||||
public enum ExportFormat
|
||||
{
|
||||
OpenVex,
|
||||
Csaf,
|
||||
CycloneDx,
|
||||
Spdx,
|
||||
Ndjson,
|
||||
Json
|
||||
}
|
||||
@@ -0,0 +1,258 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain.AirGap;
|
||||
|
||||
/// <summary>
|
||||
/// Enforcement mode for air-gap policies.
|
||||
/// </summary>
|
||||
public enum EnforcementMode
|
||||
{
|
||||
/// <summary>Enforcement is disabled.</summary>
|
||||
Disabled,
|
||||
|
||||
/// <summary>Violations are logged as warnings but not blocked.</summary>
|
||||
Warn,
|
||||
|
||||
/// <summary>Violations are blocked strictly.</summary>
|
||||
Strict
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Declares a network intent for a job descriptor.
|
||||
/// Per ORCH-AIRGAP-56-001: Enforce job descriptors to declare network intents.
|
||||
/// </summary>
|
||||
public sealed record NetworkIntent(
|
||||
/// <summary>Target host or hostname pattern.</summary>
|
||||
string Host,
|
||||
|
||||
/// <summary>Target port (null for any port).</summary>
|
||||
int? Port,
|
||||
|
||||
/// <summary>Protocol (http, https, grpc, etc.).</summary>
|
||||
string Protocol,
|
||||
|
||||
/// <summary>Purpose description for audit trail.</summary>
|
||||
string Purpose,
|
||||
|
||||
/// <summary>Whether this is an egress (outbound) or ingress (inbound) intent.</summary>
|
||||
NetworkDirection Direction = NetworkDirection.Egress)
|
||||
{
|
||||
/// <summary>
|
||||
/// Creates a network intent for HTTPS egress to a specific host.
|
||||
/// </summary>
|
||||
public static NetworkIntent HttpsEgress(string host, string purpose, int? port = 443)
|
||||
=> new(host, port, "https", purpose, NetworkDirection.Egress);
|
||||
|
||||
/// <summary>
|
||||
/// Creates a network intent for HTTP egress to a specific host.
|
||||
/// </summary>
|
||||
public static NetworkIntent HttpEgress(string host, string purpose, int? port = 80)
|
||||
=> new(host, port, "http", purpose, NetworkDirection.Egress);
|
||||
|
||||
/// <summary>
|
||||
/// Creates a network intent for gRPC egress to a specific host.
|
||||
/// </summary>
|
||||
public static NetworkIntent GrpcEgress(string host, string purpose, int? port = 443)
|
||||
=> new(host, port, "grpc", purpose, NetworkDirection.Egress);
|
||||
|
||||
/// <summary>
|
||||
/// Checks if this intent matches an allowlist entry.
|
||||
/// </summary>
|
||||
public bool MatchesAllowlistEntry(NetworkAllowlistEntry entry)
|
||||
{
|
||||
if (!HostMatches(entry.HostPattern))
|
||||
return false;
|
||||
|
||||
if (entry.Port.HasValue && Port.HasValue && entry.Port != Port)
|
||||
return false;
|
||||
|
||||
if (!string.IsNullOrEmpty(entry.Protocol) &&
|
||||
!string.Equals(entry.Protocol, Protocol, StringComparison.OrdinalIgnoreCase))
|
||||
return false;
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
private bool HostMatches(string pattern)
|
||||
{
|
||||
if (string.Equals(pattern, "*", StringComparison.Ordinal))
|
||||
return true;
|
||||
|
||||
if (pattern.StartsWith("*.", StringComparison.Ordinal))
|
||||
{
|
||||
var suffix = pattern[1..]; // e.g., ".example.com"
|
||||
return Host.EndsWith(suffix, StringComparison.OrdinalIgnoreCase) ||
|
||||
string.Equals(Host, pattern[2..], StringComparison.OrdinalIgnoreCase);
|
||||
}
|
||||
|
||||
return string.Equals(Host, pattern, StringComparison.OrdinalIgnoreCase);
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Network traffic direction.
|
||||
/// </summary>
|
||||
public enum NetworkDirection
|
||||
{
|
||||
/// <summary>Outbound traffic from the job.</summary>
|
||||
Egress,
|
||||
|
||||
/// <summary>Inbound traffic to the job (e.g., callbacks).</summary>
|
||||
Ingress
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Entry in the network allowlist for sealed mode.
|
||||
/// </summary>
|
||||
public sealed record NetworkAllowlistEntry(
|
||||
/// <summary>Host pattern (exact match or wildcard like "*.example.com").</summary>
|
||||
string HostPattern,
|
||||
|
||||
/// <summary>Allowed port (null for any port).</summary>
|
||||
int? Port = null,
|
||||
|
||||
/// <summary>Allowed protocol (null for any protocol).</summary>
|
||||
string? Protocol = null,
|
||||
|
||||
/// <summary>Description of why this entry is allowed.</summary>
|
||||
string? Description = null);
|
||||
|
||||
/// <summary>
|
||||
/// Result of network intent validation.
|
||||
/// </summary>
|
||||
public sealed record NetworkIntentValidationResult(
|
||||
/// <summary>Whether the validation passed.</summary>
|
||||
bool IsValid,
|
||||
|
||||
/// <summary>Whether the job should be blocked from scheduling.</summary>
|
||||
bool ShouldBlock,
|
||||
|
||||
/// <summary>Error code if validation failed.</summary>
|
||||
string? ErrorCode,
|
||||
|
||||
/// <summary>Human-readable error message.</summary>
|
||||
string? ErrorMessage,
|
||||
|
||||
/// <summary>Detailed violations found.</summary>
|
||||
IReadOnlyList<NetworkIntentViolation> Violations,
|
||||
|
||||
/// <summary>Recommendations for resolving violations.</summary>
|
||||
IReadOnlyList<string> Recommendations)
|
||||
{
|
||||
/// <summary>
|
||||
/// Creates a successful validation result.
|
||||
/// </summary>
|
||||
public static NetworkIntentValidationResult Success()
|
||||
=> new(true, false, null, null, [], []);
|
||||
|
||||
/// <summary>
|
||||
/// Creates a validation result for missing network intents.
|
||||
/// </summary>
|
||||
public static NetworkIntentValidationResult MissingIntents(
|
||||
IReadOnlyList<string> detectedEndpoints,
|
||||
bool shouldBlock)
|
||||
{
|
||||
var violations = detectedEndpoints
|
||||
.Select(e => new NetworkIntentViolation(e, NetworkViolationType.MissingIntent, null))
|
||||
.ToList();
|
||||
|
||||
return new(
|
||||
IsValid: false,
|
||||
ShouldBlock: shouldBlock,
|
||||
ErrorCode: "NETWORK_INTENT_MISSING",
|
||||
ErrorMessage: $"Job accesses {detectedEndpoints.Count} network endpoint(s) without declared intents",
|
||||
Violations: violations,
|
||||
Recommendations: [
|
||||
"Add 'networkIntents' to the job payload declaring all external endpoints",
|
||||
"Use NetworkIntent.HttpsEgress() for HTTPS endpoints",
|
||||
$"Endpoints detected: {string.Join(", ", detectedEndpoints.Take(5))}"
|
||||
]);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Creates a validation result for disallowed network intents.
|
||||
/// </summary>
|
||||
public static NetworkIntentValidationResult DisallowedIntents(
|
||||
IReadOnlyList<NetworkIntentViolation> violations,
|
||||
bool shouldBlock)
|
||||
{
|
||||
var disallowedHosts = violations
|
||||
.Where(v => v.ViolationType == NetworkViolationType.NotInAllowlist)
|
||||
.Select(v => v.Endpoint)
|
||||
.Distinct()
|
||||
.ToList();
|
||||
|
||||
return new(
|
||||
IsValid: false,
|
||||
ShouldBlock: shouldBlock,
|
||||
ErrorCode: "NETWORK_INTENT_DISALLOWED",
|
||||
ErrorMessage: $"Job declares {violations.Count} network intent(s) not in sealed-mode allowlist",
|
||||
Violations: violations,
|
||||
Recommendations: [
|
||||
"Add the required hosts to the air-gap egress allowlist",
|
||||
"Or disable network intent enforcement in the staleness configuration",
|
||||
$"Disallowed hosts: {string.Join(", ", disallowedHosts.Take(5))}"
|
||||
]);
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// A specific network intent violation.
|
||||
/// </summary>
|
||||
public sealed record NetworkIntentViolation(
|
||||
/// <summary>The endpoint that violated the policy.</summary>
|
||||
string Endpoint,
|
||||
|
||||
/// <summary>Type of violation.</summary>
|
||||
NetworkViolationType ViolationType,
|
||||
|
||||
/// <summary>The intent that caused the violation (if any).</summary>
|
||||
NetworkIntent? Intent);
|
||||
|
||||
/// <summary>
|
||||
/// Type of network intent violation.
|
||||
/// </summary>
|
||||
public enum NetworkViolationType
|
||||
{
|
||||
/// <summary>Network endpoint accessed without a declared intent.</summary>
|
||||
MissingIntent,
|
||||
|
||||
/// <summary>Declared intent is not in the sealed-mode allowlist.</summary>
|
||||
NotInAllowlist,
|
||||
|
||||
/// <summary>Intent declared for blocked protocol.</summary>
|
||||
BlockedProtocol,
|
||||
|
||||
/// <summary>Intent declared for blocked port.</summary>
|
||||
BlockedPort
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Configuration for network intent enforcement.
|
||||
/// </summary>
|
||||
public sealed record NetworkIntentConfig(
|
||||
/// <summary>Enforcement mode for network intent validation.</summary>
|
||||
EnforcementMode EnforcementMode = EnforcementMode.Warn,
|
||||
|
||||
/// <summary>Allowlist of permitted network endpoints in sealed mode.</summary>
|
||||
IReadOnlyList<NetworkAllowlistEntry>? Allowlist = null,
|
||||
|
||||
/// <summary>Whether to require explicit intent declarations.</summary>
|
||||
bool RequireExplicitIntents = true,
|
||||
|
||||
/// <summary>Protocols that are always blocked.</summary>
|
||||
IReadOnlyList<string>? BlockedProtocols = null)
|
||||
{
|
||||
/// <summary>
|
||||
/// Default configuration with warning mode.
|
||||
/// </summary>
|
||||
public static NetworkIntentConfig Default => new();
|
||||
|
||||
/// <summary>
|
||||
/// Strict enforcement configuration.
|
||||
/// </summary>
|
||||
public static NetworkIntentConfig Strict => new(EnforcementMode.Strict);
|
||||
|
||||
/// <summary>
|
||||
/// Disabled enforcement configuration.
|
||||
/// </summary>
|
||||
public static NetworkIntentConfig Disabled => new(EnforcementMode.Disabled);
|
||||
}
|
||||
@@ -0,0 +1,104 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain.AirGap;
|
||||
|
||||
/// <summary>
|
||||
/// Represents the current sealing status for air-gap mode.
|
||||
/// Per ORCH-AIRGAP-56-002.
|
||||
/// </summary>
|
||||
public sealed record SealingStatus(
|
||||
/// <summary>Whether the environment is currently sealed (air-gapped).</summary>
|
||||
bool IsSealed,
|
||||
|
||||
/// <summary>When the environment was sealed.</summary>
|
||||
DateTimeOffset? SealedAt,
|
||||
|
||||
/// <summary>Actor who sealed the environment.</summary>
|
||||
string? SealedBy,
|
||||
|
||||
/// <summary>Reason for sealing.</summary>
|
||||
string? SealReason,
|
||||
|
||||
/// <summary>Per-domain staleness metrics.</summary>
|
||||
IReadOnlyDictionary<string, DomainStalenessMetric>? DomainStaleness,
|
||||
|
||||
/// <summary>Aggregate staleness metrics.</summary>
|
||||
AggregateMetrics? Aggregates,
|
||||
|
||||
/// <summary>When staleness metrics were last calculated.</summary>
|
||||
DateTimeOffset? MetricsCollectedAt)
|
||||
{
|
||||
/// <summary>
|
||||
/// An unsealed (online) environment status.
|
||||
/// </summary>
|
||||
public static readonly SealingStatus Unsealed = new(
|
||||
IsSealed: false,
|
||||
SealedAt: null,
|
||||
SealedBy: null,
|
||||
SealReason: null,
|
||||
DomainStaleness: null,
|
||||
Aggregates: null,
|
||||
MetricsCollectedAt: null);
|
||||
|
||||
/// <summary>
|
||||
/// Gets the staleness for a specific domain.
|
||||
/// </summary>
|
||||
public DomainStalenessMetric? GetDomainStaleness(string domainId)
|
||||
=> DomainStaleness?.GetValueOrDefault(domainId);
|
||||
|
||||
/// <summary>
|
||||
/// Checks if any domain has exceeded staleness threshold.
|
||||
/// </summary>
|
||||
public bool HasStaleDomains => Aggregates?.StaleDomains > 0;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Staleness metrics for a specific domain.
|
||||
/// </summary>
|
||||
public sealed record DomainStalenessMetric(
|
||||
/// <summary>Domain identifier.</summary>
|
||||
string DomainId,
|
||||
|
||||
/// <summary>Current staleness in seconds.</summary>
|
||||
int StalenessSeconds,
|
||||
|
||||
/// <summary>Last bundle import timestamp.</summary>
|
||||
DateTimeOffset LastImportAt,
|
||||
|
||||
/// <summary>Source timestamp of last import.</summary>
|
||||
DateTimeOffset LastSourceTimestamp,
|
||||
|
||||
/// <summary>Total bundles imported for this domain.</summary>
|
||||
int BundleCount,
|
||||
|
||||
/// <summary>Whether domain data exceeds staleness threshold.</summary>
|
||||
bool IsStale,
|
||||
|
||||
/// <summary>Staleness as percentage of threshold.</summary>
|
||||
double PercentOfThreshold,
|
||||
|
||||
/// <summary>When data will become stale if no updates.</summary>
|
||||
DateTimeOffset? ProjectedStaleAt);
|
||||
|
||||
/// <summary>
|
||||
/// Aggregate staleness metrics across all domains.
|
||||
/// </summary>
|
||||
public sealed record AggregateMetrics(
|
||||
/// <summary>Total domains tracked.</summary>
|
||||
int TotalDomains,
|
||||
|
||||
/// <summary>Domains exceeding staleness threshold.</summary>
|
||||
int StaleDomains,
|
||||
|
||||
/// <summary>Domains approaching staleness threshold.</summary>
|
||||
int WarningDomains,
|
||||
|
||||
/// <summary>Domains within healthy staleness range.</summary>
|
||||
int HealthyDomains,
|
||||
|
||||
/// <summary>Maximum staleness across all domains.</summary>
|
||||
int MaxStalenessSeconds,
|
||||
|
||||
/// <summary>Average staleness across all domains.</summary>
|
||||
double AvgStalenessSeconds,
|
||||
|
||||
/// <summary>Timestamp of oldest bundle source data.</summary>
|
||||
DateTimeOffset? OldestBundle);
|
||||
@@ -0,0 +1,88 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain.AirGap;
|
||||
|
||||
/// <summary>
|
||||
/// Configuration for air-gap staleness enforcement policies.
|
||||
/// Per ORCH-AIRGAP-56-002.
|
||||
/// </summary>
|
||||
public sealed record StalenessConfig(
|
||||
/// <summary>Maximum age in seconds before data is considered stale (default: 7 days = 604800).</summary>
|
||||
int FreshnessThresholdSeconds = 604800,
|
||||
|
||||
/// <summary>How staleness violations are handled.</summary>
|
||||
StalenessEnforcementMode EnforcementMode = StalenessEnforcementMode.Strict,
|
||||
|
||||
/// <summary>Grace period after threshold before hard enforcement (default: 1 day = 86400).</summary>
|
||||
int GracePeriodSeconds = 86400,
|
||||
|
||||
/// <summary>Domains exempt from staleness enforcement.</summary>
|
||||
IReadOnlyList<string>? AllowedDomains = null,
|
||||
|
||||
/// <summary>Alert thresholds for approaching staleness.</summary>
|
||||
IReadOnlyList<NotificationThreshold>? NotificationThresholds = null)
|
||||
{
|
||||
/// <summary>
|
||||
/// Default staleness configuration.
|
||||
/// </summary>
|
||||
public static readonly StalenessConfig Default = new();
|
||||
|
||||
/// <summary>
|
||||
/// Creates a disabled staleness configuration.
|
||||
/// </summary>
|
||||
public static StalenessConfig Disabled() => new(EnforcementMode: StalenessEnforcementMode.Disabled);
|
||||
|
||||
/// <summary>
|
||||
/// Checks if a domain is exempt from staleness enforcement.
|
||||
/// </summary>
|
||||
public bool IsDomainExempt(string domainId)
|
||||
=> AllowedDomains?.Contains(domainId, StringComparer.OrdinalIgnoreCase) == true;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// How staleness violations are handled.
|
||||
/// </summary>
|
||||
public enum StalenessEnforcementMode
|
||||
{
|
||||
/// <summary>Violations block execution with error.</summary>
|
||||
Strict,
|
||||
|
||||
/// <summary>Violations generate warnings but allow execution.</summary>
|
||||
Warn,
|
||||
|
||||
/// <summary>Staleness checking is disabled.</summary>
|
||||
Disabled
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Alert threshold for approaching staleness.
|
||||
/// </summary>
|
||||
public sealed record NotificationThreshold(
|
||||
/// <summary>Percentage of freshness threshold to trigger notification (1-100).</summary>
|
||||
int PercentOfThreshold,
|
||||
|
||||
/// <summary>Notification severity level.</summary>
|
||||
NotificationSeverity Severity,
|
||||
|
||||
/// <summary>Notification delivery channels.</summary>
|
||||
IReadOnlyList<NotificationChannel>? Channels = null);
|
||||
|
||||
/// <summary>
|
||||
/// Notification severity level.
|
||||
/// </summary>
|
||||
public enum NotificationSeverity
|
||||
{
|
||||
Info,
|
||||
Warning,
|
||||
Critical
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Notification delivery channel.
|
||||
/// </summary>
|
||||
public enum NotificationChannel
|
||||
{
|
||||
Email,
|
||||
Slack,
|
||||
Teams,
|
||||
Webhook,
|
||||
Metric
|
||||
}
|
||||
@@ -0,0 +1,172 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain.AirGap;
|
||||
|
||||
/// <summary>
|
||||
/// Result of staleness validation check.
|
||||
/// Per ORCH-AIRGAP-56-002 and ledger-airgap-staleness.schema.json.
|
||||
/// </summary>
|
||||
public sealed record StalenessValidationResult(
|
||||
/// <summary>When validation was performed.</summary>
|
||||
DateTimeOffset ValidatedAt,
|
||||
|
||||
/// <summary>Whether validation passed.</summary>
|
||||
bool Passed,
|
||||
|
||||
/// <summary>Context where validation was triggered.</summary>
|
||||
StalenessValidationContext Context,
|
||||
|
||||
/// <summary>Domain being validated.</summary>
|
||||
string? DomainId,
|
||||
|
||||
/// <summary>Current staleness at validation time.</summary>
|
||||
int StalenessSeconds,
|
||||
|
||||
/// <summary>Threshold used for validation.</summary>
|
||||
int ThresholdSeconds,
|
||||
|
||||
/// <summary>Enforcement mode at validation time.</summary>
|
||||
StalenessEnforcementMode EnforcementMode,
|
||||
|
||||
/// <summary>Error details if validation failed.</summary>
|
||||
StalenessError? Error,
|
||||
|
||||
/// <summary>Warnings generated during validation.</summary>
|
||||
IReadOnlyList<StalenessWarning>? Warnings)
|
||||
{
|
||||
/// <summary>
|
||||
/// Creates a passing validation result.
|
||||
/// </summary>
|
||||
public static StalenessValidationResult Pass(
|
||||
DateTimeOffset validatedAt,
|
||||
StalenessValidationContext context,
|
||||
string? domainId,
|
||||
int stalenessSeconds,
|
||||
int thresholdSeconds,
|
||||
StalenessEnforcementMode enforcementMode,
|
||||
IReadOnlyList<StalenessWarning>? warnings = null)
|
||||
=> new(validatedAt, true, context, domainId, stalenessSeconds, thresholdSeconds, enforcementMode, null, warnings);
|
||||
|
||||
/// <summary>
|
||||
/// Creates a failing validation result.
|
||||
/// </summary>
|
||||
public static StalenessValidationResult Fail(
|
||||
DateTimeOffset validatedAt,
|
||||
StalenessValidationContext context,
|
||||
string? domainId,
|
||||
int stalenessSeconds,
|
||||
int thresholdSeconds,
|
||||
StalenessEnforcementMode enforcementMode,
|
||||
StalenessError error,
|
||||
IReadOnlyList<StalenessWarning>? warnings = null)
|
||||
=> new(validatedAt, false, context, domainId, stalenessSeconds, thresholdSeconds, enforcementMode, error, warnings);
|
||||
|
||||
/// <summary>
|
||||
/// Whether this result should block execution (depends on enforcement mode).
|
||||
/// </summary>
|
||||
public bool ShouldBlock => !Passed && EnforcementMode == StalenessEnforcementMode.Strict;
|
||||
|
||||
/// <summary>
|
||||
/// Whether this result has warnings.
|
||||
/// </summary>
|
||||
public bool HasWarnings => Warnings is { Count: > 0 };
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Context where staleness validation was triggered.
|
||||
/// </summary>
|
||||
public enum StalenessValidationContext
|
||||
{
|
||||
/// <summary>Export operation.</summary>
|
||||
Export,
|
||||
|
||||
/// <summary>Query operation.</summary>
|
||||
Query,
|
||||
|
||||
/// <summary>Policy evaluation.</summary>
|
||||
PolicyEval,
|
||||
|
||||
/// <summary>Attestation generation.</summary>
|
||||
Attestation,
|
||||
|
||||
/// <summary>Job scheduling.</summary>
|
||||
JobScheduling,
|
||||
|
||||
/// <summary>Run scheduling.</summary>
|
||||
RunScheduling
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Error details for staleness validation failure.
|
||||
/// </summary>
|
||||
public sealed record StalenessError(
|
||||
/// <summary>Error code.</summary>
|
||||
StalenessErrorCode Code,
|
||||
|
||||
/// <summary>Human-readable error message.</summary>
|
||||
string Message,
|
||||
|
||||
/// <summary>Affected domain.</summary>
|
||||
string? DomainId,
|
||||
|
||||
/// <summary>Actual staleness when error occurred.</summary>
|
||||
int? StalenessSeconds,
|
||||
|
||||
/// <summary>Threshold that was exceeded.</summary>
|
||||
int? ThresholdSeconds,
|
||||
|
||||
/// <summary>Recommended action to resolve.</summary>
|
||||
string? Recommendation);
|
||||
|
||||
/// <summary>
|
||||
/// Staleness error codes.
|
||||
/// </summary>
|
||||
public enum StalenessErrorCode
|
||||
{
|
||||
/// <summary>Data is stale beyond threshold.</summary>
|
||||
AirgapStale,
|
||||
|
||||
/// <summary>No bundle available for domain.</summary>
|
||||
AirgapNoBundle,
|
||||
|
||||
/// <summary>Time anchor is missing.</summary>
|
||||
AirgapTimeAnchorMissing,
|
||||
|
||||
/// <summary>Time drift detected.</summary>
|
||||
AirgapTimeDrift,
|
||||
|
||||
/// <summary>Attestation is invalid.</summary>
|
||||
AirgapAttestationInvalid
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Warning generated during staleness validation.
|
||||
/// </summary>
|
||||
public sealed record StalenessWarning(
|
||||
/// <summary>Warning code.</summary>
|
||||
StalenessWarningCode Code,
|
||||
|
||||
/// <summary>Human-readable warning message.</summary>
|
||||
string Message,
|
||||
|
||||
/// <summary>Current staleness as percentage of threshold.</summary>
|
||||
double? PercentOfThreshold,
|
||||
|
||||
/// <summary>When data will become stale.</summary>
|
||||
DateTimeOffset? ProjectedStaleAt);
|
||||
|
||||
/// <summary>
|
||||
/// Staleness warning codes.
|
||||
/// </summary>
|
||||
public enum StalenessWarningCode
|
||||
{
|
||||
/// <summary>Approaching staleness threshold.</summary>
|
||||
AirgapApproachingStale,
|
||||
|
||||
/// <summary>Time uncertainty is high.</summary>
|
||||
AirgapTimeUncertaintyHigh,
|
||||
|
||||
/// <summary>Bundle is old but within threshold.</summary>
|
||||
AirgapBundleOld,
|
||||
|
||||
/// <summary>No recent import detected.</summary>
|
||||
AirgapNoRecentImport
|
||||
}
|
||||
@@ -0,0 +1,39 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
/// <summary>
|
||||
/// Represents an artifact produced by a job execution.
|
||||
/// Artifacts are immutable outputs with content digests for provenance.
|
||||
/// </summary>
|
||||
public sealed record Artifact(
|
||||
/// <summary>Unique artifact identifier.</summary>
|
||||
Guid ArtifactId,
|
||||
|
||||
/// <summary>Tenant owning this artifact.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Job that produced this artifact.</summary>
|
||||
Guid JobId,
|
||||
|
||||
/// <summary>Run containing the producing job (if any).</summary>
|
||||
Guid? RunId,
|
||||
|
||||
/// <summary>Artifact type (e.g., "sbom", "scan-result", "attestation", "log").</summary>
|
||||
string ArtifactType,
|
||||
|
||||
/// <summary>Storage URI (e.g., "s3://bucket/path", "file:///local/path").</summary>
|
||||
string Uri,
|
||||
|
||||
/// <summary>Content digest (SHA-256) for integrity verification.</summary>
|
||||
string Digest,
|
||||
|
||||
/// <summary>MIME type (e.g., "application/json", "application/vnd.cyclonedx+json").</summary>
|
||||
string? MimeType,
|
||||
|
||||
/// <summary>Artifact size in bytes.</summary>
|
||||
long? SizeBytes,
|
||||
|
||||
/// <summary>When the artifact was created.</summary>
|
||||
DateTimeOffset CreatedAt,
|
||||
|
||||
/// <summary>Optional metadata JSON blob.</summary>
|
||||
string? Metadata);
|
||||
@@ -0,0 +1,281 @@
|
||||
using StellaOps.JobEngine.Core.Hashing;
|
||||
|
||||
namespace StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
/// <summary>
|
||||
/// Represents an immutable audit log entry for orchestrator operations.
|
||||
/// Captures who did what, when, and with what effect.
|
||||
/// </summary>
|
||||
public sealed record AuditEntry(
|
||||
/// <summary>Unique audit entry identifier.</summary>
|
||||
Guid EntryId,
|
||||
|
||||
/// <summary>Tenant owning this entry.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Type of audited event.</summary>
|
||||
AuditEventType EventType,
|
||||
|
||||
/// <summary>Resource type being audited (job, run, source, quota, etc.).</summary>
|
||||
string ResourceType,
|
||||
|
||||
/// <summary>Resource identifier being audited.</summary>
|
||||
Guid ResourceId,
|
||||
|
||||
/// <summary>Actor who performed the action.</summary>
|
||||
string ActorId,
|
||||
|
||||
/// <summary>Actor type (user, system, worker, api-key).</summary>
|
||||
ActorType ActorType,
|
||||
|
||||
/// <summary>IP address of the actor (if applicable).</summary>
|
||||
string? ActorIp,
|
||||
|
||||
/// <summary>User agent string (if applicable).</summary>
|
||||
string? UserAgent,
|
||||
|
||||
/// <summary>HTTP method used (if applicable).</summary>
|
||||
string? HttpMethod,
|
||||
|
||||
/// <summary>Request path (if applicable).</summary>
|
||||
string? RequestPath,
|
||||
|
||||
/// <summary>State before the change (JSON).</summary>
|
||||
string? OldState,
|
||||
|
||||
/// <summary>State after the change (JSON).</summary>
|
||||
string? NewState,
|
||||
|
||||
/// <summary>Human-readable description of the change.</summary>
|
||||
string Description,
|
||||
|
||||
/// <summary>Correlation ID for distributed tracing.</summary>
|
||||
string? CorrelationId,
|
||||
|
||||
/// <summary>SHA-256 hash of the previous entry for chain integrity.</summary>
|
||||
string? PreviousEntryHash,
|
||||
|
||||
/// <summary>SHA-256 hash of this entry's content for integrity.</summary>
|
||||
string ContentHash,
|
||||
|
||||
/// <summary>Sequence number within the tenant's audit stream.</summary>
|
||||
long SequenceNumber,
|
||||
|
||||
/// <summary>When the event occurred.</summary>
|
||||
DateTimeOffset OccurredAt,
|
||||
|
||||
/// <summary>Optional metadata JSON blob.</summary>
|
||||
string? Metadata)
|
||||
{
|
||||
/// <summary>
|
||||
/// Creates a new audit entry with computed hash.
|
||||
/// Uses the platform's compliance-aware crypto abstraction.
|
||||
/// </summary>
|
||||
public static AuditEntry Create(
|
||||
CanonicalJsonHasher hasher,
|
||||
string tenantId,
|
||||
AuditEventType eventType,
|
||||
string resourceType,
|
||||
Guid resourceId,
|
||||
string actorId,
|
||||
ActorType actorType,
|
||||
string description,
|
||||
DateTimeOffset occurredAt,
|
||||
string? oldState = null,
|
||||
string? newState = null,
|
||||
string? actorIp = null,
|
||||
string? userAgent = null,
|
||||
string? httpMethod = null,
|
||||
string? requestPath = null,
|
||||
string? correlationId = null,
|
||||
string? previousEntryHash = null,
|
||||
long sequenceNumber = 0,
|
||||
string? metadata = null,
|
||||
Guid? entryId = null)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(hasher);
|
||||
|
||||
var actualEntryId = entryId ?? Guid.NewGuid();
|
||||
|
||||
// Compute canonical hash from immutable content
|
||||
// Use the same property names and fields as VerifyIntegrity to keep the hash stable.
|
||||
var contentHash = hasher.ComputeCanonicalHash(new
|
||||
{
|
||||
EntryId = actualEntryId,
|
||||
TenantId = tenantId,
|
||||
EventType = eventType,
|
||||
ResourceType = resourceType,
|
||||
ResourceId = resourceId,
|
||||
ActorId = actorId,
|
||||
ActorType = actorType,
|
||||
Description = description,
|
||||
OldState = oldState,
|
||||
NewState = newState,
|
||||
OccurredAt = occurredAt,
|
||||
SequenceNumber = sequenceNumber
|
||||
});
|
||||
|
||||
return new AuditEntry(
|
||||
EntryId: actualEntryId,
|
||||
TenantId: tenantId,
|
||||
EventType: eventType,
|
||||
ResourceType: resourceType,
|
||||
ResourceId: resourceId,
|
||||
ActorId: actorId,
|
||||
ActorType: actorType,
|
||||
ActorIp: actorIp,
|
||||
UserAgent: userAgent,
|
||||
HttpMethod: httpMethod,
|
||||
RequestPath: requestPath,
|
||||
OldState: oldState,
|
||||
NewState: newState,
|
||||
Description: description,
|
||||
CorrelationId: correlationId,
|
||||
PreviousEntryHash: previousEntryHash,
|
||||
ContentHash: contentHash,
|
||||
SequenceNumber: sequenceNumber,
|
||||
OccurredAt: occurredAt,
|
||||
Metadata: metadata);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Verifies the integrity of this entry's content hash.
|
||||
/// Uses the platform's compliance-aware crypto abstraction.
|
||||
/// </summary>
|
||||
public bool VerifyIntegrity(CanonicalJsonHasher hasher)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(hasher);
|
||||
|
||||
var computed = hasher.ComputeCanonicalHash(new
|
||||
{
|
||||
EntryId,
|
||||
TenantId,
|
||||
EventType,
|
||||
ResourceType,
|
||||
ResourceId,
|
||||
ActorId,
|
||||
ActorType,
|
||||
Description,
|
||||
OldState,
|
||||
NewState,
|
||||
OccurredAt,
|
||||
SequenceNumber
|
||||
});
|
||||
return string.Equals(ContentHash, computed, StringComparison.OrdinalIgnoreCase);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Verifies the chain link to the previous entry.
|
||||
/// </summary>
|
||||
public bool VerifyChainLink(AuditEntry? previousEntry)
|
||||
{
|
||||
if (previousEntry is null)
|
||||
{
|
||||
return PreviousEntryHash is null || SequenceNumber == 1;
|
||||
}
|
||||
|
||||
return string.Equals(PreviousEntryHash, previousEntry.ContentHash, StringComparison.OrdinalIgnoreCase);
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Types of auditable events in the orchestrator.
|
||||
/// </summary>
|
||||
public enum AuditEventType
|
||||
{
|
||||
// Job lifecycle events
|
||||
JobCreated = 100,
|
||||
JobScheduled = 101,
|
||||
JobLeased = 102,
|
||||
JobCompleted = 103,
|
||||
JobFailed = 104,
|
||||
JobCanceled = 105,
|
||||
JobRetried = 106,
|
||||
|
||||
// Run lifecycle events
|
||||
RunCreated = 200,
|
||||
RunStarted = 201,
|
||||
RunCompleted = 202,
|
||||
RunFailed = 203,
|
||||
RunCanceled = 204,
|
||||
|
||||
// Source management events
|
||||
SourceCreated = 300,
|
||||
SourceUpdated = 301,
|
||||
SourcePaused = 302,
|
||||
SourceResumed = 303,
|
||||
SourceDeleted = 304,
|
||||
|
||||
// Quota management events
|
||||
QuotaCreated = 400,
|
||||
QuotaUpdated = 401,
|
||||
QuotaPaused = 402,
|
||||
QuotaResumed = 403,
|
||||
QuotaDeleted = 404,
|
||||
|
||||
// SLO management events
|
||||
SloCreated = 500,
|
||||
SloUpdated = 501,
|
||||
SloEnabled = 502,
|
||||
SloDisabled = 503,
|
||||
SloDeleted = 504,
|
||||
SloAlertTriggered = 505,
|
||||
SloAlertAcknowledged = 506,
|
||||
SloAlertResolved = 507,
|
||||
|
||||
// Dead-letter events
|
||||
DeadLetterCreated = 600,
|
||||
DeadLetterReplayed = 601,
|
||||
DeadLetterResolved = 602,
|
||||
DeadLetterExpired = 603,
|
||||
|
||||
// Backfill events
|
||||
BackfillCreated = 700,
|
||||
BackfillStarted = 701,
|
||||
BackfillCompleted = 702,
|
||||
BackfillFailed = 703,
|
||||
BackfillCanceled = 704,
|
||||
|
||||
// Ledger events
|
||||
LedgerExportRequested = 800,
|
||||
LedgerExportCompleted = 801,
|
||||
LedgerExportFailed = 802,
|
||||
|
||||
// Worker events
|
||||
WorkerClaimed = 900,
|
||||
WorkerHeartbeat = 901,
|
||||
WorkerProgressReported = 902,
|
||||
WorkerCompleted = 903,
|
||||
|
||||
// Security events
|
||||
AuthenticationSuccess = 1000,
|
||||
AuthenticationFailure = 1001,
|
||||
AuthorizationDenied = 1002,
|
||||
ApiKeyCreated = 1003,
|
||||
ApiKeyRevoked = 1004
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Types of actors that can perform auditable actions.
|
||||
/// </summary>
|
||||
public enum ActorType
|
||||
{
|
||||
/// <summary>Human user via UI or API.</summary>
|
||||
User = 0,
|
||||
|
||||
/// <summary>System-initiated action (scheduler, background job).</summary>
|
||||
System = 1,
|
||||
|
||||
/// <summary>Worker process.</summary>
|
||||
Worker = 2,
|
||||
|
||||
/// <summary>API key authentication.</summary>
|
||||
ApiKey = 3,
|
||||
|
||||
/// <summary>Service-to-service call.</summary>
|
||||
Service = 4,
|
||||
|
||||
/// <summary>Unknown or unidentified actor.</summary>
|
||||
Unknown = 99
|
||||
}
|
||||
@@ -0,0 +1,429 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
/// <summary>
|
||||
/// Represents a request to backfill/reprocess events within a time window.
|
||||
/// </summary>
|
||||
public sealed record BackfillRequest(
|
||||
/// <summary>Unique backfill request identifier.</summary>
|
||||
Guid BackfillId,
|
||||
|
||||
/// <summary>Tenant this backfill applies to.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Source to backfill (null if job-type scoped).</summary>
|
||||
Guid? SourceId,
|
||||
|
||||
/// <summary>Job type to backfill (null if source-scoped).</summary>
|
||||
string? JobType,
|
||||
|
||||
/// <summary>Normalized scope key.</summary>
|
||||
string ScopeKey,
|
||||
|
||||
/// <summary>Current status of the backfill.</summary>
|
||||
BackfillStatus Status,
|
||||
|
||||
/// <summary>Start of the time window to backfill (inclusive).</summary>
|
||||
DateTimeOffset WindowStart,
|
||||
|
||||
/// <summary>End of the time window to backfill (exclusive).</summary>
|
||||
DateTimeOffset WindowEnd,
|
||||
|
||||
/// <summary>Current processing position within the window.</summary>
|
||||
DateTimeOffset? CurrentPosition,
|
||||
|
||||
/// <summary>Total events estimated in the window.</summary>
|
||||
long? TotalEvents,
|
||||
|
||||
/// <summary>Events successfully processed.</summary>
|
||||
long ProcessedEvents,
|
||||
|
||||
/// <summary>Events skipped due to duplicate suppression.</summary>
|
||||
long SkippedEvents,
|
||||
|
||||
/// <summary>Events that failed processing.</summary>
|
||||
long FailedEvents,
|
||||
|
||||
/// <summary>Number of events to process per batch.</summary>
|
||||
int BatchSize,
|
||||
|
||||
/// <summary>Whether this is a dry-run (preview only, no changes).</summary>
|
||||
bool DryRun,
|
||||
|
||||
/// <summary>Whether to force reprocessing (ignore duplicate suppression).</summary>
|
||||
bool ForceReprocess,
|
||||
|
||||
/// <summary>Estimated duration for the backfill.</summary>
|
||||
TimeSpan? EstimatedDuration,
|
||||
|
||||
/// <summary>Maximum allowed duration (safety limit).</summary>
|
||||
TimeSpan? MaxDuration,
|
||||
|
||||
/// <summary>Results of safety validation checks.</summary>
|
||||
BackfillSafetyChecks? SafetyChecks,
|
||||
|
||||
/// <summary>Reason for the backfill request.</summary>
|
||||
string Reason,
|
||||
|
||||
/// <summary>Optional ticket reference for audit.</summary>
|
||||
string? Ticket,
|
||||
|
||||
/// <summary>When the request was created.</summary>
|
||||
DateTimeOffset CreatedAt,
|
||||
|
||||
/// <summary>When processing started.</summary>
|
||||
DateTimeOffset? StartedAt,
|
||||
|
||||
/// <summary>When processing completed.</summary>
|
||||
DateTimeOffset? CompletedAt,
|
||||
|
||||
/// <summary>Actor who created the request.</summary>
|
||||
string CreatedBy,
|
||||
|
||||
/// <summary>Actor who last modified the request.</summary>
|
||||
string UpdatedBy,
|
||||
|
||||
/// <summary>Error message if failed.</summary>
|
||||
string? ErrorMessage)
|
||||
{
|
||||
/// <summary>
|
||||
/// Window duration.
|
||||
/// </summary>
|
||||
public TimeSpan WindowDuration => WindowEnd - WindowStart;
|
||||
|
||||
/// <summary>
|
||||
/// Progress percentage (0-100).
|
||||
/// </summary>
|
||||
public double ProgressPercent => TotalEvents > 0
|
||||
? Math.Round((double)(ProcessedEvents + SkippedEvents + FailedEvents) / TotalEvents.Value * 100, 2)
|
||||
: 0;
|
||||
|
||||
/// <summary>
|
||||
/// Whether the backfill is in a terminal state.
|
||||
/// </summary>
|
||||
public bool IsTerminal => Status is BackfillStatus.Completed or BackfillStatus.Failed or BackfillStatus.Canceled;
|
||||
|
||||
/// <summary>
|
||||
/// Creates a new backfill request.
|
||||
/// </summary>
|
||||
public static BackfillRequest Create(
|
||||
string tenantId,
|
||||
Guid? sourceId,
|
||||
string? jobType,
|
||||
DateTimeOffset windowStart,
|
||||
DateTimeOffset windowEnd,
|
||||
string reason,
|
||||
string createdBy,
|
||||
DateTimeOffset timestamp,
|
||||
int batchSize = 100,
|
||||
bool dryRun = false,
|
||||
bool forceReprocess = false,
|
||||
string? ticket = null,
|
||||
TimeSpan? maxDuration = null)
|
||||
{
|
||||
if (windowEnd <= windowStart)
|
||||
throw new ArgumentException("Window end must be after window start.", nameof(windowEnd));
|
||||
|
||||
if (batchSize <= 0 || batchSize > 10000)
|
||||
throw new ArgumentOutOfRangeException(nameof(batchSize), "Batch size must be between 1 and 10000.");
|
||||
|
||||
var scopeKey = (sourceId, jobType) switch
|
||||
{
|
||||
(Guid s, string j) when !string.IsNullOrEmpty(j) => Watermark.CreateScopeKey(s, j),
|
||||
(Guid s, _) => Watermark.CreateScopeKey(s),
|
||||
(_, string j) when !string.IsNullOrEmpty(j) => Watermark.CreateScopeKey(j),
|
||||
_ => throw new ArgumentException("Either sourceId or jobType must be specified.")
|
||||
};
|
||||
|
||||
return new BackfillRequest(
|
||||
BackfillId: Guid.NewGuid(),
|
||||
TenantId: tenantId,
|
||||
SourceId: sourceId,
|
||||
JobType: jobType,
|
||||
ScopeKey: scopeKey,
|
||||
Status: BackfillStatus.Pending,
|
||||
WindowStart: windowStart,
|
||||
WindowEnd: windowEnd,
|
||||
CurrentPosition: null,
|
||||
TotalEvents: null,
|
||||
ProcessedEvents: 0,
|
||||
SkippedEvents: 0,
|
||||
FailedEvents: 0,
|
||||
BatchSize: batchSize,
|
||||
DryRun: dryRun,
|
||||
ForceReprocess: forceReprocess,
|
||||
EstimatedDuration: null,
|
||||
MaxDuration: maxDuration,
|
||||
SafetyChecks: null,
|
||||
Reason: reason,
|
||||
Ticket: ticket,
|
||||
CreatedAt: timestamp,
|
||||
StartedAt: null,
|
||||
CompletedAt: null,
|
||||
CreatedBy: createdBy,
|
||||
UpdatedBy: createdBy,
|
||||
ErrorMessage: null);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Transitions to validating status.
|
||||
/// </summary>
|
||||
public BackfillRequest StartValidation(string updatedBy)
|
||||
{
|
||||
if (Status != BackfillStatus.Pending)
|
||||
throw new InvalidOperationException($"Cannot start validation from status {Status}.");
|
||||
|
||||
return this with
|
||||
{
|
||||
Status = BackfillStatus.Validating,
|
||||
UpdatedBy = updatedBy
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Records safety check results.
|
||||
/// </summary>
|
||||
public BackfillRequest WithSafetyChecks(BackfillSafetyChecks checks, long? totalEvents, TimeSpan? estimatedDuration, string updatedBy)
|
||||
{
|
||||
return this with
|
||||
{
|
||||
SafetyChecks = checks,
|
||||
TotalEvents = totalEvents,
|
||||
EstimatedDuration = estimatedDuration,
|
||||
UpdatedBy = updatedBy
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Transitions to running status.
|
||||
/// </summary>
|
||||
public BackfillRequest Start(string updatedBy, DateTimeOffset timestamp)
|
||||
{
|
||||
if (Status != BackfillStatus.Validating)
|
||||
throw new InvalidOperationException($"Cannot start from status {Status}.");
|
||||
|
||||
if (SafetyChecks?.HasBlockingIssues == true)
|
||||
throw new InvalidOperationException("Cannot start backfill with blocking safety issues.");
|
||||
|
||||
return this with
|
||||
{
|
||||
Status = BackfillStatus.Running,
|
||||
StartedAt = timestamp,
|
||||
CurrentPosition = WindowStart,
|
||||
UpdatedBy = updatedBy
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Updates progress after processing a batch.
|
||||
/// </summary>
|
||||
public BackfillRequest UpdateProgress(
|
||||
DateTimeOffset newPosition,
|
||||
long processed,
|
||||
long skipped,
|
||||
long failed,
|
||||
string updatedBy)
|
||||
{
|
||||
if (Status != BackfillStatus.Running)
|
||||
throw new InvalidOperationException($"Cannot update progress in status {Status}.");
|
||||
|
||||
return this with
|
||||
{
|
||||
CurrentPosition = newPosition,
|
||||
ProcessedEvents = ProcessedEvents + processed,
|
||||
SkippedEvents = SkippedEvents + skipped,
|
||||
FailedEvents = FailedEvents + failed,
|
||||
UpdatedBy = updatedBy
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Pauses the backfill.
|
||||
/// </summary>
|
||||
public BackfillRequest Pause(string updatedBy)
|
||||
{
|
||||
if (Status != BackfillStatus.Running)
|
||||
throw new InvalidOperationException($"Cannot pause from status {Status}.");
|
||||
|
||||
return this with
|
||||
{
|
||||
Status = BackfillStatus.Paused,
|
||||
UpdatedBy = updatedBy
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Resumes a paused backfill.
|
||||
/// </summary>
|
||||
public BackfillRequest Resume(string updatedBy)
|
||||
{
|
||||
if (Status != BackfillStatus.Paused)
|
||||
throw new InvalidOperationException($"Cannot resume from status {Status}.");
|
||||
|
||||
return this with
|
||||
{
|
||||
Status = BackfillStatus.Running,
|
||||
UpdatedBy = updatedBy
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Completes the backfill successfully.
|
||||
/// </summary>
|
||||
public BackfillRequest Complete(string updatedBy, DateTimeOffset timestamp)
|
||||
{
|
||||
if (Status != BackfillStatus.Running)
|
||||
throw new InvalidOperationException($"Cannot complete from status {Status}.");
|
||||
|
||||
return this with
|
||||
{
|
||||
Status = BackfillStatus.Completed,
|
||||
CompletedAt = timestamp,
|
||||
CurrentPosition = WindowEnd,
|
||||
UpdatedBy = updatedBy
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Fails the backfill with an error.
|
||||
/// </summary>
|
||||
public BackfillRequest Fail(string error, string updatedBy, DateTimeOffset timestamp)
|
||||
{
|
||||
return this with
|
||||
{
|
||||
Status = BackfillStatus.Failed,
|
||||
CompletedAt = timestamp,
|
||||
ErrorMessage = error,
|
||||
UpdatedBy = updatedBy
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Cancels the backfill.
|
||||
/// </summary>
|
||||
public BackfillRequest Cancel(string updatedBy, DateTimeOffset timestamp)
|
||||
{
|
||||
if (IsTerminal)
|
||||
throw new InvalidOperationException($"Cannot cancel from terminal status {Status}.");
|
||||
|
||||
return this with
|
||||
{
|
||||
Status = BackfillStatus.Canceled,
|
||||
CompletedAt = timestamp,
|
||||
UpdatedBy = updatedBy
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Status of a backfill request.
|
||||
/// </summary>
|
||||
public enum BackfillStatus
|
||||
{
|
||||
/// <summary>Request created, awaiting validation.</summary>
|
||||
Pending,
|
||||
|
||||
/// <summary>Running safety validations.</summary>
|
||||
Validating,
|
||||
|
||||
/// <summary>Actively processing events.</summary>
|
||||
Running,
|
||||
|
||||
/// <summary>Temporarily paused.</summary>
|
||||
Paused,
|
||||
|
||||
/// <summary>Successfully completed.</summary>
|
||||
Completed,
|
||||
|
||||
/// <summary>Failed with error.</summary>
|
||||
Failed,
|
||||
|
||||
/// <summary>Canceled by operator.</summary>
|
||||
Canceled
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Results of backfill safety validation checks.
|
||||
/// </summary>
|
||||
public sealed record BackfillSafetyChecks(
|
||||
/// <summary>Whether the source exists and is accessible.</summary>
|
||||
bool SourceExists,
|
||||
|
||||
/// <summary>Whether there are overlapping active backfills.</summary>
|
||||
bool HasOverlappingBackfill,
|
||||
|
||||
/// <summary>Whether the window is within retention period.</summary>
|
||||
bool WithinRetention,
|
||||
|
||||
/// <summary>Whether the estimated event count is within limits.</summary>
|
||||
bool WithinEventLimit,
|
||||
|
||||
/// <summary>Whether estimated duration is within max duration.</summary>
|
||||
bool WithinDurationLimit,
|
||||
|
||||
/// <summary>Whether required quotas are available.</summary>
|
||||
bool QuotaAvailable,
|
||||
|
||||
/// <summary>Warning messages (non-blocking).</summary>
|
||||
IReadOnlyList<string> Warnings,
|
||||
|
||||
/// <summary>Error messages (blocking).</summary>
|
||||
IReadOnlyList<string> Errors)
|
||||
{
|
||||
/// <summary>
|
||||
/// Whether there are any blocking issues.
|
||||
/// </summary>
|
||||
public bool HasBlockingIssues => !SourceExists || HasOverlappingBackfill || !WithinRetention
|
||||
|| !WithinEventLimit || !WithinDurationLimit || Errors.Count > 0;
|
||||
|
||||
/// <summary>
|
||||
/// Whether the backfill is safe to proceed.
|
||||
/// </summary>
|
||||
public bool IsSafe => !HasBlockingIssues;
|
||||
|
||||
/// <summary>
|
||||
/// Creates successful safety checks with no issues.
|
||||
/// </summary>
|
||||
public static BackfillSafetyChecks AllPassed() => new(
|
||||
SourceExists: true,
|
||||
HasOverlappingBackfill: false,
|
||||
WithinRetention: true,
|
||||
WithinEventLimit: true,
|
||||
WithinDurationLimit: true,
|
||||
QuotaAvailable: true,
|
||||
Warnings: [],
|
||||
Errors: []);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Preview result for dry-run backfill.
|
||||
/// </summary>
|
||||
public sealed record BackfillPreview(
|
||||
/// <summary>Scope being backfilled.</summary>
|
||||
string ScopeKey,
|
||||
|
||||
/// <summary>Time window for backfill.</summary>
|
||||
DateTimeOffset WindowStart,
|
||||
|
||||
/// <summary>Time window for backfill.</summary>
|
||||
DateTimeOffset WindowEnd,
|
||||
|
||||
/// <summary>Estimated total events in window.</summary>
|
||||
long EstimatedEvents,
|
||||
|
||||
/// <summary>Events that would be skipped (already processed).</summary>
|
||||
long SkippedEvents,
|
||||
|
||||
/// <summary>Events that would be processed.</summary>
|
||||
long ProcessableEvents,
|
||||
|
||||
/// <summary>Estimated duration.</summary>
|
||||
TimeSpan EstimatedDuration,
|
||||
|
||||
/// <summary>Number of batches required.</summary>
|
||||
int EstimatedBatches,
|
||||
|
||||
/// <summary>Safety validation results.</summary>
|
||||
BackfillSafetyChecks SafetyChecks,
|
||||
|
||||
/// <summary>Sample of event keys that would be processed.</summary>
|
||||
IReadOnlyList<string> SampleEventKeys);
|
||||
@@ -0,0 +1,94 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
/// <summary>
|
||||
/// Represents a circuit breaker for a downstream service.
|
||||
/// Tracks failure rates and opens the circuit when thresholds are exceeded.
|
||||
/// </summary>
|
||||
public sealed record CircuitBreaker(
|
||||
/// <summary>Unique circuit breaker identifier.</summary>
|
||||
Guid CircuitBreakerId,
|
||||
|
||||
/// <summary>Tenant this circuit breaker applies to.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Target service identifier (e.g., "scanner", "attestor", "policy-engine").</summary>
|
||||
string ServiceId,
|
||||
|
||||
/// <summary>Current state of the circuit breaker.</summary>
|
||||
CircuitState State,
|
||||
|
||||
/// <summary>Number of failures in the current window.</summary>
|
||||
int FailureCount,
|
||||
|
||||
/// <summary>Number of successes in the current window.</summary>
|
||||
int SuccessCount,
|
||||
|
||||
/// <summary>Start of the current sampling window.</summary>
|
||||
DateTimeOffset WindowStart,
|
||||
|
||||
/// <summary>Failure rate threshold (0.0-1.0) that triggers circuit open.</summary>
|
||||
double FailureThreshold,
|
||||
|
||||
/// <summary>Window duration for failure rate calculation.</summary>
|
||||
TimeSpan WindowDuration,
|
||||
|
||||
/// <summary>Minimum samples before circuit can trip.</summary>
|
||||
int MinimumSamples,
|
||||
|
||||
/// <summary>Time when circuit was opened (null if not open).</summary>
|
||||
DateTimeOffset? OpenedAt,
|
||||
|
||||
/// <summary>Duration to keep circuit open before transitioning to half-open.</summary>
|
||||
TimeSpan OpenDuration,
|
||||
|
||||
/// <summary>Number of test requests allowed in half-open state.</summary>
|
||||
int HalfOpenTestCount,
|
||||
|
||||
/// <summary>Current test request count in half-open state.</summary>
|
||||
int HalfOpenCurrentCount,
|
||||
|
||||
/// <summary>Number of successful tests in half-open state.</summary>
|
||||
int HalfOpenSuccessCount,
|
||||
|
||||
/// <summary>When the circuit breaker was created.</summary>
|
||||
DateTimeOffset CreatedAt,
|
||||
|
||||
/// <summary>When the circuit breaker was last updated.</summary>
|
||||
DateTimeOffset UpdatedAt,
|
||||
|
||||
/// <summary>Actor who last modified the circuit breaker.</summary>
|
||||
string UpdatedBy);
|
||||
|
||||
/// <summary>
|
||||
/// Circuit breaker states.
|
||||
/// </summary>
|
||||
public enum CircuitState
|
||||
{
|
||||
/// <summary>Circuit is closed - requests flow normally.</summary>
|
||||
Closed = 0,
|
||||
|
||||
/// <summary>Circuit is open - requests are blocked.</summary>
|
||||
Open = 1,
|
||||
|
||||
/// <summary>Circuit is half-open - testing if service recovered.</summary>
|
||||
HalfOpen = 2
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Result of a circuit breaker check.
|
||||
/// </summary>
|
||||
public sealed record CircuitBreakerCheckResult(
|
||||
/// <summary>Whether the request should be allowed.</summary>
|
||||
bool IsAllowed,
|
||||
|
||||
/// <summary>Current circuit state.</summary>
|
||||
CircuitState State,
|
||||
|
||||
/// <summary>Current failure rate (0.0-1.0).</summary>
|
||||
double FailureRate,
|
||||
|
||||
/// <summary>Time until circuit may recover (if open).</summary>
|
||||
TimeSpan? TimeUntilRetry,
|
||||
|
||||
/// <summary>Reason for blocking (if blocked).</summary>
|
||||
string? BlockReason);
|
||||
@@ -0,0 +1,42 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
/// <summary>
|
||||
/// Represents a dependency edge in a job DAG (Directed Acyclic Graph).
|
||||
/// The child job cannot start until the parent job succeeds.
|
||||
/// </summary>
|
||||
public sealed record DagEdge(
|
||||
/// <summary>Unique edge identifier.</summary>
|
||||
Guid EdgeId,
|
||||
|
||||
/// <summary>Tenant owning this edge.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Run containing these jobs.</summary>
|
||||
Guid RunId,
|
||||
|
||||
/// <summary>Parent job ID (must complete first).</summary>
|
||||
Guid ParentJobId,
|
||||
|
||||
/// <summary>Child job ID (depends on parent).</summary>
|
||||
Guid ChildJobId,
|
||||
|
||||
/// <summary>Edge type (e.g., "success", "always", "failure").</summary>
|
||||
string EdgeType,
|
||||
|
||||
/// <summary>When this edge was created.</summary>
|
||||
DateTimeOffset CreatedAt);
|
||||
|
||||
/// <summary>
|
||||
/// Edge types defining dependency semantics.
|
||||
/// </summary>
|
||||
public static class DagEdgeTypes
|
||||
{
|
||||
/// <summary>Child runs only if parent succeeds.</summary>
|
||||
public const string Success = "success";
|
||||
|
||||
/// <summary>Child runs regardless of parent outcome.</summary>
|
||||
public const string Always = "always";
|
||||
|
||||
/// <summary>Child runs only if parent fails.</summary>
|
||||
public const string Failure = "failure";
|
||||
}
|
||||
@@ -0,0 +1,292 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
/// <summary>
|
||||
/// Represents a job that has been moved to the dead-letter store after exhausting retries
|
||||
/// or encountering a non-retryable error.
|
||||
/// </summary>
|
||||
public sealed record DeadLetterEntry(
|
||||
/// <summary>Unique dead-letter entry identifier.</summary>
|
||||
Guid EntryId,
|
||||
|
||||
/// <summary>Tenant owning this entry.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Original job that failed.</summary>
|
||||
Guid OriginalJobId,
|
||||
|
||||
/// <summary>Run the job belonged to (if any).</summary>
|
||||
Guid? RunId,
|
||||
|
||||
/// <summary>Source the job was processing (if any).</summary>
|
||||
Guid? SourceId,
|
||||
|
||||
/// <summary>Job type (e.g., "scan.image", "advisory.nvd").</summary>
|
||||
string JobType,
|
||||
|
||||
/// <summary>Job payload JSON (inputs, parameters).</summary>
|
||||
string Payload,
|
||||
|
||||
/// <summary>SHA-256 digest of the payload.</summary>
|
||||
string PayloadDigest,
|
||||
|
||||
/// <summary>Idempotency key from original job.</summary>
|
||||
string IdempotencyKey,
|
||||
|
||||
/// <summary>Correlation ID for distributed tracing.</summary>
|
||||
string? CorrelationId,
|
||||
|
||||
/// <summary>Current entry status.</summary>
|
||||
DeadLetterStatus Status,
|
||||
|
||||
/// <summary>Classified error code.</summary>
|
||||
string ErrorCode,
|
||||
|
||||
/// <summary>Human-readable failure reason.</summary>
|
||||
string FailureReason,
|
||||
|
||||
/// <summary>Suggested remediation hint for operators.</summary>
|
||||
string? RemediationHint,
|
||||
|
||||
/// <summary>Error classification category.</summary>
|
||||
ErrorCategory Category,
|
||||
|
||||
/// <summary>Whether this error is potentially retryable.</summary>
|
||||
bool IsRetryable,
|
||||
|
||||
/// <summary>Number of attempts made by original job.</summary>
|
||||
int OriginalAttempts,
|
||||
|
||||
/// <summary>Number of replay attempts from dead-letter.</summary>
|
||||
int ReplayAttempts,
|
||||
|
||||
/// <summary>Maximum replay attempts allowed.</summary>
|
||||
int MaxReplayAttempts,
|
||||
|
||||
/// <summary>When the job originally failed.</summary>
|
||||
DateTimeOffset FailedAt,
|
||||
|
||||
/// <summary>When the entry was created in dead-letter store.</summary>
|
||||
DateTimeOffset CreatedAt,
|
||||
|
||||
/// <summary>When the entry was last updated.</summary>
|
||||
DateTimeOffset UpdatedAt,
|
||||
|
||||
/// <summary>When the entry expires and can be purged.</summary>
|
||||
DateTimeOffset ExpiresAt,
|
||||
|
||||
/// <summary>When the entry was resolved (if applicable).</summary>
|
||||
DateTimeOffset? ResolvedAt,
|
||||
|
||||
/// <summary>Resolution notes (if resolved).</summary>
|
||||
string? ResolutionNotes,
|
||||
|
||||
/// <summary>Actor who created/submitted the original job.</summary>
|
||||
string CreatedBy,
|
||||
|
||||
/// <summary>Actor who last updated the entry.</summary>
|
||||
string UpdatedBy)
|
||||
{
|
||||
/// <summary>Default retention period for dead-letter entries.</summary>
|
||||
public static readonly TimeSpan DefaultRetention = TimeSpan.FromDays(30);
|
||||
|
||||
/// <summary>Default maximum replay attempts.</summary>
|
||||
public const int DefaultMaxReplayAttempts = 3;
|
||||
|
||||
/// <summary>Whether this entry is in a terminal state.</summary>
|
||||
public bool IsTerminal => Status is DeadLetterStatus.Replayed
|
||||
or DeadLetterStatus.Resolved
|
||||
or DeadLetterStatus.Exhausted
|
||||
or DeadLetterStatus.Expired;
|
||||
|
||||
/// <summary>Whether more replay attempts are allowed.</summary>
|
||||
public bool CanReplay => !IsTerminal && IsRetryable && ReplayAttempts < MaxReplayAttempts;
|
||||
|
||||
/// <summary>Creates a new dead-letter entry from a failed job.</summary>
|
||||
public static DeadLetterEntry FromFailedJob(
|
||||
Job job,
|
||||
string errorCode,
|
||||
string failureReason,
|
||||
string? remediationHint,
|
||||
ErrorCategory category,
|
||||
bool isRetryable,
|
||||
DateTimeOffset now,
|
||||
TimeSpan? retention = null,
|
||||
int? maxReplayAttempts = null)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(job);
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(errorCode);
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(failureReason);
|
||||
|
||||
var effectiveRetention = retention ?? DefaultRetention;
|
||||
var effectiveMaxReplays = maxReplayAttempts ?? DefaultMaxReplayAttempts;
|
||||
|
||||
return new DeadLetterEntry(
|
||||
EntryId: Guid.NewGuid(),
|
||||
TenantId: job.TenantId,
|
||||
OriginalJobId: job.JobId,
|
||||
RunId: job.RunId,
|
||||
SourceId: null, // Would be extracted from payload if available
|
||||
JobType: job.JobType,
|
||||
Payload: job.Payload,
|
||||
PayloadDigest: job.PayloadDigest,
|
||||
IdempotencyKey: job.IdempotencyKey,
|
||||
CorrelationId: job.CorrelationId,
|
||||
Status: DeadLetterStatus.Pending,
|
||||
ErrorCode: errorCode,
|
||||
FailureReason: failureReason,
|
||||
RemediationHint: remediationHint,
|
||||
Category: category,
|
||||
IsRetryable: isRetryable,
|
||||
OriginalAttempts: job.Attempt,
|
||||
ReplayAttempts: 0,
|
||||
MaxReplayAttempts: effectiveMaxReplays,
|
||||
FailedAt: job.CompletedAt ?? now,
|
||||
CreatedAt: now,
|
||||
UpdatedAt: now,
|
||||
ExpiresAt: now.Add(effectiveRetention),
|
||||
ResolvedAt: null,
|
||||
ResolutionNotes: null,
|
||||
CreatedBy: job.CreatedBy,
|
||||
UpdatedBy: "system");
|
||||
}
|
||||
|
||||
/// <summary>Marks entry as being replayed.</summary>
|
||||
public DeadLetterEntry StartReplay(string updatedBy, DateTimeOffset now)
|
||||
{
|
||||
if (!CanReplay)
|
||||
throw new InvalidOperationException($"Cannot replay entry in status {Status} with {ReplayAttempts}/{MaxReplayAttempts} attempts.");
|
||||
|
||||
return this with
|
||||
{
|
||||
Status = DeadLetterStatus.Replaying,
|
||||
ReplayAttempts = ReplayAttempts + 1,
|
||||
UpdatedAt = now,
|
||||
UpdatedBy = updatedBy
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>Marks entry as successfully replayed.</summary>
|
||||
public DeadLetterEntry CompleteReplay(Guid newJobId, string updatedBy, DateTimeOffset now)
|
||||
{
|
||||
if (Status != DeadLetterStatus.Replaying)
|
||||
throw new InvalidOperationException($"Cannot complete replay from status {Status}.");
|
||||
|
||||
return this with
|
||||
{
|
||||
Status = DeadLetterStatus.Replayed,
|
||||
ResolvedAt = now,
|
||||
ResolutionNotes = $"Replayed as job {newJobId}",
|
||||
UpdatedAt = now,
|
||||
UpdatedBy = updatedBy
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>Marks replay as failed.</summary>
|
||||
public DeadLetterEntry FailReplay(string reason, string updatedBy, DateTimeOffset now)
|
||||
{
|
||||
if (Status != DeadLetterStatus.Replaying)
|
||||
throw new InvalidOperationException($"Cannot fail replay from status {Status}.");
|
||||
|
||||
var newStatus = ReplayAttempts >= MaxReplayAttempts
|
||||
? DeadLetterStatus.Exhausted
|
||||
: DeadLetterStatus.Pending;
|
||||
|
||||
return this with
|
||||
{
|
||||
Status = newStatus,
|
||||
FailureReason = reason,
|
||||
UpdatedAt = now,
|
||||
UpdatedBy = updatedBy
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>Manually resolves the entry without replay.</summary>
|
||||
public DeadLetterEntry Resolve(string notes, string updatedBy, DateTimeOffset now)
|
||||
{
|
||||
if (IsTerminal)
|
||||
throw new InvalidOperationException($"Cannot resolve entry in terminal status {Status}.");
|
||||
|
||||
return this with
|
||||
{
|
||||
Status = DeadLetterStatus.Resolved,
|
||||
ResolvedAt = now,
|
||||
ResolutionNotes = notes,
|
||||
UpdatedAt = now,
|
||||
UpdatedBy = updatedBy
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>Marks entry as expired for cleanup.</summary>
|
||||
public DeadLetterEntry MarkExpired(DateTimeOffset now)
|
||||
{
|
||||
if (IsTerminal)
|
||||
throw new InvalidOperationException($"Cannot expire entry in terminal status {Status}.");
|
||||
|
||||
return this with
|
||||
{
|
||||
Status = DeadLetterStatus.Expired,
|
||||
UpdatedAt = now,
|
||||
UpdatedBy = "system"
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Dead-letter entry lifecycle states.
|
||||
/// </summary>
|
||||
public enum DeadLetterStatus
|
||||
{
|
||||
/// <summary>Entry awaiting operator action or replay.</summary>
|
||||
Pending = 0,
|
||||
|
||||
/// <summary>Entry currently being replayed.</summary>
|
||||
Replaying = 1,
|
||||
|
||||
/// <summary>Entry successfully replayed as a new job.</summary>
|
||||
Replayed = 2,
|
||||
|
||||
/// <summary>Entry manually resolved without replay.</summary>
|
||||
Resolved = 3,
|
||||
|
||||
/// <summary>Entry exhausted all replay attempts.</summary>
|
||||
Exhausted = 4,
|
||||
|
||||
/// <summary>Entry expired and eligible for purge.</summary>
|
||||
Expired = 5
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Error classification categories for dead-letter entries.
|
||||
/// </summary>
|
||||
public enum ErrorCategory
|
||||
{
|
||||
/// <summary>Unknown or unclassified error.</summary>
|
||||
Unknown = 0,
|
||||
|
||||
/// <summary>Transient infrastructure error (network, timeout).</summary>
|
||||
Transient = 1,
|
||||
|
||||
/// <summary>Resource not found (image, source, etc.).</summary>
|
||||
NotFound = 2,
|
||||
|
||||
/// <summary>Authentication or authorization failure.</summary>
|
||||
AuthFailure = 3,
|
||||
|
||||
/// <summary>Rate limiting or quota exceeded.</summary>
|
||||
RateLimited = 4,
|
||||
|
||||
/// <summary>Invalid input or configuration.</summary>
|
||||
ValidationError = 5,
|
||||
|
||||
/// <summary>Upstream service error (registry, advisory feed).</summary>
|
||||
UpstreamError = 6,
|
||||
|
||||
/// <summary>Internal processing error (bug, corruption).</summary>
|
||||
InternalError = 7,
|
||||
|
||||
/// <summary>Resource conflict (duplicate, version mismatch).</summary>
|
||||
Conflict = 8,
|
||||
|
||||
/// <summary>Operation canceled by user or system.</summary>
|
||||
Canceled = 9
|
||||
}
|
||||
@@ -0,0 +1,581 @@
|
||||
|
||||
using StellaOps.Cryptography;
|
||||
using StellaOps.JobEngine.Core.Hashing;
|
||||
using System.Text;
|
||||
using System.Text.Json;
|
||||
using System.Text.Json.Serialization;
|
||||
|
||||
namespace StellaOps.JobEngine.Core.Domain.Events;
|
||||
|
||||
/// <summary>
|
||||
/// Standardized event envelope for orchestrator events.
|
||||
/// Supports policy, export, and job lifecycle events with idempotency keys.
|
||||
/// </summary>
|
||||
public sealed record EventEnvelope(
|
||||
/// <summary>Schema version identifier.</summary>
|
||||
string SchemaVersion,
|
||||
|
||||
/// <summary>Unique event ID (UUIDv7 or ULID format).</summary>
|
||||
string EventId,
|
||||
|
||||
/// <summary>Event type classification.</summary>
|
||||
JobEngineEventType EventType,
|
||||
|
||||
/// <summary>When the event occurred (UTC).</summary>
|
||||
DateTimeOffset OccurredAt,
|
||||
|
||||
/// <summary>Idempotency key for deduplication.</summary>
|
||||
string IdempotencyKey,
|
||||
|
||||
/// <summary>Correlation ID for request tracing.</summary>
|
||||
string? CorrelationId,
|
||||
|
||||
/// <summary>Tenant identifier.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Project identifier (optional but preferred).</summary>
|
||||
string? ProjectId,
|
||||
|
||||
/// <summary>Actor who triggered/emitted the event.</summary>
|
||||
EventActor Actor,
|
||||
|
||||
/// <summary>Job-related metadata (null for non-job events).</summary>
|
||||
EventJob? Job,
|
||||
|
||||
/// <summary>Event metrics.</summary>
|
||||
EventMetrics? Metrics,
|
||||
|
||||
/// <summary>Notifier transport metadata.</summary>
|
||||
EventNotifier? Notifier,
|
||||
|
||||
/// <summary>Event-specific payload.</summary>
|
||||
JsonElement? Payload)
|
||||
{
|
||||
/// <summary>Current schema version.</summary>
|
||||
public const string CurrentSchemaVersion = "orch.event.v1";
|
||||
|
||||
/// <summary>Creates a new event envelope with generated ID and timestamp.</summary>
|
||||
public static EventEnvelope Create(
|
||||
JobEngineEventType eventType,
|
||||
string tenantId,
|
||||
EventActor actor,
|
||||
DateTimeOffset occurredAt,
|
||||
string? correlationId = null,
|
||||
string? projectId = null,
|
||||
EventJob? job = null,
|
||||
EventMetrics? metrics = null,
|
||||
EventNotifier? notifier = null,
|
||||
JsonElement? payload = null)
|
||||
{
|
||||
var eventId = GenerateEventId(occurredAt);
|
||||
var idempotencyKey = GenerateIdempotencyKey(eventType, job?.Id, job?.Attempt ?? 0);
|
||||
|
||||
return new EventEnvelope(
|
||||
SchemaVersion: CurrentSchemaVersion,
|
||||
EventId: eventId,
|
||||
EventType: eventType,
|
||||
OccurredAt: occurredAt,
|
||||
IdempotencyKey: idempotencyKey,
|
||||
CorrelationId: correlationId,
|
||||
TenantId: tenantId,
|
||||
ProjectId: projectId,
|
||||
Actor: actor,
|
||||
Job: job,
|
||||
Metrics: metrics,
|
||||
Notifier: notifier,
|
||||
Payload: payload);
|
||||
}
|
||||
|
||||
/// <summary>Creates a job-related event envelope.</summary>
|
||||
public static EventEnvelope ForJob(
|
||||
JobEngineEventType eventType,
|
||||
string tenantId,
|
||||
EventActor actor,
|
||||
EventJob job,
|
||||
DateTimeOffset occurredAt,
|
||||
string? correlationId = null,
|
||||
string? projectId = null,
|
||||
EventMetrics? metrics = null,
|
||||
JsonElement? payload = null)
|
||||
{
|
||||
return Create(
|
||||
eventType: eventType,
|
||||
tenantId: tenantId,
|
||||
actor: actor,
|
||||
occurredAt: occurredAt,
|
||||
correlationId: correlationId,
|
||||
projectId: projectId,
|
||||
job: job,
|
||||
metrics: metrics,
|
||||
payload: payload);
|
||||
}
|
||||
|
||||
/// <summary>Creates an export-related event envelope.</summary>
|
||||
public static EventEnvelope ForExport(
|
||||
JobEngineEventType eventType,
|
||||
string tenantId,
|
||||
EventActor actor,
|
||||
EventJob exportJob,
|
||||
DateTimeOffset occurredAt,
|
||||
string? correlationId = null,
|
||||
string? projectId = null,
|
||||
EventMetrics? metrics = null,
|
||||
JsonElement? payload = null)
|
||||
{
|
||||
return ForJob(
|
||||
eventType: eventType,
|
||||
tenantId: tenantId,
|
||||
actor: actor,
|
||||
job: exportJob,
|
||||
occurredAt: occurredAt,
|
||||
correlationId: correlationId,
|
||||
projectId: projectId,
|
||||
metrics: metrics,
|
||||
payload: payload);
|
||||
}
|
||||
|
||||
/// <summary>Creates a policy-related event envelope.</summary>
|
||||
public static EventEnvelope ForPolicy(
|
||||
JobEngineEventType eventType,
|
||||
string tenantId,
|
||||
EventActor actor,
|
||||
DateTimeOffset occurredAt,
|
||||
string? correlationId = null,
|
||||
string? projectId = null,
|
||||
JsonElement? payload = null)
|
||||
{
|
||||
return Create(
|
||||
eventType: eventType,
|
||||
tenantId: tenantId,
|
||||
actor: actor,
|
||||
occurredAt: occurredAt,
|
||||
correlationId: correlationId,
|
||||
projectId: projectId,
|
||||
payload: payload);
|
||||
}
|
||||
|
||||
/// <summary>Generates a UUIDv7-style event ID.</summary>
|
||||
private static string GenerateEventId(DateTimeOffset timestamp)
|
||||
{
|
||||
// UUIDv7: timestamp-based with random suffix
|
||||
var timestampMs = timestamp.ToUnixTimeMilliseconds();
|
||||
var random = Guid.NewGuid().ToString("N")[..16];
|
||||
return $"urn:orch:event:{timestampMs:x}-{random}";
|
||||
}
|
||||
|
||||
/// <summary>Generates an idempotency key for deduplication.</summary>
|
||||
public static string GenerateIdempotencyKey(JobEngineEventType eventType, string? jobId, int attempt)
|
||||
{
|
||||
var jobPart = jobId ?? "none";
|
||||
return $"orch-{eventType.ToEventTypeName()}-{jobPart}-{attempt}";
|
||||
}
|
||||
|
||||
/// <summary>Serializes the envelope to JSON.</summary>
|
||||
public string ToJson() => JsonSerializer.Serialize(this, JsonOptions);
|
||||
|
||||
/// <summary>Deserializes an envelope from JSON.</summary>
|
||||
public static EventEnvelope? FromJson(string json)
|
||||
{
|
||||
try
|
||||
{
|
||||
return JsonSerializer.Deserialize<EventEnvelope>(json, JsonOptions);
|
||||
}
|
||||
catch (JsonException)
|
||||
{
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Computes a digest of the envelope for signing.
|
||||
/// Uses the platform's compliance-aware crypto abstraction.
|
||||
/// </summary>
|
||||
public string ComputeDigest(ICryptoHash cryptoHash)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(cryptoHash);
|
||||
|
||||
var canonicalJson = CanonicalJsonHasher.ToCanonicalJson(new { envelope = this });
|
||||
var bytes = Encoding.UTF8.GetBytes(canonicalJson);
|
||||
var hash = cryptoHash.ComputePrefixedHashForPurpose(bytes, HashPurpose.Content);
|
||||
return hash;
|
||||
}
|
||||
|
||||
private static readonly JsonSerializerOptions JsonOptions = new()
|
||||
{
|
||||
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
|
||||
DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull,
|
||||
WriteIndented = false,
|
||||
Converters = { new JsonStringEnumConverter(JsonNamingPolicy.SnakeCaseLower) }
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Actor who triggered or emitted an event.
|
||||
/// </summary>
|
||||
public sealed record EventActor(
|
||||
/// <summary>Subject identifier (e.g., "service/worker-sdk-go", "user/admin@example.com").</summary>
|
||||
string Subject,
|
||||
|
||||
/// <summary>Scopes/permissions under which the action was taken.</summary>
|
||||
IReadOnlyList<string>? Scopes)
|
||||
{
|
||||
/// <summary>Creates a service actor.</summary>
|
||||
public static EventActor Service(string serviceName, params string[] scopes)
|
||||
=> new($"service/{serviceName}", scopes.Length > 0 ? scopes : null);
|
||||
|
||||
/// <summary>Creates a user actor.</summary>
|
||||
public static EventActor User(string userId, params string[] scopes)
|
||||
=> new($"user/{userId}", scopes.Length > 0 ? scopes : null);
|
||||
|
||||
/// <summary>Creates a system actor (for automated processes).</summary>
|
||||
public static EventActor System(string component, params string[] scopes)
|
||||
=> new($"system/{component}", scopes.Length > 0 ? scopes : null);
|
||||
|
||||
/// <summary>Creates a worker actor.</summary>
|
||||
public static EventActor Worker(string workerId, string sdkType)
|
||||
=> new($"worker/{sdkType}/{workerId}", null);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Job-related metadata in an event.
|
||||
/// </summary>
|
||||
public sealed record EventJob(
|
||||
/// <summary>Job identifier.</summary>
|
||||
string Id,
|
||||
|
||||
/// <summary>Job type (e.g., "pack-run", "ingest", "export").</summary>
|
||||
string Type,
|
||||
|
||||
/// <summary>Run identifier (for pack runs / simulations).</summary>
|
||||
string? RunId,
|
||||
|
||||
/// <summary>Attempt number.</summary>
|
||||
int Attempt,
|
||||
|
||||
/// <summary>Lease identifier.</summary>
|
||||
string? LeaseId,
|
||||
|
||||
/// <summary>Task runner identifier.</summary>
|
||||
string? TaskRunnerId,
|
||||
|
||||
/// <summary>Job status.</summary>
|
||||
string Status,
|
||||
|
||||
/// <summary>Status reason (for failures/cancellations).</summary>
|
||||
string? Reason,
|
||||
|
||||
/// <summary>Payload digest for integrity.</summary>
|
||||
string? PayloadDigest,
|
||||
|
||||
/// <summary>Associated artifacts.</summary>
|
||||
IReadOnlyList<EventArtifact>? Artifacts)
|
||||
{
|
||||
/// <summary>Creates job metadata from basic info.</summary>
|
||||
public static EventJob Create(
|
||||
string id,
|
||||
string type,
|
||||
string status,
|
||||
int attempt = 1,
|
||||
string? runId = null,
|
||||
string? leaseId = null,
|
||||
string? taskRunnerId = null,
|
||||
string? reason = null,
|
||||
string? payloadDigest = null,
|
||||
IReadOnlyList<EventArtifact>? artifacts = null)
|
||||
{
|
||||
return new EventJob(
|
||||
Id: id,
|
||||
Type: type,
|
||||
RunId: runId,
|
||||
Attempt: attempt,
|
||||
LeaseId: leaseId,
|
||||
TaskRunnerId: taskRunnerId,
|
||||
Status: status,
|
||||
Reason: reason,
|
||||
PayloadDigest: payloadDigest,
|
||||
Artifacts: artifacts);
|
||||
}
|
||||
|
||||
/// <summary>Creates a completed job event.</summary>
|
||||
public static EventJob Completed(string id, string type, int attempt, string? payloadDigest = null, IReadOnlyList<EventArtifact>? artifacts = null)
|
||||
=> Create(id, type, "completed", attempt, payloadDigest: payloadDigest, artifacts: artifacts);
|
||||
|
||||
/// <summary>Creates a failed job event.</summary>
|
||||
public static EventJob Failed(string id, string type, int attempt, string reason)
|
||||
=> Create(id, type, "failed", attempt, reason: reason);
|
||||
|
||||
/// <summary>Creates a canceled job event.</summary>
|
||||
public static EventJob Canceled(string id, string type, int attempt, string reason)
|
||||
=> Create(id, type, "canceled", attempt, reason: reason);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Artifact metadata in an event.
|
||||
/// </summary>
|
||||
public sealed record EventArtifact(
|
||||
/// <summary>Artifact URI (storage location).</summary>
|
||||
string Uri,
|
||||
|
||||
/// <summary>Content digest for integrity.</summary>
|
||||
string Digest,
|
||||
|
||||
/// <summary>MIME type.</summary>
|
||||
string Mime);
|
||||
|
||||
/// <summary>
|
||||
/// Event timing and performance metrics.
|
||||
/// </summary>
|
||||
public sealed record EventMetrics(
|
||||
/// <summary>Duration in seconds.</summary>
|
||||
double? DurationSeconds,
|
||||
|
||||
/// <summary>Log stream lag in seconds.</summary>
|
||||
double? LogStreamLagSeconds,
|
||||
|
||||
/// <summary>Backoff delay in seconds.</summary>
|
||||
double? BackoffSeconds,
|
||||
|
||||
/// <summary>Queue wait time in seconds.</summary>
|
||||
double? QueueWaitSeconds,
|
||||
|
||||
/// <summary>Processing time in seconds.</summary>
|
||||
double? ProcessingSeconds)
|
||||
{
|
||||
/// <summary>Creates metrics with just duration.</summary>
|
||||
public static EventMetrics WithDuration(double seconds)
|
||||
=> new(seconds, null, null, null, null);
|
||||
|
||||
/// <summary>Creates metrics with duration and processing breakdown.</summary>
|
||||
public static EventMetrics WithBreakdown(double total, double queueWait, double processing)
|
||||
=> new(total, null, null, queueWait, processing);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Notifier transport metadata.
|
||||
/// </summary>
|
||||
public sealed record EventNotifier(
|
||||
/// <summary>Notifier channel name.</summary>
|
||||
string Channel,
|
||||
|
||||
/// <summary>Delivery format (e.g., "dsse", "raw").</summary>
|
||||
string Delivery,
|
||||
|
||||
/// <summary>Replay metadata (for replayed events).</summary>
|
||||
EventReplay? Replay)
|
||||
{
|
||||
/// <summary>Creates notifier metadata for the jobs channel.</summary>
|
||||
public static EventNotifier JobsChannel(string delivery = "dsse")
|
||||
=> new("orch.jobs", delivery, null);
|
||||
|
||||
/// <summary>Creates notifier metadata for the exports channel.</summary>
|
||||
public static EventNotifier ExportsChannel(string delivery = "dsse")
|
||||
=> new("orch.exports", delivery, null);
|
||||
|
||||
/// <summary>Creates notifier metadata for the policy channel.</summary>
|
||||
public static EventNotifier PolicyChannel(string delivery = "dsse")
|
||||
=> new("orch.policy", delivery, null);
|
||||
|
||||
/// <summary>Adds replay metadata.</summary>
|
||||
public EventNotifier WithReplay(int ordinal, int total)
|
||||
=> this with { Replay = new EventReplay(ordinal, total) };
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Replay metadata for replayed events.
|
||||
/// </summary>
|
||||
public sealed record EventReplay(
|
||||
/// <summary>Ordinal position in replay sequence.</summary>
|
||||
int Ordinal,
|
||||
|
||||
/// <summary>Total events in replay sequence.</summary>
|
||||
int Total);
|
||||
|
||||
/// <summary>
|
||||
/// Orchestrator event types.
|
||||
/// </summary>
|
||||
public enum JobEngineEventType
|
||||
{
|
||||
// Job lifecycle
|
||||
JobCreated,
|
||||
JobScheduled,
|
||||
JobStarted,
|
||||
JobCompleted,
|
||||
JobFailed,
|
||||
JobCanceled,
|
||||
JobRetrying,
|
||||
|
||||
// Export lifecycle
|
||||
ExportCreated,
|
||||
ExportStarted,
|
||||
ExportCompleted,
|
||||
ExportFailed,
|
||||
ExportCanceled,
|
||||
ExportArchived,
|
||||
ExportExpired,
|
||||
ExportDeleted,
|
||||
|
||||
// Schedule lifecycle
|
||||
ScheduleCreated,
|
||||
ScheduleEnabled,
|
||||
ScheduleDisabled,
|
||||
ScheduleTriggered,
|
||||
ScheduleSkipped,
|
||||
|
||||
// Alert lifecycle
|
||||
AlertCreated,
|
||||
AlertAcknowledged,
|
||||
AlertResolved,
|
||||
|
||||
// Retention lifecycle
|
||||
RetentionPruneStarted,
|
||||
RetentionPruneCompleted,
|
||||
|
||||
// Policy lifecycle
|
||||
PolicyUpdated,
|
||||
PolicySimulated,
|
||||
PolicyApplied,
|
||||
|
||||
// Pack run lifecycle
|
||||
PackRunCreated,
|
||||
PackRunStarted,
|
||||
PackRunLog,
|
||||
PackRunArtifact,
|
||||
PackRunCompleted,
|
||||
PackRunFailed
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Extension methods for event types.
|
||||
/// </summary>
|
||||
public static class JobEngineEventTypeExtensions
|
||||
{
|
||||
/// <summary>Converts event type to canonical string name.</summary>
|
||||
public static string ToEventTypeName(this JobEngineEventType eventType)
|
||||
{
|
||||
return eventType switch
|
||||
{
|
||||
JobEngineEventType.JobCreated => "job.created",
|
||||
JobEngineEventType.JobScheduled => "job.scheduled",
|
||||
JobEngineEventType.JobStarted => "job.started",
|
||||
JobEngineEventType.JobCompleted => "job.completed",
|
||||
JobEngineEventType.JobFailed => "job.failed",
|
||||
JobEngineEventType.JobCanceled => "job.canceled",
|
||||
JobEngineEventType.JobRetrying => "job.retrying",
|
||||
|
||||
JobEngineEventType.ExportCreated => "export.created",
|
||||
JobEngineEventType.ExportStarted => "export.started",
|
||||
JobEngineEventType.ExportCompleted => "export.completed",
|
||||
JobEngineEventType.ExportFailed => "export.failed",
|
||||
JobEngineEventType.ExportCanceled => "export.canceled",
|
||||
JobEngineEventType.ExportArchived => "export.archived",
|
||||
JobEngineEventType.ExportExpired => "export.expired",
|
||||
JobEngineEventType.ExportDeleted => "export.deleted",
|
||||
|
||||
JobEngineEventType.ScheduleCreated => "schedule.created",
|
||||
JobEngineEventType.ScheduleEnabled => "schedule.enabled",
|
||||
JobEngineEventType.ScheduleDisabled => "schedule.disabled",
|
||||
JobEngineEventType.ScheduleTriggered => "schedule.triggered",
|
||||
JobEngineEventType.ScheduleSkipped => "schedule.skipped",
|
||||
|
||||
JobEngineEventType.AlertCreated => "alert.created",
|
||||
JobEngineEventType.AlertAcknowledged => "alert.acknowledged",
|
||||
JobEngineEventType.AlertResolved => "alert.resolved",
|
||||
|
||||
JobEngineEventType.RetentionPruneStarted => "retention.prune_started",
|
||||
JobEngineEventType.RetentionPruneCompleted => "retention.prune_completed",
|
||||
|
||||
JobEngineEventType.PolicyUpdated => "policy.updated",
|
||||
JobEngineEventType.PolicySimulated => "policy.simulated",
|
||||
JobEngineEventType.PolicyApplied => "policy.applied",
|
||||
|
||||
JobEngineEventType.PackRunCreated => "pack_run.created",
|
||||
JobEngineEventType.PackRunStarted => "pack_run.started",
|
||||
JobEngineEventType.PackRunLog => "pack_run.log",
|
||||
JobEngineEventType.PackRunArtifact => "pack_run.artifact",
|
||||
JobEngineEventType.PackRunCompleted => "pack_run.completed",
|
||||
JobEngineEventType.PackRunFailed => "pack_run.failed",
|
||||
|
||||
_ => eventType.ToString().ToLowerInvariant()
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>Parses a canonical event type name.</summary>
|
||||
public static JobEngineEventType? FromEventTypeName(string name)
|
||||
{
|
||||
return name switch
|
||||
{
|
||||
"job.created" => JobEngineEventType.JobCreated,
|
||||
"job.scheduled" => JobEngineEventType.JobScheduled,
|
||||
"job.started" => JobEngineEventType.JobStarted,
|
||||
"job.completed" => JobEngineEventType.JobCompleted,
|
||||
"job.failed" => JobEngineEventType.JobFailed,
|
||||
"job.canceled" => JobEngineEventType.JobCanceled,
|
||||
"job.retrying" => JobEngineEventType.JobRetrying,
|
||||
|
||||
"export.created" => JobEngineEventType.ExportCreated,
|
||||
"export.started" => JobEngineEventType.ExportStarted,
|
||||
"export.completed" => JobEngineEventType.ExportCompleted,
|
||||
"export.failed" => JobEngineEventType.ExportFailed,
|
||||
"export.canceled" => JobEngineEventType.ExportCanceled,
|
||||
"export.archived" => JobEngineEventType.ExportArchived,
|
||||
"export.expired" => JobEngineEventType.ExportExpired,
|
||||
"export.deleted" => JobEngineEventType.ExportDeleted,
|
||||
|
||||
"schedule.created" => JobEngineEventType.ScheduleCreated,
|
||||
"schedule.enabled" => JobEngineEventType.ScheduleEnabled,
|
||||
"schedule.disabled" => JobEngineEventType.ScheduleDisabled,
|
||||
"schedule.triggered" => JobEngineEventType.ScheduleTriggered,
|
||||
"schedule.skipped" => JobEngineEventType.ScheduleSkipped,
|
||||
|
||||
"alert.created" => JobEngineEventType.AlertCreated,
|
||||
"alert.acknowledged" => JobEngineEventType.AlertAcknowledged,
|
||||
"alert.resolved" => JobEngineEventType.AlertResolved,
|
||||
|
||||
"retention.prune_started" => JobEngineEventType.RetentionPruneStarted,
|
||||
"retention.prune_completed" => JobEngineEventType.RetentionPruneCompleted,
|
||||
|
||||
"policy.updated" => JobEngineEventType.PolicyUpdated,
|
||||
"policy.simulated" => JobEngineEventType.PolicySimulated,
|
||||
"policy.applied" => JobEngineEventType.PolicyApplied,
|
||||
|
||||
"pack_run.created" => JobEngineEventType.PackRunCreated,
|
||||
"pack_run.started" => JobEngineEventType.PackRunStarted,
|
||||
"pack_run.log" => JobEngineEventType.PackRunLog,
|
||||
"pack_run.artifact" => JobEngineEventType.PackRunArtifact,
|
||||
"pack_run.completed" => JobEngineEventType.PackRunCompleted,
|
||||
"pack_run.failed" => JobEngineEventType.PackRunFailed,
|
||||
|
||||
_ => null
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>Whether the event type is a failure event.</summary>
|
||||
public static bool IsFailure(this JobEngineEventType eventType)
|
||||
{
|
||||
return eventType is
|
||||
JobEngineEventType.JobFailed or
|
||||
JobEngineEventType.ExportFailed or
|
||||
JobEngineEventType.PackRunFailed;
|
||||
}
|
||||
|
||||
/// <summary>Whether the event type is a completion event.</summary>
|
||||
public static bool IsCompletion(this JobEngineEventType eventType)
|
||||
{
|
||||
return eventType is
|
||||
JobEngineEventType.JobCompleted or
|
||||
JobEngineEventType.ExportCompleted or
|
||||
JobEngineEventType.PackRunCompleted or
|
||||
JobEngineEventType.RetentionPruneCompleted;
|
||||
}
|
||||
|
||||
/// <summary>Whether the event type is a lifecycle terminal event.</summary>
|
||||
public static bool IsTerminal(this JobEngineEventType eventType)
|
||||
{
|
||||
return eventType.IsFailure() || eventType.IsCompletion() ||
|
||||
eventType is
|
||||
JobEngineEventType.JobCanceled or
|
||||
JobEngineEventType.ExportCanceled or
|
||||
JobEngineEventType.ExportDeleted or
|
||||
JobEngineEventType.AlertResolved;
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,252 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain.Events;
|
||||
|
||||
/// <summary>
|
||||
/// Interface for publishing orchestrator events to the notifier bus.
|
||||
/// </summary>
|
||||
public interface IEventPublisher
|
||||
{
|
||||
/// <summary>Publishes an event to the notifier bus.</summary>
|
||||
/// <param name="envelope">The event envelope to publish.</param>
|
||||
/// <param name="cancellationToken">Cancellation token.</param>
|
||||
/// <returns>True if published successfully; false if deduplicated.</returns>
|
||||
Task<bool> PublishAsync(EventEnvelope envelope, CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>Publishes multiple events to the notifier bus.</summary>
|
||||
/// <param name="envelopes">The event envelopes to publish.</param>
|
||||
/// <param name="cancellationToken">Cancellation token.</param>
|
||||
/// <returns>The result containing success/dedup counts.</returns>
|
||||
Task<BatchPublishResult> PublishBatchAsync(IEnumerable<EventEnvelope> envelopes, CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>Checks if an event with the given idempotency key has already been published.</summary>
|
||||
/// <param name="idempotencyKey">The idempotency key to check.</param>
|
||||
/// <param name="cancellationToken">Cancellation token.</param>
|
||||
/// <returns>True if already published.</returns>
|
||||
Task<bool> IsPublishedAsync(string idempotencyKey, CancellationToken cancellationToken = default);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Result of a batch publish operation.
|
||||
/// </summary>
|
||||
public sealed record BatchPublishResult(
|
||||
/// <summary>Number of events successfully published.</summary>
|
||||
int Published,
|
||||
|
||||
/// <summary>Number of events deduplicated (already published).</summary>
|
||||
int Deduplicated,
|
||||
|
||||
/// <summary>Number of events that failed to publish.</summary>
|
||||
int Failed,
|
||||
|
||||
/// <summary>Errors encountered during publishing.</summary>
|
||||
IReadOnlyList<string> Errors)
|
||||
{
|
||||
/// <summary>Total events processed.</summary>
|
||||
public int Total => Published + Deduplicated + Failed;
|
||||
|
||||
/// <summary>Whether any events were published successfully.</summary>
|
||||
public bool HasPublished => Published > 0;
|
||||
|
||||
/// <summary>Whether any errors occurred.</summary>
|
||||
public bool HasErrors => Failed > 0 || Errors.Count > 0;
|
||||
|
||||
/// <summary>Creates an empty result.</summary>
|
||||
public static BatchPublishResult Empty => new(0, 0, 0, []);
|
||||
|
||||
/// <summary>Creates a successful single publish result.</summary>
|
||||
public static BatchPublishResult SingleSuccess => new(1, 0, 0, []);
|
||||
|
||||
/// <summary>Creates a deduplicated single result.</summary>
|
||||
public static BatchPublishResult SingleDeduplicated => new(0, 1, 0, []);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Event publishing options.
|
||||
/// </summary>
|
||||
public sealed record EventPublishOptions(
|
||||
/// <summary>Whether to sign events with DSSE.</summary>
|
||||
bool SignWithDsse,
|
||||
|
||||
/// <summary>Maximum retry attempts for transient failures.</summary>
|
||||
int MaxRetries,
|
||||
|
||||
/// <summary>Base delay between retries.</summary>
|
||||
TimeSpan RetryDelay,
|
||||
|
||||
/// <summary>TTL for idempotency key tracking.</summary>
|
||||
TimeSpan IdempotencyTtl,
|
||||
|
||||
/// <summary>Whether to include provenance metadata.</summary>
|
||||
bool IncludeProvenance,
|
||||
|
||||
/// <summary>Whether to compress large payloads.</summary>
|
||||
bool CompressLargePayloads,
|
||||
|
||||
/// <summary>Threshold for payload compression (bytes).</summary>
|
||||
int CompressionThreshold,
|
||||
|
||||
/// <summary>Maximum number of events to fan out in a single batch to avoid backpressure.</summary>
|
||||
int MaxBatchSize)
|
||||
{
|
||||
/// <summary>Default publishing options.</summary>
|
||||
public static EventPublishOptions Default => new(
|
||||
SignWithDsse: true,
|
||||
MaxRetries: 3,
|
||||
RetryDelay: TimeSpan.FromSeconds(1),
|
||||
IdempotencyTtl: TimeSpan.FromHours(24),
|
||||
IncludeProvenance: true,
|
||||
CompressLargePayloads: true,
|
||||
CompressionThreshold: 64 * 1024,
|
||||
MaxBatchSize: 500);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Interface for event signing.
|
||||
/// </summary>
|
||||
public interface IEventSigner
|
||||
{
|
||||
/// <summary>Signs an event envelope with DSSE.</summary>
|
||||
/// <param name="envelope">The envelope to sign.</param>
|
||||
/// <param name="cancellationToken">Cancellation token.</param>
|
||||
/// <returns>The signed envelope as a DSSE payload.</returns>
|
||||
Task<string> SignAsync(EventEnvelope envelope, CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>Verifies a signed event envelope.</summary>
|
||||
/// <param name="signedPayload">The signed DSSE payload.</param>
|
||||
/// <param name="cancellationToken">Cancellation token.</param>
|
||||
/// <returns>The verified envelope, or null if verification fails.</returns>
|
||||
Task<EventEnvelope?> VerifyAsync(string signedPayload, CancellationToken cancellationToken = default);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Interface for idempotency tracking.
|
||||
/// </summary>
|
||||
public interface IIdempotencyStore
|
||||
{
|
||||
/// <summary>Tries to mark an idempotency key as processed.</summary>
|
||||
/// <param name="key">The idempotency key.</param>
|
||||
/// <param name="ttl">TTL for the key.</param>
|
||||
/// <param name="cancellationToken">Cancellation token.</param>
|
||||
/// <returns>True if newly marked; false if already existed.</returns>
|
||||
Task<bool> TryMarkAsync(string key, TimeSpan ttl, CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>Checks if an idempotency key exists.</summary>
|
||||
/// <param name="key">The idempotency key.</param>
|
||||
/// <param name="cancellationToken">Cancellation token.</param>
|
||||
/// <returns>True if the key exists.</returns>
|
||||
Task<bool> ExistsAsync(string key, CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>Removes an idempotency key.</summary>
|
||||
/// <param name="key">The idempotency key.</param>
|
||||
/// <param name="cancellationToken">Cancellation token.</param>
|
||||
Task RemoveAsync(string key, CancellationToken cancellationToken = default);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Interface for the notifier bus transport.
|
||||
/// </summary>
|
||||
public interface INotifierBus
|
||||
{
|
||||
/// <summary>Sends a message to the notifier bus.</summary>
|
||||
/// <param name="channel">Target channel.</param>
|
||||
/// <param name="message">Message payload (JSON or signed DSSE).</param>
|
||||
/// <param name="cancellationToken">Cancellation token.</param>
|
||||
Task SendAsync(string channel, string message, CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>Sends multiple messages to the notifier bus.</summary>
|
||||
/// <param name="channel">Target channel.</param>
|
||||
/// <param name="messages">Message payloads.</param>
|
||||
/// <param name="cancellationToken">Cancellation token.</param>
|
||||
Task SendBatchAsync(string channel, IEnumerable<string> messages, CancellationToken cancellationToken = default);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Null implementation of event publisher for testing.
|
||||
/// </summary>
|
||||
public sealed class NullEventPublisher : IEventPublisher
|
||||
{
|
||||
/// <summary>Singleton instance.</summary>
|
||||
public static NullEventPublisher Instance { get; } = new();
|
||||
|
||||
private NullEventPublisher() { }
|
||||
|
||||
public Task<bool> PublishAsync(EventEnvelope envelope, CancellationToken cancellationToken = default)
|
||||
=> Task.FromResult(true);
|
||||
|
||||
public Task<BatchPublishResult> PublishBatchAsync(IEnumerable<EventEnvelope> envelopes, CancellationToken cancellationToken = default)
|
||||
{
|
||||
var count = envelopes.Count();
|
||||
return Task.FromResult(new BatchPublishResult(count, 0, 0, []));
|
||||
}
|
||||
|
||||
public Task<bool> IsPublishedAsync(string idempotencyKey, CancellationToken cancellationToken = default)
|
||||
=> Task.FromResult(false);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// In-memory implementation of idempotency store for testing.
|
||||
/// </summary>
|
||||
public sealed class InMemoryIdempotencyStore : IIdempotencyStore
|
||||
{
|
||||
private readonly Dictionary<string, DateTimeOffset> _keys = new();
|
||||
private readonly TimeProvider _timeProvider;
|
||||
private readonly object _lock = new();
|
||||
|
||||
/// <summary>Creates a new in-memory idempotency store.</summary>
|
||||
public InMemoryIdempotencyStore(TimeProvider? timeProvider = null)
|
||||
{
|
||||
_timeProvider = timeProvider ?? TimeProvider.System;
|
||||
}
|
||||
|
||||
public Task<bool> TryMarkAsync(string key, TimeSpan ttl, CancellationToken cancellationToken = default)
|
||||
{
|
||||
lock (_lock)
|
||||
{
|
||||
CleanupExpired();
|
||||
if (_keys.ContainsKey(key))
|
||||
return Task.FromResult(false);
|
||||
|
||||
_keys[key] = _timeProvider.GetUtcNow().Add(ttl);
|
||||
return Task.FromResult(true);
|
||||
}
|
||||
}
|
||||
|
||||
public Task<bool> ExistsAsync(string key, CancellationToken cancellationToken = default)
|
||||
{
|
||||
lock (_lock)
|
||||
{
|
||||
CleanupExpired();
|
||||
return Task.FromResult(_keys.ContainsKey(key));
|
||||
}
|
||||
}
|
||||
|
||||
public Task RemoveAsync(string key, CancellationToken cancellationToken = default)
|
||||
{
|
||||
lock (_lock)
|
||||
{
|
||||
_keys.Remove(key);
|
||||
}
|
||||
return Task.CompletedTask;
|
||||
}
|
||||
|
||||
private void CleanupExpired()
|
||||
{
|
||||
var now = _timeProvider.GetUtcNow();
|
||||
var expired = _keys.Where(kv => kv.Value <= now).Select(kv => kv.Key).ToList();
|
||||
foreach (var key in expired)
|
||||
{
|
||||
_keys.Remove(key);
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>Gets the current key count (for testing).</summary>
|
||||
public int Count
|
||||
{
|
||||
get { lock (_lock) { CleanupExpired(); return _keys.Count; } }
|
||||
}
|
||||
|
||||
/// <summary>Clears all keys (for testing).</summary>
|
||||
public void Clear()
|
||||
{
|
||||
lock (_lock) { _keys.Clear(); }
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,257 @@
|
||||
|
||||
using System.Security.Cryptography;
|
||||
using System.Text;
|
||||
using System.Text.Json;
|
||||
using System.Text.Json.Serialization;
|
||||
|
||||
namespace StellaOps.JobEngine.Core.Domain.Events;
|
||||
|
||||
/// <summary>
|
||||
/// Unified timeline event for audit trail, observability, and evidence chain tracking.
|
||||
/// Per ORCH-OBS-52-001 and timeline-event.schema.json.
|
||||
/// </summary>
|
||||
public sealed record TimelineEvent(
|
||||
/// <summary>Monotonically increasing sequence number for ordering.</summary>
|
||||
long? EventSeq,
|
||||
|
||||
/// <summary>Globally unique event identifier.</summary>
|
||||
Guid EventId,
|
||||
|
||||
/// <summary>Tenant scope for multi-tenant isolation.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Event type identifier following namespace convention.</summary>
|
||||
string EventType,
|
||||
|
||||
/// <summary>Service or component that emitted this event.</summary>
|
||||
string Source,
|
||||
|
||||
/// <summary>When the event actually occurred.</summary>
|
||||
DateTimeOffset OccurredAt,
|
||||
|
||||
/// <summary>When the event was received by timeline indexer.</summary>
|
||||
DateTimeOffset? ReceivedAt,
|
||||
|
||||
/// <summary>Correlation ID linking related events across services.</summary>
|
||||
string? CorrelationId,
|
||||
|
||||
/// <summary>OpenTelemetry trace ID for distributed tracing.</summary>
|
||||
string? TraceId,
|
||||
|
||||
/// <summary>OpenTelemetry span ID within the trace.</summary>
|
||||
string? SpanId,
|
||||
|
||||
/// <summary>User, service account, or system that triggered the event.</summary>
|
||||
string? Actor,
|
||||
|
||||
/// <summary>Event severity level.</summary>
|
||||
TimelineEventSeverity Severity,
|
||||
|
||||
/// <summary>Key-value attributes for filtering and querying.</summary>
|
||||
IReadOnlyDictionary<string, string>? Attributes,
|
||||
|
||||
/// <summary>SHA-256 hash of the raw payload for integrity.</summary>
|
||||
string? PayloadHash,
|
||||
|
||||
/// <summary>Original event payload as JSON string.</summary>
|
||||
string? RawPayloadJson,
|
||||
|
||||
/// <summary>Canonicalized JSON for deterministic hashing.</summary>
|
||||
string? NormalizedPayloadJson,
|
||||
|
||||
/// <summary>Reference to associated evidence bundle or attestation.</summary>
|
||||
EvidencePointer? EvidencePointer,
|
||||
|
||||
/// <summary>Run ID if this event is associated with a run.</summary>
|
||||
Guid? RunId,
|
||||
|
||||
/// <summary>Job ID if this event is associated with a job.</summary>
|
||||
Guid? JobId,
|
||||
|
||||
/// <summary>Project ID scope within tenant.</summary>
|
||||
string? ProjectId)
|
||||
{
|
||||
private static readonly JsonSerializerOptions JsonOptions = new()
|
||||
{
|
||||
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
|
||||
DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull,
|
||||
WriteIndented = false
|
||||
};
|
||||
|
||||
private static readonly JsonSerializerOptions CanonicalJsonOptions = new()
|
||||
{
|
||||
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
|
||||
DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull,
|
||||
WriteIndented = false,
|
||||
Encoder = System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping
|
||||
};
|
||||
|
||||
/// <summary>
|
||||
/// Creates a new timeline event with generated ID.
|
||||
/// </summary>
|
||||
public static TimelineEvent Create(
|
||||
string tenantId,
|
||||
string eventType,
|
||||
string source,
|
||||
DateTimeOffset occurredAt,
|
||||
string? actor = null,
|
||||
TimelineEventSeverity severity = TimelineEventSeverity.Info,
|
||||
IReadOnlyDictionary<string, string>? attributes = null,
|
||||
string? correlationId = null,
|
||||
string? traceId = null,
|
||||
string? spanId = null,
|
||||
Guid? runId = null,
|
||||
Guid? jobId = null,
|
||||
string? projectId = null,
|
||||
object? payload = null,
|
||||
EvidencePointer? evidencePointer = null)
|
||||
{
|
||||
string? rawPayload = null;
|
||||
string? normalizedPayload = null;
|
||||
string? payloadHash = null;
|
||||
|
||||
if (payload is not null)
|
||||
{
|
||||
rawPayload = JsonSerializer.Serialize(payload, JsonOptions);
|
||||
normalizedPayload = NormalizeJson(rawPayload);
|
||||
payloadHash = ComputeHash(normalizedPayload);
|
||||
}
|
||||
|
||||
return new TimelineEvent(
|
||||
EventSeq: null,
|
||||
EventId: Guid.NewGuid(),
|
||||
TenantId: tenantId,
|
||||
EventType: eventType,
|
||||
Source: source,
|
||||
OccurredAt: occurredAt,
|
||||
ReceivedAt: null,
|
||||
CorrelationId: correlationId,
|
||||
TraceId: traceId,
|
||||
SpanId: spanId,
|
||||
Actor: actor,
|
||||
Severity: severity,
|
||||
Attributes: attributes,
|
||||
PayloadHash: payloadHash,
|
||||
RawPayloadJson: rawPayload,
|
||||
NormalizedPayloadJson: normalizedPayload,
|
||||
EvidencePointer: evidencePointer,
|
||||
RunId: runId,
|
||||
JobId: jobId,
|
||||
ProjectId: projectId);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Serializes the event to JSON.
|
||||
/// </summary>
|
||||
public string ToJson() => JsonSerializer.Serialize(this, JsonOptions);
|
||||
|
||||
/// <summary>
|
||||
/// Parses a timeline event from JSON.
|
||||
/// </summary>
|
||||
public static TimelineEvent? FromJson(string json)
|
||||
=> JsonSerializer.Deserialize<TimelineEvent>(json, JsonOptions);
|
||||
|
||||
/// <summary>
|
||||
/// Creates a copy with received timestamp set.
|
||||
/// </summary>
|
||||
public TimelineEvent WithReceivedAt(DateTimeOffset receivedAt)
|
||||
=> this with { ReceivedAt = receivedAt };
|
||||
|
||||
/// <summary>
|
||||
/// Creates a copy with sequence number set.
|
||||
/// </summary>
|
||||
public TimelineEvent WithSequence(long seq)
|
||||
=> this with { EventSeq = seq };
|
||||
|
||||
/// <summary>
|
||||
/// Generates an idempotency key for this event.
|
||||
/// </summary>
|
||||
public string GenerateIdempotencyKey()
|
||||
=> $"timeline:{TenantId}:{EventType}:{EventId}";
|
||||
|
||||
private static string NormalizeJson(string json)
|
||||
{
|
||||
using var doc = JsonDocument.Parse(json);
|
||||
return JsonSerializer.Serialize(doc.RootElement, CanonicalJsonOptions);
|
||||
}
|
||||
|
||||
private static string ComputeHash(string content)
|
||||
{
|
||||
var bytes = Encoding.UTF8.GetBytes(content);
|
||||
var hash = SHA256.HashData(bytes);
|
||||
return $"sha256:{Convert.ToHexString(hash).ToLowerInvariant()}";
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Event severity level.
|
||||
/// </summary>
|
||||
public enum TimelineEventSeverity
|
||||
{
|
||||
Debug,
|
||||
Info,
|
||||
Warning,
|
||||
Error,
|
||||
Critical
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Reference to associated evidence bundle or attestation.
|
||||
/// </summary>
|
||||
public sealed record EvidencePointer(
|
||||
/// <summary>Type of evidence being referenced.</summary>
|
||||
EvidencePointerType Type,
|
||||
|
||||
/// <summary>Evidence bundle identifier.</summary>
|
||||
Guid? BundleId,
|
||||
|
||||
/// <summary>Content digest of the evidence bundle.</summary>
|
||||
string? BundleDigest,
|
||||
|
||||
/// <summary>Subject URI for the attestation.</summary>
|
||||
string? AttestationSubject,
|
||||
|
||||
/// <summary>Digest of the attestation envelope.</summary>
|
||||
string? AttestationDigest,
|
||||
|
||||
/// <summary>URI to the evidence manifest.</summary>
|
||||
string? ManifestUri,
|
||||
|
||||
/// <summary>Path within evidence locker storage.</summary>
|
||||
string? LockerPath)
|
||||
{
|
||||
/// <summary>
|
||||
/// Creates a bundle evidence pointer.
|
||||
/// </summary>
|
||||
public static EvidencePointer Bundle(Guid bundleId, string? bundleDigest = null)
|
||||
=> new(EvidencePointerType.Bundle, bundleId, bundleDigest, null, null, null, null);
|
||||
|
||||
/// <summary>
|
||||
/// Creates an attestation evidence pointer.
|
||||
/// </summary>
|
||||
public static EvidencePointer Attestation(string subject, string? digest = null)
|
||||
=> new(EvidencePointerType.Attestation, null, null, subject, digest, null, null);
|
||||
|
||||
/// <summary>
|
||||
/// Creates a manifest evidence pointer.
|
||||
/// </summary>
|
||||
public static EvidencePointer Manifest(string uri, string? lockerPath = null)
|
||||
=> new(EvidencePointerType.Manifest, null, null, null, null, uri, lockerPath);
|
||||
|
||||
/// <summary>
|
||||
/// Creates an artifact evidence pointer.
|
||||
/// </summary>
|
||||
public static EvidencePointer Artifact(string lockerPath, string? digest = null)
|
||||
=> new(EvidencePointerType.Artifact, null, digest, null, null, null, lockerPath);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Type of evidence being referenced.
|
||||
/// </summary>
|
||||
public enum EvidencePointerType
|
||||
{
|
||||
Bundle,
|
||||
Attestation,
|
||||
Manifest,
|
||||
Artifact
|
||||
}
|
||||
@@ -0,0 +1,495 @@
|
||||
using Microsoft.Extensions.Logging;
|
||||
|
||||
namespace StellaOps.JobEngine.Core.Domain.Events;
|
||||
|
||||
/// <summary>
|
||||
/// Service for emitting timeline events with trace IDs and retries.
|
||||
/// Per ORCH-OBS-52-001.
|
||||
/// </summary>
|
||||
public interface ITimelineEventEmitter
|
||||
{
|
||||
/// <summary>
|
||||
/// Emits a timeline event.
|
||||
/// </summary>
|
||||
Task<TimelineEmitResult> EmitAsync(TimelineEvent evt, CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>
|
||||
/// Emits multiple timeline events in batch.
|
||||
/// </summary>
|
||||
Task<TimelineBatchEmitResult> EmitBatchAsync(IEnumerable<TimelineEvent> events, CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>
|
||||
/// Creates and emits a job lifecycle event.
|
||||
/// </summary>
|
||||
Task<TimelineEmitResult> EmitJobEventAsync(
|
||||
string tenantId,
|
||||
Guid jobId,
|
||||
string eventType,
|
||||
object? payload = null,
|
||||
string? actor = null,
|
||||
string? correlationId = null,
|
||||
string? traceId = null,
|
||||
string? projectId = null,
|
||||
IReadOnlyDictionary<string, string>? attributes = null,
|
||||
CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>
|
||||
/// Creates and emits a run lifecycle event.
|
||||
/// </summary>
|
||||
Task<TimelineEmitResult> EmitRunEventAsync(
|
||||
string tenantId,
|
||||
Guid runId,
|
||||
string eventType,
|
||||
object? payload = null,
|
||||
string? actor = null,
|
||||
string? correlationId = null,
|
||||
string? traceId = null,
|
||||
string? projectId = null,
|
||||
IReadOnlyDictionary<string, string>? attributes = null,
|
||||
CancellationToken cancellationToken = default);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Result of timeline event emission.
|
||||
/// </summary>
|
||||
public sealed record TimelineEmitResult(
|
||||
/// <summary>Whether the event was emitted successfully.</summary>
|
||||
bool Success,
|
||||
|
||||
/// <summary>The emitted event (with sequence if assigned).</summary>
|
||||
TimelineEvent Event,
|
||||
|
||||
/// <summary>Whether the event was deduplicated.</summary>
|
||||
bool Deduplicated,
|
||||
|
||||
/// <summary>Error message if emission failed.</summary>
|
||||
string? Error);
|
||||
|
||||
/// <summary>
|
||||
/// Result of batch timeline event emission.
|
||||
/// </summary>
|
||||
public sealed record TimelineBatchEmitResult(
|
||||
/// <summary>Number of events emitted successfully.</summary>
|
||||
int Emitted,
|
||||
|
||||
/// <summary>Number of events deduplicated.</summary>
|
||||
int Deduplicated,
|
||||
|
||||
/// <summary>Number of events that failed.</summary>
|
||||
int Failed,
|
||||
|
||||
/// <summary>Errors encountered.</summary>
|
||||
IReadOnlyList<string> Errors)
|
||||
{
|
||||
/// <summary>Total events processed.</summary>
|
||||
public int Total => Emitted + Deduplicated + Failed;
|
||||
|
||||
/// <summary>Whether any events were emitted.</summary>
|
||||
public bool HasEmitted => Emitted > 0;
|
||||
|
||||
/// <summary>Whether any errors occurred.</summary>
|
||||
public bool HasErrors => Failed > 0 || Errors.Count > 0;
|
||||
|
||||
/// <summary>Creates an empty result.</summary>
|
||||
public static TimelineBatchEmitResult Empty => new(0, 0, 0, []);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Default implementation of timeline event emitter.
|
||||
/// </summary>
|
||||
public sealed class TimelineEventEmitter : ITimelineEventEmitter
|
||||
{
|
||||
private const string Source = "jobengine";
|
||||
private readonly ITimelineEventSink _sink;
|
||||
private readonly TimeProvider _timeProvider;
|
||||
private readonly ILogger<TimelineEventEmitter> _logger;
|
||||
private readonly TimelineEmitterOptions _options;
|
||||
|
||||
public TimelineEventEmitter(
|
||||
ITimelineEventSink sink,
|
||||
TimeProvider timeProvider,
|
||||
ILogger<TimelineEventEmitter> logger,
|
||||
TimelineEmitterOptions? options = null)
|
||||
{
|
||||
_sink = sink ?? throw new ArgumentNullException(nameof(sink));
|
||||
_timeProvider = timeProvider ?? throw new ArgumentNullException(nameof(timeProvider));
|
||||
_logger = logger ?? throw new ArgumentNullException(nameof(logger));
|
||||
_options = options ?? TimelineEmitterOptions.Default;
|
||||
}
|
||||
|
||||
public async Task<TimelineEmitResult> EmitAsync(TimelineEvent evt, CancellationToken cancellationToken = default)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(evt);
|
||||
|
||||
var eventWithReceived = evt.WithReceivedAt(_timeProvider.GetUtcNow());
|
||||
|
||||
try
|
||||
{
|
||||
var result = await EmitWithRetryAsync(eventWithReceived, cancellationToken);
|
||||
return result;
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
_logger.LogError(ex,
|
||||
"Failed to emit timeline event {EventId} type {EventType} for tenant {TenantId}",
|
||||
evt.EventId, evt.EventType, evt.TenantId);
|
||||
|
||||
return new TimelineEmitResult(
|
||||
Success: false,
|
||||
Event: eventWithReceived,
|
||||
Deduplicated: false,
|
||||
Error: ex.Message);
|
||||
}
|
||||
}
|
||||
|
||||
public async Task<TimelineBatchEmitResult> EmitBatchAsync(
|
||||
IEnumerable<TimelineEvent> events,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(events);
|
||||
|
||||
var emitted = 0;
|
||||
var deduplicated = 0;
|
||||
var failed = 0;
|
||||
var errors = new List<string>();
|
||||
|
||||
// Order by occurredAt then eventId for deterministic fan-out
|
||||
var ordered = events
|
||||
.OrderBy(e => e.OccurredAt)
|
||||
.ThenBy(e => e.EventId)
|
||||
.ToList();
|
||||
|
||||
foreach (var evt in ordered)
|
||||
{
|
||||
var result = await EmitAsync(evt, cancellationToken);
|
||||
|
||||
if (result.Success)
|
||||
{
|
||||
if (result.Deduplicated)
|
||||
deduplicated++;
|
||||
else
|
||||
emitted++;
|
||||
}
|
||||
else
|
||||
{
|
||||
failed++;
|
||||
if (result.Error is not null)
|
||||
errors.Add($"{evt.EventId}: {result.Error}");
|
||||
}
|
||||
}
|
||||
|
||||
return new TimelineBatchEmitResult(emitted, deduplicated, failed, errors);
|
||||
}
|
||||
|
||||
public async Task<TimelineEmitResult> EmitJobEventAsync(
|
||||
string tenantId,
|
||||
Guid jobId,
|
||||
string eventType,
|
||||
object? payload = null,
|
||||
string? actor = null,
|
||||
string? correlationId = null,
|
||||
string? traceId = null,
|
||||
string? projectId = null,
|
||||
IReadOnlyDictionary<string, string>? attributes = null,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
var attrs = MergeAttributes(attributes, new Dictionary<string, string>
|
||||
{
|
||||
["jobId"] = jobId.ToString()
|
||||
});
|
||||
|
||||
var evt = TimelineEvent.Create(
|
||||
tenantId: tenantId,
|
||||
eventType: eventType,
|
||||
source: Source,
|
||||
occurredAt: _timeProvider.GetUtcNow(),
|
||||
actor: actor,
|
||||
severity: GetSeverityForEventType(eventType),
|
||||
attributes: attrs,
|
||||
correlationId: correlationId,
|
||||
traceId: traceId,
|
||||
jobId: jobId,
|
||||
projectId: projectId,
|
||||
payload: payload);
|
||||
|
||||
return await EmitAsync(evt, cancellationToken);
|
||||
}
|
||||
|
||||
public async Task<TimelineEmitResult> EmitRunEventAsync(
|
||||
string tenantId,
|
||||
Guid runId,
|
||||
string eventType,
|
||||
object? payload = null,
|
||||
string? actor = null,
|
||||
string? correlationId = null,
|
||||
string? traceId = null,
|
||||
string? projectId = null,
|
||||
IReadOnlyDictionary<string, string>? attributes = null,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
var attrs = MergeAttributes(attributes, new Dictionary<string, string>
|
||||
{
|
||||
["runId"] = runId.ToString()
|
||||
});
|
||||
|
||||
var evt = TimelineEvent.Create(
|
||||
tenantId: tenantId,
|
||||
eventType: eventType,
|
||||
source: Source,
|
||||
occurredAt: _timeProvider.GetUtcNow(),
|
||||
actor: actor,
|
||||
severity: GetSeverityForEventType(eventType),
|
||||
attributes: attrs,
|
||||
correlationId: correlationId,
|
||||
traceId: traceId,
|
||||
runId: runId,
|
||||
projectId: projectId,
|
||||
payload: payload);
|
||||
|
||||
return await EmitAsync(evt, cancellationToken);
|
||||
}
|
||||
|
||||
private async Task<TimelineEmitResult> EmitWithRetryAsync(
|
||||
TimelineEvent evt,
|
||||
CancellationToken cancellationToken)
|
||||
{
|
||||
var attempt = 0;
|
||||
var delay = _options.RetryDelay;
|
||||
|
||||
while (true)
|
||||
{
|
||||
try
|
||||
{
|
||||
var sinkResult = await _sink.WriteAsync(evt, cancellationToken);
|
||||
|
||||
if (sinkResult.Deduplicated)
|
||||
{
|
||||
_logger.LogDebug(
|
||||
"Timeline event {EventId} deduplicated",
|
||||
evt.EventId);
|
||||
|
||||
return new TimelineEmitResult(
|
||||
Success: true,
|
||||
Event: evt,
|
||||
Deduplicated: true,
|
||||
Error: null);
|
||||
}
|
||||
|
||||
_logger.LogInformation(
|
||||
"Emitted timeline event {EventId} type {EventType} tenant {TenantId} seq {Seq}",
|
||||
evt.EventId, evt.EventType, evt.TenantId, sinkResult.Sequence);
|
||||
|
||||
return new TimelineEmitResult(
|
||||
Success: true,
|
||||
Event: sinkResult.Sequence.HasValue ? evt.WithSequence(sinkResult.Sequence.Value) : evt,
|
||||
Deduplicated: false,
|
||||
Error: null);
|
||||
}
|
||||
catch (Exception ex) when (attempt < _options.MaxRetries && IsTransient(ex))
|
||||
{
|
||||
attempt++;
|
||||
_logger.LogWarning(ex,
|
||||
"Transient failure emitting timeline event {EventId}, attempt {Attempt}/{MaxRetries}",
|
||||
evt.EventId, attempt, _options.MaxRetries);
|
||||
|
||||
await Task.Delay(delay, cancellationToken);
|
||||
delay = TimeSpan.FromMilliseconds(delay.TotalMilliseconds * 2);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private static IReadOnlyDictionary<string, string> MergeAttributes(
|
||||
IReadOnlyDictionary<string, string>? existing,
|
||||
Dictionary<string, string> additional)
|
||||
{
|
||||
if (existing is null || existing.Count == 0)
|
||||
return additional;
|
||||
|
||||
var merged = new Dictionary<string, string>(existing);
|
||||
foreach (var (key, value) in additional)
|
||||
{
|
||||
merged.TryAdd(key, value);
|
||||
}
|
||||
return merged;
|
||||
}
|
||||
|
||||
private static TimelineEventSeverity GetSeverityForEventType(string eventType)
|
||||
{
|
||||
return eventType switch
|
||||
{
|
||||
var t when t.Contains(".failed") => TimelineEventSeverity.Error,
|
||||
var t when t.Contains(".error") => TimelineEventSeverity.Error,
|
||||
var t when t.Contains(".warning") => TimelineEventSeverity.Warning,
|
||||
var t when t.Contains(".critical") => TimelineEventSeverity.Critical,
|
||||
_ => TimelineEventSeverity.Info
|
||||
};
|
||||
}
|
||||
|
||||
private static bool IsTransient(Exception ex)
|
||||
{
|
||||
return ex is TimeoutException or
|
||||
TaskCanceledException or
|
||||
System.Net.Http.HttpRequestException or
|
||||
System.IO.IOException;
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Options for timeline event emitter.
|
||||
/// </summary>
|
||||
public sealed record TimelineEmitterOptions(
|
||||
/// <summary>Maximum retry attempts for transient failures.</summary>
|
||||
int MaxRetries,
|
||||
|
||||
/// <summary>Base delay between retries.</summary>
|
||||
TimeSpan RetryDelay,
|
||||
|
||||
/// <summary>Whether to include evidence pointers.</summary>
|
||||
bool IncludeEvidencePointers)
|
||||
{
|
||||
/// <summary>Default emitter options.</summary>
|
||||
public static TimelineEmitterOptions Default => new(
|
||||
MaxRetries: 3,
|
||||
RetryDelay: TimeSpan.FromSeconds(1),
|
||||
IncludeEvidencePointers: true);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Sink for timeline events (Kafka, NATS, file, etc.).
|
||||
/// </summary>
|
||||
public interface ITimelineEventSink
|
||||
{
|
||||
/// <summary>
|
||||
/// Writes a timeline event to the sink.
|
||||
/// </summary>
|
||||
Task<TimelineSinkWriteResult> WriteAsync(TimelineEvent evt, CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>
|
||||
/// Writes multiple timeline events to the sink.
|
||||
/// </summary>
|
||||
Task<TimelineSinkBatchWriteResult> WriteBatchAsync(IEnumerable<TimelineEvent> events, CancellationToken cancellationToken = default);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Result of writing to timeline sink.
|
||||
/// </summary>
|
||||
public sealed record TimelineSinkWriteResult(
|
||||
/// <summary>Whether the event was written successfully.</summary>
|
||||
bool Success,
|
||||
|
||||
/// <summary>Assigned sequence number if applicable.</summary>
|
||||
long? Sequence,
|
||||
|
||||
/// <summary>Whether the event was deduplicated.</summary>
|
||||
bool Deduplicated,
|
||||
|
||||
/// <summary>Error message if write failed.</summary>
|
||||
string? Error);
|
||||
|
||||
/// <summary>
|
||||
/// Result of batch writing to timeline sink.
|
||||
/// </summary>
|
||||
public sealed record TimelineSinkBatchWriteResult(
|
||||
/// <summary>Number of events written successfully.</summary>
|
||||
int Written,
|
||||
|
||||
/// <summary>Number of events deduplicated.</summary>
|
||||
int Deduplicated,
|
||||
|
||||
/// <summary>Number of events that failed.</summary>
|
||||
int Failed);
|
||||
|
||||
/// <summary>
|
||||
/// In-memory timeline event sink for testing.
|
||||
/// </summary>
|
||||
public sealed class InMemoryTimelineEventSink : ITimelineEventSink
|
||||
{
|
||||
private readonly List<TimelineEvent> _events = new();
|
||||
private readonly HashSet<Guid> _seenIds = new();
|
||||
private readonly object _lock = new();
|
||||
private long _sequence;
|
||||
|
||||
public Task<TimelineSinkWriteResult> WriteAsync(TimelineEvent evt, CancellationToken cancellationToken = default)
|
||||
{
|
||||
lock (_lock)
|
||||
{
|
||||
if (!_seenIds.Add(evt.EventId))
|
||||
{
|
||||
return Task.FromResult(new TimelineSinkWriteResult(
|
||||
Success: true,
|
||||
Sequence: null,
|
||||
Deduplicated: true,
|
||||
Error: null));
|
||||
}
|
||||
|
||||
var seq = ++_sequence;
|
||||
var eventWithSeq = evt.WithSequence(seq);
|
||||
_events.Add(eventWithSeq);
|
||||
|
||||
return Task.FromResult(new TimelineSinkWriteResult(
|
||||
Success: true,
|
||||
Sequence: seq,
|
||||
Deduplicated: false,
|
||||
Error: null));
|
||||
}
|
||||
}
|
||||
|
||||
public Task<TimelineSinkBatchWriteResult> WriteBatchAsync(IEnumerable<TimelineEvent> events, CancellationToken cancellationToken = default)
|
||||
{
|
||||
var written = 0;
|
||||
var deduplicated = 0;
|
||||
|
||||
lock (_lock)
|
||||
{
|
||||
foreach (var evt in events)
|
||||
{
|
||||
if (!_seenIds.Add(evt.EventId))
|
||||
{
|
||||
deduplicated++;
|
||||
continue;
|
||||
}
|
||||
|
||||
var seq = ++_sequence;
|
||||
_events.Add(evt.WithSequence(seq));
|
||||
written++;
|
||||
}
|
||||
}
|
||||
|
||||
return Task.FromResult(new TimelineSinkBatchWriteResult(written, deduplicated, 0));
|
||||
}
|
||||
|
||||
/// <summary>Gets all events (for testing).</summary>
|
||||
public IReadOnlyList<TimelineEvent> GetEvents()
|
||||
{
|
||||
lock (_lock) { return _events.ToList(); }
|
||||
}
|
||||
|
||||
/// <summary>Gets events for a tenant (for testing).</summary>
|
||||
public IReadOnlyList<TimelineEvent> GetEvents(string tenantId)
|
||||
{
|
||||
lock (_lock) { return _events.Where(e => e.TenantId == tenantId).ToList(); }
|
||||
}
|
||||
|
||||
/// <summary>Gets events by type (for testing).</summary>
|
||||
public IReadOnlyList<TimelineEvent> GetEventsByType(string eventType)
|
||||
{
|
||||
lock (_lock) { return _events.Where(e => e.EventType == eventType).ToList(); }
|
||||
}
|
||||
|
||||
/// <summary>Clears all events (for testing).</summary>
|
||||
public void Clear()
|
||||
{
|
||||
lock (_lock)
|
||||
{
|
||||
_events.Clear();
|
||||
_seenIds.Clear();
|
||||
_sequence = 0;
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>Gets the current event count.</summary>
|
||||
public int Count
|
||||
{
|
||||
get { lock (_lock) { return _events.Count; } }
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,559 @@
|
||||
using System.Security.Cryptography;
|
||||
using System.Text;
|
||||
using System.Text.Json;
|
||||
|
||||
namespace StellaOps.JobEngine.Core.Domain.Export;
|
||||
|
||||
/// <summary>
|
||||
/// Export job payload containing export-specific parameters.
|
||||
/// Serialized to JSON and stored in Job.Payload.
|
||||
/// </summary>
|
||||
public sealed record ExportJobPayload(
|
||||
/// <summary>Export format (e.g., "json", "ndjson", "csv", "spdx", "cyclonedx").</summary>
|
||||
string Format,
|
||||
|
||||
/// <summary>Start of time range to export (inclusive).</summary>
|
||||
DateTimeOffset? StartTime,
|
||||
|
||||
/// <summary>End of time range to export (exclusive).</summary>
|
||||
DateTimeOffset? EndTime,
|
||||
|
||||
/// <summary>Filter by source ID.</summary>
|
||||
Guid? SourceId,
|
||||
|
||||
/// <summary>Filter by project ID.</summary>
|
||||
string? ProjectId,
|
||||
|
||||
/// <summary>Filter by specific entity IDs.</summary>
|
||||
IReadOnlyList<Guid>? EntityIds,
|
||||
|
||||
/// <summary>Maximum entries to export (pagination/limit).</summary>
|
||||
int? MaxEntries,
|
||||
|
||||
/// <summary>Whether to include provenance metadata.</summary>
|
||||
bool IncludeProvenance,
|
||||
|
||||
/// <summary>Whether to sign the export output.</summary>
|
||||
bool SignOutput,
|
||||
|
||||
/// <summary>Compression format (null = none, "gzip", "zstd").</summary>
|
||||
string? Compression,
|
||||
|
||||
/// <summary>Destination URI for the export output.</summary>
|
||||
string? DestinationUri,
|
||||
|
||||
/// <summary>Callback URL for completion notification.</summary>
|
||||
string? CallbackUrl,
|
||||
|
||||
/// <summary>Additional export-specific options.</summary>
|
||||
IReadOnlyDictionary<string, string>? Options)
|
||||
{
|
||||
/// <summary>Default export payload with minimal settings.</summary>
|
||||
public static ExportJobPayload Default(string format) => new(
|
||||
Format: format,
|
||||
StartTime: null,
|
||||
EndTime: null,
|
||||
SourceId: null,
|
||||
ProjectId: null,
|
||||
EntityIds: null,
|
||||
MaxEntries: null,
|
||||
IncludeProvenance: true,
|
||||
SignOutput: true,
|
||||
Compression: null,
|
||||
DestinationUri: null,
|
||||
CallbackUrl: null,
|
||||
Options: null);
|
||||
|
||||
/// <summary>Serializes the payload to JSON.</summary>
|
||||
public string ToJson() => JsonSerializer.Serialize(this, JsonOptions);
|
||||
|
||||
/// <summary>Computes SHA-256 digest of the payload.</summary>
|
||||
public string ComputeDigest()
|
||||
{
|
||||
var json = ToJson();
|
||||
var bytes = Encoding.UTF8.GetBytes(json);
|
||||
var hash = SHA256.HashData(bytes);
|
||||
return $"sha256:{Convert.ToHexStringLower(hash)}";
|
||||
}
|
||||
|
||||
/// <summary>Deserializes a payload from JSON. Returns null for invalid JSON.</summary>
|
||||
public static ExportJobPayload? FromJson(string json)
|
||||
{
|
||||
try
|
||||
{
|
||||
return JsonSerializer.Deserialize<ExportJobPayload>(json, JsonOptions);
|
||||
}
|
||||
catch (JsonException)
|
||||
{
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
private static readonly JsonSerializerOptions JsonOptions = new()
|
||||
{
|
||||
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
|
||||
WriteIndented = false
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Export job result containing output metadata.
|
||||
/// </summary>
|
||||
public sealed record ExportJobResult(
|
||||
/// <summary>Output URI where export is stored.</summary>
|
||||
string OutputUri,
|
||||
|
||||
/// <summary>SHA-256 digest of the output.</summary>
|
||||
string OutputDigest,
|
||||
|
||||
/// <summary>Output size in bytes.</summary>
|
||||
long OutputSizeBytes,
|
||||
|
||||
/// <summary>Number of entries exported.</summary>
|
||||
int EntryCount,
|
||||
|
||||
/// <summary>Export format used.</summary>
|
||||
string Format,
|
||||
|
||||
/// <summary>Compression applied (if any).</summary>
|
||||
string? Compression,
|
||||
|
||||
/// <summary>Provenance attestation URI (if signed).</summary>
|
||||
string? ProvenanceUri,
|
||||
|
||||
/// <summary>Start of actual exported time range.</summary>
|
||||
DateTimeOffset? ActualStartTime,
|
||||
|
||||
/// <summary>End of actual exported time range.</summary>
|
||||
DateTimeOffset? ActualEndTime,
|
||||
|
||||
/// <summary>Export generation timestamp.</summary>
|
||||
DateTimeOffset GeneratedAt,
|
||||
|
||||
/// <summary>Duration of export operation in seconds.</summary>
|
||||
double DurationSeconds)
|
||||
{
|
||||
/// <summary>Serializes the result to JSON.</summary>
|
||||
public string ToJson() => JsonSerializer.Serialize(this, JsonOptions);
|
||||
|
||||
/// <summary>Deserializes a result from JSON.</summary>
|
||||
public static ExportJobResult? FromJson(string json) =>
|
||||
JsonSerializer.Deserialize<ExportJobResult>(json, JsonOptions);
|
||||
|
||||
private static readonly JsonSerializerOptions JsonOptions = new()
|
||||
{
|
||||
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
|
||||
WriteIndented = false
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Export job progress information.
|
||||
/// </summary>
|
||||
public sealed record ExportJobProgress(
|
||||
/// <summary>Current phase of export.</summary>
|
||||
ExportPhase Phase,
|
||||
|
||||
/// <summary>Entries processed so far.</summary>
|
||||
int EntriesProcessed,
|
||||
|
||||
/// <summary>Total entries to process (if known).</summary>
|
||||
int? TotalEntries,
|
||||
|
||||
/// <summary>Bytes written so far.</summary>
|
||||
long BytesWritten,
|
||||
|
||||
/// <summary>Current progress message.</summary>
|
||||
string? Message)
|
||||
{
|
||||
/// <summary>Computes progress percentage (0-100).</summary>
|
||||
public double? ProgressPercent => TotalEntries > 0
|
||||
? Math.Min(100.0, 100.0 * EntriesProcessed / TotalEntries.Value)
|
||||
: null;
|
||||
|
||||
/// <summary>Serializes the progress to JSON.</summary>
|
||||
public string ToJson() => JsonSerializer.Serialize(this, JsonOptions);
|
||||
|
||||
/// <summary>Deserializes progress from JSON.</summary>
|
||||
public static ExportJobProgress? FromJson(string json) =>
|
||||
JsonSerializer.Deserialize<ExportJobProgress>(json, JsonOptions);
|
||||
|
||||
private static readonly JsonSerializerOptions JsonOptions = new()
|
||||
{
|
||||
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
|
||||
WriteIndented = false
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Export job phases.
|
||||
/// </summary>
|
||||
public enum ExportPhase
|
||||
{
|
||||
/// <summary>Initializing export.</summary>
|
||||
Initializing = 0,
|
||||
|
||||
/// <summary>Querying data.</summary>
|
||||
Querying = 1,
|
||||
|
||||
/// <summary>Formatting output.</summary>
|
||||
Formatting = 2,
|
||||
|
||||
/// <summary>Compressing output.</summary>
|
||||
Compressing = 3,
|
||||
|
||||
/// <summary>Signing/attesting output.</summary>
|
||||
Signing = 4,
|
||||
|
||||
/// <summary>Uploading to destination.</summary>
|
||||
Uploading = 5,
|
||||
|
||||
/// <summary>Finalizing export.</summary>
|
||||
Finalizing = 6,
|
||||
|
||||
/// <summary>Export completed.</summary>
|
||||
Completed = 7
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Distribution metadata for export jobs.
|
||||
/// Tracks where exports are stored, download URLs, and replication status.
|
||||
/// </summary>
|
||||
public sealed record ExportDistribution(
|
||||
/// <summary>Primary storage location URI.</summary>
|
||||
string PrimaryUri,
|
||||
|
||||
/// <summary>Pre-signed download URL (time-limited).</summary>
|
||||
string? DownloadUrl,
|
||||
|
||||
/// <summary>Download URL expiration time.</summary>
|
||||
DateTimeOffset? DownloadUrlExpiresAt,
|
||||
|
||||
/// <summary>Storage provider (e.g., "s3", "azure-blob", "gcs", "local").</summary>
|
||||
string StorageProvider,
|
||||
|
||||
/// <summary>Storage region/location.</summary>
|
||||
string? Region,
|
||||
|
||||
/// <summary>Storage tier (e.g., "hot", "cool", "archive").</summary>
|
||||
string StorageTier,
|
||||
|
||||
/// <summary>Replication targets with their URIs.</summary>
|
||||
IReadOnlyDictionary<string, string>? Replicas,
|
||||
|
||||
/// <summary>Replication status per target.</summary>
|
||||
IReadOnlyDictionary<string, ReplicationStatus>? ReplicationStatus,
|
||||
|
||||
/// <summary>Content type of the export.</summary>
|
||||
string ContentType,
|
||||
|
||||
/// <summary>Access control list (principals with access).</summary>
|
||||
IReadOnlyList<string>? AccessList,
|
||||
|
||||
/// <summary>Whether export is publicly accessible.</summary>
|
||||
bool IsPublic,
|
||||
|
||||
/// <summary>Distribution creation timestamp.</summary>
|
||||
DateTimeOffset CreatedAt)
|
||||
{
|
||||
/// <summary>Serializes distribution to JSON.</summary>
|
||||
public string ToJson() => JsonSerializer.Serialize(this, JsonOptions);
|
||||
|
||||
/// <summary>Deserializes distribution from JSON.</summary>
|
||||
public static ExportDistribution? FromJson(string json)
|
||||
{
|
||||
try
|
||||
{
|
||||
return JsonSerializer.Deserialize<ExportDistribution>(json, JsonOptions);
|
||||
}
|
||||
catch (JsonException)
|
||||
{
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>Creates a download URL with expiration.</summary>
|
||||
public ExportDistribution WithDownloadUrl(string url, TimeSpan validity, DateTimeOffset timestamp) => this with
|
||||
{
|
||||
DownloadUrl = url,
|
||||
DownloadUrlExpiresAt = timestamp.Add(validity)
|
||||
};
|
||||
|
||||
/// <summary>Adds a replication target.</summary>
|
||||
public ExportDistribution WithReplica(string target, string uri, ReplicationStatus status)
|
||||
{
|
||||
var replicas = Replicas is null
|
||||
? new Dictionary<string, string> { [target] = uri }
|
||||
: new Dictionary<string, string>(Replicas) { [target] = uri };
|
||||
|
||||
var replicationStatus = ReplicationStatus is null
|
||||
? new Dictionary<string, ReplicationStatus> { [target] = status }
|
||||
: new Dictionary<string, ReplicationStatus>(ReplicationStatus) { [target] = status };
|
||||
|
||||
return this with { Replicas = replicas, ReplicationStatus = replicationStatus };
|
||||
}
|
||||
|
||||
private static readonly JsonSerializerOptions JsonOptions = new()
|
||||
{
|
||||
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
|
||||
WriteIndented = false
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Replication status for a distribution target.
|
||||
/// </summary>
|
||||
public enum ReplicationStatus
|
||||
{
|
||||
/// <summary>Replication pending.</summary>
|
||||
Pending = 0,
|
||||
|
||||
/// <summary>Replication in progress.</summary>
|
||||
InProgress = 1,
|
||||
|
||||
/// <summary>Replication completed successfully.</summary>
|
||||
Completed = 2,
|
||||
|
||||
/// <summary>Replication failed.</summary>
|
||||
Failed = 3,
|
||||
|
||||
/// <summary>Replication skipped.</summary>
|
||||
Skipped = 4
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Retention policy and timestamps for export jobs.
|
||||
/// Controls when exports are archived, deleted, or need manual action.
|
||||
/// </summary>
|
||||
public sealed record ExportRetention(
|
||||
/// <summary>Retention policy name.</summary>
|
||||
string PolicyName,
|
||||
|
||||
/// <summary>When the export becomes available for download.</summary>
|
||||
DateTimeOffset AvailableAt,
|
||||
|
||||
/// <summary>When the export should be moved to archive tier.</summary>
|
||||
DateTimeOffset? ArchiveAt,
|
||||
|
||||
/// <summary>When the export should be deleted.</summary>
|
||||
DateTimeOffset? ExpiresAt,
|
||||
|
||||
/// <summary>When the export was actually archived.</summary>
|
||||
DateTimeOffset? ArchivedAt,
|
||||
|
||||
/// <summary>When the export was actually deleted.</summary>
|
||||
DateTimeOffset? DeletedAt,
|
||||
|
||||
/// <summary>Whether legal hold prevents deletion.</summary>
|
||||
bool LegalHold,
|
||||
|
||||
/// <summary>Legal hold reason (if applicable).</summary>
|
||||
string? LegalHoldReason,
|
||||
|
||||
/// <summary>Whether export requires explicit release before deletion.</summary>
|
||||
bool RequiresRelease,
|
||||
|
||||
/// <summary>Who released the export for deletion (if applicable).</summary>
|
||||
string? ReleasedBy,
|
||||
|
||||
/// <summary>When export was released for deletion.</summary>
|
||||
DateTimeOffset? ReleasedAt,
|
||||
|
||||
/// <summary>Number of times retention was extended.</summary>
|
||||
int ExtensionCount,
|
||||
|
||||
/// <summary>Retention metadata (audit trail, etc.).</summary>
|
||||
IReadOnlyDictionary<string, string>? Metadata)
|
||||
{
|
||||
/// <summary>Default retention policy names.</summary>
|
||||
public static class PolicyNames
|
||||
{
|
||||
public const string Default = "default";
|
||||
public const string Compliance = "compliance";
|
||||
public const string Temporary = "temporary";
|
||||
public const string LongTerm = "long-term";
|
||||
public const string Permanent = "permanent";
|
||||
}
|
||||
|
||||
/// <summary>Default retention periods.</summary>
|
||||
public static class DefaultPeriods
|
||||
{
|
||||
public static readonly TimeSpan Temporary = TimeSpan.FromDays(7);
|
||||
public static readonly TimeSpan Default = TimeSpan.FromDays(30);
|
||||
public static readonly TimeSpan LongTerm = TimeSpan.FromDays(365);
|
||||
public static readonly TimeSpan ArchiveDelay = TimeSpan.FromDays(90);
|
||||
}
|
||||
|
||||
/// <summary>Creates a default retention policy.</summary>
|
||||
public static ExportRetention Default(DateTimeOffset now) => new(
|
||||
PolicyName: PolicyNames.Default,
|
||||
AvailableAt: now,
|
||||
ArchiveAt: now.Add(DefaultPeriods.ArchiveDelay),
|
||||
ExpiresAt: now.Add(DefaultPeriods.Default),
|
||||
ArchivedAt: null,
|
||||
DeletedAt: null,
|
||||
LegalHold: false,
|
||||
LegalHoldReason: null,
|
||||
RequiresRelease: false,
|
||||
ReleasedBy: null,
|
||||
ReleasedAt: null,
|
||||
ExtensionCount: 0,
|
||||
Metadata: null);
|
||||
|
||||
/// <summary>Creates a temporary retention policy.</summary>
|
||||
public static ExportRetention Temporary(DateTimeOffset now) => new(
|
||||
PolicyName: PolicyNames.Temporary,
|
||||
AvailableAt: now,
|
||||
ArchiveAt: null,
|
||||
ExpiresAt: now.Add(DefaultPeriods.Temporary),
|
||||
ArchivedAt: null,
|
||||
DeletedAt: null,
|
||||
LegalHold: false,
|
||||
LegalHoldReason: null,
|
||||
RequiresRelease: false,
|
||||
ReleasedBy: null,
|
||||
ReleasedAt: null,
|
||||
ExtensionCount: 0,
|
||||
Metadata: null);
|
||||
|
||||
/// <summary>Creates a compliance retention policy (requires release).</summary>
|
||||
public static ExportRetention Compliance(DateTimeOffset now, TimeSpan minimumRetention) => new(
|
||||
PolicyName: PolicyNames.Compliance,
|
||||
AvailableAt: now,
|
||||
ArchiveAt: now.Add(DefaultPeriods.ArchiveDelay),
|
||||
ExpiresAt: now.Add(minimumRetention),
|
||||
ArchivedAt: null,
|
||||
DeletedAt: null,
|
||||
LegalHold: false,
|
||||
LegalHoldReason: null,
|
||||
RequiresRelease: true,
|
||||
ReleasedBy: null,
|
||||
ReleasedAt: null,
|
||||
ExtensionCount: 0,
|
||||
Metadata: null);
|
||||
|
||||
/// <summary>Whether the export is expired at the given timestamp.</summary>
|
||||
public bool IsExpiredAt(DateTimeOffset timestamp) => ExpiresAt.HasValue && timestamp >= ExpiresAt.Value && !LegalHold;
|
||||
|
||||
/// <summary>Whether the export should be archived at the given timestamp.</summary>
|
||||
public bool ShouldArchiveAt(DateTimeOffset timestamp) => ArchiveAt.HasValue && timestamp >= ArchiveAt.Value && !ArchivedAt.HasValue;
|
||||
|
||||
/// <summary>Whether the export can be deleted at the given timestamp.</summary>
|
||||
public bool CanDeleteAt(DateTimeOffset timestamp) => IsExpiredAt(timestamp) && (!RequiresRelease || ReleasedAt.HasValue) && !LegalHold;
|
||||
|
||||
/// <summary>Extends the retention period.</summary>
|
||||
public ExportRetention ExtendRetention(TimeSpan extension, DateTimeOffset timestamp, string? reason = null)
|
||||
{
|
||||
var metadata = Metadata is null
|
||||
? new Dictionary<string, string>()
|
||||
: new Dictionary<string, string>(Metadata);
|
||||
|
||||
metadata[$"extension_{ExtensionCount + 1}_at"] = timestamp.ToString("o");
|
||||
if (reason is not null)
|
||||
metadata[$"extension_{ExtensionCount + 1}_reason"] = reason;
|
||||
|
||||
return this with
|
||||
{
|
||||
ExpiresAt = (ExpiresAt ?? timestamp).Add(extension),
|
||||
ArchiveAt = ArchiveAt?.Add(extension),
|
||||
ExtensionCount = ExtensionCount + 1,
|
||||
Metadata = metadata
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>Places a legal hold on the export.</summary>
|
||||
public ExportRetention PlaceLegalHold(string reason) => this with
|
||||
{
|
||||
LegalHold = true,
|
||||
LegalHoldReason = reason
|
||||
};
|
||||
|
||||
/// <summary>Releases a legal hold.</summary>
|
||||
public ExportRetention ReleaseLegalHold() => this with
|
||||
{
|
||||
LegalHold = false,
|
||||
LegalHoldReason = null
|
||||
};
|
||||
|
||||
/// <summary>Releases the export for deletion.</summary>
|
||||
public ExportRetention Release(string releasedBy, DateTimeOffset timestamp) => this with
|
||||
{
|
||||
ReleasedBy = releasedBy,
|
||||
ReleasedAt = timestamp
|
||||
};
|
||||
|
||||
/// <summary>Marks the export as archived.</summary>
|
||||
public ExportRetention MarkArchived(DateTimeOffset timestamp) => this with
|
||||
{
|
||||
ArchivedAt = timestamp
|
||||
};
|
||||
|
||||
/// <summary>Marks the export as deleted.</summary>
|
||||
public ExportRetention MarkDeleted(DateTimeOffset timestamp) => this with
|
||||
{
|
||||
DeletedAt = timestamp
|
||||
};
|
||||
|
||||
/// <summary>Serializes retention to JSON.</summary>
|
||||
public string ToJson() => JsonSerializer.Serialize(this, JsonOptions);
|
||||
|
||||
/// <summary>Deserializes retention from JSON.</summary>
|
||||
public static ExportRetention? FromJson(string json)
|
||||
{
|
||||
try
|
||||
{
|
||||
return JsonSerializer.Deserialize<ExportRetention>(json, JsonOptions);
|
||||
}
|
||||
catch (JsonException)
|
||||
{
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
private static readonly JsonSerializerOptions JsonOptions = new()
|
||||
{
|
||||
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
|
||||
WriteIndented = false
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Complete export job state for streaming updates.
|
||||
/// </summary>
|
||||
public sealed record ExportJobState(
|
||||
/// <summary>Job ID.</summary>
|
||||
Guid JobId,
|
||||
|
||||
/// <summary>Export type.</summary>
|
||||
string ExportType,
|
||||
|
||||
/// <summary>Current status.</summary>
|
||||
string Status,
|
||||
|
||||
/// <summary>Current progress.</summary>
|
||||
ExportJobProgress? Progress,
|
||||
|
||||
/// <summary>Job result (when complete).</summary>
|
||||
ExportJobResult? Result,
|
||||
|
||||
/// <summary>Distribution metadata (when complete).</summary>
|
||||
ExportDistribution? Distribution,
|
||||
|
||||
/// <summary>Retention policy.</summary>
|
||||
ExportRetention? Retention,
|
||||
|
||||
/// <summary>Error message (when failed).</summary>
|
||||
string? Error,
|
||||
|
||||
/// <summary>State timestamp.</summary>
|
||||
DateTimeOffset Timestamp)
|
||||
{
|
||||
/// <summary>Serializes state to JSON.</summary>
|
||||
public string ToJson() => JsonSerializer.Serialize(this, JsonOptions);
|
||||
|
||||
private static readonly JsonSerializerOptions JsonOptions = new()
|
||||
{
|
||||
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
|
||||
WriteIndented = false
|
||||
};
|
||||
}
|
||||
@@ -0,0 +1,183 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain.Export;
|
||||
|
||||
/// <summary>
|
||||
/// Default policy settings for export jobs.
|
||||
/// These values are used when creating export job quotas and rate limits.
|
||||
/// </summary>
|
||||
public static class ExportJobPolicy
|
||||
{
|
||||
/// <summary>
|
||||
/// Default quota settings for export jobs.
|
||||
/// Export jobs are typically I/O bound and should be limited to prevent resource exhaustion.
|
||||
/// </summary>
|
||||
public static class QuotaDefaults
|
||||
{
|
||||
/// <summary>Maximum concurrent export jobs per tenant.</summary>
|
||||
public const int MaxActive = 5;
|
||||
|
||||
/// <summary>Maximum export jobs per hour per tenant.</summary>
|
||||
public const int MaxPerHour = 50;
|
||||
|
||||
/// <summary>Token bucket burst capacity.</summary>
|
||||
public const int BurstCapacity = 10;
|
||||
|
||||
/// <summary>Token refill rate (tokens per second).</summary>
|
||||
public const double RefillRate = 0.5;
|
||||
|
||||
/// <summary>Default priority for export jobs (lower than scan jobs).</summary>
|
||||
public const int DefaultPriority = -10;
|
||||
|
||||
/// <summary>Maximum retry attempts for export jobs.</summary>
|
||||
public const int MaxAttempts = 3;
|
||||
|
||||
/// <summary>Default lease duration in seconds.</summary>
|
||||
public const int DefaultLeaseSeconds = 600; // 10 minutes
|
||||
|
||||
/// <summary>Maximum lease duration in seconds.</summary>
|
||||
public const int MaxLeaseSeconds = 3600; // 1 hour
|
||||
|
||||
/// <summary>Heartbeat interval recommendation in seconds.</summary>
|
||||
public const int RecommendedHeartbeatInterval = 60;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Rate limiting settings for export jobs by type.
|
||||
/// Different export types may have different resource requirements.
|
||||
/// </summary>
|
||||
public static class RateLimits
|
||||
{
|
||||
/// <summary>Ledger export: moderate rate (database-heavy).</summary>
|
||||
public static readonly ExportRateLimit Ledger = new(
|
||||
MaxConcurrent: 3,
|
||||
MaxPerHour: 30,
|
||||
EstimatedDurationSeconds: 120);
|
||||
|
||||
/// <summary>SBOM export: higher rate (typically smaller datasets).</summary>
|
||||
public static readonly ExportRateLimit Sbom = new(
|
||||
MaxConcurrent: 5,
|
||||
MaxPerHour: 100,
|
||||
EstimatedDurationSeconds: 30);
|
||||
|
||||
/// <summary>VEX export: similar to SBOM.</summary>
|
||||
public static readonly ExportRateLimit Vex = new(
|
||||
MaxConcurrent: 5,
|
||||
MaxPerHour: 100,
|
||||
EstimatedDurationSeconds: 30);
|
||||
|
||||
/// <summary>Scan results export: moderate rate.</summary>
|
||||
public static readonly ExportRateLimit ScanResults = new(
|
||||
MaxConcurrent: 3,
|
||||
MaxPerHour: 50,
|
||||
EstimatedDurationSeconds: 60);
|
||||
|
||||
/// <summary>Policy evaluation export: moderate rate.</summary>
|
||||
public static readonly ExportRateLimit PolicyEvaluation = new(
|
||||
MaxConcurrent: 3,
|
||||
MaxPerHour: 50,
|
||||
EstimatedDurationSeconds: 60);
|
||||
|
||||
/// <summary>Attestation export: lower rate (cryptographic operations).</summary>
|
||||
public static readonly ExportRateLimit Attestation = new(
|
||||
MaxConcurrent: 2,
|
||||
MaxPerHour: 20,
|
||||
EstimatedDurationSeconds: 180);
|
||||
|
||||
/// <summary>Portable bundle export: lowest rate (large bundles).</summary>
|
||||
public static readonly ExportRateLimit PortableBundle = new(
|
||||
MaxConcurrent: 1,
|
||||
MaxPerHour: 10,
|
||||
EstimatedDurationSeconds: 600);
|
||||
|
||||
/// <summary>Gets rate limit for a specific export type.</summary>
|
||||
public static ExportRateLimit GetForJobType(string jobType) => jobType switch
|
||||
{
|
||||
ExportJobTypes.Ledger => Ledger,
|
||||
ExportJobTypes.Sbom => Sbom,
|
||||
ExportJobTypes.Vex => Vex,
|
||||
ExportJobTypes.ScanResults => ScanResults,
|
||||
ExportJobTypes.PolicyEvaluation => PolicyEvaluation,
|
||||
ExportJobTypes.Attestation => Attestation,
|
||||
ExportJobTypes.PortableBundle => PortableBundle,
|
||||
_ => new ExportRateLimit(MaxConcurrent: 3, MaxPerHour: 30, EstimatedDurationSeconds: 120)
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Timeout settings for export jobs.
|
||||
/// </summary>
|
||||
public static class Timeouts
|
||||
{
|
||||
/// <summary>Maximum time for an export job before it's considered stale.</summary>
|
||||
public static readonly TimeSpan MaxJobDuration = TimeSpan.FromHours(2);
|
||||
|
||||
/// <summary>Maximum time to wait for a heartbeat before reclaiming.</summary>
|
||||
public static readonly TimeSpan HeartbeatTimeout = TimeSpan.FromMinutes(5);
|
||||
|
||||
/// <summary>Backoff delay after failure before retry.</summary>
|
||||
public static readonly TimeSpan RetryBackoff = TimeSpan.FromMinutes(1);
|
||||
|
||||
/// <summary>Maximum backoff delay for exponential retry.</summary>
|
||||
public static readonly TimeSpan MaxRetryBackoff = TimeSpan.FromMinutes(30);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Creates a default quota for export jobs.
|
||||
/// </summary>
|
||||
public static Quota CreateDefaultQuota(
|
||||
string tenantId,
|
||||
string? jobType = null,
|
||||
string createdBy = "system")
|
||||
{
|
||||
throw new NotImplementedException("ExportJobPolicy.CreateDefaultQuota requires a timestamp parameter for deterministic behavior. Use the overload with DateTimeOffset now parameter.");
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Creates a default quota for export jobs with explicit timestamp.
|
||||
/// </summary>
|
||||
public static Quota CreateDefaultQuota(
|
||||
string tenantId,
|
||||
DateTimeOffset now,
|
||||
string? jobType = null,
|
||||
string createdBy = "system")
|
||||
{
|
||||
var rateLimit = jobType is not null && ExportJobTypes.IsExportJob(jobType)
|
||||
? RateLimits.GetForJobType(jobType)
|
||||
: new ExportRateLimit(
|
||||
QuotaDefaults.MaxActive,
|
||||
QuotaDefaults.MaxPerHour,
|
||||
QuotaDefaults.DefaultLeaseSeconds);
|
||||
|
||||
return new Quota(
|
||||
QuotaId: Guid.NewGuid(),
|
||||
TenantId: tenantId,
|
||||
JobType: jobType,
|
||||
MaxActive: rateLimit.MaxConcurrent,
|
||||
MaxPerHour: rateLimit.MaxPerHour,
|
||||
BurstCapacity: QuotaDefaults.BurstCapacity,
|
||||
RefillRate: QuotaDefaults.RefillRate,
|
||||
CurrentTokens: QuotaDefaults.BurstCapacity,
|
||||
LastRefillAt: now,
|
||||
CurrentActive: 0,
|
||||
CurrentHourCount: 0,
|
||||
CurrentHourStart: now,
|
||||
Paused: false,
|
||||
PauseReason: null,
|
||||
QuotaTicket: null,
|
||||
CreatedAt: now,
|
||||
UpdatedAt: now,
|
||||
UpdatedBy: createdBy);
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Rate limit configuration for an export type.
|
||||
/// </summary>
|
||||
public sealed record ExportRateLimit(
|
||||
/// <summary>Maximum concurrent jobs of this type.</summary>
|
||||
int MaxConcurrent,
|
||||
|
||||
/// <summary>Maximum jobs per hour.</summary>
|
||||
int MaxPerHour,
|
||||
|
||||
/// <summary>Estimated duration in seconds (for scheduling hints).</summary>
|
||||
int EstimatedDurationSeconds);
|
||||
@@ -0,0 +1,61 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain.Export;
|
||||
|
||||
/// <summary>
|
||||
/// Standard export job type identifiers.
|
||||
/// Export jobs follow the pattern "export.{target}" where target is the export destination/format.
|
||||
/// </summary>
|
||||
public static class ExportJobTypes
|
||||
{
|
||||
/// <summary>Job type prefix for all export jobs.</summary>
|
||||
public const string Prefix = "export.";
|
||||
|
||||
/// <summary>Run ledger export (audit trail, immutable snapshots).</summary>
|
||||
public const string Ledger = "export.ledger";
|
||||
|
||||
/// <summary>SBOM export (SPDX, CycloneDX formats).</summary>
|
||||
public const string Sbom = "export.sbom";
|
||||
|
||||
/// <summary>VEX document export.</summary>
|
||||
public const string Vex = "export.vex";
|
||||
|
||||
/// <summary>Scan results export.</summary>
|
||||
public const string ScanResults = "export.scan-results";
|
||||
|
||||
/// <summary>Policy evaluation export.</summary>
|
||||
public const string PolicyEvaluation = "export.policy-evaluation";
|
||||
|
||||
/// <summary>Attestation bundle export.</summary>
|
||||
public const string Attestation = "export.attestation";
|
||||
|
||||
/// <summary>Portable evidence bundle export (for air-gap transfer).</summary>
|
||||
public const string PortableBundle = "export.portable-bundle";
|
||||
|
||||
/// <summary>All known export job types.</summary>
|
||||
public static readonly IReadOnlyList<string> All =
|
||||
[
|
||||
Ledger,
|
||||
Sbom,
|
||||
Vex,
|
||||
ScanResults,
|
||||
PolicyEvaluation,
|
||||
Attestation,
|
||||
PortableBundle
|
||||
];
|
||||
|
||||
/// <summary>Checks if a job type is an export job.</summary>
|
||||
public static bool IsExportJob(string? jobType) =>
|
||||
jobType is not null && jobType.StartsWith(Prefix, StringComparison.OrdinalIgnoreCase);
|
||||
|
||||
/// <summary>Gets the export target from a job type (e.g., "ledger" from "export.ledger").</summary>
|
||||
public static string? GetExportTarget(string? jobType)
|
||||
{
|
||||
if (!IsExportJob(jobType))
|
||||
{
|
||||
return null;
|
||||
}
|
||||
|
||||
return jobType!.Length > Prefix.Length
|
||||
? jobType[Prefix.Length..]
|
||||
: null;
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,536 @@
|
||||
using System.Text.Json;
|
||||
|
||||
namespace StellaOps.JobEngine.Core.Domain.Export;
|
||||
|
||||
/// <summary>
|
||||
/// Represents a scheduled export configuration.
|
||||
/// Exports can be scheduled to run on a cron pattern.
|
||||
/// </summary>
|
||||
public sealed record ExportSchedule(
|
||||
/// <summary>Schedule ID.</summary>
|
||||
Guid ScheduleId,
|
||||
|
||||
/// <summary>Tenant ID.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Schedule name for identification.</summary>
|
||||
string Name,
|
||||
|
||||
/// <summary>Schedule description.</summary>
|
||||
string? Description,
|
||||
|
||||
/// <summary>Export type to execute.</summary>
|
||||
string ExportType,
|
||||
|
||||
/// <summary>Cron expression for scheduling (5 or 6 fields).</summary>
|
||||
string CronExpression,
|
||||
|
||||
/// <summary>Timezone for cron evaluation (IANA format).</summary>
|
||||
string Timezone,
|
||||
|
||||
/// <summary>Whether the schedule is enabled.</summary>
|
||||
bool Enabled,
|
||||
|
||||
/// <summary>Export payload template.</summary>
|
||||
ExportJobPayload PayloadTemplate,
|
||||
|
||||
/// <summary>Retention policy to apply to generated exports.</summary>
|
||||
string RetentionPolicy,
|
||||
|
||||
/// <summary>Project ID filter (optional).</summary>
|
||||
string? ProjectId,
|
||||
|
||||
/// <summary>Maximum concurrent exports from this schedule.</summary>
|
||||
int MaxConcurrent,
|
||||
|
||||
/// <summary>Whether to skip if previous run is still executing.</summary>
|
||||
bool SkipIfRunning,
|
||||
|
||||
/// <summary>Last successful run timestamp.</summary>
|
||||
DateTimeOffset? LastRunAt,
|
||||
|
||||
/// <summary>Last run job ID.</summary>
|
||||
Guid? LastJobId,
|
||||
|
||||
/// <summary>Last run status.</summary>
|
||||
string? LastRunStatus,
|
||||
|
||||
/// <summary>Next scheduled run time.</summary>
|
||||
DateTimeOffset? NextRunAt,
|
||||
|
||||
/// <summary>Total runs executed.</summary>
|
||||
long TotalRuns,
|
||||
|
||||
/// <summary>Successful runs count.</summary>
|
||||
long SuccessfulRuns,
|
||||
|
||||
/// <summary>Failed runs count.</summary>
|
||||
long FailedRuns,
|
||||
|
||||
/// <summary>Created timestamp.</summary>
|
||||
DateTimeOffset CreatedAt,
|
||||
|
||||
/// <summary>Last updated timestamp.</summary>
|
||||
DateTimeOffset UpdatedAt,
|
||||
|
||||
/// <summary>Created by user.</summary>
|
||||
string CreatedBy,
|
||||
|
||||
/// <summary>Last updated by user.</summary>
|
||||
string UpdatedBy)
|
||||
{
|
||||
/// <summary>Creates a new export schedule.</summary>
|
||||
public static ExportSchedule Create(
|
||||
string tenantId,
|
||||
string name,
|
||||
string exportType,
|
||||
string cronExpression,
|
||||
ExportJobPayload payloadTemplate,
|
||||
string createdBy,
|
||||
DateTimeOffset timestamp,
|
||||
string? description = null,
|
||||
string timezone = "UTC",
|
||||
string retentionPolicy = "default",
|
||||
string? projectId = null,
|
||||
int maxConcurrent = 1,
|
||||
bool skipIfRunning = true)
|
||||
{
|
||||
return new ExportSchedule(
|
||||
ScheduleId: Guid.NewGuid(),
|
||||
TenantId: tenantId,
|
||||
Name: name,
|
||||
Description: description,
|
||||
ExportType: exportType,
|
||||
CronExpression: cronExpression,
|
||||
Timezone: timezone,
|
||||
Enabled: true,
|
||||
PayloadTemplate: payloadTemplate,
|
||||
RetentionPolicy: retentionPolicy,
|
||||
ProjectId: projectId,
|
||||
MaxConcurrent: maxConcurrent,
|
||||
SkipIfRunning: skipIfRunning,
|
||||
LastRunAt: null,
|
||||
LastJobId: null,
|
||||
LastRunStatus: null,
|
||||
NextRunAt: null,
|
||||
TotalRuns: 0,
|
||||
SuccessfulRuns: 0,
|
||||
FailedRuns: 0,
|
||||
CreatedAt: timestamp,
|
||||
UpdatedAt: timestamp,
|
||||
CreatedBy: createdBy,
|
||||
UpdatedBy: createdBy);
|
||||
}
|
||||
|
||||
/// <summary>Success rate as percentage (0-100).</summary>
|
||||
public double SuccessRate => TotalRuns > 0
|
||||
? 100.0 * SuccessfulRuns / TotalRuns
|
||||
: 0;
|
||||
|
||||
/// <summary>Enables the schedule.</summary>
|
||||
public ExportSchedule Enable(DateTimeOffset timestamp) => this with
|
||||
{
|
||||
Enabled = true,
|
||||
UpdatedAt = timestamp
|
||||
};
|
||||
|
||||
/// <summary>Disables the schedule.</summary>
|
||||
public ExportSchedule Disable(DateTimeOffset timestamp) => this with
|
||||
{
|
||||
Enabled = false,
|
||||
UpdatedAt = timestamp
|
||||
};
|
||||
|
||||
/// <summary>Records a successful run.</summary>
|
||||
public ExportSchedule RecordSuccess(Guid jobId, DateTimeOffset timestamp, DateTimeOffset? nextRun = null) => this with
|
||||
{
|
||||
LastRunAt = timestamp,
|
||||
LastJobId = jobId,
|
||||
LastRunStatus = "completed",
|
||||
NextRunAt = nextRun,
|
||||
TotalRuns = TotalRuns + 1,
|
||||
SuccessfulRuns = SuccessfulRuns + 1,
|
||||
UpdatedAt = timestamp
|
||||
};
|
||||
|
||||
/// <summary>Records a failed run.</summary>
|
||||
public ExportSchedule RecordFailure(Guid jobId, DateTimeOffset timestamp, string? reason = null, DateTimeOffset? nextRun = null) => this with
|
||||
{
|
||||
LastRunAt = timestamp,
|
||||
LastJobId = jobId,
|
||||
LastRunStatus = $"failed: {reason ?? "unknown"}",
|
||||
NextRunAt = nextRun,
|
||||
TotalRuns = TotalRuns + 1,
|
||||
FailedRuns = FailedRuns + 1,
|
||||
UpdatedAt = timestamp
|
||||
};
|
||||
|
||||
/// <summary>Updates the next run time.</summary>
|
||||
public ExportSchedule WithNextRun(DateTimeOffset nextRun, DateTimeOffset timestamp) => this with
|
||||
{
|
||||
NextRunAt = nextRun,
|
||||
UpdatedAt = timestamp
|
||||
};
|
||||
|
||||
/// <summary>Updates the cron expression.</summary>
|
||||
public ExportSchedule WithCron(string cronExpression, string updatedBy, DateTimeOffset timestamp) => this with
|
||||
{
|
||||
CronExpression = cronExpression,
|
||||
UpdatedAt = timestamp,
|
||||
UpdatedBy = updatedBy
|
||||
};
|
||||
|
||||
/// <summary>Updates the payload template.</summary>
|
||||
public ExportSchedule WithPayload(ExportJobPayload payload, string updatedBy, DateTimeOffset timestamp) => this with
|
||||
{
|
||||
PayloadTemplate = payload,
|
||||
UpdatedAt = timestamp,
|
||||
UpdatedBy = updatedBy
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Configuration for retention pruning.
|
||||
/// </summary>
|
||||
public sealed record RetentionPruneConfig(
|
||||
/// <summary>Pruning job ID.</summary>
|
||||
Guid PruneId,
|
||||
|
||||
/// <summary>Tenant ID (null for global).</summary>
|
||||
string? TenantId,
|
||||
|
||||
/// <summary>Export type filter (null for all).</summary>
|
||||
string? ExportType,
|
||||
|
||||
/// <summary>Whether pruning is enabled.</summary>
|
||||
bool Enabled,
|
||||
|
||||
/// <summary>Cron expression for prune schedule.</summary>
|
||||
string CronExpression,
|
||||
|
||||
/// <summary>Maximum exports to prune per run.</summary>
|
||||
int BatchSize,
|
||||
|
||||
/// <summary>Whether to archive before deleting.</summary>
|
||||
bool ArchiveBeforeDelete,
|
||||
|
||||
/// <summary>Archive storage provider.</summary>
|
||||
string? ArchiveProvider,
|
||||
|
||||
/// <summary>Whether to notify on prune completion.</summary>
|
||||
bool NotifyOnComplete,
|
||||
|
||||
/// <summary>Notification channel for alerts.</summary>
|
||||
string? NotificationChannel,
|
||||
|
||||
/// <summary>Last prune timestamp.</summary>
|
||||
DateTimeOffset? LastPruneAt,
|
||||
|
||||
/// <summary>Exports pruned in last run.</summary>
|
||||
int LastPruneCount,
|
||||
|
||||
/// <summary>Total exports pruned.</summary>
|
||||
long TotalPruned,
|
||||
|
||||
/// <summary>Created timestamp.</summary>
|
||||
DateTimeOffset CreatedAt,
|
||||
|
||||
/// <summary>Updated timestamp.</summary>
|
||||
DateTimeOffset UpdatedAt)
|
||||
{
|
||||
/// <summary>Default batch size for pruning.</summary>
|
||||
public const int DefaultBatchSize = 100;
|
||||
|
||||
/// <summary>Default cron expression (daily at 2 AM).</summary>
|
||||
public const string DefaultCronExpression = "0 2 * * *";
|
||||
|
||||
/// <summary>Creates a default prune configuration.</summary>
|
||||
public static RetentionPruneConfig Create(
|
||||
DateTimeOffset timestamp,
|
||||
string? tenantId = null,
|
||||
string? exportType = null,
|
||||
string? cronExpression = null,
|
||||
int batchSize = DefaultBatchSize)
|
||||
{
|
||||
return new RetentionPruneConfig(
|
||||
PruneId: Guid.NewGuid(),
|
||||
TenantId: tenantId,
|
||||
ExportType: exportType,
|
||||
Enabled: true,
|
||||
CronExpression: cronExpression ?? DefaultCronExpression,
|
||||
BatchSize: batchSize,
|
||||
ArchiveBeforeDelete: true,
|
||||
ArchiveProvider: null,
|
||||
NotifyOnComplete: false,
|
||||
NotificationChannel: null,
|
||||
LastPruneAt: null,
|
||||
LastPruneCount: 0,
|
||||
TotalPruned: 0,
|
||||
CreatedAt: timestamp,
|
||||
UpdatedAt: timestamp);
|
||||
}
|
||||
|
||||
/// <summary>Records a prune operation.</summary>
|
||||
public RetentionPruneConfig RecordPrune(int count, DateTimeOffset timestamp) => this with
|
||||
{
|
||||
LastPruneAt = timestamp,
|
||||
LastPruneCount = count,
|
||||
TotalPruned = TotalPruned + count,
|
||||
UpdatedAt = timestamp
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Export failure alert configuration.
|
||||
/// </summary>
|
||||
public sealed record ExportAlertConfig(
|
||||
/// <summary>Alert configuration ID.</summary>
|
||||
Guid AlertConfigId,
|
||||
|
||||
/// <summary>Tenant ID.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Alert name.</summary>
|
||||
string Name,
|
||||
|
||||
/// <summary>Export type filter (null for all).</summary>
|
||||
string? ExportType,
|
||||
|
||||
/// <summary>Whether alerting is enabled.</summary>
|
||||
bool Enabled,
|
||||
|
||||
/// <summary>Minimum consecutive failures to trigger.</summary>
|
||||
int ConsecutiveFailuresThreshold,
|
||||
|
||||
/// <summary>Failure rate threshold (0-100).</summary>
|
||||
double FailureRateThreshold,
|
||||
|
||||
/// <summary>Time window for failure rate calculation.</summary>
|
||||
TimeSpan FailureRateWindow,
|
||||
|
||||
/// <summary>Alert severity.</summary>
|
||||
ExportAlertSeverity Severity,
|
||||
|
||||
/// <summary>Notification channels (comma-separated).</summary>
|
||||
string NotificationChannels,
|
||||
|
||||
/// <summary>Alert cooldown period.</summary>
|
||||
TimeSpan Cooldown,
|
||||
|
||||
/// <summary>Last alert timestamp.</summary>
|
||||
DateTimeOffset? LastAlertAt,
|
||||
|
||||
/// <summary>Total alerts triggered.</summary>
|
||||
long TotalAlerts,
|
||||
|
||||
/// <summary>Created timestamp.</summary>
|
||||
DateTimeOffset CreatedAt,
|
||||
|
||||
/// <summary>Updated timestamp.</summary>
|
||||
DateTimeOffset UpdatedAt)
|
||||
{
|
||||
/// <summary>Creates a default alert configuration.</summary>
|
||||
public static ExportAlertConfig Create(
|
||||
string tenantId,
|
||||
string name,
|
||||
DateTimeOffset timestamp,
|
||||
string? exportType = null,
|
||||
int consecutiveFailuresThreshold = 3,
|
||||
double failureRateThreshold = 50.0,
|
||||
ExportAlertSeverity severity = ExportAlertSeverity.Warning)
|
||||
{
|
||||
return new ExportAlertConfig(
|
||||
AlertConfigId: Guid.NewGuid(),
|
||||
TenantId: tenantId,
|
||||
Name: name,
|
||||
ExportType: exportType,
|
||||
Enabled: true,
|
||||
ConsecutiveFailuresThreshold: consecutiveFailuresThreshold,
|
||||
FailureRateThreshold: failureRateThreshold,
|
||||
FailureRateWindow: TimeSpan.FromHours(1),
|
||||
Severity: severity,
|
||||
NotificationChannels: "email",
|
||||
Cooldown: TimeSpan.FromMinutes(15),
|
||||
LastAlertAt: null,
|
||||
TotalAlerts: 0,
|
||||
CreatedAt: timestamp,
|
||||
UpdatedAt: timestamp);
|
||||
}
|
||||
|
||||
/// <summary>Whether an alert can be triggered (respects cooldown).</summary>
|
||||
public bool CanAlertAt(DateTimeOffset timestamp) => !LastAlertAt.HasValue ||
|
||||
timestamp >= LastAlertAt.Value.Add(Cooldown);
|
||||
|
||||
/// <summary>Records an alert.</summary>
|
||||
public ExportAlertConfig RecordAlert(DateTimeOffset timestamp) => this with
|
||||
{
|
||||
LastAlertAt = timestamp,
|
||||
TotalAlerts = TotalAlerts + 1,
|
||||
UpdatedAt = timestamp
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Export alert severity levels.
|
||||
/// </summary>
|
||||
public enum ExportAlertSeverity
|
||||
{
|
||||
/// <summary>Informational.</summary>
|
||||
Info = 0,
|
||||
|
||||
/// <summary>Warning - attention needed.</summary>
|
||||
Warning = 1,
|
||||
|
||||
/// <summary>Error - action required.</summary>
|
||||
Error = 2,
|
||||
|
||||
/// <summary>Critical - immediate action.</summary>
|
||||
Critical = 3
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Export failure alert instance.
|
||||
/// </summary>
|
||||
public sealed record ExportAlert(
|
||||
/// <summary>Alert ID.</summary>
|
||||
Guid AlertId,
|
||||
|
||||
/// <summary>Alert configuration ID.</summary>
|
||||
Guid AlertConfigId,
|
||||
|
||||
/// <summary>Tenant ID.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Export type.</summary>
|
||||
string ExportType,
|
||||
|
||||
/// <summary>Alert severity.</summary>
|
||||
ExportAlertSeverity Severity,
|
||||
|
||||
/// <summary>Alert message.</summary>
|
||||
string Message,
|
||||
|
||||
/// <summary>Failed job IDs.</summary>
|
||||
IReadOnlyList<Guid> FailedJobIds,
|
||||
|
||||
/// <summary>Consecutive failure count.</summary>
|
||||
int ConsecutiveFailures,
|
||||
|
||||
/// <summary>Current failure rate.</summary>
|
||||
double FailureRate,
|
||||
|
||||
/// <summary>Alert timestamp.</summary>
|
||||
DateTimeOffset TriggeredAt,
|
||||
|
||||
/// <summary>Acknowledged timestamp.</summary>
|
||||
DateTimeOffset? AcknowledgedAt,
|
||||
|
||||
/// <summary>Acknowledged by user.</summary>
|
||||
string? AcknowledgedBy,
|
||||
|
||||
/// <summary>Resolved timestamp.</summary>
|
||||
DateTimeOffset? ResolvedAt,
|
||||
|
||||
/// <summary>Resolution notes.</summary>
|
||||
string? ResolutionNotes)
|
||||
{
|
||||
/// <summary>Creates a new alert for consecutive failures.</summary>
|
||||
public static ExportAlert CreateForConsecutiveFailures(
|
||||
Guid alertConfigId,
|
||||
string tenantId,
|
||||
string exportType,
|
||||
ExportAlertSeverity severity,
|
||||
IReadOnlyList<Guid> failedJobIds,
|
||||
int consecutiveFailures,
|
||||
DateTimeOffset timestamp)
|
||||
{
|
||||
return new ExportAlert(
|
||||
AlertId: Guid.NewGuid(),
|
||||
AlertConfigId: alertConfigId,
|
||||
TenantId: tenantId,
|
||||
ExportType: exportType,
|
||||
Severity: severity,
|
||||
Message: $"Export job {exportType} has failed {consecutiveFailures} consecutive times",
|
||||
FailedJobIds: failedJobIds,
|
||||
ConsecutiveFailures: consecutiveFailures,
|
||||
FailureRate: 0,
|
||||
TriggeredAt: timestamp,
|
||||
AcknowledgedAt: null,
|
||||
AcknowledgedBy: null,
|
||||
ResolvedAt: null,
|
||||
ResolutionNotes: null);
|
||||
}
|
||||
|
||||
/// <summary>Creates a new alert for high failure rate.</summary>
|
||||
public static ExportAlert CreateForHighFailureRate(
|
||||
Guid alertConfigId,
|
||||
string tenantId,
|
||||
string exportType,
|
||||
ExportAlertSeverity severity,
|
||||
double failureRate,
|
||||
IReadOnlyList<Guid> recentFailedJobIds,
|
||||
DateTimeOffset timestamp)
|
||||
{
|
||||
return new ExportAlert(
|
||||
AlertId: Guid.NewGuid(),
|
||||
AlertConfigId: alertConfigId,
|
||||
TenantId: tenantId,
|
||||
ExportType: exportType,
|
||||
Severity: severity,
|
||||
Message: FormattableString.Invariant($"Export job {exportType} failure rate is {failureRate:F1}%"),
|
||||
FailedJobIds: recentFailedJobIds,
|
||||
ConsecutiveFailures: 0,
|
||||
FailureRate: failureRate,
|
||||
TriggeredAt: timestamp,
|
||||
AcknowledgedAt: null,
|
||||
AcknowledgedBy: null,
|
||||
ResolvedAt: null,
|
||||
ResolutionNotes: null);
|
||||
}
|
||||
|
||||
/// <summary>Acknowledges the alert.</summary>
|
||||
public ExportAlert Acknowledge(string acknowledgedBy, DateTimeOffset timestamp) => this with
|
||||
{
|
||||
AcknowledgedAt = timestamp,
|
||||
AcknowledgedBy = acknowledgedBy
|
||||
};
|
||||
|
||||
/// <summary>Resolves the alert.</summary>
|
||||
public ExportAlert Resolve(DateTimeOffset timestamp, string? notes = null) => this with
|
||||
{
|
||||
ResolvedAt = timestamp,
|
||||
ResolutionNotes = notes
|
||||
};
|
||||
|
||||
/// <summary>Whether the alert is active (not resolved).</summary>
|
||||
public bool IsActive => ResolvedAt is null;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Result of a retention prune operation.
|
||||
/// </summary>
|
||||
public sealed record RetentionPruneResult(
|
||||
/// <summary>Number of exports archived.</summary>
|
||||
int ArchivedCount,
|
||||
|
||||
/// <summary>Number of exports deleted.</summary>
|
||||
int DeletedCount,
|
||||
|
||||
/// <summary>Number of exports skipped (legal hold, etc.).</summary>
|
||||
int SkippedCount,
|
||||
|
||||
/// <summary>Errors encountered.</summary>
|
||||
IReadOnlyList<string> Errors,
|
||||
|
||||
/// <summary>Duration of prune operation.</summary>
|
||||
TimeSpan Duration)
|
||||
{
|
||||
/// <summary>Total exports processed.</summary>
|
||||
public int TotalProcessed => ArchivedCount + DeletedCount + SkippedCount;
|
||||
|
||||
/// <summary>Whether any errors occurred.</summary>
|
||||
public bool HasErrors => Errors.Count > 0;
|
||||
|
||||
/// <summary>Empty result.</summary>
|
||||
public static RetentionPruneResult Empty => new(0, 0, 0, [], TimeSpan.Zero);
|
||||
}
|
||||
@@ -0,0 +1,74 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
/// <summary>
|
||||
/// Represents the first meaningful signal for a job/run.
|
||||
/// </summary>
|
||||
public sealed record FirstSignal
|
||||
{
|
||||
public required string Version { get; init; } = "1.0";
|
||||
public required string SignalId { get; init; }
|
||||
public required Guid JobId { get; init; }
|
||||
public required DateTimeOffset Timestamp { get; init; }
|
||||
public required FirstSignalKind Kind { get; init; }
|
||||
public required FirstSignalPhase Phase { get; init; }
|
||||
public required FirstSignalScope Scope { get; init; }
|
||||
public required string Summary { get; init; }
|
||||
public int? EtaSeconds { get; init; }
|
||||
public LastKnownOutcome? LastKnownOutcome { get; init; }
|
||||
public IReadOnlyList<NextAction>? NextActions { get; init; }
|
||||
public required FirstSignalDiagnostics Diagnostics { get; init; }
|
||||
}
|
||||
|
||||
public enum FirstSignalKind
|
||||
{
|
||||
Queued,
|
||||
Started,
|
||||
Phase,
|
||||
Blocked,
|
||||
Failed,
|
||||
Succeeded,
|
||||
Canceled,
|
||||
Unavailable
|
||||
}
|
||||
|
||||
public enum FirstSignalPhase
|
||||
{
|
||||
Resolve,
|
||||
Fetch,
|
||||
Restore,
|
||||
Analyze,
|
||||
Policy,
|
||||
Report,
|
||||
Unknown
|
||||
}
|
||||
|
||||
public sealed record FirstSignalScope
|
||||
{
|
||||
public required string Type { get; init; } // "repo" | "image" | "artifact"
|
||||
public required string Id { get; init; }
|
||||
}
|
||||
|
||||
public sealed record LastKnownOutcome
|
||||
{
|
||||
public required string SignatureId { get; init; }
|
||||
public string? ErrorCode { get; init; }
|
||||
public required string Token { get; init; }
|
||||
public string? Excerpt { get; init; }
|
||||
public required string Confidence { get; init; } // "low" | "medium" | "high"
|
||||
public required DateTimeOffset FirstSeenAt { get; init; }
|
||||
public required int HitCount { get; init; }
|
||||
}
|
||||
|
||||
public sealed record NextAction
|
||||
{
|
||||
public required string Type { get; init; } // "open_logs" | "open_job" | "docs" | "retry" | "cli_command"
|
||||
public required string Label { get; init; }
|
||||
public required string Target { get; init; }
|
||||
}
|
||||
|
||||
public sealed record FirstSignalDiagnostics
|
||||
{
|
||||
public required bool CacheHit { get; init; }
|
||||
public required string Source { get; init; } // "snapshot" | "failure_index" | "cold_start"
|
||||
public required string CorrelationId { get; init; }
|
||||
}
|
||||
@@ -0,0 +1,69 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
/// <summary>
|
||||
/// Represents an operational incident triggered by threshold breaches.
|
||||
/// Incidents are generated when failure rates exceed configured limits.
|
||||
/// </summary>
|
||||
public sealed record Incident(
|
||||
/// <summary>Unique incident identifier.</summary>
|
||||
Guid IncidentId,
|
||||
|
||||
/// <summary>Tenant affected by this incident.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Incident type (e.g., "failure_rate", "quota_exhausted", "circuit_open").</summary>
|
||||
string IncidentType,
|
||||
|
||||
/// <summary>Incident severity (e.g., "warning", "critical").</summary>
|
||||
string Severity,
|
||||
|
||||
/// <summary>Affected job type (if applicable).</summary>
|
||||
string? JobType,
|
||||
|
||||
/// <summary>Affected source (if applicable).</summary>
|
||||
Guid? SourceId,
|
||||
|
||||
/// <summary>Human-readable incident title.</summary>
|
||||
string Title,
|
||||
|
||||
/// <summary>Detailed incident description.</summary>
|
||||
string Description,
|
||||
|
||||
/// <summary>Current incident status.</summary>
|
||||
IncidentStatus Status,
|
||||
|
||||
/// <summary>When the incident was created.</summary>
|
||||
DateTimeOffset CreatedAt,
|
||||
|
||||
/// <summary>When the incident was acknowledged.</summary>
|
||||
DateTimeOffset? AcknowledgedAt,
|
||||
|
||||
/// <summary>Actor who acknowledged the incident.</summary>
|
||||
string? AcknowledgedBy,
|
||||
|
||||
/// <summary>When the incident was resolved.</summary>
|
||||
DateTimeOffset? ResolvedAt,
|
||||
|
||||
/// <summary>Actor who resolved the incident.</summary>
|
||||
string? ResolvedBy,
|
||||
|
||||
/// <summary>Resolution notes.</summary>
|
||||
string? ResolutionNotes,
|
||||
|
||||
/// <summary>Optional metadata JSON blob.</summary>
|
||||
string? Metadata);
|
||||
|
||||
/// <summary>
|
||||
/// Incident lifecycle states.
|
||||
/// </summary>
|
||||
public enum IncidentStatus
|
||||
{
|
||||
/// <summary>Incident is open and unacknowledged.</summary>
|
||||
Open = 0,
|
||||
|
||||
/// <summary>Incident acknowledged by operator.</summary>
|
||||
Acknowledged = 1,
|
||||
|
||||
/// <summary>Incident resolved.</summary>
|
||||
Resolved = 2
|
||||
}
|
||||
@@ -0,0 +1,81 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
/// <summary>
|
||||
/// Represents a unit of work to be executed by a worker.
|
||||
/// Jobs are scheduled, leased to workers, and tracked through completion.
|
||||
/// </summary>
|
||||
public sealed record Job(
|
||||
/// <summary>Unique job identifier.</summary>
|
||||
Guid JobId,
|
||||
|
||||
/// <summary>Tenant owning this job.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Optional project scope within tenant.</summary>
|
||||
string? ProjectId,
|
||||
|
||||
/// <summary>Run this job belongs to (if any).</summary>
|
||||
Guid? RunId,
|
||||
|
||||
/// <summary>Job type (e.g., "scan.image", "advisory.nvd", "export.sbom").</summary>
|
||||
string JobType,
|
||||
|
||||
/// <summary>Current job status.</summary>
|
||||
JobStatus Status,
|
||||
|
||||
/// <summary>Priority (higher = more urgent). Default 0.</summary>
|
||||
int Priority,
|
||||
|
||||
/// <summary>Current attempt number (1-based).</summary>
|
||||
int Attempt,
|
||||
|
||||
/// <summary>Maximum retry attempts.</summary>
|
||||
int MaxAttempts,
|
||||
|
||||
/// <summary>SHA-256 digest of the payload for determinism verification.</summary>
|
||||
string PayloadDigest,
|
||||
|
||||
/// <summary>Job payload JSON (inputs, parameters).</summary>
|
||||
string Payload,
|
||||
|
||||
/// <summary>Idempotency key for deduplication.</summary>
|
||||
string IdempotencyKey,
|
||||
|
||||
/// <summary>Correlation ID for distributed tracing.</summary>
|
||||
string? CorrelationId,
|
||||
|
||||
/// <summary>Current lease ID (if leased).</summary>
|
||||
Guid? LeaseId,
|
||||
|
||||
/// <summary>Worker holding the lease (if leased).</summary>
|
||||
string? WorkerId,
|
||||
|
||||
/// <summary>Task runner ID executing the job (if applicable).</summary>
|
||||
string? TaskRunnerId,
|
||||
|
||||
/// <summary>Lease expiration time.</summary>
|
||||
DateTimeOffset? LeaseUntil,
|
||||
|
||||
/// <summary>When the job was created.</summary>
|
||||
DateTimeOffset CreatedAt,
|
||||
|
||||
/// <summary>When the job was scheduled (quota cleared).</summary>
|
||||
DateTimeOffset? ScheduledAt,
|
||||
|
||||
/// <summary>When the job was leased to a worker.</summary>
|
||||
DateTimeOffset? LeasedAt,
|
||||
|
||||
/// <summary>When the job completed (terminal state).</summary>
|
||||
DateTimeOffset? CompletedAt,
|
||||
|
||||
/// <summary>Earliest time the job can be scheduled (for backoff).</summary>
|
||||
DateTimeOffset? NotBefore,
|
||||
|
||||
/// <summary>Terminal status reason (failure message, cancel reason, etc.).</summary>
|
||||
string? Reason,
|
||||
|
||||
/// <summary>ID of the original job if this is a replay.</summary>
|
||||
Guid? ReplayOf,
|
||||
|
||||
/// <summary>Actor who created/submitted the job.</summary>
|
||||
string CreatedBy);
|
||||
@@ -0,0 +1,48 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
/// <summary>
|
||||
/// Represents an immutable history entry for job state changes.
|
||||
/// Provides audit trail for all job lifecycle transitions.
|
||||
/// </summary>
|
||||
public sealed record JobHistory(
|
||||
/// <summary>Unique history entry identifier.</summary>
|
||||
Guid HistoryId,
|
||||
|
||||
/// <summary>Tenant owning this entry.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Job this history entry belongs to.</summary>
|
||||
Guid JobId,
|
||||
|
||||
/// <summary>Sequence number within the job's history (1-based).</summary>
|
||||
int SequenceNo,
|
||||
|
||||
/// <summary>Previous job status.</summary>
|
||||
JobStatus? FromStatus,
|
||||
|
||||
/// <summary>New job status.</summary>
|
||||
JobStatus ToStatus,
|
||||
|
||||
/// <summary>Attempt number at time of transition.</summary>
|
||||
int Attempt,
|
||||
|
||||
/// <summary>Lease ID (if applicable).</summary>
|
||||
Guid? LeaseId,
|
||||
|
||||
/// <summary>Worker ID (if applicable).</summary>
|
||||
string? WorkerId,
|
||||
|
||||
/// <summary>Reason for the transition.</summary>
|
||||
string? Reason,
|
||||
|
||||
/// <summary>When this transition occurred.</summary>
|
||||
DateTimeOffset OccurredAt,
|
||||
|
||||
/// <summary>When this entry was recorded.</summary>
|
||||
DateTimeOffset RecordedAt,
|
||||
|
||||
/// <summary>Actor who caused this transition.</summary>
|
||||
string ActorId,
|
||||
|
||||
/// <summary>Actor type (system, operator, worker).</summary>
|
||||
string ActorType);
|
||||
@@ -0,0 +1,30 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
/// <summary>
|
||||
/// Job lifecycle states. Transitions follow the state machine:
|
||||
/// Pending → Scheduled → Leased → (Succeeded | Failed | Canceled | TimedOut)
|
||||
/// Failed jobs may transition to Pending via replay.
|
||||
/// </summary>
|
||||
public enum JobStatus
|
||||
{
|
||||
/// <summary>Job enqueued but not yet scheduled (e.g., quota exceeded).</summary>
|
||||
Pending = 0,
|
||||
|
||||
/// <summary>Job scheduled and awaiting worker lease.</summary>
|
||||
Scheduled = 1,
|
||||
|
||||
/// <summary>Job leased to a worker for execution.</summary>
|
||||
Leased = 2,
|
||||
|
||||
/// <summary>Job completed successfully.</summary>
|
||||
Succeeded = 3,
|
||||
|
||||
/// <summary>Job failed after exhausting retries.</summary>
|
||||
Failed = 4,
|
||||
|
||||
/// <summary>Job canceled by operator or system.</summary>
|
||||
Canceled = 5,
|
||||
|
||||
/// <summary>Job timed out (lease expired without completion).</summary>
|
||||
TimedOut = 6
|
||||
}
|
||||
@@ -0,0 +1,427 @@
|
||||
|
||||
using StellaOps.JobEngine.Core.Domain.AirGap;
|
||||
using System.Security.Cryptography;
|
||||
using System.Text;
|
||||
using System.Text.Json;
|
||||
|
||||
namespace StellaOps.JobEngine.Core.Domain.Mirror;
|
||||
|
||||
/// <summary>
|
||||
/// Mirror bundle job payload containing bundle-specific parameters.
|
||||
/// Serialized to JSON and stored in Job.Payload.
|
||||
/// Per ORCH-AIRGAP-57-001.
|
||||
/// </summary>
|
||||
public sealed record MirrorBundlePayload(
|
||||
/// <summary>Domains to include in the bundle (vex-advisories, vulnerability-feeds, etc.).</summary>
|
||||
IReadOnlyList<string> Domains,
|
||||
|
||||
/// <summary>Start of time range to include (inclusive).</summary>
|
||||
DateTimeOffset? StartTime,
|
||||
|
||||
/// <summary>End of time range to include (exclusive).</summary>
|
||||
DateTimeOffset? EndTime,
|
||||
|
||||
/// <summary>Target environment identifier for the bundle.</summary>
|
||||
string? TargetEnvironment,
|
||||
|
||||
/// <summary>Maximum staleness allowed in bundle data (seconds).</summary>
|
||||
int? MaxStalenessSeconds,
|
||||
|
||||
/// <summary>Whether to include full provenance chain.</summary>
|
||||
bool IncludeProvenance,
|
||||
|
||||
/// <summary>Whether to include audit trail entries.</summary>
|
||||
bool IncludeAuditTrail,
|
||||
|
||||
/// <summary>Whether to sign the bundle with DSSE.</summary>
|
||||
bool SignBundle,
|
||||
|
||||
/// <summary>Signing key identifier.</summary>
|
||||
string? SigningKeyId,
|
||||
|
||||
/// <summary>Compression format (null = none, "gzip", "zstd").</summary>
|
||||
string? Compression,
|
||||
|
||||
/// <summary>Destination URI for the bundle output.</summary>
|
||||
string? DestinationUri,
|
||||
|
||||
/// <summary>Whether to include time anchor for staleness validation.</summary>
|
||||
bool IncludeTimeAnchor,
|
||||
|
||||
/// <summary>Additional bundle-specific options.</summary>
|
||||
IReadOnlyDictionary<string, string>? Options)
|
||||
{
|
||||
/// <summary>Default bundle payload with minimal settings.</summary>
|
||||
public static MirrorBundlePayload Default(IReadOnlyList<string> domains) => new(
|
||||
Domains: domains,
|
||||
StartTime: null,
|
||||
EndTime: null,
|
||||
TargetEnvironment: null,
|
||||
MaxStalenessSeconds: null,
|
||||
IncludeProvenance: true,
|
||||
IncludeAuditTrail: true,
|
||||
SignBundle: true,
|
||||
SigningKeyId: null,
|
||||
Compression: "gzip",
|
||||
DestinationUri: null,
|
||||
IncludeTimeAnchor: true,
|
||||
Options: null);
|
||||
|
||||
/// <summary>Serializes the payload to JSON.</summary>
|
||||
public string ToJson() => JsonSerializer.Serialize(this, JsonOptions);
|
||||
|
||||
/// <summary>Computes SHA-256 digest of the payload.</summary>
|
||||
public string ComputeDigest()
|
||||
{
|
||||
var json = ToJson();
|
||||
var bytes = Encoding.UTF8.GetBytes(json);
|
||||
var hash = SHA256.HashData(bytes);
|
||||
return $"sha256:{Convert.ToHexStringLower(hash)}";
|
||||
}
|
||||
|
||||
/// <summary>Deserializes a payload from JSON. Returns null for invalid JSON.</summary>
|
||||
public static MirrorBundlePayload? FromJson(string json)
|
||||
{
|
||||
try
|
||||
{
|
||||
return JsonSerializer.Deserialize<MirrorBundlePayload>(json, JsonOptions);
|
||||
}
|
||||
catch (JsonException)
|
||||
{
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
private static readonly JsonSerializerOptions JsonOptions = new()
|
||||
{
|
||||
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
|
||||
WriteIndented = false
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Mirror bundle job result containing output metadata and provenance.
|
||||
/// Per ORCH-AIRGAP-57-001.
|
||||
/// </summary>
|
||||
public sealed record MirrorBundleResult(
|
||||
/// <summary>Output URI where bundle is stored.</summary>
|
||||
string OutputUri,
|
||||
|
||||
/// <summary>SHA-256 digest of the bundle.</summary>
|
||||
string BundleDigest,
|
||||
|
||||
/// <summary>SHA-256 digest of the bundle manifest.</summary>
|
||||
string ManifestDigest,
|
||||
|
||||
/// <summary>Bundle size in bytes.</summary>
|
||||
long BundleSizeBytes,
|
||||
|
||||
/// <summary>Domains included in the bundle.</summary>
|
||||
IReadOnlyList<string> IncludedDomains,
|
||||
|
||||
/// <summary>Per-domain export records.</summary>
|
||||
IReadOnlyList<ExportRecord> Exports,
|
||||
|
||||
/// <summary>Provenance attestation URI (if signed).</summary>
|
||||
string? ProvenanceUri,
|
||||
|
||||
/// <summary>Audit trail URI (if included).</summary>
|
||||
string? AuditTrailUri,
|
||||
|
||||
/// <summary>Audit trail entry count.</summary>
|
||||
int? AuditEntryCount,
|
||||
|
||||
/// <summary>Time anchor included in bundle.</summary>
|
||||
TimeAnchor? TimeAnchor,
|
||||
|
||||
/// <summary>Compression applied.</summary>
|
||||
string? Compression,
|
||||
|
||||
/// <summary>Source environment identifier.</summary>
|
||||
string SourceEnvironment,
|
||||
|
||||
/// <summary>Target environment identifier (if specified).</summary>
|
||||
string? TargetEnvironment,
|
||||
|
||||
/// <summary>Bundle generation timestamp.</summary>
|
||||
DateTimeOffset GeneratedAt,
|
||||
|
||||
/// <summary>Duration of bundle creation in seconds.</summary>
|
||||
double DurationSeconds,
|
||||
|
||||
/// <summary>DSSE signature (if signed).</summary>
|
||||
MirrorBundleSignature? Signature)
|
||||
{
|
||||
/// <summary>Serializes the result to JSON.</summary>
|
||||
public string ToJson() => JsonSerializer.Serialize(this, JsonOptions);
|
||||
|
||||
/// <summary>Deserializes a result from JSON.</summary>
|
||||
public static MirrorBundleResult? FromJson(string json) =>
|
||||
JsonSerializer.Deserialize<MirrorBundleResult>(json, JsonOptions);
|
||||
|
||||
private static readonly JsonSerializerOptions JsonOptions = new()
|
||||
{
|
||||
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
|
||||
WriteIndented = false
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// DSSE signature for a mirror bundle.
|
||||
/// </summary>
|
||||
public sealed record MirrorBundleSignature(
|
||||
/// <summary>Signature algorithm (e.g., "ECDSA-P256-SHA256").</summary>
|
||||
string Algorithm,
|
||||
|
||||
/// <summary>Signing key identifier.</summary>
|
||||
string KeyId,
|
||||
|
||||
/// <summary>Signature value (base64).</summary>
|
||||
string SignatureValue,
|
||||
|
||||
/// <summary>Signed timestamp.</summary>
|
||||
DateTimeOffset SignedAt,
|
||||
|
||||
/// <summary>DSSE payload type.</summary>
|
||||
string PayloadType,
|
||||
|
||||
/// <summary>URI to full DSSE envelope.</summary>
|
||||
string? EnvelopeUri);
|
||||
|
||||
/// <summary>
|
||||
/// Audit trail record included in mirror bundle.
|
||||
/// Per ORCH-AIRGAP-57-001.
|
||||
/// </summary>
|
||||
public sealed record MirrorAuditEntry(
|
||||
/// <summary>Audit entry ID.</summary>
|
||||
Guid EntryId,
|
||||
|
||||
/// <summary>Event type.</summary>
|
||||
string EventType,
|
||||
|
||||
/// <summary>Event timestamp.</summary>
|
||||
DateTimeOffset Timestamp,
|
||||
|
||||
/// <summary>Actor who triggered the event.</summary>
|
||||
string? Actor,
|
||||
|
||||
/// <summary>Affected domain.</summary>
|
||||
string? DomainId,
|
||||
|
||||
/// <summary>Affected entity ID.</summary>
|
||||
Guid? EntityId,
|
||||
|
||||
/// <summary>Event details.</summary>
|
||||
string? Details,
|
||||
|
||||
/// <summary>Content hash for integrity verification.</summary>
|
||||
string ContentHash,
|
||||
|
||||
/// <summary>Correlation ID for related events.</summary>
|
||||
string? CorrelationId)
|
||||
{
|
||||
/// <summary>Computes SHA-256 digest of the entry for verification.</summary>
|
||||
public string ComputeDigest()
|
||||
{
|
||||
var canonical = $"{EntryId}|{EventType}|{Timestamp:o}|{Actor ?? ""}|{DomainId ?? ""}|{EntityId?.ToString() ?? ""}|{Details ?? ""}|{CorrelationId ?? ""}";
|
||||
var bytes = Encoding.UTF8.GetBytes(canonical);
|
||||
var hash = SHA256.HashData(bytes);
|
||||
return $"sha256:{Convert.ToHexStringLower(hash)}";
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Mirror bundle job progress information.
|
||||
/// </summary>
|
||||
public sealed record MirrorBundleProgress(
|
||||
/// <summary>Current phase of bundle creation.</summary>
|
||||
MirrorPhase Phase,
|
||||
|
||||
/// <summary>Domains processed so far.</summary>
|
||||
int DomainsProcessed,
|
||||
|
||||
/// <summary>Total domains to process.</summary>
|
||||
int TotalDomains,
|
||||
|
||||
/// <summary>Records processed so far.</summary>
|
||||
int RecordsProcessed,
|
||||
|
||||
/// <summary>Bytes written so far.</summary>
|
||||
long BytesWritten,
|
||||
|
||||
/// <summary>Audit entries collected.</summary>
|
||||
int AuditEntriesCollected,
|
||||
|
||||
/// <summary>Current progress message.</summary>
|
||||
string? Message)
|
||||
{
|
||||
/// <summary>Computes progress percentage (0-100).</summary>
|
||||
public double? ProgressPercent => TotalDomains > 0
|
||||
? Math.Min(100.0, 100.0 * DomainsProcessed / TotalDomains)
|
||||
: null;
|
||||
|
||||
/// <summary>Serializes the progress to JSON.</summary>
|
||||
public string ToJson() => JsonSerializer.Serialize(this, JsonOptions);
|
||||
|
||||
/// <summary>Deserializes progress from JSON.</summary>
|
||||
public static MirrorBundleProgress? FromJson(string json) =>
|
||||
JsonSerializer.Deserialize<MirrorBundleProgress>(json, JsonOptions);
|
||||
|
||||
private static readonly JsonSerializerOptions JsonOptions = new()
|
||||
{
|
||||
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
|
||||
WriteIndented = false
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Mirror bundle job phases.
|
||||
/// </summary>
|
||||
public enum MirrorPhase
|
||||
{
|
||||
/// <summary>Initializing bundle creation.</summary>
|
||||
Initializing = 0,
|
||||
|
||||
/// <summary>Validating staleness requirements.</summary>
|
||||
ValidatingStaleness = 1,
|
||||
|
||||
/// <summary>Collecting domain data.</summary>
|
||||
CollectingDomainData = 2,
|
||||
|
||||
/// <summary>Collecting audit trail.</summary>
|
||||
CollectingAuditTrail = 3,
|
||||
|
||||
/// <summary>Generating provenance.</summary>
|
||||
GeneratingProvenance = 4,
|
||||
|
||||
/// <summary>Creating time anchor.</summary>
|
||||
CreatingTimeAnchor = 5,
|
||||
|
||||
/// <summary>Compressing bundle.</summary>
|
||||
Compressing = 6,
|
||||
|
||||
/// <summary>Signing bundle with DSSE.</summary>
|
||||
Signing = 7,
|
||||
|
||||
/// <summary>Uploading to destination.</summary>
|
||||
Uploading = 8,
|
||||
|
||||
/// <summary>Finalizing bundle.</summary>
|
||||
Finalizing = 9,
|
||||
|
||||
/// <summary>Bundle creation completed.</summary>
|
||||
Completed = 10
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Manifest for a mirror bundle describing its contents.
|
||||
/// </summary>
|
||||
public sealed record MirrorBundleManifest(
|
||||
/// <summary>Bundle identifier.</summary>
|
||||
Guid BundleId,
|
||||
|
||||
/// <summary>Manifest schema version.</summary>
|
||||
string SchemaVersion,
|
||||
|
||||
/// <summary>Source environment identifier.</summary>
|
||||
string SourceEnvironment,
|
||||
|
||||
/// <summary>Target environment identifier (if specified).</summary>
|
||||
string? TargetEnvironment,
|
||||
|
||||
/// <summary>Bundle creation timestamp.</summary>
|
||||
DateTimeOffset CreatedAt,
|
||||
|
||||
/// <summary>Domains included in the bundle.</summary>
|
||||
IReadOnlyList<MirrorDomainEntry> Domains,
|
||||
|
||||
/// <summary>Time anchor for staleness validation.</summary>
|
||||
TimeAnchor? TimeAnchor,
|
||||
|
||||
/// <summary>Provenance record.</summary>
|
||||
BundleProvenance Provenance,
|
||||
|
||||
/// <summary>Audit trail summary.</summary>
|
||||
MirrorAuditSummary? AuditSummary,
|
||||
|
||||
/// <summary>Bundle metadata.</summary>
|
||||
IReadOnlyDictionary<string, string>? Metadata)
|
||||
{
|
||||
/// <summary>Current manifest schema version.</summary>
|
||||
public const string CurrentSchemaVersion = "1.0.0";
|
||||
|
||||
/// <summary>Serializes the manifest to JSON.</summary>
|
||||
public string ToJson() => JsonSerializer.Serialize(this, JsonOptions);
|
||||
|
||||
/// <summary>Computes SHA-256 digest of the manifest.</summary>
|
||||
public string ComputeDigest()
|
||||
{
|
||||
var json = ToJson();
|
||||
var bytes = Encoding.UTF8.GetBytes(json);
|
||||
var hash = SHA256.HashData(bytes);
|
||||
return $"sha256:{Convert.ToHexStringLower(hash)}";
|
||||
}
|
||||
|
||||
/// <summary>Deserializes a manifest from JSON.</summary>
|
||||
public static MirrorBundleManifest? FromJson(string json) =>
|
||||
JsonSerializer.Deserialize<MirrorBundleManifest>(json, JsonOptions);
|
||||
|
||||
private static readonly JsonSerializerOptions JsonOptions = new()
|
||||
{
|
||||
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
|
||||
WriteIndented = false
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Domain entry in a mirror bundle manifest.
|
||||
/// </summary>
|
||||
public sealed record MirrorDomainEntry(
|
||||
/// <summary>Domain identifier.</summary>
|
||||
string DomainId,
|
||||
|
||||
/// <summary>Export format.</summary>
|
||||
ExportFormat Format,
|
||||
|
||||
/// <summary>Export file path within bundle.</summary>
|
||||
string FilePath,
|
||||
|
||||
/// <summary>Export digest.</summary>
|
||||
string Digest,
|
||||
|
||||
/// <summary>Export size in bytes.</summary>
|
||||
long SizeBytes,
|
||||
|
||||
/// <summary>Record count in export.</summary>
|
||||
int RecordCount,
|
||||
|
||||
/// <summary>Source timestamp of the data.</summary>
|
||||
DateTimeOffset SourceTimestamp,
|
||||
|
||||
/// <summary>Staleness at bundle creation time (seconds).</summary>
|
||||
int StalenessSeconds);
|
||||
|
||||
/// <summary>
|
||||
/// Summary of audit trail included in mirror bundle.
|
||||
/// </summary>
|
||||
public sealed record MirrorAuditSummary(
|
||||
/// <summary>Total audit entries in bundle.</summary>
|
||||
int TotalEntries,
|
||||
|
||||
/// <summary>Audit trail file path within bundle.</summary>
|
||||
string FilePath,
|
||||
|
||||
/// <summary>Audit trail digest.</summary>
|
||||
string Digest,
|
||||
|
||||
/// <summary>Audit trail size in bytes.</summary>
|
||||
long SizeBytes,
|
||||
|
||||
/// <summary>Earliest audit entry timestamp.</summary>
|
||||
DateTimeOffset EarliestEntry,
|
||||
|
||||
/// <summary>Latest audit entry timestamp.</summary>
|
||||
DateTimeOffset LatestEntry,
|
||||
|
||||
/// <summary>Event type counts.</summary>
|
||||
IReadOnlyDictionary<string, int> EventTypeCounts);
|
||||
@@ -0,0 +1,54 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain.Mirror;
|
||||
|
||||
/// <summary>
|
||||
/// Standard mirror job type identifiers for air-gap bundle operations.
|
||||
/// Mirror jobs follow the pattern "mirror.{operation}" where operation is the mirror action.
|
||||
/// Per ORCH-AIRGAP-57-001.
|
||||
/// </summary>
|
||||
public static class MirrorJobTypes
|
||||
{
|
||||
/// <summary>Job type prefix for all mirror jobs.</summary>
|
||||
public const string Prefix = "mirror.";
|
||||
|
||||
/// <summary>Bundle creation for air-gap export (creates portable bundle with provenance).</summary>
|
||||
public const string Bundle = "mirror.bundle";
|
||||
|
||||
/// <summary>Bundle import from external source (validates and imports portable bundle).</summary>
|
||||
public const string Import = "mirror.import";
|
||||
|
||||
/// <summary>Bundle verification (validates bundle integrity without importing).</summary>
|
||||
public const string Verify = "mirror.verify";
|
||||
|
||||
/// <summary>Bundle sync (synchronizes bundles between environments).</summary>
|
||||
public const string Sync = "mirror.sync";
|
||||
|
||||
/// <summary>Bundle diff (compares bundles to identify delta).</summary>
|
||||
public const string Diff = "mirror.diff";
|
||||
|
||||
/// <summary>All known mirror job types.</summary>
|
||||
public static readonly IReadOnlyList<string> All =
|
||||
[
|
||||
Bundle,
|
||||
Import,
|
||||
Verify,
|
||||
Sync,
|
||||
Diff
|
||||
];
|
||||
|
||||
/// <summary>Checks if a job type is a mirror job.</summary>
|
||||
public static bool IsMirrorJob(string? jobType) =>
|
||||
jobType is not null && jobType.StartsWith(Prefix, StringComparison.OrdinalIgnoreCase);
|
||||
|
||||
/// <summary>Gets the mirror operation from a job type (e.g., "bundle" from "mirror.bundle").</summary>
|
||||
public static string? GetMirrorOperation(string? jobType)
|
||||
{
|
||||
if (!IsMirrorJob(jobType))
|
||||
{
|
||||
return null;
|
||||
}
|
||||
|
||||
return jobType!.Length > Prefix.Length
|
||||
? jobType[Prefix.Length..]
|
||||
: null;
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,861 @@
|
||||
using Microsoft.Extensions.Logging;
|
||||
using StellaOps.JobEngine.Core.Domain.AirGap;
|
||||
using StellaOps.JobEngine.Core.Domain.Events;
|
||||
using StellaOps.JobEngine.Core.Evidence;
|
||||
|
||||
namespace StellaOps.JobEngine.Core.Domain.Mirror;
|
||||
|
||||
/// <summary>
|
||||
/// Event types for mirror operations.
|
||||
/// Per ORCH-AIRGAP-58-001.
|
||||
/// </summary>
|
||||
public static class MirrorEventTypes
|
||||
{
|
||||
public const string Prefix = "mirror.";
|
||||
|
||||
// Bundle operations
|
||||
public const string BundleStarted = "mirror.bundle.started";
|
||||
public const string BundleProgress = "mirror.bundle.progress";
|
||||
public const string BundleCompleted = "mirror.bundle.completed";
|
||||
public const string BundleFailed = "mirror.bundle.failed";
|
||||
|
||||
// Import operations
|
||||
public const string ImportStarted = "mirror.import.started";
|
||||
public const string ImportValidated = "mirror.import.validated";
|
||||
public const string ImportCompleted = "mirror.import.completed";
|
||||
public const string ImportFailed = "mirror.import.failed";
|
||||
|
||||
// Verification operations
|
||||
public const string VerifyStarted = "mirror.verify.started";
|
||||
public const string VerifyCompleted = "mirror.verify.completed";
|
||||
public const string VerifyFailed = "mirror.verify.failed";
|
||||
|
||||
// Sync operations
|
||||
public const string SyncStarted = "mirror.sync.started";
|
||||
public const string SyncProgress = "mirror.sync.progress";
|
||||
public const string SyncCompleted = "mirror.sync.completed";
|
||||
public const string SyncFailed = "mirror.sync.failed";
|
||||
|
||||
// Evidence capture
|
||||
public const string EvidenceCaptured = "mirror.evidence.captured";
|
||||
public const string ProvenanceRecorded = "mirror.provenance.recorded";
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Service for recording mirror import/export operations as timeline events and evidence entries.
|
||||
/// Per ORCH-AIRGAP-58-001.
|
||||
/// </summary>
|
||||
public interface IMirrorOperationRecorder
|
||||
{
|
||||
/// <summary>Records the start of a bundle creation operation.</summary>
|
||||
Task<MirrorOperationRecordResult> RecordBundleStartedAsync(
|
||||
MirrorOperationContext context,
|
||||
MirrorBundlePayload payload,
|
||||
CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>Records bundle creation progress.</summary>
|
||||
Task<MirrorOperationRecordResult> RecordBundleProgressAsync(
|
||||
MirrorOperationContext context,
|
||||
MirrorBundleProgress progress,
|
||||
CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>Records successful bundle completion with evidence.</summary>
|
||||
Task<MirrorOperationRecordResult> RecordBundleCompletedAsync(
|
||||
MirrorOperationContext context,
|
||||
MirrorBundleResult result,
|
||||
CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>Records bundle creation failure.</summary>
|
||||
Task<MirrorOperationRecordResult> RecordBundleFailedAsync(
|
||||
MirrorOperationContext context,
|
||||
string errorCode,
|
||||
string errorMessage,
|
||||
CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>Records the start of an import operation.</summary>
|
||||
Task<MirrorOperationRecordResult> RecordImportStartedAsync(
|
||||
MirrorOperationContext context,
|
||||
MirrorImportRequest request,
|
||||
CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>Records successful import validation.</summary>
|
||||
Task<MirrorOperationRecordResult> RecordImportValidatedAsync(
|
||||
MirrorOperationContext context,
|
||||
MirrorImportValidation validation,
|
||||
CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>Records successful import completion.</summary>
|
||||
Task<MirrorOperationRecordResult> RecordImportCompletedAsync(
|
||||
MirrorOperationContext context,
|
||||
MirrorImportResult result,
|
||||
CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>Records import failure.</summary>
|
||||
Task<MirrorOperationRecordResult> RecordImportFailedAsync(
|
||||
MirrorOperationContext context,
|
||||
string errorCode,
|
||||
string errorMessage,
|
||||
CancellationToken cancellationToken = default);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Context for mirror operations.
|
||||
/// </summary>
|
||||
public sealed record MirrorOperationContext(
|
||||
/// <summary>Tenant scope.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Project scope.</summary>
|
||||
string? ProjectId,
|
||||
|
||||
/// <summary>Job identifier.</summary>
|
||||
Guid JobId,
|
||||
|
||||
/// <summary>Operation identifier.</summary>
|
||||
Guid OperationId,
|
||||
|
||||
/// <summary>Job type.</summary>
|
||||
string JobType,
|
||||
|
||||
/// <summary>Actor triggering the operation.</summary>
|
||||
string? Actor,
|
||||
|
||||
/// <summary>Trace ID for correlation.</summary>
|
||||
string? TraceId,
|
||||
|
||||
/// <summary>Span ID for correlation.</summary>
|
||||
string? SpanId,
|
||||
|
||||
/// <summary>Source environment identifier.</summary>
|
||||
string SourceEnvironment,
|
||||
|
||||
/// <summary>Target environment identifier.</summary>
|
||||
string? TargetEnvironment);
|
||||
|
||||
/// <summary>
|
||||
/// Result of recording a mirror operation.
|
||||
/// </summary>
|
||||
public sealed record MirrorOperationRecordResult(
|
||||
/// <summary>Whether recording was successful.</summary>
|
||||
bool Success,
|
||||
|
||||
/// <summary>Timeline event ID.</summary>
|
||||
Guid? EventId,
|
||||
|
||||
/// <summary>Evidence capsule ID if created.</summary>
|
||||
Guid? CapsuleId,
|
||||
|
||||
/// <summary>Evidence pointer for downstream consumers.</summary>
|
||||
EvidencePointer? EvidencePointer,
|
||||
|
||||
/// <summary>Error message if recording failed.</summary>
|
||||
string? Error);
|
||||
|
||||
/// <summary>
|
||||
/// Import request details.
|
||||
/// </summary>
|
||||
public sealed record MirrorImportRequest(
|
||||
/// <summary>Bundle URI to import.</summary>
|
||||
string BundleUri,
|
||||
|
||||
/// <summary>Expected bundle digest.</summary>
|
||||
string? ExpectedDigest,
|
||||
|
||||
/// <summary>Whether to validate signatures.</summary>
|
||||
bool ValidateSignatures,
|
||||
|
||||
/// <summary>Whether to verify provenance chain.</summary>
|
||||
bool VerifyProvenance,
|
||||
|
||||
/// <summary>Maximum staleness allowed (seconds).</summary>
|
||||
int? MaxStalenessSeconds);
|
||||
|
||||
/// <summary>
|
||||
/// Import validation result.
|
||||
/// </summary>
|
||||
public sealed record MirrorImportValidation(
|
||||
/// <summary>Whether bundle is valid.</summary>
|
||||
bool IsValid,
|
||||
|
||||
/// <summary>Verified bundle digest.</summary>
|
||||
string BundleDigest,
|
||||
|
||||
/// <summary>Verified manifest digest.</summary>
|
||||
string ManifestDigest,
|
||||
|
||||
/// <summary>Whether signature was verified.</summary>
|
||||
bool SignatureVerified,
|
||||
|
||||
/// <summary>Whether provenance was verified.</summary>
|
||||
bool ProvenanceVerified,
|
||||
|
||||
/// <summary>Staleness at validation time (seconds).</summary>
|
||||
int? StalenessSeconds,
|
||||
|
||||
/// <summary>Validation warnings.</summary>
|
||||
IReadOnlyList<string>? Warnings);
|
||||
|
||||
/// <summary>
|
||||
/// Import result details.
|
||||
/// </summary>
|
||||
public sealed record MirrorImportResult(
|
||||
/// <summary>Number of domains imported.</summary>
|
||||
int DomainsImported,
|
||||
|
||||
/// <summary>Number of records imported.</summary>
|
||||
int RecordsImported,
|
||||
|
||||
/// <summary>Import duration in seconds.</summary>
|
||||
double DurationSeconds,
|
||||
|
||||
/// <summary>Time anchor from bundle.</summary>
|
||||
TimeAnchor? TimeAnchor,
|
||||
|
||||
/// <summary>Provenance record.</summary>
|
||||
MirrorImportProvenance Provenance);
|
||||
|
||||
/// <summary>
|
||||
/// Provenance record for imported bundle.
|
||||
/// </summary>
|
||||
public sealed record MirrorImportProvenance(
|
||||
/// <summary>Original bundle ID.</summary>
|
||||
Guid BundleId,
|
||||
|
||||
/// <summary>Source environment.</summary>
|
||||
string SourceEnvironment,
|
||||
|
||||
/// <summary>Original creation timestamp.</summary>
|
||||
DateTimeOffset OriginalCreatedAt,
|
||||
|
||||
/// <summary>Bundle digest.</summary>
|
||||
string BundleDigest,
|
||||
|
||||
/// <summary>Signing key ID.</summary>
|
||||
string? SigningKeyId,
|
||||
|
||||
/// <summary>Import timestamp.</summary>
|
||||
DateTimeOffset ImportedAt);
|
||||
|
||||
/// <summary>
|
||||
/// Default implementation of mirror operation recorder.
|
||||
/// </summary>
|
||||
public sealed class MirrorOperationRecorder : IMirrorOperationRecorder
|
||||
{
|
||||
private const string Source = "orchestrator-mirror";
|
||||
|
||||
private readonly ITimelineEventEmitter _timelineEmitter;
|
||||
private readonly IJobCapsuleGenerator _capsuleGenerator;
|
||||
private readonly IMirrorEvidenceStore _evidenceStore;
|
||||
private readonly TimeProvider _timeProvider;
|
||||
private readonly ILogger<MirrorOperationRecorder> _logger;
|
||||
|
||||
public MirrorOperationRecorder(
|
||||
ITimelineEventEmitter timelineEmitter,
|
||||
IJobCapsuleGenerator capsuleGenerator,
|
||||
IMirrorEvidenceStore evidenceStore,
|
||||
TimeProvider timeProvider,
|
||||
ILogger<MirrorOperationRecorder> logger)
|
||||
{
|
||||
_timelineEmitter = timelineEmitter ?? throw new ArgumentNullException(nameof(timelineEmitter));
|
||||
_capsuleGenerator = capsuleGenerator ?? throw new ArgumentNullException(nameof(capsuleGenerator));
|
||||
_evidenceStore = evidenceStore ?? throw new ArgumentNullException(nameof(evidenceStore));
|
||||
_timeProvider = timeProvider ?? throw new ArgumentNullException(nameof(timeProvider));
|
||||
_logger = logger ?? throw new ArgumentNullException(nameof(logger));
|
||||
}
|
||||
|
||||
public async Task<MirrorOperationRecordResult> RecordBundleStartedAsync(
|
||||
MirrorOperationContext context,
|
||||
MirrorBundlePayload payload,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
try
|
||||
{
|
||||
var attributes = CreateBaseAttributes(context);
|
||||
attributes["domainsCount"] = payload.Domains.Count.ToString();
|
||||
attributes["includeProvenance"] = payload.IncludeProvenance.ToString();
|
||||
attributes["includeAuditTrail"] = payload.IncludeAuditTrail.ToString();
|
||||
|
||||
var eventPayload = new
|
||||
{
|
||||
operationId = context.OperationId,
|
||||
domains = payload.Domains,
|
||||
targetEnvironment = payload.TargetEnvironment,
|
||||
compression = payload.Compression,
|
||||
signBundle = payload.SignBundle
|
||||
};
|
||||
|
||||
var emitResult = await _timelineEmitter.EmitJobEventAsync(
|
||||
context.TenantId,
|
||||
context.JobId,
|
||||
MirrorEventTypes.BundleStarted,
|
||||
payload: eventPayload,
|
||||
actor: context.Actor,
|
||||
correlationId: context.OperationId.ToString(),
|
||||
traceId: context.TraceId,
|
||||
projectId: context.ProjectId,
|
||||
attributes: attributes,
|
||||
cancellationToken: cancellationToken);
|
||||
|
||||
_logger.LogInformation(
|
||||
"Recorded bundle started for job {JobId} operation {OperationId}",
|
||||
context.JobId, context.OperationId);
|
||||
|
||||
return new MirrorOperationRecordResult(
|
||||
Success: emitResult.Success,
|
||||
EventId: emitResult.Event.EventId,
|
||||
CapsuleId: null,
|
||||
EvidencePointer: null,
|
||||
Error: emitResult.Error);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
_logger.LogError(ex, "Failed to record bundle started for job {JobId}", context.JobId);
|
||||
return new MirrorOperationRecordResult(false, null, null, null, ex.Message);
|
||||
}
|
||||
}
|
||||
|
||||
public async Task<MirrorOperationRecordResult> RecordBundleProgressAsync(
|
||||
MirrorOperationContext context,
|
||||
MirrorBundleProgress progress,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
try
|
||||
{
|
||||
var attributes = CreateBaseAttributes(context);
|
||||
attributes["phase"] = progress.Phase.ToString();
|
||||
attributes["domainsProcessed"] = progress.DomainsProcessed.ToString();
|
||||
attributes["totalDomains"] = progress.TotalDomains.ToString();
|
||||
|
||||
var emitResult = await _timelineEmitter.EmitJobEventAsync(
|
||||
context.TenantId,
|
||||
context.JobId,
|
||||
MirrorEventTypes.BundleProgress,
|
||||
payload: progress,
|
||||
actor: context.Actor,
|
||||
correlationId: context.OperationId.ToString(),
|
||||
traceId: context.TraceId,
|
||||
projectId: context.ProjectId,
|
||||
attributes: attributes,
|
||||
cancellationToken: cancellationToken);
|
||||
|
||||
return new MirrorOperationRecordResult(
|
||||
Success: emitResult.Success,
|
||||
EventId: emitResult.Event.EventId,
|
||||
CapsuleId: null,
|
||||
EvidencePointer: null,
|
||||
Error: emitResult.Error);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
_logger.LogError(ex, "Failed to record bundle progress for job {JobId}", context.JobId);
|
||||
return new MirrorOperationRecordResult(false, null, null, null, ex.Message);
|
||||
}
|
||||
}
|
||||
|
||||
public async Task<MirrorOperationRecordResult> RecordBundleCompletedAsync(
|
||||
MirrorOperationContext context,
|
||||
MirrorBundleResult result,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
try
|
||||
{
|
||||
// Create evidence entry
|
||||
var now = _timeProvider.GetUtcNow();
|
||||
var evidence = new MirrorOperationEvidence(
|
||||
OperationId: context.OperationId,
|
||||
OperationType: MirrorOperationType.BundleExport,
|
||||
TenantId: context.TenantId,
|
||||
ProjectId: context.ProjectId,
|
||||
JobId: context.JobId,
|
||||
Status: MirrorOperationStatus.Completed,
|
||||
StartedAt: now.AddSeconds(-result.DurationSeconds),
|
||||
CompletedAt: now,
|
||||
SourceEnvironment: context.SourceEnvironment,
|
||||
TargetEnvironment: context.TargetEnvironment,
|
||||
BundleDigest: result.BundleDigest,
|
||||
ManifestDigest: result.ManifestDigest,
|
||||
ProvenanceUri: result.ProvenanceUri,
|
||||
AuditTrailUri: result.AuditTrailUri,
|
||||
DomainsCount: result.IncludedDomains.Count,
|
||||
RecordsCount: result.Exports.Sum(e => e.RecordCount ?? 0),
|
||||
SizeBytes: result.BundleSizeBytes,
|
||||
DurationSeconds: result.DurationSeconds,
|
||||
Error: null);
|
||||
|
||||
await _evidenceStore.StoreAsync(evidence, cancellationToken);
|
||||
|
||||
// Create job capsule for Evidence Locker
|
||||
var capsuleRequest = new JobCapsuleRequest(
|
||||
TenantId: context.TenantId,
|
||||
JobId: context.JobId,
|
||||
JobType: context.JobType,
|
||||
PayloadJson: result.ToJson(),
|
||||
ProjectId: context.ProjectId,
|
||||
SourceRef: new JobCapsuleSourceRef("mirror.bundle", context.OperationId.ToString(), context.Actor, context.TraceId),
|
||||
Environment: new JobCapsuleEnvironment(null, null, null, false, null),
|
||||
Metadata: new Dictionary<string, string>
|
||||
{
|
||||
["operationId"] = context.OperationId.ToString(),
|
||||
["bundleDigest"] = result.BundleDigest,
|
||||
["sourceEnvironment"] = result.SourceEnvironment
|
||||
});
|
||||
|
||||
var outputs = new JobCapsuleOutputs(
|
||||
Status: "completed",
|
||||
ExitCode: 0,
|
||||
ResultSummary: $"Bundle created with {result.IncludedDomains.Count} domains",
|
||||
ResultHash: result.BundleDigest,
|
||||
DurationSeconds: result.DurationSeconds,
|
||||
RetryCount: 0,
|
||||
Error: null);
|
||||
|
||||
var artifacts = result.Exports.Select(e => new JobCapsuleArtifact(
|
||||
Name: e.Key,
|
||||
Digest: e.ArtifactDigest,
|
||||
SizeBytes: 0,
|
||||
MediaType: "application/json",
|
||||
StorageUri: null,
|
||||
Attributes: new Dictionary<string, string> { ["format"] = e.Format.ToString() })).ToList();
|
||||
|
||||
var capsuleResult = await _capsuleGenerator.GenerateJobCompletionCapsuleAsync(
|
||||
capsuleRequest, outputs, artifacts, cancellationToken);
|
||||
|
||||
// Emit timeline event
|
||||
var attributes = CreateBaseAttributes(context);
|
||||
attributes["bundleDigest"] = result.BundleDigest;
|
||||
attributes["domainsCount"] = result.IncludedDomains.Count.ToString();
|
||||
attributes["sizeBytes"] = result.BundleSizeBytes.ToString();
|
||||
attributes["durationSeconds"] = result.DurationSeconds.ToString("F2");
|
||||
|
||||
var emitResult = await _timelineEmitter.EmitJobEventAsync(
|
||||
context.TenantId,
|
||||
context.JobId,
|
||||
MirrorEventTypes.BundleCompleted,
|
||||
payload: new
|
||||
{
|
||||
operationId = context.OperationId,
|
||||
bundleDigest = result.BundleDigest,
|
||||
manifestDigest = result.ManifestDigest,
|
||||
includedDomains = result.IncludedDomains,
|
||||
sizeBytes = result.BundleSizeBytes,
|
||||
durationSeconds = result.DurationSeconds,
|
||||
provenanceUri = result.ProvenanceUri,
|
||||
auditTrailUri = result.AuditTrailUri
|
||||
},
|
||||
actor: context.Actor,
|
||||
correlationId: context.OperationId.ToString(),
|
||||
traceId: context.TraceId,
|
||||
projectId: context.ProjectId,
|
||||
attributes: attributes,
|
||||
cancellationToken: cancellationToken);
|
||||
|
||||
_logger.LogInformation(
|
||||
"Recorded bundle completed for job {JobId} operation {OperationId}, digest {BundleDigest}",
|
||||
context.JobId, context.OperationId, result.BundleDigest);
|
||||
|
||||
return new MirrorOperationRecordResult(
|
||||
Success: emitResult.Success,
|
||||
EventId: emitResult.Event.EventId,
|
||||
CapsuleId: capsuleResult.Capsule?.CapsuleId,
|
||||
EvidencePointer: capsuleResult.EvidencePointer,
|
||||
Error: emitResult.Error);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
_logger.LogError(ex, "Failed to record bundle completed for job {JobId}", context.JobId);
|
||||
return new MirrorOperationRecordResult(false, null, null, null, ex.Message);
|
||||
}
|
||||
}
|
||||
|
||||
public async Task<MirrorOperationRecordResult> RecordBundleFailedAsync(
|
||||
MirrorOperationContext context,
|
||||
string errorCode,
|
||||
string errorMessage,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
try
|
||||
{
|
||||
var now = _timeProvider.GetUtcNow();
|
||||
var evidence = new MirrorOperationEvidence(
|
||||
OperationId: context.OperationId,
|
||||
OperationType: MirrorOperationType.BundleExport,
|
||||
TenantId: context.TenantId,
|
||||
ProjectId: context.ProjectId,
|
||||
JobId: context.JobId,
|
||||
Status: MirrorOperationStatus.Failed,
|
||||
StartedAt: now,
|
||||
CompletedAt: now,
|
||||
SourceEnvironment: context.SourceEnvironment,
|
||||
TargetEnvironment: context.TargetEnvironment,
|
||||
BundleDigest: null,
|
||||
ManifestDigest: null,
|
||||
ProvenanceUri: null,
|
||||
AuditTrailUri: null,
|
||||
DomainsCount: 0,
|
||||
RecordsCount: 0,
|
||||
SizeBytes: 0,
|
||||
DurationSeconds: 0,
|
||||
Error: new MirrorOperationError(errorCode, errorMessage));
|
||||
|
||||
await _evidenceStore.StoreAsync(evidence, cancellationToken);
|
||||
|
||||
var attributes = CreateBaseAttributes(context);
|
||||
attributes["errorCode"] = errorCode;
|
||||
|
||||
var emitResult = await _timelineEmitter.EmitJobEventAsync(
|
||||
context.TenantId,
|
||||
context.JobId,
|
||||
MirrorEventTypes.BundleFailed,
|
||||
payload: new { operationId = context.OperationId, errorCode, errorMessage },
|
||||
actor: context.Actor,
|
||||
correlationId: context.OperationId.ToString(),
|
||||
traceId: context.TraceId,
|
||||
projectId: context.ProjectId,
|
||||
attributes: attributes,
|
||||
cancellationToken: cancellationToken);
|
||||
|
||||
_logger.LogWarning(
|
||||
"Recorded bundle failed for job {JobId} operation {OperationId}: {ErrorCode} - {ErrorMessage}",
|
||||
context.JobId, context.OperationId, errorCode, errorMessage);
|
||||
|
||||
return new MirrorOperationRecordResult(
|
||||
Success: emitResult.Success,
|
||||
EventId: emitResult.Event.EventId,
|
||||
CapsuleId: null,
|
||||
EvidencePointer: null,
|
||||
Error: emitResult.Error);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
_logger.LogError(ex, "Failed to record bundle failed for job {JobId}", context.JobId);
|
||||
return new MirrorOperationRecordResult(false, null, null, null, ex.Message);
|
||||
}
|
||||
}
|
||||
|
||||
public async Task<MirrorOperationRecordResult> RecordImportStartedAsync(
|
||||
MirrorOperationContext context,
|
||||
MirrorImportRequest request,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
try
|
||||
{
|
||||
var attributes = CreateBaseAttributes(context);
|
||||
attributes["validateSignatures"] = request.ValidateSignatures.ToString();
|
||||
attributes["verifyProvenance"] = request.VerifyProvenance.ToString();
|
||||
|
||||
var emitResult = await _timelineEmitter.EmitJobEventAsync(
|
||||
context.TenantId,
|
||||
context.JobId,
|
||||
MirrorEventTypes.ImportStarted,
|
||||
payload: new
|
||||
{
|
||||
operationId = context.OperationId,
|
||||
bundleUri = request.BundleUri,
|
||||
expectedDigest = request.ExpectedDigest,
|
||||
validateSignatures = request.ValidateSignatures,
|
||||
verifyProvenance = request.VerifyProvenance
|
||||
},
|
||||
actor: context.Actor,
|
||||
correlationId: context.OperationId.ToString(),
|
||||
traceId: context.TraceId,
|
||||
projectId: context.ProjectId,
|
||||
attributes: attributes,
|
||||
cancellationToken: cancellationToken);
|
||||
|
||||
_logger.LogInformation(
|
||||
"Recorded import started for job {JobId} operation {OperationId}",
|
||||
context.JobId, context.OperationId);
|
||||
|
||||
return new MirrorOperationRecordResult(
|
||||
Success: emitResult.Success,
|
||||
EventId: emitResult.Event.EventId,
|
||||
CapsuleId: null,
|
||||
EvidencePointer: null,
|
||||
Error: emitResult.Error);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
_logger.LogError(ex, "Failed to record import started for job {JobId}", context.JobId);
|
||||
return new MirrorOperationRecordResult(false, null, null, null, ex.Message);
|
||||
}
|
||||
}
|
||||
|
||||
public async Task<MirrorOperationRecordResult> RecordImportValidatedAsync(
|
||||
MirrorOperationContext context,
|
||||
MirrorImportValidation validation,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
try
|
||||
{
|
||||
var attributes = CreateBaseAttributes(context);
|
||||
attributes["isValid"] = validation.IsValid.ToString();
|
||||
attributes["signatureVerified"] = validation.SignatureVerified.ToString();
|
||||
attributes["provenanceVerified"] = validation.ProvenanceVerified.ToString();
|
||||
|
||||
var emitResult = await _timelineEmitter.EmitJobEventAsync(
|
||||
context.TenantId,
|
||||
context.JobId,
|
||||
MirrorEventTypes.ImportValidated,
|
||||
payload: validation,
|
||||
actor: context.Actor,
|
||||
correlationId: context.OperationId.ToString(),
|
||||
traceId: context.TraceId,
|
||||
projectId: context.ProjectId,
|
||||
attributes: attributes,
|
||||
cancellationToken: cancellationToken);
|
||||
|
||||
return new MirrorOperationRecordResult(
|
||||
Success: emitResult.Success,
|
||||
EventId: emitResult.Event.EventId,
|
||||
CapsuleId: null,
|
||||
EvidencePointer: null,
|
||||
Error: emitResult.Error);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
_logger.LogError(ex, "Failed to record import validated for job {JobId}", context.JobId);
|
||||
return new MirrorOperationRecordResult(false, null, null, null, ex.Message);
|
||||
}
|
||||
}
|
||||
|
||||
public async Task<MirrorOperationRecordResult> RecordImportCompletedAsync(
|
||||
MirrorOperationContext context,
|
||||
MirrorImportResult result,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
try
|
||||
{
|
||||
var now = _timeProvider.GetUtcNow();
|
||||
var evidence = new MirrorOperationEvidence(
|
||||
OperationId: context.OperationId,
|
||||
OperationType: MirrorOperationType.BundleImport,
|
||||
TenantId: context.TenantId,
|
||||
ProjectId: context.ProjectId,
|
||||
JobId: context.JobId,
|
||||
Status: MirrorOperationStatus.Completed,
|
||||
StartedAt: now.AddSeconds(-result.DurationSeconds),
|
||||
CompletedAt: now,
|
||||
SourceEnvironment: result.Provenance.SourceEnvironment,
|
||||
TargetEnvironment: context.TargetEnvironment,
|
||||
BundleDigest: result.Provenance.BundleDigest,
|
||||
ManifestDigest: null,
|
||||
ProvenanceUri: null,
|
||||
AuditTrailUri: null,
|
||||
DomainsCount: result.DomainsImported,
|
||||
RecordsCount: result.RecordsImported,
|
||||
SizeBytes: 0,
|
||||
DurationSeconds: result.DurationSeconds,
|
||||
Error: null);
|
||||
|
||||
await _evidenceStore.StoreAsync(evidence, cancellationToken);
|
||||
|
||||
var attributes = CreateBaseAttributes(context);
|
||||
attributes["domainsImported"] = result.DomainsImported.ToString();
|
||||
attributes["recordsImported"] = result.RecordsImported.ToString();
|
||||
attributes["durationSeconds"] = result.DurationSeconds.ToString("F2");
|
||||
|
||||
var emitResult = await _timelineEmitter.EmitJobEventAsync(
|
||||
context.TenantId,
|
||||
context.JobId,
|
||||
MirrorEventTypes.ImportCompleted,
|
||||
payload: new
|
||||
{
|
||||
operationId = context.OperationId,
|
||||
domainsImported = result.DomainsImported,
|
||||
recordsImported = result.RecordsImported,
|
||||
durationSeconds = result.DurationSeconds,
|
||||
provenance = result.Provenance
|
||||
},
|
||||
actor: context.Actor,
|
||||
correlationId: context.OperationId.ToString(),
|
||||
traceId: context.TraceId,
|
||||
projectId: context.ProjectId,
|
||||
attributes: attributes,
|
||||
cancellationToken: cancellationToken);
|
||||
|
||||
_logger.LogInformation(
|
||||
"Recorded import completed for job {JobId} operation {OperationId}, {DomainsImported} domains",
|
||||
context.JobId, context.OperationId, result.DomainsImported);
|
||||
|
||||
return new MirrorOperationRecordResult(
|
||||
Success: emitResult.Success,
|
||||
EventId: emitResult.Event.EventId,
|
||||
CapsuleId: null,
|
||||
EvidencePointer: null,
|
||||
Error: emitResult.Error);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
_logger.LogError(ex, "Failed to record import completed for job {JobId}", context.JobId);
|
||||
return new MirrorOperationRecordResult(false, null, null, null, ex.Message);
|
||||
}
|
||||
}
|
||||
|
||||
public async Task<MirrorOperationRecordResult> RecordImportFailedAsync(
|
||||
MirrorOperationContext context,
|
||||
string errorCode,
|
||||
string errorMessage,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
try
|
||||
{
|
||||
var now = _timeProvider.GetUtcNow();
|
||||
var evidence = new MirrorOperationEvidence(
|
||||
OperationId: context.OperationId,
|
||||
OperationType: MirrorOperationType.BundleImport,
|
||||
TenantId: context.TenantId,
|
||||
ProjectId: context.ProjectId,
|
||||
JobId: context.JobId,
|
||||
Status: MirrorOperationStatus.Failed,
|
||||
StartedAt: now,
|
||||
CompletedAt: now,
|
||||
SourceEnvironment: context.SourceEnvironment,
|
||||
TargetEnvironment: context.TargetEnvironment,
|
||||
BundleDigest: null,
|
||||
ManifestDigest: null,
|
||||
ProvenanceUri: null,
|
||||
AuditTrailUri: null,
|
||||
DomainsCount: 0,
|
||||
RecordsCount: 0,
|
||||
SizeBytes: 0,
|
||||
DurationSeconds: 0,
|
||||
Error: new MirrorOperationError(errorCode, errorMessage));
|
||||
|
||||
await _evidenceStore.StoreAsync(evidence, cancellationToken);
|
||||
|
||||
var attributes = CreateBaseAttributes(context);
|
||||
attributes["errorCode"] = errorCode;
|
||||
|
||||
var emitResult = await _timelineEmitter.EmitJobEventAsync(
|
||||
context.TenantId,
|
||||
context.JobId,
|
||||
MirrorEventTypes.ImportFailed,
|
||||
payload: new { operationId = context.OperationId, errorCode, errorMessage },
|
||||
actor: context.Actor,
|
||||
correlationId: context.OperationId.ToString(),
|
||||
traceId: context.TraceId,
|
||||
projectId: context.ProjectId,
|
||||
attributes: attributes,
|
||||
cancellationToken: cancellationToken);
|
||||
|
||||
_logger.LogWarning(
|
||||
"Recorded import failed for job {JobId} operation {OperationId}: {ErrorCode} - {ErrorMessage}",
|
||||
context.JobId, context.OperationId, errorCode, errorMessage);
|
||||
|
||||
return new MirrorOperationRecordResult(
|
||||
Success: emitResult.Success,
|
||||
EventId: emitResult.Event.EventId,
|
||||
CapsuleId: null,
|
||||
EvidencePointer: null,
|
||||
Error: emitResult.Error);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
_logger.LogError(ex, "Failed to record import failed for job {JobId}", context.JobId);
|
||||
return new MirrorOperationRecordResult(false, null, null, null, ex.Message);
|
||||
}
|
||||
}
|
||||
|
||||
private static Dictionary<string, string> CreateBaseAttributes(MirrorOperationContext context) =>
|
||||
new()
|
||||
{
|
||||
["operationId"] = context.OperationId.ToString(),
|
||||
["jobType"] = context.JobType,
|
||||
["sourceEnvironment"] = context.SourceEnvironment
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Evidence record for mirror operations.
|
||||
/// </summary>
|
||||
public sealed record MirrorOperationEvidence(
|
||||
Guid OperationId,
|
||||
MirrorOperationType OperationType,
|
||||
string TenantId,
|
||||
string? ProjectId,
|
||||
Guid JobId,
|
||||
MirrorOperationStatus Status,
|
||||
DateTimeOffset StartedAt,
|
||||
DateTimeOffset CompletedAt,
|
||||
string SourceEnvironment,
|
||||
string? TargetEnvironment,
|
||||
string? BundleDigest,
|
||||
string? ManifestDigest,
|
||||
string? ProvenanceUri,
|
||||
string? AuditTrailUri,
|
||||
int DomainsCount,
|
||||
int RecordsCount,
|
||||
long SizeBytes,
|
||||
double DurationSeconds,
|
||||
MirrorOperationError? Error);
|
||||
|
||||
/// <summary>
|
||||
/// Error details for mirror operations.
|
||||
/// </summary>
|
||||
public sealed record MirrorOperationError(string Code, string Message);
|
||||
|
||||
/// <summary>
|
||||
/// Types of mirror operations.
|
||||
/// </summary>
|
||||
public enum MirrorOperationType
|
||||
{
|
||||
BundleExport,
|
||||
BundleImport,
|
||||
BundleVerify,
|
||||
BundleSync,
|
||||
BundleDiff
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Status of mirror operations.
|
||||
/// </summary>
|
||||
public enum MirrorOperationStatus
|
||||
{
|
||||
Started,
|
||||
InProgress,
|
||||
Completed,
|
||||
Failed,
|
||||
Cancelled
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Store for mirror operation evidence.
|
||||
/// </summary>
|
||||
public interface IMirrorEvidenceStore
|
||||
{
|
||||
Task StoreAsync(MirrorOperationEvidence evidence, CancellationToken cancellationToken = default);
|
||||
Task<MirrorOperationEvidence?> GetAsync(Guid operationId, CancellationToken cancellationToken = default);
|
||||
Task<IReadOnlyList<MirrorOperationEvidence>> ListForJobAsync(Guid jobId, CancellationToken cancellationToken = default);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// In-memory mirror evidence store for testing.
|
||||
/// </summary>
|
||||
public sealed class InMemoryMirrorEvidenceStore : IMirrorEvidenceStore
|
||||
{
|
||||
private readonly Dictionary<Guid, MirrorOperationEvidence> _evidence = new();
|
||||
private readonly object _lock = new();
|
||||
|
||||
public Task StoreAsync(MirrorOperationEvidence evidence, CancellationToken cancellationToken = default)
|
||||
{
|
||||
lock (_lock) { _evidence[evidence.OperationId] = evidence; }
|
||||
return Task.CompletedTask;
|
||||
}
|
||||
|
||||
public Task<MirrorOperationEvidence?> GetAsync(Guid operationId, CancellationToken cancellationToken = default)
|
||||
{
|
||||
lock (_lock) { return Task.FromResult(_evidence.GetValueOrDefault(operationId)); }
|
||||
}
|
||||
|
||||
public Task<IReadOnlyList<MirrorOperationEvidence>> ListForJobAsync(Guid jobId, CancellationToken cancellationToken = default)
|
||||
{
|
||||
lock (_lock)
|
||||
{
|
||||
var result = _evidence.Values.Where(e => e.JobId == jobId).ToList();
|
||||
return Task.FromResult<IReadOnlyList<MirrorOperationEvidence>>(result);
|
||||
}
|
||||
}
|
||||
|
||||
public void Clear() { lock (_lock) { _evidence.Clear(); } }
|
||||
public int Count { get { lock (_lock) { return _evidence.Count; } } }
|
||||
}
|
||||
@@ -0,0 +1,363 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
/// <summary>
|
||||
/// Represents a pack in the registry with tenant/project scoping.
|
||||
/// Per 150.B-PacksRegistry: Pack versioning and lifecycle management.
|
||||
/// </summary>
|
||||
public sealed record Pack(
|
||||
Guid PackId,
|
||||
string TenantId,
|
||||
string? ProjectId,
|
||||
string Name,
|
||||
string DisplayName,
|
||||
string? Description,
|
||||
PackStatus Status,
|
||||
string CreatedBy,
|
||||
DateTimeOffset CreatedAt,
|
||||
DateTimeOffset UpdatedAt,
|
||||
string? UpdatedBy,
|
||||
string? Metadata,
|
||||
string? Tags,
|
||||
string? IconUri,
|
||||
int VersionCount,
|
||||
string? LatestVersion,
|
||||
DateTimeOffset? PublishedAt,
|
||||
string? PublishedBy)
|
||||
{
|
||||
/// <summary>
|
||||
/// Creates a new pack.
|
||||
/// </summary>
|
||||
public static Pack Create(
|
||||
Guid packId,
|
||||
string tenantId,
|
||||
string? projectId,
|
||||
string name,
|
||||
string displayName,
|
||||
string? description,
|
||||
string createdBy,
|
||||
string? metadata = null,
|
||||
string? tags = null,
|
||||
string? iconUri = null,
|
||||
DateTimeOffset? createdAt = null)
|
||||
{
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(tenantId);
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(name);
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(displayName);
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(createdBy);
|
||||
|
||||
if (createdAt is null)
|
||||
throw new ArgumentNullException(nameof(createdAt), "createdAt must be provided for deterministic behavior.");
|
||||
|
||||
var now = createdAt.Value;
|
||||
|
||||
return new Pack(
|
||||
PackId: packId,
|
||||
TenantId: tenantId,
|
||||
ProjectId: projectId,
|
||||
Name: name.ToLowerInvariant(),
|
||||
DisplayName: displayName,
|
||||
Description: description,
|
||||
Status: PackStatus.Draft,
|
||||
CreatedBy: createdBy,
|
||||
CreatedAt: now,
|
||||
UpdatedAt: now,
|
||||
UpdatedBy: null,
|
||||
Metadata: metadata,
|
||||
Tags: tags,
|
||||
IconUri: iconUri,
|
||||
VersionCount: 0,
|
||||
LatestVersion: null,
|
||||
PublishedAt: null,
|
||||
PublishedBy: null);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Whether the pack is in a terminal state.
|
||||
/// </summary>
|
||||
public bool IsTerminal => Status is PackStatus.Archived;
|
||||
|
||||
/// <summary>
|
||||
/// Whether the pack can accept new versions.
|
||||
/// </summary>
|
||||
public bool CanAddVersion => Status is PackStatus.Draft or PackStatus.Published;
|
||||
|
||||
/// <summary>
|
||||
/// Whether the pack can be published.
|
||||
/// </summary>
|
||||
public bool CanPublish => Status == PackStatus.Draft && VersionCount > 0;
|
||||
|
||||
/// <summary>
|
||||
/// Whether the pack can be deprecated.
|
||||
/// </summary>
|
||||
public bool CanDeprecate => Status == PackStatus.Published;
|
||||
|
||||
/// <summary>
|
||||
/// Whether the pack can be archived.
|
||||
/// </summary>
|
||||
public bool CanArchive => Status is PackStatus.Draft or PackStatus.Deprecated;
|
||||
|
||||
/// <summary>
|
||||
/// Creates a copy with updated status.
|
||||
/// </summary>
|
||||
public Pack WithStatus(PackStatus newStatus, string updatedBy, DateTimeOffset updatedAt)
|
||||
{
|
||||
return this with
|
||||
{
|
||||
Status = newStatus,
|
||||
UpdatedAt = updatedAt,
|
||||
UpdatedBy = updatedBy,
|
||||
PublishedAt = newStatus == PackStatus.Published ? updatedAt : PublishedAt,
|
||||
PublishedBy = newStatus == PackStatus.Published ? updatedBy : PublishedBy
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Creates a copy with incremented version count.
|
||||
/// </summary>
|
||||
public Pack WithVersionAdded(string version, string updatedBy, DateTimeOffset updatedAt)
|
||||
{
|
||||
return this with
|
||||
{
|
||||
VersionCount = VersionCount + 1,
|
||||
LatestVersion = version,
|
||||
UpdatedAt = updatedAt,
|
||||
UpdatedBy = updatedBy
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Pack lifecycle status.
|
||||
/// </summary>
|
||||
public enum PackStatus
|
||||
{
|
||||
/// <summary>
|
||||
/// Pack is in draft mode, not yet published.
|
||||
/// </summary>
|
||||
Draft = 0,
|
||||
|
||||
/// <summary>
|
||||
/// Pack is published and available for use.
|
||||
/// </summary>
|
||||
Published = 1,
|
||||
|
||||
/// <summary>
|
||||
/// Pack is deprecated but still usable.
|
||||
/// </summary>
|
||||
Deprecated = 2,
|
||||
|
||||
/// <summary>
|
||||
/// Pack is archived and no longer usable.
|
||||
/// </summary>
|
||||
Archived = 3
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Represents a version of a pack with artifact provenance.
|
||||
/// Per 150.B-PacksRegistry: Pack artifact storage with provenance metadata.
|
||||
/// </summary>
|
||||
public sealed record PackVersion(
|
||||
Guid PackVersionId,
|
||||
string TenantId,
|
||||
Guid PackId,
|
||||
string Version,
|
||||
string? SemVer,
|
||||
PackVersionStatus Status,
|
||||
string ArtifactUri,
|
||||
string ArtifactDigest,
|
||||
string? ArtifactMimeType,
|
||||
long? ArtifactSizeBytes,
|
||||
string? ManifestJson,
|
||||
string? ManifestDigest,
|
||||
string? ReleaseNotes,
|
||||
string? MinEngineVersion,
|
||||
string? Dependencies,
|
||||
string CreatedBy,
|
||||
DateTimeOffset CreatedAt,
|
||||
DateTimeOffset UpdatedAt,
|
||||
string? UpdatedBy,
|
||||
DateTimeOffset? PublishedAt,
|
||||
string? PublishedBy,
|
||||
DateTimeOffset? DeprecatedAt,
|
||||
string? DeprecatedBy,
|
||||
string? DeprecationReason,
|
||||
string? SignatureUri,
|
||||
string? SignatureAlgorithm,
|
||||
string? SignedBy,
|
||||
DateTimeOffset? SignedAt,
|
||||
string? Metadata,
|
||||
int DownloadCount)
|
||||
{
|
||||
/// <summary>
|
||||
/// Creates a new pack version.
|
||||
/// </summary>
|
||||
public static PackVersion Create(
|
||||
Guid packVersionId,
|
||||
string tenantId,
|
||||
Guid packId,
|
||||
string version,
|
||||
string? semVer,
|
||||
string artifactUri,
|
||||
string artifactDigest,
|
||||
string? artifactMimeType,
|
||||
long? artifactSizeBytes,
|
||||
string? manifestJson,
|
||||
string? manifestDigest,
|
||||
string? releaseNotes,
|
||||
string? minEngineVersion,
|
||||
string? dependencies,
|
||||
string createdBy,
|
||||
string? metadata = null,
|
||||
DateTimeOffset? createdAt = null)
|
||||
{
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(tenantId);
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(version);
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(artifactUri);
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(artifactDigest);
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(createdBy);
|
||||
|
||||
if (createdAt is null)
|
||||
throw new ArgumentNullException(nameof(createdAt), "createdAt must be provided for deterministic behavior.");
|
||||
|
||||
var now = createdAt.Value;
|
||||
|
||||
return new PackVersion(
|
||||
PackVersionId: packVersionId,
|
||||
TenantId: tenantId,
|
||||
PackId: packId,
|
||||
Version: version,
|
||||
SemVer: semVer,
|
||||
Status: PackVersionStatus.Draft,
|
||||
ArtifactUri: artifactUri,
|
||||
ArtifactDigest: artifactDigest,
|
||||
ArtifactMimeType: artifactMimeType,
|
||||
ArtifactSizeBytes: artifactSizeBytes,
|
||||
ManifestJson: manifestJson,
|
||||
ManifestDigest: manifestDigest,
|
||||
ReleaseNotes: releaseNotes,
|
||||
MinEngineVersion: minEngineVersion,
|
||||
Dependencies: dependencies,
|
||||
CreatedBy: createdBy,
|
||||
CreatedAt: now,
|
||||
UpdatedAt: now,
|
||||
UpdatedBy: null,
|
||||
PublishedAt: null,
|
||||
PublishedBy: null,
|
||||
DeprecatedAt: null,
|
||||
DeprecatedBy: null,
|
||||
DeprecationReason: null,
|
||||
SignatureUri: null,
|
||||
SignatureAlgorithm: null,
|
||||
SignedBy: null,
|
||||
SignedAt: null,
|
||||
Metadata: metadata,
|
||||
DownloadCount: 0);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Whether the version is in a terminal state.
|
||||
/// </summary>
|
||||
public bool IsTerminal => Status == PackVersionStatus.Archived;
|
||||
|
||||
/// <summary>
|
||||
/// Whether the version can be published.
|
||||
/// </summary>
|
||||
public bool CanPublish => Status == PackVersionStatus.Draft;
|
||||
|
||||
/// <summary>
|
||||
/// Whether the version can be deprecated.
|
||||
/// </summary>
|
||||
public bool CanDeprecate => Status == PackVersionStatus.Published;
|
||||
|
||||
/// <summary>
|
||||
/// Whether the version can be archived.
|
||||
/// </summary>
|
||||
public bool CanArchive => Status is PackVersionStatus.Draft or PackVersionStatus.Deprecated;
|
||||
|
||||
/// <summary>
|
||||
/// Whether the version is signed.
|
||||
/// </summary>
|
||||
public bool IsSigned => !string.IsNullOrEmpty(SignatureUri);
|
||||
|
||||
/// <summary>
|
||||
/// Creates a copy with updated status.
|
||||
/// </summary>
|
||||
public PackVersion WithStatus(PackVersionStatus newStatus, string updatedBy, DateTimeOffset updatedAt)
|
||||
{
|
||||
return this with
|
||||
{
|
||||
Status = newStatus,
|
||||
UpdatedAt = updatedAt,
|
||||
UpdatedBy = updatedBy,
|
||||
PublishedAt = newStatus == PackVersionStatus.Published ? updatedAt : PublishedAt,
|
||||
PublishedBy = newStatus == PackVersionStatus.Published ? updatedBy : PublishedBy
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Creates a copy with deprecation info.
|
||||
/// </summary>
|
||||
public PackVersion WithDeprecation(string deprecatedBy, string? reason, DateTimeOffset deprecatedAt)
|
||||
{
|
||||
return this with
|
||||
{
|
||||
Status = PackVersionStatus.Deprecated,
|
||||
UpdatedAt = deprecatedAt,
|
||||
UpdatedBy = deprecatedBy,
|
||||
DeprecatedAt = deprecatedAt,
|
||||
DeprecatedBy = deprecatedBy,
|
||||
DeprecationReason = reason
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Creates a copy with signature info.
|
||||
/// </summary>
|
||||
public PackVersion WithSignature(
|
||||
string signatureUri,
|
||||
string signatureAlgorithm,
|
||||
string signedBy,
|
||||
DateTimeOffset signedAt)
|
||||
{
|
||||
return this with
|
||||
{
|
||||
SignatureUri = signatureUri,
|
||||
SignatureAlgorithm = signatureAlgorithm,
|
||||
SignedBy = signedBy,
|
||||
SignedAt = signedAt,
|
||||
UpdatedAt = signedAt,
|
||||
UpdatedBy = signedBy
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Creates a copy with incremented download count.
|
||||
/// </summary>
|
||||
public PackVersion WithDownload() => this with { DownloadCount = DownloadCount + 1 };
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Pack version lifecycle status.
|
||||
/// </summary>
|
||||
public enum PackVersionStatus
|
||||
{
|
||||
/// <summary>
|
||||
/// Version is in draft mode.
|
||||
/// </summary>
|
||||
Draft = 0,
|
||||
|
||||
/// <summary>
|
||||
/// Version is published and available.
|
||||
/// </summary>
|
||||
Published = 1,
|
||||
|
||||
/// <summary>
|
||||
/// Version is deprecated but still available.
|
||||
/// </summary>
|
||||
Deprecated = 2,
|
||||
|
||||
/// <summary>
|
||||
/// Version is archived and no longer available.
|
||||
/// </summary>
|
||||
Archived = 3
|
||||
}
|
||||
@@ -0,0 +1,180 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
/// <summary>
|
||||
/// Represents an Authority pack execution.
|
||||
/// Pack runs execute policy automation scripts with log collection and artifact production.
|
||||
/// </summary>
|
||||
public sealed record PackRun(
|
||||
/// <summary>Unique pack run identifier.</summary>
|
||||
Guid PackRunId,
|
||||
|
||||
/// <summary>Tenant owning this pack run.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Optional project scope within tenant.</summary>
|
||||
string? ProjectId,
|
||||
|
||||
/// <summary>Authority pack ID being executed.</summary>
|
||||
string PackId,
|
||||
|
||||
/// <summary>Pack version (e.g., "1.2.3", "latest").</summary>
|
||||
string PackVersion,
|
||||
|
||||
/// <summary>Current pack run status.</summary>
|
||||
PackRunStatus Status,
|
||||
|
||||
/// <summary>Priority (higher = more urgent). Default 0.</summary>
|
||||
int Priority,
|
||||
|
||||
/// <summary>Current attempt number (1-based).</summary>
|
||||
int Attempt,
|
||||
|
||||
/// <summary>Maximum retry attempts.</summary>
|
||||
int MaxAttempts,
|
||||
|
||||
/// <summary>Pack input parameters JSON.</summary>
|
||||
string Parameters,
|
||||
|
||||
/// <summary>SHA-256 digest of the parameters for determinism verification.</summary>
|
||||
string ParametersDigest,
|
||||
|
||||
/// <summary>Idempotency key for deduplication.</summary>
|
||||
string IdempotencyKey,
|
||||
|
||||
/// <summary>Correlation ID for distributed tracing.</summary>
|
||||
string? CorrelationId,
|
||||
|
||||
/// <summary>Current lease ID (if leased to a task runner).</summary>
|
||||
Guid? LeaseId,
|
||||
|
||||
/// <summary>Task runner executing this pack run.</summary>
|
||||
string? TaskRunnerId,
|
||||
|
||||
/// <summary>Lease expiration time.</summary>
|
||||
DateTimeOffset? LeaseUntil,
|
||||
|
||||
/// <summary>When the pack run was created.</summary>
|
||||
DateTimeOffset CreatedAt,
|
||||
|
||||
/// <summary>When the pack run was scheduled (quota cleared).</summary>
|
||||
DateTimeOffset? ScheduledAt,
|
||||
|
||||
/// <summary>When the pack run was leased to a task runner.</summary>
|
||||
DateTimeOffset? LeasedAt,
|
||||
|
||||
/// <summary>When the pack run started executing.</summary>
|
||||
DateTimeOffset? StartedAt,
|
||||
|
||||
/// <summary>When the pack run completed (terminal state).</summary>
|
||||
DateTimeOffset? CompletedAt,
|
||||
|
||||
/// <summary>Earliest time the pack run can be scheduled (for backoff).</summary>
|
||||
DateTimeOffset? NotBefore,
|
||||
|
||||
/// <summary>Terminal status reason (failure message, cancel reason, etc.).</summary>
|
||||
string? Reason,
|
||||
|
||||
/// <summary>Exit code from pack execution (null if not completed).</summary>
|
||||
int? ExitCode,
|
||||
|
||||
/// <summary>Duration of pack execution in milliseconds.</summary>
|
||||
long? DurationMs,
|
||||
|
||||
/// <summary>Actor who initiated the pack run.</summary>
|
||||
string CreatedBy,
|
||||
|
||||
/// <summary>Optional metadata JSON blob (e.g., trigger info, source context).</summary>
|
||||
string? Metadata)
|
||||
{
|
||||
/// <summary>
|
||||
/// Creates a new pack run in pending status.
|
||||
/// </summary>
|
||||
public static PackRun Create(
|
||||
Guid packRunId,
|
||||
string tenantId,
|
||||
string? projectId,
|
||||
string packId,
|
||||
string packVersion,
|
||||
string parameters,
|
||||
string parametersDigest,
|
||||
string idempotencyKey,
|
||||
string? correlationId,
|
||||
string createdBy,
|
||||
int priority = 0,
|
||||
int maxAttempts = 3,
|
||||
string? metadata = null,
|
||||
DateTimeOffset? createdAt = null)
|
||||
{
|
||||
return new PackRun(
|
||||
PackRunId: packRunId,
|
||||
TenantId: tenantId,
|
||||
ProjectId: projectId,
|
||||
PackId: packId,
|
||||
PackVersion: packVersion,
|
||||
Status: PackRunStatus.Pending,
|
||||
Priority: priority,
|
||||
Attempt: 1,
|
||||
MaxAttempts: maxAttempts,
|
||||
Parameters: parameters,
|
||||
ParametersDigest: parametersDigest,
|
||||
IdempotencyKey: idempotencyKey,
|
||||
CorrelationId: correlationId,
|
||||
LeaseId: null,
|
||||
TaskRunnerId: null,
|
||||
LeaseUntil: null,
|
||||
CreatedAt: createdAt ?? throw new ArgumentNullException(nameof(createdAt), "createdAt must be provided for deterministic behavior."),
|
||||
ScheduledAt: null,
|
||||
LeasedAt: null,
|
||||
StartedAt: null,
|
||||
CompletedAt: null,
|
||||
NotBefore: null,
|
||||
Reason: null,
|
||||
ExitCode: null,
|
||||
DurationMs: null,
|
||||
CreatedBy: createdBy,
|
||||
Metadata: metadata);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Checks if the pack run is in a terminal state.
|
||||
/// </summary>
|
||||
public bool IsTerminal => Status is PackRunStatus.Succeeded or PackRunStatus.Failed or PackRunStatus.Canceled or PackRunStatus.TimedOut;
|
||||
|
||||
/// <summary>
|
||||
/// Checks if the pack run can be retried.
|
||||
/// </summary>
|
||||
public bool CanRetry => Attempt < MaxAttempts && Status == PackRunStatus.Failed;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Pack run lifecycle states.
|
||||
/// Transitions follow the state machine:
|
||||
/// Pending → Scheduled → Leased → Running → (Succeeded | Failed | Canceled | TimedOut)
|
||||
/// Failed pack runs may transition to Pending via retry.
|
||||
/// </summary>
|
||||
public enum PackRunStatus
|
||||
{
|
||||
/// <summary>Pack run created but not yet scheduled (e.g., quota exceeded).</summary>
|
||||
Pending = 0,
|
||||
|
||||
/// <summary>Pack run scheduled and awaiting task runner lease.</summary>
|
||||
Scheduled = 1,
|
||||
|
||||
/// <summary>Pack run leased to a task runner.</summary>
|
||||
Leased = 2,
|
||||
|
||||
/// <summary>Pack run is executing (received start signal from runner).</summary>
|
||||
Running = 3,
|
||||
|
||||
/// <summary>Pack run completed successfully (exit code 0).</summary>
|
||||
Succeeded = 4,
|
||||
|
||||
/// <summary>Pack run failed (non-zero exit or execution error).</summary>
|
||||
Failed = 5,
|
||||
|
||||
/// <summary>Pack run canceled by operator or system.</summary>
|
||||
Canceled = 6,
|
||||
|
||||
/// <summary>Pack run timed out (lease expired without completion).</summary>
|
||||
TimedOut = 7
|
||||
}
|
||||
@@ -0,0 +1,241 @@
|
||||
|
||||
using StellaOps.Cryptography;
|
||||
using StellaOps.JobEngine.Core.Hashing;
|
||||
using System.Text;
|
||||
|
||||
namespace StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
/// <summary>
|
||||
/// Represents a log entry from a pack run execution.
|
||||
/// Log entries are append-only and ordered by sequence number within a pack run.
|
||||
/// </summary>
|
||||
public sealed record PackRunLog(
|
||||
/// <summary>Unique log entry identifier.</summary>
|
||||
Guid LogId,
|
||||
|
||||
/// <summary>Tenant owning this log entry.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Pack run this log belongs to.</summary>
|
||||
Guid PackRunId,
|
||||
|
||||
/// <summary>Sequence number within the pack run (0-based, monotonically increasing).</summary>
|
||||
long Sequence,
|
||||
|
||||
/// <summary>Log level (info, warn, error, debug, trace).</summary>
|
||||
LogLevel Level,
|
||||
|
||||
/// <summary>Log source (e.g., "stdout", "stderr", "system", "pack").</summary>
|
||||
string Source,
|
||||
|
||||
/// <summary>Log message content.</summary>
|
||||
string Message,
|
||||
|
||||
/// <summary>Canonical SHA-256 digest of the log payload (message+data+metadata).</summary>
|
||||
string Digest,
|
||||
|
||||
/// <summary>Size of the log payload in bytes (UTF-8).</summary>
|
||||
long SizeBytes,
|
||||
|
||||
/// <summary>When the log entry was created.</summary>
|
||||
DateTimeOffset Timestamp,
|
||||
|
||||
/// <summary>Optional structured data JSON (e.g., key-value pairs, metrics).</summary>
|
||||
string? Data)
|
||||
{
|
||||
/// <summary>
|
||||
/// Creates a new log entry.
|
||||
/// Uses the platform's compliance-aware crypto abstraction.
|
||||
/// </summary>
|
||||
public static PackRunLog Create(
|
||||
ICryptoHash cryptoHash,
|
||||
Guid packRunId,
|
||||
string tenantId,
|
||||
long sequence,
|
||||
LogLevel level,
|
||||
string source,
|
||||
string message,
|
||||
string? data = null,
|
||||
DateTimeOffset? timestamp = null)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(cryptoHash);
|
||||
|
||||
var (digest, sizeBytes) = ComputeDigest(cryptoHash, message, data, tenantId, packRunId, sequence, level, source);
|
||||
|
||||
return new PackRunLog(
|
||||
LogId: Guid.NewGuid(),
|
||||
TenantId: tenantId,
|
||||
PackRunId: packRunId,
|
||||
Sequence: sequence,
|
||||
Level: level,
|
||||
Source: source,
|
||||
Message: message,
|
||||
Digest: digest,
|
||||
SizeBytes: sizeBytes,
|
||||
Timestamp: timestamp ?? throw new ArgumentNullException(nameof(timestamp), "timestamp must be provided for deterministic behavior."),
|
||||
Data: data);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Creates an info-level stdout log entry.
|
||||
/// </summary>
|
||||
public static PackRunLog Stdout(
|
||||
ICryptoHash cryptoHash,
|
||||
Guid packRunId,
|
||||
string tenantId,
|
||||
long sequence,
|
||||
string message,
|
||||
DateTimeOffset? timestamp = null)
|
||||
{
|
||||
return Create(cryptoHash, packRunId, tenantId, sequence, LogLevel.Info, "stdout", message, null, timestamp);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Creates a warn-level stderr log entry.
|
||||
/// </summary>
|
||||
public static PackRunLog Stderr(
|
||||
ICryptoHash cryptoHash,
|
||||
Guid packRunId,
|
||||
string tenantId,
|
||||
long sequence,
|
||||
string message,
|
||||
DateTimeOffset? timestamp = null)
|
||||
{
|
||||
return Create(cryptoHash, packRunId, tenantId, sequence, LogLevel.Warn, "stderr", message, null, timestamp);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Creates a system-level log entry (lifecycle events).
|
||||
/// </summary>
|
||||
public static PackRunLog System(
|
||||
ICryptoHash cryptoHash,
|
||||
Guid packRunId,
|
||||
string tenantId,
|
||||
long sequence,
|
||||
LogLevel level,
|
||||
string message,
|
||||
string? data = null,
|
||||
DateTimeOffset? timestamp = null)
|
||||
{
|
||||
return Create(cryptoHash, packRunId, tenantId, sequence, level, "system", message, data, timestamp);
|
||||
}
|
||||
|
||||
private static (string Digest, long SizeBytes) ComputeDigest(
|
||||
ICryptoHash cryptoHash,
|
||||
string message,
|
||||
string? data,
|
||||
string tenantId,
|
||||
Guid packRunId,
|
||||
long sequence,
|
||||
LogLevel level,
|
||||
string source)
|
||||
{
|
||||
var payload = new
|
||||
{
|
||||
tenantId,
|
||||
packRunId,
|
||||
sequence,
|
||||
level,
|
||||
source,
|
||||
message,
|
||||
data
|
||||
};
|
||||
|
||||
var canonicalJson = CanonicalJsonHasher.ToCanonicalJson(payload);
|
||||
var bytes = Encoding.UTF8.GetBytes(canonicalJson);
|
||||
var hash = cryptoHash.ComputeHashHexForPurpose(bytes, HashPurpose.Content);
|
||||
|
||||
return (hash, bytes.LongLength);
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Log levels for pack run logs.
|
||||
/// </summary>
|
||||
public enum LogLevel
|
||||
{
|
||||
/// <summary>Trace-level logging (most verbose).</summary>
|
||||
Trace = 0,
|
||||
|
||||
/// <summary>Debug-level logging.</summary>
|
||||
Debug = 1,
|
||||
|
||||
/// <summary>Informational messages (default for stdout).</summary>
|
||||
Info = 2,
|
||||
|
||||
/// <summary>Warning messages (default for stderr).</summary>
|
||||
Warn = 3,
|
||||
|
||||
/// <summary>Error messages.</summary>
|
||||
Error = 4,
|
||||
|
||||
/// <summary>Fatal/critical errors.</summary>
|
||||
Fatal = 5
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Represents a batch of log entries for efficient streaming/storage.
|
||||
/// </summary>
|
||||
public sealed record PackRunLogBatch(
|
||||
/// <summary>Pack run ID these logs belong to.</summary>
|
||||
Guid PackRunId,
|
||||
|
||||
/// <summary>Tenant owning these logs.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Starting sequence number of this batch.</summary>
|
||||
long StartSequence,
|
||||
|
||||
/// <summary>Log entries in this batch.</summary>
|
||||
IReadOnlyList<PackRunLog> Logs)
|
||||
{
|
||||
/// <summary>
|
||||
/// Gets the next expected sequence number after this batch.
|
||||
/// </summary>
|
||||
public long NextSequence => StartSequence + Logs.Count;
|
||||
|
||||
/// <summary>
|
||||
/// Creates a batch from a list of logs.
|
||||
/// </summary>
|
||||
public static PackRunLogBatch FromLogs(Guid packRunId, string tenantId, IReadOnlyList<PackRunLog> logs)
|
||||
{
|
||||
if (logs.Count == 0)
|
||||
return new PackRunLogBatch(packRunId, tenantId, 0, logs);
|
||||
|
||||
return new PackRunLogBatch(packRunId, tenantId, logs[0].Sequence, logs);
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Represents a log cursor for resumable streaming.
|
||||
/// </summary>
|
||||
public sealed record PackRunLogCursor(
|
||||
/// <summary>Pack run ID.</summary>
|
||||
Guid PackRunId,
|
||||
|
||||
/// <summary>Last seen sequence number.</summary>
|
||||
long LastSequence,
|
||||
|
||||
/// <summary>Whether we've reached the end of current logs.</summary>
|
||||
bool IsComplete)
|
||||
{
|
||||
/// <summary>
|
||||
/// Creates a cursor starting from the beginning.
|
||||
/// </summary>
|
||||
public static PackRunLogCursor Start(Guid packRunId) => new(packRunId, -1, false);
|
||||
|
||||
/// <summary>
|
||||
/// Creates a cursor for resuming from a specific sequence.
|
||||
/// </summary>
|
||||
public static PackRunLogCursor Resume(Guid packRunId, long lastSequence) => new(packRunId, lastSequence, false);
|
||||
|
||||
/// <summary>
|
||||
/// Creates a completed cursor.
|
||||
/// </summary>
|
||||
public PackRunLogCursor Complete() => this with { IsComplete = true };
|
||||
|
||||
/// <summary>
|
||||
/// Advances the cursor to a new sequence.
|
||||
/// </summary>
|
||||
public PackRunLogCursor Advance(long newSequence) => this with { LastSequence = newSequence };
|
||||
}
|
||||
@@ -0,0 +1,60 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
/// <summary>
|
||||
/// Represents rate-limit and concurrency quotas for job scheduling.
|
||||
/// Quotas are scoped to tenant and optionally job type.
|
||||
/// </summary>
|
||||
public sealed record Quota(
|
||||
/// <summary>Unique quota identifier.</summary>
|
||||
Guid QuotaId,
|
||||
|
||||
/// <summary>Tenant this quota applies to.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Job type this quota applies to (null = all job types).</summary>
|
||||
string? JobType,
|
||||
|
||||
/// <summary>Maximum concurrent active (leased) jobs.</summary>
|
||||
int MaxActive,
|
||||
|
||||
/// <summary>Maximum jobs per hour (sliding window).</summary>
|
||||
int MaxPerHour,
|
||||
|
||||
/// <summary>Burst capacity for token bucket.</summary>
|
||||
int BurstCapacity,
|
||||
|
||||
/// <summary>Token refill rate (tokens per second).</summary>
|
||||
double RefillRate,
|
||||
|
||||
/// <summary>Current available tokens.</summary>
|
||||
double CurrentTokens,
|
||||
|
||||
/// <summary>Last time tokens were refilled.</summary>
|
||||
DateTimeOffset LastRefillAt,
|
||||
|
||||
/// <summary>Current count of active (leased) jobs.</summary>
|
||||
int CurrentActive,
|
||||
|
||||
/// <summary>Jobs scheduled in current hour window.</summary>
|
||||
int CurrentHourCount,
|
||||
|
||||
/// <summary>Start of current hour window.</summary>
|
||||
DateTimeOffset CurrentHourStart,
|
||||
|
||||
/// <summary>Whether this quota is currently paused (operator override).</summary>
|
||||
bool Paused,
|
||||
|
||||
/// <summary>Operator-provided reason for pause.</summary>
|
||||
string? PauseReason,
|
||||
|
||||
/// <summary>Ticket reference for quota change audit.</summary>
|
||||
string? QuotaTicket,
|
||||
|
||||
/// <summary>When the quota was created.</summary>
|
||||
DateTimeOffset CreatedAt,
|
||||
|
||||
/// <summary>When the quota was last updated.</summary>
|
||||
DateTimeOffset UpdatedAt,
|
||||
|
||||
/// <summary>Actor who last modified the quota.</summary>
|
||||
string UpdatedBy);
|
||||
@@ -0,0 +1,136 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
/// <summary>
|
||||
/// Represents a quota allocation policy that governs how quota is distributed across tenants.
|
||||
/// </summary>
|
||||
public sealed record QuotaAllocationPolicy(
|
||||
/// <summary>Unique policy identifier.</summary>
|
||||
Guid PolicyId,
|
||||
|
||||
/// <summary>Policy name for operator reference.</summary>
|
||||
string Name,
|
||||
|
||||
/// <summary>Policy description.</summary>
|
||||
string? Description,
|
||||
|
||||
/// <summary>Allocation strategy type.</summary>
|
||||
AllocationStrategy Strategy,
|
||||
|
||||
/// <summary>Total capacity pool to allocate from (for proportional/fair strategies).</summary>
|
||||
int TotalCapacity,
|
||||
|
||||
/// <summary>Minimum guaranteed allocation per tenant.</summary>
|
||||
int MinimumPerTenant,
|
||||
|
||||
/// <summary>Maximum allocation per tenant (0 = unlimited up to total).</summary>
|
||||
int MaximumPerTenant,
|
||||
|
||||
/// <summary>Reserved capacity for high-priority tenants.</summary>
|
||||
int ReservedCapacity,
|
||||
|
||||
/// <summary>Whether to allow burst beyond allocation when capacity is available.</summary>
|
||||
bool AllowBurst,
|
||||
|
||||
/// <summary>Maximum burst multiplier (e.g., 1.5 = 150% of allocation).</summary>
|
||||
double BurstMultiplier,
|
||||
|
||||
/// <summary>Policy priority (higher = evaluated first).</summary>
|
||||
int Priority,
|
||||
|
||||
/// <summary>Whether this policy is currently active.</summary>
|
||||
bool Active,
|
||||
|
||||
/// <summary>Job type this policy applies to (null = all).</summary>
|
||||
string? JobType,
|
||||
|
||||
/// <summary>When the policy was created.</summary>
|
||||
DateTimeOffset CreatedAt,
|
||||
|
||||
/// <summary>When the policy was last updated.</summary>
|
||||
DateTimeOffset UpdatedAt,
|
||||
|
||||
/// <summary>Actor who last modified the policy.</summary>
|
||||
string UpdatedBy);
|
||||
|
||||
/// <summary>
|
||||
/// Quota allocation strategies.
|
||||
/// </summary>
|
||||
public enum AllocationStrategy
|
||||
{
|
||||
/// <summary>Equal share for all tenants.</summary>
|
||||
Equal = 0,
|
||||
|
||||
/// <summary>Proportional based on tenant weight/tier.</summary>
|
||||
Proportional = 1,
|
||||
|
||||
/// <summary>Priority-based with preemption.</summary>
|
||||
Priority = 2,
|
||||
|
||||
/// <summary>Reserved minimum with fair sharing of remainder.</summary>
|
||||
ReservedWithFairShare = 3,
|
||||
|
||||
/// <summary>Fixed allocation per tenant.</summary>
|
||||
Fixed = 4
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Tenant priority configuration for quota allocation.
|
||||
/// </summary>
|
||||
public sealed record TenantQuotaPriority(
|
||||
/// <summary>Unique priority record identifier.</summary>
|
||||
Guid PriorityId,
|
||||
|
||||
/// <summary>Tenant this priority applies to.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Policy this priority is associated with.</summary>
|
||||
Guid PolicyId,
|
||||
|
||||
/// <summary>Weight for proportional allocation (default 1.0).</summary>
|
||||
double Weight,
|
||||
|
||||
/// <summary>Priority tier (1 = highest).</summary>
|
||||
int PriorityTier,
|
||||
|
||||
/// <summary>Reserved capacity for this tenant (overrides policy default).</summary>
|
||||
int? ReservedCapacity,
|
||||
|
||||
/// <summary>Whether this tenant is eligible for burst.</summary>
|
||||
bool BurstEligible,
|
||||
|
||||
/// <summary>When the priority was created.</summary>
|
||||
DateTimeOffset CreatedAt,
|
||||
|
||||
/// <summary>When the priority was last updated.</summary>
|
||||
DateTimeOffset UpdatedAt,
|
||||
|
||||
/// <summary>Actor who last modified the priority.</summary>
|
||||
string UpdatedBy);
|
||||
|
||||
/// <summary>
|
||||
/// Result of a quota allocation calculation.
|
||||
/// </summary>
|
||||
public sealed record QuotaAllocationResult(
|
||||
/// <summary>Tenant receiving the allocation.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Allocated quota amount.</summary>
|
||||
int AllocatedQuota,
|
||||
|
||||
/// <summary>Burst capacity available (if any).</summary>
|
||||
int BurstCapacity,
|
||||
|
||||
/// <summary>Reserved capacity guaranteed.</summary>
|
||||
int ReservedCapacity,
|
||||
|
||||
/// <summary>Whether allocation was constrained by limits.</summary>
|
||||
bool WasConstrained,
|
||||
|
||||
/// <summary>Constraint reason if applicable.</summary>
|
||||
string? ConstraintReason,
|
||||
|
||||
/// <summary>Policy that produced this allocation.</summary>
|
||||
Guid PolicyId,
|
||||
|
||||
/// <summary>Time of allocation calculation.</summary>
|
||||
DateTimeOffset CalculatedAt);
|
||||
@@ -0,0 +1,47 @@
|
||||
|
||||
using StellaOps.JobEngine.Core.Hashing;
|
||||
using System.Text.Json.Serialization;
|
||||
|
||||
namespace StellaOps.JobEngine.Core.Domain.Replay;
|
||||
|
||||
/// <summary>
|
||||
/// Immutable lock record that captures the exact replay inputs (tooling, policy/graph hashes, seeds, env)
|
||||
/// and ties them to a specific replay manifest hash. Used to ensure deterministic replays.
|
||||
/// </summary>
|
||||
public sealed record ReplayInputsLock(
|
||||
[property: JsonPropertyName("schemaVersion")] string SchemaVersion,
|
||||
[property: JsonPropertyName("manifestHash")] string ManifestHash,
|
||||
[property: JsonPropertyName("createdAt")] DateTimeOffset CreatedAt,
|
||||
[property: JsonPropertyName("inputs")] ReplayInputs Inputs,
|
||||
[property: JsonPropertyName("notes")] string? Notes = null)
|
||||
{
|
||||
public const string DefaultSchemaVersion = "orch.replay.lock.v1";
|
||||
|
||||
public static ReplayInputsLock Create(
|
||||
ReplayManifest manifest,
|
||||
CanonicalJsonHasher hasher,
|
||||
string? notes = null,
|
||||
DateTimeOffset? createdAt = null,
|
||||
string schemaVersion = DefaultSchemaVersion)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(manifest);
|
||||
ArgumentNullException.ThrowIfNull(hasher);
|
||||
|
||||
return new ReplayInputsLock(
|
||||
SchemaVersion: schemaVersion,
|
||||
ManifestHash: manifest.ComputeHash(hasher),
|
||||
CreatedAt: createdAt ?? throw new ArgumentNullException(nameof(createdAt), "createdAt must be provided for deterministic behavior."),
|
||||
Inputs: manifest.Inputs,
|
||||
Notes: string.IsNullOrWhiteSpace(notes) ? null : notes);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Canonical hash of the lock content.
|
||||
/// Uses the platform's compliance-aware crypto abstraction.
|
||||
/// </summary>
|
||||
public string ComputeHash(CanonicalJsonHasher hasher)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(hasher);
|
||||
return hasher.ComputeCanonicalHash(this);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,78 @@
|
||||
|
||||
using StellaOps.JobEngine.Core.Hashing;
|
||||
using System.Collections.Immutable;
|
||||
using System.Text.Json.Serialization;
|
||||
|
||||
namespace StellaOps.JobEngine.Core.Domain.Replay;
|
||||
|
||||
/// <summary>
|
||||
/// Deterministic replay manifest that captures all inputs required to faithfully re-run a job.
|
||||
/// Aligns with replay-manifest.schema.json and is hashed via canonical JSON.
|
||||
/// </summary>
|
||||
public sealed record ReplayManifest(
|
||||
[property: JsonPropertyName("schemaVersion")] string SchemaVersion,
|
||||
[property: JsonPropertyName("jobId")] string JobId,
|
||||
[property: JsonPropertyName("replayOf")] string ReplayOf,
|
||||
[property: JsonPropertyName("createdAt")] DateTimeOffset CreatedAt,
|
||||
[property: JsonPropertyName("reason")] string? Reason,
|
||||
[property: JsonPropertyName("inputs")] ReplayInputs Inputs,
|
||||
[property: JsonPropertyName("artifacts")] ImmutableArray<ReplayArtifact> Artifacts)
|
||||
{
|
||||
public static ReplayManifest Create(
|
||||
string jobId,
|
||||
string replayOf,
|
||||
ReplayInputs inputs,
|
||||
IEnumerable<ReplayArtifact>? artifacts = null,
|
||||
string schemaVersion = "orch.replay.v1",
|
||||
string? reason = null,
|
||||
DateTimeOffset? createdAt = null)
|
||||
{
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(jobId);
|
||||
ArgumentException.ThrowIfNullOrWhiteSpace(replayOf);
|
||||
ArgumentNullException.ThrowIfNull(inputs);
|
||||
|
||||
return new ReplayManifest(
|
||||
SchemaVersion: schemaVersion,
|
||||
JobId: jobId,
|
||||
ReplayOf: replayOf,
|
||||
CreatedAt: createdAt ?? throw new ArgumentNullException(nameof(createdAt), "createdAt must be provided for deterministic behavior."),
|
||||
Reason: string.IsNullOrWhiteSpace(reason) ? null : reason,
|
||||
Inputs: inputs,
|
||||
Artifacts: artifacts is null ? ImmutableArray<ReplayArtifact>.Empty : ImmutableArray.CreateRange(artifacts));
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Deterministic hash over canonical JSON representation of the manifest.
|
||||
/// Uses the platform's compliance-aware crypto abstraction.
|
||||
/// </summary>
|
||||
public string ComputeHash(CanonicalJsonHasher hasher)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(hasher);
|
||||
return hasher.ComputeCanonicalHash(this);
|
||||
}
|
||||
}
|
||||
|
||||
public sealed record ReplayInputs(
|
||||
[property: JsonPropertyName("policyHash")] string PolicyHash,
|
||||
[property: JsonPropertyName("graphRevisionId")] string GraphRevisionId,
|
||||
[property: JsonPropertyName("latticeHash")] string? LatticeHash,
|
||||
[property: JsonPropertyName("toolImages")] ImmutableArray<string> ToolImages,
|
||||
[property: JsonPropertyName("seeds")] ReplaySeeds Seeds,
|
||||
[property: JsonPropertyName("timeSource")] ReplayTimeSource TimeSource,
|
||||
[property: JsonPropertyName("env")] ImmutableDictionary<string, string> Env);
|
||||
|
||||
public sealed record ReplaySeeds(
|
||||
[property: JsonPropertyName("rng")] int? Rng,
|
||||
[property: JsonPropertyName("sampling")] int? Sampling);
|
||||
|
||||
public sealed record ReplayArtifact(
|
||||
[property: JsonPropertyName("name")] string Name,
|
||||
[property: JsonPropertyName("digest")] string Digest,
|
||||
[property: JsonPropertyName("mediaType")] string? MediaType);
|
||||
|
||||
[JsonConverter(typeof(JsonStringEnumConverter))]
|
||||
public enum ReplayTimeSource
|
||||
{
|
||||
monotonic,
|
||||
wall
|
||||
}
|
||||
@@ -0,0 +1,78 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
/// <summary>
|
||||
/// Represents a run (batch/workflow execution) containing multiple jobs.
|
||||
/// Runs group related jobs (e.g., scanning an image produces multiple analyzer jobs).
|
||||
/// </summary>
|
||||
public sealed record Run(
|
||||
/// <summary>Unique run identifier.</summary>
|
||||
Guid RunId,
|
||||
|
||||
/// <summary>Tenant owning this run.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Optional project scope within tenant.</summary>
|
||||
string? ProjectId,
|
||||
|
||||
/// <summary>Source that initiated this run.</summary>
|
||||
Guid SourceId,
|
||||
|
||||
/// <summary>Run type (e.g., "scan", "advisory-sync", "export").</summary>
|
||||
string RunType,
|
||||
|
||||
/// <summary>Current aggregate status of the run.</summary>
|
||||
RunStatus Status,
|
||||
|
||||
/// <summary>Correlation ID for distributed tracing.</summary>
|
||||
string? CorrelationId,
|
||||
|
||||
/// <summary>Total number of jobs in this run.</summary>
|
||||
int TotalJobs,
|
||||
|
||||
/// <summary>Number of completed jobs (succeeded + failed + canceled).</summary>
|
||||
int CompletedJobs,
|
||||
|
||||
/// <summary>Number of succeeded jobs.</summary>
|
||||
int SucceededJobs,
|
||||
|
||||
/// <summary>Number of failed jobs.</summary>
|
||||
int FailedJobs,
|
||||
|
||||
/// <summary>When the run was created.</summary>
|
||||
DateTimeOffset CreatedAt,
|
||||
|
||||
/// <summary>When the run started executing (first job leased).</summary>
|
||||
DateTimeOffset? StartedAt,
|
||||
|
||||
/// <summary>When the run completed (all jobs terminal).</summary>
|
||||
DateTimeOffset? CompletedAt,
|
||||
|
||||
/// <summary>Actor who initiated the run.</summary>
|
||||
string CreatedBy,
|
||||
|
||||
/// <summary>Optional metadata JSON blob.</summary>
|
||||
string? Metadata);
|
||||
|
||||
/// <summary>
|
||||
/// Run lifecycle states.
|
||||
/// </summary>
|
||||
public enum RunStatus
|
||||
{
|
||||
/// <summary>Run created, jobs being enqueued.</summary>
|
||||
Pending = 0,
|
||||
|
||||
/// <summary>Run is executing (at least one job leased).</summary>
|
||||
Running = 1,
|
||||
|
||||
/// <summary>All jobs completed successfully.</summary>
|
||||
Succeeded = 2,
|
||||
|
||||
/// <summary>Run completed with some failures.</summary>
|
||||
PartiallySucceeded = 3,
|
||||
|
||||
/// <summary>All jobs failed.</summary>
|
||||
Failed = 4,
|
||||
|
||||
/// <summary>Run canceled by operator.</summary>
|
||||
Canceled = 5
|
||||
}
|
||||
@@ -0,0 +1,342 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
/// <summary>
|
||||
/// Immutable ledger entry for run execution records.
|
||||
/// Provides a tamper-evident history of run outcomes with provenance to artifacts.
|
||||
/// </summary>
|
||||
public sealed record RunLedgerEntry(
|
||||
/// <summary>Unique ledger entry identifier.</summary>
|
||||
Guid LedgerId,
|
||||
|
||||
/// <summary>Tenant owning this entry.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Run this entry records.</summary>
|
||||
Guid RunId,
|
||||
|
||||
/// <summary>Source that initiated the run.</summary>
|
||||
Guid SourceId,
|
||||
|
||||
/// <summary>Run type (scan, advisory-sync, export).</summary>
|
||||
string RunType,
|
||||
|
||||
/// <summary>Final run status.</summary>
|
||||
RunStatus FinalStatus,
|
||||
|
||||
/// <summary>Total jobs in the run.</summary>
|
||||
int TotalJobs,
|
||||
|
||||
/// <summary>Successfully completed jobs.</summary>
|
||||
int SucceededJobs,
|
||||
|
||||
/// <summary>Failed jobs.</summary>
|
||||
int FailedJobs,
|
||||
|
||||
/// <summary>When the run was created.</summary>
|
||||
DateTimeOffset RunCreatedAt,
|
||||
|
||||
/// <summary>When the run started executing.</summary>
|
||||
DateTimeOffset? RunStartedAt,
|
||||
|
||||
/// <summary>When the run completed.</summary>
|
||||
DateTimeOffset RunCompletedAt,
|
||||
|
||||
/// <summary>Total execution duration.</summary>
|
||||
TimeSpan ExecutionDuration,
|
||||
|
||||
/// <summary>Actor who initiated the run.</summary>
|
||||
string InitiatedBy,
|
||||
|
||||
/// <summary>SHA-256 digest of the run's input payload.</summary>
|
||||
string InputDigest,
|
||||
|
||||
/// <summary>Aggregated SHA-256 digest of all outputs.</summary>
|
||||
string OutputDigest,
|
||||
|
||||
/// <summary>JSON array of artifact references with their digests.</summary>
|
||||
string ArtifactManifest,
|
||||
|
||||
/// <summary>Sequence number in the tenant's ledger.</summary>
|
||||
long SequenceNumber,
|
||||
|
||||
/// <summary>SHA-256 hash of the previous ledger entry.</summary>
|
||||
string? PreviousEntryHash,
|
||||
|
||||
/// <summary>SHA-256 hash of this entry's content.</summary>
|
||||
string ContentHash,
|
||||
|
||||
/// <summary>When this ledger entry was created.</summary>
|
||||
DateTimeOffset LedgerCreatedAt,
|
||||
|
||||
/// <summary>Correlation ID for tracing.</summary>
|
||||
string? CorrelationId,
|
||||
|
||||
/// <summary>Optional metadata JSON.</summary>
|
||||
string? Metadata)
|
||||
{
|
||||
/// <summary>
|
||||
/// Creates a ledger entry from a completed run.
|
||||
/// </summary>
|
||||
public static RunLedgerEntry FromCompletedRun(
|
||||
Run run,
|
||||
IReadOnlyList<Artifact> artifacts,
|
||||
string inputDigest,
|
||||
long sequenceNumber,
|
||||
string? previousEntryHash,
|
||||
DateTimeOffset ledgerCreatedAt,
|
||||
string? metadata = null)
|
||||
{
|
||||
if (run.CompletedAt is null)
|
||||
{
|
||||
throw new InvalidOperationException("Cannot create ledger entry from an incomplete run.");
|
||||
}
|
||||
|
||||
var ledgerId = Guid.NewGuid();
|
||||
|
||||
// Build artifact manifest
|
||||
var artifactManifest = BuildArtifactManifest(artifacts);
|
||||
|
||||
// Compute output digest from all artifact digests
|
||||
var outputDigest = ComputeOutputDigest(artifacts);
|
||||
|
||||
// Compute execution duration
|
||||
var startTime = run.StartedAt ?? run.CreatedAt;
|
||||
var executionDuration = run.CompletedAt.Value - startTime;
|
||||
|
||||
// Compute content hash for tamper evidence
|
||||
var contentToHash = $"{ledgerId}|{run.TenantId}|{run.RunId}|{run.SourceId}|{run.RunType}|{run.Status}|{run.TotalJobs}|{run.SucceededJobs}|{run.FailedJobs}|{run.CreatedAt:O}|{run.StartedAt:O}|{run.CompletedAt:O}|{inputDigest}|{outputDigest}|{sequenceNumber}|{previousEntryHash}|{ledgerCreatedAt:O}";
|
||||
var contentHash = ComputeSha256(contentToHash);
|
||||
|
||||
return new RunLedgerEntry(
|
||||
LedgerId: ledgerId,
|
||||
TenantId: run.TenantId,
|
||||
RunId: run.RunId,
|
||||
SourceId: run.SourceId,
|
||||
RunType: run.RunType,
|
||||
FinalStatus: run.Status,
|
||||
TotalJobs: run.TotalJobs,
|
||||
SucceededJobs: run.SucceededJobs,
|
||||
FailedJobs: run.FailedJobs,
|
||||
RunCreatedAt: run.CreatedAt,
|
||||
RunStartedAt: run.StartedAt,
|
||||
RunCompletedAt: run.CompletedAt.Value,
|
||||
ExecutionDuration: executionDuration,
|
||||
InitiatedBy: run.CreatedBy,
|
||||
InputDigest: inputDigest,
|
||||
OutputDigest: outputDigest,
|
||||
ArtifactManifest: artifactManifest,
|
||||
SequenceNumber: sequenceNumber,
|
||||
PreviousEntryHash: previousEntryHash,
|
||||
ContentHash: contentHash,
|
||||
LedgerCreatedAt: ledgerCreatedAt,
|
||||
CorrelationId: run.CorrelationId,
|
||||
Metadata: metadata);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Verifies the integrity of this ledger entry.
|
||||
/// </summary>
|
||||
public bool VerifyIntegrity()
|
||||
{
|
||||
var contentToHash = $"{LedgerId}|{TenantId}|{RunId}|{SourceId}|{RunType}|{FinalStatus}|{TotalJobs}|{SucceededJobs}|{FailedJobs}|{RunCreatedAt:O}|{RunStartedAt:O}|{RunCompletedAt:O}|{InputDigest}|{OutputDigest}|{SequenceNumber}|{PreviousEntryHash}|{LedgerCreatedAt:O}";
|
||||
var computed = ComputeSha256(contentToHash);
|
||||
return string.Equals(ContentHash, computed, StringComparison.OrdinalIgnoreCase);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Verifies the chain link to the previous entry.
|
||||
/// </summary>
|
||||
public bool VerifyChainLink(RunLedgerEntry? previousEntry)
|
||||
{
|
||||
if (previousEntry is null)
|
||||
{
|
||||
return PreviousEntryHash is null || SequenceNumber == 1;
|
||||
}
|
||||
|
||||
return string.Equals(PreviousEntryHash, previousEntry.ContentHash, StringComparison.OrdinalIgnoreCase);
|
||||
}
|
||||
|
||||
private static string BuildArtifactManifest(IReadOnlyList<Artifact> artifacts)
|
||||
{
|
||||
var entries = artifacts.Select(a => new
|
||||
{
|
||||
a.ArtifactId,
|
||||
a.ArtifactType,
|
||||
a.Uri,
|
||||
a.Digest,
|
||||
a.MimeType,
|
||||
a.SizeBytes,
|
||||
a.CreatedAt
|
||||
});
|
||||
|
||||
return System.Text.Json.JsonSerializer.Serialize(entries);
|
||||
}
|
||||
|
||||
private static string ComputeOutputDigest(IReadOnlyList<Artifact> artifacts)
|
||||
{
|
||||
if (artifacts.Count == 0)
|
||||
{
|
||||
return ComputeSha256("(no artifacts)");
|
||||
}
|
||||
|
||||
// Sort by artifact ID for deterministic ordering
|
||||
var sortedDigests = artifacts
|
||||
.OrderBy(a => a.ArtifactId)
|
||||
.Select(a => a.Digest)
|
||||
.ToList();
|
||||
|
||||
var combined = string.Join("|", sortedDigests);
|
||||
return ComputeSha256(combined);
|
||||
}
|
||||
|
||||
private static string ComputeSha256(string content)
|
||||
{
|
||||
var bytes = System.Text.Encoding.UTF8.GetBytes(content);
|
||||
var hash = System.Security.Cryptography.SHA256.HashData(bytes);
|
||||
return Convert.ToHexString(hash).ToLowerInvariant();
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Represents a ledger export operation.
|
||||
/// </summary>
|
||||
public sealed record LedgerExport(
|
||||
/// <summary>Unique export identifier.</summary>
|
||||
Guid ExportId,
|
||||
|
||||
/// <summary>Tenant requesting the export.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Export status.</summary>
|
||||
LedgerExportStatus Status,
|
||||
|
||||
/// <summary>Export format (json, ndjson, csv).</summary>
|
||||
string Format,
|
||||
|
||||
/// <summary>Start of the time range to export.</summary>
|
||||
DateTimeOffset? StartTime,
|
||||
|
||||
/// <summary>End of the time range to export.</summary>
|
||||
DateTimeOffset? EndTime,
|
||||
|
||||
/// <summary>Run types to include (null = all).</summary>
|
||||
string? RunTypeFilter,
|
||||
|
||||
/// <summary>Source ID filter (null = all).</summary>
|
||||
Guid? SourceIdFilter,
|
||||
|
||||
/// <summary>Number of entries exported.</summary>
|
||||
int EntryCount,
|
||||
|
||||
/// <summary>URI where the export is stored.</summary>
|
||||
string? OutputUri,
|
||||
|
||||
/// <summary>SHA-256 digest of the export file.</summary>
|
||||
string? OutputDigest,
|
||||
|
||||
/// <summary>Size of the export in bytes.</summary>
|
||||
long? OutputSizeBytes,
|
||||
|
||||
/// <summary>Actor who requested the export.</summary>
|
||||
string RequestedBy,
|
||||
|
||||
/// <summary>When the export was requested.</summary>
|
||||
DateTimeOffset RequestedAt,
|
||||
|
||||
/// <summary>When the export started processing.</summary>
|
||||
DateTimeOffset? StartedAt,
|
||||
|
||||
/// <summary>When the export completed.</summary>
|
||||
DateTimeOffset? CompletedAt,
|
||||
|
||||
/// <summary>Error message if export failed.</summary>
|
||||
string? ErrorMessage)
|
||||
{
|
||||
/// <summary>
|
||||
/// Creates a new pending export request.
|
||||
/// </summary>
|
||||
public static LedgerExport CreateRequest(
|
||||
string tenantId,
|
||||
string format,
|
||||
string requestedBy,
|
||||
DateTimeOffset requestedAt,
|
||||
DateTimeOffset? startTime = null,
|
||||
DateTimeOffset? endTime = null,
|
||||
string? runTypeFilter = null,
|
||||
Guid? sourceIdFilter = null)
|
||||
{
|
||||
if (string.IsNullOrWhiteSpace(format))
|
||||
{
|
||||
throw new ArgumentException("Format is required.", nameof(format));
|
||||
}
|
||||
|
||||
var validFormats = new[] { "json", "ndjson", "csv" };
|
||||
if (!validFormats.Contains(format.ToLowerInvariant()))
|
||||
{
|
||||
throw new ArgumentException($"Invalid format. Must be one of: {string.Join(", ", validFormats)}", nameof(format));
|
||||
}
|
||||
|
||||
return new LedgerExport(
|
||||
ExportId: Guid.NewGuid(),
|
||||
TenantId: tenantId,
|
||||
Status: LedgerExportStatus.Pending,
|
||||
Format: format.ToLowerInvariant(),
|
||||
StartTime: startTime,
|
||||
EndTime: endTime,
|
||||
RunTypeFilter: runTypeFilter,
|
||||
SourceIdFilter: sourceIdFilter,
|
||||
EntryCount: 0,
|
||||
OutputUri: null,
|
||||
OutputDigest: null,
|
||||
OutputSizeBytes: null,
|
||||
RequestedBy: requestedBy,
|
||||
RequestedAt: requestedAt,
|
||||
StartedAt: null,
|
||||
CompletedAt: null,
|
||||
ErrorMessage: null);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Marks the export as started.
|
||||
/// </summary>
|
||||
public LedgerExport Start(DateTimeOffset startedAt) => this with
|
||||
{
|
||||
Status = LedgerExportStatus.Processing,
|
||||
StartedAt = startedAt
|
||||
};
|
||||
|
||||
/// <summary>
|
||||
/// Marks the export as completed.
|
||||
/// </summary>
|
||||
public LedgerExport Complete(string outputUri, string outputDigest, long outputSizeBytes, int entryCount, DateTimeOffset completedAt) => this with
|
||||
{
|
||||
Status = LedgerExportStatus.Completed,
|
||||
OutputUri = outputUri,
|
||||
OutputDigest = outputDigest,
|
||||
OutputSizeBytes = outputSizeBytes,
|
||||
EntryCount = entryCount,
|
||||
CompletedAt = completedAt
|
||||
};
|
||||
|
||||
/// <summary>
|
||||
/// Marks the export as failed.
|
||||
/// </summary>
|
||||
public LedgerExport Fail(string errorMessage, DateTimeOffset failedAt) => this with
|
||||
{
|
||||
Status = LedgerExportStatus.Failed,
|
||||
ErrorMessage = errorMessage,
|
||||
CompletedAt = failedAt
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Status of a ledger export operation.
|
||||
/// </summary>
|
||||
public enum LedgerExportStatus
|
||||
{
|
||||
Pending = 0,
|
||||
Processing = 1,
|
||||
Completed = 2,
|
||||
Failed = 3,
|
||||
Canceled = 4
|
||||
}
|
||||
@@ -0,0 +1,60 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
/// <summary>
|
||||
/// Represents a scheduled job trigger (cron-based or interval-based).
|
||||
/// Schedules automatically create jobs at specified times.
|
||||
/// </summary>
|
||||
public sealed record Schedule(
|
||||
/// <summary>Unique schedule identifier.</summary>
|
||||
Guid ScheduleId,
|
||||
|
||||
/// <summary>Tenant owning this schedule.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Optional project scope within tenant.</summary>
|
||||
string? ProjectId,
|
||||
|
||||
/// <summary>Source that will be used for jobs.</summary>
|
||||
Guid SourceId,
|
||||
|
||||
/// <summary>Human-readable schedule name.</summary>
|
||||
string Name,
|
||||
|
||||
/// <summary>Job type to create.</summary>
|
||||
string JobType,
|
||||
|
||||
/// <summary>Cron expression (6-field with seconds, UTC).</summary>
|
||||
string CronExpression,
|
||||
|
||||
/// <summary>Timezone for cron evaluation (IANA, e.g., "UTC", "America/New_York").</summary>
|
||||
string Timezone,
|
||||
|
||||
/// <summary>Whether the schedule is enabled.</summary>
|
||||
bool Enabled,
|
||||
|
||||
/// <summary>Job payload template JSON.</summary>
|
||||
string PayloadTemplate,
|
||||
|
||||
/// <summary>Job priority for scheduled jobs.</summary>
|
||||
int Priority,
|
||||
|
||||
/// <summary>Maximum retry attempts for scheduled jobs.</summary>
|
||||
int MaxAttempts,
|
||||
|
||||
/// <summary>Last time a job was triggered from this schedule.</summary>
|
||||
DateTimeOffset? LastTriggeredAt,
|
||||
|
||||
/// <summary>Next scheduled trigger time.</summary>
|
||||
DateTimeOffset? NextTriggerAt,
|
||||
|
||||
/// <summary>When the schedule was created.</summary>
|
||||
DateTimeOffset CreatedAt,
|
||||
|
||||
/// <summary>When the schedule was last updated.</summary>
|
||||
DateTimeOffset UpdatedAt,
|
||||
|
||||
/// <summary>Actor who created the schedule.</summary>
|
||||
string CreatedBy,
|
||||
|
||||
/// <summary>Actor who last modified the schedule.</summary>
|
||||
string UpdatedBy);
|
||||
@@ -0,0 +1,432 @@
|
||||
using System.Text.Json;
|
||||
|
||||
namespace StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
/// <summary>
|
||||
/// Signed manifest providing provenance chain from ledger entries to artifacts.
|
||||
/// Enables verification of artifact authenticity and integrity.
|
||||
/// </summary>
|
||||
public sealed record SignedManifest(
|
||||
/// <summary>Unique manifest identifier.</summary>
|
||||
Guid ManifestId,
|
||||
|
||||
/// <summary>Manifest schema version.</summary>
|
||||
string SchemaVersion,
|
||||
|
||||
/// <summary>Tenant owning this manifest.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Type of provenance (run, export, attestation).</summary>
|
||||
ProvenanceType ProvenanceType,
|
||||
|
||||
/// <summary>Subject of the provenance (run ID, export ID, etc.).</summary>
|
||||
Guid SubjectId,
|
||||
|
||||
/// <summary>Provenance statements (JSON array).</summary>
|
||||
string Statements,
|
||||
|
||||
/// <summary>Artifact references with digests (JSON array).</summary>
|
||||
string Artifacts,
|
||||
|
||||
/// <summary>Materials (inputs) used to produce the artifacts (JSON array).</summary>
|
||||
string Materials,
|
||||
|
||||
/// <summary>Build environment information (JSON object).</summary>
|
||||
string? BuildInfo,
|
||||
|
||||
/// <summary>SHA-256 digest of the manifest payload (excluding signature).</summary>
|
||||
string PayloadDigest,
|
||||
|
||||
/// <summary>Signature algorithm used.</summary>
|
||||
string SignatureAlgorithm,
|
||||
|
||||
/// <summary>Base64-encoded signature.</summary>
|
||||
string Signature,
|
||||
|
||||
/// <summary>Key ID used for signing.</summary>
|
||||
string KeyId,
|
||||
|
||||
/// <summary>When the manifest was created.</summary>
|
||||
DateTimeOffset CreatedAt,
|
||||
|
||||
/// <summary>Expiration time of the manifest (if any).</summary>
|
||||
DateTimeOffset? ExpiresAt,
|
||||
|
||||
/// <summary>Additional metadata (JSON object).</summary>
|
||||
string? Metadata)
|
||||
{
|
||||
/// <summary>
|
||||
/// Current schema version for manifests.
|
||||
/// </summary>
|
||||
public const string CurrentSchemaVersion = "1.0.0";
|
||||
|
||||
/// <summary>
|
||||
/// Indicates whether the manifest has expired.
|
||||
/// A manifest is expired if it has an ExpiresAt value and that time has passed.
|
||||
/// </summary>
|
||||
public bool IsExpired => ExpiresAt.HasValue && ExpiresAt.Value < DateTimeOffset.UtcNow;
|
||||
|
||||
/// <summary>
|
||||
/// Creates an unsigned manifest from a ledger entry.
|
||||
/// The manifest must be signed separately using SigningService.
|
||||
/// </summary>
|
||||
public static SignedManifest CreateFromLedgerEntry(
|
||||
RunLedgerEntry ledger,
|
||||
DateTimeOffset createdAt,
|
||||
string? buildInfo = null,
|
||||
string? metadata = null)
|
||||
{
|
||||
var statements = CreateStatementsFromLedger(ledger);
|
||||
var artifacts = ledger.ArtifactManifest;
|
||||
var materials = CreateMaterialsFromLedger(ledger);
|
||||
|
||||
var payloadDigest = ComputePayloadDigest(
|
||||
ledger.TenantId,
|
||||
ProvenanceType.Run,
|
||||
ledger.RunId,
|
||||
statements,
|
||||
artifacts,
|
||||
materials);
|
||||
|
||||
return new SignedManifest(
|
||||
ManifestId: Guid.NewGuid(),
|
||||
SchemaVersion: CurrentSchemaVersion,
|
||||
TenantId: ledger.TenantId,
|
||||
ProvenanceType: ProvenanceType.Run,
|
||||
SubjectId: ledger.RunId,
|
||||
Statements: statements,
|
||||
Artifacts: artifacts,
|
||||
Materials: materials,
|
||||
BuildInfo: buildInfo,
|
||||
PayloadDigest: payloadDigest,
|
||||
SignatureAlgorithm: "none",
|
||||
Signature: string.Empty,
|
||||
KeyId: string.Empty,
|
||||
CreatedAt: createdAt,
|
||||
ExpiresAt: null,
|
||||
Metadata: metadata);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Creates an unsigned manifest from a ledger export.
|
||||
/// </summary>
|
||||
public static SignedManifest CreateFromExport(
|
||||
LedgerExport export,
|
||||
IReadOnlyList<RunLedgerEntry> entries,
|
||||
DateTimeOffset createdAt,
|
||||
string? buildInfo = null,
|
||||
string? metadata = null)
|
||||
{
|
||||
if (export.Status != LedgerExportStatus.Completed)
|
||||
{
|
||||
throw new InvalidOperationException("Cannot create manifest from incomplete export.");
|
||||
}
|
||||
|
||||
var statements = CreateStatementsFromExport(export, entries, createdAt);
|
||||
var artifacts = CreateExportArtifacts(export);
|
||||
var materials = CreateExportMaterials(entries);
|
||||
|
||||
var payloadDigest = ComputePayloadDigest(
|
||||
export.TenantId,
|
||||
ProvenanceType.Export,
|
||||
export.ExportId,
|
||||
statements,
|
||||
artifacts,
|
||||
materials);
|
||||
|
||||
return new SignedManifest(
|
||||
ManifestId: Guid.NewGuid(),
|
||||
SchemaVersion: CurrentSchemaVersion,
|
||||
TenantId: export.TenantId,
|
||||
ProvenanceType: ProvenanceType.Export,
|
||||
SubjectId: export.ExportId,
|
||||
Statements: statements,
|
||||
Artifacts: artifacts,
|
||||
Materials: materials,
|
||||
BuildInfo: buildInfo,
|
||||
PayloadDigest: payloadDigest,
|
||||
SignatureAlgorithm: "none",
|
||||
Signature: string.Empty,
|
||||
KeyId: string.Empty,
|
||||
CreatedAt: createdAt,
|
||||
ExpiresAt: null,
|
||||
Metadata: metadata);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Signs the manifest with the provided signature.
|
||||
/// </summary>
|
||||
public SignedManifest Sign(string signatureAlgorithm, string signature, string keyId, DateTimeOffset? expiresAt = null)
|
||||
{
|
||||
if (string.IsNullOrWhiteSpace(signatureAlgorithm))
|
||||
{
|
||||
throw new ArgumentException("Signature algorithm is required.", nameof(signatureAlgorithm));
|
||||
}
|
||||
|
||||
if (string.IsNullOrWhiteSpace(signature))
|
||||
{
|
||||
throw new ArgumentException("Signature is required.", nameof(signature));
|
||||
}
|
||||
|
||||
if (string.IsNullOrWhiteSpace(keyId))
|
||||
{
|
||||
throw new ArgumentException("Key ID is required.", nameof(keyId));
|
||||
}
|
||||
|
||||
return this with
|
||||
{
|
||||
SignatureAlgorithm = signatureAlgorithm,
|
||||
Signature = signature,
|
||||
KeyId = keyId,
|
||||
ExpiresAt = expiresAt
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Checks if the manifest is signed.
|
||||
/// </summary>
|
||||
public bool IsSigned => !string.IsNullOrEmpty(Signature) && SignatureAlgorithm != "none";
|
||||
|
||||
/// <summary>
|
||||
/// Checks if the manifest has expired at the given time.
|
||||
/// </summary>
|
||||
public bool IsExpiredAt(DateTimeOffset now) => ExpiresAt.HasValue && ExpiresAt.Value < now;
|
||||
|
||||
/// <summary>
|
||||
/// Verifies the payload digest integrity.
|
||||
/// </summary>
|
||||
public bool VerifyPayloadIntegrity()
|
||||
{
|
||||
var computed = ComputePayloadDigest(TenantId, ProvenanceType, SubjectId, Statements, Artifacts, Materials);
|
||||
return string.Equals(PayloadDigest, computed, StringComparison.OrdinalIgnoreCase);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Parses the artifact manifest into typed objects.
|
||||
/// </summary>
|
||||
public IReadOnlyList<ArtifactReference> GetArtifactReferences()
|
||||
{
|
||||
if (string.IsNullOrEmpty(Artifacts) || Artifacts == "[]")
|
||||
{
|
||||
return Array.Empty<ArtifactReference>();
|
||||
}
|
||||
|
||||
return JsonSerializer.Deserialize<List<ArtifactReference>>(Artifacts) ?? [];
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Parses the material manifest into typed objects.
|
||||
/// </summary>
|
||||
public IReadOnlyList<MaterialReference> GetMaterialReferences()
|
||||
{
|
||||
if (string.IsNullOrEmpty(Materials) || Materials == "[]")
|
||||
{
|
||||
return Array.Empty<MaterialReference>();
|
||||
}
|
||||
|
||||
return JsonSerializer.Deserialize<List<MaterialReference>>(Materials) ?? [];
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Parses the statements into typed objects.
|
||||
/// </summary>
|
||||
public IReadOnlyList<ProvenanceStatement> GetStatements()
|
||||
{
|
||||
if (string.IsNullOrEmpty(Statements) || Statements == "[]")
|
||||
{
|
||||
return Array.Empty<ProvenanceStatement>();
|
||||
}
|
||||
|
||||
return JsonSerializer.Deserialize<List<ProvenanceStatement>>(Statements) ?? [];
|
||||
}
|
||||
|
||||
private static string CreateStatementsFromLedger(RunLedgerEntry ledger)
|
||||
{
|
||||
var statements = new List<ProvenanceStatement>
|
||||
{
|
||||
new(
|
||||
StatementType: "run_completed",
|
||||
Subject: $"run:{ledger.RunId}",
|
||||
Predicate: "produced",
|
||||
Object: $"outputs:{ledger.OutputDigest}",
|
||||
Timestamp: ledger.RunCompletedAt,
|
||||
Metadata: JsonSerializer.Serialize(new
|
||||
{
|
||||
ledger.RunType,
|
||||
ledger.FinalStatus,
|
||||
ledger.TotalJobs,
|
||||
ledger.SucceededJobs,
|
||||
ledger.FailedJobs,
|
||||
ledger.ExecutionDuration
|
||||
})),
|
||||
new(
|
||||
StatementType: "chain_link",
|
||||
Subject: $"ledger:{ledger.LedgerId}",
|
||||
Predicate: "follows",
|
||||
Object: ledger.PreviousEntryHash ?? "(genesis)",
|
||||
Timestamp: ledger.LedgerCreatedAt,
|
||||
Metadata: JsonSerializer.Serialize(new
|
||||
{
|
||||
ledger.SequenceNumber,
|
||||
ledger.ContentHash
|
||||
}))
|
||||
};
|
||||
|
||||
return JsonSerializer.Serialize(statements);
|
||||
}
|
||||
|
||||
private static string CreateMaterialsFromLedger(RunLedgerEntry ledger)
|
||||
{
|
||||
var materials = new List<MaterialReference>
|
||||
{
|
||||
new(
|
||||
Uri: $"input:{ledger.RunId}",
|
||||
Digest: ledger.InputDigest,
|
||||
MediaType: "application/json",
|
||||
Name: "run_input")
|
||||
};
|
||||
|
||||
return JsonSerializer.Serialize(materials);
|
||||
}
|
||||
|
||||
private static string CreateStatementsFromExport(LedgerExport export, IReadOnlyList<RunLedgerEntry> entries, DateTimeOffset createdAt)
|
||||
{
|
||||
var timestamp = export.CompletedAt ?? createdAt;
|
||||
var statements = new List<ProvenanceStatement>
|
||||
{
|
||||
new(
|
||||
StatementType: "export_completed",
|
||||
Subject: $"export:{export.ExportId}",
|
||||
Predicate: "contains",
|
||||
Object: $"entries:{entries.Count}",
|
||||
Timestamp: timestamp,
|
||||
Metadata: JsonSerializer.Serialize(new
|
||||
{
|
||||
export.Format,
|
||||
export.EntryCount,
|
||||
export.StartTime,
|
||||
export.EndTime,
|
||||
export.RunTypeFilter,
|
||||
export.SourceIdFilter
|
||||
}))
|
||||
};
|
||||
|
||||
// Add chain integrity statement
|
||||
if (entries.Count > 0)
|
||||
{
|
||||
var first = entries.MinBy(e => e.SequenceNumber);
|
||||
var last = entries.MaxBy(e => e.SequenceNumber);
|
||||
if (first is not null && last is not null)
|
||||
{
|
||||
statements.Add(new ProvenanceStatement(
|
||||
StatementType: "chain_range",
|
||||
Subject: $"export:{export.ExportId}",
|
||||
Predicate: "covers",
|
||||
Object: $"sequence:{first.SequenceNumber}-{last.SequenceNumber}",
|
||||
Timestamp: timestamp,
|
||||
Metadata: JsonSerializer.Serialize(new
|
||||
{
|
||||
FirstEntryHash = first.ContentHash,
|
||||
LastEntryHash = last.ContentHash
|
||||
})));
|
||||
}
|
||||
}
|
||||
|
||||
return JsonSerializer.Serialize(statements);
|
||||
}
|
||||
|
||||
private static string CreateExportArtifacts(LedgerExport export)
|
||||
{
|
||||
var artifacts = new List<ArtifactReference>
|
||||
{
|
||||
new(
|
||||
ArtifactId: export.ExportId,
|
||||
ArtifactType: "ledger_export",
|
||||
Uri: export.OutputUri ?? string.Empty,
|
||||
Digest: export.OutputDigest ?? string.Empty,
|
||||
MediaType: GetMediaType(export.Format),
|
||||
SizeBytes: export.OutputSizeBytes ?? 0)
|
||||
};
|
||||
|
||||
return JsonSerializer.Serialize(artifacts);
|
||||
}
|
||||
|
||||
private static string CreateExportMaterials(IReadOnlyList<RunLedgerEntry> entries)
|
||||
{
|
||||
var materials = entries.Select(e => new MaterialReference(
|
||||
Uri: $"ledger:{e.LedgerId}",
|
||||
Digest: e.ContentHash,
|
||||
MediaType: "application/json",
|
||||
Name: $"run_{e.RunId}")).ToList();
|
||||
|
||||
return JsonSerializer.Serialize(materials);
|
||||
}
|
||||
|
||||
private static string GetMediaType(string format) => format.ToLowerInvariant() switch
|
||||
{
|
||||
"json" => "application/json",
|
||||
"ndjson" => "application/x-ndjson",
|
||||
"csv" => "text/csv",
|
||||
_ => "application/octet-stream"
|
||||
};
|
||||
|
||||
private static string ComputePayloadDigest(
|
||||
string tenantId,
|
||||
ProvenanceType provenanceType,
|
||||
Guid subjectId,
|
||||
string statements,
|
||||
string artifacts,
|
||||
string materials)
|
||||
{
|
||||
var payload = $"{tenantId}|{provenanceType}|{subjectId}|{statements}|{artifacts}|{materials}";
|
||||
var bytes = System.Text.Encoding.UTF8.GetBytes(payload);
|
||||
var hash = System.Security.Cryptography.SHA256.HashData(bytes);
|
||||
return Convert.ToHexString(hash).ToLowerInvariant();
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Types of provenance tracked by manifests.
|
||||
/// </summary>
|
||||
public enum ProvenanceType
|
||||
{
|
||||
/// <summary>Provenance for a completed run.</summary>
|
||||
Run = 0,
|
||||
|
||||
/// <summary>Provenance for a ledger export.</summary>
|
||||
Export = 1,
|
||||
|
||||
/// <summary>Provenance for an attestation.</summary>
|
||||
Attestation = 2
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Reference to an artifact in a manifest.
|
||||
/// </summary>
|
||||
public sealed record ArtifactReference(
|
||||
Guid ArtifactId,
|
||||
string ArtifactType,
|
||||
string Uri,
|
||||
string Digest,
|
||||
string MediaType,
|
||||
long SizeBytes);
|
||||
|
||||
/// <summary>
|
||||
/// Reference to a material (input) in a manifest.
|
||||
/// </summary>
|
||||
public sealed record MaterialReference(
|
||||
string Uri,
|
||||
string Digest,
|
||||
string MediaType,
|
||||
string Name);
|
||||
|
||||
/// <summary>
|
||||
/// A provenance statement in a manifest.
|
||||
/// </summary>
|
||||
public sealed record ProvenanceStatement(
|
||||
string StatementType,
|
||||
string Subject,
|
||||
string Predicate,
|
||||
string Object,
|
||||
DateTimeOffset Timestamp,
|
||||
string? Metadata);
|
||||
@@ -0,0 +1,568 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
/// <summary>
|
||||
/// Service Level Objective type.
|
||||
/// </summary>
|
||||
public enum SloType
|
||||
{
|
||||
/// <summary>Availability SLO (percentage of successful requests).</summary>
|
||||
Availability,
|
||||
|
||||
/// <summary>Latency SLO (percentile-based response time).</summary>
|
||||
Latency,
|
||||
|
||||
/// <summary>Throughput SLO (minimum jobs processed per period).</summary>
|
||||
Throughput
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Time window for SLO computation.
|
||||
/// </summary>
|
||||
public enum SloWindow
|
||||
{
|
||||
/// <summary>Rolling 1 hour window.</summary>
|
||||
OneHour,
|
||||
|
||||
/// <summary>Rolling 1 day window.</summary>
|
||||
OneDay,
|
||||
|
||||
/// <summary>Rolling 7 day window.</summary>
|
||||
SevenDays,
|
||||
|
||||
/// <summary>Rolling 30 day window.</summary>
|
||||
ThirtyDays
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Alert severity for SLO violations.
|
||||
/// </summary>
|
||||
public enum AlertSeverity
|
||||
{
|
||||
/// <summary>Informational - SLO approaching threshold.</summary>
|
||||
Info,
|
||||
|
||||
/// <summary>Warning - SLO at risk.</summary>
|
||||
Warning,
|
||||
|
||||
/// <summary>Critical - SLO likely to be breached.</summary>
|
||||
Critical,
|
||||
|
||||
/// <summary>Emergency - SLO breached.</summary>
|
||||
Emergency
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Service Level Objective definition.
|
||||
/// </summary>
|
||||
public sealed record Slo(
|
||||
/// <summary>Unique SLO identifier.</summary>
|
||||
Guid SloId,
|
||||
|
||||
/// <summary>Tenant this SLO belongs to.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Human-readable name.</summary>
|
||||
string Name,
|
||||
|
||||
/// <summary>Optional description.</summary>
|
||||
string? Description,
|
||||
|
||||
/// <summary>Type of SLO.</summary>
|
||||
SloType Type,
|
||||
|
||||
/// <summary>Job type this SLO applies to (null = all job types).</summary>
|
||||
string? JobType,
|
||||
|
||||
/// <summary>Source ID this SLO applies to (null = all sources).</summary>
|
||||
Guid? SourceId,
|
||||
|
||||
/// <summary>Target objective (e.g., 0.999 for 99.9% availability).</summary>
|
||||
double Target,
|
||||
|
||||
/// <summary>Time window for SLO evaluation.</summary>
|
||||
SloWindow Window,
|
||||
|
||||
/// <summary>For latency SLOs: the percentile (e.g., 0.95 for P95).</summary>
|
||||
double? LatencyPercentile,
|
||||
|
||||
/// <summary>For latency SLOs: the target latency in seconds.</summary>
|
||||
double? LatencyTargetSeconds,
|
||||
|
||||
/// <summary>For throughput SLOs: minimum jobs per period.</summary>
|
||||
int? ThroughputMinimum,
|
||||
|
||||
/// <summary>Whether this SLO is actively monitored.</summary>
|
||||
bool Enabled,
|
||||
|
||||
/// <summary>When the SLO was created.</summary>
|
||||
DateTimeOffset CreatedAt,
|
||||
|
||||
/// <summary>When the SLO was last updated.</summary>
|
||||
DateTimeOffset UpdatedAt,
|
||||
|
||||
/// <summary>Actor who created the SLO.</summary>
|
||||
string CreatedBy,
|
||||
|
||||
/// <summary>Actor who last modified the SLO.</summary>
|
||||
string UpdatedBy)
|
||||
{
|
||||
/// <summary>Calculates the error budget as a decimal (1 - target).</summary>
|
||||
public double ErrorBudget => 1.0 - Target;
|
||||
|
||||
/// <summary>Creates a new availability SLO.</summary>
|
||||
public static Slo CreateAvailability(
|
||||
string tenantId,
|
||||
string name,
|
||||
double target,
|
||||
SloWindow window,
|
||||
string createdBy,
|
||||
DateTimeOffset createdAt,
|
||||
string? description = null,
|
||||
string? jobType = null,
|
||||
Guid? sourceId = null)
|
||||
{
|
||||
ValidateTarget(target);
|
||||
|
||||
return new Slo(
|
||||
SloId: Guid.NewGuid(),
|
||||
TenantId: tenantId,
|
||||
Name: name,
|
||||
Description: description,
|
||||
Type: SloType.Availability,
|
||||
JobType: jobType,
|
||||
SourceId: sourceId,
|
||||
Target: target,
|
||||
Window: window,
|
||||
LatencyPercentile: null,
|
||||
LatencyTargetSeconds: null,
|
||||
ThroughputMinimum: null,
|
||||
Enabled: true,
|
||||
CreatedAt: createdAt,
|
||||
UpdatedAt: createdAt,
|
||||
CreatedBy: createdBy,
|
||||
UpdatedBy: createdBy);
|
||||
}
|
||||
|
||||
/// <summary>Creates a new latency SLO.</summary>
|
||||
public static Slo CreateLatency(
|
||||
string tenantId,
|
||||
string name,
|
||||
double percentile,
|
||||
double targetSeconds,
|
||||
double target,
|
||||
SloWindow window,
|
||||
string createdBy,
|
||||
DateTimeOffset createdAt,
|
||||
string? description = null,
|
||||
string? jobType = null,
|
||||
Guid? sourceId = null)
|
||||
{
|
||||
ValidateTarget(target);
|
||||
if (percentile < 0 || percentile > 1)
|
||||
throw new ArgumentOutOfRangeException(nameof(percentile), "Percentile must be between 0 and 1");
|
||||
if (targetSeconds <= 0)
|
||||
throw new ArgumentOutOfRangeException(nameof(targetSeconds), "Target latency must be positive");
|
||||
|
||||
return new Slo(
|
||||
SloId: Guid.NewGuid(),
|
||||
TenantId: tenantId,
|
||||
Name: name,
|
||||
Description: description,
|
||||
Type: SloType.Latency,
|
||||
JobType: jobType,
|
||||
SourceId: sourceId,
|
||||
Target: target,
|
||||
Window: window,
|
||||
LatencyPercentile: percentile,
|
||||
LatencyTargetSeconds: targetSeconds,
|
||||
ThroughputMinimum: null,
|
||||
Enabled: true,
|
||||
CreatedAt: createdAt,
|
||||
UpdatedAt: createdAt,
|
||||
CreatedBy: createdBy,
|
||||
UpdatedBy: createdBy);
|
||||
}
|
||||
|
||||
/// <summary>Creates a new throughput SLO.</summary>
|
||||
public static Slo CreateThroughput(
|
||||
string tenantId,
|
||||
string name,
|
||||
int minimum,
|
||||
double target,
|
||||
SloWindow window,
|
||||
string createdBy,
|
||||
DateTimeOffset createdAt,
|
||||
string? description = null,
|
||||
string? jobType = null,
|
||||
Guid? sourceId = null)
|
||||
{
|
||||
ValidateTarget(target);
|
||||
if (minimum <= 0)
|
||||
throw new ArgumentOutOfRangeException(nameof(minimum), "Throughput minimum must be positive");
|
||||
|
||||
return new Slo(
|
||||
SloId: Guid.NewGuid(),
|
||||
TenantId: tenantId,
|
||||
Name: name,
|
||||
Description: description,
|
||||
Type: SloType.Throughput,
|
||||
JobType: jobType,
|
||||
SourceId: sourceId,
|
||||
Target: target,
|
||||
Window: window,
|
||||
LatencyPercentile: null,
|
||||
LatencyTargetSeconds: null,
|
||||
ThroughputMinimum: minimum,
|
||||
Enabled: true,
|
||||
CreatedAt: createdAt,
|
||||
UpdatedAt: createdAt,
|
||||
CreatedBy: createdBy,
|
||||
UpdatedBy: createdBy);
|
||||
}
|
||||
|
||||
/// <summary>Updates the SLO with new values.</summary>
|
||||
public Slo Update(
|
||||
DateTimeOffset updatedAt,
|
||||
string? name = null,
|
||||
string? description = null,
|
||||
double? target = null,
|
||||
bool? enabled = null,
|
||||
string? updatedBy = null)
|
||||
{
|
||||
if (target.HasValue)
|
||||
ValidateTarget(target.Value);
|
||||
|
||||
return this with
|
||||
{
|
||||
Name = name ?? Name,
|
||||
Description = description ?? Description,
|
||||
Target = target ?? Target,
|
||||
Enabled = enabled ?? Enabled,
|
||||
UpdatedAt = updatedAt,
|
||||
UpdatedBy = updatedBy ?? UpdatedBy
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>Disables the SLO.</summary>
|
||||
public Slo Disable(string updatedBy, DateTimeOffset updatedAt) =>
|
||||
this with
|
||||
{
|
||||
Enabled = false,
|
||||
UpdatedAt = updatedAt,
|
||||
UpdatedBy = updatedBy
|
||||
};
|
||||
|
||||
/// <summary>Enables the SLO.</summary>
|
||||
public Slo Enable(string updatedBy, DateTimeOffset updatedAt) =>
|
||||
this with
|
||||
{
|
||||
Enabled = true,
|
||||
UpdatedAt = updatedAt,
|
||||
UpdatedBy = updatedBy
|
||||
};
|
||||
|
||||
/// <summary>Gets the window duration as a TimeSpan.</summary>
|
||||
public TimeSpan GetWindowDuration() => Window switch
|
||||
{
|
||||
SloWindow.OneHour => TimeSpan.FromHours(1),
|
||||
SloWindow.OneDay => TimeSpan.FromDays(1),
|
||||
SloWindow.SevenDays => TimeSpan.FromDays(7),
|
||||
SloWindow.ThirtyDays => TimeSpan.FromDays(30),
|
||||
_ => throw new InvalidOperationException($"Unknown window: {Window}")
|
||||
};
|
||||
|
||||
private static void ValidateTarget(double target)
|
||||
{
|
||||
if (target <= 0 || target > 1)
|
||||
throw new ArgumentOutOfRangeException(nameof(target), "Target must be between 0 (exclusive) and 1 (inclusive)");
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Current state of an SLO including burn rate and budget consumption.
|
||||
/// </summary>
|
||||
public sealed record SloState(
|
||||
/// <summary>The SLO this state belongs to.</summary>
|
||||
Guid SloId,
|
||||
|
||||
/// <summary>Tenant this state belongs to.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Current SLI value (actual performance).</summary>
|
||||
double CurrentSli,
|
||||
|
||||
/// <summary>Total events/requests in the window.</summary>
|
||||
long TotalEvents,
|
||||
|
||||
/// <summary>Good events (successful) in the window.</summary>
|
||||
long GoodEvents,
|
||||
|
||||
/// <summary>Bad events (failed) in the window.</summary>
|
||||
long BadEvents,
|
||||
|
||||
/// <summary>Error budget consumed (0-1 where 1 = fully consumed).</summary>
|
||||
double BudgetConsumed,
|
||||
|
||||
/// <summary>Error budget remaining (0-1 where 1 = fully available).</summary>
|
||||
double BudgetRemaining,
|
||||
|
||||
/// <summary>Current burn rate (1.0 = consuming budget at sustainable rate).</summary>
|
||||
double BurnRate,
|
||||
|
||||
/// <summary>Projected time until budget exhaustion (null if not burning).</summary>
|
||||
TimeSpan? TimeToExhaustion,
|
||||
|
||||
/// <summary>Whether the SLO is currently met.</summary>
|
||||
bool IsMet,
|
||||
|
||||
/// <summary>Current alert severity based on budget consumption.</summary>
|
||||
AlertSeverity AlertSeverity,
|
||||
|
||||
/// <summary>When this state was computed.</summary>
|
||||
DateTimeOffset ComputedAt,
|
||||
|
||||
/// <summary>Start of the evaluation window.</summary>
|
||||
DateTimeOffset WindowStart,
|
||||
|
||||
/// <summary>End of the evaluation window.</summary>
|
||||
DateTimeOffset WindowEnd)
|
||||
{
|
||||
/// <summary>Creates a state indicating no data is available.</summary>
|
||||
public static SloState NoData(Guid sloId, string tenantId, DateTimeOffset now, SloWindow window)
|
||||
{
|
||||
var windowDuration = GetWindowDuration(window);
|
||||
return new SloState(
|
||||
SloId: sloId,
|
||||
TenantId: tenantId,
|
||||
CurrentSli: 1.0, // Assume good when no data
|
||||
TotalEvents: 0,
|
||||
GoodEvents: 0,
|
||||
BadEvents: 0,
|
||||
BudgetConsumed: 0,
|
||||
BudgetRemaining: 1.0,
|
||||
BurnRate: 0,
|
||||
TimeToExhaustion: null,
|
||||
IsMet: true,
|
||||
AlertSeverity: AlertSeverity.Info,
|
||||
ComputedAt: now,
|
||||
WindowStart: now - windowDuration,
|
||||
WindowEnd: now);
|
||||
}
|
||||
|
||||
private static TimeSpan GetWindowDuration(SloWindow window) => window switch
|
||||
{
|
||||
SloWindow.OneHour => TimeSpan.FromHours(1),
|
||||
SloWindow.OneDay => TimeSpan.FromDays(1),
|
||||
SloWindow.SevenDays => TimeSpan.FromDays(7),
|
||||
SloWindow.ThirtyDays => TimeSpan.FromDays(30),
|
||||
_ => TimeSpan.FromDays(1)
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Alert budget threshold configuration.
|
||||
/// </summary>
|
||||
public sealed record AlertBudgetThreshold(
|
||||
/// <summary>Unique threshold identifier.</summary>
|
||||
Guid ThresholdId,
|
||||
|
||||
/// <summary>SLO this threshold applies to.</summary>
|
||||
Guid SloId,
|
||||
|
||||
/// <summary>Tenant this threshold belongs to.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Budget consumed percentage that triggers this alert (0-1).</summary>
|
||||
double BudgetConsumedThreshold,
|
||||
|
||||
/// <summary>Burn rate threshold that triggers this alert.</summary>
|
||||
double? BurnRateThreshold,
|
||||
|
||||
/// <summary>Severity of the alert.</summary>
|
||||
AlertSeverity Severity,
|
||||
|
||||
/// <summary>Whether this threshold is enabled.</summary>
|
||||
bool Enabled,
|
||||
|
||||
/// <summary>Notification channel for this alert.</summary>
|
||||
string? NotificationChannel,
|
||||
|
||||
/// <summary>Notification endpoint for this alert.</summary>
|
||||
string? NotificationEndpoint,
|
||||
|
||||
/// <summary>Cooldown period between alerts.</summary>
|
||||
TimeSpan Cooldown,
|
||||
|
||||
/// <summary>When an alert was last triggered.</summary>
|
||||
DateTimeOffset? LastTriggeredAt,
|
||||
|
||||
/// <summary>When the threshold was created.</summary>
|
||||
DateTimeOffset CreatedAt,
|
||||
|
||||
/// <summary>When the threshold was last updated.</summary>
|
||||
DateTimeOffset UpdatedAt,
|
||||
|
||||
/// <summary>Actor who created the threshold.</summary>
|
||||
string CreatedBy,
|
||||
|
||||
/// <summary>Actor who last modified the threshold.</summary>
|
||||
string UpdatedBy)
|
||||
{
|
||||
/// <summary>Creates a new alert threshold.</summary>
|
||||
public static AlertBudgetThreshold Create(
|
||||
Guid sloId,
|
||||
string tenantId,
|
||||
double budgetConsumedThreshold,
|
||||
AlertSeverity severity,
|
||||
string createdBy,
|
||||
DateTimeOffset createdAt,
|
||||
double? burnRateThreshold = null,
|
||||
string? notificationChannel = null,
|
||||
string? notificationEndpoint = null,
|
||||
TimeSpan? cooldown = null)
|
||||
{
|
||||
if (budgetConsumedThreshold < 0 || budgetConsumedThreshold > 1)
|
||||
throw new ArgumentOutOfRangeException(nameof(budgetConsumedThreshold), "Threshold must be between 0 and 1");
|
||||
|
||||
return new AlertBudgetThreshold(
|
||||
ThresholdId: Guid.NewGuid(),
|
||||
SloId: sloId,
|
||||
TenantId: tenantId,
|
||||
BudgetConsumedThreshold: budgetConsumedThreshold,
|
||||
BurnRateThreshold: burnRateThreshold,
|
||||
Severity: severity,
|
||||
Enabled: true,
|
||||
NotificationChannel: notificationChannel,
|
||||
NotificationEndpoint: notificationEndpoint,
|
||||
Cooldown: cooldown ?? TimeSpan.FromHours(1),
|
||||
LastTriggeredAt: null,
|
||||
CreatedAt: createdAt,
|
||||
UpdatedAt: createdAt,
|
||||
CreatedBy: createdBy,
|
||||
UpdatedBy: createdBy);
|
||||
}
|
||||
|
||||
/// <summary>Checks if this threshold should trigger based on current state.</summary>
|
||||
public bool ShouldTrigger(SloState state, DateTimeOffset now)
|
||||
{
|
||||
if (!Enabled) return false;
|
||||
|
||||
// Check cooldown
|
||||
if (LastTriggeredAt.HasValue && (now - LastTriggeredAt.Value) < Cooldown)
|
||||
return false;
|
||||
|
||||
// Check budget consumed threshold
|
||||
if (state.BudgetConsumed >= BudgetConsumedThreshold)
|
||||
return true;
|
||||
|
||||
// Check burn rate threshold if set
|
||||
if (BurnRateThreshold.HasValue && state.BurnRate >= BurnRateThreshold.Value)
|
||||
return true;
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
/// <summary>Records that this threshold was triggered.</summary>
|
||||
public AlertBudgetThreshold RecordTrigger(DateTimeOffset now) =>
|
||||
this with
|
||||
{
|
||||
LastTriggeredAt = now,
|
||||
UpdatedAt = now
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// SLO alert event.
|
||||
/// </summary>
|
||||
public sealed record SloAlert(
|
||||
/// <summary>Unique alert identifier.</summary>
|
||||
Guid AlertId,
|
||||
|
||||
/// <summary>SLO this alert relates to.</summary>
|
||||
Guid SloId,
|
||||
|
||||
/// <summary>Threshold that triggered this alert.</summary>
|
||||
Guid ThresholdId,
|
||||
|
||||
/// <summary>Tenant this alert belongs to.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Severity of the alert.</summary>
|
||||
AlertSeverity Severity,
|
||||
|
||||
/// <summary>Alert message.</summary>
|
||||
string Message,
|
||||
|
||||
/// <summary>Budget consumed at time of alert.</summary>
|
||||
double BudgetConsumed,
|
||||
|
||||
/// <summary>Burn rate at time of alert.</summary>
|
||||
double BurnRate,
|
||||
|
||||
/// <summary>Current SLI value at time of alert.</summary>
|
||||
double CurrentSli,
|
||||
|
||||
/// <summary>When the alert was triggered.</summary>
|
||||
DateTimeOffset TriggeredAt,
|
||||
|
||||
/// <summary>When the alert was acknowledged (null if not acknowledged).</summary>
|
||||
DateTimeOffset? AcknowledgedAt,
|
||||
|
||||
/// <summary>Who acknowledged the alert.</summary>
|
||||
string? AcknowledgedBy,
|
||||
|
||||
/// <summary>When the alert was resolved (null if not resolved).</summary>
|
||||
DateTimeOffset? ResolvedAt,
|
||||
|
||||
/// <summary>How the alert was resolved.</summary>
|
||||
string? ResolutionNotes)
|
||||
{
|
||||
/// <summary>Creates a new alert from an SLO state and threshold.</summary>
|
||||
public static SloAlert Create(
|
||||
Slo slo,
|
||||
SloState state,
|
||||
AlertBudgetThreshold threshold)
|
||||
{
|
||||
var message = threshold.BurnRateThreshold.HasValue && state.BurnRate >= threshold.BurnRateThreshold.Value
|
||||
? FormattableString.Invariant($"SLO '{slo.Name}' burn rate {state.BurnRate:F2}x exceeds threshold {threshold.BurnRateThreshold.Value:F2}x")
|
||||
: FormattableString.Invariant($"SLO '{slo.Name}' error budget {state.BudgetConsumed:P1} consumed exceeds threshold {threshold.BudgetConsumedThreshold:P1}");
|
||||
|
||||
return new SloAlert(
|
||||
AlertId: Guid.NewGuid(),
|
||||
SloId: slo.SloId,
|
||||
ThresholdId: threshold.ThresholdId,
|
||||
TenantId: slo.TenantId,
|
||||
Severity: threshold.Severity,
|
||||
Message: message,
|
||||
BudgetConsumed: state.BudgetConsumed,
|
||||
BurnRate: state.BurnRate,
|
||||
CurrentSli: state.CurrentSli,
|
||||
TriggeredAt: state.ComputedAt,
|
||||
AcknowledgedAt: null,
|
||||
AcknowledgedBy: null,
|
||||
ResolvedAt: null,
|
||||
ResolutionNotes: null);
|
||||
}
|
||||
|
||||
/// <summary>Acknowledges the alert.</summary>
|
||||
public SloAlert Acknowledge(string acknowledgedBy, DateTimeOffset now) =>
|
||||
this with
|
||||
{
|
||||
AcknowledgedAt = now,
|
||||
AcknowledgedBy = acknowledgedBy
|
||||
};
|
||||
|
||||
/// <summary>Resolves the alert.</summary>
|
||||
public SloAlert Resolve(string notes, DateTimeOffset now) =>
|
||||
this with
|
||||
{
|
||||
ResolvedAt = now,
|
||||
ResolutionNotes = notes
|
||||
};
|
||||
|
||||
/// <summary>Whether this alert has been acknowledged.</summary>
|
||||
public bool IsAcknowledged => AcknowledgedAt.HasValue;
|
||||
|
||||
/// <summary>Whether this alert has been resolved.</summary>
|
||||
public bool IsResolved => ResolvedAt.HasValue;
|
||||
}
|
||||
@@ -0,0 +1,42 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
/// <summary>
|
||||
/// Represents a job source (producer) that submits jobs to the orchestrator.
|
||||
/// Examples: Concelier, Excititor, Scheduler, Export Center, Policy Engine.
|
||||
/// </summary>
|
||||
public sealed record Source(
|
||||
/// <summary>Unique source identifier.</summary>
|
||||
Guid SourceId,
|
||||
|
||||
/// <summary>Tenant owning this source.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Human-readable source name (e.g., "concelier-nvd").</summary>
|
||||
string Name,
|
||||
|
||||
/// <summary>Source type/category (e.g., "advisory-ingest", "scanner", "export").</summary>
|
||||
string SourceType,
|
||||
|
||||
/// <summary>Whether the source is currently enabled.</summary>
|
||||
bool Enabled,
|
||||
|
||||
/// <summary>Whether the source is paused (throttled by operator).</summary>
|
||||
bool Paused,
|
||||
|
||||
/// <summary>Operator-provided reason for pause (if paused).</summary>
|
||||
string? PauseReason,
|
||||
|
||||
/// <summary>Ticket reference for pause audit trail.</summary>
|
||||
string? PauseTicket,
|
||||
|
||||
/// <summary>Optional configuration JSON blob.</summary>
|
||||
string? Configuration,
|
||||
|
||||
/// <summary>When the source was created.</summary>
|
||||
DateTimeOffset CreatedAt,
|
||||
|
||||
/// <summary>When the source was last updated.</summary>
|
||||
DateTimeOffset UpdatedAt,
|
||||
|
||||
/// <summary>Actor who last modified the source.</summary>
|
||||
string UpdatedBy);
|
||||
@@ -0,0 +1,60 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
/// <summary>
|
||||
/// Represents a dynamic rate-limit override (throttle) for a source or job type.
|
||||
/// Throttles are temporary pause/slow-down mechanisms, often in response to upstream pressure.
|
||||
/// </summary>
|
||||
public sealed record Throttle(
|
||||
/// <summary>Unique throttle identifier.</summary>
|
||||
Guid ThrottleId,
|
||||
|
||||
/// <summary>Tenant this throttle applies to.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Source to throttle (null if job-type scoped).</summary>
|
||||
Guid? SourceId,
|
||||
|
||||
/// <summary>Job type to throttle (null if source-scoped).</summary>
|
||||
string? JobType,
|
||||
|
||||
/// <summary>Whether this throttle is currently active.</summary>
|
||||
bool Active,
|
||||
|
||||
/// <summary>Reason for the throttle (e.g., "429 from upstream", "Manual pause").</summary>
|
||||
string Reason,
|
||||
|
||||
/// <summary>Optional ticket reference for audit.</summary>
|
||||
string? Ticket,
|
||||
|
||||
/// <summary>When the throttle was created.</summary>
|
||||
DateTimeOffset CreatedAt,
|
||||
|
||||
/// <summary>When the throttle expires (null = indefinite).</summary>
|
||||
DateTimeOffset? ExpiresAt,
|
||||
|
||||
/// <summary>Actor who created the throttle.</summary>
|
||||
string CreatedBy);
|
||||
|
||||
/// <summary>
|
||||
/// Reason categories for throttle creation.
|
||||
/// </summary>
|
||||
public static class ThrottleReasons
|
||||
{
|
||||
/// <summary>Upstream returned 429 Too Many Requests.</summary>
|
||||
public const string UpstreamRateLimited = "upstream_429";
|
||||
|
||||
/// <summary>Upstream returned 503 Service Unavailable.</summary>
|
||||
public const string UpstreamUnavailable = "upstream_503";
|
||||
|
||||
/// <summary>Upstream returned 5xx error repeatedly.</summary>
|
||||
public const string UpstreamErrors = "upstream_5xx";
|
||||
|
||||
/// <summary>Manual operator intervention.</summary>
|
||||
public const string ManualPause = "manual_pause";
|
||||
|
||||
/// <summary>Circuit breaker triggered.</summary>
|
||||
public const string CircuitBreaker = "circuit_breaker";
|
||||
|
||||
/// <summary>Quota exhausted.</summary>
|
||||
public const string QuotaExhausted = "quota_exhausted";
|
||||
}
|
||||
@@ -0,0 +1,163 @@
|
||||
namespace StellaOps.JobEngine.Core.Domain;
|
||||
|
||||
/// <summary>
|
||||
/// Represents an event-time watermark for tracking processing progress.
|
||||
/// Watermarks are scoped by source, job type, or custom key.
|
||||
/// </summary>
|
||||
public sealed record Watermark(
|
||||
/// <summary>Unique watermark identifier.</summary>
|
||||
Guid WatermarkId,
|
||||
|
||||
/// <summary>Tenant this watermark belongs to.</summary>
|
||||
string TenantId,
|
||||
|
||||
/// <summary>Source this watermark tracks (null if job-type scoped).</summary>
|
||||
Guid? SourceId,
|
||||
|
||||
/// <summary>Job type this watermark tracks (null if source-scoped).</summary>
|
||||
string? JobType,
|
||||
|
||||
/// <summary>Normalized scope key for uniqueness.</summary>
|
||||
string ScopeKey,
|
||||
|
||||
/// <summary>Latest processed event time (high watermark).</summary>
|
||||
DateTimeOffset HighWatermark,
|
||||
|
||||
/// <summary>Earliest event time in current window (low watermark for windowing).</summary>
|
||||
DateTimeOffset? LowWatermark,
|
||||
|
||||
/// <summary>Monotonic sequence number for ordering.</summary>
|
||||
long SequenceNumber,
|
||||
|
||||
/// <summary>Total events processed through this watermark.</summary>
|
||||
long ProcessedCount,
|
||||
|
||||
/// <summary>SHA-256 hash of last processed batch for integrity verification.</summary>
|
||||
string? LastBatchHash,
|
||||
|
||||
/// <summary>When the watermark was created.</summary>
|
||||
DateTimeOffset CreatedAt,
|
||||
|
||||
/// <summary>When the watermark was last updated.</summary>
|
||||
DateTimeOffset UpdatedAt,
|
||||
|
||||
/// <summary>Actor who last modified the watermark.</summary>
|
||||
string UpdatedBy)
|
||||
{
|
||||
/// <summary>
|
||||
/// Creates a scope key for source-scoped watermarks.
|
||||
/// </summary>
|
||||
public static string CreateScopeKey(Guid sourceId) =>
|
||||
$"source:{sourceId:N}";
|
||||
|
||||
/// <summary>
|
||||
/// Creates a scope key for job-type-scoped watermarks.
|
||||
/// </summary>
|
||||
public static string CreateScopeKey(string jobType) =>
|
||||
$"job_type:{jobType.ToLowerInvariant()}";
|
||||
|
||||
/// <summary>
|
||||
/// Creates a scope key for source+job-type scoped watermarks.
|
||||
/// </summary>
|
||||
public static string CreateScopeKey(Guid sourceId, string jobType) =>
|
||||
$"source:{sourceId:N}:job_type:{jobType.ToLowerInvariant()}";
|
||||
|
||||
/// <summary>
|
||||
/// Creates a new watermark with initial values.
|
||||
/// </summary>
|
||||
public static Watermark Create(
|
||||
string tenantId,
|
||||
Guid? sourceId,
|
||||
string? jobType,
|
||||
DateTimeOffset highWatermark,
|
||||
string createdBy,
|
||||
DateTimeOffset createdAt)
|
||||
{
|
||||
var scopeKey = (sourceId, jobType) switch
|
||||
{
|
||||
(Guid s, string j) when !string.IsNullOrEmpty(j) => CreateScopeKey(s, j),
|
||||
(Guid s, _) => CreateScopeKey(s),
|
||||
(_, string j) when !string.IsNullOrEmpty(j) => CreateScopeKey(j),
|
||||
_ => throw new ArgumentException("Either sourceId or jobType must be specified.")
|
||||
};
|
||||
|
||||
return new Watermark(
|
||||
WatermarkId: Guid.NewGuid(),
|
||||
TenantId: tenantId,
|
||||
SourceId: sourceId,
|
||||
JobType: jobType,
|
||||
ScopeKey: scopeKey,
|
||||
HighWatermark: highWatermark,
|
||||
LowWatermark: null,
|
||||
SequenceNumber: 0,
|
||||
ProcessedCount: 0,
|
||||
LastBatchHash: null,
|
||||
CreatedAt: createdAt,
|
||||
UpdatedAt: createdAt,
|
||||
UpdatedBy: createdBy);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Advances the watermark after successful batch processing.
|
||||
/// </summary>
|
||||
public Watermark Advance(
|
||||
DateTimeOffset newHighWatermark,
|
||||
long eventsProcessed,
|
||||
string? batchHash,
|
||||
string updatedBy,
|
||||
DateTimeOffset updatedAt)
|
||||
{
|
||||
if (newHighWatermark < HighWatermark)
|
||||
throw new ArgumentException("New high watermark cannot be before current high watermark.", nameof(newHighWatermark));
|
||||
|
||||
return this with
|
||||
{
|
||||
HighWatermark = newHighWatermark,
|
||||
SequenceNumber = SequenceNumber + 1,
|
||||
ProcessedCount = ProcessedCount + eventsProcessed,
|
||||
LastBatchHash = batchHash,
|
||||
UpdatedAt = updatedAt,
|
||||
UpdatedBy = updatedBy
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Sets the event-time window bounds.
|
||||
/// </summary>
|
||||
public Watermark WithWindow(DateTimeOffset lowWatermark, DateTimeOffset highWatermark, DateTimeOffset updatedAt)
|
||||
{
|
||||
if (highWatermark < lowWatermark)
|
||||
throw new ArgumentException("High watermark cannot be before low watermark.");
|
||||
|
||||
return this with
|
||||
{
|
||||
LowWatermark = lowWatermark,
|
||||
HighWatermark = highWatermark,
|
||||
UpdatedAt = updatedAt
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Snapshot of watermark state for observability.
|
||||
/// </summary>
|
||||
public sealed record WatermarkSnapshot(
|
||||
string ScopeKey,
|
||||
DateTimeOffset HighWatermark,
|
||||
DateTimeOffset? LowWatermark,
|
||||
long SequenceNumber,
|
||||
long ProcessedCount,
|
||||
TimeSpan? Lag)
|
||||
{
|
||||
/// <summary>
|
||||
/// Creates a snapshot from a watermark with calculated lag.
|
||||
/// </summary>
|
||||
public static WatermarkSnapshot FromWatermark(Watermark watermark, DateTimeOffset now) =>
|
||||
new(
|
||||
ScopeKey: watermark.ScopeKey,
|
||||
HighWatermark: watermark.HighWatermark,
|
||||
LowWatermark: watermark.LowWatermark,
|
||||
SequenceNumber: watermark.SequenceNumber,
|
||||
ProcessedCount: watermark.ProcessedCount,
|
||||
Lag: now - watermark.HighWatermark);
|
||||
}
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user