diff --git a/AGENTS.md b/AGENTS.md index be402f746..d054d5ed1 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -15,27 +15,20 @@ Unless explicitly told otherwise, assume you are working inside the StellaOps mo --- -### 1) What is StellaOps? +## Project Overview -**StellaOps** is a next-generation, sovereign container-security toolkit built for high-speed, offline operation and released under AGPL-3.0-or-later. +**Stella Ops Suite** is a self-hostable, sovereign release control plane for non-Kubernetes container estates, released under AGPL-3.0-or-later. It orchestrates environment promotions (Dev → Stage → Prod), gates releases using reachability-aware security and policy, and produces verifiable evidence for every release decision. -StellaOps is a self-hostable, sovereign container-security platform that makes proof—not promises—default. It binds every container digest to content-addressed SBOMs (SPDX 3.0.1 and CycloneDX 1.6), in-toto/DSSE attestations, and optional Sigstore Rekor transparency, then layers deterministic, replayable scanning with entry-trace and VEX-first decisioning. +The platform combines: +- **Release orchestration** — UI-driven promotion, approvals, policy gates, rollbacks; hook-able with scripts +- **Security decisioning as a gate** — Scan on build, evaluate on release, re-evaluate on CVE updates +- **OCI-digest-first releases** — Immutable digest-based release identity with "what is deployed where" tracking +- **Toolchain-agnostic integrations** — Plug into any SCM, CI, registry, and secrets system +- **Auditability + standards** — Evidence packets, SBOM/VEX/attestation support, deterministic replay -“Next-gen” means: +Existing capabilities (operational): Reproducible vulnerability scanning with VEX-first decisioning, SBOM generation (SPDX 2.2/2.3 and CycloneDX 1.7; SPDX 3.0.1 planned), in-toto/DSSE attestations, and optional Sigstore Rekor transparency. The platform is designed for offline/air-gapped operation with regional crypto support (eIDAS/FIPS/GOST/SM). -* Findings are reproducible and explainable. -* Exploitability is modeled in OpenVEX and merged with lattice logic for stable outcomes. -* The same workflow runs online or fully air-gapped. - -“Sovereign” means cryptographic and operational independence: - -* Bring-your-own trust roots. -* Regional crypto readiness (eIDAS/FIPS/GOST/SM). -* Offline bundles and post-quantum-ready modes. - -Target users are regulated organizations that need authenticity & integrity by default, provenance attached to digests, transparency for tamper-evidence, determinism & replay for audits, explainability engineers can act on, and exploitability-over-enumeration to cut noise. We minimize trust and blast radius with short-lived keys, least-privilege, and content-addressed caches; we stay air-gap friendly with mirrored feeds; and we keep governance honest with reviewable OPA/Rego policy gates and VEX-based waivers. - -More documentation is in `./docs/*.md`. Start with `docs/README.md` to discover available documentation. When needed, you may request specific documents to be provided (e.g., `docs/modules/scanner/architecture.md`). +Planned capabilities (release orchestration): Environment management, release bundles, promotion workflows, deployment execution (Docker/Compose/ECS/Nomad agents), progressive delivery (A/B, canary), and a three-surface plugin system. See `docs/modules/release-orchestrator/README.md` for the full specification. --- diff --git a/CLAUDE.md b/CLAUDE.md index b59dd10e1..f83bfb416 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -4,7 +4,18 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co ## Project Overview -StellaOps is a self-hostable, sovereign container-security platform released under AGPL-3.0-or-later. It provides reproducible vulnerability scanning with VEX-first decisioning, SBOM generation (SPDX 2.2/2.3 and CycloneDX 1.7; SPDX 3.0.1 planned), in-toto/DSSE attestations, and optional Sigstore Rekor transparency. The platform is designed for offline/air-gapped operation with regional crypto support (eIDAS/FIPS/GOST/SM). +**Stella Ops Suite** is a self-hostable, sovereign release control plane for non-Kubernetes container estates, released under AGPL-3.0-or-later. It orchestrates environment promotions (Dev → Stage → Prod), gates releases using reachability-aware security and policy, and produces verifiable evidence for every release decision. + +The platform combines: +- **Release orchestration** — UI-driven promotion, approvals, policy gates, rollbacks; hook-able with scripts +- **Security decisioning as a gate** — Scan on build, evaluate on release, re-evaluate on CVE updates +- **OCI-digest-first releases** — Immutable digest-based release identity with "what is deployed where" tracking +- **Toolchain-agnostic integrations** — Plug into any SCM, CI, registry, and secrets system +- **Auditability + standards** — Evidence packets, SBOM/VEX/attestation support, deterministic replay + +Existing capabilities (operational): Reproducible vulnerability scanning with VEX-first decisioning, SBOM generation (SPDX 2.2/2.3 and CycloneDX 1.7; SPDX 3.0.1 planned), in-toto/DSSE attestations, and optional Sigstore Rekor transparency. The platform is designed for offline/air-gapped operation with regional crypto support (eIDAS/FIPS/GOST/SM). + +Planned capabilities (release orchestration): Environment management, release bundles, promotion workflows, deployment execution (Docker/Compose/ECS/Nomad agents), progressive delivery (A/B, canary), and a three-surface plugin system. See `docs/modules/release-orchestrator/README.md` for the full specification. ## Build Commands diff --git a/docs/ARCHITECTURE_OVERVIEW.md b/docs/ARCHITECTURE_OVERVIEW.md index fcdfb8d52..2e456ad7f 100755 --- a/docs/ARCHITECTURE_OVERVIEW.md +++ b/docs/ARCHITECTURE_OVERVIEW.md @@ -1,41 +1,84 @@ # Architecture Overview (High-Level) -This document is the 10-minute tour for StellaOps: what components exist, how they fit together, and what "offline-first + deterministic + evidence-linked decisions" means in practice. +This document is the 10-minute tour for Stella Ops Suite: what components exist, how they fit together, and what "release control plane + security gates + evidence-linked decisions" means in practice. For the full reference map (services, boundaries, detailed flows), see `docs/ARCHITECTURE_REFERENCE.md`. +## What Stella Ops Suite Is + +**Stella Ops Suite is a centralized, auditable release control plane for non-Kubernetes container estates.** + +It sits between your CI and your runtime targets, governs promotion across environments, enforces security and policy gates, and produces verifiable evidence for every release decision. + +``` +CI Build → Registry → Stella (Scan + Release + Promote + Gate + Deploy) → Targets → Evidence +``` + ## Guiding Principles -- **SBOM-first:** scan and reason over SBOMs; fall back to unpacking only when needed. +- **Digest-first releases:** a release is an immutable set of OCI digests, never mutable tags. - **Deterministic replay:** the same inputs yield the same outputs (stable ordering, canonical hashing, UTC timestamps). -- **Evidence-linked decisions:** policy decisions link back to specific evidence artifacts (SBOM slices, advisory/VEX observations, reachability proofs, attestations). -- **Aggregation-not-merge:** upstream advisories and VEX are stored and exposed with provenance; conflicts are visible, not silently collapsed. -- **Offline-first:** the same workflow runs connected or air-gapped via Offline Kit snapshots and signed bundles. +- **Evidence-linked decisions:** every release decision links to concrete evidence artifacts (scan verdicts, approvals, policy evaluations). +- **Pluggable everything:** integrations are plugins; the core orchestration engine is stable. +- **Offline-first:** all core operations work in air-gapped environments. +- **No feature gating:** all plans include all features; limits are environments + new digests/day. -## System Map (What Runs) +## System Map + +### Release-Centric Flow ``` -Build -> Sign -> Store -> Scan -> Decide -> Attest -> Notify/Export +Build → Scan → Create Release → Request Promotion → Gate Evaluation → Deploy → Evidence + ↑ ↓ + └── Re-evaluate on CVE Updates ┘ ``` -At a high level, StellaOps is a set of services grouped by responsibility: +### Platform Themes -- **Identity and authorization:** Authority (OIDC/OAuth2, scopes/tenancy) -- **Scanning and SBOM:** Scanner WebService + Worker (facts generation) -- **Advisories:** Concelier (ingest/normalize/export vulnerability sources) -- **VEX:** Excititor + VEX Lens (VEX observations/linksets and exploration) -- **Decisioning:** Policy Engine surfaces (lattice-style explainable policy) -- **Signing and transparency:** Signer + Attestor (DSSE/in-toto and optional transparency) -- **Orchestration and delivery:** Scheduler, Notify, Export Center -- **Console:** Web UI for operators and auditors +Stella Ops Suite organizes capabilities into **themes** (functional areas): -| Tier | Services | Key responsibilities | +#### Existing Themes (Operational) + +| Theme | Purpose | Key Modules | +|-------|---------|-------------| +| **INGEST** | Advisory ingestion | Concelier, Advisory-AI | +| **VEXOPS** | VEX document handling | Excititor, VEX Lens, VEX Hub | +| **REASON** | Policy and decisioning | Policy Engine, OPA Runtime | +| **SCANENG** | Scanning and SBOM | Scanner, SBOM Service, Reachability | +| **EVIDENCE** | Evidence and attestation | Evidence Locker, Attestor, Export Center | +| **RUNTIME** | Runtime signals | Signals, Graph, Zastava | +| **JOBCTRL** | Job orchestration | Scheduler, Orchestrator, TaskRunner | +| **OBSERVE** | Observability | Notifier, Telemetry | +| **REPLAY** | Deterministic replay | Replay Engine | +| **DEVEXP** | Developer experience | CLI, Web UI, SDK | + +#### Planned Themes (Release Orchestration) + +| Theme | Purpose | Key Modules | +|-------|---------|-------------| +| **INTHUB** | Integration hub | Integration Manager, Connection Profiles, Connector Runtime, Doctor Checks | +| **ENVMGR** | Environment management | Environment Manager, Target Registry, Agent Manager, Inventory Sync | +| **RELMAN** | Release management | Component Registry, Version Manager, Release Manager, Release Catalog | +| **WORKFL** | Workflow engine | Workflow Designer, Workflow Engine, Step Executor, Step Registry | +| **PROMOT** | Promotion and approval | Promotion Manager, Approval Gateway, Decision Engine, Gate Registry | +| **DEPLOY** | Deployment execution | Deploy Orchestrator, Target Executor, Runner Executor, Artifact Generator, Rollback Manager | +| **AGENTS** | Deployment agents | Agent Core, Agent Docker, Agent Compose, Agent SSH, Agent WinRM, Agent ECS, Agent Nomad | +| **PROGDL** | Progressive delivery | A/B Manager, Traffic Router, Canary Controller, Rollout Strategy | +| **RELEVI** | Release evidence | Evidence Collector, Evidence Signer, Sticker Writer, Audit Exporter | +| **PLUGIN** | Plugin infrastructure | Plugin Registry, Plugin Loader, Plugin Sandbox, Plugin SDK | + +### Service Tiers + +| Tier | Services | Key Responsibilities | |------|----------|----------------------| -| **Edge / Identity** | `StellaOps.Authority` | Issues short-lived tokens (DPoP + mTLS), exposes OIDC device-code + auth-code flows, rotates JWKS. | -| **Scan & attest** | `StellaOps.Scanner` (API + Worker), `StellaOps.Signer`, `StellaOps.Attestor` | Accept SBOMs/images, drive analyzers, produce DSSE bundles, optionally log to a Rekor mirror. | -| **Evidence graph** | `StellaOps.Concelier`, `StellaOps.Excititor`, `StellaOps.Policy.Engine` | Ingest advisories/VEX, correlate linksets, run lattice policy and VEX-first decisioning. | -| **Experience** | `StellaOps.Web` (Console), `StellaOps.Cli`, `StellaOps.Notify`, `StellaOps.ExportCenter` | Operator UX, automation, notifications, and offline/mirror packaging. | -| **Data plane** | PostgreSQL, Valkey, RustFS/object storage (optional NATS JetStream) | Canonical store, counters/queues, and artifact storage with deterministic layouts. | +| **Edge / Identity** | `StellaOps.Authority` | Issues short-lived tokens (DPoP + mTLS), exposes OIDC flows, rotates JWKS | +| **Release Control** | `StellaOps.ReleaseManager`, `StellaOps.PromotionManager`, `StellaOps.WorkflowEngine` | Release bundles, promotion workflows, gate evaluation (planned) | +| **Integration Hub** | `StellaOps.IntegrationManager`, `StellaOps.ConnectorRuntime` | SCM/CI/Registry/Vault connectors (planned) | +| **Scan & Attest** | `StellaOps.Scanner`, `StellaOps.Signer`, `StellaOps.Attestor` | Accept SBOMs/images, produce DSSE bundles, transparency logging | +| **Evidence Graph** | `StellaOps.Concelier`, `StellaOps.Excititor`, `StellaOps.Policy.Engine` | Advisories/VEX, linksets, lattice policy | +| **Deployment** | `StellaOps.DeployOrchestrator`, `StellaOps.Agent.*` | Deployment execution to Docker/Compose/ECS/Nomad (planned) | +| **Experience** | `StellaOps.Web`, `StellaOps.Cli`, `StellaOps.Notify`, `StellaOps.ExportCenter` | Operator UX, automation, notifications | +| **Data Plane** | PostgreSQL, Valkey, RustFS/object storage | Canonical store, queues, artifact storage | ## Infrastructure (What Is Required) @@ -50,7 +93,9 @@ At a high level, StellaOps is a set of services grouped by responsibility: - **NATS JetStream:** optional messaging transport in some deployments. - **Transparency log services:** Rekor mirror (and CA services) when transparency is enabled. -## End-to-End Flow (Typical) +## End-to-End Flows + +### Current: Vulnerability Scanning Flow 1. **Evidence enters** via Concelier and Excititor connectors (Aggregation-Only Contract). 2. **SBOM arrives** from CLI/CI; Scanner deduplicates layers and enqueues work. @@ -59,22 +104,64 @@ At a high level, StellaOps is a set of services grouped by responsibility: 5. **Signer + Attestor** wrap outputs into DSSE bundles and (optionally) anchor them in a Rekor mirror. 6. **Console/CLI/Export** surface findings and package verifiable evidence; Notify emits digests/incidents. -## Extension Points (Where You Customize) +### Planned: Release Orchestration Flow + +1. **CI pushes image** to registry by digest; triggers webhook to Stella. +2. **Stella scans** the new digest and stores the verdict. +3. **Release created** bundling component digests with semantic version. +4. **Promotion requested** to move release from Dev → Stage → Prod. +5. **Gate evaluation** checks: security verdict, approval count, freeze windows, custom policies. +6. **Decision record** produced with evidence refs and signed. +7. **Deployment executed** via agent to target (Docker/Compose/ECS/Nomad). +8. **Version sticker** written to target for drift detection. +9. **Evidence packet** sealed and stored. + +## Extension Points + +### Current Extension Points - **Scanner analyzers** (restart-time plug-ins) for ecosystem-specific parsing and facts extraction. - **Concelier connectors** for new advisory sources (preserving aggregation-only guardrails). - **Policy packs** for organization-specific gating and waivers/justifications. - **Export profiles** for output formats and offline bundle shapes. +### Planned Extension Points (Three-Surface Plugin Model) + +Plugins contribute through three surfaces: + +1. **Manifest** (static declaration): What the plugin provides (integrations, steps, agents, gates) +2. **Connector Runtime** (dynamic execution): gRPC interface for runtime operations +3. **Step Provider** (execution contract): Execution characteristics for workflow steps + +Plugin types: +- **Integration connectors:** SCM, CI, Registry, Vault, Target, Router +- **Step providers:** Custom workflow steps +- **Agent types:** New deployment target types +- **Gate providers:** Custom gate evaluations + ## Offline & Sovereign Notes - Offline Kit carries vulnerability feeds, container images, signatures, and verification material so the workflow stays identical when air-gapped. - Authority + token verification remain local; quota enforcement is verifiable offline. - Attestor can cache transparency proofs for offline verification. +- Evidence packets are exportable for external audit in air-gapped environments. +- All release decisions can be replayed with frozen inputs. + +## Key Architectural Decisions + +| Decision | Rationale | +|----------|-----------| +| **Digest-first release identity** | Tags are mutable; digests provide immutable release identity for audit | +| **3-surface plugin model** | Enables extensibility without core code changes | +| **Compiled C# scripts + sandboxed bash** | C# for complex orchestration; bash for simple hooks | +| **Agent + agentless execution** | Agent-based preferred for reliability; agentless for adoption | +| **Evidence packets for every decision** | Enables deterministic replay and audit-grade compliance | ## References -- `docs/ARCHITECTURE_REFERENCE.md` -- `docs/OFFLINE_KIT.md` -- `docs/API_CLI_REFERENCE.md` -- `docs/modules/platform/architecture-overview.md` +- `docs/ARCHITECTURE_REFERENCE.md` — Full reference map +- `docs/modules/release-orchestrator/architecture.md` — Release orchestrator design (planned) +- `docs/OFFLINE_KIT.md` — Air-gap operations +- `docs/API_CLI_REFERENCE.md` — API and CLI contracts +- `docs/modules/platform/architecture-overview.md` — Platform service design +- `docs/product/advisories/09-Jan-2026 - Stella Ops Orchestrator Architecture.md` — Full orchestrator specification diff --git a/docs/FEATURE_MATRIX.md b/docs/FEATURE_MATRIX.md index 9f5d80a18..91e0b7c76 100755 --- a/docs/FEATURE_MATRIX.md +++ b/docs/FEATURE_MATRIX.md @@ -1,30 +1,44 @@ -# 4 · Feature Matrix — **Stella Ops** -*(rev 4.0 · 24 Dec 2025)* +# Feature Matrix — Stella Ops Suite +*(rev 5.0 · 09 Jan 2026)* > **Looking for a quick read?** Check [`key-features.md`](key-features.md) for the short capability cards; this matrix keeps full tier-by-tier detail. --- -## Pricing Tiers Overview +## Product Evolution -| Tier | Scans/Day | Registration | Token Refresh | Target User | Price | -|------|-----------|--------------|---------------|-------------|-------| -| **Free** | 33 | None | 12h auto | Individual developer | $0 | -| **Community** | 333 | Required | 30d manual | Startups, small teams (<25) | $0 | -| **Enterprise** | 2,000+ | SSO/Contract | Annual | Organizations (25+), regulated | Contact Sales | +**Stella Ops Suite** is now a centralized, auditable release control plane for non-Kubernetes container estates. The platform combines release orchestration with security decisioning as a gate. -**Key Differences:** -- **Free → Community**: 10× quota, deep analysis, Helm/K8s, email alerts, requires registration -- **Community → Enterprise**: Scale (HA), multi-team (RBAC scopes), automation (CI/CD), support (SLA) +- **Release orchestration** — UI-driven promotion (Dev → Stage → Prod), approvals, policy gates, rollbacks +- **Security decisioning as a gate** — Scan on build, evaluate on release, re-evaluate on CVE updates +- **OCI-digest-first releases** — Immutable digest-based release identity +- **Evidence packets** — Every release decision is cryptographically signed and stored + +--- + +## Pricing Model + +**Principle:** Pay for scale, not for features or automation. No per-seat, per-project, or per-deployment taxes. + +| Plan | Price | Environments | New Digests/Day | Deployments | Notes | +|------|-------|--------------|-----------------|-------------|-------| +| **Free** | $0/month | 3 | 333 | Unlimited (fair use) | Full features | +| **Pro** | $699/month | 33 | 3,333 | Unlimited (fair use) | Same features | +| **Enterprise** | $1,999/month | Unlimited | Unlimited | Unlimited | Fair use on mirroring/audit bandwidth | + +**Key Principles:** +- All plans include all features (no feature gating) +- Limits are environments + new digests analyzed per day +- Unlimited deployments with fair use policy --- ## Competitive Moat Features -*These differentiators are available across all tiers to build brand and adoption.* +*These differentiators are available across all plans.* -| Capability | Free | Community | Enterprise | Notes | -|------------|:----:|:---------:|:----------:|-------| +| Capability | Free | Pro | Enterprise | Notes | +|------------|:----:|:---:|:----------:|-------| | Signed Replayable Risk Verdicts | ✅ | ✅ | ✅ | Core differentiator | | Decision Capsules | ✅ | ✅ | ✅ | Audit-grade evidence bundles | | VEX Decisioning Engine | ✅ | ✅ | ✅ | Trust lattice + conflict resolution | @@ -32,6 +46,79 @@ | Smart-Diff (Semantic Risk Delta) | ✅ | ✅ | ✅ | Material change detection | | Unknowns as First-Class State | ✅ | ✅ | ✅ | Uncertainty budgets | | Deterministic Replay | ✅ | ✅ | ✅ | `stella replay srm.yaml` | +| Non-Kubernetes First-Class | ✅ | ✅ | ✅ | Docker/Compose/ECS/Nomad targets | +| Digest-First Release Identity | ✅ | ✅ | ✅ | Immutable releases | + +--- + +## Release Orchestration (Planned) + +*Release orchestration capabilities are planned for implementation. All plans will include all features.* + +| Capability | Free | Pro | Enterprise | Notes | +|------------|:----:|:---:|:----------:|-------| +| **Environment Management** | | | | | +| Environment CRUD | ⏳ | ⏳ | ⏳ | Dev/Stage/Prod definitions | +| Freeze Windows | ⏳ | ⏳ | ⏳ | Calendar-based blocking | +| Approval Policies | ⏳ | ⏳ | ⏳ | Per-environment rules | +| **Release Management** | | | | | +| Component Registry | ⏳ | ⏳ | ⏳ | Service → repository mapping | +| Release Bundles | ⏳ | ⏳ | ⏳ | Component → digest bundles | +| Semantic Versioning | ⏳ | ⏳ | ⏳ | SemVer release versions | +| Tag → Digest Resolution | ⏳ | ⏳ | ⏳ | Immutable digest pinning | +| **Promotion & Gates** | | | | | +| Promotion Workflows | ⏳ | ⏳ | ⏳ | Environment transitions | +| Security Gate | ⏳ | ⏳ | ⏳ | Scan verdict evaluation | +| Approval Gate | ⏳ | ⏳ | ⏳ | Human sign-off | +| Freeze Window Gate | ⏳ | ⏳ | ⏳ | Calendar enforcement | +| Policy Gate (OPA/Rego) | ⏳ | ⏳ | ⏳ | Custom rules | +| Decision Records | ⏳ | ⏳ | ⏳ | Evidence-linked decisions | +| **Deployment Execution** | | | | | +| Docker Host Agent | ⏳ | ⏳ | ⏳ | Direct container deployment | +| Compose Host Agent | ⏳ | ⏳ | ⏳ | Docker Compose deployment | +| SSH Agentless | ⏳ | ⏳ | ⏳ | Linux remote execution | +| WinRM Agentless | ⏳ | ⏳ | ⏳ | Windows remote execution | +| ECS Agent | ⏳ | ⏳ | ⏳ | AWS ECS deployment | +| Nomad Agent | ⏳ | ⏳ | ⏳ | HashiCorp Nomad deployment | +| Rollback | ⏳ | ⏳ | ⏳ | Previous version restore | +| **Progressive Delivery** | | | | | +| A/B Releases | ⏳ | ⏳ | ⏳ | Traffic splitting | +| Canary Deployments | ⏳ | ⏳ | ⏳ | Gradual rollout | +| Blue-Green | ⏳ | ⏳ | ⏳ | Zero-downtime switch | +| Traffic Routing Plugins | ⏳ | ⏳ | ⏳ | Nginx/HAProxy/Traefik/ALB | +| **Workflow Engine** | | | | | +| DAG Workflow Execution | ⏳ | ⏳ | ⏳ | Directed acyclic graphs | +| Step Registry | ⏳ | ⏳ | ⏳ | Built-in + custom steps | +| Workflow Templates | ⏳ | ⏳ | ⏳ | Reusable workflows | +| Script Steps (Bash/C#) | ⏳ | ⏳ | ⏳ | Custom automation | +| **Evidence & Audit** | | | | | +| Evidence Packets | ⏳ | ⏳ | ⏳ | Sealed decision bundles | +| Version Stickers | ⏳ | ⏳ | ⏳ | On-target deployment records | +| Audit Export | ⏳ | ⏳ | ⏳ | Compliance reporting | +| **Integrations** | | | | | +| GitHub Integration | ⏳ | ⏳ | ⏳ | SCM + webhooks | +| GitLab Integration | ⏳ | ⏳ | ⏳ | SCM + webhooks | +| Harbor Integration | ⏳ | ⏳ | ⏳ | Registry + scanning | +| HashiCorp Vault | ⏳ | ⏳ | ⏳ | Secrets management | +| AWS Secrets Manager | ⏳ | ⏳ | ⏳ | Secrets management | +| **Plugin System** | | | | | +| Plugin Manifest | ⏳ | ⏳ | ⏳ | Static declarations | +| Connector Runtime | ⏳ | ⏳ | ⏳ | Dynamic execution | +| Step Providers | ⏳ | ⏳ | ⏳ | Custom workflow steps | +| Agent Types | ⏳ | ⏳ | ⏳ | Custom deployment targets | + +--- + +## Plan Limits + +| Limit | Free | Pro | Enterprise | +|-------|:----:|:---:|:----------:| +| **Environments** | 3 | 33 | Unlimited | +| **New Digests/Day** | 333 | 3,333 | Unlimited | +| **Deployments** | Fair use | Fair use | Fair use | +| **Targets per Environment** | 10 | 100 | Unlimited | +| **Agents** | 3 | 33 | Unlimited | +| **Integrations** | 5 | 50 | Unlimited | --- diff --git a/docs/README.md b/docs/README.md index e402681af..d43a3c042 100755 --- a/docs/README.md +++ b/docs/README.md @@ -1,6 +1,13 @@ -# StellaOps Documentation +# Stella Ops Suite Documentation -StellaOps is a deterministic, offline-first container security platform: every verdict links back to concrete evidence (SBOM slices, advisory/VEX observations, reachability proofs, policy explain traces) and can be replayed for audits. +**Stella Ops Suite** is a centralized, auditable release control plane for non-Kubernetes container estates. It orchestrates environment promotions, gates releases using reachability-aware security and policy, and produces verifiable evidence for every decision. + +The platform combines: +- **Release orchestration** — UI-driven promotion (Dev → Stage → Prod), approvals, policy gates, rollbacks +- **Security decisioning as a gate** — Scan on build, evaluate on release, re-evaluate on CVE updates +- **OCI-digest-first releases** — Immutable digest-based release identity with "what is deployed where" tracking +- **Toolchain-agnostic integrations** — Plug into any SCM, CI, registry, and secrets system +- **Auditability + standards** — Evidence packets, SBOM/VEX/attestation support, deterministic replay ## Two Levels of Documentation @@ -11,39 +18,98 @@ This documentation set is internal and does not keep compatibility stubs for old ## Start Here +### Product Understanding + | Goal | Open this | | --- | --- | | Understand the product in 2 minutes | [overview.md](overview.md) | -| Run a first scan (CLI) | [quickstart.md](quickstart.md) | | Browse capabilities | [key-features.md](key-features.md) | +| Feature matrix | [FEATURE_MATRIX.md](FEATURE_MATRIX.md) | +| Product vision | [product/VISION.md](product/VISION.md) | | Roadmap (priorities + definition of "done") | [ROADMAP.md](ROADMAP.md) | + +### Getting Started + +| Goal | Open this | +| --- | --- | +| Run a first scan (CLI) | [quickstart.md](quickstart.md) | +| Ingest advisories (Concelier + CLI) | [CONCELIER_CLI_QUICKSTART.md](CONCELIER_CLI_QUICKSTART.md) | +| Console (Web UI) operator guide | [UI_GUIDE.md](UI_GUIDE.md) | +| Offline / air-gap operations | [OFFLINE_KIT.md](OFFLINE_KIT.md) | + +### Architecture + +| Goal | Open this | +| --- | --- | | Architecture: high-level overview | [ARCHITECTURE_OVERVIEW.md](ARCHITECTURE_OVERVIEW.md) | | Architecture: full reference map | [ARCHITECTURE_REFERENCE.md](ARCHITECTURE_REFERENCE.md) | | Architecture: user flows (UML) | [technical/architecture/user-flows.md](technical/architecture/user-flows.md) | -| Architecture: module matrix (46 modules) | [technical/architecture/module-matrix.md](technical/architecture/module-matrix.md) | +| Architecture: module matrix | [technical/architecture/module-matrix.md](technical/architecture/module-matrix.md) | | Architecture: data flows | [technical/architecture/data-flows.md](technical/architecture/data-flows.md) | | Architecture: schema mapping | [technical/architecture/schema-mapping.md](technical/architecture/schema-mapping.md) | -| Offline / air-gap operations | [OFFLINE_KIT.md](OFFLINE_KIT.md) | -| Security deployment hardening | [SECURITY_HARDENING_GUIDE.md](SECURITY_HARDENING_GUIDE.md) | -| Ingest advisories (Concelier + CLI) | [CONCELIER_CLI_QUICKSTART.md](CONCELIER_CLI_QUICKSTART.md) | +| Release Orchestrator architecture | [modules/release-orchestrator/architecture.md](modules/release-orchestrator/architecture.md) | + +### Development & Operations + +| Goal | Open this | +| --- | --- | | Develop plugins/connectors | [PLUGIN_SDK_GUIDE.md](PLUGIN_SDK_GUIDE.md) | -| Console (Web UI) operator guide | [UI_GUIDE.md](UI_GUIDE.md) | +| Security deployment hardening | [SECURITY_HARDENING_GUIDE.md](SECURITY_HARDENING_GUIDE.md) | | VEX consensus and issuer trust | [VEX_CONSENSUS_GUIDE.md](VEX_CONSENSUS_GUIDE.md) | | Vulnerability Explorer guide | [VULNERABILITY_EXPLORER_GUIDE.md](VULNERABILITY_EXPLORER_GUIDE.md) | ## Detailed Indexes - **Technical index (everything):** [docs/technical/README.md](/docs/technical/) -- **End-to-end workflow flows:** [docs/flows/](/docs/flows/) (16 detailed flow documents) +- **End-to-end workflow flows:** [docs/flows/](/docs/flows/) - **Module dossiers:** [docs/modules/](/docs/modules/) - **API contracts and samples:** [docs/api/](/docs/api/) - **Architecture notes / ADRs:** [docs/technical/architecture/](/docs/technical/architecture/), [docs/technical/adr/](/docs/technical/adr/) -- **Operations and deployment:** [docs/operations/](/docs/operations/), [docs/deploy/](/docs/deploy/), [docs/deployment/](/docs/deployment/) +- **Operations and deployment:** [docs/operations/](/docs/operations/) - **Air-gap workflows:** [docs/modules/airgap/guides/](/docs/modules/airgap/guides/) - **Security deep dives:** [docs/security/](/docs/security/) - **Benchmarks and fixtures:** [docs/benchmarks/](/docs/benchmarks/), [docs/assets/](/docs/assets/) +- **Product advisories:** [docs/product/advisories/](/docs/product/advisories/) -## Notes +## Platform Themes -- The product is **offline-first**: docs and examples should avoid network dependencies and prefer deterministic fixtures. -- Feature exposure is configuration-driven; module dossiers define authoritative schemas and contracts per component. +Stella Ops Suite organizes capabilities into themes: + +### Existing Themes (Operational) + +| Theme | Purpose | Key Modules | +|-------|---------|-------------| +| **INGEST** | Advisory ingestion | Concelier, Advisory-AI | +| **VEXOPS** | VEX document handling | Excititor, VEX Lens, VEX Hub | +| **REASON** | Policy and decisioning | Policy Engine, OPA Runtime | +| **SCANENG** | Scanning and SBOM | Scanner, SBOM Service, Reachability | +| **EVIDENCE** | Evidence and attestation | Evidence Locker, Attestor, Export Center | +| **RUNTIME** | Runtime signals | Signals, Graph, Zastava | +| **JOBCTRL** | Job orchestration | Scheduler, Orchestrator, TaskRunner | +| **OBSERVE** | Observability | Notifier, Telemetry | +| **REPLAY** | Deterministic replay | Replay Engine | +| **DEVEXP** | Developer experience | CLI, Web UI, SDK | + +### Planned Themes (Release Orchestration) + +| Theme | Purpose | Key Modules | +|-------|---------|-------------| +| **INTHUB** | Integration hub | Integration Manager, Connection Profiles, Connector Runtime | +| **ENVMGR** | Environment management | Environment Manager, Target Registry, Agent Manager | +| **RELMAN** | Release management | Component Registry, Version Manager, Release Manager | +| **WORKFL** | Workflow engine | Workflow Designer, Workflow Engine, Step Executor | +| **PROMOT** | Promotion and approval | Promotion Manager, Approval Gateway, Decision Engine | +| **DEPLOY** | Deployment execution | Deploy Orchestrator, Target Executor, Artifact Generator | +| **AGENTS** | Deployment agents | Agent Core, Docker/Compose/ECS/Nomad agents | +| **PROGDL** | Progressive delivery | A/B Manager, Traffic Router, Canary Controller | +| **RELEVI** | Release evidence | Evidence Collector, Sticker Writer, Audit Exporter | +| **PLUGIN** | Plugin infrastructure | Plugin Registry, Plugin Loader, Plugin SDK | + +## Design Principles + +- **Offline-first**: All core operations work in air-gapped environments +- **Deterministic replay**: Same inputs yield same outputs (stable ordering, canonical hashing) +- **Evidence-linked decisions**: Every decision links to concrete evidence artifacts +- **Digest-first release identity**: Releases are immutable OCI digests, not mutable tags +- **Pluggable everything**: Integrations are plugins; core orchestration is stable +- **No feature gating**: All plans include all features; limits are environments + new digests/day diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md index 94e0d1d14..937e5537b 100755 --- a/docs/ROADMAP.md +++ b/docs/ROADMAP.md @@ -1,34 +1,112 @@ # Roadmap -This repository is the source of truth for StellaOps direction. The roadmap is expressed as stable, evidence-based capability milestones (not calendar promises) so it stays correct during long audits and offline operation. +This repository is the source of truth for Stella Ops Suite direction. The roadmap is expressed as stable, evidence-based capability milestones (not calendar promises) so it stays correct during long audits and offline operation. -## How to read this -- **Now / Next / Later** are priority bands, not dates. -- A capability is "done" when the required evidence exists and is reproducible (see `docs/product/roadmap/maturity-model.md`). +## Strategic Direction -## Now (Foundation) -- Deterministic scan pipeline: image -> SBOMs (SPDX 3.0.1 + CycloneDX 1.7) with stable identifiers and replayable outputs. -- Advisory ingestion with offline-friendly mirrors, normalization, and deterministic merges. -- VEX-first triage: OpenVEX ingestion/consensus with explainable, stable verdicts. -- Policy gates: deterministic policy evaluation (OPA/Rego where applicable) with audit-friendly decision traces. -- Offline Kit workflows (bundle -> import -> verify) with signed artifacts and deterministic indexes. +**Stella Ops Suite** is evolving from a vulnerability scanning platform into a **centralized, auditable release control plane** for non-Kubernetes container estates. The existing scanning capabilities become security gates within release orchestration. -## Next (Hardening) -- Multi-tenant isolation (tenancy boundaries + RLS where applicable) and an audit trail built for replay. -- Signing and provenance hardening: DSSE/in-toto everywhere; configurable crypto profiles (FIPS/GOST/SM) where enabled. -- Determinism gates and replay tests in CI to prevent output drift across time and environments. +- **Release orchestration** — UI-driven promotion (Dev → Stage → Prod), approvals, policy gates, rollbacks +- **Security decisioning as a gate** — Scan on build, evaluate on release, re-evaluate on CVE updates +- **OCI-digest-first releases** — Immutable digest-based release identity +- **Non-Kubernetes specialization** — Docker hosts, Compose, ECS, Nomad as first-class targets -## Later (Ecosystem) -- Wider connector/plugin ecosystem, operator tooling, and SDKs. -- Expanded graph/reachability capabilities and export/pack formats for regulated environments. +## How to Read This -## Detailed breakdown -- `docs/product/roadmap/README.md` -- `docs/product/roadmap/maturity-model.md` +- **Operational** = capabilities that are implemented and working +- **Now / Next / Later** = priority bands for new development (not calendar dates) +- A capability is "done" when the required evidence exists and is reproducible (see `docs/product/roadmap/maturity-model.md`) -## Related high-level docs -- `docs/VISION.md` -- `docs/FEATURE_MATRIX.md` -- `docs/ARCHITECTURE_OVERVIEW.md` -- `docs/OFFLINE_KIT.md` -- `docs/key-features.md` +--- + +## Operational (Existing Capabilities) + +These capabilities are implemented and serve as the foundation for security gates: + +- **Deterministic scan pipeline** — Image → SBOMs (SPDX 3.0.1 + CycloneDX 1.7) with stable identifiers and replayable outputs +- **Advisory ingestion** — Offline-friendly mirrors, normalization, deterministic merges (Concelier) +- **VEX-first triage** — OpenVEX ingestion/consensus with explainable, stable verdicts (VEX Lens) +- **Policy gates** — Deterministic policy evaluation (OPA/Rego) with audit-friendly decision traces +- **Offline Kit workflows** — Bundle → import → verify with signed artifacts and deterministic indexes +- **Signing and provenance** — DSSE/in-toto attestations; configurable crypto profiles (FIPS/eIDAS/GOST/SM) +- **Determinism guarantees** — Replay tests in CI; frozen feeds; stable ordering + +--- + +## Now (Release Orchestration Foundation) + +Priority: Building the core release orchestration infrastructure. + +### Phase 1: Foundation +- **Environment management** — Environment CRUD, freeze windows, approval policies +- **Integration hub** — Connection profiles, basic connectors (GitHub, Harbor) +- **Release bundles** — Component registry, release creation, tag → digest resolution +- **Database schemas** — Core release, environment, target tables + +### Phase 2: Workflow Engine +- **DAG execution** — Directed acyclic graph workflow processing +- **Step registry** — Built-in steps (script, approval, deploy, gate) +- **Workflow templates** — Reusable workflow definitions +- **Script execution** — C# compiled scripts + sandboxed bash + +--- + +## Next (Promotion & Deployment) + +Priority: Enabling end-to-end release flow. + +### Phase 3: Promotion & Decision +- **Approval gateway** — Approval collection, separation of duties +- **Security gates** — Integration with scan verdicts for gate evaluation +- **Decision engine** — Gate aggregation, decision record generation +- **Evidence packets** — Sealed, signed evidence bundles + +### Phase 4: Deployment Execution +- **Agent framework** — Core agent infrastructure, heartbeat, capability advertisement +- **Docker/Compose agents** — Agent-based deployment to Docker and Compose targets +- **Artifact generation** — `compose.stella.lock.yml`, deployment scripts +- **Rollback support** — Previous version restoration +- **Version stickers** — On-target deployment records for drift detection + +### Phase 5: UI & Polish +- **Release dashboard** — Release list, status, promotion history +- **Promotion UI** — Request, approve, track promotions +- **Environment management UI** — Environment configuration, freeze windows + +--- + +## Later (Advanced Capabilities) + +Priority: Expanding target support and delivery strategies. + +### Phase 6: Progressive Delivery +- **A/B releases** — Traffic splitting between versions +- **Canary deployments** — Gradual rollout with health checks +- **Traffic routing plugins** — Nginx, HAProxy, Traefik, AWS ALB integration + +### Phase 7: Extended Targets +- **ECS agent** — AWS ECS service deployment +- **Nomad agent** — HashiCorp Nomad job deployment +- **SSH/WinRM agentless** — Remote execution without installed agent + +### Phase 8: Plugin Ecosystem +- **Full plugin system** — Three-surface plugin model (manifest, connector, step provider) +- **Plugin SDK** — Development kit for custom integrations +- **Additional connectors** — Expanded SCM, CI, registry, vault support + +--- + +## Detailed Breakdown + +- `docs/product/roadmap/README.md` — Detailed roadmap documentation +- `docs/product/roadmap/maturity-model.md` — Capability maturity definitions +- `docs/modules/release-orchestrator/architecture.md` — Release orchestrator architecture + +## Related Documents + +- [Product Vision](product/VISION.md) +- [Architecture Overview](ARCHITECTURE_OVERVIEW.md) +- [Feature Matrix](FEATURE_MATRIX.md) +- [Key Features](key-features.md) +- [Offline Kit](OFFLINE_KIT.md) +- [Release Orchestrator Specification](product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) diff --git a/docs/implplan/SPRINT_20260110_100_000_INDEX_plugin_unification.md b/docs/implplan/SPRINT_20260110_100_000_INDEX_plugin_unification.md new file mode 100644 index 000000000..687201b88 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_000_INDEX_plugin_unification.md @@ -0,0 +1,574 @@ +# SPRINT INDEX: Phase 100 - Plugin System Unification + +> **Epic:** Platform Foundation +> **Phase:** 100 - Plugin System Unification +> **Batch:** 100 +> **Status:** TODO +> **Successor:** [101_000_INDEX](SPRINT_20260110_101_000_INDEX_foundation.md) (Release Orchestrator Foundation) + +--- + +## Executive Summary + +Phase 100 establishes a **unified plugin architecture** for the entire Stella Ops platform. This phase reworks all existing plugin systems (Crypto, Auth, LLM, SCM, Scanner, Router, Concelier) into a single, cohesive model that supports: + +- **Trust-based execution** - Built-in plugins run in-process; untrusted plugins run sandboxed +- **Capability composition** - Plugins declare and implement multiple capabilities +- **Database-backed registry** - Centralized plugin management with health tracking +- **Full lifecycle management** - Discovery, loading, initialization, health monitoring, graceful shutdown +- **Multi-tenant isolation** - Per-tenant plugin instances with separate configurations + +This unification is **prerequisite** to the Release Orchestrator (Phase 101+), which extends the plugin system with workflow steps, gates, and orchestration-specific connectors. + +--- + +## Strategic Rationale + +### Why Unify Now? + +1. **Technical Debt Reduction** - Seven disparate plugin patterns create maintenance burden +2. **Security Posture** - Unified trust model enables consistent security enforcement +3. **Developer Experience** - Single SDK for all plugin development +4. **Observability** - Centralized registry enables unified health monitoring +5. **Future Extensibility** - Release Orchestrator requires robust plugin infrastructure + +### Current State Analysis + +| Plugin Type | Location | Interface | Pattern | Issues | +|-------------|----------|-----------|---------|--------| +| Crypto | `src/Cryptography/` | `ICryptoProvider` | Simple DI | No lifecycle, no health checks | +| Authority | `src/Authority/` | Various | Config-driven | Inconsistent interfaces | +| LLM | `src/AdvisoryAI/` | `ILlmProviderPlugin` | Priority selection | No isolation | +| SCM | `src/Integrations/` | `IScmConnectorPlugin` | Factory + auto-detect | No registry | +| Scanner | `src/Scanner/` | Analyzer interfaces | Pipeline | Tightly coupled | +| Router | `src/Router/` | `IRouterTransportPlugin` | Transport abstraction | No health tracking | +| Concelier | `src/Concelier/` | `IConcielierConnector` | Feed ingestion | No unified lifecycle | + +### Target State + +All plugins implement: +```csharp +public interface IPlugin : IAsyncDisposable +{ + PluginInfo Info { get; } + PluginTrustLevel TrustLevel { get; } + PluginCapabilities Capabilities { get; } + Task InitializeAsync(IPluginContext context, CancellationToken ct); + Task HealthCheckAsync(CancellationToken ct); +} +``` + +With capability-specific interfaces: +```csharp +// Crypto capability +public interface ICryptoCapability { ... } + +// Connector capability +public interface IConnectorCapability { ... } + +// Analysis capability +public interface IAnalysisCapability { ... } + +// Transport capability +public interface ITransportCapability { ... } +``` + +--- + +## Architecture Overview + +``` +┌─────────────────────────────────────────────────────────────────────────────────┐ +│ UNIFIED PLUGIN ARCHITECTURE │ +│ │ +│ ┌────────────────────────────────────────────────────────────────────────────┐ │ +│ │ StellaOps.Plugin.Abstractions │ │ +│ │ │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ IPlugin │ │ PluginInfo │ │ TrustLevel │ │ Capabilities│ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Capability Interfaces │ │ │ +│ │ │ │ │ │ +│ │ │ ICryptoCapability IConnectorCapability IAnalysisCapability │ │ │ +│ │ │ IAuthCapability ITransportCapability ILlmCapability │ │ │ +│ │ │ IStepProviderCapability IGateProviderCapability │ │ │ +│ │ └─────────────────────────────────────────────────────────────────────┘ │ │ +│ └────────────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌────────────────────────────────────────────────────────────────────────────┐ │ +│ │ StellaOps.Plugin.Host │ │ +│ │ │ │ +│ │ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │ │ +│ │ │ PluginDiscovery │ │ PluginLoader │ │ PluginRegistry │ │ │ +│ │ │ - File system │ │ - Assembly load │ │ - Database │ │ │ +│ │ │ - Manifest parse │ │ - Type activate │ │ - Health track │ │ │ +│ │ └───────────────────┘ └───────────────────┘ └───────────────────┘ │ │ +│ │ │ │ +│ │ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │ │ +│ │ │LifecycleManager │ │ PluginContext │ │ HealthMonitor │ │ │ +│ │ │ - State machine │ │ - Config bind │ │ - Periodic check │ │ │ +│ │ │ - Graceful stop │ │ - Service access │ │ - Alert on fail │ │ │ +│ │ └───────────────────┘ └───────────────────┘ └───────────────────┘ │ │ +│ └────────────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ┌─────────────────────────┼─────────────────────────┐ │ +│ │ │ │ │ +│ ▼ ▼ ▼ │ +│ ┌────────────────────┐ ┌────────────────────┐ ┌────────────────────┐ │ +│ │ In-Process │ │ Isolated │ │ Sandboxed │ │ +│ │ Execution │ │ Execution │ │ Execution │ │ +│ │ │ │ │ │ │ │ +│ │ TrustLevel.BuiltIn│ │ TrustLevel.Trusted │ │TrustLevel.Untrusted│ │ +│ │ - Direct calls │ │ - AppDomain/ALC │ │ - Process isolation│ │ +│ │ - Shared memory │ │ - Resource limits │ │ - gRPC boundary │ │ +│ │ - No overhead │ │ - Moderate overhead│ │ - Full sandboxing │ │ +│ └────────────────────┘ └────────────────────┘ └────────────────────┘ │ +│ │ +│ ┌────────────────────────────────────────────────────────────────────────────┐ │ +│ │ StellaOps.Plugin.Sandbox │ │ +│ │ │ │ +│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ +│ │ │ ProcessManager │ │ ResourceLimiter │ │ NetworkPolicy │ │ │ +│ │ │ - Spawn/kill │ │ - CPU/memory │ │ - Allow/block │ │ │ +│ │ │ - Health watch │ │ - Disk/network │ │ - Rate limit │ │ │ +│ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │ +│ │ │ │ +│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ +│ │ │ GrpcBridge │ │ SecretProxy │ │ LogCollector │ │ │ +│ │ │ - Method call │ │ - Vault access │ │ - Structured │ │ │ +│ │ │ - Streaming │ │ - Scoped access │ │ - Rate limited │ │ │ +│ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │ +│ └────────────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────────┘ + + REWORKED PLUGINS +┌─────────────────────────────────────────────────────────────────────────────────┐ +│ │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ Crypto │ │ Auth │ │ LLM │ │ SCM │ │ +│ │ Plugins │ │ Plugins │ │ Plugins │ │ Connectors │ │ +│ │ │ │ │ │ │ │ │ │ +│ │ - GOST │ │ - LDAP │ │ - llama │ │ - GitHub │ │ +│ │ - eIDAS │ │ - OIDC │ │ - ollama │ │ - GitLab │ │ +│ │ - SM2/3/4 │ │ - SAML │ │ - OpenAI │ │ - AzDO │ │ +│ │ - FIPS │ │ - Workforce │ │ - Claude │ │ - Gitea │ │ +│ │ - HSM │ │ │ │ │ │ │ │ +│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │ +│ │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ Scanner │ │ Router │ │ Concelier │ │ Future │ │ +│ │ Analyzers │ │ Transports │ │ Connectors │ │ Plugins │ │ +│ │ │ │ │ │ │ │ │ │ +│ │ - Go │ │ - TCP/TLS │ │ - NVD │ │ - Steps │ │ +│ │ - Java │ │ - UDP │ │ - OSV │ │ - Gates │ │ +│ │ - .NET │ │ - RabbitMQ │ │ - GHSA │ │ - CI │ │ +│ │ - Python │ │ - Valkey │ │ - Distros │ │ - Registry │ │ +│ │ - 7 more... │ │ │ │ │ │ - Vault │ │ +│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Sprint Structure + +| Sprint ID | Title | Working Directory | Status | Dependencies | +|-----------|-------|-------------------|--------|--------------| +| 100_001 | Plugin Abstractions Library | `src/Plugin/StellaOps.Plugin.Abstractions/` | TODO | None | +| 100_002 | Plugin Host & Lifecycle Manager | `src/Plugin/StellaOps.Plugin.Host/` | TODO | 100_001 | +| 100_003 | Plugin Registry (Database) | `src/Plugin/StellaOps.Plugin.Registry/` | TODO | 100_001, 100_002 | +| 100_004 | Plugin Sandbox Infrastructure | `src/Plugin/StellaOps.Plugin.Sandbox/` | TODO | 100_001, 100_002 | +| 100_005 | Crypto Plugin Rework | `src/Cryptography/` | TODO | 100_001, 100_002, 100_003 | +| 100_006 | Auth Plugin Rework | `src/Authority/` | TODO | 100_001, 100_002, 100_003 | +| 100_007 | LLM Provider Rework | `src/AdvisoryAI/` | TODO | 100_001, 100_002, 100_003 | +| 100_008 | SCM Connector Rework | `src/Integrations/` | TODO | 100_001, 100_002, 100_003 | +| 100_009 | Scanner Analyzer Rework | `src/Scanner/` | TODO | 100_001, 100_002, 100_003 | +| 100_010 | Router Transport Rework | `src/Router/` | TODO | 100_001, 100_002, 100_003 | +| 100_011 | Concelier Connector Rework | `src/Concelier/` | TODO | 100_001, 100_002, 100_003 | +| 100_012 | Plugin SDK & Developer Experience | `src/Plugin/StellaOps.Plugin.Sdk/` | TODO | All above | + +--- + +## Database Schema + +### Core Tables + +```sql +-- Platform-wide plugin registry +CREATE TABLE platform.plugins ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + plugin_id VARCHAR(255) NOT NULL, -- e.g., "com.stellaops.crypto.gost" + name VARCHAR(255) NOT NULL, + version VARCHAR(50) NOT NULL, -- SemVer + vendor VARCHAR(255) NOT NULL, + description TEXT, + license_id VARCHAR(50), -- SPDX identifier + + -- Trust and security + trust_level VARCHAR(50) NOT NULL CHECK (trust_level IN ('builtin', 'trusted', 'untrusted')), + signature BYTEA, -- Plugin signature for verification + signing_key_id VARCHAR(255), + + -- Capabilities (bitmask stored as array for queryability) + capabilities TEXT[] NOT NULL DEFAULT '{}', -- ['crypto', 'connector.scm', 'analysis'] + capability_details JSONB NOT NULL DEFAULT '{}', -- Detailed capability metadata + + -- Source and deployment + source VARCHAR(50) NOT NULL CHECK (source IN ('bundled', 'installed', 'discovered')), + assembly_path VARCHAR(500), + entry_point VARCHAR(255), -- Type name for activation + + -- Lifecycle + status VARCHAR(50) NOT NULL DEFAULT 'discovered' CHECK (status IN ( + 'discovered', 'loading', 'initializing', 'active', + 'degraded', 'stopping', 'stopped', 'failed', 'unloading' + )), + status_message TEXT, + + -- Health + health_status VARCHAR(50) DEFAULT 'unknown' CHECK (health_status IN ( + 'unknown', 'healthy', 'degraded', 'unhealthy' + )), + last_health_check TIMESTAMPTZ, + health_check_failures INT NOT NULL DEFAULT 0, + + -- Metadata + manifest JSONB, -- Full plugin manifest + runtime_info JSONB, -- Runtime metrics, resource usage + + -- Audit + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + loaded_at TIMESTAMPTZ, + + UNIQUE(plugin_id, version) +); + +-- Plugin capability registry (denormalized for fast queries) +CREATE TABLE platform.plugin_capabilities ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + plugin_id UUID NOT NULL REFERENCES platform.plugins(id) ON DELETE CASCADE, + + capability_type VARCHAR(100) NOT NULL, -- 'crypto', 'connector.scm', 'analysis.java' + capability_id VARCHAR(255) NOT NULL, -- 'sign', 'github', 'maven-analyzer' + + -- Capability-specific metadata + config_schema JSONB, -- JSON Schema for configuration + input_schema JSONB, -- Input contract + output_schema JSONB, -- Output contract + + -- Discovery metadata + display_name VARCHAR(255), + description TEXT, + documentation_url VARCHAR(500), + + -- Runtime + is_enabled BOOLEAN NOT NULL DEFAULT TRUE, + + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + + UNIQUE(plugin_id, capability_type, capability_id) +); + +-- Plugin instances for multi-tenant scenarios +CREATE TABLE platform.plugin_instances ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + plugin_id UUID NOT NULL REFERENCES platform.plugins(id) ON DELETE CASCADE, + tenant_id UUID REFERENCES platform.tenants(id) ON DELETE CASCADE, -- NULL = global instance + + instance_name VARCHAR(255), -- Optional friendly name + config JSONB NOT NULL DEFAULT '{}', -- Tenant-specific configuration + secrets_path VARCHAR(500), -- Vault path for secrets + + -- Instance state + enabled BOOLEAN NOT NULL DEFAULT TRUE, + status VARCHAR(50) NOT NULL DEFAULT 'pending', + + -- Resource allocation (for sandboxed plugins) + resource_limits JSONB, -- CPU, memory, network limits + + -- Usage tracking + last_used_at TIMESTAMPTZ, + invocation_count BIGINT NOT NULL DEFAULT 0, + error_count BIGINT NOT NULL DEFAULT 0, + + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + + UNIQUE(plugin_id, tenant_id, COALESCE(instance_name, '')) +); + +-- Plugin health history for trending +CREATE TABLE platform.plugin_health_history ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + plugin_id UUID NOT NULL REFERENCES platform.plugins(id) ON DELETE CASCADE, + + checked_at TIMESTAMPTZ NOT NULL DEFAULT now(), + status VARCHAR(50) NOT NULL, + response_time_ms INT, + details JSONB, + + -- Partition by time for efficient cleanup + created_at TIMESTAMPTZ NOT NULL DEFAULT now() +) PARTITION BY RANGE (created_at); + +-- Indexes +CREATE INDEX idx_plugins_status ON platform.plugins(status) WHERE status != 'active'; +CREATE INDEX idx_plugins_trust_level ON platform.plugins(trust_level); +CREATE INDEX idx_plugins_capabilities ON platform.plugins USING GIN (capabilities); +CREATE INDEX idx_plugin_capabilities_type ON platform.plugin_capabilities(capability_type); +CREATE INDEX idx_plugin_capabilities_lookup ON platform.plugin_capabilities(capability_type, capability_id); +CREATE INDEX idx_plugin_instances_tenant ON platform.plugin_instances(tenant_id) WHERE tenant_id IS NOT NULL; +CREATE INDEX idx_plugin_instances_enabled ON platform.plugin_instances(plugin_id, enabled) WHERE enabled = TRUE; +CREATE INDEX idx_plugin_health_history_plugin ON platform.plugin_health_history(plugin_id, checked_at DESC); +``` + +--- + +## Trust Model + +### Trust Level Determination + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ TRUST LEVEL DETERMINATION │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ Plugin Discovery │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ Is bundled with platform? │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ │ │ +│ YES NO │ +│ │ │ │ +│ ▼ ▼ │ +│ ┌─────────────────┐ ┌─────────────────────────────────────────┐ │ +│ │ TrustLevel. │ │ Has valid signature? │ │ +│ │ BuiltIn │ └─────────────────────────────────────────┘ │ +│ │ │ │ │ │ +│ │ - In-process │ YES NO │ +│ │ - No sandbox │ │ │ │ +│ │ - Full access │ ▼ ▼ │ +│ └─────────────────┘ ┌─────────────────────┐ ┌─────────────────────┐ │ +│ │ Signer in trusted │ │ TrustLevel. │ │ +│ │ vendor list? │ │ Untrusted │ │ +│ └─────────────────────┘ │ │ │ +│ │ │ │ - Process isolation│ │ +│ YES NO │ - Resource limits │ │ +│ │ │ │ - Network policy │ │ +│ ▼ ▼ │ - gRPC boundary │ │ +│ ┌─────────────────┐ │ └─────────────────────┘ │ +│ │ TrustLevel. │ │ │ +│ │ Trusted │◄───┘ │ +│ │ │ │ +│ │ - AppDomain │ │ +│ │ - Soft limits │ │ +│ │ - Monitored │ │ +│ └─────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### Capability-Based Access Control + +Each capability grants specific permissions: + +| Capability | Permissions Granted | +|------------|---------------------| +| `crypto` | Access to key material, signing operations | +| `network` | Outbound HTTP/gRPC calls (host allowlist) | +| `filesystem.read` | Read-only access to specified paths | +| `filesystem.write` | Write access to plugin workspace | +| `secrets` | Access to vault secrets (scoped by policy) | +| `database` | Database connections (scoped by schema) | +| `process` | Spawn child processes (sandboxed only) | + +--- + +## Plugin Lifecycle + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ PLUGIN LIFECYCLE STATE MACHINE │ +│ │ +│ ┌──────────────┐ │ +│ │ Discovered │ │ +│ └──────┬───────┘ │ +│ │ load() │ +│ ▼ │ +│ ┌──────────────┐ │ +│ │ Loading │ │ +│ └──────┬───────┘ │ +│ │ assembly loaded │ +│ ▼ │ +│ ┌──────────────┐ │ +│ │ Initializing │ │ +│ └──────┬───────┘ │ +│ ┌──────────────┼──────────────┐ │ +│ │ success │ │ failure │ +│ ▼ │ ▼ │ +│ ┌──────────────┐ │ ┌──────────────┐ │ +│ │ Active │ │ │ Failed │ │ +│ └──────┬───────┘ │ └──────┬───────┘ │ +│ │ │ │ │ +│ ┌─────────────┼─────────────┐│ │ retry() │ +│ │ │ ││ │ │ +│ health fail stop() health degrade ▼ │ +│ │ │ ││ ┌──────────────┐ │ +│ ▼ │ ▼│ │ Loading │ (retry) │ +│ ┌──────────────┐ │ ┌──────────────┐└──────────────┘ │ +│ │ Unhealthy │ │ │ Degraded │ │ +│ └──────┬───────┘ │ └──────┬───────┘ │ +│ │ │ │ │ +│ auto-recover │ health ok │ +│ │ │ │ │ +│ └─────────────┼─────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌──────────────┐ │ +│ │ Stopping │ │ +│ └──────┬───────┘ │ +│ │ cleanup complete │ +│ ▼ │ +│ ┌──────────────┐ │ +│ │ Stopped │ │ +│ └──────┬───────┘ │ +│ │ unload() │ +│ ▼ │ +│ ┌──────────────┐ │ +│ │ Unloading │ │ +│ └──────┬───────┘ │ +│ │ resources freed │ +│ ▼ │ +│ ┌──────────────┐ │ +│ │ (removed) │ │ +│ └──────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Migration Strategy + +### Phase Approach + +Each plugin type migration follows the same pattern: + +1. **Create New Implementation** - Implement `IPlugin` + capability interfaces +2. **Parallel Operation** - Both old and new implementations active +3. **Feature Parity Validation** - Automated tests verify identical behavior +4. **Gradual Cutover** - Configuration flag switches to new implementation +5. **Deprecation** - Old interfaces marked deprecated +6. **Removal** - Old implementations removed after transition period + +### Breaking Change Policy + +- **Internal interfaces** - Can be changed; update all internal consumers +- **Plugin SDK** - Maintain backward compatibility for one major version +- **Configuration** - Provide migration tooling for config format changes +- **Database** - Always use migrations; never break existing data + +--- + +## Deliverables Summary + +### Libraries Created + +| Library | Purpose | NuGet Package | +|---------|---------|---------------| +| `StellaOps.Plugin.Abstractions` | Core interfaces | `StellaOps.Plugin.Abstractions` | +| `StellaOps.Plugin.Host` | Plugin hosting | `StellaOps.Plugin.Host` | +| `StellaOps.Plugin.Registry` | Database registry | Internal | +| `StellaOps.Plugin.Sandbox` | Process isolation | Internal | +| `StellaOps.Plugin.Sdk` | Plugin development | `StellaOps.Plugin.Sdk` | +| `StellaOps.Plugin.Testing` | Test infrastructure | `StellaOps.Plugin.Testing` | + +### Plugins Reworked + +| Plugin Type | Count | Capability Interface | +|-------------|-------|----------------------| +| Crypto | 5 | `ICryptoCapability` | +| Auth | 4 | `IAuthCapability` | +| LLM | 4 | `ILlmCapability` | +| SCM | 4 | `IScmCapability` | +| Scanner | 11 | `IAnalysisCapability` | +| Router | 4 | `ITransportCapability` | +| Concelier | 8+ | `IFeedCapability` | + +--- + +## Success Criteria + +### Functional Requirements + +- [ ] All existing plugin functionality preserved +- [ ] All plugins implement unified `IPlugin` interface +- [ ] Database registry tracks all plugins +- [ ] Health checks report accurate status +- [ ] Trust levels correctly enforced +- [ ] Sandboxing works for untrusted plugins + +### Non-Functional Requirements + +- [ ] Plugin load time < 500ms (in-process) +- [ ] Plugin load time < 2s (sandboxed) +- [ ] Health check latency < 100ms +- [ ] No memory leaks in plugin lifecycle +- [ ] Graceful shutdown completes in < 10s + +### Quality Requirements + +- [ ] Unit test coverage >= 80% +- [ ] Integration test coverage >= 70% +- [ ] All public APIs documented +- [ ] Migration guide for each plugin type + +--- + +## Risk Assessment + +| Risk | Impact | Likelihood | Mitigation | +|------|--------|------------|------------| +| Breaking existing integrations | High | Medium | Comprehensive testing, gradual rollout | +| Performance regression | Medium | Low | Benchmarking, profiling | +| Sandbox escape vulnerability | Critical | Low | Security audit, penetration testing | +| Migration complexity | Medium | Medium | Clear documentation, tooling | +| Timeline overrun | Medium | Medium | Parallel workstreams, MVP scope | + +--- + +## Dependencies + +### External Dependencies + +| Dependency | Version | Purpose | +|------------|---------|---------| +| .NET 10 | Latest | Runtime | +| gRPC | 2.x | Sandbox communication | +| Npgsql | 8.x | Database access | +| System.Text.Json | Built-in | Manifest parsing | + +### Internal Dependencies + +| Dependency | Purpose | +|------------|---------| +| `StellaOps.Infrastructure.Postgres` | Database utilities | +| `StellaOps.Telemetry` | Logging, metrics | +| `StellaOps.HybridLogicalClock` | Event ordering | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Phase 100 index created | diff --git a/docs/implplan/SPRINT_20260110_100_000_INDEX_release_orchestrator.md b/docs/implplan/SPRINT_20260110_100_000_INDEX_release_orchestrator.md new file mode 100644 index 000000000..f3904de0a --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_000_INDEX_release_orchestrator.md @@ -0,0 +1,326 @@ +# SPRINT INDEX: Release Orchestrator Implementation + +> **Epic:** Stella Ops Suite - Release Control Plane +> **Batch:** 100 +> **Status:** Planning +> **Created:** 10-Jan-2026 +> **Source:** [Architecture Specification](../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) + +--- + +## Overview + +This sprint batch implements the **Release Orchestrator** - transforming Stella Ops from a vulnerability scanning platform into **Stella Ops Suite**, a unified release control plane for non-Kubernetes container environments. + +### Business Value + +- **Unified release governance:** Single pane of glass for release lifecycle +- **Audit-grade evidence:** Cryptographically signed proof of every decision +- **Security as a gate:** Reachability-aware scanning integrated into promotion flow +- **Plugin extensibility:** Support for any SCM, CI, registry, and vault +- **Non-K8s first:** Docker, Compose, ECS, Nomad deployment targets + +### Key Principles + +1. **Digest-first release identity** - Releases are immutable OCI digests, not tags +2. **Evidence for every decision** - Every promotion/deployment produces sealed evidence +3. **Pluggable everything, stable core** - Integrations are plugins; core is stable +4. **No feature gating** - All plans include all features +5. **Offline-first operation** - Core works in air-gapped environments +6. **Immutable generated artifacts** - Every deployment generates stored artifacts + +--- + +## Implementation Phases + +| Phase | Batch | Title | Description | Duration Est. | +|-------|-------|-------|-------------|---------------| +| 1 | 101 | Foundation | Database schema, plugin infrastructure | Foundation | +| 2 | 102 | Integration Hub | Connector runtime, built-in integrations | Foundation | +| 3 | 103 | Environment Manager | Environments, targets, agent registration | Core | +| 4 | 104 | Release Manager | Components, versions, release bundles | Core | +| 5 | 105 | Workflow Engine | DAG execution, step registry | Core | +| 6 | 106 | Promotion & Gates | Approvals, security gates, decisions | Core | +| 7 | 107 | Deployment Execution | Deploy orchestrator, artifact generation | Core | +| 8 | 108 | Agents | Docker, Compose, SSH, WinRM agents | Deployment | +| 9 | 109 | Evidence & Audit | Evidence packets, version stickers | Audit | +| 10 | 110 | Progressive Delivery | A/B releases, canary, traffic routing | Advanced | +| 11 | 111 | UI Implementation | Dashboard, workflow editor, screens | Frontend | + +--- + +## Module Dependencies + +``` + ┌──────────────┐ + │ AUTHORITY │ (existing) + └──────┬───────┘ + │ + ┌──────────────────┼──────────────────┐ + │ │ │ + ▼ ▼ ▼ +┌───────────────┐ ┌───────────────┐ ┌───────────────┐ +│ PLUGIN │ │ INTHUB │ │ ENVMGR │ +│ (Batch 101) │ │ (Batch 102) │ │ (Batch 103) │ +└───────┬───────┘ └───────┬───────┘ └───────┬───────┘ + │ │ │ + └──────────┬───────┴──────────────────┘ + │ + ▼ + ┌───────────────┐ + │ RELMAN │ + │ (Batch 104) │ + └───────┬───────┘ + │ + ▼ + ┌───────────────┐ + │ WORKFL │ + │ (Batch 105) │ + └───────┬───────┘ + │ + ┌──────────┴──────────┐ + │ │ + ▼ ▼ +┌───────────────┐ ┌───────────────┐ +│ PROMOT │ │ DEPLOY │ +│ (Batch 106) │ │ (Batch 107) │ +└───────┬───────┘ └───────┬───────┘ + │ │ + │ ▼ + │ ┌───────────────┐ + │ │ AGENTS │ + │ │ (Batch 108) │ + │ └───────┬───────┘ + │ │ + └──────────┬──────────┘ + │ + ▼ + ┌───────────────┐ + │ RELEVI │ + │ (Batch 109) │ + └───────┬───────┘ + │ + ▼ + ┌───────────────┐ + │ PROGDL │ + │ (Batch 110) │ + └───────────────┘ +``` + +--- + +## Sprint Structure + +### Phase 1: Foundation (Batch 101) + +| Sprint ID | Title | Module | Dependencies | +|-----------|-------|--------|--------------| +| 101_001 | Database Schema - Core Tables | DB | - | +| 101_002 | Plugin Registry | PLUGIN | 101_001 | +| 101_003 | Plugin Loader & Sandbox | PLUGIN | 101_002 | +| 101_004 | Plugin SDK | PLUGIN | 101_003 | + +### Phase 2: Integration Hub (Batch 102) + +| Sprint ID | Title | Module | Dependencies | +|-----------|-------|--------|--------------| +| 102_001 | Integration Manager | INTHUB | 101_002 | +| 102_002 | Connector Runtime | INTHUB | 102_001 | +| 102_003 | Built-in SCM Connectors | INTHUB | 102_002 | +| 102_004 | Built-in Registry Connectors | INTHUB | 102_002 | +| 102_005 | Built-in Vault Connector | INTHUB | 102_002 | +| 102_006 | Doctor Checks | INTHUB | 102_002 | + +### Phase 3: Environment Manager (Batch 103) + +| Sprint ID | Title | Module | Dependencies | +|-----------|-------|--------|--------------| +| 103_001 | Environment CRUD | ENVMGR | 101_001 | +| 103_002 | Target Registry | ENVMGR | 103_001 | +| 103_003 | Agent Manager - Core | ENVMGR | 103_002 | +| 103_004 | Inventory Sync | ENVMGR | 103_002, 103_003 | + +### Phase 4: Release Manager (Batch 104) + +| Sprint ID | Title | Module | Dependencies | +|-----------|-------|--------|--------------| +| 104_001 | Component Registry | RELMAN | 102_004 | +| 104_002 | Version Manager | RELMAN | 104_001 | +| 104_003 | Release Manager | RELMAN | 104_002 | +| 104_004 | Release Catalog | RELMAN | 104_003 | + +### Phase 5: Workflow Engine (Batch 105) + +| Sprint ID | Title | Module | Dependencies | +|-----------|-------|--------|--------------| +| 105_001 | Workflow Template Designer | WORKFL | 101_001 | +| 105_002 | Step Registry | WORKFL | 101_002 | +| 105_003 | Workflow Engine - DAG Executor | WORKFL | 105_001, 105_002 | +| 105_004 | Step Executor | WORKFL | 105_003 | +| 105_005 | Built-in Steps | WORKFL | 105_004 | + +### Phase 6: Promotion & Gates (Batch 106) + +| Sprint ID | Title | Module | Dependencies | +|-----------|-------|--------|--------------| +| 106_001 | Promotion Manager | PROMOT | 104_003, 103_001 | +| 106_002 | Approval Gateway | PROMOT | 106_001 | +| 106_003 | Gate Registry | PROMOT | 106_001 | +| 106_004 | Security Gate | PROMOT | 106_003 | +| 106_005 | Decision Engine | PROMOT | 106_002, 106_003 | + +### Phase 7: Deployment Execution (Batch 107) + +| Sprint ID | Title | Module | Dependencies | +|-----------|-------|--------|--------------| +| 107_001 | Deploy Orchestrator | DEPLOY | 105_003, 106_005 | +| 107_002 | Target Executor | DEPLOY | 107_001, 103_002 | +| 107_003 | Artifact Generator | DEPLOY | 107_001 | +| 107_004 | Rollback Manager | DEPLOY | 107_002 | +| 107_005 | Deployment Strategies | DEPLOY | 107_002 | + +### Phase 8: Agents (Batch 108) + +| Sprint ID | Title | Module | Dependencies | +|-----------|-------|--------|--------------| +| 108_001 | Agent Core Runtime | AGENTS | 103_003 | +| 108_002 | Agent - Docker | AGENTS | 108_001 | +| 108_003 | Agent - Compose | AGENTS | 108_002 | +| 108_004 | Agent - SSH | AGENTS | 108_001 | +| 108_005 | Agent - WinRM | AGENTS | 108_001 | + +### Phase 9: Evidence & Audit (Batch 109) + +| Sprint ID | Title | Module | Dependencies | +|-----------|-------|--------|--------------| +| 109_001 | Evidence Collector | RELEVI | 106_005, 107_001 | +| 109_002 | Evidence Signer | RELEVI | 109_001 | +| 109_003 | Version Sticker Writer | RELEVI | 107_002 | +| 109_004 | Audit Exporter | RELEVI | 109_002 | + +### Phase 10: Progressive Delivery (Batch 110) + +| Sprint ID | Title | Module | Dependencies | +|-----------|-------|--------|--------------| +| 110_001 | A/B Release Manager | PROGDL | 107_005 | +| 110_002 | Traffic Router Framework | PROGDL | 110_001 | +| 110_003 | Canary Controller | PROGDL | 110_002 | +| 110_004 | Router Plugin - Nginx | PROGDL | 110_002 | + +### Phase 11: UI Implementation (Batch 111) + +| Sprint ID | Title | Module | Dependencies | +|-----------|-------|--------|--------------| +| 111_001 | Dashboard - Overview | FE | 107_001 | +| 111_002 | Environment Management UI | FE | 103_001 | +| 111_003 | Release Management UI | FE | 104_003 | +| 111_004 | Workflow Editor | FE | 105_001 | +| 111_005 | Promotion & Approval UI | FE | 106_001 | +| 111_006 | Deployment Monitoring UI | FE | 107_001 | +| 111_007 | Evidence Viewer | FE | 109_002 | + +--- + +## Documentation References + +All architecture documentation is available in: + +``` +docs/modules/release-orchestrator/ +├── README.md # Entry point +├── design/ +│ ├── principles.md # Design principles +│ └── decisions.md # ADRs +├── modules/ +│ ├── overview.md # Module landscape +│ ├── integration-hub.md # INTHUB spec +│ ├── environment-manager.md # ENVMGR spec +│ ├── release-manager.md # RELMAN spec +│ ├── workflow-engine.md # WORKFL spec +│ ├── promotion-manager.md # PROMOT spec +│ ├── deploy-orchestrator.md # DEPLOY spec +│ ├── agents.md # AGENTS spec +│ ├── progressive-delivery.md # PROGDL spec +│ ├── evidence.md # RELEVI spec +│ └── plugin-system.md # PLUGIN spec +├── data-model/ +│ ├── schema.md # PostgreSQL schema +│ └── entities.md # Entity definitions +├── api/ +│ └── overview.md # API design +├── workflow/ +│ ├── templates.md # Template spec +│ ├── execution.md # Execution state machine +│ └── promotion.md # Promotion state machine +├── security/ +│ ├── overview.md # Security architecture +│ ├── auth.md # AuthN/AuthZ +│ ├── agent-security.md # Agent security +│ └── threat-model.md # Threat model +├── deployment/ +│ ├── overview.md # Deployment architecture +│ ├── strategies.md # Deployment strategies +│ └── artifacts.md # Artifact generation +├── integrations/ +│ ├── overview.md # Integration types +│ ├── connectors.md # Connector interface +│ ├── webhooks.md # Webhook architecture +│ └── ci-cd.md # CI/CD patterns +├── operations/ +│ ├── overview.md # Observability +│ └── metrics.md # Prometheus metrics +├── ui/ +│ └── overview.md # UI specification +└── appendices/ + ├── glossary.md # Terms + ├── errors.md # Error codes + └── evidence-schema.md # Evidence format +``` + +--- + +## Technology Stack + +| Layer | Technology | +|-------|------------| +| Backend | .NET 10, C# preview | +| Database | PostgreSQL 16+ | +| Message Queue | RabbitMQ / Valkey | +| Frontend | Angular 17 | +| Agent Runtime | .NET AOT | +| Plugin Runtime | gRPC, container sandbox | +| Observability | OpenTelemetry, Prometheus | + +--- + +## Risk Register + +| Risk | Impact | Mitigation | +|------|--------|------------| +| Plugin security vulnerabilities | High | Sandbox isolation, capability restrictions | +| Agent compromise | High | mTLS, short-lived credentials, audit | +| Evidence tampering | High | Append-only DB, cryptographic signing | +| Registry unavailability | Medium | Connection pooling, caching, fallbacks | +| Complex workflow failures | Medium | Comprehensive testing, rollback support | + +--- + +## Success Criteria + +- [ ] Complete database schema for all 10 themes +- [ ] Plugin system supports connector, step, gate types +- [ ] At least 2 built-in connectors per integration type +- [ ] Environment → Release → Promotion → Deploy flow works E2E +- [ ] Evidence packet generated for every deployment +- [ ] Agent deploys to Docker and Compose targets +- [ ] UI shows pipeline overview, approval queues, deployment logs +- [ ] Performance: <500ms API P99, <5min deployment for 10 targets + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint index created | +| | Architecture documentation complete | diff --git a/docs/implplan/SPRINT_20260110_100_001_PLUGIN_abstractions.md b/docs/implplan/SPRINT_20260110_100_001_PLUGIN_abstractions.md new file mode 100644 index 000000000..e49a31d55 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_001_PLUGIN_abstractions.md @@ -0,0 +1,1514 @@ +# SPRINT: Plugin Abstractions Library + +> **Sprint ID:** 100_001 +> **Module:** PLUGIN +> **Phase:** 100 - Plugin System Unification +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_plugin_unification.md) + +--- + +## Overview + +Create the foundational abstractions library that defines all plugin interfaces, types, and contracts. This library is the cornerstone of the unified plugin architecture and must be carefully designed for long-term stability. + +### Objectives + +- Define core `IPlugin` interface and supporting types +- Define capability interfaces for all plugin types +- Define plugin context and lifecycle contracts +- Define configuration and manifest models +- Ensure backward-compatible extensibility patterns +- Create comprehensive XML documentation + +### Working Directory + +``` +src/Plugin/ +├── StellaOps.Plugin.Abstractions/ +│ ├── StellaOps.Plugin.Abstractions.csproj +│ ├── IPlugin.cs +│ ├── PluginInfo.cs +│ ├── PluginTrustLevel.cs +│ ├── PluginCapabilities.cs +│ ├── PluginStatus.cs +│ ├── Context/ +│ │ ├── IPluginContext.cs +│ │ ├── IPluginConfiguration.cs +│ │ ├── IPluginLogger.cs +│ │ └── IPluginServices.cs +│ ├── Lifecycle/ +│ │ ├── PluginLifecycleState.cs +│ │ ├── IPluginLifecycle.cs +│ │ └── PluginLifecycleException.cs +│ ├── Health/ +│ │ ├── HealthCheckResult.cs +│ │ ├── HealthStatus.cs +│ │ └── IHealthCheckable.cs +│ ├── Capabilities/ +│ │ ├── ICryptoCapability.cs +│ │ ├── IAuthCapability.cs +│ │ ├── ILlmCapability.cs +│ │ ├── IConnectorCapability.cs +│ │ ├── IScmCapability.cs +│ │ ├── IRegistryCapability.cs +│ │ ├── IAnalysisCapability.cs +│ │ ├── ITransportCapability.cs +│ │ ├── IFeedCapability.cs +│ │ ├── IStepProviderCapability.cs +│ │ └── IGateProviderCapability.cs +│ ├── Manifest/ +│ │ ├── PluginManifest.cs +│ │ ├── ManifestCapabilityDeclaration.cs +│ │ ├── ManifestDependency.cs +│ │ └── ManifestResourceRequirements.cs +│ ├── Execution/ +│ │ ├── ExecutionContext.cs +│ │ ├── IExecutionBoundary.cs +│ │ └── ExecutionResult.cs +│ └── Attributes/ +│ ├── PluginAttribute.cs +│ ├── CapabilityAttribute.cs +│ ├── RequiresCapabilityAttribute.cs +│ └── PluginVersionAttribute.cs +└── __Tests/ + └── StellaOps.Plugin.Abstractions.Tests/ + ├── StellaOps.Plugin.Abstractions.Tests.csproj + ├── PluginInfoTests.cs + ├── PluginCapabilitiesTests.cs + ├── HealthCheckResultTests.cs + └── ManifestTests.cs +``` + +--- + +## Deliverables + +### Core Plugin Interface + +```csharp +// IPlugin.cs +namespace StellaOps.Plugin.Abstractions; + +/// +/// Core interface that all Stella Ops plugins must implement. +/// Plugins provide one or more capabilities to the platform. +/// +/// +/// +/// The plugin lifecycle follows these phases: +/// +/// Discovery - Plugin assembly found and manifest parsed +/// Loading - Assembly loaded, types resolved +/// Initialization - called with context +/// Active - Plugin servicing requests +/// Shutdown - called for cleanup +/// +/// +/// +/// Plugins declare their trust level, which determines execution context: +/// +/// - Runs in-process, full access +/// - Runs isolated, monitored +/// - Runs sandboxed, restricted +/// +/// +/// +public interface IPlugin : IAsyncDisposable +{ + /// + /// Unique plugin metadata including ID, version, and vendor. + /// + PluginInfo Info { get; } + + /// + /// Trust level determines the execution environment. + /// Bundled plugins return . + /// Third-party plugins typically return . + /// + PluginTrustLevel TrustLevel { get; } + + /// + /// Capabilities this plugin provides. Used for discovery and routing. + /// + PluginCapabilities Capabilities { get; } + + /// + /// Current lifecycle state of the plugin. + /// + PluginLifecycleState State { get; } + + /// + /// Initialize the plugin with the provided context. + /// Called once after loading, before any capability methods. + /// + /// Provides configuration, logging, and service access. + /// Cancellation token for initialization timeout. + /// If initialization fails. + Task InitializeAsync(IPluginContext context, CancellationToken ct); + + /// + /// Perform a health check to verify plugin is functioning correctly. + /// Called periodically by the plugin host. + /// + /// Cancellation token for health check timeout. + /// Health check result with status and optional diagnostics. + Task HealthCheckAsync(CancellationToken ct); +} +``` + +### Plugin Info + +```csharp +// PluginInfo.cs +namespace StellaOps.Plugin.Abstractions; + +/// +/// Immutable metadata identifying a plugin. +/// +/// +/// Reverse domain notation identifier, e.g., "com.stellaops.crypto.gost". +/// Must be unique across all plugins. +/// +/// Human-readable display name. +/// Semantic version string (Major.Minor.Patch[-PreRelease]). +/// Organization or individual that created the plugin. +/// Optional description of plugin functionality. +/// Optional SPDX license identifier. +/// Optional URL to project homepage or repository. +/// Optional URL to plugin icon (64x64 PNG recommended). +public sealed record PluginInfo( + string Id, + string Name, + string Version, + string Vendor, + string? Description = null, + string? LicenseId = null, + string? ProjectUrl = null, + string? IconUrl = null) +{ + /// + /// Validates the plugin info and throws if invalid. + /// + /// If any required field is invalid. + public void Validate() + { + if (string.IsNullOrWhiteSpace(Id)) + throw new ArgumentException("Plugin ID is required", nameof(Id)); + + if (!PluginIdPattern().IsMatch(Id)) + throw new ArgumentException( + "Plugin ID must be reverse domain notation (e.g., com.example.myplugin)", + nameof(Id)); + + if (string.IsNullOrWhiteSpace(Name)) + throw new ArgumentException("Plugin name is required", nameof(Name)); + + if (string.IsNullOrWhiteSpace(Version)) + throw new ArgumentException("Plugin version is required", nameof(Version)); + + if (!SemVerPattern().IsMatch(Version)) + throw new ArgumentException( + "Plugin version must be valid SemVer (e.g., 1.0.0 or 1.0.0-beta.1)", + nameof(Version)); + + if (string.IsNullOrWhiteSpace(Vendor)) + throw new ArgumentException("Plugin vendor is required", nameof(Vendor)); + } + + /// + /// Parses the version string into a comparable . + /// Pre-release suffixes are stripped for comparison. + /// + public Version ParsedVersion => + Version.Parse(Version.Split('-')[0]); + + [GeneratedRegex(@"^[a-z][a-z0-9]*(\.[a-z][a-z0-9]*)+$", RegexOptions.Compiled)] + private static partial Regex PluginIdPattern(); + + [GeneratedRegex(@"^\d+\.\d+\.\d+(-[a-zA-Z0-9\.]+)?$", RegexOptions.Compiled)] + private static partial Regex SemVerPattern(); +} +``` + +### Trust Level + +```csharp +// PluginTrustLevel.cs +namespace StellaOps.Plugin.Abstractions; + +/// +/// Trust level determines plugin execution environment and permissions. +/// +public enum PluginTrustLevel +{ + /// + /// Plugin is bundled with the platform and fully trusted. + /// Executes in-process with full access to platform internals. + /// No sandboxing or resource limits applied. + /// + BuiltIn = 0, + + /// + /// Plugin is signed by a trusted vendor. + /// Executes with moderate isolation (AssemblyLoadContext). + /// Soft resource limits applied, behavior monitored. + /// + Trusted = 1, + + /// + /// Plugin is from an unknown or untrusted source. + /// Executes in isolated process with full sandboxing. + /// Hard resource limits, network restrictions, filesystem isolation. + /// Communication via gRPC over Unix domain socket. + /// + Untrusted = 2 +} + +/// +/// Extension methods for . +/// +public static class PluginTrustLevelExtensions +{ + /// + /// Returns true if the plugin requires process isolation. + /// + public static bool RequiresProcessIsolation(this PluginTrustLevel level) => + level == PluginTrustLevel.Untrusted; + + /// + /// Returns true if the plugin should have resource limits enforced. + /// + public static bool HasResourceLimits(this PluginTrustLevel level) => + level >= PluginTrustLevel.Trusted; + + /// + /// Returns true if the plugin can access platform internals directly. + /// + public static bool CanAccessInternals(this PluginTrustLevel level) => + level == PluginTrustLevel.BuiltIn; +} +``` + +### Plugin Capabilities + +```csharp +// PluginCapabilities.cs +namespace StellaOps.Plugin.Abstractions; + +/// +/// Flags indicating plugin capabilities. Plugins may provide multiple capabilities. +/// +[Flags] +public enum PluginCapabilities : long +{ + /// No capabilities declared. + None = 0, + + // ========== Core Platform Capabilities (bits 0-9) ========== + + /// Cryptographic operations (signing, verification, encryption). + Crypto = 1L << 0, + + /// Authentication and authorization. + Auth = 1L << 1, + + /// Large language model inference. + Llm = 1L << 2, + + /// General secret management. + Secrets = 1L << 3, + + // ========== Connector Capabilities (bits 10-19) ========== + + /// Source control management (GitHub, GitLab, etc.). + Scm = 1L << 10, + + /// Container registry operations. + Registry = 1L << 11, + + /// Continuous integration systems. + Ci = 1L << 12, + + /// Secret vault integration (HashiCorp, Azure KeyVault, etc.). + Vault = 1L << 13, + + /// Notification delivery (email, Slack, Teams, webhooks). + Notification = 1L << 14, + + /// Issue tracking systems (Jira, GitHub Issues, etc.). + IssueTracker = 1L << 15, + + // ========== Analysis Capabilities (bits 20-29) ========== + + /// Source code or binary analysis. + Analysis = 1L << 20, + + /// Vulnerability feed ingestion. + Feed = 1L << 21, + + /// SBOM generation or parsing. + Sbom = 1L << 22, + + // ========== Infrastructure Capabilities (bits 30-39) ========== + + /// Message transport (TCP, UDP, AMQP, etc.). + Transport = 1L << 30, + + /// Network access required. + Network = 1L << 31, + + /// Filesystem read access. + FilesystemRead = 1L << 32, + + /// Filesystem write access. + FilesystemWrite = 1L << 33, + + /// Process spawning. + Process = 1L << 34, + + // ========== Orchestrator Capabilities (bits 40-49) ========== + + /// Workflow step provider. + StepProvider = 1L << 40, + + /// Promotion gate provider. + GateProvider = 1L << 41, + + /// Deployment target provider. + TargetProvider = 1L << 42, + + /// Evidence collector. + EvidenceProvider = 1L << 43, + + // ========== Composite Capabilities ========== + + /// All connector capabilities. + AllConnectors = Scm | Registry | Ci | Vault | Notification | IssueTracker, + + /// All orchestrator capabilities. + AllOrchestrator = StepProvider | GateProvider | TargetProvider | EvidenceProvider, + + /// All infrastructure capabilities. + AllInfrastructure = Transport | Network | FilesystemRead | FilesystemWrite | Process +} + +/// +/// Extension methods for . +/// +public static class PluginCapabilitiesExtensions +{ + /// + /// Returns true if the plugin has the specified capability. + /// + public static bool Has(this PluginCapabilities capabilities, PluginCapabilities capability) => + (capabilities & capability) == capability; + + /// + /// Returns true if the plugin has any of the specified capabilities. + /// + public static bool HasAny(this PluginCapabilities capabilities, PluginCapabilities any) => + (capabilities & any) != 0; + + /// + /// Converts capabilities to string array for database storage. + /// + public static string[] ToStringArray(this PluginCapabilities capabilities) + { + var result = new List(); + foreach (PluginCapabilities value in Enum.GetValues()) + { + if (value != PluginCapabilities.None && + !value.ToString().StartsWith("All") && + capabilities.Has(value)) + { + result.Add(value.ToString().ToLowerInvariant()); + } + } + return result.ToArray(); + } + + /// + /// Parses capability strings back to flags. + /// + public static PluginCapabilities FromStringArray(string[] capabilities) + { + var result = PluginCapabilities.None; + foreach (var cap in capabilities) + { + if (Enum.TryParse(cap, ignoreCase: true, out var parsed)) + { + result |= parsed; + } + } + return result; + } +} +``` + +### Plugin Context + +```csharp +// Context/IPluginContext.cs +namespace StellaOps.Plugin.Abstractions.Context; + +/// +/// Provides access to platform services and configuration during plugin execution. +/// Passed to and available throughout plugin lifetime. +/// +public interface IPluginContext +{ + /// + /// Plugin-specific configuration bound from manifest and runtime settings. + /// + IPluginConfiguration Configuration { get; } + + /// + /// Scoped logger for plugin diagnostics. Automatically tagged with plugin ID. + /// + IPluginLogger Logger { get; } + + /// + /// Service locator for accessing platform services. + /// Available services depend on plugin trust level and declared capabilities. + /// + IPluginServices Services { get; } + + /// + /// Current tenant ID, if plugin is running in tenant context. + /// Null for global plugins. + /// + Guid? TenantId { get; } + + /// + /// Unique instance ID for this plugin activation. + /// + Guid InstanceId { get; } + + /// + /// Cancellation token that fires when shutdown is requested. + /// Plugins should monitor this and clean up gracefully. + /// + CancellationToken ShutdownToken { get; } + + /// + /// Time provider for deterministic time operations. + /// + TimeProvider TimeProvider { get; } +} + +/// +/// Plugin-specific configuration access. +/// +public interface IPluginConfiguration +{ + /// + /// Gets a configuration value by key. + /// + /// Target type for conversion. + /// Configuration key (dot-separated path). + /// Default if key not found. + T? GetValue(string key, T? defaultValue = default); + + /// + /// Binds configuration section to a strongly-typed options class. + /// + /// Options type with properties matching config keys. + /// Section key, or null for root. + T Bind(string? sectionKey = null) where T : class, new(); + + /// + /// Gets a secret value from the configured vault. + /// + /// Name of the secret. + /// Cancellation token. + /// Secret value, or null if not found. + Task GetSecretAsync(string secretName, CancellationToken ct); +} + +/// +/// Plugin logging interface with structured logging support. +/// +public interface IPluginLogger +{ + void Log(LogLevel level, string message, params object[] args); + void Log(LogLevel level, Exception exception, string message, params object[] args); + + void Debug(string message, params object[] args) => Log(LogLevel.Debug, message, args); + void Info(string message, params object[] args) => Log(LogLevel.Information, message, args); + void Warning(string message, params object[] args) => Log(LogLevel.Warning, message, args); + void Error(string message, params object[] args) => Log(LogLevel.Error, message, args); + void Error(Exception ex, string message, params object[] args) => Log(LogLevel.Error, ex, message, args); + + /// + /// Creates a scoped logger with additional properties. + /// + IPluginLogger WithProperty(string name, object value); + + /// + /// Creates a scoped logger for a specific operation. + /// + IPluginLogger ForOperation(string operationName); +} + +/// +/// Service locator for accessing platform services from plugins. +/// +public interface IPluginServices +{ + /// + /// Gets a required service. Throws if not available. + /// + /// Service type. + /// If service not available. + T GetRequiredService() where T : class; + + /// + /// Gets an optional service. Returns null if not available. + /// + T? GetService() where T : class; + + /// + /// Gets all registered implementations of a service. + /// + IEnumerable GetServices() where T : class; + + /// + /// Creates a scoped service provider for the current operation. + /// Dispose the scope when operation completes. + /// + IAsyncDisposable CreateScope(out IPluginServices scopedServices); +} +``` + +### Health Check + +```csharp +// Health/HealthCheckResult.cs +namespace StellaOps.Plugin.Abstractions.Health; + +/// +/// Result of a plugin health check. +/// +/// Overall health status. +/// Optional message describing status. +/// Time taken to perform health check. +/// Additional diagnostic details. +public sealed record HealthCheckResult( + HealthStatus Status, + string? Message = null, + TimeSpan? Duration = null, + IReadOnlyDictionary? Details = null) +{ + /// + /// Creates a healthy result. + /// + public static HealthCheckResult Healthy(string? message = null) => + new(HealthStatus.Healthy, message); + + /// + /// Creates a degraded result (functioning but impaired). + /// + public static HealthCheckResult Degraded(string message, IReadOnlyDictionary? details = null) => + new(HealthStatus.Degraded, message, Details: details); + + /// + /// Creates an unhealthy result. + /// + public static HealthCheckResult Unhealthy(string message, IReadOnlyDictionary? details = null) => + new(HealthStatus.Unhealthy, message, Details: details); + + /// + /// Creates an unhealthy result from an exception. + /// + public static HealthCheckResult Unhealthy(Exception exception) => + new(HealthStatus.Unhealthy, exception.Message, Details: new Dictionary + { + ["exceptionType"] = exception.GetType().Name, + ["stackTrace"] = exception.StackTrace ?? string.Empty + }); +} + +/// +/// Health status values. +/// +public enum HealthStatus +{ + /// Plugin is healthy and fully operational. + Healthy = 0, + + /// Plugin is functioning but with degraded performance or partial failures. + Degraded = 1, + + /// Plugin is not functioning and cannot service requests. + Unhealthy = 2 +} +``` + +### Capability Interfaces - Crypto + +```csharp +// Capabilities/ICryptoCapability.cs +namespace StellaOps.Plugin.Abstractions.Capabilities; + +/// +/// Capability interface for cryptographic operations. +/// Implemented by plugins providing signing, verification, encryption, or hashing. +/// +public interface ICryptoCapability +{ + /// + /// Algorithms supported by this provider. + /// Format: "{family}-{variant}" e.g., "RSA-SHA256", "ECDSA-P256", "GOST-R34.10-2012". + /// + IReadOnlyList SupportedAlgorithms { get; } + + /// + /// Returns true if this provider can perform the specified operation with the given algorithm. + /// + bool CanHandle(CryptoOperation operation, string algorithm); + + /// + /// Sign data using the specified algorithm and key. + /// + /// Data to sign. + /// Signing options including algorithm and key reference. + /// Cancellation token. + /// Signature bytes. + Task SignAsync(ReadOnlyMemory data, CryptoSignOptions options, CancellationToken ct); + + /// + /// Verify a signature. + /// + /// Original data. + /// Signature to verify. + /// Verification options including algorithm and key reference. + /// Cancellation token. + /// True if signature is valid. + Task VerifyAsync(ReadOnlyMemory data, ReadOnlyMemory signature, CryptoVerifyOptions options, CancellationToken ct); + + /// + /// Encrypt data. + /// + Task EncryptAsync(ReadOnlyMemory data, CryptoEncryptOptions options, CancellationToken ct); + + /// + /// Decrypt data. + /// + Task DecryptAsync(ReadOnlyMemory data, CryptoDecryptOptions options, CancellationToken ct); + + /// + /// Compute hash of data. + /// + Task HashAsync(ReadOnlyMemory data, string algorithm, CancellationToken ct); +} + +public enum CryptoOperation +{ + Sign, + Verify, + Encrypt, + Decrypt, + Hash +} + +public sealed record CryptoSignOptions( + string Algorithm, + string KeyId, + string? KeyVersion = null, + IReadOnlyDictionary? Metadata = null); + +public sealed record CryptoVerifyOptions( + string Algorithm, + string KeyId, + string? KeyVersion = null, + string? CertificateChain = null); + +public sealed record CryptoEncryptOptions( + string Algorithm, + string KeyId, + byte[]? Iv = null, + byte[]? Aad = null); + +public sealed record CryptoDecryptOptions( + string Algorithm, + string KeyId, + byte[]? Iv = null, + byte[]? Aad = null); +``` + +### Capability Interfaces - Connector + +```csharp +// Capabilities/IConnectorCapability.cs +namespace StellaOps.Plugin.Abstractions.Capabilities; + +/// +/// Base capability for external system connectors. +/// +public interface IConnectorCapability +{ + /// + /// Connector type identifier, e.g., "scm.github", "registry.ecr", "vault.hashicorp". + /// + string ConnectorType { get; } + + /// + /// Human-readable display name. + /// + string DisplayName { get; } + + /// + /// Test the connection to the external system. + /// + Task TestConnectionAsync(CancellationToken ct); + + /// + /// Get current connection status and metadata. + /// + Task GetConnectionInfoAsync(CancellationToken ct); +} + +public sealed record ConnectionTestResult( + bool Success, + string? Message = null, + TimeSpan? Latency = null, + IReadOnlyDictionary? Details = null) +{ + public static ConnectionTestResult Succeeded(TimeSpan? latency = null) => + new(true, "Connection successful", latency); + + public static ConnectionTestResult Failed(string message, Exception? ex = null) => + new(false, message, Details: ex != null ? new Dictionary + { + ["exception"] = ex.GetType().Name, + ["exceptionMessage"] = ex.Message + } : null); +} + +public sealed record ConnectionInfo( + string EndpointUrl, + string? AuthenticatedAs = null, + DateTimeOffset? ConnectedSince = null, + IReadOnlyDictionary? Metadata = null); +``` + +### Capability Interfaces - SCM + +```csharp +// Capabilities/IScmCapability.cs +namespace StellaOps.Plugin.Abstractions.Capabilities; + +/// +/// Capability interface for source control management systems. +/// +public interface IScmCapability : IConnectorCapability +{ + /// + /// SCM type (github, gitlab, azdo, gitea, bitbucket). + /// + string ScmType { get; } + + /// + /// Returns true if this connector can handle the given repository URL. + /// Used for auto-detection. + /// + bool CanHandle(string repositoryUrl); + + /// + /// List branches in a repository. + /// + Task> ListBranchesAsync(string repositoryUrl, CancellationToken ct); + + /// + /// List commits on a branch. + /// + Task> ListCommitsAsync( + string repositoryUrl, + string branch, + int limit = 50, + CancellationToken ct = default); + + /// + /// Get details of a specific commit. + /// + Task GetCommitAsync(string repositoryUrl, string commitSha, CancellationToken ct); + + /// + /// Get file content at a specific ref. + /// + Task GetFileAsync( + string repositoryUrl, + string filePath, + string? reference = null, + CancellationToken ct = default); + + /// + /// Download repository archive. + /// + Task GetArchiveAsync( + string repositoryUrl, + string reference, + ArchiveFormat format = ArchiveFormat.TarGz, + CancellationToken ct = default); + + /// + /// Create or update a webhook. + /// + Task UpsertWebhookAsync( + string repositoryUrl, + ScmWebhookConfig config, + CancellationToken ct); + + /// + /// Get current authenticated user info. + /// + Task GetCurrentUserAsync(CancellationToken ct); +} + +public sealed record ScmBranch( + string Name, + string CommitSha, + bool IsDefault, + bool IsProtected); + +public sealed record ScmCommit( + string Sha, + string Message, + string AuthorName, + string AuthorEmail, + DateTimeOffset AuthoredAt, + IReadOnlyList ParentShas); + +public sealed record ScmFileContent( + string Path, + string Content, + string Encoding, + string Sha, + long Size); + +public sealed record ScmWebhook( + string Id, + string Url, + IReadOnlyList Events, + bool Active); + +public sealed record ScmWebhookConfig( + string Url, + string Secret, + IReadOnlyList Events); + +public sealed record ScmUser( + string Id, + string Username, + string? DisplayName, + string? Email, + string? AvatarUrl); + +public enum ArchiveFormat +{ + TarGz, + Zip +} +``` + +### Capability Interfaces - Analysis + +```csharp +// Capabilities/IAnalysisCapability.cs +namespace StellaOps.Plugin.Abstractions.Capabilities; + +/// +/// Capability interface for source code and binary analysis. +/// Implemented by scanner analyzers for different languages/ecosystems. +/// +public interface IAnalysisCapability +{ + /// + /// Analysis type identifier, e.g., "maven", "npm", "go-mod", "dotnet". + /// + string AnalysisType { get; } + + /// + /// File patterns this analyzer can process. + /// Glob patterns, e.g., ["pom.xml", "**/pom.xml", "*.jar"]. + /// + IReadOnlyList FilePatterns { get; } + + /// + /// Languages/ecosystems this analyzer supports. + /// + IReadOnlyList SupportedEcosystems { get; } + + /// + /// Returns true if this analyzer can process the given file. + /// + bool CanAnalyze(string filePath, ReadOnlySpan fileHeader); + + /// + /// Analyze a file or directory and extract dependency information. + /// + /// Analysis context with file access and configuration. + /// Cancellation token. + /// Analysis result with discovered components. + Task AnalyzeAsync(IAnalysisContext context, CancellationToken ct); +} + +public interface IAnalysisContext +{ + /// Root path being analyzed. + string RootPath { get; } + + /// Target file or directory for analysis. + string TargetPath { get; } + + /// Read file contents. + Task ReadFileAsync(string relativePath, CancellationToken ct); + + /// List files matching a pattern. + Task> GlobAsync(string pattern, CancellationToken ct); + + /// Check if file exists. + Task FileExistsAsync(string relativePath, CancellationToken ct); + + /// Analysis configuration. + IPluginConfiguration Configuration { get; } + + /// Logger for diagnostics. + IPluginLogger Logger { get; } +} + +public sealed record AnalysisResult( + bool Success, + IReadOnlyList Components, + IReadOnlyList Diagnostics, + AnalysisMetadata Metadata); + +public sealed record DiscoveredComponent( + string Name, + string Version, + string Ecosystem, + string? Purl, + string? Cpe, + ComponentType Type, + string? License, + IReadOnlyList Dependencies, + IReadOnlyDictionary? Metadata = null); + +public sealed record ComponentDependency( + string Name, + string? VersionConstraint, + DependencyScope Scope, + bool IsOptional); + +public sealed record AnalysisDiagnostic( + DiagnosticSeverity Severity, + string Code, + string Message, + string? FilePath = null, + int? Line = null); + +public sealed record AnalysisMetadata( + string AnalyzerType, + string AnalyzerVersion, + TimeSpan Duration, + int FilesProcessed); + +public enum ComponentType +{ + Library, + Framework, + Application, + OperatingSystem, + Device, + Container, + File +} + +public enum DependencyScope +{ + Runtime, + Development, + Test, + Build, + Optional, + Provided +} + +public enum DiagnosticSeverity +{ + Info, + Warning, + Error +} +``` + +### Capability Interfaces - Step Provider + +```csharp +// Capabilities/IStepProviderCapability.cs +namespace StellaOps.Plugin.Abstractions.Capabilities; + +/// +/// Capability interface for workflow step providers. +/// Plugins implementing this provide custom steps for release workflows. +/// +public interface IStepProviderCapability +{ + /// + /// Steps provided by this plugin. + /// + IReadOnlyList ProvidedSteps { get; } + + /// + /// Create an executor for the specified step type. + /// + /// Step type from . + /// Cancellation token. + Task CreateExecutorAsync(string stepType, CancellationToken ct); +} + +/// +/// Definition of a step type provided by a plugin. +/// +public sealed record StepDefinition( + string Type, + string DisplayName, + string Description, + string Category, + JsonDocument ConfigSchema, + JsonDocument InputSchema, + JsonDocument OutputSchema, + IReadOnlyList RequiredCapabilities); + +/// +/// Executor for a workflow step. +/// +public interface IStepExecutor : IAsyncDisposable +{ + /// + /// Execute the step with streaming events. + /// + /// Step configuration from workflow definition. + /// Input values from previous steps or workflow inputs. + /// Execution context with services and cancellation. + /// Cancellation token. + /// Async stream of step events (logs, outputs, progress, result). + IAsyncEnumerable ExecuteAsync( + JsonDocument config, + IReadOnlyDictionary inputs, + IStepContext context, + CancellationToken ct); +} + +/// +/// Context provided to step executors during execution. +/// +public interface IStepContext +{ + /// Unique execution ID for this step run. + Guid ExecutionId { get; } + + /// Workflow execution ID. + Guid WorkflowExecutionId { get; } + + /// Deployment ID if step is part of deployment. + Guid? DeploymentId { get; } + + /// Logger for step diagnostics. + IPluginLogger Logger { get; } + + /// Secret access (scoped to step permissions). + Task GetSecretAsync(string secretName, CancellationToken ct); + + /// Report progress (0-100). + Task ReportProgressAsync(int percentage, string? message = null, CancellationToken ct = default); +} + +/// +/// Event emitted during step execution. +/// +public abstract record StepEvent(DateTimeOffset Timestamp); + +public sealed record StepLogEvent( + DateTimeOffset Timestamp, + LogLevel Level, + string Message) : StepEvent(Timestamp); + +public sealed record StepOutputEvent( + DateTimeOffset Timestamp, + string Name, + object Value) : StepEvent(Timestamp); + +public sealed record StepProgressEvent( + DateTimeOffset Timestamp, + int Percentage, + string? Message = null) : StepEvent(Timestamp); + +public sealed record StepResultEvent( + DateTimeOffset Timestamp, + bool Success, + string? Message = null, + IReadOnlyDictionary? Outputs = null) : StepEvent(Timestamp); +``` + +### Capability Interfaces - Gate Provider + +```csharp +// Capabilities/IGateProviderCapability.cs +namespace StellaOps.Plugin.Abstractions.Capabilities; + +/// +/// Capability interface for promotion gate providers. +/// Plugins implementing this provide custom gates for release promotion. +/// +public interface IGateProviderCapability +{ + /// + /// Gates provided by this plugin. + /// + IReadOnlyList ProvidedGates { get; } + + /// + /// Create an evaluator for the specified gate type. + /// + Task CreateEvaluatorAsync(string gateType, CancellationToken ct); +} + +/// +/// Definition of a gate type provided by a plugin. +/// +public sealed record GateDefinition( + string Type, + string DisplayName, + string Description, + string Category, + JsonDocument ConfigSchema, + bool SupportsAutoApprove, + bool SupportsManualOverride); + +/// +/// Evaluator for a promotion gate. +/// +public interface IGateEvaluator : IAsyncDisposable +{ + /// + /// Evaluate the gate and return a decision. + /// + /// Gate configuration from workflow definition. + /// Evaluation context with release and environment info. + /// Cancellation token. + Task EvaluateAsync( + JsonDocument config, + IGateContext context, + CancellationToken ct); +} + +/// +/// Context provided to gate evaluators. +/// +public interface IGateContext +{ + /// Release being promoted. + GateReleaseInfo Release { get; } + + /// Source environment. + GateEnvironmentInfo SourceEnvironment { get; } + + /// Target environment. + GateEnvironmentInfo TargetEnvironment { get; } + + /// User requesting promotion. + GateUserInfo RequestedBy { get; } + + /// Previous gate results in this promotion. + IReadOnlyList PreviousGateResults { get; } + + /// Logger for diagnostics. + IPluginLogger Logger { get; } +} + +public sealed record GateReleaseInfo( + Guid Id, + string Name, + string Version, + IReadOnlyList Components); + +public sealed record GateComponentInfo( + string Name, + string Digest, + string? Version); + +public sealed record GateEnvironmentInfo( + Guid Id, + string Name, + int Tier); + +public sealed record GateUserInfo( + string Id, + string Username, + IReadOnlyList Roles); + +/// +/// Result of gate evaluation. +/// +public sealed record GateResult( + GateDecision Decision, + string? Message = null, + IReadOnlyList? Findings = null, + IReadOnlyDictionary? Evidence = null, + DateTimeOffset? ExpiresAt = null); + +public enum GateDecision +{ + /// Gate passed, promotion can proceed. + Passed, + + /// Gate failed, promotion blocked. + Failed, + + /// Gate requires manual review. + PendingReview, + + /// Gate could not evaluate, retry later. + Inconclusive +} + +public sealed record GateFinding( + GateFindingSeverity Severity, + string Code, + string Title, + string Description, + string? Remediation = null, + IReadOnlyDictionary? Metadata = null); + +public enum GateFindingSeverity +{ + Info, + Low, + Medium, + High, + Critical +} +``` + +### Plugin Manifest + +```csharp +// Manifest/PluginManifest.cs +namespace StellaOps.Plugin.Abstractions.Manifest; + +/// +/// Plugin manifest describing plugin metadata, capabilities, and requirements. +/// Typically loaded from plugin.yaml or plugin.json in the plugin package. +/// +public sealed record PluginManifest +{ + /// Plugin metadata. + public required PluginInfo Info { get; init; } + + /// Plugin entry point type (fully qualified name). + public required string EntryPoint { get; init; } + + /// Minimum platform version required. + public string? MinPlatformVersion { get; init; } + + /// Maximum platform version supported. + public string? MaxPlatformVersion { get; init; } + + /// Capabilities declared by this plugin. + public IReadOnlyList Capabilities { get; init; } = []; + + /// Dependencies on other plugins. + public IReadOnlyList Dependencies { get; init; } = []; + + /// Resource requirements for sandboxed execution. + public ManifestResourceRequirements? ResourceRequirements { get; init; } + + /// Network hosts the plugin needs to access. + public IReadOnlyList RequiredHosts { get; init; } = []; + + /// Configuration schema (JSON Schema). + public JsonDocument? ConfigSchema { get; init; } + + /// Default configuration values. + public JsonDocument? DefaultConfig { get; init; } +} + +/// +/// Capability declaration in the manifest. +/// +public sealed record ManifestCapabilityDeclaration( + string Type, + string? Id = null, + JsonDocument? ConfigSchema = null, + IReadOnlyDictionary? Metadata = null); + +/// +/// Dependency on another plugin. +/// +public sealed record ManifestDependency( + string PluginId, + string? MinVersion = null, + string? MaxVersion = null, + bool Optional = false); + +/// +/// Resource requirements for sandboxed plugins. +/// +public sealed record ManifestResourceRequirements( + int? MaxMemoryMb = null, + int? MaxCpuPercent = null, + int? MaxDiskMb = null, + int? MaxNetworkBandwidthMbps = null, + TimeSpan? InitializationTimeout = null, + TimeSpan? OperationTimeout = null); +``` + +### Plugin Attributes + +```csharp +// Attributes/PluginAttribute.cs +namespace StellaOps.Plugin.Abstractions.Attributes; + +/// +/// Marks a class as a plugin entry point. +/// +[AttributeUsage(AttributeTargets.Class, AllowMultiple = false, Inherited = false)] +public sealed class PluginAttribute : Attribute +{ + public string Id { get; } + public string Name { get; } + public string Version { get; } + public string Vendor { get; } + public string? Description { get; set; } + public string? LicenseId { get; set; } + + public PluginAttribute(string id, string name, string version, string vendor) + { + Id = id; + Name = name; + Version = version; + Vendor = vendor; + } +} + +/// +/// Declares a capability provided by the plugin. +/// +[AttributeUsage(AttributeTargets.Class, AllowMultiple = true, Inherited = false)] +public sealed class ProvidesCapabilityAttribute : Attribute +{ + public PluginCapabilities Capability { get; } + public string? CapabilityId { get; set; } + + public ProvidesCapabilityAttribute(PluginCapabilities capability) + { + Capability = capability; + } +} + +/// +/// Declares a capability required by the plugin. +/// +[AttributeUsage(AttributeTargets.Class, AllowMultiple = true, Inherited = false)] +public sealed class RequiresCapabilityAttribute : Attribute +{ + public PluginCapabilities Capability { get; } + public bool Optional { get; set; } + + public RequiresCapabilityAttribute(PluginCapabilities capability) + { + Capability = capability; + } +} + +/// +/// Specifies the minimum platform version required. +/// +[AttributeUsage(AttributeTargets.Class, AllowMultiple = false, Inherited = false)] +public sealed class RequiresPlatformVersionAttribute : Attribute +{ + public string MinVersion { get; } + public string? MaxVersion { get; set; } + + public RequiresPlatformVersionAttribute(string minVersion) + { + MinVersion = minVersion; + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] `IPlugin` interface defined with all lifecycle methods +- [ ] `PluginInfo` record with validation +- [ ] `PluginTrustLevel` enum with extension methods +- [ ] `PluginCapabilities` flags enum with 40+ capabilities +- [ ] `IPluginContext` and related interfaces +- [ ] `HealthCheckResult` with factory methods +- [ ] All capability interfaces defined: + - [ ] `ICryptoCapability` with sign/verify/encrypt/decrypt/hash + - [ ] `IConnectorCapability` with connection test + - [ ] `IScmCapability` with branches/commits/files/webhooks + - [ ] `IRegistryCapability` with repos/tags/manifests + - [ ] `IAnalysisCapability` with file patterns and analysis + - [ ] `ITransportCapability` with send/receive + - [ ] `IFeedCapability` with fetch/parse + - [ ] `ILlmCapability` with session creation + - [ ] `IAuthCapability` with authenticate/authorize + - [ ] `IStepProviderCapability` with step execution + - [ ] `IGateProviderCapability` with gate evaluation +- [ ] `PluginManifest` model with all fields +- [ ] Plugin attributes for decoration +- [ ] XML documentation on all public members +- [ ] Unit tests for all validation logic +- [ ] Unit tests for enum extensions +- [ ] Test coverage >= 90% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| .NET 10 | External | Available | +| System.Text.Json | External | Built-in | +| Microsoft.Extensions.Logging.Abstractions | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IPlugin interface | TODO | | +| PluginInfo record | TODO | | +| PluginTrustLevel enum | TODO | | +| PluginCapabilities enum | TODO | | +| IPluginContext interfaces | TODO | | +| Health types | TODO | | +| ICryptoCapability | TODO | | +| IConnectorCapability | TODO | | +| IScmCapability | TODO | | +| IRegistryCapability | TODO | | +| IAnalysisCapability | TODO | | +| ITransportCapability | TODO | | +| IFeedCapability | TODO | | +| ILlmCapability | TODO | | +| IAuthCapability | TODO | | +| IStepProviderCapability | TODO | | +| IGateProviderCapability | TODO | | +| PluginManifest model | TODO | | +| Attributes | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_100_002_PLUGIN_host.md b/docs/implplan/SPRINT_20260110_100_002_PLUGIN_host.md new file mode 100644 index 000000000..5fda5fda8 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_002_PLUGIN_host.md @@ -0,0 +1,1173 @@ +# SPRINT: Plugin Host & Lifecycle Manager + +> **Sprint ID:** 100_002 +> **Module:** PLUGIN +> **Phase:** 100 - Plugin System Unification +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_plugin_unification.md) + +--- + +## Overview + +Implement the unified plugin host that manages plugin discovery, loading, initialization, lifecycle transitions, and shutdown. The host is the central coordinator for all plugins in the platform. + +### Objectives + +- Implement plugin discovery from filesystem and embedded assemblies +- Implement assembly loading with isolation (AssemblyLoadContext) +- Implement plugin lifecycle state machine +- Implement graceful initialization and shutdown +- Implement health monitoring and auto-recovery +- Implement plugin dependency resolution + +### Working Directory + +``` +src/Plugin/ +├── StellaOps.Plugin.Host/ +│ ├── StellaOps.Plugin.Host.csproj +│ ├── PluginHost.cs +│ ├── IPluginHost.cs +│ ├── PluginHostOptions.cs +│ ├── Discovery/ +│ │ ├── IPluginDiscovery.cs +│ │ ├── FileSystemPluginDiscovery.cs +│ │ ├── EmbeddedPluginDiscovery.cs +│ │ ├── CompositePluginDiscovery.cs +│ │ └── PluginDiscoveryResult.cs +│ ├── Loading/ +│ │ ├── IPluginLoader.cs +│ │ ├── AssemblyPluginLoader.cs +│ │ ├── PluginAssemblyLoadContext.cs +│ │ └── PluginLoadResult.cs +│ ├── Lifecycle/ +│ │ ├── IPluginLifecycleManager.cs +│ │ ├── PluginLifecycleManager.cs +│ │ ├── PluginStateMachine.cs +│ │ └── PluginStateTransition.cs +│ ├── Context/ +│ │ ├── PluginContext.cs +│ │ ├── PluginConfiguration.cs +│ │ ├── PluginLogger.cs +│ │ └── PluginServices.cs +│ ├── Health/ +│ │ ├── IPluginHealthMonitor.cs +│ │ ├── PluginHealthMonitor.cs +│ │ └── HealthCheckScheduler.cs +│ ├── Dependencies/ +│ │ ├── IPluginDependencyResolver.cs +│ │ ├── PluginDependencyResolver.cs +│ │ └── DependencyGraph.cs +│ └── Extensions/ +│ └── ServiceCollectionExtensions.cs +└── __Tests/ + └── StellaOps.Plugin.Host.Tests/ + ├── PluginHostTests.cs + ├── PluginDiscoveryTests.cs + ├── PluginLoaderTests.cs + ├── LifecycleManagerTests.cs + └── DependencyResolverTests.cs +``` + +--- + +## Deliverables + +### Plugin Host Interface + +```csharp +// IPluginHost.cs +namespace StellaOps.Plugin.Host; + +/// +/// Central coordinator for plugin lifecycle management. +/// +public interface IPluginHost : IAsyncDisposable +{ + /// + /// All currently loaded plugins. + /// + IReadOnlyDictionary Plugins { get; } + + /// + /// Discover and load all plugins from configured sources. + /// + Task StartAsync(CancellationToken ct); + + /// + /// Gracefully stop all plugins and release resources. + /// + Task StopAsync(CancellationToken ct); + + /// + /// Load a specific plugin from a source. + /// + Task LoadPluginAsync(PluginSource source, CancellationToken ct); + + /// + /// Unload a specific plugin. + /// + Task UnloadPluginAsync(string pluginId, CancellationToken ct); + + /// + /// Reload a plugin (unload then load). + /// + Task ReloadPluginAsync(string pluginId, CancellationToken ct); + + /// + /// Get plugins with a specific capability. + /// + IEnumerable GetPluginsWithCapability() where T : class; + + /// + /// Get a specific plugin by ID. + /// + LoadedPlugin? GetPlugin(string pluginId); + + /// + /// Get a plugin capability instance. + /// + T? GetCapability(string pluginId) where T : class; + + /// + /// Event raised when a plugin state changes. + /// + event EventHandler? PluginStateChanged; + + /// + /// Event raised when a plugin health status changes. + /// + event EventHandler? PluginHealthChanged; +} + +/// +/// Represents a loaded plugin with its runtime state. +/// +public sealed class LoadedPlugin +{ + public required string PluginId { get; init; } + public required PluginInfo Info { get; init; } + public required IPlugin Instance { get; init; } + public required PluginTrustLevel TrustLevel { get; init; } + public required PluginCapabilities Capabilities { get; init; } + public required PluginLifecycleState State { get; init; } + public required HealthStatus HealthStatus { get; init; } + public required DateTimeOffset LoadedAt { get; init; } + public DateTimeOffset? LastHealthCheck { get; init; } + public PluginManifest? Manifest { get; init; } + public IPluginContext? Context { get; init; } +} + +public sealed record PluginSource( + PluginSourceType Type, + string Location, + IReadOnlyDictionary? Metadata = null); + +public enum PluginSourceType +{ + FileSystem, + Embedded, + Remote, + Database +} + +public sealed class PluginStateChangedEventArgs : EventArgs +{ + public required string PluginId { get; init; } + public required PluginLifecycleState OldState { get; init; } + public required PluginLifecycleState NewState { get; init; } + public string? Reason { get; init; } +} + +public sealed class PluginHealthChangedEventArgs : EventArgs +{ + public required string PluginId { get; init; } + public required HealthStatus OldStatus { get; init; } + public required HealthStatus NewStatus { get; init; } + public HealthCheckResult? CheckResult { get; init; } +} +``` + +### Plugin Host Implementation + +```csharp +// PluginHost.cs +namespace StellaOps.Plugin.Host; + +public sealed class PluginHost : IPluginHost +{ + private readonly IPluginDiscovery _discovery; + private readonly IPluginLoader _loader; + private readonly IPluginLifecycleManager _lifecycle; + private readonly IPluginHealthMonitor _healthMonitor; + private readonly IPluginDependencyResolver _dependencyResolver; + private readonly IPluginRegistry? _registry; + private readonly PluginHostOptions _options; + private readonly ILogger _logger; + private readonly TimeProvider _timeProvider; + + private readonly ConcurrentDictionary _plugins = new(); + private readonly SemaphoreSlim _loadLock = new(1, 1); + private CancellationTokenSource? _shutdownCts; + + public IReadOnlyDictionary Plugins => _plugins; + + public event EventHandler? PluginStateChanged; + public event EventHandler? PluginHealthChanged; + + public PluginHost( + IPluginDiscovery discovery, + IPluginLoader loader, + IPluginLifecycleManager lifecycle, + IPluginHealthMonitor healthMonitor, + IPluginDependencyResolver dependencyResolver, + IOptions options, + ILogger logger, + TimeProvider timeProvider, + IPluginRegistry? registry = null) + { + _discovery = discovery; + _loader = loader; + _lifecycle = lifecycle; + _healthMonitor = healthMonitor; + _dependencyResolver = dependencyResolver; + _options = options.Value; + _logger = logger; + _timeProvider = timeProvider; + _registry = registry; + + _healthMonitor.HealthChanged += OnPluginHealthChanged; + } + + public async Task StartAsync(CancellationToken ct) + { + _shutdownCts = CancellationTokenSource.CreateLinkedTokenSource(ct); + + _logger.LogInformation("Starting plugin host..."); + + // 1. Discover plugins + var discovered = await _discovery.DiscoverAsync(_options.PluginPaths, ct); + _logger.LogInformation("Discovered {Count} plugins", discovered.Count); + + // 2. Resolve dependencies and determine load order + var loadOrder = _dependencyResolver.ResolveLoadOrder(discovered); + + // 3. Load plugins in dependency order + foreach (var manifest in loadOrder) + { + try + { + await LoadPluginInternalAsync(manifest, ct); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to load plugin {PluginId}", manifest.Info.Id); + + if (_options.FailOnPluginLoadError) + throw; + } + } + + // 4. Start health monitoring + await _healthMonitor.StartAsync(_shutdownCts.Token); + + _logger.LogInformation("Plugin host started with {Count} active plugins", + _plugins.Count(p => p.Value.State == PluginLifecycleState.Active)); + } + + public async Task StopAsync(CancellationToken ct) + { + _logger.LogInformation("Stopping plugin host..."); + + // Cancel ongoing operations + _shutdownCts?.Cancel(); + + // Stop health monitoring + await _healthMonitor.StopAsync(ct); + + // Unload plugins in reverse dependency order + var unloadOrder = _dependencyResolver.ResolveUnloadOrder(_plugins.Values.Select(p => p.Manifest!)); + + foreach (var pluginId in unloadOrder) + { + try + { + await UnloadPluginInternalAsync(pluginId, ct); + } + catch (Exception ex) + { + _logger.LogError(ex, "Error unloading plugin {PluginId}", pluginId); + } + } + + _logger.LogInformation("Plugin host stopped"); + } + + public async Task LoadPluginAsync(PluginSource source, CancellationToken ct) + { + await _loadLock.WaitAsync(ct); + try + { + // Discover manifest from source + var manifest = await _discovery.DiscoverSingleAsync(source, ct); + + // Check if already loaded + if (_plugins.ContainsKey(manifest.Info.Id)) + throw new InvalidOperationException($"Plugin {manifest.Info.Id} is already loaded"); + + return await LoadPluginInternalAsync(manifest, ct); + } + finally + { + _loadLock.Release(); + } + } + + private async Task LoadPluginInternalAsync(PluginManifest manifest, CancellationToken ct) + { + var pluginId = manifest.Info.Id; + _logger.LogDebug("Loading plugin {PluginId} v{Version}", pluginId, manifest.Info.Version); + + // Transition to Loading state + await _lifecycle.TransitionAsync(pluginId, PluginLifecycleState.Loading, ct); + RaiseStateChanged(pluginId, PluginLifecycleState.Discovered, PluginLifecycleState.Loading); + + try + { + // 1. Determine trust level + var trustLevel = await DetermineTrustLevelAsync(manifest, ct); + + // 2. Load assembly and create instance + var loadResult = await _loader.LoadAsync(manifest, trustLevel, ct); + + // 3. Create plugin context + var context = CreatePluginContext(manifest, trustLevel); + + // 4. Transition to Initializing + await _lifecycle.TransitionAsync(pluginId, PluginLifecycleState.Initializing, ct); + RaiseStateChanged(pluginId, PluginLifecycleState.Loading, PluginLifecycleState.Initializing); + + // 5. Initialize plugin + using var initCts = CancellationTokenSource.CreateLinkedTokenSource(ct); + initCts.CancelAfter(_options.InitializationTimeout); + + await loadResult.Instance.InitializeAsync(context, initCts.Token); + + // 6. Transition to Active + await _lifecycle.TransitionAsync(pluginId, PluginLifecycleState.Active, ct); + + var loadedPlugin = new LoadedPlugin + { + PluginId = pluginId, + Info = manifest.Info, + Instance = loadResult.Instance, + TrustLevel = trustLevel, + Capabilities = loadResult.Instance.Capabilities, + State = PluginLifecycleState.Active, + HealthStatus = HealthStatus.Healthy, + LoadedAt = _timeProvider.GetUtcNow(), + Manifest = manifest, + Context = context + }; + + _plugins[pluginId] = loadedPlugin; + + // 7. Register in database if available + if (_registry != null) + { + await _registry.RegisterAsync(loadedPlugin, ct); + } + + // 8. Register with health monitor + _healthMonitor.RegisterPlugin(loadedPlugin); + + RaiseStateChanged(pluginId, PluginLifecycleState.Initializing, PluginLifecycleState.Active); + + _logger.LogInformation( + "Loaded plugin {PluginId} v{Version} with capabilities [{Capabilities}]", + pluginId, manifest.Info.Version, loadedPlugin.Capabilities); + + return loadedPlugin; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to load plugin {PluginId}", pluginId); + await _lifecycle.TransitionAsync(pluginId, PluginLifecycleState.Failed, ct); + RaiseStateChanged(pluginId, PluginLifecycleState.Initializing, PluginLifecycleState.Failed, ex.Message); + throw; + } + } + + public async Task UnloadPluginAsync(string pluginId, CancellationToken ct) + { + await _loadLock.WaitAsync(ct); + try + { + await UnloadPluginInternalAsync(pluginId, ct); + } + finally + { + _loadLock.Release(); + } + } + + private async Task UnloadPluginInternalAsync(string pluginId, CancellationToken ct) + { + if (!_plugins.TryGetValue(pluginId, out var plugin)) + return; + + _logger.LogDebug("Unloading plugin {PluginId}", pluginId); + + var oldState = plugin.State; + + // Transition to Stopping + await _lifecycle.TransitionAsync(pluginId, PluginLifecycleState.Stopping, ct); + RaiseStateChanged(pluginId, oldState, PluginLifecycleState.Stopping); + + try + { + // Unregister from health monitor + _healthMonitor.UnregisterPlugin(pluginId); + + // Dispose plugin + using var disposeCts = CancellationTokenSource.CreateLinkedTokenSource(ct); + disposeCts.CancelAfter(_options.ShutdownTimeout); + + await plugin.Instance.DisposeAsync(); + + // Transition to Stopped + await _lifecycle.TransitionAsync(pluginId, PluginLifecycleState.Stopped, ct); + RaiseStateChanged(pluginId, PluginLifecycleState.Stopping, PluginLifecycleState.Stopped); + + // Unload assembly + await _loader.UnloadAsync(pluginId, ct); + + // Remove from registry + _plugins.TryRemove(pluginId, out _); + + if (_registry != null) + { + await _registry.UnregisterAsync(pluginId, ct); + } + + _logger.LogInformation("Unloaded plugin {PluginId}", pluginId); + } + catch (Exception ex) + { + _logger.LogError(ex, "Error unloading plugin {PluginId}", pluginId); + throw; + } + } + + public async Task ReloadPluginAsync(string pluginId, CancellationToken ct) + { + if (!_plugins.TryGetValue(pluginId, out var existing)) + throw new InvalidOperationException($"Plugin {pluginId} is not loaded"); + + var manifest = existing.Manifest + ?? throw new InvalidOperationException($"Plugin {pluginId} has no manifest"); + + await UnloadPluginAsync(pluginId, ct); + + // Small delay to allow resources to be released + await Task.Delay(100, ct); + + return await LoadPluginInternalAsync(manifest, ct); + } + + public IEnumerable GetPluginsWithCapability() where T : class + { + foreach (var plugin in _plugins.Values) + { + if (plugin.State == PluginLifecycleState.Active && plugin.Instance is T capability) + { + yield return capability; + } + } + } + + public LoadedPlugin? GetPlugin(string pluginId) => + _plugins.TryGetValue(pluginId, out var plugin) ? plugin : null; + + public T? GetCapability(string pluginId) where T : class => + GetPlugin(pluginId)?.Instance as T; + + private async Task DetermineTrustLevelAsync(PluginManifest manifest, CancellationToken ct) + { + // Built-in plugins are always trusted + if (_options.BuiltInPluginIds.Contains(manifest.Info.Id)) + return PluginTrustLevel.BuiltIn; + + // Check signature if present + if (manifest.Signature != null) + { + var isValid = await VerifySignatureAsync(manifest, ct); + if (isValid && _options.TrustedVendors.Contains(manifest.Info.Vendor)) + return PluginTrustLevel.Trusted; + } + + // Check trusted plugins list + if (_options.TrustedPluginIds.Contains(manifest.Info.Id)) + return PluginTrustLevel.Trusted; + + // Default to untrusted + return PluginTrustLevel.Untrusted; + } + + private async Task VerifySignatureAsync(PluginManifest manifest, CancellationToken ct) + { + // Signature verification implementation + // Uses crypto capability if available + await Task.CompletedTask; // Placeholder + return false; + } + + private IPluginContext CreatePluginContext(PluginManifest manifest, PluginTrustLevel trustLevel) + { + return new PluginContext( + manifest, + trustLevel, + _options, + _logger, + _timeProvider, + _shutdownCts!.Token); + } + + private void OnPluginHealthChanged(object? sender, PluginHealthChangedEventArgs e) + { + if (_plugins.TryGetValue(e.PluginId, out var plugin)) + { + // Update plugin health status + var updated = plugin with { HealthStatus = e.NewStatus, LastHealthCheck = _timeProvider.GetUtcNow() }; + _plugins[e.PluginId] = updated; + + // Handle unhealthy plugins + if (e.NewStatus == HealthStatus.Unhealthy && _options.AutoRecoverUnhealthyPlugins) + { + _ = Task.Run(async () => + { + try + { + _logger.LogWarning("Plugin {PluginId} unhealthy, attempting recovery", e.PluginId); + await ReloadPluginAsync(e.PluginId, CancellationToken.None); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to recover plugin {PluginId}", e.PluginId); + } + }); + } + } + + PluginHealthChanged?.Invoke(this, e); + } + + private void RaiseStateChanged(string pluginId, PluginLifecycleState oldState, PluginLifecycleState newState, string? reason = null) + { + PluginStateChanged?.Invoke(this, new PluginStateChangedEventArgs + { + PluginId = pluginId, + OldState = oldState, + NewState = newState, + Reason = reason + }); + } + + public async ValueTask DisposeAsync() + { + await StopAsync(CancellationToken.None); + _shutdownCts?.Dispose(); + _loadLock.Dispose(); + } +} +``` + +### Plugin Loader + +```csharp +// Loading/AssemblyPluginLoader.cs +namespace StellaOps.Plugin.Host.Loading; + +public sealed class AssemblyPluginLoader : IPluginLoader +{ + private readonly ConcurrentDictionary _loadContexts = new(); + private readonly ILogger _logger; + + public AssemblyPluginLoader(ILogger logger) + { + _logger = logger; + } + + public async Task LoadAsync( + PluginManifest manifest, + PluginTrustLevel trustLevel, + CancellationToken ct) + { + var assemblyPath = ResolveAssemblyPath(manifest); + + _logger.LogDebug("Loading plugin assembly from {Path}", assemblyPath); + + // Create isolated load context + var loadContext = new PluginAssemblyLoadContext( + manifest.Info.Id, + assemblyPath, + isCollectible: trustLevel != PluginTrustLevel.BuiltIn); + + _loadContexts[manifest.Info.Id] = loadContext; + + try + { + // Load the assembly + var assembly = loadContext.LoadFromAssemblyPath(assemblyPath); + + // Find the entry point type + var entryPointType = assembly.GetType(manifest.EntryPoint) + ?? throw new PluginLoadException($"Entry point type '{manifest.EntryPoint}' not found"); + + // Verify it implements IPlugin + if (!typeof(IPlugin).IsAssignableFrom(entryPointType)) + throw new PluginLoadException($"Entry point type '{manifest.EntryPoint}' does not implement IPlugin"); + + // Create instance + var instance = Activator.CreateInstance(entryPointType) as IPlugin + ?? throw new PluginLoadException($"Failed to create instance of '{manifest.EntryPoint}'"); + + return new PluginLoadResult(instance, assembly, loadContext); + } + catch (Exception ex) + { + // Cleanup on failure + _loadContexts.TryRemove(manifest.Info.Id, out _); + loadContext.Unload(); + throw new PluginLoadException($"Failed to load plugin {manifest.Info.Id}", ex); + } + } + + public async Task UnloadAsync(string pluginId, CancellationToken ct) + { + if (_loadContexts.TryRemove(pluginId, out var loadContext)) + { + loadContext.Unload(); + + // Wait for GC to collect the assemblies + for (int i = 0; i < 10 && loadContext.IsAlive; i++) + { + GC.Collect(); + GC.WaitForPendingFinalizers(); + await Task.Delay(100, ct); + } + + if (loadContext.IsAlive) + { + _logger.LogWarning("Plugin {PluginId} load context still alive after unload", pluginId); + } + } + } + + private static string ResolveAssemblyPath(PluginManifest manifest) + { + // Implementation to resolve the main assembly path from manifest + // Could be relative to manifest location or absolute + return manifest.AssemblyPath ?? throw new PluginLoadException("Assembly path not specified in manifest"); + } +} + +public sealed class PluginAssemblyLoadContext : AssemblyLoadContext +{ + private readonly AssemblyDependencyResolver _resolver; + private readonly WeakReference _weakReference; + + public bool IsAlive => _weakReference.IsAlive; + + public PluginAssemblyLoadContext(string name, string pluginPath, bool isCollectible) + : base(name, isCollectible) + { + _resolver = new AssemblyDependencyResolver(pluginPath); + _weakReference = new WeakReference(this); + } + + protected override Assembly? Load(AssemblyName assemblyName) + { + var assemblyPath = _resolver.ResolveAssemblyToPath(assemblyName); + if (assemblyPath != null) + { + return LoadFromAssemblyPath(assemblyPath); + } + return null; + } + + protected override IntPtr LoadUnmanagedDll(string unmanagedDllName) + { + var libraryPath = _resolver.ResolveUnmanagedDllToPath(unmanagedDllName); + if (libraryPath != null) + { + return LoadUnmanagedDllFromPath(libraryPath); + } + return IntPtr.Zero; + } +} + +public sealed record PluginLoadResult( + IPlugin Instance, + Assembly Assembly, + AssemblyLoadContext LoadContext); + +public class PluginLoadException : Exception +{ + public PluginLoadException(string message) : base(message) { } + public PluginLoadException(string message, Exception inner) : base(message, inner) { } +} +``` + +### Plugin Discovery + +```csharp +// Discovery/FileSystemPluginDiscovery.cs +namespace StellaOps.Plugin.Host.Discovery; + +public sealed class FileSystemPluginDiscovery : IPluginDiscovery +{ + private readonly ILogger _logger; + private static readonly string[] ManifestFileNames = ["plugin.yaml", "plugin.yml", "plugin.json"]; + + public FileSystemPluginDiscovery(ILogger logger) + { + _logger = logger; + } + + public async Task> DiscoverAsync( + IEnumerable searchPaths, + CancellationToken ct) + { + var manifests = new List(); + + foreach (var searchPath in searchPaths) + { + if (!Directory.Exists(searchPath)) + { + _logger.LogWarning("Plugin search path does not exist: {Path}", searchPath); + continue; + } + + _logger.LogDebug("Searching for plugins in {Path}", searchPath); + + // Look for plugin directories (contain plugin.yaml/plugin.json) + foreach (var dir in Directory.EnumerateDirectories(searchPath)) + { + ct.ThrowIfCancellationRequested(); + + var manifestPath = FindManifestFile(dir); + if (manifestPath == null) + continue; + + try + { + var manifest = await ParseManifestAsync(manifestPath, ct); + manifests.Add(manifest); + _logger.LogDebug("Discovered plugin {PluginId} at {Path}", manifest.Info.Id, dir); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Failed to parse manifest at {Path}", manifestPath); + } + } + } + + return manifests; + } + + public async Task DiscoverSingleAsync(PluginSource source, CancellationToken ct) + { + if (source.Type != PluginSourceType.FileSystem) + throw new ArgumentException($"Unsupported source type: {source.Type}"); + + var manifestPath = FindManifestFile(source.Location) + ?? throw new FileNotFoundException($"No plugin manifest found in {source.Location}"); + + return await ParseManifestAsync(manifestPath, ct); + } + + private static string? FindManifestFile(string directory) + { + foreach (var fileName in ManifestFileNames) + { + var path = Path.Combine(directory, fileName); + if (File.Exists(path)) + return path; + } + return null; + } + + private static async Task ParseManifestAsync(string manifestPath, CancellationToken ct) + { + var content = await File.ReadAllTextAsync(manifestPath, ct); + var extension = Path.GetExtension(manifestPath).ToLowerInvariant(); + + return extension switch + { + ".yaml" or ".yml" => ParseYamlManifest(content, manifestPath), + ".json" => ParseJsonManifest(content, manifestPath), + _ => throw new InvalidOperationException($"Unknown manifest format: {extension}") + }; + } + + private static PluginManifest ParseYamlManifest(string content, string path) + { + var deserializer = new DeserializerBuilder() + .WithNamingConvention(CamelCaseNamingConvention.Instance) + .Build(); + + var manifestDto = deserializer.Deserialize(content); + return manifestDto.ToManifest(Path.GetDirectoryName(path)!); + } + + private static PluginManifest ParseJsonManifest(string content, string path) + { + var manifestDto = JsonSerializer.Deserialize(content, new JsonSerializerOptions + { + PropertyNamingPolicy = JsonNamingPolicy.CamelCase + }); + + return manifestDto?.ToManifest(Path.GetDirectoryName(path)!) + ?? throw new InvalidOperationException("Failed to parse manifest JSON"); + } +} +``` + +### Health Monitor + +```csharp +// Health/PluginHealthMonitor.cs +namespace StellaOps.Plugin.Host.Health; + +public sealed class PluginHealthMonitor : IPluginHealthMonitor, IAsyncDisposable +{ + private readonly PluginHostOptions _options; + private readonly ILogger _logger; + private readonly TimeProvider _timeProvider; + private readonly ConcurrentDictionary _healthStates = new(); + private readonly Channel _checkQueue; + private Task? _monitorTask; + private CancellationTokenSource? _cts; + + public event EventHandler? HealthChanged; + + public PluginHealthMonitor( + IOptions options, + ILogger logger, + TimeProvider timeProvider) + { + _options = options.Value; + _logger = logger; + _timeProvider = timeProvider; + _checkQueue = Channel.CreateBounded(new BoundedChannelOptions(100) + { + FullMode = BoundedChannelFullMode.DropOldest + }); + } + + public async Task StartAsync(CancellationToken ct) + { + _cts = CancellationTokenSource.CreateLinkedTokenSource(ct); + _monitorTask = Task.Run(() => MonitorLoopAsync(_cts.Token), _cts.Token); + _logger.LogInformation("Plugin health monitor started"); + } + + public async Task StopAsync(CancellationToken ct) + { + _cts?.Cancel(); + if (_monitorTask != null) + { + try + { + await _monitorTask.WaitAsync(ct); + } + catch (OperationCanceledException) { } + } + _logger.LogInformation("Plugin health monitor stopped"); + } + + public void RegisterPlugin(LoadedPlugin plugin) + { + _healthStates[plugin.PluginId] = new PluginHealthState + { + Plugin = plugin, + LastCheck = _timeProvider.GetUtcNow(), + Status = HealthStatus.Healthy, + ConsecutiveFailures = 0 + }; + } + + public void UnregisterPlugin(string pluginId) + { + _healthStates.TryRemove(pluginId, out _); + } + + public async Task CheckHealthAsync(string pluginId, CancellationToken ct) + { + if (!_healthStates.TryGetValue(pluginId, out var state)) + return HealthCheckResult.Unhealthy("Plugin not registered"); + + return await PerformHealthCheckAsync(state, ct); + } + + private async Task MonitorLoopAsync(CancellationToken ct) + { + var periodicTimer = new PeriodicTimer(_options.HealthCheckInterval); + + while (!ct.IsCancellationRequested) + { + try + { + await periodicTimer.WaitForNextTickAsync(ct); + + // Check all registered plugins + foreach (var kvp in _healthStates) + { + ct.ThrowIfCancellationRequested(); + + var state = kvp.Value; + var timeSinceLastCheck = _timeProvider.GetUtcNow() - state.LastCheck; + + if (timeSinceLastCheck >= _options.HealthCheckInterval) + { + try + { + await PerformHealthCheckAsync(state, ct); + } + catch (Exception ex) + { + _logger.LogError(ex, "Health check failed for plugin {PluginId}", kvp.Key); + } + } + } + } + catch (OperationCanceledException) + { + break; + } + catch (Exception ex) + { + _logger.LogError(ex, "Error in health monitor loop"); + } + } + } + + private async Task PerformHealthCheckAsync(PluginHealthState state, CancellationToken ct) + { + var plugin = state.Plugin; + var stopwatch = Stopwatch.StartNew(); + + try + { + using var timeoutCts = CancellationTokenSource.CreateLinkedTokenSource(ct); + timeoutCts.CancelAfter(_options.HealthCheckTimeout); + + var result = await plugin.Instance.HealthCheckAsync(timeoutCts.Token); + stopwatch.Stop(); + + result = result with { Duration = stopwatch.Elapsed }; + + // Update state + var oldStatus = state.Status; + state.Status = result.Status; + state.LastCheck = _timeProvider.GetUtcNow(); + state.LastResult = result; + + if (result.Status == HealthStatus.Healthy) + { + state.ConsecutiveFailures = 0; + } + else + { + state.ConsecutiveFailures++; + } + + // Raise event if status changed + if (oldStatus != result.Status) + { + HealthChanged?.Invoke(this, new PluginHealthChangedEventArgs + { + PluginId = plugin.PluginId, + OldStatus = oldStatus, + NewStatus = result.Status, + CheckResult = result + }); + } + + return result; + } + catch (OperationCanceledException) + { + var result = HealthCheckResult.Unhealthy("Health check timed out"); + state.ConsecutiveFailures++; + UpdateHealthStatus(state, result); + return result; + } + catch (Exception ex) + { + var result = HealthCheckResult.Unhealthy(ex); + state.ConsecutiveFailures++; + UpdateHealthStatus(state, result); + return result; + } + } + + private void UpdateHealthStatus(PluginHealthState state, HealthCheckResult result) + { + var oldStatus = state.Status; + state.Status = result.Status; + state.LastCheck = _timeProvider.GetUtcNow(); + state.LastResult = result; + + if (oldStatus != result.Status) + { + HealthChanged?.Invoke(this, new PluginHealthChangedEventArgs + { + PluginId = state.Plugin.PluginId, + OldStatus = oldStatus, + NewStatus = result.Status, + CheckResult = result + }); + } + } + + public async ValueTask DisposeAsync() + { + await StopAsync(CancellationToken.None); + _cts?.Dispose(); + } + + private sealed class PluginHealthState + { + public required LoadedPlugin Plugin { get; init; } + public DateTimeOffset LastCheck { get; set; } + public HealthStatus Status { get; set; } + public int ConsecutiveFailures { get; set; } + public HealthCheckResult? LastResult { get; set; } + } +} +``` + +### Service Collection Extensions + +```csharp +// Extensions/ServiceCollectionExtensions.cs +namespace StellaOps.Plugin.Host.Extensions; + +public static class ServiceCollectionExtensions +{ + public static IServiceCollection AddPluginHost( + this IServiceCollection services, + IConfiguration configuration) + { + // Bind options + services.Configure(configuration.GetSection("Plugins")); + + // Core services + services.AddSingleton(); + services.AddSingleton(); + services.AddSingleton(); + services.AddSingleton(); + services.AddSingleton(); + + // Plugin host + services.AddSingleton(); + + // Hosted service to start/stop plugin host + services.AddHostedService(); + + return services; + } + + public static IServiceCollection AddPluginRegistry(this IServiceCollection services) + { + services.AddScoped(); + return services; + } +} + +public sealed class PluginHostedService : IHostedService +{ + private readonly IPluginHost _pluginHost; + private readonly ILogger _logger; + + public PluginHostedService(IPluginHost pluginHost, ILogger logger) + { + _pluginHost = pluginHost; + _logger = logger; + } + + public async Task StartAsync(CancellationToken ct) + { + _logger.LogInformation("Starting plugin host..."); + await _pluginHost.StartAsync(ct); + } + + public async Task StopAsync(CancellationToken ct) + { + _logger.LogInformation("Stopping plugin host..."); + await _pluginHost.StopAsync(ct); + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] `IPluginHost` interface with all methods +- [ ] Plugin discovery from filesystem +- [ ] Plugin discovery from embedded assemblies +- [ ] Assembly loading with `AssemblyLoadContext` isolation +- [ ] Plugin lifecycle state machine +- [ ] Graceful initialization with timeout +- [ ] Graceful shutdown with timeout +- [ ] Health monitoring with configurable interval +- [ ] Health status change events +- [ ] Auto-recovery for unhealthy plugins (optional) +- [ ] Dependency resolution for load order +- [ ] Hot reload support +- [ ] Service collection extensions +- [ ] Integration tests with test plugins +- [ ] Unit tests for all components +- [ ] Test coverage >= 80% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_001 Plugin Abstractions | Internal | TODO | +| .NET 10 | External | Available | +| YamlDotNet | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IPluginHost interface | TODO | | +| PluginHost implementation | TODO | | +| FileSystemPluginDiscovery | TODO | | +| EmbeddedPluginDiscovery | TODO | | +| AssemblyPluginLoader | TODO | | +| PluginAssemblyLoadContext | TODO | | +| PluginLifecycleManager | TODO | | +| PluginHealthMonitor | TODO | | +| PluginDependencyResolver | TODO | | +| PluginContext | TODO | | +| ServiceCollectionExtensions | TODO | | +| Unit tests | TODO | | +| Integration tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_100_003_PLUGIN_registry.md b/docs/implplan/SPRINT_20260110_100_003_PLUGIN_registry.md new file mode 100644 index 000000000..9a7517e63 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_003_PLUGIN_registry.md @@ -0,0 +1,762 @@ +# SPRINT: Plugin Registry (Database) + +> **Sprint ID:** 100_003 +> **Module:** PLUGIN +> **Phase:** 100 - Plugin System Unification +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_plugin_unification.md) + +--- + +## Overview + +Implement the database-backed plugin registry that persists plugin metadata, tracks health status, and supports multi-tenant plugin instances. The registry provides centralized plugin management and enables querying plugins by capability. + +### Objectives + +- Implement PostgreSQL-backed plugin registry +- Implement plugin capability indexing +- Implement tenant-specific plugin instances +- Implement health history tracking +- Implement plugin version management +- Provide migration scripts for schema creation + +### Working Directory + +``` +src/Plugin/ +├── StellaOps.Plugin.Registry/ +│ ├── StellaOps.Plugin.Registry.csproj +│ ├── IPluginRegistry.cs +│ ├── PostgresPluginRegistry.cs +│ ├── Models/ +│ │ ├── PluginRecord.cs +│ │ ├── PluginCapabilityRecord.cs +│ │ ├── PluginInstanceRecord.cs +│ │ └── PluginHealthRecord.cs +│ ├── Queries/ +│ │ ├── PluginQueries.cs +│ │ ├── CapabilityQueries.cs +│ │ └── InstanceQueries.cs +│ └── Migrations/ +│ └── 001_CreatePluginTables.sql +└── __Tests/ + └── StellaOps.Plugin.Registry.Tests/ + ├── PostgresPluginRegistryTests.cs + └── PluginQueryTests.cs +``` + +--- + +## Deliverables + +### Plugin Registry Interface + +```csharp +// IPluginRegistry.cs +namespace StellaOps.Plugin.Registry; + +/// +/// Database-backed plugin registry for persistent plugin management. +/// +public interface IPluginRegistry +{ + // ========== Plugin Management ========== + + /// + /// Register a loaded plugin in the database. + /// + Task RegisterAsync(LoadedPlugin plugin, CancellationToken ct); + + /// + /// Update plugin status. + /// + Task UpdateStatusAsync(string pluginId, PluginLifecycleState status, string? message = null, CancellationToken ct = default); + + /// + /// Update plugin health status. + /// + Task UpdateHealthAsync(string pluginId, HealthStatus status, HealthCheckResult? result = null, CancellationToken ct = default); + + /// + /// Unregister a plugin. + /// + Task UnregisterAsync(string pluginId, CancellationToken ct); + + /// + /// Get plugin by ID. + /// + Task GetAsync(string pluginId, CancellationToken ct); + + /// + /// Get all registered plugins. + /// + Task> GetAllAsync(CancellationToken ct); + + /// + /// Get plugins by status. + /// + Task> GetByStatusAsync(PluginLifecycleState status, CancellationToken ct); + + // ========== Capability Queries ========== + + /// + /// Get plugins with a specific capability. + /// + Task> GetByCapabilityAsync(PluginCapabilities capability, CancellationToken ct); + + /// + /// Get plugins providing a specific capability type/id. + /// + Task> GetByCapabilityTypeAsync(string capabilityType, string? capabilityId = null, CancellationToken ct = default); + + /// + /// Register plugin capabilities. + /// + Task RegisterCapabilitiesAsync(Guid pluginDbId, IEnumerable capabilities, CancellationToken ct); + + // ========== Instance Management ========== + + /// + /// Create a tenant-specific plugin instance. + /// + Task CreateInstanceAsync(CreatePluginInstanceRequest request, CancellationToken ct); + + /// + /// Get plugin instance. + /// + Task GetInstanceAsync(Guid instanceId, CancellationToken ct); + + /// + /// Get instances for a tenant. + /// + Task> GetInstancesForTenantAsync(Guid tenantId, CancellationToken ct); + + /// + /// Get instances for a plugin. + /// + Task> GetInstancesForPluginAsync(string pluginId, CancellationToken ct); + + /// + /// Update instance configuration. + /// + Task UpdateInstanceConfigAsync(Guid instanceId, JsonDocument config, CancellationToken ct); + + /// + /// Enable/disable instance. + /// + Task SetInstanceEnabledAsync(Guid instanceId, bool enabled, CancellationToken ct); + + /// + /// Delete instance. + /// + Task DeleteInstanceAsync(Guid instanceId, CancellationToken ct); + + // ========== Health History ========== + + /// + /// Record health check result. + /// + Task RecordHealthCheckAsync(string pluginId, HealthCheckResult result, CancellationToken ct); + + /// + /// Get health history for a plugin. + /// + Task> GetHealthHistoryAsync( + string pluginId, + DateTimeOffset since, + int limit = 100, + CancellationToken ct = default); +} + +public sealed record CreatePluginInstanceRequest( + string PluginId, + Guid? TenantId, + string? InstanceName, + JsonDocument Config, + string? SecretsPath = null, + JsonDocument? ResourceLimits = null); +``` + +### PostgreSQL Implementation + +```csharp +// PostgresPluginRegistry.cs +namespace StellaOps.Plugin.Registry; + +public sealed class PostgresPluginRegistry : IPluginRegistry +{ + private readonly NpgsqlDataSource _dataSource; + private readonly ILogger _logger; + private readonly TimeProvider _timeProvider; + + public PostgresPluginRegistry( + NpgsqlDataSource dataSource, + ILogger logger, + TimeProvider timeProvider) + { + _dataSource = dataSource; + _logger = logger; + _timeProvider = timeProvider; + } + + public async Task RegisterAsync(LoadedPlugin plugin, CancellationToken ct) + { + const string sql = """ + INSERT INTO platform.plugins ( + plugin_id, name, version, vendor, description, license_id, + trust_level, capabilities, capability_details, source, + assembly_path, entry_point, status, manifest, created_at, updated_at, loaded_at + ) VALUES ( + @plugin_id, @name, @version, @vendor, @description, @license_id, + @trust_level, @capabilities, @capability_details, @source, + @assembly_path, @entry_point, @status, @manifest, @now, @now, @now + ) + ON CONFLICT (plugin_id, version) DO UPDATE SET + status = @status, + updated_at = @now, + loaded_at = @now + RETURNING * + """; + + await using var conn = await _dataSource.OpenConnectionAsync(ct); + await using var cmd = new NpgsqlCommand(sql, conn); + + var now = _timeProvider.GetUtcNow(); + cmd.Parameters.AddWithValue("plugin_id", plugin.Info.Id); + cmd.Parameters.AddWithValue("name", plugin.Info.Name); + cmd.Parameters.AddWithValue("version", plugin.Info.Version); + cmd.Parameters.AddWithValue("vendor", plugin.Info.Vendor); + cmd.Parameters.AddWithValue("description", (object?)plugin.Info.Description ?? DBNull.Value); + cmd.Parameters.AddWithValue("license_id", (object?)plugin.Info.LicenseId ?? DBNull.Value); + cmd.Parameters.AddWithValue("trust_level", plugin.TrustLevel.ToString().ToLowerInvariant()); + cmd.Parameters.AddWithValue("capabilities", plugin.Capabilities.ToStringArray()); + cmd.Parameters.AddWithValue("capability_details", JsonSerializer.Serialize(new { })); + cmd.Parameters.AddWithValue("source", "installed"); + cmd.Parameters.AddWithValue("assembly_path", (object?)plugin.Manifest?.AssemblyPath ?? DBNull.Value); + cmd.Parameters.AddWithValue("entry_point", (object?)plugin.Manifest?.EntryPoint ?? DBNull.Value); + cmd.Parameters.AddWithValue("status", plugin.State.ToString().ToLowerInvariant()); + cmd.Parameters.AddWithValue("manifest", plugin.Manifest != null + ? JsonSerializer.Serialize(plugin.Manifest) + : DBNull.Value); + cmd.Parameters.AddWithValue("now", now); + + await using var reader = await cmd.ExecuteReaderAsync(ct); + if (await reader.ReadAsync(ct)) + { + var record = MapPluginRecord(reader); + + // Register capabilities + if (plugin.Manifest?.Capabilities.Count > 0) + { + var capRecords = plugin.Manifest.Capabilities.Select(c => new PluginCapabilityRecord + { + Id = Guid.NewGuid(), + PluginId = record.Id, + CapabilityType = c.Type, + CapabilityId = c.Id ?? c.Type, + ConfigSchema = c.ConfigSchema, + Metadata = c.Metadata, + IsEnabled = true, + CreatedAt = now + }); + + await RegisterCapabilitiesAsync(record.Id, capRecords, ct); + } + + _logger.LogDebug("Registered plugin {PluginId} with DB ID {DbId}", plugin.Info.Id, record.Id); + return record; + } + + throw new InvalidOperationException($"Failed to register plugin {plugin.Info.Id}"); + } + + public async Task UpdateStatusAsync(string pluginId, PluginLifecycleState status, string? message = null, CancellationToken ct = default) + { + const string sql = """ + UPDATE platform.plugins + SET status = @status, status_message = @message, updated_at = @now + WHERE plugin_id = @plugin_id + """; + + await using var conn = await _dataSource.OpenConnectionAsync(ct); + await using var cmd = new NpgsqlCommand(sql, conn); + + cmd.Parameters.AddWithValue("plugin_id", pluginId); + cmd.Parameters.AddWithValue("status", status.ToString().ToLowerInvariant()); + cmd.Parameters.AddWithValue("message", (object?)message ?? DBNull.Value); + cmd.Parameters.AddWithValue("now", _timeProvider.GetUtcNow()); + + await cmd.ExecuteNonQueryAsync(ct); + } + + public async Task UpdateHealthAsync(string pluginId, HealthStatus status, HealthCheckResult? result = null, CancellationToken ct = default) + { + const string sql = """ + UPDATE platform.plugins + SET health_status = @health_status, last_health_check = @now, updated_at = @now + WHERE plugin_id = @plugin_id + """; + + await using var conn = await _dataSource.OpenConnectionAsync(ct); + await using var cmd = new NpgsqlCommand(sql, conn); + + var now = _timeProvider.GetUtcNow(); + cmd.Parameters.AddWithValue("plugin_id", pluginId); + cmd.Parameters.AddWithValue("health_status", status.ToString().ToLowerInvariant()); + cmd.Parameters.AddWithValue("now", now); + + await cmd.ExecuteNonQueryAsync(ct); + + // Record health history + if (result != null) + { + await RecordHealthCheckAsync(pluginId, result, ct); + } + } + + public async Task UnregisterAsync(string pluginId, CancellationToken ct) + { + const string sql = "DELETE FROM platform.plugins WHERE plugin_id = @plugin_id"; + + await using var conn = await _dataSource.OpenConnectionAsync(ct); + await using var cmd = new NpgsqlCommand(sql, conn); + + cmd.Parameters.AddWithValue("plugin_id", pluginId); + await cmd.ExecuteNonQueryAsync(ct); + + _logger.LogDebug("Unregistered plugin {PluginId}", pluginId); + } + + public async Task GetAsync(string pluginId, CancellationToken ct) + { + const string sql = "SELECT * FROM platform.plugins WHERE plugin_id = @plugin_id"; + + await using var conn = await _dataSource.OpenConnectionAsync(ct); + await using var cmd = new NpgsqlCommand(sql, conn); + + cmd.Parameters.AddWithValue("plugin_id", pluginId); + + await using var reader = await cmd.ExecuteReaderAsync(ct); + return await reader.ReadAsync(ct) ? MapPluginRecord(reader) : null; + } + + public async Task> GetAllAsync(CancellationToken ct) + { + const string sql = "SELECT * FROM platform.plugins ORDER BY name"; + + await using var conn = await _dataSource.OpenConnectionAsync(ct); + await using var cmd = new NpgsqlCommand(sql, conn); + + var results = new List(); + await using var reader = await cmd.ExecuteReaderAsync(ct); + + while (await reader.ReadAsync(ct)) + { + results.Add(MapPluginRecord(reader)); + } + + return results; + } + + public async Task> GetByCapabilityAsync(PluginCapabilities capability, CancellationToken ct) + { + var capabilityStrings = capability.ToStringArray(); + + const string sql = """ + SELECT * FROM platform.plugins + WHERE capabilities && @capabilities + AND status = 'active' + ORDER BY name + """; + + await using var conn = await _dataSource.OpenConnectionAsync(ct); + await using var cmd = new NpgsqlCommand(sql, conn); + + cmd.Parameters.AddWithValue("capabilities", capabilityStrings); + + var results = new List(); + await using var reader = await cmd.ExecuteReaderAsync(ct); + + while (await reader.ReadAsync(ct)) + { + results.Add(MapPluginRecord(reader)); + } + + return results; + } + + public async Task> GetByCapabilityTypeAsync( + string capabilityType, + string? capabilityId = null, + CancellationToken ct = default) + { + var sql = """ + SELECT p.* FROM platform.plugins p + INNER JOIN platform.plugin_capabilities c ON c.plugin_id = p.id + WHERE c.capability_type = @capability_type + AND c.is_enabled = TRUE + AND p.status = 'active' + """; + + if (capabilityId != null) + { + sql += " AND c.capability_id = @capability_id"; + } + + sql += " ORDER BY p.name"; + + await using var conn = await _dataSource.OpenConnectionAsync(ct); + await using var cmd = new NpgsqlCommand(sql, conn); + + cmd.Parameters.AddWithValue("capability_type", capabilityType); + if (capabilityId != null) + { + cmd.Parameters.AddWithValue("capability_id", capabilityId); + } + + var results = new List(); + await using var reader = await cmd.ExecuteReaderAsync(ct); + + while (await reader.ReadAsync(ct)) + { + results.Add(MapPluginRecord(reader)); + } + + return results; + } + + public async Task RegisterCapabilitiesAsync( + Guid pluginDbId, + IEnumerable capabilities, + CancellationToken ct) + { + const string sql = """ + INSERT INTO platform.plugin_capabilities ( + id, plugin_id, capability_type, capability_id, + config_schema, metadata, is_enabled, created_at + ) VALUES ( + @id, @plugin_id, @capability_type, @capability_id, + @config_schema, @metadata, @is_enabled, @created_at + ) + ON CONFLICT (plugin_id, capability_type, capability_id) DO UPDATE SET + config_schema = @config_schema, + metadata = @metadata + """; + + await using var conn = await _dataSource.OpenConnectionAsync(ct); + await using var batch = new NpgsqlBatch(conn); + + foreach (var cap in capabilities) + { + var cmd = new NpgsqlBatchCommand(sql); + cmd.Parameters.AddWithValue("id", cap.Id); + cmd.Parameters.AddWithValue("plugin_id", pluginDbId); + cmd.Parameters.AddWithValue("capability_type", cap.CapabilityType); + cmd.Parameters.AddWithValue("capability_id", cap.CapabilityId); + cmd.Parameters.AddWithValue("config_schema", cap.ConfigSchema != null + ? JsonSerializer.Serialize(cap.ConfigSchema) + : DBNull.Value); + cmd.Parameters.AddWithValue("metadata", cap.Metadata != null + ? JsonSerializer.Serialize(cap.Metadata) + : DBNull.Value); + cmd.Parameters.AddWithValue("is_enabled", cap.IsEnabled); + cmd.Parameters.AddWithValue("created_at", cap.CreatedAt); + batch.BatchCommands.Add(cmd); + } + + await batch.ExecuteNonQueryAsync(ct); + } + + // ========== Instance Management ========== + + public async Task CreateInstanceAsync(CreatePluginInstanceRequest request, CancellationToken ct) + { + const string sql = """ + INSERT INTO platform.plugin_instances ( + plugin_id, tenant_id, instance_name, config, secrets_path, + resource_limits, enabled, status, created_at, updated_at + ) + SELECT p.id, @tenant_id, @instance_name, @config, @secrets_path, + @resource_limits, TRUE, 'pending', @now, @now + FROM platform.plugins p + WHERE p.plugin_id = @plugin_id + RETURNING * + """; + + await using var conn = await _dataSource.OpenConnectionAsync(ct); + await using var cmd = new NpgsqlCommand(sql, conn); + + var now = _timeProvider.GetUtcNow(); + cmd.Parameters.AddWithValue("plugin_id", request.PluginId); + cmd.Parameters.AddWithValue("tenant_id", (object?)request.TenantId ?? DBNull.Value); + cmd.Parameters.AddWithValue("instance_name", (object?)request.InstanceName ?? DBNull.Value); + cmd.Parameters.AddWithValue("config", JsonSerializer.Serialize(request.Config)); + cmd.Parameters.AddWithValue("secrets_path", (object?)request.SecretsPath ?? DBNull.Value); + cmd.Parameters.AddWithValue("resource_limits", request.ResourceLimits != null + ? JsonSerializer.Serialize(request.ResourceLimits) + : DBNull.Value); + cmd.Parameters.AddWithValue("now", now); + + await using var reader = await cmd.ExecuteReaderAsync(ct); + if (await reader.ReadAsync(ct)) + { + return MapInstanceRecord(reader); + } + + throw new InvalidOperationException($"Failed to create instance for plugin {request.PluginId}"); + } + + public async Task RecordHealthCheckAsync(string pluginId, HealthCheckResult result, CancellationToken ct) + { + const string sql = """ + INSERT INTO platform.plugin_health_history ( + plugin_id, checked_at, status, response_time_ms, details, created_at + ) + SELECT p.id, @checked_at, @status, @response_time_ms, @details, @checked_at + FROM platform.plugins p + WHERE p.plugin_id = @plugin_id + """; + + await using var conn = await _dataSource.OpenConnectionAsync(ct); + await using var cmd = new NpgsqlCommand(sql, conn); + + cmd.Parameters.AddWithValue("plugin_id", pluginId); + cmd.Parameters.AddWithValue("checked_at", _timeProvider.GetUtcNow()); + cmd.Parameters.AddWithValue("status", result.Status.ToString().ToLowerInvariant()); + cmd.Parameters.AddWithValue("response_time_ms", result.Duration?.TotalMilliseconds ?? 0); + cmd.Parameters.AddWithValue("details", result.Details != null + ? JsonSerializer.Serialize(result.Details) + : DBNull.Value); + + await cmd.ExecuteNonQueryAsync(ct); + } + + // ... additional method implementations ... + + private static PluginRecord MapPluginRecord(NpgsqlDataReader reader) => new() + { + Id = reader.GetGuid(reader.GetOrdinal("id")), + PluginId = reader.GetString(reader.GetOrdinal("plugin_id")), + Name = reader.GetString(reader.GetOrdinal("name")), + Version = reader.GetString(reader.GetOrdinal("version")), + Vendor = reader.GetString(reader.GetOrdinal("vendor")), + Description = reader.IsDBNull(reader.GetOrdinal("description")) ? null : reader.GetString(reader.GetOrdinal("description")), + TrustLevel = Enum.Parse(reader.GetString(reader.GetOrdinal("trust_level")), ignoreCase: true), + Capabilities = PluginCapabilitiesExtensions.FromStringArray(reader.GetFieldValue(reader.GetOrdinal("capabilities"))), + Status = Enum.Parse(reader.GetString(reader.GetOrdinal("status")), ignoreCase: true), + HealthStatus = reader.IsDBNull(reader.GetOrdinal("health_status")) + ? HealthStatus.Unknown + : Enum.Parse(reader.GetString(reader.GetOrdinal("health_status")), ignoreCase: true), + CreatedAt = reader.GetFieldValue(reader.GetOrdinal("created_at")), + UpdatedAt = reader.GetFieldValue(reader.GetOrdinal("updated_at")), + LoadedAt = reader.IsDBNull(reader.GetOrdinal("loaded_at")) ? null : reader.GetFieldValue(reader.GetOrdinal("loaded_at")) + }; + + private static PluginInstanceRecord MapInstanceRecord(NpgsqlDataReader reader) => new() + { + Id = reader.GetGuid(reader.GetOrdinal("id")), + PluginId = reader.GetGuid(reader.GetOrdinal("plugin_id")), + TenantId = reader.IsDBNull(reader.GetOrdinal("tenant_id")) ? null : reader.GetGuid(reader.GetOrdinal("tenant_id")), + InstanceName = reader.IsDBNull(reader.GetOrdinal("instance_name")) ? null : reader.GetString(reader.GetOrdinal("instance_name")), + Config = JsonDocument.Parse(reader.GetString(reader.GetOrdinal("config"))), + SecretsPath = reader.IsDBNull(reader.GetOrdinal("secrets_path")) ? null : reader.GetString(reader.GetOrdinal("secrets_path")), + Enabled = reader.GetBoolean(reader.GetOrdinal("enabled")), + Status = reader.GetString(reader.GetOrdinal("status")), + CreatedAt = reader.GetFieldValue(reader.GetOrdinal("created_at")), + UpdatedAt = reader.GetFieldValue(reader.GetOrdinal("updated_at")) + }; +} +``` + +### Database Migration + +```sql +-- Migrations/001_CreatePluginTables.sql + +-- Plugin registry table +CREATE TABLE IF NOT EXISTS platform.plugins ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + plugin_id VARCHAR(255) NOT NULL, + name VARCHAR(255) NOT NULL, + version VARCHAR(50) NOT NULL, + vendor VARCHAR(255) NOT NULL, + description TEXT, + license_id VARCHAR(50), + + -- Trust and security + trust_level VARCHAR(50) NOT NULL CHECK (trust_level IN ('builtin', 'trusted', 'untrusted')), + signature BYTEA, + signing_key_id VARCHAR(255), + + -- Capabilities + capabilities TEXT[] NOT NULL DEFAULT '{}', + capability_details JSONB NOT NULL DEFAULT '{}', + + -- Source and deployment + source VARCHAR(50) NOT NULL CHECK (source IN ('bundled', 'installed', 'discovered')), + assembly_path VARCHAR(500), + entry_point VARCHAR(255), + + -- Lifecycle + status VARCHAR(50) NOT NULL DEFAULT 'discovered' CHECK (status IN ( + 'discovered', 'loading', 'initializing', 'active', + 'degraded', 'stopping', 'stopped', 'failed', 'unloading' + )), + status_message TEXT, + + -- Health + health_status VARCHAR(50) DEFAULT 'unknown' CHECK (health_status IN ( + 'unknown', 'healthy', 'degraded', 'unhealthy' + )), + last_health_check TIMESTAMPTZ, + health_check_failures INT NOT NULL DEFAULT 0, + + -- Metadata + manifest JSONB, + runtime_info JSONB, + + -- Audit + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + loaded_at TIMESTAMPTZ, + + UNIQUE(plugin_id, version) +); + +-- Plugin capabilities +CREATE TABLE IF NOT EXISTS platform.plugin_capabilities ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + plugin_id UUID NOT NULL REFERENCES platform.plugins(id) ON DELETE CASCADE, + + capability_type VARCHAR(100) NOT NULL, + capability_id VARCHAR(255) NOT NULL, + + config_schema JSONB, + input_schema JSONB, + output_schema JSONB, + + display_name VARCHAR(255), + description TEXT, + documentation_url VARCHAR(500), + + is_enabled BOOLEAN NOT NULL DEFAULT TRUE, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + + UNIQUE(plugin_id, capability_type, capability_id) +); + +-- Plugin instances (for multi-tenant) +CREATE TABLE IF NOT EXISTS platform.plugin_instances ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + plugin_id UUID NOT NULL REFERENCES platform.plugins(id) ON DELETE CASCADE, + tenant_id UUID REFERENCES platform.tenants(id) ON DELETE CASCADE, + + instance_name VARCHAR(255), + config JSONB NOT NULL DEFAULT '{}', + secrets_path VARCHAR(500), + + enabled BOOLEAN NOT NULL DEFAULT TRUE, + status VARCHAR(50) NOT NULL DEFAULT 'pending', + + resource_limits JSONB, + + last_used_at TIMESTAMPTZ, + invocation_count BIGINT NOT NULL DEFAULT 0, + error_count BIGINT NOT NULL DEFAULT 0, + + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + + UNIQUE(plugin_id, tenant_id, COALESCE(instance_name, '')) +); + +-- Plugin health history (partitioned) +CREATE TABLE IF NOT EXISTS platform.plugin_health_history ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + plugin_id UUID NOT NULL REFERENCES platform.plugins(id) ON DELETE CASCADE, + + checked_at TIMESTAMPTZ NOT NULL DEFAULT now(), + status VARCHAR(50) NOT NULL, + response_time_ms INT, + details JSONB, + + created_at TIMESTAMPTZ NOT NULL DEFAULT now() +) PARTITION BY RANGE (created_at); + +-- Create partitions for health history (last 30 days) +CREATE TABLE IF NOT EXISTS platform.plugin_health_history_current + PARTITION OF platform.plugin_health_history + FOR VALUES FROM (CURRENT_DATE - INTERVAL '30 days') TO (CURRENT_DATE + INTERVAL '1 day'); + +-- Indexes +CREATE INDEX IF NOT EXISTS idx_plugins_plugin_id ON platform.plugins(plugin_id); +CREATE INDEX IF NOT EXISTS idx_plugins_status ON platform.plugins(status) WHERE status != 'active'; +CREATE INDEX IF NOT EXISTS idx_plugins_trust_level ON platform.plugins(trust_level); +CREATE INDEX IF NOT EXISTS idx_plugins_capabilities ON platform.plugins USING GIN (capabilities); +CREATE INDEX IF NOT EXISTS idx_plugins_health ON platform.plugins(health_status) WHERE health_status != 'healthy'; + +CREATE INDEX IF NOT EXISTS idx_plugin_capabilities_type ON platform.plugin_capabilities(capability_type); +CREATE INDEX IF NOT EXISTS idx_plugin_capabilities_lookup ON platform.plugin_capabilities(capability_type, capability_id); +CREATE INDEX IF NOT EXISTS idx_plugin_capabilities_plugin ON platform.plugin_capabilities(plugin_id); + +CREATE INDEX IF NOT EXISTS idx_plugin_instances_tenant ON platform.plugin_instances(tenant_id) WHERE tenant_id IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_plugin_instances_plugin ON platform.plugin_instances(plugin_id); +CREATE INDEX IF NOT EXISTS idx_plugin_instances_enabled ON platform.plugin_instances(plugin_id, enabled) WHERE enabled = TRUE; + +CREATE INDEX IF NOT EXISTS idx_plugin_health_history_plugin ON platform.plugin_health_history(plugin_id, checked_at DESC); +``` + +--- + +## Acceptance Criteria + +- [ ] `IPluginRegistry` interface with all methods +- [ ] PostgreSQL implementation +- [ ] Plugin registration and unregistration +- [ ] Status updates +- [ ] Health updates and history +- [ ] Capability registration and queries +- [ ] Capability type/id lookup +- [ ] Instance creation +- [ ] Instance configuration updates +- [ ] Instance enable/disable +- [ ] Tenant-scoped instance queries +- [ ] Database migration scripts +- [ ] Partitioned health history table +- [ ] Integration tests with PostgreSQL +- [ ] Test coverage >= 80% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_001 Plugin Abstractions | Internal | TODO | +| 100_002 Plugin Host | Internal | TODO | +| PostgreSQL 16+ | External | Available | +| Npgsql 8.x | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IPluginRegistry interface | TODO | | +| PostgresPluginRegistry | TODO | | +| PluginRecord model | TODO | | +| PluginCapabilityRecord model | TODO | | +| PluginInstanceRecord model | TODO | | +| PluginHealthRecord model | TODO | | +| Database migration | TODO | | +| Integration tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_100_004_PLUGIN_sandbox.md b/docs/implplan/SPRINT_20260110_100_004_PLUGIN_sandbox.md new file mode 100644 index 000000000..0d1da0b94 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_004_PLUGIN_sandbox.md @@ -0,0 +1,1134 @@ +# SPRINT: Plugin Sandbox Infrastructure + +> **Sprint ID:** 100_004 +> **Module:** PLUGIN +> **Phase:** 100 - Plugin System Unification +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_plugin_unification.md) + +--- + +## Overview + +Implement the plugin sandbox infrastructure that provides process isolation, resource limits, and security boundaries for untrusted plugins. The sandbox ensures that third-party plugins cannot compromise platform stability or security. + +### Objectives + +- Implement process-based plugin isolation +- Implement resource limits (CPU, memory, disk, network) +- Implement gRPC communication bridge +- Implement network policy enforcement +- Implement filesystem isolation +- Implement secret proxy for controlled vault access + +### Working Directory + +``` +src/Plugin/ +├── StellaOps.Plugin.Sandbox/ +│ ├── StellaOps.Plugin.Sandbox.csproj +│ ├── ISandbox.cs +│ ├── ISandboxFactory.cs +│ ├── ProcessSandbox.cs +│ ├── SandboxConfiguration.cs +│ ├── Process/ +│ │ ├── PluginProcessManager.cs +│ │ ├── PluginProcessHost.cs +│ │ └── ProcessMonitor.cs +│ ├── Communication/ +│ │ ├── GrpcPluginBridge.cs +│ │ ├── PluginServiceImpl.cs +│ │ └── Proto/ +│ │ └── plugin_bridge.proto +│ ├── Resources/ +│ │ ├── IResourceLimiter.cs +│ │ ├── LinuxResourceLimiter.cs +│ │ ├── WindowsResourceLimiter.cs +│ │ └── ResourceUsage.cs +│ ├── Network/ +│ │ ├── INetworkPolicy.cs +│ │ ├── NetworkPolicyEnforcer.cs +│ │ └── AllowedHostsFilter.cs +│ ├── Filesystem/ +│ │ ├── IFilesystemPolicy.cs +│ │ ├── SandboxedFilesystem.cs +│ │ └── FilesystemMount.cs +│ └── Secrets/ +│ ├── ISecretProxy.cs +│ └── ScopedSecretProxy.cs +├── StellaOps.Plugin.Sandbox.Host/ +│ ├── StellaOps.Plugin.Sandbox.Host.csproj +│ ├── Program.cs +│ └── PluginHostService.cs +└── __Tests/ + └── StellaOps.Plugin.Sandbox.Tests/ + ├── ProcessSandboxTests.cs + ├── ResourceLimiterTests.cs + └── NetworkPolicyTests.cs +``` + +--- + +## Deliverables + +### Sandbox Interface + +```csharp +// ISandbox.cs +namespace StellaOps.Plugin.Sandbox; + +/// +/// Provides isolated execution environment for untrusted plugins. +/// +public interface ISandbox : IAsyncDisposable +{ + /// + /// Sandbox identifier. + /// + string Id { get; } + + /// + /// Current sandbox state. + /// + SandboxState State { get; } + + /// + /// Current resource usage. + /// + ResourceUsage CurrentUsage { get; } + + /// + /// Start the sandbox and load the plugin. + /// + Task StartAsync(PluginManifest manifest, CancellationToken ct); + + /// + /// Stop the sandbox gracefully. + /// + Task StopAsync(TimeSpan timeout, CancellationToken ct); + + /// + /// Execute an operation in the sandbox. + /// + Task ExecuteAsync( + string operationName, + object? parameters, + TimeSpan timeout, + CancellationToken ct); + + /// + /// Execute a streaming operation in the sandbox. + /// + IAsyncEnumerable ExecuteStreamingAsync( + string operationName, + object? parameters, + CancellationToken ct); + + /// + /// Perform health check on sandboxed plugin. + /// + Task HealthCheckAsync(CancellationToken ct); + + /// + /// Event raised when sandbox state changes. + /// + event EventHandler? StateChanged; + + /// + /// Event raised when resource limits are approached. + /// + event EventHandler? ResourceWarning; +} + +public enum SandboxState +{ + Created, + Starting, + Running, + Stopping, + Stopped, + Failed, + Killed +} + +public sealed class SandboxStateChangedEventArgs : EventArgs +{ + public required SandboxState OldState { get; init; } + public required SandboxState NewState { get; init; } + public string? Reason { get; init; } +} + +public sealed class ResourceWarningEventArgs : EventArgs +{ + public required ResourceType Resource { get; init; } + public required double CurrentUsagePercent { get; init; } + public required double ThresholdPercent { get; init; } +} + +public enum ResourceType +{ + Memory, + Cpu, + Disk, + Network +} +``` + +### Sandbox Configuration + +```csharp +// SandboxConfiguration.cs +namespace StellaOps.Plugin.Sandbox; + +/// +/// Configuration for plugin sandbox. +/// +public sealed record SandboxConfiguration +{ + /// + /// Resource limits for the sandbox. + /// + public required ResourceLimits ResourceLimits { get; init; } + + /// + /// Network policy for the sandbox. + /// + public required NetworkPolicy NetworkPolicy { get; init; } + + /// + /// Filesystem policy for the sandbox. + /// + public required FilesystemPolicy FilesystemPolicy { get; init; } + + /// + /// Timeouts for sandbox operations. + /// + public required SandboxTimeouts Timeouts { get; init; } + + /// + /// Whether to enable process isolation. + /// + public bool ProcessIsolation { get; init; } = true; + + /// + /// Working directory for the sandbox. + /// + public string? WorkingDirectory { get; init; } + + /// + /// Environment variables to pass to the sandbox. + /// + public IReadOnlyDictionary EnvironmentVariables { get; init; } = + new Dictionary(); + + /// + /// Default configuration for untrusted plugins. + /// + public static SandboxConfiguration Default => new() + { + ResourceLimits = new ResourceLimits + { + MaxMemoryMb = 512, + MaxCpuPercent = 25, + MaxDiskMb = 100, + MaxNetworkBandwidthMbps = 10 + }, + NetworkPolicy = new NetworkPolicy + { + AllowedHosts = new HashSet(), + BlockedPorts = new HashSet { 22, 3389, 5432, 27017, 6379 } + }, + FilesystemPolicy = new FilesystemPolicy + { + ReadOnlyPaths = new List(), + WritablePaths = new List(), + BlockedPaths = new List { "/etc", "/var", "/root", "C:\\Windows" } + }, + Timeouts = new SandboxTimeouts + { + StartupTimeout = TimeSpan.FromSeconds(30), + OperationTimeout = TimeSpan.FromSeconds(60), + ShutdownTimeout = TimeSpan.FromSeconds(10), + HealthCheckTimeout = TimeSpan.FromSeconds(5) + } + }; +} + +public sealed record ResourceLimits +{ + public int MaxMemoryMb { get; init; } = 512; + public int MaxCpuPercent { get; init; } = 25; + public int MaxDiskMb { get; init; } = 100; + public int MaxNetworkBandwidthMbps { get; init; } = 10; + public int MaxOpenFiles { get; init; } = 1000; + public int MaxProcesses { get; init; } = 10; +} + +public sealed record NetworkPolicy +{ + public IReadOnlySet AllowedHosts { get; init; } = new HashSet(); + public IReadOnlySet BlockedHosts { get; init; } = new HashSet(); + public IReadOnlySet AllowedPorts { get; init; } = new HashSet { 80, 443 }; + public IReadOnlySet BlockedPorts { get; init; } = new HashSet(); + public bool AllowDns { get; init; } = true; + public int MaxConnectionsPerHost { get; init; } = 10; +} + +public sealed record FilesystemPolicy +{ + public IReadOnlyList ReadOnlyPaths { get; init; } = new List(); + public IReadOnlyList WritablePaths { get; init; } = new List(); + public IReadOnlyList BlockedPaths { get; init; } = new List(); + public long MaxWriteBytes { get; init; } = 100 * 1024 * 1024; // 100 MB +} + +public sealed record SandboxTimeouts +{ + public TimeSpan StartupTimeout { get; init; } = TimeSpan.FromSeconds(30); + public TimeSpan OperationTimeout { get; init; } = TimeSpan.FromSeconds(60); + public TimeSpan ShutdownTimeout { get; init; } = TimeSpan.FromSeconds(10); + public TimeSpan HealthCheckTimeout { get; init; } = TimeSpan.FromSeconds(5); +} +``` + +### Process Sandbox Implementation + +```csharp +// ProcessSandbox.cs +namespace StellaOps.Plugin.Sandbox; + +public sealed class ProcessSandbox : ISandbox +{ + private readonly SandboxConfiguration _config; + private readonly IPluginProcessManager _processManager; + private readonly IGrpcPluginBridge _bridge; + private readonly IResourceLimiter _resourceLimiter; + private readonly INetworkPolicyEnforcer _networkEnforcer; + private readonly ILogger _logger; + + private Process? _process; + private SandboxState _state = SandboxState.Created; + private ResourceUsage _currentUsage = new(); + + public string Id { get; } + public SandboxState State => _state; + public ResourceUsage CurrentUsage => _currentUsage; + + public event EventHandler? StateChanged; + public event EventHandler? ResourceWarning; + + public ProcessSandbox( + string id, + SandboxConfiguration config, + IPluginProcessManager processManager, + IGrpcPluginBridge bridge, + IResourceLimiter resourceLimiter, + INetworkPolicyEnforcer networkEnforcer, + ILogger logger) + { + Id = id; + _config = config; + _processManager = processManager; + _bridge = bridge; + _resourceLimiter = resourceLimiter; + _networkEnforcer = networkEnforcer; + _logger = logger; + } + + public async Task StartAsync(PluginManifest manifest, CancellationToken ct) + { + TransitionState(SandboxState.Starting); + + try + { + // 1. Create isolated working directory + var workDir = PrepareWorkingDirectory(manifest); + + // 2. Configure resource limits + var resourceConfig = _resourceLimiter.CreateConfiguration(_config.ResourceLimits); + + // 3. Configure network policy + await _networkEnforcer.ApplyPolicyAsync(Id, _config.NetworkPolicy, ct); + + // 4. Start the plugin host process + var socketPath = GetSocketPath(); + _process = await _processManager.StartAsync(new ProcessStartRequest + { + PluginAssemblyPath = manifest.AssemblyPath!, + EntryPoint = manifest.EntryPoint, + WorkingDirectory = workDir, + SocketPath = socketPath, + ResourceConfiguration = resourceConfig, + EnvironmentVariables = _config.EnvironmentVariables + }, ct); + + // 5. Wait for the process to be ready + await WaitForReadyAsync(ct); + + // 6. Connect gRPC bridge + await _bridge.ConnectAsync(socketPath, ct); + + // 7. Initialize the plugin + await _bridge.InitializePluginAsync(manifest, ct); + + // 8. Start resource monitoring + StartResourceMonitoring(); + + TransitionState(SandboxState.Running); + + _logger.LogInformation("Sandbox {Id} started for plugin {PluginId}", + Id, manifest.Info.Id); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to start sandbox {Id}", Id); + TransitionState(SandboxState.Failed, ex.Message); + throw; + } + } + + public async Task StopAsync(TimeSpan timeout, CancellationToken ct) + { + if (_state != SandboxState.Running) + return; + + TransitionState(SandboxState.Stopping); + + try + { + // 1. Signal graceful shutdown via gRPC + using var timeoutCts = CancellationTokenSource.CreateLinkedTokenSource(ct); + timeoutCts.CancelAfter(timeout); + + try + { + await _bridge.ShutdownPluginAsync(timeoutCts.Token); + } + catch (OperationCanceledException) + { + _logger.LogWarning("Sandbox {Id} did not shutdown gracefully, killing", Id); + } + + // 2. Disconnect bridge + await _bridge.DisconnectAsync(ct); + + // 3. Stop the process + await _processManager.StopAsync(_process!, timeout, ct); + + // 4. Cleanup network policy + await _networkEnforcer.RemovePolicyAsync(Id, ct); + + // 5. Cleanup working directory + CleanupWorkingDirectory(); + + TransitionState(SandboxState.Stopped); + + _logger.LogInformation("Sandbox {Id} stopped", Id); + } + catch (Exception ex) + { + _logger.LogError(ex, "Error stopping sandbox {Id}", Id); + TransitionState(SandboxState.Failed, ex.Message); + throw; + } + } + + public async Task ExecuteAsync( + string operationName, + object? parameters, + TimeSpan timeout, + CancellationToken ct) + { + EnsureRunning(); + + using var timeoutCts = CancellationTokenSource.CreateLinkedTokenSource(ct); + timeoutCts.CancelAfter(timeout); + + try + { + return await _bridge.InvokeAsync(operationName, parameters, timeoutCts.Token); + } + catch (OperationCanceledException) when (timeoutCts.IsCancellationRequested && !ct.IsCancellationRequested) + { + throw new TimeoutException($"Operation '{operationName}' timed out after {timeout}"); + } + } + + public async IAsyncEnumerable ExecuteStreamingAsync( + string operationName, + object? parameters, + [EnumeratorCancellation] CancellationToken ct) + { + EnsureRunning(); + + await foreach (var evt in _bridge.InvokeStreamingAsync(operationName, parameters, ct)) + { + yield return evt; + } + } + + public async Task HealthCheckAsync(CancellationToken ct) + { + if (_state != SandboxState.Running) + { + return HealthCheckResult.Unhealthy($"Sandbox is in state {_state}"); + } + + try + { + using var timeoutCts = CancellationTokenSource.CreateLinkedTokenSource(ct); + timeoutCts.CancelAfter(_config.Timeouts.HealthCheckTimeout); + + var result = await _bridge.HealthCheckAsync(timeoutCts.Token); + + // Add resource usage to details + var details = new Dictionary(result.Details ?? new Dictionary()) + { + ["sandboxId"] = Id, + ["memoryUsageMb"] = _currentUsage.MemoryUsageMb, + ["cpuUsagePercent"] = _currentUsage.CpuUsagePercent + }; + + return result with { Details = details }; + } + catch (Exception ex) + { + return HealthCheckResult.Unhealthy(ex); + } + } + + private void EnsureRunning() + { + if (_state != SandboxState.Running) + { + throw new InvalidOperationException($"Sandbox is not running (state: {_state})"); + } + } + + private void TransitionState(SandboxState newState, string? reason = null) + { + var oldState = _state; + _state = newState; + + StateChanged?.Invoke(this, new SandboxStateChangedEventArgs + { + OldState = oldState, + NewState = newState, + Reason = reason + }); + } + + private string PrepareWorkingDirectory(PluginManifest manifest) + { + var workDir = _config.WorkingDirectory + ?? Path.Combine(Path.GetTempPath(), "stellaops-sandbox", Id); + + if (Directory.Exists(workDir)) + Directory.Delete(workDir, recursive: true); + + Directory.CreateDirectory(workDir); + + // Copy plugin files to sandbox directory + var pluginDir = Path.GetDirectoryName(manifest.AssemblyPath)!; + CopyDirectory(pluginDir, workDir); + + return workDir; + } + + private void CleanupWorkingDirectory() + { + var workDir = _config.WorkingDirectory + ?? Path.Combine(Path.GetTempPath(), "stellaops-sandbox", Id); + + if (Directory.Exists(workDir)) + { + try + { + Directory.Delete(workDir, recursive: true); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Failed to cleanup sandbox directory {WorkDir}", workDir); + } + } + } + + private string GetSocketPath() + { + if (OperatingSystem.IsWindows()) + { + return $"\\\\.\\pipe\\stellaops-sandbox-{Id}"; + } + else + { + return Path.Combine(Path.GetTempPath(), $"stellaops-sandbox-{Id}.sock"); + } + } + + private async Task WaitForReadyAsync(CancellationToken ct) + { + using var timeoutCts = CancellationTokenSource.CreateLinkedTokenSource(ct); + timeoutCts.CancelAfter(_config.Timeouts.StartupTimeout); + + while (!timeoutCts.IsCancellationRequested) + { + if (_process?.HasExited == true) + { + throw new InvalidOperationException( + $"Plugin process exited with code {_process.ExitCode}"); + } + + if (File.Exists(GetSocketPath()) || OperatingSystem.IsWindows()) + { + // Try to connect + try + { + await _bridge.ConnectAsync(GetSocketPath(), timeoutCts.Token); + return; + } + catch + { + // Not ready yet + } + } + + await Task.Delay(100, timeoutCts.Token); + } + + throw new TimeoutException("Plugin process did not become ready in time"); + } + + private void StartResourceMonitoring() + { + _ = Task.Run(async () => + { + while (_state == SandboxState.Running) + { + try + { + _currentUsage = await _resourceLimiter.GetUsageAsync(_process!, default); + + // Check thresholds + CheckResourceThreshold(ResourceType.Memory, + _currentUsage.MemoryUsageMb, + _config.ResourceLimits.MaxMemoryMb); + + CheckResourceThreshold(ResourceType.Cpu, + _currentUsage.CpuUsagePercent, + _config.ResourceLimits.MaxCpuPercent); + + await Task.Delay(1000); + } + catch (Exception ex) + { + _logger.LogError(ex, "Error monitoring resources for sandbox {Id}", Id); + } + } + }); + } + + private void CheckResourceThreshold(ResourceType resource, double current, double max) + { + var percent = (current / max) * 100; + if (percent >= 80) + { + ResourceWarning?.Invoke(this, new ResourceWarningEventArgs + { + Resource = resource, + CurrentUsagePercent = percent, + ThresholdPercent = 80 + }); + } + } + + private static void CopyDirectory(string source, string destination) + { + foreach (var dir in Directory.GetDirectories(source, "*", SearchOption.AllDirectories)) + { + Directory.CreateDirectory(dir.Replace(source, destination)); + } + + foreach (var file in Directory.GetFiles(source, "*", SearchOption.AllDirectories)) + { + File.Copy(file, file.Replace(source, destination), overwrite: true); + } + } + + public async ValueTask DisposeAsync() + { + if (_state == SandboxState.Running) + { + await StopAsync(_config.Timeouts.ShutdownTimeout, CancellationToken.None); + } + + _bridge?.Dispose(); + } +} +``` + +### gRPC Plugin Bridge + +```protobuf +// Proto/plugin_bridge.proto +syntax = "proto3"; + +package stellaops.plugin.bridge; + +option csharp_namespace = "StellaOps.Plugin.Sandbox.Communication"; + +service PluginBridge { + // Lifecycle + rpc Initialize(InitializeRequest) returns (InitializeResponse); + rpc Shutdown(ShutdownRequest) returns (ShutdownResponse); + rpc HealthCheck(HealthCheckRequest) returns (HealthCheckResponse); + + // Operations + rpc Invoke(InvokeRequest) returns (InvokeResponse); + rpc InvokeStreaming(InvokeRequest) returns (stream StreamingEvent); + + // Logging + rpc StreamLogs(LogStreamRequest) returns (stream LogEntry); +} + +message InitializeRequest { + string manifest_json = 1; + string config_json = 2; +} + +message InitializeResponse { + bool success = 1; + string error = 2; +} + +message ShutdownRequest { + int32 timeout_ms = 1; +} + +message ShutdownResponse { + bool success = 1; +} + +message HealthCheckRequest {} + +message HealthCheckResponse { + string status = 1; // healthy, degraded, unhealthy + string message = 2; + int32 duration_ms = 3; + string details_json = 4; +} + +message InvokeRequest { + string operation = 1; + string parameters_json = 2; + int32 timeout_ms = 3; +} + +message InvokeResponse { + bool success = 1; + string result_json = 2; + string error = 3; +} + +message StreamingEvent { + string event_type = 1; + string payload_json = 2; + int64 timestamp_unix_ms = 3; +} + +message LogStreamRequest { + string min_level = 1; +} + +message LogEntry { + int64 timestamp_unix_ms = 1; + string level = 2; + string message = 3; + string properties_json = 4; +} +``` + +### Resource Limiter (Linux) + +```csharp +// Resources/LinuxResourceLimiter.cs +namespace StellaOps.Plugin.Sandbox.Resources; + +public sealed class LinuxResourceLimiter : IResourceLimiter +{ + private readonly ILogger _logger; + + public LinuxResourceLimiter(ILogger logger) + { + _logger = logger; + } + + public ResourceConfiguration CreateConfiguration(ResourceLimits limits) + { + return new ResourceConfiguration + { + // Memory limit using cgroups v2 + MemoryLimitBytes = limits.MaxMemoryMb * 1024L * 1024L, + + // CPU limit as percentage (cgroups cpu.max) + CpuQuotaUs = (long)(limits.MaxCpuPercent * 1000), // Per 100ms period + CpuPeriodUs = 100_000, // 100ms + + // Process limit + MaxProcesses = limits.MaxProcesses, + + // File descriptor limit + MaxOpenFiles = limits.MaxOpenFiles + }; + } + + public async Task ApplyLimitsAsync(Process process, ResourceConfiguration config, CancellationToken ct) + { + var cgroupPath = $"/sys/fs/cgroup/stellaops-sandbox/{process.Id}"; + + try + { + // Create cgroup for this process + Directory.CreateDirectory(cgroupPath); + + // Set memory limit + await File.WriteAllTextAsync( + Path.Combine(cgroupPath, "memory.max"), + config.MemoryLimitBytes.ToString(), + ct); + + // Set CPU limit + await File.WriteAllTextAsync( + Path.Combine(cgroupPath, "cpu.max"), + $"{config.CpuQuotaUs} {config.CpuPeriodUs}", + ct); + + // Set process limit + await File.WriteAllTextAsync( + Path.Combine(cgroupPath, "pids.max"), + config.MaxProcesses.ToString(), + ct); + + // Add process to cgroup + await File.WriteAllTextAsync( + Path.Combine(cgroupPath, "cgroup.procs"), + process.Id.ToString(), + ct); + + _logger.LogDebug("Applied cgroup limits for process {ProcessId}", process.Id); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to apply cgroup limits for process {ProcessId}", process.Id); + throw; + } + } + + public async Task GetUsageAsync(Process process, CancellationToken ct) + { + var cgroupPath = $"/sys/fs/cgroup/stellaops-sandbox/{process.Id}"; + + try + { + // Read memory usage + var memoryUsageStr = await File.ReadAllTextAsync( + Path.Combine(cgroupPath, "memory.current"), ct); + var memoryUsageBytes = long.Parse(memoryUsageStr.Trim()); + + // Read CPU usage + var cpuStatStr = await File.ReadAllTextAsync( + Path.Combine(cgroupPath, "cpu.stat"), ct); + var cpuUsageUs = ParseCpuStat(cpuStatStr); + + return new ResourceUsage + { + MemoryUsageMb = memoryUsageBytes / (1024.0 * 1024.0), + CpuUsagePercent = CalculateCpuPercent(cpuUsageUs), + ProcessCount = process.Threads.Count + }; + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Failed to read resource usage for process {ProcessId}", process.Id); + return new ResourceUsage(); + } + } + + public async Task RemoveLimitsAsync(Process process, CancellationToken ct) + { + var cgroupPath = $"/sys/fs/cgroup/stellaops-sandbox/{process.Id}"; + + try + { + if (Directory.Exists(cgroupPath)) + { + // Move process out of cgroup first + await File.WriteAllTextAsync( + "/sys/fs/cgroup/cgroup.procs", + process.Id.ToString(), + ct); + + Directory.Delete(cgroupPath); + } + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Failed to cleanup cgroup for process {ProcessId}", process.Id); + } + } + + private static long ParseCpuStat(string stat) + { + foreach (var line in stat.Split('\n')) + { + if (line.StartsWith("usage_usec")) + { + return long.Parse(line.Split(' ')[1]); + } + } + return 0; + } + + private double CalculateCpuPercent(long cpuUsageUs) + { + // Simplified calculation - would need to track over time for accuracy + return 0; + } +} + +public sealed record ResourceConfiguration +{ + public long MemoryLimitBytes { get; init; } + public long CpuQuotaUs { get; init; } + public long CpuPeriodUs { get; init; } + public int MaxProcesses { get; init; } + public int MaxOpenFiles { get; init; } +} + +public sealed record ResourceUsage +{ + public double MemoryUsageMb { get; init; } + public double CpuUsagePercent { get; init; } + public int ProcessCount { get; init; } + public long DiskUsageBytes { get; init; } + public long NetworkBytesIn { get; init; } + public long NetworkBytesOut { get; init; } +} +``` + +### Network Policy Enforcer + +```csharp +// Network/NetworkPolicyEnforcer.cs +namespace StellaOps.Plugin.Sandbox.Network; + +public sealed class NetworkPolicyEnforcer : INetworkPolicyEnforcer +{ + private readonly ILogger _logger; + private readonly ConcurrentDictionary _activePolicies = new(); + + public NetworkPolicyEnforcer(ILogger logger) + { + _logger = logger; + } + + public async Task ApplyPolicyAsync(string sandboxId, NetworkPolicy policy, CancellationToken ct) + { + _activePolicies[sandboxId] = policy; + + if (OperatingSystem.IsLinux()) + { + await ApplyIptablesRulesAsync(sandboxId, policy, ct); + } + else if (OperatingSystem.IsWindows()) + { + await ApplyWindowsFirewallRulesAsync(sandboxId, policy, ct); + } + + _logger.LogDebug("Applied network policy for sandbox {SandboxId}", sandboxId); + } + + public async Task RemovePolicyAsync(string sandboxId, CancellationToken ct) + { + if (_activePolicies.TryRemove(sandboxId, out _)) + { + if (OperatingSystem.IsLinux()) + { + await RemoveIptablesRulesAsync(sandboxId, ct); + } + else if (OperatingSystem.IsWindows()) + { + await RemoveWindowsFirewallRulesAsync(sandboxId, ct); + } + + _logger.LogDebug("Removed network policy for sandbox {SandboxId}", sandboxId); + } + } + + public bool IsAllowed(string sandboxId, string host, int port) + { + if (!_activePolicies.TryGetValue(sandboxId, out var policy)) + return false; + + // Check blocked ports + if (policy.BlockedPorts.Contains(port)) + return false; + + // Check allowed ports + if (policy.AllowedPorts.Count > 0 && !policy.AllowedPorts.Contains(port)) + return false; + + // Check blocked hosts + if (policy.BlockedHosts.Contains(host)) + return false; + + // Check allowed hosts (if specified, only these are allowed) + if (policy.AllowedHosts.Count > 0 && !policy.AllowedHosts.Contains(host)) + return false; + + return true; + } + + private async Task ApplyIptablesRulesAsync(string sandboxId, NetworkPolicy policy, CancellationToken ct) + { + var chain = $"STELLAOPS_SANDBOX_{sandboxId.Replace("-", "_").ToUpperInvariant()}"; + + // Create chain + await ExecuteCommandAsync("iptables", $"-N {chain}", ct); + + // Add rules for blocked ports + foreach (var port in policy.BlockedPorts) + { + await ExecuteCommandAsync("iptables", + $"-A {chain} -p tcp --dport {port} -j DROP", ct); + await ExecuteCommandAsync("iptables", + $"-A {chain} -p udp --dport {port} -j DROP", ct); + } + + // Add rules for allowed hosts only + if (policy.AllowedHosts.Count > 0) + { + foreach (var host in policy.AllowedHosts) + { + await ExecuteCommandAsync("iptables", + $"-A {chain} -d {host} -j ACCEPT", ct); + } + // Drop everything else + await ExecuteCommandAsync("iptables", + $"-A {chain} -j DROP", ct); + } + } + + private async Task RemoveIptablesRulesAsync(string sandboxId, CancellationToken ct) + { + var chain = $"STELLAOPS_SANDBOX_{sandboxId.Replace("-", "_").ToUpperInvariant()}"; + + await ExecuteCommandAsync("iptables", $"-F {chain}", ct); + await ExecuteCommandAsync("iptables", $"-X {chain}", ct); + } + + private async Task ApplyWindowsFirewallRulesAsync(string sandboxId, NetworkPolicy policy, CancellationToken ct) + { + // Use Windows Firewall with Advanced Security + foreach (var port in policy.BlockedPorts) + { + var ruleName = $"StellaOps-Sandbox-{sandboxId}-Block-{port}"; + await ExecuteCommandAsync("netsh", + $"advfirewall firewall add rule name=\"{ruleName}\" dir=out protocol=tcp remoteport={port} action=block", + ct); + } + } + + private async Task RemoveWindowsFirewallRulesAsync(string sandboxId, CancellationToken ct) + { + await ExecuteCommandAsync("netsh", + $"advfirewall firewall delete rule name=all dir=out | findstr \"StellaOps-Sandbox-{sandboxId}\"", + ct); + } + + private static async Task ExecuteCommandAsync(string command, string args, CancellationToken ct) + { + using var process = new Process + { + StartInfo = new ProcessStartInfo + { + FileName = command, + Arguments = args, + RedirectStandardOutput = true, + RedirectStandardError = true, + UseShellExecute = false, + CreateNoWindow = true + } + }; + + process.Start(); + await process.WaitForExitAsync(ct); + + if (process.ExitCode != 0) + { + var error = await process.StandardError.ReadToEndAsync(ct); + throw new InvalidOperationException($"Command failed: {command} {args}\n{error}"); + } + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] `ISandbox` interface with all methods +- [ ] `SandboxConfiguration` with defaults +- [ ] Process-based sandbox implementation +- [ ] gRPC communication bridge +- [ ] Plugin host process executable +- [ ] Resource limits (Linux cgroups v2) +- [ ] Resource limits (Windows Job Objects) +- [ ] Network policy enforcement (iptables/Windows Firewall) +- [ ] Filesystem isolation +- [ ] Resource usage monitoring +- [ ] Resource warning events +- [ ] Graceful shutdown with timeout +- [ ] Process kill on timeout +- [ ] Unit tests for all components +- [ ] Integration tests with real processes +- [ ] Test coverage >= 80% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_001 Plugin Abstractions | Internal | TODO | +| 100_002 Plugin Host | Internal | TODO | +| Grpc.AspNetCore | External | Available | +| .NET 10 | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| ISandbox interface | TODO | | +| SandboxConfiguration | TODO | | +| ProcessSandbox | TODO | | +| GrpcPluginBridge | TODO | | +| plugin_bridge.proto | TODO | | +| PluginProcessManager | TODO | | +| LinuxResourceLimiter | TODO | | +| WindowsResourceLimiter | TODO | | +| NetworkPolicyEnforcer | TODO | | +| SandboxedFilesystem | TODO | | +| ScopedSecretProxy | TODO | | +| Plugin host executable | TODO | | +| Unit tests | TODO | | +| Integration tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_100_005_PLUGIN_crypto_rework.md b/docs/implplan/SPRINT_20260110_100_005_PLUGIN_crypto_rework.md new file mode 100644 index 000000000..d801ce48f --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_005_PLUGIN_crypto_rework.md @@ -0,0 +1,421 @@ +# SPRINT: Crypto Plugin Rework + +> **Sprint ID:** 100_005 +> **Module:** PLUGIN +> **Phase:** 100 - Plugin System Unification +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_plugin_unification.md) + +--- + +## Overview + +Rework all cryptographic providers (GOST, eIDAS, SM2/SM3/SM4, FIPS, HSM) to implement the unified plugin architecture with `IPlugin` and `ICryptoCapability` interfaces. + +### Objectives + +- Migrate GOST provider to unified plugin model +- Migrate eIDAS provider to unified plugin model +- Migrate SM2/SM3/SM4 provider to unified plugin model +- Migrate FIPS provider to unified plugin model +- Migrate HSM integration to unified plugin model +- Preserve all existing functionality +- Add health checks for all providers +- Add plugin manifests + +### Current State + +``` +src/Cryptography/ +├── StellaOps.Cryptography.Gost/ # GOST R 34.10-2012, R 34.11-2012 +├── StellaOps.Cryptography.Eidas/ # EU eIDAS qualified signatures +├── StellaOps.Cryptography.Sm/ # Chinese SM2/SM3/SM4 +├── StellaOps.Cryptography.Fips/ # US FIPS 140-2 compliant +└── StellaOps.Cryptography.Hsm/ # Hardware Security Module integration +``` + +### Target State + +``` +src/Cryptography/ +├── StellaOps.Cryptography.Plugin.Gost/ +│ ├── GostPlugin.cs # IPlugin implementation +│ ├── GostCryptoCapability.cs # ICryptoCapability implementation +│ ├── plugin.yaml # Plugin manifest +│ └── ... +├── StellaOps.Cryptography.Plugin.Eidas/ +├── StellaOps.Cryptography.Plugin.Sm/ +├── StellaOps.Cryptography.Plugin.Fips/ +└── StellaOps.Cryptography.Plugin.Hsm/ +``` + +--- + +## Deliverables + +### GOST Plugin Implementation + +```csharp +// GostPlugin.cs +namespace StellaOps.Cryptography.Plugin.Gost; + +[Plugin( + id: "com.stellaops.crypto.gost", + name: "GOST Cryptography Provider", + version: "1.0.0", + vendor: "Stella Ops")] +[ProvidesCapability(PluginCapabilities.Crypto, CapabilityId = "gost")] +public sealed class GostPlugin : IPlugin, ICryptoCapability +{ + private IPluginContext? _context; + private GostCryptoService? _cryptoService; + + public PluginInfo Info => new( + Id: "com.stellaops.crypto.gost", + Name: "GOST Cryptography Provider", + Version: "1.0.0", + Vendor: "Stella Ops", + Description: "Russian GOST R 34.10-2012 and R 34.11-2012 cryptographic algorithms", + LicenseId: "AGPL-3.0-or-later"); + + public PluginTrustLevel TrustLevel => PluginTrustLevel.BuiltIn; + + public PluginCapabilities Capabilities => PluginCapabilities.Crypto; + + public PluginLifecycleState State { get; private set; } = PluginLifecycleState.Discovered; + + // ICryptoCapability implementation + public IReadOnlyList SupportedAlgorithms => new[] + { + "GOST-R34.10-2012-256", + "GOST-R34.10-2012-512", + "GOST-R34.11-2012-256", + "GOST-R34.11-2012-512", + "GOST-28147-89" + }; + + public async Task InitializeAsync(IPluginContext context, CancellationToken ct) + { + _context = context; + State = PluginLifecycleState.Initializing; + + try + { + var options = context.Configuration.Bind(); + _cryptoService = new GostCryptoService(options, context.Logger); + + await _cryptoService.InitializeAsync(ct); + + State = PluginLifecycleState.Active; + context.Logger.Info("GOST cryptography provider initialized"); + } + catch (Exception ex) + { + State = PluginLifecycleState.Failed; + context.Logger.Error(ex, "Failed to initialize GOST provider"); + throw; + } + } + + public async Task HealthCheckAsync(CancellationToken ct) + { + if (_cryptoService == null) + return HealthCheckResult.Unhealthy("Provider not initialized"); + + try + { + // Verify we can perform a test operation + var testData = "test"u8.ToArray(); + var hash = await HashAsync(testData, "GOST-R34.11-2012-256", ct); + + if (hash.Length != 32) + return HealthCheckResult.Degraded("Hash output size mismatch"); + + return HealthCheckResult.Healthy(); + } + catch (Exception ex) + { + return HealthCheckResult.Unhealthy(ex); + } + } + + public bool CanHandle(CryptoOperation operation, string algorithm) + { + return algorithm.StartsWith("GOST", StringComparison.OrdinalIgnoreCase) && + SupportedAlgorithms.Contains(algorithm, StringComparer.OrdinalIgnoreCase); + } + + public async Task SignAsync( + ReadOnlyMemory data, + CryptoSignOptions options, + CancellationToken ct) + { + EnsureInitialized(); + + _context!.Logger.Debug("Signing with algorithm {Algorithm}", options.Algorithm); + + return await _cryptoService!.SignAsync( + data, + options.Algorithm, + options.KeyId, + options.KeyVersion, + ct); + } + + public async Task VerifyAsync( + ReadOnlyMemory data, + ReadOnlyMemory signature, + CryptoVerifyOptions options, + CancellationToken ct) + { + EnsureInitialized(); + + return await _cryptoService!.VerifyAsync( + data, + signature, + options.Algorithm, + options.KeyId, + options.CertificateChain, + ct); + } + + public async Task EncryptAsync( + ReadOnlyMemory data, + CryptoEncryptOptions options, + CancellationToken ct) + { + EnsureInitialized(); + + if (!options.Algorithm.Contains("28147", StringComparison.Ordinal)) + throw new NotSupportedException($"Encryption not supported for {options.Algorithm}"); + + return await _cryptoService!.EncryptAsync( + data, + options.KeyId, + options.Iv, + ct); + } + + public async Task DecryptAsync( + ReadOnlyMemory data, + CryptoDecryptOptions options, + CancellationToken ct) + { + EnsureInitialized(); + + return await _cryptoService!.DecryptAsync( + data, + options.KeyId, + options.Iv, + ct); + } + + public async Task HashAsync( + ReadOnlyMemory data, + string algorithm, + CancellationToken ct) + { + EnsureInitialized(); + + return await _cryptoService!.HashAsync(data, algorithm, ct); + } + + private void EnsureInitialized() + { + if (State != PluginLifecycleState.Active || _cryptoService == null) + throw new InvalidOperationException("GOST provider is not initialized"); + } + + public async ValueTask DisposeAsync() + { + if (_cryptoService != null) + { + await _cryptoService.DisposeAsync(); + _cryptoService = null; + } + State = PluginLifecycleState.Stopped; + } +} +``` + +### Plugin Manifest + +```yaml +# plugin.yaml +plugin: + id: com.stellaops.crypto.gost + name: GOST Cryptography Provider + version: 1.0.0 + vendor: Stella Ops + description: Russian GOST R 34.10-2012 and R 34.11-2012 cryptographic algorithms + license: AGPL-3.0-or-later + +entryPoint: StellaOps.Cryptography.Plugin.Gost.GostPlugin + +minPlatformVersion: 1.0.0 + +capabilities: + - type: crypto + id: gost + algorithms: + - GOST-R34.10-2012-256 + - GOST-R34.10-2012-512 + - GOST-R34.11-2012-256 + - GOST-R34.11-2012-512 + - GOST-28147-89 + +configSchema: + type: object + properties: + keyStorePath: + type: string + description: Path to GOST key store + defaultKeyId: + type: string + description: Default key identifier for signing + required: [] +``` + +### Shared Crypto Base Class + +```csharp +// CryptoPluginBase.cs +namespace StellaOps.Cryptography.Plugin; + +/// +/// Base class for crypto plugins with common functionality. +/// +public abstract class CryptoPluginBase : IPlugin, ICryptoCapability +{ + protected IPluginContext? Context { get; private set; } + + public abstract PluginInfo Info { get; } + public PluginTrustLevel TrustLevel => PluginTrustLevel.BuiltIn; + public PluginCapabilities Capabilities => PluginCapabilities.Crypto; + public PluginLifecycleState State { get; protected set; } = PluginLifecycleState.Discovered; + + public abstract IReadOnlyList SupportedAlgorithms { get; } + + public async Task InitializeAsync(IPluginContext context, CancellationToken ct) + { + Context = context; + State = PluginLifecycleState.Initializing; + + try + { + await InitializeCryptoServiceAsync(context, ct); + State = PluginLifecycleState.Active; + context.Logger.Info("{PluginName} initialized", Info.Name); + } + catch (Exception ex) + { + State = PluginLifecycleState.Failed; + context.Logger.Error(ex, "Failed to initialize {PluginName}", Info.Name); + throw; + } + } + + protected abstract Task InitializeCryptoServiceAsync(IPluginContext context, CancellationToken ct); + + public virtual async Task HealthCheckAsync(CancellationToken ct) + { + if (State != PluginLifecycleState.Active) + return HealthCheckResult.Unhealthy($"Plugin is in state {State}"); + + try + { + // Default health check: verify we can hash test data + var testData = "health-check-test"u8.ToArray(); + var algorithm = SupportedAlgorithms.FirstOrDefault(a => a.Contains("256") || a.Contains("SHA")); + + if (algorithm != null) + { + await HashAsync(testData, algorithm, ct); + } + + return HealthCheckResult.Healthy(); + } + catch (Exception ex) + { + return HealthCheckResult.Unhealthy(ex); + } + } + + public abstract bool CanHandle(CryptoOperation operation, string algorithm); + public abstract Task SignAsync(ReadOnlyMemory data, CryptoSignOptions options, CancellationToken ct); + public abstract Task VerifyAsync(ReadOnlyMemory data, ReadOnlyMemory signature, CryptoVerifyOptions options, CancellationToken ct); + public abstract Task EncryptAsync(ReadOnlyMemory data, CryptoEncryptOptions options, CancellationToken ct); + public abstract Task DecryptAsync(ReadOnlyMemory data, CryptoDecryptOptions options, CancellationToken ct); + public abstract Task HashAsync(ReadOnlyMemory data, string algorithm, CancellationToken ct); + + public abstract ValueTask DisposeAsync(); + + protected void EnsureActive() + { + if (State != PluginLifecycleState.Active) + throw new InvalidOperationException($"{Info.Name} is not active (state: {State})"); + } +} +``` + +### Migration Tasks + +| Provider | Current Interface | New Implementation | Status | +|----------|-------------------|-------------------|--------| +| GOST | `ICryptoProvider` | `GostPlugin : IPlugin, ICryptoCapability` | TODO | +| eIDAS | `ICryptoProvider` | `EidasPlugin : IPlugin, ICryptoCapability` | TODO | +| SM2/SM3/SM4 | `ICryptoProvider` | `SmPlugin : IPlugin, ICryptoCapability` | TODO | +| FIPS | `ICryptoProvider` | `FipsPlugin : IPlugin, ICryptoCapability` | TODO | +| HSM | `IHsmProvider` | `HsmPlugin : IPlugin, ICryptoCapability` | TODO | + +--- + +## Acceptance Criteria + +- [ ] All 5 crypto providers implement `IPlugin` +- [ ] All 5 crypto providers implement `ICryptoCapability` +- [ ] All providers have plugin manifests +- [ ] All existing crypto operations preserved +- [ ] Health checks implemented for all providers +- [ ] All providers discoverable by plugin host +- [ ] All providers register in plugin registry +- [ ] Backward-compatible configuration +- [ ] Unit tests migrated/updated +- [ ] Integration tests passing +- [ ] Performance benchmarks comparable to original + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_001 Plugin Abstractions | Internal | TODO | +| 100_002 Plugin Host | Internal | TODO | +| 100_003 Plugin Registry | Internal | TODO | +| BouncyCastle | External | Available | +| CryptoPro SDK | External | Available (GOST) | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| GostPlugin | TODO | | +| EidasPlugin | TODO | | +| SmPlugin | TODO | | +| FipsPlugin | TODO | | +| HsmPlugin | TODO | | +| CryptoPluginBase | TODO | | +| Plugin manifests (5) | TODO | | +| Unit tests | TODO | | +| Integration tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_100_006_PLUGIN_auth_rework.md b/docs/implplan/SPRINT_20260110_100_006_PLUGIN_auth_rework.md new file mode 100644 index 000000000..3b8c9104f --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_006_PLUGIN_auth_rework.md @@ -0,0 +1,455 @@ +# SPRINT: Auth Plugin Rework + +> **Sprint ID:** 100_006 +> **Module:** PLUGIN +> **Phase:** 100 - Plugin System Unification +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_plugin_unification.md) + +--- + +## Overview + +Rework all authentication providers (LDAP, OIDC, SAML, Workforce Identity) to implement the unified plugin architecture with `IPlugin` and `IAuthCapability` interfaces. + +### Objectives + +- Migrate LDAP provider to unified plugin model +- Migrate OIDC providers (Azure AD, Okta, Google, etc.) to unified plugin model +- Migrate SAML provider to unified plugin model +- Migrate Workforce Identity provider to unified plugin model +- Preserve all existing authentication flows +- Add health checks for all providers +- Add plugin manifests + +### Current State + +``` +src/Authority/ +├── __Plugins/ +│ ├── StellaOps.Authority.Plugin.Ldap/ +│ ├── StellaOps.Authority.Plugin.Oidc/ +│ └── StellaOps.Authority.Plugin.Saml/ +└── __Libraries/ + └── StellaOps.Authority.Identity/ +``` + +### Target State + +Each auth plugin implements: +- `IPlugin` - Core plugin interface with lifecycle +- `IAuthCapability` - Authentication/authorization operations +- Health checks for connectivity +- Plugin manifest for discovery + +--- + +## Deliverables + +### Auth Capability Interface + +```csharp +// IAuthCapability.cs (added to 100_001 Abstractions) +namespace StellaOps.Plugin.Abstractions.Capabilities; + +/// +/// Capability interface for authentication and authorization. +/// +public interface IAuthCapability +{ + /// + /// Auth provider type (ldap, oidc, saml, workforce). + /// + string ProviderType { get; } + + /// + /// Supported authentication methods. + /// + IReadOnlyList SupportedMethods { get; } + + /// + /// Authenticate a user with credentials. + /// + Task AuthenticateAsync(AuthRequest request, CancellationToken ct); + + /// + /// Validate an existing token/session. + /// + Task ValidateTokenAsync(string token, CancellationToken ct); + + /// + /// Get user information. + /// + Task GetUserInfoAsync(string userId, CancellationToken ct); + + /// + /// Get user's group memberships. + /// + Task> GetUserGroupsAsync(string userId, CancellationToken ct); + + /// + /// Check if user has specific permission. + /// + Task HasPermissionAsync(string userId, string permission, CancellationToken ct); + + /// + /// Initiate SSO flow (for OIDC/SAML). + /// + Task InitiateSsoAsync(SsoRequest request, CancellationToken ct); + + /// + /// Complete SSO callback. + /// + Task CompleteSsoAsync(SsoCallback callback, CancellationToken ct); +} + +public sealed record AuthRequest( + string Method, + string? Username, + string? Password, + string? Token, + IReadOnlyDictionary? AdditionalData); + +public sealed record AuthResult( + bool Success, + string? UserId, + string? AccessToken, + string? RefreshToken, + DateTimeOffset? ExpiresAt, + IReadOnlyList? Roles, + string? Error); + +public sealed record ValidationResult( + bool Valid, + string? UserId, + DateTimeOffset? ExpiresAt, + IReadOnlyList? Claims, + string? Error); + +public sealed record UserInfo( + string Id, + string Username, + string? Email, + string? DisplayName, + IReadOnlyDictionary? Attributes); + +public sealed record GroupInfo( + string Id, + string Name, + string? Description); + +public sealed record SsoRequest( + string RedirectUri, + string? State, + IReadOnlyList? Scopes); + +public sealed record SsoInitiation( + string AuthorizationUrl, + string State, + string? CodeVerifier); + +public sealed record SsoCallback( + string? Code, + string? State, + string? Error, + string? CodeVerifier); +``` + +### LDAP Plugin Implementation + +```csharp +// LdapPlugin.cs +namespace StellaOps.Authority.Plugin.Ldap; + +[Plugin( + id: "com.stellaops.auth.ldap", + name: "LDAP Authentication Provider", + version: "1.0.0", + vendor: "Stella Ops")] +[ProvidesCapability(PluginCapabilities.Auth, CapabilityId = "ldap")] +public sealed class LdapPlugin : IPlugin, IAuthCapability +{ + private IPluginContext? _context; + private LdapConnection? _connection; + private LdapOptions? _options; + + public PluginInfo Info => new( + Id: "com.stellaops.auth.ldap", + Name: "LDAP Authentication Provider", + Version: "1.0.0", + Vendor: "Stella Ops", + Description: "LDAP/Active Directory authentication and user lookup"); + + public PluginTrustLevel TrustLevel => PluginTrustLevel.BuiltIn; + public PluginCapabilities Capabilities => PluginCapabilities.Auth | PluginCapabilities.Network; + public PluginLifecycleState State { get; private set; } = PluginLifecycleState.Discovered; + + public string ProviderType => "ldap"; + public IReadOnlyList SupportedMethods => new[] { "password", "kerberos" }; + + public async Task InitializeAsync(IPluginContext context, CancellationToken ct) + { + _context = context; + State = PluginLifecycleState.Initializing; + + _options = context.Configuration.Bind(); + + // Test connection + _connection = new LdapConnection(new LdapDirectoryIdentifier(_options.Server, _options.Port)); + _connection.Credential = new NetworkCredential(_options.BindDn, _options.BindPassword); + _connection.AuthType = AuthType.Basic; + _connection.SessionOptions.SecureSocketLayer = _options.UseSsl; + + await Task.Run(() => _connection.Bind(), ct); + + State = PluginLifecycleState.Active; + context.Logger.Info("LDAP plugin connected to {Server}", _options.Server); + } + + public async Task HealthCheckAsync(CancellationToken ct) + { + if (_connection == null) + return HealthCheckResult.Unhealthy("Not initialized"); + + try + { + // Perform a simple search to verify connectivity + var request = new SearchRequest( + _options!.BaseDn, + "(objectClass=*)", + SearchScope.Base, + "objectClass"); + + var response = await Task.Run(() => + (SearchResponse)_connection.SendRequest(request), ct); + + return response.Entries.Count > 0 + ? HealthCheckResult.Healthy() + : HealthCheckResult.Degraded("Base DN search returned no results"); + } + catch (Exception ex) + { + return HealthCheckResult.Unhealthy(ex); + } + } + + public async Task AuthenticateAsync(AuthRequest request, CancellationToken ct) + { + if (request.Method != "password" || string.IsNullOrEmpty(request.Username)) + return new AuthResult(false, null, null, null, null, null, "Invalid auth method or missing username"); + + try + { + // Find user DN + var userDn = await FindUserDnAsync(request.Username, ct); + if (userDn == null) + return new AuthResult(false, null, null, null, null, null, "User not found"); + + // Attempt bind with user credentials + using var userConnection = new LdapConnection( + new LdapDirectoryIdentifier(_options!.Server, _options.Port)); + userConnection.Credential = new NetworkCredential(userDn, request.Password); + + await Task.Run(() => userConnection.Bind(), ct); + + // Get user info and groups + var userInfo = await GetUserInfoAsync(request.Username, ct); + var groups = await GetUserGroupsAsync(request.Username, ct); + + return new AuthResult( + Success: true, + UserId: request.Username, + AccessToken: null, // LDAP doesn't issue tokens + RefreshToken: null, + ExpiresAt: null, + Roles: groups.Select(g => g.Name).ToList(), + Error: null); + } + catch (LdapException ex) + { + _context?.Logger.Warning(ex, "LDAP authentication failed for {Username}", request.Username); + return new AuthResult(false, null, null, null, null, null, "Authentication failed"); + } + } + + public Task ValidateTokenAsync(string token, CancellationToken ct) + { + // LDAP doesn't use tokens + return Task.FromResult(new ValidationResult(false, null, null, null, "LDAP does not support token validation")); + } + + public async Task GetUserInfoAsync(string userId, CancellationToken ct) + { + var userDn = await FindUserDnAsync(userId, ct); + if (userDn == null) return null; + + var request = new SearchRequest( + userDn, + "(objectClass=*)", + SearchScope.Base, + "uid", "mail", "displayName", "cn", "sn", "givenName"); + + var response = await Task.Run(() => + (SearchResponse)_connection!.SendRequest(request), ct); + + if (response.Entries.Count == 0) return null; + + var entry = response.Entries[0]; + return new UserInfo( + Id: userId, + Username: GetAttribute(entry, "uid") ?? userId, + Email: GetAttribute(entry, "mail"), + DisplayName: GetAttribute(entry, "displayName") ?? GetAttribute(entry, "cn"), + Attributes: entry.Attributes.Cast() + .ToDictionary(a => a.Name, a => a[0]?.ToString() ?? "")); + } + + public async Task> GetUserGroupsAsync(string userId, CancellationToken ct) + { + var userDn = await FindUserDnAsync(userId, ct); + if (userDn == null) return Array.Empty(); + + var request = new SearchRequest( + _options!.GroupBaseDn ?? _options.BaseDn, + $"(member={userDn})", + SearchScope.Subtree, + "cn", "description"); + + var response = await Task.Run(() => + (SearchResponse)_connection!.SendRequest(request), ct); + + return response.Entries.Cast() + .Select(e => new GroupInfo( + Id: e.DistinguishedName, + Name: GetAttribute(e, "cn") ?? e.DistinguishedName, + Description: GetAttribute(e, "description"))) + .ToList(); + } + + public async Task HasPermissionAsync(string userId, string permission, CancellationToken ct) + { + var groups = await GetUserGroupsAsync(userId, ct); + // Permission checking would be based on group membership + return groups.Any(g => g.Name.Equals(permission, StringComparison.OrdinalIgnoreCase)); + } + + public Task InitiateSsoAsync(SsoRequest request, CancellationToken ct) + { + // LDAP doesn't support SSO initiation + return Task.FromResult(null); + } + + public Task CompleteSsoAsync(SsoCallback callback, CancellationToken ct) + { + return Task.FromResult(new AuthResult(false, null, null, null, null, null, "LDAP does not support SSO")); + } + + private async Task FindUserDnAsync(string username, CancellationToken ct) + { + var filter = string.Format(_options!.UserFilter, username); + var request = new SearchRequest( + _options.BaseDn, + filter, + SearchScope.Subtree, + "distinguishedName"); + + var response = await Task.Run(() => + (SearchResponse)_connection!.SendRequest(request), ct); + + return response.Entries.Count > 0 ? response.Entries[0].DistinguishedName : null; + } + + private static string? GetAttribute(SearchResultEntry entry, string name) + { + return entry.Attributes[name]?[0]?.ToString(); + } + + public async ValueTask DisposeAsync() + { + _connection?.Dispose(); + _connection = null; + State = PluginLifecycleState.Stopped; + } +} + +public sealed class LdapOptions +{ + public string Server { get; set; } = "localhost"; + public int Port { get; set; } = 389; + public bool UseSsl { get; set; } = false; + public string BaseDn { get; set; } = ""; + public string? GroupBaseDn { get; set; } + public string BindDn { get; set; } = ""; + public string BindPassword { get; set; } = ""; + public string UserFilter { get; set; } = "(uid={0})"; +} +``` + +### Migration Tasks + +| Provider | Current Interface | New Implementation | Status | +|----------|-------------------|-------------------|--------| +| LDAP | Authority plugin interfaces | `LdapPlugin : IPlugin, IAuthCapability` | TODO | +| OIDC Generic | Authority plugin interfaces | `OidcPlugin : IPlugin, IAuthCapability` | TODO | +| Azure AD | Authority plugin interfaces | `AzureAdPlugin : OidcPlugin` | TODO | +| Okta | Authority plugin interfaces | `OktaPlugin : OidcPlugin` | TODO | +| Google | Authority plugin interfaces | `GooglePlugin : OidcPlugin` | TODO | +| SAML | Authority plugin interfaces | `SamlPlugin : IPlugin, IAuthCapability` | TODO | +| Workforce | Authority plugin interfaces | `WorkforcePlugin : IPlugin, IAuthCapability` | TODO | + +--- + +## Acceptance Criteria + +- [ ] All auth providers implement `IPlugin` +- [ ] All auth providers implement `IAuthCapability` +- [ ] All providers have plugin manifests +- [ ] LDAP bind/search operations work +- [ ] OIDC authorization flow works +- [ ] OIDC token validation works +- [ ] SAML assertion handling works +- [ ] SSO initiation/completion works +- [ ] User info retrieval works +- [ ] Group membership queries work +- [ ] Health checks for all providers +- [ ] Unit tests migrated/updated +- [ ] Integration tests passing + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_001 Plugin Abstractions | Internal | TODO | +| 100_002 Plugin Host | Internal | TODO | +| 100_003 Plugin Registry | Internal | TODO | +| System.DirectoryServices.Protocols | External | Available | +| Microsoft.IdentityModel.* | External | Available | +| ITfoxtec.Identity.Saml2 | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IAuthCapability interface | TODO | | +| LdapPlugin | TODO | | +| OidcPlugin (base) | TODO | | +| AzureAdPlugin | TODO | | +| OktaPlugin | TODO | | +| GooglePlugin | TODO | | +| SamlPlugin | TODO | | +| WorkforcePlugin | TODO | | +| Plugin manifests | TODO | | +| Unit tests | TODO | | +| Integration tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_100_007_PLUGIN_llm_rework.md b/docs/implplan/SPRINT_20260110_100_007_PLUGIN_llm_rework.md new file mode 100644 index 000000000..204d49a3f --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_007_PLUGIN_llm_rework.md @@ -0,0 +1,453 @@ +# SPRINT: LLM Provider Rework + +> **Sprint ID:** 100_007 +> **Module:** PLUGIN +> **Phase:** 100 - Plugin System Unification +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_plugin_unification.md) + +--- + +## Overview + +Rework all LLM providers (llama-server, ollama, OpenAI, Claude) to implement the unified plugin architecture with `IPlugin` and `ILlmCapability` interfaces. + +### Objectives + +- Migrate llama-server provider to unified plugin model +- Migrate ollama provider to unified plugin model +- Migrate OpenAI provider to unified plugin model +- Migrate Claude provider to unified plugin model +- Preserve priority-based provider selection +- Add health checks with model availability +- Add plugin manifests + +### Current State + +``` +src/AdvisoryAI/ +├── __Libraries/ +│ └── StellaOps.AdvisoryAI.Providers/ +│ ├── LlamaServerProvider.cs +│ ├── OllamaProvider.cs +│ ├── OpenAiProvider.cs +│ └── ClaudeProvider.cs +``` + +--- + +## Deliverables + +### LLM Capability Interface + +```csharp +// ILlmCapability.cs +namespace StellaOps.Plugin.Abstractions.Capabilities; + +/// +/// Capability interface for Large Language Model inference. +/// +public interface ILlmCapability +{ + /// + /// Provider identifier (llama, ollama, openai, claude). + /// + string ProviderId { get; } + + /// + /// Priority for provider selection (higher = preferred). + /// + int Priority { get; } + + /// + /// Available models from this provider. + /// + IReadOnlyList AvailableModels { get; } + + /// + /// Create an inference session. + /// + Task CreateSessionAsync(LlmSessionOptions options, CancellationToken ct); + + /// + /// Check if provider can serve the specified model. + /// + Task CanServeModelAsync(string modelId, CancellationToken ct); + + /// + /// Refresh available models list. + /// + Task RefreshModelsAsync(CancellationToken ct); +} + +public interface ILlmSession : IAsyncDisposable +{ + /// + /// Session identifier. + /// + string SessionId { get; } + + /// + /// Model being used. + /// + string ModelId { get; } + + /// + /// Generate a completion. + /// + Task CompleteAsync(LlmPrompt prompt, CancellationToken ct); + + /// + /// Generate a streaming completion. + /// + IAsyncEnumerable CompleteStreamingAsync(LlmPrompt prompt, CancellationToken ct); + + /// + /// Generate embeddings for text. + /// + Task EmbedAsync(string text, CancellationToken ct); +} + +public sealed record LlmModelInfo( + string Id, + string Name, + string? Description, + long? ParameterCount, + int? ContextLength, + IReadOnlyList Capabilities); // ["chat", "completion", "embedding"] + +public sealed record LlmSessionOptions( + string ModelId, + LlmParameters? Parameters = null, + string? SystemPrompt = null); + +public sealed record LlmParameters( + float? Temperature = null, + float? TopP = null, + int? MaxTokens = null, + float? FrequencyPenalty = null, + float? PresencePenalty = null, + IReadOnlyList? StopSequences = null); + +public sealed record LlmPrompt( + IReadOnlyList Messages, + LlmParameters? ParameterOverrides = null); + +public sealed record LlmMessage( + LlmRole Role, + string Content); + +public enum LlmRole +{ + System, + User, + Assistant +} + +public sealed record LlmCompletion( + string Content, + LlmUsage Usage, + string? FinishReason); + +public sealed record LlmCompletionChunk( + string Content, + bool IsComplete, + LlmUsage? Usage = null); + +public sealed record LlmUsage( + int PromptTokens, + int CompletionTokens, + int TotalTokens); + +public sealed record LlmEmbedding( + float[] Vector, + int Dimensions, + LlmUsage Usage); +``` + +### OpenAI Plugin Implementation + +```csharp +// OpenAiPlugin.cs +namespace StellaOps.AdvisoryAI.Plugin.OpenAi; + +[Plugin( + id: "com.stellaops.llm.openai", + name: "OpenAI LLM Provider", + version: "1.0.0", + vendor: "Stella Ops")] +[ProvidesCapability(PluginCapabilities.Llm, CapabilityId = "openai")] +[RequiresCapability(PluginCapabilities.Network)] +public sealed class OpenAiPlugin : IPlugin, ILlmCapability +{ + private IPluginContext? _context; + private OpenAiClient? _client; + private List _models = new(); + + public PluginInfo Info => new( + Id: "com.stellaops.llm.openai", + Name: "OpenAI LLM Provider", + Version: "1.0.0", + Vendor: "Stella Ops", + Description: "OpenAI GPT models for AI-assisted advisory analysis"); + + public PluginTrustLevel TrustLevel => PluginTrustLevel.BuiltIn; + public PluginCapabilities Capabilities => PluginCapabilities.Llm | PluginCapabilities.Network; + public PluginLifecycleState State { get; private set; } = PluginLifecycleState.Discovered; + + public string ProviderId => "openai"; + public int Priority { get; private set; } = 10; + public IReadOnlyList AvailableModels => _models; + + public async Task InitializeAsync(IPluginContext context, CancellationToken ct) + { + _context = context; + State = PluginLifecycleState.Initializing; + + var options = context.Configuration.Bind(); + var apiKey = await context.Configuration.GetSecretAsync("openai-api-key", ct) + ?? options.ApiKey; + + if (string.IsNullOrEmpty(apiKey)) + { + State = PluginLifecycleState.Failed; + throw new InvalidOperationException("OpenAI API key not configured"); + } + + _client = new OpenAiClient(apiKey, options.BaseUrl); + Priority = options.Priority; + + await RefreshModelsAsync(ct); + + State = PluginLifecycleState.Active; + context.Logger.Info("OpenAI plugin initialized with {ModelCount} models", _models.Count); + } + + public async Task HealthCheckAsync(CancellationToken ct) + { + if (_client == null) + return HealthCheckResult.Unhealthy("Not initialized"); + + try + { + var models = await _client.ListModelsAsync(ct); + return HealthCheckResult.Healthy(details: new Dictionary + { + ["modelCount"] = models.Count + }); + } + catch (Exception ex) + { + return HealthCheckResult.Unhealthy(ex); + } + } + + public async Task CreateSessionAsync(LlmSessionOptions options, CancellationToken ct) + { + EnsureActive(); + + if (!await CanServeModelAsync(options.ModelId, ct)) + throw new InvalidOperationException($"Model {options.ModelId} not available"); + + return new OpenAiSession(_client!, options, _context!.Logger); + } + + public async Task CanServeModelAsync(string modelId, CancellationToken ct) + { + return _models.Any(m => m.Id.Equals(modelId, StringComparison.OrdinalIgnoreCase)); + } + + public async Task RefreshModelsAsync(CancellationToken ct) + { + var models = await _client!.ListModelsAsync(ct); + _models = models + .Where(m => m.Id.StartsWith("gpt") || m.Id.Contains("embedding")) + .Select(m => new LlmModelInfo( + Id: m.Id, + Name: m.Id, + Description: null, + ParameterCount: null, + ContextLength: GetContextLength(m.Id), + Capabilities: GetModelCapabilities(m.Id))) + .ToList(); + } + + private static int? GetContextLength(string modelId) => modelId switch + { + var m when m.Contains("gpt-4-turbo") => 128000, + var m when m.Contains("gpt-4") => 8192, + var m when m.Contains("gpt-3.5-turbo-16k") => 16384, + var m when m.Contains("gpt-3.5") => 4096, + _ => null + }; + + private static List GetModelCapabilities(string modelId) + { + if (modelId.Contains("embedding")) + return new List { "embedding" }; + return new List { "chat", "completion" }; + } + + private void EnsureActive() + { + if (State != PluginLifecycleState.Active) + throw new InvalidOperationException($"OpenAI plugin is not active (state: {State})"); + } + + public ValueTask DisposeAsync() + { + _client?.Dispose(); + State = PluginLifecycleState.Stopped; + return ValueTask.CompletedTask; + } +} + +internal sealed class OpenAiSession : ILlmSession +{ + private readonly OpenAiClient _client; + private readonly LlmSessionOptions _options; + private readonly IPluginLogger _logger; + + public string SessionId { get; } = Guid.NewGuid().ToString("N"); + public string ModelId => _options.ModelId; + + public OpenAiSession(OpenAiClient client, LlmSessionOptions options, IPluginLogger logger) + { + _client = client; + _options = options; + _logger = logger; + } + + public async Task CompleteAsync(LlmPrompt prompt, CancellationToken ct) + { + var request = BuildRequest(prompt); + var response = await _client.ChatCompleteAsync(request, ct); + + return new LlmCompletion( + Content: response.Choices[0].Message.Content, + Usage: new LlmUsage( + response.Usage.PromptTokens, + response.Usage.CompletionTokens, + response.Usage.TotalTokens), + FinishReason: response.Choices[0].FinishReason); + } + + public async IAsyncEnumerable CompleteStreamingAsync( + LlmPrompt prompt, + [EnumeratorCancellation] CancellationToken ct) + { + var request = BuildRequest(prompt); + request.Stream = true; + + await foreach (var chunk in _client.ChatCompleteStreamAsync(request, ct)) + { + yield return new LlmCompletionChunk( + Content: chunk.Choices[0].Delta?.Content ?? "", + IsComplete: chunk.Choices[0].FinishReason != null, + Usage: chunk.Usage != null ? new LlmUsage( + chunk.Usage.PromptTokens, + chunk.Usage.CompletionTokens, + chunk.Usage.TotalTokens) : null); + } + } + + public async Task EmbedAsync(string text, CancellationToken ct) + { + var response = await _client.EmbedAsync(text, "text-embedding-ada-002", ct); + + return new LlmEmbedding( + Vector: response.Data[0].Embedding, + Dimensions: response.Data[0].Embedding.Length, + Usage: new LlmUsage(response.Usage.PromptTokens, 0, response.Usage.TotalTokens)); + } + + private ChatCompletionRequest BuildRequest(LlmPrompt prompt) + { + var messages = new List(); + + if (!string.IsNullOrEmpty(_options.SystemPrompt)) + { + messages.Add(new ChatMessage("system", _options.SystemPrompt)); + } + + messages.AddRange(prompt.Messages.Select(m => new ChatMessage( + m.Role.ToString().ToLowerInvariant(), + m.Content))); + + var parameters = prompt.ParameterOverrides ?? _options.Parameters ?? new LlmParameters(); + + return new ChatCompletionRequest + { + Model = ModelId, + Messages = messages, + Temperature = parameters.Temperature, + TopP = parameters.TopP, + MaxTokens = parameters.MaxTokens, + FrequencyPenalty = parameters.FrequencyPenalty, + PresencePenalty = parameters.PresencePenalty, + Stop = parameters.StopSequences?.ToArray() + }; + } + + public ValueTask DisposeAsync() => ValueTask.CompletedTask; +} +``` + +### Migration Tasks + +| Provider | Priority | New Implementation | Status | +|----------|----------|-------------------|--------| +| llama-server | 100 (local) | `LlamaServerPlugin : IPlugin, ILlmCapability` | TODO | +| ollama | 90 (local) | `OllamaPlugin : IPlugin, ILlmCapability` | TODO | +| Claude | 20 | `ClaudePlugin : IPlugin, ILlmCapability` | TODO | +| OpenAI | 10 | `OpenAiPlugin : IPlugin, ILlmCapability` | TODO | + +--- + +## Acceptance Criteria + +- [ ] All LLM providers implement `IPlugin` +- [ ] All LLM providers implement `ILlmCapability` +- [ ] Priority-based provider selection preserved +- [ ] Chat completion works +- [ ] Streaming completion works +- [ ] Embedding generation works +- [ ] Model listing works +- [ ] Health checks verify API connectivity +- [ ] Local providers (llama/ollama) check process availability +- [ ] Unit tests migrated/updated +- [ ] Integration tests with mock servers + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_001 Plugin Abstractions | Internal | TODO | +| 100_002 Plugin Host | Internal | TODO | +| OpenAI .NET SDK | External | Available | +| Anthropic SDK | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| ILlmCapability interface | TODO | | +| LlamaServerPlugin | TODO | | +| OllamaPlugin | TODO | | +| OpenAiPlugin | TODO | | +| ClaudePlugin | TODO | | +| LlmProviderSelector | TODO | Priority-based selection | +| Plugin manifests | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_100_008_PLUGIN_scm_rework.md b/docs/implplan/SPRINT_20260110_100_008_PLUGIN_scm_rework.md new file mode 100644 index 000000000..3aa251343 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_008_PLUGIN_scm_rework.md @@ -0,0 +1,359 @@ +# SPRINT: SCM Connector Rework + +> **Sprint ID:** 100_008 +> **Module:** PLUGIN +> **Phase:** 100 - Plugin System Unification +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_plugin_unification.md) + +--- + +## Overview + +Rework all SCM connectors (GitHub, GitLab, Azure DevOps, Gitea, Bitbucket) to implement the unified plugin architecture with `IPlugin` and `IScmCapability` interfaces. + +### Objectives + +- Migrate GitHub connector to unified plugin model +- Migrate GitLab connector to unified plugin model +- Migrate Azure DevOps connector to unified plugin model +- Migrate Gitea connector to unified plugin model +- Add Bitbucket connector +- Preserve URL auto-detection +- Add health checks with API connectivity +- Add plugin manifests + +### Migration Tasks + +| Provider | Current Interface | New Implementation | Status | +|----------|-------------------|-------------------|--------| +| GitHub | `IScmConnectorPlugin` | `GitHubPlugin : IPlugin, IScmCapability` | TODO | +| GitLab | `IScmConnectorPlugin` | `GitLabPlugin : IPlugin, IScmCapability` | TODO | +| Azure DevOps | `IScmConnectorPlugin` | `AzureDevOpsPlugin : IPlugin, IScmCapability` | TODO | +| Gitea | `IScmConnectorPlugin` | `GiteaPlugin : IPlugin, IScmCapability` | TODO | +| Bitbucket | (new) | `BitbucketPlugin : IPlugin, IScmCapability` | TODO | + +--- + +## Deliverables + +### GitHub Plugin Implementation + +```csharp +// GitHubPlugin.cs +namespace StellaOps.Integrations.Plugin.GitHub; + +[Plugin( + id: "com.stellaops.scm.github", + name: "GitHub SCM Connector", + version: "1.0.0", + vendor: "Stella Ops")] +[ProvidesCapability(PluginCapabilities.Scm, CapabilityId = "github")] +public sealed class GitHubPlugin : IPlugin, IScmCapability +{ + private IPluginContext? _context; + private GitHubClient? _client; + private GitHubOptions? _options; + + public PluginInfo Info => new( + Id: "com.stellaops.scm.github", + Name: "GitHub SCM Connector", + Version: "1.0.0", + Vendor: "Stella Ops", + Description: "GitHub repository integration for source control operations"); + + public PluginTrustLevel TrustLevel => PluginTrustLevel.BuiltIn; + public PluginCapabilities Capabilities => PluginCapabilities.Scm | PluginCapabilities.Network; + public PluginLifecycleState State { get; private set; } = PluginLifecycleState.Discovered; + + public string ConnectorType => "scm.github"; + public string DisplayName => "GitHub"; + public string ScmType => "github"; + + private static readonly Regex GitHubUrlPattern = new( + @"^https?://(?:www\.)?github\.com/([^/]+)/([^/]+?)(?:\.git)?/?$", + RegexOptions.Compiled | RegexOptions.IgnoreCase); + + public async Task InitializeAsync(IPluginContext context, CancellationToken ct) + { + _context = context; + State = PluginLifecycleState.Initializing; + + _options = context.Configuration.Bind(); + + var token = await context.Configuration.GetSecretAsync("github-token", ct) + ?? _options.Token; + + _client = new GitHubClient(new ProductHeaderValue("StellaOps")) + { + Credentials = new Credentials(token) + }; + + if (!string.IsNullOrEmpty(_options.BaseUrl)) + { + _client = new GitHubClient( + new ProductHeaderValue("StellaOps"), + new Uri(_options.BaseUrl)) + { + Credentials = new Credentials(token) + }; + } + + State = PluginLifecycleState.Active; + context.Logger.Info("GitHub plugin initialized"); + } + + public async Task HealthCheckAsync(CancellationToken ct) + { + if (_client == null) + return HealthCheckResult.Unhealthy("Not initialized"); + + try + { + var user = await _client.User.Current(); + return HealthCheckResult.Healthy(details: new Dictionary + { + ["authenticatedAs"] = user.Login, + ["rateLimitRemaining"] = _client.GetLastApiInfo()?.RateLimit?.Remaining ?? -1 + }); + } + catch (Exception ex) + { + return HealthCheckResult.Unhealthy(ex); + } + } + + public bool CanHandle(string repositoryUrl) => GitHubUrlPattern.IsMatch(repositoryUrl); + + public async Task TestConnectionAsync(CancellationToken ct) + { + try + { + var sw = Stopwatch.StartNew(); + var user = await _client!.User.Current(); + sw.Stop(); + + return ConnectionTestResult.Succeeded(sw.Elapsed); + } + catch (Exception ex) + { + return ConnectionTestResult.Failed(ex.Message, ex); + } + } + + public async Task GetConnectionInfoAsync(CancellationToken ct) + { + var user = await _client!.User.Current(); + var apiInfo = _client.GetLastApiInfo(); + + return new ConnectionInfo( + EndpointUrl: _options?.BaseUrl ?? "https://api.github.com", + AuthenticatedAs: user.Login, + Metadata: new Dictionary + { + ["rateLimitRemaining"] = apiInfo?.RateLimit?.Remaining ?? -1, + ["rateLimitReset"] = apiInfo?.RateLimit?.Reset.ToString() ?? "" + }); + } + + public async Task> ListBranchesAsync(string repositoryUrl, CancellationToken ct) + { + var (owner, repo) = ParseRepositoryUrl(repositoryUrl); + var branches = await _client!.Repository.Branch.GetAll(owner, repo); + var defaultBranch = (await _client.Repository.Get(owner, repo)).DefaultBranch; + + return branches.Select(b => new ScmBranch( + Name: b.Name, + CommitSha: b.Commit.Sha, + IsDefault: b.Name == defaultBranch, + IsProtected: b.Protected)).ToList(); + } + + public async Task> ListCommitsAsync( + string repositoryUrl, + string branch, + int limit = 50, + CancellationToken ct = default) + { + var (owner, repo) = ParseRepositoryUrl(repositoryUrl); + var commits = await _client!.Repository.Commit.GetAll(owner, repo, + new CommitRequest { Sha = branch }, + new ApiOptions { PageSize = limit, PageCount = 1 }); + + return commits.Select(c => new ScmCommit( + Sha: c.Sha, + Message: c.Commit.Message, + AuthorName: c.Commit.Author.Name, + AuthorEmail: c.Commit.Author.Email, + AuthoredAt: c.Commit.Author.Date, + ParentShas: c.Parents.Select(p => p.Sha).ToList())).ToList(); + } + + public async Task GetCommitAsync(string repositoryUrl, string commitSha, CancellationToken ct) + { + var (owner, repo) = ParseRepositoryUrl(repositoryUrl); + var commit = await _client!.Repository.Commit.Get(owner, repo, commitSha); + + return new ScmCommit( + Sha: commit.Sha, + Message: commit.Commit.Message, + AuthorName: commit.Commit.Author.Name, + AuthorEmail: commit.Commit.Author.Email, + AuthoredAt: commit.Commit.Author.Date, + ParentShas: commit.Parents.Select(p => p.Sha).ToList()); + } + + public async Task GetFileAsync( + string repositoryUrl, + string filePath, + string? reference = null, + CancellationToken ct = default) + { + var (owner, repo) = ParseRepositoryUrl(repositoryUrl); + var content = await _client!.Repository.Content.GetAllContentsByRef(owner, repo, filePath, reference ?? "HEAD"); + var file = content.First(); + + return new ScmFileContent( + Path: file.Path, + Content: file.Content, + Encoding: file.Encoding.StringValue, + Sha: file.Sha, + Size: file.Size); + } + + public async Task GetArchiveAsync( + string repositoryUrl, + string reference, + ArchiveFormat format = ArchiveFormat.TarGz, + CancellationToken ct = default) + { + var (owner, repo) = ParseRepositoryUrl(repositoryUrl); + var archiveFormat = format == ArchiveFormat.Zip + ? Octokit.ArchiveFormat.Zipball + : Octokit.ArchiveFormat.Tarball; + + var bytes = await _client!.Repository.Content.GetArchive(owner, repo, archiveFormat, reference); + return new MemoryStream(bytes); + } + + public async Task UpsertWebhookAsync( + string repositoryUrl, + ScmWebhookConfig config, + CancellationToken ct) + { + var (owner, repo) = ParseRepositoryUrl(repositoryUrl); + + var existingHooks = await _client!.Repository.Hooks.GetAll(owner, repo); + var existing = existingHooks.FirstOrDefault(h => + h.Config.TryGetValue("url", out var url) && url == config.Url); + + if (existing != null) + { + var updated = await _client.Repository.Hooks.Edit(owner, repo, (int)existing.Id, + new EditRepositoryHook(config.Events.ToArray()) + { + Active = true, + Config = new Dictionary + { + ["url"] = config.Url, + ["secret"] = config.Secret, + ["content_type"] = "json" + } + }); + + return new ScmWebhook(updated.Id.ToString(), updated.Config["url"], updated.Events.ToList(), updated.Active); + } + + var created = await _client.Repository.Hooks.Create(owner, repo, new NewRepositoryHook("web", new Dictionary + { + ["url"] = config.Url, + ["secret"] = config.Secret, + ["content_type"] = "json" + }) + { + Events = config.Events.ToArray(), + Active = true + }); + + return new ScmWebhook(created.Id.ToString(), created.Config["url"], created.Events.ToList(), created.Active); + } + + public async Task GetCurrentUserAsync(CancellationToken ct) + { + var user = await _client!.User.Current(); + + return new ScmUser( + Id: user.Id.ToString(), + Username: user.Login, + DisplayName: user.Name, + Email: user.Email, + AvatarUrl: user.AvatarUrl); + } + + private static (string Owner, string Repo) ParseRepositoryUrl(string url) + { + var match = GitHubUrlPattern.Match(url); + if (!match.Success) + throw new ArgumentException($"Invalid GitHub repository URL: {url}"); + + return (match.Groups[1].Value, match.Groups[2].Value); + } + + public ValueTask DisposeAsync() + { + State = PluginLifecycleState.Stopped; + return ValueTask.CompletedTask; + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] All SCM connectors implement `IPlugin` +- [ ] All SCM connectors implement `IScmCapability` +- [ ] URL auto-detection works for all providers +- [ ] Branch listing works +- [ ] Commit listing works +- [ ] File retrieval works +- [ ] Archive download works +- [ ] Webhook management works +- [ ] Health checks verify API connectivity +- [ ] Rate limit information exposed +- [ ] Unit tests migrated/updated +- [ ] Integration tests with mock APIs + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_001 Plugin Abstractions | Internal | TODO | +| 100_002 Plugin Host | Internal | TODO | +| Octokit | External | Available | +| GitLabApiClient | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| GitHubPlugin | TODO | | +| GitLabPlugin | TODO | | +| AzureDevOpsPlugin | TODO | | +| GiteaPlugin | TODO | | +| BitbucketPlugin | TODO | New | +| ScmPluginBase | TODO | Shared base class | +| Plugin manifests | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_100_009_PLUGIN_scanner_rework.md b/docs/implplan/SPRINT_20260110_100_009_PLUGIN_scanner_rework.md new file mode 100644 index 000000000..a00a4457d --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_009_PLUGIN_scanner_rework.md @@ -0,0 +1,1156 @@ +# SPRINT: Scanner Analyzer Rework + +> **Sprint ID:** 100_009 +> **Module:** PLUGIN +> **Phase:** 100 - Plugin System Unification +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_plugin_unification.md) + +--- + +## Overview + +Rework all Scanner analyzers (11 language analyzers) to implement the unified plugin architecture with `IPlugin` and `IAnalysisCapability` interfaces. + +### Objectives + +- Migrate all 11 language analyzers to unified plugin model +- Migrate SBOM generators to unified plugin model +- Migrate binary analyzers to unified plugin model +- Preserve deterministic output guarantees +- Add health checks for analyzer availability +- Add plugin manifests +- Maintain backward compatibility with existing scan workflows + +### Current State + +``` +src/Scanner/ +├── __Libraries/ +│ └── StellaOps.Scanner.Analyzers/ +│ ├── Languages/ +│ │ ├── DotNetAnalyzer.cs # .NET/NuGet +│ │ ├── GoAnalyzer.cs # Go modules +│ │ ├── JavaAnalyzer.cs # Maven/Gradle +│ │ ├── JavaScriptAnalyzer.cs # npm/yarn/pnpm +│ │ ├── PythonAnalyzer.cs # pip/poetry/pipenv +│ │ ├── RubyAnalyzer.cs # Bundler/Gemfile +│ │ ├── RustAnalyzer.cs # Cargo +│ │ ├── PhpAnalyzer.cs # Composer +│ │ ├── SwiftAnalyzer.cs # Swift Package Manager +│ │ ├── CppAnalyzer.cs # Conan/vcpkg +│ │ └── ElixirAnalyzer.cs # Mix/Hex +│ ├── Binary/ +│ │ ├── ElfAnalyzer.cs +│ │ ├── PeAnalyzer.cs +│ │ └── MachOAnalyzer.cs +│ └── Sbom/ +│ ├── SpdxGenerator.cs +│ └── CycloneDxGenerator.cs +``` + +### Target State + +Each analyzer implements: +- `IPlugin` - Core plugin interface with lifecycle +- `IAnalysisCapability` - Analysis operations +- Health checks for tool availability +- Plugin manifest for discovery +- Deterministic output guarantees + +--- + +## Deliverables + +### Analysis Capability Interface + +```csharp +// IAnalysisCapability.cs +namespace StellaOps.Plugin.Abstractions.Capabilities; + +/// +/// Capability interface for container/dependency analysis. +/// +public interface IAnalysisCapability +{ + /// + /// Analyzer identifier (dotnet, go, java, etc.). + /// + string AnalyzerId { get; } + + /// + /// Analysis category (language, binary, sbom). + /// + AnalysisCategory Category { get; } + + /// + /// File patterns this analyzer can process. + /// + IReadOnlyList SupportedPatterns { get; } + + /// + /// Check if analyzer can handle specific file. + /// + bool CanAnalyze(string filePath); + + /// + /// Analyze a single file or directory. + /// + Task AnalyzeAsync( + AnalysisRequest request, + CancellationToken ct); + + /// + /// Batch analyze multiple targets. + /// + IAsyncEnumerable AnalyzeBatchAsync( + IReadOnlyList requests, + CancellationToken ct); +} + +public enum AnalysisCategory +{ + Language, + Binary, + Sbom, + Container +} + +public sealed record AnalysisRequest( + string TargetPath, + AnalysisOptions Options); + +public sealed record AnalysisOptions( + bool IncludeDevDependencies = false, + bool IncludeTransitive = true, + int MaxDepth = 100, + IReadOnlyList? ExcludePatterns = null, + IReadOnlyDictionary? Environment = null); + +public sealed record AnalysisResult( + string TargetPath, + string AnalyzerId, + bool Success, + IReadOnlyList Components, + IReadOnlyList Errors, + AnalysisMetadata Metadata); + +public sealed record DetectedComponent( + string Name, + string Version, + string? Purl, + ComponentType Type, + string? Ecosystem, + string? License, + string? SourceLocation, + IReadOnlyList DirectDependencies, + IReadOnlyDictionary? Metadata); + +public enum ComponentType +{ + Library, + Application, + Framework, + Container, + OperatingSystem, + Device, + File, + Data +} + +public sealed record AnalysisError( + string Code, + string Message, + string? FilePath, + int? Line, + AnalysisErrorSeverity Severity); + +public enum AnalysisErrorSeverity +{ + Warning, + Error, + Fatal +} + +public sealed record AnalysisMetadata( + DateTimeOffset AnalyzedAt, + TimeSpan Duration, + string AnalyzerVersion, + IReadOnlyDictionary? AdditionalInfo); +``` + +### Language Analyzer Base Class + +```csharp +// LanguageAnalyzerBase.cs +namespace StellaOps.Scanner.Plugin; + +/// +/// Base class for language-specific analyzers. +/// +public abstract class LanguageAnalyzerBase : IPlugin, IAnalysisCapability +{ + protected IPluginContext? Context { get; private set; } + + public abstract PluginInfo Info { get; } + public PluginTrustLevel TrustLevel => PluginTrustLevel.BuiltIn; + public PluginCapabilities Capabilities => PluginCapabilities.Analysis; + public PluginLifecycleState State { get; protected set; } = PluginLifecycleState.Discovered; + + public abstract string AnalyzerId { get; } + public AnalysisCategory Category => AnalysisCategory.Language; + public abstract IReadOnlyList SupportedPatterns { get; } + + public async Task InitializeAsync(IPluginContext context, CancellationToken ct) + { + Context = context; + State = PluginLifecycleState.Initializing; + + try + { + await InitializeAnalyzerAsync(context, ct); + State = PluginLifecycleState.Active; + context.Logger.Info("{AnalyzerId} analyzer initialized", AnalyzerId); + } + catch (Exception ex) + { + State = PluginLifecycleState.Failed; + context.Logger.Error(ex, "Failed to initialize {AnalyzerId} analyzer", AnalyzerId); + throw; + } + } + + protected virtual Task InitializeAnalyzerAsync(IPluginContext context, CancellationToken ct) + => Task.CompletedTask; + + public virtual async Task HealthCheckAsync(CancellationToken ct) + { + if (State != PluginLifecycleState.Active) + return HealthCheckResult.Unhealthy($"Analyzer is in state {State}"); + + try + { + // Check if required tools are available + var toolsAvailable = await CheckToolAvailabilityAsync(ct); + if (!toolsAvailable) + return HealthCheckResult.Degraded("Some analysis tools unavailable"); + + return HealthCheckResult.Healthy(); + } + catch (Exception ex) + { + return HealthCheckResult.Unhealthy(ex); + } + } + + protected virtual Task CheckToolAvailabilityAsync(CancellationToken ct) + => Task.FromResult(true); + + public virtual bool CanAnalyze(string filePath) + { + var fileName = Path.GetFileName(filePath); + return SupportedPatterns.Any(pattern => + FilePatternMatcher.Matches(fileName, pattern)); + } + + public abstract Task AnalyzeAsync( + AnalysisRequest request, + CancellationToken ct); + + public virtual async IAsyncEnumerable AnalyzeBatchAsync( + IReadOnlyList requests, + [EnumeratorCancellation] CancellationToken ct) + { + foreach (var request in requests) + { + ct.ThrowIfCancellationRequested(); + yield return await AnalyzeAsync(request, ct); + } + } + + protected void EnsureActive() + { + if (State != PluginLifecycleState.Active) + throw new InvalidOperationException($"{AnalyzerId} analyzer is not active (state: {State})"); + } + + public virtual ValueTask DisposeAsync() + { + State = PluginLifecycleState.Stopped; + return ValueTask.CompletedTask; + } +} +``` + +### .NET Analyzer Plugin Implementation + +```csharp +// DotNetAnalyzerPlugin.cs +namespace StellaOps.Scanner.Plugin.DotNet; + +[Plugin( + id: "com.stellaops.analyzer.dotnet", + name: ".NET Dependency Analyzer", + version: "1.0.0", + vendor: "Stella Ops")] +[ProvidesCapability(PluginCapabilities.Analysis, CapabilityId = "dotnet")] +public sealed class DotNetAnalyzerPlugin : LanguageAnalyzerBase +{ + private NuGetClient? _nugetClient; + private DotNetAnalyzerOptions? _options; + + public override PluginInfo Info => new( + Id: "com.stellaops.analyzer.dotnet", + Name: ".NET Dependency Analyzer", + Version: "1.0.0", + Vendor: "Stella Ops", + Description: "Analyzes .NET projects for NuGet dependencies"); + + public override string AnalyzerId => "dotnet"; + + public override IReadOnlyList SupportedPatterns => new[] + { + "*.csproj", + "*.fsproj", + "*.vbproj", + "*.sln", + "packages.config", + "*.deps.json", + "Directory.Packages.props", + "global.json" + }; + + protected override async Task InitializeAnalyzerAsync(IPluginContext context, CancellationToken ct) + { + _options = context.Configuration.Bind(); + + // Initialize NuGet client for metadata enrichment + _nugetClient = new NuGetClient( + _options.NuGetSources ?? new[] { "https://api.nuget.org/v3/index.json" }, + context.Logger); + + await _nugetClient.InitializeAsync(ct); + } + + protected override async Task CheckToolAvailabilityAsync(CancellationToken ct) + { + // Check if dotnet CLI is available + try + { + var result = await ProcessRunner.RunAsync("dotnet", "--version", ct); + return result.ExitCode == 0; + } + catch + { + return false; + } + } + + public override async Task AnalyzeAsync( + AnalysisRequest request, + CancellationToken ct) + { + EnsureActive(); + + var sw = Stopwatch.StartNew(); + var components = new List(); + var errors = new List(); + + try + { + Context!.Logger.Debug("Analyzing .NET project: {Path}", request.TargetPath); + + var fileType = DetermineFileType(request.TargetPath); + + components = fileType switch + { + DotNetFileType.Project => await AnalyzeProjectAsync(request, ct), + DotNetFileType.Solution => await AnalyzeSolutionAsync(request, ct), + DotNetFileType.PackagesConfig => await AnalyzePackagesConfigAsync(request, ct), + DotNetFileType.DepsJson => await AnalyzeDepsJsonAsync(request, ct), + DotNetFileType.DirectoryPackagesProps => await AnalyzeCentralPackageManagementAsync(request, ct), + _ => throw new NotSupportedException($"Unsupported file type: {request.TargetPath}") + }; + + // Enrich with NuGet metadata if enabled + if (_options!.EnrichMetadata) + { + components = await EnrichComponentsAsync(components, ct); + } + + // Sort for deterministic output + components = components + .OrderBy(c => c.Name, StringComparer.OrdinalIgnoreCase) + .ThenBy(c => c.Version, StringComparer.OrdinalIgnoreCase) + .ToList(); + + sw.Stop(); + + return new AnalysisResult( + TargetPath: request.TargetPath, + AnalyzerId: AnalyzerId, + Success: true, + Components: components, + Errors: errors, + Metadata: new AnalysisMetadata( + AnalyzedAt: Context.TimeProvider.GetUtcNow(), + Duration: sw.Elapsed, + AnalyzerVersion: Info.Version, + AdditionalInfo: new Dictionary + { + ["fileType"] = fileType.ToString(), + ["componentCount"] = components.Count + })); + } + catch (Exception ex) + { + sw.Stop(); + Context!.Logger.Error(ex, "Failed to analyze {Path}", request.TargetPath); + + errors.Add(new AnalysisError( + Code: "DOTNET001", + Message: ex.Message, + FilePath: request.TargetPath, + Line: null, + Severity: AnalysisErrorSeverity.Error)); + + return new AnalysisResult( + TargetPath: request.TargetPath, + AnalyzerId: AnalyzerId, + Success: false, + Components: components, + Errors: errors, + Metadata: new AnalysisMetadata( + AnalyzedAt: Context.TimeProvider.GetUtcNow(), + Duration: sw.Elapsed, + AnalyzerVersion: Info.Version, + AdditionalInfo: null)); + } + } + + private async Task> AnalyzeProjectAsync( + AnalysisRequest request, + CancellationToken ct) + { + var components = new List(); + + // Use dotnet list package --format json for accurate dependency resolution + var result = await ProcessRunner.RunAsync( + "dotnet", + $"list \"{request.TargetPath}\" package --format json" + + (request.Options.IncludeTransitive ? " --include-transitive" : ""), + ct, + workingDirectory: Path.GetDirectoryName(request.TargetPath)); + + if (result.ExitCode != 0) + { + // Fallback to parsing project file directly + return await ParseProjectFileAsync(request.TargetPath, ct); + } + + var packageData = JsonSerializer.Deserialize(result.Output); + if (packageData?.Projects == null) return components; + + foreach (var project in packageData.Projects) + { + foreach (var framework in project.Frameworks ?? Enumerable.Empty()) + { + foreach (var pkg in framework.TopLevelPackages ?? Enumerable.Empty()) + { + components.Add(CreateComponent(pkg, isDirect: true)); + } + + if (request.Options.IncludeTransitive) + { + foreach (var pkg in framework.TransitivePackages ?? Enumerable.Empty()) + { + components.Add(CreateComponent(pkg, isDirect: false)); + } + } + } + } + + return components; + } + + private async Task> AnalyzeSolutionAsync( + AnalysisRequest request, + CancellationToken ct) + { + var components = new List(); + + // Parse solution file to find all projects + var solutionContent = await File.ReadAllTextAsync(request.TargetPath, ct); + var projectPaths = SolutionParser.ExtractProjectPaths(solutionContent, Path.GetDirectoryName(request.TargetPath)!); + + foreach (var projectPath in projectPaths) + { + if (!File.Exists(projectPath)) continue; + + var projectRequest = request with { TargetPath = projectPath }; + var projectComponents = await AnalyzeProjectAsync(projectRequest, ct); + components.AddRange(projectComponents); + } + + // Deduplicate by name+version + return components + .GroupBy(c => (c.Name, c.Version)) + .Select(g => g.First()) + .ToList(); + } + + private async Task> AnalyzePackagesConfigAsync( + AnalysisRequest request, + CancellationToken ct) + { + var components = new List(); + var content = await File.ReadAllTextAsync(request.TargetPath, ct); + var doc = XDocument.Parse(content); + + foreach (var package in doc.Descendants("package")) + { + var id = package.Attribute("id")?.Value; + var version = package.Attribute("version")?.Value; + + if (string.IsNullOrEmpty(id) || string.IsNullOrEmpty(version)) continue; + + components.Add(new DetectedComponent( + Name: id, + Version: version, + Purl: $"pkg:nuget/{id}@{version}", + Type: ComponentType.Library, + Ecosystem: "nuget", + License: null, + SourceLocation: request.TargetPath, + DirectDependencies: Array.Empty(), + Metadata: null)); + } + + return components; + } + + private async Task> AnalyzeDepsJsonAsync( + AnalysisRequest request, + CancellationToken ct) + { + var components = new List(); + var content = await File.ReadAllTextAsync(request.TargetPath, ct); + var depsJson = JsonSerializer.Deserialize(content); + + if (depsJson?.Libraries == null) return components; + + foreach (var (key, library) in depsJson.Libraries) + { + var parts = key.Split('/'); + if (parts.Length != 2) continue; + + var name = parts[0]; + var version = parts[1]; + + // Skip runtime libraries + if (library.Type == "project") continue; + + components.Add(new DetectedComponent( + Name: name, + Version: version, + Purl: $"pkg:nuget/{name}@{version}", + Type: ComponentType.Library, + Ecosystem: "nuget", + License: null, + SourceLocation: request.TargetPath, + DirectDependencies: Array.Empty(), + Metadata: library.Sha512 != null + ? new Dictionary { ["sha512"] = library.Sha512 } + : null)); + } + + return components; + } + + private async Task> AnalyzeCentralPackageManagementAsync( + AnalysisRequest request, + CancellationToken ct) + { + var components = new List(); + var content = await File.ReadAllTextAsync(request.TargetPath, ct); + var doc = XDocument.Parse(content); + + foreach (var packageVersion in doc.Descendants("PackageVersion")) + { + var include = packageVersion.Attribute("Include")?.Value; + var version = packageVersion.Attribute("Version")?.Value; + + if (string.IsNullOrEmpty(include) || string.IsNullOrEmpty(version)) continue; + + components.Add(new DetectedComponent( + Name: include, + Version: version, + Purl: $"pkg:nuget/{include}@{version}", + Type: ComponentType.Library, + Ecosystem: "nuget", + License: null, + SourceLocation: request.TargetPath, + DirectDependencies: Array.Empty(), + Metadata: new Dictionary { ["centrallyManaged"] = "true" })); + } + + return components; + } + + private async Task> EnrichComponentsAsync( + List components, + CancellationToken ct) + { + var enriched = new List(); + + foreach (var component in components) + { + try + { + var metadata = await _nugetClient!.GetPackageMetadataAsync( + component.Name, + component.Version, + ct); + + enriched.Add(component with + { + License = metadata?.LicenseExpression ?? metadata?.LicenseUrl, + Metadata = MergeMetadata(component.Metadata, new Dictionary + { + ["description"] = metadata?.Description ?? "", + ["projectUrl"] = metadata?.ProjectUrl ?? "", + ["authors"] = metadata?.Authors ?? "" + }) + }); + } + catch + { + enriched.Add(component); + } + } + + return enriched; + } + + private DetectedComponent CreateComponent(PackageInfo pkg, bool isDirect) + { + return new DetectedComponent( + Name: pkg.Id, + Version: pkg.ResolvedVersion ?? pkg.RequestedVersion, + Purl: $"pkg:nuget/{pkg.Id}@{pkg.ResolvedVersion ?? pkg.RequestedVersion}", + Type: ComponentType.Library, + Ecosystem: "nuget", + License: null, + SourceLocation: null, + DirectDependencies: Array.Empty(), + Metadata: new Dictionary + { + ["isDirect"] = isDirect.ToString(), + ["requestedVersion"] = pkg.RequestedVersion ?? "" + }); + } + + private static DotNetFileType DetermineFileType(string path) + { + var fileName = Path.GetFileName(path); + return fileName.ToLowerInvariant() switch + { + "packages.config" => DotNetFileType.PackagesConfig, + "directory.packages.props" => DotNetFileType.DirectoryPackagesProps, + "global.json" => DotNetFileType.GlobalJson, + var f when f.EndsWith(".deps.json") => DotNetFileType.DepsJson, + var f when f.EndsWith(".sln") => DotNetFileType.Solution, + var f when f.EndsWith("proj") => DotNetFileType.Project, + _ => DotNetFileType.Unknown + }; + } + + private static IReadOnlyDictionary? MergeMetadata( + IReadOnlyDictionary? existing, + Dictionary additional) + { + if (existing == null) return additional; + var merged = new Dictionary(existing); + foreach (var (key, value) in additional) + { + merged[key] = value; + } + return merged; + } + + private enum DotNetFileType + { + Unknown, + Project, + Solution, + PackagesConfig, + DepsJson, + DirectoryPackagesProps, + GlobalJson + } + + public override async ValueTask DisposeAsync() + { + if (_nugetClient != null) + { + await _nugetClient.DisposeAsync(); + _nugetClient = null; + } + await base.DisposeAsync(); + } +} + +public sealed class DotNetAnalyzerOptions +{ + public string[]? NuGetSources { get; set; } + public bool EnrichMetadata { get; set; } = true; +} +``` + +### Go Analyzer Plugin Implementation + +```csharp +// GoAnalyzerPlugin.cs +namespace StellaOps.Scanner.Plugin.Go; + +[Plugin( + id: "com.stellaops.analyzer.go", + name: "Go Module Analyzer", + version: "1.0.0", + vendor: "Stella Ops")] +[ProvidesCapability(PluginCapabilities.Analysis, CapabilityId = "go")] +public sealed class GoAnalyzerPlugin : LanguageAnalyzerBase +{ + public override PluginInfo Info => new( + Id: "com.stellaops.analyzer.go", + Name: "Go Module Analyzer", + Version: "1.0.0", + Vendor: "Stella Ops", + Description: "Analyzes Go modules for dependencies"); + + public override string AnalyzerId => "go"; + + public override IReadOnlyList SupportedPatterns => new[] + { + "go.mod", + "go.sum", + "Gopkg.lock", + "Gopkg.toml", + "vendor/modules.txt" + }; + + protected override async Task CheckToolAvailabilityAsync(CancellationToken ct) + { + try + { + var result = await ProcessRunner.RunAsync("go", "version", ct); + return result.ExitCode == 0; + } + catch + { + return false; + } + } + + public override async Task AnalyzeAsync( + AnalysisRequest request, + CancellationToken ct) + { + EnsureActive(); + + var sw = Stopwatch.StartNew(); + var components = new List(); + var errors = new List(); + + try + { + var fileName = Path.GetFileName(request.TargetPath); + + if (fileName == "go.mod") + { + components = await AnalyzeGoModAsync(request, ct); + } + else if (fileName == "go.sum") + { + components = await AnalyzeGoSumAsync(request, ct); + } + else if (fileName == "Gopkg.lock") + { + components = await AnalyzeDepLockAsync(request, ct); + } + else if (fileName == "modules.txt") + { + components = await AnalyzeVendorModulesAsync(request, ct); + } + + // Sort for deterministic output + components = components + .OrderBy(c => c.Name, StringComparer.OrdinalIgnoreCase) + .ThenBy(c => c.Version, StringComparer.OrdinalIgnoreCase) + .ToList(); + + sw.Stop(); + + return new AnalysisResult( + TargetPath: request.TargetPath, + AnalyzerId: AnalyzerId, + Success: true, + Components: components, + Errors: errors, + Metadata: new AnalysisMetadata( + AnalyzedAt: Context!.TimeProvider.GetUtcNow(), + Duration: sw.Elapsed, + AnalyzerVersion: Info.Version, + AdditionalInfo: new Dictionary + { + ["componentCount"] = components.Count + })); + } + catch (Exception ex) + { + sw.Stop(); + Context!.Logger.Error(ex, "Failed to analyze {Path}", request.TargetPath); + + errors.Add(new AnalysisError( + Code: "GO001", + Message: ex.Message, + FilePath: request.TargetPath, + Line: null, + Severity: AnalysisErrorSeverity.Error)); + + return new AnalysisResult( + TargetPath: request.TargetPath, + AnalyzerId: AnalyzerId, + Success: false, + Components: components, + Errors: errors, + Metadata: new AnalysisMetadata( + AnalyzedAt: Context.TimeProvider.GetUtcNow(), + Duration: sw.Elapsed, + AnalyzerVersion: Info.Version, + AdditionalInfo: null)); + } + } + + private async Task> AnalyzeGoModAsync( + AnalysisRequest request, + CancellationToken ct) + { + var components = new List(); + var workDir = Path.GetDirectoryName(request.TargetPath)!; + + // Use go list for accurate dependency resolution + var result = await ProcessRunner.RunAsync( + "go", + "list -m -json all", + ct, + workingDirectory: workDir, + environment: new Dictionary + { + ["GO111MODULE"] = "on" + }); + + if (result.ExitCode == 0) + { + // Parse NDJSON output + var reader = new StringReader(result.Output); + string? line; + while ((line = await reader.ReadLineAsync(ct)) != null) + { + if (string.IsNullOrWhiteSpace(line)) continue; + + var module = JsonSerializer.Deserialize(line); + if (module?.Path == null || module.Main) continue; + + components.Add(new DetectedComponent( + Name: module.Path, + Version: module.Version ?? "unknown", + Purl: CreateGoPurl(module.Path, module.Version), + Type: ComponentType.Library, + Ecosystem: "go", + License: null, + SourceLocation: request.TargetPath, + DirectDependencies: Array.Empty(), + Metadata: module.Replace != null + ? new Dictionary + { + ["replacedBy"] = module.Replace.Path, + ["replacedVersion"] = module.Replace.Version ?? "" + } + : null)); + } + } + else + { + // Fallback: parse go.mod directly + components = await ParseGoModFileAsync(request.TargetPath, ct); + } + + return components; + } + + private async Task> ParseGoModFileAsync( + string path, + CancellationToken ct) + { + var components = new List(); + var lines = await File.ReadAllLinesAsync(path, ct); + var inRequireBlock = false; + + foreach (var line in lines) + { + var trimmed = line.Trim(); + + if (trimmed.StartsWith("require (")) + { + inRequireBlock = true; + continue; + } + if (trimmed == ")") + { + inRequireBlock = false; + continue; + } + + if (inRequireBlock || trimmed.StartsWith("require ")) + { + var requireLine = inRequireBlock ? trimmed : trimmed[8..].Trim(); + var parts = requireLine.Split(' ', StringSplitOptions.RemoveEmptyEntries); + + if (parts.Length >= 2) + { + var modulePath = parts[0]; + var version = parts[1]; + + // Skip indirect dependencies if not requested + var isIndirect = requireLine.Contains("// indirect"); + + components.Add(new DetectedComponent( + Name: modulePath, + Version: version, + Purl: CreateGoPurl(modulePath, version), + Type: ComponentType.Library, + Ecosystem: "go", + License: null, + SourceLocation: path, + DirectDependencies: Array.Empty(), + Metadata: new Dictionary + { + ["isDirect"] = (!isIndirect).ToString() + })); + } + } + } + + return components; + } + + private async Task> AnalyzeGoSumAsync( + AnalysisRequest request, + CancellationToken ct) + { + var components = new List(); + var lines = await File.ReadAllLinesAsync(request.TargetPath, ct); + + foreach (var line in lines) + { + var parts = line.Split(' ', StringSplitOptions.RemoveEmptyEntries); + if (parts.Length < 3) continue; + + var modulePath = parts[0]; + var version = parts[1].TrimEnd("/go.mod".ToCharArray()); + var hash = parts[2]; + + // Deduplicate (go.sum has entries for both module and go.mod) + if (version.EndsWith("/go.mod")) continue; + + components.Add(new DetectedComponent( + Name: modulePath, + Version: version, + Purl: CreateGoPurl(modulePath, version), + Type: ComponentType.Library, + Ecosystem: "go", + License: null, + SourceLocation: request.TargetPath, + DirectDependencies: Array.Empty(), + Metadata: new Dictionary + { + ["h1"] = hash + })); + } + + // Deduplicate by module path + version + return components + .GroupBy(c => (c.Name, c.Version)) + .Select(g => g.First()) + .ToList(); + } + + private async Task> AnalyzeDepLockAsync( + AnalysisRequest request, + CancellationToken ct) + { + var components = new List(); + var content = await File.ReadAllTextAsync(request.TargetPath, ct); + + // Parse TOML lock file + var toml = Toml.Parse(content); + + foreach (var project in toml.GetTableArray("projects")) + { + var name = project.GetString("name"); + var version = project.GetString("version"); + var revision = project.GetString("revision"); + + if (string.IsNullOrEmpty(name)) continue; + + components.Add(new DetectedComponent( + Name: name, + Version: version ?? revision ?? "unknown", + Purl: CreateGoPurl(name, version ?? revision), + Type: ComponentType.Library, + Ecosystem: "go", + License: null, + SourceLocation: request.TargetPath, + DirectDependencies: Array.Empty(), + Metadata: revision != null + ? new Dictionary { ["revision"] = revision } + : null)); + } + + return components; + } + + private async Task> AnalyzeVendorModulesAsync( + AnalysisRequest request, + CancellationToken ct) + { + var components = new List(); + var lines = await File.ReadAllLinesAsync(request.TargetPath, ct); + + foreach (var line in lines) + { + if (!line.StartsWith("# ")) continue; + + var parts = line[2..].Split(' ', StringSplitOptions.RemoveEmptyEntries); + if (parts.Length < 2) continue; + + var modulePath = parts[0]; + var version = parts[1]; + + components.Add(new DetectedComponent( + Name: modulePath, + Version: version, + Purl: CreateGoPurl(modulePath, version), + Type: ComponentType.Library, + Ecosystem: "go", + License: null, + SourceLocation: request.TargetPath, + DirectDependencies: Array.Empty(), + Metadata: new Dictionary + { + ["vendored"] = "true" + })); + } + + return components; + } + + private static string CreateGoPurl(string modulePath, string? version) + { + // Go PURLs use the module path as namespace/name + var encoded = Uri.EscapeDataString(modulePath.ToLowerInvariant()); + return version != null + ? $"pkg:golang/{encoded}@{version}" + : $"pkg:golang/{encoded}"; + } + + private sealed record GoModule( + string? Path, + string? Version, + bool Main, + GoModule? Replace); +} +``` + +### Migration Tasks + +| Analyzer | Current Interface | New Implementation | Status | +|----------|-------------------|-------------------|--------| +| DotNet | `ILanguageAnalyzer` | `DotNetAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| Go | `ILanguageAnalyzer` | `GoAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| Java | `ILanguageAnalyzer` | `JavaAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| JavaScript | `ILanguageAnalyzer` | `JavaScriptAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| Python | `ILanguageAnalyzer` | `PythonAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| Ruby | `ILanguageAnalyzer` | `RubyAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| Rust | `ILanguageAnalyzer` | `RustAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| PHP | `ILanguageAnalyzer` | `PhpAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| Swift | `ILanguageAnalyzer` | `SwiftAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| C++ | `ILanguageAnalyzer` | `CppAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| Elixir | `ILanguageAnalyzer` | `ElixirAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| ELF Binary | `IBinaryAnalyzer` | `ElfAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| PE Binary | `IBinaryAnalyzer` | `PeAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| Mach-O Binary | `IBinaryAnalyzer` | `MachOAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| SPDX Gen | `ISbomGenerator` | `SpdxGeneratorPlugin : IPlugin, ISbomCapability` | TODO | +| CycloneDX Gen | `ISbomGenerator` | `CycloneDxGeneratorPlugin : IPlugin, ISbomCapability` | TODO | + +--- + +## Acceptance Criteria + +- [ ] All 11 language analyzers implement `IPlugin` +- [ ] All 11 language analyzers implement `IAnalysisCapability` +- [ ] Binary analyzers (ELF, PE, Mach-O) implement plugin interfaces +- [ ] SBOM generators implement plugin interfaces +- [ ] Deterministic output maintained (sorted components) +- [ ] Health checks verify tool availability +- [ ] Plugin manifests for all analyzers +- [ ] Backward compatibility with Scanner service +- [ ] Unit tests migrated/updated +- [ ] Integration tests with real packages + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_001 Plugin Abstractions | Internal | TODO | +| 100_002 Plugin Host | Internal | TODO | +| NuGet.Protocol | External | Available | +| Tomlyn | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IAnalysisCapability interface | TODO | | +| LanguageAnalyzerBase | TODO | | +| DotNetAnalyzerPlugin | TODO | | +| GoAnalyzerPlugin | TODO | | +| JavaAnalyzerPlugin | TODO | Maven + Gradle | +| JavaScriptAnalyzerPlugin | TODO | npm + yarn + pnpm | +| PythonAnalyzerPlugin | TODO | pip + poetry + pipenv | +| RubyAnalyzerPlugin | TODO | Bundler | +| RustAnalyzerPlugin | TODO | Cargo | +| PhpAnalyzerPlugin | TODO | Composer | +| SwiftAnalyzerPlugin | TODO | SPM | +| CppAnalyzerPlugin | TODO | Conan + vcpkg | +| ElixirAnalyzerPlugin | TODO | Mix/Hex | +| ElfAnalyzerPlugin | TODO | | +| PeAnalyzerPlugin | TODO | | +| MachOAnalyzerPlugin | TODO | | +| SpdxGeneratorPlugin | TODO | | +| CycloneDxGeneratorPlugin | TODO | | +| Plugin manifests | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_100_010_PLUGIN_router_rework.md b/docs/implplan/SPRINT_20260110_100_010_PLUGIN_router_rework.md new file mode 100644 index 000000000..489eac6c8 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_010_PLUGIN_router_rework.md @@ -0,0 +1,1129 @@ +# SPRINT: Router Transport Rework + +> **Sprint ID:** 100_010 +> **Module:** PLUGIN +> **Phase:** 100 - Plugin System Unification +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_plugin_unification.md) + +--- + +## Overview + +Rework all Router transport providers (TCP/TLS, UDP, RabbitMQ, Valkey) to implement the unified plugin architecture with `IPlugin` and `ITransportCapability` interfaces. + +### Objectives + +- Migrate TCP/TLS transport to unified plugin model +- Migrate UDP transport to unified plugin model +- Migrate RabbitMQ transport to unified plugin model +- Migrate Valkey (Redis) transport to unified plugin model +- Preserve message routing semantics +- Add health checks with connectivity verification +- Add plugin manifests +- Support hot-swap transport configuration + +### Current State + +``` +src/Router/ +├── __Libraries/ +│ └── StellaOps.Router.Core/ +│ └── Transports/ +│ ├── TcpTransport.cs +│ ├── TlsTransport.cs +│ ├── UdpTransport.cs +│ ├── RabbitMqTransport.cs +│ └── ValkeyTransport.cs +``` + +### Target State + +Each transport implements: +- `IPlugin` - Core plugin interface with lifecycle +- `ITransportCapability` - Message transport operations +- Health checks for connectivity +- Plugin manifest for discovery +- Connection pooling and resilience + +--- + +## Deliverables + +### Transport Capability Interface + +```csharp +// ITransportCapability.cs +namespace StellaOps.Plugin.Abstractions.Capabilities; + +/// +/// Capability interface for message transport. +/// +public interface ITransportCapability +{ + /// + /// Transport identifier (tcp, tls, udp, rabbitmq, valkey). + /// + string TransportId { get; } + + /// + /// Transport protocol type. + /// + TransportProtocol Protocol { get; } + + /// + /// Whether this transport supports pub/sub patterns. + /// + bool SupportsPubSub { get; } + + /// + /// Whether this transport supports request/reply patterns. + /// + bool SupportsRequestReply { get; } + + /// + /// Whether this transport supports message queuing. + /// + bool SupportsQueuing { get; } + + /// + /// Create a connection to a destination. + /// + Task ConnectAsync( + TransportEndpoint endpoint, + TransportConnectionOptions options, + CancellationToken ct); + + /// + /// Create a listener on an endpoint. + /// + Task ListenAsync( + TransportEndpoint endpoint, + TransportListenerOptions options, + CancellationToken ct); + + /// + /// Subscribe to a topic/channel (for pub/sub transports). + /// + Task SubscribeAsync( + string topic, + TransportSubscriptionOptions options, + CancellationToken ct); +} + +public enum TransportProtocol +{ + Tcp, + Tls, + Udp, + Amqp, + Redis +} + +public sealed record TransportEndpoint( + string Host, + int Port, + string? Path = null, + IReadOnlyDictionary? Parameters = null); + +public sealed record TransportConnectionOptions( + TimeSpan ConnectTimeout = default, + TimeSpan ReadTimeout = default, + TimeSpan WriteTimeout = default, + int MaxRetries = 3, + TimeSpan RetryDelay = default, + bool KeepAlive = true, + int BufferSize = 65536, + TlsOptions? Tls = null); + +public sealed record TlsOptions( + string? CertificatePath = null, + string? CertificatePassword = null, + bool ValidateServerCertificate = true, + string? ServerName = null, + IReadOnlyList? AllowedCipherSuites = null); + +public sealed record TransportListenerOptions( + int Backlog = 100, + int MaxConnections = 1000, + TimeSpan IdleTimeout = default, + TlsOptions? Tls = null); + +public sealed record TransportSubscriptionOptions( + string? ConsumerGroup = null, + int PrefetchCount = 10, + bool AutoAck = false, + TimeSpan? AckTimeout = null); + +/// +/// Represents an active transport connection. +/// +public interface ITransportConnection : IAsyncDisposable +{ + /// + /// Connection identifier. + /// + string ConnectionId { get; } + + /// + /// Remote endpoint. + /// + TransportEndpoint RemoteEndpoint { get; } + + /// + /// Connection state. + /// + TransportConnectionState State { get; } + + /// + /// Send a message. + /// + Task SendAsync(TransportMessage message, CancellationToken ct); + + /// + /// Receive a message (blocking). + /// + Task ReceiveAsync(CancellationToken ct); + + /// + /// Stream of incoming messages. + /// + IAsyncEnumerable ReceiveStreamAsync(CancellationToken ct); + + /// + /// Request-reply pattern. + /// + Task RequestAsync( + TransportMessage request, + TimeSpan timeout, + CancellationToken ct); +} + +public enum TransportConnectionState +{ + Connecting, + Connected, + Disconnected, + Failed +} + +/// +/// Represents a transport listener accepting connections. +/// +public interface ITransportListener : IAsyncDisposable +{ + /// + /// Local endpoint being listened on. + /// + TransportEndpoint LocalEndpoint { get; } + + /// + /// Accept incoming connections. + /// + IAsyncEnumerable AcceptAsync(CancellationToken ct); +} + +/// +/// Represents a pub/sub subscription. +/// +public interface ITransportSubscription : IAsyncDisposable +{ + /// + /// Subscription topic. + /// + string Topic { get; } + + /// + /// Stream of incoming messages. + /// + IAsyncEnumerable MessagesAsync(CancellationToken ct); + + /// + /// Acknowledge a message. + /// + Task AcknowledgeAsync(string messageId, CancellationToken ct); + + /// + /// Negative acknowledge (requeue) a message. + /// + Task NegativeAcknowledgeAsync(string messageId, CancellationToken ct); +} + +/// +/// Transport message envelope. +/// +public sealed record TransportMessage( + string Id, + ReadOnlyMemory Payload, + string? ContentType = null, + string? CorrelationId = null, + string? ReplyTo = null, + IReadOnlyDictionary? Headers = null, + DateTimeOffset? Timestamp = null, + TimeSpan? Ttl = null); +``` + +### TCP/TLS Transport Plugin Implementation + +```csharp +// TcpTransportPlugin.cs +namespace StellaOps.Router.Plugin.Tcp; + +[Plugin( + id: "com.stellaops.transport.tcp", + name: "TCP Transport", + version: "1.0.0", + vendor: "Stella Ops")] +[ProvidesCapability(PluginCapabilities.Transport, CapabilityId = "tcp")] +public sealed class TcpTransportPlugin : IPlugin, ITransportCapability +{ + private IPluginContext? _context; + private TcpTransportOptions? _options; + private readonly ConcurrentDictionary _connections = new(); + + public PluginInfo Info => new( + Id: "com.stellaops.transport.tcp", + Name: "TCP Transport", + Version: "1.0.0", + Vendor: "Stella Ops", + Description: "Raw TCP transport for high-performance internal messaging"); + + public PluginTrustLevel TrustLevel => PluginTrustLevel.BuiltIn; + public PluginCapabilities Capabilities => PluginCapabilities.Transport | PluginCapabilities.Network; + public PluginLifecycleState State { get; private set; } = PluginLifecycleState.Discovered; + + public string TransportId => "tcp"; + public TransportProtocol Protocol => TransportProtocol.Tcp; + public bool SupportsPubSub => false; + public bool SupportsRequestReply => true; + public bool SupportsQueuing => false; + + public Task InitializeAsync(IPluginContext context, CancellationToken ct) + { + _context = context; + State = PluginLifecycleState.Initializing; + + _options = context.Configuration.Bind(); + + State = PluginLifecycleState.Active; + context.Logger.Info("TCP transport initialized"); + + return Task.CompletedTask; + } + + public async Task HealthCheckAsync(CancellationToken ct) + { + if (State != PluginLifecycleState.Active) + return HealthCheckResult.Unhealthy($"Transport is in state {State}"); + + var activeConnections = _connections.Count(c => c.Value.State == TransportConnectionState.Connected); + + return HealthCheckResult.Healthy(details: new Dictionary + { + ["activeConnections"] = activeConnections, + ["totalConnections"] = _connections.Count + }); + } + + public async Task ConnectAsync( + TransportEndpoint endpoint, + TransportConnectionOptions options, + CancellationToken ct) + { + EnsureActive(); + + var client = new TcpClient(); + + try + { + var connectTimeout = options.ConnectTimeout != default + ? options.ConnectTimeout + : TimeSpan.FromSeconds(30); + + using var cts = CancellationTokenSource.CreateLinkedTokenSource(ct); + cts.CancelAfter(connectTimeout); + + await client.ConnectAsync(endpoint.Host, endpoint.Port, cts.Token); + + if (options.KeepAlive) + { + client.Client.SetSocketOption( + SocketOptionLevel.Socket, + SocketOptionName.KeepAlive, + true); + } + + client.ReceiveBufferSize = options.BufferSize; + client.SendBufferSize = options.BufferSize; + + var connection = new TcpTransportConnection( + client, + endpoint, + options, + _context!.Logger, + _context.TimeProvider); + + _connections[connection.ConnectionId] = connection; + + _context.Logger.Debug("TCP connection established: {ConnectionId} -> {Host}:{Port}", + connection.ConnectionId, endpoint.Host, endpoint.Port); + + return connection; + } + catch (Exception ex) + { + client.Dispose(); + _context!.Logger.Error(ex, "Failed to connect to {Host}:{Port}", + endpoint.Host, endpoint.Port); + throw; + } + } + + public async Task ListenAsync( + TransportEndpoint endpoint, + TransportListenerOptions options, + CancellationToken ct) + { + EnsureActive(); + + var listener = new TcpListener( + IPAddress.Parse(endpoint.Host), + endpoint.Port); + + listener.Server.SetSocketOption( + SocketOptionLevel.Socket, + SocketOptionName.ReuseAddress, + true); + + listener.Start(options.Backlog); + + var transportListener = new TcpTransportListener( + listener, + endpoint, + options, + _context!, + conn => _connections[conn.ConnectionId] = conn); + + _context.Logger.Info("TCP listener started on {Host}:{Port}", + endpoint.Host, endpoint.Port); + + return transportListener; + } + + public Task SubscribeAsync( + string topic, + TransportSubscriptionOptions options, + CancellationToken ct) + { + throw new NotSupportedException("TCP transport does not support pub/sub"); + } + + private void EnsureActive() + { + if (State != PluginLifecycleState.Active) + throw new InvalidOperationException($"TCP transport is not active (state: {State})"); + } + + public async ValueTask DisposeAsync() + { + foreach (var connection in _connections.Values) + { + await connection.DisposeAsync(); + } + _connections.Clear(); + State = PluginLifecycleState.Stopped; + } +} + +internal sealed class TcpTransportConnection : ITransportConnection +{ + private readonly TcpClient _client; + private readonly NetworkStream _stream; + private readonly TransportConnectionOptions _options; + private readonly IPluginLogger _logger; + private readonly TimeProvider _timeProvider; + private readonly SemaphoreSlim _sendLock = new(1, 1); + private readonly SemaphoreSlim _receiveLock = new(1, 1); + + public string ConnectionId { get; } = Guid.NewGuid().ToString("N"); + public TransportEndpoint RemoteEndpoint { get; } + public TransportConnectionState State { get; private set; } = TransportConnectionState.Connected; + + public TcpTransportConnection( + TcpClient client, + TransportEndpoint endpoint, + TransportConnectionOptions options, + IPluginLogger logger, + TimeProvider timeProvider) + { + _client = client; + _stream = client.GetStream(); + RemoteEndpoint = endpoint; + _options = options; + _logger = logger; + _timeProvider = timeProvider; + } + + public async Task SendAsync(TransportMessage message, CancellationToken ct) + { + await _sendLock.WaitAsync(ct); + try + { + var frame = FrameEncoder.Encode(message); + await _stream.WriteAsync(frame, ct); + await _stream.FlushAsync(ct); + } + finally + { + _sendLock.Release(); + } + } + + public async Task ReceiveAsync(CancellationToken ct) + { + await _receiveLock.WaitAsync(ct); + try + { + return await FrameDecoder.DecodeAsync(_stream, ct); + } + finally + { + _receiveLock.Release(); + } + } + + public async IAsyncEnumerable ReceiveStreamAsync( + [EnumeratorCancellation] CancellationToken ct) + { + while (!ct.IsCancellationRequested && State == TransportConnectionState.Connected) + { + TransportMessage message; + try + { + message = await ReceiveAsync(ct); + } + catch (IOException) when (ct.IsCancellationRequested) + { + yield break; + } + catch (Exception ex) + { + _logger.Error(ex, "Error receiving message on connection {ConnectionId}", ConnectionId); + State = TransportConnectionState.Disconnected; + yield break; + } + + yield return message; + } + } + + public async Task RequestAsync( + TransportMessage request, + TimeSpan timeout, + CancellationToken ct) + { + using var cts = CancellationTokenSource.CreateLinkedTokenSource(ct); + cts.CancelAfter(timeout); + + var correlationId = request.CorrelationId ?? Guid.NewGuid().ToString("N"); + var requestWithCorrelation = request with { CorrelationId = correlationId }; + + await SendAsync(requestWithCorrelation, cts.Token); + var response = await ReceiveAsync(cts.Token); + + if (response.CorrelationId != correlationId) + { + throw new InvalidOperationException( + $"Correlation ID mismatch: expected {correlationId}, got {response.CorrelationId}"); + } + + return response; + } + + public async ValueTask DisposeAsync() + { + State = TransportConnectionState.Disconnected; + _sendLock.Dispose(); + _receiveLock.Dispose(); + await _stream.DisposeAsync(); + _client.Dispose(); + } +} + +internal sealed class TcpTransportListener : ITransportListener +{ + private readonly TcpListener _listener; + private readonly TransportListenerOptions _options; + private readonly IPluginContext _context; + private readonly Action _onConnectionAccepted; + + public TransportEndpoint LocalEndpoint { get; } + + public TcpTransportListener( + TcpListener listener, + TransportEndpoint localEndpoint, + TransportListenerOptions options, + IPluginContext context, + Action onConnectionAccepted) + { + _listener = listener; + LocalEndpoint = localEndpoint; + _options = options; + _context = context; + _onConnectionAccepted = onConnectionAccepted; + } + + public async IAsyncEnumerable AcceptAsync( + [EnumeratorCancellation] CancellationToken ct) + { + while (!ct.IsCancellationRequested) + { + TcpClient client; + try + { + client = await _listener.AcceptTcpClientAsync(ct); + } + catch (OperationCanceledException) + { + yield break; + } + catch (SocketException ex) when (ex.SocketErrorCode == SocketError.OperationAborted) + { + yield break; + } + + var connection = new TcpTransportConnection( + client, + new TransportEndpoint( + ((IPEndPoint)client.Client.RemoteEndPoint!).Address.ToString(), + ((IPEndPoint)client.Client.RemoteEndPoint!).Port), + new TransportConnectionOptions(), + _context.Logger, + _context.TimeProvider); + + _onConnectionAccepted(connection); + + _context.Logger.Debug("Accepted TCP connection: {ConnectionId} from {RemoteEndpoint}", + connection.ConnectionId, connection.RemoteEndpoint); + + yield return connection; + } + } + + public ValueTask DisposeAsync() + { + _listener.Stop(); + return ValueTask.CompletedTask; + } +} + +public sealed class TcpTransportOptions +{ + public int DefaultBufferSize { get; set; } = 65536; + public TimeSpan DefaultConnectTimeout { get; set; } = TimeSpan.FromSeconds(30); + public bool EnableKeepAlive { get; set; } = true; +} +``` + +### RabbitMQ Transport Plugin Implementation + +```csharp +// RabbitMqTransportPlugin.cs +namespace StellaOps.Router.Plugin.RabbitMq; + +[Plugin( + id: "com.stellaops.transport.rabbitmq", + name: "RabbitMQ Transport", + version: "1.0.0", + vendor: "Stella Ops")] +[ProvidesCapability(PluginCapabilities.Transport, CapabilityId = "rabbitmq")] +[RequiresCapability(PluginCapabilities.Network)] +public sealed class RabbitMqTransportPlugin : IPlugin, ITransportCapability +{ + private IPluginContext? _context; + private IConnection? _connection; + private RabbitMqOptions? _options; + private readonly ConcurrentDictionary _channels = new(); + + public PluginInfo Info => new( + Id: "com.stellaops.transport.rabbitmq", + Name: "RabbitMQ Transport", + Version: "1.0.0", + Vendor: "Stella Ops", + Description: "RabbitMQ AMQP transport for reliable message queuing"); + + public PluginTrustLevel TrustLevel => PluginTrustLevel.BuiltIn; + public PluginCapabilities Capabilities => PluginCapabilities.Transport | PluginCapabilities.Network; + public PluginLifecycleState State { get; private set; } = PluginLifecycleState.Discovered; + + public string TransportId => "rabbitmq"; + public TransportProtocol Protocol => TransportProtocol.Amqp; + public bool SupportsPubSub => true; + public bool SupportsRequestReply => true; + public bool SupportsQueuing => true; + + public async Task InitializeAsync(IPluginContext context, CancellationToken ct) + { + _context = context; + State = PluginLifecycleState.Initializing; + + _options = context.Configuration.Bind(); + + var password = await context.Configuration.GetSecretAsync("rabbitmq-password", ct) + ?? _options.Password; + + var factory = new ConnectionFactory + { + HostName = _options.Host, + Port = _options.Port, + UserName = _options.Username, + Password = password, + VirtualHost = _options.VirtualHost, + AutomaticRecoveryEnabled = true, + NetworkRecoveryInterval = TimeSpan.FromSeconds(10), + RequestedHeartbeat = TimeSpan.FromSeconds(60) + }; + + if (_options.UseSsl) + { + factory.Ssl = new SslOption + { + Enabled = true, + ServerName = _options.Host + }; + } + + _connection = await Task.Run(() => factory.CreateConnection(), ct); + + State = PluginLifecycleState.Active; + context.Logger.Info("RabbitMQ transport connected to {Host}:{Port}", + _options.Host, _options.Port); + } + + public async Task HealthCheckAsync(CancellationToken ct) + { + if (_connection == null || !_connection.IsOpen) + return HealthCheckResult.Unhealthy("Connection not open"); + + try + { + using var channel = _connection.CreateModel(); + return HealthCheckResult.Healthy(details: new Dictionary + { + ["connected"] = _connection.IsOpen, + ["serverVersion"] = _connection.ServerProperties.TryGetValue("version", out var v) ? v : "unknown", + ["activeChannels"] = _channels.Count + }); + } + catch (Exception ex) + { + return HealthCheckResult.Unhealthy(ex); + } + } + + public Task ConnectAsync( + TransportEndpoint endpoint, + TransportConnectionOptions options, + CancellationToken ct) + { + EnsureActive(); + + var channel = _connection!.CreateModel(); + var queueName = endpoint.Path ?? throw new ArgumentException("Queue name required in endpoint path"); + + // Declare queue if it doesn't exist + channel.QueueDeclare( + queue: queueName, + durable: true, + exclusive: false, + autoDelete: false, + arguments: null); + + var connection = new RabbitMqConnection( + channel, + queueName, + endpoint, + _context!); + + _channels[connection.ConnectionId] = channel; + + return Task.FromResult(connection); + } + + public Task ListenAsync( + TransportEndpoint endpoint, + TransportListenerOptions options, + CancellationToken ct) + { + throw new NotSupportedException( + "RabbitMQ uses Subscribe for consuming messages, not Listen"); + } + + public Task SubscribeAsync( + string topic, + TransportSubscriptionOptions options, + CancellationToken ct) + { + EnsureActive(); + + var channel = _connection!.CreateModel(); + + // For topic-based routing, use fanout exchange + channel.ExchangeDeclare( + exchange: topic, + type: ExchangeType.Fanout, + durable: true); + + // Create exclusive queue for this subscription + var queueName = channel.QueueDeclare( + queue: "", + durable: false, + exclusive: true, + autoDelete: true).QueueName; + + channel.QueueBind(queueName, topic, ""); + + if (options.PrefetchCount > 0) + { + channel.BasicQos(0, (ushort)options.PrefetchCount, false); + } + + var subscription = new RabbitMqSubscription( + channel, + topic, + queueName, + options, + _context!); + + return Task.FromResult(subscription); + } + + private void EnsureActive() + { + if (State != PluginLifecycleState.Active || _connection == null || !_connection.IsOpen) + throw new InvalidOperationException("RabbitMQ transport is not active"); + } + + public async ValueTask DisposeAsync() + { + foreach (var channel in _channels.Values) + { + channel.Close(); + channel.Dispose(); + } + _channels.Clear(); + + _connection?.Close(); + _connection?.Dispose(); + _connection = null; + + State = PluginLifecycleState.Stopped; + } +} + +internal sealed class RabbitMqConnection : ITransportConnection +{ + private readonly IModel _channel; + private readonly string _queueName; + private readonly IPluginContext _context; + private readonly Channel _incomingMessages; + private readonly AsyncEventingBasicConsumer? _consumer; + + public string ConnectionId { get; } = Guid.NewGuid().ToString("N"); + public TransportEndpoint RemoteEndpoint { get; } + public TransportConnectionState State { get; private set; } = TransportConnectionState.Connected; + + public RabbitMqConnection( + IModel channel, + string queueName, + TransportEndpoint endpoint, + IPluginContext context) + { + _channel = channel; + _queueName = queueName; + RemoteEndpoint = endpoint; + _context = context; + _incomingMessages = Channel.CreateUnbounded(); + + // Start consumer + _consumer = new AsyncEventingBasicConsumer(_channel); + _consumer.Received += OnMessageReceived; + _channel.BasicConsume(_queueName, autoAck: false, _consumer); + } + + private async Task OnMessageReceived(object sender, BasicDeliverEventArgs e) + { + var message = new TransportMessage( + Id: e.BasicProperties.MessageId ?? e.DeliveryTag.ToString(), + Payload: e.Body.ToArray(), + ContentType: e.BasicProperties.ContentType, + CorrelationId: e.BasicProperties.CorrelationId, + ReplyTo: e.BasicProperties.ReplyTo, + Headers: e.BasicProperties.Headers?.ToDictionary( + h => h.Key, + h => Encoding.UTF8.GetString((byte[])h.Value)), + Timestamp: e.BasicProperties.Timestamp.UnixTime > 0 + ? DateTimeOffset.FromUnixTimeSeconds(e.BasicProperties.Timestamp.UnixTime) + : null, + Ttl: e.BasicProperties.Expiration != null + ? TimeSpan.FromMilliseconds(int.Parse(e.BasicProperties.Expiration)) + : null); + + await _incomingMessages.Writer.WriteAsync(message); + } + + public Task SendAsync(TransportMessage message, CancellationToken ct) + { + var properties = _channel.CreateBasicProperties(); + properties.MessageId = message.Id; + properties.ContentType = message.ContentType ?? "application/octet-stream"; + properties.CorrelationId = message.CorrelationId; + properties.ReplyTo = message.ReplyTo; + properties.DeliveryMode = 2; // Persistent + + if (message.Headers != null) + { + properties.Headers = message.Headers.ToDictionary( + h => h.Key, + h => (object)Encoding.UTF8.GetBytes(h.Value)); + } + + if (message.Timestamp.HasValue) + { + properties.Timestamp = new AmqpTimestamp(message.Timestamp.Value.ToUnixTimeSeconds()); + } + + if (message.Ttl.HasValue) + { + properties.Expiration = message.Ttl.Value.TotalMilliseconds.ToString(CultureInfo.InvariantCulture); + } + + _channel.BasicPublish( + exchange: "", + routingKey: _queueName, + basicProperties: properties, + body: message.Payload); + + return Task.CompletedTask; + } + + public async Task ReceiveAsync(CancellationToken ct) + { + return await _incomingMessages.Reader.ReadAsync(ct); + } + + public async IAsyncEnumerable ReceiveStreamAsync( + [EnumeratorCancellation] CancellationToken ct) + { + await foreach (var message in _incomingMessages.Reader.ReadAllAsync(ct)) + { + yield return message; + } + } + + public async Task RequestAsync( + TransportMessage request, + TimeSpan timeout, + CancellationToken ct) + { + var correlationId = request.CorrelationId ?? Guid.NewGuid().ToString("N"); + var replyQueue = _channel.QueueDeclare(queue: "", exclusive: true).QueueName; + + var requestWithReply = request with + { + CorrelationId = correlationId, + ReplyTo = replyQueue + }; + + var tcs = new TaskCompletionSource(); + + var consumer = new AsyncEventingBasicConsumer(_channel); + consumer.Received += async (_, e) => + { + if (e.BasicProperties.CorrelationId == correlationId) + { + var response = new TransportMessage( + Id: e.BasicProperties.MessageId ?? e.DeliveryTag.ToString(), + Payload: e.Body.ToArray(), + ContentType: e.BasicProperties.ContentType, + CorrelationId: e.BasicProperties.CorrelationId); + tcs.TrySetResult(response); + } + }; + + _channel.BasicConsume(replyQueue, autoAck: true, consumer); + await SendAsync(requestWithReply, ct); + + using var cts = CancellationTokenSource.CreateLinkedTokenSource(ct); + cts.CancelAfter(timeout); + + using var registration = cts.Token.Register(() => + tcs.TrySetCanceled(cts.Token)); + + return await tcs.Task; + } + + public ValueTask DisposeAsync() + { + State = TransportConnectionState.Disconnected; + _incomingMessages.Writer.Complete(); + _channel.Close(); + _channel.Dispose(); + return ValueTask.CompletedTask; + } +} + +internal sealed class RabbitMqSubscription : ITransportSubscription +{ + private readonly IModel _channel; + private readonly string _queueName; + private readonly TransportSubscriptionOptions _options; + private readonly IPluginContext _context; + private readonly Channel<(TransportMessage Message, ulong DeliveryTag)> _messages; + private readonly string _consumerTag; + + public string Topic { get; } + + public RabbitMqSubscription( + IModel channel, + string topic, + string queueName, + TransportSubscriptionOptions options, + IPluginContext context) + { + _channel = channel; + Topic = topic; + _queueName = queueName; + _options = options; + _context = context; + _messages = Channel.CreateUnbounded<(TransportMessage, ulong)>(); + + var consumer = new AsyncEventingBasicConsumer(_channel); + consumer.Received += OnMessageReceived; + _consumerTag = _channel.BasicConsume(_queueName, autoAck: options.AutoAck, consumer); + } + + private async Task OnMessageReceived(object sender, BasicDeliverEventArgs e) + { + var message = new TransportMessage( + Id: e.DeliveryTag.ToString(), + Payload: e.Body.ToArray(), + ContentType: e.BasicProperties.ContentType, + CorrelationId: e.BasicProperties.CorrelationId, + Headers: e.BasicProperties.Headers?.ToDictionary( + h => h.Key, + h => Encoding.UTF8.GetString((byte[])h.Value))); + + await _messages.Writer.WriteAsync((message, e.DeliveryTag)); + } + + public async IAsyncEnumerable MessagesAsync( + [EnumeratorCancellation] CancellationToken ct) + { + await foreach (var (message, _) in _messages.Reader.ReadAllAsync(ct)) + { + yield return message; + } + } + + public Task AcknowledgeAsync(string messageId, CancellationToken ct) + { + if (ulong.TryParse(messageId, out var deliveryTag)) + { + _channel.BasicAck(deliveryTag, multiple: false); + } + return Task.CompletedTask; + } + + public Task NegativeAcknowledgeAsync(string messageId, CancellationToken ct) + { + if (ulong.TryParse(messageId, out var deliveryTag)) + { + _channel.BasicNack(deliveryTag, multiple: false, requeue: true); + } + return Task.CompletedTask; + } + + public ValueTask DisposeAsync() + { + _channel.BasicCancel(_consumerTag); + _messages.Writer.Complete(); + _channel.Close(); + _channel.Dispose(); + return ValueTask.CompletedTask; + } +} + +public sealed class RabbitMqOptions +{ + public string Host { get; set; } = "localhost"; + public int Port { get; set; } = 5672; + public string Username { get; set; } = "guest"; + public string Password { get; set; } = "guest"; + public string VirtualHost { get; set; } = "/"; + public bool UseSsl { get; set; } = false; +} +``` + +### Migration Tasks + +| Transport | Current Interface | New Implementation | Status | +|-----------|-------------------|-------------------|--------| +| TCP | `ITransport` | `TcpTransportPlugin : IPlugin, ITransportCapability` | TODO | +| TLS | `ITransport` | `TlsTransportPlugin : IPlugin, ITransportCapability` | TODO | +| UDP | `ITransport` | `UdpTransportPlugin : IPlugin, ITransportCapability` | TODO | +| RabbitMQ | `ITransport` | `RabbitMqTransportPlugin : IPlugin, ITransportCapability` | TODO | +| Valkey | `ITransport` | `ValkeyTransportPlugin : IPlugin, ITransportCapability` | TODO | + +--- + +## Acceptance Criteria + +- [ ] All transports implement `IPlugin` +- [ ] All transports implement `ITransportCapability` +- [ ] TCP/TLS transports support connection pooling +- [ ] RabbitMQ supports pub/sub and queuing +- [ ] Valkey supports pub/sub +- [ ] Health checks verify connectivity +- [ ] Message framing is consistent +- [ ] Request/reply pattern works +- [ ] Graceful connection shutdown +- [ ] Plugin manifests for all transports +- [ ] Unit tests migrated/updated +- [ ] Integration tests with real brokers + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_001 Plugin Abstractions | Internal | TODO | +| 100_002 Plugin Host | Internal | TODO | +| RabbitMQ.Client | External | Available | +| StackExchange.Redis | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| ITransportCapability interface | TODO | | +| TcpTransportPlugin | TODO | | +| TlsTransportPlugin | TODO | | +| UdpTransportPlugin | TODO | | +| RabbitMqTransportPlugin | TODO | | +| ValkeyTransportPlugin | TODO | | +| FrameEncoder/Decoder | TODO | | +| Plugin manifests | TODO | | +| Unit tests | TODO | | +| Integration tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_100_011_PLUGIN_concelier_rework.md b/docs/implplan/SPRINT_20260110_100_011_PLUGIN_concelier_rework.md new file mode 100644 index 000000000..4a6b311ec --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_011_PLUGIN_concelier_rework.md @@ -0,0 +1,1209 @@ +# SPRINT: Concelier Connector Rework + +> **Sprint ID:** 100_011 +> **Module:** PLUGIN +> **Phase:** 100 - Plugin System Unification +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_plugin_unification.md) + +--- + +## Overview + +Rework all Concelier vulnerability feed connectors to implement the unified plugin architecture with `IPlugin` and `IFeedCapability` interfaces. + +### Objectives + +- Migrate all vulnerability feed connectors to unified plugin model +- Migrate OVAL connectors (distro security data) +- Migrate NVD/CVE connectors +- Migrate OSV connectors (ecosystem advisories) +- Migrate vendor advisory connectors +- Preserve feed synchronization semantics +- Add health checks with feed availability +- Add plugin manifests +- Support incremental/delta updates + +### Current State + +``` +src/Concelier/ +├── __Libraries/ +│ └── StellaOps.Concelier.Connectors/ +│ ├── Oval/ +│ │ ├── RedHatOvalConnector.cs +│ │ ├── UbuntuOvalConnector.cs +│ │ ├── DebianOvalConnector.cs +│ │ ├── SuseOvalConnector.cs +│ │ ├── OracleOvalConnector.cs +│ │ ├── AlmaLinuxOvalConnector.cs +│ │ ├── RockyLinuxOvalConnector.cs +│ │ └── AlpineSecDbConnector.cs +│ ├── Cve/ +│ │ ├── NvdConnector.cs +│ │ ├── MitreConnector.cs +│ │ └── CveListV5Connector.cs +│ ├── Osv/ +│ │ ├── OsvConnector.cs +│ │ ├── GhsaConnector.cs +│ │ └── GitLabAdvisoriesConnector.cs +│ ├── Vendor/ +│ │ ├── MicrosoftMsrcConnector.cs +│ │ ├── AmazonInspectorConnector.cs +│ │ └── CisaKevConnector.cs +│ └── Mirror/ +│ └── MirrorFeedConnector.cs +``` + +### Target State + +Each connector implements: +- `IPlugin` - Core plugin interface with lifecycle +- `IFeedCapability` - Feed synchronization operations +- Health checks for feed availability +- Plugin manifest for discovery +- Incremental update support +- Deterministic output ordering + +--- + +## Deliverables + +### Feed Capability Interface + +```csharp +// IFeedCapability.cs +namespace StellaOps.Plugin.Abstractions.Capabilities; + +/// +/// Capability interface for vulnerability feed ingestion. +/// +public interface IFeedCapability +{ + /// + /// Feed identifier (nvd, ghsa, redhat-oval, etc.). + /// + string FeedId { get; } + + /// + /// Feed type category. + /// + FeedType Type { get; } + + /// + /// Supported advisory formats. + /// + IReadOnlyList SupportedFormats { get; } + + /// + /// Whether this feed supports incremental updates. + /// + bool SupportsIncremental { get; } + + /// + /// Get feed metadata and statistics. + /// + Task GetMetadataAsync(CancellationToken ct); + + /// + /// Synchronize feed data. + /// + Task SyncAsync(FeedSyncOptions options, CancellationToken ct); + + /// + /// Stream advisories incrementally. + /// + IAsyncEnumerable StreamAdvisoriesAsync( + FeedStreamOptions options, + CancellationToken ct); + + /// + /// Get a specific advisory by ID. + /// + Task GetAdvisoryAsync(string advisoryId, CancellationToken ct); +} + +public enum FeedType +{ + Cve, + Oval, + Osv, + Vendor, + Kev, + Mirror +} + +public enum AdvisoryFormat +{ + Cve5, + Oval, + Osv, + Csaf, + Vex, + Custom +} + +public sealed record FeedMetadata( + string FeedId, + string Name, + string? Description, + DateTimeOffset? LastModified, + DateTimeOffset? LastSync, + long AdvisoryCount, + string? Version, + string? SourceUrl, + IReadOnlyDictionary? AdditionalInfo); + +public sealed record FeedSyncOptions( + DateTimeOffset? Since = null, + DateTimeOffset? Until = null, + bool FullSync = false, + int? MaxItems = null, + IReadOnlyList? FilterIds = null, + string? Checkpoint = null); + +public sealed record FeedSyncResult( + bool Success, + int ItemsProcessed, + int ItemsAdded, + int ItemsUpdated, + int ItemsRemoved, + DateTimeOffset SyncedAt, + TimeSpan Duration, + string? NextCheckpoint, + IReadOnlyList Errors); + +public sealed record FeedSyncError( + string AdvisoryId, + string Message, + Exception? Exception); + +public sealed record FeedStreamOptions( + DateTimeOffset? ModifiedSince = null, + int BatchSize = 100, + string? StartAfter = null, + IReadOnlyList? Ecosystems = null); + +/// +/// Normalized advisory representation. +/// +public sealed record Advisory( + string Id, + string FeedId, + AdvisoryFormat SourceFormat, + string? Title, + string? Description, + AdvisorySeverity? Severity, + CvssScore? Cvss, + DateTimeOffset? Published, + DateTimeOffset? Modified, + IReadOnlyList AffectedPackages, + IReadOnlyList References, + IReadOnlyList? Aliases, + IReadOnlyDictionary? Metadata, + ReadOnlyMemory? RawData); + +public enum AdvisorySeverity +{ + None, + Low, + Medium, + High, + Critical +} + +public sealed record CvssScore( + string Version, + double Score, + string? Vector, + string? Severity); + +public sealed record AffectedPackage( + string Name, + string? Ecosystem, + string? Purl, + VersionRange? AffectedVersions, + string? FixedVersion, + PackageStatus Status); + +public sealed record VersionRange( + string? Start, + bool StartInclusive, + string? End, + bool EndInclusive); + +public enum PackageStatus +{ + Unknown, + Affected, + NotAffected, + Fixed, + UnderInvestigation +} + +public sealed record AdvisoryReference( + string Url, + ReferenceType Type, + string? Description); + +public enum ReferenceType +{ + Advisory, + Article, + Report, + Fix, + Web, + Package, + Evidence +} +``` + +### Feed Connector Base Class + +```csharp +// FeedConnectorBase.cs +namespace StellaOps.Concelier.Plugin; + +/// +/// Base class for vulnerability feed connectors. +/// +public abstract class FeedConnectorBase : IPlugin, IFeedCapability +{ + protected IPluginContext? Context { get; private set; } + protected HttpClient? HttpClient { get; private set; } + + public abstract PluginInfo Info { get; } + public PluginTrustLevel TrustLevel => PluginTrustLevel.BuiltIn; + public PluginCapabilities Capabilities => PluginCapabilities.Feed | PluginCapabilities.Network; + public PluginLifecycleState State { get; protected set; } = PluginLifecycleState.Discovered; + + public abstract string FeedId { get; } + public abstract FeedType Type { get; } + public abstract IReadOnlyList SupportedFormats { get; } + public virtual bool SupportsIncremental => true; + + public async Task InitializeAsync(IPluginContext context, CancellationToken ct) + { + Context = context; + State = PluginLifecycleState.Initializing; + + try + { + HttpClient = context.HttpClientFactory.CreateClient(FeedId); + await InitializeConnectorAsync(context, ct); + + State = PluginLifecycleState.Active; + context.Logger.Info("{FeedId} feed connector initialized", FeedId); + } + catch (Exception ex) + { + State = PluginLifecycleState.Failed; + context.Logger.Error(ex, "Failed to initialize {FeedId} connector", FeedId); + throw; + } + } + + protected virtual Task InitializeConnectorAsync(IPluginContext context, CancellationToken ct) + => Task.CompletedTask; + + public virtual async Task HealthCheckAsync(CancellationToken ct) + { + if (State != PluginLifecycleState.Active) + return HealthCheckResult.Unhealthy($"Connector is in state {State}"); + + try + { + var metadata = await GetMetadataAsync(ct); + return HealthCheckResult.Healthy(details: new Dictionary + { + ["lastModified"] = metadata.LastModified?.ToString("O") ?? "unknown", + ["advisoryCount"] = metadata.AdvisoryCount + }); + } + catch (Exception ex) + { + return HealthCheckResult.Unhealthy(ex); + } + } + + public abstract Task GetMetadataAsync(CancellationToken ct); + public abstract Task SyncAsync(FeedSyncOptions options, CancellationToken ct); + public abstract IAsyncEnumerable StreamAdvisoriesAsync(FeedStreamOptions options, CancellationToken ct); + public abstract Task GetAdvisoryAsync(string advisoryId, CancellationToken ct); + + protected void EnsureActive() + { + if (State != PluginLifecycleState.Active) + throw new InvalidOperationException($"{FeedId} connector is not active (state: {State})"); + } + + protected static AdvisorySeverity ParseSeverity(double? cvssScore) => cvssScore switch + { + null => AdvisorySeverity.None, + < 0.1 => AdvisorySeverity.None, + < 4.0 => AdvisorySeverity.Low, + < 7.0 => AdvisorySeverity.Medium, + < 9.0 => AdvisorySeverity.High, + _ => AdvisorySeverity.Critical + }; + + public virtual ValueTask DisposeAsync() + { + HttpClient?.Dispose(); + HttpClient = null; + State = PluginLifecycleState.Stopped; + return ValueTask.CompletedTask; + } +} +``` + +### NVD Connector Plugin Implementation + +```csharp +// NvdConnectorPlugin.cs +namespace StellaOps.Concelier.Plugin.Nvd; + +[Plugin( + id: "com.stellaops.feed.nvd", + name: "NVD CVE Feed", + version: "1.0.0", + vendor: "Stella Ops")] +[ProvidesCapability(PluginCapabilities.Feed, CapabilityId = "nvd")] +public sealed class NvdConnectorPlugin : FeedConnectorBase +{ + private NvdOptions? _options; + private string? _apiKey; + + public override PluginInfo Info => new( + Id: "com.stellaops.feed.nvd", + Name: "NVD CVE Feed", + Version: "1.0.0", + Vendor: "Stella Ops", + Description: "NIST National Vulnerability Database CVE feed"); + + public override string FeedId => "nvd"; + public override FeedType Type => FeedType.Cve; + public override IReadOnlyList SupportedFormats => new[] { AdvisoryFormat.Cve5 }; + + protected override async Task InitializeConnectorAsync(IPluginContext context, CancellationToken ct) + { + _options = context.Configuration.Bind(); + _apiKey = await context.Configuration.GetSecretAsync("nvd-api-key", ct); + + if (string.IsNullOrEmpty(_apiKey)) + { + context.Logger.Warning("NVD API key not configured - rate limits will apply"); + } + } + + public override async Task GetMetadataAsync(CancellationToken ct) + { + EnsureActive(); + + // Query NVD statistics + var request = new HttpRequestMessage(HttpMethod.Get, + $"{_options!.BaseUrl}/cves/2.0?resultsPerPage=1"); + + AddApiKeyHeader(request); + + var response = await HttpClient!.SendAsync(request, ct); + response.EnsureSuccessStatusCode(); + + var content = await response.Content.ReadAsStringAsync(ct); + var result = JsonSerializer.Deserialize(content); + + return new FeedMetadata( + FeedId: FeedId, + Name: "NIST National Vulnerability Database", + Description: "Official US government CVE feed", + LastModified: result?.Timestamp, + LastSync: null, + AdvisoryCount: result?.TotalResults ?? 0, + Version: result?.Version, + SourceUrl: _options.BaseUrl, + AdditionalInfo: new Dictionary + { + ["format"] = result?.Format ?? "NVD_CVE" + }); + } + + public override async Task SyncAsync(FeedSyncOptions options, CancellationToken ct) + { + EnsureActive(); + + var sw = Stopwatch.StartNew(); + var errors = new List(); + var added = 0; + var updated = 0; + var processed = 0; + + try + { + await foreach (var advisory in StreamAdvisoriesAsync( + new FeedStreamOptions(ModifiedSince: options.Since), + ct)) + { + processed++; + + // Emit advisory for processing by caller + // In real implementation, this would invoke storage callback + + if (advisory.Published >= (options.Since ?? DateTimeOffset.MinValue)) + added++; + else + updated++; + + if (options.MaxItems.HasValue && processed >= options.MaxItems.Value) + break; + } + + sw.Stop(); + + return new FeedSyncResult( + Success: true, + ItemsProcessed: processed, + ItemsAdded: added, + ItemsUpdated: updated, + ItemsRemoved: 0, + SyncedAt: Context!.TimeProvider.GetUtcNow(), + Duration: sw.Elapsed, + NextCheckpoint: Context.TimeProvider.GetUtcNow().ToString("O"), + Errors: errors); + } + catch (Exception ex) + { + sw.Stop(); + Context!.Logger.Error(ex, "NVD sync failed"); + + return new FeedSyncResult( + Success: false, + ItemsProcessed: processed, + ItemsAdded: added, + ItemsUpdated: updated, + ItemsRemoved: 0, + SyncedAt: Context.TimeProvider.GetUtcNow(), + Duration: sw.Elapsed, + NextCheckpoint: null, + Errors: new[] { new FeedSyncError("sync", ex.Message, ex) }); + } + } + + public override async IAsyncEnumerable StreamAdvisoriesAsync( + FeedStreamOptions options, + [EnumeratorCancellation] CancellationToken ct) + { + EnsureActive(); + + var startIndex = 0; + var batchSize = Math.Min(options.BatchSize, 2000); // NVD max is 2000 + + while (!ct.IsCancellationRequested) + { + var url = BuildQueryUrl(options, startIndex, batchSize); + var request = new HttpRequestMessage(HttpMethod.Get, url); + AddApiKeyHeader(request); + + var response = await HttpClient!.SendAsync(request, ct); + + // Handle rate limiting + if (response.StatusCode == HttpStatusCode.TooManyRequests) + { + var retryAfter = response.Headers.RetryAfter?.Delta ?? TimeSpan.FromSeconds(30); + Context!.Logger.Warning("NVD rate limited, waiting {Seconds}s", retryAfter.TotalSeconds); + await Task.Delay(retryAfter, ct); + continue; + } + + response.EnsureSuccessStatusCode(); + + var content = await response.Content.ReadAsStringAsync(ct); + var result = JsonSerializer.Deserialize(content); + + if (result?.Vulnerabilities == null || result.Vulnerabilities.Count == 0) + yield break; + + foreach (var vuln in result.Vulnerabilities) + { + var advisory = MapToAdvisory(vuln); + if (advisory != null) + yield return advisory; + } + + startIndex += batchSize; + if (startIndex >= result.TotalResults) + yield break; + + // Rate limit delay (6 requests per minute without API key) + if (string.IsNullOrEmpty(_apiKey)) + { + await Task.Delay(TimeSpan.FromSeconds(10), ct); + } + } + } + + public override async Task GetAdvisoryAsync(string advisoryId, CancellationToken ct) + { + EnsureActive(); + + var request = new HttpRequestMessage(HttpMethod.Get, + $"{_options!.BaseUrl}/cves/2.0?cveId={advisoryId}"); + AddApiKeyHeader(request); + + var response = await HttpClient!.SendAsync(request, ct); + if (response.StatusCode == HttpStatusCode.NotFound) + return null; + + response.EnsureSuccessStatusCode(); + + var content = await response.Content.ReadAsStringAsync(ct); + var result = JsonSerializer.Deserialize(content); + + return result?.Vulnerabilities?.FirstOrDefault() is { } vuln + ? MapToAdvisory(vuln) + : null; + } + + private string BuildQueryUrl(FeedStreamOptions options, int startIndex, int batchSize) + { + var url = $"{_options!.BaseUrl}/cves/2.0?startIndex={startIndex}&resultsPerPage={batchSize}"; + + if (options.ModifiedSince.HasValue) + { + url += $"&lastModStartDate={options.ModifiedSince.Value:yyyy-MM-ddTHH:mm:ss.fff}Z"; + url += $"&lastModEndDate={Context!.TimeProvider.GetUtcNow():yyyy-MM-ddTHH:mm:ss.fff}Z"; + } + + return url; + } + + private void AddApiKeyHeader(HttpRequestMessage request) + { + if (!string.IsNullOrEmpty(_apiKey)) + { + request.Headers.Add("apiKey", _apiKey); + } + } + + private Advisory? MapToAdvisory(NvdVulnerability vuln) + { + var cve = vuln.Cve; + if (cve == null) return null; + + var cvss = ExtractCvss(cve); + + return new Advisory( + Id: cve.Id, + FeedId: FeedId, + SourceFormat: AdvisoryFormat.Cve5, + Title: cve.Id, + Description: cve.Descriptions?.FirstOrDefault(d => d.Lang == "en")?.Value, + Severity: ParseSeverity(cvss?.Score), + Cvss: cvss, + Published: cve.Published, + Modified: cve.LastModified, + AffectedPackages: MapAffectedPackages(cve.Configurations), + References: MapReferences(cve.References), + Aliases: null, + Metadata: new Dictionary + { + ["vulnStatus"] = cve.VulnStatus ?? "unknown", + ["source"] = "NVD" + }, + RawData: null); + } + + private CvssScore? ExtractCvss(NvdCve cve) + { + // Prefer CVSS 3.1, then 3.0, then 2.0 + var metrics = cve.Metrics; + if (metrics == null) return null; + + if (metrics.CvssMetricV31?.FirstOrDefault() is { } v31) + { + return new CvssScore( + Version: "3.1", + Score: v31.CvssData.BaseScore, + Vector: v31.CvssData.VectorString, + Severity: v31.CvssData.BaseSeverity); + } + + if (metrics.CvssMetricV30?.FirstOrDefault() is { } v30) + { + return new CvssScore( + Version: "3.0", + Score: v30.CvssData.BaseScore, + Vector: v30.CvssData.VectorString, + Severity: v30.CvssData.BaseSeverity); + } + + if (metrics.CvssMetricV2?.FirstOrDefault() is { } v2) + { + return new CvssScore( + Version: "2.0", + Score: v2.CvssData.BaseScore, + Vector: v2.CvssData.VectorString, + Severity: v2.BaseSeverity); + } + + return null; + } + + private IReadOnlyList MapAffectedPackages(IReadOnlyList? configs) + { + if (configs == null) return Array.Empty(); + + var packages = new List(); + + foreach (var config in configs) + { + foreach (var node in config.Nodes ?? Enumerable.Empty()) + { + foreach (var cpeMatch in node.CpeMatch ?? Enumerable.Empty()) + { + if (!cpeMatch.Vulnerable) continue; + + // Parse CPE to extract package info + var cpe = ParseCpe(cpeMatch.Criteria); + if (cpe == null) continue; + + packages.Add(new AffectedPackage( + Name: cpe.Product, + Ecosystem: cpe.Vendor, + Purl: null, // CPE doesn't map directly to PURL + AffectedVersions: new VersionRange( + Start: cpeMatch.VersionStartIncluding ?? cpeMatch.VersionStartExcluding, + StartInclusive: cpeMatch.VersionStartIncluding != null, + End: cpeMatch.VersionEndIncluding ?? cpeMatch.VersionEndExcluding, + EndInclusive: cpeMatch.VersionEndIncluding != null), + FixedVersion: null, + Status: PackageStatus.Affected)); + } + } + } + + // Sort for deterministic output + return packages + .OrderBy(p => p.Name, StringComparer.OrdinalIgnoreCase) + .ThenBy(p => p.Ecosystem, StringComparer.OrdinalIgnoreCase) + .ToList(); + } + + private IReadOnlyList MapReferences(IReadOnlyList? refs) + { + if (refs == null) return Array.Empty(); + + return refs + .Select(r => new AdvisoryReference( + Url: r.Url, + Type: MapReferenceType(r.Tags), + Description: r.Source)) + .OrderBy(r => r.Url, StringComparer.OrdinalIgnoreCase) + .ToList(); + } + + private static ReferenceType MapReferenceType(IReadOnlyList? tags) + { + if (tags == null || tags.Count == 0) return ReferenceType.Web; + + if (tags.Contains("Patch")) return ReferenceType.Fix; + if (tags.Contains("Vendor Advisory")) return ReferenceType.Advisory; + if (tags.Contains("Third Party Advisory")) return ReferenceType.Advisory; + if (tags.Contains("Exploit")) return ReferenceType.Evidence; + + return ReferenceType.Web; + } + + private static CpeInfo? ParseCpe(string cpe) + { + // CPE 2.3 format: cpe:2.3:a:vendor:product:version:... + var parts = cpe.Split(':'); + if (parts.Length < 5) return null; + + return new CpeInfo( + Part: parts[2], + Vendor: parts[3], + Product: parts[4], + Version: parts.Length > 5 ? parts[5] : null); + } + + private sealed record CpeInfo(string Part, string Vendor, string Product, string? Version); + + // NVD API response models + private sealed record NvdResponse( + int ResultsPerPage, + int StartIndex, + int TotalResults, + string? Format, + string? Version, + DateTimeOffset? Timestamp, + IReadOnlyList? Vulnerabilities); + + private sealed record NvdVulnerability(NvdCve? Cve); + + private sealed record NvdCve( + string Id, + string? VulnStatus, + DateTimeOffset? Published, + DateTimeOffset? LastModified, + IReadOnlyList? Descriptions, + NvdMetrics? Metrics, + IReadOnlyList? Configurations, + IReadOnlyList? References); + + private sealed record NvdDescription(string Lang, string Value); + + private sealed record NvdMetrics( + IReadOnlyList? CvssMetricV31, + IReadOnlyList? CvssMetricV30, + IReadOnlyList? CvssMetricV2); + + private sealed record NvdCvssMetricV31(NvdCvssData CvssData); + private sealed record NvdCvssMetricV30(NvdCvssData CvssData); + private sealed record NvdCvssMetricV2(NvdCvssDataV2 CvssData, string? BaseSeverity); + + private sealed record NvdCvssData(double BaseScore, string? VectorString, string? BaseSeverity); + private sealed record NvdCvssDataV2(double BaseScore, string? VectorString); + + private sealed record NvdConfiguration(IReadOnlyList? Nodes); + private sealed record NvdNode(IReadOnlyList? CpeMatch); + + private sealed record NvdCpeMatch( + bool Vulnerable, + string Criteria, + string? VersionStartIncluding, + string? VersionStartExcluding, + string? VersionEndIncluding, + string? VersionEndExcluding); + + private sealed record NvdReference(string Url, string? Source, IReadOnlyList? Tags); +} + +public sealed class NvdOptions +{ + public string BaseUrl { get; set; } = "https://services.nvd.nist.gov/rest/json"; +} +``` + +### Red Hat OVAL Connector Plugin Implementation + +```csharp +// RedHatOvalConnectorPlugin.cs +namespace StellaOps.Concelier.Plugin.Oval.RedHat; + +[Plugin( + id: "com.stellaops.feed.oval.redhat", + name: "Red Hat OVAL Feed", + version: "1.0.0", + vendor: "Stella Ops")] +[ProvidesCapability(PluginCapabilities.Feed, CapabilityId = "redhat-oval")] +public sealed class RedHatOvalConnectorPlugin : FeedConnectorBase +{ + private RedHatOvalOptions? _options; + + public override PluginInfo Info => new( + Id: "com.stellaops.feed.oval.redhat", + Name: "Red Hat OVAL Feed", + Version: "1.0.0", + Vendor: "Stella Ops", + Description: "Red Hat Enterprise Linux security advisories in OVAL format"); + + public override string FeedId => "redhat-oval"; + public override FeedType Type => FeedType.Oval; + public override IReadOnlyList SupportedFormats => new[] { AdvisoryFormat.Oval }; + + protected override Task InitializeConnectorAsync(IPluginContext context, CancellationToken ct) + { + _options = context.Configuration.Bind(); + return Task.CompletedTask; + } + + public override async Task GetMetadataAsync(CancellationToken ct) + { + EnsureActive(); + + // Check PULP repository for metadata + var response = await HttpClient!.GetAsync( + $"{_options!.BaseUrl}/PULP_MANIFEST", ct); + + if (!response.IsSuccessStatusCode) + { + return new FeedMetadata( + FeedId: FeedId, + Name: "Red Hat OVAL", + Description: "Red Hat Enterprise Linux security data", + LastModified: null, + LastSync: null, + AdvisoryCount: 0, + Version: null, + SourceUrl: _options.BaseUrl, + AdditionalInfo: null); + } + + var manifest = await response.Content.ReadAsStringAsync(ct); + var lines = manifest.Split('\n', StringSplitOptions.RemoveEmptyEntries); + + return new FeedMetadata( + FeedId: FeedId, + Name: "Red Hat OVAL", + Description: "Red Hat Enterprise Linux security data", + LastModified: response.Content.Headers.LastModified, + LastSync: null, + AdvisoryCount: lines.Length, + Version: null, + SourceUrl: _options.BaseUrl, + AdditionalInfo: new Dictionary + { + ["fileCount"] = lines.Length + }); + } + + public override async Task SyncAsync(FeedSyncOptions options, CancellationToken ct) + { + EnsureActive(); + + var sw = Stopwatch.StartNew(); + var errors = new List(); + var processed = 0; + var added = 0; + + try + { + // Get list of OVAL files from PULP manifest + var manifestResponse = await HttpClient!.GetAsync( + $"{_options!.BaseUrl}/PULP_MANIFEST", ct); + manifestResponse.EnsureSuccessStatusCode(); + + var manifest = await manifestResponse.Content.ReadAsStringAsync(ct); + var files = ParseManifest(manifest); + + foreach (var file in files) + { + if (ct.IsCancellationRequested) break; + + try + { + var ovalContent = await DownloadOvalFileAsync(file, ct); + var advisories = await ParseOvalContentAsync(ovalContent, ct); + + foreach (var advisory in advisories) + { + processed++; + added++; + } + } + catch (Exception ex) + { + errors.Add(new FeedSyncError(file.Path, ex.Message, ex)); + } + } + + sw.Stop(); + + return new FeedSyncResult( + Success: errors.Count == 0, + ItemsProcessed: processed, + ItemsAdded: added, + ItemsUpdated: 0, + ItemsRemoved: 0, + SyncedAt: Context!.TimeProvider.GetUtcNow(), + Duration: sw.Elapsed, + NextCheckpoint: Context.TimeProvider.GetUtcNow().ToString("O"), + Errors: errors); + } + catch (Exception ex) + { + sw.Stop(); + return new FeedSyncResult( + Success: false, + ItemsProcessed: processed, + ItemsAdded: added, + ItemsUpdated: 0, + ItemsRemoved: 0, + SyncedAt: Context!.TimeProvider.GetUtcNow(), + Duration: sw.Elapsed, + NextCheckpoint: null, + Errors: new[] { new FeedSyncError("manifest", ex.Message, ex) }); + } + } + + public override async IAsyncEnumerable StreamAdvisoriesAsync( + FeedStreamOptions options, + [EnumeratorCancellation] CancellationToken ct) + { + EnsureActive(); + + // Get manifest + var manifestResponse = await HttpClient!.GetAsync( + $"{_options!.BaseUrl}/PULP_MANIFEST", ct); + manifestResponse.EnsureSuccessStatusCode(); + + var manifest = await manifestResponse.Content.ReadAsStringAsync(ct); + var files = ParseManifest(manifest); + + foreach (var file in files) + { + if (ct.IsCancellationRequested) yield break; + + var ovalContent = await DownloadOvalFileAsync(file, ct); + var advisories = await ParseOvalContentAsync(ovalContent, ct); + + foreach (var advisory in advisories) + { + if (options.ModifiedSince.HasValue && + advisory.Modified < options.ModifiedSince.Value) + continue; + + yield return advisory; + } + } + } + + public override Task GetAdvisoryAsync(string advisoryId, CancellationToken ct) + { + // OVAL doesn't support individual advisory lookup + // Would need to search through all files + return Task.FromResult(null); + } + + private IReadOnlyList ParseManifest(string manifest) + { + var files = new List(); + + foreach (var line in manifest.Split('\n', StringSplitOptions.RemoveEmptyEntries)) + { + var parts = line.Split(','); + if (parts.Length < 3) continue; + + files.Add(new OvalFileInfo( + Path: parts[0].Trim(), + Checksum: parts[1].Trim(), + Size: long.Parse(parts[2].Trim()))); + } + + return files + .Where(f => f.Path.EndsWith(".xml") || f.Path.EndsWith(".xml.bz2")) + .OrderBy(f => f.Path, StringComparer.OrdinalIgnoreCase) + .ToList(); + } + + private async Task DownloadOvalFileAsync(OvalFileInfo file, CancellationToken ct) + { + var response = await HttpClient!.GetAsync($"{_options!.BaseUrl}/{file.Path}", ct); + response.EnsureSuccessStatusCode(); + + if (file.Path.EndsWith(".bz2")) + { + await using var stream = await response.Content.ReadAsStreamAsync(ct); + await using var decompressed = new BZip2InputStream(stream); + using var reader = new StreamReader(decompressed); + return await reader.ReadToEndAsync(ct); + } + + return await response.Content.ReadAsStringAsync(ct); + } + + private Task> ParseOvalContentAsync(string content, CancellationToken ct) + { + var advisories = new List(); + var doc = XDocument.Parse(content); + var ns = doc.Root?.GetDefaultNamespace() ?? XNamespace.None; + + var definitions = doc.Descendants(ns + "definition"); + + foreach (var def in definitions) + { + var id = def.Attribute("id")?.Value; + if (string.IsNullOrEmpty(id)) continue; + + var metadata = def.Element(ns + "metadata"); + var title = metadata?.Element(ns + "title")?.Value; + var description = metadata?.Element(ns + "description")?.Value; + + var advisory = metadata?.Element(ns + "advisory"); + var severity = advisory?.Element(ns + "severity")?.Value; + var issued = advisory?.Element(ns + "issued")?.Attribute("date")?.Value; + var updated = advisory?.Element(ns + "updated")?.Attribute("date")?.Value; + + var cves = advisory?.Elements(ns + "cve") + .Select(c => c.Value) + .ToList() ?? new List(); + + var references = metadata?.Elements(ns + "reference") + .Select(r => new AdvisoryReference( + Url: r.Attribute("ref_url")?.Value ?? "", + Type: ReferenceType.Advisory, + Description: r.Attribute("source")?.Value)) + .Where(r => !string.IsNullOrEmpty(r.Url)) + .ToList() ?? new List(); + + advisories.Add(new Advisory( + Id: id, + FeedId: FeedId, + SourceFormat: AdvisoryFormat.Oval, + Title: title, + Description: description, + Severity: MapOvalSeverity(severity), + Cvss: null, + Published: ParseDate(issued), + Modified: ParseDate(updated), + AffectedPackages: ParseAffectedPackages(def, ns), + References: references, + Aliases: cves, + Metadata: new Dictionary + { + ["source"] = "Red Hat OVAL" + }, + RawData: null)); + } + + // Sort for deterministic output + return Task.FromResult>(advisories + .OrderBy(a => a.Id, StringComparer.OrdinalIgnoreCase) + .ToList()); + } + + private IReadOnlyList ParseAffectedPackages(XElement definition, XNamespace ns) + { + var packages = new List(); + + // Parse criteria for RPM references + var criteria = definition.Descendants(ns + "criterion"); + + foreach (var criterion in criteria) + { + var comment = criterion.Attribute("comment")?.Value; + if (string.IsNullOrEmpty(comment)) continue; + + // Parse patterns like "package-name is earlier than 0:1.2.3-4.el8" + var match = Regex.Match(comment, @"^(.+?)\s+is earlier than\s+(.+)$"); + if (match.Success) + { + packages.Add(new AffectedPackage( + Name: match.Groups[1].Value, + Ecosystem: "rpm", + Purl: $"pkg:rpm/redhat/{match.Groups[1].Value}", + AffectedVersions: new VersionRange( + Start: null, + StartInclusive: false, + End: match.Groups[2].Value, + EndInclusive: false), + FixedVersion: match.Groups[2].Value, + Status: PackageStatus.Fixed)); + } + } + + return packages + .OrderBy(p => p.Name, StringComparer.OrdinalIgnoreCase) + .ToList(); + } + + private static AdvisorySeverity MapOvalSeverity(string? severity) => severity?.ToLowerInvariant() switch + { + "critical" => AdvisorySeverity.Critical, + "important" => AdvisorySeverity.High, + "moderate" => AdvisorySeverity.Medium, + "low" => AdvisorySeverity.Low, + _ => AdvisorySeverity.None + }; + + private static DateTimeOffset? ParseDate(string? date) + { + if (string.IsNullOrEmpty(date)) return null; + return DateTimeOffset.TryParse(date, out var result) ? result : null; + } + + private sealed record OvalFileInfo(string Path, string Checksum, long Size); +} + +public sealed class RedHatOvalOptions +{ + public string BaseUrl { get; set; } = "https://www.redhat.com/security/data/oval/v2"; +} +``` + +### Migration Tasks + +| Connector | Current Interface | New Implementation | Status | +|-----------|-------------------|-------------------|--------| +| NVD | `IFeedConnector` | `NvdConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| MITRE | `IFeedConnector` | `MitreConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| CVE List v5 | `IFeedConnector` | `CveListV5ConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| Red Hat OVAL | `IFeedConnector` | `RedHatOvalConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| Ubuntu OVAL | `IFeedConnector` | `UbuntuOvalConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| Debian OVAL | `IFeedConnector` | `DebianOvalConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| SUSE OVAL | `IFeedConnector` | `SuseOvalConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| Oracle OVAL | `IFeedConnector` | `OracleOvalConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| AlmaLinux OVAL | `IFeedConnector` | `AlmaLinuxOvalConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| Rocky Linux OVAL | `IFeedConnector` | `RockyLinuxOvalConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| Alpine SecDB | `IFeedConnector` | `AlpineSecDbConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| OSV | `IFeedConnector` | `OsvConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| GHSA | `IFeedConnector` | `GhsaConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| GitLab Advisories | `IFeedConnector` | `GitLabAdvisoriesConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| Microsoft MSRC | `IFeedConnector` | `MsrcConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| Amazon Inspector | `IFeedConnector` | `AmazonInspectorConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| CISA KEV | `IFeedConnector` | `CisaKevConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| Mirror | `IFeedConnector` | `MirrorFeedConnectorPlugin : IPlugin, IFeedCapability` | TODO | + +--- + +## Acceptance Criteria + +- [ ] All feed connectors implement `IPlugin` +- [ ] All feed connectors implement `IFeedCapability` +- [ ] Incremental sync with checkpoints works +- [ ] Full sync works for all feeds +- [ ] Advisory streaming works +- [ ] Deterministic output ordering maintained +- [ ] Health checks verify feed availability +- [ ] Rate limiting handled gracefully +- [ ] Plugin manifests for all connectors +- [ ] Air-gap mirror connector works +- [ ] Unit tests migrated/updated +- [ ] Integration tests with mock feeds + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_001 Plugin Abstractions | Internal | TODO | +| 100_002 Plugin Host | Internal | TODO | +| SharpCompress | External | Available (bz2) | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IFeedCapability interface | TODO | | +| FeedConnectorBase | TODO | | +| NvdConnectorPlugin | TODO | | +| MitreConnectorPlugin | TODO | | +| CveListV5ConnectorPlugin | TODO | | +| RedHatOvalConnectorPlugin | TODO | | +| UbuntuOvalConnectorPlugin | TODO | | +| DebianOvalConnectorPlugin | TODO | | +| SuseOvalConnectorPlugin | TODO | | +| OracleOvalConnectorPlugin | TODO | | +| AlmaLinuxOvalConnectorPlugin | TODO | | +| RockyLinuxOvalConnectorPlugin | TODO | | +| AlpineSecDbConnectorPlugin | TODO | | +| OsvConnectorPlugin | TODO | | +| GhsaConnectorPlugin | TODO | | +| GitLabAdvisoriesConnectorPlugin | TODO | | +| MsrcConnectorPlugin | TODO | | +| AmazonInspectorConnectorPlugin | TODO | | +| CisaKevConnectorPlugin | TODO | | +| MirrorFeedConnectorPlugin | TODO | | +| Plugin manifests | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_100_012_PLUGIN_sdk.md b/docs/implplan/SPRINT_20260110_100_012_PLUGIN_sdk.md new file mode 100644 index 000000000..43291d8a5 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_012_PLUGIN_sdk.md @@ -0,0 +1,1168 @@ +# SPRINT: Plugin SDK & Developer Experience + +> **Sprint ID:** 100_012 +> **Module:** PLUGIN +> **Phase:** 100 - Plugin System Unification +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_plugin_unification.md) + +--- + +## Overview + +Create a comprehensive Plugin SDK that provides developers with tools, templates, testing utilities, and documentation for building plugins for the Stella Ops platform. + +### Objectives + +- Create plugin project templates (dotnet new) +- Build plugin development CLI tool +- Create plugin testing framework +- Build plugin packaging tooling +- Create plugin validation tooling +- Write comprehensive documentation +- Provide sample plugins for each capability + +### Target Deliverables + +``` +src/ +├── Plugin/ +│ ├── StellaOps.Plugin.Sdk/ # SDK library +│ ├── StellaOps.Plugin.Templates/ # dotnet new templates +│ ├── StellaOps.Plugin.Testing/ # Testing utilities +│ ├── StellaOps.Plugin.Cli/ # Plugin development CLI +│ └── StellaOps.Plugin.Samples/ # Sample plugins +``` + +--- + +## Deliverables + +### Plugin SDK Library + +```csharp +// StellaOps.Plugin.Sdk - Main entry point for plugin developers + +namespace StellaOps.Plugin.Sdk; + +/// +/// Base class for simplified plugin development. +/// Provides common patterns and reduces boilerplate. +/// +public abstract class PluginBase : IPlugin +{ + private IPluginContext? _context; + protected IPluginLogger Logger => _context?.Logger ?? NullPluginLogger.Instance; + protected IPluginConfiguration Configuration => _context?.Configuration ?? EmptyConfiguration.Instance; + protected TimeProvider TimeProvider => _context?.TimeProvider ?? TimeProvider.System; + + public abstract PluginInfo Info { get; } + public virtual PluginTrustLevel TrustLevel => PluginTrustLevel.Untrusted; + public abstract PluginCapabilities Capabilities { get; } + public PluginLifecycleState State { get; protected set; } = PluginLifecycleState.Discovered; + + public async Task InitializeAsync(IPluginContext context, CancellationToken ct) + { + _context = context; + State = PluginLifecycleState.Initializing; + + try + { + await OnInitializeAsync(context, ct); + State = PluginLifecycleState.Active; + Logger.Info("Plugin {PluginId} initialized successfully", Info.Id); + } + catch (Exception ex) + { + State = PluginLifecycleState.Failed; + Logger.Error(ex, "Plugin {PluginId} failed to initialize", Info.Id); + throw; + } + } + + protected virtual Task OnInitializeAsync(IPluginContext context, CancellationToken ct) + => Task.CompletedTask; + + public virtual Task HealthCheckAsync(CancellationToken ct) + { + return Task.FromResult(State == PluginLifecycleState.Active + ? HealthCheckResult.Healthy() + : HealthCheckResult.Unhealthy($"Plugin is in state {State}")); + } + + public virtual async ValueTask DisposeAsync() + { + try + { + await OnDisposeAsync(); + } + finally + { + State = PluginLifecycleState.Stopped; + } + } + + protected virtual ValueTask OnDisposeAsync() => ValueTask.CompletedTask; +} + +/// +/// Fluent builder for creating PluginInfo. +/// +public sealed class PluginInfoBuilder +{ + private string _id = ""; + private string _name = ""; + private string _version = "1.0.0"; + private string _vendor = ""; + private string? _description; + private string? _licenseId; + private string? _homepage; + private string? _repository; + private readonly List _dependencies = new(); + private readonly Dictionary _metadata = new(); + + public PluginInfoBuilder WithId(string id) + { + _id = id; + return this; + } + + public PluginInfoBuilder WithName(string name) + { + _name = name; + return this; + } + + public PluginInfoBuilder WithVersion(string version) + { + _version = version; + return this; + } + + public PluginInfoBuilder WithVendor(string vendor) + { + _vendor = vendor; + return this; + } + + public PluginInfoBuilder WithDescription(string description) + { + _description = description; + return this; + } + + public PluginInfoBuilder WithLicense(string licenseId) + { + _licenseId = licenseId; + return this; + } + + public PluginInfoBuilder WithHomepage(string homepage) + { + _homepage = homepage; + return this; + } + + public PluginInfoBuilder WithRepository(string repository) + { + _repository = repository; + return this; + } + + public PluginInfoBuilder DependsOn(string pluginId, string? versionRange = null) + { + _dependencies.Add(new PluginDependency(pluginId, versionRange, false)); + return this; + } + + public PluginInfoBuilder OptionallyDependsOn(string pluginId, string? versionRange = null) + { + _dependencies.Add(new PluginDependency(pluginId, versionRange, true)); + return this; + } + + public PluginInfoBuilder WithMetadata(string key, string value) + { + _metadata[key] = value; + return this; + } + + public PluginInfo Build() + { + if (string.IsNullOrEmpty(_id)) + throw new InvalidOperationException("Plugin ID is required"); + if (string.IsNullOrEmpty(_name)) + throw new InvalidOperationException("Plugin name is required"); + + return new PluginInfo( + Id: _id, + Name: _name, + Version: _version, + Vendor: _vendor, + Description: _description, + LicenseId: _licenseId, + Homepage: _homepage, + Repository: _repository, + Dependencies: _dependencies, + Metadata: _metadata.Count > 0 ? _metadata : null); + } +} + +/// +/// Extension methods for common plugin operations. +/// +public static class PluginExtensions +{ + /// + /// Get configuration value with type conversion. + /// + public static T GetValue(this IPluginConfiguration config, string key, T defaultValue = default!) + { + var value = config.GetValue(key); + if (value == null) return defaultValue; + + return (T)Convert.ChangeType(value, typeof(T), CultureInfo.InvariantCulture); + } + + /// + /// Get secret with caching. + /// + public static async Task GetCachedSecretAsync( + this IPluginConfiguration config, + string key, + TimeSpan cacheDuration, + CancellationToken ct) + { + // Implementation would cache secrets to reduce vault calls + return await config.GetSecretAsync(key, ct); + } + + /// + /// Create a scoped logger for a specific operation. + /// + public static IDisposable BeginScope(this IPluginLogger logger, string operationName) + { + logger.Debug("Starting operation: {Operation}", operationName); + var sw = Stopwatch.StartNew(); + + return new ScopeDisposable(() => + { + sw.Stop(); + logger.Debug("Completed operation: {Operation} in {Elapsed}ms", + operationName, sw.ElapsedMilliseconds); + }); + } + + private sealed class ScopeDisposable(Action onDispose) : IDisposable + { + public void Dispose() => onDispose(); + } +} + +/// +/// Attribute for marking plugin configuration properties. +/// +[AttributeUsage(AttributeTargets.Property)] +public sealed class PluginConfigAttribute : Attribute +{ + public string? Key { get; set; } + public string? Description { get; set; } + public bool Required { get; set; } + public object? DefaultValue { get; set; } + public bool Secret { get; set; } +} + +/// +/// Options base class with validation support. +/// +public abstract class PluginOptionsBase : IValidatableObject +{ + public virtual IEnumerable Validate(ValidationContext validationContext) + { + yield break; + } +} +``` + +### Plugin Testing Framework + +```csharp +// StellaOps.Plugin.Testing + +namespace StellaOps.Plugin.Testing; + +/// +/// Test host for running plugins in isolation during testing. +/// +public sealed class PluginTestHost : IAsyncDisposable +{ + private readonly List _plugins = new(); + private readonly TestPluginContext _context; + + public PluginTestHost(Action? configure = null) + { + var options = new PluginTestHostOptions(); + configure?.Invoke(options); + + _context = new TestPluginContext(options); + } + + /// + /// Load and initialize a plugin. + /// + public async Task LoadPluginAsync(CancellationToken ct = default) where T : IPlugin, new() + { + var plugin = new T(); + await plugin.InitializeAsync(_context, ct); + _plugins.Add(plugin); + return plugin; + } + + /// + /// Load and initialize a plugin with custom configuration. + /// + public async Task LoadPluginAsync( + Dictionary configuration, + CancellationToken ct = default) where T : IPlugin, new() + { + foreach (var (key, value) in configuration) + { + _context.Configuration.SetValue(key, value); + } + + return await LoadPluginAsync(ct); + } + + /// + /// Get the test context for assertions. + /// + public TestPluginContext Context => _context; + + /// + /// Verify plugin health. + /// + public async Task CheckHealthAsync(T plugin, CancellationToken ct = default) + where T : IPlugin + { + return await plugin.HealthCheckAsync(ct); + } + + public async ValueTask DisposeAsync() + { + foreach (var plugin in _plugins) + { + await plugin.DisposeAsync(); + } + _plugins.Clear(); + } +} + +/// +/// Options for configuring the test host. +/// +public sealed class PluginTestHostOptions +{ + public bool EnableLogging { get; set; } = true; + public LogLevel MinLogLevel { get; set; } = LogLevel.Debug; + public TimeProvider? TimeProvider { get; set; } + public Dictionary Secrets { get; } = new(); + public Dictionary Configuration { get; } = new(); +} + +/// +/// Test implementation of IPluginContext. +/// +public sealed class TestPluginContext : IPluginContext +{ + public TestPluginConfiguration Configuration { get; } + public TestPluginLogger Logger { get; } + public TimeProvider TimeProvider { get; } + public IHttpClientFactory HttpClientFactory { get; } + public IGuidGenerator GuidGenerator { get; } + + IPluginConfiguration IPluginContext.Configuration => Configuration; + IPluginLogger IPluginContext.Logger => Logger; + + public TestPluginContext(PluginTestHostOptions options) + { + Configuration = new TestPluginConfiguration(options.Configuration, options.Secrets); + Logger = new TestPluginLogger(options.MinLogLevel, options.EnableLogging); + TimeProvider = options.TimeProvider ?? new FakeTimeProvider(DateTimeOffset.UtcNow); + HttpClientFactory = new TestHttpClientFactory(); + GuidGenerator = new SequentialGuidGenerator(); + } +} + +/// +/// Test implementation of plugin configuration. +/// +public sealed class TestPluginConfiguration : IPluginConfiguration +{ + private readonly Dictionary _values; + private readonly Dictionary _secrets; + + public TestPluginConfiguration( + Dictionary values, + Dictionary secrets) + { + _values = new Dictionary(values); + _secrets = new Dictionary(secrets); + } + + public string? GetValue(string key) + { + return _values.TryGetValue(key, out var value) ? value?.ToString() : null; + } + + public void SetValue(string key, object value) + { + _values[key] = value; + } + + public T Bind() where T : new() + { + var result = new T(); + var properties = typeof(T).GetProperties(); + + foreach (var prop in properties) + { + var key = prop.Name; + var configAttr = prop.GetCustomAttribute(); + if (configAttr?.Key != null) + key = configAttr.Key; + + if (_values.TryGetValue(key, out var value)) + { + prop.SetValue(result, Convert.ChangeType(value, prop.PropertyType)); + } + } + + return result; + } + + public Task GetSecretAsync(string key, CancellationToken ct) + { + return Task.FromResult(_secrets.TryGetValue(key, out var secret) ? secret : null); + } + + public void SetSecret(string key, string value) + { + _secrets[key] = value; + } +} + +/// +/// Test logger that captures log entries for assertions. +/// +public sealed class TestPluginLogger : IPluginLogger +{ + private readonly LogLevel _minLevel; + private readonly bool _enabled; + private readonly List _entries = new(); + private readonly object _lock = new(); + + public IReadOnlyList Entries + { + get + { + lock (_lock) return _entries.ToList(); + } + } + + public TestPluginLogger(LogLevel minLevel, bool enabled) + { + _minLevel = minLevel; + _enabled = enabled; + } + + public void Log(LogLevel level, string message, params object[] args) + { + if (!_enabled || level < _minLevel) return; + + var formatted = args.Length > 0 + ? string.Format(CultureInfo.InvariantCulture, message, args) + : message; + + lock (_lock) + { + _entries.Add(new LogEntry(level, formatted, null)); + } + + if (_enabled) + { + Console.WriteLine($"[{level}] {formatted}"); + } + } + + public void Log(LogLevel level, Exception exception, string message, params object[] args) + { + if (!_enabled || level < _minLevel) return; + + var formatted = args.Length > 0 + ? string.Format(CultureInfo.InvariantCulture, message, args) + : message; + + lock (_lock) + { + _entries.Add(new LogEntry(level, formatted, exception)); + } + + if (_enabled) + { + Console.WriteLine($"[{level}] {formatted}"); + Console.WriteLine(exception); + } + } + + public void Debug(string message, params object[] args) => Log(LogLevel.Debug, message, args); + public void Info(string message, params object[] args) => Log(LogLevel.Information, message, args); + public void Warning(string message, params object[] args) => Log(LogLevel.Warning, message, args); + public void Warning(Exception ex, string message, params object[] args) => Log(LogLevel.Warning, ex, message, args); + public void Error(string message, params object[] args) => Log(LogLevel.Error, message, args); + public void Error(Exception ex, string message, params object[] args) => Log(LogLevel.Error, ex, message, args); + + public bool HasLoggedAtLevel(LogLevel level) => Entries.Any(e => e.Level == level); + public bool HasLoggedError() => HasLoggedAtLevel(LogLevel.Error); + public bool HasLoggedWarning() => HasLoggedAtLevel(LogLevel.Warning); + + public void Clear() + { + lock (_lock) _entries.Clear(); + } +} + +public sealed record LogEntry(LogLevel Level, string Message, Exception? Exception); + +/// +/// Fake time provider for deterministic testing. +/// +public sealed class FakeTimeProvider : TimeProvider +{ + private DateTimeOffset _now; + + public FakeTimeProvider(DateTimeOffset startTime) + { + _now = startTime; + } + + public override DateTimeOffset GetUtcNow() => _now; + + public void Advance(TimeSpan duration) => _now += duration; + public void SetTime(DateTimeOffset time) => _now = time; +} + +/// +/// Sequential GUID generator for deterministic testing. +/// +public sealed class SequentialGuidGenerator : IGuidGenerator +{ + private int _counter; + + public Guid NewGuid() + { + var counter = Interlocked.Increment(ref _counter); + var bytes = new byte[16]; + BitConverter.GetBytes(counter).CopyTo(bytes, 0); + return new Guid(bytes); + } + + public void Reset() => _counter = 0; +} + +/// +/// Test HTTP client factory with request recording. +/// +public sealed class TestHttpClientFactory : IHttpClientFactory +{ + private readonly Dictionary _handlers = new(); + private readonly List _requests = new(); + + public HttpClient CreateClient(string name) + { + if (!_handlers.TryGetValue(name, out var handler)) + { + handler = new MockHttpMessageHandler(_requests); + _handlers[name] = handler; + } + + return new HttpClient(handler); + } + + public void SetupResponse(string name, string url, HttpResponseMessage response) + { + if (!_handlers.TryGetValue(name, out var handler)) + { + handler = new MockHttpMessageHandler(_requests); + _handlers[name] = handler; + } + + handler.SetupResponse(url, response); + } + + public IReadOnlyList RecordedRequests => _requests; +} + +internal sealed class MockHttpMessageHandler : HttpMessageHandler +{ + private readonly List _requests; + private readonly Dictionary _responses = new(); + + public MockHttpMessageHandler(List requests) + { + _requests = requests; + } + + public void SetupResponse(string url, HttpResponseMessage response) + { + _responses[url] = response; + } + + protected override Task SendAsync( + HttpRequestMessage request, + CancellationToken cancellationToken) + { + _requests.Add(request); + + var url = request.RequestUri?.ToString() ?? ""; + if (_responses.TryGetValue(url, out var response)) + { + return Task.FromResult(response); + } + + return Task.FromResult(new HttpResponseMessage(HttpStatusCode.NotFound)); + } +} + +/// +/// xUnit test fixtures for plugin testing. +/// +public abstract class PluginTestBase : IAsyncLifetime where TPlugin : IPlugin, new() +{ + protected PluginTestHost Host { get; private set; } = null!; + protected TPlugin Plugin { get; private set; } = default!; + protected TestPluginContext Context => Host.Context; + + protected virtual void ConfigureHost(PluginTestHostOptions options) { } + protected virtual Dictionary GetConfiguration() => new(); + + public virtual async Task InitializeAsync() + { + Host = new PluginTestHost(ConfigureHost); + Plugin = await Host.LoadPluginAsync(GetConfiguration()); + } + + public virtual async Task DisposeAsync() + { + await Host.DisposeAsync(); + } +} +``` + +### Plugin CLI Tool + +```csharp +// StellaOps.Plugin.Cli - Command-line tool for plugin development + +namespace StellaOps.Plugin.Cli; + +/// +/// CLI commands for plugin development workflow. +/// +public static class PluginCliCommands +{ + /// + /// Create a new plugin project from template. + /// + [Command("new")] + public static class NewCommand + { + [CommandOption("--name", "-n", Description = "Plugin name")] + public required string Name { get; set; } + + [CommandOption("--capability", "-c", Description = "Plugin capability type")] + public PluginCapabilities Capability { get; set; } = PluginCapabilities.Custom; + + [CommandOption("--output", "-o", Description = "Output directory")] + public string Output { get; set; } = "."; + + public async Task ExecuteAsync() + { + Console.WriteLine($"Creating new plugin: {Name}"); + Console.WriteLine($"Capability: {Capability}"); + Console.WriteLine($"Output: {Output}"); + + var generator = new PluginProjectGenerator(); + await generator.GenerateAsync(Name, Capability, Output); + + Console.WriteLine("Plugin project created successfully!"); + return 0; + } + } + + /// + /// Validate a plugin manifest. + /// + [Command("validate")] + public static class ValidateCommand + { + [CommandOption("--manifest", "-m", Description = "Path to plugin manifest")] + public string Manifest { get; set; } = "plugin.yaml"; + + public async Task ExecuteAsync() + { + Console.WriteLine($"Validating manifest: {Manifest}"); + + var validator = new PluginManifestValidator(); + var result = await validator.ValidateAsync(Manifest); + + if (result.IsValid) + { + Console.WriteLine("Manifest is valid."); + return 0; + } + + Console.WriteLine("Validation errors:"); + foreach (var error in result.Errors) + { + Console.WriteLine($" - {error}"); + } + return 1; + } + } + + /// + /// Package a plugin for distribution. + /// + [Command("pack")] + public static class PackCommand + { + [CommandOption("--project", "-p", Description = "Path to plugin project")] + public string Project { get; set; } = "."; + + [CommandOption("--output", "-o", Description = "Output directory for package")] + public string Output { get; set; } = "./packages"; + + [CommandOption("--configuration", "-c", Description = "Build configuration")] + public string Configuration { get; set; } = "Release"; + + public async Task ExecuteAsync() + { + Console.WriteLine($"Packaging plugin from: {Project}"); + + var packager = new PluginPackager(); + var package = await packager.PackAsync(Project, Output, Configuration); + + Console.WriteLine($"Package created: {package}"); + return 0; + } + } + + /// + /// Run a plugin locally for testing. + /// + [Command("run")] + public static class RunCommand + { + [CommandOption("--project", "-p", Description = "Path to plugin project")] + public string Project { get; set; } = "."; + + [CommandOption("--config", "-c", Description = "Path to configuration file")] + public string? Config { get; set; } + + public async Task ExecuteAsync() + { + Console.WriteLine($"Running plugin from: {Project}"); + + var runner = new PluginLocalRunner(); + await runner.RunAsync(Project, Config, CancellationToken.None); + + return 0; + } + } + + /// + /// Generate plugin manifest from code. + /// + [Command("manifest")] + public static class ManifestCommand + { + [CommandOption("--assembly", "-a", Description = "Path to plugin assembly")] + public required string Assembly { get; set; } + + [CommandOption("--output", "-o", Description = "Output path for manifest")] + public string Output { get; set; } = "plugin.yaml"; + + public async Task ExecuteAsync() + { + Console.WriteLine($"Generating manifest from: {Assembly}"); + + var generator = new ManifestGenerator(); + await generator.GenerateFromAssemblyAsync(Assembly, Output); + + Console.WriteLine($"Manifest generated: {Output}"); + return 0; + } + } + + /// + /// Test plugin in isolated environment. + /// + [Command("test")] + public static class TestCommand + { + [CommandOption("--project", "-p", Description = "Path to plugin project")] + public string Project { get; set; } = "."; + + [CommandOption("--filter", "-f", Description = "Test filter")] + public string? Filter { get; set; } + + public async Task ExecuteAsync() + { + Console.WriteLine($"Testing plugin: {Project}"); + + var tester = new PluginTestRunner(); + var result = await tester.RunAsync(Project, Filter); + + Console.WriteLine($"Tests: {result.Passed} passed, {result.Failed} failed"); + return result.Failed > 0 ? 1 : 0; + } + } +} + +/// +/// Generates plugin project structure from templates. +/// +public sealed class PluginProjectGenerator +{ + public async Task GenerateAsync(string name, PluginCapabilities capability, string outputDir) + { + var projectDir = Path.Combine(outputDir, name); + Directory.CreateDirectory(projectDir); + + // Generate .csproj + var csproj = GenerateCsproj(name, capability); + await File.WriteAllTextAsync(Path.Combine(projectDir, $"{name}.csproj"), csproj); + + // Generate main plugin class + var pluginClass = GeneratePluginClass(name, capability); + await File.WriteAllTextAsync(Path.Combine(projectDir, $"{ToPascalCase(name)}Plugin.cs"), pluginClass); + + // Generate plugin manifest + var manifest = GenerateManifest(name, capability); + await File.WriteAllTextAsync(Path.Combine(projectDir, "plugin.yaml"), manifest); + + // Generate options class + var options = GenerateOptions(name); + await File.WriteAllTextAsync(Path.Combine(projectDir, $"{ToPascalCase(name)}Options.cs"), options); + + // Generate test project + var testDir = Path.Combine(projectDir, "Tests"); + Directory.CreateDirectory(testDir); + + var testCsproj = GenerateTestCsproj(name); + await File.WriteAllTextAsync(Path.Combine(testDir, $"{name}.Tests.csproj"), testCsproj); + + var testClass = GenerateTestClass(name); + await File.WriteAllTextAsync(Path.Combine(testDir, $"{ToPascalCase(name)}PluginTests.cs"), testClass); + } + + private string GenerateCsproj(string name, PluginCapabilities capability) => $""" + + + net10.0 + enable + enable + true + + + + + + + + + + + + """; + + private string GeneratePluginClass(string name, PluginCapabilities capability) + { + var className = ToPascalCase(name); + var capabilityInterface = GetCapabilityInterface(capability); + + return $$""" + using StellaOps.Plugin.Abstractions; + using StellaOps.Plugin.Sdk; + + namespace {{className}}; + + [Plugin( + id: "com.example.{{name.ToLowerInvariant()}}", + name: "{{className}}", + version: "1.0.0", + vendor: "Your Company")] + [ProvidesCapability(PluginCapabilities.{{capability}}, CapabilityId = "{{name.ToLowerInvariant()}}")] + public sealed class {{className}}Plugin : PluginBase{{(capabilityInterface != null ? $", {capabilityInterface}" : "")}} + { + private {{className}}Options? _options; + + public override PluginInfo Info => new PluginInfoBuilder() + .WithId("com.example.{{name.ToLowerInvariant()}}") + .WithName("{{className}}") + .WithVersion("1.0.0") + .WithVendor("Your Company") + .WithDescription("Description of your plugin") + .Build(); + + public override PluginCapabilities Capabilities => PluginCapabilities.{{capability}}; + + protected override async Task OnInitializeAsync(IPluginContext context, CancellationToken ct) + { + _options = context.Configuration.Bind<{{className}}Options>(); + + // Add your initialization logic here + Logger.Info("{{className}} plugin initialized"); + } + + public override async Task HealthCheckAsync(CancellationToken ct) + { + // Add your health check logic here + return HealthCheckResult.Healthy(); + } + + protected override async ValueTask OnDisposeAsync() + { + // Add your cleanup logic here + } + } + """; + } + + private string GenerateManifest(string name, PluginCapabilities capability) => $""" + plugin: + id: com.example.{name.ToLowerInvariant()} + name: {ToPascalCase(name)} + version: 1.0.0 + vendor: Your Company + description: Description of your plugin + license: MIT + + entryPoint: {ToPascalCase(name)}.{ToPascalCase(name)}Plugin + + minPlatformVersion: 1.0.0 + + capabilities: + - type: {capability.ToString().ToLowerInvariant()} + id: {name.ToLowerInvariant()} + + configSchema: + type: object + properties: + exampleSetting: + type: string + description: An example configuration setting + required: [] + """; + + private string GenerateOptions(string name) => $""" + using System.ComponentModel.DataAnnotations; + using StellaOps.Plugin.Sdk; + + namespace {ToPascalCase(name)}; + + public sealed class {ToPascalCase(name)}Options : PluginOptionsBase + {{ + [PluginConfig(Description = "An example configuration setting")] + public string? ExampleSetting {{ get; set; }} + }} + """; + + private string GenerateTestCsproj(string name) => $""" + + + net10.0 + enable + enable + false + + + + + + + + + + + + + + """; + + private string GenerateTestClass(string name) => $""" + using StellaOps.Plugin.Testing; + using Xunit; + + namespace {ToPascalCase(name)}.Tests; + + public class {ToPascalCase(name)}PluginTests : PluginTestBase<{ToPascalCase(name)}Plugin> + {{ + [Fact] + public async Task Plugin_Initializes_Successfully() + {{ + // Assert plugin is in active state after initialization + Assert.Equal(PluginLifecycleState.Active, Plugin.State); + }} + + [Fact] + public async Task HealthCheck_Returns_Healthy() + {{ + var result = await Host.CheckHealthAsync(Plugin); + Assert.Equal(HealthStatus.Healthy, result.Status); + }} + }} + """; + + private static string? GetCapabilityInterface(PluginCapabilities capability) => capability switch + { + PluginCapabilities.Crypto => "ICryptoCapability", + PluginCapabilities.Auth => "IAuthCapability", + PluginCapabilities.Llm => "ILlmCapability", + PluginCapabilities.Scm => "IScmCapability", + PluginCapabilities.Analysis => "IAnalysisCapability", + PluginCapabilities.Transport => "ITransportCapability", + PluginCapabilities.Feed => "IFeedCapability", + PluginCapabilities.WorkflowStep => "IStepProviderCapability", + PluginCapabilities.PromotionGate => "IGateProviderCapability", + _ => null + }; + + private static string ToPascalCase(string name) + { + return string.Join("", name.Split('-', '_') + .Select(s => char.ToUpperInvariant(s[0]) + s[1..])); + } +} + +/// +/// Validates plugin manifests. +/// +public sealed class PluginManifestValidator +{ + public async Task ValidateAsync(string manifestPath) + { + var errors = new List(); + + if (!File.Exists(manifestPath)) + { + errors.Add($"Manifest file not found: {manifestPath}"); + return new ValidationResult(false, errors); + } + + var content = await File.ReadAllTextAsync(manifestPath); + + try + { + var manifest = YamlDeserializer.Deserialize(content); + + // Validate required fields + if (string.IsNullOrEmpty(manifest.Plugin?.Id)) + errors.Add("Plugin ID is required"); + + if (string.IsNullOrEmpty(manifest.Plugin?.Name)) + errors.Add("Plugin name is required"); + + if (string.IsNullOrEmpty(manifest.Plugin?.Version)) + errors.Add("Plugin version is required"); + + if (string.IsNullOrEmpty(manifest.EntryPoint)) + errors.Add("Entry point is required"); + + // Validate ID format + if (manifest.Plugin?.Id != null && !PluginIdPattern.IsMatch(manifest.Plugin.Id)) + errors.Add("Plugin ID must be in reverse domain notation (e.g., com.example.plugin)"); + + // Validate version format + if (manifest.Plugin?.Version != null && !SemVerPattern.IsMatch(manifest.Plugin.Version)) + errors.Add("Plugin version must be valid SemVer"); + } + catch (Exception ex) + { + errors.Add($"Failed to parse manifest: {ex.Message}"); + } + + return new ValidationResult(errors.Count == 0, errors); + } + + private static readonly Regex PluginIdPattern = new(@"^[a-z][a-z0-9]*(\.[a-z][a-z0-9]*)+$", RegexOptions.Compiled); + private static readonly Regex SemVerPattern = new(@"^\d+\.\d+\.\d+(-[a-zA-Z0-9]+)?$", RegexOptions.Compiled); + + public sealed record ValidationResult(bool IsValid, IReadOnlyList Errors); +} +``` + +### Sample Plugins + +```yaml +# Directory structure for sample plugins +samples/ +├── HelloWorldPlugin/ # Basic plugin example +├── CustomStepPlugin/ # Workflow step example +├── CustomGatePlugin/ # Promotion gate example +├── WebhookReceiverPlugin/ # Webhook handling example +└── MetricsCollectorPlugin/ # Metrics capability example +``` + +--- + +## Acceptance Criteria + +- [ ] dotnet new templates work for all capability types +- [ ] Plugin CLI tool builds and runs +- [ ] `stellaops-plugin new` creates valid projects +- [ ] `stellaops-plugin validate` validates manifests +- [ ] `stellaops-plugin pack` creates distributable packages +- [ ] `stellaops-plugin test` runs plugin tests +- [ ] Testing framework provides all mock implementations +- [ ] Deterministic testing with FakeTimeProvider works +- [ ] HTTP request recording works in tests +- [ ] Sample plugins compile and pass tests +- [ ] Documentation is comprehensive +- [ ] API reference generated from XML docs + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_001 Plugin Abstractions | Internal | TODO | +| 100_002 Plugin Host | Internal | TODO | +| YamlDotNet | External | Available | +| McMaster.Extensions.CommandLineUtils | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| StellaOps.Plugin.Sdk | TODO | | +| StellaOps.Plugin.Templates | TODO | | +| StellaOps.Plugin.Testing | TODO | | +| StellaOps.Plugin.Cli | TODO | | +| HelloWorldPlugin sample | TODO | | +| CustomStepPlugin sample | TODO | | +| CustomGatePlugin sample | TODO | | +| WebhookReceiverPlugin sample | TODO | | +| Developer documentation | TODO | | +| API reference | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_101_000_INDEX_foundation.md b/docs/implplan/SPRINT_20260110_101_000_INDEX_foundation.md new file mode 100644 index 000000000..a4c31dcec --- /dev/null +++ b/docs/implplan/SPRINT_20260110_101_000_INDEX_foundation.md @@ -0,0 +1,200 @@ +# SPRINT INDEX: Phase 1 - Foundation + +> **Epic:** Release Orchestrator +> **Phase:** 1 - Foundation +> **Batch:** 101 +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_release_orchestrator.md) +> **Prerequisites:** [100_000_INDEX - Plugin System Unification](SPRINT_20260110_100_000_INDEX_plugin_unification.md) (must be completed first) + +--- + +## Overview + +Phase 1 establishes the foundational infrastructure for the Release Orchestrator: database schema and Release Orchestrator-specific plugin extensions. The unified plugin system from Phase 100 provides the core plugin infrastructure; this phase builds on it with Release Orchestrator domain-specific capabilities. + +### Prerequisites + +**Phase 100 - Plugin System Unification** must be completed before starting Phase 101. Phase 100 provides: +- `IPlugin` base interface and lifecycle management +- `IPluginHost` and `PluginHost` implementation +- Database-backed plugin registry +- Plugin sandbox infrastructure +- Core capability interfaces (ICryptoCapability, IAuthCapability, etc.) +- Plugin SDK and developer tooling + +### Objectives + +- Create PostgreSQL schema for all release orchestration tables +- Extend plugin registry with Release Orchestrator-specific capability types +- Implement `IStepProviderCapability` for workflow steps +- Implement `IGateProviderCapability` for promotion gates +- Deliver built-in step and gate providers + +--- + +## Sprint Structure + +| Sprint ID | Title | Module | Status | Dependencies | +|-----------|-------|--------|--------|--------------| +| 101_001 | Database Schema - Core Tables | DB | TODO | Phase 100 complete | +| 101_002 | Plugin Registry Extensions | PLUGIN | TODO | 101_001, 100_003 | +| 101_003 | Loader & Sandbox Extensions | PLUGIN | TODO | 101_002, 100_002, 100_004 | +| 101_004 | SDK Extensions | PLUGIN | TODO | 101_003, 100_012 | + +> **Note:** Sprint numbers 101_002-101_004 now focus on Release Orchestrator-specific plugin extensions rather than duplicating the unified plugin infrastructure built in Phase 100. + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ FOUNDATION LAYER │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ DATABASE SCHEMA (101_001) │ │ +│ │ │ │ +│ │ release.integration_types release.environments │ │ +│ │ release.integrations release.targets │ │ +│ │ release.components release.releases │ │ +│ │ release.workflow_templates release.promotions │ │ +│ │ release.deployment_jobs release.evidence_packets │ │ +│ │ release.plugins release.agents │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ PLUGIN SYSTEM │ │ +│ │ │ │ +│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ +│ │ │ Plugin Registry │ │ Plugin Loader │ │ Plugin Sandbox │ │ │ +│ │ │ (101_002) │ │ (101_003) │ │ (101_003) │ │ │ +│ │ │ │ │ │ │ │ │ │ +│ │ │ - Discovery │ │ - Load/Unload │ │ - Process │ │ │ +│ │ │ - Versioning │ │ - Health check │ │ isolation │ │ │ +│ │ │ - Dependencies │ │ - Hot reload │ │ - Resource │ │ │ +│ │ │ - Manifest │ │ - Lifecycle │ │ limits │ │ │ +│ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Plugin SDK (101_004) │ │ │ +│ │ │ │ │ │ +│ │ │ - Connector interfaces - Step provider interfaces │ │ │ +│ │ │ - Gate provider interfaces - Manifest builder │ │ │ +│ │ │ - Testing utilities - Documentation templates │ │ │ +│ │ └─────────────────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Deliverables Summary + +### 101_001: Database Schema + +| Deliverable | Type | Description | +|-------------|------|-------------| +| Migration 001 | SQL | Integration hub tables | +| Migration 002 | SQL | Environment tables | +| Migration 003 | SQL | Release management tables | +| Migration 004 | SQL | Workflow engine tables | +| Migration 005 | SQL | Promotion tables | +| Migration 006 | SQL | Deployment tables | +| Migration 007 | SQL | Agent tables | +| Migration 008 | SQL | Evidence tables | +| Migration 009 | SQL | Plugin tables | +| RLS Policies | SQL | Row-level security | +| Indexes | SQL | Performance indexes | + +### 101_002: Plugin Registry + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IPluginRegistry` | Interface | Plugin discovery/versioning | +| `PluginRegistry` | Class | Implementation | +| `PluginManifest` | Record | Manifest schema | +| `PluginManifestValidator` | Class | Schema validation | +| `PluginVersion` | Record | SemVer handling | +| `PluginDependencyResolver` | Class | Dependency resolution | + +### 101_003: Plugin Loader & Sandbox + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IPluginLoader` | Interface | Load/unload/reload | +| `PluginLoader` | Class | Implementation | +| `IPluginSandbox` | Interface | Isolation contract | +| `ContainerSandbox` | Class | Container-based isolation | +| `ProcessSandbox` | Class | Process-based isolation | +| `ResourceLimiter` | Class | CPU/memory limits | +| `PluginHealthMonitor` | Class | Health checking | + +### 101_004: Plugin SDK + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `StellaOps.Plugin.Sdk` | NuGet | SDK package | +| `IConnectorPlugin` | Interface | Connector contract | +| `IStepProvider` | Interface | Step contract | +| `IGateProvider` | Interface | Gate contract | +| `ManifestBuilder` | Class | Fluent manifest building | +| Plugin Templates | dotnet new | Project templates | +| Documentation | Markdown | SDK documentation | + +--- + +## Dependencies + +### Phase Dependencies + +| Phase | Purpose | Status | +|-------|---------|--------| +| **Phase 100 - Plugin System Unification** | Unified plugin infrastructure | TODO | +| 100_001 Plugin Abstractions | IPlugin, capabilities | TODO | +| 100_002 Plugin Host | Lifecycle management | TODO | +| 100_003 Plugin Registry | Database registry | TODO | +| 100_004 Plugin Sandbox | Process isolation | TODO | +| 100_012 Plugin SDK | Developer tooling | TODO | + +### External Dependencies + +| Dependency | Purpose | +|------------|---------| +| PostgreSQL 16+ | Database | +| Docker | Plugin sandbox (via Phase 100) | +| gRPC | Plugin communication (via Phase 100) | + +### Internal Dependencies + +| Module | Purpose | +|--------|---------| +| Authority | Tenant context, permissions | +| Telemetry | Metrics, tracing | +| StellaOps.Plugin.Abstractions | Core plugin interfaces (from Phase 100) | +| StellaOps.Plugin.Host | Plugin host (from Phase 100) | +| StellaOps.Plugin.Sdk | SDK library (from Phase 100) | + +--- + +## Acceptance Criteria + +- [ ] All database migrations execute successfully +- [ ] RLS policies enforce tenant isolation +- [ ] Plugin manifest validation covers all required fields +- [ ] Plugin loader can load, start, stop, and unload plugins +- [ ] Sandbox enforces resource limits +- [ ] SDK compiles to NuGet package +- [ ] Sample plugin builds and loads successfully +- [ ] Unit test coverage ≥80% +- [ ] Integration tests pass + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Phase 1 index created | +| 10-Jan-2026 | Added Phase 100 (Plugin System Unification) as prerequisite - plugin infrastructure now centralized | diff --git a/docs/implplan/SPRINT_20260110_101_001_DB_schema_core_tables.md b/docs/implplan/SPRINT_20260110_101_001_DB_schema_core_tables.md new file mode 100644 index 000000000..329a2ac65 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_101_001_DB_schema_core_tables.md @@ -0,0 +1,617 @@ +# SPRINT: Database Schema - Core Tables + +> **Sprint ID:** 101_001 +> **Module:** DB +> **Phase:** 1 - Foundation +> **Status:** TODO +> **Parent:** [101_000_INDEX](SPRINT_20260110_101_000_INDEX_foundation.md) + +--- + +## Overview + +Create the PostgreSQL schema for all Release Orchestrator tables within the `release` schema. This sprint establishes the data model foundation for all subsequent modules. + +> **NORMATIVE:** This sprint MUST comply with [docs/db/SPECIFICATION.md](../../db/SPECIFICATION.md) which defines the authoritative database design patterns for Stella Ops, including schema ownership, RLS policies, UUID generation, and JSONB conventions. + +### Objectives + +- Create `release` schema with RLS policies per SPECIFICATION.md +- Implement all core tables for 10 platform themes +- Add performance indexes and constraints +- Create audit triggers for append-only tables +- Use `require_current_tenant()` RLS helper pattern +- Add generated columns for JSONB hot paths + +### Working Directory + +``` +src/Platform/__Libraries/StellaOps.Platform.Database/ +├── Migrations/ +│ └── Release/ +│ ├── 001_IntegrationHub.sql +│ ├── 002_Environments.sql +│ ├── 003_ReleaseManagement.sql +│ ├── 004_Workflow.sql +│ ├── 005_Promotion.sql +│ ├── 006_Deployment.sql +│ ├── 007_Agents.sql +│ ├── 008_Evidence.sql +│ └── 009_Plugin.sql +└── ReleaseSchema/ + ├── Tables/ + ├── Indexes/ + ├── Functions/ + └── Policies/ +``` + +--- + +## Architecture Reference + +- [Data Model Schema](../modules/release-orchestrator/data-model/schema.md) +- [Entity Definitions](../modules/release-orchestrator/data-model/entities.md) +- [Security Overview](../modules/release-orchestrator/security/overview.md) + +--- + +## Deliverables + +### Migration 001: Integration Hub Tables + +| Table | Description | Key Columns | +|-------|-------------|-------------| +| `release.integration_types` | Enum-like type registry | `id`, `name`, `category` | +| `release.integrations` | Configured integrations | `id`, `tenant_id`, `type_id`, `name`, `config_encrypted` | +| `release.integration_health_checks` | Health check history | `id`, `integration_id`, `status`, `checked_at` | + +```sql +-- release.integration_types +CREATE TABLE release.integration_types ( + id TEXT PRIMARY KEY, + name TEXT NOT NULL, + category TEXT NOT NULL CHECK (category IN ('scm', 'ci', 'registry', 'vault', 'notify')), + description TEXT, + config_schema JSONB NOT NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +-- release.integrations +CREATE TABLE release.integrations ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + type_id TEXT NOT NULL REFERENCES release.integration_types(id), + name TEXT NOT NULL, + display_name TEXT NOT NULL, + config_encrypted BYTEA NOT NULL, + is_enabled BOOLEAN NOT NULL DEFAULT true, + health_status TEXT NOT NULL DEFAULT 'unknown', + last_health_check TIMESTAMPTZ, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + created_by UUID NOT NULL, + UNIQUE (tenant_id, name) +); +``` + +### Migration 002: Environment Tables + +| Table | Description | Key Columns | +|-------|-------------|-------------| +| `release.environments` | Deployment environments | `id`, `tenant_id`, `name`, `order_index` | +| `release.targets` | Deployment targets | `id`, `environment_id`, `type`, `connection_config` | +| `release.freeze_windows` | Deployment freeze periods | `id`, `environment_id`, `start_at`, `end_at` | + +```sql +-- release.environments +CREATE TABLE release.environments ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + name TEXT NOT NULL, + display_name TEXT NOT NULL, + description TEXT, + order_index INT NOT NULL, + is_production BOOLEAN NOT NULL DEFAULT false, + required_approvals INT NOT NULL DEFAULT 0, + require_separation_of_duties BOOLEAN NOT NULL DEFAULT false, + auto_promote_from UUID REFERENCES release.environments(id), + deployment_timeout_seconds INT NOT NULL DEFAULT 600, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + created_by UUID NOT NULL, + UNIQUE (tenant_id, name), + UNIQUE (tenant_id, order_index) +); + +-- release.targets +CREATE TABLE release.targets ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + environment_id UUID NOT NULL REFERENCES release.environments(id), + name TEXT NOT NULL, + display_name TEXT NOT NULL, + type TEXT NOT NULL CHECK (type IN ('docker_host', 'compose_host', 'ecs_service', 'nomad_job')), + connection_config_encrypted BYTEA NOT NULL, + agent_id UUID, + health_status TEXT NOT NULL DEFAULT 'unknown', + last_health_check TIMESTAMPTZ, + last_sync_at TIMESTAMPTZ, + inventory_snapshot JSONB, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + UNIQUE (tenant_id, environment_id, name) +); +``` + +### Migration 003: Release Management Tables + +| Table | Description | Key Columns | +|-------|-------------|-------------| +| `release.components` | Container components | `id`, `tenant_id`, `name`, `registry_integration_id` | +| `release.component_versions` | Version snapshots | `id`, `component_id`, `digest`, `semver` | +| `release.releases` | Release bundles | `id`, `tenant_id`, `name`, `status` | +| `release.release_components` | Release-component mapping | `release_id`, `component_version_id` | + +```sql +-- release.components +CREATE TABLE release.components ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + name TEXT NOT NULL, + display_name TEXT NOT NULL, + description TEXT, + registry_integration_id UUID NOT NULL REFERENCES release.integrations(id), + repository TEXT NOT NULL, + scm_integration_id UUID REFERENCES release.integrations(id), + scm_repository TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + UNIQUE (tenant_id, name) +); + +-- release.releases +CREATE TABLE release.releases ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + name TEXT NOT NULL, + display_name TEXT NOT NULL, + description TEXT, + status TEXT NOT NULL DEFAULT 'draft' CHECK (status IN ('draft', 'ready', 'promoting', 'deployed', 'deprecated')), + source_commit_sha TEXT, + source_branch TEXT, + ci_build_id TEXT, + ci_pipeline_url TEXT, + finalized_at TIMESTAMPTZ, + finalized_by UUID, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + created_by UUID NOT NULL, + UNIQUE (tenant_id, name) +); +``` + +### Migration 004: Workflow Tables + +| Table | Description | Key Columns | +|-------|-------------|-------------| +| `release.workflow_templates` | DAG templates | `id`, `tenant_id`, `name`, `definition` | +| `release.workflow_runs` | Workflow executions | `id`, `template_id`, `status` | +| `release.workflow_steps` | Step definitions | `id`, `run_id`, `step_type`, `status` | + +```sql +-- release.workflow_templates +CREATE TABLE release.workflow_templates ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + name TEXT NOT NULL, + display_name TEXT NOT NULL, + description TEXT, + definition JSONB NOT NULL, + version INT NOT NULL DEFAULT 1, + is_active BOOLEAN NOT NULL DEFAULT true, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + UNIQUE (tenant_id, name, version) +); + +-- release.workflow_runs +CREATE TABLE release.workflow_runs ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + template_id UUID NOT NULL REFERENCES release.workflow_templates(id), + template_version INT NOT NULL, + context_type TEXT NOT NULL, + context_id UUID NOT NULL, + status TEXT NOT NULL DEFAULT 'pending' CHECK (status IN ('pending', 'running', 'succeeded', 'failed', 'cancelled')), + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + error_message TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT now() +); +``` + +### Migration 005: Promotion Tables + +| Table | Description | Key Columns | +|-------|-------------|-------------| +| `release.promotions` | Promotion requests | `id`, `release_id`, `target_environment_id`, `status` | +| `release.approvals` | Approval records | `id`, `promotion_id`, `approver_id`, `decision` | +| `release.gate_results` | Gate evaluation results | `id`, `promotion_id`, `gate_type`, `passed` | + +```sql +-- release.promotions +CREATE TABLE release.promotions ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + release_id UUID NOT NULL REFERENCES release.releases(id), + source_environment_id UUID REFERENCES release.environments(id), + target_environment_id UUID NOT NULL REFERENCES release.environments(id), + status TEXT NOT NULL DEFAULT 'pending' CHECK (status IN ( + 'pending', 'awaiting_approval', 'approved', 'rejected', + 'deploying', 'deployed', 'failed', 'cancelled', 'rolled_back' + )), + requested_by UUID NOT NULL, + requested_at TIMESTAMPTZ NOT NULL DEFAULT now(), + request_reason TEXT, + decision TEXT CHECK (decision IN ('allow', 'block')), + decided_at TIMESTAMPTZ, + deployment_job_id UUID, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +-- release.approvals (append-only) +CREATE TABLE release.approvals ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + promotion_id UUID NOT NULL REFERENCES release.promotions(id), + approver_id UUID NOT NULL, + decision TEXT NOT NULL CHECK (decision IN ('approved', 'rejected')), + comment TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT now() + -- No updated_at - append only +); + +-- Prevent modifications to approvals +REVOKE UPDATE, DELETE ON release.approvals FROM app_role; +``` + +### Migration 006: Deployment Tables + +| Table | Description | Key Columns | +|-------|-------------|-------------| +| `release.deployment_jobs` | Deployment executions | `id`, `promotion_id`, `strategy`, `status` | +| `release.deployment_tasks` | Per-target tasks | `id`, `job_id`, `target_id`, `status` | +| `release.deployment_artifacts` | Generated artifacts | `id`, `job_id`, `type`, `storage_ref` | + +```sql +-- release.deployment_jobs +CREATE TABLE release.deployment_jobs ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + promotion_id UUID NOT NULL REFERENCES release.promotions(id), + strategy TEXT NOT NULL DEFAULT 'rolling' CHECK (strategy IN ('rolling', 'blue_green', 'canary', 'all_at_once')), + status TEXT NOT NULL DEFAULT 'pending' CHECK (status IN ( + 'pending', 'pulling', 'deploying', 'verifying', + 'succeeded', 'failed', 'rolling_back', 'rolled_back' + )), + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + error_message TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +-- release.deployment_tasks +CREATE TABLE release.deployment_tasks ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + job_id UUID NOT NULL REFERENCES release.deployment_jobs(id), + target_id UUID NOT NULL REFERENCES release.targets(id), + agent_id UUID, + status TEXT NOT NULL DEFAULT 'pending' CHECK (status IN ( + 'pending', 'assigned', 'pulling', 'deploying', + 'verifying', 'succeeded', 'failed' + )), + digest_deployed TEXT, + sticker_written BOOLEAN NOT NULL DEFAULT false, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + error_message TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); +``` + +### Migration 007: Agent Tables + +| Table | Description | Key Columns | +|-------|-------------|-------------| +| `release.agents` | Registered agents | `id`, `tenant_id`, `name`, `status` | +| `release.agent_capabilities` | Agent capabilities | `agent_id`, `capability` | +| `release.agent_heartbeats` | Heartbeat history | `id`, `agent_id`, `received_at` | + +```sql +-- release.agents +CREATE TABLE release.agents ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + name TEXT NOT NULL, + display_name TEXT NOT NULL, + version TEXT NOT NULL, + status TEXT NOT NULL DEFAULT 'pending' CHECK (status IN ('pending', 'active', 'inactive', 'revoked')), + certificate_thumbprint TEXT, + certificate_expires_at TIMESTAMPTZ, + last_heartbeat_at TIMESTAMPTZ, + last_heartbeat_status JSONB, + registered_at TIMESTAMPTZ, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + UNIQUE (tenant_id, name) +); + +-- release.agent_capabilities +CREATE TABLE release.agent_capabilities ( + agent_id UUID NOT NULL REFERENCES release.agents(id) ON DELETE CASCADE, + capability TEXT NOT NULL CHECK (capability IN ('docker', 'compose', 'ssh', 'winrm')), + config JSONB, + PRIMARY KEY (agent_id, capability) +); +``` + +### Migration 008: Evidence Tables + +| Table | Description | Key Columns | +|-------|-------------|-------------| +| `release.evidence_packets` | Signed evidence (append-only) | `id`, `promotion_id`, `type`, `content` | + +```sql +-- release.evidence_packets (append-only, immutable) +CREATE TABLE release.evidence_packets ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + promotion_id UUID NOT NULL REFERENCES release.promotions(id), + type TEXT NOT NULL CHECK (type IN ('release_decision', 'deployment', 'rollback', 'ab_promotion')), + version TEXT NOT NULL DEFAULT '1.0', + content JSONB NOT NULL, + content_hash TEXT NOT NULL, + signature TEXT NOT NULL, + signature_algorithm TEXT NOT NULL, + signer_key_ref TEXT NOT NULL, + generated_at TIMESTAMPTZ NOT NULL, + generator_version TEXT NOT NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT now() + -- No updated_at - packets are immutable +); + +-- Prevent modifications +REVOKE UPDATE, DELETE ON release.evidence_packets FROM app_role; + +-- Index for quick lookups +CREATE INDEX idx_evidence_packets_promotion ON release.evidence_packets(promotion_id); +CREATE INDEX idx_evidence_packets_type ON release.evidence_packets(tenant_id, type); +``` + +### Migration 009: Plugin Tables + +| Table | Description | Key Columns | +|-------|-------------|-------------| +| `release.plugins` | Registered plugins | `id`, `tenant_id`, `name`, `type` | +| `release.plugin_versions` | Plugin versions | `id`, `plugin_id`, `version`, `manifest` | + +```sql +-- release.plugins +CREATE TABLE release.plugins ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID REFERENCES tenants(id), -- NULL for system plugins + name TEXT NOT NULL, + display_name TEXT NOT NULL, + description TEXT, + type TEXT NOT NULL CHECK (type IN ('connector', 'step', 'gate')), + is_builtin BOOLEAN NOT NULL DEFAULT false, + is_enabled BOOLEAN NOT NULL DEFAULT true, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + UNIQUE (COALESCE(tenant_id, '00000000-0000-0000-0000-000000000000'::UUID), name) +); + +-- release.plugin_versions +CREATE TABLE release.plugin_versions ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + plugin_id UUID NOT NULL REFERENCES release.plugins(id), + version TEXT NOT NULL, + manifest JSONB NOT NULL, + package_hash TEXT NOT NULL, + package_url TEXT, + is_active BOOLEAN NOT NULL DEFAULT false, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + UNIQUE (plugin_id, version) +); +``` + +### RLS Policies + +Following the pattern established in `docs/db/SPECIFICATION.md`, all RLS policies use the `require_current_tenant()` helper function for consistent tenant isolation. + +```sql +-- Create helper function per SPECIFICATION.md Section 2.3 +CREATE OR REPLACE FUNCTION release_app.require_current_tenant() +RETURNS UUID +LANGUAGE sql +STABLE +AS $$ + SELECT COALESCE( + NULLIF(current_setting('app.current_tenant_id', true), '')::UUID, + (SELECT id FROM shared.tenants WHERE is_default = true LIMIT 1) + ) +$$; + +-- Enable RLS on all tables +ALTER TABLE release.integrations ENABLE ROW LEVEL SECURITY; +ALTER TABLE release.environments ENABLE ROW LEVEL SECURITY; +ALTER TABLE release.targets ENABLE ROW LEVEL SECURITY; +ALTER TABLE release.components ENABLE ROW LEVEL SECURITY; +ALTER TABLE release.releases ENABLE ROW LEVEL SECURITY; +ALTER TABLE release.promotions ENABLE ROW LEVEL SECURITY; +ALTER TABLE release.approvals ENABLE ROW LEVEL SECURITY; +ALTER TABLE release.deployment_jobs ENABLE ROW LEVEL SECURITY; +ALTER TABLE release.deployment_tasks ENABLE ROW LEVEL SECURITY; +ALTER TABLE release.agents ENABLE ROW LEVEL SECURITY; +ALTER TABLE release.evidence_packets ENABLE ROW LEVEL SECURITY; +ALTER TABLE release.plugins ENABLE ROW LEVEL SECURITY; + +-- Standard tenant isolation policy using helper (example for integrations) +CREATE POLICY tenant_isolation ON release.integrations + USING (tenant_id = release_app.require_current_tenant()); + +-- Repeat pattern for all tenant-scoped tables +``` + +### Performance Indexes + +```sql +-- High-cardinality lookup indexes +CREATE INDEX idx_releases_tenant_status ON release.releases(tenant_id, status); +CREATE INDEX idx_promotions_tenant_status ON release.promotions(tenant_id, status); +CREATE INDEX idx_promotions_release ON release.promotions(release_id); +CREATE INDEX idx_deployment_jobs_promotion ON release.deployment_jobs(promotion_id); +CREATE INDEX idx_deployment_tasks_job ON release.deployment_tasks(job_id); +CREATE INDEX idx_agents_tenant_status ON release.agents(tenant_id, status); +CREATE INDEX idx_targets_environment ON release.targets(environment_id); +CREATE INDEX idx_targets_agent ON release.targets(agent_id); + +-- Partial indexes for active records +CREATE INDEX idx_promotions_pending ON release.promotions(tenant_id, target_environment_id) + WHERE status IN ('pending', 'awaiting_approval'); +CREATE INDEX idx_agents_active ON release.agents(tenant_id) + WHERE status = 'active'; +``` + +### Generated Columns for JSONB Hot Paths + +Per `docs/db/SPECIFICATION.md` Section 4.5, use generated columns for frequently-queried JSONB fields to enable efficient indexing and avoid repeated JSON parsing. + +```sql +-- Evidence packets: extract release_id for quick lookups +ALTER TABLE release.evidence_packets + ADD COLUMN release_id UUID GENERATED ALWAYS AS ( + (content->>'releaseId')::UUID + ) STORED; + +-- Evidence packets: extract what.type for filtering by evidence type +ALTER TABLE release.evidence_packets + ADD COLUMN evidence_what_type TEXT GENERATED ALWAYS AS ( + content->'what'->>'type' + ) STORED; + +-- Workflow templates: extract step count for UI display +ALTER TABLE release.workflow_templates + ADD COLUMN step_count INT GENERATED ALWAYS AS ( + jsonb_array_length(COALESCE(definition->'steps', '[]'::JSONB)) + ) STORED; + +-- Agents: extract primary capability from last heartbeat +ALTER TABLE release.agents + ADD COLUMN primary_capability TEXT GENERATED ALWAYS AS ( + last_heartbeat_status->>'primaryCapability' + ) STORED; + +-- Targets: extract deployed digest from inventory snapshot +ALTER TABLE release.targets + ADD COLUMN current_digest TEXT GENERATED ALWAYS AS ( + inventory_snapshot->>'digest' + ) STORED; + +-- Index generated columns for efficient queries +CREATE INDEX idx_evidence_packets_release ON release.evidence_packets(release_id); +CREATE INDEX idx_evidence_packets_what_type ON release.evidence_packets(tenant_id, evidence_what_type); +CREATE INDEX idx_targets_current_digest ON release.targets(current_digest) WHERE current_digest IS NOT NULL; +``` + +--- + +## Acceptance Criteria + +- [ ] All 9 migrations execute successfully in order +- [ ] Schema complies with docs/db/SPECIFICATION.md +- [ ] RLS policies use `require_current_tenant()` helper +- [ ] RLS policies enforce tenant isolation +- [ ] Append-only tables reject UPDATE/DELETE +- [ ] All foreign key constraints valid +- [ ] Performance indexes created +- [ ] Generated columns created for JSONB hot paths +- [ ] Schema documentation generated +- [ ] Migration rollback scripts created +- [ ] Integration tests pass with Testcontainers + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `MigrationOrderTest` | Verify migrations run in dependency order | +| `RlsPolicyTest` | Verify tenant isolation enforced | +| `AppendOnlyTest` | Verify UPDATE/DELETE rejected on evidence tables | +| `ForeignKeyTest` | Verify all FK constraints | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `SchemaCreationTest` | Full schema creation on fresh database | +| `MigrationIdempotencyTest` | Migrations can be re-run safely | +| `PerformanceIndexTest` | Verify indexes used in common queries | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| PostgreSQL 16+ | External | Available | +| `tenants` table | Internal | Exists | +| Testcontainers | Testing | Available | + +--- + +## Risks & Mitigations + +| Risk | Impact | Mitigation | +|------|--------|------------| +| Schema conflicts with existing tables | High | Use dedicated `release` schema | +| Migration performance on large DBs | Medium | Use concurrent index creation | +| RLS policy overhead | Low | Benchmark and optimize | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| Migration 001 - Integration Hub | TODO | | +| Migration 002 - Environments | TODO | | +| Migration 003 - Release Management | TODO | | +| Migration 004 - Workflow | TODO | | +| Migration 005 - Promotion | TODO | | +| Migration 006 - Deployment | TODO | | +| Migration 007 - Agents | TODO | | +| Migration 008 - Evidence | TODO | | +| Migration 009 - Plugin | TODO | | +| RLS Policies | TODO | Uses `require_current_tenant()` helper | +| Performance Indexes | TODO | | +| Generated Columns | TODO | JSONB hot paths for evidence, workflows, agents | +| Integration Tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 10-Jan-2026 | Added reference to docs/db/SPECIFICATION.md as normative | +| 10-Jan-2026 | Added require_current_tenant() RLS helper pattern | +| 10-Jan-2026 | Added generated columns for JSONB hot paths (evidence_packets, workflow_templates, agents, targets) | diff --git a/docs/implplan/SPRINT_20260110_101_002_PLUGIN_registry.md b/docs/implplan/SPRINT_20260110_101_002_PLUGIN_registry.md new file mode 100644 index 000000000..1e483eead --- /dev/null +++ b/docs/implplan/SPRINT_20260110_101_002_PLUGIN_registry.md @@ -0,0 +1,938 @@ +# SPRINT: Plugin Registry Extensions for Release Orchestrator + +> **Sprint ID:** 101_002 +> **Module:** PLUGIN +> **Phase:** 1 - Foundation +> **Status:** TODO +> **Parent:** [101_000_INDEX](SPRINT_20260110_101_000_INDEX_foundation.md) +> **Prerequisites:** [100_003 Plugin Registry](SPRINT_20260110_100_003_PLUGIN_registry.md), [100_001 Plugin Abstractions](SPRINT_20260110_100_001_PLUGIN_abstractions.md) + +--- + +## Overview + +Extend the unified plugin registry (from Phase 100) with Release Orchestrator-specific capability types, including workflow step providers, promotion gate providers, and integration connectors. This sprint builds on top of the core `IPluginRegistry` infrastructure. + +> **Note:** The core plugin registry (`IPluginRegistry`, `PostgresPluginRegistry`, database schema) is implemented in Phase 100 sprint 100_003. This sprint adds Release Orchestrator domain-specific extensions. + +### Objectives + +- Register Release Orchestrator capability interfaces with the plugin system +- Define `IStepProviderCapability` for workflow steps +- Define `IGateProviderCapability` for promotion gates +- Define `IConnectorCapability` variants for Integration Hub +- Create domain-specific registry queries +- Add capability-specific validation + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Plugin/ +│ ├── Capabilities/ +│ │ ├── IStepProviderCapability.cs +│ │ ├── IGateProviderCapability.cs +│ │ ├── IScmConnectorCapability.cs +│ │ ├── IRegistryConnectorCapability.cs +│ │ ├── IVaultConnectorCapability.cs +│ │ ├── INotifyConnectorCapability.cs +│ │ └── ICiConnectorCapability.cs +│ ├── Registry/ +│ │ ├── ReleaseOrchestratorPluginRegistry.cs +│ │ ├── StepProviderRegistry.cs +│ │ ├── GateProviderRegistry.cs +│ │ └── ConnectorRegistry.cs +│ └── Models/ +│ ├── StepDefinition.cs +│ ├── GateDefinition.cs +│ └── ConnectorDefinition.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Plugin.Tests/ + ├── StepProviderRegistryTests.cs + ├── GateProviderRegistryTests.cs + └── ConnectorRegistryTests.cs +``` + +--- + +## Architecture Reference + +- [Phase 100 Plugin System](SPRINT_20260110_100_000_INDEX_plugin_unification.md) +- [Plugin System](../modules/release-orchestrator/modules/plugin-system.md) +- [Workflow Engine](../modules/release-orchestrator/modules/workflow-engine.md) + +--- + +## Deliverables + +### IStepProviderCapability Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Capabilities; + +/// +/// Capability interface for workflow step providers. +/// Plugins implementing this capability can provide custom workflow steps. +/// +public interface IStepProviderCapability +{ + /// + /// Get step definitions provided by this plugin. + /// + IReadOnlyList GetStepDefinitions(); + + /// + /// Execute a step. + /// + Task ExecuteStepAsync(StepExecutionContext context, CancellationToken ct); + + /// + /// Validate step configuration before execution. + /// + Task ValidateStepConfigAsync( + string stepType, + JsonElement configuration, + CancellationToken ct); + + /// + /// Get step output schema for a step type. + /// + JsonSchema? GetOutputSchema(string stepType); +} + +public sealed record StepDefinition( + string Type, + string DisplayName, + string Description, + string Category, + JsonSchema ConfigSchema, + JsonSchema OutputSchema, + IReadOnlyList RequiredCapabilities, + bool SupportsRetry, + TimeSpan DefaultTimeout); + +public sealed record StepExecutionContext( + Guid StepId, + Guid WorkflowRunId, + Guid TenantId, + string StepType, + JsonElement Configuration, + IReadOnlyDictionary Inputs, + IStepOutputWriter OutputWriter, + IPluginLogger Logger); + +public sealed record StepResult( + StepStatus Status, + IReadOnlyDictionary Outputs, + string? ErrorMessage = null, + TimeSpan? Duration = null); + +public enum StepStatus +{ + Succeeded, + Failed, + Skipped, + TimedOut +} + +public sealed record StepValidationResult( + bool IsValid, + IReadOnlyList Errors) +{ + public static StepValidationResult Success() => new(true, []); + public static StepValidationResult Failure(params string[] errors) => new(false, errors); +} +``` + +### IGateProviderCapability Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Capabilities; + +/// +/// Capability interface for promotion gate providers. +/// Plugins implementing this capability can provide custom promotion gates. +/// +public interface IGateProviderCapability +{ + /// + /// Get gate definitions provided by this plugin. + /// + IReadOnlyList GetGateDefinitions(); + + /// + /// Evaluate a gate for a promotion. + /// + Task EvaluateGateAsync(GateEvaluationContext context, CancellationToken ct); + + /// + /// Validate gate configuration. + /// + Task ValidateGateConfigAsync( + string gateType, + JsonElement configuration, + CancellationToken ct); +} + +public sealed record GateDefinition( + string Type, + string DisplayName, + string Description, + string Category, + JsonSchema ConfigSchema, + bool IsBlocking, + bool SupportsOverride, + IReadOnlyList RequiredPermissions); + +public sealed record GateEvaluationContext( + Guid GateId, + Guid PromotionId, + Guid ReleaseId, + Guid SourceEnvironmentId, + Guid TargetEnvironmentId, + Guid TenantId, + string GateType, + JsonElement Configuration, + ReleaseInfo Release, + EnvironmentInfo TargetEnvironment, + IPluginLogger Logger); + +public sealed record GateResult( + GateStatus Status, + string Message, + IReadOnlyDictionary Details, + IReadOnlyList? Evidence = null) +{ + public static GateResult Pass(string message, IReadOnlyDictionary? details = null) => + new(GateStatus.Passed, message, details ?? new Dictionary()); + + public static GateResult Fail(string message, IReadOnlyDictionary? details = null) => + new(GateStatus.Failed, message, details ?? new Dictionary()); + + public static GateResult Warn(string message, IReadOnlyDictionary? details = null) => + new(GateStatus.Warning, message, details ?? new Dictionary()); + + public static GateResult Pending(string message) => + new(GateStatus.Pending, message, new Dictionary()); +} + +public enum GateStatus +{ + Passed, + Failed, + Warning, + Pending, + Skipped +} + +public sealed record GateEvidence( + string Type, + string Description, + JsonElement Data); + +public sealed record GateValidationResult( + bool IsValid, + IReadOnlyList Errors) +{ + public static GateValidationResult Success() => new(true, []); + public static GateValidationResult Failure(params string[] errors) => new(false, errors); +} +``` + +### Integration Connector Capability Interfaces + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Capabilities; + +/// +/// Base interface for all Integration Hub connectors. +/// +public interface IIntegrationConnectorCapability +{ + /// + /// Connector category (SCM, CI, Registry, Vault, Notify). + /// + ConnectorCategory Category { get; } + + /// + /// Connector type identifier. + /// + string ConnectorType { get; } + + /// + /// Human-readable display name. + /// + string DisplayName { get; } + + /// + /// Validate connector configuration. + /// + Task ValidateConfigAsync( + JsonElement config, + CancellationToken ct); + + /// + /// Test connection with current configuration. + /// + Task TestConnectionAsync( + ConnectorContext context, + CancellationToken ct); + + /// + /// Get connector capabilities. + /// + IReadOnlyList GetSupportedOperations(); +} + +public enum ConnectorCategory +{ + Scm, + Ci, + Registry, + Vault, + Notify +} + +public sealed record ConnectorContext( + Guid IntegrationId, + Guid TenantId, + JsonElement Configuration, + ISecretResolver SecretResolver, + IPluginLogger Logger); + +/// +/// Extended interface for SCM connectors in Release Orchestrator context. +/// Extends the base IScmCapability from Phase 100 with Release Orchestrator-specific operations. +/// +public interface IScmConnectorCapability : IIntegrationConnectorCapability +{ + /// + /// List repositories accessible by this integration. + /// + Task> ListRepositoriesAsync( + ConnectorContext context, + string? searchPattern = null, + CancellationToken ct = default); + + /// + /// Get commit information. + /// + Task GetCommitAsync( + ConnectorContext context, + string repository, + string commitSha, + CancellationToken ct = default); + + /// + /// Create a webhook for repository events. + /// + Task CreateWebhookAsync( + ConnectorContext context, + string repository, + IReadOnlyList events, + string callbackUrl, + CancellationToken ct = default); + + /// + /// Get release/tag information. + /// + Task> ListReleasesAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default); +} + +/// +/// Extended interface for container registry connectors. +/// +public interface IRegistryConnectorCapability : IIntegrationConnectorCapability +{ + /// + /// List repositories in the registry. + /// + Task> ListRepositoriesAsync( + ConnectorContext context, + string? prefix = null, + CancellationToken ct = default); + + /// + /// List tags for a repository. + /// + Task> ListTagsAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default); + + /// + /// Resolve a tag to its digest. + /// + Task ResolveTagAsync( + ConnectorContext context, + string repository, + string tag, + CancellationToken ct = default); + + /// + /// Get image manifest. + /// + Task GetManifestAsync( + ConnectorContext context, + string repository, + string reference, + CancellationToken ct = default); + + /// + /// Generate pull credentials for an image. + /// + Task GetPullCredentialsAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default); +} + +/// +/// Extended interface for vault/secrets connectors. +/// +public interface IVaultConnectorCapability : IIntegrationConnectorCapability +{ + /// + /// Get a secret value. + /// + Task GetSecretAsync( + ConnectorContext context, + string path, + CancellationToken ct = default); + + /// + /// List secrets at a path. + /// + Task> ListSecretsAsync( + ConnectorContext context, + string path, + CancellationToken ct = default); +} + +/// +/// Extended interface for notification connectors. +/// +public interface INotifyConnectorCapability : IIntegrationConnectorCapability +{ + /// + /// Send a notification. + /// + Task SendNotificationAsync( + ConnectorContext context, + Notification notification, + CancellationToken ct = default); + + /// + /// Get supported notification channels. + /// + IReadOnlyList GetSupportedChannels(); +} + +/// +/// Extended interface for CI/CD system connectors. +/// +public interface ICiConnectorCapability : IIntegrationConnectorCapability +{ + /// + /// Trigger a pipeline/workflow. + /// + Task TriggerPipelineAsync( + ConnectorContext context, + PipelineTriggerRequest request, + CancellationToken ct = default); + + /// + /// Get pipeline status. + /// + Task GetPipelineStatusAsync( + ConnectorContext context, + string pipelineId, + CancellationToken ct = default); + + /// + /// List available pipelines. + /// + Task> ListPipelinesAsync( + ConnectorContext context, + string? repository = null, + CancellationToken ct = default); +} +``` + +### Step Provider Registry + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Registry; + +/// +/// Registry for discovering and querying step providers. +/// Builds on top of the unified plugin registry from Phase 100. +/// +public interface IStepProviderRegistry +{ + /// + /// Get all registered step definitions. + /// + Task> GetAllStepsAsync(CancellationToken ct = default); + + /// + /// Get steps by category. + /// + Task> GetStepsByCategoryAsync( + string category, + CancellationToken ct = default); + + /// + /// Get a specific step definition. + /// + Task GetStepAsync(string stepType, CancellationToken ct = default); + + /// + /// Get the plugin that provides a step. + /// + Task GetStepProviderPluginAsync(string stepType, CancellationToken ct = default); + + /// + /// Execute a step using its provider. + /// + Task ExecuteStepAsync( + string stepType, + StepExecutionContext context, + CancellationToken ct = default); +} + +public sealed record RegisteredStep( + StepDefinition Definition, + string PluginId, + string PluginVersion, + bool IsBuiltIn); + +/// +/// Implementation that queries the unified plugin registry for step providers. +/// +public sealed class StepProviderRegistry : IStepProviderRegistry +{ + private readonly IPluginHost _pluginHost; + private readonly IPluginRegistry _pluginRegistry; + private readonly ILogger _logger; + + public StepProviderRegistry( + IPluginHost pluginHost, + IPluginRegistry pluginRegistry, + ILogger logger) + { + _pluginHost = pluginHost; + _pluginRegistry = pluginRegistry; + _logger = logger; + } + + public async Task> GetAllStepsAsync(CancellationToken ct = default) + { + var steps = new List(); + + // Query plugins with WorkflowStep capability + var stepProviders = await _pluginRegistry.QueryByCapabilityAsync( + PluginCapabilities.WorkflowStep, ct); + + foreach (var pluginInfo in stepProviders) + { + var plugin = _pluginHost.GetPlugin(pluginInfo.Id); + if (plugin is IStepProviderCapability stepProvider) + { + var definitions = stepProvider.GetStepDefinitions(); + foreach (var def in definitions) + { + steps.Add(new RegisteredStep( + Definition: def, + PluginId: pluginInfo.Id, + PluginVersion: pluginInfo.Version, + IsBuiltIn: plugin.TrustLevel == PluginTrustLevel.BuiltIn)); + } + } + } + + return steps; + } + + public async Task> GetStepsByCategoryAsync( + string category, + CancellationToken ct = default) + { + var allSteps = await GetAllStepsAsync(ct); + return allSteps.Where(s => + s.Definition.Category.Equals(category, StringComparison.OrdinalIgnoreCase)) + .ToList(); + } + + public async Task GetStepAsync(string stepType, CancellationToken ct = default) + { + var allSteps = await GetAllStepsAsync(ct); + return allSteps.FirstOrDefault(s => + s.Definition.Type.Equals(stepType, StringComparison.OrdinalIgnoreCase)); + } + + public async Task GetStepProviderPluginAsync(string stepType, CancellationToken ct = default) + { + var step = await GetStepAsync(stepType, ct); + if (step == null) return null; + + return _pluginHost.GetPlugin(step.PluginId); + } + + public async Task ExecuteStepAsync( + string stepType, + StepExecutionContext context, + CancellationToken ct = default) + { + var plugin = await GetStepProviderPluginAsync(stepType, ct); + if (plugin is not IStepProviderCapability stepProvider) + { + throw new InvalidOperationException($"No step provider found for type: {stepType}"); + } + + return await stepProvider.ExecuteStepAsync(context, ct); + } +} +``` + +### Gate Provider Registry + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Registry; + +/// +/// Registry for discovering and querying gate providers. +/// +public interface IGateProviderRegistry +{ + /// + /// Get all registered gate definitions. + /// + Task> GetAllGatesAsync(CancellationToken ct = default); + + /// + /// Get gates by category. + /// + Task> GetGatesByCategoryAsync( + string category, + CancellationToken ct = default); + + /// + /// Get a specific gate definition. + /// + Task GetGateAsync(string gateType, CancellationToken ct = default); + + /// + /// Get the plugin that provides a gate. + /// + Task GetGateProviderPluginAsync(string gateType, CancellationToken ct = default); + + /// + /// Evaluate a gate using its provider. + /// + Task EvaluateGateAsync( + string gateType, + GateEvaluationContext context, + CancellationToken ct = default); +} + +public sealed record RegisteredGate( + GateDefinition Definition, + string PluginId, + string PluginVersion, + bool IsBuiltIn); + +/// +/// Implementation that queries the unified plugin registry for gate providers. +/// +public sealed class GateProviderRegistry : IGateProviderRegistry +{ + private readonly IPluginHost _pluginHost; + private readonly IPluginRegistry _pluginRegistry; + private readonly ILogger _logger; + + public GateProviderRegistry( + IPluginHost pluginHost, + IPluginRegistry pluginRegistry, + ILogger logger) + { + _pluginHost = pluginHost; + _pluginRegistry = pluginRegistry; + _logger = logger; + } + + public async Task> GetAllGatesAsync(CancellationToken ct = default) + { + var gates = new List(); + + var gateProviders = await _pluginRegistry.QueryByCapabilityAsync( + PluginCapabilities.PromotionGate, ct); + + foreach (var pluginInfo in gateProviders) + { + var plugin = _pluginHost.GetPlugin(pluginInfo.Id); + if (plugin is IGateProviderCapability gateProvider) + { + var definitions = gateProvider.GetGateDefinitions(); + foreach (var def in definitions) + { + gates.Add(new RegisteredGate( + Definition: def, + PluginId: pluginInfo.Id, + PluginVersion: pluginInfo.Version, + IsBuiltIn: plugin.TrustLevel == PluginTrustLevel.BuiltIn)); + } + } + } + + return gates; + } + + public async Task> GetGatesByCategoryAsync( + string category, + CancellationToken ct = default) + { + var allGates = await GetAllGatesAsync(ct); + return allGates.Where(g => + g.Definition.Category.Equals(category, StringComparison.OrdinalIgnoreCase)) + .ToList(); + } + + public async Task GetGateAsync(string gateType, CancellationToken ct = default) + { + var allGates = await GetAllGatesAsync(ct); + return allGates.FirstOrDefault(g => + g.Definition.Type.Equals(gateType, StringComparison.OrdinalIgnoreCase)); + } + + public async Task GetGateProviderPluginAsync(string gateType, CancellationToken ct = default) + { + var gate = await GetGateAsync(gateType, ct); + if (gate == null) return null; + + return _pluginHost.GetPlugin(gate.PluginId); + } + + public async Task EvaluateGateAsync( + string gateType, + GateEvaluationContext context, + CancellationToken ct = default) + { + var plugin = await GetGateProviderPluginAsync(gateType, ct); + if (plugin is not IGateProviderCapability gateProvider) + { + throw new InvalidOperationException($"No gate provider found for type: {gateType}"); + } + + return await gateProvider.EvaluateGateAsync(context, ct); + } +} +``` + +### Connector Registry + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Registry; + +/// +/// Registry for discovering and querying integration connectors. +/// +public interface IConnectorRegistry +{ + /// + /// Get all registered connectors. + /// + Task> GetAllConnectorsAsync(CancellationToken ct = default); + + /// + /// Get connectors by category. + /// + Task> GetConnectorsByCategoryAsync( + ConnectorCategory category, + CancellationToken ct = default); + + /// + /// Get a specific connector. + /// + Task GetConnectorAsync(string connectorType, CancellationToken ct = default); + + /// + /// Get the plugin that provides a connector. + /// + Task GetConnectorPluginAsync(string connectorType, CancellationToken ct = default); +} + +public sealed record RegisteredConnector( + string Type, + string DisplayName, + ConnectorCategory Category, + string PluginId, + string PluginVersion, + IReadOnlyList SupportedOperations, + bool IsBuiltIn); + +/// +/// Implementation that queries the unified plugin registry for connectors. +/// +public sealed class ConnectorRegistry : IConnectorRegistry +{ + private readonly IPluginHost _pluginHost; + private readonly IPluginRegistry _pluginRegistry; + private readonly ILogger _logger; + + private static readonly Dictionary CategoryToCapability = new() + { + [ConnectorCategory.Scm] = PluginCapabilities.Scm, + [ConnectorCategory.Ci] = PluginCapabilities.Ci, + [ConnectorCategory.Registry] = PluginCapabilities.ContainerRegistry, + [ConnectorCategory.Vault] = PluginCapabilities.SecretsVault, + [ConnectorCategory.Notify] = PluginCapabilities.Notification + }; + + public ConnectorRegistry( + IPluginHost pluginHost, + IPluginRegistry pluginRegistry, + ILogger logger) + { + _pluginHost = pluginHost; + _pluginRegistry = pluginRegistry; + _logger = logger; + } + + public async Task> GetAllConnectorsAsync(CancellationToken ct = default) + { + var connectors = new List(); + + foreach (var (category, capability) in CategoryToCapability) + { + var categoryConnectors = await GetConnectorsByCategoryAsync(category, ct); + connectors.AddRange(categoryConnectors); + } + + return connectors; + } + + public async Task> GetConnectorsByCategoryAsync( + ConnectorCategory category, + CancellationToken ct = default) + { + var connectors = new List(); + + if (!CategoryToCapability.TryGetValue(category, out var capability)) + return connectors; + + var plugins = await _pluginRegistry.QueryByCapabilityAsync(capability, ct); + + foreach (var pluginInfo in plugins) + { + var plugin = _pluginHost.GetPlugin(pluginInfo.Id); + if (plugin is IIntegrationConnectorCapability connector) + { + connectors.Add(new RegisteredConnector( + Type: connector.ConnectorType, + DisplayName: connector.DisplayName, + Category: connector.Category, + PluginId: pluginInfo.Id, + PluginVersion: pluginInfo.Version, + SupportedOperations: connector.GetSupportedOperations(), + IsBuiltIn: plugin.TrustLevel == PluginTrustLevel.BuiltIn)); + } + } + + return connectors; + } + + public async Task GetConnectorAsync(string connectorType, CancellationToken ct = default) + { + var all = await GetAllConnectorsAsync(ct); + return all.FirstOrDefault(c => + c.Type.Equals(connectorType, StringComparison.OrdinalIgnoreCase)); + } + + public async Task GetConnectorPluginAsync(string connectorType, CancellationToken ct = default) + { + var connector = await GetConnectorAsync(connectorType, ct); + if (connector == null) return null; + + return _pluginHost.GetPlugin(connector.PluginId); + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] `IStepProviderCapability` interface defined with full step lifecycle +- [ ] `IGateProviderCapability` interface defined with gate evaluation +- [ ] Integration connector interfaces defined for all categories (SCM, CI, Registry, Vault, Notify) +- [ ] `StepProviderRegistry` queries plugins from unified registry +- [ ] `GateProviderRegistry` queries plugins from unified registry +- [ ] `ConnectorRegistry` queries plugins from unified registry +- [ ] Step execution routing works through registry +- [ ] Gate evaluation routing works through registry +- [ ] Unit test coverage >= 90% + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `StepProviderRegistry_ReturnsStepsFromPlugins` | Registry queries plugins correctly | +| `GateProviderRegistry_ReturnsGatesFromPlugins` | Registry queries plugins correctly | +| `ConnectorRegistry_FiltersByCategory` | Category filtering works | +| `StepExecution_RoutesToCorrectPlugin` | Step execution routing | +| `GateEvaluation_RoutesToCorrectPlugin` | Gate evaluation routing | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `BuiltInStepsAvailable` | Built-in steps discoverable | +| `BuiltInGatesAvailable` | Built-in gates discoverable | +| `ThirdPartyPluginIntegration` | Third-party plugins integrate | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_001 Plugin Abstractions | Internal | TODO | +| 100_002 Plugin Host | Internal | TODO | +| 100_003 Plugin Registry | Internal | TODO | +| 101_001 Database Schema | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IStepProviderCapability interface | TODO | | +| IGateProviderCapability interface | TODO | | +| IScmConnectorCapability interface | TODO | | +| IRegistryConnectorCapability interface | TODO | | +| IVaultConnectorCapability interface | TODO | | +| INotifyConnectorCapability interface | TODO | | +| ICiConnectorCapability interface | TODO | | +| StepProviderRegistry | TODO | | +| GateProviderRegistry | TODO | | +| ConnectorRegistry | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 10-Jan-2026 | Refocused on Release Orchestrator-specific extensions (builds on Phase 100 core) | diff --git a/docs/implplan/SPRINT_20260110_101_003_PLUGIN_loader_sandbox.md b/docs/implplan/SPRINT_20260110_101_003_PLUGIN_loader_sandbox.md new file mode 100644 index 000000000..beed25d1b --- /dev/null +++ b/docs/implplan/SPRINT_20260110_101_003_PLUGIN_loader_sandbox.md @@ -0,0 +1,935 @@ +# SPRINT: Plugin Loader & Sandbox Extensions for Release Orchestrator + +> **Sprint ID:** 101_003 +> **Module:** PLUGIN +> **Phase:** 1 - Foundation +> **Status:** TODO +> **Parent:** [101_000_INDEX](SPRINT_20260110_101_000_INDEX_foundation.md) +> **Prerequisites:** [100_002 Plugin Host](SPRINT_20260110_100_002_PLUGIN_host.md), [100_004 Plugin Sandbox](SPRINT_20260110_100_004_PLUGIN_sandbox.md), [101_002 Registry Extensions](SPRINT_20260110_101_002_PLUGIN_registry.md) + +--- + +## Overview + +Extend the unified plugin host and sandbox (from Phase 100) with Release Orchestrator-specific execution contexts, service integrations, and domain-specific lifecycle management. This sprint builds on the core plugin infrastructure to add release orchestration capabilities. + +> **Note:** The core plugin host (`IPluginHost`, `PluginHost`, lifecycle management) and sandbox infrastructure (`ISandbox`, `ProcessSandbox`, resource limits) are implemented in Phase 100 sprints 100_002 and 100_004. This sprint adds Release Orchestrator domain-specific extensions. + +### Objectives + +- Create Release Orchestrator plugin context extensions +- Implement step execution context with workflow integration +- Implement gate evaluation context with promotion integration +- Add connector context with tenant-aware secret resolution +- Integrate with Release Orchestrator services (secrets, evidence, notifications) +- Add domain-specific health monitoring + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Plugin/ +│ ├── Context/ +│ │ ├── ReleaseOrchestratorPluginContext.cs +│ │ ├── StepExecutionContextBuilder.cs +│ │ ├── GateEvaluationContextBuilder.cs +│ │ └── ConnectorContextBuilder.cs +│ ├── Integration/ +│ │ ├── TenantSecretResolver.cs +│ │ ├── EvidenceCollector.cs +│ │ ├── NotificationBridge.cs +│ │ └── AuditLogger.cs +│ ├── Execution/ +│ │ ├── StepExecutor.cs +│ │ ├── GateEvaluator.cs +│ │ └── ConnectorInvoker.cs +│ └── Monitoring/ +│ ├── ReleaseOrchestratorPluginMonitor.cs +│ └── PluginMetricsCollector.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Plugin.Tests/ + ├── StepExecutorTests.cs + ├── GateEvaluatorTests.cs + └── ConnectorInvokerTests.cs +``` + +--- + +## Architecture Reference + +- [Phase 100 Plugin System](SPRINT_20260110_100_000_INDEX_plugin_unification.md) +- [Workflow Engine](../modules/release-orchestrator/modules/workflow-engine.md) +- [Promotion Gates](../modules/release-orchestrator/modules/promotion-gates.md) + +--- + +## Deliverables + +### Release Orchestrator Plugin Context Extensions + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Context; + +/// +/// Extended plugin context for Release Orchestrator with domain-specific services. +/// Wraps the base IPluginContext from Phase 100. +/// +public sealed class ReleaseOrchestratorPluginContext : IPluginContext +{ + private readonly IPluginContext _baseContext; + private readonly ITenantSecretResolver _secretResolver; + private readonly IEvidenceCollector _evidenceCollector; + private readonly INotificationBridge _notificationBridge; + private readonly IAuditLogger _auditLogger; + + public ReleaseOrchestratorPluginContext( + IPluginContext baseContext, + ITenantSecretResolver secretResolver, + IEvidenceCollector evidenceCollector, + INotificationBridge notificationBridge, + IAuditLogger auditLogger) + { + _baseContext = baseContext; + _secretResolver = secretResolver; + _evidenceCollector = evidenceCollector; + _notificationBridge = notificationBridge; + _auditLogger = auditLogger; + } + + // Delegate to base context + public IPluginConfiguration Configuration => _baseContext.Configuration; + public IPluginLogger Logger => _baseContext.Logger; + public TimeProvider TimeProvider => _baseContext.TimeProvider; + public IHttpClientFactory HttpClientFactory => _baseContext.HttpClientFactory; + public IGuidGenerator GuidGenerator => _baseContext.GuidGenerator; + + // Release Orchestrator-specific services + public ITenantSecretResolver SecretResolver => _secretResolver; + public IEvidenceCollector EvidenceCollector => _evidenceCollector; + public INotificationBridge NotificationBridge => _notificationBridge; + public IAuditLogger AuditLogger => _auditLogger; + + /// + /// Create a scoped context for a specific tenant. + /// + public ReleaseOrchestratorPluginContext ForTenant(Guid tenantId) + { + return new ReleaseOrchestratorPluginContext( + _baseContext, + _secretResolver.ForTenant(tenantId), + _evidenceCollector.ForTenant(tenantId), + _notificationBridge.ForTenant(tenantId), + _auditLogger.ForTenant(tenantId)); + } +} +``` + +### Tenant-Aware Secret Resolution + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Integration; + +/// +/// Resolves secrets with tenant isolation and vault connector integration. +/// +public interface ITenantSecretResolver : ISecretResolver +{ + /// + /// Create a resolver scoped to a specific tenant. + /// + ITenantSecretResolver ForTenant(Guid tenantId); + + /// + /// Resolve a secret using a specific vault integration. + /// + Task ResolveFromVaultAsync( + Guid integrationId, + string secretPath, + CancellationToken ct = default); + + /// + /// Resolve secret references in configuration. + /// Handles patterns like ${vault:integration-id/path/to/secret} + /// + Task ResolveConfigurationSecretsAsync( + JsonElement configuration, + CancellationToken ct = default); +} + +public sealed class TenantSecretResolver : ITenantSecretResolver +{ + private readonly IConnectorRegistry _connectorRegistry; + private readonly IPluginHost _pluginHost; + private readonly ILogger _logger; + private Guid? _tenantId; + + public TenantSecretResolver( + IConnectorRegistry connectorRegistry, + IPluginHost pluginHost, + ILogger logger) + { + _connectorRegistry = connectorRegistry; + _pluginHost = pluginHost; + _logger = logger; + } + + public ITenantSecretResolver ForTenant(Guid tenantId) + { + return new TenantSecretResolver(_connectorRegistry, _pluginHost, _logger) + { + _tenantId = tenantId + }; + } + + public async Task ResolveAsync(string key, CancellationToken ct = default) + { + // First try environment variables + var envValue = Environment.GetEnvironmentVariable(key); + if (envValue != null) return envValue; + + // Then try configured secrets store + // Implementation depends on deployment configuration + return null; + } + + public async Task ResolveFromVaultAsync( + Guid integrationId, + string secretPath, + CancellationToken ct = default) + { + if (_tenantId == null) + throw new InvalidOperationException("Tenant ID not set. Call ForTenant first."); + + // Find vault connector for this integration + var connector = await _connectorRegistry.GetConnectorAsync("vault", ct); + if (connector == null) + { + _logger.LogWarning("No vault connector found for integration {IntegrationId}", integrationId); + return null; + } + + var plugin = await _connectorRegistry.GetConnectorPluginAsync(connector.Type, ct); + if (plugin is not IVaultConnectorCapability vaultConnector) + { + _logger.LogWarning("Connector is not a vault connector"); + return null; + } + + var context = new ConnectorContext( + IntegrationId: integrationId, + TenantId: _tenantId.Value, + Configuration: JsonDocument.Parse("{}").RootElement, // Loaded from DB + SecretResolver: this, + Logger: new PluginLoggerAdapter(_logger)); + + var secret = await vaultConnector.GetSecretAsync(context, secretPath, ct); + return secret?.Value; + } + + public async Task ResolveConfigurationSecretsAsync( + JsonElement configuration, + CancellationToken ct = default) + { + // Parse and resolve ${vault:...} patterns in configuration + var json = configuration.GetRawText(); + var pattern = new Regex(@"\$\{vault:([^/]+)/([^}]+)\}"); + + var matches = pattern.Matches(json); + foreach (Match match in matches) + { + var integrationId = Guid.Parse(match.Groups[1].Value); + var secretPath = match.Groups[2].Value; + + var secretValue = await ResolveFromVaultAsync(integrationId, secretPath, ct); + if (secretValue != null) + { + json = json.Replace(match.Value, secretValue); + } + } + + return JsonDocument.Parse(json).RootElement; + } +} +``` + +### Step Executor + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Execution; + +/// +/// Executes workflow steps with full context integration. +/// +public interface IStepExecutor +{ + /// + /// Execute a step in the context of a workflow run. + /// + Task ExecuteAsync( + StepExecutionRequest request, + CancellationToken ct = default); +} + +public sealed record StepExecutionRequest( + Guid StepId, + Guid WorkflowRunId, + Guid TenantId, + string StepType, + JsonElement Configuration, + IReadOnlyDictionary Inputs, + TimeSpan Timeout); + +public sealed record StepExecutionResult( + StepStatus Status, + IReadOnlyDictionary Outputs, + TimeSpan Duration, + string? ErrorMessage, + IReadOnlyList Logs, + EvidencePacket? Evidence); + +public sealed record StepLogEntry( + DateTimeOffset Timestamp, + LogLevel Level, + string Message); + +public sealed class StepExecutor : IStepExecutor +{ + private readonly IStepProviderRegistry _stepRegistry; + private readonly ITenantSecretResolver _secretResolver; + private readonly IEvidenceCollector _evidenceCollector; + private readonly IAuditLogger _auditLogger; + private readonly ILogger _logger; + private readonly TimeProvider _timeProvider; + + public StepExecutor( + IStepProviderRegistry stepRegistry, + ITenantSecretResolver secretResolver, + IEvidenceCollector evidenceCollector, + IAuditLogger auditLogger, + ILogger logger, + TimeProvider timeProvider) + { + _stepRegistry = stepRegistry; + _secretResolver = secretResolver; + _evidenceCollector = evidenceCollector; + _auditLogger = auditLogger; + _logger = logger; + _timeProvider = timeProvider; + } + + public async Task ExecuteAsync( + StepExecutionRequest request, + CancellationToken ct = default) + { + var startTime = _timeProvider.GetUtcNow(); + var logs = new List(); + var outputWriter = new BufferedStepOutputWriter(); + + // Resolve secrets in configuration + var resolvedConfig = await _secretResolver + .ForTenant(request.TenantId) + .ResolveConfigurationSecretsAsync(request.Configuration, ct); + + // Create execution context + var context = new StepExecutionContext( + StepId: request.StepId, + WorkflowRunId: request.WorkflowRunId, + TenantId: request.TenantId, + StepType: request.StepType, + Configuration: resolvedConfig, + Inputs: request.Inputs, + OutputWriter: outputWriter, + Logger: new StepLogger(logs, _logger)); + + // Log execution start + await _auditLogger.LogAsync(new AuditEntry( + EventType: "step.execution.started", + TenantId: request.TenantId, + ResourceType: "workflow_step", + ResourceId: request.StepId.ToString(), + Details: new Dictionary + { + ["stepType"] = request.StepType, + ["workflowRunId"] = request.WorkflowRunId + })); + + try + { + // Execute with timeout + using var cts = CancellationTokenSource.CreateLinkedTokenSource(ct); + cts.CancelAfter(request.Timeout); + + var result = await _stepRegistry.ExecuteStepAsync( + request.StepType, + context, + cts.Token); + + var duration = _timeProvider.GetUtcNow() - startTime; + + // Collect evidence if step succeeded + EvidencePacket? evidence = null; + if (result.Status == StepStatus.Succeeded) + { + evidence = await _evidenceCollector + .ForTenant(request.TenantId) + .CollectStepEvidenceAsync(request.StepId, result, ct); + } + + // Log execution completion + await _auditLogger.LogAsync(new AuditEntry( + EventType: "step.execution.completed", + TenantId: request.TenantId, + ResourceType: "workflow_step", + ResourceId: request.StepId.ToString(), + Details: new Dictionary + { + ["stepType"] = request.StepType, + ["status"] = result.Status.ToString(), + ["durationMs"] = duration.TotalMilliseconds + })); + + return new StepExecutionResult( + Status: result.Status, + Outputs: result.Outputs, + Duration: duration, + ErrorMessage: result.ErrorMessage, + Logs: logs, + Evidence: evidence); + } + catch (OperationCanceledException) when (!ct.IsCancellationRequested) + { + var duration = _timeProvider.GetUtcNow() - startTime; + + await _auditLogger.LogAsync(new AuditEntry( + EventType: "step.execution.timeout", + TenantId: request.TenantId, + ResourceType: "workflow_step", + ResourceId: request.StepId.ToString(), + Details: new Dictionary + { + ["stepType"] = request.StepType, + ["timeoutMs"] = request.Timeout.TotalMilliseconds + })); + + return new StepExecutionResult( + Status: StepStatus.TimedOut, + Outputs: new Dictionary(), + Duration: duration, + ErrorMessage: $"Step timed out after {request.Timeout.TotalSeconds}s", + Logs: logs, + Evidence: null); + } + catch (Exception ex) + { + var duration = _timeProvider.GetUtcNow() - startTime; + + _logger.LogError(ex, "Step execution failed: {StepType}", request.StepType); + + await _auditLogger.LogAsync(new AuditEntry( + EventType: "step.execution.failed", + TenantId: request.TenantId, + ResourceType: "workflow_step", + ResourceId: request.StepId.ToString(), + Details: new Dictionary + { + ["stepType"] = request.StepType, + ["error"] = ex.Message + })); + + return new StepExecutionResult( + Status: StepStatus.Failed, + Outputs: new Dictionary(), + Duration: duration, + ErrorMessage: ex.Message, + Logs: logs, + Evidence: null); + } + } +} +``` + +### Gate Evaluator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Execution; + +/// +/// Evaluates promotion gates with full context integration. +/// +public interface IGateEvaluator +{ + /// + /// Evaluate a gate for a promotion. + /// + Task EvaluateAsync( + GateEvaluationRequest request, + CancellationToken ct = default); +} + +public sealed record GateEvaluationRequest( + Guid GateId, + Guid PromotionId, + Guid ReleaseId, + Guid SourceEnvironmentId, + Guid TargetEnvironmentId, + Guid TenantId, + string GateType, + JsonElement Configuration); + +public sealed record GateEvaluationResult( + GateStatus Status, + string Message, + IReadOnlyDictionary Details, + IReadOnlyList Evidence, + TimeSpan Duration, + bool CanOverride, + IReadOnlyList OverridePermissions); + +public sealed class GateEvaluator : IGateEvaluator +{ + private readonly IGateProviderRegistry _gateRegistry; + private readonly ITenantSecretResolver _secretResolver; + private readonly IEvidenceCollector _evidenceCollector; + private readonly IReleaseRepository _releaseRepository; + private readonly IEnvironmentRepository _environmentRepository; + private readonly IAuditLogger _auditLogger; + private readonly ILogger _logger; + private readonly TimeProvider _timeProvider; + + public GateEvaluator( + IGateProviderRegistry gateRegistry, + ITenantSecretResolver secretResolver, + IEvidenceCollector evidenceCollector, + IReleaseRepository releaseRepository, + IEnvironmentRepository environmentRepository, + IAuditLogger auditLogger, + ILogger logger, + TimeProvider timeProvider) + { + _gateRegistry = gateRegistry; + _secretResolver = secretResolver; + _evidenceCollector = evidenceCollector; + _releaseRepository = releaseRepository; + _environmentRepository = environmentRepository; + _auditLogger = auditLogger; + _logger = logger; + _timeProvider = timeProvider; + } + + public async Task EvaluateAsync( + GateEvaluationRequest request, + CancellationToken ct = default) + { + var startTime = _timeProvider.GetUtcNow(); + + // Load release and environment info + var release = await _releaseRepository.GetAsync(request.ReleaseId, ct) + ?? throw new InvalidOperationException($"Release not found: {request.ReleaseId}"); + + var targetEnvironment = await _environmentRepository.GetAsync(request.TargetEnvironmentId, ct) + ?? throw new InvalidOperationException($"Environment not found: {request.TargetEnvironmentId}"); + + // Get gate definition + var gateDefinition = await _gateRegistry.GetGateAsync(request.GateType, ct); + if (gateDefinition == null) + { + throw new InvalidOperationException($"Gate type not found: {request.GateType}"); + } + + // Resolve secrets in configuration + var resolvedConfig = await _secretResolver + .ForTenant(request.TenantId) + .ResolveConfigurationSecretsAsync(request.Configuration, ct); + + // Create evaluation context + var context = new GateEvaluationContext( + GateId: request.GateId, + PromotionId: request.PromotionId, + ReleaseId: request.ReleaseId, + SourceEnvironmentId: request.SourceEnvironmentId, + TargetEnvironmentId: request.TargetEnvironmentId, + TenantId: request.TenantId, + GateType: request.GateType, + Configuration: resolvedConfig, + Release: release.ToReleaseInfo(), + TargetEnvironment: targetEnvironment.ToEnvironmentInfo(), + Logger: new GateLogger(_logger)); + + // Log evaluation start + await _auditLogger.LogAsync(new AuditEntry( + EventType: "gate.evaluation.started", + TenantId: request.TenantId, + ResourceType: "promotion_gate", + ResourceId: request.GateId.ToString(), + Details: new Dictionary + { + ["gateType"] = request.GateType, + ["promotionId"] = request.PromotionId, + ["releaseId"] = request.ReleaseId + })); + + try + { + var result = await _gateRegistry.EvaluateGateAsync( + request.GateType, + context, + ct); + + var duration = _timeProvider.GetUtcNow() - startTime; + + // Collect evidence + var evidence = result.Evidence?.ToList() ?? new List(); + + // Add evaluation metadata as evidence + evidence.Add(new GateEvidence( + Type: "gate_evaluation_metadata", + Description: "Gate evaluation details", + Data: JsonSerializer.SerializeToElement(new + { + gateType = request.GateType, + evaluatedAt = _timeProvider.GetUtcNow(), + durationMs = duration.TotalMilliseconds, + status = result.Status.ToString() + }))); + + // Log evaluation completion + await _auditLogger.LogAsync(new AuditEntry( + EventType: "gate.evaluation.completed", + TenantId: request.TenantId, + ResourceType: "promotion_gate", + ResourceId: request.GateId.ToString(), + Details: new Dictionary + { + ["gateType"] = request.GateType, + ["status"] = result.Status.ToString(), + ["message"] = result.Message, + ["durationMs"] = duration.TotalMilliseconds + })); + + return new GateEvaluationResult( + Status: result.Status, + Message: result.Message, + Details: result.Details, + Evidence: evidence, + Duration: duration, + CanOverride: gateDefinition.Definition.SupportsOverride, + OverridePermissions: gateDefinition.Definition.RequiredPermissions); + } + catch (Exception ex) + { + var duration = _timeProvider.GetUtcNow() - startTime; + + _logger.LogError(ex, "Gate evaluation failed: {GateType}", request.GateType); + + await _auditLogger.LogAsync(new AuditEntry( + EventType: "gate.evaluation.failed", + TenantId: request.TenantId, + ResourceType: "promotion_gate", + ResourceId: request.GateId.ToString(), + Details: new Dictionary + { + ["gateType"] = request.GateType, + ["error"] = ex.Message + })); + + return new GateEvaluationResult( + Status: GateStatus.Failed, + Message: $"Gate evaluation error: {ex.Message}", + Details: new Dictionary { ["exception"] = ex.GetType().Name }, + Evidence: new List(), + Duration: duration, + CanOverride: gateDefinition.Definition.SupportsOverride, + OverridePermissions: gateDefinition.Definition.RequiredPermissions); + } + } +} +``` + +### Evidence Collector Integration + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Integration; + +/// +/// Collects evidence from plugin executions for audit trails. +/// +public interface IEvidenceCollector +{ + /// + /// Create a collector scoped to a specific tenant. + /// + IEvidenceCollector ForTenant(Guid tenantId); + + /// + /// Collect evidence from a step execution. + /// + Task CollectStepEvidenceAsync( + Guid stepId, + StepResult result, + CancellationToken ct = default); + + /// + /// Collect evidence from a gate evaluation. + /// + Task CollectGateEvidenceAsync( + Guid gateId, + GateResult result, + CancellationToken ct = default); + + /// + /// Collect evidence from a connector operation. + /// + Task CollectConnectorEvidenceAsync( + Guid integrationId, + string operation, + JsonElement result, + CancellationToken ct = default); +} + +public sealed record EvidencePacket( + Guid Id, + string Type, + Guid TenantId, + DateTimeOffset CollectedAt, + string ContentDigest, + JsonElement Content, + IReadOnlyDictionary Metadata); +``` + +### Plugin Metrics Collector + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Monitoring; + +/// +/// Collects metrics from Release Orchestrator plugin executions. +/// +public sealed class ReleaseOrchestratorPluginMonitor : IHostedService +{ + private readonly IPluginHost _pluginHost; + private readonly IStepProviderRegistry _stepRegistry; + private readonly IGateProviderRegistry _gateRegistry; + private readonly IConnectorRegistry _connectorRegistry; + private readonly IMeterFactory _meterFactory; + private readonly ILogger _logger; + private readonly TimeSpan _monitoringInterval = TimeSpan.FromSeconds(30); + private Timer? _timer; + + private Meter _meter; + private Counter _stepExecutionCounter; + private Counter _gateEvaluationCounter; + private Counter _connectorOperationCounter; + private Histogram _stepExecutionDuration; + private Histogram _gateEvaluationDuration; + + public ReleaseOrchestratorPluginMonitor( + IPluginHost pluginHost, + IStepProviderRegistry stepRegistry, + IGateProviderRegistry gateRegistry, + IConnectorRegistry connectorRegistry, + IMeterFactory meterFactory, + ILogger logger) + { + _pluginHost = pluginHost; + _stepRegistry = stepRegistry; + _gateRegistry = gateRegistry; + _connectorRegistry = connectorRegistry; + _meterFactory = meterFactory; + _logger = logger; + + InitializeMetrics(); + } + + private void InitializeMetrics() + { + _meter = _meterFactory.Create("StellaOps.ReleaseOrchestrator.Plugin"); + + _stepExecutionCounter = _meter.CreateCounter( + "stellaops_step_executions_total", + description: "Total number of step executions"); + + _gateEvaluationCounter = _meter.CreateCounter( + "stellaops_gate_evaluations_total", + description: "Total number of gate evaluations"); + + _connectorOperationCounter = _meter.CreateCounter( + "stellaops_connector_operations_total", + description: "Total number of connector operations"); + + _stepExecutionDuration = _meter.CreateHistogram( + "stellaops_step_execution_duration_ms", + unit: "ms", + description: "Step execution duration in milliseconds"); + + _gateEvaluationDuration = _meter.CreateHistogram( + "stellaops_gate_evaluation_duration_ms", + unit: "ms", + description: "Gate evaluation duration in milliseconds"); + } + + public Task StartAsync(CancellationToken ct) + { + _timer = new Timer( + MonitorPlugins, + null, + TimeSpan.FromSeconds(10), + _monitoringInterval); + + return Task.CompletedTask; + } + + public Task StopAsync(CancellationToken ct) + { + _timer?.Change(Timeout.Infinite, 0); + return Task.CompletedTask; + } + + private async void MonitorPlugins(object? state) + { + try + { + // Collect plugin health status + var plugins = _pluginHost.GetLoadedPlugins(); + foreach (var pluginInfo in plugins) + { + var plugin = _pluginHost.GetPlugin(pluginInfo.Id); + if (plugin != null) + { + var health = await plugin.HealthCheckAsync(CancellationToken.None); + _logger.LogDebug( + "Plugin {PluginId} health: {Status}", + pluginInfo.Id, + health.Status); + } + } + + // Count available steps, gates, connectors + var steps = await _stepRegistry.GetAllStepsAsync(); + var gates = await _gateRegistry.GetAllGatesAsync(); + var connectors = await _connectorRegistry.GetAllConnectorsAsync(); + + _logger.LogDebug( + "Plugin inventory: {Steps} steps, {Gates} gates, {Connectors} connectors", + steps.Count, gates.Count, connectors.Count); + } + catch (Exception ex) + { + _logger.LogError(ex, "Error monitoring plugins"); + } + } + + /// + /// Record a step execution metric. + /// + public void RecordStepExecution(string stepType, StepStatus status, TimeSpan duration) + { + _stepExecutionCounter.Add(1, + new KeyValuePair("step_type", stepType), + new KeyValuePair("status", status.ToString())); + + _stepExecutionDuration.Record(duration.TotalMilliseconds, + new KeyValuePair("step_type", stepType)); + } + + /// + /// Record a gate evaluation metric. + /// + public void RecordGateEvaluation(string gateType, GateStatus status, TimeSpan duration) + { + _gateEvaluationCounter.Add(1, + new KeyValuePair("gate_type", gateType), + new KeyValuePair("status", status.ToString())); + + _gateEvaluationDuration.Record(duration.TotalMilliseconds, + new KeyValuePair("gate_type", gateType)); + } + + /// + /// Record a connector operation metric. + /// + public void RecordConnectorOperation(string connectorType, string operation, bool success) + { + _connectorOperationCounter.Add(1, + new KeyValuePair("connector_type", connectorType), + new KeyValuePair("operation", operation), + new KeyValuePair("success", success)); + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] `ReleaseOrchestratorPluginContext` wraps base context with domain services +- [ ] `TenantSecretResolver` resolves secrets with tenant isolation +- [ ] Secret reference patterns (`${vault:...}`) resolved in configuration +- [ ] `StepExecutor` executes steps with full context integration +- [ ] `GateEvaluator` evaluates gates with evidence collection +- [ ] Audit logging for all plugin executions +- [ ] Evidence collection for steps and gates +- [ ] Plugin metrics exposed via OpenTelemetry +- [ ] Unit test coverage >= 85% +- [ ] Integration tests with mock plugins + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `TenantSecretResolver_IsolatesTenants` | Tenant isolation works | +| `SecretPattern_ResolvedInConfig` | ${vault:...} patterns resolved | +| `StepExecutor_RecordsAuditLogs` | Audit logging works | +| `StepExecutor_CollectsEvidence` | Evidence collected on success | +| `StepExecutor_HandlesTimeout` | Timeout handling works | +| `GateEvaluator_ReturnsOverrideInfo` | Override permissions returned | +| `PluginMonitor_CollectsMetrics` | Metrics recorded correctly | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `StepExecutor_ExecutesBuiltInStep` | Built-in step execution | +| `GateEvaluator_EvaluatesBuiltInGate` | Built-in gate evaluation | +| `EvidenceCollector_PersistsEvidence` | Evidence persisted to storage | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_002 Plugin Host | Internal | TODO | +| 100_004 Plugin Sandbox | Internal | TODO | +| 101_002 Registry Extensions | Internal | TODO | +| 101_001 Database Schema | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| ReleaseOrchestratorPluginContext | TODO | | +| TenantSecretResolver | TODO | | +| StepExecutor | TODO | | +| GateEvaluator | TODO | | +| ConnectorInvoker | TODO | | +| EvidenceCollector | TODO | | +| AuditLogger integration | TODO | | +| ReleaseOrchestratorPluginMonitor | TODO | | +| Unit tests | TODO | | +| Integration tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 10-Jan-2026 | Refocused on Release Orchestrator-specific extensions (builds on Phase 100 core) | diff --git a/docs/implplan/SPRINT_20260110_101_004_PLUGIN_sdk.md b/docs/implplan/SPRINT_20260110_101_004_PLUGIN_sdk.md new file mode 100644 index 000000000..05a86a8b8 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_101_004_PLUGIN_sdk.md @@ -0,0 +1,1120 @@ +# SPRINT: Plugin SDK Extensions for Release Orchestrator + +> **Sprint ID:** 101_004 +> **Module:** PLUGIN +> **Phase:** 1 - Foundation +> **Status:** TODO +> **Parent:** [101_000_INDEX](SPRINT_20260110_101_000_INDEX_foundation.md) +> **Prerequisites:** [100_012 Plugin SDK](SPRINT_20260110_100_012_PLUGIN_sdk.md), [101_002 Registry Extensions](SPRINT_20260110_101_002_PLUGIN_registry.md), [101_003 Loader Extensions](SPRINT_20260110_101_003_PLUGIN_loader_sandbox.md) + +--- + +## Overview + +Extend the unified Plugin SDK (from Phase 100) with Release Orchestrator-specific base classes, project templates, and testing utilities for building workflow steps, promotion gates, and integration connectors. + +> **Note:** The core Plugin SDK (`StellaOps.Plugin.Sdk`, base classes, manifest builder, developer tools) is implemented in Phase 100 sprint 100_012. This sprint adds Release Orchestrator domain-specific templates and utilities. + +### Objectives + +- Create base classes for step, gate, and connector plugins +- Create project templates for each plugin type +- Build testing utilities for Release Orchestrator plugins +- Create sample plugins demonstrating best practices +- Add documentation and tutorials + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Plugin.Sdk/ +│ ├── Contracts/ +│ │ ├── IStepPlugin.cs +│ │ ├── IGatePlugin.cs +│ │ └── IConnectorPlugin.cs +│ ├── Base/ +│ │ ├── StepPluginBase.cs +│ │ ├── GatePluginBase.cs +│ │ ├── ScmConnectorPluginBase.cs +│ │ ├── RegistryConnectorPluginBase.cs +│ │ ├── VaultConnectorPluginBase.cs +│ │ └── NotifyConnectorPluginBase.cs +│ ├── Testing/ +│ │ ├── StepTestHost.cs +│ │ ├── GateTestHost.cs +│ │ ├── ConnectorTestHost.cs +│ │ ├── MockReleaseContext.cs +│ │ └── MockEnvironmentContext.cs +│ └── StellaOps.ReleaseOrchestrator.Plugin.Sdk.csproj +├── __Templates/ +│ └── stella-orchestrator-plugin/ +│ ├── template.json +│ ├── StepPlugin/ +│ ├── GatePlugin/ +│ └── ConnectorPlugin/ +└── __Samples/ + ├── StellaOps.Plugin.Sample.Step/ + ├── StellaOps.Plugin.Sample.Gate/ + └── StellaOps.Plugin.Sample.Connector/ +``` + +--- + +## Architecture Reference + +- [Phase 100 Plugin SDK](SPRINT_20260110_100_012_PLUGIN_sdk.md) +- [Plugin System](../modules/release-orchestrator/modules/plugin-system.md) +- [Workflow Engine](../modules/release-orchestrator/modules/workflow-engine.md) + +--- + +## Deliverables + +### Step Plugin Contracts and Base Class + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Sdk.Contracts; + +/// +/// Complete interface for a workflow step plugin. +/// Combines IPlugin from Phase 100 with IStepProviderCapability. +/// +public interface IStepPlugin : IPlugin, IStepProviderCapability +{ + /// + /// Step metadata for registration. + /// + StepPluginMetadata StepMetadata { get; } +} + +public sealed record StepPluginMetadata( + string StepType, + string DisplayName, + string Description, + string Category, + JsonSchema ConfigSchema, + JsonSchema OutputSchema, + bool SupportsRetry = true, + TimeSpan? DefaultTimeout = null, + IReadOnlyList? RequiredCapabilities = null); +``` + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Sdk.Base; + +/// +/// Base class for workflow step plugins with common functionality. +/// +public abstract class StepPluginBase : PluginBase, IStepPlugin +{ + private readonly List _stepDefinitions = new(); + + /// + /// Step metadata. Derived classes must implement. + /// + public abstract StepPluginMetadata StepMetadata { get; } + + /// + /// Register additional step types (if plugin provides multiple). + /// + protected void RegisterStep(StepDefinition definition) + { + _stepDefinitions.Add(definition); + } + + public IReadOnlyList GetStepDefinitions() + { + // Primary step from metadata + var primary = new StepDefinition( + Type: StepMetadata.StepType, + DisplayName: StepMetadata.DisplayName, + Description: StepMetadata.Description, + Category: StepMetadata.Category, + ConfigSchema: StepMetadata.ConfigSchema, + OutputSchema: StepMetadata.OutputSchema, + RequiredCapabilities: StepMetadata.RequiredCapabilities ?? [], + SupportsRetry: StepMetadata.SupportsRetry, + DefaultTimeout: StepMetadata.DefaultTimeout ?? TimeSpan.FromMinutes(5)); + + return new[] { primary }.Concat(_stepDefinitions).ToList(); + } + + /// + /// Execute the step. Derived classes must implement. + /// + public abstract Task ExecuteStepAsync( + StepExecutionContext context, + CancellationToken ct); + + public virtual Task ValidateStepConfigAsync( + string stepType, + JsonElement configuration, + CancellationToken ct) + { + // Default implementation - schema validation only + return Task.FromResult(StepValidationResult.Success()); + } + + public virtual JsonSchema? GetOutputSchema(string stepType) + { + if (stepType == StepMetadata.StepType) + return StepMetadata.OutputSchema; + + var definition = _stepDefinitions.FirstOrDefault(d => d.Type == stepType); + return definition?.OutputSchema; + } + + // Helper methods for derived classes + + /// + /// Write structured output. + /// + protected static Task WriteOutputAsync( + StepExecutionContext context, + string key, + object value) + { + return context.OutputWriter.WriteAsync(key, value); + } + + /// + /// Write log message. + /// + protected static void Log( + StepExecutionContext context, + LogLevel level, + string message, + params object[] args) + { + context.Logger.Log(level, message, args); + } + + /// + /// Create a success result. + /// + protected static StepResult Success(IReadOnlyDictionary? outputs = null) + { + return new StepResult( + Status: StepStatus.Succeeded, + Outputs: outputs ?? new Dictionary()); + } + + /// + /// Create a failure result. + /// + protected static StepResult Failure(string errorMessage) + { + return new StepResult( + Status: StepStatus.Failed, + Outputs: new Dictionary(), + ErrorMessage: errorMessage); + } + + /// + /// Create a skipped result. + /// + protected static StepResult Skipped(string reason) + { + return new StepResult( + Status: StepStatus.Skipped, + Outputs: new Dictionary + { + ["skipReason"] = reason + }); + } +} +``` + +### Gate Plugin Contracts and Base Class + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Sdk.Contracts; + +/// +/// Complete interface for a promotion gate plugin. +/// +public interface IGatePlugin : IPlugin, IGateProviderCapability +{ + /// + /// Gate metadata for registration. + /// + GatePluginMetadata GateMetadata { get; } +} + +public sealed record GatePluginMetadata( + string GateType, + string DisplayName, + string Description, + string Category, + JsonSchema ConfigSchema, + bool IsBlocking = true, + bool SupportsOverride = true, + IReadOnlyList? RequiredPermissions = null); +``` + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Sdk.Base; + +/// +/// Base class for promotion gate plugins. +/// +public abstract class GatePluginBase : PluginBase, IGatePlugin +{ + private readonly List _gateDefinitions = new(); + + public abstract GatePluginMetadata GateMetadata { get; } + + protected void RegisterGate(GateDefinition definition) + { + _gateDefinitions.Add(definition); + } + + public IReadOnlyList GetGateDefinitions() + { + var primary = new GateDefinition( + Type: GateMetadata.GateType, + DisplayName: GateMetadata.DisplayName, + Description: GateMetadata.Description, + Category: GateMetadata.Category, + ConfigSchema: GateMetadata.ConfigSchema, + IsBlocking: GateMetadata.IsBlocking, + SupportsOverride: GateMetadata.SupportsOverride, + RequiredPermissions: GateMetadata.RequiredPermissions ?? []); + + return new[] { primary }.Concat(_gateDefinitions).ToList(); + } + + /// + /// Evaluate the gate. Derived classes must implement. + /// + public abstract Task EvaluateGateAsync( + GateEvaluationContext context, + CancellationToken ct); + + public virtual Task ValidateGateConfigAsync( + string gateType, + JsonElement configuration, + CancellationToken ct) + { + return Task.FromResult(GateValidationResult.Success()); + } + + // Helper methods + + /// + /// Create a pass result. + /// + protected static GateResult Pass( + string message, + IReadOnlyDictionary? details = null, + IReadOnlyList? evidence = null) + { + return new GateResult( + Status: GateStatus.Passed, + Message: message, + Details: details ?? new Dictionary(), + Evidence: evidence); + } + + /// + /// Create a fail result. + /// + protected static GateResult Fail( + string message, + IReadOnlyDictionary? details = null, + IReadOnlyList? evidence = null) + { + return new GateResult( + Status: GateStatus.Failed, + Message: message, + Details: details ?? new Dictionary(), + Evidence: evidence); + } + + /// + /// Create a warning result (passes but with advisory). + /// + protected static GateResult Warn( + string message, + IReadOnlyDictionary? details = null, + IReadOnlyList? evidence = null) + { + return new GateResult( + Status: GateStatus.Warning, + Message: message, + Details: details ?? new Dictionary(), + Evidence: evidence); + } + + /// + /// Create a pending result (requires async evaluation). + /// + protected static GateResult Pending(string message) + { + return new GateResult( + Status: GateStatus.Pending, + Message: message, + Details: new Dictionary()); + } + + /// + /// Create evidence from JSON data. + /// + protected static GateEvidence CreateEvidence( + string type, + string description, + object data) + { + return new GateEvidence( + Type: type, + Description: description, + Data: JsonSerializer.SerializeToElement(data)); + } +} +``` + +### Connector Plugin Base Classes + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Sdk.Base; + +/// +/// Base class for SCM connector plugins. +/// +public abstract class ScmConnectorPluginBase : PluginBase, IPlugin, IScmConnectorCapability +{ + public ConnectorCategory Category => ConnectorCategory.Scm; + + public abstract string ConnectorType { get; } + public abstract string DisplayName { get; } + + public abstract Task ValidateConfigAsync( + JsonElement config, + CancellationToken ct); + + public abstract Task TestConnectionAsync( + ConnectorContext context, + CancellationToken ct); + + public abstract IReadOnlyList GetSupportedOperations(); + + public abstract Task> ListRepositoriesAsync( + ConnectorContext context, + string? searchPattern = null, + CancellationToken ct = default); + + public abstract Task GetCommitAsync( + ConnectorContext context, + string repository, + string commitSha, + CancellationToken ct = default); + + public abstract Task CreateWebhookAsync( + ConnectorContext context, + string repository, + IReadOnlyList events, + string callbackUrl, + CancellationToken ct = default); + + public abstract Task> ListReleasesAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default); + + // Helper methods + + /// + /// Get configuration value with type conversion. + /// + protected static T GetConfig(JsonElement config, string key, T defaultValue = default!) + { + if (config.TryGetProperty(key, out var element)) + { + try + { + return element.Deserialize() ?? defaultValue; + } + catch + { + return defaultValue; + } + } + return defaultValue; + } + + /// + /// Validate required configuration fields. + /// + protected static ConfigValidationResult ValidateRequired( + JsonElement config, + params string[] requiredFields) + { + var errors = new List(); + + foreach (var field in requiredFields) + { + if (!config.TryGetProperty(field, out var element) || + element.ValueKind == JsonValueKind.Null || + (element.ValueKind == JsonValueKind.String && string.IsNullOrEmpty(element.GetString()))) + { + errors.Add($"Required field '{field}' is missing or empty"); + } + } + + return errors.Count == 0 + ? ConfigValidationResult.Success() + : ConfigValidationResult.Failure(errors.ToArray()); + } + + /// + /// Create a connection test result. + /// + protected static ConnectionTestResult ConnectionSuccess(TimeSpan responseTime) + { + return new ConnectionTestResult(true, "Connection successful", responseTime); + } + + protected static ConnectionTestResult ConnectionFailed(string message) + { + return new ConnectionTestResult(false, message, TimeSpan.Zero); + } +} + +/// +/// Base class for registry connector plugins. +/// +public abstract class RegistryConnectorPluginBase : PluginBase, IPlugin, IRegistryConnectorCapability +{ + public ConnectorCategory Category => ConnectorCategory.Registry; + + public abstract string ConnectorType { get; } + public abstract string DisplayName { get; } + + public abstract Task ValidateConfigAsync( + JsonElement config, + CancellationToken ct); + + public abstract Task TestConnectionAsync( + ConnectorContext context, + CancellationToken ct); + + public abstract IReadOnlyList GetSupportedOperations(); + + public abstract Task> ListRepositoriesAsync( + ConnectorContext context, + string? prefix = null, + CancellationToken ct = default); + + public abstract Task> ListTagsAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default); + + public abstract Task ResolveTagAsync( + ConnectorContext context, + string repository, + string tag, + CancellationToken ct = default); + + public abstract Task GetManifestAsync( + ConnectorContext context, + string repository, + string reference, + CancellationToken ct = default); + + public abstract Task GetPullCredentialsAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default); + + // Helper methods inherited from ScmConnectorPluginBase + protected static T GetConfig(JsonElement config, string key, T defaultValue = default!) => + ScmConnectorPluginBase.GetConfig(config, key, defaultValue); + + protected static ConfigValidationResult ValidateRequired(JsonElement config, params string[] fields) => + ScmConnectorPluginBase.ValidateRequired(config, fields); + + protected static ConnectionTestResult ConnectionSuccess(TimeSpan responseTime) => + ScmConnectorPluginBase.ConnectionSuccess(responseTime); + + protected static ConnectionTestResult ConnectionFailed(string message) => + ScmConnectorPluginBase.ConnectionFailed(message); +} +``` + +### Testing Utilities + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Sdk.Testing; + +/// +/// Test host for step plugin unit testing. +/// +public sealed class StepTestHost : IAsyncDisposable +{ + private readonly IStepPlugin _plugin; + private readonly IPluginContext _context; + + private StepTestHost(IStepPlugin plugin, IPluginContext context) + { + _plugin = plugin; + _context = context; + } + + /// + /// Create a test host for a step plugin. + /// + public static async Task CreateAsync( + Action? configureContext = null) + where TPlugin : IStepPlugin, new() + { + var context = new MockPluginContext(); + configureContext?.Invoke(context); + + var plugin = new TPlugin(); + await plugin.InitializeAsync(context, CancellationToken.None); + + return new StepTestHost(plugin, context); + } + + /// + /// Execute the step with test inputs. + /// + public Task ExecuteAsync( + JsonElement? configuration = null, + IReadOnlyDictionary? inputs = null, + CancellationToken ct = default) + { + var context = new StepExecutionContext( + StepId: Guid.NewGuid(), + WorkflowRunId: Guid.NewGuid(), + TenantId: Guid.NewGuid(), + StepType: _plugin.StepMetadata.StepType, + Configuration: configuration ?? JsonDocument.Parse("{}").RootElement, + Inputs: inputs ?? new Dictionary(), + OutputWriter: new MockStepOutputWriter(), + Logger: new MockPluginLogger()); + + return _plugin.ExecuteStepAsync(context, ct); + } + + /// + /// Execute with strongly-typed configuration. + /// + public Task ExecuteAsync( + TConfig configuration, + IReadOnlyDictionary? inputs = null, + CancellationToken ct = default) + { + var configJson = JsonSerializer.SerializeToElement(configuration); + return ExecuteAsync(configJson, inputs, ct); + } + + /// + /// Validate step configuration. + /// + public Task ValidateConfigAsync( + JsonElement configuration, + CancellationToken ct = default) + { + return _plugin.ValidateStepConfigAsync( + _plugin.StepMetadata.StepType, + configuration, + ct); + } + + public async ValueTask DisposeAsync() + { + await _plugin.DisposeAsync(); + } +} + +/// +/// Test host for gate plugin unit testing. +/// +public sealed class GateTestHost : IAsyncDisposable +{ + private readonly IGatePlugin _plugin; + private readonly IPluginContext _context; + + private GateTestHost(IGatePlugin plugin, IPluginContext context) + { + _plugin = plugin; + _context = context; + } + + public static async Task CreateAsync( + Action? configureContext = null) + where TPlugin : IGatePlugin, new() + { + var context = new MockPluginContext(); + configureContext?.Invoke(context); + + var plugin = new TPlugin(); + await plugin.InitializeAsync(context, CancellationToken.None); + + return new GateTestHost(plugin, context); + } + + /// + /// Evaluate the gate with test context. + /// + public Task EvaluateAsync( + JsonElement? configuration = null, + ReleaseInfo? release = null, + EnvironmentInfo? environment = null, + CancellationToken ct = default) + { + var context = new GateEvaluationContext( + GateId: Guid.NewGuid(), + PromotionId: Guid.NewGuid(), + ReleaseId: Guid.NewGuid(), + SourceEnvironmentId: Guid.NewGuid(), + TargetEnvironmentId: Guid.NewGuid(), + TenantId: Guid.NewGuid(), + GateType: _plugin.GateMetadata.GateType, + Configuration: configuration ?? JsonDocument.Parse("{}").RootElement, + Release: release ?? MockReleaseContext.Create(), + TargetEnvironment: environment ?? MockEnvironmentContext.Create(), + Logger: new MockPluginLogger()); + + return _plugin.EvaluateGateAsync(context, ct); + } + + public async ValueTask DisposeAsync() + { + await _plugin.DisposeAsync(); + } +} + +/// +/// Test host for connector plugin unit testing. +/// +public sealed class ConnectorTestHost : IAsyncDisposable + where TConnector : IPlugin, IIntegrationConnectorCapability +{ + private readonly TConnector _plugin; + private readonly MockPluginContext _context; + + private ConnectorTestHost(TConnector plugin, MockPluginContext context) + { + _plugin = plugin; + _context = context; + } + + public TConnector Plugin => _plugin; + + public static async Task> CreateAsync( + TConnector plugin, + Action? configureContext = null) + { + var context = new MockPluginContext(); + configureContext?.Invoke(context); + + await plugin.InitializeAsync(context, CancellationToken.None); + + return new ConnectorTestHost(plugin, context); + } + + /// + /// Create a connector context for testing. + /// + public ConnectorContext CreateContext( + JsonElement? configuration = null, + Dictionary? secrets = null) + { + return new ConnectorContext( + IntegrationId: Guid.NewGuid(), + TenantId: Guid.NewGuid(), + Configuration: configuration ?? JsonDocument.Parse("{}").RootElement, + SecretResolver: new MockSecretResolver(secrets ?? new()), + Logger: new MockPluginLogger()); + } + + /// + /// Test connection with configuration. + /// + public Task TestConnectionAsync( + JsonElement? configuration = null, + CancellationToken ct = default) + { + var context = CreateContext(configuration); + return _plugin.TestConnectionAsync(context, ct); + } + + public async ValueTask DisposeAsync() + { + await _plugin.DisposeAsync(); + } +} + +/// +/// Mock release context for testing. +/// +public static class MockReleaseContext +{ + public static ReleaseInfo Create( + string? version = null, + IReadOnlyList? components = null) + { + return new ReleaseInfo( + Id: Guid.NewGuid(), + Name: "test-release", + Version: version ?? "1.0.0", + CreatedAt: DateTimeOffset.UtcNow, + Components: components ?? new List + { + new ReleaseComponent( + Name: "test-service", + ImageReference: "registry.example.com/test:1.0.0", + Digest: "sha256:abc123", + Tags: new[] { "1.0.0", "latest" }) + }, + Metadata: new Dictionary()); + } +} + +/// +/// Mock environment context for testing. +/// +public static class MockEnvironmentContext +{ + public static EnvironmentInfo Create( + string? name = null, + string? tier = null) + { + return new EnvironmentInfo( + Id: Guid.NewGuid(), + Name: name ?? "test-env", + Tier: tier ?? "development", + Variables: new Dictionary(), + Metadata: new Dictionary()); + } +} +``` + +### Project Template + +```json +// template.json +{ + "$schema": "http://json.schemastore.org/template", + "author": "Stella Ops", + "classifications": ["Stella", "Plugin", "Release Orchestrator", "Step", "Gate", "Connector"], + "identity": "StellaOps.ReleaseOrchestrator.Plugin.Templates", + "name": "Stella Ops Release Orchestrator Plugin", + "shortName": "stella-orchestrator-plugin", + "tags": { + "language": "C#", + "type": "project" + }, + "sourceName": "StellaOps.Plugin.Template", + "preferNameDirectory": true, + "symbols": { + "pluginType": { + "type": "parameter", + "datatype": "choice", + "choices": [ + { "choice": "step", "description": "Workflow step plugin" }, + { "choice": "gate", "description": "Promotion gate plugin" }, + { "choice": "scm", "description": "SCM connector plugin" }, + { "choice": "registry", "description": "Container registry connector plugin" }, + { "choice": "vault", "description": "Secrets vault connector plugin" }, + { "choice": "notify", "description": "Notification connector plugin" } + ], + "defaultValue": "step", + "description": "Type of plugin to create" + }, + "pluginName": { + "type": "parameter", + "datatype": "string", + "defaultValue": "my-plugin", + "description": "Plugin identifier (lowercase, alphanumeric, hyphens)" + }, + "author": { + "type": "parameter", + "datatype": "string", + "defaultValue": "Your Name", + "description": "Plugin author name" + } + }, + "sources": [ + { + "modifiers": [ + { "condition": "(pluginType == 'step')", "include": ["StepPlugin/**/*"] }, + { "condition": "(pluginType == 'gate')", "include": ["GatePlugin/**/*"] }, + { "condition": "(pluginType == 'scm')", "include": ["ScmConnectorPlugin/**/*"] }, + { "condition": "(pluginType == 'registry')", "include": ["RegistryConnectorPlugin/**/*"] }, + { "condition": "(pluginType == 'vault')", "include": ["VaultConnectorPlugin/**/*"] }, + { "condition": "(pluginType == 'notify')", "include": ["NotifyConnectorPlugin/**/*"] } + ] + } + ] +} +``` + +### Sample Step Plugin + +```csharp +// Sample: HttpRequestStep - executes HTTP requests as workflow step +namespace StellaOps.Plugin.Sample.Step; + +[Plugin( + id: "com.stellaops.step.http-request", + name: "HTTP Request Step", + version: "1.0.0", + vendor: "Stella Ops")] +[ProvidesCapability(PluginCapabilities.WorkflowStep, CapabilityId = "http-request")] +[RequiresCapability(PluginCapabilities.Network)] +public sealed class HttpRequestStep : StepPluginBase +{ + private HttpClient? _httpClient; + + public override PluginInfo Info => new( + Id: "com.stellaops.step.http-request", + Name: "HTTP Request Step", + Version: "1.0.0", + Vendor: "Stella Ops", + Description: "Execute HTTP requests as part of workflow"); + + public override PluginTrustLevel TrustLevel => PluginTrustLevel.BuiltIn; + public override PluginCapabilities Capabilities => + PluginCapabilities.WorkflowStep | PluginCapabilities.Network; + + public override StepPluginMetadata StepMetadata => new( + StepType: "http-request", + DisplayName: "HTTP Request", + Description: "Execute an HTTP request and capture the response", + Category: "Integration", + ConfigSchema: CreateConfigSchema(), + OutputSchema: CreateOutputSchema(), + SupportsRetry: true, + DefaultTimeout: TimeSpan.FromSeconds(30)); + + protected override Task InitializeCoreAsync(IPluginContext context, CancellationToken ct) + { + _httpClient = context.HttpClientFactory.CreateClient("HttpRequestStep"); + return Task.CompletedTask; + } + + public override async Task ExecuteStepAsync( + StepExecutionContext context, + CancellationToken ct) + { + var config = context.Configuration; + + var method = GetConfig(config, "method", "GET"); + var url = GetConfig(config, "url", ""); + var headers = GetConfig>(config, "headers", new()); + var body = GetConfig(config, "body", null); + var expectedStatus = GetConfig(config, "expectedStatus", null); + + if (string.IsNullOrEmpty(url)) + { + return Failure("URL is required"); + } + + Log(context, LogLevel.Information, "Executing {Method} request to {Url}", method, url); + + var request = new HttpRequestMessage(new HttpMethod(method), url); + + foreach (var header in headers) + { + request.Headers.TryAddWithoutValidation(header.Key, header.Value); + } + + if (!string.IsNullOrEmpty(body)) + { + request.Content = new StringContent(body, Encoding.UTF8, "application/json"); + } + + var response = await _httpClient!.SendAsync(request, ct); + + var responseBody = await response.Content.ReadAsStringAsync(ct); + var statusCode = (int)response.StatusCode; + + await WriteOutputAsync(context, "statusCode", statusCode); + await WriteOutputAsync(context, "body", responseBody); + await WriteOutputAsync(context, "headers", + response.Headers.ToDictionary(h => h.Key, h => string.Join(", ", h.Value))); + + if (expectedStatus.HasValue && statusCode != expectedStatus.Value) + { + return Failure($"Expected status {expectedStatus.Value} but got {statusCode}"); + } + + if (!response.IsSuccessStatusCode) + { + return Failure($"Request failed with status {statusCode}"); + } + + return Success(new Dictionary + { + ["statusCode"] = statusCode, + ["body"] = responseBody + }); + } + + private static JsonSchema CreateConfigSchema() + { + return JsonSchema.Parse(""" + { + "type": "object", + "properties": { + "method": { "type": "string", "enum": ["GET", "POST", "PUT", "DELETE", "PATCH"] }, + "url": { "type": "string", "format": "uri" }, + "headers": { "type": "object", "additionalProperties": { "type": "string" } }, + "body": { "type": "string" }, + "expectedStatus": { "type": "integer" } + }, + "required": ["url"] + } + """); + } + + private static JsonSchema CreateOutputSchema() + { + return JsonSchema.Parse(""" + { + "type": "object", + "properties": { + "statusCode": { "type": "integer" }, + "body": { "type": "string" }, + "headers": { "type": "object" } + } + } + """); + } + + private static T GetConfig(JsonElement config, string key, T defaultValue) + { + if (config.TryGetProperty(key, out var element)) + { + try { return element.Deserialize() ?? defaultValue; } + catch { return defaultValue; } + } + return defaultValue; + } + + public override ValueTask DisposeAsync() + { + _httpClient?.Dispose(); + return base.DisposeAsync(); + } +} +``` + +--- + +## NuGet Package Configuration + +```xml + + + + net10.0 + enable + enable + preview + + StellaOps.ReleaseOrchestrator.Plugin.Sdk + 1.0.0 + Stella Ops + Stella Ops + SDK for building Stella Ops Release Orchestrator plugins (steps, gates, connectors) + stellaops;plugin;step;gate;connector;release;orchestrator;workflow + AGPL-3.0-or-later + https://stellaops.io + https://git.stella-ops.org/stella-ops.org/git.stella-ops.org + git + README.md + true + + + + + + + + + + + +``` + +--- + +## Acceptance Criteria + +- [ ] `IStepPlugin` and `StepPluginBase` enable step development +- [ ] `IGatePlugin` and `GatePluginBase` enable gate development +- [ ] Connector base classes for all categories (SCM, Registry, Vault, Notify) +- [ ] `StepTestHost` enables step unit testing +- [ ] `GateTestHost` enables gate unit testing +- [ ] `ConnectorTestHost` enables connector unit testing +- [ ] Mock contexts for releases and environments +- [ ] Project templates install via `dotnet new` +- [ ] Sample plugins demonstrate best practices +- [ ] SDK documentation complete +- [ ] NuGet package builds + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `StepPluginBase_ProvidesDefinitions` | Step definitions registered | +| `GatePluginBase_ProvidesDefinitions` | Gate definitions registered | +| `StepTestHost_ExecutesStep` | Test host works | +| `GateTestHost_EvaluatesGate` | Test host works | +| `MockContexts_ProvideDefaults` | Mock contexts usable | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `SampleStepPlugin_ExecutesCorrectly` | HTTP request step works | +| `TemplateProject_Builds` | Template generates valid project | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_012 Plugin SDK | Internal | TODO | +| 101_002 Registry Extensions | Internal | TODO | +| 101_003 Loader Extensions | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IStepPlugin interface | TODO | | +| StepPluginBase | TODO | | +| IGatePlugin interface | TODO | | +| GatePluginBase | TODO | | +| Connector base classes (5) | TODO | | +| StepTestHost | TODO | | +| GateTestHost | TODO | | +| ConnectorTestHost | TODO | | +| Mock contexts | TODO | | +| Project templates | TODO | | +| Sample plugins (3) | TODO | | +| NuGet package | TODO | | +| SDK documentation | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 10-Jan-2026 | Refocused on Release Orchestrator-specific SDK extensions (builds on Phase 100 core) | diff --git a/docs/implplan/SPRINT_20260110_102_000_INDEX_integration_hub.md b/docs/implplan/SPRINT_20260110_102_000_INDEX_integration_hub.md new file mode 100644 index 000000000..cf648e2ec --- /dev/null +++ b/docs/implplan/SPRINT_20260110_102_000_INDEX_integration_hub.md @@ -0,0 +1,201 @@ +# SPRINT INDEX: Phase 2 - Integration Hub + +> **Epic:** Release Orchestrator +> **Phase:** 2 - Integration Hub +> **Batch:** 102 +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_release_orchestrator.md) + +--- + +## Overview + +Phase 2 builds the Integration Hub - the system for connecting to external SCM, CI, Registry, Vault, and Notification services. Includes the connector runtime and built-in connectors. + +### Objectives + +- Implement Integration Manager for CRUD operations +- Build Connector Runtime for plugin execution +- Create built-in SCM connectors (GitHub, GitLab, Gitea) +- Create built-in Registry connectors (Docker Hub, Harbor, ACR, ECR, GCR) +- Create built-in Vault connector +- Implement Doctor checks for integration health + +--- + +## Sprint Structure + +| Sprint ID | Title | Module | Status | Dependencies | +|-----------|-------|--------|--------|--------------| +| 102_001 | Integration Manager | INTHUB | TODO | 101_002 | +| 102_002 | Connector Runtime | INTHUB | TODO | 102_001 | +| 102_003 | Built-in SCM Connectors | INTHUB | TODO | 102_002 | +| 102_004 | Built-in Registry Connectors | INTHUB | TODO | 102_002 | +| 102_005 | Built-in Vault Connector | INTHUB | TODO | 102_002 | +| 102_006 | Doctor Checks | INTHUB | TODO | 102_002 | + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ INTEGRATION HUB │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ INTEGRATION MANAGER (102_001) │ │ +│ │ │ │ +│ │ - Integration CRUD - Config encryption │ │ +│ │ - Health status tracking - Integration events │ │ +│ │ - Tenant isolation - Audit logging │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ CONNECTOR RUNTIME (102_002) │ │ +│ │ │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ Connector │ │ Connector │ │ Pool │ │ │ +│ │ │ Factory │ │ Pool │ │ Manager │ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ │ │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ Retry │ │ Circuit │ │ Rate │ │ │ +│ │ │ Policy │ │ Breaker │ │ Limiter │ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ BUILT-IN CONNECTORS │ │ +│ │ │ │ +│ │ ┌─────────────────────┐ ┌─────────────────────┐ │ │ +│ │ │ SCM (102_003) │ │ Registry (102_004) │ │ │ +│ │ │ │ │ │ │ │ +│ │ │ - GitHub │ │ - Docker Hub │ │ │ +│ │ │ - GitLab │ │ - Harbor │ │ │ +│ │ │ - Gitea │ │ - ACR / ECR / GCR │ │ │ +│ │ │ - Azure DevOps │ │ - Generic OCI │ │ │ +│ │ └─────────────────────┘ └─────────────────────┘ │ │ +│ │ │ │ +│ │ ┌─────────────────────┐ ┌─────────────────────┐ │ │ +│ │ │ Vault (102_005) │ │ Doctor (102_006) │ │ │ +│ │ │ │ │ │ │ │ +│ │ │ - HashiCorp Vault │ │ - Connectivity │ │ │ +│ │ │ - Azure Key Vault │ │ - Credentials │ │ │ +│ │ │ - AWS Secrets Mgr │ │ - Permissions │ │ │ +│ │ │ │ │ - Rate limits │ │ │ +│ │ └─────────────────────┘ └─────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Deliverables Summary + +### 102_001: Integration Manager + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IIntegrationManager` | Interface | CRUD operations | +| `IntegrationManager` | Class | Implementation | +| `IntegrationStore` | Class | Database persistence | +| `IntegrationEncryption` | Class | Config encryption | +| `IntegrationEvents` | Events | Domain events | + +### 102_002: Connector Runtime + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IConnectorFactory` | Interface | Creates connectors | +| `ConnectorFactory` | Class | Plugin-aware factory | +| `ConnectorPool` | Class | Connection pooling | +| `ConnectorRetryPolicy` | Class | Retry with backoff | +| `ConnectorCircuitBreaker` | Class | Fault tolerance | +| `ConnectorRateLimiter` | Class | Rate limiting | + +### 102_003: Built-in SCM Connectors + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `GitHubConnector` | Connector | GitHub.com / GHE | +| `GitLabConnector` | Connector | GitLab.com / Self-hosted | +| `GiteaConnector` | Connector | Gitea self-hosted | +| `AzureDevOpsConnector` | Connector | Azure DevOps Services | + +### 102_004: Built-in Registry Connectors + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `DockerHubConnector` | Connector | Docker Hub | +| `HarborConnector` | Connector | Harbor registry | +| `AcrConnector` | Connector | Azure Container Registry | +| `EcrConnector` | Connector | AWS ECR | +| `GcrConnector` | Connector | Google Container Registry | +| `GenericOciConnector` | Connector | Any OCI-compliant registry | + +### 102_005: Built-in Vault Connector + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `HashiCorpVaultConnector` | Connector | HashiCorp Vault | +| `AzureKeyVaultConnector` | Connector | Azure Key Vault | +| `AwsSecretsManagerConnector` | Connector | AWS Secrets Manager | + +### 102_006: Doctor Checks + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IDoctorCheck` | Interface | Health check contract | +| `ConnectivityCheck` | Check | Network connectivity | +| `CredentialsCheck` | Check | Credential validity | +| `PermissionsCheck` | Check | Required permissions | +| `RateLimitCheck` | Check | Rate limit status | +| `DoctorReport` | Record | Aggregated results | + +--- + +## Dependencies + +### External Dependencies + +| Dependency | Purpose | +|------------|---------| +| Octokit | GitHub API client | +| GitLabApiClient | GitLab API client | +| AWSSDK.* | AWS service clients | +| Azure.* | Azure service clients | +| Docker.DotNet | Docker API client | + +### Internal Dependencies + +| Module | Purpose | +|--------|---------| +| 101_002 Plugin Registry | Plugin discovery | +| 101_003 Plugin Loader | Plugin execution | +| Authority | Tenant context, credentials | + +--- + +## Acceptance Criteria + +- [ ] Integration CRUD operations work +- [ ] Config encryption with tenant keys +- [ ] Connector factory creates correct instances +- [ ] Connection pooling reduces overhead +- [ ] Retry policy handles transient failures +- [ ] Circuit breaker prevents cascading failures +- [ ] All built-in SCM connectors work +- [ ] All built-in registry connectors work +- [ ] Vault connectors retrieve secrets +- [ ] Doctor checks identify issues +- [ ] Unit test coverage ≥80% +- [ ] Integration tests pass + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Phase 2 index created | diff --git a/docs/implplan/SPRINT_20260110_102_001_INTHUB_integration_manager.md b/docs/implplan/SPRINT_20260110_102_001_INTHUB_integration_manager.md new file mode 100644 index 000000000..8cd6d9e17 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_102_001_INTHUB_integration_manager.md @@ -0,0 +1,328 @@ +# SPRINT: Integration Manager + +> **Sprint ID:** 102_001 +> **Module:** INTHUB +> **Phase:** 2 - Integration Hub +> **Status:** TODO +> **Parent:** [102_000_INDEX](SPRINT_20260110_102_000_INDEX_integration_hub.md) + +--- + +## Overview + +Implement the Integration Manager for creating, updating, and managing integrations with external systems. Includes encrypted configuration storage and tenant isolation. + +### Objectives + +- CRUD operations for integrations +- Encrypted configuration storage +- Health status tracking +- Tenant isolation +- Domain events for integration changes + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.IntegrationHub/ +│ ├── Manager/ +│ │ ├── IIntegrationManager.cs +│ │ ├── IntegrationManager.cs +│ │ └── IntegrationValidator.cs +│ ├── Store/ +│ │ ├── IIntegrationStore.cs +│ │ ├── IntegrationStore.cs +│ │ └── IntegrationMapper.cs +│ ├── Encryption/ +│ │ ├── IIntegrationEncryption.cs +│ │ └── IntegrationEncryption.cs +│ ├── Events/ +│ │ ├── IntegrationCreated.cs +│ │ ├── IntegrationUpdated.cs +│ │ └── IntegrationDeleted.cs +│ └── Models/ +│ ├── Integration.cs +│ ├── IntegrationType.cs +│ └── HealthStatus.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.IntegrationHub.Tests/ + └── Manager/ +``` + +--- + +## Architecture Reference + +- [Integration Hub](../modules/release-orchestrator/modules/integration-hub.md) +- [Security Overview](../modules/release-orchestrator/security/overview.md) + +--- + +## Deliverables + +### IIntegrationManager Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Manager; + +public interface IIntegrationManager +{ + Task CreateAsync( + CreateIntegrationRequest request, + CancellationToken ct = default); + + Task UpdateAsync( + Guid id, + UpdateIntegrationRequest request, + CancellationToken ct = default); + + Task DeleteAsync(Guid id, CancellationToken ct = default); + + Task GetAsync(Guid id, CancellationToken ct = default); + + Task GetByNameAsync( + string name, + CancellationToken ct = default); + + Task> ListAsync( + IntegrationFilter? filter = null, + CancellationToken ct = default); + + Task> ListByTypeAsync( + IntegrationType type, + CancellationToken ct = default); + + Task SetEnabledAsync(Guid id, bool enabled, CancellationToken ct = default); + + Task UpdateHealthAsync( + Guid id, + HealthStatus status, + CancellationToken ct = default); + + Task TestConnectionAsync( + Guid id, + CancellationToken ct = default); +} + +public sealed record CreateIntegrationRequest( + string Name, + string DisplayName, + IntegrationType Type, + JsonElement Configuration +); + +public sealed record UpdateIntegrationRequest( + string? DisplayName = null, + JsonElement? Configuration = null, + bool? IsEnabled = null +); + +public sealed record IntegrationFilter( + IntegrationType? Type = null, + bool? IsEnabled = null, + HealthStatus? HealthStatus = null +); +``` + +### Integration Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Models; + +public sealed record Integration +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required string Name { get; init; } + public required string DisplayName { get; init; } + public required IntegrationType Type { get; init; } + public required bool IsEnabled { get; init; } + public required HealthStatus HealthStatus { get; init; } + public DateTimeOffset? LastHealthCheck { get; init; } + public DateTimeOffset CreatedAt { get; init; } + public DateTimeOffset UpdatedAt { get; init; } + public Guid CreatedBy { get; init; } + + // Configuration is decrypted on demand, not stored in memory + public JsonElement? Configuration { get; init; } +} + +public enum IntegrationType +{ + Scm, + Ci, + Registry, + Vault, + Notify +} + +public enum HealthStatus +{ + Unknown, + Healthy, + Degraded, + Unhealthy +} +``` + +### IntegrationEncryption + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Encryption; + +public interface IIntegrationEncryption +{ + Task EncryptAsync( + Guid tenantId, + JsonElement configuration, + CancellationToken ct = default); + + Task DecryptAsync( + Guid tenantId, + byte[] encryptedConfig, + CancellationToken ct = default); +} + +public sealed class IntegrationEncryption : IIntegrationEncryption +{ + private readonly ITenantKeyProvider _keyProvider; + + public IntegrationEncryption(ITenantKeyProvider keyProvider) + { + _keyProvider = keyProvider; + } + + public async Task EncryptAsync( + Guid tenantId, + JsonElement configuration, + CancellationToken ct = default) + { + var key = await _keyProvider.GetKeyAsync(tenantId, ct); + var json = configuration.GetRawText(); + var plaintext = Encoding.UTF8.GetBytes(json); + + using var aes = Aes.Create(); + aes.Key = key; + aes.GenerateIV(); + + using var encryptor = aes.CreateEncryptor(); + var ciphertext = encryptor.TransformFinalBlock( + plaintext, 0, plaintext.Length); + + // Prepend IV to ciphertext + var result = new byte[aes.IV.Length + ciphertext.Length]; + aes.IV.CopyTo(result, 0); + ciphertext.CopyTo(result, aes.IV.Length); + + return result; + } + + public async Task DecryptAsync( + Guid tenantId, + byte[] encryptedConfig, + CancellationToken ct = default) + { + var key = await _keyProvider.GetKeyAsync(tenantId, ct); + + using var aes = Aes.Create(); + aes.Key = key; + + // Extract IV from beginning + var iv = encryptedConfig[..16]; + var ciphertext = encryptedConfig[16..]; + aes.IV = iv; + + using var decryptor = aes.CreateDecryptor(); + var plaintext = decryptor.TransformFinalBlock( + ciphertext, 0, ciphertext.Length); + + var json = Encoding.UTF8.GetString(plaintext); + return JsonDocument.Parse(json).RootElement; + } +} +``` + +### Domain Events + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Events; + +public sealed record IntegrationCreated( + Guid IntegrationId, + Guid TenantId, + string Name, + IntegrationType Type, + DateTimeOffset CreatedAt, + Guid CreatedBy +) : IDomainEvent; + +public sealed record IntegrationUpdated( + Guid IntegrationId, + Guid TenantId, + IReadOnlyList ChangedFields, + DateTimeOffset UpdatedAt, + Guid UpdatedBy +) : IDomainEvent; + +public sealed record IntegrationDeleted( + Guid IntegrationId, + Guid TenantId, + string Name, + DateTimeOffset DeletedAt, + Guid DeletedBy +) : IDomainEvent; + +public sealed record IntegrationHealthChanged( + Guid IntegrationId, + Guid TenantId, + HealthStatus OldStatus, + HealthStatus NewStatus, + DateTimeOffset ChangedAt +) : IDomainEvent; +``` + +--- + +## Acceptance Criteria + +- [ ] Create integration with encrypted config +- [ ] Update integration preserves encryption +- [ ] Delete integration removes all data +- [ ] List integrations with filtering +- [ ] Tenant isolation enforced +- [ ] Domain events published +- [ ] Health status tracked +- [ ] Connection test works +- [ ] Unit test coverage ≥85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 101_001 Database Schema | Internal | TODO | +| 101_002 Plugin Registry | Internal | TODO | +| Authority | Internal | Exists | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IIntegrationManager | TODO | | +| IntegrationManager | TODO | | +| IntegrationStore | TODO | | +| IntegrationEncryption | TODO | | +| Domain events | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_102_002_INTHUB_connector_runtime.md b/docs/implplan/SPRINT_20260110_102_002_INTHUB_connector_runtime.md new file mode 100644 index 000000000..59e30ea04 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_102_002_INTHUB_connector_runtime.md @@ -0,0 +1,522 @@ +# SPRINT: Connector Runtime + +> **Sprint ID:** 102_002 +> **Module:** INTHUB +> **Phase:** 2 - Integration Hub +> **Status:** TODO +> **Parent:** [102_000_INDEX](SPRINT_20260110_102_000_INDEX_integration_hub.md) + +--- + +## Overview + +Build the Connector Runtime that manages connector instantiation, pooling, and resilience patterns. Handles both built-in and plugin connectors uniformly. + +### Objectives + +- Connector factory for creating instances +- Connection pooling for efficiency +- Retry policies with exponential backoff +- Circuit breaker for fault isolation +- Rate limiting per integration + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.IntegrationHub/ +│ └── Runtime/ +│ ├── IConnectorFactory.cs +│ ├── ConnectorFactory.cs +│ ├── ConnectorPool.cs +│ ├── ConnectorPoolManager.cs +│ ├── ConnectorRetryPolicy.cs +│ ├── ConnectorCircuitBreaker.cs +│ ├── ConnectorRateLimiter.cs +│ └── ConnectorContext.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.IntegrationHub.Tests/ + └── Runtime/ +``` + +--- + +## Deliverables + +### IConnectorFactory + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Runtime; + +public interface IConnectorFactory +{ + Task CreateAsync( + Integration integration, + CancellationToken ct = default); + + Task CreateAsync( + Integration integration, + CancellationToken ct = default) where T : IConnectorPlugin; + + bool CanCreate(IntegrationType type, string? pluginName = null); + + IReadOnlyList GetAvailableConnectors(IntegrationType type); +} + +public sealed class ConnectorFactory : IConnectorFactory +{ + private readonly IPluginRegistry _pluginRegistry; + private readonly IPluginLoader _pluginLoader; + private readonly IIntegrationEncryption _encryption; + private readonly IServiceProvider _serviceProvider; + private readonly ILogger _logger; + + private readonly Dictionary _builtInConnectors = new() + { + ["github"] = typeof(GitHubConnector), + ["gitlab"] = typeof(GitLabConnector), + ["gitea"] = typeof(GiteaConnector), + ["dockerhub"] = typeof(DockerHubConnector), + ["harbor"] = typeof(HarborConnector), + ["acr"] = typeof(AcrConnector), + ["ecr"] = typeof(EcrConnector), + ["gcr"] = typeof(GcrConnector), + ["hashicorp-vault"] = typeof(HashiCorpVaultConnector), + ["azure-keyvault"] = typeof(AzureKeyVaultConnector) + }; + + public async Task CreateAsync( + Integration integration, + CancellationToken ct = default) + { + var config = await _encryption.DecryptAsync( + integration.TenantId, + integration.EncryptedConfig!, + ct); + + // Check built-in first + var connectorKey = GetConnectorKey(config); + if (_builtInConnectors.TryGetValue(connectorKey, out var type)) + { + return CreateBuiltIn(type, integration, config); + } + + // Fall back to plugin + var plugin = await _pluginLoader.GetPlugin(connectorKey); + if (plugin?.Sandbox is not null) + { + return CreateFromPlugin(plugin, integration, config); + } + + throw new ConnectorNotFoundException( + $"No connector found for type {connectorKey}"); + } +} +``` + +### ConnectorPool + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Runtime; + +public sealed class ConnectorPool : IAsyncDisposable +{ + private readonly Integration _integration; + private readonly IConnectorFactory _factory; + private readonly Channel _available; + private readonly ConcurrentDictionary _inUse = new(); + private readonly int _maxSize; + private int _currentSize; + + public ConnectorPool( + Integration integration, + IConnectorFactory factory, + int maxSize = 10) + { + _integration = integration; + _factory = factory; + _maxSize = maxSize; + _available = Channel.CreateBounded(maxSize); + } + + public async Task AcquireAsync( + CancellationToken ct = default) + { + // Try to get existing + if (_available.Reader.TryRead(out var existing)) + { + existing.MarkInUse(); + _inUse[existing.Id] = existing; + return existing; + } + + // Create new if under limit + if (Interlocked.Increment(ref _currentSize) <= _maxSize) + { + var connector = await _factory.CreateAsync(_integration, ct); + var pooled = new PooledConnector(connector, this); + pooled.MarkInUse(); + _inUse[pooled.Id] = pooled; + return pooled; + } + + Interlocked.Decrement(ref _currentSize); + + // Wait for available + var released = await _available.Reader.ReadAsync(ct); + released.MarkInUse(); + _inUse[released.Id] = released; + return released; + } + + public void Release(PooledConnector connector) + { + _inUse.TryRemove(connector.Id, out _); + connector.MarkAvailable(); + + if (!_available.Writer.TryWrite(connector)) + { + // Pool full, dispose + connector.DisposeConnector(); + Interlocked.Decrement(ref _currentSize); + } + } + + public async ValueTask DisposeAsync() + { + _available.Writer.Complete(); + + await foreach (var connector in _available.Reader.ReadAllAsync()) + { + connector.DisposeConnector(); + } + + foreach (var (_, connector) in _inUse) + { + connector.DisposeConnector(); + } + } +} + +public sealed class PooledConnector : IAsyncDisposable +{ + private readonly ConnectorPool _pool; + private readonly IConnectorPlugin _connector; + + public Guid Id { get; } = Guid.NewGuid(); + public IConnectorPlugin Connector => _connector; + public bool InUse { get; private set; } + public DateTimeOffset LastUsed { get; private set; } + + internal PooledConnector(IConnectorPlugin connector, ConnectorPool pool) + { + _connector = connector; + _pool = pool; + } + + internal void MarkInUse() + { + InUse = true; + LastUsed = TimeProvider.System.GetUtcNow(); + } + + internal void MarkAvailable() => InUse = false; + + internal void DisposeConnector() + { + if (_connector is IDisposable disposable) + disposable.Dispose(); + } + + public ValueTask DisposeAsync() + { + _pool.Release(this); + return ValueTask.CompletedTask; + } +} +``` + +### ConnectorRetryPolicy + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Runtime; + +public sealed class ConnectorRetryPolicy +{ + private readonly int _maxRetries; + private readonly TimeSpan _baseDelay; + private readonly ILogger _logger; + + public ConnectorRetryPolicy( + int maxRetries = 3, + TimeSpan? baseDelay = null, + ILogger? logger = null) + { + _maxRetries = maxRetries; + _baseDelay = baseDelay ?? TimeSpan.FromMilliseconds(200); + _logger = logger ?? NullLogger.Instance; + } + + public async Task ExecuteAsync( + Func> action, + CancellationToken ct = default) + { + var attempt = 0; + var exceptions = new List(); + + while (true) + { + try + { + return await action(ct); + } + catch (Exception ex) when (IsTransient(ex) && attempt < _maxRetries) + { + exceptions.Add(ex); + attempt++; + + var delay = CalculateDelay(attempt); + _logger.LogWarning( + "Connector operation failed (attempt {Attempt}/{Max}), retrying in {Delay}ms: {Error}", + attempt, _maxRetries, delay.TotalMilliseconds, ex.Message); + + await Task.Delay(delay, ct); + } + catch (Exception ex) + { + exceptions.Add(ex); + throw new ConnectorRetryExhaustedException( + $"Operation failed after {attempt + 1} attempts", + new AggregateException(exceptions)); + } + } + } + + private TimeSpan CalculateDelay(int attempt) + { + // Exponential backoff with jitter + var exponential = Math.Pow(2, attempt - 1); + var jitter = Random.Shared.NextDouble() * 0.3 + 0.85; // 0.85-1.15 + return TimeSpan.FromMilliseconds( + _baseDelay.TotalMilliseconds * exponential * jitter); + } + + private static bool IsTransient(Exception ex) => + ex is HttpRequestException or + TimeoutException or + TaskCanceledException { CancellationToken.IsCancellationRequested: false } or + OperationCanceledException { CancellationToken.IsCancellationRequested: false }; +} +``` + +### ConnectorCircuitBreaker + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Runtime; + +public sealed class ConnectorCircuitBreaker +{ + private readonly int _failureThreshold; + private readonly TimeSpan _resetTimeout; + private readonly ILogger _logger; + + private int _failureCount; + private CircuitState _state = CircuitState.Closed; + private DateTimeOffset _lastFailure; + private DateTimeOffset _openedAt; + + public ConnectorCircuitBreaker( + int failureThreshold = 5, + TimeSpan? resetTimeout = null, + ILogger? logger = null) + { + _failureThreshold = failureThreshold; + _resetTimeout = resetTimeout ?? TimeSpan.FromMinutes(1); + _logger = logger ?? NullLogger.Instance; + } + + public CircuitState State => _state; + + public async Task ExecuteAsync( + Func> action, + CancellationToken ct = default) + { + if (_state == CircuitState.Open) + { + if (ShouldAttemptReset()) + { + _state = CircuitState.HalfOpen; + _logger.LogInformation("Circuit breaker half-open, attempting reset"); + } + else + { + throw new CircuitBreakerOpenException( + $"Circuit breaker is open, retry after {_openedAt.Add(_resetTimeout)}"); + } + } + + try + { + var result = await action(ct); + OnSuccess(); + return result; + } + catch (Exception ex) when (!IsCritical(ex)) + { + OnFailure(); + throw; + } + } + + private void OnSuccess() + { + _failureCount = 0; + if (_state == CircuitState.HalfOpen) + { + _state = CircuitState.Closed; + _logger.LogInformation("Circuit breaker closed after successful request"); + } + } + + private void OnFailure() + { + _lastFailure = TimeProvider.System.GetUtcNow(); + _failureCount++; + + if (_state == CircuitState.HalfOpen) + { + _state = CircuitState.Open; + _openedAt = _lastFailure; + _logger.LogWarning("Circuit breaker opened after half-open failure"); + } + else if (_failureCount >= _failureThreshold) + { + _state = CircuitState.Open; + _openedAt = _lastFailure; + _logger.LogWarning( + "Circuit breaker opened after {Count} failures", + _failureCount); + } + } + + private bool ShouldAttemptReset() => + TimeProvider.System.GetUtcNow() >= _openedAt.Add(_resetTimeout); + + private static bool IsCritical(Exception ex) => + ex is OutOfMemoryException or StackOverflowException; +} + +public enum CircuitState +{ + Closed, + Open, + HalfOpen +} +``` + +### ConnectorRateLimiter + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Runtime; + +public sealed class ConnectorRateLimiter : IAsyncDisposable +{ + private readonly SemaphoreSlim _semaphore; + private readonly RateLimitLease[] _leases; + private readonly TimeSpan _window; + private readonly int _limit; + private int _leaseIndex; + + public ConnectorRateLimiter(int requestsPerWindow, TimeSpan window) + { + _limit = requestsPerWindow; + _window = window; + _semaphore = new SemaphoreSlim(requestsPerWindow, requestsPerWindow); + _leases = new RateLimitLease[requestsPerWindow]; + } + + public async Task AcquireAsync(CancellationToken ct = default) + { + await _semaphore.WaitAsync(ct); + + var index = Interlocked.Increment(ref _leaseIndex) % _limit; + var oldLease = _leases[index]; + + if (oldLease is not null) + { + var elapsed = TimeProvider.System.GetUtcNow() - oldLease.AcquiredAt; + if (elapsed < _window) + { + await Task.Delay(_window - elapsed, ct); + } + } + + var lease = new RateLimitLease(this); + _leases[index] = lease; + return lease; + } + + private void Release() => _semaphore.Release(); + + public ValueTask DisposeAsync() + { + _semaphore.Dispose(); + return ValueTask.CompletedTask; + } + + private sealed class RateLimitLease : IDisposable + { + private readonly ConnectorRateLimiter _limiter; + public DateTimeOffset AcquiredAt { get; } = TimeProvider.System.GetUtcNow(); + + public RateLimitLease(ConnectorRateLimiter limiter) => _limiter = limiter; + public void Dispose() => _limiter.Release(); + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Factory creates built-in connectors +- [ ] Factory creates plugin connectors +- [ ] Connection pooling works +- [ ] Retry policy retries transient failures +- [ ] Circuit breaker opens on failures +- [ ] Rate limiter enforces limits +- [ ] Metrics exposed for monitoring +- [ ] Unit test coverage ≥85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 102_001 Integration Manager | Internal | TODO | +| 101_003 Plugin Loader | Internal | TODO | +| Polly | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IConnectorFactory | TODO | | +| ConnectorFactory | TODO | | +| ConnectorPool | TODO | | +| ConnectorRetryPolicy | TODO | | +| ConnectorCircuitBreaker | TODO | | +| ConnectorRateLimiter | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_102_003_INTHUB_scm_connectors.md b/docs/implplan/SPRINT_20260110_102_003_INTHUB_scm_connectors.md new file mode 100644 index 000000000..313e50f4b --- /dev/null +++ b/docs/implplan/SPRINT_20260110_102_003_INTHUB_scm_connectors.md @@ -0,0 +1,460 @@ +# SPRINT: Built-in SCM Connectors + +> **Sprint ID:** 102_003 +> **Module:** INTHUB +> **Phase:** 2 - Integration Hub +> **Status:** TODO +> **Parent:** [102_000_INDEX](SPRINT_20260110_102_000_INDEX_integration_hub.md) + +--- + +## Overview + +Implement built-in SCM (Source Control Management) connectors for GitHub, GitLab, Gitea, and Azure DevOps. Each connector implements the `IScmConnector` interface. + +### Objectives + +- GitHub connector (GitHub.com and GitHub Enterprise) +- GitLab connector (GitLab.com and self-hosted) +- Gitea connector (self-hosted) +- Azure DevOps connector +- Webhook support for all connectors + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.IntegrationHub/ +│ └── Connectors/ +│ └── Scm/ +│ ├── GitHubConnector.cs +│ ├── GitLabConnector.cs +│ ├── GiteaConnector.cs +│ └── AzureDevOpsConnector.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.IntegrationHub.Tests/ + └── Connectors/ + └── Scm/ +``` + +--- + +## Deliverables + +### GitHubConnector + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Connectors.Scm; + +public sealed class GitHubConnector : IScmConnector +{ + public ConnectorCategory Category => ConnectorCategory.Scm; + public IReadOnlyList Capabilities { get; } = [ + "repositories", "commits", "branches", "webhooks", "actions" + ]; + + private GitHubClient? _client; + private ConnectorContext? _context; + + public Task InitializeAsync( + ConnectorContext context, + CancellationToken ct = default) + { + _context = context; + var config = ParseConfig(context.Configuration); + + var credentials = new Credentials( + config.Token ?? await ResolveTokenAsync(context, ct)); + + _client = new GitHubClient(new ProductHeaderValue("StellaOps")) + { + Credentials = credentials + }; + + if (!string.IsNullOrEmpty(config.BaseUrl)) + { + _client = new GitHubClient( + new ProductHeaderValue("StellaOps"), + new Uri(config.BaseUrl)) + { + Credentials = credentials + }; + } + + return Task.CompletedTask; + } + + public async Task ValidateConfigAsync( + JsonElement config, + CancellationToken ct = default) + { + var errors = new List(); + var parsed = ParseConfig(config); + + if (string.IsNullOrEmpty(parsed.Token) && + string.IsNullOrEmpty(parsed.TokenSecretRef)) + { + errors.Add("Either 'token' or 'tokenSecretRef' is required"); + } + + return errors.Count == 0 + ? ConfigValidationResult.Success() + : ConfigValidationResult.Failure(errors.ToArray()); + } + + public async Task TestConnectionAsync( + ConnectorContext context, + CancellationToken ct = default) + { + var sw = Stopwatch.StartNew(); + try + { + var user = await _client!.User.Current(); + return new ConnectionTestResult( + Success: true, + Message: $"Authenticated as {user.Login}", + ResponseTime: sw.Elapsed); + } + catch (Exception ex) + { + return new ConnectionTestResult( + Success: false, + Message: ex.Message, + ResponseTime: sw.Elapsed); + } + } + + public async Task> ListRepositoriesAsync( + ConnectorContext context, + string? searchPattern = null, + CancellationToken ct = default) + { + var repos = await _client!.Repository.GetAllForCurrent(); + + var result = repos + .Where(r => searchPattern is null || + r.FullName.Contains(searchPattern, StringComparison.OrdinalIgnoreCase)) + .Select(r => new ScmRepository( + Id: r.Id.ToString(CultureInfo.InvariantCulture), + Name: r.Name, + FullName: r.FullName, + DefaultBranch: r.DefaultBranch, + CloneUrl: r.CloneUrl, + IsPrivate: r.Private)) + .ToList(); + + return result; + } + + public async Task GetCommitAsync( + ConnectorContext context, + string repository, + string commitSha, + CancellationToken ct = default) + { + var (owner, repo) = ParseRepository(repository); + var commit = await _client!.Repository.Commit.Get(owner, repo, commitSha); + + return new ScmCommit( + Sha: commit.Sha, + Message: commit.Commit.Message, + AuthorName: commit.Commit.Author.Name, + AuthorEmail: commit.Commit.Author.Email, + AuthoredAt: commit.Commit.Author.Date, + ParentSha: commit.Parents.FirstOrDefault()?.Sha); + } + + public async Task CreateWebhookAsync( + ConnectorContext context, + string repository, + IReadOnlyList events, + string callbackUrl, + CancellationToken ct = default) + { + var (owner, repo) = ParseRepository(repository); + + var webhook = await _client!.Repository.Hooks.Create(owner, repo, new NewRepositoryHook( + "web", + new Dictionary + { + ["url"] = callbackUrl, + ["content_type"] = "json", + ["secret"] = GenerateWebhookSecret() + }) + { + Events = events.ToArray(), + Active = true + }); + + return new WebhookRegistration( + Id: webhook.Id.ToString(CultureInfo.InvariantCulture), + Url: callbackUrl, + Secret: webhook.Config["secret"], + Events: events); + } + + private static (string Owner, string Repo) ParseRepository(string fullName) + { + var parts = fullName.Split('/'); + return (parts[0], parts[1]); + } +} + +internal sealed record GitHubConfig( + string? BaseUrl, + string? Token, + string? TokenSecretRef +); +``` + +### GitLabConnector + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Connectors.Scm; + +public sealed class GitLabConnector : IScmConnector +{ + public ConnectorCategory Category => ConnectorCategory.Scm; + public IReadOnlyList Capabilities { get; } = [ + "repositories", "commits", "branches", "webhooks", "pipelines" + ]; + + private GitLabClient? _client; + + public async Task InitializeAsync( + ConnectorContext context, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + var token = config.Token ?? + await context.SecretResolver.ResolveAsync(config.TokenSecretRef!, ct); + + var baseUrl = config.BaseUrl ?? "https://gitlab.com"; + _client = new GitLabClient(baseUrl, token); + } + + public async Task> ListRepositoriesAsync( + ConnectorContext context, + string? searchPattern = null, + CancellationToken ct = default) + { + var projects = await _client!.Projects.GetAsync(new ProjectQueryOptions + { + Search = searchPattern, + Membership = true + }); + + return projects.Select(p => new ScmRepository( + Id: p.Id.ToString(CultureInfo.InvariantCulture), + Name: p.Name, + FullName: p.PathWithNamespace, + DefaultBranch: p.DefaultBranch, + CloneUrl: p.HttpUrlToRepo, + IsPrivate: p.Visibility == ProjectVisibility.Private)) + .ToList(); + } + + public async Task GetCommitAsync( + ConnectorContext context, + string repository, + string commitSha, + CancellationToken ct = default) + { + var commit = await _client!.Commits.GetAsync(repository, commitSha); + + return new ScmCommit( + Sha: commit.Id, + Message: commit.Message, + AuthorName: commit.AuthorName, + AuthorEmail: commit.AuthorEmail, + AuthoredAt: commit.AuthoredDate, + ParentSha: commit.ParentIds?.FirstOrDefault()); + } + + public async Task CreateWebhookAsync( + ConnectorContext context, + string repository, + IReadOnlyList events, + string callbackUrl, + CancellationToken ct = default) + { + var secret = GenerateWebhookSecret(); + var hook = await _client!.Projects.CreateWebhookAsync(repository, new CreateWebhookRequest + { + Url = callbackUrl, + Token = secret, + PushEvents = events.Contains("push"), + TagPushEvents = events.Contains("tag_push"), + MergeRequestsEvents = events.Contains("merge_request"), + PipelineEvents = events.Contains("pipeline") + }); + + return new WebhookRegistration( + Id: hook.Id.ToString(CultureInfo.InvariantCulture), + Url: callbackUrl, + Secret: secret, + Events: events); + } +} +``` + +### GiteaConnector + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Connectors.Scm; + +public sealed class GiteaConnector : IScmConnector +{ + public ConnectorCategory Category => ConnectorCategory.Scm; + public IReadOnlyList Capabilities { get; } = [ + "repositories", "commits", "branches", "webhooks" + ]; + + private HttpClient? _httpClient; + private string? _baseUrl; + + public async Task InitializeAsync( + ConnectorContext context, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + var token = config.Token ?? + await context.SecretResolver.ResolveAsync(config.TokenSecretRef!, ct); + + _baseUrl = config.BaseUrl?.TrimEnd('/'); + + _httpClient = new HttpClient + { + BaseAddress = new Uri(_baseUrl + "/api/v1/") + }; + _httpClient.DefaultRequestHeaders.Authorization = + new AuthenticationHeaderValue("token", token); + } + + public async Task> ListRepositoriesAsync( + ConnectorContext context, + string? searchPattern = null, + CancellationToken ct = default) + { + var response = await _httpClient!.GetAsync("user/repos", ct); + response.EnsureSuccessStatusCode(); + + var repos = await response.Content + .ReadFromJsonAsync>(ct); + + return repos! + .Where(r => searchPattern is null || + r.FullName.Contains(searchPattern, StringComparison.OrdinalIgnoreCase)) + .Select(r => new ScmRepository( + Id: r.Id.ToString(CultureInfo.InvariantCulture), + Name: r.Name, + FullName: r.FullName, + DefaultBranch: r.DefaultBranch, + CloneUrl: r.CloneUrl, + IsPrivate: r.Private)) + .ToList(); + } + + // Additional methods... +} +``` + +### AzureDevOpsConnector + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Connectors.Scm; + +public sealed class AzureDevOpsConnector : IScmConnector +{ + public ConnectorCategory Category => ConnectorCategory.Scm; + public IReadOnlyList Capabilities { get; } = [ + "repositories", "commits", "branches", "webhooks", "pipelines", "workitems" + ]; + + private VssConnection? _connection; + private GitHttpClient? _gitClient; + + public async Task InitializeAsync( + ConnectorContext context, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + var pat = config.Pat ?? + await context.SecretResolver.ResolveAsync(config.PatSecretRef!, ct); + + var credentials = new VssBasicCredential(string.Empty, pat); + _connection = new VssConnection(new Uri(config.OrganizationUrl), credentials); + _gitClient = await _connection.GetClientAsync(ct); + } + + public async Task> ListRepositoriesAsync( + ConnectorContext context, + string? searchPattern = null, + CancellationToken ct = default) + { + var repos = await _gitClient!.GetRepositoriesAsync(cancellationToken: ct); + + return repos + .Where(r => searchPattern is null || + r.Name.Contains(searchPattern, StringComparison.OrdinalIgnoreCase)) + .Select(r => new ScmRepository( + Id: r.Id.ToString(), + Name: r.Name, + FullName: $"{r.ProjectReference.Name}/{r.Name}", + DefaultBranch: r.DefaultBranch?.Replace("refs/heads/", "") ?? "main", + CloneUrl: r.RemoteUrl, + IsPrivate: true)) // Azure DevOps repos are always private to org + .ToList(); + } + + // Additional methods... +} +``` + +--- + +## Acceptance Criteria + +- [ ] GitHub connector authenticates +- [ ] GitHub connector lists repositories +- [ ] GitHub connector creates webhooks +- [ ] GitLab connector works +- [ ] Gitea connector works +- [ ] Azure DevOps connector works +- [ ] All connectors handle errors gracefully +- [ ] Webhook secret generation is secure +- [ ] Config validation catches issues +- [ ] Integration tests pass + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 102_002 Connector Runtime | Internal | TODO | +| Octokit | NuGet | Available | +| GitLabApiClient | NuGet | Available | +| Microsoft.TeamFoundationServer.Client | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| GitHubConnector | TODO | | +| GitLabConnector | TODO | | +| GiteaConnector | TODO | | +| AzureDevOpsConnector | TODO | | +| Unit tests | TODO | | +| Integration tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_102_004_INTHUB_registry_connectors.md b/docs/implplan/SPRINT_20260110_102_004_INTHUB_registry_connectors.md new file mode 100644 index 000000000..8e65a1eba --- /dev/null +++ b/docs/implplan/SPRINT_20260110_102_004_INTHUB_registry_connectors.md @@ -0,0 +1,617 @@ +# SPRINT: Built-in Registry Connectors + +> **Sprint ID:** 102_004 +> **Module:** INTHUB +> **Phase:** 2 - Integration Hub +> **Status:** TODO +> **Parent:** [102_000_INDEX](SPRINT_20260110_102_000_INDEX_integration_hub.md) + +--- + +## Overview + +Implement built-in container registry connectors for Docker Hub, Harbor, ACR, ECR, GCR, and generic OCI registries. Each implements `IRegistryConnector`. + +### Objectives + +- Docker Hub connector with rate limit handling +- Harbor connector for self-hosted registries +- Azure Container Registry (ACR) connector +- AWS Elastic Container Registry (ECR) connector +- Google Container Registry (GCR) connector +- Generic OCI connector for any compliant registry + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.IntegrationHub/ +│ └── Connectors/ +│ └── Registry/ +│ ├── DockerHubConnector.cs +│ ├── HarborConnector.cs +│ ├── AcrConnector.cs +│ ├── EcrConnector.cs +│ ├── GcrConnector.cs +│ └── GenericOciConnector.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.IntegrationHub.Tests/ + └── Connectors/ + └── Registry/ +``` + +--- + +## Deliverables + +### DockerHubConnector + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Connectors.Registry; + +public sealed class DockerHubConnector : IRegistryConnector +{ + private const string RegistryUrl = "https://registry-1.docker.io"; + private const string AuthUrl = "https://auth.docker.io/token"; + private const string HubApiUrl = "https://hub.docker.com/v2"; + + public ConnectorCategory Category => ConnectorCategory.Registry; + public IReadOnlyList Capabilities { get; } = [ + "list_repos", "list_tags", "resolve_tag", "get_manifest", "pull_credentials" + ]; + + private HttpClient? _httpClient; + private string? _username; + private string? _password; + + public async Task InitializeAsync( + ConnectorContext context, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + _username = config.Username; + _password = config.Password ?? + await context.SecretResolver.ResolveAsync(config.PasswordSecretRef!, ct); + + _httpClient = new HttpClient(); + } + + public async Task> ListTagsAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default) + { + var token = await GetAuthTokenAsync(repository, "pull", ct); + + var request = new HttpRequestMessage( + HttpMethod.Get, + $"{RegistryUrl}/v2/{repository}/tags/list"); + request.Headers.Authorization = + new AuthenticationHeaderValue("Bearer", token); + + var response = await _httpClient!.SendAsync(request, ct); + response.EnsureSuccessStatusCode(); + + var result = await response.Content + .ReadFromJsonAsync(ct); + + // Get tag details from Hub API + var tags = new List(); + foreach (var tag in result!.Tags) + { + var detail = await GetTagDetailAsync(repository, tag, ct); + tags.Add(new ImageTag( + Name: tag, + Digest: detail?.Digest, + PushedAt: detail?.LastPushed)); + } + + return tags; + } + + public async Task ResolveTagAsync( + ConnectorContext context, + string repository, + string tag, + CancellationToken ct = default) + { + var token = await GetAuthTokenAsync(repository, "pull", ct); + + var request = new HttpRequestMessage( + HttpMethod.Head, + $"{RegistryUrl}/v2/{repository}/manifests/{tag}"); + request.Headers.Authorization = + new AuthenticationHeaderValue("Bearer", token); + request.Headers.Accept.Add(new MediaTypeWithQualityHeaderValue( + "application/vnd.docker.distribution.manifest.v2+json")); + request.Headers.Accept.Add(new MediaTypeWithQualityHeaderValue( + "application/vnd.oci.image.manifest.v1+json")); + + var response = await _httpClient!.SendAsync(request, ct); + if (response.StatusCode == HttpStatusCode.NotFound) + return null; + + response.EnsureSuccessStatusCode(); + + var digest = response.Headers.GetValues("Docker-Content-Digest").First(); + var contentType = response.Content.Headers.ContentType?.MediaType ?? ""; + var size = response.Content.Headers.ContentLength ?? 0; + + return new ImageDigest(digest, contentType, size); + } + + public async Task GetManifestAsync( + ConnectorContext context, + string repository, + string reference, + CancellationToken ct = default) + { + var token = await GetAuthTokenAsync(repository, "pull", ct); + + var request = new HttpRequestMessage( + HttpMethod.Get, + $"{RegistryUrl}/v2/{repository}/manifests/{reference}"); + request.Headers.Authorization = + new AuthenticationHeaderValue("Bearer", token); + request.Headers.Accept.Add(new MediaTypeWithQualityHeaderValue( + "application/vnd.docker.distribution.manifest.v2+json")); + request.Headers.Accept.Add(new MediaTypeWithQualityHeaderValue( + "application/vnd.oci.image.manifest.v1+json")); + + var response = await _httpClient!.SendAsync(request, ct); + if (response.StatusCode == HttpStatusCode.NotFound) + return null; + + response.EnsureSuccessStatusCode(); + + var json = await response.Content.ReadAsStringAsync(ct); + var digest = response.Headers.GetValues("Docker-Content-Digest").First(); + + return new ImageManifest( + Digest: digest, + MediaType: response.Content.Headers.ContentType?.MediaType ?? "", + Size: response.Content.Headers.ContentLength ?? 0, + RawManifest: json); + } + + public Task GetPullCredentialsAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default) + { + return Task.FromResult(new PullCredentials( + Registry: "docker.io", + Username: _username!, + Password: _password!, + ExpiresAt: DateTimeOffset.MaxValue)); + } + + private async Task GetAuthTokenAsync( + string repository, + string scope, + CancellationToken ct) + { + var url = $"{AuthUrl}?service=registry.docker.io&scope=repository:{repository}:{scope}"; + + var request = new HttpRequestMessage(HttpMethod.Get, url); + if (!string.IsNullOrEmpty(_username)) + { + var credentials = Convert.ToBase64String( + Encoding.UTF8.GetBytes($"{_username}:{_password}")); + request.Headers.Authorization = + new AuthenticationHeaderValue("Basic", credentials); + } + + var response = await _httpClient!.SendAsync(request, ct); + response.EnsureSuccessStatusCode(); + + var result = await response.Content.ReadFromJsonAsync(ct); + return result!.Token; + } +} +``` + +### AcrConnector + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Connectors.Registry; + +public sealed class AcrConnector : IRegistryConnector +{ + public ConnectorCategory Category => ConnectorCategory.Registry; + + private ContainerRegistryClient? _client; + private string? _registryUrl; + + public async Task InitializeAsync( + ConnectorContext context, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + _registryUrl = config.RegistryUrl; + + // Support both service principal and managed identity + TokenCredential credential = config.AuthMethod switch + { + "service_principal" => new ClientSecretCredential( + config.TenantId, + config.ClientId, + config.ClientSecret ?? + await context.SecretResolver.ResolveAsync(config.ClientSecretRef!, ct)), + + "managed_identity" => new ManagedIdentityCredential(), + + _ => new DefaultAzureCredential() + }; + + _client = new ContainerRegistryClient( + new Uri($"https://{_registryUrl}"), + credential); + } + + public async Task> ListRepositoriesAsync( + ConnectorContext context, + string? prefix = null, + CancellationToken ct = default) + { + var repos = new List(); + + await foreach (var name in _client!.GetRepositoryNamesAsync(ct)) + { + if (prefix is null || name.StartsWith(prefix, StringComparison.OrdinalIgnoreCase)) + { + repos.Add(new RegistryRepository(name)); + } + } + + return repos; + } + + public async Task> ListTagsAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default) + { + var repo = _client!.GetRepository(repository); + var tags = new List(); + + await foreach (var manifest in repo.GetAllManifestPropertiesAsync(ct)) + { + foreach (var tag in manifest.Tags) + { + tags.Add(new ImageTag( + Name: tag, + Digest: manifest.Digest, + PushedAt: manifest.CreatedOn)); + } + } + + return tags; + } + + public async Task ResolveTagAsync( + ConnectorContext context, + string repository, + string tag, + CancellationToken ct = default) + { + try + { + var repo = _client!.GetRepository(repository); + var artifact = repo.GetArtifact(tag); + var manifest = await artifact.GetManifestPropertiesAsync(ct); + + return new ImageDigest( + Digest: manifest.Value.Digest, + MediaType: manifest.Value.MediaType ?? "", + Size: manifest.Value.SizeInBytes ?? 0); + } + catch (RequestFailedException ex) when (ex.Status == 404) + { + return null; + } + } + + public async Task GetPullCredentialsAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default) + { + // Get short-lived token for pull + var exchangeClient = new ContainerRegistryContentClient( + new Uri($"https://{_registryUrl}"), + repository, + new DefaultAzureCredential()); + + // Use refresh token exchange + return new PullCredentials( + Registry: _registryUrl!, + Username: "00000000-0000-0000-0000-000000000000", + Password: await GetAcrRefreshTokenAsync(ct), + ExpiresAt: DateTimeOffset.UtcNow.AddHours(1)); + } +} +``` + +### EcrConnector + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Connectors.Registry; + +public sealed class EcrConnector : IRegistryConnector +{ + public ConnectorCategory Category => ConnectorCategory.Registry; + + private AmazonECRClient? _ecrClient; + private string? _registryId; + + public async Task InitializeAsync( + ConnectorContext context, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + + AWSCredentials credentials = config.AuthMethod switch + { + "access_key" => new BasicAWSCredentials( + config.AccessKeyId, + config.SecretAccessKey ?? + await context.SecretResolver.ResolveAsync(config.SecretAccessKeyRef!, ct)), + + "assume_role" => new AssumeRoleAWSCredentials( + new BasicAWSCredentials(config.AccessKeyId, config.SecretAccessKey), + config.RoleArn, + "StellaOps"), + + _ => new InstanceProfileAWSCredentials() + }; + + _ecrClient = new AmazonECRClient(credentials, RegionEndpoint.GetBySystemName(config.Region)); + _registryId = config.RegistryId; + } + + public async Task> ListRepositoriesAsync( + ConnectorContext context, + string? prefix = null, + CancellationToken ct = default) + { + var repos = new List(); + string? nextToken = null; + + do + { + var response = await _ecrClient!.DescribeRepositoriesAsync( + new DescribeRepositoriesRequest + { + RegistryId = _registryId, + NextToken = nextToken + }, ct); + + foreach (var repo in response.Repositories) + { + if (prefix is null || + repo.RepositoryName.StartsWith(prefix, StringComparison.OrdinalIgnoreCase)) + { + repos.Add(new RegistryRepository(repo.RepositoryName)); + } + } + + nextToken = response.NextToken; + } while (!string.IsNullOrEmpty(nextToken)); + + return repos; + } + + public async Task> ListTagsAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default) + { + var tags = new List(); + string? nextToken = null; + + do + { + var response = await _ecrClient!.DescribeImagesAsync( + new DescribeImagesRequest + { + RegistryId = _registryId, + RepositoryName = repository, + NextToken = nextToken + }, ct); + + foreach (var image in response.ImageDetails) + { + foreach (var tag in image.ImageTags) + { + tags.Add(new ImageTag( + Name: tag, + Digest: image.ImageDigest, + PushedAt: image.ImagePushedAt)); + } + } + + nextToken = response.NextToken; + } while (!string.IsNullOrEmpty(nextToken)); + + return tags; + } + + public async Task GetPullCredentialsAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default) + { + var response = await _ecrClient!.GetAuthorizationTokenAsync( + new GetAuthorizationTokenRequest + { + RegistryIds = _registryId is not null ? [_registryId] : null + }, ct); + + var auth = response.AuthorizationData.First(); + var decoded = Encoding.UTF8.GetString( + Convert.FromBase64String(auth.AuthorizationToken)); + var parts = decoded.Split(':'); + + return new PullCredentials( + Registry: new Uri(auth.ProxyEndpoint).Host, + Username: parts[0], + Password: parts[1], + ExpiresAt: auth.ExpiresAt); + } +} +``` + +### GenericOciConnector + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Connectors.Registry; + +/// +/// Generic OCI Distribution-compliant registry connector. +/// Works with any registry implementing OCI Distribution Spec. +/// +public sealed class GenericOciConnector : IRegistryConnector +{ + public ConnectorCategory Category => ConnectorCategory.Registry; + + private HttpClient? _httpClient; + private string? _registryUrl; + private string? _username; + private string? _password; + + public async Task InitializeAsync( + ConnectorContext context, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + _registryUrl = config.RegistryUrl.TrimEnd('/'); + _username = config.Username; + _password = config.Password ?? + (config.PasswordSecretRef is not null + ? await context.SecretResolver.ResolveAsync(config.PasswordSecretRef, ct) + : null); + + _httpClient = new HttpClient(); + + if (!string.IsNullOrEmpty(_username)) + { + var credentials = Convert.ToBase64String( + Encoding.UTF8.GetBytes($"{_username}:{_password}")); + _httpClient.DefaultRequestHeaders.Authorization = + new AuthenticationHeaderValue("Basic", credentials); + } + } + + public async Task> ListRepositoriesAsync( + ConnectorContext context, + string? prefix = null, + CancellationToken ct = default) + { + var response = await _httpClient!.GetAsync( + $"{_registryUrl}/v2/_catalog", ct); + response.EnsureSuccessStatusCode(); + + var result = await response.Content + .ReadFromJsonAsync(ct); + + return result!.Repositories + .Where(r => prefix is null || + r.StartsWith(prefix, StringComparison.OrdinalIgnoreCase)) + .Select(r => new RegistryRepository(r)) + .ToList(); + } + + public async Task ResolveTagAsync( + ConnectorContext context, + string repository, + string tag, + CancellationToken ct = default) + { + var request = new HttpRequestMessage( + HttpMethod.Head, + $"{_registryUrl}/v2/{repository}/manifests/{tag}"); + + request.Headers.Accept.Add(new MediaTypeWithQualityHeaderValue( + "application/vnd.oci.image.manifest.v1+json")); + request.Headers.Accept.Add(new MediaTypeWithQualityHeaderValue( + "application/vnd.docker.distribution.manifest.v2+json")); + + var response = await _httpClient!.SendAsync(request, ct); + if (response.StatusCode == HttpStatusCode.NotFound) + return null; + + response.EnsureSuccessStatusCode(); + + var digest = response.Headers + .GetValues("Docker-Content-Digest") + .FirstOrDefault() ?? ""; + var mediaType = response.Content.Headers.ContentType?.MediaType ?? ""; + var size = response.Content.Headers.ContentLength ?? 0; + + return new ImageDigest(digest, mediaType, size); + } + + public Task GetPullCredentialsAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default) + { + var uri = new Uri(_registryUrl!); + return Task.FromResult(new PullCredentials( + Registry: uri.Host, + Username: _username ?? "", + Password: _password ?? "", + ExpiresAt: DateTimeOffset.MaxValue)); + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Docker Hub connector works with rate limiting +- [ ] Harbor connector supports webhooks +- [ ] ACR connector uses Azure Identity +- [ ] ECR connector handles token refresh +- [ ] GCR connector uses GCP credentials +- [ ] Generic OCI connector works with any registry +- [ ] All connectors resolve tags to digests +- [ ] Pull credentials generated correctly +- [ ] Integration tests pass + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 102_002 Connector Runtime | Internal | TODO | +| Azure.Containers.ContainerRegistry | NuGet | Available | +| AWSSDK.ECR | NuGet | Available | +| Google.Cloud.ArtifactRegistry.V1 | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| DockerHubConnector | TODO | | +| HarborConnector | TODO | | +| AcrConnector | TODO | | +| EcrConnector | TODO | | +| GcrConnector | TODO | | +| GenericOciConnector | TODO | | +| Unit tests | TODO | | +| Integration tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_102_005_INTHUB_vault_connector.md b/docs/implplan/SPRINT_20260110_102_005_INTHUB_vault_connector.md new file mode 100644 index 000000000..cf54ed255 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_102_005_INTHUB_vault_connector.md @@ -0,0 +1,503 @@ +# SPRINT: Built-in Vault Connector + +> **Sprint ID:** 102_005 +> **Module:** INTHUB +> **Phase:** 2 - Integration Hub +> **Status:** TODO +> **Parent:** [102_000_INDEX](SPRINT_20260110_102_000_INDEX_integration_hub.md) + +--- + +## Overview + +Implement built-in vault connectors for secrets management: HashiCorp Vault, Azure Key Vault, and AWS Secrets Manager. + +### Objectives + +- HashiCorp Vault connector with multiple auth methods +- Azure Key Vault connector with managed identity support +- AWS Secrets Manager connector +- Unified secret resolution interface + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.IntegrationHub/ +│ └── Connectors/ +│ └── Vault/ +│ ├── IVaultConnector.cs +│ ├── HashiCorpVaultConnector.cs +│ ├── AzureKeyVaultConnector.cs +│ └── AwsSecretsManagerConnector.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### IVaultConnector Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Connectors.Vault; + +public interface IVaultConnector : IConnectorPlugin +{ + Task GetSecretAsync( + ConnectorContext context, + string path, + string? key = null, + CancellationToken ct = default); + + Task> GetSecretsAsync( + ConnectorContext context, + string path, + CancellationToken ct = default); + + Task SetSecretAsync( + ConnectorContext context, + string path, + string key, + string value, + CancellationToken ct = default); + + Task> ListSecretsAsync( + ConnectorContext context, + string? path = null, + CancellationToken ct = default); +} +``` + +### HashiCorpVaultConnector + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Connectors.Vault; + +public sealed class HashiCorpVaultConnector : IVaultConnector +{ + public ConnectorCategory Category => ConnectorCategory.Vault; + + private VaultClient? _client; + + public async Task InitializeAsync( + ConnectorContext context, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + + IAuthMethodInfo authMethod = config.AuthMethod switch + { + "token" => new TokenAuthMethodInfo( + config.Token ?? await context.SecretResolver.ResolveAsync( + config.TokenSecretRef!, ct)), + + "approle" => new AppRoleAuthMethodInfo( + config.RoleId, + config.SecretId ?? await context.SecretResolver.ResolveAsync( + config.SecretIdRef!, ct)), + + "kubernetes" => new KubernetesAuthMethodInfo( + config.Role, + await File.ReadAllTextAsync( + "/var/run/secrets/kubernetes.io/serviceaccount/token", ct)), + + _ => throw new ArgumentException($"Unknown auth method: {config.AuthMethod}") + }; + + var vaultSettings = new VaultClientSettings(config.Address, authMethod); + _client = new VaultClient(vaultSettings); + } + + public async Task GetSecretAsync( + ConnectorContext context, + string path, + string? key = null, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + var mountPoint = config.MountPoint ?? "secret"; + + var secret = await _client!.V1.Secrets.KeyValue.V2 + .ReadSecretAsync(path, mountPoint: mountPoint); + + if (secret?.Data?.Data is null) + return null; + + if (key is null) + { + // Return first value if no key specified + return secret.Data.Data.Values.FirstOrDefault()?.ToString(); + } + + return secret.Data.Data.TryGetValue(key, out var value) + ? value?.ToString() + : null; + } + + public async Task> GetSecretsAsync( + ConnectorContext context, + string path, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + var mountPoint = config.MountPoint ?? "secret"; + + var secret = await _client!.V1.Secrets.KeyValue.V2 + .ReadSecretAsync(path, mountPoint: mountPoint); + + if (secret?.Data?.Data is null) + return new Dictionary(); + + return secret.Data.Data + .ToDictionary( + kvp => kvp.Key, + kvp => kvp.Value?.ToString() ?? ""); + } + + public async Task SetSecretAsync( + ConnectorContext context, + string path, + string key, + string value, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + var mountPoint = config.MountPoint ?? "secret"; + + // Get existing secrets to merge + var existing = await GetSecretsAsync(context, path, ct); + var data = new Dictionary( + existing.ToDictionary(k => k.Key, v => (object)v.Value)) + { + [key] = value + }; + + await _client!.V1.Secrets.KeyValue.V2 + .WriteSecretAsync(path, data, mountPoint: mountPoint); + } + + public async Task> ListSecretsAsync( + ConnectorContext context, + string? path = null, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + var mountPoint = config.MountPoint ?? "secret"; + + var result = await _client!.V1.Secrets.KeyValue.V2 + .ReadSecretPathsAsync(path ?? "", mountPoint: mountPoint); + + return result?.Data?.Keys?.ToList() ?? []; + } +} +``` + +### AzureKeyVaultConnector + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Connectors.Vault; + +public sealed class AzureKeyVaultConnector : IVaultConnector +{ + public ConnectorCategory Category => ConnectorCategory.Vault; + + private SecretClient? _client; + + public async Task InitializeAsync( + ConnectorContext context, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + + TokenCredential credential = config.AuthMethod switch + { + "service_principal" => new ClientSecretCredential( + config.TenantId, + config.ClientId, + config.ClientSecret ?? await context.SecretResolver.ResolveAsync( + config.ClientSecretRef!, ct)), + + "managed_identity" => new ManagedIdentityCredential( + config.ManagedIdentityClientId), + + _ => new DefaultAzureCredential() + }; + + _client = new SecretClient( + new Uri($"https://{config.VaultName}.vault.azure.net/"), + credential); + } + + public async Task GetSecretAsync( + ConnectorContext context, + string path, + string? key = null, + CancellationToken ct = default) + { + try + { + // Azure Key Vault uses flat namespace, path is the secret name + var secretName = key is not null ? $"{path}--{key}" : path; + var response = await _client!.GetSecretAsync(secretName, cancellationToken: ct); + return response.Value.Value; + } + catch (RequestFailedException ex) when (ex.Status == 404) + { + return null; + } + } + + public async Task> GetSecretsAsync( + ConnectorContext context, + string path, + CancellationToken ct = default) + { + var result = new Dictionary(); + + await foreach (var secret in _client!.GetPropertiesOfSecretsAsync(ct)) + { + if (secret.Name.StartsWith(path, StringComparison.OrdinalIgnoreCase)) + { + var value = await _client.GetSecretAsync(secret.Name, cancellationToken: ct); + var key = secret.Name[(path.Length + 2)..]; // Remove prefix and "--" + result[key] = value.Value.Value; + } + } + + return result; + } + + public async Task SetSecretAsync( + ConnectorContext context, + string path, + string key, + string value, + CancellationToken ct = default) + { + var secretName = $"{path}--{key}"; + await _client!.SetSecretAsync(secretName, value, ct); + } + + public async Task> ListSecretsAsync( + ConnectorContext context, + string? path = null, + CancellationToken ct = default) + { + var result = new List(); + + await foreach (var secret in _client!.GetPropertiesOfSecretsAsync(ct)) + { + if (path is null || secret.Name.StartsWith(path, StringComparison.OrdinalIgnoreCase)) + { + result.Add(secret.Name); + } + } + + return result; + } +} +``` + +### AwsSecretsManagerConnector + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Connectors.Vault; + +public sealed class AwsSecretsManagerConnector : IVaultConnector +{ + public ConnectorCategory Category => ConnectorCategory.Vault; + + private AmazonSecretsManagerClient? _client; + + public async Task InitializeAsync( + ConnectorContext context, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + + AWSCredentials credentials = config.AuthMethod switch + { + "access_key" => new BasicAWSCredentials( + config.AccessKeyId, + config.SecretAccessKey ?? await context.SecretResolver.ResolveAsync( + config.SecretAccessKeyRef!, ct)), + + "assume_role" => new AssumeRoleAWSCredentials( + new BasicAWSCredentials(config.AccessKeyId, config.SecretAccessKey), + config.RoleArn, + "StellaOps"), + + _ => new InstanceProfileAWSCredentials() + }; + + _client = new AmazonSecretsManagerClient( + credentials, + RegionEndpoint.GetBySystemName(config.Region)); + } + + public async Task GetSecretAsync( + ConnectorContext context, + string path, + string? key = null, + CancellationToken ct = default) + { + try + { + var response = await _client!.GetSecretValueAsync( + new GetSecretValueRequest { SecretId = path }, ct); + + var secretValue = response.SecretString; + + if (key is null) + return secretValue; + + // AWS secrets can be JSON, try to parse + try + { + var json = JsonDocument.Parse(secretValue); + if (json.RootElement.TryGetProperty(key, out var prop)) + { + return prop.GetString(); + } + } + catch (JsonException) + { + // Not JSON, return full value + } + + return secretValue; + } + catch (ResourceNotFoundException) + { + return null; + } + } + + public async Task> GetSecretsAsync( + ConnectorContext context, + string path, + CancellationToken ct = default) + { + var secretValue = await GetSecretAsync(context, path, ct: ct); + if (secretValue is null) + return new Dictionary(); + + try + { + var json = JsonDocument.Parse(secretValue); + return json.RootElement.EnumerateObject() + .ToDictionary(p => p.Name, p => p.Value.GetString() ?? ""); + } + catch (JsonException) + { + return new Dictionary { ["value"] = secretValue }; + } + } + + public async Task SetSecretAsync( + ConnectorContext context, + string path, + string key, + string value, + CancellationToken ct = default) + { + // Get existing secret to merge + var existing = await GetSecretsAsync(context, path, ct); + var data = new Dictionary(existing) { [key] = value }; + var json = JsonSerializer.Serialize(data); + + try + { + await _client!.UpdateSecretAsync( + new UpdateSecretRequest + { + SecretId = path, + SecretString = json + }, ct); + } + catch (ResourceNotFoundException) + { + await _client.CreateSecretAsync( + new CreateSecretRequest + { + Name = path, + SecretString = json + }, ct); + } + } + + public async Task> ListSecretsAsync( + ConnectorContext context, + string? path = null, + CancellationToken ct = default) + { + var result = new List(); + string? nextToken = null; + + do + { + var response = await _client!.ListSecretsAsync( + new ListSecretsRequest + { + NextToken = nextToken, + Filters = path is not null + ? [new Filter { Key = FilterNameStringType.Name, Values = [path] }] + : null + }, ct); + + result.AddRange(response.SecretList.Select(s => s.Name)); + nextToken = response.NextToken; + } while (!string.IsNullOrEmpty(nextToken)); + + return result; + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] HashiCorp Vault token auth works +- [ ] HashiCorp Vault AppRole auth works +- [ ] HashiCorp Vault Kubernetes auth works +- [ ] Azure Key Vault service principal works +- [ ] Azure Key Vault managed identity works +- [ ] AWS Secrets Manager IAM auth works +- [ ] All connectors read/write secrets +- [ ] Secret listing works with path prefix +- [ ] Integration tests pass + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 102_002 Connector Runtime | Internal | TODO | +| VaultSharp | NuGet | Available | +| Azure.Security.KeyVault.Secrets | NuGet | Available | +| AWSSDK.SecretsManager | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IVaultConnector | TODO | | +| HashiCorpVaultConnector | TODO | | +| AzureKeyVaultConnector | TODO | | +| AwsSecretsManagerConnector | TODO | | +| Unit tests | TODO | | +| Integration tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_102_006_INTHUB_doctor_checks.md b/docs/implplan/SPRINT_20260110_102_006_INTHUB_doctor_checks.md new file mode 100644 index 000000000..14ca1cef6 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_102_006_INTHUB_doctor_checks.md @@ -0,0 +1,605 @@ +# SPRINT: Doctor Checks + +> **Sprint ID:** 102_006 +> **Module:** INTHUB +> **Phase:** 2 - Integration Hub +> **Status:** TODO +> **Parent:** [102_000_INDEX](SPRINT_20260110_102_000_INDEX_integration_hub.md) + +--- + +## Overview + +Implement Doctor checks that diagnose integration health issues. Checks validate connectivity, credentials, permissions, and rate limit status. + +### Objectives + +- Connectivity check for all integration types +- Credential validation checks +- Permission verification checks +- Rate limit status checks +- Aggregated health report generation + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.IntegrationHub/ +│ └── Doctor/ +│ ├── IDoctorCheck.cs +│ ├── DoctorService.cs +│ ├── Checks/ +│ │ ├── ConnectivityCheck.cs +│ │ ├── CredentialsCheck.cs +│ │ ├── PermissionsCheck.cs +│ │ └── RateLimitCheck.cs +│ └── Reports/ +│ ├── DoctorReport.cs +│ └── CheckResult.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### IDoctorCheck Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Doctor; + +public interface IDoctorCheck +{ + string Name { get; } + string Description { get; } + CheckCategory Category { get; } + + Task ExecuteAsync( + Integration integration, + IConnectorPlugin connector, + CancellationToken ct = default); +} + +public enum CheckCategory +{ + Connectivity, + Credentials, + Permissions, + RateLimit +} + +public sealed record CheckResult( + string CheckName, + CheckStatus Status, + string Message, + IReadOnlyDictionary? Details = null, + TimeSpan Duration = default +) +{ + public static CheckResult Pass(string name, string message, + IReadOnlyDictionary? details = null) => + new(name, CheckStatus.Pass, message, details); + + public static CheckResult Warn(string name, string message, + IReadOnlyDictionary? details = null) => + new(name, CheckStatus.Warning, message, details); + + public static CheckResult Fail(string name, string message, + IReadOnlyDictionary? details = null) => + new(name, CheckStatus.Fail, message, details); + + public static CheckResult Skip(string name, string message) => + new(name, CheckStatus.Skipped, message); +} + +public enum CheckStatus +{ + Pass, + Warning, + Fail, + Skipped +} +``` + +### DoctorService + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Doctor; + +public sealed class DoctorService +{ + private readonly IIntegrationManager _integrationManager; + private readonly IConnectorFactory _connectorFactory; + private readonly IEnumerable _checks; + private readonly ILogger _logger; + + public DoctorService( + IIntegrationManager integrationManager, + IConnectorFactory connectorFactory, + IEnumerable checks, + ILogger logger) + { + _integrationManager = integrationManager; + _connectorFactory = connectorFactory; + _checks = checks; + _logger = logger; + } + + public async Task CheckIntegrationAsync( + Guid integrationId, + CancellationToken ct = default) + { + var integration = await _integrationManager.GetAsync(integrationId, ct) + ?? throw new IntegrationNotFoundException(integrationId); + + var connector = await _connectorFactory.CreateAsync(integration, ct); + var results = new List(); + + foreach (var check in _checks) + { + try + { + var sw = Stopwatch.StartNew(); + var result = await check.ExecuteAsync(integration, connector, ct); + results.Add(result with { Duration = sw.Elapsed }); + } + catch (Exception ex) + { + _logger.LogError(ex, + "Doctor check {CheckName} failed for integration {IntegrationId}", + check.Name, integrationId); + + results.Add(CheckResult.Fail( + check.Name, + $"Check threw exception: {ex.Message}")); + } + } + + return new DoctorReport( + IntegrationId: integrationId, + IntegrationName: integration.Name, + IntegrationType: integration.Type, + CheckedAt: TimeProvider.System.GetUtcNow(), + Results: results, + OverallStatus: DetermineOverallStatus(results)); + } + + public async Task> CheckAllIntegrationsAsync( + CancellationToken ct = default) + { + var integrations = await _integrationManager.ListAsync(ct: ct); + var reports = new List(); + + foreach (var integration in integrations) + { + var report = await CheckIntegrationAsync(integration.Id, ct); + reports.Add(report); + } + + return reports; + } + + private static HealthStatus DetermineOverallStatus( + IReadOnlyList results) + { + if (results.Any(r => r.Status == CheckStatus.Fail)) + return HealthStatus.Unhealthy; + + if (results.Any(r => r.Status == CheckStatus.Warning)) + return HealthStatus.Degraded; + + return HealthStatus.Healthy; + } +} + +public sealed record DoctorReport( + Guid IntegrationId, + string IntegrationName, + IntegrationType IntegrationType, + DateTimeOffset CheckedAt, + IReadOnlyList Results, + HealthStatus OverallStatus +); +``` + +### ConnectivityCheck + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Doctor.Checks; + +public sealed class ConnectivityCheck : IDoctorCheck +{ + public string Name => "connectivity"; + public string Description => "Verifies network connectivity to the integration endpoint"; + public CheckCategory Category => CheckCategory.Connectivity; + + public async Task ExecuteAsync( + Integration integration, + IConnectorPlugin connector, + CancellationToken ct = default) + { + try + { + var result = await connector.TestConnectionAsync( + new ConnectorContext( + integration.Id, + integration.TenantId, + default, // Config already loaded in connector + null!, + NullLogger.Instance), + ct); + + if (result.Success) + { + return CheckResult.Pass( + Name, + $"Connected successfully in {result.ResponseTime.TotalMilliseconds:F0}ms", + new Dictionary + { + ["response_time_ms"] = result.ResponseTime.TotalMilliseconds + }); + } + + return CheckResult.Fail( + Name, + $"Connection failed: {result.Message}"); + } + catch (HttpRequestException ex) + { + return CheckResult.Fail( + Name, + $"Network error: {ex.Message}", + new Dictionary + { + ["exception_type"] = ex.GetType().Name + }); + } + catch (TaskCanceledException) + { + return CheckResult.Fail(Name, "Connection timed out"); + } + } +} +``` + +### CredentialsCheck + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Doctor.Checks; + +public sealed class CredentialsCheck : IDoctorCheck +{ + public string Name => "credentials"; + public string Description => "Validates that credentials are valid and not expired"; + public CheckCategory Category => CheckCategory.Credentials; + + public async Task ExecuteAsync( + Integration integration, + IConnectorPlugin connector, + CancellationToken ct = default) + { + // First verify we can connect + var connectionResult = await connector.TestConnectionAsync( + CreateContext(integration), ct); + + if (!connectionResult.Success) + { + // Check if it's specifically a credential issue + if (IsCredentialError(connectionResult.Message)) + { + return CheckResult.Fail( + Name, + $"Invalid credentials: {connectionResult.Message}", + new Dictionary + { + ["error_type"] = "authentication_failed" + }); + } + + return CheckResult.Skip( + Name, + "Skipped: connectivity check failed first"); + } + + // Check for expiring credentials if applicable + if (connector is ICredentialExpiration credExpiration) + { + var expiration = await credExpiration.GetCredentialExpirationAsync(ct); + if (expiration.HasValue) + { + var remaining = expiration.Value - TimeProvider.System.GetUtcNow(); + + if (remaining < TimeSpan.Zero) + { + return CheckResult.Fail( + Name, + "Credentials have expired", + new Dictionary + { + ["expired_at"] = expiration.Value.ToString("O") + }); + } + + if (remaining < TimeSpan.FromDays(7)) + { + return CheckResult.Warn( + Name, + $"Credentials expire in {remaining.Days} days", + new Dictionary + { + ["expires_at"] = expiration.Value.ToString("O"), + ["days_remaining"] = remaining.Days + }); + } + } + } + + return CheckResult.Pass(Name, "Credentials are valid"); + } + + private static bool IsCredentialError(string? message) + { + if (message is null) return false; + + var credentialKeywords = new[] + { + "401", "unauthorized", "authentication", + "invalid token", "invalid credentials", + "access denied", "forbidden" + }; + + return credentialKeywords.Any(k => + message.Contains(k, StringComparison.OrdinalIgnoreCase)); + } +} +``` + +### PermissionsCheck + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Doctor.Checks; + +public sealed class PermissionsCheck : IDoctorCheck +{ + public string Name => "permissions"; + public string Description => "Verifies the integration has required permissions"; + public CheckCategory Category => CheckCategory.Permissions; + + public async Task ExecuteAsync( + Integration integration, + IConnectorPlugin connector, + CancellationToken ct = default) + { + var requiredCapabilities = GetRequiredCapabilities(integration.Type); + var availableCapabilities = connector.GetCapabilities(); + + var missing = requiredCapabilities + .Except(availableCapabilities) + .ToList(); + + if (missing.Count > 0) + { + return CheckResult.Warn( + Name, + $"Missing capabilities: {string.Join(", ", missing)}", + new Dictionary + { + ["missing_capabilities"] = missing, + ["available_capabilities"] = availableCapabilities + }); + } + + // Type-specific permission checks + var specificResult = integration.Type switch + { + IntegrationType.Scm => await CheckScmPermissionsAsync( + (IScmConnector)connector, ct), + IntegrationType.Registry => await CheckRegistryPermissionsAsync( + (IRegistryConnector)connector, ct), + IntegrationType.Vault => await CheckVaultPermissionsAsync( + (IVaultConnector)connector, ct), + _ => null + }; + + if (specificResult is not null && specificResult.Status != CheckStatus.Pass) + { + return specificResult; + } + + return CheckResult.Pass( + Name, + "All required permissions available", + new Dictionary + { + ["capabilities"] = availableCapabilities + }); + } + + private async Task CheckScmPermissionsAsync( + IScmConnector connector, + CancellationToken ct) + { + try + { + // Try to list repos to verify read access + var repos = await connector.ListRepositoriesAsync( + CreateContext(), null, ct); + + return repos.Count == 0 + ? CheckResult.Warn(Name, "No repositories accessible") + : null; + } + catch (Exception ex) + { + return CheckResult.Fail( + Name, + $"Cannot list repositories: {ex.Message}"); + } + } + + private async Task CheckRegistryPermissionsAsync( + IRegistryConnector connector, + CancellationToken ct) + { + try + { + var repos = await connector.ListRepositoriesAsync( + CreateContext(), null, ct); + + return repos.Count == 0 + ? CheckResult.Warn(Name, "No repositories accessible") + : null; + } + catch (Exception ex) + { + return CheckResult.Fail( + Name, + $"Cannot list repositories: {ex.Message}"); + } + } + + private async Task CheckVaultPermissionsAsync( + IVaultConnector connector, + CancellationToken ct) + { + try + { + var secrets = await connector.ListSecretsAsync(CreateContext(), ct: ct); + return null; // Can list = has permissions + } + catch (Exception ex) + { + return CheckResult.Fail( + Name, + $"Cannot list secrets: {ex.Message}"); + } + } +} +``` + +### RateLimitCheck + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Doctor.Checks; + +public sealed class RateLimitCheck : IDoctorCheck +{ + public string Name => "rate_limit"; + public string Description => "Checks remaining API rate limit quota"; + public CheckCategory Category => CheckCategory.RateLimit; + + public async Task ExecuteAsync( + Integration integration, + IConnectorPlugin connector, + CancellationToken ct = default) + { + if (connector is not IRateLimitInfo rateLimitInfo) + { + return CheckResult.Skip( + Name, + "Connector does not expose rate limit information"); + } + + try + { + var info = await rateLimitInfo.GetRateLimitInfoAsync(ct); + + var percentUsed = info.Limit > 0 + ? (double)(info.Limit - info.Remaining) / info.Limit * 100 + : 0; + + var details = new Dictionary + { + ["limit"] = info.Limit, + ["remaining"] = info.Remaining, + ["reset_at"] = info.ResetAt?.ToString("O") ?? "unknown", + ["percent_used"] = percentUsed + }; + + if (info.Remaining == 0) + { + return CheckResult.Fail( + Name, + $"Rate limit exhausted, resets at {info.ResetAt:HH:mm:ss}", + details); + } + + if (percentUsed > 80) + { + return CheckResult.Warn( + Name, + $"Rate limit {percentUsed:F0}% consumed ({info.Remaining}/{info.Limit} remaining)", + details); + } + + return CheckResult.Pass( + Name, + $"Rate limit healthy: {info.Remaining}/{info.Limit} remaining", + details); + } + catch (Exception ex) + { + return CheckResult.Skip( + Name, + $"Could not retrieve rate limit info: {ex.Message}"); + } + } +} + +public interface IRateLimitInfo +{ + Task GetRateLimitInfoAsync(CancellationToken ct = default); +} + +public sealed record RateLimitStatus( + int Limit, + int Remaining, + DateTimeOffset? ResetAt +); +``` + +--- + +## Acceptance Criteria + +- [ ] Connectivity check works for all types +- [ ] Credential check detects auth failures +- [ ] Credential expiration warning works +- [ ] Permission check verifies capabilities +- [ ] Rate limit check warns on low quota +- [ ] Doctor report aggregates all results +- [ ] Check all integrations at once works +- [ ] Health status updates after checks +- [ ] Unit test coverage ≥85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 102_001 Integration Manager | Internal | TODO | +| 102_002 Connector Runtime | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IDoctorCheck interface | TODO | | +| DoctorService | TODO | | +| ConnectivityCheck | TODO | | +| CredentialsCheck | TODO | | +| PermissionsCheck | TODO | | +| RateLimitCheck | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_103_000_INDEX_environment_manager.md b/docs/implplan/SPRINT_20260110_103_000_INDEX_environment_manager.md new file mode 100644 index 000000000..7ffd35a39 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_103_000_INDEX_environment_manager.md @@ -0,0 +1,197 @@ +# SPRINT INDEX: Phase 3 - Environment Manager + +> **Epic:** Release Orchestrator +> **Phase:** 3 - Environment Manager +> **Batch:** 103 +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_release_orchestrator.md) + +--- + +## Overview + +Phase 3 implements the Environment Manager - managing deployment environments (Dev, Stage, Prod), targets within environments, and agent registration. + +### Objectives + +- Environment CRUD with promotion order +- Target registry for deployment destinations +- Agent registration and lifecycle +- Inventory synchronization from targets + +--- + +## Sprint Structure + +| Sprint ID | Title | Module | Status | Dependencies | +|-----------|-------|--------|--------|--------------| +| 103_001 | Environment CRUD | ENVMGR | TODO | 101_001 | +| 103_002 | Target Registry | ENVMGR | TODO | 103_001 | +| 103_003 | Agent Manager - Core | ENVMGR | TODO | 103_002 | +| 103_004 | Inventory Sync | ENVMGR | TODO | 103_002, 103_003 | + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ ENVIRONMENT MANAGER │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ ENVIRONMENT SERVICE (103_001) │ │ +│ │ │ │ +│ │ - Create/Update/Delete environments │ │ +│ │ - Promotion order management │ │ +│ │ - Freeze window configuration │ │ +│ │ - Auto-promotion rules │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ TARGET REGISTRY (103_002) │ │ +│ │ │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │Docker Host │ │Compose Host │ │ECS Service │ │ Nomad Job │ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ │ │ │ +│ │ - Target registration - Health monitoring │ │ +│ │ - Connection validation - Capability detection │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ AGENT MANAGER (103_003) │ │ +│ │ │ │ +│ │ - Agent registration flow - Certificate issuance │ │ +│ │ - Heartbeat processing - Capability registration │ │ +│ │ - Agent lifecycle (active/inactive/revoked) │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ INVENTORY SYNC (103_004) │ │ +│ │ │ │ +│ │ - Pull current state from targets │ │ +│ │ - Detect drift from expected state │ │ +│ │ - Container inventory snapshot │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Deliverables Summary + +### 103_001: Environment CRUD + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IEnvironmentService` | Interface | Environment operations | +| `EnvironmentService` | Class | Implementation | +| `Environment` | Model | Environment entity | +| `FreezeWindow` | Model | Deployment freeze windows | +| `EnvironmentValidator` | Class | Business rule validation | + +### 103_002: Target Registry + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `ITargetRegistry` | Interface | Target registration | +| `TargetRegistry` | Class | Implementation | +| `Target` | Model | Deployment target entity | +| `TargetType` | Enum | docker_host, compose_host, ecs_service, nomad_job | +| `TargetHealthChecker` | Class | Health monitoring | + +### 103_003: Agent Manager - Core + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IAgentManager` | Interface | Agent lifecycle | +| `AgentManager` | Class | Implementation | +| `AgentRegistration` | Flow | One-time token registration | +| `AgentCertificateService` | Class | mTLS certificate issuance | +| `HeartbeatProcessor` | Class | Process agent heartbeats | + +### 103_004: Inventory Sync + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IInventorySyncService` | Interface | Sync operations | +| `InventorySyncService` | Class | Implementation | +| `InventorySnapshot` | Model | Container state snapshot | +| `DriftDetector` | Class | Detect configuration drift | + +--- + +## Key Interfaces + +```csharp +public interface IEnvironmentService +{ + Task CreateAsync(CreateEnvironmentRequest request, CancellationToken ct); + Task UpdateAsync(Guid id, UpdateEnvironmentRequest request, CancellationToken ct); + Task DeleteAsync(Guid id, CancellationToken ct); + Task GetAsync(Guid id, CancellationToken ct); + Task> ListAsync(CancellationToken ct); + Task ReorderAsync(IReadOnlyList orderedIds, CancellationToken ct); + Task IsFrozenAsync(Guid id, CancellationToken ct); +} + +public interface ITargetRegistry +{ + Task RegisterAsync(RegisterTargetRequest request, CancellationToken ct); + Task UpdateAsync(Guid id, UpdateTargetRequest request, CancellationToken ct); + Task UnregisterAsync(Guid id, CancellationToken ct); + Task GetAsync(Guid id, CancellationToken ct); + Task> ListByEnvironmentAsync(Guid environmentId, CancellationToken ct); + Task UpdateHealthAsync(Guid id, HealthStatus status, CancellationToken ct); +} + +public interface IAgentManager +{ + Task CreateRegistrationTokenAsync(CreateTokenRequest request, CancellationToken ct); + Task RegisterAsync(AgentRegistrationRequest request, CancellationToken ct); + Task ProcessHeartbeatAsync(AgentHeartbeat heartbeat, CancellationToken ct); + Task GetAsync(Guid id, CancellationToken ct); + Task RevokeAsync(Guid id, CancellationToken ct); +} +``` + +--- + +## Dependencies + +### External Dependencies + +| Dependency | Purpose | +|------------|---------| +| PostgreSQL 16+ | Database | +| gRPC | Agent communication | + +### Internal Dependencies + +| Module | Purpose | +|--------|---------| +| 101_001 Database Schema | Tables | +| Authority | Tenant context, PKI | + +--- + +## Acceptance Criteria + +- [ ] Environment CRUD with ordering +- [ ] Freeze window blocks deployments +- [ ] Target types validated +- [ ] Agent registration flow works +- [ ] mTLS certificates issued +- [ ] Heartbeats update status +- [ ] Inventory snapshot captured +- [ ] Drift detection works +- [ ] Unit test coverage ≥80% + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Phase 3 index created | diff --git a/docs/implplan/SPRINT_20260110_103_001_ENVMGR_environment.md b/docs/implplan/SPRINT_20260110_103_001_ENVMGR_environment.md new file mode 100644 index 000000000..b3ebab046 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_103_001_ENVMGR_environment.md @@ -0,0 +1,401 @@ +# SPRINT: Environment CRUD + +> **Sprint ID:** 103_001 +> **Module:** ENVMGR +> **Phase:** 3 - Environment Manager +> **Status:** TODO +> **Parent:** [103_000_INDEX](SPRINT_20260110_103_000_INDEX_environment_manager.md) + +--- + +## Overview + +Implement Environment CRUD operations including promotion order management, freeze windows, and auto-promotion configuration. + +### Objectives + +- Create/Read/Update/Delete environments +- Manage promotion order (Dev → Stage → Prod) +- Configure freeze windows for deployment blocks +- Set up auto-promotion rules between environments + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Environment/ +│ ├── Services/ +│ │ ├── IEnvironmentService.cs +│ │ ├── EnvironmentService.cs +│ │ └── EnvironmentValidator.cs +│ ├── Store/ +│ │ ├── IEnvironmentStore.cs +│ │ ├── EnvironmentStore.cs +│ │ └── EnvironmentMapper.cs +│ ├── FreezeWindow/ +│ │ ├── IFreezeWindowService.cs +│ │ ├── FreezeWindowService.cs +│ │ └── FreezeWindowChecker.cs +│ ├── Models/ +│ │ ├── Environment.cs +│ │ ├── FreezeWindow.cs +│ │ ├── EnvironmentConfig.cs +│ │ └── PromotionPolicy.cs +│ └── Events/ +│ ├── EnvironmentCreated.cs +│ ├── EnvironmentUpdated.cs +│ └── FreezeWindowActivated.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Environment.Tests/ +``` + +--- + +## Architecture Reference + +- [Environment Manager](../modules/release-orchestrator/modules/environment-manager.md) +- [Data Model Schema](../modules/release-orchestrator/data-model/schema.md) + +--- + +## Deliverables + +### IEnvironmentService Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Services; + +public interface IEnvironmentService +{ + Task CreateAsync(CreateEnvironmentRequest request, CancellationToken ct = default); + Task UpdateAsync(Guid id, UpdateEnvironmentRequest request, CancellationToken ct = default); + Task DeleteAsync(Guid id, CancellationToken ct = default); + Task GetAsync(Guid id, CancellationToken ct = default); + Task GetByNameAsync(string name, CancellationToken ct = default); + Task> ListAsync(CancellationToken ct = default); + Task> ListOrderedAsync(CancellationToken ct = default); + Task ReorderAsync(IReadOnlyList orderedIds, CancellationToken ct = default); + Task GetNextPromotionTargetAsync(Guid environmentId, CancellationToken ct = default); +} + +public sealed record CreateEnvironmentRequest( + string Name, + string DisplayName, + string? Description, + int OrderIndex, + bool IsProduction, + int RequiredApprovals, + bool RequireSeparationOfDuties, + Guid? AutoPromoteFrom, + int DeploymentTimeoutSeconds +); + +public sealed record UpdateEnvironmentRequest( + string? DisplayName = null, + string? Description = null, + int? OrderIndex = null, + bool? IsProduction = null, + int? RequiredApprovals = null, + bool? RequireSeparationOfDuties = null, + Guid? AutoPromoteFrom = null, + int? DeploymentTimeoutSeconds = null +); +``` + +### Environment Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Models; + +public sealed record Environment +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required string Name { get; init; } + public required string DisplayName { get; init; } + public string? Description { get; init; } + public required int OrderIndex { get; init; } + public required bool IsProduction { get; init; } + public required int RequiredApprovals { get; init; } + public required bool RequireSeparationOfDuties { get; init; } + public Guid? AutoPromoteFrom { get; init; } + public required int DeploymentTimeoutSeconds { get; init; } + public DateTimeOffset CreatedAt { get; init; } + public DateTimeOffset UpdatedAt { get; init; } + public Guid CreatedBy { get; init; } +} +``` + +### IFreezeWindowService Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.FreezeWindow; + +public interface IFreezeWindowService +{ + Task CreateAsync(CreateFreezeWindowRequest request, CancellationToken ct = default); + Task UpdateAsync(Guid id, UpdateFreezeWindowRequest request, CancellationToken ct = default); + Task DeleteAsync(Guid id, CancellationToken ct = default); + Task> ListByEnvironmentAsync(Guid environmentId, CancellationToken ct = default); + Task IsEnvironmentFrozenAsync(Guid environmentId, CancellationToken ct = default); + Task GetActiveFreezeWindowAsync(Guid environmentId, CancellationToken ct = default); + Task GrantExemptionAsync(Guid freezeWindowId, GrantExemptionRequest request, CancellationToken ct = default); +} + +public sealed record FreezeWindow +{ + public required Guid Id { get; init; } + public required Guid EnvironmentId { get; init; } + public required string Name { get; init; } + public required DateTimeOffset StartAt { get; init; } + public required DateTimeOffset EndAt { get; init; } + public string? Reason { get; init; } + public bool IsRecurring { get; init; } + public string? RecurrenceRule { get; init; } // iCal RRULE format + public DateTimeOffset CreatedAt { get; init; } + public Guid CreatedBy { get; init; } +} + +public sealed record CreateFreezeWindowRequest( + Guid EnvironmentId, + string Name, + DateTimeOffset StartAt, + DateTimeOffset EndAt, + string? Reason, + bool IsRecurring = false, + string? RecurrenceRule = null +); +``` + +### EnvironmentValidator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Services; + +public sealed class EnvironmentValidator +{ + private readonly IEnvironmentStore _store; + + public EnvironmentValidator(IEnvironmentStore store) + { + _store = store; + } + + public async Task ValidateCreateAsync( + CreateEnvironmentRequest request, + CancellationToken ct = default) + { + var errors = new List(); + + // Name format validation + if (!IsValidEnvironmentName(request.Name)) + { + errors.Add("Environment name must be lowercase alphanumeric with hyphens, 2-32 characters"); + } + + // Check for duplicate name + var existing = await _store.GetByNameAsync(request.Name, ct); + if (existing is not null) + { + errors.Add($"Environment with name '{request.Name}' already exists"); + } + + // Check for duplicate order index + var existingOrder = await _store.GetByOrderIndexAsync(request.OrderIndex, ct); + if (existingOrder is not null) + { + errors.Add($"Environment with order index {request.OrderIndex} already exists"); + } + + // Validate auto-promote reference + if (request.AutoPromoteFrom.HasValue) + { + var sourceEnv = await _store.GetAsync(request.AutoPromoteFrom.Value, ct); + if (sourceEnv is null) + { + errors.Add("Auto-promote source environment not found"); + } + else if (sourceEnv.OrderIndex >= request.OrderIndex) + { + errors.Add("Auto-promote source must have lower order index (earlier in pipeline)"); + } + } + + // Production environment validation + if (request.IsProduction && request.RequiredApprovals < 1) + { + errors.Add("Production environments must require at least 1 approval"); + } + + return errors.Count == 0 + ? ValidationResult.Success() + : ValidationResult.Failure(errors); + } + + public async Task ValidateReorderAsync( + IReadOnlyList orderedIds, + CancellationToken ct = default) + { + var errors = new List(); + var allEnvironments = await _store.ListAsync(ct); + + // Check all environments are included + var existingIds = allEnvironments.Select(e => e.Id).ToHashSet(); + var providedIds = orderedIds.ToHashSet(); + + if (!existingIds.SetEquals(providedIds)) + { + errors.Add("Reorder must include all existing environments exactly once"); + } + + // Check no duplicates + if (orderedIds.Count != orderedIds.Distinct().Count()) + { + errors.Add("Reorder list contains duplicate environment IDs"); + } + + // Validate auto-promote chains don't break + foreach (var env in allEnvironments.Where(e => e.AutoPromoteFrom.HasValue)) + { + var sourceIndex = orderedIds.ToList().IndexOf(env.AutoPromoteFrom!.Value); + var targetIndex = orderedIds.ToList().IndexOf(env.Id); + + if (sourceIndex >= targetIndex) + { + errors.Add($"Reorder would break auto-promote chain: {env.Name} must come after its source"); + } + } + + return errors.Count == 0 + ? ValidationResult.Success() + : ValidationResult.Failure(errors); + } + + private static bool IsValidEnvironmentName(string name) => + Regex.IsMatch(name, @"^[a-z][a-z0-9-]{1,31}$"); +} +``` + +### Domain Events + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Events; + +public sealed record EnvironmentCreated( + Guid EnvironmentId, + Guid TenantId, + string Name, + int OrderIndex, + bool IsProduction, + DateTimeOffset CreatedAt, + Guid CreatedBy +) : IDomainEvent; + +public sealed record EnvironmentUpdated( + Guid EnvironmentId, + Guid TenantId, + IReadOnlyList ChangedFields, + DateTimeOffset UpdatedAt, + Guid UpdatedBy +) : IDomainEvent; + +public sealed record EnvironmentDeleted( + Guid EnvironmentId, + Guid TenantId, + string Name, + DateTimeOffset DeletedAt, + Guid DeletedBy +) : IDomainEvent; + +public sealed record FreezeWindowActivated( + Guid FreezeWindowId, + Guid EnvironmentId, + Guid TenantId, + DateTimeOffset StartAt, + DateTimeOffset EndAt, + string? Reason +) : IDomainEvent; + +public sealed record FreezeWindowDeactivated( + Guid FreezeWindowId, + Guid EnvironmentId, + Guid TenantId, + DateTimeOffset EndedAt +) : IDomainEvent; +``` + +--- + +## Acceptance Criteria + +- [ ] Create environment with all fields +- [ ] Update environment preserves audit fields +- [ ] Delete environment checks for targets/releases +- [ ] List environments returns ordered by OrderIndex +- [ ] Reorder validates chain integrity +- [ ] Auto-promote reference validated +- [ ] Freeze window blocks deployments +- [ ] Freeze window exemptions work +- [ ] Recurring freeze windows calculated correctly +- [ ] Domain events published +- [ ] Unit test coverage ≥85% + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `CreateEnvironment_ValidRequest_Succeeds` | Valid creation works | +| `CreateEnvironment_DuplicateName_Fails` | Duplicate name rejected | +| `CreateEnvironment_DuplicateOrder_Fails` | Duplicate order rejected | +| `UpdateEnvironment_ValidRequest_Succeeds` | Update works | +| `DeleteEnvironment_WithTargets_Fails` | Cannot delete with children | +| `Reorder_AllEnvironments_Succeeds` | Reorder works | +| `Reorder_BreaksAutoPromote_Fails` | Chain validation works | +| `IsFrozen_ActiveWindow_ReturnsTrue` | Freeze detection works | +| `IsFrozen_WithExemption_ReturnsFalse` | Exemption works | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `EnvironmentLifecycle_E2E` | Full CRUD cycle | +| `FreezeWindowRecurrence_E2E` | Recurring windows | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 101_001 Database Schema | Internal | TODO | +| Authority | Internal | Exists | +| ICal.Net | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IEnvironmentService | TODO | | +| EnvironmentService | TODO | | +| EnvironmentValidator | TODO | | +| IFreezeWindowService | TODO | | +| FreezeWindowService | TODO | | +| FreezeWindowChecker | TODO | | +| Domain events | TODO | | +| Unit tests | TODO | | +| Integration tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_103_002_ENVMGR_target_registry.md b/docs/implplan/SPRINT_20260110_103_002_ENVMGR_target_registry.md new file mode 100644 index 000000000..bd2126999 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_103_002_ENVMGR_target_registry.md @@ -0,0 +1,407 @@ +# SPRINT: Target Registry + +> **Sprint ID:** 103_002 +> **Module:** ENVMGR +> **Phase:** 3 - Environment Manager +> **Status:** TODO +> **Parent:** [103_000_INDEX](SPRINT_20260110_103_000_INDEX_environment_manager.md) + +--- + +## Overview + +Implement the Target Registry for managing deployment targets within environments. Targets represent where containers are deployed (Docker hosts, Compose hosts, ECS services, Nomad jobs). + +### Objectives + +- Register deployment targets in environments +- Support multiple target types (docker_host, compose_host, ecs_service, nomad_job) +- Validate target connection configurations +- Track target health status +- Manage target-agent associations + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Environment/ +│ ├── Target/ +│ │ ├── ITargetRegistry.cs +│ │ ├── TargetRegistry.cs +│ │ ├── TargetValidator.cs +│ │ └── TargetConnectionTester.cs +│ ├── Store/ +│ │ ├── ITargetStore.cs +│ │ └── TargetStore.cs +│ ├── Health/ +│ │ ├── ITargetHealthChecker.cs +│ │ ├── TargetHealthChecker.cs +│ │ └── HealthCheckScheduler.cs +│ └── Models/ +│ ├── Target.cs +│ ├── TargetType.cs +│ ├── TargetConfig.cs +│ └── TargetHealthStatus.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Environment.Tests/ + └── Target/ +``` + +--- + +## Architecture Reference + +- [Environment Manager](../modules/release-orchestrator/modules/environment-manager.md) +- [Agents](../modules/release-orchestrator/modules/agents.md) + +--- + +## Deliverables + +### ITargetRegistry Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Target; + +public interface ITargetRegistry +{ + Task RegisterAsync(RegisterTargetRequest request, CancellationToken ct = default); + Task UpdateAsync(Guid id, UpdateTargetRequest request, CancellationToken ct = default); + Task UnregisterAsync(Guid id, CancellationToken ct = default); + Task GetAsync(Guid id, CancellationToken ct = default); + Task GetByNameAsync(Guid environmentId, string name, CancellationToken ct = default); + Task> ListByEnvironmentAsync(Guid environmentId, CancellationToken ct = default); + Task> ListByAgentAsync(Guid agentId, CancellationToken ct = default); + Task> ListHealthyAsync(Guid environmentId, CancellationToken ct = default); + Task AssignAgentAsync(Guid targetId, Guid agentId, CancellationToken ct = default); + Task UnassignAgentAsync(Guid targetId, CancellationToken ct = default); + Task UpdateHealthAsync(Guid id, HealthStatus status, string? message, CancellationToken ct = default); + Task TestConnectionAsync(Guid id, CancellationToken ct = default); +} + +public sealed record RegisterTargetRequest( + Guid EnvironmentId, + string Name, + string DisplayName, + TargetType Type, + TargetConnectionConfig ConnectionConfig, + Guid? AgentId = null +); + +public sealed record UpdateTargetRequest( + string? DisplayName = null, + TargetConnectionConfig? ConnectionConfig = null, + Guid? AgentId = null +); +``` + +### Target Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Models; + +public sealed record Target +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required Guid EnvironmentId { get; init; } + public required string Name { get; init; } + public required string DisplayName { get; init; } + public required TargetType Type { get; init; } + public Guid? AgentId { get; init; } + public required HealthStatus HealthStatus { get; init; } + public string? HealthMessage { get; init; } + public DateTimeOffset? LastHealthCheck { get; init; } + public DateTimeOffset? LastSyncAt { get; init; } + public InventorySnapshot? InventorySnapshot { get; init; } + public DateTimeOffset CreatedAt { get; init; } + public DateTimeOffset UpdatedAt { get; init; } +} + +public enum TargetType +{ + DockerHost, + ComposeHost, + EcsService, + NomadJob +} + +public enum HealthStatus +{ + Unknown, + Healthy, + Degraded, + Unhealthy, + Unreachable +} +``` + +### TargetConnectionConfig + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Models; + +public abstract record TargetConnectionConfig +{ + public abstract TargetType TargetType { get; } +} + +public sealed record DockerHostConfig : TargetConnectionConfig +{ + public override TargetType TargetType => TargetType.DockerHost; + public required string Host { get; init; } + public int Port { get; init; } = 2376; + public bool UseTls { get; init; } = true; + public string? CaCertSecretRef { get; init; } + public string? ClientCertSecretRef { get; init; } + public string? ClientKeySecretRef { get; init; } +} + +public sealed record ComposeHostConfig : TargetConnectionConfig +{ + public override TargetType TargetType => TargetType.ComposeHost; + public required string Host { get; init; } + public int Port { get; init; } = 2376; + public bool UseTls { get; init; } = true; + public required string ComposeProjectPath { get; init; } + public string? ComposeFile { get; init; } = "docker-compose.yml"; + public string? CaCertSecretRef { get; init; } + public string? ClientCertSecretRef { get; init; } + public string? ClientKeySecretRef { get; init; } +} + +public sealed record EcsServiceConfig : TargetConnectionConfig +{ + public override TargetType TargetType => TargetType.EcsService; + public required string Region { get; init; } + public required string ClusterArn { get; init; } + public required string ServiceName { get; init; } + public string? RoleArn { get; init; } + public string? AccessKeyIdSecretRef { get; init; } + public string? SecretAccessKeySecretRef { get; init; } +} + +public sealed record NomadJobConfig : TargetConnectionConfig +{ + public override TargetType TargetType => TargetType.NomadJob; + public required string Address { get; init; } + public required string Namespace { get; init; } + public required string JobId { get; init; } + public string? TokenSecretRef { get; init; } + public bool UseTls { get; init; } = true; +} +``` + +### TargetHealthChecker + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Health; + +public interface ITargetHealthChecker +{ + Task CheckAsync(Target target, CancellationToken ct = default); +} + +public sealed class TargetHealthChecker : ITargetHealthChecker +{ + private readonly ITargetConnectionTester _connectionTester; + private readonly IAgentManager _agentManager; + private readonly ILogger _logger; + + public async Task CheckAsync( + Target target, + CancellationToken ct = default) + { + var sw = Stopwatch.StartNew(); + + try + { + // If target has assigned agent, check via agent + if (target.AgentId.HasValue) + { + return await CheckViaAgentAsync(target, ct); + } + + // Otherwise, check directly + return await CheckDirectlyAsync(target, ct); + } + catch (Exception ex) + { + _logger.LogWarning(ex, + "Health check failed for target {TargetId}", + target.Id); + + return new HealthCheckResult( + Status: HealthStatus.Unreachable, + Message: ex.Message, + Duration: sw.Elapsed, + CheckedAt: TimeProvider.System.GetUtcNow() + ); + } + } + + private async Task CheckViaAgentAsync( + Target target, + CancellationToken ct) + { + var agent = await _agentManager.GetAsync(target.AgentId!.Value, ct); + if (agent is null || agent.Status != AgentStatus.Active) + { + return new HealthCheckResult( + Status: HealthStatus.Unreachable, + Message: "Assigned agent is not active", + Duration: TimeSpan.Zero, + CheckedAt: TimeProvider.System.GetUtcNow() + ); + } + + // Dispatch health check task to agent + var result = await _agentManager.ExecuteTaskAsync( + target.AgentId!.Value, + new HealthCheckTask(target.Id, target.Type), + ct); + + return ParseAgentHealthResult(result); + } + + private async Task CheckDirectlyAsync( + Target target, + CancellationToken ct) + { + var testResult = await _connectionTester.TestAsync(target, ct); + + return new HealthCheckResult( + Status: testResult.Success ? HealthStatus.Healthy : HealthStatus.Unreachable, + Message: testResult.Message, + Duration: testResult.Duration, + CheckedAt: TimeProvider.System.GetUtcNow() + ); + } +} + +public sealed record HealthCheckResult( + HealthStatus Status, + string? Message, + TimeSpan Duration, + DateTimeOffset CheckedAt +); +``` + +### HealthCheckScheduler + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Health; + +public sealed class HealthCheckScheduler : IHostedService, IDisposable +{ + private readonly ITargetRegistry _targetRegistry; + private readonly ITargetHealthChecker _healthChecker; + private readonly ILogger _logger; + private readonly TimeSpan _checkInterval = TimeSpan.FromMinutes(1); + private Timer? _timer; + + public Task StartAsync(CancellationToken ct) + { + _timer = new Timer( + DoHealthChecks, + null, + TimeSpan.FromSeconds(30), + _checkInterval); + + return Task.CompletedTask; + } + + public Task StopAsync(CancellationToken ct) + { + _timer?.Change(Timeout.Infinite, 0); + return Task.CompletedTask; + } + + private async void DoHealthChecks(object? state) + { + try + { + var environments = await _environmentService.ListAsync(); + + foreach (var env in environments) + { + var targets = await _targetRegistry.ListByEnvironmentAsync(env.Id); + + foreach (var target in targets) + { + try + { + var result = await _healthChecker.CheckAsync(target); + await _targetRegistry.UpdateHealthAsync( + target.Id, + result.Status, + result.Message); + } + catch (Exception ex) + { + _logger.LogError(ex, + "Failed to update health for target {TargetId}", + target.Id); + } + } + } + } + catch (Exception ex) + { + _logger.LogError(ex, "Health check scheduler failed"); + } + } + + public void Dispose() => _timer?.Dispose(); +} +``` + +--- + +## Acceptance Criteria + +- [ ] Register target with connection config +- [ ] Update target preserves encrypted config +- [ ] Unregister checks for active deployments +- [ ] List targets by environment works +- [ ] List healthy targets filters correctly +- [ ] Assign/unassign agent works +- [ ] Connection test validates config +- [ ] Health check updates status +- [ ] Scheduled health checks run +- [ ] Unit test coverage ≥85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 103_001 Environment CRUD | Internal | TODO | +| 101_001 Database Schema | Internal | TODO | +| Docker.DotNet | NuGet | Available | +| AWSSDK.ECS | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| ITargetRegistry | TODO | | +| TargetRegistry | TODO | | +| TargetValidator | TODO | | +| TargetConnectionTester | TODO | | +| ITargetHealthChecker | TODO | | +| TargetHealthChecker | TODO | | +| HealthCheckScheduler | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_103_003_ENVMGR_agent_manager.md b/docs/implplan/SPRINT_20260110_103_003_ENVMGR_agent_manager.md new file mode 100644 index 000000000..7d46fe65f --- /dev/null +++ b/docs/implplan/SPRINT_20260110_103_003_ENVMGR_agent_manager.md @@ -0,0 +1,539 @@ +# SPRINT: Agent Manager - Core + +> **Sprint ID:** 103_003 +> **Module:** ENVMGR +> **Phase:** 3 - Environment Manager +> **Status:** TODO +> **Parent:** [103_000_INDEX](SPRINT_20260110_103_000_INDEX_environment_manager.md) + +--- + +## Overview + +Implement the Agent Manager for registering, authenticating, and managing deployment agents. Agents are secure executors that run on target hosts. + +### Objectives + +- One-time token generation for agent registration +- Agent registration with certificate issuance +- Heartbeat processing and status tracking +- Agent capability registration +- Agent lifecycle management (active/inactive/revoked) + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Agent/ +│ ├── Manager/ +│ │ ├── IAgentManager.cs +│ │ ├── AgentManager.cs +│ │ └── AgentValidator.cs +│ ├── Registration/ +│ │ ├── IAgentRegistration.cs +│ │ ├── AgentRegistration.cs +│ │ ├── RegistrationTokenService.cs +│ │ └── RegistrationToken.cs +│ ├── Certificate/ +│ │ ├── IAgentCertificateService.cs +│ │ ├── AgentCertificateService.cs +│ │ └── CertificateTemplate.cs +│ ├── Heartbeat/ +│ │ ├── IHeartbeatProcessor.cs +│ │ ├── HeartbeatProcessor.cs +│ │ └── HeartbeatTimeoutMonitor.cs +│ ├── Capability/ +│ │ ├── AgentCapability.cs +│ │ └── CapabilityRegistry.cs +│ └── Models/ +│ ├── Agent.cs +│ ├── AgentStatus.cs +│ └── AgentHeartbeat.cs +└── __Tests/ +``` + +--- + +## Architecture Reference + +- [Agent Security](../modules/release-orchestrator/security/agent-security.md) +- [Agents](../modules/release-orchestrator/modules/agents.md) + +--- + +## Deliverables + +### IAgentManager Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Agent.Manager; + +public interface IAgentManager +{ + // Registration + Task CreateRegistrationTokenAsync( + CreateRegistrationTokenRequest request, + CancellationToken ct = default); + + Task RegisterAsync( + AgentRegistrationRequest request, + CancellationToken ct = default); + + // Lifecycle + Task GetAsync(Guid id, CancellationToken ct = default); + Task GetByNameAsync(string name, CancellationToken ct = default); + Task> ListAsync(AgentFilter? filter = null, CancellationToken ct = default); + Task> ListActiveAsync(CancellationToken ct = default); + Task ActivateAsync(Guid id, CancellationToken ct = default); + Task DeactivateAsync(Guid id, CancellationToken ct = default); + Task RevokeAsync(Guid id, string reason, CancellationToken ct = default); + + // Heartbeat + Task ProcessHeartbeatAsync(AgentHeartbeat heartbeat, CancellationToken ct = default); + + // Certificate + Task RenewCertificateAsync(Guid id, CancellationToken ct = default); + + // Task execution + Task ExecuteTaskAsync( + Guid agentId, + AgentTask task, + CancellationToken ct = default); +} + +public sealed record CreateRegistrationTokenRequest( + string AgentName, + string DisplayName, + IReadOnlyList Capabilities, + TimeSpan? ValidFor = null +); + +public sealed record AgentRegistrationRequest( + string Token, + string AgentVersion, + string Hostname, + IReadOnlyDictionary Labels +); +``` + +### Agent Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Agent.Models; + +public sealed record Agent +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required string Name { get; init; } + public required string DisplayName { get; init; } + public required string Version { get; init; } + public string? Hostname { get; init; } + public required AgentStatus Status { get; init; } + public required ImmutableArray Capabilities { get; init; } + public ImmutableDictionary Labels { get; init; } = ImmutableDictionary.Empty; + public string? CertificateThumbprint { get; init; } + public DateTimeOffset? CertificateExpiresAt { get; init; } + public DateTimeOffset? LastHeartbeatAt { get; init; } + public AgentResourceStatus? LastResourceStatus { get; init; } + public DateTimeOffset? RegisteredAt { get; init; } + public DateTimeOffset CreatedAt { get; init; } + public DateTimeOffset UpdatedAt { get; init; } +} + +public enum AgentStatus +{ + Pending, // Token created, not yet registered + Active, // Registered and healthy + Inactive, // Manually deactivated + Stale, // Missed heartbeats + Revoked // Permanently disabled +} + +public enum AgentCapability +{ + Docker, + Compose, + Ssh, + WinRm +} + +public sealed record AgentResourceStatus( + double CpuPercent, + long MemoryUsedBytes, + long MemoryTotalBytes, + long DiskUsedBytes, + long DiskTotalBytes +); +``` + +### RegistrationTokenService + +```csharp +namespace StellaOps.ReleaseOrchestrator.Agent.Registration; + +public sealed class RegistrationTokenService +{ + private readonly IAgentStore _store; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + + private static readonly TimeSpan DefaultTokenValidity = TimeSpan.FromHours(24); + + public async Task CreateAsync( + CreateRegistrationTokenRequest request, + CancellationToken ct = default) + { + // Validate agent name is unique + var existing = await _store.GetByNameAsync(request.AgentName, ct); + if (existing is not null) + { + throw new AgentAlreadyExistsException(request.AgentName); + } + + var token = GenerateSecureToken(); + var validity = request.ValidFor ?? DefaultTokenValidity; + var expiresAt = _timeProvider.GetUtcNow().Add(validity); + + var registrationToken = new RegistrationToken + { + Id = _guidGenerator.NewGuid(), + Token = token, + AgentName = request.AgentName, + DisplayName = request.DisplayName, + Capabilities = request.Capabilities.ToImmutableArray(), + ExpiresAt = expiresAt, + CreatedAt = _timeProvider.GetUtcNow(), + IsUsed = false + }; + + await _store.SaveRegistrationTokenAsync(registrationToken, ct); + + return registrationToken; + } + + public async Task ValidateAndConsumeAsync( + string token, + CancellationToken ct = default) + { + var registrationToken = await _store.GetRegistrationTokenAsync(token, ct); + + if (registrationToken is null) + { + return null; + } + + if (registrationToken.IsUsed) + { + throw new RegistrationTokenAlreadyUsedException(token); + } + + if (registrationToken.ExpiresAt < _timeProvider.GetUtcNow()) + { + throw new RegistrationTokenExpiredException(token); + } + + // Mark as used + await _store.MarkRegistrationTokenUsedAsync(registrationToken.Id, ct); + + return registrationToken; + } + + private static string GenerateSecureToken() + { + var bytes = RandomNumberGenerator.GetBytes(32); + return Convert.ToBase64String(bytes) + .Replace("+", "-") + .Replace("/", "_") + .TrimEnd('='); + } +} + +public sealed record RegistrationToken +{ + public required Guid Id { get; init; } + public required string Token { get; init; } + public required string AgentName { get; init; } + public required string DisplayName { get; init; } + public required ImmutableArray Capabilities { get; init; } + public required DateTimeOffset ExpiresAt { get; init; } + public required DateTimeOffset CreatedAt { get; init; } + public required bool IsUsed { get; init; } +} +``` + +### AgentCertificateService + +```csharp +namespace StellaOps.ReleaseOrchestrator.Agent.Certificate; + +public interface IAgentCertificateService +{ + Task IssueAsync(Agent agent, CancellationToken ct = default); + Task RenewAsync(Agent agent, CancellationToken ct = default); + Task RevokeAsync(Agent agent, CancellationToken ct = default); + Task ValidateAsync(string thumbprint, CancellationToken ct = default); +} + +public sealed class AgentCertificateService : IAgentCertificateService +{ + private readonly ICertificateAuthority _ca; + private readonly IAgentStore _store; + private readonly TimeProvider _timeProvider; + + private static readonly TimeSpan CertificateValidity = TimeSpan.FromHours(24); + + public async Task IssueAsync( + Agent agent, + CancellationToken ct = default) + { + var now = _timeProvider.GetUtcNow(); + var notAfter = now.Add(CertificateValidity); + + var subject = new X500DistinguishedName( + $"CN={agent.Name}, O=StellaOps Agent, OU={agent.TenantId}"); + + var certificate = await _ca.IssueCertificateAsync( + subject: subject, + notBefore: now, + notAfter: notAfter, + keyUsage: X509KeyUsageFlags.DigitalSignature | X509KeyUsageFlags.KeyEncipherment, + extendedKeyUsage: [Oids.ClientAuthentication], + ct: ct); + + var agentCertificate = new AgentCertificate + { + Thumbprint = certificate.Thumbprint, + SubjectName = certificate.Subject, + NotBefore = now, + NotAfter = notAfter, + CertificatePem = certificate.ExportCertificatePem(), + PrivateKeyPem = certificate.GetRSAPrivateKey()!.ExportRSAPrivateKeyPem() + }; + + // Update agent with new certificate + await _store.UpdateCertificateAsync( + agent.Id, + agentCertificate.Thumbprint, + notAfter, + ct); + + return agentCertificate; + } + + public async Task RenewAsync( + Agent agent, + CancellationToken ct = default) + { + // Revoke old certificate if exists + if (!string.IsNullOrEmpty(agent.CertificateThumbprint)) + { + await _ca.RevokeCertificateAsync(agent.CertificateThumbprint, ct); + } + + // Issue new certificate + return await IssueAsync(agent, ct); + } + + public async Task RevokeAsync(Agent agent, CancellationToken ct = default) + { + if (!string.IsNullOrEmpty(agent.CertificateThumbprint)) + { + await _ca.RevokeCertificateAsync(agent.CertificateThumbprint, ct); + await _store.ClearCertificateAsync(agent.Id, ct); + } + } +} + +public sealed record AgentCertificate +{ + public required string Thumbprint { get; init; } + public required string SubjectName { get; init; } + public required DateTimeOffset NotBefore { get; init; } + public required DateTimeOffset NotAfter { get; init; } + public required string CertificatePem { get; init; } + public required string PrivateKeyPem { get; init; } +} +``` + +### HeartbeatProcessor + +```csharp +namespace StellaOps.ReleaseOrchestrator.Agent.Heartbeat; + +public interface IHeartbeatProcessor +{ + Task ProcessAsync(AgentHeartbeat heartbeat, CancellationToken ct = default); +} + +public sealed class HeartbeatProcessor : IHeartbeatProcessor +{ + private readonly IAgentStore _store; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public async Task ProcessAsync( + AgentHeartbeat heartbeat, + CancellationToken ct = default) + { + var agent = await _store.GetAsync(heartbeat.AgentId, ct); + if (agent is null) + { + _logger.LogWarning( + "Received heartbeat from unknown agent {AgentId}", + heartbeat.AgentId); + return; + } + + if (agent.Status == AgentStatus.Revoked) + { + _logger.LogWarning( + "Received heartbeat from revoked agent {AgentName}", + agent.Name); + return; + } + + // Update last heartbeat + await _store.UpdateHeartbeatAsync( + heartbeat.AgentId, + _timeProvider.GetUtcNow(), + heartbeat.ResourceStatus, + ct); + + // If agent was stale, reactivate it + if (agent.Status == AgentStatus.Stale) + { + await _store.UpdateStatusAsync( + heartbeat.AgentId, + AgentStatus.Active, + ct); + + _logger.LogInformation( + "Agent {AgentName} recovered from stale state", + agent.Name); + } + } +} + +public sealed record AgentHeartbeat( + Guid AgentId, + string Version, + AgentResourceStatus ResourceStatus, + IReadOnlyList RunningTasks, + DateTimeOffset Timestamp +); +``` + +### HeartbeatTimeoutMonitor + +```csharp +namespace StellaOps.ReleaseOrchestrator.Agent.Heartbeat; + +public sealed class HeartbeatTimeoutMonitor : IHostedService, IDisposable +{ + private readonly IAgentManager _agentManager; + private readonly ILogger _logger; + private readonly TimeSpan _checkInterval = TimeSpan.FromSeconds(30); + private readonly TimeSpan _heartbeatTimeout = TimeSpan.FromMinutes(2); + private Timer? _timer; + + public Task StartAsync(CancellationToken ct) + { + _timer = new Timer( + CheckForTimeouts, + null, + TimeSpan.FromMinutes(1), + _checkInterval); + + return Task.CompletedTask; + } + + public Task StopAsync(CancellationToken ct) + { + _timer?.Change(Timeout.Infinite, 0); + return Task.CompletedTask; + } + + private async void CheckForTimeouts(object? state) + { + try + { + var agents = await _agentManager.ListActiveAsync(); + var now = TimeProvider.System.GetUtcNow(); + + foreach (var agent in agents) + { + if (agent.LastHeartbeatAt is null) + continue; + + var timeSinceHeartbeat = now - agent.LastHeartbeatAt.Value; + + if (timeSinceHeartbeat > _heartbeatTimeout) + { + _logger.LogWarning( + "Agent {AgentName} missed heartbeat (last: {LastHeartbeat})", + agent.Name, + agent.LastHeartbeatAt); + + await _agentManager.MarkStaleAsync(agent.Id); + } + } + } + catch (Exception ex) + { + _logger.LogError(ex, "Heartbeat timeout check failed"); + } + } + + public void Dispose() => _timer?.Dispose(); +} +``` + +--- + +## Acceptance Criteria + +- [ ] Registration token created with expiry +- [ ] Token can only be used once +- [ ] Agent registered with certificate +- [ ] mTLS certificate issued correctly +- [ ] Certificate renewed before expiry +- [ ] Heartbeat updates agent status +- [ ] Stale agents detected after timeout +- [ ] Revoked agents cannot send heartbeats +- [ ] Agent capabilities stored correctly +- [ ] Unit test coverage ≥85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 103_002 Target Registry | Internal | TODO | +| 101_001 Database Schema | Internal | TODO | +| Authority | Internal | Exists | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IAgentManager | TODO | | +| AgentManager | TODO | | +| RegistrationTokenService | TODO | | +| IAgentCertificateService | TODO | | +| AgentCertificateService | TODO | | +| HeartbeatProcessor | TODO | | +| HeartbeatTimeoutMonitor | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_103_004_ENVMGR_inventory_sync.md b/docs/implplan/SPRINT_20260110_103_004_ENVMGR_inventory_sync.md new file mode 100644 index 000000000..ae5d2b3e7 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_103_004_ENVMGR_inventory_sync.md @@ -0,0 +1,385 @@ +# SPRINT: Inventory Sync + +> **Sprint ID:** 103_004 +> **Module:** ENVMGR +> **Phase:** 3 - Environment Manager +> **Status:** TODO +> **Parent:** [103_000_INDEX](SPRINT_20260110_103_000_INDEX_environment_manager.md) + +--- + +## Overview + +Implement Inventory Sync for capturing current container state from targets and detecting configuration drift. + +### Objectives + +- Pull current container state from targets +- Create inventory snapshots +- Detect drift from expected/deployed state +- Support scheduled and on-demand sync + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Environment/ +│ └── Inventory/ +│ ├── IInventorySyncService.cs +│ ├── InventorySyncService.cs +│ ├── InventoryCollector.cs +│ ├── DriftDetector.cs +│ ├── InventorySnapshot.cs +│ └── SyncScheduler.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### IInventorySyncService Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Inventory; + +public interface IInventorySyncService +{ + Task SyncTargetAsync(Guid targetId, CancellationToken ct = default); + Task> SyncEnvironmentAsync(Guid environmentId, CancellationToken ct = default); + Task GetLatestSnapshotAsync(Guid targetId, CancellationToken ct = default); + Task DetectDriftAsync(Guid targetId, CancellationToken ct = default); + Task DetectDriftAsync(Guid targetId, Guid releaseId, CancellationToken ct = default); +} +``` + +### InventorySnapshot Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Inventory; + +public sealed record InventorySnapshot +{ + public required Guid Id { get; init; } + public required Guid TargetId { get; init; } + public required DateTimeOffset CollectedAt { get; init; } + public required ImmutableArray Containers { get; init; } + public required ImmutableArray Networks { get; init; } + public required ImmutableArray Volumes { get; init; } + public string? CollectionError { get; init; } +} + +public sealed record ContainerInfo( + string Id, + string Name, + string Image, + string ImageDigest, + string Status, + ImmutableDictionary Labels, + ImmutableArray Ports, + DateTimeOffset CreatedAt, + DateTimeOffset? StartedAt +); + +public sealed record NetworkInfo( + string Id, + string Name, + string Driver, + ImmutableArray ConnectedContainers +); + +public sealed record VolumeInfo( + string Name, + string Driver, + string Mountpoint +); + +public sealed record PortMapping( + int PrivatePort, + int? PublicPort, + string Type +); +``` + +### DriftDetector + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Inventory; + +public sealed class DriftDetector +{ + public DriftReport Detect( + InventorySnapshot currentState, + ExpectedState expectedState) + { + var drifts = new List(); + + // Check for missing containers + foreach (var expected in expectedState.Containers) + { + var actual = currentState.Containers + .FirstOrDefault(c => c.Name == expected.Name); + + if (actual is null) + { + drifts.Add(new DriftItem( + Type: DriftType.Missing, + Resource: "container", + Name: expected.Name, + Expected: expected.ImageDigest, + Actual: null, + Message: $"Container '{expected.Name}' not found" + )); + continue; + } + + // Check digest mismatch + if (actual.ImageDigest != expected.ImageDigest) + { + drifts.Add(new DriftItem( + Type: DriftType.DigestMismatch, + Resource: "container", + Name: expected.Name, + Expected: expected.ImageDigest, + Actual: actual.ImageDigest, + Message: $"Container '{expected.Name}' has different image digest" + )); + } + + // Check status + if (actual.Status != "running") + { + drifts.Add(new DriftItem( + Type: DriftType.StatusMismatch, + Resource: "container", + Name: expected.Name, + Expected: "running", + Actual: actual.Status, + Message: $"Container '{expected.Name}' is not running" + )); + } + } + + // Check for unexpected containers + var expectedNames = expectedState.Containers.Select(c => c.Name).ToHashSet(); + foreach (var actual in currentState.Containers) + { + if (!expectedNames.Contains(actual.Name) && + !IsSystemContainer(actual.Name)) + { + drifts.Add(new DriftItem( + Type: DriftType.Unexpected, + Resource: "container", + Name: actual.Name, + Expected: null, + Actual: actual.ImageDigest, + Message: $"Unexpected container '{actual.Name}' found" + )); + } + } + + return new DriftReport( + TargetId: currentState.TargetId, + DetectedAt: TimeProvider.System.GetUtcNow(), + HasDrift: drifts.Count > 0, + Drifts: drifts.ToImmutableArray() + ); + } + + private static bool IsSystemContainer(string name) => + name.StartsWith("stella-agent") || + name.StartsWith("k8s_") || + name.StartsWith("rancher-"); +} + +public sealed record DriftReport( + Guid TargetId, + DateTimeOffset DetectedAt, + bool HasDrift, + ImmutableArray Drifts +); + +public sealed record DriftItem( + DriftType Type, + string Resource, + string Name, + string? Expected, + string? Actual, + string Message +); + +public enum DriftType +{ + Missing, + Unexpected, + DigestMismatch, + StatusMismatch, + ConfigMismatch +} +``` + +### InventoryCollector + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Inventory; + +public sealed class InventoryCollector +{ + private readonly IAgentManager _agentManager; + private readonly ILogger _logger; + + public async Task CollectAsync( + Target target, + CancellationToken ct = default) + { + var sw = Stopwatch.StartNew(); + + try + { + if (target.AgentId is null) + { + throw new InvalidOperationException( + $"Target {target.Name} has no assigned agent"); + } + + var agent = await _agentManager.GetAsync(target.AgentId.Value, ct); + if (agent?.Status != AgentStatus.Active) + { + throw new InvalidOperationException( + $"Agent for target {target.Name} is not active"); + } + + // Dispatch inventory collection task to agent + var result = await _agentManager.ExecuteTaskAsync( + target.AgentId.Value, + new InventoryCollectionTask(target.Id, target.Type), + ct); + + return ParseInventoryResult(target.Id, result); + } + catch (Exception ex) + { + _logger.LogError(ex, + "Failed to collect inventory from target {TargetName}", + target.Name); + + return new InventorySnapshot + { + Id = Guid.NewGuid(), + TargetId = target.Id, + CollectedAt = TimeProvider.System.GetUtcNow(), + Containers = [], + Networks = [], + Volumes = [], + CollectionError = ex.Message + }; + } + } +} +``` + +### SyncScheduler + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Inventory; + +public sealed class SyncScheduler : IHostedService, IDisposable +{ + private readonly IInventorySyncService _syncService; + private readonly IEnvironmentService _environmentService; + private readonly ILogger _logger; + private readonly TimeSpan _syncInterval = TimeSpan.FromMinutes(5); + private Timer? _timer; + + public Task StartAsync(CancellationToken ct) + { + _timer = new Timer( + DoSync, + null, + TimeSpan.FromMinutes(2), + _syncInterval); + + return Task.CompletedTask; + } + + public Task StopAsync(CancellationToken ct) + { + _timer?.Change(Timeout.Infinite, 0); + return Task.CompletedTask; + } + + private async void DoSync(object? state) + { + try + { + var environments = await _environmentService.ListAsync(); + + foreach (var env in environments) + { + try + { + await _syncService.SyncEnvironmentAsync(env.Id); + } + catch (Exception ex) + { + _logger.LogError(ex, + "Failed to sync environment {EnvironmentName}", + env.Name); + } + } + } + catch (Exception ex) + { + _logger.LogError(ex, "Inventory sync scheduler failed"); + } + } + + public void Dispose() => _timer?.Dispose(); +} +``` + +--- + +## Acceptance Criteria + +- [ ] Collect inventory from Docker targets +- [ ] Collect inventory from Compose targets +- [ ] Store inventory snapshots +- [ ] Detect missing containers +- [ ] Detect digest mismatches +- [ ] Detect unexpected containers +- [ ] Generate drift report +- [ ] Scheduled sync runs periodically +- [ ] On-demand sync works +- [ ] Unit test coverage ≥85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 103_002 Target Registry | Internal | TODO | +| 103_003 Agent Manager | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IInventorySyncService | TODO | | +| InventorySyncService | TODO | | +| InventoryCollector | TODO | | +| DriftDetector | TODO | | +| SyncScheduler | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_104_000_INDEX_release_manager.md b/docs/implplan/SPRINT_20260110_104_000_INDEX_release_manager.md new file mode 100644 index 000000000..fd0a97447 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_104_000_INDEX_release_manager.md @@ -0,0 +1,200 @@ +# SPRINT INDEX: Phase 4 - Release Manager + +> **Epic:** Release Orchestrator +> **Phase:** 4 - Release Manager +> **Batch:** 104 +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_release_orchestrator.md) + +--- + +## Overview + +Phase 4 implements the Release Manager - handling components (container images), versions, release bundles, and the release catalog. + +### Objectives + +- Component registry for tracking container images +- Version management with digest-first identity +- Release bundle creation (multiple components) +- Release catalog with status lifecycle + +--- + +## Sprint Structure + +| Sprint ID | Title | Module | Status | Dependencies | +|-----------|-------|--------|--------|--------------| +| 104_001 | Component Registry | RELMAN | TODO | 102_004 | +| 104_002 | Version Manager | RELMAN | TODO | 104_001 | +| 104_003 | Release Manager | RELMAN | TODO | 104_002 | +| 104_004 | Release Catalog | RELMAN | TODO | 104_003 | + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ RELEASE MANAGER │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ COMPONENT REGISTRY (104_001) │ │ +│ │ │ │ +│ │ Component ──────────────────────────────────────────────┐ │ │ +│ │ │ id: UUID │ │ │ +│ │ │ name: "api" │ │ │ +│ │ │ registry: acr.example.io │ │ │ +│ │ │ repository: myorg/api │ │ │ +│ │ └───────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ VERSION MANAGER (104_002) │ │ +│ │ │ │ +│ │ ComponentVersion ───────────────────────────────────────┐ │ │ +│ │ │ digest: sha256:abc123... │ │ │ +│ │ │ semver: 2.3.1 │ ◄── SOURCE │ │ +│ │ │ tag: v2.3.1 │ OF TRUTH│ │ +│ │ │ discovered_at: 2026-01-10T10:00:00Z │ │ │ +│ │ └───────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ RELEASE MANAGER (104_003) │ │ +│ │ │ │ +│ │ Release Bundle ─────────────────────────────────────────┐ │ │ +│ │ │ name: "myapp-v2.3.1" │ │ │ +│ │ │ status: ready │ │ │ +│ │ │ components: [ │ │ │ +│ │ │ { component: api, version: sha256:abc... } │ │ │ +│ │ │ { component: worker, version: sha256:def... } │ │ │ +│ │ │ ] │ │ │ +│ │ └───────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ RELEASE CATALOG (104_004) │ │ +│ │ │ │ +│ │ Status Lifecycle: │ │ +│ │ ┌──────┐ finalize ┌───────┐ promote ┌──────────┐ │ │ +│ │ │draft │──────────►│ ready │─────────►│promoting │ │ │ +│ │ └──────┘ └───────┘ └────┬─────┘ │ │ +│ │ │ │ │ +│ │ ┌──────────┴──────────┐ │ │ +│ │ ▼ ▼ │ │ +│ │ ┌──────────┐ ┌────────────┐ │ │ +│ │ │ deployed │ │ deprecated │ │ │ +│ │ └──────────┘ └────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Deliverables Summary + +### 104_001: Component Registry + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IComponentRegistry` | Interface | Component CRUD | +| `ComponentRegistry` | Class | Implementation | +| `Component` | Model | Container component entity | +| `ComponentDiscovery` | Class | Auto-discover from registry | + +### 104_002: Version Manager + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IVersionManager` | Interface | Version tracking | +| `VersionManager` | Class | Implementation | +| `ComponentVersion` | Model | Digest-first version | +| `VersionResolver` | Class | Tag → Digest resolution | +| `VersionWatcher` | Service | Watch for new versions | + +### 104_003: Release Manager + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IReleaseManager` | Interface | Release operations | +| `ReleaseManager` | Class | Implementation | +| `Release` | Model | Release bundle entity | +| `ReleaseComponent` | Model | Release-component mapping | +| `ReleaseFinalizer` | Class | Finalize and lock release | + +### 104_004: Release Catalog + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IReleaseCatalog` | Interface | Catalog queries | +| `ReleaseCatalog` | Class | Implementation | +| `ReleaseStatusMachine` | Class | Status transitions | +| `ReleaseHistory` | Service | Track release history | + +--- + +## Key Interfaces + +```csharp +public interface IComponentRegistry +{ + Task CreateAsync(CreateComponentRequest request, CancellationToken ct); + Task GetAsync(Guid id, CancellationToken ct); + Task> ListAsync(CancellationToken ct); + Task> GetVersionsAsync(Guid componentId, CancellationToken ct); +} + +public interface IVersionManager +{ + Task ResolveAsync(Guid componentId, string tagOrDigest, CancellationToken ct); + Task GetByDigestAsync(Guid componentId, string digest, CancellationToken ct); + Task> ListLatestAsync(Guid componentId, int count, CancellationToken ct); +} + +public interface IReleaseManager +{ + Task CreateAsync(CreateReleaseRequest request, CancellationToken ct); + Task AddComponentAsync(Guid releaseId, AddReleaseComponentRequest request, CancellationToken ct); + Task FinalizeAsync(Guid releaseId, CancellationToken ct); + Task GetAsync(Guid releaseId, CancellationToken ct); +} + +public interface IReleaseCatalog +{ + Task> ListAsync(ReleaseFilter? filter, CancellationToken ct); + Task GetLatestDeployedAsync(Guid environmentId, CancellationToken ct); + Task GetHistoryAsync(Guid releaseId, CancellationToken ct); +} +``` + +--- + +## Dependencies + +| Module | Purpose | +|--------|---------| +| 102_004 Registry Connectors | Tag resolution | +| 101_001 Database Schema | Tables | + +--- + +## Acceptance Criteria + +- [ ] Component registration works +- [ ] Tag resolves to immutable digest +- [ ] Release bundles multiple components +- [ ] Release finalization locks versions +- [ ] Status transitions validated +- [ ] Release history tracked +- [ ] Deprecation prevents promotion +- [ ] Unit test coverage ≥80% + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Phase 4 index created | diff --git a/docs/implplan/SPRINT_20260110_104_001_RELMAN_component_registry.md b/docs/implplan/SPRINT_20260110_104_001_RELMAN_component_registry.md new file mode 100644 index 000000000..dd9b8cb7b --- /dev/null +++ b/docs/implplan/SPRINT_20260110_104_001_RELMAN_component_registry.md @@ -0,0 +1,522 @@ +# SPRINT: Component Registry + +> **Sprint ID:** 104_001 +> **Module:** RELMAN +> **Phase:** 4 - Release Manager +> **Status:** TODO +> **Parent:** [104_000_INDEX](SPRINT_20260110_104_000_INDEX_release_manager.md) + +--- + +## Overview + +Implement the Component Registry for tracking container images as deployable components. + +### Objectives + +- Register container components with registry/repository metadata +- Discover components from connected registries +- Track component configurations and labels +- Support component lifecycle (active/deprecated) + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Release/ +│ ├── Component/ +│ │ ├── IComponentRegistry.cs +│ │ ├── ComponentRegistry.cs +│ │ ├── ComponentValidator.cs +│ │ └── ComponentDiscovery.cs +│ ├── Store/ +│ │ ├── IComponentStore.cs +│ │ └── ComponentStore.cs +│ └── Models/ +│ ├── Component.cs +│ ├── ComponentConfig.cs +│ └── ComponentStatus.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Release.Tests/ + └── Component/ +``` + +--- + +## Architecture Reference + +- [Release Manager](../modules/release-orchestrator/modules/release-manager.md) +- [Data Model Schema](../modules/release-orchestrator/data-model/schema.md) + +--- + +## Deliverables + +### IComponentRegistry Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Component; + +public interface IComponentRegistry +{ + Task RegisterAsync(RegisterComponentRequest request, CancellationToken ct = default); + Task UpdateAsync(Guid id, UpdateComponentRequest request, CancellationToken ct = default); + Task GetAsync(Guid id, CancellationToken ct = default); + Task GetByNameAsync(string name, CancellationToken ct = default); + Task> ListAsync(ComponentFilter? filter = null, CancellationToken ct = default); + Task> ListActiveAsync(CancellationToken ct = default); + Task DeprecateAsync(Guid id, string reason, CancellationToken ct = default); + Task ReactivateAsync(Guid id, CancellationToken ct = default); + Task DeleteAsync(Guid id, CancellationToken ct = default); +} + +public sealed record RegisterComponentRequest( + string Name, + string DisplayName, + string RegistryUrl, + string Repository, + string? Description = null, + IReadOnlyDictionary? Labels = null, + ComponentConfig? Config = null +); + +public sealed record UpdateComponentRequest( + string? DisplayName = null, + string? Description = null, + IReadOnlyDictionary? Labels = null, + ComponentConfig? Config = null +); + +public sealed record ComponentFilter( + string? NameContains = null, + string? RegistryUrl = null, + ComponentStatus? Status = null, + IReadOnlyDictionary? Labels = null +); +``` + +### Component Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Models; + +public sealed record Component +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required string Name { get; init; } + public required string DisplayName { get; init; } + public string? Description { get; init; } + public required string RegistryUrl { get; init; } + public required string Repository { get; init; } + public required ComponentStatus Status { get; init; } + public string? DeprecationReason { get; init; } + public ImmutableDictionary Labels { get; init; } = ImmutableDictionary.Empty; + public ComponentConfig? Config { get; init; } + public DateTimeOffset CreatedAt { get; init; } + public DateTimeOffset UpdatedAt { get; init; } + public Guid CreatedBy { get; init; } + + public string FullImageRef => $"{RegistryUrl}/{Repository}"; +} + +public enum ComponentStatus +{ + Active, + Deprecated +} + +public sealed record ComponentConfig +{ + public string? DefaultTag { get; init; } + public bool WatchForNewVersions { get; init; } = true; + public string? TagPattern { get; init; } // Regex for valid tags + public int? RetainVersionCount { get; init; } + public ImmutableArray RequiredLabels { get; init; } = []; +} +``` + +### ComponentRegistry Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Component; + +public sealed class ComponentRegistry : IComponentRegistry +{ + private readonly IComponentStore _store; + private readonly IComponentValidator _validator; + private readonly IRegistryConnectorFactory _registryFactory; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + private readonly ILogger _logger; + + public async Task RegisterAsync( + RegisterComponentRequest request, + CancellationToken ct = default) + { + // Validate request + var validation = await _validator.ValidateRegisterAsync(request, ct); + if (!validation.IsValid) + { + throw new ComponentValidationException(validation.Errors); + } + + // Verify registry connectivity + var connector = await _registryFactory.GetConnectorAsync(request.RegistryUrl, ct); + var exists = await connector.RepositoryExistsAsync(request.Repository, ct); + if (!exists) + { + throw new RepositoryNotFoundException(request.RegistryUrl, request.Repository); + } + + var component = new Component + { + Id = _guidGenerator.NewGuid(), + TenantId = _tenantContext.TenantId, + Name = request.Name, + DisplayName = request.DisplayName, + Description = request.Description, + RegistryUrl = request.RegistryUrl, + Repository = request.Repository, + Status = ComponentStatus.Active, + Labels = request.Labels?.ToImmutableDictionary() ?? ImmutableDictionary.Empty, + Config = request.Config, + CreatedAt = _timeProvider.GetUtcNow(), + UpdatedAt = _timeProvider.GetUtcNow(), + CreatedBy = _userContext.UserId + }; + + await _store.SaveAsync(component, ct); + + await _eventPublisher.PublishAsync(new ComponentRegistered( + component.Id, + component.TenantId, + component.Name, + component.FullImageRef, + _timeProvider.GetUtcNow() + ), ct); + + _logger.LogInformation( + "Registered component {ComponentName} ({FullRef})", + component.Name, + component.FullImageRef); + + return component; + } + + public async Task DeprecateAsync( + Guid id, + string reason, + CancellationToken ct = default) + { + var component = await _store.GetAsync(id, ct) + ?? throw new ComponentNotFoundException(id); + + if (component.Status == ComponentStatus.Deprecated) + { + return; + } + + var updated = component with + { + Status = ComponentStatus.Deprecated, + DeprecationReason = reason, + UpdatedAt = _timeProvider.GetUtcNow() + }; + + await _store.SaveAsync(updated, ct); + + await _eventPublisher.PublishAsync(new ComponentDeprecated( + component.Id, + component.TenantId, + component.Name, + reason, + _timeProvider.GetUtcNow() + ), ct); + } + + public async Task DeleteAsync(Guid id, CancellationToken ct = default) + { + var component = await _store.GetAsync(id, ct) + ?? throw new ComponentNotFoundException(id); + + // Check for existing releases using this component + var hasReleases = await _store.HasReleasesAsync(id, ct); + if (hasReleases) + { + throw new ComponentInUseException(id, + "Cannot delete component with existing releases"); + } + + await _store.DeleteAsync(id, ct); + + await _eventPublisher.PublishAsync(new ComponentDeleted( + component.Id, + component.TenantId, + component.Name, + _timeProvider.GetUtcNow() + ), ct); + } +} +``` + +### ComponentDiscovery + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Component; + +public sealed class ComponentDiscovery +{ + private readonly IRegistryConnectorFactory _registryFactory; + private readonly IComponentRegistry _registry; + private readonly ILogger _logger; + + public async Task> DiscoverAsync( + string registryUrl, + string repositoryPattern, + CancellationToken ct = default) + { + var connector = await _registryFactory.GetConnectorAsync(registryUrl, ct); + var repositories = await connector.ListRepositoriesAsync(repositoryPattern, ct); + + var discovered = new List(); + + foreach (var repo in repositories) + { + var existing = await _registry.GetByNameAsync( + NormalizeComponentName(repo), ct); + + discovered.Add(new DiscoveredComponent( + RegistryUrl: registryUrl, + Repository: repo, + SuggestedName: NormalizeComponentName(repo), + AlreadyRegistered: existing is not null, + ExistingComponentId: existing?.Id + )); + } + + return discovered.AsReadOnly(); + } + + public async Task> ImportDiscoveredAsync( + IReadOnlyList components, + CancellationToken ct = default) + { + var imported = new List(); + + foreach (var discovered in components.Where(c => !c.AlreadyRegistered)) + { + try + { + var component = await _registry.RegisterAsync( + new RegisterComponentRequest( + Name: discovered.SuggestedName, + DisplayName: FormatDisplayName(discovered.SuggestedName), + RegistryUrl: discovered.RegistryUrl, + Repository: discovered.Repository + ), ct); + + imported.Add(component); + } + catch (Exception ex) + { + _logger.LogWarning(ex, + "Failed to import discovered component {Repository}", + discovered.Repository); + } + } + + return imported.AsReadOnly(); + } + + private static string NormalizeComponentName(string repository) => + repository.Replace("/", "-").ToLowerInvariant(); + + private static string FormatDisplayName(string name) => + CultureInfo.InvariantCulture.TextInfo.ToTitleCase( + name.Replace("-", " ").Replace("_", " ")); +} + +public sealed record DiscoveredComponent( + string RegistryUrl, + string Repository, + string SuggestedName, + bool AlreadyRegistered, + Guid? ExistingComponentId +); +``` + +### ComponentValidator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Component; + +public sealed class ComponentValidator : IComponentValidator +{ + private readonly IComponentStore _store; + + public async Task ValidateRegisterAsync( + RegisterComponentRequest request, + CancellationToken ct = default) + { + var errors = new List(); + + // Name validation + if (!IsValidComponentName(request.Name)) + { + errors.Add("Component name must be lowercase alphanumeric with hyphens, 2-64 characters"); + } + + // Check for duplicate name + var existing = await _store.GetByNameAsync(request.Name, ct); + if (existing is not null) + { + errors.Add($"Component with name '{request.Name}' already exists"); + } + + // Check for duplicate registry/repository combination + var duplicate = await _store.GetByRegistryAndRepositoryAsync( + request.RegistryUrl, + request.Repository, + ct); + if (duplicate is not null) + { + errors.Add($"Component already registered for {request.RegistryUrl}/{request.Repository}"); + } + + // Validate registry URL format + if (!Uri.TryCreate($"https://{request.RegistryUrl}", UriKind.Absolute, out _)) + { + errors.Add("Invalid registry URL format"); + } + + // Validate tag pattern if specified + if (request.Config?.TagPattern is not null) + { + try + { + _ = new Regex(request.Config.TagPattern); + } + catch (RegexParseException) + { + errors.Add("Invalid tag pattern regex"); + } + } + + return errors.Count == 0 + ? ValidationResult.Success() + : ValidationResult.Failure(errors); + } + + private static bool IsValidComponentName(string name) => + Regex.IsMatch(name, @"^[a-z][a-z0-9-]{1,63}$"); +} +``` + +### Domain Events + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Events; + +public sealed record ComponentRegistered( + Guid ComponentId, + Guid TenantId, + string Name, + string FullImageRef, + DateTimeOffset RegisteredAt +) : IDomainEvent; + +public sealed record ComponentUpdated( + Guid ComponentId, + Guid TenantId, + IReadOnlyList ChangedFields, + DateTimeOffset UpdatedAt +) : IDomainEvent; + +public sealed record ComponentDeprecated( + Guid ComponentId, + Guid TenantId, + string Name, + string Reason, + DateTimeOffset DeprecatedAt +) : IDomainEvent; + +public sealed record ComponentDeleted( + Guid ComponentId, + Guid TenantId, + string Name, + DateTimeOffset DeletedAt +) : IDomainEvent; +``` + +--- + +## Acceptance Criteria + +- [ ] Register component with registry/repository +- [ ] Validate registry connectivity on register +- [ ] Check for duplicate components +- [ ] List components with filters +- [ ] Deprecate component with reason +- [ ] Reactivate deprecated component +- [ ] Delete component (only if no releases) +- [ ] Discover components from registry +- [ ] Import discovered components +- [ ] Unit test coverage >=85% + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `RegisterComponent_ValidRequest_Succeeds` | Registration works | +| `RegisterComponent_DuplicateName_Fails` | Duplicate name rejected | +| `RegisterComponent_InvalidRegistry_Fails` | Bad registry rejected | +| `DeprecateComponent_Active_Succeeds` | Deprecation works | +| `DeleteComponent_WithReleases_Fails` | In-use check works | +| `DiscoverComponents_ReturnsRepositories` | Discovery works | +| `ImportDiscovered_CreatesComponents` | Import works | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `ComponentLifecycle_E2E` | Full CRUD cycle | +| `RegistryDiscovery_E2E` | Discovery from real registry | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 102_004 Registry Connectors | Internal | TODO | +| 101_001 Database Schema | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IComponentRegistry | TODO | | +| ComponentRegistry | TODO | | +| ComponentValidator | TODO | | +| ComponentDiscovery | TODO | | +| IComponentStore | TODO | | +| ComponentStore | TODO | | +| Domain events | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_104_002_RELMAN_version_manager.md b/docs/implplan/SPRINT_20260110_104_002_RELMAN_version_manager.md new file mode 100644 index 000000000..c2987e259 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_104_002_RELMAN_version_manager.md @@ -0,0 +1,527 @@ +# SPRINT: Version Manager + +> **Sprint ID:** 104_002 +> **Module:** RELMAN +> **Phase:** 4 - Release Manager +> **Status:** TODO +> **Parent:** [104_000_INDEX](SPRINT_20260110_104_000_INDEX_release_manager.md) + +--- + +## Overview + +Implement the Version Manager for digest-first version tracking of container images. + +### Objectives + +- Resolve tags to immutable digests +- Track component versions with metadata +- Watch for new versions from registries +- Support semantic versioning extraction + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Release/ +│ ├── Version/ +│ │ ├── IVersionManager.cs +│ │ ├── VersionManager.cs +│ │ ├── VersionResolver.cs +│ │ ├── VersionWatcher.cs +│ │ └── SemVerExtractor.cs +│ ├── Store/ +│ │ ├── IVersionStore.cs +│ │ └── VersionStore.cs +│ └── Models/ +│ ├── ComponentVersion.cs +│ └── VersionMetadata.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Release.Tests/ + └── Version/ +``` + +--- + +## Architecture Reference + +- [Release Manager](../modules/release-orchestrator/modules/release-manager.md) +- [Data Model Schema](../modules/release-orchestrator/data-model/schema.md) + +--- + +## Deliverables + +### IVersionManager Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Version; + +public interface IVersionManager +{ + Task ResolveAsync( + Guid componentId, + string tagOrDigest, + CancellationToken ct = default); + + Task GetByDigestAsync( + Guid componentId, + string digest, + CancellationToken ct = default); + + Task GetLatestAsync( + Guid componentId, + CancellationToken ct = default); + + Task> ListAsync( + Guid componentId, + VersionFilter? filter = null, + CancellationToken ct = default); + + Task> ListLatestAsync( + Guid componentId, + int count = 10, + CancellationToken ct = default); + + Task RecordVersionAsync( + RecordVersionRequest request, + CancellationToken ct = default); + + Task DigestExistsAsync( + Guid componentId, + string digest, + CancellationToken ct = default); +} + +public sealed record VersionFilter( + string? DigestPrefix = null, + string? TagContains = null, + SemanticVersion? MinVersion = null, + SemanticVersion? MaxVersion = null, + DateTimeOffset? DiscoveredAfter = null, + DateTimeOffset? DiscoveredBefore = null +); + +public sealed record RecordVersionRequest( + Guid ComponentId, + string Digest, + string? Tag = null, + SemanticVersion? SemVer = null, + VersionMetadata? Metadata = null +); +``` + +### ComponentVersion Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Models; + +public sealed record ComponentVersion +{ + public required Guid Id { get; init; } + public required Guid ComponentId { get; init; } + public required string Digest { get; init; } // sha256:abc123... + public string? Tag { get; init; } // v2.3.1, latest, etc. + public SemanticVersion? SemVer { get; init; } // Parsed semantic version + public VersionMetadata Metadata { get; init; } = new(); + public required DateTimeOffset DiscoveredAt { get; init; } + public DateTimeOffset? BuiltAt { get; init; } + public Guid? DiscoveredBy { get; init; } // User or system + + public string ShortDigest => Digest.Length > 19 + ? Digest[7..19] // sha256: prefix + 12 chars + : Digest; +} + +public sealed record SemanticVersion( + int Major, + int Minor, + int Patch, + string? Prerelease = null, + string? BuildMetadata = null +) : IComparable +{ + public override string ToString() + { + var version = $"{Major}.{Minor}.{Patch}"; + if (Prerelease is not null) + version += $"-{Prerelease}"; + if (BuildMetadata is not null) + version += $"+{BuildMetadata}"; + return version; + } + + public int CompareTo(SemanticVersion? other) + { + if (other is null) return 1; + + var majorCmp = Major.CompareTo(other.Major); + if (majorCmp != 0) return majorCmp; + + var minorCmp = Minor.CompareTo(other.Minor); + if (minorCmp != 0) return minorCmp; + + var patchCmp = Patch.CompareTo(other.Patch); + if (patchCmp != 0) return patchCmp; + + // Prerelease versions have lower precedence + if (Prerelease is null && other.Prerelease is not null) return 1; + if (Prerelease is not null && other.Prerelease is null) return -1; + + return string.Compare(Prerelease, other.Prerelease, + StringComparison.OrdinalIgnoreCase); + } +} + +public sealed record VersionMetadata +{ + public long? SizeBytes { get; init; } + public string? Architecture { get; init; } + public string? Os { get; init; } + public string? Author { get; init; } + public DateTimeOffset? CreatedAt { get; init; } + public ImmutableDictionary Labels { get; init; } = + ImmutableDictionary.Empty; + public ImmutableArray Layers { get; init; } = []; +} +``` + +### VersionResolver + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Version; + +public sealed class VersionResolver +{ + private readonly IRegistryConnectorFactory _registryFactory; + private readonly IComponentStore _componentStore; + private readonly ILogger _logger; + + public async Task ResolveAsync( + Guid componentId, + string tagOrDigest, + CancellationToken ct = default) + { + var component = await _componentStore.GetAsync(componentId, ct) + ?? throw new ComponentNotFoundException(componentId); + + var connector = await _registryFactory.GetConnectorAsync( + component.RegistryUrl, ct); + + // Check if already a digest + if (IsDigest(tagOrDigest)) + { + var manifest = await connector.GetManifestAsync( + component.Repository, + tagOrDigest, + ct); + + return new ResolvedVersion( + Digest: tagOrDigest, + Tag: null, + Manifest: manifest, + ResolvedAt: TimeProvider.System.GetUtcNow() + ); + } + + // Resolve tag to digest + var tag = tagOrDigest; + var resolvedDigest = await connector.ResolveTagAsync( + component.Repository, + tag, + ct); + + var manifestData = await connector.GetManifestAsync( + component.Repository, + resolvedDigest, + ct); + + _logger.LogDebug( + "Resolved {Component}:{Tag} to {Digest}", + component.Name, + tag, + resolvedDigest); + + return new ResolvedVersion( + Digest: resolvedDigest, + Tag: tag, + Manifest: manifestData, + ResolvedAt: TimeProvider.System.GetUtcNow() + ); + } + + private static bool IsDigest(string value) => + value.StartsWith("sha256:", StringComparison.OrdinalIgnoreCase) && + value.Length == 71; // sha256: + 64 hex chars +} + +public sealed record ResolvedVersion( + string Digest, + string? Tag, + ManifestData Manifest, + DateTimeOffset ResolvedAt +); + +public sealed record ManifestData( + string MediaType, + long TotalSize, + string? Architecture, + string? Os, + IReadOnlyList Layers, + IReadOnlyDictionary Labels, + DateTimeOffset? CreatedAt +); + +public sealed record LayerInfo( + string Digest, + long Size, + string MediaType +); +``` + +### VersionWatcher + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Version; + +public sealed class VersionWatcher : IHostedService, IDisposable +{ + private readonly IComponentRegistry _componentRegistry; + private readonly IVersionManager _versionManager; + private readonly IRegistryConnectorFactory _registryFactory; + private readonly IEventPublisher _eventPublisher; + private readonly ILogger _logger; + private readonly TimeSpan _pollInterval = TimeSpan.FromMinutes(5); + private Timer? _timer; + + public Task StartAsync(CancellationToken ct) + { + _timer = new Timer( + PollForNewVersions, + null, + TimeSpan.FromMinutes(1), + _pollInterval); + + return Task.CompletedTask; + } + + public Task StopAsync(CancellationToken ct) + { + _timer?.Change(Timeout.Infinite, 0); + return Task.CompletedTask; + } + + private async void PollForNewVersions(object? state) + { + try + { + var components = await _componentRegistry.ListActiveAsync(); + + foreach (var component in components) + { + if (component.Config?.WatchForNewVersions != true) + continue; + + await CheckForNewVersionsAsync(component); + } + } + catch (Exception ex) + { + _logger.LogError(ex, "Version watch poll failed"); + } + } + + private async Task CheckForNewVersionsAsync(Component component) + { + try + { + var connector = await _registryFactory.GetConnectorAsync( + component.RegistryUrl); + + var tags = await connector.ListTagsAsync( + component.Repository, + component.Config?.TagPattern); + + foreach (var tag in tags) + { + var digest = await connector.ResolveTagAsync( + component.Repository, + tag); + + var exists = await _versionManager.DigestExistsAsync( + component.Id, + digest); + + if (!exists) + { + var version = await _versionManager.RecordVersionAsync( + new RecordVersionRequest( + ComponentId: component.Id, + Digest: digest, + Tag: tag, + SemVer: SemVerExtractor.TryParse(tag) + )); + + await _eventPublisher.PublishAsync(new NewVersionDiscovered( + component.Id, + component.TenantId, + component.Name, + version.Digest, + tag, + TimeProvider.System.GetUtcNow() + )); + + _logger.LogInformation( + "Discovered new version for {Component}: {Tag} ({Digest})", + component.Name, + tag, + version.ShortDigest); + } + } + } + catch (Exception ex) + { + _logger.LogWarning(ex, + "Failed to check versions for component {Component}", + component.Name); + } + } + + public void Dispose() => _timer?.Dispose(); +} +``` + +### SemVerExtractor + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Version; + +public static class SemVerExtractor +{ + private static readonly Regex SemVerPattern = new( + @"^v?(?\d+)\.(?\d+)\.(?\d+)" + + @"(?:-(?[0-9A-Za-z-]+(?:\.[0-9A-Za-z-]+)*))?" + + @"(?:\+(?[0-9A-Za-z-]+(?:\.[0-9A-Za-z-]+)*))?$", + RegexOptions.Compiled | RegexOptions.CultureInvariant); + + public static SemanticVersion? TryParse(string? tag) + { + if (string.IsNullOrEmpty(tag)) + return null; + + var match = SemVerPattern.Match(tag); + if (!match.Success) + return null; + + return new SemanticVersion( + Major: int.Parse(match.Groups["major"].Value, CultureInfo.InvariantCulture), + Minor: int.Parse(match.Groups["minor"].Value, CultureInfo.InvariantCulture), + Patch: int.Parse(match.Groups["patch"].Value, CultureInfo.InvariantCulture), + Prerelease: match.Groups["prerelease"].Success + ? match.Groups["prerelease"].Value + : null, + BuildMetadata: match.Groups["build"].Success + ? match.Groups["build"].Value + : null + ); + } + + public static bool IsValidSemVer(string tag) => + SemVerPattern.IsMatch(tag); +} +``` + +### Domain Events + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Events; + +public sealed record NewVersionDiscovered( + Guid ComponentId, + Guid TenantId, + string ComponentName, + string Digest, + string? Tag, + DateTimeOffset DiscoveredAt +) : IDomainEvent; + +public sealed record VersionResolved( + Guid ComponentId, + Guid TenantId, + string Tag, + string Digest, + DateTimeOffset ResolvedAt +) : IDomainEvent; +``` + +--- + +## Acceptance Criteria + +- [ ] Resolve tag to digest +- [ ] Resolve digest returns same digest +- [ ] Record new version with metadata +- [ ] Extract semantic version from tag +- [ ] Watch for new versions +- [ ] Filter versions by criteria +- [ ] Get latest version for component +- [ ] List versions with pagination +- [ ] Unit test coverage >=85% + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `ResolveTag_ReturnsDigest` | Tag resolution works | +| `ResolveDigest_ReturnsSameDigest` | Digest passthrough works | +| `RecordVersion_StoresMetadata` | Recording works | +| `SemVerExtractor_ParsesValid` | SemVer parsing works | +| `SemVerExtractor_RejectsInvalid` | Invalid tags rejected | +| `GetLatest_ReturnsNewest` | Latest selection works | +| `ListVersions_AppliesFilter` | Filtering works | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `VersionResolution_E2E` | Full resolution flow | +| `VersionWatcher_E2E` | Discovery polling | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 104_001 Component Registry | Internal | TODO | +| 102_004 Registry Connectors | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IVersionManager | TODO | | +| VersionManager | TODO | | +| VersionResolver | TODO | | +| VersionWatcher | TODO | | +| SemVerExtractor | TODO | | +| ComponentVersion model | TODO | | +| IVersionStore | TODO | | +| VersionStore | TODO | | +| Domain events | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_104_003_RELMAN_release_manager.md b/docs/implplan/SPRINT_20260110_104_003_RELMAN_release_manager.md new file mode 100644 index 000000000..fb940e93c --- /dev/null +++ b/docs/implplan/SPRINT_20260110_104_003_RELMAN_release_manager.md @@ -0,0 +1,628 @@ +# SPRINT: Release Manager + +> **Sprint ID:** 104_003 +> **Module:** RELMAN +> **Phase:** 4 - Release Manager +> **Status:** TODO +> **Parent:** [104_000_INDEX](SPRINT_20260110_104_000_INDEX_release_manager.md) + +--- + +## Overview + +Implement the Release Manager for creating and managing release bundles containing multiple component versions. + +### Objectives + +- Create release bundles with multiple components +- Add/remove components from draft releases +- Finalize releases to lock component versions +- Generate release manifests + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Release/ +│ ├── Manager/ +│ │ ├── IReleaseManager.cs +│ │ ├── ReleaseManager.cs +│ │ ├── ReleaseValidator.cs +│ │ ├── ReleaseFinalizer.cs +│ │ └── ReleaseManifestGenerator.cs +│ ├── Store/ +│ │ ├── IReleaseStore.cs +│ │ └── ReleaseStore.cs +│ └── Models/ +│ ├── Release.cs +│ ├── ReleaseComponent.cs +│ ├── ReleaseStatus.cs +│ └── ReleaseManifest.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Release.Tests/ + └── Manager/ +``` + +--- + +## Architecture Reference + +- [Release Manager](../modules/release-orchestrator/modules/release-manager.md) +- [Data Model Schema](../modules/release-orchestrator/data-model/schema.md) + +--- + +## Deliverables + +### IReleaseManager Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Manager; + +public interface IReleaseManager +{ + // CRUD + Task CreateAsync(CreateReleaseRequest request, CancellationToken ct = default); + Task UpdateAsync(Guid id, UpdateReleaseRequest request, CancellationToken ct = default); + Task GetAsync(Guid id, CancellationToken ct = default); + Task GetByNameAsync(string name, CancellationToken ct = default); + Task DeleteAsync(Guid id, CancellationToken ct = default); + + // Component management + Task AddComponentAsync(Guid releaseId, AddComponentRequest request, CancellationToken ct = default); + Task UpdateComponentAsync(Guid releaseId, Guid componentId, UpdateReleaseComponentRequest request, CancellationToken ct = default); + Task RemoveComponentAsync(Guid releaseId, Guid componentId, CancellationToken ct = default); + + // Lifecycle + Task FinalizeAsync(Guid id, CancellationToken ct = default); + Task DeprecateAsync(Guid id, string reason, CancellationToken ct = default); + + // Manifest + Task GetManifestAsync(Guid id, CancellationToken ct = default); +} + +public sealed record CreateReleaseRequest( + string Name, + string DisplayName, + string? Description = null, + IReadOnlyList? Components = null +); + +public sealed record UpdateReleaseRequest( + string? DisplayName = null, + string? Description = null +); + +public sealed record AddComponentRequest( + Guid ComponentId, + string VersionRef, // Tag or digest + IReadOnlyDictionary? Config = null +); + +public sealed record UpdateReleaseComponentRequest( + string? VersionRef = null, + IReadOnlyDictionary? Config = null +); +``` + +### Release Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Models; + +public sealed record Release +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required string Name { get; init; } + public required string DisplayName { get; init; } + public string? Description { get; init; } + public required ReleaseStatus Status { get; init; } + public required ImmutableArray Components { get; init; } + public string? ManifestDigest { get; init; } // Set on finalization + public DateTimeOffset? FinalizedAt { get; init; } + public Guid? FinalizedBy { get; init; } + public string? DeprecationReason { get; init; } + public DateTimeOffset? DeprecatedAt { get; init; } + public DateTimeOffset CreatedAt { get; init; } + public DateTimeOffset UpdatedAt { get; init; } + public Guid CreatedBy { get; init; } + + public bool IsDraft => Status == ReleaseStatus.Draft; + public bool IsFinalized => Status != ReleaseStatus.Draft; +} + +public enum ReleaseStatus +{ + Draft, // Can be modified + Ready, // Finalized, can be promoted + Promoting, // Currently being promoted + Deployed, // Deployed to at least one environment + Deprecated // Should not be used +} + +public sealed record ReleaseComponent +{ + public required Guid Id { get; init; } + public required Guid ComponentId { get; init; } + public required string ComponentName { get; init; } + public required string Digest { get; init; } + public string? Tag { get; init; } + public string? SemVer { get; init; } + public ImmutableDictionary Config { get; init; } = + ImmutableDictionary.Empty; + public int OrderIndex { get; init; } +} +``` + +### ReleaseManager Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Manager; + +public sealed class ReleaseManager : IReleaseManager +{ + private readonly IReleaseStore _store; + private readonly IReleaseValidator _validator; + private readonly IVersionManager _versionManager; + private readonly IComponentRegistry _componentRegistry; + private readonly IReleaseFinalizer _finalizer; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + private readonly ILogger _logger; + + public async Task CreateAsync( + CreateReleaseRequest request, + CancellationToken ct = default) + { + var validation = await _validator.ValidateCreateAsync(request, ct); + if (!validation.IsValid) + { + throw new ReleaseValidationException(validation.Errors); + } + + var release = new Release + { + Id = _guidGenerator.NewGuid(), + TenantId = _tenantContext.TenantId, + Name = request.Name, + DisplayName = request.DisplayName, + Description = request.Description, + Status = ReleaseStatus.Draft, + Components = [], + CreatedAt = _timeProvider.GetUtcNow(), + UpdatedAt = _timeProvider.GetUtcNow(), + CreatedBy = _userContext.UserId + }; + + await _store.SaveAsync(release, ct); + + // Add initial components if provided + if (request.Components?.Count > 0) + { + foreach (var compRequest in request.Components) + { + release = await AddComponentInternalAsync(release, compRequest, ct); + } + } + + await _eventPublisher.PublishAsync(new ReleaseCreated( + release.Id, + release.TenantId, + release.Name, + _timeProvider.GetUtcNow() + ), ct); + + return release; + } + + public async Task AddComponentAsync( + Guid releaseId, + AddComponentRequest request, + CancellationToken ct = default) + { + var release = await _store.GetAsync(releaseId, ct) + ?? throw new ReleaseNotFoundException(releaseId); + + if (!release.IsDraft) + { + throw new ReleaseNotEditableException(releaseId, + "Cannot modify finalized release"); + } + + return await AddComponentInternalAsync(release, request, ct); + } + + private async Task AddComponentInternalAsync( + Release release, + AddComponentRequest request, + CancellationToken ct) + { + // Check component exists + var component = await _componentRegistry.GetAsync(request.ComponentId, ct) + ?? throw new ComponentNotFoundException(request.ComponentId); + + // Check for duplicate component + if (release.Components.Any(c => c.ComponentId == request.ComponentId)) + { + throw new DuplicateReleaseComponentException(release.Id, request.ComponentId); + } + + // Resolve version + var version = await _versionManager.ResolveAsync( + request.ComponentId, + request.VersionRef, + ct); + + var releaseComponent = new ReleaseComponent + { + Id = _guidGenerator.NewGuid(), + ComponentId = component.Id, + ComponentName = component.Name, + Digest = version.Digest, + Tag = version.Tag, + SemVer = version.SemVer?.ToString(), + Config = request.Config?.ToImmutableDictionary() ?? + ImmutableDictionary.Empty, + OrderIndex = release.Components.Length + }; + + var updatedRelease = release with + { + Components = release.Components.Add(releaseComponent), + UpdatedAt = _timeProvider.GetUtcNow() + }; + + await _store.SaveAsync(updatedRelease, ct); + + _logger.LogInformation( + "Added component {Component}@{Digest} to release {Release}", + component.Name, + version.ShortDigest, + release.Name); + + return updatedRelease; + } + + public async Task FinalizeAsync( + Guid id, + CancellationToken ct = default) + { + var release = await _store.GetAsync(id, ct) + ?? throw new ReleaseNotFoundException(id); + + if (!release.IsDraft) + { + throw new ReleaseAlreadyFinalizedException(id); + } + + // Validate release is complete + var validation = await _validator.ValidateFinalizeAsync(release, ct); + if (!validation.IsValid) + { + throw new ReleaseValidationException(validation.Errors); + } + + // Generate manifest and digest + var (manifest, manifestDigest) = await _finalizer.FinalizeAsync(release, ct); + + var finalizedRelease = release with + { + Status = ReleaseStatus.Ready, + ManifestDigest = manifestDigest, + FinalizedAt = _timeProvider.GetUtcNow(), + FinalizedBy = _userContext.UserId, + UpdatedAt = _timeProvider.GetUtcNow() + }; + + await _store.SaveAsync(finalizedRelease, ct); + await _store.SaveManifestAsync(id, manifest, ct); + + await _eventPublisher.PublishAsync(new ReleaseFinalized( + release.Id, + release.TenantId, + release.Name, + manifestDigest, + release.Components.Length, + _timeProvider.GetUtcNow() + ), ct); + + _logger.LogInformation( + "Finalized release {Release} with {Count} components (manifest: {Digest})", + release.Name, + release.Components.Length, + manifestDigest[..16]); + + return finalizedRelease; + } + + public async Task DeleteAsync(Guid id, CancellationToken ct = default) + { + var release = await _store.GetAsync(id, ct) + ?? throw new ReleaseNotFoundException(id); + + if (!release.IsDraft) + { + throw new ReleaseNotEditableException(id, + "Cannot delete finalized release"); + } + + await _store.DeleteAsync(id, ct); + + await _eventPublisher.PublishAsync(new ReleaseDeleted( + release.Id, + release.TenantId, + release.Name, + _timeProvider.GetUtcNow() + ), ct); + } +} +``` + +### ReleaseFinalizer + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Manager; + +public sealed class ReleaseFinalizer : IReleaseFinalizer +{ + private readonly IReleaseManifestGenerator _manifestGenerator; + private readonly ILogger _logger; + + public async Task<(ReleaseManifest Manifest, string Digest)> FinalizeAsync( + Release release, + CancellationToken ct = default) + { + // Generate canonical manifest + var manifest = await _manifestGenerator.GenerateAsync(release, ct); + + // Compute digest of canonical JSON + var canonicalJson = CanonicalJsonSerializer.Serialize(manifest); + var digest = ComputeDigest(canonicalJson); + + _logger.LogDebug( + "Generated manifest for release {Release}: {Digest}", + release.Name, + digest); + + return (manifest, digest); + } + + private static string ComputeDigest(string content) + { + var bytes = Encoding.UTF8.GetBytes(content); + var hash = SHA256.HashData(bytes); + return $"sha256:{Convert.ToHexString(hash).ToLowerInvariant()}"; + } +} +``` + +### ReleaseManifest + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Models; + +public sealed record ReleaseManifest +{ + public required string SchemaVersion { get; init; } = "1.0"; + public required string Name { get; init; } + public required string DisplayName { get; init; } + public string? Description { get; init; } + public required DateTimeOffset FinalizedAt { get; init; } + public required string FinalizedBy { get; init; } + public required ImmutableArray Components { get; init; } + public required ManifestMetadata Metadata { get; init; } +} + +public sealed record ManifestComponent( + string Name, + string Registry, + string Repository, + string Digest, + string? Tag, + string? SemVer, + int Order +); + +public sealed record ManifestMetadata( + string TenantId, + string CreatedBy, + DateTimeOffset CreatedAt, + int TotalComponents, + long? TotalSizeBytes +); +``` + +### ReleaseValidator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Manager; + +public sealed class ReleaseValidator : IReleaseValidator +{ + private readonly IReleaseStore _store; + + public async Task ValidateCreateAsync( + CreateReleaseRequest request, + CancellationToken ct = default) + { + var errors = new List(); + + // Name format validation + if (!IsValidReleaseName(request.Name)) + { + errors.Add("Release name must be lowercase alphanumeric with hyphens, 2-64 characters"); + } + + // Check for duplicate name + var existing = await _store.GetByNameAsync(request.Name, ct); + if (existing is not null) + { + errors.Add($"Release with name '{request.Name}' already exists"); + } + + return errors.Count == 0 + ? ValidationResult.Success() + : ValidationResult.Failure(errors); + } + + public async Task ValidateFinalizeAsync( + Release release, + CancellationToken ct = default) + { + var errors = new List(); + + // Must have at least one component + if (release.Components.Length == 0) + { + errors.Add("Release must have at least one component"); + } + + // All components must have valid digests + foreach (var component in release.Components) + { + if (string.IsNullOrEmpty(component.Digest)) + { + errors.Add($"Component {component.ComponentName} has no digest"); + } + + if (!IsValidDigest(component.Digest)) + { + errors.Add($"Component {component.ComponentName} has invalid digest format"); + } + } + + return errors.Count == 0 + ? ValidationResult.Success() + : ValidationResult.Failure(errors); + } + + private static bool IsValidReleaseName(string name) => + Regex.IsMatch(name, @"^[a-z][a-z0-9-]{1,63}$"); + + private static bool IsValidDigest(string digest) => + digest.StartsWith("sha256:", StringComparison.OrdinalIgnoreCase) && + digest.Length == 71; +} +``` + +### Domain Events + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Events; + +public sealed record ReleaseCreated( + Guid ReleaseId, + Guid TenantId, + string Name, + DateTimeOffset CreatedAt +) : IDomainEvent; + +public sealed record ReleaseComponentAdded( + Guid ReleaseId, + Guid TenantId, + Guid ComponentId, + string ComponentName, + string Digest, + DateTimeOffset AddedAt +) : IDomainEvent; + +public sealed record ReleaseFinalized( + Guid ReleaseId, + Guid TenantId, + string Name, + string ManifestDigest, + int ComponentCount, + DateTimeOffset FinalizedAt +) : IDomainEvent; + +public sealed record ReleaseDeprecated( + Guid ReleaseId, + Guid TenantId, + string Name, + string Reason, + DateTimeOffset DeprecatedAt +) : IDomainEvent; + +public sealed record ReleaseDeleted( + Guid ReleaseId, + Guid TenantId, + string Name, + DateTimeOffset DeletedAt +) : IDomainEvent; +``` + +--- + +## Acceptance Criteria + +- [ ] Create draft release +- [ ] Add components to draft release +- [ ] Remove components from draft release +- [ ] Finalize release locks versions +- [ ] Cannot modify finalized release +- [ ] Generate release manifest +- [ ] Compute manifest digest +- [ ] Deprecate release +- [ ] Delete only draft releases +- [ ] Unit test coverage >=85% + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `CreateRelease_ValidRequest_Succeeds` | Creation works | +| `AddComponent_ToDraft_Succeeds` | Add component works | +| `AddComponent_ToFinalized_Fails` | Finalized protection works | +| `AddComponent_Duplicate_Fails` | Duplicate check works | +| `FinalizeRelease_GeneratesManifest` | Finalization works | +| `FinalizeRelease_NoComponents_Fails` | Validation works | +| `DeleteRelease_Draft_Succeeds` | Draft deletion works | +| `DeleteRelease_Finalized_Fails` | Finalized protection works | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `ReleaseLifecycle_E2E` | Full create-add-finalize flow | +| `ManifestGeneration_E2E` | Manifest correctness | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 104_002 Version Manager | Internal | TODO | +| 104_001 Component Registry | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IReleaseManager | TODO | | +| ReleaseManager | TODO | | +| ReleaseValidator | TODO | | +| ReleaseFinalizer | TODO | | +| ReleaseManifestGenerator | TODO | | +| Release model | TODO | | +| IReleaseStore | TODO | | +| ReleaseStore | TODO | | +| Domain events | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_104_004_RELMAN_release_catalog.md b/docs/implplan/SPRINT_20260110_104_004_RELMAN_release_catalog.md new file mode 100644 index 000000000..11ef0caf0 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_104_004_RELMAN_release_catalog.md @@ -0,0 +1,623 @@ +# SPRINT: Release Catalog + +> **Sprint ID:** 104_004 +> **Module:** RELMAN +> **Phase:** 4 - Release Manager +> **Status:** TODO +> **Parent:** [104_000_INDEX](SPRINT_20260110_104_000_INDEX_release_manager.md) + +--- + +## Overview + +Implement the Release Catalog for querying releases and tracking deployment history. + +### Objectives + +- Query releases with filtering and pagination +- Track release status transitions +- Maintain deployment history per environment +- Support release comparison + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Release/ +│ ├── Catalog/ +│ │ ├── IReleaseCatalog.cs +│ │ ├── ReleaseCatalog.cs +│ │ ├── ReleaseStatusMachine.cs +│ │ └── ReleaseComparer.cs +│ ├── History/ +│ │ ├── IReleaseHistory.cs +│ │ ├── ReleaseHistory.cs +│ │ └── DeploymentRecord.cs +│ └── Models/ +│ ├── ReleaseDeploymentHistory.cs +│ └── ReleaseComparison.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Release.Tests/ + └── Catalog/ +``` + +--- + +## Architecture Reference + +- [Release Manager](../modules/release-orchestrator/modules/release-manager.md) +- [Data Model Schema](../modules/release-orchestrator/data-model/schema.md) + +--- + +## Deliverables + +### IReleaseCatalog Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Catalog; + +public interface IReleaseCatalog +{ + // Queries + Task> ListAsync( + ReleaseFilter? filter = null, + CancellationToken ct = default); + + Task> ListPagedAsync( + ReleaseFilter? filter, + PaginationParams pagination, + CancellationToken ct = default); + + Task GetLatestAsync(CancellationToken ct = default); + + Task GetLatestDeployedAsync( + Guid environmentId, + CancellationToken ct = default); + + Task> GetDeployedReleasesAsync( + Guid environmentId, + CancellationToken ct = default); + + // History + Task GetHistoryAsync( + Guid releaseId, + CancellationToken ct = default); + + Task> GetEnvironmentHistoryAsync( + Guid environmentId, + int limit = 50, + CancellationToken ct = default); + + // Comparison + Task CompareAsync( + Guid sourceReleaseId, + Guid targetReleaseId, + CancellationToken ct = default); +} + +public sealed record ReleaseFilter( + string? NameContains = null, + ReleaseStatus? Status = null, + Guid? ComponentId = null, + DateTimeOffset? CreatedAfter = null, + DateTimeOffset? CreatedBefore = null, + DateTimeOffset? FinalizedAfter = null, + DateTimeOffset? FinalizedBefore = null, + bool? HasDeployments = null +); + +public sealed record PaginationParams( + int PageNumber = 1, + int PageSize = 20, + string? SortBy = null, + bool SortDescending = true +); + +public sealed record PagedResult( + IReadOnlyList Items, + int TotalCount, + int PageNumber, + int PageSize, + int TotalPages +); +``` + +### ReleaseDeploymentHistory + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Models; + +public sealed record ReleaseDeploymentHistory +{ + public required Guid ReleaseId { get; init; } + public required string ReleaseName { get; init; } + public required ImmutableArray Deployments { get; init; } + public DateTimeOffset? FirstDeployedAt { get; init; } + public DateTimeOffset? LastDeployedAt { get; init; } + public int TotalDeployments { get; init; } +} + +public sealed record EnvironmentDeployment( + Guid EnvironmentId, + string EnvironmentName, + DeploymentStatus Status, + DateTimeOffset DeployedAt, + Guid DeployedBy, + DateTimeOffset? ReplacedAt, + Guid? ReplacedByReleaseId +); + +public sealed record DeploymentRecord +{ + public required Guid Id { get; init; } + public required Guid EnvironmentId { get; init; } + public required Guid ReleaseId { get; init; } + public required string ReleaseName { get; init; } + public required DeploymentStatus Status { get; init; } + public required DateTimeOffset DeployedAt { get; init; } + public required Guid DeployedBy { get; init; } + public TimeSpan? Duration { get; init; } + public string? Notes { get; init; } +} + +public enum DeploymentStatus +{ + Current, // Currently deployed + Replaced, // Was deployed, replaced by newer + RolledBack, // Was rolled back from + Failed // Deployment failed +} +``` + +### ReleaseStatusMachine + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Catalog; + +public sealed class ReleaseStatusMachine +{ + private static readonly ImmutableDictionary> ValidTransitions = + new Dictionary> + { + [ReleaseStatus.Draft] = [ReleaseStatus.Ready], + [ReleaseStatus.Ready] = [ReleaseStatus.Promoting, ReleaseStatus.Deprecated], + [ReleaseStatus.Promoting] = [ReleaseStatus.Ready, ReleaseStatus.Deployed], + [ReleaseStatus.Deployed] = [ReleaseStatus.Promoting, ReleaseStatus.Deprecated], + [ReleaseStatus.Deprecated] = [] // Terminal state + }.ToImmutableDictionary(); + + public bool CanTransition(ReleaseStatus from, ReleaseStatus to) + { + if (!ValidTransitions.TryGetValue(from, out var validTargets)) + return false; + + return validTargets.Contains(to); + } + + public ValidationResult ValidateTransition(ReleaseStatus from, ReleaseStatus to) + { + if (CanTransition(from, to)) + return ValidationResult.Success(); + + return ValidationResult.Failure( + $"Invalid status transition from {from} to {to}"); + } + + public IReadOnlyList GetValidTransitions(ReleaseStatus current) + { + return ValidTransitions.TryGetValue(current, out var targets) + ? targets + : []; + } +} +``` + +### ReleaseCatalog Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Catalog; + +public sealed class ReleaseCatalog : IReleaseCatalog +{ + private readonly IReleaseStore _releaseStore; + private readonly IDeploymentStore _deploymentStore; + private readonly IEnvironmentService _environmentService; + private readonly ILogger _logger; + + public async Task> ListPagedAsync( + ReleaseFilter? filter, + PaginationParams pagination, + CancellationToken ct = default) + { + var (releases, totalCount) = await _releaseStore.QueryAsync( + filter, + pagination.PageNumber, + pagination.PageSize, + pagination.SortBy, + pagination.SortDescending, + ct); + + var totalPages = (int)Math.Ceiling((double)totalCount / pagination.PageSize); + + return new PagedResult( + Items: releases, + TotalCount: totalCount, + PageNumber: pagination.PageNumber, + PageSize: pagination.PageSize, + TotalPages: totalPages + ); + } + + public async Task GetLatestDeployedAsync( + Guid environmentId, + CancellationToken ct = default) + { + var deployment = await _deploymentStore.GetCurrentDeploymentAsync( + environmentId, ct); + + if (deployment is null) + return null; + + return await _releaseStore.GetAsync(deployment.ReleaseId, ct); + } + + public async Task GetHistoryAsync( + Guid releaseId, + CancellationToken ct = default) + { + var release = await _releaseStore.GetAsync(releaseId, ct) + ?? throw new ReleaseNotFoundException(releaseId); + + var deployments = await _deploymentStore.GetDeploymentsForReleaseAsync( + releaseId, ct); + + var environments = await _environmentService.ListAsync(ct); + var envLookup = environments.ToDictionary(e => e.Id); + + var envDeployments = deployments + .Select(d => new EnvironmentDeployment( + EnvironmentId: d.EnvironmentId, + EnvironmentName: envLookup.TryGetValue(d.EnvironmentId, out var env) + ? env.Name + : "Unknown", + Status: d.Status, + DeployedAt: d.DeployedAt, + DeployedBy: d.DeployedBy, + ReplacedAt: d.ReplacedAt, + ReplacedByReleaseId: d.ReplacedByReleaseId + )) + .ToImmutableArray(); + + return new ReleaseDeploymentHistory + { + ReleaseId = release.Id, + ReleaseName = release.Name, + Deployments = envDeployments, + FirstDeployedAt = envDeployments.MinBy(d => d.DeployedAt)?.DeployedAt, + LastDeployedAt = envDeployments.MaxBy(d => d.DeployedAt)?.DeployedAt, + TotalDeployments = envDeployments.Length + }; + } +} +``` + +### ReleaseComparer + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Catalog; + +public sealed class ReleaseComparer +{ + public ReleaseComparison Compare(Release source, Release target) + { + var sourceComponents = source.Components.ToDictionary(c => c.ComponentId); + var targetComponents = target.Components.ToDictionary(c => c.ComponentId); + + var added = new List(); + var removed = new List(); + var changed = new List(); + var unchanged = new List(); + + // Find added and changed components + foreach (var (componentId, targetComp) in targetComponents) + { + if (!sourceComponents.TryGetValue(componentId, out var sourceComp)) + { + added.Add(new ComponentChange( + ComponentId: componentId, + ComponentName: targetComp.ComponentName, + ChangeType: ComponentChangeType.Added, + OldDigest: null, + NewDigest: targetComp.Digest, + OldTag: null, + NewTag: targetComp.Tag + )); + } + else if (sourceComp.Digest != targetComp.Digest) + { + changed.Add(new ComponentChange( + ComponentId: componentId, + ComponentName: targetComp.ComponentName, + ChangeType: ComponentChangeType.Changed, + OldDigest: sourceComp.Digest, + NewDigest: targetComp.Digest, + OldTag: sourceComp.Tag, + NewTag: targetComp.Tag + )); + } + else + { + unchanged.Add(new ComponentChange( + ComponentId: componentId, + ComponentName: targetComp.ComponentName, + ChangeType: ComponentChangeType.Unchanged, + OldDigest: sourceComp.Digest, + NewDigest: targetComp.Digest, + OldTag: sourceComp.Tag, + NewTag: targetComp.Tag + )); + } + } + + // Find removed components + foreach (var (componentId, sourceComp) in sourceComponents) + { + if (!targetComponents.ContainsKey(componentId)) + { + removed.Add(new ComponentChange( + ComponentId: componentId, + ComponentName: sourceComp.ComponentName, + ChangeType: ComponentChangeType.Removed, + OldDigest: sourceComp.Digest, + NewDigest: null, + OldTag: sourceComp.Tag, + NewTag: null + )); + } + } + + return new ReleaseComparison( + SourceReleaseId: source.Id, + SourceReleaseName: source.Name, + TargetReleaseId: target.Id, + TargetReleaseName: target.Name, + Added: added.ToImmutableArray(), + Removed: removed.ToImmutableArray(), + Changed: changed.ToImmutableArray(), + Unchanged: unchanged.ToImmutableArray(), + HasChanges: added.Count > 0 || removed.Count > 0 || changed.Count > 0 + ); + } +} + +public sealed record ReleaseComparison( + Guid SourceReleaseId, + string SourceReleaseName, + Guid TargetReleaseId, + string TargetReleaseName, + ImmutableArray Added, + ImmutableArray Removed, + ImmutableArray Changed, + ImmutableArray Unchanged, + bool HasChanges +) +{ + public int TotalChanges => Added.Length + Removed.Length + Changed.Length; +} + +public sealed record ComponentChange( + Guid ComponentId, + string ComponentName, + ComponentChangeType ChangeType, + string? OldDigest, + string? NewDigest, + string? OldTag, + string? NewTag +); + +public enum ComponentChangeType +{ + Added, + Removed, + Changed, + Unchanged +} +``` + +### ReleaseHistory Service + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.History; + +public interface IReleaseHistory +{ + Task RecordDeploymentAsync( + Guid releaseId, + Guid environmentId, + Guid deploymentId, + CancellationToken ct = default); + + Task RecordReplacementAsync( + Guid oldReleaseId, + Guid newReleaseId, + Guid environmentId, + CancellationToken ct = default); + + Task RecordRollbackAsync( + Guid fromReleaseId, + Guid toReleaseId, + Guid environmentId, + CancellationToken ct = default); +} + +public sealed class ReleaseHistory : IReleaseHistory +{ + private readonly IDeploymentStore _store; + private readonly IReleaseStore _releaseStore; + private readonly ReleaseStatusMachine _statusMachine; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public async Task RecordDeploymentAsync( + Guid releaseId, + Guid environmentId, + Guid deploymentId, + CancellationToken ct = default) + { + var release = await _releaseStore.GetAsync(releaseId, ct) + ?? throw new ReleaseNotFoundException(releaseId); + + // Mark any existing deployment as replaced + var currentDeployment = await _store.GetCurrentDeploymentAsync( + environmentId, ct); + + if (currentDeployment is not null) + { + await _store.MarkReplacedAsync( + currentDeployment.Id, + releaseId, + _timeProvider.GetUtcNow(), + ct); + } + + // Update release status if first deployment + if (release.Status == ReleaseStatus.Ready || + release.Status == ReleaseStatus.Promoting) + { + var updatedRelease = release with + { + Status = ReleaseStatus.Deployed, + UpdatedAt = _timeProvider.GetUtcNow() + }; + await _releaseStore.SaveAsync(updatedRelease, ct); + } + + _logger.LogInformation( + "Recorded deployment of release {Release} to environment {Environment}", + release.Name, + environmentId); + } + + public async Task RecordRollbackAsync( + Guid fromReleaseId, + Guid toReleaseId, + Guid environmentId, + CancellationToken ct = default) + { + // Mark the from-deployment as rolled back + var currentDeployment = await _store.GetCurrentDeploymentAsync( + environmentId, ct); + + if (currentDeployment?.ReleaseId == fromReleaseId) + { + await _store.MarkRolledBackAsync( + currentDeployment.Id, + _timeProvider.GetUtcNow(), + ct); + } + + await _eventPublisher.PublishAsync(new ReleaseRolledBack( + FromReleaseId: fromReleaseId, + ToReleaseId: toReleaseId, + EnvironmentId: environmentId, + RolledBackAt: _timeProvider.GetUtcNow() + ), ct); + } +} +``` + +### Domain Events + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Events; + +public sealed record ReleaseStatusChanged( + Guid ReleaseId, + Guid TenantId, + ReleaseStatus OldStatus, + ReleaseStatus NewStatus, + DateTimeOffset ChangedAt +) : IDomainEvent; + +public sealed record ReleaseRolledBack( + Guid FromReleaseId, + Guid ToReleaseId, + Guid EnvironmentId, + DateTimeOffset RolledBackAt +) : IDomainEvent; +``` + +--- + +## Acceptance Criteria + +- [ ] List releases with filtering +- [ ] Paginate release list +- [ ] Get latest deployed release for environment +- [ ] Track deployment history +- [ ] Record status transitions +- [ ] Compare two releases +- [ ] Identify added/removed/changed components +- [ ] Record rollback history +- [ ] Status machine validates transitions +- [ ] Unit test coverage >=85% + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `ListReleases_WithFilter_ReturnsFiltered` | Filtering works | +| `ListReleases_Paginated_ReturnsPaged` | Pagination works | +| `GetLatestDeployed_ReturnsCorrect` | Latest lookup works | +| `CompareReleases_DetectsChanges` | Comparison works | +| `StatusMachine_ValidTransitions` | Valid transitions work | +| `StatusMachine_InvalidTransitions_Rejected` | Invalid rejected | +| `RecordDeployment_UpdatesHistory` | History recording works | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `ReleaseCatalog_E2E` | Full query/history flow | +| `ReleaseComparison_E2E` | Comparison accuracy | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 104_003 Release Manager | Internal | TODO | +| 103_001 Environment CRUD | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IReleaseCatalog | TODO | | +| ReleaseCatalog | TODO | | +| ReleaseStatusMachine | TODO | | +| ReleaseComparer | TODO | | +| IReleaseHistory | TODO | | +| ReleaseHistory | TODO | | +| IDeploymentStore | TODO | | +| DeploymentStore | TODO | | +| Domain events | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_105_000_INDEX_workflow_engine.md b/docs/implplan/SPRINT_20260110_105_000_INDEX_workflow_engine.md new file mode 100644 index 000000000..3befbbbe1 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_105_000_INDEX_workflow_engine.md @@ -0,0 +1,263 @@ +# SPRINT INDEX: Phase 5 - Workflow Engine + +> **Epic:** Release Orchestrator +> **Phase:** 5 - Workflow Engine +> **Batch:** 105 +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_release_orchestrator.md) + +--- + +## Overview + +Phase 5 implements the Workflow Engine - DAG-based workflow execution for deployments, promotions, and custom automation. + +### Objectives + +- Workflow template designer with YAML/JSON DSL +- Step registry for built-in and plugin steps +- DAG executor with parallel and sequential execution +- Step executor with retry and timeout handling +- Built-in steps (script, approval, notification) + +--- + +## Sprint Structure + +| Sprint ID | Title | Module | Status | Dependencies | +|-----------|-------|--------|--------|--------------| +| 105_001 | Workflow Template Designer | WORKFL | TODO | 101_001 | +| 105_002 | Step Registry | WORKFL | TODO | 101_002 | +| 105_003 | Workflow Engine - DAG Executor | WORKFL | TODO | 105_001, 105_002 | +| 105_004 | Step Executor | WORKFL | TODO | 105_003 | +| 105_005 | Built-in Steps | WORKFL | TODO | 105_004 | + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ WORKFLOW ENGINE │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ WORKFLOW TEMPLATE (105_001) │ │ +│ │ │ │ +│ │ name: deploy-to-production │ │ +│ │ steps: │ │ +│ │ - id: security-scan │ │ +│ │ type: security-gate │ │ +│ │ - id: approval │ │ +│ │ type: approval │ │ +│ │ dependsOn: [security-scan] │ │ +│ │ - id: deploy │ │ +│ │ type: deploy │ │ +│ │ dependsOn: [approval] │ │ +│ │ - id: notify │ │ +│ │ type: notify │ │ +│ │ dependsOn: [deploy] │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ STEP REGISTRY (105_002) │ │ +│ │ │ │ +│ │ Built-in Steps: Plugin Steps: │ │ +│ │ ├── script ├── custom-gate │ │ +│ │ ├── approval ├── jira-update │ │ +│ │ ├── notify ├── terraform-apply │ │ +│ │ ├── wait └── k8s-rollout │ │ +│ │ ├── security-gate │ │ +│ │ └── deploy │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ DAG EXECUTOR (105_003) │ │ +│ │ │ │ +│ │ ┌─────────────┐ │ │ +│ │ │security-scan│ │ │ +│ │ └──────┬──────┘ │ │ +│ │ │ │ │ +│ │ ▼ │ │ +│ │ ┌─────────────┐ │ │ +│ │ │ approval │ │ │ +│ │ └──────┬──────┘ │ │ +│ │ │ │ │ +│ │ ┌────┴────┐ │ │ +│ │ ▼ ▼ │ │ +│ │ ┌──────┐ ┌──────┐ (parallel) │ │ +│ │ │deploy│ │smoke │ │ │ +│ │ └──┬───┘ └──┬───┘ │ │ +│ │ └────┬───┘ │ │ +│ │ ▼ │ │ +│ │ ┌─────────────┐ │ │ +│ │ │ notify │ │ │ +│ │ └─────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Deliverables Summary + +### 105_001: Workflow Template Designer + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `WorkflowTemplate` | Model | Template entity | +| `WorkflowParser` | Class | YAML/JSON parser | +| `WorkflowValidator` | Class | DAG validation | +| `TemplateStore` | Class | Persistence | + +### 105_002: Step Registry + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IStepRegistry` | Interface | Step lookup | +| `StepRegistry` | Class | Implementation | +| `StepDefinition` | Model | Step metadata | +| `StepSchema` | Class | Config schema | + +### 105_003: DAG Executor + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IWorkflowEngine` | Interface | Execution control | +| `WorkflowEngine` | Class | Implementation | +| `DagScheduler` | Class | Step scheduling | +| `WorkflowRun` | Model | Execution state | + +### 105_004: Step Executor + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IStepExecutor` | Interface | Step execution | +| `StepExecutor` | Class | Implementation | +| `StepContext` | Model | Execution context | +| `StepRetryPolicy` | Class | Retry handling | + +### 105_005: Built-in Steps + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `ScriptStep` | Step | Execute shell scripts | +| `ApprovalStep` | Step | Manual approval | +| `NotifyStep` | Step | Send notifications | +| `WaitStep` | Step | Time delay | +| `SecurityGateStep` | Step | Security check | +| `DeployStep` | Step | Deployment trigger | + +--- + +## Key Interfaces + +```csharp +public interface IWorkflowEngine +{ + Task StartAsync(Guid templateId, WorkflowContext context, CancellationToken ct); + Task ResumeAsync(Guid runId, CancellationToken ct); + Task CancelAsync(Guid runId, CancellationToken ct); + Task GetRunAsync(Guid runId, CancellationToken ct); +} + +public interface IStepExecutor +{ + Task ExecuteAsync(StepDefinition step, StepContext context, CancellationToken ct); +} + +public interface IStepRegistry +{ + void RegisterBuiltIn(string type) where T : IStepProvider; + Task GetAsync(string type, CancellationToken ct); + IReadOnlyList GetAllDefinitions(); +} +``` + +--- + +## Workflow DSL Example + +```yaml +name: production-deployment +version: 1 +triggers: + - type: promotion + environment: production + +steps: + - id: security-check + type: security-gate + config: + maxCritical: 0 + maxHigh: 5 + + - id: lead-approval + type: approval + dependsOn: [security-check] + config: + approvers: ["@release-managers"] + minApprovals: 1 + + - id: deploy + type: deploy + dependsOn: [lead-approval] + config: + strategy: rolling + batchSize: 25% + + - id: smoke-test + type: script + dependsOn: [deploy] + config: + script: ./scripts/smoke-test.sh + timeout: 300 + + - id: notify-success + type: notify + dependsOn: [smoke-test] + condition: success() + config: + channel: slack + message: "Deployment to production succeeded" + + - id: notify-failure + type: notify + dependsOn: [smoke-test] + condition: failure() + config: + channel: slack + message: "Deployment to production FAILED" +``` + +--- + +## Dependencies + +| Module | Purpose | +|--------|---------| +| 101_002 Plugin Registry | Plugin steps | +| 101_003 Plugin Loader | Execute plugin steps | +| 106_* Promotion | Gate integration | + +--- + +## Acceptance Criteria + +- [ ] Workflow templates parse correctly +- [ ] DAG cycle detection works +- [ ] Parallel steps execute concurrently +- [ ] Step dependencies respected +- [ ] Retry policy works +- [ ] Timeout cancels steps +- [ ] Built-in steps functional +- [ ] Workflow state persisted +- [ ] Unit test coverage ≥80% + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Phase 5 index created | diff --git a/docs/implplan/SPRINT_20260110_105_001_WORKFL_workflow_template.md b/docs/implplan/SPRINT_20260110_105_001_WORKFL_workflow_template.md new file mode 100644 index 000000000..ff15baf91 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_105_001_WORKFL_workflow_template.md @@ -0,0 +1,671 @@ +# SPRINT: Workflow Template Designer + +> **Sprint ID:** 105_001 +> **Module:** WORKFL +> **Phase:** 5 - Workflow Engine +> **Status:** TODO +> **Parent:** [105_000_INDEX](SPRINT_20260110_105_000_INDEX_workflow_engine.md) + +--- + +## Overview + +Implement the Workflow Template Designer for defining deployment and automation workflows using YAML/JSON DSL. + +### Objectives + +- Define workflow template data model +- Parse YAML/JSON workflow definitions +- Validate DAG structure (no cycles) +- Store and version workflow templates + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Workflow/ +│ ├── Template/ +│ │ ├── IWorkflowTemplateService.cs +│ │ ├── WorkflowTemplateService.cs +│ │ ├── WorkflowParser.cs +│ │ ├── WorkflowValidator.cs +│ │ └── DagBuilder.cs +│ ├── Store/ +│ │ ├── IWorkflowTemplateStore.cs +│ │ └── WorkflowTemplateStore.cs +│ └── Models/ +│ ├── WorkflowTemplate.cs +│ ├── WorkflowStep.cs +│ ├── StepConfig.cs +│ └── WorkflowTrigger.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Workflow.Tests/ + └── Template/ +``` + +--- + +## Architecture Reference + +- [Workflow Engine](../modules/release-orchestrator/modules/workflow-engine.md) +- [Data Model Schema](../modules/release-orchestrator/data-model/schema.md) + +--- + +## Deliverables + +### IWorkflowTemplateService Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Template; + +public interface IWorkflowTemplateService +{ + Task CreateAsync(CreateWorkflowTemplateRequest request, CancellationToken ct = default); + Task UpdateAsync(Guid id, UpdateWorkflowTemplateRequest request, CancellationToken ct = default); + Task GetAsync(Guid id, CancellationToken ct = default); + Task GetByNameAsync(string name, CancellationToken ct = default); + Task GetByNameAndVersionAsync(string name, int version, CancellationToken ct = default); + Task> ListAsync(WorkflowTemplateFilter? filter = null, CancellationToken ct = default); + Task PublishAsync(Guid id, CancellationToken ct = default); + Task DeprecateAsync(Guid id, CancellationToken ct = default); + Task DeleteAsync(Guid id, CancellationToken ct = default); + Task ValidateAsync(string content, WorkflowFormat format, CancellationToken ct = default); +} + +public sealed record CreateWorkflowTemplateRequest( + string Name, + string DisplayName, + string Content, + WorkflowFormat Format, + string? Description = null +); + +public sealed record UpdateWorkflowTemplateRequest( + string? DisplayName = null, + string? Content = null, + string? Description = null +); + +public enum WorkflowFormat +{ + Yaml, + Json +} +``` + +### WorkflowTemplate Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Models; + +public sealed record WorkflowTemplate +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required string Name { get; init; } + public required string DisplayName { get; init; } + public string? Description { get; init; } + public required int Version { get; init; } + public required WorkflowTemplateStatus Status { get; init; } + public required string Content { get; init; } + public required WorkflowFormat Format { get; init; } + public required ImmutableArray Steps { get; init; } + public required ImmutableArray Triggers { get; init; } + public ImmutableDictionary Variables { get; init; } = + ImmutableDictionary.Empty; + public DateTimeOffset CreatedAt { get; init; } + public DateTimeOffset UpdatedAt { get; init; } + public DateTimeOffset? PublishedAt { get; init; } + public Guid CreatedBy { get; init; } +} + +public enum WorkflowTemplateStatus +{ + Draft, + Published, + Deprecated +} + +public sealed record WorkflowStep +{ + public required string Id { get; init; } + public required string Type { get; init; } + public string? DisplayName { get; init; } + public ImmutableArray DependsOn { get; init; } = []; + public string? Condition { get; init; } + public ImmutableDictionary Config { get; init; } = + ImmutableDictionary.Empty; + public TimeSpan? Timeout { get; init; } + public RetryConfig? Retry { get; init; } + public bool ContinueOnError { get; init; } = false; +} + +public sealed record RetryConfig( + int MaxAttempts = 3, + TimeSpan InitialDelay = default, + double BackoffMultiplier = 2.0 +) +{ + public TimeSpan InitialDelay { get; init; } = InitialDelay == default + ? TimeSpan.FromSeconds(5) + : InitialDelay; +} + +public sealed record WorkflowTrigger +{ + public required TriggerType Type { get; init; } + public Guid? EnvironmentId { get; init; } + public string? EnvironmentName { get; init; } + public string? CronExpression { get; init; } + public ImmutableDictionary Filters { get; init; } = + ImmutableDictionary.Empty; +} + +public enum TriggerType +{ + Manual, + Promotion, + Schedule, + Webhook, + NewVersion +} +``` + +### WorkflowParser + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Template; + +public sealed class WorkflowParser +{ + private readonly ILogger _logger; + + public ParsedWorkflow Parse(string content, WorkflowFormat format) + { + return format switch + { + WorkflowFormat.Yaml => ParseYaml(content), + WorkflowFormat.Json => ParseJson(content), + _ => throw new ArgumentOutOfRangeException(nameof(format)) + }; + } + + private ParsedWorkflow ParseYaml(string content) + { + var deserializer = new DeserializerBuilder() + .WithNamingConvention(CamelCaseNamingConvention.Instance) + .Build(); + + try + { + var raw = deserializer.Deserialize(content); + return MapToWorkflow(raw); + } + catch (YamlException ex) + { + throw new WorkflowParseException($"YAML parse error at line {ex.Start.Line}: {ex.Message}", ex); + } + } + + private ParsedWorkflow ParseJson(string content) + { + try + { + var raw = JsonSerializer.Deserialize(content, + new JsonSerializerOptions + { + PropertyNamingPolicy = JsonNamingPolicy.CamelCase, + ReadCommentHandling = JsonCommentHandling.Skip + }); + + return MapToWorkflow(raw!); + } + catch (JsonException ex) + { + throw new WorkflowParseException($"JSON parse error: {ex.Message}", ex); + } + } + + private ParsedWorkflow MapToWorkflow(RawWorkflowDefinition raw) + { + var steps = raw.Steps.Select(s => new WorkflowStep + { + Id = s.Id, + Type = s.Type, + DisplayName = s.DisplayName, + DependsOn = s.DependsOn?.ToImmutableArray() ?? [], + Condition = s.Condition, + Config = s.Config?.ToImmutableDictionary() ?? ImmutableDictionary.Empty, + Timeout = s.Timeout.HasValue ? TimeSpan.FromSeconds(s.Timeout.Value) : null, + Retry = s.Retry is not null ? new RetryConfig( + s.Retry.MaxAttempts ?? 3, + TimeSpan.FromSeconds(s.Retry.InitialDelaySeconds ?? 5), + s.Retry.BackoffMultiplier ?? 2.0 + ) : null, + ContinueOnError = s.ContinueOnError ?? false + }).ToImmutableArray(); + + var triggers = raw.Triggers?.Select(t => new WorkflowTrigger + { + Type = Enum.Parse(t.Type, ignoreCase: true), + EnvironmentName = t.Environment, + CronExpression = t.Cron, + Filters = t.Filters?.ToImmutableDictionary() ?? ImmutableDictionary.Empty + }).ToImmutableArray() ?? []; + + return new ParsedWorkflow( + Name: raw.Name, + Version: raw.Version ?? 1, + Steps: steps, + Triggers: triggers, + Variables: raw.Variables?.ToImmutableDictionary() ?? ImmutableDictionary.Empty + ); + } +} + +public sealed record ParsedWorkflow( + string Name, + int Version, + ImmutableArray Steps, + ImmutableArray Triggers, + ImmutableDictionary Variables +); +``` + +### WorkflowValidator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Template; + +public sealed class WorkflowValidator +{ + private readonly IStepRegistry _stepRegistry; + + public async Task ValidateAsync( + ParsedWorkflow workflow, + CancellationToken ct = default) + { + var errors = new List(); + var warnings = new List(); + + // Validate workflow name + if (!IsValidWorkflowName(workflow.Name)) + { + errors.Add(new ValidationError( + "workflow.name", + "Workflow name must be lowercase alphanumeric with hyphens, 2-64 characters")); + } + + // Validate steps exist + if (workflow.Steps.Length == 0) + { + errors.Add(new ValidationError( + "workflow.steps", + "Workflow must have at least one step")); + } + + // Validate step IDs are unique + var stepIds = workflow.Steps.Select(s => s.Id).ToList(); + var duplicates = stepIds.GroupBy(id => id) + .Where(g => g.Count() > 1) + .Select(g => g.Key); + + foreach (var dup in duplicates) + { + errors.Add(new ValidationError( + $"steps.{dup}", + $"Duplicate step ID: {dup}")); + } + + // Validate step types exist + foreach (var step in workflow.Steps) + { + var stepDef = await _stepRegistry.GetAsync(step.Type, ct); + if (stepDef is null) + { + errors.Add(new ValidationError( + $"steps.{step.Id}.type", + $"Unknown step type: {step.Type}")); + } + } + + // Validate dependencies exist + var stepIdSet = stepIds.ToHashSet(); + foreach (var step in workflow.Steps) + { + foreach (var dep in step.DependsOn) + { + if (!stepIdSet.Contains(dep)) + { + errors.Add(new ValidationError( + $"steps.{step.Id}.dependsOn", + $"Unknown dependency: {dep}")); + } + } + } + + // Validate DAG has no cycles + var cycleError = DetectCycles(workflow.Steps); + if (cycleError is not null) + { + errors.Add(cycleError); + } + + // Validate triggers + foreach (var (trigger, index) in workflow.Triggers.Select((t, i) => (t, i))) + { + if (trigger.Type == TriggerType.Schedule && + string.IsNullOrEmpty(trigger.CronExpression)) + { + errors.Add(new ValidationError( + $"triggers[{index}].cron", + "Schedule trigger requires cron expression")); + } + } + + // Check for unreachable steps (warning only) + var reachable = FindReachableSteps(workflow.Steps); + var unreachable = stepIdSet.Except(reachable); + foreach (var stepId in unreachable) + { + warnings.Add(new ValidationWarning( + $"steps.{stepId}", + $"Step {stepId} is unreachable (has dependencies but nothing depends on it)")); + } + + return new WorkflowValidationResult( + IsValid: errors.Count == 0, + Errors: errors.ToImmutableArray(), + Warnings: warnings.ToImmutableArray() + ); + } + + private static ValidationError? DetectCycles(ImmutableArray steps) + { + var visited = new HashSet(); + var recursionStack = new HashSet(); + var stepMap = steps.ToDictionary(s => s.Id); + + foreach (var step in steps) + { + if (HasCycle(step.Id, stepMap, visited, recursionStack, out var cycle)) + { + return new ValidationError( + "workflow.steps", + $"Circular dependency detected: {string.Join(" -> ", cycle)}"); + } + } + + return null; + } + + private static bool HasCycle( + string stepId, + Dictionary stepMap, + HashSet visited, + HashSet recursionStack, + out List cycle) + { + cycle = []; + + if (recursionStack.Contains(stepId)) + { + cycle.Add(stepId); + return true; + } + + if (visited.Contains(stepId)) + return false; + + visited.Add(stepId); + recursionStack.Add(stepId); + + if (stepMap.TryGetValue(stepId, out var step)) + { + foreach (var dep in step.DependsOn) + { + if (HasCycle(dep, stepMap, visited, recursionStack, out cycle)) + { + cycle.Insert(0, stepId); + return true; + } + } + } + + recursionStack.Remove(stepId); + return false; + } + + private static HashSet FindReachableSteps(ImmutableArray steps) + { + // Steps with no dependencies are entry points + var entryPoints = steps.Where(s => s.DependsOn.Length == 0) + .Select(s => s.Id) + .ToHashSet(); + + // All steps that are depended upon + var dependedUpon = steps.SelectMany(s => s.DependsOn).ToHashSet(); + + // Entry points + all steps that something depends on + return entryPoints.Union(dependedUpon).ToHashSet(); + } + + private static bool IsValidWorkflowName(string name) => + Regex.IsMatch(name, @"^[a-z][a-z0-9-]{1,63}$"); +} + +public sealed record WorkflowValidationResult( + bool IsValid, + ImmutableArray Errors, + ImmutableArray Warnings +); + +public sealed record ValidationError(string Path, string Message); +public sealed record ValidationWarning(string Path, string Message); +``` + +### DagBuilder + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Template; + +public sealed class DagBuilder +{ + public WorkflowDag Build(ImmutableArray steps) + { + var nodes = new Dictionary(); + + // Create nodes + foreach (var step in steps) + { + nodes[step.Id] = new DagNode(step.Id, step); + } + + // Build edges + foreach (var step in steps) + { + var node = nodes[step.Id]; + foreach (var dep in step.DependsOn) + { + if (nodes.TryGetValue(dep, out var depNode)) + { + node.Dependencies.Add(depNode); + depNode.Dependents.Add(node); + } + } + } + + // Find entry nodes (no dependencies) + var entryNodes = nodes.Values + .Where(n => n.Dependencies.Count == 0) + .ToImmutableArray(); + + // Compute topological order + var order = TopologicalSort(nodes.Values); + + return new WorkflowDag( + Nodes: nodes.Values.ToImmutableArray(), + EntryNodes: entryNodes, + TopologicalOrder: order + ); + } + + private static ImmutableArray TopologicalSort(IEnumerable nodes) + { + var sorted = new List(); + var visited = new HashSet(); + var nodeList = nodes.ToList(); + + void Visit(DagNode node) + { + if (visited.Contains(node.Id)) + return; + + visited.Add(node.Id); + + foreach (var dep in node.Dependencies) + { + Visit(dep); + } + + sorted.Add(node); + } + + foreach (var node in nodeList) + { + Visit(node); + } + + return sorted.ToImmutableArray(); + } +} + +public sealed record WorkflowDag( + ImmutableArray Nodes, + ImmutableArray EntryNodes, + ImmutableArray TopologicalOrder +); + +public sealed class DagNode +{ + public string Id { get; } + public WorkflowStep Step { get; } + public List Dependencies { get; } = []; + public List Dependents { get; } = []; + + public DagNode(string id, WorkflowStep step) + { + Id = id; + Step = step; + } + + public bool IsReady(IReadOnlySet completedSteps) => + Dependencies.All(d => completedSteps.Contains(d.Id)); +} +``` + +### Domain Events + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Events; + +public sealed record WorkflowTemplateCreated( + Guid TemplateId, + Guid TenantId, + string Name, + int Version, + int StepCount, + DateTimeOffset CreatedAt +) : IDomainEvent; + +public sealed record WorkflowTemplatePublished( + Guid TemplateId, + Guid TenantId, + string Name, + int Version, + DateTimeOffset PublishedAt +) : IDomainEvent; + +public sealed record WorkflowTemplateDeprecated( + Guid TemplateId, + Guid TenantId, + string Name, + DateTimeOffset DeprecatedAt +) : IDomainEvent; +``` + +--- + +## Acceptance Criteria + +- [ ] Parse YAML workflow definitions +- [ ] Parse JSON workflow definitions +- [ ] Validate step types exist +- [ ] Detect circular dependencies +- [ ] Validate dependencies exist +- [ ] Create workflow templates +- [ ] Version workflow templates +- [ ] Publish workflow templates +- [ ] Deprecate workflow templates +- [ ] Unit test coverage >=85% + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `ParseYaml_ValidWorkflow_Succeeds` | YAML parsing works | +| `ParseJson_ValidWorkflow_Succeeds` | JSON parsing works | +| `Validate_CyclicDependency_Fails` | Cycle detection works | +| `Validate_MissingDependency_Fails` | Dependency check works | +| `Validate_UnknownStepType_Fails` | Step type check works | +| `DagBuilder_CreatesTopologicalOrder` | DAG building works | +| `CreateTemplate_StoresContent` | Creation works | +| `PublishTemplate_ChangesStatus` | Publishing works | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `WorkflowTemplateLifecycle_E2E` | Full CRUD cycle | +| `WorkflowParsing_E2E` | Real workflow files | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 101_001 Database Schema | Internal | TODO | +| 105_002 Step Registry | Internal | TODO | +| YamlDotNet | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IWorkflowTemplateService | TODO | | +| WorkflowTemplateService | TODO | | +| WorkflowParser | TODO | | +| WorkflowValidator | TODO | | +| DagBuilder | TODO | | +| WorkflowTemplate model | TODO | | +| IWorkflowTemplateStore | TODO | | +| WorkflowTemplateStore | TODO | | +| Domain events | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_105_002_WORKFL_step_registry.md b/docs/implplan/SPRINT_20260110_105_002_WORKFL_step_registry.md new file mode 100644 index 000000000..e85c11fbb --- /dev/null +++ b/docs/implplan/SPRINT_20260110_105_002_WORKFL_step_registry.md @@ -0,0 +1,549 @@ +# SPRINT: Step Registry + +> **Sprint ID:** 105_002 +> **Module:** WORKFL +> **Phase:** 5 - Workflow Engine +> **Status:** TODO +> **Parent:** [105_000_INDEX](SPRINT_20260110_105_000_INDEX_workflow_engine.md) + +--- + +## Overview + +Implement the Step Registry for managing built-in and plugin workflow steps. + +### Objectives + +- Register built-in step types +- Load plugin step types dynamically +- Define step schemas for configuration validation +- Provide step discovery and documentation + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Workflow/ +│ ├── Steps/ +│ │ ├── IStepRegistry.cs +│ │ ├── StepRegistry.cs +│ │ ├── IStepProvider.cs +│ │ ├── StepDefinition.cs +│ │ └── StepSchema.cs +│ ├── Steps.BuiltIn/ +│ │ └── (see 105_005) +│ └── Steps.Plugin/ +│ ├── IPluginStepLoader.cs +│ └── PluginStepLoader.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Workflow.Tests/ + └── Steps/ +``` + +--- + +## Architecture Reference + +- [Workflow Engine](../modules/release-orchestrator/modules/workflow-engine.md) +- [Plugin System](../modules/release-orchestrator/plugins/step-plugins.md) + +--- + +## Deliverables + +### IStepRegistry Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps; + +public interface IStepRegistry +{ + void RegisterBuiltIn(string type) where T : class, IStepProvider; + void RegisterPlugin(StepDefinition definition, IStepProvider provider); + Task GetProviderAsync(string type, CancellationToken ct = default); + StepDefinition? GetDefinition(string type); + IReadOnlyList GetAllDefinitions(); + IReadOnlyList GetBuiltInDefinitions(); + IReadOnlyList GetPluginDefinitions(); + bool IsRegistered(string type); +} + +public interface IStepProvider +{ + string Type { get; } + string DisplayName { get; } + string Description { get; } + StepSchema ConfigSchema { get; } + StepCapabilities Capabilities { get; } + + Task ExecuteAsync(StepContext context, CancellationToken ct = default); + Task ValidateConfigAsync(IReadOnlyDictionary config, CancellationToken ct = default); +} +``` + +### StepDefinition Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps; + +public sealed record StepDefinition +{ + public required string Type { get; init; } + public required string DisplayName { get; init; } + public required string Description { get; init; } + public required StepCategory Category { get; init; } + public required StepSource Source { get; init; } + public string? PluginId { get; init; } + public required StepSchema ConfigSchema { get; init; } + public required StepCapabilities Capabilities { get; init; } + public string? DocumentationUrl { get; init; } + public string? IconUrl { get; init; } + public ImmutableArray Examples { get; init; } = []; +} + +public enum StepCategory +{ + Deployment, + Gate, + Approval, + Notification, + Script, + Integration, + Utility +} + +public enum StepSource +{ + BuiltIn, + Plugin +} + +public sealed record StepCapabilities +{ + public bool SupportsRetry { get; init; } = true; + public bool SupportsTimeout { get; init; } = true; + public bool SupportsCondition { get; init; } = true; + public bool RequiresAgent { get; init; } = false; + public bool IsAsync { get; init; } = false; // Requires callback to complete + public ImmutableArray RequiredPermissions { get; init; } = []; +} + +public sealed record StepExample( + string Name, + string Description, + ImmutableDictionary Config +); +``` + +### StepSchema + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps; + +public sealed record StepSchema +{ + public ImmutableArray Properties { get; init; } = []; + public ImmutableArray Required { get; init; } = []; + + public ValidationResult Validate(IReadOnlyDictionary config) + { + var errors = new List(); + + // Check required properties + foreach (var required in Required) + { + if (!config.ContainsKey(required) || config[required] is null) + { + errors.Add($"Required property '{required}' is missing"); + } + } + + // Validate property types + foreach (var prop in Properties) + { + if (config.TryGetValue(prop.Name, out var value) && value is not null) + { + var propError = ValidateProperty(prop, value); + if (propError is not null) + { + errors.Add(propError); + } + } + } + + return errors.Count == 0 + ? ValidationResult.Success() + : ValidationResult.Failure(errors); + } + + private static string? ValidateProperty(StepProperty prop, object value) + { + return prop.Type switch + { + StepPropertyType.String when value is not string => + $"Property '{prop.Name}' must be a string", + + StepPropertyType.Integer when !IsInteger(value) => + $"Property '{prop.Name}' must be an integer", + + StepPropertyType.Number when !IsNumber(value) => + $"Property '{prop.Name}' must be a number", + + StepPropertyType.Boolean when value is not bool => + $"Property '{prop.Name}' must be a boolean", + + StepPropertyType.Array when value is not IEnumerable => + $"Property '{prop.Name}' must be an array", + + StepPropertyType.Object when value is not IDictionary => + $"Property '{prop.Name}' must be an object", + + _ => null + }; + } + + private static bool IsInteger(object value) => + value is int or long or short or byte; + + private static bool IsNumber(object value) => + value is int or long or short or byte or float or double or decimal; +} + +public sealed record StepProperty +{ + public required string Name { get; init; } + public required StepPropertyType Type { get; init; } + public string? Description { get; init; } + public object? Default { get; init; } + public ImmutableArray? Enum { get; init; } + public int? MinValue { get; init; } + public int? MaxValue { get; init; } + public int? MinLength { get; init; } + public int? MaxLength { get; init; } + public string? Pattern { get; init; } +} + +public enum StepPropertyType +{ + String, + Integer, + Number, + Boolean, + Array, + Object, + Secret +} +``` + +### StepRegistry Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps; + +public sealed class StepRegistry : IStepRegistry +{ + private readonly ConcurrentDictionary _steps = new(); + private readonly IServiceProvider _serviceProvider; + private readonly ILogger _logger; + + public StepRegistry(IServiceProvider serviceProvider, ILogger logger) + { + _serviceProvider = serviceProvider; + _logger = logger; + } + + public void RegisterBuiltIn(string type) where T : class, IStepProvider + { + var provider = _serviceProvider.GetRequiredService(); + + var definition = new StepDefinition + { + Type = type, + DisplayName = provider.DisplayName, + Description = provider.Description, + Category = InferCategory(type), + Source = StepSource.BuiltIn, + ConfigSchema = provider.ConfigSchema, + Capabilities = provider.Capabilities + }; + + if (!_steps.TryAdd(type, (definition, provider))) + { + throw new InvalidOperationException($"Step type '{type}' is already registered"); + } + + _logger.LogInformation("Registered built-in step: {Type}", type); + } + + public void RegisterPlugin(StepDefinition definition, IStepProvider provider) + { + if (definition.Source != StepSource.Plugin) + { + throw new ArgumentException("Definition must have Plugin source"); + } + + if (!_steps.TryAdd(definition.Type, (definition, provider))) + { + throw new InvalidOperationException($"Step type '{definition.Type}' is already registered"); + } + + _logger.LogInformation( + "Registered plugin step: {Type} from {PluginId}", + definition.Type, + definition.PluginId); + } + + public Task GetProviderAsync(string type, CancellationToken ct = default) + { + return _steps.TryGetValue(type, out var entry) + ? Task.FromResult(entry.Provider) + : Task.FromResult(null); + } + + public StepDefinition? GetDefinition(string type) + { + return _steps.TryGetValue(type, out var entry) + ? entry.Definition + : null; + } + + public IReadOnlyList GetAllDefinitions() + { + return _steps.Values.Select(e => e.Definition).ToList().AsReadOnly(); + } + + public IReadOnlyList GetBuiltInDefinitions() + { + return _steps.Values + .Where(e => e.Definition.Source == StepSource.BuiltIn) + .Select(e => e.Definition) + .ToList() + .AsReadOnly(); + } + + public IReadOnlyList GetPluginDefinitions() + { + return _steps.Values + .Where(e => e.Definition.Source == StepSource.Plugin) + .Select(e => e.Definition) + .ToList() + .AsReadOnly(); + } + + public bool IsRegistered(string type) => _steps.ContainsKey(type); + + private static StepCategory InferCategory(string type) => + type switch + { + "deploy" or "rollback" => StepCategory.Deployment, + "security-gate" or "policy-gate" => StepCategory.Gate, + "approval" => StepCategory.Approval, + "notify" => StepCategory.Notification, + "script" => StepCategory.Script, + "wait" => StepCategory.Utility, + _ => StepCategory.Integration + }; +} +``` + +### PluginStepLoader + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps.Plugin; + +public interface IPluginStepLoader +{ + Task LoadPluginStepsAsync(CancellationToken ct = default); + Task ReloadPluginAsync(string pluginId, CancellationToken ct = default); +} + +public sealed class PluginStepLoader : IPluginStepLoader +{ + private readonly IPluginLoader _pluginLoader; + private readonly IStepRegistry _stepRegistry; + private readonly ILogger _logger; + + public async Task LoadPluginStepsAsync(CancellationToken ct = default) + { + var plugins = await _pluginLoader.GetPluginsAsync(ct); + + foreach (var plugin in plugins) + { + try + { + await LoadPluginAsync(plugin, ct); + } + catch (Exception ex) + { + _logger.LogError(ex, + "Failed to load step plugin {PluginId}", + plugin.Manifest.Id); + } + } + } + + private async Task LoadPluginAsync(LoadedPlugin plugin, CancellationToken ct) + { + var stepProviders = plugin.Instance.GetStepProviders(); + + foreach (var provider in stepProviders) + { + var definition = new StepDefinition + { + Type = provider.Type, + DisplayName = provider.DisplayName, + Description = provider.Description, + Category = plugin.Instance.Category, + Source = StepSource.Plugin, + PluginId = plugin.Manifest.Id, + ConfigSchema = provider.ConfigSchema, + Capabilities = provider.Capabilities, + DocumentationUrl = plugin.Manifest.DocumentationUrl + }; + + _stepRegistry.RegisterPlugin(definition, provider); + + _logger.LogInformation( + "Loaded step '{Type}' from plugin '{PluginId}'", + provider.Type, + plugin.Manifest.Id); + } + } + + public async Task ReloadPluginAsync(string pluginId, CancellationToken ct = default) + { + // Unregister existing steps from this plugin + var existingDefs = _stepRegistry.GetPluginDefinitions() + .Where(d => d.PluginId == pluginId) + .ToList(); + + // Note: Full unregistration would require registry modification + // For now, just log and reload (new registration will override) + + _logger.LogInformation("Reloading step plugin {PluginId}", pluginId); + + var plugin = await _pluginLoader.GetPluginAsync(pluginId, ct); + if (plugin is not null) + { + await LoadPluginAsync(plugin, ct); + } + } +} + +public interface IStepPlugin +{ + StepCategory Category { get; } + IReadOnlyList GetStepProviders(); +} +``` + +### StepRegistryInitializer + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps; + +public sealed class StepRegistryInitializer : IHostedService +{ + private readonly IStepRegistry _registry; + private readonly IPluginStepLoader _pluginLoader; + private readonly ILogger _logger; + + public async Task StartAsync(CancellationToken ct) + { + _logger.LogInformation("Initializing step registry"); + + // Register built-in steps + _registry.RegisterBuiltIn("script"); + _registry.RegisterBuiltIn("approval"); + _registry.RegisterBuiltIn("notify"); + _registry.RegisterBuiltIn("wait"); + _registry.RegisterBuiltIn("security-gate"); + _registry.RegisterBuiltIn("deploy"); + _registry.RegisterBuiltIn("rollback"); + + _logger.LogInformation( + "Registered {Count} built-in steps", + _registry.GetBuiltInDefinitions().Count); + + // Load plugin steps + await _pluginLoader.LoadPluginStepsAsync(ct); + + _logger.LogInformation( + "Loaded {Count} plugin steps", + _registry.GetPluginDefinitions().Count); + } + + public Task StopAsync(CancellationToken ct) => Task.CompletedTask; +} +``` + +--- + +## Acceptance Criteria + +- [ ] Register built-in step types +- [ ] Load plugin step types +- [ ] Validate step configurations against schema +- [ ] Get step provider by type +- [ ] List all step definitions +- [ ] Filter by built-in vs plugin +- [ ] Step schema validation works +- [ ] Required property validation works +- [ ] Unit test coverage >=85% + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `RegisterBuiltIn_AddsStep` | Registration works | +| `RegisterPlugin_AddsStep` | Plugin registration works | +| `GetProvider_ReturnsProvider` | Lookup works | +| `GetDefinition_ReturnsDefinition` | Definition lookup works | +| `SchemaValidation_RequiredMissing_Fails` | Required check works | +| `SchemaValidation_WrongType_Fails` | Type check works | +| `SchemaValidation_Valid_Succeeds` | Valid config passes | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `StepRegistryInit_E2E` | Full initialization | +| `PluginStepLoading_E2E` | Plugin step loading | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 101_002 Plugin Registry | Internal | TODO | +| 101_003 Plugin Loader | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IStepRegistry | TODO | | +| StepRegistry | TODO | | +| IStepProvider | TODO | | +| StepDefinition | TODO | | +| StepSchema | TODO | | +| IPluginStepLoader | TODO | | +| PluginStepLoader | TODO | | +| StepRegistryInitializer | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_105_003_WORKFL_dag_executor.md b/docs/implplan/SPRINT_20260110_105_003_WORKFL_dag_executor.md new file mode 100644 index 000000000..bdafc828a --- /dev/null +++ b/docs/implplan/SPRINT_20260110_105_003_WORKFL_dag_executor.md @@ -0,0 +1,719 @@ +# SPRINT: Workflow Engine - DAG Executor + +> **Sprint ID:** 105_003 +> **Module:** WORKFL +> **Phase:** 5 - Workflow Engine +> **Status:** TODO +> **Parent:** [105_000_INDEX](SPRINT_20260110_105_000_INDEX_workflow_engine.md) + +--- + +## Overview + +Implement the DAG Executor for orchestrating workflow step execution with parallel and sequential support. + +### Objectives + +- Start workflow runs from templates +- Schedule steps based on DAG dependencies +- Execute parallel steps concurrently +- Track workflow run state +- Support pause/resume/cancel + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Workflow/ +│ ├── Engine/ +│ │ ├── IWorkflowEngine.cs +│ │ ├── WorkflowEngine.cs +│ │ ├── DagScheduler.cs +│ │ └── WorkflowRuntime.cs +│ ├── State/ +│ │ ├── IWorkflowStateManager.cs +│ │ ├── WorkflowStateManager.cs +│ │ └── WorkflowCheckpoint.cs +│ ├── Store/ +│ │ ├── IWorkflowRunStore.cs +│ │ └── WorkflowRunStore.cs +│ └── Models/ +│ ├── WorkflowRun.cs +│ ├── StepRun.cs +│ └── WorkflowContext.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Workflow.Tests/ + └── Engine/ +``` + +--- + +## Architecture Reference + +- [Workflow Engine](../modules/release-orchestrator/modules/workflow-engine.md) +- [Data Model Schema](../modules/release-orchestrator/data-model/schema.md) + +--- + +## Deliverables + +### IWorkflowEngine Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Engine; + +public interface IWorkflowEngine +{ + Task StartAsync( + Guid templateId, + WorkflowContext context, + CancellationToken ct = default); + + Task StartFromTemplateAsync( + WorkflowTemplate template, + WorkflowContext context, + CancellationToken ct = default); + + Task ResumeAsync(Guid runId, CancellationToken ct = default); + Task PauseAsync(Guid runId, CancellationToken ct = default); + Task CancelAsync(Guid runId, string? reason = null, CancellationToken ct = default); + Task GetRunAsync(Guid runId, CancellationToken ct = default); + Task> ListRunsAsync(WorkflowRunFilter? filter = null, CancellationToken ct = default); + Task RetryStepAsync(Guid runId, string stepId, CancellationToken ct = default); + Task SkipStepAsync(Guid runId, string stepId, CancellationToken ct = default); +} +``` + +### WorkflowRun Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Models; + +public sealed record WorkflowRun +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required Guid TemplateId { get; init; } + public required string TemplateName { get; init; } + public required int TemplateVersion { get; init; } + public required WorkflowRunStatus Status { get; init; } + public required WorkflowContext Context { get; init; } + public required ImmutableArray Steps { get; init; } + public string? FailureReason { get; init; } + public string? CancelReason { get; init; } + public DateTimeOffset StartedAt { get; init; } + public DateTimeOffset? CompletedAt { get; init; } + public DateTimeOffset? PausedAt { get; init; } + public TimeSpan? Duration => CompletedAt.HasValue + ? CompletedAt.Value - StartedAt + : null; + public Guid StartedBy { get; init; } + + public bool IsTerminal => Status is + WorkflowRunStatus.Completed or + WorkflowRunStatus.Failed or + WorkflowRunStatus.Cancelled; +} + +public enum WorkflowRunStatus +{ + Pending, + Running, + Paused, + WaitingForApproval, + Completed, + Failed, + Cancelled +} + +public sealed record StepRun +{ + public required string StepId { get; init; } + public required string StepType { get; init; } + public required StepRunStatus Status { get; init; } + public int AttemptCount { get; init; } + public DateTimeOffset? StartedAt { get; init; } + public DateTimeOffset? CompletedAt { get; init; } + public StepResult? Result { get; init; } + public string? Error { get; init; } + public ImmutableArray Attempts { get; init; } = []; +} + +public enum StepRunStatus +{ + Pending, + Ready, + Running, + WaitingForCallback, + Completed, + Failed, + Skipped, + Cancelled +} + +public sealed record StepAttempt( + int AttemptNumber, + DateTimeOffset StartedAt, + DateTimeOffset? CompletedAt, + StepResult? Result, + string? Error +); + +public sealed record WorkflowContext +{ + public Guid? ReleaseId { get; init; } + public Guid? EnvironmentId { get; init; } + public Guid? PromotionId { get; init; } + public Guid? DeploymentId { get; init; } + public ImmutableDictionary Variables { get; init; } = + ImmutableDictionary.Empty; + public ImmutableDictionary Outputs { get; init; } = + ImmutableDictionary.Empty; +} +``` + +### WorkflowEngine Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Engine; + +public sealed class WorkflowEngine : IWorkflowEngine +{ + private readonly IWorkflowTemplateService _templateService; + private readonly IWorkflowRunStore _runStore; + private readonly IWorkflowStateManager _stateManager; + private readonly IDagScheduler _scheduler; + private readonly IStepExecutor _stepExecutor; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + private readonly ILogger _logger; + + public async Task StartAsync( + Guid templateId, + WorkflowContext context, + CancellationToken ct = default) + { + var template = await _templateService.GetAsync(templateId, ct) + ?? throw new WorkflowTemplateNotFoundException(templateId); + + if (template.Status != WorkflowTemplateStatus.Published) + { + throw new WorkflowTemplateNotPublishedException(templateId); + } + + return await StartFromTemplateAsync(template, context, ct); + } + + public async Task StartFromTemplateAsync( + WorkflowTemplate template, + WorkflowContext context, + CancellationToken ct = default) + { + var now = _timeProvider.GetUtcNow(); + + var stepRuns = template.Steps.Select(step => new StepRun + { + StepId = step.Id, + StepType = step.Type, + Status = StepRunStatus.Pending, + AttemptCount = 0 + }).ToImmutableArray(); + + var run = new WorkflowRun + { + Id = _guidGenerator.NewGuid(), + TenantId = _tenantContext.TenantId, + TemplateId = template.Id, + TemplateName = template.Name, + TemplateVersion = template.Version, + Status = WorkflowRunStatus.Pending, + Context = context, + Steps = stepRuns, + StartedAt = now, + StartedBy = _userContext.UserId + }; + + await _runStore.SaveAsync(run, ct); + + await _eventPublisher.PublishAsync(new WorkflowRunStarted( + run.Id, + run.TenantId, + run.TemplateName, + run.Context.ReleaseId, + run.Context.EnvironmentId, + now + ), ct); + + _logger.LogInformation( + "Started workflow run {RunId} from template {TemplateName}", + run.Id, + template.Name); + + // Start execution + _ = ExecuteAsync(run.Id, template, ct); + + return run; + } + + private async Task ExecuteAsync( + Guid runId, + WorkflowTemplate template, + CancellationToken ct) + { + try + { + var dag = new DagBuilder().Build(template.Steps); + + await _stateManager.SetStatusAsync(runId, WorkflowRunStatus.Running, ct); + + while (!ct.IsCancellationRequested) + { + var run = await _runStore.GetAsync(runId, ct); + if (run is null || run.IsTerminal) + break; + + if (run.Status == WorkflowRunStatus.Paused) + { + await Task.Delay(TimeSpan.FromSeconds(1), ct); + continue; + } + + // Get ready steps + var completedStepIds = run.Steps + .Where(s => s.Status == StepRunStatus.Completed || s.Status == StepRunStatus.Skipped) + .Select(s => s.StepId) + .ToHashSet(); + + var readySteps = _scheduler.GetReadySteps(dag, completedStepIds, run.Steps); + + if (readySteps.Count == 0) + { + // Check if all steps are complete or if we're stuck + var pendingSteps = run.Steps.Where(s => + s.Status is StepRunStatus.Pending or + StepRunStatus.Ready or + StepRunStatus.Running or + StepRunStatus.WaitingForCallback); + + if (!pendingSteps.Any()) + { + // All steps complete + await _stateManager.CompleteAsync(runId, ct); + break; + } + + // Waiting for async steps + await Task.Delay(TimeSpan.FromSeconds(1), ct); + continue; + } + + // Execute ready steps in parallel + var tasks = readySteps.Select(step => + ExecuteStepAsync(runId, step, run.Context, ct)); + + await Task.WhenAll(tasks); + } + } + catch (OperationCanceledException) when (ct.IsCancellationRequested) + { + _logger.LogInformation("Workflow run {RunId} cancelled", runId); + } + catch (Exception ex) + { + _logger.LogError(ex, "Workflow run {RunId} failed", runId); + await _stateManager.FailAsync(runId, ex.Message, ct); + } + } + + private async Task ExecuteStepAsync( + Guid runId, + WorkflowStep step, + WorkflowContext context, + CancellationToken ct) + { + try + { + await _stateManager.SetStepStatusAsync(runId, step.Id, StepRunStatus.Running, ct); + + var stepContext = new StepContext + { + RunId = runId, + StepId = step.Id, + StepType = step.Type, + Config = step.Config, + WorkflowContext = context, + Timeout = step.Timeout, + RetryConfig = step.Retry + }; + + var result = await _stepExecutor.ExecuteAsync(step, stepContext, ct); + + if (result.IsSuccess) + { + await _stateManager.CompleteStepAsync(runId, step.Id, result, ct); + } + else if (result.RequiresCallback) + { + await _stateManager.SetStepStatusAsync(runId, step.Id, + StepRunStatus.WaitingForCallback, ct); + } + else + { + await _stateManager.FailStepAsync(runId, step.Id, result.Error ?? "Unknown error", ct); + } + } + catch (Exception ex) + { + _logger.LogError(ex, "Step {StepId} failed in run {RunId}", step.Id, runId); + await _stateManager.FailStepAsync(runId, step.Id, ex.Message, ct); + } + } + + public async Task CancelAsync(Guid runId, string? reason = null, CancellationToken ct = default) + { + var run = await _runStore.GetAsync(runId, ct) + ?? throw new WorkflowRunNotFoundException(runId); + + if (run.IsTerminal) + { + throw new WorkflowRunAlreadyTerminalException(runId); + } + + await _stateManager.CancelAsync(runId, reason, ct); + + await _eventPublisher.PublishAsync(new WorkflowRunCancelled( + runId, + run.TenantId, + run.TemplateName, + reason, + _timeProvider.GetUtcNow() + ), ct); + } +} +``` + +### DagScheduler + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Engine; + +public interface IDagScheduler +{ + IReadOnlyList GetReadySteps( + WorkflowDag dag, + IReadOnlySet completedStepIds, + ImmutableArray stepRuns); +} + +public sealed class DagScheduler : IDagScheduler +{ + public IReadOnlyList GetReadySteps( + WorkflowDag dag, + IReadOnlySet completedStepIds, + ImmutableArray stepRuns) + { + var readySteps = new List(); + var runningOrWaiting = stepRuns + .Where(s => s.Status is StepRunStatus.Running or StepRunStatus.WaitingForCallback) + .Select(s => s.StepId) + .ToHashSet(); + + foreach (var node in dag.Nodes) + { + var stepRun = stepRuns.FirstOrDefault(s => s.StepId == node.Id); + if (stepRun is null) + continue; + + // Skip if already running, complete, or failed + if (stepRun.Status != StepRunStatus.Pending && + stepRun.Status != StepRunStatus.Ready) + continue; + + // Check if all dependencies are complete + if (node.IsReady(completedStepIds)) + { + // Evaluate condition if present + if (ShouldExecute(node.Step, stepRuns)) + { + readySteps.Add(node.Step); + } + } + } + + return readySteps.AsReadOnly(); + } + + private static bool ShouldExecute(WorkflowStep step, ImmutableArray stepRuns) + { + if (string.IsNullOrEmpty(step.Condition)) + return true; + + // Evaluate simple conditions + return step.Condition switch + { + "success()" => stepRuns + .Where(s => step.DependsOn.Contains(s.StepId)) + .All(s => s.Status == StepRunStatus.Completed), + + "failure()" => stepRuns + .Where(s => step.DependsOn.Contains(s.StepId)) + .Any(s => s.Status == StepRunStatus.Failed), + + "always()" => true, + + _ => true // Default to execute for unrecognized conditions + }; + } +} +``` + +### WorkflowStateManager + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.State; + +public interface IWorkflowStateManager +{ + Task SetStatusAsync(Guid runId, WorkflowRunStatus status, CancellationToken ct = default); + Task SetStepStatusAsync(Guid runId, string stepId, StepRunStatus status, CancellationToken ct = default); + Task CompleteAsync(Guid runId, CancellationToken ct = default); + Task FailAsync(Guid runId, string reason, CancellationToken ct = default); + Task CancelAsync(Guid runId, string? reason, CancellationToken ct = default); + Task CompleteStepAsync(Guid runId, string stepId, StepResult result, CancellationToken ct = default); + Task FailStepAsync(Guid runId, string stepId, string error, CancellationToken ct = default); +} + +public sealed class WorkflowStateManager : IWorkflowStateManager +{ + private readonly IWorkflowRunStore _store; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public async Task CompleteStepAsync( + Guid runId, + string stepId, + StepResult result, + CancellationToken ct = default) + { + var run = await _store.GetAsync(runId, ct) + ?? throw new WorkflowRunNotFoundException(runId); + + var updatedSteps = run.Steps.Select(s => + { + if (s.StepId != stepId) + return s; + + return s with + { + Status = StepRunStatus.Completed, + CompletedAt = _timeProvider.GetUtcNow(), + Result = result, + AttemptCount = s.AttemptCount + 1 + }; + }).ToImmutableArray(); + + var updatedRun = run with { Steps = updatedSteps }; + await _store.SaveAsync(updatedRun, ct); + + await _eventPublisher.PublishAsync(new WorkflowStepCompleted( + runId, + stepId, + result.Outputs, + _timeProvider.GetUtcNow() + ), ct); + + _logger.LogInformation( + "Step {StepId} completed in run {RunId}", + stepId, + runId); + } + + public async Task FailStepAsync( + Guid runId, + string stepId, + string error, + CancellationToken ct = default) + { + var run = await _store.GetAsync(runId, ct) + ?? throw new WorkflowRunNotFoundException(runId); + + var step = run.Steps.FirstOrDefault(s => s.StepId == stepId); + if (step is null) + return; + + // Check if we should retry + var template = await GetStepDefinition(run.TemplateId, stepId, ct); + var shouldRetry = template?.Retry is not null && + step.AttemptCount < template.Retry.MaxAttempts; + + var updatedSteps = run.Steps.Select(s => + { + if (s.StepId != stepId) + return s; + + return s with + { + Status = shouldRetry ? StepRunStatus.Pending : StepRunStatus.Failed, + Error = error, + AttemptCount = s.AttemptCount + 1 + }; + }).ToImmutableArray(); + + var updatedRun = run with { Steps = updatedSteps }; + + // If step failed and no retry, fail the workflow + if (!shouldRetry && !step.ContinueOnError) + { + updatedRun = updatedRun with + { + Status = WorkflowRunStatus.Failed, + FailureReason = $"Step {stepId} failed: {error}", + CompletedAt = _timeProvider.GetUtcNow() + }; + } + + await _store.SaveAsync(updatedRun, ct); + + if (!shouldRetry) + { + await _eventPublisher.PublishAsync(new WorkflowStepFailed( + runId, + stepId, + error, + _timeProvider.GetUtcNow() + ), ct); + } + } +} +``` + +### Domain Events + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Events; + +public sealed record WorkflowRunStarted( + Guid RunId, + Guid TenantId, + string TemplateName, + Guid? ReleaseId, + Guid? EnvironmentId, + DateTimeOffset StartedAt +) : IDomainEvent; + +public sealed record WorkflowRunCompleted( + Guid RunId, + Guid TenantId, + string TemplateName, + TimeSpan Duration, + DateTimeOffset CompletedAt +) : IDomainEvent; + +public sealed record WorkflowRunFailed( + Guid RunId, + Guid TenantId, + string TemplateName, + string Reason, + DateTimeOffset FailedAt +) : IDomainEvent; + +public sealed record WorkflowRunCancelled( + Guid RunId, + Guid TenantId, + string TemplateName, + string? Reason, + DateTimeOffset CancelledAt +) : IDomainEvent; + +public sealed record WorkflowStepCompleted( + Guid RunId, + string StepId, + IReadOnlyDictionary? Outputs, + DateTimeOffset CompletedAt +) : IDomainEvent; + +public sealed record WorkflowStepFailed( + Guid RunId, + string StepId, + string Error, + DateTimeOffset FailedAt +) : IDomainEvent; +``` + +--- + +## Acceptance Criteria + +- [ ] Start workflow from template +- [ ] Execute steps in dependency order +- [ ] Execute independent steps in parallel +- [ ] Track workflow run state +- [ ] Pause/resume workflow +- [ ] Cancel workflow +- [ ] Retry failed step +- [ ] Skip step +- [ ] Evaluate step conditions +- [ ] Unit test coverage >=85% + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `StartWorkflow_CreatesRun` | Start creates run | +| `DagScheduler_RespectsOrdering` | Dependencies respected | +| `DagScheduler_ParallelSteps` | Parallel execution | +| `ExecuteStep_Success_CompletesStep` | Success handling | +| `ExecuteStep_Failure_RetriesOrFails` | Failure handling | +| `Cancel_StopsExecution` | Cancellation works | +| `Condition_Success_ExecutesStep` | Condition evaluation | +| `Condition_Failure_SkipsStep` | Conditional skip | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `WorkflowExecution_E2E` | Full workflow run | +| `WorkflowRetry_E2E` | Retry behavior | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 105_001 Workflow Template | Internal | TODO | +| 105_002 Step Registry | Internal | TODO | +| 105_004 Step Executor | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IWorkflowEngine | TODO | | +| WorkflowEngine | TODO | | +| IDagScheduler | TODO | | +| DagScheduler | TODO | | +| IWorkflowStateManager | TODO | | +| WorkflowStateManager | TODO | | +| WorkflowRun model | TODO | | +| IWorkflowRunStore | TODO | | +| WorkflowRunStore | TODO | | +| Domain events | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_105_004_WORKFL_step_executor.md b/docs/implplan/SPRINT_20260110_105_004_WORKFL_step_executor.md new file mode 100644 index 000000000..d159dd1f5 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_105_004_WORKFL_step_executor.md @@ -0,0 +1,615 @@ +# SPRINT: Step Executor + +> **Sprint ID:** 105_004 +> **Module:** WORKFL +> **Phase:** 5 - Workflow Engine +> **Status:** TODO +> **Parent:** [105_000_INDEX](SPRINT_20260110_105_000_INDEX_workflow_engine.md) + +--- + +## Overview + +Implement the Step Executor for executing individual workflow steps with retry and timeout handling. + +### Objectives + +- Execute steps with configuration validation +- Apply retry policies with exponential backoff +- Handle step timeouts +- Manage step execution context +- Support async steps with callbacks + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Workflow/ +│ ├── Executor/ +│ │ ├── IStepExecutor.cs +│ │ ├── StepExecutor.cs +│ │ ├── StepContext.cs +│ │ ├── StepResult.cs +│ │ ├── StepRetryPolicy.cs +│ │ └── StepTimeoutHandler.cs +│ └── Callback/ +│ ├── IStepCallbackHandler.cs +│ ├── StepCallbackHandler.cs +│ └── CallbackToken.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Workflow.Tests/ + └── Executor/ +``` + +--- + +## Architecture Reference + +- [Workflow Engine](../modules/release-orchestrator/modules/workflow-engine.md) + +--- + +## Deliverables + +### IStepExecutor Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Executor; + +public interface IStepExecutor +{ + Task ExecuteAsync( + WorkflowStep step, + StepContext context, + CancellationToken ct = default); +} +``` + +### StepContext Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Executor; + +public sealed record StepContext +{ + public required Guid RunId { get; init; } + public required string StepId { get; init; } + public required string StepType { get; init; } + public required ImmutableDictionary Config { get; init; } + public required WorkflowContext WorkflowContext { get; init; } + public TimeSpan? Timeout { get; init; } + public RetryConfig? RetryConfig { get; init; } + public int AttemptNumber { get; init; } = 1; + + // For variable interpolation + public string Interpolate(string template) + { + var result = template; + + foreach (var (key, value) in WorkflowContext.Variables) + { + result = result.Replace($"${{variables.{key}}}", value); + } + + foreach (var (key, value) in WorkflowContext.Outputs) + { + result = result.Replace($"${{outputs.{key}}}", value?.ToString() ?? ""); + } + + // Built-in variables + result = result.Replace("${run.id}", RunId.ToString()); + result = result.Replace("${step.id}", StepId); + result = result.Replace("${step.attempt}", AttemptNumber.ToString(CultureInfo.InvariantCulture)); + + if (WorkflowContext.ReleaseId.HasValue) + result = result.Replace("${release.id}", WorkflowContext.ReleaseId.Value.ToString()); + + if (WorkflowContext.EnvironmentId.HasValue) + result = result.Replace("${environment.id}", WorkflowContext.EnvironmentId.Value.ToString()); + + return result; + } +} +``` + +### StepResult Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Executor; + +public sealed record StepResult +{ + public required StepResultStatus Status { get; init; } + public string? Error { get; init; } + public ImmutableDictionary Outputs { get; init; } = + ImmutableDictionary.Empty; + public TimeSpan Duration { get; init; } + public bool RequiresCallback { get; init; } + public string? CallbackToken { get; init; } + public DateTimeOffset? CallbackExpiresAt { get; init; } + + public bool IsSuccess => Status == StepResultStatus.Success; + public bool IsFailure => Status == StepResultStatus.Failed; + + public static StepResult Success( + ImmutableDictionary? outputs = null, + TimeSpan duration = default) => + new() + { + Status = StepResultStatus.Success, + Outputs = outputs ?? ImmutableDictionary.Empty, + Duration = duration + }; + + public static StepResult Failed(string error, TimeSpan duration = default) => + new() + { + Status = StepResultStatus.Failed, + Error = error, + Duration = duration + }; + + public static StepResult WaitingForCallback( + string callbackToken, + DateTimeOffset expiresAt) => + new() + { + Status = StepResultStatus.WaitingForCallback, + RequiresCallback = true, + CallbackToken = callbackToken, + CallbackExpiresAt = expiresAt + }; + + public static StepResult Skipped(string reason) => + new() + { + Status = StepResultStatus.Skipped, + Error = reason + }; +} + +public enum StepResultStatus +{ + Success, + Failed, + Skipped, + WaitingForCallback, + TimedOut, + Cancelled +} +``` + +### StepExecutor Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Executor; + +public sealed class StepExecutor : IStepExecutor +{ + private readonly IStepRegistry _stepRegistry; + private readonly IStepRetryPolicy _retryPolicy; + private readonly IStepTimeoutHandler _timeoutHandler; + private readonly ILogger _logger; + private readonly TimeProvider _timeProvider; + + public async Task ExecuteAsync( + WorkflowStep step, + StepContext context, + CancellationToken ct = default) + { + var sw = Stopwatch.StartNew(); + + _logger.LogInformation( + "Executing step {StepId} (type: {StepType}, attempt: {Attempt})", + step.Id, + step.Type, + context.AttemptNumber); + + try + { + // Get step provider + var provider = await _stepRegistry.GetProviderAsync(step.Type, ct); + if (provider is null) + { + return StepResult.Failed($"Unknown step type: {step.Type}", sw.Elapsed); + } + + // Validate configuration + var validation = await provider.ValidateConfigAsync(context.Config, ct); + if (!validation.IsValid) + { + return StepResult.Failed( + $"Invalid configuration: {string.Join(", ", validation.Errors)}", + sw.Elapsed); + } + + // Apply timeout if configured + using var timeoutCts = context.Timeout.HasValue + ? new CancellationTokenSource(context.Timeout.Value) + : new CancellationTokenSource(); + + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource( + ct, timeoutCts.Token); + + try + { + var result = await provider.ExecuteAsync(context, linkedCts.Token); + result = result with { Duration = sw.Elapsed }; + + _logger.LogInformation( + "Step {StepId} completed with status {Status} in {Duration}ms", + step.Id, + result.Status, + sw.ElapsedMilliseconds); + + return result; + } + catch (OperationCanceledException) when (timeoutCts.IsCancellationRequested) + { + _logger.LogWarning( + "Step {StepId} timed out after {Timeout}", + step.Id, + context.Timeout); + + return new StepResult + { + Status = StepResultStatus.TimedOut, + Error = $"Step timed out after {context.Timeout}", + Duration = sw.Elapsed + }; + } + } + catch (OperationCanceledException) when (ct.IsCancellationRequested) + { + return new StepResult + { + Status = StepResultStatus.Cancelled, + Duration = sw.Elapsed + }; + } + catch (Exception ex) + { + _logger.LogError(ex, + "Step {StepId} failed with exception", + step.Id); + + return StepResult.Failed(ex.Message, sw.Elapsed); + } + } +} +``` + +### StepRetryPolicy + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Executor; + +public interface IStepRetryPolicy +{ + bool ShouldRetry(StepResult result, RetryConfig? config, int attemptNumber); + TimeSpan GetDelay(RetryConfig config, int attemptNumber); +} + +public sealed class StepRetryPolicy : IStepRetryPolicy +{ + private static readonly HashSet RetryableStatuses = new() + { + StepResultStatus.Failed, + StepResultStatus.TimedOut + }; + + public bool ShouldRetry(StepResult result, RetryConfig? config, int attemptNumber) + { + if (config is null) + return false; + + if (!RetryableStatuses.Contains(result.Status)) + return false; + + if (attemptNumber >= config.MaxAttempts) + return false; + + return true; + } + + public TimeSpan GetDelay(RetryConfig config, int attemptNumber) + { + // Exponential backoff with jitter + var baseDelay = config.InitialDelay.TotalMilliseconds; + var exponentialDelay = baseDelay * Math.Pow(config.BackoffMultiplier, attemptNumber - 1); + + // Add jitter (+-20%) + var jitter = exponentialDelay * (Random.Shared.NextDouble() * 0.4 - 0.2); + var totalDelayMs = exponentialDelay + jitter; + + // Cap at 5 minutes + var cappedDelay = Math.Min(totalDelayMs, TimeSpan.FromMinutes(5).TotalMilliseconds); + + return TimeSpan.FromMilliseconds(cappedDelay); + } +} +``` + +### StepCallbackHandler + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Callback; + +public interface IStepCallbackHandler +{ + Task CreateCallbackAsync( + Guid runId, + string stepId, + TimeSpan? expiresIn = null, + CancellationToken ct = default); + + Task ProcessCallbackAsync( + string token, + CallbackPayload payload, + CancellationToken ct = default); + + Task ValidateCallbackAsync( + string token, + CancellationToken ct = default); +} + +public sealed class StepCallbackHandler : IStepCallbackHandler +{ + private readonly ICallbackStore _store; + private readonly IWorkflowStateManager _stateManager; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + private static readonly TimeSpan DefaultExpiry = TimeSpan.FromHours(24); + + public async Task CreateCallbackAsync( + Guid runId, + string stepId, + TimeSpan? expiresIn = null, + CancellationToken ct = default) + { + var token = GenerateSecureToken(); + var expiry = _timeProvider.GetUtcNow().Add(expiresIn ?? DefaultExpiry); + + var callback = new PendingCallback + { + Token = token, + RunId = runId, + StepId = stepId, + CreatedAt = _timeProvider.GetUtcNow(), + ExpiresAt = expiry + }; + + await _store.SaveAsync(callback, ct); + + return new CallbackToken(token, expiry); + } + + public async Task ProcessCallbackAsync( + string token, + CallbackPayload payload, + CancellationToken ct = default) + { + var pending = await _store.GetByTokenAsync(token, ct); + if (pending is null) + { + return CallbackResult.Failed("Invalid callback token"); + } + + if (pending.ExpiresAt < _timeProvider.GetUtcNow()) + { + return CallbackResult.Failed("Callback token expired"); + } + + if (pending.ProcessedAt.HasValue) + { + return CallbackResult.Failed("Callback already processed"); + } + + // Mark as processed + pending = pending with { ProcessedAt = _timeProvider.GetUtcNow() }; + await _store.SaveAsync(pending, ct); + + // Update step with callback result + var result = payload.Success + ? StepResult.Success(payload.Outputs?.ToImmutableDictionary()) + : StepResult.Failed(payload.Error ?? "Callback indicated failure"); + + await _stateManager.CompleteStepAsync(pending.RunId, pending.StepId, result, ct); + + _logger.LogInformation( + "Processed callback for step {StepId} in run {RunId}", + pending.StepId, + pending.RunId); + + return CallbackResult.Succeeded(pending.RunId, pending.StepId); + } + + private static string GenerateSecureToken() + { + var bytes = RandomNumberGenerator.GetBytes(32); + return Convert.ToBase64String(bytes) + .Replace("+", "-") + .Replace("/", "_") + .TrimEnd('='); + } +} + +public sealed record CallbackToken( + string Token, + DateTimeOffset ExpiresAt +); + +public sealed record CallbackPayload( + bool Success, + string? Error = null, + IReadOnlyDictionary? Outputs = null +); + +public sealed record CallbackResult( + bool IsSuccess, + string? Error = null, + Guid? RunId = null, + string? StepId = null +) +{ + public static CallbackResult Succeeded(Guid runId, string stepId) => + new(true, RunId: runId, StepId: stepId); + + public static CallbackResult Failed(string error) => + new(false, Error: error); +} + +public sealed record PendingCallback +{ + public required string Token { get; init; } + public required Guid RunId { get; init; } + public required string StepId { get; init; } + public required DateTimeOffset CreatedAt { get; init; } + public required DateTimeOffset ExpiresAt { get; init; } + public DateTimeOffset? ProcessedAt { get; init; } +} +``` + +### StepTimeoutHandler + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Executor; + +public interface IStepTimeoutHandler +{ + Task MonitorTimeoutsAsync(CancellationToken ct = default); +} + +public sealed class StepTimeoutHandler : IStepTimeoutHandler, IHostedService +{ + private readonly IWorkflowRunStore _runStore; + private readonly IWorkflowStateManager _stateManager; + private readonly ICallbackStore _callbackStore; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + private Timer? _timer; + + public Task StartAsync(CancellationToken ct) + { + _timer = new Timer( + _ => _ = MonitorTimeoutsAsync(ct), + null, + TimeSpan.FromSeconds(30), + TimeSpan.FromSeconds(30)); + + return Task.CompletedTask; + } + + public Task StopAsync(CancellationToken ct) + { + _timer?.Change(Timeout.Infinite, 0); + return Task.CompletedTask; + } + + public async Task MonitorTimeoutsAsync(CancellationToken ct = default) + { + try + { + var now = _timeProvider.GetUtcNow(); + + // Check for expired callbacks + var expiredCallbacks = await _callbackStore.GetExpiredAsync(now, ct); + foreach (var callback in expiredCallbacks) + { + _logger.LogWarning( + "Callback expired for step {StepId} in run {RunId}", + callback.StepId, + callback.RunId); + + await _stateManager.FailStepAsync( + callback.RunId, + callback.StepId, + "Callback timed out", + ct); + } + } + catch (Exception ex) + { + _logger.LogError(ex, "Error monitoring step timeouts"); + } + } + + public void Dispose() => _timer?.Dispose(); +} +``` + +--- + +## Acceptance Criteria + +- [ ] Execute steps with configuration +- [ ] Validate step configuration before execution +- [ ] Apply timeout to step execution +- [ ] Retry failed steps with exponential backoff +- [ ] Create callback tokens for async steps +- [ ] Process callbacks and complete steps +- [ ] Monitor and fail expired callbacks +- [ ] Interpolate variables in configuration +- [ ] Unit test coverage >=85% + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `ExecuteStep_Success_ReturnsSuccess` | Success case | +| `ExecuteStep_InvalidConfig_Fails` | Config validation | +| `ExecuteStep_Timeout_ReturnsTimedOut` | Timeout handling | +| `RetryPolicy_ShouldRetry_ReturnsTrue` | Retry logic | +| `RetryPolicy_MaxAttempts_ReturnsFalse` | Max attempts | +| `GetDelay_ExponentialBackoff` | Backoff calculation | +| `Callback_ValidToken_Succeeds` | Callback processing | +| `Callback_ExpiredToken_Fails` | Expiry handling | +| `Interpolate_ReplacesVariables` | Variable interpolation | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `StepExecution_E2E` | Full execution flow | +| `StepCallback_E2E` | Async callback flow | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 105_002 Step Registry | Internal | TODO | +| 105_003 DAG Executor | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IStepExecutor | TODO | | +| StepExecutor | TODO | | +| StepContext | TODO | | +| StepResult | TODO | | +| IStepRetryPolicy | TODO | | +| StepRetryPolicy | TODO | | +| IStepCallbackHandler | TODO | | +| StepCallbackHandler | TODO | | +| IStepTimeoutHandler | TODO | | +| StepTimeoutHandler | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_105_005_WORKFL_builtin_steps.md b/docs/implplan/SPRINT_20260110_105_005_WORKFL_builtin_steps.md new file mode 100644 index 000000000..4ce74aa4b --- /dev/null +++ b/docs/implplan/SPRINT_20260110_105_005_WORKFL_builtin_steps.md @@ -0,0 +1,771 @@ +# SPRINT: Built-in Steps + +> **Sprint ID:** 105_005 +> **Module:** WORKFL +> **Phase:** 5 - Workflow Engine +> **Status:** TODO +> **Parent:** [105_000_INDEX](SPRINT_20260110_105_000_INDEX_workflow_engine.md) + +--- + +## Overview + +Implement the built-in workflow steps for common deployment and automation tasks. + +### Objectives + +- Implement core workflow steps required for v1 release +- Define complete step type catalog (16 types total) +- Document deferral strategy for post-v1 and plugin-based steps + +### Step Type Catalog + +The Release Orchestrator supports 16 built-in step types. This sprint implements the **7 core types** required for v1; remaining types are deferred to post-v1 or delivered via the plugin SDK. + +| Step Type | v1 Status | Description | +|-----------|-----------|-------------| +| `script` | **v1** | Execute shell scripts on target host | +| `approval` | **v1** | Request manual approval before proceeding | +| `notify` | **v1** | Send notifications via configured channels | +| `wait` | **v1** | Pause execution for duration/until time | +| `security-gate` | **v1** | Check vulnerability thresholds | +| `deploy` | **v1** | Trigger deployment to target environment | +| `rollback` | **v1** | Rollback to previous release version | +| `http` | Post-v1 | Make HTTP requests (API calls, webhooks) | +| `smoke-test` | Post-v1 | Run smoke tests against deployed service | +| `health-check` | Post-v1 | Custom health check beyond deploy step | +| `database-migrate` | Post-v1 | Run database migrations via agent | +| `feature-flag` | Plugin | Toggle feature flags (LaunchDarkly, Split, etc.) | +| `cache-invalidate` | Plugin | Invalidate CDN/cache (CloudFront, Fastly, etc.) | +| `metric-check` | Plugin | Query metrics (Prometheus, DataDog, etc.) | +| `dns-switch` | Plugin | Update DNS records for blue-green | +| `custom` | Plugin | User-defined plugin steps | + +> **Deferral Strategy:** Post-v1 types will be implemented in Release Orchestrator 1.1. Plugin types are delivered via the Plugin SDK (`StellaOps.Plugin.Sdk`) and can be developed by users or third parties using `IStepProviderCapability`. + +### v1 Core Step Objectives + +- Script step for executing shell commands +- Approval step for manual gates +- Notify step for sending notifications +- Wait step for time delays +- Security gate step for vulnerability checks +- Deploy step for triggering deployments +- Rollback step for reverting releases + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Workflow/ +│ └── Steps.BuiltIn/ +│ ├── ScriptStepProvider.cs +│ ├── ApprovalStepProvider.cs +│ ├── NotifyStepProvider.cs +│ ├── WaitStepProvider.cs +│ ├── SecurityGateStepProvider.cs +│ ├── DeployStepProvider.cs +│ └── RollbackStepProvider.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Workflow.Tests/ + └── Steps.BuiltIn/ +``` + +--- + +## Architecture Reference + +- [Workflow Engine](../modules/release-orchestrator/modules/workflow-engine.md) +- [Step Plugins](../modules/release-orchestrator/plugins/step-plugins.md) + +--- + +## Deliverables + +### ScriptStepProvider + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps.BuiltIn; + +public sealed class ScriptStepProvider : IStepProvider +{ + private readonly IAgentManager _agentManager; + private readonly ILogger _logger; + + public string Type => "script"; + public string DisplayName => "Script"; + public string Description => "Execute a shell script or command on target host"; + + public StepSchema ConfigSchema => new() + { + Properties = + [ + new StepProperty { Name = "script", Type = StepPropertyType.String, Description = "Script content or path" }, + new StepProperty { Name = "shell", Type = StepPropertyType.String, Default = "bash", Description = "Shell to use (bash, sh, powershell)" }, + new StepProperty { Name = "workingDir", Type = StepPropertyType.String, Description = "Working directory" }, + new StepProperty { Name = "environment", Type = StepPropertyType.Object, Description = "Environment variables" }, + new StepProperty { Name = "timeout", Type = StepPropertyType.Integer, Default = 300, Description = "Timeout in seconds" }, + new StepProperty { Name = "failOnNonZero", Type = StepPropertyType.Boolean, Default = true, Description = "Fail if exit code is non-zero" } + ], + Required = ["script"] + }; + + public StepCapabilities Capabilities => new() + { + SupportsRetry = true, + SupportsTimeout = true, + RequiresAgent = true + }; + + public async Task ExecuteAsync(StepContext context, CancellationToken ct) + { + var script = context.Interpolate(context.Config.GetValueOrDefault("script")?.ToString() ?? ""); + var shell = context.Config.GetValueOrDefault("shell")?.ToString() ?? "bash"; + var workingDir = context.Config.GetValueOrDefault("workingDir")?.ToString(); + var timeout = context.Config.GetValueOrDefault("timeout") is int t ? t : 300; + var failOnNonZero = context.Config.GetValueOrDefault("failOnNonZero") as bool? ?? true; + + var environmentId = context.WorkflowContext.EnvironmentId + ?? throw new InvalidOperationException("Script step requires an environment"); + + // Get agent for target + var agent = await GetAgentForEnvironment(environmentId, ct); + + var task = new ScriptExecutionTask + { + Script = script, + Shell = shell, + WorkingDirectory = workingDir, + TimeoutSeconds = timeout, + Environment = ExtractEnvironment(context.Config) + }; + + var result = await _agentManager.ExecuteTaskAsync(agent.Id, task, ct); + + if (result.ExitCode != 0 && failOnNonZero) + { + return StepResult.Failed( + $"Script exited with code {result.ExitCode}: {result.Stderr}"); + } + + return StepResult.Success(new Dictionary + { + ["exitCode"] = result.ExitCode, + ["stdout"] = result.Stdout, + ["stderr"] = result.Stderr + }.ToImmutableDictionary()); + } + + public Task ValidateConfigAsync( + IReadOnlyDictionary config, + CancellationToken ct) => + Task.FromResult(ConfigSchema.Validate(config)); +} +``` + +### ApprovalStepProvider + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps.BuiltIn; + +public sealed class ApprovalStepProvider : IStepProvider +{ + private readonly IApprovalService _approvalService; + private readonly IStepCallbackHandler _callbackHandler; + private readonly ILogger _logger; + + public string Type => "approval"; + public string DisplayName => "Approval"; + public string Description => "Request manual approval before proceeding"; + + public StepSchema ConfigSchema => new() + { + Properties = + [ + new StepProperty { Name = "approvers", Type = StepPropertyType.Array, Description = "List of approver user IDs or group names" }, + new StepProperty { Name = "minApprovals", Type = StepPropertyType.Integer, Default = 1, Description = "Minimum approvals required" }, + new StepProperty { Name = "message", Type = StepPropertyType.String, Description = "Message to display to approvers" }, + new StepProperty { Name = "timeout", Type = StepPropertyType.Integer, Default = 86400, Description = "Timeout in seconds (default 24h)" }, + new StepProperty { Name = "autoApproveOnTimeout", Type = StepPropertyType.Boolean, Default = false, Description = "Auto-approve if timeout reached" } + ], + Required = ["approvers"] + }; + + public StepCapabilities Capabilities => new() + { + SupportsRetry = false, + SupportsTimeout = true, + IsAsync = true + }; + + public async Task ExecuteAsync(StepContext context, CancellationToken ct) + { + var approvers = context.Config.GetValueOrDefault("approvers") as IEnumerable + ?? throw new InvalidOperationException("Approvers required"); + var minApprovals = context.Config.GetValueOrDefault("minApprovals") as int? ?? 1; + var message = context.Interpolate( + context.Config.GetValueOrDefault("message")?.ToString() ?? "Approval required"); + var timeoutSeconds = context.Config.GetValueOrDefault("timeout") as int? ?? 86400; + + // Create callback token + var callback = await _callbackHandler.CreateCallbackAsync( + context.RunId, + context.StepId, + TimeSpan.FromSeconds(timeoutSeconds), + ct); + + // Create approval request + var approval = await _approvalService.CreateAsync(new CreateApprovalRequest + { + WorkflowRunId = context.RunId, + StepId = context.StepId, + Message = message, + Approvers = approvers.Select(a => a.ToString()!).ToList(), + MinApprovals = minApprovals, + ExpiresAt = callback.ExpiresAt, + CallbackToken = callback.Token, + ReleaseId = context.WorkflowContext.ReleaseId, + EnvironmentId = context.WorkflowContext.EnvironmentId + }, ct); + + _logger.LogInformation( + "Created approval request {ApprovalId} for step {StepId}", + approval.Id, + context.StepId); + + return StepResult.WaitingForCallback(callback.Token, callback.ExpiresAt); + } + + public Task ValidateConfigAsync( + IReadOnlyDictionary config, + CancellationToken ct) => + Task.FromResult(ConfigSchema.Validate(config)); +} +``` + +### NotifyStepProvider + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps.BuiltIn; + +public sealed class NotifyStepProvider : IStepProvider +{ + private readonly INotificationService _notificationService; + private readonly ILogger _logger; + + public string Type => "notify"; + public string DisplayName => "Notify"; + public string Description => "Send notifications via configured channels"; + + public StepSchema ConfigSchema => new() + { + Properties = + [ + new StepProperty { Name = "channel", Type = StepPropertyType.String, Description = "Notification channel (slack, teams, email, webhook)" }, + new StepProperty { Name = "message", Type = StepPropertyType.String, Description = "Message content" }, + new StepProperty { Name = "title", Type = StepPropertyType.String, Description = "Message title" }, + new StepProperty { Name = "recipients", Type = StepPropertyType.Array, Description = "Recipient addresses/channels" }, + new StepProperty { Name = "severity", Type = StepPropertyType.String, Default = "info", Description = "Message severity (info, warning, error)" }, + new StepProperty { Name = "template", Type = StepPropertyType.String, Description = "Named template to use" } + ], + Required = ["channel", "message"] + }; + + public StepCapabilities Capabilities => new() + { + SupportsRetry = true, + SupportsTimeout = true + }; + + public async Task ExecuteAsync(StepContext context, CancellationToken ct) + { + var channel = context.Config.GetValueOrDefault("channel")?.ToString() + ?? throw new InvalidOperationException("Channel required"); + var message = context.Interpolate( + context.Config.GetValueOrDefault("message")?.ToString() ?? ""); + var title = context.Interpolate( + context.Config.GetValueOrDefault("title")?.ToString() ?? "Workflow Notification"); + var severity = context.Config.GetValueOrDefault("severity")?.ToString() ?? "info"; + + var recipients = context.Config.GetValueOrDefault("recipients") as IEnumerable; + + var notification = new NotificationRequest + { + Channel = channel, + Title = title, + Message = message, + Severity = Enum.Parse(severity, ignoreCase: true), + Recipients = recipients?.Select(r => r.ToString()!).ToList(), + Metadata = new Dictionary + { + ["workflowRunId"] = context.RunId.ToString(), + ["stepId"] = context.StepId, + ["releaseId"] = context.WorkflowContext.ReleaseId?.ToString() ?? "", + ["environmentId"] = context.WorkflowContext.EnvironmentId?.ToString() ?? "" + } + }; + + await _notificationService.SendAsync(notification, ct); + + _logger.LogInformation( + "Sent {Channel} notification for step {StepId}", + channel, + context.StepId); + + return StepResult.Success(); + } + + public Task ValidateConfigAsync( + IReadOnlyDictionary config, + CancellationToken ct) => + Task.FromResult(ConfigSchema.Validate(config)); +} +``` + +### WaitStepProvider + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps.BuiltIn; + +public sealed class WaitStepProvider : IStepProvider +{ + public string Type => "wait"; + public string DisplayName => "Wait"; + public string Description => "Pause workflow execution for specified duration"; + + public StepSchema ConfigSchema => new() + { + Properties = + [ + new StepProperty { Name = "duration", Type = StepPropertyType.Integer, Description = "Wait duration in seconds" }, + new StepProperty { Name = "until", Type = StepPropertyType.String, Description = "Wait until specific time (ISO 8601)" } + ] + }; + + public StepCapabilities Capabilities => new() + { + SupportsRetry = false, + SupportsTimeout = false + }; + + public async Task ExecuteAsync(StepContext context, CancellationToken ct) + { + TimeSpan waitDuration; + + if (context.Config.TryGetValue("duration", out var durationObj) && durationObj is int duration) + { + waitDuration = TimeSpan.FromSeconds(duration); + } + else if (context.Config.TryGetValue("until", out var untilObj) && + untilObj is string untilStr && + DateTimeOffset.TryParse(untilStr, CultureInfo.InvariantCulture, + DateTimeStyles.AssumeUniversal, out var until)) + { + waitDuration = until - TimeProvider.System.GetUtcNow(); + if (waitDuration < TimeSpan.Zero) + waitDuration = TimeSpan.Zero; + } + else + { + return StepResult.Failed("Either 'duration' or 'until' must be specified"); + } + + if (waitDuration > TimeSpan.Zero) + { + await Task.Delay(waitDuration, ct); + } + + return StepResult.Success(new Dictionary + { + ["waitedSeconds"] = (int)waitDuration.TotalSeconds + }.ToImmutableDictionary()); + } + + public Task ValidateConfigAsync( + IReadOnlyDictionary config, + CancellationToken ct) + { + if (!config.ContainsKey("duration") && !config.ContainsKey("until")) + { + return Task.FromResult(ValidationResult.Failure( + "Either 'duration' or 'until' must be specified")); + } + return Task.FromResult(ValidationResult.Success()); + } +} +``` + +### SecurityGateStepProvider + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps.BuiltIn; + +public sealed class SecurityGateStepProvider : IStepProvider +{ + private readonly IReleaseManager _releaseManager; + private readonly IScannerService _scannerService; + private readonly ILogger _logger; + + public string Type => "security-gate"; + public string DisplayName => "Security Gate"; + public string Description => "Check security vulnerabilities meet thresholds"; + + public StepSchema ConfigSchema => new() + { + Properties = + [ + new StepProperty { Name = "maxCritical", Type = StepPropertyType.Integer, Default = 0, Description = "Maximum critical vulnerabilities allowed" }, + new StepProperty { Name = "maxHigh", Type = StepPropertyType.Integer, Default = 5, Description = "Maximum high vulnerabilities allowed" }, + new StepProperty { Name = "maxMedium", Type = StepPropertyType.Integer, Description = "Maximum medium vulnerabilities allowed" }, + new StepProperty { Name = "requireScan", Type = StepPropertyType.Boolean, Default = true, Description = "Require scan to exist" }, + new StepProperty { Name = "maxAge", Type = StepPropertyType.Integer, Default = 86400, Description = "Max scan age in seconds" } + ] + }; + + public StepCapabilities Capabilities => new() + { + SupportsRetry = true, + SupportsTimeout = true, + RequiredPermissions = ["release:read", "scanner:read"] + }; + + public async Task ExecuteAsync(StepContext context, CancellationToken ct) + { + var releaseId = context.WorkflowContext.ReleaseId + ?? throw new InvalidOperationException("Security gate requires a release"); + + var maxCritical = context.Config.GetValueOrDefault("maxCritical") as int? ?? 0; + var maxHigh = context.Config.GetValueOrDefault("maxHigh") as int? ?? 5; + var maxMedium = context.Config.GetValueOrDefault("maxMedium") as int?; + var requireScan = context.Config.GetValueOrDefault("requireScan") as bool? ?? true; + var maxAgeSeconds = context.Config.GetValueOrDefault("maxAge") as int? ?? 86400; + + var release = await _releaseManager.GetAsync(releaseId, ct) + ?? throw new ReleaseNotFoundException(releaseId); + + var violations = new List(); + var totalCritical = 0; + var totalHigh = 0; + var totalMedium = 0; + + foreach (var component in release.Components) + { + var scanResult = await _scannerService.GetLatestScanAsync( + component.Digest, ct); + + if (scanResult is null) + { + if (requireScan) + { + violations.Add($"No scan found for {component.ComponentName}"); + } + continue; + } + + var scanAge = TimeProvider.System.GetUtcNow() - scanResult.CompletedAt; + if (scanAge.TotalSeconds > maxAgeSeconds) + { + violations.Add($"Scan for {component.ComponentName} is too old ({scanAge.TotalHours:F1}h)"); + } + + totalCritical += scanResult.CriticalCount; + totalHigh += scanResult.HighCount; + totalMedium += scanResult.MediumCount; + } + + // Check thresholds + if (totalCritical > maxCritical) + { + violations.Add($"Critical vulnerabilities ({totalCritical}) exceed threshold ({maxCritical})"); + } + + if (totalHigh > maxHigh) + { + violations.Add($"High vulnerabilities ({totalHigh}) exceed threshold ({maxHigh})"); + } + + if (maxMedium.HasValue && totalMedium > maxMedium.Value) + { + violations.Add($"Medium vulnerabilities ({totalMedium}) exceed threshold ({maxMedium})"); + } + + if (violations.Count > 0) + { + return StepResult.Failed(string.Join("; ", violations)); + } + + _logger.LogInformation( + "Security gate passed for release {ReleaseId}: {Critical}C/{High}H/{Medium}M", + releaseId, + totalCritical, + totalHigh, + totalMedium); + + return StepResult.Success(new Dictionary + { + ["criticalCount"] = totalCritical, + ["highCount"] = totalHigh, + ["mediumCount"] = totalMedium, + ["componentsScanned"] = release.Components.Length + }.ToImmutableDictionary()); + } + + public Task ValidateConfigAsync( + IReadOnlyDictionary config, + CancellationToken ct) => + Task.FromResult(ConfigSchema.Validate(config)); +} +``` + +### DeployStepProvider + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps.BuiltIn; + +public sealed class DeployStepProvider : IStepProvider +{ + private readonly IDeploymentService _deploymentService; + private readonly IStepCallbackHandler _callbackHandler; + private readonly ILogger _logger; + + public string Type => "deploy"; + public string DisplayName => "Deploy"; + public string Description => "Deploy release to target environment"; + + public StepSchema ConfigSchema => new() + { + Properties = + [ + new StepProperty { Name = "strategy", Type = StepPropertyType.String, Default = "rolling", Description = "Deployment strategy (rolling, blue-green, canary)" }, + new StepProperty { Name = "batchSize", Type = StepPropertyType.String, Default = "25%", Description = "Batch size for rolling deploys" }, + new StepProperty { Name = "timeout", Type = StepPropertyType.Integer, Default = 3600, Description = "Deployment timeout in seconds" }, + new StepProperty { Name = "healthCheck", Type = StepPropertyType.Boolean, Default = true, Description = "Wait for health checks" }, + new StepProperty { Name = "rollbackOnFailure", Type = StepPropertyType.Boolean, Default = true, Description = "Auto-rollback on failure" } + ] + }; + + public StepCapabilities Capabilities => new() + { + SupportsRetry = false, // Deployments should not auto-retry + SupportsTimeout = true, + IsAsync = true, + RequiresAgent = true, + RequiredPermissions = ["deployment:create", "environment:deploy"] + }; + + public async Task ExecuteAsync(StepContext context, CancellationToken ct) + { + var releaseId = context.WorkflowContext.ReleaseId + ?? throw new InvalidOperationException("Deploy step requires a release"); + var environmentId = context.WorkflowContext.EnvironmentId + ?? throw new InvalidOperationException("Deploy step requires an environment"); + + var strategy = context.Config.GetValueOrDefault("strategy")?.ToString() ?? "rolling"; + var batchSize = context.Config.GetValueOrDefault("batchSize")?.ToString() ?? "25%"; + var timeoutSeconds = context.Config.GetValueOrDefault("timeout") as int? ?? 3600; + var healthCheck = context.Config.GetValueOrDefault("healthCheck") as bool? ?? true; + var rollbackOnFailure = context.Config.GetValueOrDefault("rollbackOnFailure") as bool? ?? true; + + // Create callback for deployment completion + var callback = await _callbackHandler.CreateCallbackAsync( + context.RunId, + context.StepId, + TimeSpan.FromSeconds(timeoutSeconds), + ct); + + // Start deployment + var deployment = await _deploymentService.CreateAsync(new CreateDeploymentRequest + { + ReleaseId = releaseId, + EnvironmentId = environmentId, + Strategy = Enum.Parse(strategy, ignoreCase: true), + BatchSize = batchSize, + WaitForHealthCheck = healthCheck, + RollbackOnFailure = rollbackOnFailure, + WorkflowRunId = context.RunId, + CallbackToken = callback.Token + }, ct); + + _logger.LogInformation( + "Started deployment {DeploymentId} for release {ReleaseId} to environment {EnvironmentId}", + deployment.Id, + releaseId, + environmentId); + + return StepResult.WaitingForCallback(callback.Token, callback.ExpiresAt); + } + + public Task ValidateConfigAsync( + IReadOnlyDictionary config, + CancellationToken ct) => + Task.FromResult(ConfigSchema.Validate(config)); +} +``` + +### RollbackStepProvider + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps.BuiltIn; + +public sealed class RollbackStepProvider : IStepProvider +{ + private readonly IDeploymentService _deploymentService; + private readonly IReleaseCatalog _releaseCatalog; + private readonly IStepCallbackHandler _callbackHandler; + private readonly ILogger _logger; + + public string Type => "rollback"; + public string DisplayName => "Rollback"; + public string Description => "Rollback to previous release version"; + + public StepSchema ConfigSchema => new() + { + Properties = + [ + new StepProperty { Name = "targetRelease", Type = StepPropertyType.String, Description = "Specific release ID to rollback to (optional)" }, + new StepProperty { Name = "skipCount", Type = StepPropertyType.Integer, Default = 1, Description = "Number of releases to skip back" }, + new StepProperty { Name = "timeout", Type = StepPropertyType.Integer, Default = 1800, Description = "Rollback timeout in seconds" } + ] + }; + + public StepCapabilities Capabilities => new() + { + SupportsRetry = false, + SupportsTimeout = true, + IsAsync = true, + RequiredPermissions = ["deployment:rollback"] + }; + + public async Task ExecuteAsync(StepContext context, CancellationToken ct) + { + var environmentId = context.WorkflowContext.EnvironmentId + ?? throw new InvalidOperationException("Rollback step requires an environment"); + + var targetReleaseId = context.Config.GetValueOrDefault("targetRelease")?.ToString(); + var skipCount = context.Config.GetValueOrDefault("skipCount") as int? ?? 1; + var timeoutSeconds = context.Config.GetValueOrDefault("timeout") as int? ?? 1800; + + Guid rollbackToReleaseId; + + if (!string.IsNullOrEmpty(targetReleaseId)) + { + rollbackToReleaseId = Guid.Parse(targetReleaseId); + } + else + { + // Get deployment history and find previous release + var history = await _releaseCatalog.GetEnvironmentHistoryAsync( + environmentId, skipCount + 1, ct); + + if (history.Count <= skipCount) + { + return StepResult.Failed("No previous release to rollback to"); + } + + rollbackToReleaseId = history[skipCount].ReleaseId; + } + + var callback = await _callbackHandler.CreateCallbackAsync( + context.RunId, + context.StepId, + TimeSpan.FromSeconds(timeoutSeconds), + ct); + + var deployment = await _deploymentService.RollbackAsync(new RollbackRequest + { + EnvironmentId = environmentId, + TargetReleaseId = rollbackToReleaseId, + WorkflowRunId = context.RunId, + CallbackToken = callback.Token + }, ct); + + _logger.LogInformation( + "Started rollback to release {ReleaseId} in environment {EnvironmentId}", + rollbackToReleaseId, + environmentId); + + return StepResult.WaitingForCallback(callback.Token, callback.ExpiresAt); + } + + public Task ValidateConfigAsync( + IReadOnlyDictionary config, + CancellationToken ct) => + Task.FromResult(ValidationResult.Success()); +} +``` + +--- + +## Acceptance Criteria + +- [ ] Script step executes commands via agent +- [ ] Approval step creates approval request +- [ ] Approval completes via callback +- [ ] Notify step sends to configured channels +- [ ] Wait step delays execution +- [ ] Security gate checks vulnerability thresholds +- [ ] Deploy step triggers deployment +- [ ] Rollback step reverts to previous release +- [ ] All steps validate configuration +- [ ] Unit test coverage >=85% + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `ScriptStep_ExecutesViaAgent` | Script execution | +| `ApprovalStep_CreatesRequest` | Approval creation | +| `ApprovalStep_CallbackCompletes` | Callback handling | +| `NotifyStep_SendsNotification` | Notification sending | +| `WaitStep_DelaysExecution` | Wait behavior | +| `SecurityGate_PassesThreshold` | Pass case | +| `SecurityGate_FailsThreshold` | Fail case | +| `DeployStep_TriggersDeployment` | Deployment trigger | +| `RollbackStep_FindsPreviousRelease` | Rollback logic | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `ScriptStep_E2E` | Full script execution | +| `ApprovalWorkflow_E2E` | Approval flow | +| `DeploymentWorkflow_E2E` | Deploy flow | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 105_004 Step Executor | Internal | TODO | +| 103_003 Agent Manager | Internal | TODO | +| 107_* Deployment Execution | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| ScriptStepProvider | TODO | | +| ApprovalStepProvider | TODO | | +| NotifyStepProvider | TODO | | +| WaitStepProvider | TODO | | +| SecurityGateStepProvider | TODO | | +| DeployStepProvider | TODO | | +| RollbackStepProvider | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 10-Jan-2026 | Added Step Type Catalog (16 types) with v1/post-v1/plugin deferral strategy | diff --git a/docs/implplan/SPRINT_20260110_106_000_INDEX_promotion_gates.md b/docs/implplan/SPRINT_20260110_106_000_INDEX_promotion_gates.md new file mode 100644 index 000000000..b97eae6f4 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_106_000_INDEX_promotion_gates.md @@ -0,0 +1,254 @@ +# SPRINT INDEX: Phase 6 - Promotion & Gates + +> **Epic:** Release Orchestrator +> **Phase:** 6 - Promotion & Gates +> **Batch:** 106 +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_release_orchestrator.md) + +--- + +## Overview + +Phase 6 implements the Promotion system - managing release promotions between environments with approval workflows and policy gates. + +### Objectives + +- Promotion manager for promotion requests +- Approval gateway with multi-approver support +- Gate registry for built-in and plugin gates +- Security gate with vulnerability thresholds +- Decision engine combining gates and approvals + +--- + +## Sprint Structure + +| Sprint ID | Title | Module | Status | Dependencies | +|-----------|-------|--------|--------|--------------| +| 106_001 | Promotion Manager | PROMOT | TODO | 104_003, 103_001 | +| 106_002 | Approval Gateway | PROMOT | TODO | 106_001 | +| 106_003 | Gate Registry | PROMOT | TODO | 106_001 | +| 106_004 | Security Gate | PROMOT | TODO | 106_003 | +| 106_005 | Decision Engine | PROMOT | TODO | 106_002, 106_003 | + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ PROMOTION & GATES │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ PROMOTION MANAGER (106_001) │ │ +│ │ │ │ +│ │ Promotion Request ──────────────────────────────────────┐ │ │ +│ │ │ release_id: uuid │ │ │ +│ │ │ source_environment: staging │ │ │ +│ │ │ target_environment: production │ │ │ +│ │ │ requested_by: user-123 │ │ │ +│ │ │ reason: "Release v2.3.1 for Q1 launch" │ │ │ +│ │ └───────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ APPROVAL GATEWAY (106_002) │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Approval Flow │ │ │ +│ │ │ │ │ │ +│ │ │ pending ──► awaiting_approval ──┬──► approved ──► deploying │ │ │ +│ │ │ │ │ │ │ +│ │ │ └──► rejected │ │ │ +│ │ │ │ │ │ +│ │ │ Separation of Duties: requester ≠ approver │ │ │ +│ │ │ Multi-approval: 2 of 3 approvers required │ │ │ +│ │ └─────────────────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ GATE REGISTRY (106_003) │ │ +│ │ │ │ +│ │ Built-in Gates: Plugin Gates: │ │ +│ │ ├── security-gate ├── compliance-gate │ │ +│ │ ├── approval-gate ├── change-window-gate │ │ +│ │ ├── freeze-window-gate └── custom-policy-gate │ │ +│ │ └── manual-gate │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ SECURITY GATE (106_004) │ │ +│ │ │ │ +│ │ Config: Result: │ │ +│ │ ├── max_critical: 0 ├── passed: false │ │ +│ │ ├── max_high: 5 ├── blocking: true │ │ +│ │ ├── max_medium: -1 ├── message: "3 critical vulns found" │ │ +│ │ └── require_sbom: true └── details: { critical: 3, high: 2 } │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ DECISION ENGINE (106_005) │ │ +│ │ │ │ +│ │ Input: Promotion + Gates + Approvals │ │ +│ │ │ │ +│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ +│ │ │ Security Gate│ │Approval Gate │ │ Freeze Gate │ │ │ +│ │ │ ✓ PASS │ │ ✓ PASS │ │ ✓ PASS │ │ │ +│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ +│ │ │ │ │ +│ │ ▼ │ │ +│ │ Decision: ALLOW ───────────────────────────────────────────────────►│ │ +│ │ │ │ +│ │ Decision Record: { gates: [...], approvals: [...], decision: allow } │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Deliverables Summary + +### 106_001: Promotion Manager + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IPromotionManager` | Interface | Promotion operations | +| `PromotionManager` | Class | Implementation | +| `Promotion` | Model | Promotion entity | +| `PromotionValidator` | Class | Business rules | + +### 106_002: Approval Gateway + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IApprovalGateway` | Interface | Approval operations | +| `ApprovalGateway` | Class | Implementation | +| `Approval` | Model | Approval record | +| `SeparationOfDuties` | Class | SoD enforcement | + +### 106_003: Gate Registry + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IGateRegistry` | Interface | Gate lookup | +| `GateRegistry` | Class | Implementation | +| `GateDefinition` | Model | Gate metadata | +| `GateEvaluator` | Class | Execute gates | + +### 106_004: Security Gate + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `SecurityGate` | Gate | Vulnerability threshold gate | +| `SecurityGateConfig` | Config | Threshold configuration | +| `VulnerabilityCounter` | Class | Count by severity | + +### 106_005: Decision Engine + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IDecisionEngine` | Interface | Decision evaluation | +| `DecisionEngine` | Class | Implementation | +| `DecisionRecord` | Model | Decision with evidence | +| `DecisionRules` | Class | Gate combination rules | + +--- + +## Key Interfaces + +```csharp +public interface IPromotionManager +{ + Task RequestAsync(PromotionRequest request, CancellationToken ct); + Task ApproveAsync(Guid promotionId, ApprovalRequest request, CancellationToken ct); + Task RejectAsync(Guid promotionId, RejectionRequest request, CancellationToken ct); + Task CancelAsync(Guid promotionId, CancellationToken ct); + Task GetAsync(Guid promotionId, CancellationToken ct); + Task> ListPendingAsync(Guid? environmentId, CancellationToken ct); +} + +public interface IDecisionEngine +{ + Task EvaluateAsync(Guid promotionId, CancellationToken ct); + Task EvaluateGateAsync(Guid promotionId, string gateName, CancellationToken ct); +} + +public interface IGateProvider +{ + string GateType { get; } + Task EvaluateAsync(GateContext context, CancellationToken ct); +} +``` + +--- + +## Promotion State Machine + +``` +┌─────────┐ +│ pending │ +└────┬────┘ + │ submit + ▼ +┌───────────────────┐ +│ awaiting_approval │◄─────────┐ +└─────────┬─────────┘ │ + │ │ + ┌─────┴─────┐ more approvers + │ │ needed + ▼ ▼ │ +┌────────┐ ┌────────┐ │ +│approved│ │rejected│ │ +└───┬────┘ └────────┘ │ + │ │ + │ gates pass │ + ▼ │ +┌──────────┐ │ +│ deploying│───────────────────┘ +└────┬─────┘ rollback + │ + ├──────────────┐ + ▼ ▼ +┌────────┐ ┌────────┐ +│deployed│ │ failed │ +└────────┘ └───┬────┘ + │ + ▼ + ┌───────────┐ + │rolled_back│ + └───────────┘ +``` + +--- + +## Dependencies + +| Module | Purpose | +|--------|---------| +| 104_003 Release Manager | Release to promote | +| 103_001 Environment CRUD | Target environment | +| Scanner | Security data | + +--- + +## Acceptance Criteria + +- [ ] Promotion request created +- [ ] Approval flow works +- [ ] Separation of duties enforced +- [ ] Multiple approvers supported +- [ ] Security gate blocks on vulns +- [ ] Freeze window blocks promotions +- [ ] Decision record captured +- [ ] Gate results aggregated +- [ ] Unit test coverage ≥80% + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Phase 6 index created | diff --git a/docs/implplan/SPRINT_20260110_106_001_PROMOT_promotion_manager.md b/docs/implplan/SPRINT_20260110_106_001_PROMOT_promotion_manager.md new file mode 100644 index 000000000..140c88cbf --- /dev/null +++ b/docs/implplan/SPRINT_20260110_106_001_PROMOT_promotion_manager.md @@ -0,0 +1,585 @@ +# SPRINT: Promotion Manager + +> **Sprint ID:** 106_001 +> **Module:** PROMOT +> **Phase:** 6 - Promotion & Gates +> **Status:** TODO +> **Parent:** [106_000_INDEX](SPRINT_20260110_106_000_INDEX_promotion_gates.md) + +--- + +## Overview + +Implement the Promotion Manager for handling release promotion requests between environments. + +### Objectives + +- Create promotion requests with release and environment +- Validate promotion prerequisites +- Track promotion lifecycle states +- Support promotion cancellation + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Promotion/ +│ ├── Manager/ +│ │ ├── IPromotionManager.cs +│ │ ├── PromotionManager.cs +│ │ ├── PromotionValidator.cs +│ │ └── PromotionStateMachine.cs +│ ├── Store/ +│ │ ├── IPromotionStore.cs +│ │ └── PromotionStore.cs +│ └── Models/ +│ ├── Promotion.cs +│ ├── PromotionStatus.cs +│ └── PromotionRequest.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Promotion.Tests/ + └── Manager/ +``` + +--- + +## Architecture Reference + +- [Promotion Manager](../modules/release-orchestrator/modules/promotion-manager.md) +- [Data Model Schema](../modules/release-orchestrator/data-model/schema.md) + +--- + +## Deliverables + +### IPromotionManager Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Manager; + +public interface IPromotionManager +{ + Task RequestAsync(CreatePromotionRequest request, CancellationToken ct = default); + Task SubmitAsync(Guid promotionId, CancellationToken ct = default); + Task GetAsync(Guid promotionId, CancellationToken ct = default); + Task> ListAsync(PromotionFilter? filter = null, CancellationToken ct = default); + Task> ListPendingApprovalsAsync(Guid? environmentId = null, CancellationToken ct = default); + Task> ListByReleaseAsync(Guid releaseId, CancellationToken ct = default); + Task CancelAsync(Guid promotionId, string? reason = null, CancellationToken ct = default); + Task UpdateStatusAsync(Guid promotionId, PromotionStatus status, CancellationToken ct = default); +} + +public sealed record CreatePromotionRequest( + Guid ReleaseId, + Guid SourceEnvironmentId, + Guid TargetEnvironmentId, + string? Reason = null, + bool AutoSubmit = false +); + +public sealed record PromotionFilter( + Guid? ReleaseId = null, + Guid? SourceEnvironmentId = null, + Guid? TargetEnvironmentId = null, + PromotionStatus? Status = null, + Guid? RequestedBy = null, + DateTimeOffset? RequestedAfter = null, + DateTimeOffset? RequestedBefore = null +); +``` + +### Promotion Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Models; + +public sealed record Promotion +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required Guid ReleaseId { get; init; } + public required string ReleaseName { get; init; } + public required Guid SourceEnvironmentId { get; init; } + public required string SourceEnvironmentName { get; init; } + public required Guid TargetEnvironmentId { get; init; } + public required string TargetEnvironmentName { get; init; } + public required PromotionStatus Status { get; init; } + public string? Reason { get; init; } + public string? RejectionReason { get; init; } + public string? CancellationReason { get; init; } + public string? FailureReason { get; init; } + public ImmutableArray Approvals { get; init; } = []; + public ImmutableArray GateResults { get; init; } = []; + public Guid? DeploymentId { get; init; } + public DateTimeOffset RequestedAt { get; init; } + public DateTimeOffset? SubmittedAt { get; init; } + public DateTimeOffset? ApprovedAt { get; init; } + public DateTimeOffset? DeployedAt { get; init; } + public DateTimeOffset? CompletedAt { get; init; } + public Guid RequestedBy { get; init; } + public string RequestedByName { get; init; } = ""; + + public bool IsActive => Status is + PromotionStatus.Pending or + PromotionStatus.AwaitingApproval or + PromotionStatus.Approved or + PromotionStatus.Deploying; + + public bool IsTerminal => Status is + PromotionStatus.Deployed or + PromotionStatus.Rejected or + PromotionStatus.Cancelled or + PromotionStatus.Failed or + PromotionStatus.RolledBack; +} + +public enum PromotionStatus +{ + Pending, // Created, not yet submitted + AwaitingApproval, // Submitted, waiting for approvals + Approved, // Approvals complete, ready to deploy + Deploying, // Deployment in progress + Deployed, // Successfully deployed + Rejected, // Approval rejected + Cancelled, // Cancelled by requester + Failed, // Deployment failed + RolledBack // Rolled back after failure +} + +public sealed record ApprovalRecord( + Guid UserId, + string UserName, + ApprovalDecision Decision, + string? Comment, + DateTimeOffset DecidedAt +); + +public enum ApprovalDecision +{ + Approved, + Rejected +} + +public sealed record GateResult( + string GateName, + string GateType, + bool Passed, + bool Blocking, + string? Message, + ImmutableDictionary Details, + DateTimeOffset EvaluatedAt +); +``` + +### PromotionManager Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Manager; + +public sealed class PromotionManager : IPromotionManager +{ + private readonly IPromotionStore _store; + private readonly IPromotionValidator _validator; + private readonly PromotionStateMachine _stateMachine; + private readonly IReleaseManager _releaseManager; + private readonly IEnvironmentService _environmentService; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + private readonly ILogger _logger; + + public async Task RequestAsync( + CreatePromotionRequest request, + CancellationToken ct = default) + { + // Validate request + var validation = await _validator.ValidateRequestAsync(request, ct); + if (!validation.IsValid) + { + throw new PromotionValidationException(validation.Errors); + } + + // Get release and environments + var release = await _releaseManager.GetAsync(request.ReleaseId, ct) + ?? throw new ReleaseNotFoundException(request.ReleaseId); + + var sourceEnv = await _environmentService.GetAsync(request.SourceEnvironmentId, ct) + ?? throw new EnvironmentNotFoundException(request.SourceEnvironmentId); + + var targetEnv = await _environmentService.GetAsync(request.TargetEnvironmentId, ct) + ?? throw new EnvironmentNotFoundException(request.TargetEnvironmentId); + + var now = _timeProvider.GetUtcNow(); + + var promotion = new Promotion + { + Id = _guidGenerator.NewGuid(), + TenantId = _tenantContext.TenantId, + ReleaseId = release.Id, + ReleaseName = release.Name, + SourceEnvironmentId = sourceEnv.Id, + SourceEnvironmentName = sourceEnv.Name, + TargetEnvironmentId = targetEnv.Id, + TargetEnvironmentName = targetEnv.Name, + Status = PromotionStatus.Pending, + Reason = request.Reason, + RequestedAt = now, + RequestedBy = _userContext.UserId, + RequestedByName = _userContext.UserName + }; + + await _store.SaveAsync(promotion, ct); + + await _eventPublisher.PublishAsync(new PromotionRequested( + promotion.Id, + promotion.TenantId, + promotion.ReleaseName, + promotion.SourceEnvironmentName, + promotion.TargetEnvironmentName, + now, + _userContext.UserId + ), ct); + + _logger.LogInformation( + "Created promotion {PromotionId} for release {Release} to {Environment}", + promotion.Id, + release.Name, + targetEnv.Name); + + // Auto-submit if requested + if (request.AutoSubmit) + { + promotion = await SubmitAsync(promotion.Id, ct); + } + + return promotion; + } + + public async Task SubmitAsync(Guid promotionId, CancellationToken ct = default) + { + var promotion = await _store.GetAsync(promotionId, ct) + ?? throw new PromotionNotFoundException(promotionId); + + _stateMachine.ValidateTransition(promotion.Status, PromotionStatus.AwaitingApproval); + + var updatedPromotion = promotion with + { + Status = PromotionStatus.AwaitingApproval, + SubmittedAt = _timeProvider.GetUtcNow() + }; + + await _store.SaveAsync(updatedPromotion, ct); + + await _eventPublisher.PublishAsync(new PromotionSubmitted( + promotionId, + promotion.TenantId, + promotion.TargetEnvironmentId, + _timeProvider.GetUtcNow() + ), ct); + + return updatedPromotion; + } + + public async Task CancelAsync( + Guid promotionId, + string? reason = null, + CancellationToken ct = default) + { + var promotion = await _store.GetAsync(promotionId, ct) + ?? throw new PromotionNotFoundException(promotionId); + + if (promotion.IsTerminal) + { + throw new PromotionAlreadyTerminalException(promotionId); + } + + // Only requester or admin can cancel + if (promotion.RequestedBy != _userContext.UserId && + !_userContext.IsInRole("admin")) + { + throw new UnauthorizedPromotionActionException(promotionId, "cancel"); + } + + var updatedPromotion = promotion with + { + Status = PromotionStatus.Cancelled, + CancellationReason = reason, + CompletedAt = _timeProvider.GetUtcNow() + }; + + await _store.SaveAsync(updatedPromotion, ct); + + await _eventPublisher.PublishAsync(new PromotionCancelled( + promotionId, + promotion.TenantId, + reason, + _timeProvider.GetUtcNow() + ), ct); + + return updatedPromotion; + } + + public async Task> ListPendingApprovalsAsync( + Guid? environmentId = null, + CancellationToken ct = default) + { + var filter = new PromotionFilter( + Status: PromotionStatus.AwaitingApproval, + TargetEnvironmentId: environmentId + ); + + return await _store.ListAsync(filter, ct); + } +} +``` + +### PromotionValidator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Manager; + +public sealed class PromotionValidator : IPromotionValidator +{ + private readonly IReleaseManager _releaseManager; + private readonly IEnvironmentService _environmentService; + private readonly IFreezeWindowService _freezeWindowService; + private readonly IPromotionStore _promotionStore; + + public async Task ValidateRequestAsync( + CreatePromotionRequest request, + CancellationToken ct = default) + { + var errors = new List(); + + // Check release exists and is finalized + var release = await _releaseManager.GetAsync(request.ReleaseId, ct); + if (release is null) + { + errors.Add($"Release {request.ReleaseId} not found"); + } + else if (release.Status == ReleaseStatus.Draft) + { + errors.Add("Cannot promote a draft release"); + } + else if (release.Status == ReleaseStatus.Deprecated) + { + errors.Add("Cannot promote a deprecated release"); + } + + // Check environments exist + var sourceEnv = await _environmentService.GetAsync(request.SourceEnvironmentId, ct); + var targetEnv = await _environmentService.GetAsync(request.TargetEnvironmentId, ct); + + if (sourceEnv is null) + { + errors.Add($"Source environment {request.SourceEnvironmentId} not found"); + } + if (targetEnv is null) + { + errors.Add($"Target environment {request.TargetEnvironmentId} not found"); + } + + // Validate environment order (target must be after source) + if (sourceEnv is not null && targetEnv is not null) + { + if (sourceEnv.OrderIndex >= targetEnv.OrderIndex) + { + errors.Add("Target environment must be later in promotion order than source"); + } + } + + // Check for freeze window on target + if (targetEnv is not null) + { + var isFrozen = await _freezeWindowService.IsEnvironmentFrozenAsync(targetEnv.Id, ct); + if (isFrozen) + { + errors.Add($"Target environment {targetEnv.Name} is currently frozen"); + } + } + + // Check for existing active promotion + var existingPromotions = await _promotionStore.ListAsync(new PromotionFilter( + ReleaseId: request.ReleaseId, + TargetEnvironmentId: request.TargetEnvironmentId + ), ct); + + if (existingPromotions.Any(p => p.IsActive)) + { + errors.Add("An active promotion already exists for this release and environment"); + } + + return errors.Count == 0 + ? ValidationResult.Success() + : ValidationResult.Failure(errors); + } +} +``` + +### PromotionStateMachine + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Manager; + +public sealed class PromotionStateMachine +{ + private static readonly ImmutableDictionary> ValidTransitions = + new Dictionary> + { + [PromotionStatus.Pending] = [PromotionStatus.AwaitingApproval, PromotionStatus.Cancelled], + [PromotionStatus.AwaitingApproval] = [PromotionStatus.Approved, PromotionStatus.Rejected, PromotionStatus.Cancelled], + [PromotionStatus.Approved] = [PromotionStatus.Deploying, PromotionStatus.Cancelled], + [PromotionStatus.Deploying] = [PromotionStatus.Deployed, PromotionStatus.Failed], + [PromotionStatus.Failed] = [PromotionStatus.RolledBack, PromotionStatus.AwaitingApproval], + [PromotionStatus.Deployed] = [], + [PromotionStatus.Rejected] = [], + [PromotionStatus.Cancelled] = [], + [PromotionStatus.RolledBack] = [] + }.ToImmutableDictionary(); + + public bool CanTransition(PromotionStatus from, PromotionStatus to) + { + return ValidTransitions.TryGetValue(from, out var targets) && + targets.Contains(to); + } + + public void ValidateTransition(PromotionStatus from, PromotionStatus to) + { + if (!CanTransition(from, to)) + { + throw new InvalidPromotionTransitionException(from, to); + } + } + + public IReadOnlyList GetValidTransitions(PromotionStatus current) + { + return ValidTransitions.TryGetValue(current, out var targets) + ? targets + : []; + } +} +``` + +### Domain Events + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Events; + +public sealed record PromotionRequested( + Guid PromotionId, + Guid TenantId, + string ReleaseName, + string SourceEnvironment, + string TargetEnvironment, + DateTimeOffset RequestedAt, + Guid RequestedBy +) : IDomainEvent; + +public sealed record PromotionSubmitted( + Guid PromotionId, + Guid TenantId, + Guid TargetEnvironmentId, + DateTimeOffset SubmittedAt +) : IDomainEvent; + +public sealed record PromotionApproved( + Guid PromotionId, + Guid TenantId, + int ApprovalCount, + DateTimeOffset ApprovedAt +) : IDomainEvent; + +public sealed record PromotionRejected( + Guid PromotionId, + Guid TenantId, + Guid RejectedBy, + string Reason, + DateTimeOffset RejectedAt +) : IDomainEvent; + +public sealed record PromotionCancelled( + Guid PromotionId, + Guid TenantId, + string? Reason, + DateTimeOffset CancelledAt +) : IDomainEvent; + +public sealed record PromotionDeployed( + Guid PromotionId, + Guid TenantId, + Guid DeploymentId, + DateTimeOffset DeployedAt +) : IDomainEvent; +``` + +--- + +## Acceptance Criteria + +- [ ] Create promotion request +- [ ] Validate release is finalized +- [ ] Validate environment order +- [ ] Check for freeze window +- [ ] Prevent duplicate active promotions +- [ ] Submit promotion for approval +- [ ] Cancel promotion +- [ ] State machine validates transitions +- [ ] List pending approvals +- [ ] Unit test coverage >=85% + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `RequestPromotion_ValidRequest_Succeeds` | Creation works | +| `RequestPromotion_DraftRelease_Fails` | Draft release rejected | +| `RequestPromotion_FrozenEnvironment_Fails` | Freeze check works | +| `RequestPromotion_DuplicateActive_Fails` | Duplicate check works | +| `SubmitPromotion_ChangesStatus` | Submission works | +| `CancelPromotion_ByRequester_Succeeds` | Cancellation works | +| `StateMachine_ValidTransition_Succeeds` | State transitions | +| `StateMachine_InvalidTransition_Fails` | Invalid transitions blocked | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `PromotionLifecycle_E2E` | Full promotion flow | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 104_003 Release Manager | Internal | TODO | +| 103_001 Environment CRUD | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IPromotionManager | TODO | | +| PromotionManager | TODO | | +| PromotionValidator | TODO | | +| PromotionStateMachine | TODO | | +| Promotion model | TODO | | +| IPromotionStore | TODO | | +| PromotionStore | TODO | | +| Domain events | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_106_002_PROMOT_approval_gateway.md b/docs/implplan/SPRINT_20260110_106_002_PROMOT_approval_gateway.md new file mode 100644 index 000000000..233bcb1d4 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_106_002_PROMOT_approval_gateway.md @@ -0,0 +1,632 @@ +# SPRINT: Approval Gateway + +> **Sprint ID:** 106_002 +> **Module:** PROMOT +> **Phase:** 6 - Promotion & Gates +> **Status:** TODO +> **Parent:** [106_000_INDEX](SPRINT_20260110_106_000_INDEX_promotion_gates.md) + +--- + +## Overview + +Implement the Approval Gateway for managing approval workflows with multi-approver and separation of duties support. + +### Objectives + +- Process approval/rejection decisions +- Enforce separation of duties (requester != approver) +- Support multi-approver requirements +- Track approval history + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Promotion/ +│ ├── Approval/ +│ │ ├── IApprovalGateway.cs +│ │ ├── ApprovalGateway.cs +│ │ ├── SeparationOfDutiesEnforcer.cs +│ │ ├── ApprovalEligibilityChecker.cs +│ │ └── ApprovalNotifier.cs +│ ├── Store/ +│ │ ├── IApprovalStore.cs +│ │ └── ApprovalStore.cs +│ └── Models/ +│ ├── Approval.cs +│ └── ApprovalConfig.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Promotion.Tests/ + └── Approval/ +``` + +--- + +## Architecture Reference + +- [Promotion Manager](../modules/release-orchestrator/modules/promotion-manager.md) + +--- + +## Deliverables + +### IApprovalGateway Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Approval; + +public interface IApprovalGateway +{ + Task ApproveAsync(Guid promotionId, ApprovalRequest request, CancellationToken ct = default); + Task RejectAsync(Guid promotionId, RejectionRequest request, CancellationToken ct = default); + Task GetStatusAsync(Guid promotionId, CancellationToken ct = default); + Task> GetHistoryAsync(Guid promotionId, CancellationToken ct = default); + Task> GetEligibleApproversAsync(Guid promotionId, CancellationToken ct = default); + Task CanUserApproveAsync(Guid promotionId, Guid userId, CancellationToken ct = default); +} + +public sealed record ApprovalRequest( + string? Comment = null +); + +public sealed record RejectionRequest( + string Reason +); + +public sealed record ApprovalResult( + bool Success, + ApprovalStatus Status, + string? Message = null +); + +public sealed record ApprovalStatus( + int RequiredApprovals, + int CurrentApprovals, + bool IsApproved, + bool IsRejected, + IReadOnlyList Approvals +); + +public sealed record EligibleApprover( + Guid UserId, + string UserName, + string? Email, + bool HasAlreadyDecided, + ApprovalDecision? Decision +); +``` + +### Approval Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Models; + +public sealed record Approval +{ + public required Guid Id { get; init; } + public required Guid PromotionId { get; init; } + public required Guid UserId { get; init; } + public required string UserName { get; init; } + public required ApprovalDecision Decision { get; init; } + public string? Comment { get; init; } + public required DateTimeOffset DecidedAt { get; init; } +} + +public sealed record ApprovalConfig +{ + public required int RequiredApprovals { get; init; } + public required bool RequireSeparationOfDuties { get; init; } + public ImmutableArray ApproverUserIds { get; init; } = []; + public ImmutableArray ApproverGroupNames { get; init; } = []; + public TimeSpan? Timeout { get; init; } + public bool AutoApproveOnTimeout { get; init; } = false; +} +``` + +### ApprovalGateway Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Approval; + +public sealed class ApprovalGateway : IApprovalGateway +{ + private readonly IPromotionStore _promotionStore; + private readonly IApprovalStore _approvalStore; + private readonly IEnvironmentService _environmentService; + private readonly SeparationOfDutiesEnforcer _sodEnforcer; + private readonly ApprovalEligibilityChecker _eligibilityChecker; + private readonly IPromotionManager _promotionManager; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + private readonly ILogger _logger; + + public async Task ApproveAsync( + Guid promotionId, + ApprovalRequest request, + CancellationToken ct = default) + { + var promotion = await _promotionStore.GetAsync(promotionId, ct) + ?? throw new PromotionNotFoundException(promotionId); + + if (promotion.Status != PromotionStatus.AwaitingApproval) + { + return new ApprovalResult(false, await GetStatusAsync(promotionId, ct), + "Promotion is not awaiting approval"); + } + + // Check eligibility + var canApprove = await CanUserApproveAsync(promotionId, _userContext.UserId, ct); + if (!canApprove) + { + return new ApprovalResult(false, await GetStatusAsync(promotionId, ct), + "User is not eligible to approve this promotion"); + } + + // Record approval + var approval = new Approval + { + Id = _guidGenerator.NewGuid(), + PromotionId = promotionId, + UserId = _userContext.UserId, + UserName = _userContext.UserName, + Decision = ApprovalDecision.Approved, + Comment = request.Comment, + DecidedAt = _timeProvider.GetUtcNow() + }; + + await _approvalStore.SaveAsync(approval, ct); + + // Update promotion with new approval + var updatedApprovals = promotion.Approvals.Add(new ApprovalRecord( + approval.UserId, + approval.UserName, + approval.Decision, + approval.Comment, + approval.DecidedAt + )); + + var updatedPromotion = promotion with { Approvals = updatedApprovals }; + + // Check if we have enough approvals + var config = await GetApprovalConfigAsync(promotion.TargetEnvironmentId, ct); + var approvalCount = updatedApprovals.Count(a => a.Decision == ApprovalDecision.Approved); + + if (approvalCount >= config.RequiredApprovals) + { + updatedPromotion = updatedPromotion with + { + Status = PromotionStatus.Approved, + ApprovedAt = _timeProvider.GetUtcNow() + }; + + await _eventPublisher.PublishAsync(new PromotionApproved( + promotionId, + promotion.TenantId, + approvalCount, + _timeProvider.GetUtcNow() + ), ct); + } + + await _promotionStore.SaveAsync(updatedPromotion, ct); + + _logger.LogInformation( + "User {User} approved promotion {PromotionId} ({Current}/{Required})", + _userContext.UserName, + promotionId, + approvalCount, + config.RequiredApprovals); + + return new ApprovalResult(true, await GetStatusAsync(promotionId, ct)); + } + + public async Task RejectAsync( + Guid promotionId, + RejectionRequest request, + CancellationToken ct = default) + { + var promotion = await _promotionStore.GetAsync(promotionId, ct) + ?? throw new PromotionNotFoundException(promotionId); + + if (promotion.Status != PromotionStatus.AwaitingApproval) + { + return new ApprovalResult(false, await GetStatusAsync(promotionId, ct), + "Promotion is not awaiting approval"); + } + + var canApprove = await CanUserApproveAsync(promotionId, _userContext.UserId, ct); + if (!canApprove) + { + return new ApprovalResult(false, await GetStatusAsync(promotionId, ct), + "User is not eligible to reject this promotion"); + } + + var approval = new Approval + { + Id = _guidGenerator.NewGuid(), + PromotionId = promotionId, + UserId = _userContext.UserId, + UserName = _userContext.UserName, + Decision = ApprovalDecision.Rejected, + Comment = request.Reason, + DecidedAt = _timeProvider.GetUtcNow() + }; + + await _approvalStore.SaveAsync(approval, ct); + + var updatedApprovals = promotion.Approvals.Add(new ApprovalRecord( + approval.UserId, + approval.UserName, + approval.Decision, + approval.Comment, + approval.DecidedAt + )); + + var updatedPromotion = promotion with + { + Status = PromotionStatus.Rejected, + RejectionReason = request.Reason, + Approvals = updatedApprovals, + CompletedAt = _timeProvider.GetUtcNow() + }; + + await _promotionStore.SaveAsync(updatedPromotion, ct); + + await _eventPublisher.PublishAsync(new PromotionRejected( + promotionId, + promotion.TenantId, + _userContext.UserId, + request.Reason, + _timeProvider.GetUtcNow() + ), ct); + + _logger.LogInformation( + "User {User} rejected promotion {PromotionId}: {Reason}", + _userContext.UserName, + promotionId, + request.Reason); + + return new ApprovalResult(true, await GetStatusAsync(promotionId, ct)); + } + + public async Task CanUserApproveAsync( + Guid promotionId, + Guid userId, + CancellationToken ct = default) + { + var promotion = await _promotionStore.GetAsync(promotionId, ct); + if (promotion is null) + return false; + + // Check separation of duties + var config = await GetApprovalConfigAsync(promotion.TargetEnvironmentId, ct); + if (config.RequireSeparationOfDuties && promotion.RequestedBy == userId) + { + return false; + } + + // Check if user already approved/rejected + if (promotion.Approvals.Any(a => a.UserId == userId)) + { + return false; + } + + // Check if user is in approvers list + return await _eligibilityChecker.IsEligibleAsync( + userId, config.ApproverUserIds, config.ApproverGroupNames, ct); + } + + private async Task GetApprovalConfigAsync( + Guid environmentId, + CancellationToken ct) + { + var environment = await _environmentService.GetAsync(environmentId, ct) + ?? throw new EnvironmentNotFoundException(environmentId); + + return new ApprovalConfig + { + RequiredApprovals = environment.RequiredApprovals, + RequireSeparationOfDuties = environment.RequireSeparationOfDuties, + // ApproverUserIds and ApproverGroupNames from environment config + }; + } +} +``` + +### SeparationOfDutiesEnforcer + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Approval; + +public sealed class SeparationOfDutiesEnforcer +{ + private readonly ILogger _logger; + + public ValidationResult Validate( + Promotion promotion, + Guid approvingUserId, + ApprovalConfig config) + { + if (!config.RequireSeparationOfDuties) + { + return ValidationResult.Success(); + } + + var errors = new List(); + + // Requester cannot approve their own promotion + if (promotion.RequestedBy == approvingUserId) + { + errors.Add("Separation of duties: requester cannot approve their own promotion"); + } + + // Check previous approvals don't include this user + if (promotion.Approvals.Any(a => a.UserId == approvingUserId)) + { + errors.Add("User has already provided an approval decision"); + } + + if (errors.Count > 0) + { + _logger.LogWarning( + "Separation of duties violation for promotion {PromotionId} by user {UserId}: {Errors}", + promotion.Id, + approvingUserId, + string.Join("; ", errors)); + } + + return errors.Count == 0 + ? ValidationResult.Success() + : ValidationResult.Failure(errors); + } +} +``` + +### ApprovalEligibilityChecker + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Approval; + +public sealed class ApprovalEligibilityChecker +{ + private readonly IUserService _userService; + private readonly IGroupService _groupService; + + public async Task IsEligibleAsync( + Guid userId, + ImmutableArray approverUserIds, + ImmutableArray approverGroupNames, + CancellationToken ct = default) + { + // If no specific approvers configured, any authenticated user can approve + if (approverUserIds.Length == 0 && approverGroupNames.Length == 0) + { + return true; + } + + // Check if user is directly in approvers list + if (approverUserIds.Contains(userId)) + { + return true; + } + + // Check if user is in any approver group + if (approverGroupNames.Length > 0) + { + var userGroups = await _groupService.GetUserGroupsAsync(userId, ct); + if (userGroups.Any(g => approverGroupNames.Contains(g.Name))) + { + return true; + } + } + + return false; + } + + public async Task> GetEligibleApproversAsync( + Guid promotionId, + ApprovalConfig config, + ImmutableArray existingApprovals, + CancellationToken ct = default) + { + var eligibleUsers = new List(); + + // Get users from direct list + foreach (var userId in config.ApproverUserIds) + { + var user = await _userService.GetAsync(userId, ct); + if (user is not null) + { + var existingApproval = existingApprovals.FirstOrDefault(a => a.UserId == userId); + eligibleUsers.Add(new EligibleApprover( + userId, + user.Name, + user.Email, + existingApproval is not null, + existingApproval?.Decision + )); + } + } + + // Get users from groups + foreach (var groupName in config.ApproverGroupNames) + { + var groupMembers = await _groupService.GetMembersAsync(groupName, ct); + foreach (var member in groupMembers) + { + if (!eligibleUsers.Any(u => u.UserId == member.Id)) + { + var existingApproval = existingApprovals.FirstOrDefault(a => a.UserId == member.Id); + eligibleUsers.Add(new EligibleApprover( + member.Id, + member.Name, + member.Email, + existingApproval is not null, + existingApproval?.Decision + )); + } + } + } + + return eligibleUsers.AsReadOnly(); + } +} +``` + +### ApprovalNotifier + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Approval; + +public sealed class ApprovalNotifier +{ + private readonly INotificationService _notificationService; + private readonly ApprovalEligibilityChecker _eligibilityChecker; + private readonly ILogger _logger; + + public async Task NotifyApprovalRequestedAsync( + Promotion promotion, + ApprovalConfig config, + CancellationToken ct = default) + { + var eligibleApprovers = await _eligibilityChecker.GetEligibleApproversAsync( + promotion.Id, config, promotion.Approvals, ct); + + var pendingApprovers = eligibleApprovers + .Where(a => !a.HasAlreadyDecided) + .ToList(); + + if (pendingApprovers.Count == 0) + { + _logger.LogWarning( + "No eligible approvers found for promotion {PromotionId}", + promotion.Id); + return; + } + + var notification = new NotificationRequest + { + Channel = "email", + Title = $"Approval Required: {promotion.ReleaseName} to {promotion.TargetEnvironmentName}", + Message = BuildApprovalMessage(promotion), + Recipients = pendingApprovers.Where(a => a.Email is not null).Select(a => a.Email!).ToList(), + Metadata = new Dictionary + { + ["promotionId"] = promotion.Id.ToString(), + ["releaseId"] = promotion.ReleaseId.ToString(), + ["targetEnvironment"] = promotion.TargetEnvironmentName + } + }; + + await _notificationService.SendAsync(notification, ct); + + _logger.LogInformation( + "Sent approval notification for promotion {PromotionId} to {Count} approvers", + promotion.Id, + pendingApprovers.Count); + } + + private static string BuildApprovalMessage(Promotion promotion) => + $"Release '{promotion.ReleaseName}' is requesting promotion from " + + $"{promotion.SourceEnvironmentName} to {promotion.TargetEnvironmentName}.\n\n" + + $"Requested by: {promotion.RequestedByName}\n" + + $"Reason: {promotion.Reason ?? "No reason provided"}\n\n" + + $"Please review and approve or reject this promotion."; +} +``` + +### Domain Events + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Events; + +public sealed record ApprovalDecisionRecorded( + Guid PromotionId, + Guid TenantId, + Guid UserId, + string UserName, + ApprovalDecision Decision, + DateTimeOffset DecidedAt +) : IDomainEvent; + +public sealed record ApprovalThresholdMet( + Guid PromotionId, + Guid TenantId, + int ApprovalCount, + int RequiredApprovals, + DateTimeOffset MetAt +) : IDomainEvent; +``` + +--- + +## Acceptance Criteria + +- [ ] Approve promotion with comment +- [ ] Reject promotion with reason +- [ ] Enforce separation of duties +- [ ] Support multi-approver requirements +- [ ] Check user eligibility +- [ ] List eligible approvers +- [ ] Track approval history +- [ ] Notify approvers on request +- [ ] Unit test coverage >=85% + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `Approve_ValidUser_Succeeds` | Approval works | +| `Approve_Requester_FailsSoD` | SoD enforcement | +| `Approve_AlreadyDecided_Fails` | Duplicate check | +| `Approve_ThresholdMet_ApprovesPromotion` | Threshold logic | +| `Reject_SetsStatusRejected` | Rejection works | +| `CanUserApprove_InGroup_ReturnsTrue` | Group membership | +| `GetEligibleApprovers_ReturnsCorrectList` | Eligibility list | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `ApprovalWorkflow_E2E` | Full approval flow | +| `MultiApprover_E2E` | Multi-approver scenario | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 106_001 Promotion Manager | Internal | TODO | +| Authority | Internal | Exists | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IApprovalGateway | TODO | | +| ApprovalGateway | TODO | | +| SeparationOfDutiesEnforcer | TODO | | +| ApprovalEligibilityChecker | TODO | | +| ApprovalNotifier | TODO | | +| Approval model | TODO | | +| IApprovalStore | TODO | | +| ApprovalStore | TODO | | +| Domain events | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_106_003_PROMOT_gate_registry.md b/docs/implplan/SPRINT_20260110_106_003_PROMOT_gate_registry.md new file mode 100644 index 000000000..123a130f2 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_106_003_PROMOT_gate_registry.md @@ -0,0 +1,727 @@ +# SPRINT: Gate Registry + +> **Sprint ID:** 106_003 +> **Module:** PROMOT +> **Phase:** 6 - Promotion & Gates +> **Status:** TODO +> **Parent:** [106_000_INDEX](SPRINT_20260110_106_000_INDEX_promotion_gates.md) + +--- + +## Overview + +Implement the Gate Registry for managing built-in and plugin promotion gates. + +### Objectives + +- Register built-in gate types (8 types total) +- Load plugin gate types via `IGateProviderCapability` +- Execute gates in promotion context +- Track gate evaluation results + +### Gate Type Catalog + +The Release Orchestrator supports 8 built-in promotion gates. All gates implement `IGateProvider`. + +| Gate Type | Category | Blocking | Sprint | Description | +|-----------|----------|----------|--------|-------------| +| `security-gate` | Security | Yes | 106_004 | Blocks if vulnerabilities exceed thresholds | +| `policy-gate` | Compliance | Yes | 106_003 | Evaluates policy rules (OPA/Rego) | +| `freeze-window-gate` | Operational | Yes | 106_003 | Blocks during freeze windows | +| `manual-gate` | Operational | Yes | 106_003 | Requires manual confirmation | +| `approval-gate` | Compliance | Yes | 106_003 | Requires multi-party approval (N of M) | +| `schedule-gate` | Operational | Yes | 106_003 | Deployment window restrictions | +| `dependency-gate` | Quality | No | 106_003 | Checks upstream dependencies are healthy | +| `metric-gate` | Quality | Configurable | Plugin | SLO/error rate threshold checks | + +> **Note:** `metric-gate` is delivered as a plugin reference implementation via the Plugin SDK because it requires integration with external metrics systems (Prometheus, DataDog, etc.). See 101_004 for plugin SDK details. + +### Gate Categories + +- **Security:** Gates that block based on security findings (vulnerabilities, compliance) +- **Compliance:** Gates that enforce organizational policies and approvals +- **Quality:** Gates that check service health and dependencies +- **Operational:** Gates that manage deployment timing and manual interventions +- **Custom:** User-defined plugin gates + +### This Sprint's Scope + +This sprint (106_003) implements the Gate Registry and the following built-in gates: +- `freeze-window-gate` (blocking) +- `manual-gate` (blocking) +- `policy-gate` (blocking) +- `approval-gate` (blocking) +- `schedule-gate` (blocking) +- `dependency-gate` (non-blocking) + +> **Note:** `security-gate` is detailed in sprint 106_004. + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Promotion/ +│ ├── Gate/ +│ │ ├── IGateRegistry.cs +│ │ ├── GateRegistry.cs +│ │ ├── IGateProvider.cs +│ │ ├── GateEvaluator.cs +│ │ └── GateContext.cs +│ ├── Gate.BuiltIn/ +│ │ ├── FreezeWindowGate.cs +│ │ ├── ManualGate.cs +│ │ ├── PolicyGate.cs +│ │ ├── ApprovalGate.cs +│ │ ├── ScheduleGate.cs +│ │ └── DependencyGate.cs +│ └── Models/ +│ ├── GateDefinition.cs +│ ├── GateResult.cs +│ └── GateConfig.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Promotion.Tests/ + └── Gate/ +``` + +--- + +## Architecture Reference + +- [Promotion Manager](../modules/release-orchestrator/modules/promotion-manager.md) +- [Plugin System](../modules/release-orchestrator/plugins/gate-plugins.md) + +--- + +## Deliverables + +### IGateRegistry Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Gate; + +public interface IGateRegistry +{ + void RegisterBuiltIn(string gateName) where T : class, IGateProvider; + void RegisterPlugin(GateDefinition definition, IGateProvider provider); + Task GetProviderAsync(string gateName, CancellationToken ct = default); + GateDefinition? GetDefinition(string gateName); + IReadOnlyList GetAllDefinitions(); + IReadOnlyList GetBuiltInDefinitions(); + IReadOnlyList GetPluginDefinitions(); + bool IsRegistered(string gateName); +} + +public interface IGateProvider +{ + string GateName { get; } + string DisplayName { get; } + string Description { get; } + GateConfigSchema ConfigSchema { get; } + bool IsBlocking { get; } + + Task EvaluateAsync(GateContext context, CancellationToken ct = default); + Task ValidateConfigAsync(IReadOnlyDictionary config, CancellationToken ct = default); +} +``` + +### GateDefinition Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Models; + +public sealed record GateDefinition +{ + public required string GateName { get; init; } + public required string DisplayName { get; init; } + public required string Description { get; init; } + public required GateCategory Category { get; init; } + public required GateSource Source { get; init; } + public string? PluginId { get; init; } + public required GateConfigSchema ConfigSchema { get; init; } + public required bool IsBlocking { get; init; } + public string? DocumentationUrl { get; init; } +} + +public enum GateCategory +{ + Security, + Compliance, + Quality, + Operational, + Custom +} + +public enum GateSource +{ + BuiltIn, + Plugin +} + +public sealed record GateConfigSchema +{ + public ImmutableArray Properties { get; init; } = []; + public ImmutableArray Required { get; init; } = []; +} + +public sealed record GateConfigProperty( + string Name, + GatePropertyType Type, + string Description, + object? Default = null +); + +public enum GatePropertyType +{ + String, + Integer, + Boolean, + Array, + Object +} +``` + +### GateResult Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Models; + +public sealed record GateResult +{ + public required string GateName { get; init; } + public required string GateType { get; init; } + public required bool Passed { get; init; } + public required bool Blocking { get; init; } + public string? Message { get; init; } + public ImmutableDictionary Details { get; init; } = + ImmutableDictionary.Empty; + public required DateTimeOffset EvaluatedAt { get; init; } + public TimeSpan Duration { get; init; } + + public static GateResult Pass( + string gateName, + string gateType, + string? message = null, + ImmutableDictionary? details = null) => + new() + { + GateName = gateName, + GateType = gateType, + Passed = true, + Blocking = false, + Message = message, + Details = details ?? ImmutableDictionary.Empty, + EvaluatedAt = TimeProvider.System.GetUtcNow() + }; + + public static GateResult Fail( + string gateName, + string gateType, + string message, + bool blocking = true, + ImmutableDictionary? details = null) => + new() + { + GateName = gateName, + GateType = gateType, + Passed = false, + Blocking = blocking, + Message = message, + Details = details ?? ImmutableDictionary.Empty, + EvaluatedAt = TimeProvider.System.GetUtcNow() + }; +} +``` + +### GateContext Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Gate; + +public sealed record GateContext +{ + public required Guid PromotionId { get; init; } + public required Guid ReleaseId { get; init; } + public required string ReleaseName { get; init; } + public required Guid SourceEnvironmentId { get; init; } + public required Guid TargetEnvironmentId { get; init; } + public required string TargetEnvironmentName { get; init; } + public required ImmutableDictionary Config { get; init; } + public required Guid RequestedBy { get; init; } + public required DateTimeOffset RequestedAt { get; init; } +} +``` + +### GateRegistry Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Gate; + +public sealed class GateRegistry : IGateRegistry +{ + private readonly ConcurrentDictionary _gates = new(); + private readonly IServiceProvider _serviceProvider; + private readonly ILogger _logger; + + public void RegisterBuiltIn(string gateName) where T : class, IGateProvider + { + var provider = _serviceProvider.GetRequiredService(); + + var definition = new GateDefinition + { + GateName = gateName, + DisplayName = provider.DisplayName, + Description = provider.Description, + Category = InferCategory(gateName), + Source = GateSource.BuiltIn, + ConfigSchema = provider.ConfigSchema, + IsBlocking = provider.IsBlocking + }; + + if (!_gates.TryAdd(gateName, (definition, provider))) + { + throw new InvalidOperationException($"Gate '{gateName}' is already registered"); + } + + _logger.LogInformation("Registered built-in gate: {GateName}", gateName); + } + + public void RegisterPlugin(GateDefinition definition, IGateProvider provider) + { + if (definition.Source != GateSource.Plugin) + { + throw new ArgumentException("Definition must have Plugin source"); + } + + if (!_gates.TryAdd(definition.GateName, (definition, provider))) + { + throw new InvalidOperationException($"Gate '{definition.GateName}' is already registered"); + } + + _logger.LogInformation( + "Registered plugin gate: {GateName} from {PluginId}", + definition.GateName, + definition.PluginId); + } + + public Task GetProviderAsync(string gateName, CancellationToken ct = default) + { + return _gates.TryGetValue(gateName, out var entry) + ? Task.FromResult(entry.Provider) + : Task.FromResult(null); + } + + public GateDefinition? GetDefinition(string gateName) + { + return _gates.TryGetValue(gateName, out var entry) + ? entry.Definition + : null; + } + + public IReadOnlyList GetAllDefinitions() + { + return _gates.Values.Select(e => e.Definition).ToList().AsReadOnly(); + } + + public IReadOnlyList GetBuiltInDefinitions() + { + return _gates.Values + .Where(e => e.Definition.Source == GateSource.BuiltIn) + .Select(e => e.Definition) + .ToList() + .AsReadOnly(); + } + + public IReadOnlyList GetPluginDefinitions() + { + return _gates.Values + .Where(e => e.Definition.Source == GateSource.Plugin) + .Select(e => e.Definition) + .ToList() + .AsReadOnly(); + } + + public bool IsRegistered(string gateName) => _gates.ContainsKey(gateName); + + private static GateCategory InferCategory(string gateName) => + gateName switch + { + "security-gate" => GateCategory.Security, + "freeze-window-gate" => GateCategory.Operational, + "policy-gate" => GateCategory.Compliance, + "manual-gate" => GateCategory.Operational, + _ => GateCategory.Custom + }; +} +``` + +### GateEvaluator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Gate; + +public sealed class GateEvaluator +{ + private readonly IGateRegistry _registry; + private readonly ILogger _logger; + private readonly TimeProvider _timeProvider; + + public async Task EvaluateAsync( + string gateName, + GateContext context, + CancellationToken ct = default) + { + var sw = Stopwatch.StartNew(); + + var provider = await _registry.GetProviderAsync(gateName, ct); + if (provider is null) + { + return GateResult.Fail( + gateName, + "unknown", + $"Unknown gate type: {gateName}", + blocking: true); + } + + try + { + _logger.LogDebug( + "Evaluating gate {GateName} for promotion {PromotionId}", + gateName, + context.PromotionId); + + var result = await provider.EvaluateAsync(context, ct); + result = result with { Duration = sw.Elapsed }; + + _logger.LogInformation( + "Gate {GateName} for promotion {PromotionId}: {Result} in {Duration}ms", + gateName, + context.PromotionId, + result.Passed ? "PASSED" : "FAILED", + sw.ElapsedMilliseconds); + + return result; + } + catch (Exception ex) + { + _logger.LogError(ex, + "Gate {GateName} evaluation failed for promotion {PromotionId}", + gateName, + context.PromotionId); + + return GateResult.Fail( + gateName, + provider.GateName, + $"Gate evaluation failed: {ex.Message}", + blocking: provider.IsBlocking); + } + } + + public async Task> EvaluateAllAsync( + IReadOnlyList gateNames, + GateContext context, + CancellationToken ct = default) + { + var tasks = gateNames.Select(name => EvaluateAsync(name, context, ct)); + var results = await Task.WhenAll(tasks); + return results.ToList().AsReadOnly(); + } +} +``` + +### FreezeWindowGate (Built-in) + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Gate.BuiltIn; + +public sealed class FreezeWindowGate : IGateProvider +{ + private readonly IFreezeWindowService _freezeWindowService; + + public string GateName => "freeze-window-gate"; + public string DisplayName => "Freeze Window Gate"; + public string Description => "Blocks promotion during freeze windows"; + public bool IsBlocking => true; + + public GateConfigSchema ConfigSchema => new() + { + Properties = + [ + new GateConfigProperty( + "allowExemptions", + GatePropertyType.Boolean, + "Allow exemptions to bypass freeze", + Default: true) + ] + }; + + public async Task EvaluateAsync(GateContext context, CancellationToken ct) + { + var activeFreezeWindow = await _freezeWindowService.GetActiveFreezeWindowAsync( + context.TargetEnvironmentId, ct); + + if (activeFreezeWindow is null) + { + return GateResult.Pass( + GateName, + GateName, + "No active freeze window"); + } + + // Check for exemption + var allowExemptions = context.Config.GetValueOrDefault("allowExemptions") as bool? ?? true; + if (allowExemptions) + { + var hasExemption = await _freezeWindowService.HasExemptionAsync( + activeFreezeWindow.Id, context.RequestedBy, ct); + + if (hasExemption) + { + return GateResult.Pass( + GateName, + GateName, + "Freeze window active but user has exemption", + new Dictionary + { + ["freezeWindowId"] = activeFreezeWindow.Id, + ["exemptionGranted"] = true + }.ToImmutableDictionary()); + } + } + + return GateResult.Fail( + GateName, + GateName, + $"Environment is frozen: {activeFreezeWindow.Name}", + blocking: true, + new Dictionary + { + ["freezeWindowId"] = activeFreezeWindow.Id, + ["freezeWindowName"] = activeFreezeWindow.Name, + ["endsAt"] = activeFreezeWindow.EndAt.ToString("O") + }.ToImmutableDictionary()); + } + + public Task ValidateConfigAsync( + IReadOnlyDictionary config, + CancellationToken ct) => + Task.FromResult(ValidationResult.Success()); +} +``` + +### ManualGate (Built-in) + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Gate.BuiltIn; + +public sealed class ManualGate : IGateProvider +{ + private readonly IPromotionStore _promotionStore; + private readonly IStepCallbackHandler _callbackHandler; + + public string GateName => "manual-gate"; + public string DisplayName => "Manual Gate"; + public string Description => "Requires manual confirmation to proceed"; + public bool IsBlocking => true; + + public GateConfigSchema ConfigSchema => new() + { + Properties = + [ + new GateConfigProperty( + "message", + GatePropertyType.String, + "Message to display for manual confirmation"), + new GateConfigProperty( + "confirmers", + GatePropertyType.Array, + "User IDs or group names who can confirm"), + new GateConfigProperty( + "timeout", + GatePropertyType.Integer, + "Timeout in seconds", + Default: 86400) + ], + Required = ["message"] + }; + + public async Task EvaluateAsync(GateContext context, CancellationToken ct) + { + var message = context.Config.GetValueOrDefault("message")?.ToString() ?? "Manual confirmation required"; + var timeoutSeconds = context.Config.GetValueOrDefault("timeout") as int? ?? 86400; + + // Create callback for manual confirmation + var callback = await _callbackHandler.CreateCallbackAsync( + context.PromotionId, + "manual-gate", + TimeSpan.FromSeconds(timeoutSeconds), + ct); + + // Return a result indicating we're waiting + return new GateResult + { + GateName = GateName, + GateType = GateName, + Passed = false, + Blocking = true, + Message = message, + Details = new Dictionary + { + ["callbackToken"] = callback.Token, + ["expiresAt"] = callback.ExpiresAt.ToString("O"), + ["waitingForConfirmation"] = true + }.ToImmutableDictionary(), + EvaluatedAt = TimeProvider.System.GetUtcNow() + }; + } + + public Task ValidateConfigAsync( + IReadOnlyDictionary config, + CancellationToken ct) => + Task.FromResult(ValidationResult.Success()); +} +``` + +### GateRegistryInitializer + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Gate; + +public sealed class GateRegistryInitializer : IHostedService +{ + private readonly IGateRegistry _registry; + private readonly IPluginLoader _pluginLoader; + private readonly ILogger _logger; + + public async Task StartAsync(CancellationToken ct) + { + _logger.LogInformation("Initializing gate registry"); + + // Register built-in gates (6 gates in this sprint, security-gate in 106_004) + _registry.RegisterBuiltIn("freeze-window-gate"); + _registry.RegisterBuiltIn("manual-gate"); + _registry.RegisterBuiltIn("policy-gate"); + _registry.RegisterBuiltIn("approval-gate"); + _registry.RegisterBuiltIn("schedule-gate"); + _registry.RegisterBuiltIn("dependency-gate"); + + _logger.LogInformation( + "Registered {Count} built-in gates", + _registry.GetBuiltInDefinitions().Count); + + // Load plugin gates + var plugins = await _pluginLoader.GetPluginsAsync(ct); + foreach (var plugin in plugins) + { + try + { + var providers = plugin.Instance.GetGateProviders(); + foreach (var provider in providers) + { + var definition = new GateDefinition + { + GateName = provider.GateName, + DisplayName = provider.DisplayName, + Description = provider.Description, + Category = GateCategory.Custom, + Source = GateSource.Plugin, + PluginId = plugin.Manifest.Id, + ConfigSchema = provider.ConfigSchema, + IsBlocking = provider.IsBlocking + }; + + _registry.RegisterPlugin(definition, provider); + } + } + catch (Exception ex) + { + _logger.LogError(ex, + "Failed to load gate plugin {PluginId}", + plugin.Manifest.Id); + } + } + + _logger.LogInformation( + "Loaded {Count} plugin gates", + _registry.GetPluginDefinitions().Count); + } + + public Task StopAsync(CancellationToken ct) => Task.CompletedTask; +} +``` + +--- + +## Acceptance Criteria + +- [ ] Register built-in gate types +- [ ] Load plugin gate types +- [ ] Evaluate individual gate +- [ ] Evaluate all gates for promotion +- [ ] Freeze window gate blocks during freeze +- [ ] Manual gate waits for confirmation +- [ ] Track gate results +- [ ] Validate gate configuration +- [ ] Unit test coverage >=85% + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `RegisterBuiltIn_AddsGate` | Registration works | +| `RegisterPlugin_AddsGate` | Plugin registration works | +| `GetProvider_ReturnsProvider` | Lookup works | +| `EvaluateGate_ReturnsResult` | Evaluation works | +| `FreezeWindowGate_ActiveFreeze_Fails` | Freeze gate logic | +| `FreezeWindowGate_NoFreeze_Passes` | No freeze passes | +| `ManualGate_CreatesCallback` | Manual gate logic | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `GateRegistryInit_E2E` | Full initialization | +| `PluginGateLoading_E2E` | Plugin gate loading | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 106_001 Promotion Manager | Internal | TODO | +| 101_002 Plugin Registry | Internal | TODO | +| 103_001 Environment (Freeze Windows) | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IGateRegistry | TODO | | +| GateRegistry | TODO | | +| IGateProvider | TODO | | +| GateEvaluator | TODO | | +| GateContext | TODO | | +| FreezeWindowGate | TODO | Blocks during freeze windows | +| ManualGate | TODO | Manual confirmation | +| PolicyGate | TODO | OPA/Rego policy evaluation | +| ApprovalGate | TODO | Multi-party approval (N of M) | +| ScheduleGate | TODO | Deployment window restrictions | +| DependencyGate | TODO | Upstream dependency checks | +| GateRegistryInitializer | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 10-Jan-2026 | Added Gate Type Catalog (8 types) with categories and sprint assignments | diff --git a/docs/implplan/SPRINT_20260110_106_004_PROMOT_security_gate.md b/docs/implplan/SPRINT_20260110_106_004_PROMOT_security_gate.md new file mode 100644 index 000000000..fa82ea190 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_106_004_PROMOT_security_gate.md @@ -0,0 +1,576 @@ +# SPRINT: Security Gate + +> **Sprint ID:** 106_004 +> **Module:** PROMOT +> **Phase:** 6 - Promotion & Gates +> **Status:** TODO +> **Parent:** [106_000_INDEX](SPRINT_20260110_106_000_INDEX_promotion_gates.md) + +--- + +## Overview + +Implement the Security Gate for blocking promotions based on vulnerability thresholds. + +### Objectives + +- Check vulnerability counts against thresholds +- Support severity-based limits (critical, high, medium) +- Require SBOM presence +- Integrate with Scanner service +- Support VEX-based exceptions + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Promotion/ +│ └── Gate.Security/ +│ ├── SecurityGate.cs +│ ├── SecurityGateConfig.cs +│ ├── VulnerabilityCounter.cs +│ ├── VexExceptionChecker.cs +│ └── SbomRequirementChecker.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Promotion.Tests/ + └── Gate.Security/ +``` + +--- + +## Architecture Reference + +- [Security Gate](../modules/release-orchestrator/modules/gates/security-gate.md) +- [Scanner Integration](../modules/scanner/integration.md) + +--- + +## Deliverables + +### SecurityGate + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Gate.Security; + +public sealed class SecurityGate : IGateProvider +{ + private readonly IReleaseManager _releaseManager; + private readonly IScannerService _scannerService; + private readonly VulnerabilityCounter _vulnCounter; + private readonly VexExceptionChecker _vexChecker; + private readonly SbomRequirementChecker _sbomChecker; + private readonly ILogger _logger; + + public string GateName => "security-gate"; + public string DisplayName => "Security Gate"; + public string Description => "Enforces vulnerability thresholds for release promotion"; + public bool IsBlocking => true; + + public GateConfigSchema ConfigSchema => new() + { + Properties = + [ + new GateConfigProperty("maxCritical", GatePropertyType.Integer, "Maximum critical vulnerabilities allowed", Default: 0), + new GateConfigProperty("maxHigh", GatePropertyType.Integer, "Maximum high vulnerabilities allowed", Default: 5), + new GateConfigProperty("maxMedium", GatePropertyType.Integer, "Maximum medium vulnerabilities allowed (null = unlimited)"), + new GateConfigProperty("maxLow", GatePropertyType.Integer, "Maximum low vulnerabilities allowed (null = unlimited)"), + new GateConfigProperty("requireSbom", GatePropertyType.Boolean, "Require SBOM for all components", Default: true), + new GateConfigProperty("maxScanAge", GatePropertyType.Integer, "Maximum scan age in hours", Default: 24), + new GateConfigProperty("applyVexExceptions", GatePropertyType.Boolean, "Apply VEX exceptions to counts", Default: true), + new GateConfigProperty("blockOnKnownExploited", GatePropertyType.Boolean, "Block on KEV vulnerabilities", Default: true) + ] + }; + + public async Task EvaluateAsync(GateContext context, CancellationToken ct) + { + var config = ParseConfig(context.Config); + + var release = await _releaseManager.GetAsync(context.ReleaseId, ct) + ?? throw new ReleaseNotFoundException(context.ReleaseId); + + var violations = new List(); + var details = new Dictionary(); + var totalVulns = new VulnerabilityCounts(); + + foreach (var component in release.Components) + { + // Check SBOM requirement + if (config.RequireSbom) + { + var hasSbom = await _sbomChecker.HasSbomAsync(component.Digest, ct); + if (!hasSbom) + { + violations.Add($"Component {component.ComponentName} has no SBOM"); + } + } + + // Get scan results + var scan = await _scannerService.GetLatestScanAsync(component.Digest, ct); + if (scan is null) + { + if (config.RequireSbom) + { + violations.Add($"Component {component.ComponentName} has no security scan"); + } + continue; + } + + // Check scan age + var scanAge = TimeProvider.System.GetUtcNow() - scan.CompletedAt; + if (scanAge.TotalHours > config.MaxScanAgeHours) + { + violations.Add($"Component {component.ComponentName} scan is too old ({scanAge.TotalHours:F1}h)"); + } + + // Count vulnerabilities + var vulnCounts = await _vulnCounter.CountAsync( + scan, + config.ApplyVexExceptions ? component.Digest : null, + ct); + + totalVulns = totalVulns.Add(vulnCounts); + + // Check for known exploited vulnerabilities + if (config.BlockOnKnownExploited && vulnCounts.KnownExploitedCount > 0) + { + violations.Add( + $"Component {component.ComponentName} has {vulnCounts.KnownExploitedCount} known exploited vulnerabilities"); + } + + details[$"component_{component.ComponentName}"] = new Dictionary + { + ["critical"] = vulnCounts.Critical, + ["high"] = vulnCounts.High, + ["medium"] = vulnCounts.Medium, + ["low"] = vulnCounts.Low, + ["knownExploited"] = vulnCounts.KnownExploitedCount, + ["scanAge"] = scanAge.TotalHours + }; + } + + // Check thresholds + if (totalVulns.Critical > config.MaxCritical) + { + violations.Add($"Critical vulnerabilities ({totalVulns.Critical}) exceed threshold ({config.MaxCritical})"); + } + + if (totalVulns.High > config.MaxHigh) + { + violations.Add($"High vulnerabilities ({totalVulns.High}) exceed threshold ({config.MaxHigh})"); + } + + if (config.MaxMedium.HasValue && totalVulns.Medium > config.MaxMedium.Value) + { + violations.Add($"Medium vulnerabilities ({totalVulns.Medium}) exceed threshold ({config.MaxMedium})"); + } + + if (config.MaxLow.HasValue && totalVulns.Low > config.MaxLow.Value) + { + violations.Add($"Low vulnerabilities ({totalVulns.Low}) exceed threshold ({config.MaxLow})"); + } + + details["totals"] = new Dictionary + { + ["critical"] = totalVulns.Critical, + ["high"] = totalVulns.High, + ["medium"] = totalVulns.Medium, + ["low"] = totalVulns.Low, + ["knownExploited"] = totalVulns.KnownExploitedCount, + ["componentsScanned"] = release.Components.Length + }; + + details["thresholds"] = new Dictionary + { + ["maxCritical"] = config.MaxCritical, + ["maxHigh"] = config.MaxHigh, + ["maxMedium"] = config.MaxMedium ?? -1, + ["maxLow"] = config.MaxLow ?? -1 + }; + + if (violations.Count > 0) + { + _logger.LogWarning( + "Security gate failed for release {ReleaseId}: {Violations}", + context.ReleaseId, + string.Join("; ", violations)); + + return GateResult.Fail( + GateName, + GateName, + string.Join("; ", violations), + blocking: true, + details.ToImmutableDictionary()); + } + + _logger.LogInformation( + "Security gate passed for release {ReleaseId}: {Critical}C/{High}H/{Medium}M/{Low}L", + context.ReleaseId, + totalVulns.Critical, + totalVulns.High, + totalVulns.Medium, + totalVulns.Low); + + return GateResult.Pass( + GateName, + GateName, + $"All security thresholds met", + details.ToImmutableDictionary()); + } + + private static SecurityGateConfig ParseConfig(ImmutableDictionary config) => + new() + { + MaxCritical = config.GetValueOrDefault("maxCritical") as int? ?? 0, + MaxHigh = config.GetValueOrDefault("maxHigh") as int? ?? 5, + MaxMedium = config.GetValueOrDefault("maxMedium") as int?, + MaxLow = config.GetValueOrDefault("maxLow") as int?, + RequireSbom = config.GetValueOrDefault("requireSbom") as bool? ?? true, + MaxScanAgeHours = config.GetValueOrDefault("maxScanAge") as int? ?? 24, + ApplyVexExceptions = config.GetValueOrDefault("applyVexExceptions") as bool? ?? true, + BlockOnKnownExploited = config.GetValueOrDefault("blockOnKnownExploited") as bool? ?? true + }; + + public Task ValidateConfigAsync( + IReadOnlyDictionary config, + CancellationToken ct) + { + var errors = new List(); + + if (config.TryGetValue("maxCritical", out var maxCritical) && + maxCritical is int mc && mc < 0) + { + errors.Add("maxCritical cannot be negative"); + } + + if (config.TryGetValue("maxScanAge", out var maxScanAge) && + maxScanAge is int msa && msa < 1) + { + errors.Add("maxScanAge must be at least 1 hour"); + } + + return Task.FromResult(errors.Count == 0 + ? ValidationResult.Success() + : ValidationResult.Failure(errors)); + } +} +``` + +### SecurityGateConfig + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Gate.Security; + +public sealed record SecurityGateConfig +{ + public int MaxCritical { get; init; } = 0; + public int MaxHigh { get; init; } = 5; + public int? MaxMedium { get; init; } + public int? MaxLow { get; init; } + public bool RequireSbom { get; init; } = true; + public int MaxScanAgeHours { get; init; } = 24; + public bool ApplyVexExceptions { get; init; } = true; + public bool BlockOnKnownExploited { get; init; } = true; +} +``` + +### VulnerabilityCounter + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Gate.Security; + +public sealed class VulnerabilityCounter +{ + private readonly IVexService _vexService; + private readonly IKevService _kevService; + + public async Task CountAsync( + ScanResult scan, + string? digestForVex = null, + CancellationToken ct = default) + { + var counts = new VulnerabilityCounts + { + Critical = scan.CriticalCount, + High = scan.HighCount, + Medium = scan.MediumCount, + Low = scan.LowCount + }; + + // Count known exploited + var kevVulns = await _kevService.GetKevVulnerabilitiesAsync( + scan.Vulnerabilities.Select(v => v.CveId), ct); + counts = counts with { KnownExploitedCount = kevVulns.Count }; + + // Apply VEX exceptions if requested + if (digestForVex is not null) + { + var vexDocs = await _vexService.GetVexForDigestAsync(digestForVex, ct); + counts = ApplyVexExceptions(counts, scan.Vulnerabilities, vexDocs); + } + + return counts; + } + + private static VulnerabilityCounts ApplyVexExceptions( + VulnerabilityCounts counts, + IReadOnlyList vulnerabilities, + IReadOnlyList vexDocs) + { + var exceptedCves = vexDocs + .SelectMany(v => v.Statements) + .Where(s => s.Status == VexStatus.NotAffected || s.Status == VexStatus.Fixed) + .Select(s => s.VulnerabilityId) + .ToHashSet(); + + var adjustedCounts = counts; + + foreach (var vuln in vulnerabilities) + { + if (exceptedCves.Contains(vuln.CveId)) + { + adjustedCounts = vuln.Severity switch + { + VulnerabilitySeverity.Critical => adjustedCounts with { Critical = adjustedCounts.Critical - 1 }, + VulnerabilitySeverity.High => adjustedCounts with { High = adjustedCounts.High - 1 }, + VulnerabilitySeverity.Medium => adjustedCounts with { Medium = adjustedCounts.Medium - 1 }, + VulnerabilitySeverity.Low => adjustedCounts with { Low = adjustedCounts.Low - 1 }, + _ => adjustedCounts + }; + } + } + + return adjustedCounts; + } +} + +public sealed record VulnerabilityCounts +{ + public int Critical { get; init; } + public int High { get; init; } + public int Medium { get; init; } + public int Low { get; init; } + public int KnownExploitedCount { get; init; } + + public int Total => Critical + High + Medium + Low; + + public VulnerabilityCounts Add(VulnerabilityCounts other) => + new() + { + Critical = Critical + other.Critical, + High = High + other.High, + Medium = Medium + other.Medium, + Low = Low + other.Low, + KnownExploitedCount = KnownExploitedCount + other.KnownExploitedCount + }; +} +``` + +### VexExceptionChecker + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Gate.Security; + +public sealed class VexExceptionChecker +{ + private readonly IVexService _vexService; + private readonly ILogger _logger; + + public async Task CheckAsync( + string digest, + string cveId, + CancellationToken ct = default) + { + var vexDocs = await _vexService.GetVexForDigestAsync(digest, ct); + + foreach (var doc in vexDocs) + { + var statement = doc.Statements + .FirstOrDefault(s => s.VulnerabilityId == cveId); + + if (statement is null) + continue; + + if (statement.Status == VexStatus.NotAffected) + { + return new VexExceptionResult( + IsExcepted: true, + Reason: statement.Justification ?? "Not affected", + VexDocumentId: doc.Id, + VexStatus: VexStatus.NotAffected + ); + } + + if (statement.Status == VexStatus.Fixed) + { + return new VexExceptionResult( + IsExcepted: true, + Reason: statement.ActionStatement ?? "Fixed", + VexDocumentId: doc.Id, + VexStatus: VexStatus.Fixed + ); + } + } + + return new VexExceptionResult( + IsExcepted: false, + Reason: null, + VexDocumentId: null, + VexStatus: null + ); + } +} + +public sealed record VexExceptionResult( + bool IsExcepted, + string? Reason, + Guid? VexDocumentId, + VexStatus? VexStatus +); +``` + +### SbomRequirementChecker + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Gate.Security; + +public sealed class SbomRequirementChecker +{ + private readonly ISbomService _sbomService; + private readonly ILogger _logger; + + public async Task HasSbomAsync(string digest, CancellationToken ct = default) + { + var sbom = await _sbomService.GetByDigestAsync(digest, ct); + return sbom is not null; + } + + public async Task ValidateSbomAsync( + string digest, + CancellationToken ct = default) + { + var sbom = await _sbomService.GetByDigestAsync(digest, ct); + + if (sbom is null) + { + return new SbomValidationResult( + HasSbom: false, + IsValid: false, + Errors: ["No SBOM found for digest"] + ); + } + + var errors = new List(); + + // Check SBOM has components + if (sbom.Components.Length == 0) + { + errors.Add("SBOM has no components"); + } + + // Check SBOM format + if (string.IsNullOrEmpty(sbom.Format)) + { + errors.Add("SBOM format not specified"); + } + + // Check SBOM is not too old (optional) + var sbomAge = TimeProvider.System.GetUtcNow() - sbom.GeneratedAt; + if (sbomAge.TotalDays > 90) + { + errors.Add($"SBOM is {sbomAge.TotalDays:F0} days old"); + } + + return new SbomValidationResult( + HasSbom: true, + IsValid: errors.Count == 0, + Errors: errors.ToImmutableArray(), + SbomId: sbom.Id, + Format: sbom.Format, + ComponentCount: sbom.Components.Length, + GeneratedAt: sbom.GeneratedAt + ); + } +} + +public sealed record SbomValidationResult( + bool HasSbom, + bool IsValid, + ImmutableArray Errors, + Guid? SbomId = null, + string? Format = null, + int? ComponentCount = null, + DateTimeOffset? GeneratedAt = null +); +``` + +--- + +## Acceptance Criteria + +- [ ] Check vulnerability counts against thresholds +- [ ] Block on critical vulnerabilities above threshold +- [ ] Block on high vulnerabilities above threshold +- [ ] Support optional medium/low thresholds +- [ ] Require SBOM presence +- [ ] Check scan age +- [ ] Apply VEX exceptions +- [ ] Block on known exploited (KEV) vulnerabilities +- [ ] Return detailed gate result +- [ ] Unit test coverage >=85% + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `Evaluate_BelowThreshold_Passes` | Pass case | +| `Evaluate_CriticalAboveThreshold_Fails` | Critical block | +| `Evaluate_HighAboveThreshold_Fails` | High block | +| `Evaluate_NoSbom_Fails` | SBOM requirement | +| `Evaluate_OldScan_Fails` | Scan age check | +| `VulnCounter_AppliesVexExceptions` | VEX logic | +| `VulnCounter_CountsKev` | KEV counting | +| `SbomChecker_ValidatesSbom` | SBOM validation | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `SecurityGate_E2E` | Full gate evaluation | +| `VexException_E2E` | VEX integration | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 106_003 Gate Registry | Internal | TODO | +| Scanner | Internal | Exists | +| VexService | Internal | Exists | +| SbomService | Internal | Exists | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| SecurityGate | TODO | | +| SecurityGateConfig | TODO | | +| VulnerabilityCounter | TODO | | +| VexExceptionChecker | TODO | | +| SbomRequirementChecker | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_106_005_PROMOT_decision_engine.md b/docs/implplan/SPRINT_20260110_106_005_PROMOT_decision_engine.md new file mode 100644 index 000000000..05588bfe6 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_106_005_PROMOT_decision_engine.md @@ -0,0 +1,626 @@ +# SPRINT: Decision Engine + +> **Sprint ID:** 106_005 +> **Module:** PROMOT +> **Phase:** 6 - Promotion & Gates +> **Status:** TODO +> **Parent:** [106_000_INDEX](SPRINT_20260110_106_000_INDEX_promotion_gates.md) + +--- + +## Overview + +Implement the Decision Engine for combining gate results and approvals into final promotion decisions. + +### Objectives + +- Evaluate all configured gates +- Combine gate results with approval status +- Generate decision records with evidence +- Support configurable decision rules + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Promotion/ +│ ├── Decision/ +│ │ ├── IDecisionEngine.cs +│ │ ├── DecisionEngine.cs +│ │ ├── DecisionRules.cs +│ │ ├── DecisionRecorder.cs +│ │ └── DecisionNotifier.cs +│ └── Models/ +│ ├── DecisionResult.cs +│ ├── DecisionRecord.cs +│ └── EnvironmentGateConfig.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Promotion.Tests/ + └── Decision/ +``` + +--- + +## Architecture Reference + +- [Promotion Manager](../modules/release-orchestrator/modules/promotion-manager.md) + +--- + +## Deliverables + +### IDecisionEngine Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Decision; + +public interface IDecisionEngine +{ + Task EvaluateAsync(Guid promotionId, CancellationToken ct = default); + Task EvaluateGateAsync(Guid promotionId, string gateName, CancellationToken ct = default); + Task> EvaluateAllGatesAsync(Guid promotionId, CancellationToken ct = default); + Task GetDecisionRecordAsync(Guid promotionId, CancellationToken ct = default); + Task> GetDecisionHistoryAsync(Guid promotionId, CancellationToken ct = default); +} +``` + +### DecisionResult Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Models; + +public sealed record DecisionResult +{ + public required Guid PromotionId { get; init; } + public required DecisionOutcome Outcome { get; init; } + public required bool CanProceed { get; init; } + public string? BlockingReason { get; init; } + public required ImmutableArray GateResults { get; init; } + public required ApprovalStatus ApprovalStatus { get; init; } + public required DateTimeOffset EvaluatedAt { get; init; } + public TimeSpan Duration { get; init; } + + public IEnumerable PassedGates => + GateResults.Where(g => g.Passed); + + public IEnumerable FailedGates => + GateResults.Where(g => !g.Passed); + + public IEnumerable BlockingFailedGates => + GateResults.Where(g => !g.Passed && g.Blocking); +} + +public enum DecisionOutcome +{ + Allow, // All gates passed, approvals complete + Deny, // Blocking gate failed + PendingApproval, // Gates passed, awaiting approvals + PendingGate, // Async gate awaiting callback + Error // Evaluation error +} +``` + +### DecisionRecord Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Models; + +public sealed record DecisionRecord +{ + public required Guid Id { get; init; } + public required Guid PromotionId { get; init; } + public required Guid TenantId { get; init; } + public required DecisionOutcome Outcome { get; init; } + public required string OutcomeReason { get; init; } + public required ImmutableArray GateResults { get; init; } + public required ImmutableArray Approvals { get; init; } + public required EnvironmentGateConfig GateConfig { get; init; } + public required DateTimeOffset EvaluatedAt { get; init; } + public required Guid EvaluatedBy { get; init; } // System or user + public string? EvidenceDigest { get; init; } +} + +public sealed record EnvironmentGateConfig +{ + public required Guid EnvironmentId { get; init; } + public required ImmutableArray RequiredGates { get; init; } + public required int RequiredApprovals { get; init; } + public required bool RequireSeparationOfDuties { get; init; } + public required bool AllGatesMustPass { get; init; } +} +``` + +### DecisionEngine Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Decision; + +public sealed class DecisionEngine : IDecisionEngine +{ + private readonly IPromotionStore _promotionStore; + private readonly IEnvironmentService _environmentService; + private readonly IGateRegistry _gateRegistry; + private readonly GateEvaluator _gateEvaluator; + private readonly IApprovalGateway _approvalGateway; + private readonly DecisionRules _decisionRules; + private readonly DecisionRecorder _decisionRecorder; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public async Task EvaluateAsync( + Guid promotionId, + CancellationToken ct = default) + { + var sw = Stopwatch.StartNew(); + + var promotion = await _promotionStore.GetAsync(promotionId, ct) + ?? throw new PromotionNotFoundException(promotionId); + + var gateConfig = await GetGateConfigAsync(promotion.TargetEnvironmentId, ct); + + // Evaluate all required gates + var gateContext = BuildGateContext(promotion); + var gateResults = await EvaluateGatesAsync(gateConfig.RequiredGates, gateContext, ct); + + // Get approval status + var approvalStatus = await _approvalGateway.GetStatusAsync(promotionId, ct); + + // Apply decision rules + var outcome = _decisionRules.Evaluate(gateResults, approvalStatus, gateConfig); + + var result = new DecisionResult + { + PromotionId = promotionId, + Outcome = outcome.Decision, + CanProceed = outcome.CanProceed, + BlockingReason = outcome.BlockingReason, + GateResults = gateResults, + ApprovalStatus = approvalStatus, + EvaluatedAt = _timeProvider.GetUtcNow(), + Duration = sw.Elapsed + }; + + // Record decision + await _decisionRecorder.RecordAsync(promotion, result, gateConfig, ct); + + // Update promotion gate results + var updatedPromotion = promotion with { GateResults = gateResults }; + await _promotionStore.SaveAsync(updatedPromotion, ct); + + // Publish event + await _eventPublisher.PublishAsync(new PromotionDecisionMade( + promotionId, + promotion.TenantId, + result.Outcome, + result.CanProceed, + gateResults.Count(g => g.Passed), + gateResults.Count(g => !g.Passed), + _timeProvider.GetUtcNow() + ), ct); + + _logger.LogInformation( + "Decision for promotion {PromotionId}: {Outcome} (proceed={CanProceed}) in {Duration}ms", + promotionId, + result.Outcome, + result.CanProceed, + sw.ElapsedMilliseconds); + + return result; + } + + private async Task> EvaluateGatesAsync( + ImmutableArray gateNames, + GateContext context, + CancellationToken ct) + { + var results = new List(); + + // Evaluate gates in parallel + var tasks = gateNames.Select(name => + _gateEvaluator.EvaluateAsync(name, context, ct)); + + var gateResults = await Task.WhenAll(tasks); + return gateResults.ToImmutableArray(); + } + + private async Task GetGateConfigAsync( + Guid environmentId, + CancellationToken ct) + { + var environment = await _environmentService.GetAsync(environmentId, ct) + ?? throw new EnvironmentNotFoundException(environmentId); + + // Get configured gates for this environment + var configuredGates = await _environmentService.GetGatesAsync(environmentId, ct); + + return new EnvironmentGateConfig + { + EnvironmentId = environmentId, + RequiredGates = configuredGates.Select(g => g.GateName).ToImmutableArray(), + RequiredApprovals = environment.RequiredApprovals, + RequireSeparationOfDuties = environment.RequireSeparationOfDuties, + AllGatesMustPass = true // Configurable in future + }; + } + + private static GateContext BuildGateContext(Promotion promotion) => + new() + { + PromotionId = promotion.Id, + ReleaseId = promotion.ReleaseId, + ReleaseName = promotion.ReleaseName, + SourceEnvironmentId = promotion.SourceEnvironmentId, + TargetEnvironmentId = promotion.TargetEnvironmentId, + TargetEnvironmentName = promotion.TargetEnvironmentName, + Config = ImmutableDictionary.Empty, + RequestedBy = promotion.RequestedBy, + RequestedAt = promotion.RequestedAt + }; + + public async Task GetDecisionRecordAsync( + Guid promotionId, + CancellationToken ct = default) + { + return await _decisionRecorder.GetLatestAsync(promotionId, ct) + ?? throw new DecisionRecordNotFoundException(promotionId); + } +} +``` + +### DecisionRules + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Decision; + +public sealed class DecisionRules +{ + public DecisionOutcomeResult Evaluate( + ImmutableArray gateResults, + ApprovalStatus approvalStatus, + EnvironmentGateConfig config) + { + // Check for blocking gate failures first + var blockingFailures = gateResults.Where(g => !g.Passed && g.Blocking).ToList(); + if (blockingFailures.Count > 0) + { + return new DecisionOutcomeResult( + Decision: DecisionOutcome.Deny, + CanProceed: false, + BlockingReason: $"Blocked by gates: {string.Join(", ", blockingFailures.Select(g => g.GateName))}" + ); + } + + // Check for async gates waiting for callback + var pendingGates = gateResults + .Where(g => !g.Passed && g.Details.ContainsKey("waitingForConfirmation")) + .ToList(); + + if (pendingGates.Count > 0) + { + return new DecisionOutcomeResult( + Decision: DecisionOutcome.PendingGate, + CanProceed: false, + BlockingReason: $"Waiting for: {string.Join(", ", pendingGates.Select(g => g.GateName))}" + ); + } + + // Check if all gates must pass + if (config.AllGatesMustPass) + { + var failedGates = gateResults.Where(g => !g.Passed).ToList(); + if (failedGates.Count > 0) + { + return new DecisionOutcomeResult( + Decision: DecisionOutcome.Deny, + CanProceed: false, + BlockingReason: $"Failed gates: {string.Join(", ", failedGates.Select(g => g.GateName))}" + ); + } + } + + // Check approval status + if (approvalStatus.IsRejected) + { + return new DecisionOutcomeResult( + Decision: DecisionOutcome.Deny, + CanProceed: false, + BlockingReason: "Promotion was rejected" + ); + } + + if (!approvalStatus.IsApproved && config.RequiredApprovals > 0) + { + return new DecisionOutcomeResult( + Decision: DecisionOutcome.PendingApproval, + CanProceed: false, + BlockingReason: $"Awaiting approvals: {approvalStatus.CurrentApprovals}/{config.RequiredApprovals}" + ); + } + + // All checks passed + return new DecisionOutcomeResult( + Decision: DecisionOutcome.Allow, + CanProceed: true, + BlockingReason: null + ); + } +} + +public sealed record DecisionOutcomeResult( + DecisionOutcome Decision, + bool CanProceed, + string? BlockingReason +); +``` + +### DecisionRecorder + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Decision; + +public sealed class DecisionRecorder +{ + private readonly IDecisionRecordStore _store; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + private readonly ILogger _logger; + + public async Task RecordAsync( + Promotion promotion, + DecisionResult result, + EnvironmentGateConfig config, + CancellationToken ct = default) + { + var record = new DecisionRecord + { + Id = _guidGenerator.NewGuid(), + PromotionId = promotion.Id, + TenantId = promotion.TenantId, + Outcome = result.Outcome, + OutcomeReason = result.BlockingReason ?? "All requirements met", + GateResults = result.GateResults, + Approvals = promotion.Approvals, + GateConfig = config, + EvaluatedAt = _timeProvider.GetUtcNow(), + EvaluatedBy = Guid.Empty, // System evaluation + EvidenceDigest = ComputeEvidenceDigest(result) + }; + + await _store.SaveAsync(record, ct); + + _logger.LogDebug( + "Recorded decision {DecisionId} for promotion {PromotionId}: {Outcome}", + record.Id, + promotion.Id, + result.Outcome); + } + + public async Task GetLatestAsync( + Guid promotionId, + CancellationToken ct = default) + { + return await _store.GetLatestAsync(promotionId, ct); + } + + public async Task> GetHistoryAsync( + Guid promotionId, + CancellationToken ct = default) + { + return await _store.ListByPromotionAsync(promotionId, ct); + } + + private static string ComputeEvidenceDigest(DecisionResult result) + { + // Create canonical representation and hash + var evidence = new + { + result.PromotionId, + result.Outcome, + result.EvaluatedAt, + Gates = result.GateResults.Select(g => new + { + g.GateName, + g.Passed, + g.Message + }).OrderBy(g => g.GateName), + Approvals = result.ApprovalStatus.Approvals.Select(a => new + { + a.UserId, + a.Decision, + a.DecidedAt + }).OrderBy(a => a.DecidedAt) + }; + + var json = CanonicalJsonSerializer.Serialize(evidence); + var hash = SHA256.HashData(Encoding.UTF8.GetBytes(json)); + return $"sha256:{Convert.ToHexString(hash).ToLowerInvariant()}"; + } +} +``` + +### DecisionNotifier + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Decision; + +public sealed class DecisionNotifier +{ + private readonly INotificationService _notificationService; + private readonly IPromotionStore _promotionStore; + private readonly ILogger _logger; + + public async Task NotifyDecisionAsync( + DecisionResult result, + CancellationToken ct = default) + { + var promotion = await _promotionStore.GetAsync(result.PromotionId, ct); + if (promotion is null) + return; + + var notification = result.Outcome switch + { + DecisionOutcome.Allow => BuildAllowNotification(promotion, result), + DecisionOutcome.Deny => BuildDenyNotification(promotion, result), + DecisionOutcome.PendingApproval => BuildPendingApprovalNotification(promotion, result), + _ => null + }; + + if (notification is not null) + { + await _notificationService.SendAsync(notification, ct); + + _logger.LogInformation( + "Sent {Outcome} notification for promotion {PromotionId}", + result.Outcome, + result.PromotionId); + } + } + + private static NotificationRequest BuildAllowNotification( + Promotion promotion, + DecisionResult result) => + new() + { + Channel = "slack", + Title = $"Promotion Approved: {promotion.ReleaseName}", + Message = $"Release '{promotion.ReleaseName}' has been approved for deployment to {promotion.TargetEnvironmentName}.", + Severity = NotificationSeverity.Info, + Metadata = new Dictionary + { + ["promotionId"] = promotion.Id.ToString(), + ["outcome"] = "allow" + } + }; + + private static NotificationRequest BuildDenyNotification( + Promotion promotion, + DecisionResult result) => + new() + { + Channel = "slack", + Title = $"Promotion Blocked: {promotion.ReleaseName}", + Message = $"Release '{promotion.ReleaseName}' promotion to {promotion.TargetEnvironmentName} was blocked.\n\nReason: {result.BlockingReason}", + Severity = NotificationSeverity.Warning, + Metadata = new Dictionary + { + ["promotionId"] = promotion.Id.ToString(), + ["outcome"] = "deny" + } + }; + + private static NotificationRequest BuildPendingApprovalNotification( + Promotion promotion, + DecisionResult result) => + new() + { + Channel = "slack", + Title = $"Approval Required: {promotion.ReleaseName}", + Message = $"Release '{promotion.ReleaseName}' is awaiting approval for deployment to {promotion.TargetEnvironmentName}.\n\n{result.BlockingReason}", + Severity = NotificationSeverity.Info, + Metadata = new Dictionary + { + ["promotionId"] = promotion.Id.ToString(), + ["outcome"] = "pending_approval" + } + }; +} +``` + +### Domain Events + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Events; + +public sealed record PromotionDecisionMade( + Guid PromotionId, + Guid TenantId, + DecisionOutcome Outcome, + bool CanProceed, + int PassedGates, + int FailedGates, + DateTimeOffset DecidedAt +) : IDomainEvent; + +public sealed record PromotionReadyForDeployment( + Guid PromotionId, + Guid TenantId, + Guid ReleaseId, + Guid TargetEnvironmentId, + DateTimeOffset ReadyAt +) : IDomainEvent; +``` + +--- + +## Acceptance Criteria + +- [ ] Evaluate all configured gates +- [ ] Combine gate results with approvals +- [ ] Deny on blocking gate failure +- [ ] Pending on approval required +- [ ] Allow when all requirements met +- [ ] Record decision with evidence +- [ ] Compute evidence digest +- [ ] Notify on decision +- [ ] Support decision history +- [ ] Unit test coverage >=85% + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `Evaluate_AllGatesPass_AllApprovals_Allows` | Allow case | +| `Evaluate_BlockingGateFails_Denies` | Deny case | +| `Evaluate_PendingApprovals_ReturnsPending` | Pending case | +| `DecisionRules_AllMustPass_AnyFails_Denies` | Rule logic | +| `DecisionRecorder_ComputesDigest` | Evidence hash | +| `DecisionRecorder_SavesHistory` | History tracking | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `DecisionEngine_E2E` | Full evaluation flow | +| `DecisionHistory_E2E` | Multiple decisions | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 106_002 Approval Gateway | Internal | TODO | +| 106_003 Gate Registry | Internal | TODO | +| 106_004 Security Gate | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IDecisionEngine | TODO | | +| DecisionEngine | TODO | | +| DecisionRules | TODO | | +| DecisionRecorder | TODO | | +| DecisionNotifier | TODO | | +| DecisionResult model | TODO | | +| DecisionRecord model | TODO | | +| IDecisionRecordStore | TODO | | +| DecisionRecordStore | TODO | | +| Domain events | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_107_000_INDEX_deployment_execution.md b/docs/implplan/SPRINT_20260110_107_000_INDEX_deployment_execution.md new file mode 100644 index 000000000..71abb47f8 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_107_000_INDEX_deployment_execution.md @@ -0,0 +1,254 @@ +# SPRINT INDEX: Phase 7 - Deployment Execution + +> **Epic:** Release Orchestrator +> **Phase:** 7 - Deployment Execution +> **Batch:** 107 +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_release_orchestrator.md) + +--- + +## Overview + +Phase 7 implements the Deployment Execution system - orchestrating the actual deployment of releases to targets via agents. + +### Objectives + +- Deploy orchestrator coordinates multi-target deployments +- Target executor dispatches tasks to agents +- Artifact generator creates deployment artifacts +- Rollback manager handles failure recovery +- Deployment strategies (rolling, blue-green, canary) + +--- + +## Sprint Structure + +| Sprint ID | Title | Module | Status | Dependencies | +|-----------|-------|--------|--------|--------------| +| 107_001 | Deploy Orchestrator | DEPLOY | TODO | 105_003, 106_005 | +| 107_002 | Target Executor | DEPLOY | TODO | 107_001, 103_002 | +| 107_003 | Artifact Generator | DEPLOY | TODO | 107_001 | +| 107_004 | Rollback Manager | DEPLOY | TODO | 107_002 | +| 107_005 | Deployment Strategies | DEPLOY | TODO | 107_002 | + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ DEPLOYMENT EXECUTION │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ DEPLOY ORCHESTRATOR (107_001) │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Deployment Job │ │ │ +│ │ │ promotion_id: uuid │ │ │ +│ │ │ strategy: rolling │ │ │ +│ │ │ targets: [target-1, target-2, target-3] │ │ │ +│ │ │ status: deploying │ │ │ +│ │ └─────────────────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ TARGET EXECUTOR (107_002) │ │ +│ │ │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ Target 1 │ │ Target 2 │ │ Target 3 │ │ │ +│ │ │ ✓ Done │ │ ⟳ Running │ │ ○ Pending │ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ │ │ │ +│ │ Task dispatch via gRPC to agents │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ ARTIFACT GENERATOR (107_003) │ │ +│ │ │ │ +│ │ Generated artifacts for each deployment: │ │ +│ │ ├── compose.stella.lock.yml (digested compose file) │ │ +│ │ ├── stella.version.json (version sticker) │ │ +│ │ └── deployment-manifest.json (full deployment record) │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ ROLLBACK MANAGER (107_004) │ │ +│ │ │ │ +│ │ On failure: │ │ +│ │ 1. Stop pending tasks │ │ +│ │ 2. Rollback completed targets to previous version │ │ +│ │ 3. Generate rollback evidence │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ DEPLOYMENT STRATEGIES (107_005) │ │ +│ │ │ │ +│ │ Rolling: [■■□□□] → [■■■□□] → [■■■■□] → [■■■■■] │ │ +│ │ Blue-Green: [■■■■■] ──swap──► [□□□□□] (instant cutover) │ │ +│ │ Canary: [■□□□□] → [■■□□□] → [■■■□□] → [■■■■■] (gradual) │ │ +│ │ All-at-once: [□□□□□] → [■■■■■] (simultaneous) │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Deliverables Summary + +### 107_001: Deploy Orchestrator + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IDeployOrchestrator` | Interface | Deployment coordination | +| `DeployOrchestrator` | Class | Implementation | +| `DeploymentJob` | Model | Job entity | +| `DeploymentScheduler` | Class | Task scheduling | + +### 107_002: Target Executor + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `ITargetExecutor` | Interface | Target deployment | +| `TargetExecutor` | Class | Implementation | +| `DeploymentTask` | Model | Per-target task | +| `AgentDispatcher` | Class | gRPC task dispatch | + +### 107_003: Artifact Generator + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IArtifactGenerator` | Interface | Artifact creation | +| `ComposeLockGenerator` | Class | Digest-locked compose | +| `VersionStickerGenerator` | Class | stella.version.json | +| `DeploymentManifestGenerator` | Class | Full manifest | + +### 107_004: Rollback Manager + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IRollbackManager` | Interface | Rollback operations | +| `RollbackManager` | Class | Implementation | +| `RollbackPlan` | Model | Rollback strategy | +| `RollbackExecutor` | Class | Execute rollback | + +### 107_005: Deployment Strategies + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IDeploymentStrategy` | Interface | Strategy contract | +| `RollingStrategy` | Strategy | Rolling deployment | +| `BlueGreenStrategy` | Strategy | Blue-green deployment | +| `CanaryStrategy` | Strategy | Canary deployment | +| `AllAtOnceStrategy` | Strategy | Simultaneous deployment | + +--- + +## Key Interfaces + +```csharp +public interface IDeployOrchestrator +{ + Task StartAsync(Guid promotionId, DeploymentOptions options, CancellationToken ct); + Task GetJobAsync(Guid jobId, CancellationToken ct); + Task CancelAsync(Guid jobId, CancellationToken ct); + Task WaitForCompletionAsync(Guid jobId, CancellationToken ct); +} + +public interface ITargetExecutor +{ + Task DeployToTargetAsync(Guid jobId, Guid targetId, DeploymentPayload payload, CancellationToken ct); + Task GetTaskAsync(Guid taskId, CancellationToken ct); +} + +public interface IDeploymentStrategy +{ + string Name { get; } + Task> PlanAsync(DeploymentJob job, CancellationToken ct); + Task ShouldProceedAsync(DeploymentBatch completedBatch, CancellationToken ct); +} + +public interface IRollbackManager +{ + Task PlanAsync(Guid jobId, CancellationToken ct); + Task ExecuteAsync(RollbackPlan plan, CancellationToken ct); +} +``` + +--- + +## Deployment Flow + +``` +┌──────────────────────────────────────────────────────────────────────────────┐ +│ DEPLOYMENT FLOW │ +│ │ +│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ +│ │ Promotion │───►│ Decision │───►│ Deploy │───►│ Generate │ │ +│ │ Approved │ │ Allow │ │ Start │ │ Artifacts │ │ +│ └─────────────┘ └─────────────┘ └─────────────┘ └──────┬──────┘ │ +│ │ │ +│ ┌─────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────────┐│ +│ │ Strategy Execution ││ +│ │ ││ +│ │ Batch 1 Batch 2 Batch 3 ││ +│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ││ +│ │ │Target-1 │ ──► │Target-2 │ ──► │Target-3 │ ││ +│ │ │ ✓ Done │ │ ✓ Done │ │ ⟳ Active │ ││ +│ │ └─────────┘ └─────────┘ └─────────┘ ││ +│ │ │ │ │ ││ +│ │ ▼ ▼ ▼ ││ +│ │ Health Check Health Check Health Check ││ +│ │ │ │ │ ││ +│ │ ▼ ▼ ▼ ││ +│ │ Write Sticker Write Sticker Write Sticker ││ +│ └─────────────────────────────────────────────────────────────────────────┘│ +│ │ +│ ┌─────────────────────────────────────────────────────┐ │ +│ │ On Failure │ │ +│ │ │ │ +│ │ 1. Stop pending batches │ │ +│ │ 2. Rollback completed targets │ │ +│ │ 3. Generate rollback evidence │ │ +│ │ 4. Update promotion status │ │ +│ └─────────────────────────────────────────────────────┘ │ +│ │ +└──────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Dependencies + +| Module | Purpose | +|--------|---------| +| 105_003 Workflow Engine | Workflow execution | +| 106_005 Decision Engine | Deployment approval | +| 103_002 Target Registry | Target information | +| 108_* Agents | Task execution | + +--- + +## Acceptance Criteria + +- [ ] Deployment job created from promotion +- [ ] Tasks dispatched to agents +- [ ] Rolling deployment works +- [ ] Blue-green deployment works +- [ ] Canary deployment works +- [ ] Artifacts generated for each target +- [ ] Rollback restores previous version +- [ ] Health checks gate progression +- [ ] Unit test coverage ≥80% + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Phase 7 index created | diff --git a/docs/implplan/SPRINT_20260110_107_001_DEPLOY_orchestrator.md b/docs/implplan/SPRINT_20260110_107_001_DEPLOY_orchestrator.md new file mode 100644 index 000000000..666a70b9a --- /dev/null +++ b/docs/implplan/SPRINT_20260110_107_001_DEPLOY_orchestrator.md @@ -0,0 +1,410 @@ +# SPRINT: Deploy Orchestrator + +> **Sprint ID:** 107_001 +> **Module:** DEPLOY +> **Phase:** 7 - Deployment Execution +> **Status:** TODO +> **Parent:** [107_000_INDEX](SPRINT_20260110_107_000_INDEX_deployment_execution.md) + +--- + +## Overview + +Implement the Deploy Orchestrator for coordinating multi-target deployments. + +### Objectives + +- Create deployment jobs from approved promotions +- Coordinate deployment across multiple targets +- Track deployment progress and status +- Support deployment cancellation + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Deployment/ +│ ├── Orchestrator/ +│ │ ├── IDeployOrchestrator.cs +│ │ ├── DeployOrchestrator.cs +│ │ ├── DeploymentCoordinator.cs +│ │ └── DeploymentScheduler.cs +│ ├── Store/ +│ │ ├── IDeploymentJobStore.cs +│ │ └── DeploymentJobStore.cs +│ └── Models/ +│ ├── DeploymentJob.cs +│ ├── DeploymentOptions.cs +│ └── DeploymentStatus.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### IDeployOrchestrator Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Orchestrator; + +public interface IDeployOrchestrator +{ + Task StartAsync(Guid promotionId, DeploymentOptions options, CancellationToken ct = default); + Task GetJobAsync(Guid jobId, CancellationToken ct = default); + Task> ListJobsAsync(DeploymentJobFilter? filter = null, CancellationToken ct = default); + Task CancelAsync(Guid jobId, string? reason = null, CancellationToken ct = default); + Task WaitForCompletionAsync(Guid jobId, TimeSpan? timeout = null, CancellationToken ct = default); + Task GetProgressAsync(Guid jobId, CancellationToken ct = default); +} + +public sealed record DeploymentOptions( + DeploymentStrategy Strategy = DeploymentStrategy.Rolling, + string? BatchSize = "25%", + bool WaitForHealthCheck = true, + bool RollbackOnFailure = true, + TimeSpan? Timeout = null, + Guid? WorkflowRunId = null, + string? CallbackToken = null +); + +public enum DeploymentStrategy +{ + Rolling, + BlueGreen, + Canary, + AllAtOnce +} +``` + +### DeploymentJob Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Models; + +public sealed record DeploymentJob +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required Guid PromotionId { get; init; } + public required Guid ReleaseId { get; init; } + public required string ReleaseName { get; init; } + public required Guid EnvironmentId { get; init; } + public required string EnvironmentName { get; init; } + public required DeploymentStatus Status { get; init; } + public required DeploymentStrategy Strategy { get; init; } + public required DeploymentOptions Options { get; init; } + public required ImmutableArray Tasks { get; init; } + public string? FailureReason { get; init; } + public string? CancelReason { get; init; } + public DateTimeOffset StartedAt { get; init; } + public DateTimeOffset? CompletedAt { get; init; } + public Guid StartedBy { get; init; } + public Guid? RollbackJobId { get; init; } + + public TimeSpan? Duration => CompletedAt.HasValue + ? CompletedAt.Value - StartedAt + : null; + + public int CompletedTaskCount => Tasks.Count(t => t.Status == DeploymentTaskStatus.Completed); + public int TotalTaskCount => Tasks.Length; + public double ProgressPercent => TotalTaskCount > 0 + ? (double)CompletedTaskCount / TotalTaskCount * 100 + : 0; +} + +public enum DeploymentStatus +{ + Pending, + Running, + Completed, + Failed, + Cancelled, + RollingBack, + RolledBack +} + +public sealed record DeploymentTask +{ + public required Guid Id { get; init; } + public required Guid TargetId { get; init; } + public required string TargetName { get; init; } + public required int BatchIndex { get; init; } + public required DeploymentTaskStatus Status { get; init; } + public string? AgentId { get; init; } + public DateTimeOffset? StartedAt { get; init; } + public DateTimeOffset? CompletedAt { get; init; } + public string? Error { get; init; } + public ImmutableDictionary Result { get; init; } = ImmutableDictionary.Empty; +} + +public enum DeploymentTaskStatus +{ + Pending, + Running, + Completed, + Failed, + Skipped, + Cancelled +} +``` + +### DeployOrchestrator Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Orchestrator; + +public sealed class DeployOrchestrator : IDeployOrchestrator +{ + private readonly IDeploymentJobStore _jobStore; + private readonly IPromotionManager _promotionManager; + private readonly IReleaseManager _releaseManager; + private readonly ITargetRegistry _targetRegistry; + private readonly IDeploymentStrategyFactory _strategyFactory; + private readonly ITargetExecutor _targetExecutor; + private readonly IArtifactGenerator _artifactGenerator; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + private readonly ILogger _logger; + + public async Task StartAsync( + Guid promotionId, + DeploymentOptions options, + CancellationToken ct = default) + { + var promotion = await _promotionManager.GetAsync(promotionId, ct) + ?? throw new PromotionNotFoundException(promotionId); + + if (promotion.Status != PromotionStatus.Approved) + { + throw new PromotionNotApprovedException(promotionId); + } + + var release = await _releaseManager.GetAsync(promotion.ReleaseId, ct) + ?? throw new ReleaseNotFoundException(promotion.ReleaseId); + + var targets = await _targetRegistry.ListHealthyAsync(promotion.TargetEnvironmentId, ct); + if (targets.Count == 0) + { + throw new NoHealthyTargetsException(promotion.TargetEnvironmentId); + } + + // Create deployment tasks for each target + var tasks = targets.Select((target, index) => new DeploymentTask + { + Id = _guidGenerator.NewGuid(), + TargetId = target.Id, + TargetName = target.Name, + BatchIndex = 0, // Will be set by strategy + Status = DeploymentTaskStatus.Pending + }).ToImmutableArray(); + + var job = new DeploymentJob + { + Id = _guidGenerator.NewGuid(), + TenantId = _tenantContext.TenantId, + PromotionId = promotionId, + ReleaseId = release.Id, + ReleaseName = release.Name, + EnvironmentId = promotion.TargetEnvironmentId, + EnvironmentName = promotion.TargetEnvironmentName, + Status = DeploymentStatus.Pending, + Strategy = options.Strategy, + Options = options, + Tasks = tasks, + StartedAt = _timeProvider.GetUtcNow(), + StartedBy = _userContext.UserId + }; + + await _jobStore.SaveAsync(job, ct); + + // Update promotion status + await _promotionManager.UpdateStatusAsync(promotionId, PromotionStatus.Deploying, ct); + + await _eventPublisher.PublishAsync(new DeploymentJobStarted( + job.Id, + job.TenantId, + job.ReleaseName, + job.EnvironmentName, + job.Strategy, + targets.Count, + _timeProvider.GetUtcNow() + ), ct); + + _logger.LogInformation( + "Started deployment job {JobId} for release {Release} to {Environment} with {TargetCount} targets", + job.Id, release.Name, promotion.TargetEnvironmentName, targets.Count); + + // Start deployment execution + _ = ExecuteDeploymentAsync(job.Id, ct); + + return job; + } + + private async Task ExecuteDeploymentAsync(Guid jobId, CancellationToken ct) + { + try + { + var job = await _jobStore.GetAsync(jobId, ct); + if (job is null) return; + + job = job with { Status = DeploymentStatus.Running }; + await _jobStore.SaveAsync(job, ct); + + // Get strategy and plan batches + var strategy = _strategyFactory.Create(job.Strategy); + var batches = await strategy.PlanAsync(job, ct); + + // Execute batches + foreach (var batch in batches) + { + job = await _jobStore.GetAsync(jobId, ct); + if (job is null || job.Status == DeploymentStatus.Cancelled) break; + + await ExecuteBatchAsync(job, batch, ct); + + // Check if should continue + if (!await strategy.ShouldProceedAsync(batch, ct)) + { + _logger.LogWarning("Strategy halted deployment after batch {BatchIndex}", batch.Index); + break; + } + } + + // Complete or fail + job = await _jobStore.GetAsync(jobId, ct); + if (job is not null && job.Status == DeploymentStatus.Running) + { + var allCompleted = job.Tasks.All(t => t.Status == DeploymentTaskStatus.Completed); + job = job with + { + Status = allCompleted ? DeploymentStatus.Completed : DeploymentStatus.Failed, + CompletedAt = _timeProvider.GetUtcNow() + }; + await _jobStore.SaveAsync(job, ct); + + await NotifyCompletionAsync(job, ct); + } + } + catch (Exception ex) + { + _logger.LogError(ex, "Deployment job {JobId} failed", jobId); + await FailJobAsync(jobId, ex.Message, ct); + } + } + + private async Task ExecuteBatchAsync(DeploymentJob job, DeploymentBatch batch, CancellationToken ct) + { + _logger.LogInformation("Executing batch {BatchIndex} with {TaskCount} tasks", + batch.Index, batch.TaskIds.Count); + + // Generate artifacts + var payload = await _artifactGenerator.GeneratePayloadAsync(job, ct); + + // Execute tasks in parallel within batch + var tasks = batch.TaskIds.Select(taskId => + _targetExecutor.DeployToTargetAsync(job.Id, taskId, payload, ct)); + + await Task.WhenAll(tasks); + } + + public async Task CancelAsync(Guid jobId, string? reason = null, CancellationToken ct = default) + { + var job = await _jobStore.GetAsync(jobId, ct) + ?? throw new DeploymentJobNotFoundException(jobId); + + if (job.Status != DeploymentStatus.Running && job.Status != DeploymentStatus.Pending) + { + throw new DeploymentJobNotCancellableException(jobId); + } + + job = job with + { + Status = DeploymentStatus.Cancelled, + CancelReason = reason, + CompletedAt = _timeProvider.GetUtcNow() + }; + + await _jobStore.SaveAsync(job, ct); + + await _eventPublisher.PublishAsync(new DeploymentJobCancelled( + jobId, job.TenantId, reason, _timeProvider.GetUtcNow() + ), ct); + } + + public async Task GetProgressAsync(Guid jobId, CancellationToken ct = default) + { + var job = await _jobStore.GetAsync(jobId, ct) + ?? throw new DeploymentJobNotFoundException(jobId); + + return new DeploymentProgress( + JobId: job.Id, + Status: job.Status, + TotalTargets: job.TotalTaskCount, + CompletedTargets: job.CompletedTaskCount, + FailedTargets: job.Tasks.Count(t => t.Status == DeploymentTaskStatus.Failed), + PendingTargets: job.Tasks.Count(t => t.Status == DeploymentTaskStatus.Pending), + ProgressPercent: job.ProgressPercent, + CurrentBatch: job.Tasks.Where(t => t.Status == DeploymentTaskStatus.Running).Select(t => t.BatchIndex).FirstOrDefault() + ); + } +} + +public sealed record DeploymentProgress( + Guid JobId, + DeploymentStatus Status, + int TotalTargets, + int CompletedTargets, + int FailedTargets, + int PendingTargets, + double ProgressPercent, + int CurrentBatch +); +``` + +--- + +## Acceptance Criteria + +- [ ] Create deployment job from promotion +- [ ] Coordinate multi-target deployment +- [ ] Track task progress per target +- [ ] Cancel running deployment +- [ ] Wait for deployment completion +- [ ] Report deployment progress +- [ ] Handle deployment failures +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 106_005 Decision Engine | Internal | TODO | +| 103_002 Target Registry | Internal | TODO | +| 107_002 Target Executor | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IDeployOrchestrator | TODO | | +| DeployOrchestrator | TODO | | +| DeploymentCoordinator | TODO | | +| DeploymentScheduler | TODO | | +| DeploymentJob model | TODO | | +| IDeploymentJobStore | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_107_002_DEPLOY_target_executor.md b/docs/implplan/SPRINT_20260110_107_002_DEPLOY_target_executor.md new file mode 100644 index 000000000..a4fd21d52 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_107_002_DEPLOY_target_executor.md @@ -0,0 +1,367 @@ +# SPRINT: Target Executor + +> **Sprint ID:** 107_002 +> **Module:** DEPLOY +> **Phase:** 7 - Deployment Execution +> **Status:** TODO +> **Parent:** [107_000_INDEX](SPRINT_20260110_107_000_INDEX_deployment_execution.md) + +--- + +## Overview + +Implement the Target Executor for dispatching deployment tasks to agents. + +### Objectives + +- Dispatch deployment tasks to agents via gRPC +- Track task execution status +- Handle task timeouts and retries +- Collect task results and logs + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Deployment/ +│ ├── Executor/ +│ │ ├── ITargetExecutor.cs +│ │ ├── TargetExecutor.cs +│ │ ├── AgentDispatcher.cs +│ │ └── TaskResultCollector.cs +│ └── Models/ +│ ├── DeploymentPayload.cs +│ └── TaskResult.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### ITargetExecutor Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Executor; + +public interface ITargetExecutor +{ + Task DeployToTargetAsync( + Guid jobId, + Guid taskId, + DeploymentPayload payload, + CancellationToken ct = default); + + Task GetTaskAsync(Guid taskId, CancellationToken ct = default); + Task CancelTaskAsync(Guid taskId, CancellationToken ct = default); + Task GetTaskLogsAsync(Guid taskId, CancellationToken ct = default); +} + +public sealed record DeploymentPayload +{ + public required Guid ReleaseId { get; init; } + public required string ReleaseName { get; init; } + public required ImmutableArray Components { get; init; } + public required string ComposeLock { get; init; } + public required string VersionSticker { get; init; } + public required string DeploymentManifest { get; init; } + public ImmutableDictionary Variables { get; init; } = ImmutableDictionary.Empty; +} + +public sealed record DeploymentComponent( + string Name, + string Image, + string Digest, + ImmutableDictionary Config +); +``` + +### TargetExecutor Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Executor; + +public sealed class TargetExecutor : ITargetExecutor +{ + private readonly IDeploymentJobStore _jobStore; + private readonly ITargetRegistry _targetRegistry; + private readonly IAgentManager _agentManager; + private readonly AgentDispatcher _dispatcher; + private readonly TaskResultCollector _resultCollector; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public async Task DeployToTargetAsync( + Guid jobId, + Guid taskId, + DeploymentPayload payload, + CancellationToken ct = default) + { + var job = await _jobStore.GetAsync(jobId, ct) + ?? throw new DeploymentJobNotFoundException(jobId); + + var task = job.Tasks.FirstOrDefault(t => t.Id == taskId) + ?? throw new DeploymentTaskNotFoundException(taskId); + + var target = await _targetRegistry.GetAsync(task.TargetId, ct) + ?? throw new TargetNotFoundException(task.TargetId); + + if (target.AgentId is null) + { + throw new NoAgentAssignedException(target.Id); + } + + var agent = await _agentManager.GetAsync(target.AgentId.Value, ct); + if (agent?.Status != AgentStatus.Active) + { + throw new AgentNotActiveException(target.AgentId.Value); + } + + // Update task status + task = task with + { + Status = DeploymentTaskStatus.Running, + AgentId = agent.Id.ToString(), + StartedAt = _timeProvider.GetUtcNow() + }; + + await UpdateTaskAsync(job, task, ct); + + await _eventPublisher.PublishAsync(new DeploymentTaskStarted( + taskId, jobId, target.Name, agent.Name, _timeProvider.GetUtcNow() + ), ct); + + try + { + // Dispatch to agent + var agentTask = BuildAgentTask(target, payload); + var result = await _dispatcher.DispatchAsync(agent.Id, agentTask, ct); + + // Collect results + task = await _resultCollector.CollectAsync(task, result, ct); + + if (task.Status == DeploymentTaskStatus.Completed) + { + await _eventPublisher.PublishAsync(new DeploymentTaskCompleted( + taskId, jobId, target.Name, task.CompletedAt!.Value - task.StartedAt!.Value, + _timeProvider.GetUtcNow() + ), ct); + } + else + { + await _eventPublisher.PublishAsync(new DeploymentTaskFailed( + taskId, jobId, target.Name, task.Error ?? "Unknown error", + _timeProvider.GetUtcNow() + ), ct); + } + } + catch (Exception ex) + { + _logger.LogError(ex, "Deployment task {TaskId} failed for target {Target}", taskId, target.Name); + + task = task with + { + Status = DeploymentTaskStatus.Failed, + Error = ex.Message, + CompletedAt = _timeProvider.GetUtcNow() + }; + + await _eventPublisher.PublishAsync(new DeploymentTaskFailed( + taskId, jobId, target.Name, ex.Message, _timeProvider.GetUtcNow() + ), ct); + } + + await UpdateTaskAsync(job, task, ct); + return task; + } + + private static AgentDeploymentTask BuildAgentTask(Target target, DeploymentPayload payload) + { + return new AgentDeploymentTask + { + Type = target.Type switch + { + TargetType.DockerHost => AgentTaskType.DockerDeploy, + TargetType.ComposeHost => AgentTaskType.ComposeDeploy, + _ => throw new UnsupportedTargetTypeException(target.Type) + }, + Payload = new AgentDeploymentPayload + { + Components = payload.Components.Select(c => new AgentComponent + { + Name = c.Name, + Image = $"{c.Image}@{c.Digest}", + Config = c.Config + }).ToList(), + ComposeLock = payload.ComposeLock, + VersionSticker = payload.VersionSticker, + Variables = payload.Variables + } + }; + } + + private async Task UpdateTaskAsync(DeploymentJob job, DeploymentTask updatedTask, CancellationToken ct) + { + var tasks = job.Tasks.Select(t => t.Id == updatedTask.Id ? updatedTask : t).ToImmutableArray(); + var updatedJob = job with { Tasks = tasks }; + await _jobStore.SaveAsync(updatedJob, ct); + } +} +``` + +### AgentDispatcher + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Executor; + +public sealed class AgentDispatcher +{ + private readonly IAgentManager _agentManager; + private readonly ILogger _logger; + private readonly TimeSpan _defaultTimeout = TimeSpan.FromMinutes(30); + + public async Task DispatchAsync( + Guid agentId, + AgentDeploymentTask task, + CancellationToken ct = default) + { + _logger.LogDebug("Dispatching task to agent {AgentId}", agentId); + + using var timeoutCts = new CancellationTokenSource(_defaultTimeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + try + { + var result = await _agentManager.ExecuteTaskAsync(agentId, task, linkedCts.Token); + + _logger.LogDebug( + "Agent {AgentId} completed task with status {Status}", + agentId, + result.Success ? "success" : "failure"); + + return result; + } + catch (OperationCanceledException) when (timeoutCts.IsCancellationRequested) + { + throw new AgentTaskTimeoutException(agentId, _defaultTimeout); + } + } +} + +public sealed record AgentDeploymentTask +{ + public required AgentTaskType Type { get; init; } + public required AgentDeploymentPayload Payload { get; init; } +} + +public enum AgentTaskType +{ + DockerDeploy, + ComposeDeploy, + DockerRollback, + ComposeRollback +} + +public sealed record AgentDeploymentPayload +{ + public required IReadOnlyList Components { get; init; } + public required string ComposeLock { get; init; } + public required string VersionSticker { get; init; } + public IReadOnlyDictionary Variables { get; init; } = new Dictionary(); +} + +public sealed record AgentComponent +{ + public required string Name { get; init; } + public required string Image { get; init; } + public IReadOnlyDictionary Config { get; init; } = new Dictionary(); +} + +public sealed record AgentTaskResult +{ + public bool Success { get; init; } + public string? Error { get; init; } + public IReadOnlyDictionary Outputs { get; init; } = new Dictionary(); + public string? Logs { get; init; } + public TimeSpan Duration { get; init; } +} +``` + +### TaskResultCollector + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Executor; + +public sealed class TaskResultCollector +{ + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public Task CollectAsync( + DeploymentTask task, + AgentTaskResult result, + CancellationToken ct = default) + { + var updatedTask = task with + { + Status = result.Success ? DeploymentTaskStatus.Completed : DeploymentTaskStatus.Failed, + Error = result.Error, + CompletedAt = _timeProvider.GetUtcNow(), + Result = result.Outputs.ToImmutableDictionary() + }; + + _logger.LogDebug( + "Collected result for task {TaskId}: {Status}", + task.Id, + updatedTask.Status); + + return Task.FromResult(updatedTask); + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Dispatch tasks to agents via gRPC +- [ ] Track task execution status +- [ ] Handle task timeouts +- [ ] Collect task results +- [ ] Collect task logs +- [ ] Cancel running tasks +- [ ] Support Docker and Compose targets +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 107_001 Deploy Orchestrator | Internal | TODO | +| 103_002 Target Registry | Internal | TODO | +| 103_003 Agent Manager | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| ITargetExecutor | TODO | | +| TargetExecutor | TODO | | +| AgentDispatcher | TODO | | +| TaskResultCollector | TODO | | +| DeploymentPayload | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_107_003_DEPLOY_artifact_generator.md b/docs/implplan/SPRINT_20260110_107_003_DEPLOY_artifact_generator.md new file mode 100644 index 000000000..19a776b28 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_107_003_DEPLOY_artifact_generator.md @@ -0,0 +1,461 @@ +# SPRINT: Artifact Generator + +> **Sprint ID:** 107_003 +> **Module:** DEPLOY +> **Phase:** 7 - Deployment Execution +> **Status:** TODO +> **Parent:** [107_000_INDEX](SPRINT_20260110_107_000_INDEX_deployment_execution.md) + +--- + +## Overview + +Implement the Artifact Generator for creating deployment artifacts including digest-locked compose files and version stickers. + +### Objectives + +- Generate digest-locked compose files +- Create version sticker files (stella.version.json) +- Generate deployment manifests +- Support multiple artifact formats + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Deployment/ +│ └── Artifact/ +│ ├── IArtifactGenerator.cs +│ ├── ArtifactGenerator.cs +│ ├── ComposeLockGenerator.cs +│ ├── VersionStickerGenerator.cs +│ └── DeploymentManifestGenerator.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### IArtifactGenerator Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Artifact; + +public interface IArtifactGenerator +{ + Task GeneratePayloadAsync(DeploymentJob job, CancellationToken ct = default); + Task GenerateComposeLockAsync(Release release, CancellationToken ct = default); + Task GenerateVersionStickerAsync(Release release, DeploymentJob job, CancellationToken ct = default); + Task GenerateDeploymentManifestAsync(DeploymentJob job, CancellationToken ct = default); +} +``` + +### ComposeLockGenerator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Artifact; + +public sealed class ComposeLockGenerator +{ + private readonly ILogger _logger; + + public string Generate(Release release, ComposeTemplate? template = null) + { + var services = new Dictionary(); + + foreach (var component in release.Components.OrderBy(c => c.OrderIndex)) + { + var service = new Dictionary + { + ["image"] = $"{GetFullImageRef(component)}@{component.Digest}", + ["labels"] = new Dictionary + { + ["stella.release.id"] = release.Id.ToString(), + ["stella.release.name"] = release.Name, + ["stella.component.id"] = component.ComponentId.ToString(), + ["stella.component.name"] = component.ComponentName, + ["stella.digest"] = component.Digest + } + }; + + // Add config from component + foreach (var (key, value) in component.Config) + { + service[key] = value; + } + + services[component.ComponentName] = service; + } + + var compose = new Dictionary + { + ["version"] = "3.8", + ["services"] = services, + ["x-stella"] = new Dictionary + { + ["release"] = new Dictionary + { + ["id"] = release.Id.ToString(), + ["name"] = release.Name, + ["manifestDigest"] = release.ManifestDigest ?? "" + }, + ["generated"] = TimeProvider.System.GetUtcNow().ToString("O") + } + }; + + // Merge with template if provided + if (template is not null) + { + compose = MergeWithTemplate(compose, template); + } + + var yaml = new SerializerBuilder() + .WithNamingConvention(CamelCaseNamingConvention.Instance) + .Build() + .Serialize(compose); + + _logger.LogDebug( + "Generated compose.stella.lock.yml for release {Release} with {Count} services", + release.Name, + services.Count); + + return yaml; + } + + private static string GetFullImageRef(ReleaseComponent component) + { + // Component config should include registry info + var registry = component.Config.GetValueOrDefault("registry", ""); + var repository = component.Config.GetValueOrDefault("repository", component.ComponentName); + return string.IsNullOrEmpty(registry) ? repository : $"{registry}/{repository}"; + } + + private static Dictionary MergeWithTemplate( + Dictionary generated, + ComposeTemplate template) + { + // Deep merge template with generated config + // Template provides networks, volumes, etc. + var merged = new Dictionary(generated); + + if (template.Networks is not null) + merged["networks"] = template.Networks; + + if (template.Volumes is not null) + merged["volumes"] = template.Volumes; + + // Merge service configs from template + if (template.ServiceDefaults is not null && merged["services"] is Dictionary services) + { + foreach (var (serviceName, serviceConfig) in services) + { + if (serviceConfig is Dictionary config) + { + foreach (var (key, value) in template.ServiceDefaults) + { + if (!config.ContainsKey(key)) + { + config[key] = value; + } + } + } + } + } + + return merged; + } +} + +public sealed record ComposeTemplate( + IReadOnlyDictionary? Networks, + IReadOnlyDictionary? Volumes, + IReadOnlyDictionary? ServiceDefaults +); +``` + +### VersionStickerGenerator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Artifact; + +public sealed class VersionStickerGenerator +{ + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public string Generate(Release release, DeploymentJob job, Target target) + { + var sticker = new VersionSticker + { + SchemaVersion = "1.0", + Release = new ReleaseInfo + { + Id = release.Id.ToString(), + Name = release.Name, + ManifestDigest = release.ManifestDigest, + FinalizedAt = release.FinalizedAt?.ToString("O") + }, + Deployment = new DeploymentInfo + { + JobId = job.Id.ToString(), + EnvironmentId = job.EnvironmentId.ToString(), + EnvironmentName = job.EnvironmentName, + TargetId = target.Id.ToString(), + TargetName = target.Name, + Strategy = job.Strategy.ToString(), + DeployedAt = _timeProvider.GetUtcNow().ToString("O") + }, + Components = release.Components.Select(c => new ComponentInfo + { + Name = c.ComponentName, + Digest = c.Digest, + Tag = c.Tag, + SemVer = c.SemVer + }).ToList() + }; + + var json = JsonSerializer.Serialize(sticker, new JsonSerializerOptions + { + WriteIndented = true, + PropertyNamingPolicy = JsonNamingPolicy.CamelCase + }); + + _logger.LogDebug( + "Generated stella.version.json for release {Release} on target {Target}", + release.Name, + target.Name); + + return json; + } +} + +public sealed class VersionSticker +{ + public required string SchemaVersion { get; set; } + public required ReleaseInfo Release { get; set; } + public required DeploymentInfo Deployment { get; set; } + public required IReadOnlyList Components { get; set; } +} + +public sealed class ReleaseInfo +{ + public required string Id { get; set; } + public required string Name { get; set; } + public string? ManifestDigest { get; set; } + public string? FinalizedAt { get; set; } +} + +public sealed class DeploymentInfo +{ + public required string JobId { get; set; } + public required string EnvironmentId { get; set; } + public required string EnvironmentName { get; set; } + public required string TargetId { get; set; } + public required string TargetName { get; set; } + public required string Strategy { get; set; } + public required string DeployedAt { get; set; } +} + +public sealed class ComponentInfo +{ + public required string Name { get; set; } + public required string Digest { get; set; } + public string? Tag { get; set; } + public string? SemVer { get; set; } +} +``` + +### DeploymentManifestGenerator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Artifact; + +public sealed class DeploymentManifestGenerator +{ + private readonly IReleaseManager _releaseManager; + private readonly IEnvironmentService _environmentService; + private readonly IPromotionManager _promotionManager; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public async Task GenerateAsync(DeploymentJob job, CancellationToken ct = default) + { + var release = await _releaseManager.GetAsync(job.ReleaseId, ct); + var environment = await _environmentService.GetAsync(job.EnvironmentId, ct); + var promotion = await _promotionManager.GetAsync(job.PromotionId, ct); + + var manifest = new DeploymentManifest + { + SchemaVersion = "1.0", + Deployment = new DeploymentMetadata + { + JobId = job.Id.ToString(), + Strategy = job.Strategy.ToString(), + StartedAt = job.StartedAt.ToString("O"), + StartedBy = job.StartedBy.ToString() + }, + Release = new ReleaseMetadata + { + Id = release!.Id.ToString(), + Name = release.Name, + ManifestDigest = release.ManifestDigest, + FinalizedAt = release.FinalizedAt?.ToString("O"), + Components = release.Components.Select(c => new ComponentMetadata + { + Id = c.ComponentId.ToString(), + Name = c.ComponentName, + Digest = c.Digest, + Tag = c.Tag, + SemVer = c.SemVer + }).ToList() + }, + Environment = new EnvironmentMetadata + { + Id = environment!.Id.ToString(), + Name = environment.Name, + IsProduction = environment.IsProduction + }, + Promotion = promotion is not null ? new PromotionMetadata + { + Id = promotion.Id.ToString(), + RequestedBy = promotion.RequestedBy.ToString(), + RequestedAt = promotion.RequestedAt.ToString("O"), + Approvals = promotion.Approvals.Select(a => new ApprovalMetadata + { + UserId = a.UserId.ToString(), + UserName = a.UserName, + Decision = a.Decision.ToString(), + DecidedAt = a.DecidedAt.ToString("O") + }).ToList(), + GateResults = promotion.GateResults.Select(g => new GateResultMetadata + { + GateName = g.GateName, + Passed = g.Passed, + Message = g.Message + }).ToList() + } : null, + Targets = job.Tasks.Select(t => new TargetMetadata + { + Id = t.TargetId.ToString(), + Name = t.TargetName, + Status = t.Status.ToString() + }).ToList(), + GeneratedAt = _timeProvider.GetUtcNow().ToString("O") + }; + + var json = JsonSerializer.Serialize(manifest, new JsonSerializerOptions + { + WriteIndented = true, + PropertyNamingPolicy = JsonNamingPolicy.CamelCase + }); + + _logger.LogDebug("Generated deployment manifest for job {JobId}", job.Id); + + return json; + } +} + +// Manifest models +public sealed class DeploymentManifest +{ + public required string SchemaVersion { get; set; } + public required DeploymentMetadata Deployment { get; set; } + public required ReleaseMetadata Release { get; set; } + public required EnvironmentMetadata Environment { get; set; } + public PromotionMetadata? Promotion { get; set; } + public required IReadOnlyList Targets { get; set; } + public required string GeneratedAt { get; set; } +} + +// Additional metadata classes abbreviated for brevity... +``` + +### ArtifactGenerator (Coordinator) + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Artifact; + +public sealed class ArtifactGenerator : IArtifactGenerator +{ + private readonly IReleaseManager _releaseManager; + private readonly ComposeLockGenerator _composeLockGenerator; + private readonly VersionStickerGenerator _versionStickerGenerator; + private readonly DeploymentManifestGenerator _manifestGenerator; + private readonly ILogger _logger; + + public async Task GeneratePayloadAsync( + DeploymentJob job, + CancellationToken ct = default) + { + var release = await _releaseManager.GetAsync(job.ReleaseId, ct) + ?? throw new ReleaseNotFoundException(job.ReleaseId); + + var composeLock = await GenerateComposeLockAsync(release, ct); + var versionSticker = await GenerateVersionStickerAsync(release, job, ct); + var manifest = await GenerateDeploymentManifestAsync(job, ct); + + var components = release.Components.Select(c => new DeploymentComponent( + c.ComponentName, + c.Config.GetValueOrDefault("image", c.ComponentName), + c.Digest, + c.Config + )).ToImmutableArray(); + + return new DeploymentPayload + { + ReleaseId = release.Id, + ReleaseName = release.Name, + Components = components, + ComposeLock = composeLock, + VersionSticker = versionSticker, + DeploymentManifest = manifest + }; + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Generate digest-locked compose files +- [ ] All images use digest references +- [ ] Generate stella.version.json stickers +- [ ] Generate deployment manifests +- [ ] Include all required metadata +- [ ] Merge with compose templates +- [ ] JSON/YAML formats valid +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 107_001 Deploy Orchestrator | Internal | TODO | +| 104_003 Release Manager | Internal | TODO | +| YamlDotNet | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IArtifactGenerator | TODO | | +| ArtifactGenerator | TODO | | +| ComposeLockGenerator | TODO | | +| VersionStickerGenerator | TODO | | +| DeploymentManifestGenerator | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_107_004_DEPLOY_rollback_manager.md b/docs/implplan/SPRINT_20260110_107_004_DEPLOY_rollback_manager.md new file mode 100644 index 000000000..b9c81f0de --- /dev/null +++ b/docs/implplan/SPRINT_20260110_107_004_DEPLOY_rollback_manager.md @@ -0,0 +1,461 @@ +# SPRINT: Rollback Manager + +> **Sprint ID:** 107_004 +> **Module:** DEPLOY +> **Phase:** 7 - Deployment Execution +> **Status:** TODO +> **Parent:** [107_000_INDEX](SPRINT_20260110_107_000_INDEX_deployment_execution.md) + +--- + +## Overview + +Implement the Rollback Manager for handling deployment failure recovery. + +### Objectives + +- Plan rollback strategy for failed deployments +- Execute rollback to previous release +- Track rollback progress and status +- Generate rollback evidence + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Deployment/ +│ └── Rollback/ +│ ├── IRollbackManager.cs +│ ├── RollbackManager.cs +│ ├── RollbackPlanner.cs +│ ├── RollbackExecutor.cs +│ └── RollbackEvidenceGenerator.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### IRollbackManager Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Rollback; + +public interface IRollbackManager +{ + Task PlanAsync(Guid jobId, CancellationToken ct = default); + Task ExecuteAsync(RollbackPlan plan, CancellationToken ct = default); + Task ExecuteAsync(Guid jobId, CancellationToken ct = default); + Task GetPlanAsync(Guid jobId, CancellationToken ct = default); + Task CanRollbackAsync(Guid jobId, CancellationToken ct = default); +} + +public sealed record RollbackPlan +{ + public required Guid Id { get; init; } + public required Guid FailedJobId { get; init; } + public required Guid TargetReleaseId { get; init; } + public required string TargetReleaseName { get; init; } + public required ImmutableArray Targets { get; init; } + public required RollbackStrategy Strategy { get; init; } + public required DateTimeOffset PlannedAt { get; init; } +} + +public enum RollbackStrategy +{ + RedeployPrevious, // Redeploy the previous release + RestoreSnapshot, // Restore from snapshot if available + Manual // Requires manual intervention +} + +public sealed record RollbackTarget( + Guid TargetId, + string TargetName, + string CurrentDigest, + string RollbackToDigest, + RollbackTargetStatus Status +); + +public enum RollbackTargetStatus +{ + Pending, + RollingBack, + RolledBack, + Failed, + Skipped +} +``` + +### RollbackManager Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Rollback; + +public sealed class RollbackManager : IRollbackManager +{ + private readonly IDeploymentJobStore _jobStore; + private readonly IReleaseHistory _releaseHistory; + private readonly IReleaseManager _releaseManager; + private readonly ITargetExecutor _targetExecutor; + private readonly IArtifactGenerator _artifactGenerator; + private readonly RollbackPlanner _planner; + private readonly RollbackEvidenceGenerator _evidenceGenerator; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + private readonly ILogger _logger; + + public async Task PlanAsync(Guid jobId, CancellationToken ct = default) + { + var job = await _jobStore.GetAsync(jobId, ct) + ?? throw new DeploymentJobNotFoundException(jobId); + + if (job.Status != DeploymentStatus.Failed) + { + throw new RollbackNotRequiredException(jobId); + } + + // Find previous successful deployment + var previousRelease = await _releaseHistory.GetPreviousDeployedAsync( + job.EnvironmentId, job.ReleaseId, ct); + + if (previousRelease is null) + { + throw new NoPreviousReleaseException(job.EnvironmentId); + } + + var plan = await _planner.CreatePlanAsync(job, previousRelease, ct); + + _logger.LogInformation( + "Created rollback plan {PlanId} for job {JobId}: rollback to {Release}", + plan.Id, jobId, previousRelease.Name); + + return plan; + } + + public async Task ExecuteAsync( + RollbackPlan plan, + CancellationToken ct = default) + { + var failedJob = await _jobStore.GetAsync(plan.FailedJobId, ct) + ?? throw new DeploymentJobNotFoundException(plan.FailedJobId); + + var targetRelease = await _releaseManager.GetAsync(plan.TargetReleaseId, ct) + ?? throw new ReleaseNotFoundException(plan.TargetReleaseId); + + // Update original job to rolling back + failedJob = failedJob with { Status = DeploymentStatus.RollingBack }; + await _jobStore.SaveAsync(failedJob, ct); + + await _eventPublisher.PublishAsync(new RollbackStarted( + plan.Id, plan.FailedJobId, plan.TargetReleaseId, + plan.TargetReleaseName, plan.Targets.Length, _timeProvider.GetUtcNow() + ), ct); + + try + { + // Generate rollback payload + var payload = await _artifactGenerator.GeneratePayloadAsync( + new DeploymentJob + { + Id = _guidGenerator.NewGuid(), + TenantId = failedJob.TenantId, + PromotionId = failedJob.PromotionId, + ReleaseId = targetRelease.Id, + ReleaseName = targetRelease.Name, + EnvironmentId = failedJob.EnvironmentId, + EnvironmentName = failedJob.EnvironmentName, + Status = DeploymentStatus.Running, + Strategy = DeploymentStrategy.AllAtOnce, + Options = new DeploymentOptions(), + Tasks = [], + StartedAt = _timeProvider.GetUtcNow(), + StartedBy = Guid.Empty + }, ct); + + // Execute rollback on each target + foreach (var target in plan.Targets) + { + if (target.Status != RollbackTargetStatus.Pending) + continue; + + try + { + await ExecuteTargetRollbackAsync(failedJob, target, payload, ct); + } + catch (Exception ex) + { + _logger.LogError(ex, + "Rollback failed for target {Target}", + target.TargetName); + } + } + + // Update job status + failedJob = failedJob with + { + Status = DeploymentStatus.RolledBack, + RollbackJobId = plan.Id, + CompletedAt = _timeProvider.GetUtcNow() + }; + await _jobStore.SaveAsync(failedJob, ct); + + // Generate evidence + await _evidenceGenerator.GenerateAsync(plan, failedJob, ct); + + await _eventPublisher.PublishAsync(new RollbackCompleted( + plan.Id, plan.FailedJobId, plan.TargetReleaseName, + _timeProvider.GetUtcNow() + ), ct); + + _logger.LogInformation( + "Rollback completed for job {JobId} to release {Release}", + plan.FailedJobId, targetRelease.Name); + + return failedJob; + } + catch (Exception ex) + { + _logger.LogError(ex, "Rollback failed for job {JobId}", plan.FailedJobId); + + failedJob = failedJob with + { + Status = DeploymentStatus.Failed, + FailureReason = $"Rollback failed: {ex.Message}" + }; + await _jobStore.SaveAsync(failedJob, ct); + + await _eventPublisher.PublishAsync(new RollbackFailed( + plan.Id, plan.FailedJobId, ex.Message, _timeProvider.GetUtcNow() + ), ct); + + throw; + } + } + + private async Task ExecuteTargetRollbackAsync( + DeploymentJob job, + RollbackTarget target, + DeploymentPayload payload, + CancellationToken ct) + { + _logger.LogInformation( + "Rolling back target {Target} from {Current} to {Previous}", + target.TargetName, + target.CurrentDigest[..16], + target.RollbackToDigest[..16]); + + // Create a rollback task + var task = new DeploymentTask + { + Id = _guidGenerator.NewGuid(), + TargetId = target.TargetId, + TargetName = target.TargetName, + BatchIndex = 0, + Status = DeploymentTaskStatus.Pending + }; + + await _targetExecutor.DeployToTargetAsync(job.Id, task.Id, payload, ct); + } + + public async Task CanRollbackAsync(Guid jobId, CancellationToken ct = default) + { + var job = await _jobStore.GetAsync(jobId, ct); + if (job is null) + return false; + + if (job.Status != DeploymentStatus.Failed) + return false; + + var previousRelease = await _releaseHistory.GetPreviousDeployedAsync( + job.EnvironmentId, job.ReleaseId, ct); + + return previousRelease is not null; + } +} +``` + +### RollbackPlanner + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Rollback; + +public sealed class RollbackPlanner +{ + private readonly IInventorySyncService _inventoryService; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + + public async Task CreatePlanAsync( + DeploymentJob failedJob, + Release targetRelease, + CancellationToken ct = default) + { + var targets = new List(); + + foreach (var task in failedJob.Tasks) + { + // Get current state from inventory + var snapshot = await _inventoryService.GetLatestSnapshotAsync(task.TargetId, ct); + + var currentDigest = snapshot?.Containers + .FirstOrDefault(c => IsDeployedComponent(c, failedJob.ReleaseName)) + ?.ImageDigest ?? ""; + + var rollbackDigest = targetRelease.Components + .FirstOrDefault(c => MatchesTarget(c, task)) + ?.Digest ?? ""; + + targets.Add(new RollbackTarget( + TargetId: task.TargetId, + TargetName: task.TargetName, + CurrentDigest: currentDigest, + RollbackToDigest: rollbackDigest, + Status: task.Status == DeploymentTaskStatus.Completed + ? RollbackTargetStatus.Pending + : RollbackTargetStatus.Skipped + )); + } + + return new RollbackPlan + { + Id = _guidGenerator.NewGuid(), + FailedJobId = failedJob.Id, + TargetReleaseId = targetRelease.Id, + TargetReleaseName = targetRelease.Name, + Targets = targets.ToImmutableArray(), + Strategy = RollbackStrategy.RedeployPrevious, + PlannedAt = _timeProvider.GetUtcNow() + }; + } + + private static bool IsDeployedComponent(ContainerInfo container, string releaseName) => + container.Labels.GetValueOrDefault("stella.release.name") == releaseName; + + private static bool MatchesTarget(ReleaseComponent component, DeploymentTask task) => + component.ComponentName == task.TargetName; +} +``` + +### RollbackEvidenceGenerator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Rollback; + +public sealed class RollbackEvidenceGenerator +{ + private readonly IEvidencePacketService _evidenceService; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public async Task GenerateAsync( + RollbackPlan plan, + DeploymentJob job, + CancellationToken ct = default) + { + var evidence = new RollbackEvidence + { + PlanId = plan.Id.ToString(), + FailedJobId = plan.FailedJobId.ToString(), + TargetReleaseId = plan.TargetReleaseId.ToString(), + TargetReleaseName = plan.TargetReleaseName, + RollbackStrategy = plan.Strategy.ToString(), + PlannedAt = plan.PlannedAt.ToString("O"), + ExecutedAt = _timeProvider.GetUtcNow().ToString("O"), + Targets = plan.Targets.Select(t => new RollbackTargetEvidence + { + TargetId = t.TargetId.ToString(), + TargetName = t.TargetName, + FromDigest = t.CurrentDigest, + ToDigest = t.RollbackToDigest, + Status = t.Status.ToString() + }).ToList(), + OriginalFailure = job.FailureReason + }; + + var packet = await _evidenceService.CreatePacketAsync(new CreateEvidencePacketRequest + { + Type = EvidenceType.Rollback, + SubjectId = plan.FailedJobId, + Content = JsonSerializer.Serialize(evidence), + Metadata = new Dictionary + { + ["rollbackPlanId"] = plan.Id.ToString(), + ["targetRelease"] = plan.TargetReleaseName, + ["environment"] = job.EnvironmentName + } + }, ct); + + _logger.LogInformation( + "Generated rollback evidence packet {PacketId} for job {JobId}", + packet.Id, plan.FailedJobId); + } +} + +public sealed class RollbackEvidence +{ + public required string PlanId { get; set; } + public required string FailedJobId { get; set; } + public required string TargetReleaseId { get; set; } + public required string TargetReleaseName { get; set; } + public required string RollbackStrategy { get; set; } + public required string PlannedAt { get; set; } + public required string ExecutedAt { get; set; } + public required IReadOnlyList Targets { get; set; } + public string? OriginalFailure { get; set; } +} + +public sealed class RollbackTargetEvidence +{ + public required string TargetId { get; set; } + public required string TargetName { get; set; } + public required string FromDigest { get; set; } + public required string ToDigest { get; set; } + public required string Status { get; set; } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Plan rollback from failed deployment +- [ ] Find previous successful release +- [ ] Execute rollback on completed targets +- [ ] Skip targets not yet deployed +- [ ] Track rollback progress +- [ ] Generate rollback evidence +- [ ] Update deployment status +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 107_002 Target Executor | Internal | TODO | +| 104_004 Release Catalog | Internal | TODO | +| 109_002 Evidence Packets | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IRollbackManager | TODO | | +| RollbackManager | TODO | | +| RollbackPlanner | TODO | | +| RollbackEvidenceGenerator | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_107_005_DEPLOY_strategies.md b/docs/implplan/SPRINT_20260110_107_005_DEPLOY_strategies.md new file mode 100644 index 000000000..711191e0b --- /dev/null +++ b/docs/implplan/SPRINT_20260110_107_005_DEPLOY_strategies.md @@ -0,0 +1,460 @@ +# SPRINT: Deployment Strategies + +> **Sprint ID:** 107_005 +> **Module:** DEPLOY +> **Phase:** 7 - Deployment Execution +> **Status:** TODO +> **Parent:** [107_000_INDEX](SPRINT_20260110_107_000_INDEX_deployment_execution.md) + +--- + +## Overview + +Implement deployment strategies for different deployment patterns. + +### Objectives + +- Rolling deployment strategy +- Blue-green deployment strategy +- Canary deployment strategy +- All-at-once deployment strategy +- Strategy factory for selection + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Deployment/ +│ └── Strategy/ +│ ├── IDeploymentStrategy.cs +│ ├── DeploymentStrategyFactory.cs +│ ├── RollingStrategy.cs +│ ├── BlueGreenStrategy.cs +│ ├── CanaryStrategy.cs +│ └── AllAtOnceStrategy.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### IDeploymentStrategy Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Strategy; + +public interface IDeploymentStrategy +{ + string Name { get; } + Task> PlanAsync(DeploymentJob job, CancellationToken ct = default); + Task ShouldProceedAsync(DeploymentBatch completedBatch, CancellationToken ct = default); +} + +public sealed record DeploymentBatch( + int Index, + ImmutableArray TaskIds, + BatchRequirements Requirements +); + +public sealed record BatchRequirements( + bool WaitForHealthCheck = true, + TimeSpan? HealthCheckTimeout = null, + double MinSuccessRate = 1.0 +); +``` + +### RollingStrategy + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Strategy; + +public sealed class RollingStrategy : IDeploymentStrategy +{ + private readonly ITargetHealthChecker _healthChecker; + private readonly ILogger _logger; + + public string Name => "rolling"; + + public Task> PlanAsync( + DeploymentJob job, + CancellationToken ct = default) + { + var batchSize = ParseBatchSize(job.Options.BatchSize, job.Tasks.Length); + var batches = new List(); + + var taskIds = job.Tasks.Select(t => t.Id).ToList(); + var batchIndex = 0; + + while (taskIds.Count > 0) + { + var batchTaskIds = taskIds.Take(batchSize).ToImmutableArray(); + taskIds = taskIds.Skip(batchSize).ToList(); + + batches.Add(new DeploymentBatch( + Index: batchIndex++, + TaskIds: batchTaskIds, + Requirements: new BatchRequirements( + WaitForHealthCheck: job.Options.WaitForHealthCheck, + HealthCheckTimeout: TimeSpan.FromMinutes(5) + ) + )); + } + + _logger.LogInformation( + "Rolling strategy planned {BatchCount} batches of ~{BatchSize} targets", + batches.Count, batchSize); + + return Task.FromResult>(batches); + } + + public async Task ShouldProceedAsync( + DeploymentBatch completedBatch, + CancellationToken ct = default) + { + if (!completedBatch.Requirements.WaitForHealthCheck) + return true; + + // Check health of deployed targets + foreach (var taskId in completedBatch.TaskIds) + { + var isHealthy = await _healthChecker.CheckTaskHealthAsync(taskId, ct); + if (!isHealthy) + { + _logger.LogWarning( + "Task {TaskId} in batch {BatchIndex} is unhealthy, halting rollout", + taskId, completedBatch.Index); + return false; + } + } + + return true; + } + + private static int ParseBatchSize(string? batchSizeSpec, int totalTargets) + { + if (string.IsNullOrEmpty(batchSizeSpec)) + return Math.Max(1, totalTargets / 4); + + if (batchSizeSpec.EndsWith('%')) + { + var percent = int.Parse(batchSizeSpec.TrimEnd('%'), CultureInfo.InvariantCulture); + return Math.Max(1, totalTargets * percent / 100); + } + + return int.Parse(batchSizeSpec, CultureInfo.InvariantCulture); + } +} +``` + +### BlueGreenStrategy + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Strategy; + +public sealed class BlueGreenStrategy : IDeploymentStrategy +{ + private readonly ITargetHealthChecker _healthChecker; + private readonly ITrafficRouter _trafficRouter; + private readonly ILogger _logger; + + public string Name => "blue-green"; + + public Task> PlanAsync( + DeploymentJob job, + CancellationToken ct = default) + { + // Blue-green deploys to all targets at once (the "green" set) + // Then switches traffic from "blue" to "green" + var batches = new List + { + // Phase 1: Deploy to green (all targets) + new DeploymentBatch( + Index: 0, + TaskIds: job.Tasks.Select(t => t.Id).ToImmutableArray(), + Requirements: new BatchRequirements( + WaitForHealthCheck: true, + HealthCheckTimeout: TimeSpan.FromMinutes(10), + MinSuccessRate: 1.0 // All must succeed + ) + ) + }; + + _logger.LogInformation( + "Blue-green strategy: deploy all {Count} targets, then switch traffic", + job.Tasks.Length); + + return Task.FromResult>(batches); + } + + public async Task ShouldProceedAsync( + DeploymentBatch completedBatch, + CancellationToken ct = default) + { + // All targets must be healthy before switching traffic + foreach (var taskId in completedBatch.TaskIds) + { + var isHealthy = await _healthChecker.CheckTaskHealthAsync(taskId, ct); + if (!isHealthy) + { + _logger.LogWarning( + "Blue-green: target {TaskId} unhealthy, not switching traffic", + taskId); + return false; + } + } + + // Switch traffic to new deployment + _logger.LogInformation("Blue-green: switching traffic to new deployment"); + // Traffic switching handled externally based on deployment type + + return true; + } +} +``` + +### CanaryStrategy + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Strategy; + +public sealed class CanaryStrategy : IDeploymentStrategy +{ + private readonly ITargetHealthChecker _healthChecker; + private readonly IMetricsCollector _metricsCollector; + private readonly ILogger _logger; + + public string Name => "canary"; + + public Task> PlanAsync( + DeploymentJob job, + CancellationToken ct = default) + { + var tasks = job.Tasks.ToList(); + var batches = new List(); + + if (tasks.Count == 0) + return Task.FromResult>(batches); + + // Canary phase: 1 target (or min 5% if many targets) + var canarySize = Math.Max(1, tasks.Count / 20); + batches.Add(new DeploymentBatch( + Index: 0, + TaskIds: tasks.Take(canarySize).Select(t => t.Id).ToImmutableArray(), + Requirements: new BatchRequirements( + WaitForHealthCheck: true, + HealthCheckTimeout: TimeSpan.FromMinutes(10), + MinSuccessRate: 1.0 + ) + )); + tasks = tasks.Skip(canarySize).ToList(); + + // Gradual rollout: 25% increments + var batchIndex = 1; + var incrementSize = Math.Max(1, (tasks.Count + 3) / 4); + + while (tasks.Count > 0) + { + var batchTasks = tasks.Take(incrementSize).ToList(); + tasks = tasks.Skip(incrementSize).ToList(); + + batches.Add(new DeploymentBatch( + Index: batchIndex++, + TaskIds: batchTasks.Select(t => t.Id).ToImmutableArray(), + Requirements: new BatchRequirements( + WaitForHealthCheck: true, + MinSuccessRate: 0.95 // Allow some failures in later batches + ) + )); + } + + _logger.LogInformation( + "Canary strategy: {CanarySize} canary, then {Batches} batches", + canarySize, batches.Count - 1); + + return Task.FromResult>(batches); + } + + public async Task ShouldProceedAsync( + DeploymentBatch completedBatch, + CancellationToken ct = default) + { + // Check health + var healthyCount = 0; + foreach (var taskId in completedBatch.TaskIds) + { + if (await _healthChecker.CheckTaskHealthAsync(taskId, ct)) + healthyCount++; + } + + var successRate = (double)healthyCount / completedBatch.TaskIds.Length; + if (successRate < completedBatch.Requirements.MinSuccessRate) + { + _logger.LogWarning( + "Canary batch {Index}: success rate {Rate:P0} below threshold {Required:P0}", + completedBatch.Index, successRate, completedBatch.Requirements.MinSuccessRate); + return false; + } + + // For canary batch (index 0), also check metrics + if (completedBatch.Index == 0) + { + var metrics = await _metricsCollector.GetCanaryMetricsAsync( + completedBatch.TaskIds, ct); + + if (metrics.ErrorRate > 0.05) + { + _logger.LogWarning( + "Canary error rate {Rate:P1} exceeds threshold", + metrics.ErrorRate); + return false; + } + } + + return true; + } +} + +public sealed record CanaryMetrics( + double ErrorRate, + double Latency99th, + int RequestCount +); +``` + +### AllAtOnceStrategy + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Strategy; + +public sealed class AllAtOnceStrategy : IDeploymentStrategy +{ + private readonly ITargetHealthChecker _healthChecker; + private readonly ILogger _logger; + + public string Name => "all-at-once"; + + public Task> PlanAsync( + DeploymentJob job, + CancellationToken ct = default) + { + var batches = new List + { + new DeploymentBatch( + Index: 0, + TaskIds: job.Tasks.Select(t => t.Id).ToImmutableArray(), + Requirements: new BatchRequirements( + WaitForHealthCheck: job.Options.WaitForHealthCheck, + MinSuccessRate: 0.8 // Allow some failures + ) + ) + }; + + _logger.LogInformation( + "All-at-once strategy: deploying to all {Count} targets simultaneously", + job.Tasks.Length); + + return Task.FromResult>(batches); + } + + public Task ShouldProceedAsync( + DeploymentBatch completedBatch, + CancellationToken ct = default) + { + // Single batch, always "proceed" (nothing to proceed to) + return Task.FromResult(true); + } +} +``` + +### DeploymentStrategyFactory + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Strategy; + +public interface IDeploymentStrategyFactory +{ + IDeploymentStrategy Create(DeploymentStrategy strategy); + IReadOnlyList GetAvailableStrategies(); +} + +public sealed class DeploymentStrategyFactory : IDeploymentStrategyFactory +{ + private readonly IServiceProvider _serviceProvider; + private readonly ILogger _logger; + + private static readonly Dictionary StrategyTypes = new() + { + [DeploymentStrategy.Rolling] = typeof(RollingStrategy), + [DeploymentStrategy.BlueGreen] = typeof(BlueGreenStrategy), + [DeploymentStrategy.Canary] = typeof(CanaryStrategy), + [DeploymentStrategy.AllAtOnce] = typeof(AllAtOnceStrategy) + }; + + public IDeploymentStrategy Create(DeploymentStrategy strategy) + { + if (!StrategyTypes.TryGetValue(strategy, out var type)) + { + throw new UnsupportedStrategyException(strategy); + } + + var instance = _serviceProvider.GetRequiredService(type) as IDeploymentStrategy; + if (instance is null) + { + throw new StrategyCreationException(strategy); + } + + _logger.LogDebug("Created {Strategy} deployment strategy", strategy); + return instance; + } + + public IReadOnlyList GetAvailableStrategies() => + StrategyTypes.Keys.Select(s => s.ToString()).ToList().AsReadOnly(); +} +``` + +--- + +## Acceptance Criteria + +- [ ] Rolling strategy batches targets +- [ ] Rolling strategy checks health between batches +- [ ] Blue-green deploys all then switches +- [ ] Canary deploys incrementally +- [ ] Canary checks metrics after canary batch +- [ ] All-at-once deploys simultaneously +- [ ] Strategy factory creates correct type +- [ ] Batch size parsing works +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 107_002 Target Executor | Internal | TODO | +| 103_002 Target Registry | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IDeploymentStrategy | TODO | | +| DeploymentStrategyFactory | TODO | | +| RollingStrategy | TODO | | +| BlueGreenStrategy | TODO | | +| CanaryStrategy | TODO | | +| AllAtOnceStrategy | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_108_000_INDEX_agents.md b/docs/implplan/SPRINT_20260110_108_000_INDEX_agents.md new file mode 100644 index 000000000..d1e45c1bd --- /dev/null +++ b/docs/implplan/SPRINT_20260110_108_000_INDEX_agents.md @@ -0,0 +1,291 @@ +# SPRINT INDEX: Phase 8 - Agents + +> **Epic:** Release Orchestrator +> **Phase:** 8 - Agents +> **Batch:** 108 +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_release_orchestrator.md) + +--- + +## Overview + +Phase 8 implements the deployment Agents - lightweight, secure executors that run on target hosts to perform container operations. + +### Objectives + +- Agent core runtime with gRPC communication +- Docker agent for standalone containers +- Compose agent for docker-compose deployments +- SSH agent for remote execution +- WinRM agent for Windows hosts +- ECS agent for AWS Elastic Container Service +- Nomad agent for HashiCorp Nomad + +--- + +## Sprint Structure + +| Sprint ID | Title | Module | Status | Dependencies | +|-----------|-------|--------|--------|--------------| +| 108_001 | Agent Core Runtime | AGENTS | TODO | 103_003 | +| 108_002 | Agent - Docker | AGENTS | TODO | 108_001 | +| 108_003 | Agent - Compose | AGENTS | TODO | 108_002 | +| 108_004 | Agent - SSH | AGENTS | TODO | 108_001 | +| 108_005 | Agent - WinRM | AGENTS | TODO | 108_001 | +| 108_006 | Agent - ECS | AGENTS | TODO | 108_001 | +| 108_007 | Agent - Nomad | AGENTS | TODO | 108_001 | + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ AGENT SYSTEM │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ AGENT CORE RUNTIME (108_001) │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Stella Agent │ │ │ +│ │ │ │ │ │ +│ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ +│ │ │ │ gRPC │ │ Task Queue │ │ Heartbeat │ │ │ │ +│ │ │ │ Server │ │ Executor │ │ Service │ │ │ │ +│ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ +│ │ │ │ │ │ +│ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ +│ │ │ │ Credential │ │ Log │ │ Metrics │ │ │ │ +│ │ │ │ Resolver │ │ Streamer │ │ Reporter │ │ │ │ +│ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ +│ │ └─────────────────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────┐ ┌─────────────────────────┐ │ +│ │ DOCKER AGENT (108_002)│ │ COMPOSE AGENT (108_003)│ │ +│ │ │ │ │ │ +│ │ - docker pull │ │ - docker compose pull │ │ +│ │ - docker run │ │ - docker compose up │ │ +│ │ - docker stop │ │ - docker compose down │ │ +│ │ - docker rm │ │ - service health check │ │ +│ │ - health check │ │ - volume management │ │ +│ │ - log streaming │ │ - network management │ │ +│ └─────────────────────────┘ └─────────────────────────┘ │ +│ │ +│ ┌─────────────────────────┐ ┌─────────────────────────┐ │ +│ │ SSH AGENT (108_004) │ │ WINRM AGENT (108_005) │ │ +│ │ │ │ │ │ +│ │ - Remote Docker ops │ │ - Windows containers │ │ +│ │ - Remote script exec │ │ - IIS management │ │ +│ │ - File transfer │ │ - Windows services │ │ +│ │ - SSH key auth │ │ - PowerShell execution │ │ +│ └─────────────────────────┘ └─────────────────────────┘ │ +│ │ +│ ┌─────────────────────────┐ ┌─────────────────────────┐ │ +│ │ ECS AGENT (108_006) │ │ NOMAD AGENT (108_007) │ │ +│ │ │ │ │ │ +│ │ - ECS service deploy │ │ - Nomad job deploy │ │ +│ │ - Task execution │ │ - Job scaling │ │ +│ │ - Service scaling │ │ - Allocation health │ │ +│ │ - CloudWatch logs │ │ - Log streaming │ │ +│ │ - Fargate + EC2 │ │ - Multiple drivers │ │ +│ └─────────────────────────┘ └─────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Deliverables Summary + +### 108_001: Agent Core Runtime + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `AgentHost` | Service | Main agent process | +| `GrpcAgentServer` | gRPC | Task receiver | +| `TaskExecutor` | Class | Task execution | +| `HeartbeatService` | Service | Health reporting | +| `CredentialResolver` | Class | Secret resolution | +| `LogStreamer` | Class | Log forwarding | + +### 108_002: Agent - Docker + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `DockerCapability` | Capability | Docker operations | +| `DockerPullTask` | Task | Pull images | +| `DockerRunTask` | Task | Create/start containers | +| `DockerStopTask` | Task | Stop containers | +| `DockerHealthCheck` | Task | Container health | + +### 108_003: Agent - Compose + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `ComposeCapability` | Capability | Compose operations | +| `ComposePullTask` | Task | Pull compose images | +| `ComposeUpTask` | Task | Deploy compose stack | +| `ComposeDownTask` | Task | Remove compose stack | +| `ComposeScaleTask` | Task | Scale services | + +### 108_004: Agent - SSH + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `SshCapability` | Capability | SSH operations | +| `SshExecuteTask` | Task | Remote command execution | +| `SshFileTransferTask` | Task | SCP file transfer | +| `SshTunnelTask` | Task | SSH tunneling | + +### 108_005: Agent - WinRM + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `WinRmCapability` | Capability | WinRM operations | +| `PowerShellTask` | Task | PowerShell execution | +| `WindowsServiceTask` | Task | Service management | +| `WindowsContainerTask` | Task | Windows container ops | + +### 108_006: Agent - ECS + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `EcsCapability` | Capability | AWS ECS operations | +| `EcsDeployServiceTask` | Task | Deploy/update ECS services | +| `EcsRunTaskTask` | Task | Run one-off ECS tasks | +| `EcsStopTaskTask` | Task | Stop running tasks | +| `EcsScaleServiceTask` | Task | Scale services | +| `EcsHealthCheckTask` | Task | Service health check | +| `CloudWatchLogStreamer` | Class | Log streaming | + +### 108_007: Agent - Nomad + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `NomadCapability` | Capability | Nomad operations | +| `NomadDeployJobTask` | Task | Deploy Nomad jobs | +| `NomadStopJobTask` | Task | Stop jobs | +| `NomadScaleJobTask` | Task | Scale task groups | +| `NomadHealthCheckTask` | Task | Job health check | +| `NomadDispatchJobTask` | Task | Dispatch parameterized jobs | +| `NomadLogStreamer` | Class | Allocation log streaming | + +--- + +## Agent Protocol (gRPC) + +```protobuf +syntax = "proto3"; +package stella.agent.v1; + +service AgentService { + // Task execution + rpc ExecuteTask(TaskRequest) returns (stream TaskProgress); + rpc CancelTask(CancelTaskRequest) returns (CancelTaskResponse); + + // Health and status + rpc Heartbeat(HeartbeatRequest) returns (HeartbeatResponse); + rpc GetStatus(StatusRequest) returns (StatusResponse); + + // Logs + rpc StreamLogs(LogStreamRequest) returns (stream LogEntry); +} + +message TaskRequest { + string task_id = 1; + string task_type = 2; + bytes payload = 3; + map credentials = 4; +} + +message TaskProgress { + string task_id = 1; + TaskState state = 2; + int32 progress_percent = 3; + string message = 4; + bytes result = 5; +} + +enum TaskState { + PENDING = 0; + RUNNING = 1; + SUCCEEDED = 2; + FAILED = 3; + CANCELLED = 4; +} +``` + +--- + +## Agent Security + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ AGENT SECURITY MODEL │ +│ │ +│ Registration Flow: │ +│ ┌─────────────┐ 1. Get token ┌─────────────┐ │ +│ │ Admin │ ───────────────► │ Orchestrator │ │ +│ └─────────────┘ └──────┬──────┘ │ +│ │ 2. Generate one-time token │ +│ ▼ │ +│ ┌─────────────┐ 3. Register ┌─────────────┐ │ +│ │ Agent │ ───────────────► │ Orchestrator │ │ +│ │ (token) │ └──────┬──────┘ │ +│ └─────────────┘ │ 4. Issue mTLS certificate │ +│ ▼ │ +│ ┌─────────────┐ 5. Connect ┌─────────────┐ │ +│ │ Agent │ ◄───────────────►│ Orchestrator │ │ +│ │ (mTLS) │ (gRPC) └─────────────┘ │ +│ └─────────────┘ │ +│ │ +│ Security Controls: │ +│ - mTLS with short-lived certificates (24h) │ +│ - Capability-based authorization │ +│ - Task-scoped credentials (never stored) │ +│ - Audit logging of all operations │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Dependencies + +| Module | Purpose | +|--------|---------| +| 103_003 Agent Manager | Registration | +| 107_002 Target Executor | Task dispatch | +| Docker.DotNet | Docker API | +| AWSSDK.ECS | AWS ECS API | +| AWSSDK.CloudWatchLogs | AWS CloudWatch Logs | +| Nomad.Api (custom) | Nomad HTTP API | + +--- + +## Acceptance Criteria + +- [ ] Agent registers with one-time token +- [ ] mTLS established after registration +- [ ] Heartbeat updates agent status +- [ ] Docker pull/run/stop works +- [ ] Compose up/down works +- [ ] SSH remote execution works +- [ ] WinRM PowerShell works +- [ ] ECS service deploy/scale works +- [ ] ECS task run/stop works +- [ ] Nomad job deploy/stop works +- [ ] Nomad job scaling works +- [ ] Log streaming works (Docker, CloudWatch, Nomad) +- [ ] Credentials resolved at runtime +- [ ] Unit test coverage ≥80% + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Phase 8 index created | +| 10-Jan-2026 | Added ECS agent (108_006) and Nomad agent (108_007) sprints per feature completeness review | diff --git a/docs/implplan/SPRINT_20260110_108_001_AGENTS_core_runtime.md b/docs/implplan/SPRINT_20260110_108_001_AGENTS_core_runtime.md new file mode 100644 index 000000000..a38938d28 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_108_001_AGENTS_core_runtime.md @@ -0,0 +1,776 @@ +# SPRINT: Agent Core Runtime + +> **Sprint ID:** 108_001 +> **Module:** AGENTS +> **Phase:** 8 - Agents +> **Status:** TODO +> **Parent:** [108_000_INDEX](SPRINT_20260110_108_000_INDEX_agents.md) + +--- + +## Overview + +Implement the Agent Core Runtime - the foundational process that runs on target hosts to receive and execute deployment tasks. + +### Objectives + +- Agent host process with lifecycle management +- gRPC server for task reception +- Heartbeat service for health reporting +- Credential resolution at runtime +- Log streaming to orchestrator +- Capability registration system + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Agents/ +│ └── StellaOps.Agent.Core/ +│ ├── AgentHost.cs +│ ├── AgentConfiguration.cs +│ ├── GrpcAgentServer.cs +│ ├── TaskExecutor.cs +│ ├── HeartbeatService.cs +│ ├── CredentialResolver.cs +│ ├── LogStreamer.cs +│ └── CapabilityRegistry.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### AgentConfiguration + +```csharp +namespace StellaOps.Agent.Core; + +public sealed class AgentConfiguration +{ + public required string AgentId { get; set; } + public required string AgentName { get; set; } + public required string OrchestratorUrl { get; set; } + public required string CertificatePath { get; set; } + public required string PrivateKeyPath { get; set; } + public required string CaCertificatePath { get; set; } + public int GrpcPort { get; set; } = 50051; + public TimeSpan HeartbeatInterval { get; set; } = TimeSpan.FromSeconds(30); + public TimeSpan TaskTimeout { get; set; } = TimeSpan.FromMinutes(30); + public IReadOnlyList EnabledCapabilities { get; set; } = []; +} +``` + +### IAgentCapability Interface + +```csharp +namespace StellaOps.Agent.Core; + +public interface IAgentCapability +{ + string Name { get; } + string Version { get; } + IReadOnlyList SupportedTaskTypes { get; } + Task InitializeAsync(CancellationToken ct = default); + Task ExecuteAsync(AgentTask task, CancellationToken ct = default); + Task CheckHealthAsync(CancellationToken ct = default); +} + +public sealed record CapabilityHealthStatus( + bool IsHealthy, + string? Message = null, + IReadOnlyDictionary? Details = null +); +``` + +### CapabilityRegistry + +```csharp +namespace StellaOps.Agent.Core; + +public sealed class CapabilityRegistry +{ + private readonly Dictionary _capabilities = new(); + private readonly ILogger _logger; + + public void Register(IAgentCapability capability) + { + if (_capabilities.ContainsKey(capability.Name)) + { + throw new CapabilityAlreadyRegisteredException(capability.Name); + } + + _capabilities[capability.Name] = capability; + _logger.LogInformation( + "Registered capability {Name} v{Version} with tasks: {Tasks}", + capability.Name, + capability.Version, + string.Join(", ", capability.SupportedTaskTypes)); + } + + public IAgentCapability? GetForTaskType(string taskType) + { + return _capabilities.Values + .FirstOrDefault(c => c.SupportedTaskTypes.Contains(taskType)); + } + + public IReadOnlyList GetCapabilities() + { + return _capabilities.Values.Select(c => new CapabilityInfo( + c.Name, + c.Version, + c.SupportedTaskTypes.ToImmutableArray() + )).ToList().AsReadOnly(); + } + + public async Task InitializeAllAsync(CancellationToken ct = default) + { + foreach (var (name, capability) in _capabilities) + { + var success = await capability.InitializeAsync(ct); + if (!success) + { + _logger.LogWarning("Capability {Name} failed to initialize", name); + } + } + } +} + +public sealed record CapabilityInfo( + string Name, + string Version, + ImmutableArray SupportedTaskTypes +); +``` + +### AgentTask Model + +```csharp +namespace StellaOps.Agent.Core; + +public sealed record AgentTask +{ + public required Guid Id { get; init; } + public required string TaskType { get; init; } + public required string Payload { get; init; } + public required IReadOnlyDictionary Credentials { get; init; } + public required IReadOnlyDictionary Variables { get; init; } + public DateTimeOffset ReceivedAt { get; init; } + public TimeSpan Timeout { get; init; } = TimeSpan.FromMinutes(30); +} + +public sealed record TaskResult +{ + public required Guid TaskId { get; init; } + public required bool Success { get; init; } + public string? Error { get; init; } + public IReadOnlyDictionary Outputs { get; init; } = new Dictionary(); + public DateTimeOffset CompletedAt { get; init; } + public TimeSpan Duration { get; init; } +} +``` + +### TaskExecutor + +```csharp +namespace StellaOps.Agent.Core; + +public sealed class TaskExecutor +{ + private readonly CapabilityRegistry _capabilities; + private readonly CredentialResolver _credentialResolver; + private readonly ILogger _logger; + private readonly ConcurrentDictionary _runningTasks = new(); + + public async Task ExecuteAsync( + AgentTask task, + IProgress? progress = null, + CancellationToken ct = default) + { + var capability = _capabilities.GetForTaskType(task.TaskType) + ?? throw new UnsupportedTaskTypeException(task.TaskType); + + using var taskCts = new CancellationTokenSource(task.Timeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, taskCts.Token); + + _runningTasks[task.Id] = linkedCts; + + var stopwatch = Stopwatch.StartNew(); + + try + { + _logger.LogInformation( + "Executing task {TaskId} of type {TaskType}", + task.Id, task.TaskType); + + progress?.Report(new TaskProgress(task.Id, TaskState.Running, 0, "Starting")); + + // Resolve credentials + var resolvedTask = await ResolveCredentialsAsync(task, linkedCts.Token); + + // Execute via capability + var result = await capability.ExecuteAsync(resolvedTask, linkedCts.Token); + + progress?.Report(new TaskProgress( + task.Id, + result.Success ? TaskState.Succeeded : TaskState.Failed, + 100, + result.Success ? "Completed" : result.Error ?? "Failed")); + + _logger.LogInformation( + "Task {TaskId} completed with status {Status} in {Duration}ms", + task.Id, + result.Success ? "success" : "failure", + stopwatch.ElapsedMilliseconds); + + return result with { Duration = stopwatch.Elapsed }; + } + catch (OperationCanceledException) when (taskCts.IsCancellationRequested) + { + _logger.LogWarning("Task {TaskId} timed out after {Timeout}", task.Id, task.Timeout); + + progress?.Report(new TaskProgress(task.Id, TaskState.Failed, 0, "Timeout")); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Task timed out after {task.Timeout}", + CompletedAt = DateTimeOffset.UtcNow, + Duration = stopwatch.Elapsed + }; + } + catch (OperationCanceledException) + { + _logger.LogInformation("Task {TaskId} was cancelled", task.Id); + + progress?.Report(new TaskProgress(task.Id, TaskState.Cancelled, 0, "Cancelled")); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = "Task was cancelled", + CompletedAt = DateTimeOffset.UtcNow, + Duration = stopwatch.Elapsed + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "Task {TaskId} failed with exception", task.Id); + + progress?.Report(new TaskProgress(task.Id, TaskState.Failed, 0, ex.Message)); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = ex.Message, + CompletedAt = DateTimeOffset.UtcNow, + Duration = stopwatch.Elapsed + }; + } + finally + { + _runningTasks.TryRemove(task.Id, out _); + } + } + + public bool CancelTask(Guid taskId) + { + if (_runningTasks.TryGetValue(taskId, out var cts)) + { + cts.Cancel(); + return true; + } + return false; + } + + private async Task ResolveCredentialsAsync(AgentTask task, CancellationToken ct) + { + var resolvedCredentials = new Dictionary(); + + foreach (var (key, value) in task.Credentials) + { + resolvedCredentials[key] = await _credentialResolver.ResolveAsync(value, ct); + } + + return task with { Credentials = resolvedCredentials }; + } +} + +public sealed record TaskProgress( + Guid TaskId, + TaskState State, + int ProgressPercent, + string Message +); + +public enum TaskState +{ + Pending, + Running, + Succeeded, + Failed, + Cancelled +} +``` + +### CredentialResolver + +```csharp +namespace StellaOps.Agent.Core; + +public sealed class CredentialResolver +{ + private readonly IEnumerable _providers; + private readonly ILogger _logger; + + public async Task ResolveAsync(string reference, CancellationToken ct = default) + { + // Reference format: provider://path + // e.g., env://DB_PASSWORD, file:///etc/secrets/api-key, vault://secrets/myapp/apikey + + var parsed = ParseReference(reference); + if (parsed is null) + { + // Not a reference, return as-is (literal value) + return reference; + } + + var provider = _providers.FirstOrDefault(p => p.Scheme == parsed.Scheme) + ?? throw new UnknownCredentialProviderException(parsed.Scheme); + + var value = await provider.GetSecretAsync(parsed.Path, ct); + if (value is null) + { + throw new CredentialNotFoundException(reference); + } + + _logger.LogDebug("Resolved credential reference {Scheme}://***", parsed.Scheme); + return value; + } + + private static CredentialReference? ParseReference(string reference) + { + if (string.IsNullOrEmpty(reference)) + return null; + + var match = Regex.Match(reference, @"^([a-z]+)://(.+)$"); + if (!match.Success) + return null; + + return new CredentialReference(match.Groups[1].Value, match.Groups[2].Value); + } +} + +public interface ICredentialProvider +{ + string Scheme { get; } + Task GetSecretAsync(string path, CancellationToken ct = default); +} + +public sealed class EnvironmentCredentialProvider : ICredentialProvider +{ + public string Scheme => "env"; + + public Task GetSecretAsync(string path, CancellationToken ct = default) + { + return Task.FromResult(Environment.GetEnvironmentVariable(path)); + } +} + +public sealed class FileCredentialProvider : ICredentialProvider +{ + public string Scheme => "file"; + + public async Task GetSecretAsync(string path, CancellationToken ct = default) + { + if (!File.Exists(path)) + return null; + + return (await File.ReadAllTextAsync(path, ct)).Trim(); + } +} + +internal sealed record CredentialReference(string Scheme, string Path); +``` + +### HeartbeatService + +```csharp +namespace StellaOps.Agent.Core; + +public sealed class HeartbeatService : BackgroundService +{ + private readonly AgentConfiguration _config; + private readonly CapabilityRegistry _capabilities; + private readonly IOrchestratorClient _orchestratorClient; + private readonly ILogger _logger; + + protected override async Task ExecuteAsync(CancellationToken stoppingToken) + { + _logger.LogInformation("Heartbeat service started"); + + while (!stoppingToken.IsCancellationRequested) + { + try + { + await SendHeartbeatAsync(stoppingToken); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Failed to send heartbeat"); + } + + await Task.Delay(_config.HeartbeatInterval, stoppingToken); + } + } + + private async Task SendHeartbeatAsync(CancellationToken ct) + { + var capabilities = _capabilities.GetCapabilities(); + var health = await CheckCapabilityHealthAsync(ct); + + var heartbeat = new AgentHeartbeat + { + AgentId = _config.AgentId, + Timestamp = DateTimeOffset.UtcNow, + Status = health.AllHealthy ? AgentStatus.Active : AgentStatus.Degraded, + Capabilities = capabilities, + SystemInfo = GetSystemInfo(), + RunningTaskCount = GetRunningTaskCount(), + HealthDetails = health.Details + }; + + await _orchestratorClient.SendHeartbeatAsync(heartbeat, ct); + + _logger.LogDebug( + "Heartbeat sent: status={Status}, tasks={TaskCount}", + heartbeat.Status, + heartbeat.RunningTaskCount); + } + + private async Task CheckCapabilityHealthAsync(CancellationToken ct) + { + var details = new Dictionary(); + var allHealthy = true; + + foreach (var capability in _capabilities.GetCapabilities()) + { + var cap = _capabilities.GetForTaskType(capability.SupportedTaskTypes.First()); + if (cap is null) continue; + + var health = await cap.CheckHealthAsync(ct); + details[capability.Name] = new { health.IsHealthy, health.Message }; + allHealthy = allHealthy && health.IsHealthy; + } + + return new HealthCheckResult(allHealthy, details); + } + + private static SystemInfo GetSystemInfo() + { + return new SystemInfo + { + Hostname = Environment.MachineName, + OsDescription = RuntimeInformation.OSDescription, + ProcessorCount = Environment.ProcessorCount, + MemoryBytes = GC.GetGCMemoryInfo().TotalAvailableMemoryBytes + }; + } + + private int GetRunningTaskCount() + { + // Implementation would get from TaskExecutor + return 0; + } +} + +public sealed record AgentHeartbeat +{ + public required string AgentId { get; init; } + public required DateTimeOffset Timestamp { get; init; } + public required AgentStatus Status { get; init; } + public required IReadOnlyList Capabilities { get; init; } + public required SystemInfo SystemInfo { get; init; } + public int RunningTaskCount { get; init; } + public IReadOnlyDictionary? HealthDetails { get; init; } +} + +public sealed record SystemInfo +{ + public required string Hostname { get; init; } + public required string OsDescription { get; init; } + public required int ProcessorCount { get; init; } + public required long MemoryBytes { get; init; } +} + +public enum AgentStatus +{ + Inactive, + Active, + Degraded, + Disconnected +} + +internal sealed record HealthCheckResult( + bool AllHealthy, + IReadOnlyDictionary Details +); +``` + +### LogStreamer + +```csharp +namespace StellaOps.Agent.Core; + +public sealed class LogStreamer : IAsyncDisposable +{ + private readonly IOrchestratorClient _orchestratorClient; + private readonly Channel _logChannel; + private readonly ILogger _logger; + private readonly CancellationTokenSource _cts = new(); + private readonly Task _streamTask; + + public LogStreamer(IOrchestratorClient orchestratorClient, ILogger logger) + { + _orchestratorClient = orchestratorClient; + _logger = logger; + _logChannel = Channel.CreateBounded(new BoundedChannelOptions(10000) + { + FullMode = BoundedChannelFullMode.DropOldest + }); + + _streamTask = StreamLogsAsync(_cts.Token); + } + + public void Log(Guid taskId, LogLevel level, string message) + { + var entry = new LogEntry + { + TaskId = taskId, + Timestamp = DateTimeOffset.UtcNow, + Level = level, + Message = message + }; + + if (!_logChannel.Writer.TryWrite(entry)) + { + _logger.LogWarning("Log channel full, dropping log entry"); + } + } + + private async Task StreamLogsAsync(CancellationToken ct) + { + var batch = new List(); + var batchTimeout = TimeSpan.FromMilliseconds(100); + + while (!ct.IsCancellationRequested) + { + try + { + // Collect logs for batching + using var timeoutCts = new CancellationTokenSource(batchTimeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + while (batch.Count < 100) + { + if (_logChannel.Reader.TryRead(out var entry)) + { + batch.Add(entry); + } + else + { + await _logChannel.Reader.WaitToReadAsync(linkedCts.Token); + } + } + } + catch (OperationCanceledException) when (!ct.IsCancellationRequested) + { + // Timeout, send what we have + } + + if (batch.Count > 0) + { + try + { + await _orchestratorClient.SendLogsAsync(batch, ct); + batch.Clear(); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Failed to send logs, will retry"); + } + } + } + } + + public async ValueTask DisposeAsync() + { + _cts.Cancel(); + await _streamTask; + _cts.Dispose(); + } +} + +public sealed record LogEntry +{ + public required Guid TaskId { get; init; } + public required DateTimeOffset Timestamp { get; init; } + public required LogLevel Level { get; init; } + public required string Message { get; init; } +} +``` + +### AgentHost + +```csharp +namespace StellaOps.Agent.Core; + +public sealed class AgentHost : IHostedService +{ + private readonly AgentConfiguration _config; + private readonly CapabilityRegistry _capabilities; + private readonly GrpcAgentServer _grpcServer; + private readonly HeartbeatService _heartbeatService; + private readonly IOrchestratorClient _orchestratorClient; + private readonly ILogger _logger; + + public async Task StartAsync(CancellationToken cancellationToken) + { + _logger.LogInformation( + "Starting Stella Agent {Name} ({Id})", + _config.AgentName, + _config.AgentId); + + // Initialize capabilities + await _capabilities.InitializeAllAsync(cancellationToken); + + // Connect to orchestrator + await _orchestratorClient.ConnectAsync(cancellationToken); + + // Start gRPC server + await _grpcServer.StartAsync(cancellationToken); + + _logger.LogInformation( + "Agent started on port {Port} with {Count} capabilities", + _config.GrpcPort, + _capabilities.GetCapabilities().Count); + } + + public async Task StopAsync(CancellationToken cancellationToken) + { + _logger.LogInformation("Stopping Stella Agent"); + + await _grpcServer.StopAsync(cancellationToken); + await _orchestratorClient.DisconnectAsync(cancellationToken); + + _logger.LogInformation("Agent stopped"); + } +} +``` + +### GrpcAgentServer + +```csharp +namespace StellaOps.Agent.Core; + +public sealed class GrpcAgentServer +{ + private readonly AgentConfiguration _config; + private readonly TaskExecutor _taskExecutor; + private readonly LogStreamer _logStreamer; + private readonly ILogger _logger; + private Server? _server; + + public Task StartAsync(CancellationToken ct = default) + { + var serverCredentials = BuildServerCredentials(); + + _server = new Server + { + Services = { AgentService.BindService(new AgentServiceImpl(_taskExecutor, _logStreamer)) }, + Ports = { new ServerPort("0.0.0.0", _config.GrpcPort, serverCredentials) } + }; + + _server.Start(); + _logger.LogInformation("gRPC server started on port {Port}", _config.GrpcPort); + + return Task.CompletedTask; + } + + public async Task StopAsync(CancellationToken ct = default) + { + if (_server is not null) + { + await _server.ShutdownAsync(); + _logger.LogInformation("gRPC server stopped"); + } + } + + private ServerCredentials BuildServerCredentials() + { + var cert = File.ReadAllText(_config.CertificatePath); + var key = File.ReadAllText(_config.PrivateKeyPath); + var caCert = File.ReadAllText(_config.CaCertificatePath); + + var keyCertPair = new KeyCertificatePair(cert, key); + + return new SslServerCredentials( + new[] { keyCertPair }, + caCert, + SslClientCertificateRequestType.RequestAndRequireAndVerify); + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Agent process starts and runs as service +- [ ] gRPC server accepts mTLS connections +- [ ] Capabilities register at startup +- [ ] Tasks execute via correct capability +- [ ] Task cancellation works +- [ ] Heartbeat sends to orchestrator +- [ ] Credentials resolve at runtime +- [ ] Logs stream to orchestrator +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 103_003 Agent Manager | Internal | TODO | +| Grpc.AspNetCore | NuGet | Available | +| Google.Protobuf | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| AgentConfiguration | TODO | | +| IAgentCapability | TODO | | +| CapabilityRegistry | TODO | | +| TaskExecutor | TODO | | +| CredentialResolver | TODO | | +| HeartbeatService | TODO | | +| LogStreamer | TODO | | +| AgentHost | TODO | | +| GrpcAgentServer | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_108_002_AGENTS_docker.md b/docs/implplan/SPRINT_20260110_108_002_AGENTS_docker.md new file mode 100644 index 000000000..500d4fa63 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_108_002_AGENTS_docker.md @@ -0,0 +1,921 @@ +# SPRINT: Agent - Docker + +> **Sprint ID:** 108_002 +> **Module:** AGENTS +> **Phase:** 8 - Agents +> **Status:** TODO +> **Parent:** [108_000_INDEX](SPRINT_20260110_108_000_INDEX_agents.md) + +--- + +## Overview + +Implement the Docker Agent capability for managing standalone Docker containers on target hosts. + +### Objectives + +- Docker image pull operations +- Container creation and start +- Container stop and removal +- Container health checking +- Log streaming from containers +- Registry authentication + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Agents/ +│ └── StellaOps.Agent.Docker/ +│ ├── DockerCapability.cs +│ ├── Tasks/ +│ │ ├── DockerPullTask.cs +│ │ ├── DockerRunTask.cs +│ │ ├── DockerStopTask.cs +│ │ ├── DockerRemoveTask.cs +│ │ └── DockerHealthCheckTask.cs +│ ├── DockerClientFactory.cs +│ └── ContainerLogStreamer.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### DockerCapability + +```csharp +namespace StellaOps.Agent.Docker; + +public sealed class DockerCapability : IAgentCapability +{ + private readonly IDockerClient _dockerClient; + private readonly ILogger _logger; + private readonly Dictionary>> _taskHandlers; + + public string Name => "docker"; + public string Version => "1.0.0"; + + public IReadOnlyList SupportedTaskTypes => new[] + { + "docker.pull", + "docker.run", + "docker.stop", + "docker.remove", + "docker.health-check", + "docker.logs" + }; + + public DockerCapability(IDockerClient dockerClient, ILogger logger) + { + _dockerClient = dockerClient; + _logger = logger; + + _taskHandlers = new Dictionary>> + { + ["docker.pull"] = ExecutePullAsync, + ["docker.run"] = ExecuteRunAsync, + ["docker.stop"] = ExecuteStopAsync, + ["docker.remove"] = ExecuteRemoveAsync, + ["docker.health-check"] = ExecuteHealthCheckAsync + }; + } + + public async Task InitializeAsync(CancellationToken ct = default) + { + try + { + var version = await _dockerClient.System.GetVersionAsync(ct); + _logger.LogInformation( + "Docker capability initialized: Docker {Version} on {OS}", + version.Version, + version.Os); + return true; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to initialize Docker capability"); + return false; + } + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct = default) + { + if (!_taskHandlers.TryGetValue(task.TaskType, out var handler)) + { + throw new UnsupportedTaskTypeException(task.TaskType); + } + + return await handler(task, ct); + } + + public async Task CheckHealthAsync(CancellationToken ct = default) + { + try + { + await _dockerClient.System.PingAsync(ct); + return new CapabilityHealthStatus(true, "Docker daemon responding"); + } + catch (Exception ex) + { + return new CapabilityHealthStatus(false, $"Docker daemon not responding: {ex.Message}"); + } + } + + private Task ExecutePullAsync(AgentTask task, CancellationToken ct) => + new DockerPullTask(_dockerClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteRunAsync(AgentTask task, CancellationToken ct) => + new DockerRunTask(_dockerClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteStopAsync(AgentTask task, CancellationToken ct) => + new DockerStopTask(_dockerClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteRemoveAsync(AgentTask task, CancellationToken ct) => + new DockerRemoveTask(_dockerClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteHealthCheckAsync(AgentTask task, CancellationToken ct) => + new DockerHealthCheckTask(_dockerClient, _logger).ExecuteAsync(task, ct); +} +``` + +### DockerPullTask + +```csharp +namespace StellaOps.Agent.Docker.Tasks; + +public sealed class DockerPullTask +{ + private readonly IDockerClient _dockerClient; + private readonly ILogger _logger; + + public sealed record PullPayload + { + public required string Image { get; init; } + public string? Tag { get; init; } + public string? Digest { get; init; } + public string? Registry { get; init; } + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("docker.pull"); + + var imageRef = BuildImageReference(payload); + + _logger.LogInformation("Pulling image {Image}", imageRef); + + try + { + // Get registry credentials if provided + AuthConfig? authConfig = null; + if (task.Credentials.TryGetValue("registry.username", out var username) && + task.Credentials.TryGetValue("registry.password", out var password)) + { + authConfig = new AuthConfig + { + Username = username, + Password = password, + ServerAddress = payload.Registry ?? "https://index.docker.io/v1/" + }; + } + + await _dockerClient.Images.CreateImageAsync( + new ImagesCreateParameters + { + FromImage = imageRef + }, + authConfig, + new Progress(msg => + { + if (!string.IsNullOrEmpty(msg.Status)) + { + _logger.LogDebug("Pull progress: {Status}", msg.Status); + } + }), + ct); + + // Verify the image was pulled + var images = await _dockerClient.Images.ListImagesAsync( + new ImagesListParameters + { + Filters = new Dictionary> + { + ["reference"] = new Dictionary { [imageRef] = true } + } + }, + ct); + + if (images.Count == 0) + { + throw new ImagePullException(imageRef, "Image not found after pull"); + } + + var pulledImage = images.First(); + + _logger.LogInformation( + "Successfully pulled image {Image} (ID: {Id})", + imageRef, + pulledImage.ID[..12]); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["imageId"] = pulledImage.ID, + ["size"] = pulledImage.Size, + ["digest"] = payload.Digest ?? ExtractDigest(pulledImage) + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (DockerApiException ex) + { + _logger.LogError(ex, "Failed to pull image {Image}", imageRef); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Failed to pull image: {ex.Message}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private static string BuildImageReference(PullPayload payload) + { + var image = payload.Image; + + if (!string.IsNullOrEmpty(payload.Registry)) + { + image = $"{payload.Registry}/{image}"; + } + + if (!string.IsNullOrEmpty(payload.Digest)) + { + return $"{image}@{payload.Digest}"; + } + + if (!string.IsNullOrEmpty(payload.Tag)) + { + return $"{image}:{payload.Tag}"; + } + + return $"{image}:latest"; + } + + private static string ExtractDigest(ImagesListResponse image) + { + return image.RepoDigests.FirstOrDefault()?.Split('@').LastOrDefault() ?? ""; + } +} +``` + +### DockerRunTask + +```csharp +namespace StellaOps.Agent.Docker.Tasks; + +public sealed class DockerRunTask +{ + private readonly IDockerClient _dockerClient; + private readonly ILogger _logger; + + public sealed record RunPayload + { + public required string Image { get; init; } + public required string Name { get; init; } + public IReadOnlyDictionary? Environment { get; init; } + public IReadOnlyList? Ports { get; init; } + public IReadOnlyList? Volumes { get; init; } + public IReadOnlyDictionary? Labels { get; init; } + public string? Network { get; init; } + public IReadOnlyList? Command { get; init; } + public ContainerHealthConfig? HealthCheck { get; init; } + public bool AutoRemove { get; init; } + public RestartPolicy? RestartPolicy { get; init; } + } + + public sealed record ContainerHealthConfig + { + public required IReadOnlyList Test { get; init; } + public TimeSpan Interval { get; init; } = TimeSpan.FromSeconds(30); + public TimeSpan Timeout { get; init; } = TimeSpan.FromSeconds(10); + public int Retries { get; init; } = 3; + public TimeSpan StartPeriod { get; init; } = TimeSpan.FromSeconds(0); + } + + public sealed record RestartPolicy + { + public string Name { get; init; } = "no"; + public int MaximumRetryCount { get; init; } + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("docker.run"); + + _logger.LogInformation( + "Creating container {Name} from image {Image}", + payload.Name, + payload.Image); + + try + { + // Check if container already exists + var existingContainers = await _dockerClient.Containers.ListContainersAsync( + new ContainersListParameters + { + All = true, + Filters = new Dictionary> + { + ["name"] = new Dictionary { [payload.Name] = true } + } + }, + ct); + + if (existingContainers.Any()) + { + var existing = existingContainers.First(); + _logger.LogInformation( + "Container {Name} already exists (ID: {Id}), removing", + payload.Name, + existing.ID[..12]); + + await _dockerClient.Containers.StopContainerAsync(existing.ID, new ContainerStopParameters(), ct); + await _dockerClient.Containers.RemoveContainerAsync(existing.ID, new ContainerRemoveParameters(), ct); + } + + // Merge labels with Stella metadata + var labels = new Dictionary(payload.Labels ?? new Dictionary()); + labels["stella.managed"] = "true"; + labels["stella.task.id"] = task.Id.ToString(); + + // Build create parameters + var createParams = new CreateContainerParameters + { + Image = payload.Image, + Name = payload.Name, + Env = BuildEnvironment(payload.Environment, task.Variables), + Labels = labels, + Cmd = payload.Command?.ToList(), + HostConfig = new HostConfig + { + PortBindings = ParsePortBindings(payload.Ports), + Binds = payload.Volumes?.ToList(), + NetworkMode = payload.Network, + AutoRemove = payload.AutoRemove, + RestartPolicy = payload.RestartPolicy is not null + ? new Docker.DotNet.Models.RestartPolicy + { + Name = Enum.Parse(payload.RestartPolicy.Name, ignoreCase: true), + MaximumRetryCount = payload.RestartPolicy.MaximumRetryCount + } + : null + }, + Healthcheck = payload.HealthCheck is not null + ? new HealthConfig + { + Test = payload.HealthCheck.Test.ToList(), + Interval = (long)payload.HealthCheck.Interval.TotalNanoseconds, + Timeout = (long)payload.HealthCheck.Timeout.TotalNanoseconds, + Retries = payload.HealthCheck.Retries, + StartPeriod = (long)payload.HealthCheck.StartPeriod.TotalNanoseconds + } + : null + }; + + // Create container + var createResponse = await _dockerClient.Containers.CreateContainerAsync(createParams, ct); + + _logger.LogInformation( + "Created container {Name} (ID: {Id})", + payload.Name, + createResponse.ID[..12]); + + // Start container + var started = await _dockerClient.Containers.StartContainerAsync( + createResponse.ID, + new ContainerStartParameters(), + ct); + + if (!started) + { + throw new ContainerStartException(payload.Name, "Container failed to start"); + } + + // Get container info + var containerInfo = await _dockerClient.Containers.InspectContainerAsync(createResponse.ID, ct); + + _logger.LogInformation( + "Started container {Name} (State: {State})", + payload.Name, + containerInfo.State.Status); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["containerId"] = createResponse.ID, + ["containerName"] = payload.Name, + ["state"] = containerInfo.State.Status, + ["ipAddress"] = containerInfo.NetworkSettings.IPAddress + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (DockerApiException ex) + { + _logger.LogError(ex, "Failed to create/start container {Name}", payload.Name); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Failed to create/start container: {ex.Message}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private static List BuildEnvironment( + IReadOnlyDictionary? env, + IReadOnlyDictionary variables) + { + var result = new List(); + + if (env is not null) + { + foreach (var (key, value) in env) + { + // Substitute variables in values + var resolvedValue = SubstituteVariables(value, variables); + result.Add($"{key}={resolvedValue}"); + } + } + + return result; + } + + private static string SubstituteVariables(string value, IReadOnlyDictionary variables) + { + return Regex.Replace(value, @"\$\{([^}]+)\}", match => + { + var varName = match.Groups[1].Value; + return variables.TryGetValue(varName, out var varValue) ? varValue : match.Value; + }); + } + + private static IDictionary> ParsePortBindings(IReadOnlyList? ports) + { + var bindings = new Dictionary>(); + + if (ports is null) + return bindings; + + foreach (var port in ports) + { + // Format: hostPort:containerPort or hostPort:containerPort/protocol + var parts = port.Split(':'); + if (parts.Length != 2) + continue; + + var hostPort = parts[0]; + var containerPortWithProtocol = parts[1]; + var containerPort = containerPortWithProtocol.Contains('/') + ? containerPortWithProtocol + : $"{containerPortWithProtocol}/tcp"; + + bindings[containerPort] = new List + { + new() { HostPort = hostPort } + }; + } + + return bindings; + } +} +``` + +### DockerStopTask + +```csharp +namespace StellaOps.Agent.Docker.Tasks; + +public sealed class DockerStopTask +{ + private readonly IDockerClient _dockerClient; + private readonly ILogger _logger; + + public sealed record StopPayload + { + public string? ContainerId { get; init; } + public string? ContainerName { get; init; } + public TimeSpan Timeout { get; init; } = TimeSpan.FromSeconds(30); + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("docker.stop"); + + var containerId = await ResolveContainerIdAsync(payload, ct); + if (containerId is null) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = "Container not found", + CompletedAt = DateTimeOffset.UtcNow + }; + } + + _logger.LogInformation("Stopping container {ContainerId}", containerId[..12]); + + try + { + var stopped = await _dockerClient.Containers.StopContainerAsync( + containerId, + new ContainerStopParameters + { + WaitBeforeKillSeconds = (uint)payload.Timeout.TotalSeconds + }, + ct); + + if (stopped) + { + _logger.LogInformation("Container {ContainerId} stopped", containerId[..12]); + } + else + { + _logger.LogWarning("Container {ContainerId} was already stopped", containerId[..12]); + } + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["containerId"] = containerId, + ["wasRunning"] = stopped + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (DockerApiException ex) + { + _logger.LogError(ex, "Failed to stop container {ContainerId}", containerId[..12]); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Failed to stop container: {ex.Message}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private async Task ResolveContainerIdAsync(StopPayload payload, CancellationToken ct) + { + if (!string.IsNullOrEmpty(payload.ContainerId)) + { + return payload.ContainerId; + } + + if (!string.IsNullOrEmpty(payload.ContainerName)) + { + var containers = await _dockerClient.Containers.ListContainersAsync( + new ContainersListParameters + { + All = true, + Filters = new Dictionary> + { + ["name"] = new Dictionary { [payload.ContainerName] = true } + } + }, + ct); + + return containers.FirstOrDefault()?.ID; + } + + return null; + } +} +``` + +### DockerHealthCheckTask + +```csharp +namespace StellaOps.Agent.Docker.Tasks; + +public sealed class DockerHealthCheckTask +{ + private readonly IDockerClient _dockerClient; + private readonly ILogger _logger; + + public sealed record HealthCheckPayload + { + public string? ContainerId { get; init; } + public string? ContainerName { get; init; } + public TimeSpan Timeout { get; init; } = TimeSpan.FromSeconds(30); + public bool WaitForHealthy { get; init; } = true; + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("docker.health-check"); + + var containerId = await ResolveContainerIdAsync(payload, ct); + if (containerId is null) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = "Container not found", + CompletedAt = DateTimeOffset.UtcNow + }; + } + + _logger.LogInformation("Checking health of container {ContainerId}", containerId[..12]); + + try + { + using var timeoutCts = new CancellationTokenSource(payload.Timeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + while (!linkedCts.IsCancellationRequested) + { + var containerInfo = await _dockerClient.Containers.InspectContainerAsync(containerId, linkedCts.Token); + + if (containerInfo.State.Status != "running") + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Container not running (state: {containerInfo.State.Status})", + Outputs = new Dictionary + { + ["state"] = containerInfo.State.Status + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + var health = containerInfo.State.Health; + if (health is null) + { + // No health check configured, container is running + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["containerId"] = containerId, + ["state"] = "running", + ["healthCheck"] = "none" + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + if (health.Status == "healthy") + { + _logger.LogInformation("Container {ContainerId} is healthy", containerId[..12]); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["containerId"] = containerId, + ["state"] = "running", + ["healthStatus"] = "healthy", + ["failingStreak"] = health.FailingStreak + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + if (health.Status == "unhealthy") + { + var lastLog = health.Log.LastOrDefault(); + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Container unhealthy: {lastLog?.Output ?? "unknown"}", + Outputs = new Dictionary + { + ["containerId"] = containerId, + ["healthStatus"] = "unhealthy", + ["failingStreak"] = health.FailingStreak + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + if (!payload.WaitForHealthy) + { + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["containerId"] = containerId, + ["healthStatus"] = health.Status + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + // Wait before checking again + await Task.Delay(TimeSpan.FromSeconds(2), linkedCts.Token); + } + + throw new OperationCanceledException(); + } + catch (OperationCanceledException) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Health check timed out after {payload.Timeout}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private async Task ResolveContainerIdAsync(HealthCheckPayload payload, CancellationToken ct) + { + if (!string.IsNullOrEmpty(payload.ContainerId)) + { + return payload.ContainerId; + } + + if (!string.IsNullOrEmpty(payload.ContainerName)) + { + var containers = await _dockerClient.Containers.ListContainersAsync( + new ContainersListParameters + { + All = true, + Filters = new Dictionary> + { + ["name"] = new Dictionary { [payload.ContainerName] = true } + } + }, + ct); + + return containers.FirstOrDefault()?.ID; + } + + return null; + } +} +``` + +### ContainerLogStreamer + +```csharp +namespace StellaOps.Agent.Docker; + +public sealed class ContainerLogStreamer +{ + private readonly IDockerClient _dockerClient; + private readonly LogStreamer _logStreamer; + private readonly ILogger _logger; + + public async Task StreamLogsAsync( + Guid taskId, + string containerId, + CancellationToken ct = default) + { + try + { + var stream = await _dockerClient.Containers.GetContainerLogsAsync( + containerId, + false, + new ContainerLogsParameters + { + Follow = true, + ShowStdout = true, + ShowStderr = true, + Timestamps = true + }, + ct); + + using var reader = new StreamReader(stream); + + while (!ct.IsCancellationRequested) + { + var line = await reader.ReadLineAsync(ct); + if (line is null) + break; + + var (level, message) = ParseLogLine(line); + _logStreamer.Log(taskId, level, message); + } + } + catch (OperationCanceledException) + { + // Expected when task completes + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Error streaming logs for container {ContainerId}", containerId[..12]); + } + } + + private static (LogLevel Level, string Message) ParseLogLine(string line) + { + // Docker log format includes stream type marker + // First 8 bytes are header: [stream_type, 0, 0, 0, size (4 bytes)] + // For text streams, we just parse the content + + var level = LogLevel.Information; + + // Simple heuristic for log level detection + if (line.Contains("ERROR", StringComparison.OrdinalIgnoreCase) || + line.Contains("FATAL", StringComparison.OrdinalIgnoreCase)) + { + level = LogLevel.Error; + } + else if (line.Contains("WARN", StringComparison.OrdinalIgnoreCase)) + { + level = LogLevel.Warning; + } + else if (line.Contains("DEBUG", StringComparison.OrdinalIgnoreCase)) + { + level = LogLevel.Debug; + } + + return (level, line); + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Pull images with digest references +- [ ] Pull from authenticated registries +- [ ] Create containers with environment variables +- [ ] Create containers with port mappings +- [ ] Create containers with volume mounts +- [ ] Start containers successfully +- [ ] Stop containers gracefully +- [ ] Remove containers +- [ ] Check container health status +- [ ] Wait for health check to pass +- [ ] Stream container logs +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 108_001 Agent Core Runtime | Internal | TODO | +| Docker.DotNet | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| DockerCapability | TODO | | +| DockerPullTask | TODO | | +| DockerRunTask | TODO | | +| DockerStopTask | TODO | | +| DockerRemoveTask | TODO | | +| DockerHealthCheckTask | TODO | | +| ContainerLogStreamer | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_108_003_AGENTS_compose.md b/docs/implplan/SPRINT_20260110_108_003_AGENTS_compose.md new file mode 100644 index 000000000..b3c85384d --- /dev/null +++ b/docs/implplan/SPRINT_20260110_108_003_AGENTS_compose.md @@ -0,0 +1,961 @@ +# SPRINT: Agent - Compose + +> **Sprint ID:** 108_003 +> **Module:** AGENTS +> **Phase:** 8 - Agents +> **Status:** TODO +> **Parent:** [108_000_INDEX](SPRINT_20260110_108_000_INDEX_agents.md) + +--- + +## Overview + +Implement the Compose Agent capability for managing docker-compose stacks on target hosts. + +### Objectives + +- Compose stack deployment (up) +- Compose stack teardown (down) +- Service scaling +- Stack health checking +- Compose file management with digest-locked references + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Agents/ +│ └── StellaOps.Agent.Compose/ +│ ├── ComposeCapability.cs +│ ├── Tasks/ +│ │ ├── ComposePullTask.cs +│ │ ├── ComposeUpTask.cs +│ │ ├── ComposeDownTask.cs +│ │ ├── ComposeScaleTask.cs +│ │ └── ComposeHealthCheckTask.cs +│ ├── ComposeFileManager.cs +│ └── ComposeExecutor.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### ComposeCapability + +```csharp +namespace StellaOps.Agent.Compose; + +public sealed class ComposeCapability : IAgentCapability +{ + private readonly ComposeExecutor _executor; + private readonly ComposeFileManager _fileManager; + private readonly ILogger _logger; + private readonly Dictionary>> _taskHandlers; + + public string Name => "compose"; + public string Version => "1.0.0"; + + public IReadOnlyList SupportedTaskTypes => new[] + { + "compose.pull", + "compose.up", + "compose.down", + "compose.scale", + "compose.health-check", + "compose.ps" + }; + + public ComposeCapability( + ComposeExecutor executor, + ComposeFileManager fileManager, + ILogger logger) + { + _executor = executor; + _fileManager = fileManager; + _logger = logger; + + _taskHandlers = new Dictionary>> + { + ["compose.pull"] = ExecutePullAsync, + ["compose.up"] = ExecuteUpAsync, + ["compose.down"] = ExecuteDownAsync, + ["compose.scale"] = ExecuteScaleAsync, + ["compose.health-check"] = ExecuteHealthCheckAsync, + ["compose.ps"] = ExecutePsAsync + }; + } + + public async Task InitializeAsync(CancellationToken ct = default) + { + try + { + var version = await _executor.GetVersionAsync(ct); + _logger.LogInformation("Compose capability initialized: {Version}", version); + return true; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to initialize Compose capability"); + return false; + } + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct = default) + { + if (!_taskHandlers.TryGetValue(task.TaskType, out var handler)) + { + throw new UnsupportedTaskTypeException(task.TaskType); + } + + return await handler(task, ct); + } + + public async Task CheckHealthAsync(CancellationToken ct = default) + { + try + { + await _executor.GetVersionAsync(ct); + return new CapabilityHealthStatus(true, "Docker Compose available"); + } + catch (Exception ex) + { + return new CapabilityHealthStatus(false, $"Docker Compose not available: {ex.Message}"); + } + } + + private Task ExecutePullAsync(AgentTask task, CancellationToken ct) => + new ComposePullTask(_executor, _fileManager, _logger).ExecuteAsync(task, ct); + + private Task ExecuteUpAsync(AgentTask task, CancellationToken ct) => + new ComposeUpTask(_executor, _fileManager, _logger).ExecuteAsync(task, ct); + + private Task ExecuteDownAsync(AgentTask task, CancellationToken ct) => + new ComposeDownTask(_executor, _fileManager, _logger).ExecuteAsync(task, ct); + + private Task ExecuteScaleAsync(AgentTask task, CancellationToken ct) => + new ComposeScaleTask(_executor, _logger).ExecuteAsync(task, ct); + + private Task ExecuteHealthCheckAsync(AgentTask task, CancellationToken ct) => + new ComposeHealthCheckTask(_executor, _logger).ExecuteAsync(task, ct); + + private Task ExecutePsAsync(AgentTask task, CancellationToken ct) => + new ComposePsTask(_executor, _logger).ExecuteAsync(task, ct); +} +``` + +### ComposeExecutor + +```csharp +namespace StellaOps.Agent.Compose; + +public sealed class ComposeExecutor +{ + private readonly string _composeCommand; + private readonly ILogger _logger; + + public ComposeExecutor(ILogger logger) + { + _logger = logger; + // Detect docker compose v2 vs docker-compose v1 + _composeCommand = DetectComposeCommand(); + } + + public async Task GetVersionAsync(CancellationToken ct = default) + { + var result = await ExecuteAsync("version --short", null, ct); + return result.StandardOutput.Trim(); + } + + public async Task PullAsync( + string projectDir, + string composeFile, + IReadOnlyDictionary? credentials = null, + CancellationToken ct = default) + { + var args = $"-f {composeFile} pull"; + return await ExecuteAsync(args, projectDir, ct, BuildEnvironment(credentials)); + } + + public async Task UpAsync( + string projectDir, + string composeFile, + ComposeUpOptions options, + CancellationToken ct = default) + { + var args = $"-f {composeFile} up -d"; + + if (options.ForceRecreate) + args += " --force-recreate"; + + if (options.RemoveOrphans) + args += " --remove-orphans"; + + if (options.NoStart) + args += " --no-start"; + + if (options.Services?.Count > 0) + args += " " + string.Join(" ", options.Services); + + return await ExecuteAsync(args, projectDir, ct, options.Environment); + } + + public async Task DownAsync( + string projectDir, + string composeFile, + ComposeDownOptions options, + CancellationToken ct = default) + { + var args = $"-f {composeFile} down"; + + if (options.RemoveVolumes) + args += " -v"; + + if (options.RemoveOrphans) + args += " --remove-orphans"; + + if (options.Timeout.HasValue) + args += $" -t {(int)options.Timeout.Value.TotalSeconds}"; + + return await ExecuteAsync(args, projectDir, ct); + } + + public async Task ScaleAsync( + string projectDir, + string composeFile, + IReadOnlyDictionary scaling, + CancellationToken ct = default) + { + var scaleArgs = string.Join(" ", scaling.Select(kv => $"{kv.Key}={kv.Value}")); + var args = $"-f {composeFile} up -d --no-recreate --scale {scaleArgs}"; + return await ExecuteAsync(args, projectDir, ct); + } + + public async Task PsAsync( + string projectDir, + string composeFile, + bool all = false, + CancellationToken ct = default) + { + var args = $"-f {composeFile} ps --format json"; + if (all) + args += " -a"; + + return await ExecuteAsync(args, projectDir, ct); + } + + private async Task ExecuteAsync( + string arguments, + string? workingDirectory, + CancellationToken ct, + IReadOnlyDictionary? environment = null) + { + var psi = new ProcessStartInfo + { + FileName = _composeCommand.Split(' ')[0], + Arguments = _composeCommand.Contains(' ') + ? $"{_composeCommand.Substring(_composeCommand.IndexOf(' ') + 1)} {arguments}" + : arguments, + WorkingDirectory = workingDirectory ?? Environment.CurrentDirectory, + RedirectStandardOutput = true, + RedirectStandardError = true, + UseShellExecute = false, + CreateNoWindow = true + }; + + if (environment is not null) + { + foreach (var (key, value) in environment) + { + psi.Environment[key] = value; + } + } + + _logger.LogDebug("Executing: {Command} {Args}", psi.FileName, psi.Arguments); + + using var process = new Process { StartInfo = psi }; + var stdout = new StringBuilder(); + var stderr = new StringBuilder(); + + process.OutputDataReceived += (_, e) => + { + if (e.Data is not null) + stdout.AppendLine(e.Data); + }; + + process.ErrorDataReceived += (_, e) => + { + if (e.Data is not null) + stderr.AppendLine(e.Data); + }; + + process.Start(); + process.BeginOutputReadLine(); + process.BeginErrorReadLine(); + + await process.WaitForExitAsync(ct); + + var result = new ComposeResult( + process.ExitCode == 0, + process.ExitCode, + stdout.ToString(), + stderr.ToString()); + + if (!result.Success) + { + _logger.LogWarning( + "Compose command failed with exit code {ExitCode}: {Stderr}", + result.ExitCode, + result.StandardError); + } + + return result; + } + + private static string DetectComposeCommand() + { + // Try docker compose (v2) first + try + { + var psi = new ProcessStartInfo + { + FileName = "docker", + Arguments = "compose version", + RedirectStandardOutput = true, + RedirectStandardError = true, + UseShellExecute = false, + CreateNoWindow = true + }; + + using var process = Process.Start(psi); + process?.WaitForExit(5000); + if (process?.ExitCode == 0) + { + return "docker compose"; + } + } + catch { } + + // Fall back to docker-compose (v1) + return "docker-compose"; + } + + private static IReadOnlyDictionary? BuildEnvironment( + IReadOnlyDictionary? credentials) + { + if (credentials is null) + return null; + + var env = new Dictionary(); + + if (credentials.TryGetValue("registry.username", out var user)) + env["DOCKER_REGISTRY_USER"] = user; + + if (credentials.TryGetValue("registry.password", out var pass)) + env["DOCKER_REGISTRY_PASSWORD"] = pass; + + return env; + } +} + +public sealed record ComposeResult( + bool Success, + int ExitCode, + string StandardOutput, + string StandardError +); + +public sealed record ComposeUpOptions +{ + public bool ForceRecreate { get; init; } + public bool RemoveOrphans { get; init; } = true; + public bool NoStart { get; init; } + public IReadOnlyList? Services { get; init; } + public IReadOnlyDictionary? Environment { get; init; } +} + +public sealed record ComposeDownOptions +{ + public bool RemoveVolumes { get; init; } + public bool RemoveOrphans { get; init; } = true; + public TimeSpan? Timeout { get; init; } +} +``` + +### ComposeFileManager + +```csharp +namespace StellaOps.Agent.Compose; + +public sealed class ComposeFileManager +{ + private readonly string _deploymentRoot; + private readonly ILogger _logger; + + public ComposeFileManager(AgentConfiguration config, ILogger logger) + { + _deploymentRoot = config.DeploymentRoot ?? "/var/lib/stella-agent/deployments"; + _logger = logger; + } + + public async Task WriteComposeFileAsync( + string projectName, + string composeLockContent, + string versionStickerContent, + CancellationToken ct = default) + { + var projectDir = Path.Combine(_deploymentRoot, projectName); + Directory.CreateDirectory(projectDir); + + // Write compose.stella.lock.yml + var composeFile = Path.Combine(projectDir, "compose.stella.lock.yml"); + await File.WriteAllTextAsync(composeFile, composeLockContent, ct); + _logger.LogDebug("Wrote compose file: {Path}", composeFile); + + // Write stella.version.json + var versionFile = Path.Combine(projectDir, "stella.version.json"); + await File.WriteAllTextAsync(versionFile, versionStickerContent, ct); + _logger.LogDebug("Wrote version sticker: {Path}", versionFile); + + return projectDir; + } + + public string GetProjectDirectory(string projectName) + { + return Path.Combine(_deploymentRoot, projectName); + } + + public string GetComposeFilePath(string projectName) + { + return Path.Combine(GetProjectDirectory(projectName), "compose.stella.lock.yml"); + } + + public async Task GetVersionStickerAsync(string projectName, CancellationToken ct = default) + { + var path = Path.Combine(GetProjectDirectory(projectName), "stella.version.json"); + if (!File.Exists(path)) + return null; + + return await File.ReadAllTextAsync(path, ct); + } + + public async Task BackupExistingAsync(string projectName, CancellationToken ct = default) + { + var projectDir = GetProjectDirectory(projectName); + if (!Directory.Exists(projectDir)) + return; + + var backupDir = Path.Combine(projectDir, ".backup", DateTime.UtcNow.ToString("yyyyMMdd-HHmmss")); + Directory.CreateDirectory(backupDir); + + foreach (var file in Directory.GetFiles(projectDir, "*.*")) + { + var fileName = Path.GetFileName(file); + if (fileName.StartsWith(".")) + continue; + + File.Copy(file, Path.Combine(backupDir, fileName)); + } + + _logger.LogDebug("Backed up existing deployment to {BackupDir}", backupDir); + } + + public async Task CleanupAsync(string projectName, CancellationToken ct = default) + { + var projectDir = GetProjectDirectory(projectName); + if (Directory.Exists(projectDir)) + { + Directory.Delete(projectDir, recursive: true); + _logger.LogDebug("Cleaned up project directory: {Path}", projectDir); + } + } +} +``` + +### ComposeUpTask + +```csharp +namespace StellaOps.Agent.Compose.Tasks; + +public sealed class ComposeUpTask +{ + private readonly ComposeExecutor _executor; + private readonly ComposeFileManager _fileManager; + private readonly ILogger _logger; + + public sealed record UpPayload + { + public required string ProjectName { get; init; } + public required string ComposeLock { get; init; } + public required string VersionSticker { get; init; } + public bool ForceRecreate { get; init; } = true; + public bool RemoveOrphans { get; init; } = true; + public IReadOnlyList? Services { get; init; } + public IReadOnlyDictionary? Environment { get; init; } + public bool BackupExisting { get; init; } = true; + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("compose.up"); + + _logger.LogInformation("Deploying compose stack: {Project}", payload.ProjectName); + + try + { + // Backup existing deployment + if (payload.BackupExisting) + { + await _fileManager.BackupExistingAsync(payload.ProjectName, ct); + } + + // Write compose files + var projectDir = await _fileManager.WriteComposeFileAsync( + payload.ProjectName, + payload.ComposeLock, + payload.VersionSticker, + ct); + + var composeFile = _fileManager.GetComposeFilePath(payload.ProjectName); + + // Pull images first + _logger.LogInformation("Pulling images for {Project}", payload.ProjectName); + var pullResult = await _executor.PullAsync( + projectDir, + composeFile, + task.Credentials as IReadOnlyDictionary, + ct); + + if (!pullResult.Success) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Failed to pull images: {pullResult.StandardError}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + + // Deploy the stack + _logger.LogInformation("Starting compose stack: {Project}", payload.ProjectName); + var upResult = await _executor.UpAsync( + projectDir, + composeFile, + new ComposeUpOptions + { + ForceRecreate = payload.ForceRecreate, + RemoveOrphans = payload.RemoveOrphans, + Services = payload.Services, + Environment = MergeEnvironment(payload.Environment, task.Variables) + }, + ct); + + if (!upResult.Success) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Failed to deploy stack: {upResult.StandardError}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + + // Get running services + var psResult = await _executor.PsAsync(projectDir, composeFile, ct: ct); + var services = ParseServicesFromPs(psResult.StandardOutput); + + _logger.LogInformation( + "Deployed compose stack {Project} with {Count} services", + payload.ProjectName, + services.Count); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["projectName"] = payload.ProjectName, + ["projectDir"] = projectDir, + ["services"] = services + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to deploy compose stack {Project}", payload.ProjectName); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = ex.Message, + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private static IReadOnlyDictionary? MergeEnvironment( + IReadOnlyDictionary? env, + IReadOnlyDictionary variables) + { + if (env is null && variables.Count == 0) + return null; + + var merged = new Dictionary(variables); + if (env is not null) + { + foreach (var (key, value) in env) + { + merged[key] = value; + } + } + return merged; + } + + private static IReadOnlyList ParseServicesFromPs(string output) + { + if (string.IsNullOrWhiteSpace(output)) + return []; + + try + { + var services = new List(); + foreach (var line in output.Split('\n', StringSplitOptions.RemoveEmptyEntries)) + { + var service = JsonSerializer.Deserialize(line); + services.Add(new ServiceStatus( + service.GetProperty("Name").GetString() ?? "", + service.GetProperty("Service").GetString() ?? "", + service.GetProperty("State").GetString() ?? "", + service.GetProperty("Health").GetString() + )); + } + return services; + } + catch + { + return []; + } + } +} + +public sealed record ServiceStatus( + string Name, + string Service, + string State, + string? Health +); +``` + +### ComposeDownTask + +```csharp +namespace StellaOps.Agent.Compose.Tasks; + +public sealed class ComposeDownTask +{ + private readonly ComposeExecutor _executor; + private readonly ComposeFileManager _fileManager; + private readonly ILogger _logger; + + public sealed record DownPayload + { + public required string ProjectName { get; init; } + public bool RemoveVolumes { get; init; } + public bool RemoveOrphans { get; init; } = true; + public bool CleanupFiles { get; init; } + public TimeSpan? Timeout { get; init; } = TimeSpan.FromSeconds(30); + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("compose.down"); + + _logger.LogInformation("Stopping compose stack: {Project}", payload.ProjectName); + + try + { + var projectDir = _fileManager.GetProjectDirectory(payload.ProjectName); + var composeFile = _fileManager.GetComposeFilePath(payload.ProjectName); + + if (!File.Exists(composeFile)) + { + _logger.LogWarning( + "Compose file not found for project {Project}, skipping down", + payload.ProjectName); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["projectName"] = payload.ProjectName, + ["skipped"] = true, + ["reason"] = "Compose file not found" + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + var result = await _executor.DownAsync( + projectDir, + composeFile, + new ComposeDownOptions + { + RemoveVolumes = payload.RemoveVolumes, + RemoveOrphans = payload.RemoveOrphans, + Timeout = payload.Timeout + }, + ct); + + if (!result.Success) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Failed to stop stack: {result.StandardError}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + + // Cleanup files if requested + if (payload.CleanupFiles) + { + await _fileManager.CleanupAsync(payload.ProjectName, ct); + } + + _logger.LogInformation("Stopped compose stack: {Project}", payload.ProjectName); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["projectName"] = payload.ProjectName, + ["removedVolumes"] = payload.RemoveVolumes, + ["cleanedFiles"] = payload.CleanupFiles + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to stop compose stack {Project}", payload.ProjectName); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = ex.Message, + CompletedAt = DateTimeOffset.UtcNow + }; + } + } +} +``` + +### ComposeHealthCheckTask + +```csharp +namespace StellaOps.Agent.Compose.Tasks; + +public sealed class ComposeHealthCheckTask +{ + private readonly ComposeExecutor _executor; + private readonly ILogger _logger; + + public sealed record HealthCheckPayload + { + public required string ProjectName { get; init; } + public string? ComposeFile { get; init; } + public IReadOnlyList? Services { get; init; } + public TimeSpan Timeout { get; init; } = TimeSpan.FromMinutes(5); + public bool WaitForHealthy { get; init; } = true; + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("compose.health-check"); + + _logger.LogInformation("Checking health of compose stack: {Project}", payload.ProjectName); + + try + { + var projectDir = Path.Combine("/var/lib/stella-agent/deployments", payload.ProjectName); + var composeFile = payload.ComposeFile ?? Path.Combine(projectDir, "compose.stella.lock.yml"); + + using var timeoutCts = new CancellationTokenSource(payload.Timeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + while (!linkedCts.IsCancellationRequested) + { + var psResult = await _executor.PsAsync(projectDir, composeFile, ct: linkedCts.Token); + var services = ParseServices(psResult.StandardOutput); + + // Filter to requested services if specified + if (payload.Services?.Count > 0) + { + services = services.Where(s => payload.Services.Contains(s.Service)).ToList(); + } + + var allRunning = services.All(s => s.State == "running"); + var allHealthy = services.All(s => + s.Health is null || s.Health == "healthy" || s.Health == ""); + + if (allRunning && allHealthy) + { + _logger.LogInformation( + "Compose stack {Project} is healthy ({Count} services)", + payload.ProjectName, + services.Count); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["projectName"] = payload.ProjectName, + ["services"] = services, + ["allHealthy"] = true + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + var unhealthyServices = services.Where(s => + s.State != "running" || (s.Health is not null && s.Health != "healthy" && s.Health != "")); + + if (!payload.WaitForHealthy) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = "Some services are unhealthy", + Outputs = new Dictionary + { + ["projectName"] = payload.ProjectName, + ["services"] = services, + ["unhealthyServices"] = unhealthyServices.ToList() + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + await Task.Delay(TimeSpan.FromSeconds(5), linkedCts.Token); + } + + throw new OperationCanceledException(); + } + catch (OperationCanceledException) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Health check timed out after {payload.Timeout}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "Health check failed for stack {Project}", payload.ProjectName); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = ex.Message, + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private static List ParseServices(string output) + { + var services = new List(); + + if (string.IsNullOrWhiteSpace(output)) + return services; + + foreach (var line in output.Split('\n', StringSplitOptions.RemoveEmptyEntries)) + { + try + { + var service = JsonSerializer.Deserialize(line); + services.Add(new ServiceStatus( + service.GetProperty("Name").GetString() ?? "", + service.GetProperty("Service").GetString() ?? "", + service.GetProperty("State").GetString() ?? "", + service.TryGetProperty("Health", out var health) ? health.GetString() : null + )); + } + catch { } + } + + return services; + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Deploy compose stack from compose.stella.lock.yml +- [ ] Pull images before deployment +- [ ] Support authenticated registries +- [ ] Force recreate containers option +- [ ] Remove orphan containers +- [ ] Stop and remove compose stack +- [ ] Optionally remove volumes on down +- [ ] Scale services up/down +- [ ] Check health of all services +- [ ] Wait for services to become healthy +- [ ] Backup existing deployment before update +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 108_001 Agent Core Runtime | Internal | TODO | +| 108_002 Agent - Docker | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| ComposeCapability | TODO | | +| ComposeExecutor | TODO | | +| ComposeFileManager | TODO | | +| ComposePullTask | TODO | | +| ComposeUpTask | TODO | | +| ComposeDownTask | TODO | | +| ComposeScaleTask | TODO | | +| ComposeHealthCheckTask | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_108_004_AGENTS_ssh.md b/docs/implplan/SPRINT_20260110_108_004_AGENTS_ssh.md new file mode 100644 index 000000000..c7e2a90a0 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_108_004_AGENTS_ssh.md @@ -0,0 +1,797 @@ +# SPRINT: Agent - SSH + +> **Sprint ID:** 108_004 +> **Module:** AGENTS +> **Phase:** 8 - Agents +> **Status:** TODO +> **Parent:** [108_000_INDEX](SPRINT_20260110_108_000_INDEX_agents.md) + +--- + +## Overview + +Implement the SSH Agent capability for remote command execution and file transfer via SSH. + +### Objectives + +- Remote command execution via SSH +- File transfer (SCP/SFTP) +- SSH key authentication +- SSH tunneling for remote Docker/Compose operations +- Connection pooling for efficiency + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Agents/ +│ └── StellaOps.Agent.Ssh/ +│ ├── SshCapability.cs +│ ├── Tasks/ +│ │ ├── SshExecuteTask.cs +│ │ ├── SshFileTransferTask.cs +│ │ ├── SshTunnelTask.cs +│ │ └── SshDockerProxyTask.cs +│ ├── SshConnectionPool.cs +│ └── SshClientFactory.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### SshCapability + +```csharp +namespace StellaOps.Agent.Ssh; + +public sealed class SshCapability : IAgentCapability +{ + private readonly SshConnectionPool _connectionPool; + private readonly ILogger _logger; + private readonly Dictionary>> _taskHandlers; + + public string Name => "ssh"; + public string Version => "1.0.0"; + + public IReadOnlyList SupportedTaskTypes => new[] + { + "ssh.execute", + "ssh.upload", + "ssh.download", + "ssh.tunnel", + "ssh.docker-proxy" + }; + + public SshCapability(SshConnectionPool connectionPool, ILogger logger) + { + _connectionPool = connectionPool; + _logger = logger; + + _taskHandlers = new Dictionary>> + { + ["ssh.execute"] = ExecuteCommandAsync, + ["ssh.upload"] = UploadFileAsync, + ["ssh.download"] = DownloadFileAsync, + ["ssh.tunnel"] = CreateTunnelAsync, + ["ssh.docker-proxy"] = DockerProxyAsync + }; + } + + public Task InitializeAsync(CancellationToken ct = default) + { + // SSH capability is always available if SSH.NET is loaded + _logger.LogInformation("SSH capability initialized"); + return Task.FromResult(true); + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct = default) + { + if (!_taskHandlers.TryGetValue(task.TaskType, out var handler)) + { + throw new UnsupportedTaskTypeException(task.TaskType); + } + + return await handler(task, ct); + } + + public Task CheckHealthAsync(CancellationToken ct = default) + { + return Task.FromResult(new CapabilityHealthStatus(true, "SSH capability available")); + } + + private Task ExecuteCommandAsync(AgentTask task, CancellationToken ct) => + new SshExecuteTask(_connectionPool, _logger).ExecuteAsync(task, ct); + + private Task UploadFileAsync(AgentTask task, CancellationToken ct) => + new SshFileTransferTask(_connectionPool, _logger).UploadAsync(task, ct); + + private Task DownloadFileAsync(AgentTask task, CancellationToken ct) => + new SshFileTransferTask(_connectionPool, _logger).DownloadAsync(task, ct); + + private Task CreateTunnelAsync(AgentTask task, CancellationToken ct) => + new SshTunnelTask(_connectionPool, _logger).ExecuteAsync(task, ct); + + private Task DockerProxyAsync(AgentTask task, CancellationToken ct) => + new SshDockerProxyTask(_connectionPool, _logger).ExecuteAsync(task, ct); +} +``` + +### SshConnectionPool + +```csharp +namespace StellaOps.Agent.Ssh; + +public sealed class SshConnectionPool : IAsyncDisposable +{ + private readonly ConcurrentDictionary _connections = new(); + private readonly TimeSpan _connectionTimeout = TimeSpan.FromMinutes(10); + private readonly ILogger _logger; + private readonly Timer _cleanupTimer; + + public SshConnectionPool(ILogger logger) + { + _logger = logger; + _cleanupTimer = new Timer(CleanupExpiredConnections, null, TimeSpan.FromMinutes(1), TimeSpan.FromMinutes(1)); + } + + public async Task GetConnectionAsync( + SshConnectionInfo connectionInfo, + CancellationToken ct = default) + { + var key = connectionInfo.GetConnectionKey(); + + if (_connections.TryGetValue(key, out var pooled) && pooled.Client.IsConnected) + { + pooled.LastUsed = DateTimeOffset.UtcNow; + return pooled.Client; + } + + var client = await CreateConnectionAsync(connectionInfo, ct); + _connections[key] = new PooledConnection(client, DateTimeOffset.UtcNow); + + return client; + } + + private async Task CreateConnectionAsync( + SshConnectionInfo info, + CancellationToken ct) + { + var authMethods = new List(); + + // Private key authentication + if (!string.IsNullOrEmpty(info.PrivateKey)) + { + var keyFile = string.IsNullOrEmpty(info.PrivateKeyPassphrase) + ? new PrivateKeyFile(new MemoryStream(Encoding.UTF8.GetBytes(info.PrivateKey))) + : new PrivateKeyFile(new MemoryStream(Encoding.UTF8.GetBytes(info.PrivateKey)), info.PrivateKeyPassphrase); + + authMethods.Add(new PrivateKeyAuthenticationMethod(info.Username, keyFile)); + } + + // Password authentication + if (!string.IsNullOrEmpty(info.Password)) + { + authMethods.Add(new PasswordAuthenticationMethod(info.Username, info.Password)); + } + + var connectionInfo = new ConnectionInfo( + info.Host, + info.Port, + info.Username, + authMethods.ToArray()); + + var client = new SshClient(connectionInfo); + + await Task.Run(() => client.Connect(), ct); + + _logger.LogDebug( + "SSH connection established to {User}@{Host}:{Port}", + info.Username, + info.Host, + info.Port); + + return client; + } + + public void ReleaseConnection(string connectionKey) + { + // Connection stays in pool for reuse + if (_connections.TryGetValue(connectionKey, out var pooled)) + { + pooled.LastUsed = DateTimeOffset.UtcNow; + } + } + + private void CleanupExpiredConnections(object? state) + { + var expired = _connections + .Where(kv => DateTimeOffset.UtcNow - kv.Value.LastUsed > _connectionTimeout) + .ToList(); + + foreach (var (key, pooled) in expired) + { + if (_connections.TryRemove(key, out _)) + { + try + { + pooled.Client.Disconnect(); + pooled.Client.Dispose(); + _logger.LogDebug("Closed expired SSH connection: {Key}", key); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Error closing SSH connection: {Key}", key); + } + } + } + } + + public async ValueTask DisposeAsync() + { + _cleanupTimer.Dispose(); + + foreach (var (_, pooled) in _connections) + { + try + { + pooled.Client.Disconnect(); + pooled.Client.Dispose(); + } + catch { } + } + + _connections.Clear(); + } + + private sealed class PooledConnection + { + public SshClient Client { get; } + public DateTimeOffset LastUsed { get; set; } + + public PooledConnection(SshClient client, DateTimeOffset lastUsed) + { + Client = client; + LastUsed = lastUsed; + } + } +} + +public sealed record SshConnectionInfo +{ + public required string Host { get; init; } + public int Port { get; init; } = 22; + public required string Username { get; init; } + public string? Password { get; init; } + public string? PrivateKey { get; init; } + public string? PrivateKeyPassphrase { get; init; } + + public string GetConnectionKey() => $"{Username}@{Host}:{Port}"; +} +``` + +### SshExecuteTask + +```csharp +namespace StellaOps.Agent.Ssh.Tasks; + +public sealed class SshExecuteTask +{ + private readonly SshConnectionPool _connectionPool; + private readonly ILogger _logger; + + public sealed record ExecutePayload + { + public required string Host { get; init; } + public int Port { get; init; } = 22; + public required string Username { get; init; } + public required string Command { get; init; } + public IReadOnlyDictionary? Environment { get; init; } + public string? WorkingDirectory { get; init; } + public TimeSpan Timeout { get; init; } = TimeSpan.FromMinutes(10); + public bool CombineOutput { get; init; } = true; + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("ssh.execute"); + + var connectionInfo = new SshConnectionInfo + { + Host = payload.Host, + Port = payload.Port, + Username = payload.Username, + Password = task.Credentials.GetValueOrDefault("ssh.password"), + PrivateKey = task.Credentials.GetValueOrDefault("ssh.privateKey"), + PrivateKeyPassphrase = task.Credentials.GetValueOrDefault("ssh.passphrase") + }; + + _logger.LogInformation( + "Executing SSH command on {User}@{Host}", + payload.Username, + payload.Host); + + try + { + var client = await _connectionPool.GetConnectionAsync(connectionInfo, ct); + + // Build command with environment and working directory + var fullCommand = BuildCommand(payload); + + using var command = client.CreateCommand(fullCommand); + command.CommandTimeout = payload.Timeout; + + var asyncResult = command.BeginExecute(); + + // Wait for completion with cancellation support + while (!asyncResult.IsCompleted) + { + ct.ThrowIfCancellationRequested(); + await Task.Delay(100, ct); + } + + var result = command.EndExecute(asyncResult); + + var exitCode = command.ExitStatus; + var stdout = result; + var stderr = command.Error; + + _logger.LogInformation( + "SSH command completed with exit code {ExitCode}", + exitCode); + + return new TaskResult + { + TaskId = task.Id, + Success = exitCode == 0, + Error = exitCode != 0 ? stderr : null, + Outputs = new Dictionary + { + ["exitCode"] = exitCode, + ["stdout"] = stdout, + ["stderr"] = stderr, + ["output"] = payload.CombineOutput ? $"{stdout}\n{stderr}" : stdout + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (SshException ex) + { + _logger.LogError(ex, "SSH command failed on {Host}", payload.Host); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"SSH error: {ex.Message}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private static string BuildCommand(ExecutePayload payload) + { + var parts = new List(); + + // Set environment variables + if (payload.Environment is not null) + { + foreach (var (key, value) in payload.Environment) + { + parts.Add($"export {key}='{EscapeShellString(value)}'"); + } + } + + // Change to working directory + if (!string.IsNullOrEmpty(payload.WorkingDirectory)) + { + parts.Add($"cd '{EscapeShellString(payload.WorkingDirectory)}'"); + } + + parts.Add(payload.Command); + + return string.Join(" && ", parts); + } + + private static string EscapeShellString(string value) + { + return value.Replace("'", "'\"'\"'"); + } +} +``` + +### SshFileTransferTask + +```csharp +namespace StellaOps.Agent.Ssh.Tasks; + +public sealed class SshFileTransferTask +{ + private readonly SshConnectionPool _connectionPool; + private readonly ILogger _logger; + + public sealed record UploadPayload + { + public required string Host { get; init; } + public int Port { get; init; } = 22; + public required string Username { get; init; } + public required string LocalPath { get; init; } + public required string RemotePath { get; init; } + public bool CreateDirectory { get; init; } = true; + public int Permissions { get; init; } = 0644; + } + + public sealed record DownloadPayload + { + public required string Host { get; init; } + public int Port { get; init; } = 22; + public required string Username { get; init; } + public required string RemotePath { get; init; } + public required string LocalPath { get; init; } + public bool CreateDirectory { get; init; } = true; + } + + public async Task UploadAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("ssh.upload"); + + var connectionInfo = BuildConnectionInfo(payload.Host, payload.Port, payload.Username, task.Credentials); + + _logger.LogInformation( + "Uploading {Local} to {User}@{Host}:{Remote}", + payload.LocalPath, + payload.Username, + payload.Host, + payload.RemotePath); + + try + { + var client = await _connectionPool.GetConnectionAsync(connectionInfo, ct); + + using var sftp = new SftpClient(client.ConnectionInfo); + await Task.Run(() => sftp.Connect(), ct); + + // Create parent directory if needed + if (payload.CreateDirectory) + { + var parentDir = Path.GetDirectoryName(payload.RemotePath)?.Replace('\\', '/'); + if (!string.IsNullOrEmpty(parentDir)) + { + await CreateRemoteDirectoryAsync(sftp, parentDir, ct); + } + } + + // Upload file + await using var localFile = File.OpenRead(payload.LocalPath); + await Task.Run(() => sftp.UploadFile(localFile, payload.RemotePath), ct); + + // Set permissions + sftp.ChangePermissions(payload.RemotePath, (short)payload.Permissions); + + var fileInfo = sftp.GetAttributes(payload.RemotePath); + + sftp.Disconnect(); + + _logger.LogInformation( + "Uploaded {Size} bytes to {Remote}", + fileInfo.Size, + payload.RemotePath); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["remotePath"] = payload.RemotePath, + ["size"] = fileInfo.Size, + ["permissions"] = payload.Permissions + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (SftpPathNotFoundException ex) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Remote path not found: {ex.Message}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to upload file to {Host}", payload.Host); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = ex.Message, + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + public async Task DownloadAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("ssh.download"); + + var connectionInfo = BuildConnectionInfo(payload.Host, payload.Port, payload.Username, task.Credentials); + + _logger.LogInformation( + "Downloading {User}@{Host}:{Remote} to {Local}", + payload.Username, + payload.Host, + payload.RemotePath, + payload.LocalPath); + + try + { + var client = await _connectionPool.GetConnectionAsync(connectionInfo, ct); + + using var sftp = new SftpClient(client.ConnectionInfo); + await Task.Run(() => sftp.Connect(), ct); + + // Create local directory if needed + if (payload.CreateDirectory) + { + var localDir = Path.GetDirectoryName(payload.LocalPath); + if (!string.IsNullOrEmpty(localDir)) + { + Directory.CreateDirectory(localDir); + } + } + + // Download file + var remoteAttributes = sftp.GetAttributes(payload.RemotePath); + await using var localFile = File.Create(payload.LocalPath); + await Task.Run(() => sftp.DownloadFile(payload.RemotePath, localFile), ct); + + sftp.Disconnect(); + + _logger.LogInformation( + "Downloaded {Size} bytes to {Local}", + remoteAttributes.Size, + payload.LocalPath); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["localPath"] = payload.LocalPath, + ["size"] = remoteAttributes.Size + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (SftpPathNotFoundException) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Remote file not found: {payload.RemotePath}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to download file from {Host}", payload.Host); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = ex.Message, + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private static async Task CreateRemoteDirectoryAsync(SftpClient sftp, string path, CancellationToken ct) + { + var parts = path.Split('/').Where(p => !string.IsNullOrEmpty(p)).ToList(); + var current = ""; + + foreach (var part in parts) + { + current = $"{current}/{part}"; + + try + { + var attrs = sftp.GetAttributes(current); + if (!attrs.IsDirectory) + { + throw new InvalidOperationException($"Path exists but is not a directory: {current}"); + } + } + catch (SftpPathNotFoundException) + { + await Task.Run(() => sftp.CreateDirectory(current), ct); + } + } + } + + private static SshConnectionInfo BuildConnectionInfo( + string host, + int port, + string username, + IReadOnlyDictionary credentials) + { + return new SshConnectionInfo + { + Host = host, + Port = port, + Username = username, + Password = credentials.GetValueOrDefault("ssh.password"), + PrivateKey = credentials.GetValueOrDefault("ssh.privateKey"), + PrivateKeyPassphrase = credentials.GetValueOrDefault("ssh.passphrase") + }; + } +} +``` + +### SshTunnelTask + +```csharp +namespace StellaOps.Agent.Ssh.Tasks; + +public sealed class SshTunnelTask +{ + private readonly SshConnectionPool _connectionPool; + private readonly ILogger _logger; + + public sealed record TunnelPayload + { + public required string Host { get; init; } + public int Port { get; init; } = 22; + public required string Username { get; init; } + public required int LocalPort { get; init; } + public required string RemoteHost { get; init; } + public required int RemotePort { get; init; } + public TimeSpan Duration { get; init; } = TimeSpan.FromMinutes(30); + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("ssh.tunnel"); + + var connectionInfo = new SshConnectionInfo + { + Host = payload.Host, + Port = payload.Port, + Username = payload.Username, + Password = task.Credentials.GetValueOrDefault("ssh.password"), + PrivateKey = task.Credentials.GetValueOrDefault("ssh.privateKey"), + PrivateKeyPassphrase = task.Credentials.GetValueOrDefault("ssh.passphrase") + }; + + _logger.LogInformation( + "Creating SSH tunnel: localhost:{Local} -> {User}@{Host} -> {RemoteHost}:{RemotePort}", + payload.LocalPort, + payload.Username, + payload.Host, + payload.RemoteHost, + payload.RemotePort); + + try + { + var client = await _connectionPool.GetConnectionAsync(connectionInfo, ct); + + var tunnel = new ForwardedPortLocal( + "127.0.0.1", + (uint)payload.LocalPort, + payload.RemoteHost, + (uint)payload.RemotePort); + + client.AddForwardedPort(tunnel); + tunnel.Start(); + + _logger.LogInformation( + "SSH tunnel established: localhost:{Local} -> {RemoteHost}:{RemotePort}", + payload.LocalPort, + payload.RemoteHost, + payload.RemotePort); + + // Keep tunnel open for specified duration + using var durationCts = new CancellationTokenSource(payload.Duration); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, durationCts.Token); + + try + { + await Task.Delay(payload.Duration, linkedCts.Token); + } + catch (OperationCanceledException) when (durationCts.IsCancellationRequested) + { + // Duration expired, normal completion + } + + tunnel.Stop(); + client.RemoveForwardedPort(tunnel); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["localPort"] = payload.LocalPort, + ["remoteHost"] = payload.RemoteHost, + ["remotePort"] = payload.RemotePort, + ["duration"] = payload.Duration.ToString() + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to create SSH tunnel to {Host}", payload.Host); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = ex.Message, + CompletedAt = DateTimeOffset.UtcNow + }; + } + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Execute remote commands via SSH +- [ ] Support password authentication +- [ ] Support private key authentication +- [ ] Support passphrase-protected keys +- [ ] Upload files via SFTP +- [ ] Download files via SFTP +- [ ] Create remote directories automatically +- [ ] Set file permissions on upload +- [ ] Create SSH tunnels for port forwarding +- [ ] Connection pooling for efficiency +- [ ] Timeout handling for commands +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 108_001 Agent Core Runtime | Internal | TODO | +| SSH.NET | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| SshCapability | TODO | | +| SshConnectionPool | TODO | | +| SshExecuteTask | TODO | | +| SshFileTransferTask | TODO | | +| SshTunnelTask | TODO | | +| SshDockerProxyTask | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_108_005_AGENTS_winrm.md b/docs/implplan/SPRINT_20260110_108_005_AGENTS_winrm.md new file mode 100644 index 000000000..20e6f542b --- /dev/null +++ b/docs/implplan/SPRINT_20260110_108_005_AGENTS_winrm.md @@ -0,0 +1,915 @@ +# SPRINT: Agent - WinRM + +> **Sprint ID:** 108_005 +> **Module:** AGENTS +> **Phase:** 8 - Agents +> **Status:** TODO +> **Parent:** [108_000_INDEX](SPRINT_20260110_108_000_INDEX_agents.md) + +--- + +## Overview + +Implement the WinRM Agent capability for remote Windows management via WinRM/PowerShell. + +### Objectives + +- Remote PowerShell execution via WinRM +- Windows service management +- Windows container operations +- File transfer to Windows hosts +- NTLM and Kerberos authentication + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Agents/ +│ └── StellaOps.Agent.WinRM/ +│ ├── WinRmCapability.cs +│ ├── Tasks/ +│ │ ├── PowerShellTask.cs +│ │ ├── WindowsServiceTask.cs +│ │ ├── WindowsContainerTask.cs +│ │ └── WinRmFileTransferTask.cs +│ ├── WinRmConnectionPool.cs +│ └── PowerShellRunner.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### WinRmCapability + +```csharp +namespace StellaOps.Agent.WinRM; + +public sealed class WinRmCapability : IAgentCapability +{ + private readonly WinRmConnectionPool _connectionPool; + private readonly ILogger _logger; + private readonly Dictionary>> _taskHandlers; + + public string Name => "winrm"; + public string Version => "1.0.0"; + + public IReadOnlyList SupportedTaskTypes => new[] + { + "winrm.powershell", + "winrm.service.start", + "winrm.service.stop", + "winrm.service.restart", + "winrm.service.status", + "winrm.container.deploy", + "winrm.file.upload", + "winrm.file.download" + }; + + public WinRmCapability(WinRmConnectionPool connectionPool, ILogger logger) + { + _connectionPool = connectionPool; + _logger = logger; + + _taskHandlers = new Dictionary>> + { + ["winrm.powershell"] = ExecutePowerShellAsync, + ["winrm.service.start"] = StartServiceAsync, + ["winrm.service.stop"] = StopServiceAsync, + ["winrm.service.restart"] = RestartServiceAsync, + ["winrm.service.status"] = GetServiceStatusAsync, + ["winrm.container.deploy"] = DeployContainerAsync, + ["winrm.file.upload"] = UploadFileAsync, + ["winrm.file.download"] = DownloadFileAsync + }; + } + + public Task InitializeAsync(CancellationToken ct = default) + { + _logger.LogInformation("WinRM capability initialized"); + return Task.FromResult(true); + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct = default) + { + if (!_taskHandlers.TryGetValue(task.TaskType, out var handler)) + { + throw new UnsupportedTaskTypeException(task.TaskType); + } + + return await handler(task, ct); + } + + public Task CheckHealthAsync(CancellationToken ct = default) + { + return Task.FromResult(new CapabilityHealthStatus(true, "WinRM capability available")); + } + + private Task ExecutePowerShellAsync(AgentTask task, CancellationToken ct) => + new PowerShellTask(_connectionPool, _logger).ExecuteAsync(task, ct); + + private Task StartServiceAsync(AgentTask task, CancellationToken ct) => + new WindowsServiceTask(_connectionPool, _logger).StartAsync(task, ct); + + private Task StopServiceAsync(AgentTask task, CancellationToken ct) => + new WindowsServiceTask(_connectionPool, _logger).StopAsync(task, ct); + + private Task RestartServiceAsync(AgentTask task, CancellationToken ct) => + new WindowsServiceTask(_connectionPool, _logger).RestartAsync(task, ct); + + private Task GetServiceStatusAsync(AgentTask task, CancellationToken ct) => + new WindowsServiceTask(_connectionPool, _logger).GetStatusAsync(task, ct); + + private Task DeployContainerAsync(AgentTask task, CancellationToken ct) => + new WindowsContainerTask(_connectionPool, _logger).DeployAsync(task, ct); + + private Task UploadFileAsync(AgentTask task, CancellationToken ct) => + new WinRmFileTransferTask(_connectionPool, _logger).UploadAsync(task, ct); + + private Task DownloadFileAsync(AgentTask task, CancellationToken ct) => + new WinRmFileTransferTask(_connectionPool, _logger).DownloadAsync(task, ct); +} +``` + +### WinRmConnectionPool + +```csharp +namespace StellaOps.Agent.WinRM; + +public sealed class WinRmConnectionPool : IAsyncDisposable +{ + private readonly ConcurrentDictionary _sessions = new(); + private readonly TimeSpan _sessionTimeout = TimeSpan.FromMinutes(10); + private readonly ILogger _logger; + private readonly Timer _cleanupTimer; + + public WinRmConnectionPool(ILogger logger) + { + _logger = logger; + _cleanupTimer = new Timer(CleanupExpiredSessions, null, TimeSpan.FromMinutes(1), TimeSpan.FromMinutes(1)); + } + + public async Task GetSessionAsync( + WinRmConnectionInfo connectionInfo, + CancellationToken ct = default) + { + var key = connectionInfo.GetConnectionKey(); + + if (_sessions.TryGetValue(key, out var pooled) && pooled.IsValid) + { + pooled.LastUsed = DateTimeOffset.UtcNow; + return pooled.Session; + } + + var session = await CreateSessionAsync(connectionInfo, ct); + _sessions[key] = new PooledSession(session, DateTimeOffset.UtcNow); + + return session; + } + + private async Task CreateSessionAsync( + WinRmConnectionInfo info, + CancellationToken ct) + { + var sessionOptions = new WSManSessionOptions + { + DestinationHost = info.Host, + DestinationPort = info.Port, + UseSSL = info.UseSSL, + AuthenticationMechanism = info.AuthMechanism, + Credential = new NetworkCredential(info.Username, info.Password, info.Domain) + }; + + var session = await Task.Run(() => new WSManSession(sessionOptions), ct); + + _logger.LogDebug( + "WinRM session established to {Host}:{Port}", + info.Host, + info.Port); + + return session; + } + + private void CleanupExpiredSessions(object? state) + { + var expired = _sessions + .Where(kv => DateTimeOffset.UtcNow - kv.Value.LastUsed > _sessionTimeout) + .ToList(); + + foreach (var (key, pooled) in expired) + { + if (_sessions.TryRemove(key, out _)) + { + try + { + pooled.Session.Dispose(); + _logger.LogDebug("Closed expired WinRM session: {Key}", key); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Error closing WinRM session: {Key}", key); + } + } + } + } + + public async ValueTask DisposeAsync() + { + _cleanupTimer.Dispose(); + + foreach (var (_, pooled) in _sessions) + { + try + { + pooled.Session.Dispose(); + } + catch { } + } + + _sessions.Clear(); + } + + private sealed class PooledSession + { + public WSManSession Session { get; } + public DateTimeOffset LastUsed { get; set; } + public bool IsValid => !Session.IsDisposed; + + public PooledSession(WSManSession session, DateTimeOffset lastUsed) + { + Session = session; + LastUsed = lastUsed; + } + } +} + +public sealed record WinRmConnectionInfo +{ + public required string Host { get; init; } + public int Port { get; init; } = 5985; + public bool UseSSL { get; init; } + public required string Username { get; init; } + public required string Password { get; init; } + public string? Domain { get; init; } + public WinRmAuthMechanism AuthMechanism { get; init; } = WinRmAuthMechanism.Negotiate; + + public string GetConnectionKey() => $"{Domain ?? ""}\\{Username}@{Host}:{Port}"; +} + +public enum WinRmAuthMechanism +{ + Basic, + Negotiate, + Kerberos, + CredSSP +} +``` + +### PowerShellTask + +```csharp +namespace StellaOps.Agent.WinRM.Tasks; + +public sealed class PowerShellTask +{ + private readonly WinRmConnectionPool _connectionPool; + private readonly ILogger _logger; + + public sealed record PowerShellPayload + { + public required string Host { get; init; } + public int Port { get; init; } = 5985; + public bool UseSSL { get; init; } + public required string Username { get; init; } + public string? Domain { get; init; } + public required string Script { get; init; } + public IReadOnlyDictionary? Parameters { get; init; } + public TimeSpan Timeout { get; init; } = TimeSpan.FromMinutes(10); + public bool NoProfile { get; init; } = true; + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("winrm.powershell"); + + var connectionInfo = new WinRmConnectionInfo + { + Host = payload.Host, + Port = payload.Port, + UseSSL = payload.UseSSL, + Username = payload.Username, + Password = task.Credentials.GetValueOrDefault("winrm.password") ?? "", + Domain = payload.Domain + }; + + _logger.LogInformation( + "Executing PowerShell script on {Host}", + payload.Host); + + try + { + var session = await _connectionPool.GetSessionAsync(connectionInfo, ct); + + // Build the script with parameters + var script = BuildScript(payload); + + using var timeoutCts = new CancellationTokenSource(payload.Timeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + var result = await Task.Run(() => + { + using var shell = session.CreatePowerShellShell(); + + if (payload.NoProfile) + { + shell.AddScript("$PSDefaultParameterValues['*:NoProfile'] = $true"); + } + + shell.AddScript(script); + + // Add parameters + if (payload.Parameters is not null) + { + foreach (var (key, value) in payload.Parameters) + { + shell.AddParameter(key, value); + } + } + + var output = shell.Invoke(); + + return new PowerShellResult + { + Output = output.Select(o => o.ToString()).ToList(), + HadErrors = shell.HadErrors, + Errors = shell.Streams.Error.Select(e => e.ToString()).ToList() + }; + }, linkedCts.Token); + + _logger.LogInformation( + "PowerShell script completed (errors: {HadErrors})", + result.HadErrors); + + return new TaskResult + { + TaskId = task.Id, + Success = !result.HadErrors, + Error = result.HadErrors ? string.Join("\n", result.Errors) : null, + Outputs = new Dictionary + { + ["output"] = result.Output, + ["errors"] = result.Errors, + ["hadErrors"] = result.HadErrors + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (OperationCanceledException) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"PowerShell execution timed out after {payload.Timeout}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "PowerShell execution failed on {Host}", payload.Host); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = ex.Message, + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private static string BuildScript(PowerShellPayload payload) + { + // Wrap script in error handling + return $@" +$ErrorActionPreference = 'Stop' +try {{ + {payload.Script} +}} catch {{ + Write-Error $_.Exception.Message + throw +}}"; + } + + private sealed record PowerShellResult + { + public required IReadOnlyList Output { get; init; } + public required bool HadErrors { get; init; } + public required IReadOnlyList Errors { get; init; } + } +} +``` + +### WindowsServiceTask + +```csharp +namespace StellaOps.Agent.WinRM.Tasks; + +public sealed class WindowsServiceTask +{ + private readonly WinRmConnectionPool _connectionPool; + private readonly ILogger _logger; + + public sealed record ServicePayload + { + public required string Host { get; init; } + public int Port { get; init; } = 5985; + public bool UseSSL { get; init; } + public required string Username { get; init; } + public string? Domain { get; init; } + public required string ServiceName { get; init; } + public TimeSpan Timeout { get; init; } = TimeSpan.FromMinutes(5); + } + + public async Task StartAsync(AgentTask task, CancellationToken ct) + { + return await ExecuteServiceCommandAsync(task, "Start-Service", ct); + } + + public async Task StopAsync(AgentTask task, CancellationToken ct) + { + return await ExecuteServiceCommandAsync(task, "Stop-Service", ct); + } + + public async Task RestartAsync(AgentTask task, CancellationToken ct) + { + return await ExecuteServiceCommandAsync(task, "Restart-Service", ct); + } + + public async Task GetStatusAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("winrm.service.status"); + + var connectionInfo = BuildConnectionInfo(payload, task.Credentials); + + _logger.LogInformation( + "Getting service status for {Service} on {Host}", + payload.ServiceName, + payload.Host); + + try + { + var session = await _connectionPool.GetSessionAsync(connectionInfo, ct); + + var script = $@" +$service = Get-Service -Name '{EscapeString(payload.ServiceName)}' -ErrorAction Stop +@{{ + Name = $service.Name + DisplayName = $service.DisplayName + Status = $service.Status.ToString() + StartType = $service.StartType.ToString() + CanStop = $service.CanStop + CanPauseAndContinue = $service.CanPauseAndContinue +}} | ConvertTo-Json"; + + var result = await ExecutePowerShellAsync(session, script, payload.Timeout, ct); + + if (result.HadErrors) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = string.Join("\n", result.Errors), + CompletedAt = DateTimeOffset.UtcNow + }; + } + + var serviceInfo = JsonSerializer.Deserialize(string.Join("", result.Output)); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["serviceName"] = serviceInfo?.Name ?? payload.ServiceName, + ["displayName"] = serviceInfo?.DisplayName ?? "", + ["status"] = serviceInfo?.Status ?? "Unknown", + ["startType"] = serviceInfo?.StartType ?? "Unknown" + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to get service status on {Host}", payload.Host); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = ex.Message, + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private async Task ExecuteServiceCommandAsync( + AgentTask task, + string command, + CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("winrm.service"); + + var connectionInfo = BuildConnectionInfo(payload, task.Credentials); + + _logger.LogInformation( + "Executing {Command} for {Service} on {Host}", + command, + payload.ServiceName, + payload.Host); + + try + { + var session = await _connectionPool.GetSessionAsync(connectionInfo, ct); + + var script = $@" +{command} -Name '{EscapeString(payload.ServiceName)}' -ErrorAction Stop +$service = Get-Service -Name '{EscapeString(payload.ServiceName)}' +$service.Status.ToString()"; + + var result = await ExecutePowerShellAsync(session, script, payload.Timeout, ct); + + if (result.HadErrors) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = string.Join("\n", result.Errors), + CompletedAt = DateTimeOffset.UtcNow + }; + } + + var status = string.Join("", result.Output).Trim(); + + _logger.LogInformation( + "Service {Service} is now {Status}", + payload.ServiceName, + status); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["serviceName"] = payload.ServiceName, + ["status"] = status + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "Service command failed on {Host}", payload.Host); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = ex.Message, + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private static async Task ExecutePowerShellAsync( + WSManSession session, + string script, + TimeSpan timeout, + CancellationToken ct) + { + using var timeoutCts = new CancellationTokenSource(timeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + return await Task.Run(() => + { + using var shell = session.CreatePowerShellShell(); + shell.AddScript(script); + + var output = shell.Invoke(); + + return new PowerShellResult + { + Output = output.Select(o => o.ToString()).ToList(), + HadErrors = shell.HadErrors, + Errors = shell.Streams.Error.Select(e => e.ToString()).ToList() + }; + }, linkedCts.Token); + } + + private static WinRmConnectionInfo BuildConnectionInfo( + ServicePayload payload, + IReadOnlyDictionary credentials) + { + return new WinRmConnectionInfo + { + Host = payload.Host, + Port = payload.Port, + UseSSL = payload.UseSSL, + Username = payload.Username, + Password = credentials.GetValueOrDefault("winrm.password") ?? "", + Domain = payload.Domain + }; + } + + private static string EscapeString(string value) + { + return value.Replace("'", "''"); + } + + private sealed record ServiceInfo + { + public string? Name { get; init; } + public string? DisplayName { get; init; } + public string? Status { get; init; } + public string? StartType { get; init; } + } + + private sealed record PowerShellResult + { + public required IReadOnlyList Output { get; init; } + public required bool HadErrors { get; init; } + public required IReadOnlyList Errors { get; init; } + } +} +``` + +### WindowsContainerTask + +```csharp +namespace StellaOps.Agent.WinRM.Tasks; + +public sealed class WindowsContainerTask +{ + private readonly WinRmConnectionPool _connectionPool; + private readonly ILogger _logger; + + public sealed record ContainerPayload + { + public required string Host { get; init; } + public int Port { get; init; } = 5985; + public bool UseSSL { get; init; } + public required string Username { get; init; } + public string? Domain { get; init; } + public required string Image { get; init; } + public required string Name { get; init; } + public IReadOnlyDictionary? Environment { get; init; } + public IReadOnlyList? Ports { get; init; } + public IReadOnlyList? Volumes { get; init; } + public string? Network { get; init; } + public bool RemoveExisting { get; init; } = true; + public TimeSpan Timeout { get; init; } = TimeSpan.FromMinutes(10); + } + + public async Task DeployAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("winrm.container.deploy"); + + var connectionInfo = new WinRmConnectionInfo + { + Host = payload.Host, + Port = payload.Port, + UseSSL = payload.UseSSL, + Username = payload.Username, + Password = task.Credentials.GetValueOrDefault("winrm.password") ?? "", + Domain = payload.Domain + }; + + _logger.LogInformation( + "Deploying Windows container {Name} on {Host}", + payload.Name, + payload.Host); + + try + { + var session = await _connectionPool.GetSessionAsync(connectionInfo, ct); + + // Build deployment script + var script = BuildDeploymentScript(payload, task.Credentials); + + using var timeoutCts = new CancellationTokenSource(payload.Timeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + var result = await Task.Run(() => + { + using var shell = session.CreatePowerShellShell(); + shell.AddScript(script); + + var output = shell.Invoke(); + + return new PowerShellResult + { + Output = output.Select(o => o.ToString()).ToList(), + HadErrors = shell.HadErrors, + Errors = shell.Streams.Error.Select(e => e.ToString()).ToList() + }; + }, linkedCts.Token); + + if (result.HadErrors) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = string.Join("\n", result.Errors), + CompletedAt = DateTimeOffset.UtcNow + }; + } + + // Parse container ID from output + var containerId = ParseContainerId(result.Output); + + _logger.LogInformation( + "Windows container {Name} deployed (ID: {Id})", + payload.Name, + containerId?[..12] ?? "unknown"); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["containerId"] = containerId ?? "", + ["containerName"] = payload.Name, + ["image"] = payload.Image + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "Container deployment failed on {Host}", payload.Host); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = ex.Message, + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private static string BuildDeploymentScript( + ContainerPayload payload, + IReadOnlyDictionary credentials) + { + var sb = new StringBuilder(); + + // Registry login if credentials provided + if (credentials.TryGetValue("registry.username", out var regUser) && + credentials.TryGetValue("registry.password", out var regPass)) + { + var registry = payload.Image.Contains('/') ? payload.Image.Split('/')[0] : ""; + if (!string.IsNullOrEmpty(registry) && registry.Contains('.')) + { + sb.AppendLine($@" +$securePassword = ConvertTo-SecureString '{EscapeString(regPass)}' -AsPlainText -Force +$credential = New-Object System.Management.Automation.PSCredential('{EscapeString(regUser)}', $securePassword) +docker login {registry} --username $credential.UserName --password $credential.GetNetworkCredential().Password"); + } + } + + // Remove existing container if requested + if (payload.RemoveExisting) + { + sb.AppendLine($@" +$existing = docker ps -a --filter 'name=^{payload.Name}$' --format '{{{{.ID}}}}' +if ($existing) {{ + docker stop $existing 2>&1 | Out-Null + docker rm $existing 2>&1 | Out-Null +}}"); + } + + // Pull image + sb.AppendLine($"docker pull '{EscapeString(payload.Image)}'"); + + // Build run command + var runArgs = new List { "docker run -d", $"--name '{EscapeString(payload.Name)}'" }; + + // Environment variables + if (payload.Environment is not null) + { + foreach (var (key, value) in payload.Environment) + { + runArgs.Add($"-e '{EscapeString(key)}={EscapeString(value)}'"); + } + } + + // Port mappings + if (payload.Ports is not null) + { + foreach (var port in payload.Ports) + { + runArgs.Add($"-p {port}"); + } + } + + // Volume mounts + if (payload.Volumes is not null) + { + foreach (var volume in payload.Volumes) + { + runArgs.Add($"-v '{EscapeString(volume)}'"); + } + } + + // Network + if (!string.IsNullOrEmpty(payload.Network)) + { + runArgs.Add($"--network '{EscapeString(payload.Network)}'"); + } + + runArgs.Add($"'{EscapeString(payload.Image)}'"); + + sb.AppendLine(string.Join(" `\n ", runArgs)); + + return sb.ToString(); + } + + private static string? ParseContainerId(IReadOnlyList output) + { + return output.LastOrDefault(l => l.Length >= 12 && l.All(c => char.IsLetterOrDigit(c))); + } + + private static string EscapeString(string value) + { + return value.Replace("'", "''"); + } + + private sealed record PowerShellResult + { + public required IReadOnlyList Output { get; init; } + public required bool HadErrors { get; init; } + public required IReadOnlyList Errors { get; init; } + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Execute PowerShell scripts remotely +- [ ] Support NTLM authentication +- [ ] Support Kerberos authentication +- [ ] Start Windows services +- [ ] Stop Windows services +- [ ] Restart Windows services +- [ ] Get Windows service status +- [ ] Deploy Windows containers via remote Docker +- [ ] Upload files to Windows hosts +- [ ] Download files from Windows hosts +- [ ] Connection pooling for efficiency +- [ ] Timeout handling +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 108_001 Agent Core Runtime | Internal | TODO | +| System.Management.Automation | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| WinRmCapability | TODO | | +| WinRmConnectionPool | TODO | | +| PowerShellTask | TODO | | +| WindowsServiceTask | TODO | | +| WindowsContainerTask | TODO | | +| WinRmFileTransferTask | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_108_006_AGENTS_ecs.md b/docs/implplan/SPRINT_20260110_108_006_AGENTS_ecs.md new file mode 100644 index 000000000..cbc11be6a --- /dev/null +++ b/docs/implplan/SPRINT_20260110_108_006_AGENTS_ecs.md @@ -0,0 +1,961 @@ +# SPRINT: Agent - ECS + +> **Sprint ID:** 108_006 +> **Module:** AGENTS +> **Phase:** 8 - Agents +> **Status:** TODO +> **Parent:** [108_000_INDEX](SPRINT_20260110_108_000_INDEX_agents.md) + +--- + +## Overview + +Implement the ECS Agent capability for managing AWS Elastic Container Service deployments on ECS clusters (Fargate or EC2 launch types). + +### Objectives + +- ECS service deployments (create, update, delete) +- ECS task execution (run tasks, stop tasks) +- Task definition registration +- Service scaling operations +- Deployment health monitoring +- Log streaming via CloudWatch Logs +- Support for Fargate and EC2 launch types + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Agents/ +│ └── StellaOps.Agent.Ecs/ +│ ├── EcsCapability.cs +│ ├── Tasks/ +│ │ ├── EcsDeployServiceTask.cs +│ │ ├── EcsRunTaskTask.cs +│ │ ├── EcsStopTaskTask.cs +│ │ ├── EcsScaleServiceTask.cs +│ │ ├── EcsRegisterTaskDefinitionTask.cs +│ │ └── EcsHealthCheckTask.cs +│ ├── EcsClientFactory.cs +│ └── CloudWatchLogStreamer.cs +└── __Tests/ + └── StellaOps.Agent.Ecs.Tests/ +``` + +--- + +## Deliverables + +### EcsCapability + +```csharp +namespace StellaOps.Agent.Ecs; + +public sealed class EcsCapability : IAgentCapability +{ + private readonly IAmazonECS _ecsClient; + private readonly IAmazonCloudWatchLogs _logsClient; + private readonly ILogger _logger; + private readonly Dictionary>> _taskHandlers; + + public string Name => "ecs"; + public string Version => "1.0.0"; + + public IReadOnlyList SupportedTaskTypes => new[] + { + "ecs.deploy-service", + "ecs.run-task", + "ecs.stop-task", + "ecs.scale-service", + "ecs.register-task-definition", + "ecs.health-check", + "ecs.describe-service" + }; + + public EcsCapability( + IAmazonECS ecsClient, + IAmazonCloudWatchLogs logsClient, + ILogger logger) + { + _ecsClient = ecsClient; + _logsClient = logsClient; + _logger = logger; + + _taskHandlers = new Dictionary>> + { + ["ecs.deploy-service"] = ExecuteDeployServiceAsync, + ["ecs.run-task"] = ExecuteRunTaskAsync, + ["ecs.stop-task"] = ExecuteStopTaskAsync, + ["ecs.scale-service"] = ExecuteScaleServiceAsync, + ["ecs.register-task-definition"] = ExecuteRegisterTaskDefinitionAsync, + ["ecs.health-check"] = ExecuteHealthCheckAsync + }; + } + + public async Task InitializeAsync(CancellationToken ct = default) + { + try + { + // Verify AWS credentials and ECS access + var clusters = await _ecsClient.ListClustersAsync(new ListClustersRequest + { + MaxResults = 1 + }, ct); + + _logger.LogInformation( + "ECS capability initialized, discovered {ClusterCount} clusters", + clusters.ClusterArns.Count); + + return true; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to initialize ECS capability"); + return false; + } + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct = default) + { + if (!_taskHandlers.TryGetValue(task.TaskType, out var handler)) + { + throw new UnsupportedTaskTypeException(task.TaskType); + } + + return await handler(task, ct); + } + + public async Task CheckHealthAsync(CancellationToken ct = default) + { + try + { + await _ecsClient.ListClustersAsync(new ListClustersRequest { MaxResults = 1 }, ct); + return new CapabilityHealthStatus(true, "ECS API responding"); + } + catch (Exception ex) + { + return new CapabilityHealthStatus(false, $"ECS API not responding: {ex.Message}"); + } + } + + private Task ExecuteDeployServiceAsync(AgentTask task, CancellationToken ct) => + new EcsDeployServiceTask(_ecsClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteRunTaskAsync(AgentTask task, CancellationToken ct) => + new EcsRunTaskTask(_ecsClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteStopTaskAsync(AgentTask task, CancellationToken ct) => + new EcsStopTaskTask(_ecsClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteScaleServiceAsync(AgentTask task, CancellationToken ct) => + new EcsScaleServiceTask(_ecsClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteRegisterTaskDefinitionAsync(AgentTask task, CancellationToken ct) => + new EcsRegisterTaskDefinitionTask(_ecsClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteHealthCheckAsync(AgentTask task, CancellationToken ct) => + new EcsHealthCheckTask(_ecsClient, _logger).ExecuteAsync(task, ct); +} +``` + +### EcsDeployServiceTask + +```csharp +namespace StellaOps.Agent.Ecs.Tasks; + +public sealed class EcsDeployServiceTask +{ + private readonly IAmazonECS _ecsClient; + private readonly ILogger _logger; + + public sealed record DeployServicePayload + { + public required string Cluster { get; init; } + public required string ServiceName { get; init; } + public required string TaskDefinition { get; init; } + public int DesiredCount { get; init; } = 1; + public string? LaunchType { get; init; } // FARGATE or EC2 + public NetworkConfiguration? NetworkConfig { get; init; } + public LoadBalancerConfiguration? LoadBalancer { get; init; } + public DeploymentConfiguration? DeploymentConfig { get; init; } + public IReadOnlyDictionary? Tags { get; init; } + public bool ForceNewDeployment { get; init; } = true; + public TimeSpan DeploymentTimeout { get; init; } = TimeSpan.FromMinutes(10); + } + + public sealed record NetworkConfiguration + { + public required IReadOnlyList Subnets { get; init; } + public IReadOnlyList? SecurityGroups { get; init; } + public bool AssignPublicIp { get; init; } = false; + } + + public sealed record LoadBalancerConfiguration + { + public required string TargetGroupArn { get; init; } + public required string ContainerName { get; init; } + public required int ContainerPort { get; init; } + } + + public sealed record DeploymentConfiguration + { + public int MaximumPercent { get; init; } = 200; + public int MinimumHealthyPercent { get; init; } = 100; + public DeploymentCircuitBreaker? CircuitBreaker { get; init; } + } + + public sealed record DeploymentCircuitBreaker + { + public bool Enable { get; init; } = true; + public bool Rollback { get; init; } = true; + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("ecs.deploy-service"); + + _logger.LogInformation( + "Deploying ECS service {Service} to cluster {Cluster} with task definition {TaskDef}", + payload.ServiceName, + payload.Cluster, + payload.TaskDefinition); + + try + { + // Check if service exists + var existingService = await GetServiceAsync(payload.Cluster, payload.ServiceName, ct); + + if (existingService is not null) + { + return await UpdateServiceAsync(task.Id, payload, ct); + } + else + { + return await CreateServiceAsync(task.Id, payload, ct); + } + } + catch (AmazonECSException ex) + { + _logger.LogError(ex, "Failed to deploy ECS service {Service}", payload.ServiceName); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Failed to deploy service: {ex.Message}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private async Task GetServiceAsync(string cluster, string serviceName, CancellationToken ct) + { + try + { + var response = await _ecsClient.DescribeServicesAsync(new DescribeServicesRequest + { + Cluster = cluster, + Services = new List { serviceName } + }, ct); + + return response.Services.FirstOrDefault(s => s.Status != "INACTIVE"); + } + catch + { + return null; + } + } + + private async Task CreateServiceAsync( + Guid taskId, + DeployServicePayload payload, + CancellationToken ct) + { + _logger.LogInformation("Creating new ECS service {Service}", payload.ServiceName); + + var request = new CreateServiceRequest + { + Cluster = payload.Cluster, + ServiceName = payload.ServiceName, + TaskDefinition = payload.TaskDefinition, + DesiredCount = payload.DesiredCount, + LaunchType = string.IsNullOrEmpty(payload.LaunchType) ? null : new LaunchType(payload.LaunchType), + DeploymentConfiguration = payload.DeploymentConfig is not null + ? new Amazon.ECS.Model.DeploymentConfiguration + { + MaximumPercent = payload.DeploymentConfig.MaximumPercent, + MinimumHealthyPercent = payload.DeploymentConfig.MinimumHealthyPercent, + DeploymentCircuitBreaker = payload.DeploymentConfig.CircuitBreaker is not null + ? new Amazon.ECS.Model.DeploymentCircuitBreaker + { + Enable = payload.DeploymentConfig.CircuitBreaker.Enable, + Rollback = payload.DeploymentConfig.CircuitBreaker.Rollback + } + : null + } + : null, + Tags = payload.Tags?.Select(kv => new Tag { Key = kv.Key, Value = kv.Value }).ToList() + }; + + if (payload.NetworkConfig is not null) + { + request.NetworkConfiguration = new Amazon.ECS.Model.NetworkConfiguration + { + AwsvpcConfiguration = new AwsVpcConfiguration + { + Subnets = payload.NetworkConfig.Subnets.ToList(), + SecurityGroups = payload.NetworkConfig.SecurityGroups?.ToList(), + AssignPublicIp = payload.NetworkConfig.AssignPublicIp ? AssignPublicIp.ENABLED : AssignPublicIp.DISABLED + } + }; + } + + if (payload.LoadBalancer is not null) + { + request.LoadBalancers = new List + { + new() + { + TargetGroupArn = payload.LoadBalancer.TargetGroupArn, + ContainerName = payload.LoadBalancer.ContainerName, + ContainerPort = payload.LoadBalancer.ContainerPort + } + }; + } + + var createResponse = await _ecsClient.CreateServiceAsync(request, ct); + var service = createResponse.Service; + + _logger.LogInformation( + "Created ECS service {Service} (ARN: {Arn})", + payload.ServiceName, + service.ServiceArn); + + // Wait for deployment to stabilize + var stable = await WaitForServiceStableAsync( + payload.Cluster, + payload.ServiceName, + payload.DeploymentTimeout, + ct); + + return new TaskResult + { + TaskId = taskId, + Success = stable, + Error = stable ? null : "Service did not stabilize within timeout", + Outputs = new Dictionary + { + ["serviceArn"] = service.ServiceArn, + ["serviceName"] = service.ServiceName, + ["taskDefinition"] = service.TaskDefinition, + ["runningCount"] = service.RunningCount, + ["desiredCount"] = service.DesiredCount, + ["deploymentStatus"] = stable ? "COMPLETED" : "TIMED_OUT" + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + private async Task UpdateServiceAsync( + Guid taskId, + DeployServicePayload payload, + CancellationToken ct) + { + _logger.LogInformation( + "Updating existing ECS service {Service} to task definition {TaskDef}", + payload.ServiceName, + payload.TaskDefinition); + + var request = new UpdateServiceRequest + { + Cluster = payload.Cluster, + Service = payload.ServiceName, + TaskDefinition = payload.TaskDefinition, + DesiredCount = payload.DesiredCount, + ForceNewDeployment = payload.ForceNewDeployment + }; + + if (payload.DeploymentConfig is not null) + { + request.DeploymentConfiguration = new Amazon.ECS.Model.DeploymentConfiguration + { + MaximumPercent = payload.DeploymentConfig.MaximumPercent, + MinimumHealthyPercent = payload.DeploymentConfig.MinimumHealthyPercent, + DeploymentCircuitBreaker = payload.DeploymentConfig.CircuitBreaker is not null + ? new Amazon.ECS.Model.DeploymentCircuitBreaker + { + Enable = payload.DeploymentConfig.CircuitBreaker.Enable, + Rollback = payload.DeploymentConfig.CircuitBreaker.Rollback + } + : null + }; + } + + var updateResponse = await _ecsClient.UpdateServiceAsync(request, ct); + var service = updateResponse.Service; + + _logger.LogInformation( + "Updated ECS service {Service}, deployment ID: {DeploymentId}", + payload.ServiceName, + service.Deployments.FirstOrDefault()?.Id ?? "unknown"); + + // Wait for deployment to stabilize + var stable = await WaitForServiceStableAsync( + payload.Cluster, + payload.ServiceName, + payload.DeploymentTimeout, + ct); + + return new TaskResult + { + TaskId = taskId, + Success = stable, + Error = stable ? null : "Service did not stabilize within timeout", + Outputs = new Dictionary + { + ["serviceArn"] = service.ServiceArn, + ["serviceName"] = service.ServiceName, + ["taskDefinition"] = service.TaskDefinition, + ["runningCount"] = service.RunningCount, + ["desiredCount"] = service.DesiredCount, + ["deploymentId"] = service.Deployments.FirstOrDefault()?.Id ?? "", + ["deploymentStatus"] = stable ? "COMPLETED" : "TIMED_OUT" + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + private async Task WaitForServiceStableAsync( + string cluster, + string serviceName, + TimeSpan timeout, + CancellationToken ct) + { + _logger.LogInformation("Waiting for service {Service} to stabilize", serviceName); + + using var timeoutCts = new CancellationTokenSource(timeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + try + { + while (!linkedCts.IsCancellationRequested) + { + var response = await _ecsClient.DescribeServicesAsync(new DescribeServicesRequest + { + Cluster = cluster, + Services = new List { serviceName } + }, linkedCts.Token); + + var service = response.Services.FirstOrDefault(); + if (service is null) + { + _logger.LogWarning("Service {Service} not found during stabilization check", serviceName); + return false; + } + + var primaryDeployment = service.Deployments.FirstOrDefault(d => d.Status == "PRIMARY"); + if (primaryDeployment is null) + { + await Task.Delay(TimeSpan.FromSeconds(10), linkedCts.Token); + continue; + } + + if (primaryDeployment.RunningCount == primaryDeployment.DesiredCount && + service.Deployments.Count == 1) + { + _logger.LogInformation( + "Service {Service} stabilized with {Count} running tasks", + serviceName, + primaryDeployment.RunningCount); + return true; + } + + _logger.LogDebug( + "Service {Service} not stable: running={Running}, desired={Desired}, deployments={Deployments}", + serviceName, + primaryDeployment.RunningCount, + primaryDeployment.DesiredCount, + service.Deployments.Count); + + await Task.Delay(TimeSpan.FromSeconds(10), linkedCts.Token); + } + } + catch (OperationCanceledException) when (timeoutCts.IsCancellationRequested) + { + _logger.LogWarning("Service {Service} stabilization timed out after {Timeout}", serviceName, timeout); + } + + return false; + } +} +``` + +### EcsRunTaskTask + +```csharp +namespace StellaOps.Agent.Ecs.Tasks; + +public sealed class EcsRunTaskTask +{ + private readonly IAmazonECS _ecsClient; + private readonly ILogger _logger; + + public sealed record RunTaskPayload + { + public required string Cluster { get; init; } + public required string TaskDefinition { get; init; } + public int Count { get; init; } = 1; + public string? LaunchType { get; init; } + public NetworkConfiguration? NetworkConfig { get; init; } + public IReadOnlyList? Overrides { get; init; } + public string? Group { get; init; } + public IReadOnlyDictionary? Tags { get; init; } + public bool WaitForCompletion { get; init; } = true; + public TimeSpan CompletionTimeout { get; init; } = TimeSpan.FromMinutes(30); + } + + public sealed record ContainerOverride + { + public required string Name { get; init; } + public IReadOnlyList? Command { get; init; } + public IReadOnlyDictionary? Environment { get; init; } + public int? Cpu { get; init; } + public int? Memory { get; init; } + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("ecs.run-task"); + + _logger.LogInformation( + "Running ECS task from definition {TaskDef} on cluster {Cluster}", + payload.TaskDefinition, + payload.Cluster); + + try + { + var request = new RunTaskRequest + { + Cluster = payload.Cluster, + TaskDefinition = payload.TaskDefinition, + Count = payload.Count, + LaunchType = string.IsNullOrEmpty(payload.LaunchType) ? null : new LaunchType(payload.LaunchType), + Group = payload.Group, + Tags = payload.Tags?.Select(kv => new Tag { Key = kv.Key, Value = kv.Value }).ToList() + }; + + if (payload.NetworkConfig is not null) + { + request.NetworkConfiguration = new Amazon.ECS.Model.NetworkConfiguration + { + AwsvpcConfiguration = new AwsVpcConfiguration + { + Subnets = payload.NetworkConfig.Subnets.ToList(), + SecurityGroups = payload.NetworkConfig.SecurityGroups?.ToList(), + AssignPublicIp = payload.NetworkConfig.AssignPublicIp ? AssignPublicIp.ENABLED : AssignPublicIp.DISABLED + } + }; + } + + if (payload.Overrides is not null) + { + request.Overrides = new TaskOverride + { + ContainerOverrides = payload.Overrides.Select(o => new Amazon.ECS.Model.ContainerOverride + { + Name = o.Name, + Command = o.Command?.ToList(), + Environment = o.Environment?.Select(kv => new Amazon.ECS.Model.KeyValuePair + { + Name = kv.Key, + Value = kv.Value + }).ToList(), + Cpu = o.Cpu, + Memory = o.Memory + }).ToList() + }; + } + + var runResponse = await _ecsClient.RunTaskAsync(request, ct); + + if (runResponse.Failures.Any()) + { + var failure = runResponse.Failures.First(); + _logger.LogError( + "Failed to run ECS task: {Reason} (ARN: {Arn})", + failure.Reason, + failure.Arn); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Failed to run task: {failure.Reason}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + + var ecsTasks = runResponse.Tasks; + var taskArns = ecsTasks.Select(t => t.TaskArn).ToList(); + + _logger.LogInformation( + "Started {Count} ECS task(s): {TaskArns}", + ecsTasks.Count, + string.Join(", ", taskArns.Select(a => a.Split('/').Last()))); + + if (!payload.WaitForCompletion) + { + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["taskArns"] = taskArns, + ["taskCount"] = ecsTasks.Count, + ["status"] = "RUNNING" + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + // Wait for tasks to complete + var (completed, exitCodes) = await WaitForTasksAsync( + payload.Cluster, + taskArns, + payload.CompletionTimeout, + ct); + + var allSucceeded = completed && exitCodes.All(e => e == 0); + + return new TaskResult + { + TaskId = task.Id, + Success = allSucceeded, + Error = allSucceeded ? null : $"Task(s) failed with exit codes: {string.Join(", ", exitCodes)}", + Outputs = new Dictionary + { + ["taskArns"] = taskArns, + ["taskCount"] = ecsTasks.Count, + ["exitCodes"] = exitCodes, + ["status"] = allSucceeded ? "SUCCEEDED" : "FAILED" + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (AmazonECSException ex) + { + _logger.LogError(ex, "Failed to run ECS task from {TaskDef}", payload.TaskDefinition); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Failed to run task: {ex.Message}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private async Task<(bool Completed, List ExitCodes)> WaitForTasksAsync( + string cluster, + List taskArns, + TimeSpan timeout, + CancellationToken ct) + { + using var timeoutCts = new CancellationTokenSource(timeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + var exitCodes = new List(); + + try + { + while (!linkedCts.IsCancellationRequested) + { + var response = await _ecsClient.DescribeTasksAsync(new DescribeTasksRequest + { + Cluster = cluster, + Tasks = taskArns + }, linkedCts.Token); + + var allStopped = response.Tasks.All(t => t.LastStatus == "STOPPED"); + if (allStopped) + { + exitCodes = response.Tasks + .SelectMany(t => t.Containers.Select(c => c.ExitCode ?? -1)) + .ToList(); + return (true, exitCodes); + } + + await Task.Delay(TimeSpan.FromSeconds(10), linkedCts.Token); + } + } + catch (OperationCanceledException) when (timeoutCts.IsCancellationRequested) + { + _logger.LogWarning("Task completion wait timed out after {Timeout}", timeout); + } + + return (false, exitCodes); + } +} +``` + +### EcsHealthCheckTask + +```csharp +namespace StellaOps.Agent.Ecs.Tasks; + +public sealed class EcsHealthCheckTask +{ + private readonly IAmazonECS _ecsClient; + private readonly ILogger _logger; + + public sealed record HealthCheckPayload + { + public required string Cluster { get; init; } + public required string ServiceName { get; init; } + public int MinHealthyPercent { get; init; } = 100; + public TimeSpan Timeout { get; init; } = TimeSpan.FromMinutes(5); + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("ecs.health-check"); + + _logger.LogInformation( + "Checking health of ECS service {Service} in cluster {Cluster}", + payload.ServiceName, + payload.Cluster); + + try + { + using var timeoutCts = new CancellationTokenSource(payload.Timeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + while (!linkedCts.IsCancellationRequested) + { + var response = await _ecsClient.DescribeServicesAsync(new DescribeServicesRequest + { + Cluster = payload.Cluster, + Services = new List { payload.ServiceName } + }, linkedCts.Token); + + var service = response.Services.FirstOrDefault(); + if (service is null) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = "Service not found", + CompletedAt = DateTimeOffset.UtcNow + }; + } + + var healthyPercent = service.DesiredCount > 0 + ? (service.RunningCount * 100) / service.DesiredCount + : 0; + + if (healthyPercent >= payload.MinHealthyPercent && service.Deployments.Count == 1) + { + _logger.LogInformation( + "Service {Service} is healthy: {Running}/{Desired} tasks running ({Percent}%)", + payload.ServiceName, + service.RunningCount, + service.DesiredCount, + healthyPercent); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["serviceName"] = service.ServiceName, + ["runningCount"] = service.RunningCount, + ["desiredCount"] = service.DesiredCount, + ["healthyPercent"] = healthyPercent, + ["status"] = service.Status, + ["deployments"] = service.Deployments.Count + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + _logger.LogDebug( + "Service {Service} health check: {Running}/{Desired} ({Percent}%), waiting...", + payload.ServiceName, + service.RunningCount, + service.DesiredCount, + healthyPercent); + + await Task.Delay(TimeSpan.FromSeconds(10), linkedCts.Token); + } + + throw new OperationCanceledException(); + } + catch (OperationCanceledException) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Health check timed out after {payload.Timeout}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (AmazonECSException ex) + { + _logger.LogError(ex, "Failed to check health of ECS service {Service}", payload.ServiceName); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Health check failed: {ex.Message}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + } +} +``` + +### CloudWatchLogStreamer + +```csharp +namespace StellaOps.Agent.Ecs; + +public sealed class CloudWatchLogStreamer +{ + private readonly IAmazonCloudWatchLogs _logsClient; + private readonly LogStreamer _logStreamer; + private readonly ILogger _logger; + + public async Task StreamLogsAsync( + Guid taskId, + string logGroupName, + string logStreamName, + CancellationToken ct = default) + { + string? nextToken = null; + + try + { + while (!ct.IsCancellationRequested) + { + var request = new GetLogEventsRequest + { + LogGroupName = logGroupName, + LogStreamName = logStreamName, + StartFromHead = true, + NextToken = nextToken + }; + + var response = await _logsClient.GetLogEventsAsync(request, ct); + + foreach (var logEvent in response.Events) + { + var level = DetectLogLevel(logEvent.Message); + _logStreamer.Log(taskId, level, logEvent.Message); + } + + if (response.NextForwardToken == nextToken) + { + // No new logs, wait before polling again + await Task.Delay(TimeSpan.FromSeconds(2), ct); + } + + nextToken = response.NextForwardToken; + } + } + catch (OperationCanceledException) + { + // Expected when task completes + } + catch (Exception ex) + { + _logger.LogWarning( + ex, + "Error streaming logs from {LogGroup}/{LogStream}", + logGroupName, + logStreamName); + } + } + + private static LogLevel DetectLogLevel(string message) + { + if (message.Contains("ERROR", StringComparison.OrdinalIgnoreCase) || + message.Contains("FATAL", StringComparison.OrdinalIgnoreCase)) + { + return LogLevel.Error; + } + + if (message.Contains("WARN", StringComparison.OrdinalIgnoreCase)) + { + return LogLevel.Warning; + } + + if (message.Contains("DEBUG", StringComparison.OrdinalIgnoreCase)) + { + return LogLevel.Debug; + } + + return LogLevel.Information; + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Deploy new ECS services (Fargate and EC2 launch types) +- [ ] Update existing ECS services with new task definitions +- [ ] Run one-off ECS tasks +- [ ] Stop running ECS tasks +- [ ] Scale ECS services up/down +- [ ] Register new task definitions +- [ ] Check service health and stability +- [ ] Wait for deployments to complete +- [ ] Stream logs from CloudWatch +- [ ] Support network configuration (VPC, subnets, security groups) +- [ ] Support load balancer integration +- [ ] Support deployment circuit breaker +- [ ] Unit test coverage >= 85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 108_001 Agent Core Runtime | Internal | TODO | +| AWSSDK.ECS | NuGet | Available | +| AWSSDK.CloudWatchLogs | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| EcsCapability | TODO | | +| EcsDeployServiceTask | TODO | | +| EcsRunTaskTask | TODO | | +| EcsStopTaskTask | TODO | | +| EcsScaleServiceTask | TODO | | +| EcsRegisterTaskDefinitionTask | TODO | | +| EcsHealthCheckTask | TODO | | +| CloudWatchLogStreamer | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_108_007_AGENTS_nomad.md b/docs/implplan/SPRINT_20260110_108_007_AGENTS_nomad.md new file mode 100644 index 000000000..a36fd5fec --- /dev/null +++ b/docs/implplan/SPRINT_20260110_108_007_AGENTS_nomad.md @@ -0,0 +1,900 @@ +# SPRINT: Agent - Nomad + +> **Sprint ID:** 108_007 +> **Module:** AGENTS +> **Phase:** 8 - Agents +> **Status:** TODO +> **Parent:** [108_000_INDEX](SPRINT_20260110_108_000_INDEX_agents.md) + +--- + +## Overview + +Implement the Nomad Agent capability for managing HashiCorp Nomad job deployments, supporting Docker, raw_exec, and other Nomad task drivers. + +### Objectives + +- Nomad job deployments (register, run, stop) +- Job scaling operations +- Deployment monitoring and health checks +- Allocation status tracking +- Log streaming from allocations +- Support for multiple task drivers (docker, raw_exec, java) +- Constraint and affinity configuration + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Agents/ +│ └── StellaOps.Agent.Nomad/ +│ ├── NomadCapability.cs +│ ├── Tasks/ +│ │ ├── NomadDeployJobTask.cs +│ │ ├── NomadStopJobTask.cs +│ │ ├── NomadScaleJobTask.cs +│ │ ├── NomadJobStatusTask.cs +│ │ └── NomadHealthCheckTask.cs +│ ├── NomadClientFactory.cs +│ └── NomadLogStreamer.cs +└── __Tests/ + └── StellaOps.Agent.Nomad.Tests/ +``` + +--- + +## Deliverables + +### NomadCapability + +```csharp +namespace StellaOps.Agent.Nomad; + +public sealed class NomadCapability : IAgentCapability +{ + private readonly NomadClient _nomadClient; + private readonly ILogger _logger; + private readonly Dictionary>> _taskHandlers; + + public string Name => "nomad"; + public string Version => "1.0.0"; + + public IReadOnlyList SupportedTaskTypes => new[] + { + "nomad.deploy-job", + "nomad.stop-job", + "nomad.scale-job", + "nomad.job-status", + "nomad.health-check", + "nomad.dispatch-job" + }; + + public NomadCapability(NomadClient nomadClient, ILogger logger) + { + _nomadClient = nomadClient; + _logger = logger; + + _taskHandlers = new Dictionary>> + { + ["nomad.deploy-job"] = ExecuteDeployJobAsync, + ["nomad.stop-job"] = ExecuteStopJobAsync, + ["nomad.scale-job"] = ExecuteScaleJobAsync, + ["nomad.job-status"] = ExecuteJobStatusAsync, + ["nomad.health-check"] = ExecuteHealthCheckAsync, + ["nomad.dispatch-job"] = ExecuteDispatchJobAsync + }; + } + + public async Task InitializeAsync(CancellationToken ct = default) + { + try + { + var status = await _nomadClient.Agent.GetSelfAsync(ct); + _logger.LogInformation( + "Nomad capability initialized, connected to {Region} region (version {Version})", + status.Stats["nomad"]["region"], + status.Stats["nomad"]["version"]); + return true; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to initialize Nomad capability"); + return false; + } + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct = default) + { + if (!_taskHandlers.TryGetValue(task.TaskType, out var handler)) + { + throw new UnsupportedTaskTypeException(task.TaskType); + } + + return await handler(task, ct); + } + + public async Task CheckHealthAsync(CancellationToken ct = default) + { + try + { + var status = await _nomadClient.Agent.GetSelfAsync(ct); + return new CapabilityHealthStatus(true, $"Nomad agent responding ({status.Stats["nomad"]["region"]})"); + } + catch (Exception ex) + { + return new CapabilityHealthStatus(false, $"Nomad agent not responding: {ex.Message}"); + } + } + + private Task ExecuteDeployJobAsync(AgentTask task, CancellationToken ct) => + new NomadDeployJobTask(_nomadClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteStopJobAsync(AgentTask task, CancellationToken ct) => + new NomadStopJobTask(_nomadClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteScaleJobAsync(AgentTask task, CancellationToken ct) => + new NomadScaleJobTask(_nomadClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteJobStatusAsync(AgentTask task, CancellationToken ct) => + new NomadJobStatusTask(_nomadClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteHealthCheckAsync(AgentTask task, CancellationToken ct) => + new NomadHealthCheckTask(_nomadClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteDispatchJobAsync(AgentTask task, CancellationToken ct) => + new NomadDispatchJobTask(_nomadClient, _logger).ExecuteAsync(task, ct); +} +``` + +### NomadDeployJobTask + +```csharp +namespace StellaOps.Agent.Nomad.Tasks; + +public sealed class NomadDeployJobTask +{ + private readonly NomadClient _nomadClient; + private readonly ILogger _logger; + + public sealed record DeployJobPayload + { + /// + /// Job specification in HCL or JSON format. + /// + public string? JobSpec { get; init; } + + /// + /// Pre-parsed job definition (alternative to JobSpec). + /// + public JobDefinition? Job { get; init; } + + /// + /// Variables to substitute in job spec. + /// + public IReadOnlyDictionary? Variables { get; init; } + + /// + /// Nomad namespace. + /// + public string? Namespace { get; init; } + + /// + /// Region to deploy to. + /// + public string? Region { get; init; } + + /// + /// Whether to wait for deployment to complete. + /// + public bool WaitForDeployment { get; init; } = true; + + /// + /// Deployment completion timeout. + /// + public TimeSpan DeploymentTimeout { get; init; } = TimeSpan.FromMinutes(10); + + /// + /// If true, job is run in detached mode (fire and forget). + /// + public bool Detach { get; init; } = false; + } + + public sealed record JobDefinition + { + public required string ID { get; init; } + public required string Name { get; init; } + public string Type { get; init; } = "service"; + public string? Namespace { get; init; } + public string? Region { get; init; } + public int Priority { get; init; } = 50; + public IReadOnlyList? Datacenters { get; init; } + public IReadOnlyList? TaskGroups { get; init; } + public UpdateStrategy? Update { get; init; } + public IReadOnlyDictionary? Meta { get; init; } + public IReadOnlyList? Constraints { get; init; } + } + + public sealed record TaskGroupDefinition + { + public required string Name { get; init; } + public int Count { get; init; } = 1; + public IReadOnlyList? Tasks { get; init; } + public IReadOnlyList? Networks { get; init; } + public IReadOnlyList? Services { get; init; } + public RestartPolicy? RestartPolicy { get; init; } + public EphemeralDisk? EphemeralDisk { get; init; } + } + + public sealed record TaskDefinition + { + public required string Name { get; init; } + public required string Driver { get; init; } // docker, raw_exec, java, etc. + public required IReadOnlyDictionary Config { get; init; } + public ResourceRequirements? Resources { get; init; } + public IReadOnlyDictionary? Env { get; init; } + public IReadOnlyList? Templates { get; init; } + public IReadOnlyList? Artifacts { get; init; } + public LogConfig? Logs { get; init; } + } + + public sealed record UpdateStrategy + { + public int MaxParallel { get; init; } = 1; + public string HealthCheck { get; init; } = "checks"; + public TimeSpan MinHealthyTime { get; init; } = TimeSpan.FromSeconds(10); + public TimeSpan HealthyDeadline { get; init; } = TimeSpan.FromMinutes(5); + public bool AutoRevert { get; init; } = false; + public bool AutoPromote { get; init; } = false; + public int Canary { get; init; } = 0; + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("nomad.deploy-job"); + + Job nomadJob; + + if (!string.IsNullOrEmpty(payload.JobSpec)) + { + // Parse HCL or JSON job spec + var parseResponse = await _nomadClient.Jobs.ParseJobAsync( + payload.JobSpec, + payload.Variables?.ToDictionary(kv => kv.Key, kv => kv.Value), + ct); + nomadJob = parseResponse; + } + else if (payload.Job is not null) + { + nomadJob = ConvertToNomadJob(payload.Job); + } + else + { + throw new InvalidPayloadException("nomad.deploy-job", "Either JobSpec or Job must be provided"); + } + + _logger.LogInformation( + "Deploying Nomad job {JobId} to region {Region}", + nomadJob.ID, + payload.Region ?? "default"); + + try + { + // Register the job + var registerResponse = await _nomadClient.Jobs.RegisterAsync( + nomadJob, + new WriteOptions + { + Namespace = payload.Namespace, + Region = payload.Region + }, + ct); + + _logger.LogInformation( + "Registered Nomad job {JobId}, evaluation ID: {EvalId}", + nomadJob.ID, + registerResponse.EvalID); + + if (payload.Detach) + { + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["jobId"] = nomadJob.ID, + ["evalId"] = registerResponse.EvalID, + ["status"] = "DETACHED" + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + if (!payload.WaitForDeployment) + { + // Just wait for evaluation to complete + var evaluation = await WaitForEvaluationAsync( + registerResponse.EvalID, + payload.Namespace, + TimeSpan.FromMinutes(2), + ct); + + return new TaskResult + { + TaskId = task.Id, + Success = evaluation.Status == "complete", + Outputs = new Dictionary + { + ["jobId"] = nomadJob.ID, + ["evalId"] = registerResponse.EvalID, + ["evalStatus"] = evaluation.Status, + ["status"] = evaluation.Status == "complete" ? "EVALUATED" : "EVAL_FAILED" + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + // Wait for deployment to complete + var deployment = await WaitForDeploymentAsync( + nomadJob.ID, + payload.Namespace, + payload.DeploymentTimeout, + ct); + + var success = deployment?.Status == "successful"; + + return new TaskResult + { + TaskId = task.Id, + Success = success, + Error = success ? null : $"Deployment failed: {deployment?.StatusDescription ?? "unknown"}", + Outputs = new Dictionary + { + ["jobId"] = nomadJob.ID, + ["evalId"] = registerResponse.EvalID, + ["deploymentId"] = deployment?.ID ?? "", + ["deploymentStatus"] = deployment?.Status ?? "unknown", + ["status"] = success ? "DEPLOYED" : "DEPLOYMENT_FAILED" + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (NomadApiException ex) + { + _logger.LogError(ex, "Failed to deploy Nomad job {JobId}", nomadJob.ID); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Failed to deploy job: {ex.Message}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private async Task WaitForEvaluationAsync( + string evalId, + string? ns, + TimeSpan timeout, + CancellationToken ct) + { + using var timeoutCts = new CancellationTokenSource(timeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + while (!linkedCts.IsCancellationRequested) + { + var evaluation = await _nomadClient.Evaluations.GetAsync( + evalId, + new QueryOptions { Namespace = ns }, + linkedCts.Token); + + if (evaluation.Status is "complete" or "failed" or "canceled") + { + return evaluation; + } + + _logger.LogDebug("Evaluation {EvalId} status: {Status}", evalId, evaluation.Status); + await Task.Delay(TimeSpan.FromSeconds(2), linkedCts.Token); + } + + throw new OperationCanceledException("Evaluation wait timed out"); + } + + private async Task WaitForDeploymentAsync( + string jobId, + string? ns, + TimeSpan timeout, + CancellationToken ct) + { + using var timeoutCts = new CancellationTokenSource(timeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + Deployment? deployment = null; + + while (!linkedCts.IsCancellationRequested) + { + var deployments = await _nomadClient.Jobs.GetDeploymentsAsync( + jobId, + new QueryOptions { Namespace = ns }, + linkedCts.Token); + + deployment = deployments.FirstOrDefault(); + if (deployment is null) + { + await Task.Delay(TimeSpan.FromSeconds(2), linkedCts.Token); + continue; + } + + if (deployment.Status is "successful" or "failed" or "cancelled") + { + return deployment; + } + + _logger.LogDebug( + "Deployment {DeploymentId} status: {Status}", + deployment.ID, + deployment.Status); + + await Task.Delay(TimeSpan.FromSeconds(5), linkedCts.Token); + } + + return deployment; + } + + private static Job ConvertToNomadJob(JobDefinition def) + { + return new Job + { + ID = def.ID, + Name = def.Name, + Type = def.Type, + Namespace = def.Namespace, + Region = def.Region, + Priority = def.Priority, + Datacenters = def.Datacenters?.ToList(), + Meta = def.Meta?.ToDictionary(kv => kv.Key, kv => kv.Value), + TaskGroups = def.TaskGroups?.Select(tg => new TaskGroup + { + Name = tg.Name, + Count = tg.Count, + Tasks = tg.Tasks?.Select(t => new Task + { + Name = t.Name, + Driver = t.Driver, + Config = t.Config?.ToDictionary(kv => kv.Key, kv => kv.Value), + Env = t.Env?.ToDictionary(kv => kv.Key, kv => kv.Value), + Resources = t.Resources is not null ? new Resources + { + CPU = t.Resources.CPU, + MemoryMB = t.Resources.MemoryMB + } : null + }).ToList() + }).ToList(), + Update = def.Update is not null ? new UpdateStrategy + { + MaxParallel = def.Update.MaxParallel, + HealthCheck = def.Update.HealthCheck, + MinHealthyTime = (long)def.Update.MinHealthyTime.TotalNanoseconds, + HealthyDeadline = (long)def.Update.HealthyDeadline.TotalNanoseconds, + AutoRevert = def.Update.AutoRevert, + AutoPromote = def.Update.AutoPromote, + Canary = def.Update.Canary + } : null + }; + } +} +``` + +### NomadStopJobTask + +```csharp +namespace StellaOps.Agent.Nomad.Tasks; + +public sealed class NomadStopJobTask +{ + private readonly NomadClient _nomadClient; + private readonly ILogger _logger; + + public sealed record StopJobPayload + { + public required string JobId { get; init; } + public string? Namespace { get; init; } + public string? Region { get; init; } + public bool Purge { get; init; } = false; + public bool Global { get; init; } = false; + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("nomad.stop-job"); + + _logger.LogInformation( + "Stopping Nomad job {JobId} (purge: {Purge})", + payload.JobId, + payload.Purge); + + try + { + var response = await _nomadClient.Jobs.DeregisterAsync( + payload.JobId, + payload.Purge, + payload.Global, + new WriteOptions + { + Namespace = payload.Namespace, + Region = payload.Region + }, + ct); + + _logger.LogInformation( + "Stopped Nomad job {JobId}, evaluation ID: {EvalId}", + payload.JobId, + response.EvalID); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["jobId"] = payload.JobId, + ["evalId"] = response.EvalID, + ["purged"] = payload.Purge + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (NomadApiException ex) + { + _logger.LogError(ex, "Failed to stop Nomad job {JobId}", payload.JobId); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Failed to stop job: {ex.Message}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + } +} +``` + +### NomadScaleJobTask + +```csharp +namespace StellaOps.Agent.Nomad.Tasks; + +public sealed class NomadScaleJobTask +{ + private readonly NomadClient _nomadClient; + private readonly ILogger _logger; + + public sealed record ScaleJobPayload + { + public required string JobId { get; init; } + public required string TaskGroup { get; init; } + public required int Count { get; init; } + public string? Namespace { get; init; } + public string? Region { get; init; } + public string? Reason { get; init; } + public bool PolicyOverride { get; init; } = false; + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("nomad.scale-job"); + + _logger.LogInformation( + "Scaling Nomad job {JobId} task group {TaskGroup} to {Count}", + payload.JobId, + payload.TaskGroup, + payload.Count); + + try + { + var response = await _nomadClient.Jobs.ScaleAsync( + payload.JobId, + payload.TaskGroup, + payload.Count, + payload.Reason ?? $"Scaled by Stella Ops (task: {task.Id})", + payload.PolicyOverride, + new WriteOptions + { + Namespace = payload.Namespace, + Region = payload.Region + }, + ct); + + _logger.LogInformation( + "Scaled Nomad job {JobId} task group {TaskGroup} to {Count}, evaluation ID: {EvalId}", + payload.JobId, + payload.TaskGroup, + payload.Count, + response.EvalID); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["jobId"] = payload.JobId, + ["taskGroup"] = payload.TaskGroup, + ["count"] = payload.Count, + ["evalId"] = response.EvalID + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (NomadApiException ex) + { + _logger.LogError( + ex, + "Failed to scale Nomad job {JobId} task group {TaskGroup}", + payload.JobId, + payload.TaskGroup); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Failed to scale job: {ex.Message}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + } +} +``` + +### NomadHealthCheckTask + +```csharp +namespace StellaOps.Agent.Nomad.Tasks; + +public sealed class NomadHealthCheckTask +{ + private readonly NomadClient _nomadClient; + private readonly ILogger _logger; + + public sealed record HealthCheckPayload + { + public required string JobId { get; init; } + public string? Namespace { get; init; } + public string? Region { get; init; } + public int MinHealthyAllocations { get; init; } = 1; + public TimeSpan Timeout { get; init; } = TimeSpan.FromMinutes(5); + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("nomad.health-check"); + + _logger.LogInformation( + "Checking health of Nomad job {JobId}", + payload.JobId); + + try + { + using var timeoutCts = new CancellationTokenSource(payload.Timeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + while (!linkedCts.IsCancellationRequested) + { + var allocations = await _nomadClient.Jobs.GetAllocationsAsync( + payload.JobId, + new QueryOptions + { + Namespace = payload.Namespace, + Region = payload.Region + }, + linkedCts.Token); + + var runningAllocations = allocations + .Where(a => a.ClientStatus == "running") + .ToList(); + + var healthyCount = runningAllocations + .Count(a => a.DeploymentStatus?.Healthy == true); + + if (healthyCount >= payload.MinHealthyAllocations) + { + _logger.LogInformation( + "Nomad job {JobId} is healthy: {Healthy}/{Total} allocations healthy", + payload.JobId, + healthyCount, + runningAllocations.Count); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["jobId"] = payload.JobId, + ["healthyAllocations"] = healthyCount, + ["totalAllocations"] = allocations.Count, + ["runningAllocations"] = runningAllocations.Count + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + _logger.LogDebug( + "Nomad job {JobId} health check: {Healthy}/{MinRequired} healthy, waiting...", + payload.JobId, + healthyCount, + payload.MinHealthyAllocations); + + await Task.Delay(TimeSpan.FromSeconds(5), linkedCts.Token); + } + + throw new OperationCanceledException(); + } + catch (OperationCanceledException) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Health check timed out after {payload.Timeout}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (NomadApiException ex) + { + _logger.LogError(ex, "Failed to check health of Nomad job {JobId}", payload.JobId); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Health check failed: {ex.Message}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + } +} +``` + +### NomadLogStreamer + +```csharp +namespace StellaOps.Agent.Nomad; + +public sealed class NomadLogStreamer +{ + private readonly NomadClient _nomadClient; + private readonly LogStreamer _logStreamer; + private readonly ILogger _logger; + + public async Task StreamLogsAsync( + Guid taskId, + string allocationId, + string taskName, + string logType, // "stdout" or "stderr" + CancellationToken ct = default) + { + try + { + var stream = await _nomadClient.Allocations.GetLogsAsync( + allocationId, + taskName, + logType, + follow: true, + ct); + + using var reader = new StreamReader(stream); + + while (!ct.IsCancellationRequested) + { + var line = await reader.ReadLineAsync(ct); + if (line is null) + break; + + var level = logType == "stderr" ? LogLevel.Error : LogLevel.Information; + + // Override level based on content heuristics + if (logType == "stdout") + { + level = DetectLogLevel(line); + } + + _logStreamer.Log(taskId, level, line); + } + } + catch (OperationCanceledException) + { + // Expected when task completes + } + catch (Exception ex) + { + _logger.LogWarning( + ex, + "Error streaming logs for allocation {AllocationId} task {TaskName}", + allocationId, + taskName); + } + } + + private static LogLevel DetectLogLevel(string message) + { + if (message.Contains("ERROR", StringComparison.OrdinalIgnoreCase) || + message.Contains("FATAL", StringComparison.OrdinalIgnoreCase)) + { + return LogLevel.Error; + } + + if (message.Contains("WARN", StringComparison.OrdinalIgnoreCase)) + { + return LogLevel.Warning; + } + + if (message.Contains("DEBUG", StringComparison.OrdinalIgnoreCase)) + { + return LogLevel.Debug; + } + + return LogLevel.Information; + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Register and deploy Nomad jobs from HCL or JSON spec +- [ ] Register and deploy Nomad jobs from structured JobDefinition +- [ ] Stop Nomad jobs (with optional purge) +- [ ] Scale Nomad job task groups +- [ ] Check job health and allocation status +- [ ] Wait for deployments to complete +- [ ] Dispatch parameterized batch jobs +- [ ] Stream logs from allocations +- [ ] Support Docker task driver +- [ ] Support raw_exec task driver +- [ ] Support job constraints and affinities +- [ ] Support update strategies (rolling, canary) +- [ ] Unit test coverage >= 85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 108_001 Agent Core Runtime | Internal | TODO | +| Nomad.Api (or custom HTTP client) | NuGet/Custom | TODO | + +> **Note:** HashiCorp does not provide an official .NET SDK for Nomad. Implementation will use a custom HTTP client wrapper or community library. + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| NomadCapability | TODO | | +| NomadDeployJobTask | TODO | | +| NomadStopJobTask | TODO | | +| NomadScaleJobTask | TODO | | +| NomadJobStatusTask | TODO | | +| NomadHealthCheckTask | TODO | | +| NomadDispatchJobTask | TODO | | +| NomadLogStreamer | TODO | | +| NomadClient wrapper | TODO | Custom HTTP client | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_109_000_INDEX_evidence_audit.md b/docs/implplan/SPRINT_20260110_109_000_INDEX_evidence_audit.md new file mode 100644 index 000000000..14455249f --- /dev/null +++ b/docs/implplan/SPRINT_20260110_109_000_INDEX_evidence_audit.md @@ -0,0 +1,243 @@ +# SPRINT INDEX: Phase 9 - Evidence & Audit + +> **Epic:** Release Orchestrator +> **Phase:** 9 - Evidence & Audit +> **Batch:** 109 +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_release_orchestrator.md) + +--- + +## Overview + +Phase 9 implements the Evidence & Audit system - generating cryptographically signed, immutable evidence packets for every deployment decision. + +### Objectives + +- Evidence collector gathers deployment context +- Evidence signer creates tamper-proof signatures +- Version sticker writer records deployment state +- Audit exporter generates compliance reports + +--- + +## Sprint Structure + +| Sprint ID | Title | Module | Status | Dependencies | +|-----------|-------|--------|--------|--------------| +| 109_001 | Evidence Collector | RELEVI | TODO | 106_005, 107_001 | +| 109_002 | Evidence Signer | RELEVI | TODO | 109_001 | +| 109_003 | Version Sticker Writer | RELEVI | TODO | 107_002 | +| 109_004 | Audit Exporter | RELEVI | TODO | 109_002 | + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ EVIDENCE & AUDIT │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ EVIDENCE COLLECTOR (109_001) │ │ +│ │ │ │ +│ │ Collects from: │ │ +│ │ ├── Release bundle (components, digests, source refs) │ │ +│ │ ├── Promotion (requester, approvers, gates) │ │ +│ │ ├── Deployment (targets, tasks, artifacts) │ │ +│ │ └── Decision (gate results, freeze window status) │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Evidence Packet │ │ │ +│ │ │ { │ │ │ +│ │ │ "type": "deployment", │ │ │ +│ │ │ "release": { ... }, │ │ │ +│ │ │ "environment": { ... }, │ │ │ +│ │ │ "actors": { requester, approvers, deployer }, │ │ │ +│ │ │ "decision": { gates, freeze_check, sod }, │ │ │ +│ │ │ "execution": { tasks, artifacts, metrics } │ │ │ +│ │ │ } │ │ │ +│ │ └─────────────────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ EVIDENCE SIGNER (109_002) │ │ +│ │ │ │ +│ │ 1. Canonicalize JSON (RFC 8785) │ │ +│ │ 2. Hash content (SHA-256) │ │ +│ │ 3. Sign hash with signing key (RS256 or ES256) │ │ +│ │ 4. Store in append-only table │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ { │ │ │ +│ │ │ "content": { ... }, │ │ │ +│ │ │ "contentHash": "sha256:abc...", │ │ │ +│ │ │ "signature": "base64...", │ │ │ +│ │ │ "signatureAlgorithm": "RS256", │ │ │ +│ │ │ "signerKeyRef": "stella/signing/prod-key-2026" │ │ │ +│ │ │ } │ │ │ +│ │ └─────────────────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ VERSION STICKER WRITER (109_003) │ │ +│ │ │ │ +│ │ stella.version.json written to each target: │ │ +│ │ { │ │ +│ │ "release": "myapp-v2.3.1", │ │ +│ │ "deployment_id": "uuid", │ │ +│ │ "deployed_at": "2026-01-10T14:35:00Z", │ │ +│ │ "components": [ │ │ +│ │ { "name": "api", "digest": "sha256:..." }, │ │ +│ │ { "name": "worker", "digest": "sha256:..." } │ │ +│ │ ], │ │ +│ │ "evidence_id": "evid-uuid" │ │ +│ │ } │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ AUDIT EXPORTER (109_004) │ │ +│ │ │ │ +│ │ Export formats: │ │ +│ │ ├── JSON - Machine-readable, full detail │ │ +│ │ ├── PDF - Human-readable compliance reports │ │ +│ │ ├── CSV - Spreadsheet analysis │ │ +│ │ └── SLSA - SLSA provenance format │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Deliverables Summary + +### 109_001: Evidence Collector + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IEvidenceCollector` | Interface | Evidence collection | +| `EvidenceCollector` | Class | Implementation | +| `EvidenceContent` | Model | Evidence structure | +| `ContentBuilder` | Class | Build evidence sections | + +### 109_002: Evidence Signer + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IEvidenceSigner` | Interface | Signing operations | +| `EvidenceSigner` | Class | Implementation | +| `CanonicalJsonSerializer` | Class | RFC 8785 canonicalization | +| `SigningKeyProvider` | Class | Key management | + +### 109_003: Version Sticker Writer + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IVersionStickerWriter` | Interface | Sticker writing | +| `VersionStickerWriter` | Class | Implementation | +| `VersionSticker` | Model | Sticker structure | +| `StickerAgent Task` | Task | Agent writes sticker | + +### 109_004: Audit Exporter + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IAuditExporter` | Interface | Export operations | +| `JsonExporter` | Exporter | JSON format | +| `PdfExporter` | Exporter | PDF format | +| `CsvExporter` | Exporter | CSV format | +| `SlsaExporter` | Exporter | SLSA format | + +--- + +## Key Interfaces + +```csharp +public interface IEvidenceCollector +{ + Task CollectAsync(Guid promotionId, EvidenceType type, CancellationToken ct); +} + +public interface IEvidenceSigner +{ + Task SignAsync(EvidencePacket packet, CancellationToken ct); + Task VerifyAsync(SignedEvidencePacket packet, CancellationToken ct); +} + +public interface IVersionStickerWriter +{ + Task WriteAsync(Guid deploymentTaskId, VersionSticker sticker, CancellationToken ct); + Task ReadAsync(Guid targetId, CancellationToken ct); +} + +public interface IAuditExporter +{ + Task ExportAsync(AuditExportRequest request, CancellationToken ct); + IReadOnlyList SupportedFormats { get; } +} +``` + +--- + +## Evidence Lifecycle + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ EVIDENCE LIFECYCLE │ +│ │ +│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ +│ │ Promotion │───►│ Collect │───►│ Sign │───►│ Store │ │ +│ │ Complete │ │ Evidence │ │ Evidence │ │ (immutable) │ │ +│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ +│ │ +│ │ │ +│ ┌───────────────────┴───────────────────┐ │ +│ ▼ ▼ │ +│ ┌─────────────┐ ┌─────────────┐ │ +│ │ Export │ │ Verify │ │ +│ │ (on-demand)│ │ (on-demand) │ │ +│ └─────────────┘ └─────────────┘ │ +│ │ │ │ +│ ┌───────────┼───────────┬───────────┐ │ │ +│ ▼ ▼ ▼ ▼ ▼ │ +│ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌───────────┐ │ +│ │ JSON │ │ PDF │ │ CSV │ │ SLSA │ │ Verified │ │ +│ └────────┘ └────────┘ └────────┘ └────────┘ │ Report │ │ +│ └───────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Dependencies + +| Module | Purpose | +|--------|---------| +| 106_005 Decision Engine | Decision data | +| 107_001 Deploy Orchestrator | Deployment data | +| 107_002 Target Executor | Task data | +| Signer | Cryptographic signing | + +--- + +## Acceptance Criteria + +- [ ] Evidence collected for all promotions +- [ ] Evidence signed with platform key +- [ ] Signature verification works +- [ ] Append-only storage enforced +- [ ] Version sticker written to targets +- [ ] JSON export works +- [ ] PDF export readable +- [ ] SLSA format compliant +- [ ] Unit test coverage ≥80% + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Phase 9 index created | diff --git a/docs/implplan/SPRINT_20260110_109_001_RELEVI_evidence_collector.md b/docs/implplan/SPRINT_20260110_109_001_RELEVI_evidence_collector.md new file mode 100644 index 000000000..0851221bb --- /dev/null +++ b/docs/implplan/SPRINT_20260110_109_001_RELEVI_evidence_collector.md @@ -0,0 +1,597 @@ +# SPRINT: Evidence Collector + +> **Sprint ID:** 109_001 +> **Module:** RELEVI +> **Phase:** 9 - Evidence & Audit +> **Status:** TODO +> **Parent:** [109_000_INDEX](SPRINT_20260110_109_000_INDEX_evidence_audit.md) + +--- + +## Overview + +Implement the Evidence Collector for gathering deployment decision context into cryptographically sealed evidence packets. + +### Objectives + +- Collect evidence from release, promotion, and deployment data +- Build comprehensive evidence packets +- Track evidence dependencies and lineage +- Store evidence in append-only store + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Evidence/ +│ ├── Collector/ +│ │ ├── IEvidenceCollector.cs +│ │ ├── EvidenceCollector.cs +│ │ ├── ContentBuilder.cs +│ │ └── Collectors/ +│ │ ├── ReleaseEvidenceCollector.cs +│ │ ├── PromotionEvidenceCollector.cs +│ │ ├── DeploymentEvidenceCollector.cs +│ │ └── DecisionEvidenceCollector.cs +│ ├── Models/ +│ │ ├── EvidencePacket.cs +│ │ ├── EvidenceContent.cs +│ │ └── EvidenceType.cs +│ └── Store/ +│ ├── IEvidenceStore.cs +│ └── EvidenceStore.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### IEvidenceCollector Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Collector; + +public interface IEvidenceCollector +{ + Task CollectAsync( + Guid subjectId, + EvidenceType type, + CancellationToken ct = default); + + Task CollectDeploymentEvidenceAsync( + Guid deploymentJobId, + CancellationToken ct = default); + + Task CollectPromotionEvidenceAsync( + Guid promotionId, + CancellationToken ct = default); +} + +public enum EvidenceType +{ + Promotion, + Deployment, + Rollback, + GateDecision, + Approval +} +``` + +### EvidencePacket Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Models; + +public sealed record EvidencePacket +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required EvidenceType Type { get; init; } + public required Guid SubjectId { get; init; } + public required string SubjectType { get; init; } + public required EvidenceContent Content { get; init; } + public required ImmutableArray DependsOn { get; init; } + public required DateTimeOffset CollectedAt { get; init; } + public required string CollectorVersion { get; init; } +} + +public sealed record EvidenceContent +{ + public required ReleaseEvidence? Release { get; init; } + public required PromotionEvidence? Promotion { get; init; } + public required DeploymentEvidence? Deployment { get; init; } + public required DecisionEvidence? Decision { get; init; } + public required ImmutableDictionary Metadata { get; init; } +} + +public sealed record ReleaseEvidence +{ + public required Guid ReleaseId { get; init; } + public required string ReleaseName { get; init; } + public string? ManifestDigest { get; init; } + public DateTimeOffset? FinalizedAt { get; init; } + public required ImmutableArray Components { get; init; } +} + +public sealed record ComponentEvidence +{ + public required Guid ComponentId { get; init; } + public required string ComponentName { get; init; } + public required string Digest { get; init; } + public string? Tag { get; init; } + public string? SemVer { get; init; } + public string? SourceRef { get; init; } + public string? SbomDigest { get; init; } +} + +public sealed record PromotionEvidence +{ + public required Guid PromotionId { get; init; } + public required Guid SourceEnvironmentId { get; init; } + public required string SourceEnvironmentName { get; init; } + public required Guid TargetEnvironmentId { get; init; } + public required string TargetEnvironmentName { get; init; } + public required ActorEvidence Requester { get; init; } + public required ImmutableArray Approvals { get; init; } + public required DateTimeOffset RequestedAt { get; init; } + public DateTimeOffset? ApprovedAt { get; init; } +} + +public sealed record ActorEvidence +{ + public required Guid UserId { get; init; } + public required string UserName { get; init; } + public required string UserEmail { get; init; } + public ImmutableArray Groups { get; init; } = []; +} + +public sealed record ApprovalEvidence +{ + public required ActorEvidence Approver { get; init; } + public required string Decision { get; init; } + public string? Comment { get; init; } + public required DateTimeOffset DecidedAt { get; init; } +} + +public sealed record DeploymentEvidence +{ + public required Guid DeploymentJobId { get; init; } + public required string Strategy { get; init; } + public required DateTimeOffset StartedAt { get; init; } + public DateTimeOffset? CompletedAt { get; init; } + public required string Status { get; init; } + public required ImmutableArray Tasks { get; init; } + public required ImmutableArray Artifacts { get; init; } +} + +public sealed record TaskEvidence +{ + public required Guid TaskId { get; init; } + public required Guid TargetId { get; init; } + public required string TargetName { get; init; } + public required string Status { get; init; } + public DateTimeOffset? StartedAt { get; init; } + public DateTimeOffset? CompletedAt { get; init; } + public string? Error { get; init; } +} + +public sealed record ArtifactEvidence +{ + public required string ArtifactType { get; init; } + public required string Digest { get; init; } + public required string Location { get; init; } +} + +public sealed record DecisionEvidence +{ + public required ImmutableArray GateResults { get; init; } + public required FreezeCheckEvidence FreezeCheck { get; init; } + public required SodCheckEvidence SodCheck { get; init; } + public required string FinalDecision { get; init; } + public required DateTimeOffset DecidedAt { get; init; } +} + +public sealed record GateResultEvidence +{ + public required string GateName { get; init; } + public required string GateType { get; init; } + public required bool Passed { get; init; } + public string? Message { get; init; } + public ImmutableDictionary? Details { get; init; } + public required DateTimeOffset EvaluatedAt { get; init; } +} + +public sealed record FreezeCheckEvidence +{ + public required bool Checked { get; init; } + public required bool FreezeActive { get; init; } + public string? FreezeReason { get; init; } + public bool Overridden { get; init; } + public ActorEvidence? OverriddenBy { get; init; } +} + +public sealed record SodCheckEvidence +{ + public required bool Required { get; init; } + public required bool Satisfied { get; init; } + public string? Violation { get; init; } +} +``` + +### EvidenceCollector Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Collector; + +public sealed class EvidenceCollector : IEvidenceCollector +{ + private readonly ReleaseEvidenceCollector _releaseCollector; + private readonly PromotionEvidenceCollector _promotionCollector; + private readonly DeploymentEvidenceCollector _deploymentCollector; + private readonly DecisionEvidenceCollector _decisionCollector; + private readonly IEvidenceStore _evidenceStore; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + private readonly ITenantContext _tenantContext; + private readonly ILogger _logger; + + private const string CollectorVersion = "1.0.0"; + + public async Task CollectAsync( + Guid subjectId, + EvidenceType type, + CancellationToken ct = default) + { + return type switch + { + EvidenceType.Promotion => await CollectPromotionEvidenceAsync(subjectId, ct), + EvidenceType.Deployment => await CollectDeploymentEvidenceAsync(subjectId, ct), + _ => throw new UnsupportedEvidenceTypeException(type) + }; + } + + public async Task CollectDeploymentEvidenceAsync( + Guid deploymentJobId, + CancellationToken ct = default) + { + _logger.LogInformation( + "Collecting deployment evidence for job {JobId}", + deploymentJobId); + + // Collect all evidence sections + var deploymentEvidence = await _deploymentCollector.CollectAsync(deploymentJobId, ct); + var releaseEvidence = await _releaseCollector.CollectAsync(deploymentEvidence.ReleaseId, ct); + var promotionEvidence = await _promotionCollector.CollectAsync(deploymentEvidence.PromotionId, ct); + var decisionEvidence = await _decisionCollector.CollectAsync(deploymentEvidence.PromotionId, ct); + + var content = new EvidenceContent + { + Release = releaseEvidence, + Promotion = promotionEvidence, + Deployment = deploymentEvidence.ToEvidence(), + Decision = decisionEvidence, + Metadata = ImmutableDictionary.Empty + .Add("platform", "stella-ops") + .Add("collectorVersion", CollectorVersion) + }; + + var packet = new EvidencePacket + { + Id = _guidGenerator.NewGuid(), + TenantId = _tenantContext.TenantId, + Type = EvidenceType.Deployment, + SubjectId = deploymentJobId, + SubjectType = "DeploymentJob", + Content = content, + DependsOn = await GetDependentEvidenceAsync(deploymentJobId, ct), + CollectedAt = _timeProvider.GetUtcNow(), + CollectorVersion = CollectorVersion + }; + + // Store the packet + await _evidenceStore.StoreAsync(packet, ct); + + _logger.LogInformation( + "Collected deployment evidence {PacketId} for job {JobId}", + packet.Id, + deploymentJobId); + + return packet; + } + + public async Task CollectPromotionEvidenceAsync( + Guid promotionId, + CancellationToken ct = default) + { + _logger.LogInformation( + "Collecting promotion evidence for {PromotionId}", + promotionId); + + var promotionEvidence = await _promotionCollector.CollectAsync(promotionId, ct); + var releaseEvidence = await _releaseCollector.CollectAsync(promotionEvidence.ReleaseId, ct); + var decisionEvidence = await _decisionCollector.CollectAsync(promotionId, ct); + + var content = new EvidenceContent + { + Release = releaseEvidence, + Promotion = promotionEvidence.ToEvidence(), + Deployment = null, + Decision = decisionEvidence, + Metadata = ImmutableDictionary.Empty + .Add("platform", "stella-ops") + .Add("collectorVersion", CollectorVersion) + }; + + var packet = new EvidencePacket + { + Id = _guidGenerator.NewGuid(), + TenantId = _tenantContext.TenantId, + Type = EvidenceType.Promotion, + SubjectId = promotionId, + SubjectType = "Promotion", + Content = content, + DependsOn = ImmutableArray.Empty, + CollectedAt = _timeProvider.GetUtcNow(), + CollectorVersion = CollectorVersion + }; + + await _evidenceStore.StoreAsync(packet, ct); + + _logger.LogInformation( + "Collected promotion evidence {PacketId} for {PromotionId}", + packet.Id, + promotionId); + + return packet; + } + + private async Task> GetDependentEvidenceAsync( + Guid deploymentJobId, + CancellationToken ct) + { + // Find promotion evidence that this deployment depends on + var promotion = await _promotionCollector.GetPromotionForJobAsync(deploymentJobId, ct); + if (promotion is null) + return ImmutableArray.Empty; + + var promotionEvidence = await _evidenceStore.GetBySubjectAsync( + promotion.Id, + EvidenceType.Promotion, + ct); + + if (promotionEvidence is null) + return ImmutableArray.Empty; + + return ImmutableArray.Create(promotionEvidence.Id); + } +} +``` + +### ContentBuilder + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Collector; + +public sealed class ContentBuilder +{ + public static ReleaseEvidence BuildReleaseEvidence(Release release) + { + return new ReleaseEvidence + { + ReleaseId = release.Id, + ReleaseName = release.Name, + ManifestDigest = release.ManifestDigest, + FinalizedAt = release.FinalizedAt, + Components = release.Components.Select(c => new ComponentEvidence + { + ComponentId = c.ComponentId, + ComponentName = c.ComponentName, + Digest = c.Digest, + Tag = c.Tag, + SemVer = c.SemVer, + SourceRef = c.Config.GetValueOrDefault("sourceRef"), + SbomDigest = c.Config.GetValueOrDefault("sbomDigest") + }).ToImmutableArray() + }; + } + + public static PromotionEvidence BuildPromotionEvidence( + Promotion promotion, + IReadOnlyList approvals, + IReadOnlyList users) + { + var userLookup = users.ToDictionary(u => u.Id); + + return new PromotionEvidence + { + PromotionId = promotion.Id, + SourceEnvironmentId = promotion.SourceEnvironmentId, + SourceEnvironmentName = promotion.SourceEnvironmentName, + TargetEnvironmentId = promotion.TargetEnvironmentId, + TargetEnvironmentName = promotion.TargetEnvironmentName, + Requester = BuildActorEvidence(promotion.RequestedBy, userLookup), + Approvals = approvals.Select(a => new ApprovalEvidence + { + Approver = BuildActorEvidence(a.UserId, userLookup), + Decision = a.Decision.ToString(), + Comment = a.Comment, + DecidedAt = a.DecidedAt + }).ToImmutableArray(), + RequestedAt = promotion.RequestedAt, + ApprovedAt = promotion.ApprovedAt + }; + } + + public static DeploymentEvidence BuildDeploymentEvidence( + DeploymentJob job, + IReadOnlyList artifacts) + { + return new DeploymentEvidence + { + DeploymentJobId = job.Id, + Strategy = job.Strategy.ToString(), + StartedAt = job.StartedAt, + CompletedAt = job.CompletedAt, + Status = job.Status.ToString(), + Tasks = job.Tasks.Select(t => new TaskEvidence + { + TaskId = t.Id, + TargetId = t.TargetId, + TargetName = t.TargetName, + Status = t.Status.ToString(), + StartedAt = t.StartedAt, + CompletedAt = t.CompletedAt, + Error = t.Error + }).ToImmutableArray(), + Artifacts = artifacts.Select(a => new ArtifactEvidence + { + ArtifactType = a.Type, + Digest = a.Digest, + Location = a.Location + }).ToImmutableArray() + }; + } + + public static DecisionEvidence BuildDecisionEvidence( + DecisionRecord decision, + IReadOnlyList gateResults) + { + return new DecisionEvidence + { + GateResults = gateResults.Select(g => new GateResultEvidence + { + GateName = g.GateName, + GateType = g.GateType, + Passed = g.Passed, + Message = g.Message, + Details = g.Details?.ToImmutableDictionary(), + EvaluatedAt = g.EvaluatedAt + }).ToImmutableArray(), + FreezeCheck = new FreezeCheckEvidence + { + Checked = true, + FreezeActive = decision.FreezeActive, + FreezeReason = decision.FreezeReason, + Overridden = decision.FreezeOverridden, + OverriddenBy = decision.FreezeOverriddenBy is not null + ? new ActorEvidence + { + UserId = decision.FreezeOverriddenBy.Value, + UserName = decision.FreezeOverriddenByName ?? "", + UserEmail = "" + } + : null + }, + SodCheck = new SodCheckEvidence + { + Required = decision.SodRequired, + Satisfied = decision.SodSatisfied, + Violation = decision.SodViolation + }, + FinalDecision = decision.FinalDecision.ToString(), + DecidedAt = decision.DecidedAt + }; + } + + private static ActorEvidence BuildActorEvidence( + Guid userId, + Dictionary userLookup) + { + if (userLookup.TryGetValue(userId, out var user)) + { + return new ActorEvidence + { + UserId = user.Id, + UserName = user.Name, + UserEmail = user.Email, + Groups = user.Groups.ToImmutableArray() + }; + } + + return new ActorEvidence + { + UserId = userId, + UserName = "Unknown", + UserEmail = "", + Groups = ImmutableArray.Empty + }; + } +} +``` + +### IEvidenceStore Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Store; + +public interface IEvidenceStore +{ + Task StoreAsync(EvidencePacket packet, CancellationToken ct = default); + Task GetAsync(Guid packetId, CancellationToken ct = default); + Task GetBySubjectAsync(Guid subjectId, EvidenceType type, CancellationToken ct = default); + Task> ListAsync(EvidenceQueryFilter filter, CancellationToken ct = default); + Task ExistsAsync(Guid packetId, CancellationToken ct = default); +} + +public sealed record EvidenceQueryFilter +{ + public Guid? TenantId { get; init; } + public EvidenceType? Type { get; init; } + public DateTimeOffset? FromDate { get; init; } + public DateTimeOffset? ToDate { get; init; } + public int Limit { get; init; } = 100; + public int Offset { get; init; } = 0; +} +``` + +--- + +## Acceptance Criteria + +- [ ] Collect release evidence with all components +- [ ] Collect promotion evidence with approvals +- [ ] Collect deployment evidence with all tasks +- [ ] Collect decision evidence with gate results +- [ ] Build comprehensive evidence packets +- [ ] Track evidence dependencies +- [ ] Store evidence in append-only store +- [ ] Query evidence by subject +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 106_005 Decision Engine | Internal | TODO | +| 107_001 Deploy Orchestrator | Internal | TODO | +| 104_003 Release Manager | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IEvidenceCollector | TODO | | +| EvidenceCollector | TODO | | +| ContentBuilder | TODO | | +| EvidencePacket model | TODO | | +| ReleaseEvidenceCollector | TODO | | +| PromotionEvidenceCollector | TODO | | +| DeploymentEvidenceCollector | TODO | | +| DecisionEvidenceCollector | TODO | | +| IEvidenceStore | TODO | | +| EvidenceStore | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_109_002_RELEVI_evidence_signer.md b/docs/implplan/SPRINT_20260110_109_002_RELEVI_evidence_signer.md new file mode 100644 index 000000000..a300817e6 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_109_002_RELEVI_evidence_signer.md @@ -0,0 +1,626 @@ +# SPRINT: Evidence Signer + +> **Sprint ID:** 109_002 +> **Module:** RELEVI +> **Phase:** 9 - Evidence & Audit +> **Status:** TODO +> **Parent:** [109_000_INDEX](SPRINT_20260110_109_000_INDEX_evidence_audit.md) + +--- + +## Overview + +Implement the Evidence Signer for creating cryptographically signed, tamper-proof evidence packets. + +### Objectives + +- Canonicalize JSON using RFC 8785 +- Hash evidence content with SHA-256 +- Sign with RS256 or ES256 algorithms +- Verify signatures on demand +- Key rotation support + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Evidence/ +│ └── Signing/ +│ ├── IEvidenceSigner.cs +│ ├── EvidenceSigner.cs +│ ├── CanonicalJsonSerializer.cs +│ ├── SigningKeyProvider.cs +│ ├── SignedEvidencePacket.cs +│ └── Algorithms/ +│ ├── Rs256Signer.cs +│ └── Es256Signer.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### IEvidenceSigner Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Signing; + +public interface IEvidenceSigner +{ + Task SignAsync( + EvidencePacket packet, + CancellationToken ct = default); + + Task VerifyAsync( + SignedEvidencePacket signedPacket, + CancellationToken ct = default); + + Task VerifyWithDetailsAsync( + SignedEvidencePacket signedPacket, + CancellationToken ct = default); +} + +public sealed record SignedEvidencePacket +{ + public required Guid Id { get; init; } + public required EvidencePacket Content { get; init; } + public required string ContentHash { get; init; } + public required string Signature { get; init; } + public required string SignatureAlgorithm { get; init; } + public required string SignerKeyRef { get; init; } + public required DateTimeOffset SignedAt { get; init; } +} + +public sealed record VerificationResult +{ + public required bool IsValid { get; init; } + public required bool SignatureValid { get; init; } + public required bool ContentHashValid { get; init; } + public required bool KeyValid { get; init; } + public string? Error { get; init; } + public DateTimeOffset VerifiedAt { get; init; } +} +``` + +### CanonicalJsonSerializer + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Signing; + +/// +/// RFC 8785 (JCS) compliant JSON canonicalizer. +/// +public static class CanonicalJsonSerializer +{ + public static string Serialize(object value) + { + // Convert to JsonElement for processing + var json = JsonSerializer.Serialize(value, new JsonSerializerOptions + { + PropertyNamingPolicy = null, // Preserve property names + WriteIndented = false + }); + + var element = JsonDocument.Parse(json).RootElement; + return Canonicalize(element); + } + + public static string Serialize(EvidencePacket packet) + { + // Use explicit ordering for evidence packets + var orderedContent = new SortedDictionary + { + ["id"] = packet.Id.ToString(), + ["tenantId"] = packet.TenantId.ToString(), + ["type"] = packet.Type.ToString(), + ["subjectId"] = packet.SubjectId.ToString(), + ["subjectType"] = packet.SubjectType, + ["content"] = SerializeContent(packet.Content), + ["dependsOn"] = packet.DependsOn.Select(d => d.ToString()).ToArray(), + ["collectedAt"] = FormatTimestamp(packet.CollectedAt), + ["collectorVersion"] = packet.CollectorVersion + }; + + return SerializeOrdered(orderedContent); + } + + private static string Canonicalize(JsonElement element) + { + return element.ValueKind switch + { + JsonValueKind.Object => CanonicalizeObject(element), + JsonValueKind.Array => CanonicalizeArray(element), + JsonValueKind.String => CanonicalizeString(element), + JsonValueKind.Number => CanonicalizeNumber(element), + JsonValueKind.True => "true", + JsonValueKind.False => "false", + JsonValueKind.Null => "null", + _ => throw new InvalidOperationException($"Unsupported JSON type: {element.ValueKind}") + }; + } + + private static string CanonicalizeObject(JsonElement element) + { + // RFC 8785: Sort properties by Unicode code point order + var properties = element.EnumerateObject() + .OrderBy(p => p.Name, StringComparer.Ordinal) + .Select(p => $"\"{EscapeString(p.Name)}\":{Canonicalize(p.Value)}"); + + return "{" + string.Join(",", properties) + "}"; + } + + private static string CanonicalizeArray(JsonElement element) + { + var items = element.EnumerateArray() + .Select(Canonicalize); + + return "[" + string.Join(",", items) + "]"; + } + + private static string CanonicalizeString(JsonElement element) + { + return "\"" + EscapeString(element.GetString() ?? "") + "\""; + } + + private static string CanonicalizeNumber(JsonElement element) + { + // RFC 8785: Numbers are serialized without exponent notation + // and without trailing zeros + if (element.TryGetInt64(out var longValue)) + { + return longValue.ToString(CultureInfo.InvariantCulture); + } + + if (element.TryGetDouble(out var doubleValue)) + { + // Format without exponent, minimal precision + return FormatDouble(doubleValue); + } + + return element.GetRawText(); + } + + private static string FormatDouble(double value) + { + if (double.IsNaN(value) || double.IsInfinity(value)) + { + throw new InvalidOperationException("NaN and Infinity not allowed in canonical JSON"); + } + + // Use G17 for full precision, then normalize + var str = value.ToString("G17", CultureInfo.InvariantCulture); + + // Remove exponent notation if present + if (str.Contains('E') || str.Contains('e')) + { + var d = double.Parse(str, CultureInfo.InvariantCulture); + str = d.ToString("F15", CultureInfo.InvariantCulture).TrimEnd('0').TrimEnd('.'); + } + + return str; + } + + private static string EscapeString(string value) + { + var sb = new StringBuilder(); + + foreach (var c in value) + { + switch (c) + { + case '"': sb.Append("\\\""); break; + case '\\': sb.Append("\\\\"); break; + case '\b': sb.Append("\\b"); break; + case '\f': sb.Append("\\f"); break; + case '\n': sb.Append("\\n"); break; + case '\r': sb.Append("\\r"); break; + case '\t': sb.Append("\\t"); break; + default: + if (c < 0x20) + { + sb.Append($"\\u{(int)c:x4}"); + } + else + { + sb.Append(c); + } + break; + } + } + + return sb.ToString(); + } + + private static string FormatTimestamp(DateTimeOffset timestamp) + { + return timestamp.ToUniversalTime().ToString("yyyy-MM-ddTHH:mm:ss.fffZ", CultureInfo.InvariantCulture); + } + + private static object SerializeContent(EvidenceContent content) + { + // Serialize with sorted keys + return new SortedDictionary + { + ["decision"] = content.Decision, + ["deployment"] = content.Deployment, + ["metadata"] = content.Metadata, + ["promotion"] = content.Promotion, + ["release"] = content.Release + }; + } + + private static string SerializeOrdered(SortedDictionary dict) + { + return JsonSerializer.Serialize(dict, new JsonSerializerOptions + { + PropertyNamingPolicy = null, + WriteIndented = false, + DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull + }); + } +} +``` + +### EvidenceSigner Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Signing; + +public sealed class EvidenceSigner : IEvidenceSigner +{ + private readonly ISigningKeyProvider _keyProvider; + private readonly ISignedEvidenceStore _signedStore; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public async Task SignAsync( + EvidencePacket packet, + CancellationToken ct = default) + { + _logger.LogDebug("Signing evidence packet {PacketId}", packet.Id); + + // Get signing key + var key = await _keyProvider.GetCurrentSigningKeyAsync(ct); + + // Canonicalize content + var canonicalJson = CanonicalJsonSerializer.Serialize(packet); + + // Compute content hash + var contentHashBytes = SHA256.HashData(Encoding.UTF8.GetBytes(canonicalJson)); + var contentHash = $"sha256:{Convert.ToHexString(contentHashBytes).ToLowerInvariant()}"; + + // Sign the hash + var signatureBytes = await SignHashAsync(key, contentHashBytes, ct); + var signature = Convert.ToBase64String(signatureBytes); + + var signedPacket = new SignedEvidencePacket + { + Id = packet.Id, + Content = packet, + ContentHash = contentHash, + Signature = signature, + SignatureAlgorithm = key.Algorithm, + SignerKeyRef = key.KeyRef, + SignedAt = _timeProvider.GetUtcNow() + }; + + // Store signed packet + await _signedStore.StoreAsync(signedPacket, ct); + + _logger.LogInformation( + "Signed evidence packet {PacketId} with key {KeyRef}", + packet.Id, + key.KeyRef); + + return signedPacket; + } + + public async Task VerifyAsync( + SignedEvidencePacket signedPacket, + CancellationToken ct = default) + { + var result = await VerifyWithDetailsAsync(signedPacket, ct); + return result.IsValid; + } + + public async Task VerifyWithDetailsAsync( + SignedEvidencePacket signedPacket, + CancellationToken ct = default) + { + _logger.LogDebug("Verifying evidence packet {PacketId}", signedPacket.Id); + + try + { + // Get the signing key + var key = await _keyProvider.GetKeyByRefAsync(signedPacket.SignerKeyRef, ct); + if (key is null) + { + return new VerificationResult + { + IsValid = false, + SignatureValid = false, + ContentHashValid = false, + KeyValid = false, + Error = $"Signing key not found: {signedPacket.SignerKeyRef}", + VerifiedAt = _timeProvider.GetUtcNow() + }; + } + + // Verify content hash + var canonicalJson = CanonicalJsonSerializer.Serialize(signedPacket.Content); + var computedHashBytes = SHA256.HashData(Encoding.UTF8.GetBytes(canonicalJson)); + var computedHash = $"sha256:{Convert.ToHexString(computedHashBytes).ToLowerInvariant()}"; + + var contentHashValid = signedPacket.ContentHash == computedHash; + + if (!contentHashValid) + { + _logger.LogWarning( + "Content hash mismatch for packet {PacketId}: expected {Expected}, got {Actual}", + signedPacket.Id, + signedPacket.ContentHash, + computedHash); + + return new VerificationResult + { + IsValid = false, + SignatureValid = false, + ContentHashValid = false, + KeyValid = true, + Error = "Content hash mismatch - evidence may have been tampered with", + VerifiedAt = _timeProvider.GetUtcNow() + }; + } + + // Verify signature + var signatureBytes = Convert.FromBase64String(signedPacket.Signature); + var signatureValid = await VerifySignatureAsync(key, computedHashBytes, signatureBytes, ct); + + if (!signatureValid) + { + _logger.LogWarning( + "Signature verification failed for packet {PacketId}", + signedPacket.Id); + + return new VerificationResult + { + IsValid = false, + SignatureValid = false, + ContentHashValid = true, + KeyValid = true, + Error = "Signature verification failed", + VerifiedAt = _timeProvider.GetUtcNow() + }; + } + + _logger.LogDebug("Evidence packet {PacketId} verified successfully", signedPacket.Id); + + return new VerificationResult + { + IsValid = true, + SignatureValid = true, + ContentHashValid = true, + KeyValid = true, + VerifiedAt = _timeProvider.GetUtcNow() + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "Error verifying evidence packet {PacketId}", signedPacket.Id); + + return new VerificationResult + { + IsValid = false, + SignatureValid = false, + ContentHashValid = false, + KeyValid = false, + Error = ex.Message, + VerifiedAt = _timeProvider.GetUtcNow() + }; + } + } + + private async Task SignHashAsync( + SigningKey key, + byte[] hash, + CancellationToken ct) + { + return key.Algorithm switch + { + "RS256" => await SignRs256Async(key, hash, ct), + "ES256" => await SignEs256Async(key, hash, ct), + _ => throw new UnsupportedAlgorithmException(key.Algorithm) + }; + } + + private Task SignRs256Async(SigningKey key, byte[] hash, CancellationToken ct) + { + using var rsa = RSA.Create(); + rsa.ImportFromPem(key.PrivateKey); + + var signature = rsa.SignHash(hash, HashAlgorithmName.SHA256, RSASignaturePadding.Pkcs1); + return Task.FromResult(signature); + } + + private Task SignEs256Async(SigningKey key, byte[] hash, CancellationToken ct) + { + using var ecdsa = ECDsa.Create(); + ecdsa.ImportFromPem(key.PrivateKey); + + var signature = ecdsa.SignHash(hash); + return Task.FromResult(signature); + } + + private Task VerifySignatureAsync( + SigningKey key, + byte[] hash, + byte[] signature, + CancellationToken ct) + { + return key.Algorithm switch + { + "RS256" => Task.FromResult(VerifyRs256(key, hash, signature)), + "ES256" => Task.FromResult(VerifyEs256(key, hash, signature)), + _ => throw new UnsupportedAlgorithmException(key.Algorithm) + }; + } + + private static bool VerifyRs256(SigningKey key, byte[] hash, byte[] signature) + { + using var rsa = RSA.Create(); + rsa.ImportFromPem(key.PublicKey); + + return rsa.VerifyHash(hash, signature, HashAlgorithmName.SHA256, RSASignaturePadding.Pkcs1); + } + + private static bool VerifyEs256(SigningKey key, byte[] hash, byte[] signature) + { + using var ecdsa = ECDsa.Create(); + ecdsa.ImportFromPem(key.PublicKey); + + return ecdsa.VerifyHash(hash, signature); + } +} +``` + +### SigningKeyProvider + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Signing; + +public interface ISigningKeyProvider +{ + Task GetCurrentSigningKeyAsync(CancellationToken ct = default); + Task GetKeyByRefAsync(string keyRef, CancellationToken ct = default); + Task> ListKeysAsync(CancellationToken ct = default); +} + +public sealed class SigningKeyProvider : ISigningKeyProvider +{ + private readonly IKeyVaultClient _keyVault; + private readonly SigningConfiguration _config; + private readonly ILogger _logger; + + public async Task GetCurrentSigningKeyAsync(CancellationToken ct = default) + { + var keyRef = _config.CurrentKeyRef; + var key = await GetKeyByRefAsync(keyRef, ct) + ?? throw new SigningKeyNotFoundException(keyRef); + + return key; + } + + public async Task GetKeyByRefAsync(string keyRef, CancellationToken ct = default) + { + try + { + var vaultKey = await _keyVault.GetKeyAsync(keyRef, ct); + if (vaultKey is null) + return null; + + return new SigningKey + { + KeyRef = keyRef, + Algorithm = vaultKey.Algorithm, + PublicKey = vaultKey.PublicKey, + PrivateKey = vaultKey.PrivateKey, + CreatedAt = vaultKey.CreatedAt, + ExpiresAt = vaultKey.ExpiresAt + }; + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Failed to get signing key {KeyRef}", keyRef); + return null; + } + } + + public async Task> ListKeysAsync(CancellationToken ct = default) + { + var keys = await _keyVault.ListKeysAsync(_config.KeyPrefix, ct); + + return keys.Select(k => new SigningKeyInfo + { + KeyRef = k.KeyRef, + Algorithm = k.Algorithm, + CreatedAt = k.CreatedAt, + ExpiresAt = k.ExpiresAt, + IsCurrent = k.KeyRef == _config.CurrentKeyRef + }).ToList().AsReadOnly(); + } +} + +public sealed record SigningKey +{ + public required string KeyRef { get; init; } + public required string Algorithm { get; init; } + public required string PublicKey { get; init; } + public required string PrivateKey { get; init; } + public required DateTimeOffset CreatedAt { get; init; } + public DateTimeOffset? ExpiresAt { get; init; } +} + +public sealed record SigningKeyInfo +{ + public required string KeyRef { get; init; } + public required string Algorithm { get; init; } + public required DateTimeOffset CreatedAt { get; init; } + public DateTimeOffset? ExpiresAt { get; init; } + public bool IsCurrent { get; init; } +} + +public sealed class SigningConfiguration +{ + public required string CurrentKeyRef { get; set; } + public string KeyPrefix { get; set; } = "stella/signing/"; + public string DefaultAlgorithm { get; set; } = "RS256"; +} +``` + +--- + +## Acceptance Criteria + +- [ ] Canonicalize JSON per RFC 8785 +- [ ] Hash content with SHA-256 +- [ ] Sign with RS256 algorithm +- [ ] Sign with ES256 algorithm +- [ ] Verify signatures +- [ ] Detect content tampering +- [ ] Support key rotation +- [ ] Store signed packets immutably +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 109_001 Evidence Collector | Internal | TODO | +| Signer service | Internal | Existing | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IEvidenceSigner | TODO | | +| EvidenceSigner | TODO | | +| CanonicalJsonSerializer | TODO | | +| SigningKeyProvider | TODO | | +| Rs256Signer | TODO | | +| Es256Signer | TODO | | +| SignedEvidencePacket | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_109_003_RELEVI_version_sticker.md b/docs/implplan/SPRINT_20260110_109_003_RELEVI_version_sticker.md new file mode 100644 index 000000000..033d4cdff --- /dev/null +++ b/docs/implplan/SPRINT_20260110_109_003_RELEVI_version_sticker.md @@ -0,0 +1,538 @@ +# SPRINT: Version Sticker Writer + +> **Sprint ID:** 109_003 +> **Module:** RELEVI +> **Phase:** 9 - Evidence & Audit +> **Status:** TODO +> **Parent:** [109_000_INDEX](SPRINT_20260110_109_000_INDEX_evidence_audit.md) + +--- + +## Overview + +Implement the Version Sticker Writer for recording deployment state as stella.version.json files on each target. + +### Objectives + +- Generate version sticker content +- Write stickers to targets via agents +- Read stickers from targets for verification +- Track sticker state across deployments + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Evidence/ +│ └── Sticker/ +│ ├── IVersionStickerWriter.cs +│ ├── VersionStickerWriter.cs +│ ├── VersionStickerGenerator.cs +│ ├── StickerAgentTask.cs +│ └── Models/ +│ ├── VersionSticker.cs +│ └── StickerWriteResult.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### VersionSticker Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Sticker.Models; + +public sealed record VersionSticker +{ + public required string SchemaVersion { get; init; } = "1.0"; + public required string Release { get; init; } + public required Guid ReleaseId { get; init; } + public required Guid DeploymentId { get; init; } + public required Guid EnvironmentId { get; init; } + public required string EnvironmentName { get; init; } + public required Guid TargetId { get; init; } + public required string TargetName { get; init; } + public required DateTimeOffset DeployedAt { get; init; } + public required ImmutableArray Components { get; init; } + public required Guid EvidenceId { get; init; } + public string? EvidenceDigest { get; init; } + public required StickerMetadata Metadata { get; init; } +} + +public sealed record ComponentSticker +{ + public required string Name { get; init; } + public required string Digest { get; init; } + public string? Tag { get; init; } + public string? SemVer { get; init; } + public string? Image { get; init; } +} + +public sealed record StickerMetadata +{ + public required string Platform { get; init; } = "stella-ops"; + public required string PlatformVersion { get; init; } + public required string DeploymentStrategy { get; init; } + public Guid? PromotionId { get; init; } + public string? SourceEnvironment { get; init; } + public ImmutableDictionary CustomLabels { get; init; } = ImmutableDictionary.Empty; +} +``` + +### IVersionStickerWriter Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Sticker; + +public interface IVersionStickerWriter +{ + Task WriteAsync( + Guid deploymentTaskId, + VersionSticker sticker, + CancellationToken ct = default); + + Task ReadAsync( + Guid targetId, + CancellationToken ct = default); + + Task> WriteAllAsync( + Guid deploymentJobId, + CancellationToken ct = default); + + Task ValidateAsync( + Guid targetId, + Guid expectedReleaseId, + CancellationToken ct = default); +} + +public sealed record StickerWriteResult +{ + public required Guid TargetId { get; init; } + public required string TargetName { get; init; } + public required bool Success { get; init; } + public string? Error { get; init; } + public string? StickerPath { get; init; } + public DateTimeOffset WrittenAt { get; init; } +} + +public sealed record StickerValidationResult +{ + public required Guid TargetId { get; init; } + public required bool Valid { get; init; } + public required bool StickerExists { get; init; } + public required bool ReleaseMatches { get; init; } + public required bool ComponentsMatch { get; init; } + public Guid? ActualReleaseId { get; init; } + public IReadOnlyList? MismatchedComponents { get; init; } +} +``` + +### VersionStickerGenerator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Sticker; + +public sealed class VersionStickerGenerator +{ + private readonly IReleaseManager _releaseManager; + private readonly IDeploymentJobStore _jobStore; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + private const string PlatformVersion = "1.0.0"; + + public async Task GenerateAsync( + DeploymentJob job, + DeploymentTask task, + Guid evidenceId, + CancellationToken ct = default) + { + var release = await _releaseManager.GetAsync(job.ReleaseId, ct) + ?? throw new ReleaseNotFoundException(job.ReleaseId); + + var components = release.Components.Select(c => new ComponentSticker + { + Name = c.ComponentName, + Digest = c.Digest, + Tag = c.Tag, + SemVer = c.SemVer, + Image = c.Config.GetValueOrDefault("image") + }).ToImmutableArray(); + + var sticker = new VersionSticker + { + SchemaVersion = "1.0", + Release = release.Name, + ReleaseId = release.Id, + DeploymentId = job.Id, + EnvironmentId = job.EnvironmentId, + EnvironmentName = job.EnvironmentName, + TargetId = task.TargetId, + TargetName = task.TargetName, + DeployedAt = _timeProvider.GetUtcNow(), + Components = components, + EvidenceId = evidenceId, + Metadata = new StickerMetadata + { + Platform = "stella-ops", + PlatformVersion = PlatformVersion, + DeploymentStrategy = job.Strategy.ToString(), + PromotionId = job.PromotionId + } + }; + + _logger.LogDebug( + "Generated version sticker for release {Release} on target {Target}", + release.Name, + task.TargetName); + + return sticker; + } + + public string Serialize(VersionSticker sticker) + { + return JsonSerializer.Serialize(sticker, new JsonSerializerOptions + { + WriteIndented = true, + PropertyNamingPolicy = JsonNamingPolicy.CamelCase, + DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull + }); + } + + public VersionSticker? Deserialize(string json) + { + try + { + return JsonSerializer.Deserialize(json, new JsonSerializerOptions + { + PropertyNamingPolicy = JsonNamingPolicy.CamelCase + }); + } + catch (JsonException ex) + { + _logger.LogWarning(ex, "Failed to deserialize version sticker"); + return null; + } + } +} +``` + +### VersionStickerWriter Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Sticker; + +public sealed class VersionStickerWriter : IVersionStickerWriter +{ + private readonly IDeploymentJobStore _jobStore; + private readonly ITargetExecutor _targetExecutor; + private readonly VersionStickerGenerator _stickerGenerator; + private readonly IEvidenceCollector _evidenceCollector; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + private const string StickerFileName = "stella.version.json"; + + public async Task WriteAsync( + Guid deploymentTaskId, + VersionSticker sticker, + CancellationToken ct = default) + { + _logger.LogDebug( + "Writing version sticker to target {Target}", + sticker.TargetName); + + try + { + var stickerJson = _stickerGenerator.Serialize(sticker); + + // Create agent task to write sticker + var agentTask = new StickerAgentTask + { + TargetId = sticker.TargetId, + FileName = StickerFileName, + Content = stickerJson, + Location = GetStickerLocation(sticker) + }; + + var result = await _targetExecutor.ExecuteStickerWriteAsync(agentTask, ct); + + if (result.Success) + { + _logger.LogInformation( + "Wrote version sticker to target {Target} at {Path}", + sticker.TargetName, + result.StickerPath); + } + else + { + _logger.LogWarning( + "Failed to write version sticker to target {Target}: {Error}", + sticker.TargetName, + result.Error); + } + + return new StickerWriteResult + { + TargetId = sticker.TargetId, + TargetName = sticker.TargetName, + Success = result.Success, + Error = result.Error, + StickerPath = result.StickerPath, + WrittenAt = _timeProvider.GetUtcNow() + }; + } + catch (Exception ex) + { + _logger.LogError(ex, + "Error writing version sticker to target {Target}", + sticker.TargetName); + + return new StickerWriteResult + { + TargetId = sticker.TargetId, + TargetName = sticker.TargetName, + Success = false, + Error = ex.Message, + WrittenAt = _timeProvider.GetUtcNow() + }; + } + } + + public async Task> WriteAllAsync( + Guid deploymentJobId, + CancellationToken ct = default) + { + var job = await _jobStore.GetAsync(deploymentJobId, ct) + ?? throw new DeploymentJobNotFoundException(deploymentJobId); + + // Collect evidence first + var evidence = await _evidenceCollector.CollectDeploymentEvidenceAsync(deploymentJobId, ct); + + var results = new List(); + + foreach (var task in job.Tasks) + { + if (task.Status != DeploymentTaskStatus.Completed) + { + results.Add(new StickerWriteResult + { + TargetId = task.TargetId, + TargetName = task.TargetName, + Success = false, + Error = $"Task not completed (status: {task.Status})", + WrittenAt = _timeProvider.GetUtcNow() + }); + continue; + } + + var sticker = await _stickerGenerator.GenerateAsync(job, task, evidence.Id, ct); + var result = await WriteAsync(task.Id, sticker, ct); + results.Add(result); + } + + _logger.LogInformation( + "Wrote version stickers for job {JobId}: {Success}/{Total} succeeded", + deploymentJobId, + results.Count(r => r.Success), + results.Count); + + return results.AsReadOnly(); + } + + public async Task ReadAsync( + Guid targetId, + CancellationToken ct = default) + { + try + { + var agentTask = new StickerReadAgentTask + { + TargetId = targetId, + FileName = StickerFileName + }; + + var result = await _targetExecutor.ExecuteStickerReadAsync(agentTask, ct); + + if (!result.Success || string.IsNullOrEmpty(result.Content)) + { + _logger.LogDebug("No version sticker found on target {TargetId}", targetId); + return null; + } + + return _stickerGenerator.Deserialize(result.Content); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Error reading version sticker from target {TargetId}", targetId); + return null; + } + } + + public async Task ValidateAsync( + Guid targetId, + Guid expectedReleaseId, + CancellationToken ct = default) + { + var sticker = await ReadAsync(targetId, ct); + + if (sticker is null) + { + return new StickerValidationResult + { + TargetId = targetId, + Valid = false, + StickerExists = false, + ReleaseMatches = false, + ComponentsMatch = false + }; + } + + var releaseMatches = sticker.ReleaseId == expectedReleaseId; + + // If release doesn't match, we can't validate components + if (!releaseMatches) + { + return new StickerValidationResult + { + TargetId = targetId, + Valid = false, + StickerExists = true, + ReleaseMatches = false, + ComponentsMatch = false, + ActualReleaseId = sticker.ReleaseId + }; + } + + // Validate components against actual running containers + var validation = await ValidateComponentsAsync(targetId, sticker.Components, ct); + + return new StickerValidationResult + { + TargetId = targetId, + Valid = validation.AllMatch, + StickerExists = true, + ReleaseMatches = true, + ComponentsMatch = validation.AllMatch, + ActualReleaseId = sticker.ReleaseId, + MismatchedComponents = validation.Mismatches + }; + } + + private async Task<(bool AllMatch, IReadOnlyList Mismatches)> ValidateComponentsAsync( + Guid targetId, + ImmutableArray expectedComponents, + CancellationToken ct) + { + var mismatches = new List(); + + // Query actual container digests from target + var actualContainers = await _targetExecutor.GetRunningContainersAsync(targetId, ct); + + foreach (var expected in expectedComponents) + { + var actual = actualContainers.FirstOrDefault(c => c.Name == expected.Name); + + if (actual is null) + { + mismatches.Add($"{expected.Name}: not running"); + } + else if (actual.Digest != expected.Digest) + { + mismatches.Add($"{expected.Name}: digest mismatch (expected {expected.Digest[..16]}, got {actual.Digest[..16]})"); + } + } + + return (mismatches.Count == 0, mismatches.AsReadOnly()); + } + + private static string GetStickerLocation(VersionSticker sticker) + { + // Default to /var/lib/stella-agent// + return $"/var/lib/stella-agent/{sticker.DeploymentId}/"; + } +} +``` + +### StickerAgentTask + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Sticker; + +public sealed record StickerAgentTask +{ + public required Guid TargetId { get; init; } + public required string FileName { get; init; } + public required string Content { get; init; } + public required string Location { get; init; } +} + +public sealed record StickerReadAgentTask +{ + public required Guid TargetId { get; init; } + public required string FileName { get; init; } + public string? Location { get; init; } +} + +public sealed record StickerWriteAgentResult +{ + public required bool Success { get; init; } + public string? Error { get; init; } + public string? StickerPath { get; init; } +} + +public sealed record StickerReadAgentResult +{ + public required bool Success { get; init; } + public string? Content { get; init; } + public string? Error { get; init; } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Generate version sticker with all components +- [ ] Serialize sticker as valid JSON +- [ ] Write sticker to target via agent +- [ ] Write stickers for all completed tasks +- [ ] Read sticker from target +- [ ] Validate sticker against expected release +- [ ] Validate components against running containers +- [ ] Detect digest mismatches +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 107_002 Target Executor | Internal | TODO | +| 109_001 Evidence Collector | Internal | TODO | +| 108_001 Agent Core Runtime | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IVersionStickerWriter | TODO | | +| VersionStickerWriter | TODO | | +| VersionStickerGenerator | TODO | | +| VersionSticker model | TODO | | +| StickerAgentTask | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_109_004_RELEVI_audit_exporter.md b/docs/implplan/SPRINT_20260110_109_004_RELEVI_audit_exporter.md new file mode 100644 index 000000000..8739750da --- /dev/null +++ b/docs/implplan/SPRINT_20260110_109_004_RELEVI_audit_exporter.md @@ -0,0 +1,706 @@ +# SPRINT: Audit Exporter + +> **Sprint ID:** 109_004 +> **Module:** RELEVI +> **Phase:** 9 - Evidence & Audit +> **Status:** TODO +> **Parent:** [109_000_INDEX](SPRINT_20260110_109_000_INDEX_evidence_audit.md) + +--- + +## Overview + +Implement the Audit Exporter for generating compliance reports in multiple formats from signed evidence packets. + +### Objectives + +- Export to JSON for machine processing +- Export to PDF for human-readable reports +- Export to CSV for spreadsheet analysis +- Export to SLSA provenance format +- Batch export for audit periods + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Evidence/ +│ └── Export/ +│ ├── IAuditExporter.cs +│ ├── AuditExporter.cs +│ ├── Exporters/ +│ │ ├── JsonExporter.cs +│ │ ├── PdfExporter.cs +│ │ ├── CsvExporter.cs +│ │ └── SlsaExporter.cs +│ └── Models/ +│ ├── AuditExportRequest.cs +│ └── ExportFormat.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### IAuditExporter Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Export; + +public interface IAuditExporter +{ + Task ExportAsync( + AuditExportRequest request, + CancellationToken ct = default); + + IReadOnlyList SupportedFormats { get; } + + Task ExportToStreamAsync( + AuditExportRequest request, + CancellationToken ct = default); +} + +public sealed record AuditExportRequest +{ + public required ExportFormat Format { get; init; } + public Guid? TenantId { get; init; } + public Guid? EnvironmentId { get; init; } + public DateTimeOffset? FromDate { get; init; } + public DateTimeOffset? ToDate { get; init; } + public IReadOnlyList? EvidenceIds { get; init; } + public IReadOnlyList? Types { get; init; } + public bool IncludeVerification { get; init; } = true; + public bool IncludeSignatures { get; init; } = false; + public string? ReportTitle { get; init; } +} + +public enum ExportFormat +{ + Json, + Pdf, + Csv, + Slsa +} + +public sealed record ExportResult +{ + public required bool Success { get; init; } + public required ExportFormat Format { get; init; } + public required string FileName { get; init; } + public required string ContentType { get; init; } + public required long SizeBytes { get; init; } + public required int EvidenceCount { get; init; } + public required DateTimeOffset GeneratedAt { get; init; } + public Stream? Content { get; init; } + public string? Error { get; init; } +} +``` + +### AuditExporter Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Export; + +public sealed class AuditExporter : IAuditExporter +{ + private readonly ISignedEvidenceStore _evidenceStore; + private readonly IEvidenceSigner _evidenceSigner; + private readonly IEnumerable _exporters; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public IReadOnlyList SupportedFormats => + _exporters.Select(e => e.Format).ToList().AsReadOnly(); + + public async Task ExportAsync( + AuditExportRequest request, + CancellationToken ct = default) + { + _logger.LogInformation( + "Starting audit export: format={Format}, from={From}, to={To}", + request.Format, + request.FromDate, + request.ToDate); + + var exporter = _exporters.FirstOrDefault(e => e.Format == request.Format) + ?? throw new UnsupportedExportFormatException(request.Format); + + try + { + // Query evidence + var evidence = await QueryEvidenceAsync(request, ct); + + if (evidence.Count == 0) + { + return new ExportResult + { + Success = false, + Format = request.Format, + FileName = "", + ContentType = "", + SizeBytes = 0, + EvidenceCount = 0, + GeneratedAt = _timeProvider.GetUtcNow(), + Error = "No evidence found matching the criteria" + }; + } + + // Verify evidence if requested + var verificationResults = request.IncludeVerification + ? await VerifyAllAsync(evidence, ct) + : null; + + // Export + var stream = await exporter.ExportAsync(evidence, verificationResults, request, ct); + + var fileName = GenerateFileName(request); + var contentType = exporter.ContentType; + + _logger.LogInformation( + "Audit export completed: {Count} evidence packets, {Size} bytes", + evidence.Count, + stream.Length); + + return new ExportResult + { + Success = true, + Format = request.Format, + FileName = fileName, + ContentType = contentType, + SizeBytes = stream.Length, + EvidenceCount = evidence.Count, + GeneratedAt = _timeProvider.GetUtcNow(), + Content = stream + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "Audit export failed"); + + return new ExportResult + { + Success = false, + Format = request.Format, + FileName = "", + ContentType = "", + SizeBytes = 0, + EvidenceCount = 0, + GeneratedAt = _timeProvider.GetUtcNow(), + Error = ex.Message + }; + } + } + + public async Task ExportToStreamAsync( + AuditExportRequest request, + CancellationToken ct = default) + { + var result = await ExportAsync(request, ct); + + if (!result.Success || result.Content is null) + { + throw new ExportFailedException(result.Error ?? "Unknown error"); + } + + return result.Content; + } + + private async Task> QueryEvidenceAsync( + AuditExportRequest request, + CancellationToken ct) + { + if (request.EvidenceIds?.Count > 0) + { + var packets = new List(); + foreach (var id in request.EvidenceIds) + { + var packet = await _evidenceStore.GetAsync(id, ct); + if (packet is not null) + { + packets.Add(packet); + } + } + return packets.AsReadOnly(); + } + + var filter = new SignedEvidenceQueryFilter + { + TenantId = request.TenantId, + FromDate = request.FromDate, + ToDate = request.ToDate, + Types = request.Types + }; + + return await _evidenceStore.ListAsync(filter, ct); + } + + private async Task> VerifyAllAsync( + IReadOnlyList evidence, + CancellationToken ct) + { + var results = new Dictionary(); + + foreach (var packet in evidence) + { + var result = await _evidenceSigner.VerifyWithDetailsAsync(packet, ct); + results[packet.Id] = result; + } + + return results.AsReadOnly(); + } + + private string GenerateFileName(AuditExportRequest request) + { + var timestamp = _timeProvider.GetUtcNow().ToString("yyyyMMdd-HHmmss", CultureInfo.InvariantCulture); + var extension = request.Format switch + { + ExportFormat.Json => "json", + ExportFormat.Pdf => "pdf", + ExportFormat.Csv => "csv", + ExportFormat.Slsa => "slsa.json", + _ => "dat" + }; + + return $"audit-export-{timestamp}.{extension}"; + } +} +``` + +### IFormatExporter Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Export; + +public interface IFormatExporter +{ + ExportFormat Format { get; } + string ContentType { get; } + + Task ExportAsync( + IReadOnlyList evidence, + IReadOnlyDictionary? verificationResults, + AuditExportRequest request, + CancellationToken ct = default); +} +``` + +### JsonExporter + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Export.Exporters; + +public sealed class JsonExporter : IFormatExporter +{ + public ExportFormat Format => ExportFormat.Json; + public string ContentType => "application/json"; + + public async Task ExportAsync( + IReadOnlyList evidence, + IReadOnlyDictionary? verificationResults, + AuditExportRequest request, + CancellationToken ct = default) + { + var export = new JsonAuditExport + { + SchemaVersion = "1.0", + GeneratedAt = DateTimeOffset.UtcNow.ToString("O"), + ReportTitle = request.ReportTitle ?? "Audit Export", + Query = new QueryInfo + { + FromDate = request.FromDate?.ToString("O"), + ToDate = request.ToDate?.ToString("O"), + TenantId = request.TenantId?.ToString(), + EnvironmentId = request.EnvironmentId?.ToString(), + Types = request.Types?.Select(t => t.ToString()).ToList() + }, + Summary = new ExportSummary + { + TotalEvidence = evidence.Count, + ByType = evidence.GroupBy(e => e.Content.Type) + .ToDictionary(g => g.Key.ToString(), g => g.Count()), + VerificationSummary = verificationResults is not null + ? new VerificationSummary + { + TotalVerified = verificationResults.Count, + AllValid = verificationResults.Values.All(v => v.IsValid), + FailedCount = verificationResults.Values.Count(v => !v.IsValid) + } + : null + }, + Evidence = evidence.Select(e => new EvidenceEntry + { + Id = e.Id.ToString(), + Type = e.Content.Type.ToString(), + SubjectId = e.Content.SubjectId.ToString(), + CollectedAt = e.Content.CollectedAt.ToString("O"), + SignedAt = e.SignedAt.ToString("O"), + ContentHash = request.IncludeSignatures ? e.ContentHash : null, + Signature = request.IncludeSignatures ? e.Signature : null, + SignatureAlgorithm = request.IncludeSignatures ? e.SignatureAlgorithm : null, + SignerKeyRef = e.SignerKeyRef, + Verification = verificationResults?.TryGetValue(e.Id, out var v) == true + ? new VerificationEntry + { + IsValid = v.IsValid, + SignatureValid = v.SignatureValid, + ContentHashValid = v.ContentHashValid, + Error = v.Error + } + : null, + Content = e.Content + }).ToList() + }; + + var stream = new MemoryStream(); + await JsonSerializer.SerializeAsync(stream, export, new JsonSerializerOptions + { + WriteIndented = true, + PropertyNamingPolicy = JsonNamingPolicy.CamelCase, + DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull + }, ct); + + stream.Position = 0; + return stream; + } +} + +// JSON export models +public sealed class JsonAuditExport +{ + public required string SchemaVersion { get; init; } + public required string GeneratedAt { get; init; } + public required string ReportTitle { get; init; } + public required QueryInfo Query { get; init; } + public required ExportSummary Summary { get; init; } + public required IReadOnlyList Evidence { get; init; } +} + +public sealed class QueryInfo +{ + public string? FromDate { get; init; } + public string? ToDate { get; init; } + public string? TenantId { get; init; } + public string? EnvironmentId { get; init; } + public IReadOnlyList? Types { get; init; } +} + +public sealed class ExportSummary +{ + public required int TotalEvidence { get; init; } + public required IReadOnlyDictionary ByType { get; init; } + public VerificationSummary? VerificationSummary { get; init; } +} + +public sealed class VerificationSummary +{ + public required int TotalVerified { get; init; } + public required bool AllValid { get; init; } + public required int FailedCount { get; init; } +} + +public sealed class EvidenceEntry +{ + public required string Id { get; init; } + public required string Type { get; init; } + public required string SubjectId { get; init; } + public required string CollectedAt { get; init; } + public required string SignedAt { get; init; } + public string? ContentHash { get; init; } + public string? Signature { get; init; } + public string? SignatureAlgorithm { get; init; } + public required string SignerKeyRef { get; init; } + public VerificationEntry? Verification { get; init; } + public required EvidencePacket Content { get; init; } +} + +public sealed class VerificationEntry +{ + public required bool IsValid { get; init; } + public required bool SignatureValid { get; init; } + public required bool ContentHashValid { get; init; } + public string? Error { get; init; } +} +``` + +### CsvExporter + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Export.Exporters; + +public sealed class CsvExporter : IFormatExporter +{ + public ExportFormat Format => ExportFormat.Csv; + public string ContentType => "text/csv"; + + public Task ExportAsync( + IReadOnlyList evidence, + IReadOnlyDictionary? verificationResults, + AuditExportRequest request, + CancellationToken ct = default) + { + var stream = new MemoryStream(); + using var writer = new StreamWriter(stream, Encoding.UTF8, leaveOpen: true); + + // Write header + writer.WriteLine("EvidenceId,Type,SubjectId,ReleaseName,EnvironmentName,CollectedAt,SignedAt,SignerKeyRef,IsValid,VerificationError"); + + // Write data rows + foreach (var packet in evidence) + { + var verification = verificationResults?.TryGetValue(packet.Id, out var v) == true ? v : null; + + var row = new[] + { + packet.Id.ToString(), + packet.Content.Type.ToString(), + packet.Content.SubjectId.ToString(), + EscapeCsv(packet.Content.Content.Release?.ReleaseName ?? ""), + EscapeCsv(packet.Content.Content.Deployment?.DeploymentJobId.ToString() ?? + packet.Content.Content.Promotion?.TargetEnvironmentName ?? ""), + packet.Content.CollectedAt.ToString("O"), + packet.SignedAt.ToString("O"), + packet.SignerKeyRef, + verification?.IsValid.ToString() ?? "", + EscapeCsv(verification?.Error ?? "") + }; + + writer.WriteLine(string.Join(",", row)); + } + + writer.Flush(); + stream.Position = 0; + + return Task.FromResult(stream); + } + + private static string EscapeCsv(string value) + { + if (string.IsNullOrEmpty(value)) + return ""; + + if (value.Contains(',') || value.Contains('"') || value.Contains('\n')) + { + return $"\"{value.Replace("\"", "\"\"")}\""; + } + + return value; + } +} +``` + +### SlsaExporter + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Export.Exporters; + +public sealed class SlsaExporter : IFormatExporter +{ + public ExportFormat Format => ExportFormat.Slsa; + public string ContentType => "application/vnd.in-toto+json"; + + public async Task ExportAsync( + IReadOnlyList evidence, + IReadOnlyDictionary? verificationResults, + AuditExportRequest request, + CancellationToken ct = default) + { + // Export as SLSA Provenance v1.0 format + var provenances = evidence + .Where(e => e.Content.Type == EvidenceType.Deployment) + .Select(e => BuildSlsaProvenance(e)) + .ToList(); + + var stream = new MemoryStream(); + + // Write as NDJSON (one provenance per line) + using var writer = new StreamWriter(stream, Encoding.UTF8, leaveOpen: true); + + foreach (var provenance in provenances) + { + var json = JsonSerializer.Serialize(provenance, new JsonSerializerOptions + { + PropertyNamingPolicy = JsonNamingPolicy.CamelCase, + DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull + }); + await writer.WriteLineAsync(json); + } + + await writer.FlushAsync(ct); + stream.Position = 0; + + return stream; + } + + private static SlsaProvenance BuildSlsaProvenance(SignedEvidencePacket packet) + { + var deployment = packet.Content.Content.Deployment; + var release = packet.Content.Content.Release; + + return new SlsaProvenance + { + Type = "https://in-toto.io/Statement/v1", + Subject = release?.Components.Select(c => new SlsaSubject + { + Name = c.ComponentName, + Digest = new Dictionary + { + ["sha256"] = c.Digest.Replace("sha256:", "") + } + }).ToList() ?? [], + PredicateType = "https://slsa.dev/provenance/v1", + Predicate = new SlsaPredicate + { + BuildDefinition = new SlsaBuildDefinition + { + BuildType = "https://stella-ops.io/DeploymentProvenanceV1", + ExternalParameters = new Dictionary + { + ["deployment"] = new + { + jobId = deployment?.DeploymentJobId.ToString(), + strategy = deployment?.Strategy, + environment = packet.Content.Content.Promotion?.TargetEnvironmentName + } + }, + InternalParameters = new Dictionary + { + ["evidenceId"] = packet.Id.ToString(), + ["collectedAt"] = packet.Content.CollectedAt.ToString("O") + }, + ResolvedDependencies = release?.Components.Select(c => new SlsaResourceDescriptor + { + Name = c.ComponentName, + Uri = $"oci://{c.ComponentName}@{c.Digest}", + Digest = new Dictionary + { + ["sha256"] = c.Digest.Replace("sha256:", "") + } + }).ToList() ?? [] + }, + RunDetails = new SlsaRunDetails + { + Builder = new SlsaBuilder + { + Id = "https://stella-ops.io/ReleaseOrchestrator", + Version = new Dictionary + { + ["stella-ops"] = packet.Content.CollectorVersion + } + }, + Metadata = new SlsaMetadata + { + InvocationId = packet.Content.SubjectId.ToString(), + StartedOn = deployment?.StartedAt.ToString("O"), + FinishedOn = deployment?.CompletedAt?.ToString("O") + } + } + } + }; + } +} + +// SLSA Provenance models +public sealed class SlsaProvenance +{ + [JsonPropertyName("_type")] + public required string Type { get; init; } + public required IReadOnlyList Subject { get; init; } + public required string PredicateType { get; init; } + public required SlsaPredicate Predicate { get; init; } +} + +public sealed class SlsaSubject +{ + public required string Name { get; init; } + public required IReadOnlyDictionary Digest { get; init; } +} + +public sealed class SlsaPredicate +{ + public required SlsaBuildDefinition BuildDefinition { get; init; } + public required SlsaRunDetails RunDetails { get; init; } +} + +public sealed class SlsaBuildDefinition +{ + public required string BuildType { get; init; } + public required IReadOnlyDictionary ExternalParameters { get; init; } + public required IReadOnlyDictionary InternalParameters { get; init; } + public required IReadOnlyList ResolvedDependencies { get; init; } +} + +public sealed class SlsaResourceDescriptor +{ + public required string Name { get; init; } + public required string Uri { get; init; } + public required IReadOnlyDictionary Digest { get; init; } +} + +public sealed class SlsaRunDetails +{ + public required SlsaBuilder Builder { get; init; } + public required SlsaMetadata Metadata { get; init; } +} + +public sealed class SlsaBuilder +{ + public required string Id { get; init; } + public required IReadOnlyDictionary Version { get; init; } +} + +public sealed class SlsaMetadata +{ + public required string InvocationId { get; init; } + public string? StartedOn { get; init; } + public string? FinishedOn { get; init; } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Export evidence as JSON +- [ ] Export evidence as PDF +- [ ] Export evidence as CSV +- [ ] Export evidence as SLSA provenance +- [ ] Include verification results +- [ ] Filter by date range +- [ ] Filter by evidence type +- [ ] Generate meaningful file names +- [ ] SLSA format compliant with spec +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 109_001 Evidence Collector | Internal | TODO | +| 109_002 Evidence Signer | Internal | TODO | +| QuestPDF | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IAuditExporter | TODO | | +| AuditExporter | TODO | | +| JsonExporter | TODO | | +| PdfExporter | TODO | | +| CsvExporter | TODO | | +| SlsaExporter | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_110_000_INDEX_progressive_delivery.md b/docs/implplan/SPRINT_20260110_110_000_INDEX_progressive_delivery.md new file mode 100644 index 000000000..a63d4def6 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_110_000_INDEX_progressive_delivery.md @@ -0,0 +1,250 @@ +# SPRINT INDEX: Phase 10 - Progressive Delivery + +> **Epic:** Release Orchestrator +> **Phase:** 10 - Progressive Delivery +> **Batch:** 110 +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_release_orchestrator.md) + +--- + +## Overview + +Phase 10 implements Progressive Delivery - A/B releases, canary deployments, and traffic routing for gradual rollouts. + +### Objectives + +- A/B release manager for parallel versions +- Traffic router framework abstraction +- Canary controller for gradual promotion +- Router plugin for Nginx (reference implementation) + +--- + +## Sprint Structure + +| Sprint ID | Title | Module | Status | Dependencies | +|-----------|-------|--------|--------|--------------| +| 110_001 | A/B Release Manager | PROGDL | TODO | 107_005 | +| 110_002 | Traffic Router Framework | PROGDL | TODO | 110_001 | +| 110_003 | Canary Controller | PROGDL | TODO | 110_002 | +| 110_004 | Router Plugin - Nginx | PROGDL | TODO | 110_002 | + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ PROGRESSIVE DELIVERY │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ A/B RELEASE MANAGER (110_001) │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ A/B Release │ │ │ +│ │ │ │ │ │ +│ │ │ Control (current): sha256:abc123 ──► 80% traffic │ │ │ +│ │ │ Treatment (new): sha256:def456 ──► 20% traffic │ │ │ +│ │ │ │ │ │ +│ │ │ Status: active │ │ │ +│ │ │ Decision: pending │ │ │ +│ │ └─────────────────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ TRAFFIC ROUTER FRAMEWORK (110_002) │ │ +│ │ │ │ +│ │ ITrafficRouter │ │ +│ │ ├── SetWeights(control: 80, treatment: 20) │ │ +│ │ ├── SetHeaderRouting(x-canary: true → treatment) │ │ +│ │ ├── SetCookieRouting(ab_group: B → treatment) │ │ +│ │ └── GetCurrentRouting() → RoutingConfig │ │ +│ │ │ │ +│ │ Implementations: │ │ +│ │ ├── NginxRouter │ │ +│ │ ├── HaproxyRouter │ │ +│ │ ├── TraefikRouter │ │ +│ │ └── AwsAlbRouter │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ CANARY CONTROLLER (110_003) │ │ +│ │ │ │ +│ │ Canary Progression: │ │ +│ │ │ │ +│ │ Step 1: 5% ──────┐ │ │ +│ │ Step 2: 10% ─────┤ │ │ +│ │ Step 3: 25% ─────┼──► Auto-advance if metrics pass │ │ +│ │ Step 4: 50% ─────┤ │ │ +│ │ Step 5: 100% ────┘ │ │ +│ │ │ │ +│ │ Rollback triggers: │ │ +│ │ ├── Error rate > threshold │ │ +│ │ ├── Latency P99 > threshold │ │ +│ │ └── Manual intervention │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ NGINX ROUTER PLUGIN (110_004) │ │ +│ │ │ │ +│ │ upstream control { │ │ +│ │ server app-v1:8080 weight=80; │ │ +│ │ } │ │ +│ │ upstream treatment { │ │ +│ │ server app-v2:8080 weight=20; │ │ +│ │ } │ │ +│ │ │ │ +│ │ # Header-based routing │ │ +│ │ if ($http_x_canary = "true") { │ │ +│ │ proxy_pass http://treatment; │ │ +│ │ } │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Deliverables Summary + +### 110_001: A/B Release Manager + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IAbReleaseManager` | Interface | A/B operations | +| `AbReleaseManager` | Class | Implementation | +| `AbRelease` | Model | A/B release entity | +| `AbDecision` | Model | Promotion decision | + +### 110_002: Traffic Router Framework + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `ITrafficRouter` | Interface | Router abstraction | +| `RoutingConfig` | Model | Current routing state | +| `WeightedRouting` | Strategy | Percentage-based | +| `HeaderRouting` | Strategy | Header-based | +| `CookieRouting` | Strategy | Cookie-based | + +### 110_003: Canary Controller + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `ICanaryController` | Interface | Canary operations | +| `CanaryController` | Class | Implementation | +| `CanaryStep` | Model | Progression step | +| `CanaryMetrics` | Model | Health metrics | +| `AutoRollback` | Class | Automatic rollback | + +### 110_004: Router Plugin - Nginx + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `NginxRouter` | Router | Nginx implementation | +| `NginxConfigGenerator` | Class | Config generation | +| `NginxReloader` | Class | Hot reload | +| `NginxMetrics` | Class | Status parsing | + +--- + +## Key Interfaces + +```csharp +public interface IAbReleaseManager +{ + Task CreateAsync(CreateAbReleaseRequest request, CancellationToken ct); + Task UpdateWeightsAsync(Guid id, int controlWeight, int treatmentWeight, CancellationToken ct); + Task PromoteAsync(Guid id, AbDecision decision, CancellationToken ct); + Task RollbackAsync(Guid id, CancellationToken ct); + Task GetAsync(Guid id, CancellationToken ct); +} + +public interface ITrafficRouter +{ + string RouterType { get; } + Task ApplyAsync(RoutingConfig config, CancellationToken ct); + Task GetCurrentAsync(CancellationToken ct); + Task HealthCheckAsync(CancellationToken ct); +} + +public interface ICanaryController +{ + Task StartAsync(Guid releaseId, CanaryConfig config, CancellationToken ct); + Task AdvanceAsync(Guid canaryId, CancellationToken ct); + Task RollbackAsync(Guid canaryId, string reason, CancellationToken ct); + Task CompleteAsync(Guid canaryId, CancellationToken ct); +} +``` + +--- + +## Canary Flow + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ CANARY FLOW │ +│ │ +│ ┌─────────────┐ │ +│ │ Start │ │ +│ │ Canary │ │ +│ └──────┬──────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────┐ metrics ┌─────────────┐ pass ┌─────────────┐ │ +│ │ Step 1 │ ────────────►│ Analyze │ ──────────►│ Step 2 │ │ +│ │ 5% │ │ Metrics │ │ 10% │ │ +│ └─────────────┘ └──────┬──────┘ └──────┬──────┘ │ +│ │ │ │ +│ │ fail │ │ +│ ▼ ▼ │ +│ ┌─────────────┐ ... continue │ +│ │ Rollback │ │ │ +│ │ to Control │ │ │ +│ └─────────────┘ │ │ +│ ▼ │ +│ ┌─────────────┐ │ +│ │ Step N │ │ +│ │ 100% │ │ +│ └──────┬──────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────┐ │ +│ │ Complete │ │ +│ │ Promote │ │ +│ └─────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Dependencies + +| Module | Purpose | +|--------|---------| +| 107_005 Deployment Strategies | Base deployment | +| 107_002 Target Executor | Deploy versions | +| Telemetry | Metrics collection | + +--- + +## Acceptance Criteria + +- [ ] A/B release created +- [ ] Traffic weights applied +- [ ] Header-based routing works +- [ ] Canary progression advances +- [ ] Auto-rollback on metrics failure +- [ ] Nginx config generated +- [ ] Nginx hot reload works +- [ ] Evidence captured for A/B +- [ ] Unit test coverage ≥80% + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Phase 10 index created | diff --git a/docs/implplan/SPRINT_20260110_110_001_PROGDL_ab_release_manager.md b/docs/implplan/SPRINT_20260110_110_001_PROGDL_ab_release_manager.md new file mode 100644 index 000000000..f5b594ea6 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_110_001_PROGDL_ab_release_manager.md @@ -0,0 +1,613 @@ +# SPRINT: A/B Release Manager + +> **Sprint ID:** 110_001 +> **Module:** PROGDL +> **Phase:** 10 - Progressive Delivery +> **Status:** TODO +> **Parent:** [110_000_INDEX](SPRINT_20260110_110_000_INDEX_progressive_delivery.md) + +--- + +## Overview + +Implement the A/B Release Manager for running parallel versions with traffic splitting. + +### Objectives + +- Create A/B releases with control and treatment versions +- Manage traffic weight distribution +- Track A/B experiment metrics +- Promote or rollback based on results + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Progressive/ +│ ├── AbRelease/ +│ │ ├── IAbReleaseManager.cs +│ │ ├── AbReleaseManager.cs +│ │ ├── AbReleaseStore.cs +│ │ └── AbMetricsCollector.cs +│ └── Models/ +│ ├── AbRelease.cs +│ ├── AbDecision.cs +│ └── AbMetrics.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### AbRelease Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Models; + +public sealed record AbRelease +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required Guid EnvironmentId { get; init; } + public required string EnvironmentName { get; init; } + public required AbVersion Control { get; init; } + public required AbVersion Treatment { get; init; } + public required int ControlWeight { get; init; } + public required int TreatmentWeight { get; init; } + public required AbReleaseStatus Status { get; init; } + public required DateTimeOffset CreatedAt { get; init; } + public required Guid CreatedBy { get; init; } + public DateTimeOffset? StartedAt { get; init; } + public DateTimeOffset? CompletedAt { get; init; } + public AbDecision? Decision { get; init; } + public AbMetrics? LatestMetrics { get; init; } +} + +public sealed record AbVersion +{ + public required Guid ReleaseId { get; init; } + public required string ReleaseName { get; init; } + public required string Variant { get; init; } // "control" or "treatment" + public required ImmutableArray Components { get; init; } + public required ImmutableArray TargetIds { get; init; } +} + +public sealed record AbComponent +{ + public required string Name { get; init; } + public required string Digest { get; init; } + public string? Endpoint { get; init; } +} + +public enum AbReleaseStatus +{ + Draft, + Deploying, + Active, + Paused, + Promoting, + RollingBack, + Completed, + Failed +} + +public sealed record AbDecision +{ + public required AbDecisionType Type { get; init; } + public required string Reason { get; init; } + public required Guid DecidedBy { get; init; } + public required DateTimeOffset DecidedAt { get; init; } + public AbMetrics? MetricsAtDecision { get; init; } +} + +public enum AbDecisionType +{ + PromoteTreatment, + KeepControl, + ExtendExperiment +} + +public sealed record AbMetrics +{ + public required DateTimeOffset CollectedAt { get; init; } + public required AbVariantMetrics ControlMetrics { get; init; } + public required AbVariantMetrics TreatmentMetrics { get; init; } + public double? StatisticalSignificance { get; init; } +} + +public sealed record AbVariantMetrics +{ + public required long RequestCount { get; init; } + public required double ErrorRate { get; init; } + public required double LatencyP50 { get; init; } + public required double LatencyP95 { get; init; } + public required double LatencyP99 { get; init; } + public ImmutableDictionary CustomMetrics { get; init; } = ImmutableDictionary.Empty; +} +``` + +### IAbReleaseManager Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.AbRelease; + +public interface IAbReleaseManager +{ + Task CreateAsync( + CreateAbReleaseRequest request, + CancellationToken ct = default); + + Task StartAsync( + Guid id, + CancellationToken ct = default); + + Task UpdateWeightsAsync( + Guid id, + int controlWeight, + int treatmentWeight, + CancellationToken ct = default); + + Task PauseAsync( + Guid id, + string? reason = null, + CancellationToken ct = default); + + Task ResumeAsync( + Guid id, + CancellationToken ct = default); + + Task PromoteAsync( + Guid id, + AbDecision decision, + CancellationToken ct = default); + + Task RollbackAsync( + Guid id, + string reason, + CancellationToken ct = default); + + Task GetAsync( + Guid id, + CancellationToken ct = default); + + Task> ListAsync( + AbReleaseFilter? filter = null, + CancellationToken ct = default); + + Task GetLatestMetricsAsync( + Guid id, + CancellationToken ct = default); +} + +public sealed record CreateAbReleaseRequest +{ + public required Guid EnvironmentId { get; init; } + public required Guid ControlReleaseId { get; init; } + public required Guid TreatmentReleaseId { get; init; } + public int InitialControlWeight { get; init; } = 90; + public int InitialTreatmentWeight { get; init; } = 10; + public IReadOnlyList? ControlTargetIds { get; init; } + public IReadOnlyList? TreatmentTargetIds { get; init; } + public AbRoutingMode RoutingMode { get; init; } = AbRoutingMode.Weighted; +} + +public enum AbRoutingMode +{ + Weighted, // Random distribution by weight + HeaderBased, // Route by header value + CookieBased, // Route by cookie value + UserIdBased // Route by user ID hash +} + +public sealed record AbReleaseFilter +{ + public Guid? EnvironmentId { get; init; } + public AbReleaseStatus? Status { get; init; } + public DateTimeOffset? FromDate { get; init; } + public DateTimeOffset? ToDate { get; init; } +} +``` + +### AbReleaseManager Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.AbRelease; + +public sealed class AbReleaseManager : IAbReleaseManager +{ + private readonly IAbReleaseStore _store; + private readonly IReleaseManager _releaseManager; + private readonly IDeployOrchestrator _deployOrchestrator; + private readonly ITrafficRouter _trafficRouter; + private readonly AbMetricsCollector _metricsCollector; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + private readonly ITenantContext _tenantContext; + private readonly IUserContext _userContext; + private readonly ILogger _logger; + + public async Task CreateAsync( + CreateAbReleaseRequest request, + CancellationToken ct = default) + { + var controlRelease = await _releaseManager.GetAsync(request.ControlReleaseId, ct) + ?? throw new ReleaseNotFoundException(request.ControlReleaseId); + + var treatmentRelease = await _releaseManager.GetAsync(request.TreatmentReleaseId, ct) + ?? throw new ReleaseNotFoundException(request.TreatmentReleaseId); + + ValidateWeights(request.InitialControlWeight, request.InitialTreatmentWeight); + + var abRelease = new AbRelease + { + Id = _guidGenerator.NewGuid(), + TenantId = _tenantContext.TenantId, + EnvironmentId = request.EnvironmentId, + EnvironmentName = controlRelease.Components.First().Config.GetValueOrDefault("environment", ""), + Control = new AbVersion + { + ReleaseId = controlRelease.Id, + ReleaseName = controlRelease.Name, + Variant = "control", + Components = controlRelease.Components.Select(c => new AbComponent + { + Name = c.ComponentName, + Digest = c.Digest + }).ToImmutableArray(), + TargetIds = request.ControlTargetIds?.ToImmutableArray() ?? ImmutableArray.Empty + }, + Treatment = new AbVersion + { + ReleaseId = treatmentRelease.Id, + ReleaseName = treatmentRelease.Name, + Variant = "treatment", + Components = treatmentRelease.Components.Select(c => new AbComponent + { + Name = c.ComponentName, + Digest = c.Digest + }).ToImmutableArray(), + TargetIds = request.TreatmentTargetIds?.ToImmutableArray() ?? ImmutableArray.Empty + }, + ControlWeight = request.InitialControlWeight, + TreatmentWeight = request.InitialTreatmentWeight, + Status = AbReleaseStatus.Draft, + CreatedAt = _timeProvider.GetUtcNow(), + CreatedBy = _userContext.UserId + }; + + await _store.SaveAsync(abRelease, ct); + + await _eventPublisher.PublishAsync(new AbReleaseCreated( + abRelease.Id, + abRelease.Control.ReleaseName, + abRelease.Treatment.ReleaseName, + abRelease.EnvironmentId, + _timeProvider.GetUtcNow() + ), ct); + + _logger.LogInformation( + "Created A/B release {Id}: control={Control}, treatment={Treatment}", + abRelease.Id, + controlRelease.Name, + treatmentRelease.Name); + + return abRelease; + } + + public async Task StartAsync(Guid id, CancellationToken ct = default) + { + var abRelease = await GetRequiredAsync(id, ct); + + if (abRelease.Status != AbReleaseStatus.Draft) + { + throw new InvalidAbReleaseStateException(id, abRelease.Status, "Cannot start - not in Draft status"); + } + + // Deploy both versions + abRelease = abRelease with { Status = AbReleaseStatus.Deploying }; + await _store.SaveAsync(abRelease, ct); + + try + { + // Deploy control version to control targets + await DeployVersionAsync(abRelease.Control, abRelease.EnvironmentId, ct); + + // Deploy treatment version to treatment targets + await DeployVersionAsync(abRelease.Treatment, abRelease.EnvironmentId, ct); + + // Configure traffic routing + await _trafficRouter.ApplyAsync(new RoutingConfig + { + AbReleaseId = abRelease.Id, + ControlEndpoints = abRelease.Control.Components.Select(c => c.Endpoint ?? "").ToList(), + TreatmentEndpoints = abRelease.Treatment.Components.Select(c => c.Endpoint ?? "").ToList(), + ControlWeight = abRelease.ControlWeight, + TreatmentWeight = abRelease.TreatmentWeight + }, ct); + + abRelease = abRelease with + { + Status = AbReleaseStatus.Active, + StartedAt = _timeProvider.GetUtcNow() + }; + await _store.SaveAsync(abRelease, ct); + + await _eventPublisher.PublishAsync(new AbReleaseStarted( + abRelease.Id, + abRelease.ControlWeight, + abRelease.TreatmentWeight, + _timeProvider.GetUtcNow() + ), ct); + + _logger.LogInformation( + "Started A/B release {Id}: {ControlWeight}% control, {TreatmentWeight}% treatment", + id, + abRelease.ControlWeight, + abRelease.TreatmentWeight); + + return abRelease; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to start A/B release {Id}", id); + + abRelease = abRelease with + { + Status = AbReleaseStatus.Failed, + Decision = new AbDecision + { + Type = AbDecisionType.KeepControl, + Reason = $"Deployment failed: {ex.Message}", + DecidedBy = _userContext.UserId, + DecidedAt = _timeProvider.GetUtcNow() + } + }; + await _store.SaveAsync(abRelease, ct); + + throw; + } + } + + public async Task UpdateWeightsAsync( + Guid id, + int controlWeight, + int treatmentWeight, + CancellationToken ct = default) + { + var abRelease = await GetRequiredAsync(id, ct); + + if (abRelease.Status != AbReleaseStatus.Active && abRelease.Status != AbReleaseStatus.Paused) + { + throw new InvalidAbReleaseStateException(id, abRelease.Status, "Cannot update weights - not active or paused"); + } + + ValidateWeights(controlWeight, treatmentWeight); + + // Update traffic routing + await _trafficRouter.ApplyAsync(new RoutingConfig + { + AbReleaseId = abRelease.Id, + ControlEndpoints = abRelease.Control.Components.Select(c => c.Endpoint ?? "").ToList(), + TreatmentEndpoints = abRelease.Treatment.Components.Select(c => c.Endpoint ?? "").ToList(), + ControlWeight = controlWeight, + TreatmentWeight = treatmentWeight + }, ct); + + abRelease = abRelease with + { + ControlWeight = controlWeight, + TreatmentWeight = treatmentWeight + }; + await _store.SaveAsync(abRelease, ct); + + await _eventPublisher.PublishAsync(new AbReleaseWeightsUpdated( + abRelease.Id, + controlWeight, + treatmentWeight, + _timeProvider.GetUtcNow() + ), ct); + + _logger.LogInformation( + "Updated A/B release {Id} weights: {ControlWeight}% control, {TreatmentWeight}% treatment", + id, + controlWeight, + treatmentWeight); + + return abRelease; + } + + public async Task PromoteAsync( + Guid id, + AbDecision decision, + CancellationToken ct = default) + { + var abRelease = await GetRequiredAsync(id, ct); + + if (abRelease.Status != AbReleaseStatus.Active) + { + throw new InvalidAbReleaseStateException(id, abRelease.Status, "Cannot promote - not active"); + } + + abRelease = abRelease with { Status = AbReleaseStatus.Promoting }; + await _store.SaveAsync(abRelease, ct); + + try + { + var winningRelease = decision.Type == AbDecisionType.PromoteTreatment + ? abRelease.Treatment + : abRelease.Control; + + // Route 100% traffic to winner + await _trafficRouter.ApplyAsync(new RoutingConfig + { + AbReleaseId = abRelease.Id, + ControlEndpoints = winningRelease.Components.Select(c => c.Endpoint ?? "").ToList(), + TreatmentEndpoints = [], + ControlWeight = 100, + TreatmentWeight = 0 + }, ct); + + // Collect final metrics + var finalMetrics = await _metricsCollector.CollectAsync(abRelease, ct); + + abRelease = abRelease with + { + Status = AbReleaseStatus.Completed, + CompletedAt = _timeProvider.GetUtcNow(), + Decision = decision with { MetricsAtDecision = finalMetrics }, + LatestMetrics = finalMetrics + }; + await _store.SaveAsync(abRelease, ct); + + await _eventPublisher.PublishAsync(new AbReleaseCompleted( + abRelease.Id, + decision.Type, + winningRelease.ReleaseName, + _timeProvider.GetUtcNow() + ), ct); + + _logger.LogInformation( + "A/B release {Id} completed: winner={Winner}", + id, + winningRelease.ReleaseName); + + return abRelease; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to promote A/B release {Id}", id); + + abRelease = abRelease with { Status = AbReleaseStatus.Active }; + await _store.SaveAsync(abRelease, ct); + + throw; + } + } + + public async Task RollbackAsync( + Guid id, + string reason, + CancellationToken ct = default) + { + var abRelease = await GetRequiredAsync(id, ct); + + if (abRelease.Status != AbReleaseStatus.Active && abRelease.Status != AbReleaseStatus.Promoting) + { + throw new InvalidAbReleaseStateException(id, abRelease.Status, "Cannot rollback"); + } + + abRelease = abRelease with { Status = AbReleaseStatus.RollingBack }; + await _store.SaveAsync(abRelease, ct); + + // Route 100% to control + await _trafficRouter.ApplyAsync(new RoutingConfig + { + AbReleaseId = abRelease.Id, + ControlEndpoints = abRelease.Control.Components.Select(c => c.Endpoint ?? "").ToList(), + TreatmentEndpoints = [], + ControlWeight = 100, + TreatmentWeight = 0 + }, ct); + + abRelease = abRelease with + { + Status = AbReleaseStatus.Completed, + CompletedAt = _timeProvider.GetUtcNow(), + Decision = new AbDecision + { + Type = AbDecisionType.KeepControl, + Reason = $"Rollback: {reason}", + DecidedBy = _userContext.UserId, + DecidedAt = _timeProvider.GetUtcNow() + } + }; + await _store.SaveAsync(abRelease, ct); + + _logger.LogInformation("Rolled back A/B release {Id}: {Reason}", id, reason); + + return abRelease; + } + + public async Task GetLatestMetricsAsync(Guid id, CancellationToken ct = default) + { + var abRelease = await GetRequiredAsync(id, ct); + return await _metricsCollector.CollectAsync(abRelease, ct); + } + + private async Task DeployVersionAsync(AbVersion version, Guid environmentId, CancellationToken ct) + { + // Create a deployment for this version + // Implementation depends on target assignment strategy + } + + private async Task GetRequiredAsync(Guid id, CancellationToken ct) + { + return await _store.GetAsync(id, ct) + ?? throw new AbReleaseNotFoundException(id); + } + + private static void ValidateWeights(int controlWeight, int treatmentWeight) + { + if (controlWeight < 0 || treatmentWeight < 0) + { + throw new InvalidWeightException("Weights must be non-negative"); + } + + if (controlWeight + treatmentWeight != 100) + { + throw new InvalidWeightException("Weights must sum to 100"); + } + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Create A/B release with control and treatment versions +- [ ] Validate weights sum to 100 +- [ ] Deploy both versions on start +- [ ] Configure traffic routing +- [ ] Update weights dynamically +- [ ] Pause and resume experiments +- [ ] Promote treatment version +- [ ] Rollback to control version +- [ ] Collect metrics for both variants +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 107_005 Deployment Strategies | Internal | TODO | +| 104_003 Release Manager | Internal | TODO | +| 110_002 Traffic Router Framework | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IAbReleaseManager | TODO | | +| AbReleaseManager | TODO | | +| AbRelease model | TODO | | +| AbDecision model | TODO | | +| AbMetrics model | TODO | | +| AbReleaseStore | TODO | | +| AbMetricsCollector | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_110_002_PROGDL_traffic_router.md b/docs/implplan/SPRINT_20260110_110_002_PROGDL_traffic_router.md new file mode 100644 index 000000000..5d843b144 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_110_002_PROGDL_traffic_router.md @@ -0,0 +1,520 @@ +# SPRINT: Traffic Router Framework + +> **Sprint ID:** 110_002 +> **Module:** PROGDL +> **Phase:** 10 - Progressive Delivery +> **Status:** TODO +> **Parent:** [110_000_INDEX](SPRINT_20260110_110_000_INDEX_progressive_delivery.md) + +--- + +## Overview + +Implement the Traffic Router Framework providing abstractions for traffic splitting across load balancers. + +### Objectives + +- Define traffic router interface for plugins +- Support weighted routing (percentage-based) +- Support header-based routing +- Support cookie-based routing +- Track routing state and transitions + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Progressive/ +│ └── Router/ +│ ├── ITrafficRouter.cs +│ ├── TrafficRouterRegistry.cs +│ ├── RoutingConfig.cs +│ ├── Strategies/ +│ │ ├── WeightedRouting.cs +│ │ ├── HeaderRouting.cs +│ │ └── CookieRouting.cs +│ └── Store/ +│ ├── IRoutingStateStore.cs +│ └── RoutingStateStore.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### ITrafficRouter Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Router; + +public interface ITrafficRouter +{ + string RouterType { get; } + IReadOnlyList SupportedStrategies { get; } + + Task IsAvailableAsync(CancellationToken ct = default); + + Task ApplyAsync( + RoutingConfig config, + CancellationToken ct = default); + + Task GetCurrentAsync( + Guid contextId, + CancellationToken ct = default); + + Task RemoveAsync( + Guid contextId, + CancellationToken ct = default); + + Task HealthCheckAsync(CancellationToken ct = default); + + Task GetMetricsAsync( + Guid contextId, + CancellationToken ct = default); +} + +public sealed record RoutingConfig +{ + public required Guid AbReleaseId { get; init; } + public required IReadOnlyList ControlEndpoints { get; init; } + public required IReadOnlyList TreatmentEndpoints { get; init; } + public required int ControlWeight { get; init; } + public required int TreatmentWeight { get; init; } + public RoutingStrategy Strategy { get; init; } = RoutingStrategy.Weighted; + public HeaderRoutingConfig? HeaderRouting { get; init; } + public CookieRoutingConfig? CookieRouting { get; init; } + public IReadOnlyDictionary Metadata { get; init; } = new Dictionary(); +} + +public enum RoutingStrategy +{ + Weighted, + HeaderBased, + CookieBased, + Combined +} + +public sealed record HeaderRoutingConfig +{ + public required string HeaderName { get; init; } + public required string TreatmentValue { get; init; } + public bool FallbackToWeighted { get; init; } = true; +} + +public sealed record CookieRoutingConfig +{ + public required string CookieName { get; init; } + public required string TreatmentValue { get; init; } + public bool FallbackToWeighted { get; init; } = true; +} + +public sealed record RouterMetrics +{ + public required long ControlRequests { get; init; } + public required long TreatmentRequests { get; init; } + public required double ControlErrorRate { get; init; } + public required double TreatmentErrorRate { get; init; } + public required double ControlLatencyP50 { get; init; } + public required double TreatmentLatencyP50 { get; init; } + public required DateTimeOffset CollectedAt { get; init; } +} +``` + +### TrafficRouterRegistry + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Router; + +public sealed class TrafficRouterRegistry +{ + private readonly Dictionary _routers = new(); + private readonly ILogger _logger; + + public void Register(ITrafficRouter router) + { + if (_routers.ContainsKey(router.RouterType)) + { + throw new RouterAlreadyRegisteredException(router.RouterType); + } + + _routers[router.RouterType] = router; + _logger.LogInformation( + "Registered traffic router: {Type} with strategies: {Strategies}", + router.RouterType, + string.Join(", ", router.SupportedStrategies)); + } + + public ITrafficRouter? Get(string routerType) + { + return _routers.TryGetValue(routerType, out var router) ? router : null; + } + + public ITrafficRouter GetRequired(string routerType) + { + return Get(routerType) + ?? throw new RouterNotFoundException(routerType); + } + + public IReadOnlyList GetAvailable() + { + return _routers.Values.Select(r => new RouterInfo + { + Type = r.RouterType, + SupportedStrategies = r.SupportedStrategies + }).ToList().AsReadOnly(); + } + + public async Task> CheckHealthAsync(CancellationToken ct = default) + { + var results = new List(); + + foreach (var (type, router) in _routers) + { + try + { + var isHealthy = await router.HealthCheckAsync(ct); + results.Add(new RouterHealthStatus(type, isHealthy, null)); + } + catch (Exception ex) + { + results.Add(new RouterHealthStatus(type, false, ex.Message)); + } + } + + return results.AsReadOnly(); + } +} + +public sealed record RouterInfo +{ + public required string Type { get; init; } + public required IReadOnlyList SupportedStrategies { get; init; } +} + +public sealed record RouterHealthStatus( + string RouterType, + bool IsHealthy, + string? Error +); +``` + +### RoutingStateStore + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Router.Store; + +public interface IRoutingStateStore +{ + Task SaveAsync(RoutingState state, CancellationToken ct = default); + Task GetAsync(Guid contextId, CancellationToken ct = default); + Task> ListActiveAsync(CancellationToken ct = default); + Task DeleteAsync(Guid contextId, CancellationToken ct = default); + Task> GetHistoryAsync(Guid contextId, CancellationToken ct = default); +} + +public sealed record RoutingState +{ + public required Guid ContextId { get; init; } + public required string RouterType { get; init; } + public required RoutingConfig Config { get; init; } + public required RoutingStateStatus Status { get; init; } + public required DateTimeOffset AppliedAt { get; init; } + public required DateTimeOffset LastVerifiedAt { get; init; } + public string? Error { get; init; } +} + +public enum RoutingStateStatus +{ + Pending, + Applied, + Verified, + Failed, + Removed +} + +public sealed record RoutingTransition +{ + public required Guid ContextId { get; init; } + public required RoutingConfig FromConfig { get; init; } + public required RoutingConfig ToConfig { get; init; } + public required string Reason { get; init; } + public required Guid TriggeredBy { get; init; } + public required DateTimeOffset TransitionedAt { get; init; } +} +``` + +### WeightedRouting Strategy + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Router.Strategies; + +public sealed class WeightedRouting +{ + public static UpstreamConfig Generate(RoutingConfig config) + { + var controlUpstream = new UpstreamDefinition + { + Name = $"control-{config.AbReleaseId:N}", + Servers = config.ControlEndpoints.Select(e => new UpstreamServer + { + Address = e, + Weight = config.ControlWeight + }).ToList() + }; + + var treatmentUpstream = new UpstreamDefinition + { + Name = $"treatment-{config.AbReleaseId:N}", + Servers = config.TreatmentEndpoints.Select(e => new UpstreamServer + { + Address = e, + Weight = config.TreatmentWeight + }).ToList() + }; + + return new UpstreamConfig + { + ContextId = config.AbReleaseId, + Upstreams = new[] { controlUpstream, treatmentUpstream }.ToList(), + DefaultUpstream = controlUpstream.Name, + SplitConfig = new SplitConfig + { + ControlUpstream = controlUpstream.Name, + TreatmentUpstream = treatmentUpstream.Name, + ControlWeight = config.ControlWeight, + TreatmentWeight = config.TreatmentWeight + } + }; + } +} + +public sealed record UpstreamConfig +{ + public required Guid ContextId { get; init; } + public required IReadOnlyList Upstreams { get; init; } + public required string DefaultUpstream { get; init; } + public SplitConfig? SplitConfig { get; init; } + public HeaderMatchConfig? HeaderConfig { get; init; } + public CookieMatchConfig? CookieConfig { get; init; } +} + +public sealed record UpstreamDefinition +{ + public required string Name { get; init; } + public required IReadOnlyList Servers { get; init; } +} + +public sealed record UpstreamServer +{ + public required string Address { get; init; } + public int Weight { get; init; } = 1; + public int MaxFails { get; init; } = 3; + public TimeSpan FailTimeout { get; init; } = TimeSpan.FromSeconds(30); +} + +public sealed record SplitConfig +{ + public required string ControlUpstream { get; init; } + public required string TreatmentUpstream { get; init; } + public required int ControlWeight { get; init; } + public required int TreatmentWeight { get; init; } +} +``` + +### HeaderRouting Strategy + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Router.Strategies; + +public sealed class HeaderRouting +{ + public static HeaderMatchConfig Generate(RoutingConfig config) + { + if (config.HeaderRouting is null) + { + throw new InvalidOperationException("Header routing config is required"); + } + + return new HeaderMatchConfig + { + HeaderName = config.HeaderRouting.HeaderName, + Matches = new[] + { + new HeaderMatch + { + Value = config.HeaderRouting.TreatmentValue, + Upstream = $"treatment-{config.AbReleaseId:N}" + } + }.ToList(), + DefaultUpstream = $"control-{config.AbReleaseId:N}", + FallbackToWeighted = config.HeaderRouting.FallbackToWeighted + }; + } +} + +public sealed record HeaderMatchConfig +{ + public required string HeaderName { get; init; } + public required IReadOnlyList Matches { get; init; } + public required string DefaultUpstream { get; init; } + public bool FallbackToWeighted { get; init; } +} + +public sealed record HeaderMatch +{ + public required string Value { get; init; } + public required string Upstream { get; init; } +} +``` + +### CookieRouting Strategy + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Router.Strategies; + +public sealed class CookieRouting +{ + public static CookieMatchConfig Generate(RoutingConfig config) + { + if (config.CookieRouting is null) + { + throw new InvalidOperationException("Cookie routing config is required"); + } + + return new CookieMatchConfig + { + CookieName = config.CookieRouting.CookieName, + Matches = new[] + { + new CookieMatch + { + Value = config.CookieRouting.TreatmentValue, + Upstream = $"treatment-{config.AbReleaseId:N}" + } + }.ToList(), + DefaultUpstream = $"control-{config.AbReleaseId:N}", + FallbackToWeighted = config.CookieRouting.FallbackToWeighted, + SetCookieOnFirstRequest = true + }; + } +} + +public sealed record CookieMatchConfig +{ + public required string CookieName { get; init; } + public required IReadOnlyList Matches { get; init; } + public required string DefaultUpstream { get; init; } + public bool FallbackToWeighted { get; init; } + public bool SetCookieOnFirstRequest { get; init; } + public TimeSpan CookieExpiry { get; init; } = TimeSpan.FromDays(30); +} + +public sealed record CookieMatch +{ + public required string Value { get; init; } + public required string Upstream { get; init; } +} +``` + +### Routing Config Validator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Router; + +public static class RoutingConfigValidator +{ + public static ValidationResult Validate(RoutingConfig config) + { + var errors = new List(); + + // Validate weights + if (config.ControlWeight + config.TreatmentWeight != 100) + { + errors.Add("Weights must sum to 100"); + } + + if (config.ControlWeight < 0 || config.TreatmentWeight < 0) + { + errors.Add("Weights must be non-negative"); + } + + // Validate endpoints + if (config.ControlEndpoints.Count == 0 && config.ControlWeight > 0) + { + errors.Add("Control endpoints required when control weight > 0"); + } + + if (config.TreatmentEndpoints.Count == 0 && config.TreatmentWeight > 0) + { + errors.Add("Treatment endpoints required when treatment weight > 0"); + } + + // Validate strategy-specific config + if (config.Strategy == RoutingStrategy.HeaderBased && config.HeaderRouting is null) + { + errors.Add("Header routing config required for header-based strategy"); + } + + if (config.Strategy == RoutingStrategy.CookieBased && config.CookieRouting is null) + { + errors.Add("Cookie routing config required for cookie-based strategy"); + } + + return new ValidationResult(errors.Count == 0, errors.AsReadOnly()); + } +} + +public sealed record ValidationResult( + bool IsValid, + IReadOnlyList Errors +); +``` + +--- + +## Acceptance Criteria + +- [ ] Define traffic router interface +- [ ] Register and discover routers +- [ ] Generate weighted routing config +- [ ] Generate header-based routing config +- [ ] Generate cookie-based routing config +- [ ] Validate routing configurations +- [ ] Store routing state transitions +- [ ] Query active routing states +- [ ] Health check router implementations +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 110_001 A/B Release Manager | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| ITrafficRouter | TODO | | +| TrafficRouterRegistry | TODO | | +| RoutingConfig | TODO | | +| WeightedRouting | TODO | | +| HeaderRouting | TODO | | +| CookieRouting | TODO | | +| RoutingStateStore | TODO | | +| RoutingConfigValidator | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_110_003_PROGDL_canary_controller.md b/docs/implplan/SPRINT_20260110_110_003_PROGDL_canary_controller.md new file mode 100644 index 000000000..ae423a0c4 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_110_003_PROGDL_canary_controller.md @@ -0,0 +1,702 @@ +# SPRINT: Canary Controller + +> **Sprint ID:** 110_003 +> **Module:** PROGDL +> **Phase:** 10 - Progressive Delivery +> **Status:** TODO +> **Parent:** [110_000_INDEX](SPRINT_20260110_110_000_INDEX_progressive_delivery.md) + +--- + +## Overview + +Implement the Canary Controller for gradual traffic promotion with automatic rollback based on metrics. + +### Objectives + +- Define canary progression steps +- Auto-advance based on metrics analysis +- Auto-rollback on metric threshold breach +- Manual intervention support +- Configurable promotion schedules + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Progressive/ +│ └── Canary/ +│ ├── ICanaryController.cs +│ ├── CanaryController.cs +│ ├── CanaryProgressionEngine.cs +│ ├── CanaryMetricsAnalyzer.cs +│ ├── AutoRollback.cs +│ └── Models/ +│ ├── CanaryRelease.cs +│ ├── CanaryStep.cs +│ └── CanaryConfig.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### CanaryRelease Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Canary.Models; + +public sealed record CanaryRelease +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required Guid ReleaseId { get; init; } + public required string ReleaseName { get; init; } + public required Guid EnvironmentId { get; init; } + public required string EnvironmentName { get; init; } + public required CanaryConfig Config { get; init; } + public required ImmutableArray Steps { get; init; } + public required int CurrentStepIndex { get; init; } + public required CanaryStatus Status { get; init; } + public required DateTimeOffset CreatedAt { get; init; } + public required Guid CreatedBy { get; init; } + public DateTimeOffset? StartedAt { get; init; } + public DateTimeOffset? CurrentStepStartedAt { get; init; } + public DateTimeOffset? CompletedAt { get; init; } + public CanaryRollbackInfo? RollbackInfo { get; init; } + public CanaryMetrics? LatestMetrics { get; init; } +} + +public enum CanaryStatus +{ + Pending, + Running, + WaitingForMetrics, + Advancing, + Paused, + RollingBack, + Completed, + Failed, + Cancelled +} + +public sealed record CanaryConfig +{ + public required ImmutableArray StepConfigs { get; init; } + public required CanaryMetricThresholds Thresholds { get; init; } + public TimeSpan MetricsWindowDuration { get; init; } = TimeSpan.FromMinutes(5); + public TimeSpan MinStepDuration { get; init; } = TimeSpan.FromMinutes(10); + public bool AutoAdvance { get; init; } = true; + public bool AutoRollback { get; init; } = true; + public int MetricCheckIntervalSeconds { get; init; } = 60; +} + +public sealed record CanaryStepConfig +{ + public required int StepIndex { get; init; } + public required int TrafficPercentage { get; init; } + public TimeSpan? MinDuration { get; init; } + public TimeSpan? MaxDuration { get; init; } + public bool RequireManualApproval { get; init; } +} + +public sealed record CanaryStep +{ + public required int Index { get; init; } + public required int TrafficPercentage { get; init; } + public required CanaryStepStatus Status { get; init; } + public DateTimeOffset? StartedAt { get; init; } + public DateTimeOffset? CompletedAt { get; init; } + public CanaryMetrics? MetricsAtStart { get; init; } + public CanaryMetrics? MetricsAtEnd { get; init; } + public string? Notes { get; init; } +} + +public enum CanaryStepStatus +{ + Pending, + Running, + WaitingApproval, + Completed, + Skipped, + Failed +} + +public sealed record CanaryMetricThresholds +{ + public double MaxErrorRate { get; init; } = 0.05; // 5% + public double MaxLatencyP99Ms { get; init; } = 1000; // 1 second + public double? MaxLatencyP95Ms { get; init; } + public double? MaxLatencyP50Ms { get; init; } + public double MinSuccessRate { get; init; } = 0.95; // 95% + public ImmutableDictionary CustomThresholds { get; init; } = ImmutableDictionary.Empty; +} + +public sealed record CanaryMetrics +{ + public required DateTimeOffset CollectedAt { get; init; } + public required TimeSpan WindowDuration { get; init; } + public required long RequestCount { get; init; } + public required double ErrorRate { get; init; } + public required double SuccessRate { get; init; } + public required double LatencyP50Ms { get; init; } + public required double LatencyP95Ms { get; init; } + public required double LatencyP99Ms { get; init; } + public ImmutableDictionary CustomMetrics { get; init; } = ImmutableDictionary.Empty; +} + +public sealed record CanaryRollbackInfo +{ + public required string Reason { get; init; } + public required bool WasAutomatic { get; init; } + public required int RolledBackFromStep { get; init; } + public required CanaryMetrics? MetricsAtRollback { get; init; } + public required DateTimeOffset RolledBackAt { get; init; } + public Guid? TriggeredBy { get; init; } +} +``` + +### ICanaryController Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Canary; + +public interface ICanaryController +{ + Task StartAsync( + Guid releaseId, + CanaryConfig config, + CancellationToken ct = default); + + Task AdvanceAsync( + Guid canaryId, + CancellationToken ct = default); + + Task PauseAsync( + Guid canaryId, + string? reason = null, + CancellationToken ct = default); + + Task ResumeAsync( + Guid canaryId, + CancellationToken ct = default); + + Task RollbackAsync( + Guid canaryId, + string reason, + CancellationToken ct = default); + + Task CompleteAsync( + Guid canaryId, + CancellationToken ct = default); + + Task ApproveStepAsync( + Guid canaryId, + int stepIndex, + string? comment = null, + CancellationToken ct = default); + + Task GetAsync( + Guid canaryId, + CancellationToken ct = default); + + Task> ListActiveAsync( + Guid? environmentId = null, + CancellationToken ct = default); + + Task GetCurrentMetricsAsync( + Guid canaryId, + CancellationToken ct = default); +} +``` + +### CanaryController Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Canary; + +public sealed class CanaryController : ICanaryController +{ + private readonly ICanaryStore _store; + private readonly IReleaseManager _releaseManager; + private readonly ITrafficRouter _trafficRouter; + private readonly CanaryProgressionEngine _progressionEngine; + private readonly CanaryMetricsAnalyzer _metricsAnalyzer; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + private readonly ITenantContext _tenantContext; + private readonly IUserContext _userContext; + private readonly ILogger _logger; + + public async Task StartAsync( + Guid releaseId, + CanaryConfig config, + CancellationToken ct = default) + { + var release = await _releaseManager.GetAsync(releaseId, ct) + ?? throw new ReleaseNotFoundException(releaseId); + + ValidateConfig(config); + + var steps = BuildSteps(config); + + var canary = new CanaryRelease + { + Id = _guidGenerator.NewGuid(), + TenantId = _tenantContext.TenantId, + ReleaseId = release.Id, + ReleaseName = release.Name, + EnvironmentId = release.EnvironmentId ?? Guid.Empty, + EnvironmentName = release.EnvironmentName ?? "", + Config = config, + Steps = steps, + CurrentStepIndex = 0, + Status = CanaryStatus.Pending, + CreatedAt = _timeProvider.GetUtcNow(), + CreatedBy = _userContext.UserId + }; + + await _store.SaveAsync(canary, ct); + + _logger.LogInformation( + "Created canary release {Id} for release {Release} with {StepCount} steps", + canary.Id, + release.Name, + steps.Length); + + // Start first step + return await StartStepAsync(canary, 0, ct); + } + + public async Task AdvanceAsync( + Guid canaryId, + CancellationToken ct = default) + { + var canary = await GetRequiredAsync(canaryId, ct); + + if (canary.Status != CanaryStatus.Running && canary.Status != CanaryStatus.WaitingForMetrics) + { + throw new InvalidCanaryStateException(canaryId, canary.Status, "Cannot advance"); + } + + var currentStep = canary.Steps[canary.CurrentStepIndex]; + if (currentStep.Status == CanaryStepStatus.WaitingApproval) + { + throw new CanaryStepAwaitingApprovalException(canaryId, canary.CurrentStepIndex); + } + + // Check if there are more steps + var nextStepIndex = canary.CurrentStepIndex + 1; + if (nextStepIndex >= canary.Steps.Length) + { + // All steps complete, finalize + return await CompleteAsync(canaryId, ct); + } + + // Collect metrics for current step + var currentMetrics = await _metricsAnalyzer.CollectAsync(canary, ct); + + // Update current step as completed + var updatedSteps = canary.Steps.SetItem(canary.CurrentStepIndex, currentStep with + { + Status = CanaryStepStatus.Completed, + CompletedAt = _timeProvider.GetUtcNow(), + MetricsAtEnd = currentMetrics + }); + + canary = canary with + { + Steps = updatedSteps, + LatestMetrics = currentMetrics + }; + + await _store.SaveAsync(canary, ct); + + // Start next step + return await StartStepAsync(canary, nextStepIndex, ct); + } + + public async Task RollbackAsync( + Guid canaryId, + string reason, + CancellationToken ct = default) + { + var canary = await GetRequiredAsync(canaryId, ct); + + if (canary.Status == CanaryStatus.Completed || canary.Status == CanaryStatus.Failed) + { + throw new InvalidCanaryStateException(canaryId, canary.Status, "Cannot rollback completed canary"); + } + + canary = canary with { Status = CanaryStatus.RollingBack }; + await _store.SaveAsync(canary, ct); + + _logger.LogWarning( + "Rolling back canary {Id} from step {Step}: {Reason}", + canaryId, + canary.CurrentStepIndex, + reason); + + try + { + // Route 100% traffic back to baseline + await _trafficRouter.ApplyAsync(new RoutingConfig + { + AbReleaseId = canary.Id, + ControlEndpoints = await GetBaselineEndpointsAsync(canary, ct), + TreatmentEndpoints = [], + ControlWeight = 100, + TreatmentWeight = 0 + }, ct); + + var currentMetrics = await _metricsAnalyzer.CollectAsync(canary, ct); + + canary = canary with + { + Status = CanaryStatus.Failed, + CompletedAt = _timeProvider.GetUtcNow(), + RollbackInfo = new CanaryRollbackInfo + { + Reason = reason, + WasAutomatic = false, + RolledBackFromStep = canary.CurrentStepIndex, + MetricsAtRollback = currentMetrics, + RolledBackAt = _timeProvider.GetUtcNow(), + TriggeredBy = _userContext.UserId + } + }; + + await _store.SaveAsync(canary, ct); + + await _eventPublisher.PublishAsync(new CanaryRolledBack( + canary.Id, + canary.ReleaseName, + reason, + canary.CurrentStepIndex, + _timeProvider.GetUtcNow() + ), ct); + + return canary; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to rollback canary {Id}", canaryId); + throw; + } + } + + public async Task CompleteAsync( + Guid canaryId, + CancellationToken ct = default) + { + var canary = await GetRequiredAsync(canaryId, ct); + + if (canary.Status != CanaryStatus.Running) + { + throw new InvalidCanaryStateException(canaryId, canary.Status, "Cannot complete"); + } + + // Verify we're at 100% traffic + var currentStep = canary.Steps[canary.CurrentStepIndex]; + if (currentStep.TrafficPercentage != 100) + { + throw new CanaryNotAtFullTrafficException(canaryId, currentStep.TrafficPercentage); + } + + var finalMetrics = await _metricsAnalyzer.CollectAsync(canary, ct); + + // Mark final step complete + var updatedSteps = canary.Steps.SetItem(canary.CurrentStepIndex, currentStep with + { + Status = CanaryStepStatus.Completed, + CompletedAt = _timeProvider.GetUtcNow(), + MetricsAtEnd = finalMetrics + }); + + canary = canary with + { + Steps = updatedSteps, + Status = CanaryStatus.Completed, + CompletedAt = _timeProvider.GetUtcNow(), + LatestMetrics = finalMetrics + }; + + await _store.SaveAsync(canary, ct); + + await _eventPublisher.PublishAsync(new CanaryCompleted( + canary.Id, + canary.ReleaseName, + canary.Steps.Length, + _timeProvider.GetUtcNow() + ), ct); + + _logger.LogInformation( + "Canary {Id} completed successfully after {StepCount} steps", + canaryId, + canary.Steps.Length); + + return canary; + } + + private async Task StartStepAsync( + CanaryRelease canary, + int stepIndex, + CancellationToken ct) + { + var step = canary.Steps[stepIndex]; + var stepConfig = canary.Config.StepConfigs[stepIndex]; + + _logger.LogInformation( + "Starting canary step {Step} at {Percentage}% traffic", + stepIndex, + step.TrafficPercentage); + + // Apply traffic routing + await _trafficRouter.ApplyAsync(new RoutingConfig + { + AbReleaseId = canary.Id, + ControlEndpoints = await GetBaselineEndpointsAsync(canary, ct), + TreatmentEndpoints = await GetCanaryEndpointsAsync(canary, ct), + ControlWeight = 100 - step.TrafficPercentage, + TreatmentWeight = step.TrafficPercentage + }, ct); + + var startMetrics = await _metricsAnalyzer.CollectAsync(canary, ct); + + var status = stepConfig.RequireManualApproval + ? CanaryStepStatus.WaitingApproval + : CanaryStepStatus.Running; + + var updatedSteps = canary.Steps.SetItem(stepIndex, step with + { + Status = status, + StartedAt = _timeProvider.GetUtcNow(), + MetricsAtStart = startMetrics + }); + + canary = canary with + { + Steps = updatedSteps, + CurrentStepIndex = stepIndex, + CurrentStepStartedAt = _timeProvider.GetUtcNow(), + Status = status == CanaryStepStatus.WaitingApproval + ? CanaryStatus.WaitingForMetrics + : CanaryStatus.Running, + StartedAt = canary.StartedAt ?? _timeProvider.GetUtcNow(), + LatestMetrics = startMetrics + }; + + await _store.SaveAsync(canary, ct); + + await _eventPublisher.PublishAsync(new CanaryStepStarted( + canary.Id, + stepIndex, + step.TrafficPercentage, + _timeProvider.GetUtcNow() + ), ct); + + return canary; + } + + private static ImmutableArray BuildSteps(CanaryConfig config) + { + return config.StepConfigs.Select(c => new CanaryStep + { + Index = c.StepIndex, + TrafficPercentage = c.TrafficPercentage, + Status = CanaryStepStatus.Pending + }).ToImmutableArray(); + } + + private static void ValidateConfig(CanaryConfig config) + { + if (config.StepConfigs.Length == 0) + { + throw new InvalidCanaryConfigException("At least one step is required"); + } + + var lastPercentage = 0; + foreach (var step in config.StepConfigs.OrderBy(s => s.StepIndex)) + { + if (step.TrafficPercentage <= lastPercentage) + { + throw new InvalidCanaryConfigException("Traffic percentage must increase with each step"); + } + lastPercentage = step.TrafficPercentage; + } + + if (lastPercentage != 100) + { + throw new InvalidCanaryConfigException("Final step must have 100% traffic"); + } + } + + private async Task GetRequiredAsync(Guid id, CancellationToken ct) + { + return await _store.GetAsync(id, ct) + ?? throw new CanaryNotFoundException(id); + } + + private Task> GetBaselineEndpointsAsync(CanaryRelease canary, CancellationToken ct) + { + // Implementation to get baseline/stable version endpoints + return Task.FromResult>(new List()); + } + + private Task> GetCanaryEndpointsAsync(CanaryRelease canary, CancellationToken ct) + { + // Implementation to get canary version endpoints + return Task.FromResult>(new List()); + } +} +``` + +### AutoRollback + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Canary; + +public sealed class AutoRollback : BackgroundService +{ + private readonly ICanaryController _canaryController; + private readonly CanaryMetricsAnalyzer _metricsAnalyzer; + private readonly ICanaryStore _store; + private readonly ILogger _logger; + + protected override async Task ExecuteAsync(CancellationToken stoppingToken) + { + _logger.LogInformation("Auto-rollback service started"); + + while (!stoppingToken.IsCancellationRequested) + { + try + { + await CheckActiveCanariesAsync(stoppingToken); + } + catch (Exception ex) + { + _logger.LogError(ex, "Error in auto-rollback check"); + } + + await Task.Delay(TimeSpan.FromSeconds(30), stoppingToken); + } + } + + private async Task CheckActiveCanariesAsync(CancellationToken ct) + { + var activeCanaries = await _store.ListByStatusAsync(CanaryStatus.Running, ct); + + foreach (var canary in activeCanaries) + { + if (!canary.Config.AutoRollback) + continue; + + try + { + await CheckAndRollbackIfNeededAsync(canary, ct); + } + catch (Exception ex) + { + _logger.LogError(ex, + "Error checking canary {Id} for auto-rollback", + canary.Id); + } + } + } + + private async Task CheckAndRollbackIfNeededAsync(CanaryRelease canary, CancellationToken ct) + { + var metrics = await _metricsAnalyzer.CollectAsync(canary, ct); + var thresholds = canary.Config.Thresholds; + + var violations = new List(); + + if (metrics.ErrorRate > thresholds.MaxErrorRate) + { + violations.Add($"Error rate {metrics.ErrorRate:P1} exceeds threshold {thresholds.MaxErrorRate:P1}"); + } + + if (metrics.LatencyP99Ms > thresholds.MaxLatencyP99Ms) + { + violations.Add($"P99 latency {metrics.LatencyP99Ms:F0}ms exceeds threshold {thresholds.MaxLatencyP99Ms:F0}ms"); + } + + if (metrics.SuccessRate < thresholds.MinSuccessRate) + { + violations.Add($"Success rate {metrics.SuccessRate:P1} below threshold {thresholds.MinSuccessRate:P1}"); + } + + // Check custom thresholds + foreach (var (metricName, threshold) in thresholds.CustomThresholds) + { + if (metrics.CustomMetrics.TryGetValue(metricName, out var value) && value > threshold) + { + violations.Add($"Custom metric {metricName} ({value:F2}) exceeds threshold ({threshold:F2})"); + } + } + + if (violations.Count > 0) + { + var reason = $"Auto-rollback triggered: {string.Join("; ", violations)}"; + + _logger.LogWarning( + "Auto-rolling back canary {Id}: {Reason}", + canary.Id, + reason); + + await _canaryController.RollbackAsync(canary.Id, reason, ct); + } + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Create canary with progression steps +- [ ] Start canary at initial traffic percentage +- [ ] Advance through steps automatically +- [ ] Wait for manual approval when configured +- [ ] Rollback on metric threshold breach +- [ ] Auto-rollback runs in background +- [ ] Complete canary at 100% traffic +- [ ] Pause and resume canary +- [ ] Track metrics at each step +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 110_001 A/B Release Manager | Internal | TODO | +| 110_002 Traffic Router Framework | Internal | TODO | +| Telemetry | External | Existing | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| ICanaryController | TODO | | +| CanaryController | TODO | | +| CanaryProgressionEngine | TODO | | +| CanaryMetricsAnalyzer | TODO | | +| AutoRollback | TODO | | +| CanaryRelease model | TODO | | +| CanaryStep model | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_110_004_PROGDL_nginx_router.md b/docs/implplan/SPRINT_20260110_110_004_PROGDL_nginx_router.md new file mode 100644 index 000000000..a588926da --- /dev/null +++ b/docs/implplan/SPRINT_20260110_110_004_PROGDL_nginx_router.md @@ -0,0 +1,762 @@ +# SPRINT: Router Plugin - Nginx + +> **Sprint ID:** 110_004 +> **Module:** PROGDL +> **Phase:** 10 - Progressive Delivery +> **Status:** TODO +> **Parent:** [110_000_INDEX](SPRINT_20260110_110_000_INDEX_progressive_delivery.md) + +--- + +## Overview + +Implement the Nginx traffic router plugin as the **reference implementation** for progressive delivery traffic splitting. This plugin serves as the primary built-in router and as a template for additional router plugins. + +### Router Plugin Catalog + +The Release Orchestrator supports multiple traffic router implementations via the `ITrafficRouter` interface: + +| Router | Status | Description | +|--------|--------|-------------| +| **Nginx** | **v1 Built-in** | Reference implementation (this sprint) | +| HAProxy | Plugin Example | Sample implementation for plugin developers | +| Traefik | Plugin Example | Sample implementation for plugin developers | +| AWS ALB | Plugin Example | Sample implementation for plugin developers | +| Envoy | Post-v1 | Planned for future release | + +> **Plugin Developer Note:** HAProxy, Traefik, and AWS ALB are provided as reference examples in the Plugin SDK (`StellaOps.Plugin.Sdk`) to demonstrate how third parties can implement the `ITrafficRouter` interface. These examples can be found in `src/ReleaseOrchestrator/__Plugins/StellaOps.Plugin.Sdk/Examples/Routers/`. Organizations can implement their own routers for Istio, Linkerd, Kong, or any other traffic management system. + +### Objectives + +- Generate Nginx upstream configurations +- Generate Nginx split_clients config for weighted routing +- Support header-based routing via map directives +- Hot reload Nginx configuration +- Parse Nginx status for metrics +- Serve as reference implementation for custom router plugins + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Progressive/ +│ └── Routers/ +│ └── Nginx/ +│ ├── NginxRouter.cs +│ ├── NginxConfigGenerator.cs +│ ├── NginxReloader.cs +│ ├── NginxStatusParser.cs +│ └── Templates/ +│ ├── upstream.conf.template +│ ├── split_clients.conf.template +│ └── location.conf.template +└── __Tests/ +``` + +--- + +## Deliverables + +### NginxRouter + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Routers.Nginx; + +public sealed class NginxRouter : ITrafficRouter +{ + private readonly NginxConfigGenerator _configGenerator; + private readonly NginxReloader _reloader; + private readonly NginxStatusParser _statusParser; + private readonly NginxConfiguration _config; + private readonly IRoutingStateStore _stateStore; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public string RouterType => "nginx"; + + public IReadOnlyList SupportedStrategies => new[] + { + "weighted", + "header-based", + "cookie-based", + "combined" + }; + + public async Task IsAvailableAsync(CancellationToken ct = default) + { + try + { + var configTest = await TestConfigAsync(ct); + return configTest; + } + catch + { + return false; + } + } + + public async Task ApplyAsync( + RoutingConfig config, + CancellationToken ct = default) + { + _logger.LogInformation( + "Applying Nginx routing config for {ContextId}: {Control}%/{Treatment}%", + config.AbReleaseId, + config.ControlWeight, + config.TreatmentWeight); + + try + { + // Generate configuration files + var nginxConfig = _configGenerator.Generate(config); + + // Write configuration files + await WriteConfigFilesAsync(nginxConfig, ct); + + // Test configuration + var testResult = await TestConfigAsync(ct); + if (!testResult) + { + throw new NginxConfigurationException("Configuration test failed"); + } + + // Reload Nginx + await _reloader.ReloadAsync(ct); + + // Store state + await _stateStore.SaveAsync(new RoutingState + { + ContextId = config.AbReleaseId, + RouterType = RouterType, + Config = config, + Status = RoutingStateStatus.Applied, + AppliedAt = _timeProvider.GetUtcNow(), + LastVerifiedAt = _timeProvider.GetUtcNow() + }, ct); + + _logger.LogInformation( + "Successfully applied Nginx config for {ContextId}", + config.AbReleaseId); + } + catch (Exception ex) + { + _logger.LogError(ex, + "Failed to apply Nginx config for {ContextId}", + config.AbReleaseId); + + await _stateStore.SaveAsync(new RoutingState + { + ContextId = config.AbReleaseId, + RouterType = RouterType, + Config = config, + Status = RoutingStateStatus.Failed, + AppliedAt = _timeProvider.GetUtcNow(), + LastVerifiedAt = _timeProvider.GetUtcNow(), + Error = ex.Message + }, ct); + + throw; + } + } + + public async Task GetCurrentAsync( + Guid contextId, + CancellationToken ct = default) + { + var state = await _stateStore.GetAsync(contextId, ct); + if (state is null) + { + throw new RoutingConfigNotFoundException(contextId); + } + + return state.Config; + } + + public async Task RemoveAsync( + Guid contextId, + CancellationToken ct = default) + { + _logger.LogInformation("Removing Nginx config for {ContextId}", contextId); + + // Remove configuration files + var configPath = GetConfigPath(contextId); + if (File.Exists(configPath)) + { + File.Delete(configPath); + } + + // Reload Nginx + await _reloader.ReloadAsync(ct); + + // Update state + await _stateStore.DeleteAsync(contextId, ct); + } + + public async Task HealthCheckAsync(CancellationToken ct = default) + { + try + { + // Check Nginx is running + var isRunning = await CheckNginxRunningAsync(ct); + if (!isRunning) + return false; + + // Check config is valid + var configValid = await TestConfigAsync(ct); + return configValid; + } + catch + { + return false; + } + } + + public async Task GetMetricsAsync( + Guid contextId, + CancellationToken ct = default) + { + var statusUrl = $"{_config.StatusEndpoint}/status"; + return await _statusParser.ParseAsync(statusUrl, contextId, ct); + } + + private async Task WriteConfigFilesAsync(NginxConfig config, CancellationToken ct) + { + var basePath = _config.ConfigDirectory; + Directory.CreateDirectory(basePath); + + // Write upstream config + var upstreamPath = Path.Combine(basePath, $"upstream-{config.ContextId:N}.conf"); + await File.WriteAllTextAsync(upstreamPath, config.UpstreamConfig, ct); + + // Write routing config + var routingPath = Path.Combine(basePath, $"routing-{config.ContextId:N}.conf"); + await File.WriteAllTextAsync(routingPath, config.RoutingConfig, ct); + + // Write location config + var locationPath = Path.Combine(basePath, $"location-{config.ContextId:N}.conf"); + await File.WriteAllTextAsync(locationPath, config.LocationConfig, ct); + } + + private async Task TestConfigAsync(CancellationToken ct) + { + var result = await ExecuteNginxCommandAsync("-t", ct); + return result.ExitCode == 0; + } + + private async Task CheckNginxRunningAsync(CancellationToken ct) + { + var result = await ExecuteNginxCommandAsync("-s reload", ct); + return result.ExitCode == 0; + } + + private async Task ExecuteNginxCommandAsync(string args, CancellationToken ct) + { + var psi = new ProcessStartInfo + { + FileName = _config.NginxPath, + Arguments = args, + RedirectStandardOutput = true, + RedirectStandardError = true, + UseShellExecute = false + }; + + using var process = Process.Start(psi); + if (process is null) + { + return new ProcessResult(-1, "", "Failed to start process"); + } + + await process.WaitForExitAsync(ct); + + return new ProcessResult( + process.ExitCode, + await process.StandardOutput.ReadToEndAsync(ct), + await process.StandardError.ReadToEndAsync(ct)); + } + + private string GetConfigPath(Guid contextId) + { + return Path.Combine(_config.ConfigDirectory, $"routing-{contextId:N}.conf"); + } +} + +public sealed record ProcessResult(int ExitCode, string Stdout, string Stderr); +``` + +### NginxConfigGenerator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Routers.Nginx; + +public sealed class NginxConfigGenerator +{ + private readonly NginxConfiguration _config; + + public NginxConfig Generate(RoutingConfig config) + { + var contextId = config.AbReleaseId.ToString("N"); + + var upstreamConfig = GenerateUpstreams(config, contextId); + var routingConfig = GenerateRouting(config, contextId); + var locationConfig = GenerateLocation(config, contextId); + + return new NginxConfig + { + ContextId = config.AbReleaseId, + UpstreamConfig = upstreamConfig, + RoutingConfig = routingConfig, + LocationConfig = locationConfig + }; + } + + private string GenerateUpstreams(RoutingConfig config, string contextId) + { + var sb = new StringBuilder(); + + // Control upstream + sb.AppendLine($"upstream control_{contextId} {{"); + foreach (var endpoint in config.ControlEndpoints) + { + sb.AppendLine($" server {endpoint};"); + } + sb.AppendLine("}"); + sb.AppendLine(); + + // Treatment upstream + if (config.TreatmentEndpoints.Count > 0) + { + sb.AppendLine($"upstream treatment_{contextId} {{"); + foreach (var endpoint in config.TreatmentEndpoints) + { + sb.AppendLine($" server {endpoint};"); + } + sb.AppendLine("}"); + } + + return sb.ToString(); + } + + private string GenerateRouting(RoutingConfig config, string contextId) + { + var sb = new StringBuilder(); + + switch (config.Strategy) + { + case RoutingStrategy.Weighted: + sb.Append(GenerateWeightedRouting(config, contextId)); + break; + + case RoutingStrategy.HeaderBased: + sb.Append(GenerateHeaderRouting(config, contextId)); + break; + + case RoutingStrategy.CookieBased: + sb.Append(GenerateCookieRouting(config, contextId)); + break; + + case RoutingStrategy.Combined: + sb.Append(GenerateCombinedRouting(config, contextId)); + break; + } + + return sb.ToString(); + } + + private string GenerateWeightedRouting(RoutingConfig config, string contextId) + { + var sb = new StringBuilder(); + + // Use split_clients for weighted distribution + sb.AppendLine($"split_clients \"$request_id\" $ab_upstream_{contextId} {{"); + sb.AppendLine($" {config.ControlWeight}% control_{contextId};"); + sb.AppendLine($" * treatment_{contextId};"); + sb.AppendLine("}"); + + return sb.ToString(); + } + + private string GenerateHeaderRouting(RoutingConfig config, string contextId) + { + var header = config.HeaderRouting!; + var sb = new StringBuilder(); + + // Use map for header-based routing + sb.AppendLine($"map $http_{NormalizeHeaderName(header.HeaderName)} $ab_upstream_{contextId} {{"); + sb.AppendLine($" default control_{contextId};"); + sb.AppendLine($" \"{header.TreatmentValue}\" treatment_{contextId};"); + sb.AppendLine("}"); + + return sb.ToString(); + } + + private string GenerateCookieRouting(RoutingConfig config, string contextId) + { + var cookie = config.CookieRouting!; + var sb = new StringBuilder(); + + // Use map for cookie-based routing + sb.AppendLine($"map $cookie_{cookie.CookieName} $ab_upstream_{contextId} {{"); + sb.AppendLine($" default control_{contextId};"); + sb.AppendLine($" \"{cookie.TreatmentValue}\" treatment_{contextId};"); + sb.AppendLine("}"); + + return sb.ToString(); + } + + private string GenerateCombinedRouting(RoutingConfig config, string contextId) + { + var sb = new StringBuilder(); + + // Check header first, then cookie, then weighted + sb.AppendLine($"# Combined routing for {contextId}"); + + if (config.HeaderRouting is not null) + { + var header = config.HeaderRouting; + sb.AppendLine($"map $http_{NormalizeHeaderName(header.HeaderName)} $ab_header_{contextId} {{"); + sb.AppendLine($" default \"\";"); + sb.AppendLine($" \"{header.TreatmentValue}\" treatment_{contextId};"); + sb.AppendLine("}"); + sb.AppendLine(); + } + + if (config.CookieRouting is not null) + { + var cookie = config.CookieRouting; + sb.AppendLine($"map $cookie_{cookie.CookieName} $ab_cookie_{contextId} {{"); + sb.AppendLine($" default \"\";"); + sb.AppendLine($" \"{cookie.TreatmentValue}\" treatment_{contextId};"); + sb.AppendLine("}"); + sb.AppendLine(); + } + + // Weighted fallback + sb.AppendLine($"split_clients \"$request_id\" $ab_weighted_{contextId} {{"); + sb.AppendLine($" {config.ControlWeight}% control_{contextId};"); + sb.AppendLine($" * treatment_{contextId};"); + sb.AppendLine("}"); + sb.AppendLine(); + + // Combined decision + sb.AppendLine($"map $ab_header_{contextId}$ab_cookie_{contextId} $ab_upstream_{contextId} {{"); + sb.AppendLine($" default $ab_weighted_{contextId};"); + sb.AppendLine($" \"~treatment_\" treatment_{contextId};"); + sb.AppendLine("}"); + + return sb.ToString(); + } + + private string GenerateLocation(RoutingConfig config, string contextId) + { + var sb = new StringBuilder(); + + sb.AppendLine($"# Location for A/B release {config.AbReleaseId}"); + sb.AppendLine($"location @ab_{contextId} {{"); + sb.AppendLine($" proxy_pass http://$ab_upstream_{contextId};"); + sb.AppendLine($" proxy_set_header Host $host;"); + sb.AppendLine($" proxy_set_header X-Real-IP $remote_addr;"); + sb.AppendLine($" proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;"); + sb.AppendLine($" proxy_set_header X-AB-Variant $ab_upstream_{contextId};"); + sb.AppendLine($" proxy_set_header X-AB-Release-Id {config.AbReleaseId};"); + sb.AppendLine("}"); + + return sb.ToString(); + } + + private static string NormalizeHeaderName(string headerName) + { + // Convert header name to Nginx variable format + // X-Canary-Test -> x_canary_test + return headerName.ToLowerInvariant().Replace('-', '_'); + } +} + +public sealed record NginxConfig +{ + public required Guid ContextId { get; init; } + public required string UpstreamConfig { get; init; } + public required string RoutingConfig { get; init; } + public required string LocationConfig { get; init; } +} +``` + +### NginxReloader + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Routers.Nginx; + +public sealed class NginxReloader +{ + private readonly NginxConfiguration _config; + private readonly ILogger _logger; + private readonly SemaphoreSlim _reloadLock = new(1, 1); + + public async Task ReloadAsync(CancellationToken ct = default) + { + await _reloadLock.WaitAsync(ct); + + try + { + _logger.LogDebug("Reloading Nginx configuration"); + + var result = await ExecuteAsync("-s reload", ct); + + if (result.ExitCode != 0) + { + _logger.LogError( + "Nginx reload failed: {Stderr}", + result.Stderr); + throw new NginxReloadException(result.Stderr); + } + + // Wait for reload to complete + await Task.Delay(TimeSpan.FromMilliseconds(500), ct); + + // Verify reload + var testResult = await ExecuteAsync("-t", ct); + if (testResult.ExitCode != 0) + { + throw new NginxReloadException("Post-reload test failed"); + } + + _logger.LogInformation("Nginx configuration reloaded successfully"); + } + finally + { + _reloadLock.Release(); + } + } + + public async Task TestConfigAsync(CancellationToken ct = default) + { + var result = await ExecuteAsync("-t", ct); + return result.ExitCode == 0; + } + + private async Task ExecuteAsync(string args, CancellationToken ct) + { + var psi = new ProcessStartInfo + { + FileName = _config.NginxPath, + Arguments = args, + RedirectStandardOutput = true, + RedirectStandardError = true, + UseShellExecute = false, + CreateNoWindow = true + }; + + using var process = new Process { StartInfo = psi }; + process.Start(); + + var stdout = await process.StandardOutput.ReadToEndAsync(ct); + var stderr = await process.StandardError.ReadToEndAsync(ct); + + await process.WaitForExitAsync(ct); + + return new ProcessResult(process.ExitCode, stdout, stderr); + } +} + +public sealed class NginxReloadException : Exception +{ + public NginxReloadException(string message) : base(message) { } +} +``` + +### NginxStatusParser + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Routers.Nginx; + +public sealed class NginxStatusParser +{ + private readonly HttpClient _httpClient; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public async Task ParseAsync( + string statusUrl, + Guid contextId, + CancellationToken ct = default) + { + try + { + // Get Nginx stub_status or extended status + var response = await _httpClient.GetStringAsync(statusUrl, ct); + + // Parse the status response + var status = ParseStatusResponse(response); + + // Get upstream-specific metrics if available + var upstreamMetrics = await GetUpstreamMetricsAsync(contextId, ct); + + return new RouterMetrics + { + ControlRequests = upstreamMetrics.ControlRequests, + TreatmentRequests = upstreamMetrics.TreatmentRequests, + ControlErrorRate = upstreamMetrics.ControlErrorRate, + TreatmentErrorRate = upstreamMetrics.TreatmentErrorRate, + ControlLatencyP50 = upstreamMetrics.ControlLatencyP50, + TreatmentLatencyP50 = upstreamMetrics.TreatmentLatencyP50, + CollectedAt = _timeProvider.GetUtcNow() + }; + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Failed to parse Nginx status from {Url}", statusUrl); + + return new RouterMetrics + { + ControlRequests = 0, + TreatmentRequests = 0, + ControlErrorRate = 0, + TreatmentErrorRate = 0, + ControlLatencyP50 = 0, + TreatmentLatencyP50 = 0, + CollectedAt = _timeProvider.GetUtcNow() + }; + } + } + + private NginxStatus ParseStatusResponse(string response) + { + // Parse stub_status format: + // Active connections: 43 + // server accepts handled requests + // 7368 7368 10993 + // Reading: 0 Writing: 1 Waiting: 42 + + var lines = response.Split('\n', StringSplitOptions.RemoveEmptyEntries); + var status = new NginxStatus(); + + foreach (var line in lines) + { + if (line.StartsWith("Active connections:")) + { + var value = line.Split(':')[1].Trim(); + status.ActiveConnections = int.Parse(value, CultureInfo.InvariantCulture); + } + else if (line.Contains("Reading:")) + { + var parts = line.Split(new[] { ' ', ':' }, StringSplitOptions.RemoveEmptyEntries); + for (var i = 0; i < parts.Length; i++) + { + switch (parts[i]) + { + case "Reading": + status.Reading = int.Parse(parts[i + 1], CultureInfo.InvariantCulture); + break; + case "Writing": + status.Writing = int.Parse(parts[i + 1], CultureInfo.InvariantCulture); + break; + case "Waiting": + status.Waiting = int.Parse(parts[i + 1], CultureInfo.InvariantCulture); + break; + } + } + } + } + + return status; + } + + private async Task GetUpstreamMetricsAsync(Guid contextId, CancellationToken ct) + { + // This would typically query Nginx Plus API or a metrics exporter + // For open source Nginx, we'd use access log analysis or Prometheus metrics + + return new UpstreamMetrics + { + ControlRequests = 0, + TreatmentRequests = 0, + ControlErrorRate = 0, + TreatmentErrorRate = 0, + ControlLatencyP50 = 0, + TreatmentLatencyP50 = 0 + }; + } +} + +internal sealed class NginxStatus +{ + public int ActiveConnections { get; set; } + public int Accepts { get; set; } + public int Handled { get; set; } + public int Requests { get; set; } + public int Reading { get; set; } + public int Writing { get; set; } + public int Waiting { get; set; } +} + +internal sealed class UpstreamMetrics +{ + public long ControlRequests { get; set; } + public long TreatmentRequests { get; set; } + public double ControlErrorRate { get; set; } + public double TreatmentErrorRate { get; set; } + public double ControlLatencyP50 { get; set; } + public double TreatmentLatencyP50 { get; set; } +} +``` + +### NginxConfiguration + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Routers.Nginx; + +public sealed class NginxConfiguration +{ + public string NginxPath { get; set; } = "/usr/sbin/nginx"; + public string ConfigDirectory { get; set; } = "/etc/nginx/conf.d/stella-ab"; + public string StatusEndpoint { get; set; } = "http://127.0.0.1:8080"; + public TimeSpan ReloadTimeout { get; set; } = TimeSpan.FromSeconds(30); + public bool TestConfigBeforeReload { get; set; } = true; + public bool BackupConfigOnChange { get; set; } = true; +} +``` + +--- + +## Acceptance Criteria + +- [ ] Generate upstream configuration +- [ ] Generate split_clients for weighted routing +- [ ] Generate map for header-based routing +- [ ] Generate map for cookie-based routing +- [ ] Support combined routing strategies +- [ ] Test configuration before reload +- [ ] Hot reload Nginx configuration +- [ ] Parse Nginx status for metrics +- [ ] Handle reload failures gracefully +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 110_002 Traffic Router Framework | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| NginxRouter | TODO | | +| NginxConfigGenerator | TODO | | +| NginxReloader | TODO | | +| NginxStatusParser | TODO | | +| NginxConfiguration | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 10-Jan-2026 | Added Router Plugin Catalog with HAProxy/Traefik/ALB as plugin reference examples | diff --git a/docs/implplan/SPRINT_20260110_111_000_INDEX_ui_implementation.md b/docs/implplan/SPRINT_20260110_111_000_INDEX_ui_implementation.md new file mode 100644 index 000000000..937adf5f7 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_111_000_INDEX_ui_implementation.md @@ -0,0 +1,300 @@ +# SPRINT INDEX: Phase 11 - UI Implementation + +> **Epic:** Release Orchestrator +> **Phase:** 11 - UI Implementation +> **Batch:** 111 +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_release_orchestrator.md) + +--- + +## Overview + +Phase 11 implements the frontend UI for the Release Orchestrator - Angular-based dashboards, management screens, and workflow editors. + +### Objectives + +- Dashboard with pipeline overview +- Environment management UI +- Release management UI +- Visual workflow editor +- Promotion and approval UI +- Deployment monitoring UI +- Evidence viewer + +--- + +## Sprint Structure + +| Sprint ID | Title | Module | Status | Dependencies | +|-----------|-------|--------|--------|--------------| +| 111_001 | Dashboard - Overview | FE | TODO | 107_001 | +| 111_002 | Environment Management UI | FE | TODO | 103_001 | +| 111_003 | Release Management UI | FE | TODO | 104_003 | +| 111_004 | Workflow Editor | FE | TODO | 105_001 | +| 111_005 | Promotion & Approval UI | FE | TODO | 106_001 | +| 111_006 | Deployment Monitoring UI | FE | TODO | 107_001 | +| 111_007 | Evidence Viewer | FE | TODO | 109_002 | + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ UI IMPLEMENTATION │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ DASHBOARD (111_001) │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Pipeline Overview │ │ │ +│ │ │ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │ │ │ +│ │ │ │ DEV │──►│STAGE │──►│ UAT │──►│ PROD │ │ │ │ +│ │ │ │ ✓ 3 │ │ ✓ 2 │ │ ⟳ 1 │ │ ○ 0 │ │ │ │ +│ │ │ └──────┘ └──────┘ └──────┘ └──────┘ │ │ │ +│ │ └─────────────────────────────────────────────────────────────────┘ │ │ +│ │ │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ Pending │ │ Active │ │ Recent │ │ │ +│ │ │ Approvals │ │ Deployments │ │ Releases │ │ │ +│ │ │ (5) │ │ (2) │ │ (12) │ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ ENVIRONMENT MANAGEMENT (111_002) │ │ +│ │ │ │ +│ │ Environments │ Environment: Production │ │ +│ │ ├── Development │ ┌───────────────────────────────────────────────┐ │ │ +│ │ ├── Staging │ │ Targets (4) │ Settings │ │ │ +│ │ ├── UAT │ │ ├── prod-web-01 │ Required Approvals: 2 │ │ │ +│ │ └── Production◄─┤ │ ├── prod-web-02 │ Freeze Windows: 1 │ │ │ +│ │ │ │ ├── prod-api-01 │ Auto-promote: disabled │ │ │ +│ │ │ │ └── prod-api-02 │ SoD: enabled │ │ │ +│ │ │ └───────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ WORKFLOW EDITOR (111_004) │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Visual DAG Editor │ │ │ +│ │ │ │ │ │ +│ │ │ ┌──────────┐ │ │ │ +│ │ │ │ Security │ │ │ │ +│ │ │ │ Gate │ │ │ │ +│ │ │ └────┬─────┘ │ │ │ +│ │ │ │ │ │ │ +│ │ │ ┌────▼─────┐ │ │ │ +│ │ │ │ Approval │ [ Step Palette ] │ │ │ +│ │ │ └────┬─────┘ ├── Script │ │ │ +│ │ │ │ ├── Approval │ │ │ +│ │ │ ┌────────┼────────┐ ├── Deploy │ │ │ +│ │ │ ▼ ▼ ├── Notify │ │ │ +│ │ │ ┌──────┐ ┌──────┐└── Gate │ │ │ +│ │ │ │Deploy│ │Smoke │ │ │ │ +│ │ │ └──┬───┘ └──┬───┘ │ │ │ +│ │ │ └───────┬───────┘ │ │ │ +│ │ │ ▼ │ │ │ +│ │ │ ┌─────────┐ │ │ │ +│ │ │ │ Notify │ │ │ │ +│ │ │ └─────────┘ │ │ │ +│ │ └─────────────────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ DEPLOYMENT MONITORING (111_006) │ │ +│ │ │ │ +│ │ Deployment: myapp-v2.3.1 → Production │ │ +│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Progress: 75% ████████████████████░░░░░░░ │ │ │ +│ │ │ │ │ │ +│ │ │ Target Status Duration Agent │ │ │ +│ │ │ prod-web-01 ✓ Done 2m 15s agent-01 │ │ │ +│ │ │ prod-web-02 ✓ Done 2m 08s agent-01 │ │ │ +│ │ │ prod-api-01 ⟳ Running 1m 45s agent-02 │ │ │ +│ │ │ prod-api-02 ○ Pending - - │ │ │ +│ │ │ │ │ │ +│ │ │ [View Logs] [Cancel] [Rollback] │ │ │ +│ │ └─────────────────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Deliverables Summary + +### 111_001: Dashboard - Overview + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `DashboardComponent` | Angular | Main dashboard | +| `PipelineOverview` | Component | Environment pipeline | +| `PendingApprovals` | Component | Approval queue | +| `ActiveDeployments` | Component | Running deployments | +| `RecentReleases` | Component | Release list | + +### 111_002: Environment Management UI + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `EnvironmentListComponent` | Angular | Environment list | +| `EnvironmentDetailComponent` | Angular | Environment detail | +| `TargetListComponent` | Component | Target management | +| `FreezeWindowEditor` | Component | Freeze window config | + +### 111_003: Release Management UI + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `ReleaseListComponent` | Angular | Release catalog | +| `ReleaseDetailComponent` | Angular | Release detail | +| `CreateReleaseWizard` | Component | Release creation | +| `ComponentSelector` | Component | Add components | + +### 111_004: Workflow Editor + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `WorkflowEditorComponent` | Angular | DAG editor | +| `StepPalette` | Component | Available steps | +| `StepConfigPanel` | Component | Step configuration | +| `DagCanvas` | Component | Visual DAG | +| `YamlEditor` | Component | Raw YAML editing | + +### 111_005: Promotion & Approval UI + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `PromotionRequestComponent` | Angular | Request promotion | +| `ApprovalQueueComponent` | Angular | Pending approvals | +| `ApprovalDetailComponent` | Angular | Approval action | +| `GateResultsPanel` | Component | Gate status | + +### 111_006: Deployment Monitoring UI + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `DeploymentMonitorComponent` | Angular | Deployment status | +| `TargetProgressList` | Component | Per-target progress | +| `LogStreamViewer` | Component | Real-time logs | +| `RollbackDialog` | Component | Rollback confirmation | + +### 111_007: Evidence Viewer + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `EvidenceListComponent` | Angular | Evidence packets | +| `EvidenceDetailComponent` | Angular | Evidence detail | +| `EvidenceVerifier` | Component | Verify signature | +| `ExportDialog` | Component | Export options | + +--- + +## Component Library + +```typescript +// Shared UI Components +@NgModule({ + declarations: [ + // Layout + PageHeaderComponent, + SideNavComponent, + BreadcrumbsComponent, + + // Data Display + StatusBadgeComponent, + ProgressBarComponent, + TimelineComponent, + DataTableComponent, + + // Forms + SearchInputComponent, + FilterPanelComponent, + JsonEditorComponent, + + // Feedback + ToastNotificationComponent, + ConfirmDialogComponent, + LoadingSpinnerComponent, + + // Domain + EnvironmentBadgeComponent, + ReleaseStatusComponent, + GateStatusIconComponent, + DigestDisplayComponent + ] +}) +export class SharedUiModule {} +``` + +--- + +## State Management + +```typescript +// NgRx Store Structure +interface AppState { + environments: EnvironmentsState; + releases: ReleasesState; + promotions: PromotionsState; + deployments: DeploymentsState; + evidence: EvidenceState; + ui: UiState; +} + +// Actions Pattern +export const EnvironmentActions = createActionGroup({ + source: 'Environments', + events: { + 'Load Environments': emptyProps(), + 'Load Environments Success': props<{ environments: Environment[] }>(), + 'Load Environments Failure': props<{ error: string }>(), + 'Select Environment': props<{ id: string }>(), + 'Create Environment': props<{ request: CreateEnvironmentRequest }>(), + 'Update Environment': props<{ id: string; request: UpdateEnvironmentRequest }>(), + 'Delete Environment': props<{ id: string }>() + } +}); +``` + +--- + +## Dependencies + +| Module | Purpose | +|--------|---------| +| All backend APIs | Data source | +| Angular 17 | Framework | +| NgRx | State management | +| PrimeNG | UI components | +| Monaco Editor | YAML/JSON editing | +| D3.js | DAG visualization | + +--- + +## Acceptance Criteria + +- [ ] Dashboard loads quickly (<2s) +- [ ] Environment CRUD works +- [ ] Target health displayed +- [ ] Release creation wizard works +- [ ] Workflow editor saves correctly +- [ ] DAG visualization renders +- [ ] Approval flow works end-to-end +- [ ] Deployment progress updates real-time +- [ ] Log streaming works +- [ ] Evidence verification shows result +- [ ] Export downloads file +- [ ] Responsive on tablet/desktop + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Phase 11 index created | diff --git a/docs/implplan/SPRINT_20260110_111_001_FE_dashboard_overview.md b/docs/implplan/SPRINT_20260110_111_001_FE_dashboard_overview.md new file mode 100644 index 000000000..165269e1b --- /dev/null +++ b/docs/implplan/SPRINT_20260110_111_001_FE_dashboard_overview.md @@ -0,0 +1,772 @@ +# SPRINT: Dashboard - Overview + +> **Sprint ID:** 111_001 +> **Module:** FE +> **Phase:** 11 - UI Implementation +> **Status:** TODO +> **Parent:** [111_000_INDEX](SPRINT_20260110_111_000_INDEX_ui_implementation.md) + +--- + +## Overview + +Implement the main Release Orchestrator dashboard providing at-a-glance visibility into pipeline status, pending approvals, active deployments, and recent releases. + +### Objectives + +- Pipeline overview showing environments progression +- Pending approvals count and quick actions +- Active deployments with progress indicators +- Recent releases list +- Real-time updates via SignalR + +### Working Directory + +``` +src/Web/StellaOps.Web/ +├── src/app/features/release-orchestrator/ +│ └── dashboard/ +│ ├── dashboard.component.ts +│ ├── dashboard.component.html +│ ├── dashboard.component.scss +│ ├── dashboard.routes.ts +│ ├── components/ +│ │ ├── pipeline-overview/ +│ │ │ ├── pipeline-overview.component.ts +│ │ │ ├── pipeline-overview.component.html +│ │ │ └── pipeline-overview.component.scss +│ │ ├── pending-approvals/ +│ │ │ ├── pending-approvals.component.ts +│ │ │ ├── pending-approvals.component.html +│ │ │ └── pending-approvals.component.scss +│ │ ├── active-deployments/ +│ │ │ ├── active-deployments.component.ts +│ │ │ ├── active-deployments.component.html +│ │ │ └── active-deployments.component.scss +│ │ └── recent-releases/ +│ │ ├── recent-releases.component.ts +│ │ ├── recent-releases.component.html +│ │ └── recent-releases.component.scss +│ └── services/ +│ └── dashboard.service.ts +└── src/app/store/release-orchestrator/ + └── dashboard/ + ├── dashboard.actions.ts + ├── dashboard.reducer.ts + ├── dashboard.effects.ts + └── dashboard.selectors.ts +``` + +--- + +## Deliverables + +### Dashboard Component + +```typescript +// dashboard.component.ts +import { Component, OnInit, OnDestroy, inject, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { Store } from '@ngrx/store'; +import { Subject, takeUntil } from 'rxjs'; +import { PipelineOverviewComponent } from './components/pipeline-overview/pipeline-overview.component'; +import { PendingApprovalsComponent } from './components/pending-approvals/pending-approvals.component'; +import { ActiveDeploymentsComponent } from './components/active-deployments/active-deployments.component'; +import { RecentReleasesComponent } from './components/recent-releases/recent-releases.component'; +import { DashboardActions } from '@store/release-orchestrator/dashboard/dashboard.actions'; +import * as DashboardSelectors from '@store/release-orchestrator/dashboard/dashboard.selectors'; + +@Component({ + selector: 'so-dashboard', + standalone: true, + imports: [ + CommonModule, + PipelineOverviewComponent, + PendingApprovalsComponent, + ActiveDeploymentsComponent, + RecentReleasesComponent + ], + templateUrl: './dashboard.component.html', + styleUrl: './dashboard.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class DashboardComponent implements OnInit, OnDestroy { + private readonly store = inject(Store); + private readonly destroy$ = new Subject(); + + readonly loading$ = this.store.select(DashboardSelectors.selectLoading); + readonly error$ = this.store.select(DashboardSelectors.selectError); + readonly pipelineData$ = this.store.select(DashboardSelectors.selectPipelineData); + readonly pendingApprovals$ = this.store.select(DashboardSelectors.selectPendingApprovals); + readonly activeDeployments$ = this.store.select(DashboardSelectors.selectActiveDeployments); + readonly recentReleases$ = this.store.select(DashboardSelectors.selectRecentReleases); + readonly lastUpdated$ = this.store.select(DashboardSelectors.selectLastUpdated); + + ngOnInit(): void { + this.store.dispatch(DashboardActions.loadDashboard()); + this.store.dispatch(DashboardActions.subscribeToUpdates()); + } + + ngOnDestroy(): void { + this.store.dispatch(DashboardActions.unsubscribeFromUpdates()); + this.destroy$.next(); + this.destroy$.complete(); + } + + onRefresh(): void { + this.store.dispatch(DashboardActions.loadDashboard()); + } +} +``` + +```html + +
+
+

Release Orchestrator

+
+ + Last updated: {{ lastUpdated | date:'medium' }} + + +
+
+ +
+ +
+ +
+
+ + +
+ +
+
+ + +
+ +
+ + +
+ +
+ + +
+
+
+
+``` + +### Pipeline Overview Component + +```typescript +// pipeline-overview.component.ts +import { Component, Input, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { RouterModule } from '@angular/router'; + +export interface PipelineEnvironment { + id: string; + name: string; + order: number; + releaseCount: number; + pendingCount: number; + healthStatus: 'healthy' | 'degraded' | 'unhealthy' | 'unknown'; +} + +export interface PipelineData { + environments: PipelineEnvironment[]; + connections: Array<{ from: string; to: string }>; +} + +@Component({ + selector: 'so-pipeline-overview', + standalone: true, + imports: [CommonModule, RouterModule], + templateUrl: './pipeline-overview.component.html', + styleUrl: './pipeline-overview.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class PipelineOverviewComponent { + @Input() data: PipelineData | null = null; + @Input() loading = false; + + getStatusIcon(status: string): string { + switch (status) { + case 'healthy': return 'pi-check-circle'; + case 'degraded': return 'pi-exclamation-triangle'; + case 'unhealthy': return 'pi-times-circle'; + default: return 'pi-question-circle'; + } + } + + getStatusClass(status: string): string { + return `env-card--${status}`; + } +} +``` + +```html + + +``` + +### Pending Approvals Component + +```typescript +// pending-approvals.component.ts +import { Component, Input, Output, EventEmitter, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { RouterModule } from '@angular/router'; + +export interface PendingApproval { + id: string; + releaseId: string; + releaseName: string; + sourceEnvironment: string; + targetEnvironment: string; + requestedBy: string; + requestedAt: Date; + urgency: 'low' | 'normal' | 'high' | 'critical'; +} + +@Component({ + selector: 'so-pending-approvals', + standalone: true, + imports: [CommonModule, RouterModule], + templateUrl: './pending-approvals.component.html', + styleUrl: './pending-approvals.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class PendingApprovalsComponent { + @Input() approvals: PendingApproval[] | null = null; + @Input() loading = false; + @Output() approve = new EventEmitter(); + @Output() reject = new EventEmitter(); + + getUrgencyClass(urgency: string): string { + return `approval--${urgency}`; + } + + onQuickApprove(event: Event, id: string): void { + event.preventDefault(); + event.stopPropagation(); + this.approve.emit(id); + } + + onQuickReject(event: Event, id: string): void { + event.preventDefault(); + event.stopPropagation(); + this.reject.emit(id); + } +} +``` + +```html + + +``` + +### Active Deployments Component + +```typescript +// active-deployments.component.ts +import { Component, Input, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { RouterModule } from '@angular/router'; + +export interface ActiveDeployment { + id: string; + releaseId: string; + releaseName: string; + environment: string; + progress: number; + status: 'running' | 'paused' | 'waiting'; + startedAt: Date; + completedTargets: number; + totalTargets: number; +} + +@Component({ + selector: 'so-active-deployments', + standalone: true, + imports: [CommonModule, RouterModule], + templateUrl: './active-deployments.component.html', + styleUrl: './active-deployments.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class ActiveDeploymentsComponent { + @Input() deployments: ActiveDeployment[] | null = null; + @Input() loading = false; + + getStatusIcon(status: string): string { + switch (status) { + case 'running': return 'pi-spin pi-spinner'; + case 'paused': return 'pi-pause'; + case 'waiting': return 'pi-clock'; + default: return 'pi-question'; + } + } + + getDuration(startedAt: Date): string { + const diff = Date.now() - new Date(startedAt).getTime(); + const minutes = Math.floor(diff / 60000); + const seconds = Math.floor((diff % 60000) / 1000); + return `${minutes}m ${seconds}s`; + } +} +``` + +```html + + +``` + +### Recent Releases Component + +```typescript +// recent-releases.component.ts +import { Component, Input, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { RouterModule } from '@angular/router'; + +export interface RecentRelease { + id: string; + name: string; + version: string; + status: 'draft' | 'ready' | 'deploying' | 'deployed' | 'failed' | 'rolled_back'; + currentEnvironment: string | null; + createdAt: Date; + createdBy: string; + componentCount: number; +} + +@Component({ + selector: 'so-recent-releases', + standalone: true, + imports: [CommonModule, RouterModule], + templateUrl: './recent-releases.component.html', + styleUrl: './recent-releases.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class RecentReleasesComponent { + @Input() releases: RecentRelease[] | null = null; + @Input() loading = false; + + getStatusBadgeClass(status: string): string { + const map: Record = { + draft: 'badge--secondary', + ready: 'badge--info', + deploying: 'badge--warning', + deployed: 'badge--success', + failed: 'badge--danger', + rolled_back: 'badge--warning' + }; + return map[status] || 'badge--secondary'; + } + + formatStatus(status: string): string { + return status.replace('_', ' ').replace(/\b\w/g, c => c.toUpperCase()); + } +} +``` + +```html + +
+
+

Recent Releases

+ View all +
+ +
+ + + + Release + Status + Environment + Components + Created + + + + + + + {{ release.name }} {{ release.version }} + + + + + {{ formatStatus(release.status) }} + + + {{ release.currentEnvironment || '-' }} + {{ release.componentCount }} + {{ release.createdAt | date:'short' }} + + + + + No releases found + + + +
+ + + + +
+``` + +### Dashboard Store + +```typescript +// dashboard.actions.ts +import { createActionGroup, emptyProps, props } from '@ngrx/store'; +import { PipelineData, PendingApproval, ActiveDeployment, RecentRelease } from '../models'; + +export const DashboardActions = createActionGroup({ + source: 'Dashboard', + events: { + 'Load Dashboard': emptyProps(), + 'Load Dashboard Success': props<{ + pipelineData: PipelineData; + pendingApprovals: PendingApproval[]; + activeDeployments: ActiveDeployment[]; + recentReleases: RecentRelease[]; + }>(), + 'Load Dashboard Failure': props<{ error: string }>(), + 'Subscribe To Updates': emptyProps(), + 'Unsubscribe From Updates': emptyProps(), + 'Update Pipeline': props<{ pipelineData: PipelineData }>(), + 'Update Approvals': props<{ approvals: PendingApproval[] }>(), + 'Update Deployments': props<{ deployments: ActiveDeployment[] }>(), + 'Update Releases': props<{ releases: RecentRelease[] }>(), + } +}); + +// dashboard.reducer.ts +import { createReducer, on } from '@ngrx/store'; +import { DashboardActions } from './dashboard.actions'; + +export interface DashboardState { + pipelineData: PipelineData | null; + pendingApprovals: PendingApproval[]; + activeDeployments: ActiveDeployment[]; + recentReleases: RecentRelease[]; + loading: boolean; + error: string | null; + lastUpdated: Date | null; +} + +const initialState: DashboardState = { + pipelineData: null, + pendingApprovals: [], + activeDeployments: [], + recentReleases: [], + loading: false, + error: null, + lastUpdated: null +}; + +export const dashboardReducer = createReducer( + initialState, + on(DashboardActions.loadDashboard, (state) => ({ + ...state, + loading: true, + error: null + })), + on(DashboardActions.loadDashboardSuccess, (state, { pipelineData, pendingApprovals, activeDeployments, recentReleases }) => ({ + ...state, + pipelineData, + pendingApprovals, + activeDeployments, + recentReleases, + loading: false, + lastUpdated: new Date() + })), + on(DashboardActions.loadDashboardFailure, (state, { error }) => ({ + ...state, + loading: false, + error + })), + on(DashboardActions.updatePipeline, (state, { pipelineData }) => ({ + ...state, + pipelineData, + lastUpdated: new Date() + })), + on(DashboardActions.updateApprovals, (state, { approvals }) => ({ + ...state, + pendingApprovals: approvals, + lastUpdated: new Date() + })), + on(DashboardActions.updateDeployments, (state, { deployments }) => ({ + ...state, + activeDeployments: deployments, + lastUpdated: new Date() + })), + on(DashboardActions.updateReleases, (state, { releases }) => ({ + ...state, + recentReleases: releases, + lastUpdated: new Date() + })) +); + +// dashboard.selectors.ts +import { createFeatureSelector, createSelector } from '@ngrx/store'; +import { DashboardState } from './dashboard.reducer'; + +export const selectDashboardState = createFeatureSelector('dashboard'); + +export const selectLoading = createSelector(selectDashboardState, state => state.loading); +export const selectError = createSelector(selectDashboardState, state => state.error); +export const selectPipelineData = createSelector(selectDashboardState, state => state.pipelineData); +export const selectPendingApprovals = createSelector(selectDashboardState, state => state.pendingApprovals); +export const selectActiveDeployments = createSelector(selectDashboardState, state => state.activeDeployments); +export const selectRecentReleases = createSelector(selectDashboardState, state => state.recentReleases); +export const selectLastUpdated = createSelector(selectDashboardState, state => state.lastUpdated); +export const selectPendingApprovalCount = createSelector(selectPendingApprovals, approvals => approvals.length); +export const selectActiveDeploymentCount = createSelector(selectActiveDeployments, deployments => deployments.length); +``` + +### Dashboard Service + +```typescript +// dashboard.service.ts +import { Injectable, inject } from '@angular/core'; +import { HttpClient } from '@angular/common/http'; +import { Observable, Subject, takeUntil } from 'rxjs'; +import { HubConnection, HubConnectionBuilder } from '@microsoft/signalr'; +import { environment } from '@env/environment'; + +export interface DashboardData { + pipelineData: PipelineData; + pendingApprovals: PendingApproval[]; + activeDeployments: ActiveDeployment[]; + recentReleases: RecentRelease[]; +} + +@Injectable({ providedIn: 'root' }) +export class DashboardService { + private readonly http = inject(HttpClient); + private readonly baseUrl = `${environment.apiUrl}/api/v1/release-orchestrator/dashboard`; + private hubConnection: HubConnection | null = null; + private readonly updates$ = new Subject>(); + + getDashboardData(): Observable { + return this.http.get(this.baseUrl); + } + + subscribeToUpdates(): Observable> { + if (!this.hubConnection) { + this.hubConnection = new HubConnectionBuilder() + .withUrl(`${environment.apiUrl}/hubs/dashboard`) + .withAutomaticReconnect() + .build(); + + this.hubConnection.on('PipelineUpdated', (data) => { + this.updates$.next({ pipelineData: data }); + }); + + this.hubConnection.on('ApprovalsUpdated', (data) => { + this.updates$.next({ pendingApprovals: data }); + }); + + this.hubConnection.on('DeploymentsUpdated', (data) => { + this.updates$.next({ activeDeployments: data }); + }); + + this.hubConnection.on('ReleasesUpdated', (data) => { + this.updates$.next({ recentReleases: data }); + }); + + this.hubConnection.start().catch(err => console.error('SignalR connection error:', err)); + } + + return this.updates$.asObservable(); + } + + unsubscribeFromUpdates(): void { + if (this.hubConnection) { + this.hubConnection.stop(); + this.hubConnection = null; + } + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Dashboard loads within 2 seconds +- [ ] Pipeline overview shows all environments +- [ ] Environment health status displayed correctly +- [ ] Pending approvals show count badge +- [ ] Quick approve/reject actions work +- [ ] Active deployments show progress +- [ ] Recent releases table paginated +- [ ] Real-time updates via SignalR +- [ ] Loading skeletons shown during fetch +- [ ] Error messages displayed appropriately +- [ ] Responsive layout on tablet/desktop +- [ ] Unit test coverage >=80% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 107_001 Platform API Gateway | Internal | TODO | +| Angular 17 | External | Available | +| NgRx 17 | External | Available | +| PrimeNG 17 | External | Available | +| SignalR Client | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| DashboardComponent | TODO | | +| PipelineOverviewComponent | TODO | | +| PendingApprovalsComponent | TODO | | +| ActiveDeploymentsComponent | TODO | | +| RecentReleasesComponent | TODO | | +| Dashboard NgRx Store | TODO | | +| DashboardService | TODO | | +| SignalR integration | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_111_002_FE_environment_management_ui.md b/docs/implplan/SPRINT_20260110_111_002_FE_environment_management_ui.md new file mode 100644 index 000000000..c51bb7c49 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_111_002_FE_environment_management_ui.md @@ -0,0 +1,976 @@ +# SPRINT: Environment Management UI + +> **Sprint ID:** 111_002 +> **Module:** FE +> **Phase:** 11 - UI Implementation +> **Status:** TODO +> **Parent:** [111_000_INDEX](SPRINT_20260110_111_000_INDEX_ui_implementation.md) + +--- + +## Overview + +Implement the Environment Management UI providing CRUD operations for environments, target management, freeze window configuration, and environment settings. + +### Objectives + +- Environment list with hierarchy visualization +- Environment detail with targets and settings +- Target management (add/remove/health) +- Freeze window editor +- Environment settings configuration + +### Working Directory + +``` +src/Web/StellaOps.Web/ +├── src/app/features/release-orchestrator/ +│ └── environments/ +│ ├── environment-list/ +│ │ ├── environment-list.component.ts +│ │ ├── environment-list.component.html +│ │ └── environment-list.component.scss +│ ├── environment-detail/ +│ │ ├── environment-detail.component.ts +│ │ ├── environment-detail.component.html +│ │ └── environment-detail.component.scss +│ ├── components/ +│ │ ├── target-list/ +│ │ ├── target-form/ +│ │ ├── freeze-window-editor/ +│ │ ├── environment-settings/ +│ │ └── environment-form/ +│ ├── services/ +│ │ └── environment.service.ts +│ └── environments.routes.ts +└── src/app/store/release-orchestrator/ + └── environments/ + ├── environments.actions.ts + ├── environments.reducer.ts + ├── environments.effects.ts + └── environments.selectors.ts +``` + +--- + +## Deliverables + +### Environment List Component + +```typescript +// environment-list.component.ts +import { Component, OnInit, inject, ChangeDetectionStrategy, signal, computed } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { RouterModule } from '@angular/router'; +import { Store } from '@ngrx/store'; +import { DialogService, DynamicDialogRef } from 'primeng/dynamicdialog'; +import { ConfirmationService } from 'primeng/api'; +import { EnvironmentActions } from '@store/release-orchestrator/environments/environments.actions'; +import * as EnvironmentSelectors from '@store/release-orchestrator/environments/environments.selectors'; +import { EnvironmentFormComponent } from '../components/environment-form/environment-form.component'; + +export interface Environment { + id: string; + name: string; + description: string; + order: number; + isProduction: boolean; + targetCount: number; + healthyTargetCount: number; + requiresApproval: boolean; + requiredApprovers: number; + freezeWindowCount: number; + activeFreezeWindow: boolean; + createdAt: Date; + updatedAt: Date; +} + +@Component({ + selector: 'so-environment-list', + standalone: true, + imports: [CommonModule, RouterModule], + templateUrl: './environment-list.component.html', + styleUrl: './environment-list.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush, + providers: [DialogService, ConfirmationService] +}) +export class EnvironmentListComponent implements OnInit { + private readonly store = inject(Store); + private readonly dialogService = inject(DialogService); + private readonly confirmationService = inject(ConfirmationService); + + readonly environments$ = this.store.select(EnvironmentSelectors.selectAllEnvironments); + readonly loading$ = this.store.select(EnvironmentSelectors.selectLoading); + readonly error$ = this.store.select(EnvironmentSelectors.selectError); + + searchTerm = signal(''); + + ngOnInit(): void { + this.store.dispatch(EnvironmentActions.loadEnvironments()); + } + + onSearch(term: string): void { + this.searchTerm.set(term); + } + + onCreate(): void { + const ref = this.dialogService.open(EnvironmentFormComponent, { + header: 'Create Environment', + width: '600px', + data: { mode: 'create' } + }); + + ref.onClose.subscribe((result) => { + if (result) { + this.store.dispatch(EnvironmentActions.createEnvironment({ request: result })); + } + }); + } + + onDelete(env: Environment): void { + this.confirmationService.confirm({ + message: `Are you sure you want to delete "${env.name}"? This action cannot be undone.`, + header: 'Delete Environment', + icon: 'pi pi-exclamation-triangle', + acceptButtonStyleClass: 'p-button-danger', + accept: () => { + this.store.dispatch(EnvironmentActions.deleteEnvironment({ id: env.id })); + } + }); + } + + getHealthPercentage(env: Environment): number { + if (env.targetCount === 0) return 100; + return Math.round((env.healthyTargetCount / env.targetCount) * 100); + } + + getHealthClass(env: Environment): string { + const pct = this.getHealthPercentage(env); + if (pct >= 90) return 'health--good'; + if (pct >= 70) return 'health--warning'; + return 'health--critical'; + } +} +``` + +```html + +
+
+

Environments

+
+ + + + + +
+
+ +
+ +
+ +
+
+
+
+
+ #{{ env.order }} + {{ env.name }} + Production +
+ + +
+ +

{{ env.description }}

+ +
+
+ {{ env.targetCount }} + Targets +
+
+ {{ getHealthPercentage(env) }}% + Healthy +
+
+ {{ env.requiredApprovers }} + Approvers +
+
+ + +
+
+ +
+ +

No environments yet

+

Create your first environment to start managing releases.

+ +
+
+ + +
+ +
+
+
+ + +``` + +### Environment Detail Component + +```typescript +// environment-detail.component.ts +import { Component, OnInit, inject, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { ActivatedRoute, RouterModule } from '@angular/router'; +import { Store } from '@ngrx/store'; +import { TabViewModule } from 'primeng/tabview'; +import { TargetListComponent } from '../components/target-list/target-list.component'; +import { FreezeWindowEditorComponent } from '../components/freeze-window-editor/freeze-window-editor.component'; +import { EnvironmentSettingsComponent } from '../components/environment-settings/environment-settings.component'; +import { EnvironmentActions } from '@store/release-orchestrator/environments/environments.actions'; +import * as EnvironmentSelectors from '@store/release-orchestrator/environments/environments.selectors'; + +@Component({ + selector: 'so-environment-detail', + standalone: true, + imports: [ + CommonModule, + RouterModule, + TabViewModule, + TargetListComponent, + FreezeWindowEditorComponent, + EnvironmentSettingsComponent + ], + templateUrl: './environment-detail.component.html', + styleUrl: './environment-detail.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class EnvironmentDetailComponent implements OnInit { + private readonly store = inject(Store); + private readonly route = inject(ActivatedRoute); + + readonly environment$ = this.store.select(EnvironmentSelectors.selectSelectedEnvironment); + readonly targets$ = this.store.select(EnvironmentSelectors.selectSelectedEnvironmentTargets); + readonly freezeWindows$ = this.store.select(EnvironmentSelectors.selectSelectedEnvironmentFreezeWindows); + readonly loading$ = this.store.select(EnvironmentSelectors.selectLoading); + + activeTabIndex = 0; + + ngOnInit(): void { + const id = this.route.snapshot.paramMap.get('id'); + if (id) { + this.store.dispatch(EnvironmentActions.loadEnvironment({ id })); + this.store.dispatch(EnvironmentActions.loadEnvironmentTargets({ environmentId: id })); + this.store.dispatch(EnvironmentActions.loadFreezeWindows({ environmentId: id })); + } + } + + onTabChange(index: number): void { + this.activeTabIndex = index; + } +} +``` + +```html + +
+
+
+ Environments + + {{ env.name }} +
+ +
+
+

+ {{ env.name }} + Production +

+

{{ env.description }}

+
+
+ +
+
+ +
+
+ +
+ {{ env.targetCount }} + Deployment Targets +
+
+
+ +
+ {{ env.requiredApprovers }} + Required Approvers +
+
+
+ +
+ {{ env.freezeWindowCount }} + Freeze Windows +
+
+
+
+ + + + + + + + + + + + + + + + + +
+ +
+ +
+``` + +### Target List Component + +```typescript +// target-list.component.ts +import { Component, Input, inject, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { Store } from '@ngrx/store'; +import { DialogService } from 'primeng/dynamicdialog'; +import { ConfirmationService, MessageService } from 'primeng/api'; +import { TargetFormComponent } from '../target-form/target-form.component'; +import { EnvironmentActions } from '@store/release-orchestrator/environments/environments.actions'; + +export interface DeploymentTarget { + id: string; + environmentId: string; + name: string; + type: 'docker_host' | 'compose_host' | 'ecs_service' | 'nomad_job'; + agentId: string | null; + agentStatus: 'connected' | 'disconnected' | 'unknown'; + healthStatus: 'healthy' | 'unhealthy' | 'unknown'; + lastHealthCheck: Date | null; + metadata: Record; + createdAt: Date; +} + +@Component({ + selector: 'so-target-list', + standalone: true, + imports: [CommonModule], + templateUrl: './target-list.component.html', + styleUrl: './target-list.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush, + providers: [DialogService, ConfirmationService, MessageService] +}) +export class TargetListComponent { + @Input() targets: DeploymentTarget[] | null = null; + @Input() environmentId: string = ''; + @Input() loading = false; + + private readonly store = inject(Store); + private readonly dialogService = inject(DialogService); + private readonly confirmationService = inject(ConfirmationService); + + onAddTarget(): void { + const ref = this.dialogService.open(TargetFormComponent, { + header: 'Add Deployment Target', + width: '600px', + data: { environmentId: this.environmentId, mode: 'create' } + }); + + ref.onClose.subscribe((result) => { + if (result) { + this.store.dispatch(EnvironmentActions.addTarget({ + environmentId: this.environmentId, + request: result + })); + } + }); + } + + onEditTarget(target: DeploymentTarget): void { + const ref = this.dialogService.open(TargetFormComponent, { + header: 'Edit Deployment Target', + width: '600px', + data: { environmentId: this.environmentId, target, mode: 'edit' } + }); + + ref.onClose.subscribe((result) => { + if (result) { + this.store.dispatch(EnvironmentActions.updateTarget({ + environmentId: this.environmentId, + targetId: target.id, + request: result + })); + } + }); + } + + onRemoveTarget(target: DeploymentTarget): void { + this.confirmationService.confirm({ + message: `Remove target "${target.name}" from this environment?`, + header: 'Remove Target', + icon: 'pi pi-exclamation-triangle', + accept: () => { + this.store.dispatch(EnvironmentActions.removeTarget({ + environmentId: this.environmentId, + targetId: target.id + })); + } + }); + } + + onHealthCheck(target: DeploymentTarget): void { + this.store.dispatch(EnvironmentActions.checkTargetHealth({ + environmentId: this.environmentId, + targetId: target.id + })); + } + + getTypeIcon(type: string): string { + const icons: Record = { + docker_host: 'pi-box', + compose_host: 'pi-th-large', + ecs_service: 'pi-cloud', + nomad_job: 'pi-sitemap' + }; + return icons[type] || 'pi-server'; + } + + getHealthClass(status: string): string { + return `health-badge--${status}`; + } + + getAgentStatusClass(status: string): string { + return `agent-status--${status}`; + } +} +``` + +```html + +
+
+

Deployment Targets

+ +
+ + + + + Name + Type + Agent + Health + Last Check + Actions + + + + + +
+ + {{ target.name }} +
+ + {{ target.type | titlecase }} + + + + {{ target.agentId || 'Not assigned' }} + + + + + {{ target.healthStatus | titlecase }} + + + {{ target.lastHealthCheck | date:'short' }} + + + + + + +
+ + + +
+ +

No deployment targets configured

+ +
+ + +
+
+
+ + +``` + +### Freeze Window Editor Component + +```typescript +// freeze-window-editor.component.ts +import { Component, Input, inject, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { FormBuilder, FormGroup, ReactiveFormsModule, Validators } from '@angular/forms'; +import { Store } from '@ngrx/store'; +import { DialogService } from 'primeng/dynamicdialog'; +import { EnvironmentActions } from '@store/release-orchestrator/environments/environments.actions'; + +export interface FreezeWindow { + id: string; + environmentId: string; + name: string; + reason: string; + startTime: Date; + endTime: Date; + recurrence: 'none' | 'daily' | 'weekly' | 'monthly'; + isActive: boolean; + createdBy: string; + createdAt: Date; +} + +@Component({ + selector: 'so-freeze-window-editor', + standalone: true, + imports: [CommonModule, ReactiveFormsModule], + templateUrl: './freeze-window-editor.component.html', + styleUrl: './freeze-window-editor.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush, + providers: [DialogService] +}) +export class FreezeWindowEditorComponent { + @Input() freezeWindows: FreezeWindow[] | null = null; + @Input() environmentId: string = ''; + @Input() loading = false; + + private readonly store = inject(Store); + private readonly fb = inject(FormBuilder); + private readonly dialogService = inject(DialogService); + + showForm = false; + editingId: string | null = null; + + form: FormGroup = this.fb.group({ + name: ['', [Validators.required, Validators.maxLength(100)]], + reason: ['', [Validators.required, Validators.maxLength(500)]], + startTime: [null, Validators.required], + endTime: [null, Validators.required], + recurrence: ['none'] + }); + + recurrenceOptions = [ + { label: 'None (One-time)', value: 'none' }, + { label: 'Daily', value: 'daily' }, + { label: 'Weekly', value: 'weekly' }, + { label: 'Monthly', value: 'monthly' } + ]; + + onAdd(): void { + this.showForm = true; + this.editingId = null; + this.form.reset({ recurrence: 'none' }); + } + + onEdit(window: FreezeWindow): void { + this.showForm = true; + this.editingId = window.id; + this.form.patchValue({ + name: window.name, + reason: window.reason, + startTime: new Date(window.startTime), + endTime: new Date(window.endTime), + recurrence: window.recurrence + }); + } + + onCancel(): void { + this.showForm = false; + this.editingId = null; + this.form.reset(); + } + + onSave(): void { + if (this.form.invalid) return; + + const value = this.form.value; + if (this.editingId) { + this.store.dispatch(EnvironmentActions.updateFreezeWindow({ + environmentId: this.environmentId, + windowId: this.editingId, + request: value + })); + } else { + this.store.dispatch(EnvironmentActions.createFreezeWindow({ + environmentId: this.environmentId, + request: value + })); + } + + this.onCancel(); + } + + onDelete(window: FreezeWindow): void { + this.store.dispatch(EnvironmentActions.deleteFreezeWindow({ + environmentId: this.environmentId, + windowId: window.id + })); + } + + isActiveNow(window: FreezeWindow): boolean { + const now = new Date(); + return new Date(window.startTime) <= now && now <= new Date(window.endTime); + } + + getRecurrenceLabel(value: string): string { + return this.recurrenceOptions.find(o => o.value === value)?.label || value; + } +} +``` + +```html + +
+
+

Freeze Windows

+ +
+ +
+
+
+
+ + +
+ +
+ + +
+ +
+
+ + +
+
+ + +
+
+ +
+ + +
+
+ +
+ + +
+
+
+ +
+
+
+ + + {{ window.name }} + + Active +
+

{{ window.reason }}

+
+ {{ window.startTime | date:'medium' }} - {{ window.endTime | date:'medium' }} + + ({{ getRecurrenceLabel(window.recurrence) }}) + +
+
+ + +
+
+ +
+ +

No freeze windows configured

+
+
+ + + + +
+``` + +### Environment Settings Component + +```typescript +// environment-settings.component.ts +import { Component, Input, inject, OnChanges, SimpleChanges, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { FormBuilder, FormGroup, ReactiveFormsModule, Validators } from '@angular/forms'; +import { Store } from '@ngrx/store'; +import { EnvironmentActions } from '@store/release-orchestrator/environments/environments.actions'; + +@Component({ + selector: 'so-environment-settings', + standalone: true, + imports: [CommonModule, ReactiveFormsModule], + templateUrl: './environment-settings.component.html', + styleUrl: './environment-settings.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class EnvironmentSettingsComponent implements OnChanges { + @Input() environment: Environment | null = null; + @Input() loading = false; + + private readonly store = inject(Store); + private readonly fb = inject(FormBuilder); + + form: FormGroup = this.fb.group({ + requiresApproval: [true], + requiredApprovers: [1, [Validators.min(0), Validators.max(10)]], + autoPromoteOnSuccess: [false], + separationOfDuties: [false], + notifyOnPromotion: [true], + notifyOnDeployment: [true], + notifyOnFailure: [true], + webhookUrl: [''], + maxConcurrentDeployments: [1, [Validators.min(1), Validators.max(100)]], + deploymentTimeout: [3600, [Validators.min(60), Validators.max(86400)]] + }); + + ngOnChanges(changes: SimpleChanges): void { + if (changes['environment'] && this.environment) { + this.form.patchValue({ + requiresApproval: this.environment.requiresApproval, + requiredApprovers: this.environment.requiredApprovers, + autoPromoteOnSuccess: this.environment.autoPromoteOnSuccess, + separationOfDuties: this.environment.separationOfDuties, + notifyOnPromotion: this.environment.notifyOnPromotion, + notifyOnDeployment: this.environment.notifyOnDeployment, + notifyOnFailure: this.environment.notifyOnFailure, + webhookUrl: this.environment.webhookUrl || '', + maxConcurrentDeployments: this.environment.maxConcurrentDeployments, + deploymentTimeout: this.environment.deploymentTimeout + }); + } + } + + onSave(): void { + if (this.form.invalid || !this.environment) return; + + this.store.dispatch(EnvironmentActions.updateEnvironmentSettings({ + id: this.environment.id, + settings: this.form.value + })); + } + + onReset(): void { + if (this.environment) { + this.ngOnChanges({ environment: { currentValue: this.environment } } as any); + } + } +} +``` + +```html + +
+
+
+

Approval Settings

+ +
+ + +
+ +
+ + +
+ +
+ + +
+ +
+ + +
+
+ +
+

Notifications

+ +
+ + +
+ +
+ + +
+ +
+ + +
+ +
+ + +
+
+ +
+

Deployment Limits

+ +
+ + +
+ +
+ + + {{ form.get('deploymentTimeout')?.value / 60 | number:'1.0-0' }} minutes +
+
+ +
+ + +
+
+
+``` + +--- + +## Acceptance Criteria + +- [ ] Environment list displays all environments +- [ ] Environment cards show health status +- [ ] Create environment dialog works +- [ ] Delete environment with confirmation +- [ ] Environment detail loads correctly +- [ ] Target list shows all targets +- [ ] Add/edit/remove targets works +- [ ] Target health check triggers +- [ ] Freeze window CRUD operations work +- [ ] Freeze window recurrence saves +- [ ] Active freeze window highlighted +- [ ] Environment settings save correctly +- [ ] Form validation works +- [ ] Unit test coverage >=80% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 103_001 Environment Model | Internal | TODO | +| Angular 17 | External | Available | +| NgRx 17 | External | Available | +| PrimeNG 17 | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| EnvironmentListComponent | TODO | | +| EnvironmentDetailComponent | TODO | | +| TargetListComponent | TODO | | +| TargetFormComponent | TODO | | +| FreezeWindowEditorComponent | TODO | | +| EnvironmentSettingsComponent | TODO | | +| Environment NgRx Store | TODO | | +| EnvironmentService | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_111_003_FE_release_management_ui.md b/docs/implplan/SPRINT_20260110_111_003_FE_release_management_ui.md new file mode 100644 index 000000000..0ecbc0faa --- /dev/null +++ b/docs/implplan/SPRINT_20260110_111_003_FE_release_management_ui.md @@ -0,0 +1,931 @@ +# SPRINT: Release Management UI + +> **Sprint ID:** 111_003 +> **Module:** FE +> **Phase:** 11 - UI Implementation +> **Status:** TODO +> **Parent:** [111_000_INDEX](SPRINT_20260110_111_000_INDEX_ui_implementation.md) + +--- + +## Overview + +Implement the Release Management UI providing release catalog, release detail views, release creation wizard, and component selection functionality. + +### Objectives + +- Release catalog with filtering and search +- Release detail view with components +- Create release wizard (multi-step) +- Component selector with registry integration +- Release status tracking +- Release bundle comparison + +### Working Directory + +``` +src/Web/StellaOps.Web/ +├── src/app/features/release-orchestrator/ +│ └── releases/ +│ ├── release-list/ +│ │ ├── release-list.component.ts +│ │ ├── release-list.component.html +│ │ └── release-list.component.scss +│ ├── release-detail/ +│ │ ├── release-detail.component.ts +│ │ ├── release-detail.component.html +│ │ └── release-detail.component.scss +│ ├── create-release/ +│ │ ├── create-release.component.ts +│ │ ├── steps/ +│ │ │ ├── basic-info-step/ +│ │ │ ├── component-selection-step/ +│ │ │ ├── configuration-step/ +│ │ │ └── review-step/ +│ │ └── create-release.routes.ts +│ ├── components/ +│ │ ├── component-selector/ +│ │ ├── component-list/ +│ │ ├── release-timeline/ +│ │ └── release-comparison/ +│ └── releases.routes.ts +└── src/app/store/release-orchestrator/ + └── releases/ + ├── releases.actions.ts + ├── releases.reducer.ts + ├── releases.effects.ts + └── releases.selectors.ts +``` + +--- + +## Deliverables + +### Release List Component + +```typescript +// release-list.component.ts +import { Component, OnInit, inject, signal, computed, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { RouterModule } from '@angular/router'; +import { FormsModule } from '@angular/forms'; +import { Store } from '@ngrx/store'; +import { ReleaseActions } from '@store/release-orchestrator/releases/releases.actions'; +import * as ReleaseSelectors from '@store/release-orchestrator/releases/releases.selectors'; + +export interface Release { + id: string; + name: string; + version: string; + description: string; + status: 'draft' | 'ready' | 'deploying' | 'deployed' | 'failed' | 'rolled_back'; + currentEnvironment: string | null; + targetEnvironment: string | null; + componentCount: number; + createdAt: Date; + createdBy: string; + updatedAt: Date; + deployedAt: Date | null; +} + +@Component({ + selector: 'so-release-list', + standalone: true, + imports: [CommonModule, RouterModule, FormsModule], + templateUrl: './release-list.component.html', + styleUrl: './release-list.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class ReleaseListComponent implements OnInit { + private readonly store = inject(Store); + + readonly releases$ = this.store.select(ReleaseSelectors.selectFilteredReleases); + readonly loading$ = this.store.select(ReleaseSelectors.selectLoading); + readonly totalCount$ = this.store.select(ReleaseSelectors.selectTotalCount); + + searchTerm = signal(''); + statusFilter = signal([]); + environmentFilter = signal(null); + sortField = signal('createdAt'); + sortOrder = signal<'asc' | 'desc'>('desc'); + + readonly statusOptions = [ + { label: 'Draft', value: 'draft' }, + { label: 'Ready', value: 'ready' }, + { label: 'Deploying', value: 'deploying' }, + { label: 'Deployed', value: 'deployed' }, + { label: 'Failed', value: 'failed' }, + { label: 'Rolled Back', value: 'rolled_back' } + ]; + + ngOnInit(): void { + this.loadReleases(); + } + + loadReleases(): void { + this.store.dispatch(ReleaseActions.loadReleases({ + filter: { + search: this.searchTerm(), + statuses: this.statusFilter(), + environment: this.environmentFilter(), + sortField: this.sortField(), + sortOrder: this.sortOrder() + } + })); + } + + onSearch(term: string): void { + this.searchTerm.set(term); + this.loadReleases(); + } + + onStatusFilterChange(statuses: string[]): void { + this.statusFilter.set(statuses); + this.loadReleases(); + } + + onSort(field: string): void { + if (this.sortField() === field) { + this.sortOrder.set(this.sortOrder() === 'asc' ? 'desc' : 'asc'); + } else { + this.sortField.set(field); + this.sortOrder.set('desc'); + } + this.loadReleases(); + } + + getStatusClass(status: string): string { + const classes: Record = { + draft: 'badge--secondary', + ready: 'badge--info', + deploying: 'badge--warning', + deployed: 'badge--success', + failed: 'badge--danger', + rolled_back: 'badge--warning' + }; + return classes[status] || 'badge--secondary'; + } + + formatStatus(status: string): string { + return status.replace('_', ' ').replace(/\b\w/g, c => c.toUpperCase()); + } +} +``` + +```html + +
+
+

Releases

+ +
+ +
+ + + + + + + + +
+ + + + + + Release + + Status + Environment + Components + + Created + + Actions + + + + + + + {{ release.name }} + {{ release.version }} + + {{ release.description }} + + + + {{ formatStatus(release.status) }} + + + + + {{ release.currentEnvironment }} + + - + + {{ release.componentCount }} + + {{ release.createdAt | date:'short' }} + by {{ release.createdBy }} + + + + + + + + + + +
+ +

No releases found

+

Create your first release to get started.

+ +
+ + +
+
+
+``` + +### Release Detail Component + +```typescript +// release-detail.component.ts +import { Component, OnInit, inject, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { ActivatedRoute, RouterModule } from '@angular/router'; +import { Store } from '@ngrx/store'; +import { ConfirmationService } from 'primeng/api'; +import { ComponentListComponent } from '../components/component-list/component-list.component'; +import { ReleaseTimelineComponent } from '../components/release-timeline/release-timeline.component'; +import { ReleaseActions } from '@store/release-orchestrator/releases/releases.actions'; +import * as ReleaseSelectors from '@store/release-orchestrator/releases/releases.selectors'; + +export interface ReleaseComponent { + id: string; + name: string; + imageRef: string; + digest: string; + tag: string | null; + version: string; + type: 'container' | 'helm' | 'script'; + configOverrides: Record; +} + +export interface ReleaseEvent { + id: string; + type: 'created' | 'promoted' | 'approved' | 'rejected' | 'deployed' | 'failed' | 'rolled_back'; + environment: string | null; + actor: string; + message: string; + timestamp: Date; + metadata: Record; +} + +@Component({ + selector: 'so-release-detail', + standalone: true, + imports: [CommonModule, RouterModule, ComponentListComponent, ReleaseTimelineComponent], + templateUrl: './release-detail.component.html', + styleUrl: './release-detail.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush, + providers: [ConfirmationService] +}) +export class ReleaseDetailComponent implements OnInit { + private readonly store = inject(Store); + private readonly route = inject(ActivatedRoute); + private readonly confirmationService = inject(ConfirmationService); + + readonly release$ = this.store.select(ReleaseSelectors.selectSelectedRelease); + readonly components$ = this.store.select(ReleaseSelectors.selectSelectedReleaseComponents); + readonly events$ = this.store.select(ReleaseSelectors.selectSelectedReleaseEvents); + readonly loading$ = this.store.select(ReleaseSelectors.selectLoading); + + activeTab = 'components'; + + ngOnInit(): void { + const id = this.route.snapshot.paramMap.get('id'); + if (id) { + this.store.dispatch(ReleaseActions.loadRelease({ id })); + this.store.dispatch(ReleaseActions.loadReleaseComponents({ releaseId: id })); + this.store.dispatch(ReleaseActions.loadReleaseEvents({ releaseId: id })); + } + } + + onPromote(release: Release): void { + this.store.dispatch(ReleaseActions.requestPromotion({ releaseId: release.id })); + } + + onDeploy(release: Release): void { + this.confirmationService.confirm({ + message: `Deploy "${release.name}" to ${release.targetEnvironment}?`, + header: 'Confirm Deployment', + icon: 'pi pi-exclamation-triangle', + accept: () => { + this.store.dispatch(ReleaseActions.deploy({ releaseId: release.id })); + } + }); + } + + onRollback(release: Release): void { + this.confirmationService.confirm({ + message: `Rollback "${release.name}" from ${release.currentEnvironment}? This will restore the previous release.`, + header: 'Confirm Rollback', + icon: 'pi pi-exclamation-triangle', + acceptButtonStyleClass: 'p-button-danger', + accept: () => { + this.store.dispatch(ReleaseActions.rollback({ releaseId: release.id })); + } + }); + } + + canPromote(release: Release): boolean { + return release.status === 'ready' || release.status === 'deployed'; + } + + canDeploy(release: Release): boolean { + return release.status === 'ready' && release.targetEnvironment !== null; + } + + canRollback(release: Release): boolean { + return release.status === 'deployed' || release.status === 'failed'; + } + + getStatusClass(status: string): string { + const classes: Record = { + draft: 'badge--secondary', + ready: 'badge--info', + deploying: 'badge--warning', + deployed: 'badge--success', + failed: 'badge--danger', + rolled_back: 'badge--warning' + }; + return classes[status] || 'badge--secondary'; + } +} +``` + +```html + +
+
+
+ Releases + + {{ release.name }} +
+ +
+
+

+ {{ release.name }} + {{ release.version }} + + {{ release.status | titlecase }} + +

+

{{ release.description }}

+
+
+ + + +
+
+ +
+
+ + Created by {{ release.createdBy }} +
+
+ + {{ release.createdAt | date:'medium' }} +
+
+ + Currently in {{ release.currentEnvironment }} +
+
+ + Target: {{ release.targetEnvironment }} +
+
+
+ +
+ + + +
+ +
+ + + + + + +
+
{{ release | json }}
+
+
+
+ + +``` + +### Create Release Wizard + +```typescript +// create-release.component.ts +import { Component, inject, signal, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { Router } from '@angular/router'; +import { Store } from '@ngrx/store'; +import { StepsModule } from 'primeng/steps'; +import { MenuItem } from 'primeng/api'; +import { BasicInfoStepComponent } from './steps/basic-info-step/basic-info-step.component'; +import { ComponentSelectionStepComponent } from './steps/component-selection-step/component-selection-step.component'; +import { ConfigurationStepComponent } from './steps/configuration-step/configuration-step.component'; +import { ReviewStepComponent } from './steps/review-step/review-step.component'; +import { ReleaseActions } from '@store/release-orchestrator/releases/releases.actions'; + +export interface CreateReleaseData { + basicInfo: { + name: string; + version: string; + description: string; + }; + components: ReleaseComponent[]; + configuration: { + targetEnvironment: string; + deploymentStrategy: string; + configOverrides: Record; + }; +} + +@Component({ + selector: 'so-create-release', + standalone: true, + imports: [ + CommonModule, + StepsModule, + BasicInfoStepComponent, + ComponentSelectionStepComponent, + ConfigurationStepComponent, + ReviewStepComponent + ], + templateUrl: './create-release.component.html', + styleUrl: './create-release.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class CreateReleaseComponent { + private readonly store = inject(Store); + private readonly router = inject(Router); + + activeIndex = signal(0); + releaseData = signal>({}); + + readonly steps: MenuItem[] = [ + { label: 'Basic Info' }, + { label: 'Components' }, + { label: 'Configuration' }, + { label: 'Review' } + ]; + + onBasicInfoComplete(data: CreateReleaseData['basicInfo']): void { + this.releaseData.update(current => ({ ...current, basicInfo: data })); + this.activeIndex.set(1); + } + + onComponentsComplete(components: ReleaseComponent[]): void { + this.releaseData.update(current => ({ ...current, components })); + this.activeIndex.set(2); + } + + onConfigurationComplete(config: CreateReleaseData['configuration']): void { + this.releaseData.update(current => ({ ...current, configuration: config })); + this.activeIndex.set(3); + } + + onBack(): void { + this.activeIndex.update(i => Math.max(0, i - 1)); + } + + onCancel(): void { + this.router.navigate(['/releases']); + } + + onSubmit(): void { + const data = this.releaseData(); + if (data.basicInfo && data.components && data.configuration) { + this.store.dispatch(ReleaseActions.createRelease({ + request: { + name: data.basicInfo.name, + version: data.basicInfo.version, + description: data.basicInfo.description, + components: data.components, + targetEnvironment: data.configuration.targetEnvironment, + deploymentStrategy: data.configuration.deploymentStrategy, + configOverrides: data.configuration.configOverrides + } + })); + } + } +} +``` + +```html + +
+
+

Create Release

+ +
+ + + +
+ + + + + + + + + + + +
+
+``` + +### Component Selector Component + +```typescript +// component-selector.component.ts +import { Component, Input, Output, EventEmitter, inject, signal, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { FormsModule } from '@angular/forms'; +import { Store } from '@ngrx/store'; +import { debounceTime, Subject } from 'rxjs'; + +export interface RegistryImage { + name: string; + repository: string; + tags: string[]; + digests: Array<{ tag: string; digest: string; pushedAt: Date }>; + lastPushed: Date; +} + +@Component({ + selector: 'so-component-selector', + standalone: true, + imports: [CommonModule, FormsModule], + templateUrl: './component-selector.component.html', + styleUrl: './component-selector.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class ComponentSelectorComponent { + @Input() selectedComponents: ReleaseComponent[] = []; + @Output() selectionChange = new EventEmitter(); + @Output() close = new EventEmitter(); + + private readonly searchSubject = new Subject(); + + searchTerm = signal(''); + searchResults = signal([]); + loading = signal(false); + selectedImage = signal(null); + selectedDigest = signal(null); + + constructor() { + this.searchSubject.pipe( + debounceTime(300) + ).subscribe(term => this.search(term)); + } + + onSearchInput(term: string): void { + this.searchTerm.set(term); + this.searchSubject.next(term); + } + + private search(term: string): void { + if (term.length < 2) { + this.searchResults.set([]); + return; + } + + this.loading.set(true); + // API call would go here + // For now, simulating with timeout + setTimeout(() => { + this.searchResults.set([ + { + name: term, + repository: `registry.example.com/${term}`, + tags: ['latest', 'v1.0.0', 'v1.1.0'], + digests: [ + { tag: 'latest', digest: 'sha256:abc123...', pushedAt: new Date() }, + { tag: 'v1.0.0', digest: 'sha256:def456...', pushedAt: new Date() } + ], + lastPushed: new Date() + } + ]); + this.loading.set(false); + }, 500); + } + + onSelectImage(image: RegistryImage): void { + this.selectedImage.set(image); + this.selectedDigest.set(null); + } + + onSelectDigest(digest: string): void { + this.selectedDigest.set(digest); + } + + onAddComponent(): void { + const image = this.selectedImage(); + const digest = this.selectedDigest(); + + if (!image || !digest) return; + + const digestInfo = image.digests.find(d => d.digest === digest); + const component: ReleaseComponent = { + id: crypto.randomUUID(), + name: image.name, + imageRef: image.repository, + digest: digest, + tag: digestInfo?.tag || null, + version: digestInfo?.tag || digest.substring(7, 19), + type: 'container', + configOverrides: {} + }; + + const updated = [...this.selectedComponents, component]; + this.selectionChange.emit(updated); + this.selectedImage.set(null); + this.selectedDigest.set(null); + } + + onRemoveComponent(id: string): void { + const updated = this.selectedComponents.filter(c => c.id !== id); + this.selectionChange.emit(updated); + } + + isAlreadySelected(image: RegistryImage, digest: string): boolean { + return this.selectedComponents.some( + c => c.imageRef === image.repository && c.digest === digest + ); + } + + formatDigest(digest: string): string { + return digest.substring(0, 19) + '...'; + } +} +``` + +```html + +
+ + +
+
+ No images found matching "{{ searchTerm() }}" +
+ +
+
+ + {{ image.name }} +
+ {{ image.repository }} + +
+
+ {{ d.tag || 'untagged' }} + {{ formatDigest(d.digest) }} + {{ d.pushedAt | date:'short' }} + Added +
+
+
+
+ +
+

Selected Components ({{ selectedComponents.length }})

+
+ No components selected yet +
+
+
+ {{ comp.name }} + {{ comp.version }} + {{ formatDigest(comp.digest) }} +
+ +
+
+ +
+ +
+
+``` + +### Release Timeline Component + +```typescript +// release-timeline.component.ts +import { Component, Input, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; + +@Component({ + selector: 'so-release-timeline', + standalone: true, + imports: [CommonModule], + templateUrl: './release-timeline.component.html', + styleUrl: './release-timeline.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class ReleaseTimelineComponent { + @Input() events: ReleaseEvent[] | null = null; + + getEventIcon(type: string): string { + const icons: Record = { + created: 'pi-plus-circle', + promoted: 'pi-arrow-right', + approved: 'pi-check-circle', + rejected: 'pi-times-circle', + deployed: 'pi-cloud-upload', + failed: 'pi-exclamation-triangle', + rolled_back: 'pi-undo' + }; + return icons[type] || 'pi-circle'; + } + + getEventClass(type: string): string { + const classes: Record = { + created: 'event--info', + promoted: 'event--info', + approved: 'event--success', + rejected: 'event--danger', + deployed: 'event--success', + failed: 'event--danger', + rolled_back: 'event--warning' + }; + return classes[type] || 'event--default'; + } +} +``` + +```html + +
+
+ +

No events yet

+
+ +
+
+
+ +
+
+
+ {{ event.type | titlecase }} + + {{ event.environment }} + +
+

{{ event.message }}

+
+ {{ event.actor }} + {{ event.timestamp | date:'medium' }} +
+
+
+
+
+``` + +--- + +## Acceptance Criteria + +- [ ] Release list displays all releases +- [ ] Filtering by status works +- [ ] Filtering by environment works +- [ ] Search finds releases by name +- [ ] Sorting by columns works +- [ ] Pagination works correctly +- [ ] Release detail loads correctly +- [ ] Component list displays properly +- [ ] Timeline shows release events +- [ ] Promote action works +- [ ] Deploy action with confirmation +- [ ] Rollback action with confirmation +- [ ] Create wizard completes all steps +- [ ] Component selector searches registry +- [ ] Component selector adds by digest +- [ ] Review step shows all data +- [ ] Unit test coverage >=80% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 104_003 Release Bundle | Internal | TODO | +| Angular 17 | External | Available | +| NgRx 17 | External | Available | +| PrimeNG 17 | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| ReleaseListComponent | TODO | | +| ReleaseDetailComponent | TODO | | +| CreateReleaseComponent | TODO | | +| BasicInfoStepComponent | TODO | | +| ComponentSelectionStepComponent | TODO | | +| ConfigurationStepComponent | TODO | | +| ReviewStepComponent | TODO | | +| ComponentSelectorComponent | TODO | | +| ComponentListComponent | TODO | | +| ReleaseTimelineComponent | TODO | | +| Release NgRx Store | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_111_004_FE_workflow_editor.md b/docs/implplan/SPRINT_20260110_111_004_FE_workflow_editor.md new file mode 100644 index 000000000..f8623e7a1 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_111_004_FE_workflow_editor.md @@ -0,0 +1,1189 @@ +# SPRINT: Workflow Editor + +> **Sprint ID:** 111_004 +> **Module:** FE +> **Phase:** 11 - UI Implementation +> **Status:** TODO +> **Parent:** [111_000_INDEX](SPRINT_20260110_111_000_INDEX_ui_implementation.md) + +--- + +## Overview + +Implement the Visual Workflow Editor providing DAG-based workflow design, step palette, step configuration panel, and YAML editing capabilities. + +### Objectives + +- Visual DAG editor with drag-and-drop +- Step palette with available step types +- Step configuration panel +- Connection validation +- YAML view with syntax highlighting +- Import/export workflow definitions + +### Working Directory + +``` +src/Web/StellaOps.Web/ +├── src/app/features/release-orchestrator/ +│ └── workflows/ +│ ├── workflow-list/ +│ │ ├── workflow-list.component.ts +│ │ ├── workflow-list.component.html +│ │ └── workflow-list.component.scss +│ ├── workflow-editor/ +│ │ ├── workflow-editor.component.ts +│ │ ├── workflow-editor.component.html +│ │ └── workflow-editor.component.scss +│ ├── components/ +│ │ ├── dag-canvas/ +│ │ │ ├── dag-canvas.component.ts +│ │ │ ├── dag-canvas.component.html +│ │ │ └── dag-canvas.component.scss +│ │ ├── step-palette/ +│ │ │ ├── step-palette.component.ts +│ │ │ └── step-palette.component.html +│ │ ├── step-config-panel/ +│ │ │ ├── step-config-panel.component.ts +│ │ │ └── step-config-panel.component.html +│ │ ├── step-node/ +│ │ │ ├── step-node.component.ts +│ │ │ └── step-node.component.html +│ │ └── yaml-editor/ +│ │ ├── yaml-editor.component.ts +│ │ └── yaml-editor.component.html +│ ├── services/ +│ │ ├── workflow.service.ts +│ │ └── dag-layout.service.ts +│ └── workflows.routes.ts +└── src/app/store/release-orchestrator/ + └── workflows/ + ├── workflows.actions.ts + ├── workflows.reducer.ts + └── workflows.selectors.ts +``` + +--- + +## Deliverables + +### Workflow Editor Component + +```typescript +// workflow-editor.component.ts +import { Component, OnInit, inject, signal, computed, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { ActivatedRoute, Router } from '@angular/router'; +import { Store } from '@ngrx/store'; +import { ConfirmationService, MessageService } from 'primeng/api'; +import { DagCanvasComponent } from '../components/dag-canvas/dag-canvas.component'; +import { StepPaletteComponent } from '../components/step-palette/step-palette.component'; +import { StepConfigPanelComponent } from '../components/step-config-panel/step-config-panel.component'; +import { YamlEditorComponent } from '../components/yaml-editor/yaml-editor.component'; +import { WorkflowActions } from '@store/release-orchestrator/workflows/workflows.actions'; +import * as WorkflowSelectors from '@store/release-orchestrator/workflows/workflows.selectors'; + +export interface WorkflowStep { + id: string; + type: 'script' | 'approval' | 'deploy' | 'notify' | 'gate' | 'wait' | 'parallel' | 'manual'; + name: string; + config: Record; + position: { x: number; y: number }; + dependencies: string[]; +} + +export interface Workflow { + id: string; + name: string; + description: string; + version: number; + steps: WorkflowStep[]; + connections: Array<{ from: string; to: string }>; + triggers: WorkflowTrigger[]; + isDraft: boolean; + createdAt: Date; + updatedAt: Date; +} + +export interface WorkflowTrigger { + type: 'manual' | 'promotion' | 'schedule' | 'webhook'; + config: Record; +} + +@Component({ + selector: 'so-workflow-editor', + standalone: true, + imports: [ + CommonModule, + DagCanvasComponent, + StepPaletteComponent, + StepConfigPanelComponent, + YamlEditorComponent + ], + templateUrl: './workflow-editor.component.html', + styleUrl: './workflow-editor.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush, + providers: [ConfirmationService, MessageService] +}) +export class WorkflowEditorComponent implements OnInit { + private readonly store = inject(Store); + private readonly route = inject(ActivatedRoute); + private readonly router = inject(Router); + private readonly confirmationService = inject(ConfirmationService); + private readonly messageService = inject(MessageService); + + readonly workflow$ = this.store.select(WorkflowSelectors.selectCurrentWorkflow); + readonly loading$ = this.store.select(WorkflowSelectors.selectLoading); + readonly validationErrors$ = this.store.select(WorkflowSelectors.selectValidationErrors); + readonly isDirty$ = this.store.select(WorkflowSelectors.selectIsDirty); + + viewMode = signal<'visual' | 'yaml'>('visual'); + selectedStepId = signal(null); + showPalette = signal(true); + zoom = signal(100); + + readonly selectedStep = computed(() => { + const workflow = this.workflow$; + const stepId = this.selectedStepId(); + // Would use actual selector in real implementation + return null; + }); + + ngOnInit(): void { + const id = this.route.snapshot.paramMap.get('id'); + if (id && id !== 'new') { + this.store.dispatch(WorkflowActions.loadWorkflow({ id })); + } else { + this.store.dispatch(WorkflowActions.createNewWorkflow()); + } + } + + onStepAdded(step: Partial): void { + this.store.dispatch(WorkflowActions.addStep({ step: step as WorkflowStep })); + } + + onStepSelected(stepId: string | null): void { + this.selectedStepId.set(stepId); + } + + onStepUpdated(step: WorkflowStep): void { + this.store.dispatch(WorkflowActions.updateStep({ step })); + } + + onStepDeleted(stepId: string): void { + this.confirmationService.confirm({ + message: 'Delete this step? All connections to this step will also be removed.', + header: 'Delete Step', + icon: 'pi pi-exclamation-triangle', + accept: () => { + this.store.dispatch(WorkflowActions.deleteStep({ stepId })); + this.selectedStepId.set(null); + } + }); + } + + onConnectionAdded(connection: { from: string; to: string }): void { + this.store.dispatch(WorkflowActions.addConnection({ connection })); + } + + onConnectionRemoved(connection: { from: string; to: string }): void { + this.store.dispatch(WorkflowActions.removeConnection({ connection })); + } + + onStepMoved(stepId: string, position: { x: number; y: number }): void { + this.store.dispatch(WorkflowActions.moveStep({ stepId, position })); + } + + onYamlChanged(yaml: string): void { + this.store.dispatch(WorkflowActions.updateFromYaml({ yaml })); + } + + onSave(): void { + this.store.dispatch(WorkflowActions.saveWorkflow()); + } + + onValidate(): void { + this.store.dispatch(WorkflowActions.validateWorkflow()); + } + + onPublish(): void { + this.confirmationService.confirm({ + message: 'Publish this workflow? It will become available for use in releases.', + header: 'Publish Workflow', + accept: () => { + this.store.dispatch(WorkflowActions.publishWorkflow()); + } + }); + } + + onZoomIn(): void { + this.zoom.update(z => Math.min(200, z + 10)); + } + + onZoomOut(): void { + this.zoom.update(z => Math.max(25, z - 10)); + } + + onZoomReset(): void { + this.zoom.set(100); + } + + onToggleView(): void { + this.viewMode.update(m => m === 'visual' ? 'yaml' : 'visual'); + } + + canDeactivate(): boolean { + // Would check isDirty$ and prompt for confirmation + return true; + } +} +``` + +```html + +
+
+
+ + + + + Draft + Unsaved +
+ +
+
+ + +
+ + + + +
+
+ +
+ + +
    +
  • {{ error }}
  • +
+
+
+
+ +
+ + + + +
+
+ + {{ zoom() }}% + + + + +
+ + + +
+ + +
+ + +
+ + +
+
+
+ + + +``` + +### DAG Canvas Component + +```typescript +// dag-canvas.component.ts +import { Component, Input, Output, EventEmitter, ElementRef, ViewChild, + AfterViewInit, OnChanges, SimpleChanges, inject, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import * as d3 from 'd3'; +import { DagLayoutService } from '../../services/dag-layout.service'; +import { StepNodeComponent } from '../step-node/step-node.component'; + +interface DagNode { + id: string; + x: number; + y: number; + width: number; + height: number; + step: WorkflowStep; +} + +interface DagEdge { + from: string; + to: string; + path: string; +} + +@Component({ + selector: 'so-dag-canvas', + standalone: true, + imports: [CommonModule, StepNodeComponent], + templateUrl: './dag-canvas.component.html', + styleUrl: './dag-canvas.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class DagCanvasComponent implements AfterViewInit, OnChanges { + @Input() workflow: Workflow | null = null; + @Input() zoom = 100; + @Input() selectedStepId: string | null = null; + + @Output() stepSelected = new EventEmitter(); + @Output() stepMoved = new EventEmitter<{ stepId: string; position: { x: number; y: number } }>(); + @Output() stepDeleted = new EventEmitter(); + @Output() connectionAdded = new EventEmitter<{ from: string; to: string }>(); + @Output() connectionRemoved = new EventEmitter<{ from: string; to: string }>(); + + @ViewChild('svgContainer') svgContainer!: ElementRef; + @ViewChild('canvasContainer') canvasContainer!: ElementRef; + + private readonly layoutService = inject(DagLayoutService); + + nodes: DagNode[] = []; + edges: DagEdge[] = []; + connectingFrom: string | null = null; + mousePosition = { x: 0, y: 0 }; + + ngAfterViewInit(): void { + this.setupDragDrop(); + this.setupPanZoom(); + } + + ngOnChanges(changes: SimpleChanges): void { + if (changes['workflow'] && this.workflow) { + this.computeLayout(); + } + } + + private computeLayout(): void { + if (!this.workflow) return; + + const layout = this.layoutService.computeLayout( + this.workflow.steps, + this.workflow.connections + ); + + this.nodes = layout.nodes; + this.edges = layout.edges; + } + + private setupDragDrop(): void { + // D3 drag behavior for nodes + const drag = d3.drag() + .on('start', (event, d) => { + d3.select(event.sourceEvent.target.closest('.dag-node')).raise(); + }) + .on('drag', (event, d) => { + d.x = event.x; + d.y = event.y; + this.updateNodePosition(d); + this.updateEdges(); + }) + .on('end', (event, d) => { + this.stepMoved.emit({ + stepId: d.id, + position: { x: d.x, y: d.y } + }); + }); + + // Apply to all node elements + d3.select(this.svgContainer.nativeElement) + .selectAll('.dag-node') + .call(drag); + } + + private setupPanZoom(): void { + const svg = d3.select(this.svgContainer.nativeElement); + const zoom = d3.zoom() + .scaleExtent([0.25, 2]) + .on('zoom', (event) => { + svg.select('.canvas-content') + .attr('transform', event.transform.toString()); + }); + + svg.call(zoom); + } + + private updateNodePosition(node: DagNode): void { + d3.select(this.svgContainer.nativeElement) + .select(`#node-${node.id}`) + .attr('transform', `translate(${node.x}, ${node.y})`); + } + + private updateEdges(): void { + this.edges = this.layoutService.computeEdgePaths(this.nodes, this.workflow?.connections || []); + } + + onNodeClick(node: DagNode, event: MouseEvent): void { + event.stopPropagation(); + this.stepSelected.emit(node.id); + } + + onCanvasClick(): void { + this.stepSelected.emit(null); + this.connectingFrom = null; + } + + onStartConnection(nodeId: string): void { + this.connectingFrom = nodeId; + } + + onEndConnection(nodeId: string): void { + if (this.connectingFrom && this.connectingFrom !== nodeId) { + this.connectionAdded.emit({ + from: this.connectingFrom, + to: nodeId + }); + } + this.connectingFrom = null; + } + + onEdgeClick(edge: DagEdge, event: MouseEvent): void { + event.stopPropagation(); + // Could show context menu for edge deletion + } + + onDeleteEdge(edge: DagEdge): void { + this.connectionRemoved.emit({ from: edge.from, to: edge.to }); + } + + onMouseMove(event: MouseEvent): void { + if (this.connectingFrom) { + const rect = this.canvasContainer.nativeElement.getBoundingClientRect(); + this.mousePosition = { + x: event.clientX - rect.left, + y: event.clientY - rect.top + }; + } + } + + getNodeTransform(node: DagNode): string { + return `translate(${node.x}, ${node.y})`; + } +} +``` + +```html + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ Release to connect +
+
+``` + +### Step Palette Component + +```typescript +// step-palette.component.ts +import { Component, Output, EventEmitter, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { DragDropModule, CdkDragEnd } from '@angular/cdk/drag-drop'; + +export interface StepTemplate { + type: string; + name: string; + icon: string; + description: string; + category: 'actions' | 'control' | 'integration'; + defaultConfig: Record; +} + +@Component({ + selector: 'so-step-palette', + standalone: true, + imports: [CommonModule, DragDropModule], + templateUrl: './step-palette.component.html', + styleUrl: './step-palette.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class StepPaletteComponent { + @Output() stepDragStart = new EventEmitter>(); + + readonly stepTemplates: StepTemplate[] = [ + // Actions + { + type: 'script', + name: 'Script', + icon: 'pi-code', + description: 'Execute a shell script', + category: 'actions', + defaultConfig: { script: '', timeout: 300 } + }, + { + type: 'deploy', + name: 'Deploy', + icon: 'pi-cloud-upload', + description: 'Deploy to target', + category: 'actions', + defaultConfig: { strategy: 'rolling', targetSelector: '' } + }, + { + type: 'notify', + name: 'Notify', + icon: 'pi-bell', + description: 'Send notification', + category: 'actions', + defaultConfig: { channel: 'slack', message: '' } + }, + // Control + { + type: 'approval', + name: 'Approval', + icon: 'pi-check-circle', + description: 'Wait for approval', + category: 'control', + defaultConfig: { approvers: [], minApprovals: 1 } + }, + { + type: 'gate', + name: 'Gate', + icon: 'pi-shield', + description: 'Conditional gate', + category: 'control', + defaultConfig: { condition: '', failAction: 'stop' } + }, + { + type: 'wait', + name: 'Wait', + icon: 'pi-clock', + description: 'Wait for duration', + category: 'control', + defaultConfig: { duration: 60 } + }, + { + type: 'parallel', + name: 'Parallel', + icon: 'pi-arrows-h', + description: 'Run steps in parallel', + category: 'control', + defaultConfig: { maxConcurrency: 0 } + }, + { + type: 'manual', + name: 'Manual', + icon: 'pi-user', + description: 'Manual intervention', + category: 'control', + defaultConfig: { instructions: '' } + }, + // Integration + { + type: 'webhook', + name: 'Webhook', + icon: 'pi-link', + description: 'Call external webhook', + category: 'integration', + defaultConfig: { url: '', method: 'POST', headers: {} } + } + ]; + + readonly categories = [ + { key: 'actions', label: 'Actions' }, + { key: 'control', label: 'Control Flow' }, + { key: 'integration', label: 'Integration' } + ]; + + getStepsByCategory(category: string): StepTemplate[] { + return this.stepTemplates.filter(s => s.category === category); + } + + onDragEnd(event: CdkDragEnd, template: StepTemplate): void { + const step: Partial = { + id: crypto.randomUUID(), + type: template.type as any, + name: template.name, + config: { ...template.defaultConfig }, + position: { + x: event.dropPoint.x, + y: event.dropPoint.y + }, + dependencies: [] + }; + + this.stepDragStart.emit(step); + } +} +``` + +```html + +
+

Steps

+ +
+

{{ category.label }}

+
+
+
+ +
+
+ {{ template.name }} + {{ template.description }} +
+
+ + {{ template.name }} +
+
+
+
+
+``` + +### Step Config Panel Component + +```typescript +// step-config-panel.component.ts +import { Component, Input, Output, EventEmitter, OnChanges, SimpleChanges, + inject, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { FormBuilder, FormGroup, ReactiveFormsModule } from '@angular/forms'; + +@Component({ + selector: 'so-step-config-panel', + standalone: true, + imports: [CommonModule, ReactiveFormsModule], + templateUrl: './step-config-panel.component.html', + styleUrl: './step-config-panel.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class StepConfigPanelComponent implements OnChanges { + @Input() step: WorkflowStep | null = null; + @Output() stepUpdated = new EventEmitter(); + @Output() close = new EventEmitter(); + + private readonly fb = inject(FormBuilder); + + form: FormGroup = this.fb.group({ + name: [''], + config: this.fb.group({}) + }); + + ngOnChanges(changes: SimpleChanges): void { + if (changes['step'] && this.step) { + this.buildForm(); + } + } + + private buildForm(): void { + if (!this.step) return; + + // Build dynamic form based on step type + const configGroup = this.getConfigFormGroup(this.step.type, this.step.config); + + this.form = this.fb.group({ + name: [this.step.name], + config: configGroup + }); + } + + private getConfigFormGroup(type: string, config: Record): FormGroup { + switch (type) { + case 'script': + return this.fb.group({ + script: [config['script'] || ''], + timeout: [config['timeout'] || 300], + shell: [config['shell'] || '/bin/bash'] + }); + + case 'deploy': + return this.fb.group({ + strategy: [config['strategy'] || 'rolling'], + targetSelector: [config['targetSelector'] || ''], + healthCheckPath: [config['healthCheckPath'] || '/health'], + rollbackOnFailure: [config['rollbackOnFailure'] ?? true] + }); + + case 'approval': + return this.fb.group({ + approvers: [config['approvers'] || []], + minApprovals: [config['minApprovals'] || 1], + timeout: [config['timeout'] || 86400], + autoReject: [config['autoReject'] ?? false] + }); + + case 'gate': + return this.fb.group({ + condition: [config['condition'] || ''], + failAction: [config['failAction'] || 'stop'], + retryCount: [config['retryCount'] || 0], + retryDelay: [config['retryDelay'] || 60] + }); + + case 'notify': + return this.fb.group({ + channel: [config['channel'] || 'slack'], + message: [config['message'] || ''], + recipients: [config['recipients'] || []] + }); + + case 'wait': + return this.fb.group({ + duration: [config['duration'] || 60], + until: [config['until'] || null] + }); + + case 'webhook': + return this.fb.group({ + url: [config['url'] || ''], + method: [config['method'] || 'POST'], + headers: [config['headers'] || {}], + body: [config['body'] || ''], + expectedStatus: [config['expectedStatus'] || 200] + }); + + default: + return this.fb.group({}); + } + } + + onSave(): void { + if (!this.step || this.form.invalid) return; + + const updated: WorkflowStep = { + ...this.step, + name: this.form.value.name, + config: this.form.value.config + }; + + this.stepUpdated.emit(updated); + } + + getStepIcon(): string { + const icons: Record = { + script: 'pi-code', + deploy: 'pi-cloud-upload', + approval: 'pi-check-circle', + gate: 'pi-shield', + notify: 'pi-bell', + wait: 'pi-clock', + parallel: 'pi-arrows-h', + manual: 'pi-user', + webhook: 'pi-link' + }; + return icons[this.step?.type || ''] || 'pi-circle'; + } +} +``` + +```html + +
+
+
+ + {{ step?.type | titlecase }} Configuration +
+ +
+ +
+
+ + +
+ + + + +
+ + +
+
+ + +
+
+ + +
+
+ + + +
+ + +
+
+ + +
+
+ + +
+
+ + +
+
+ + + +
+ + +
+
+ + +
+
+ + +
+
+ + + +
+ + +
+
+ + +
+
+ + + +
+ + +
+
+ + +
+
+ + + +
+ + +
+
+
+ +
+ +
+
+
+``` + +### YAML Editor Component + +```typescript +// yaml-editor.component.ts +import { Component, Input, Output, EventEmitter, AfterViewInit, ViewChild, + ElementRef, OnChanges, SimpleChanges, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import * as yaml from 'js-yaml'; +import * as monaco from 'monaco-editor'; + +@Component({ + selector: 'so-yaml-editor', + standalone: true, + imports: [CommonModule], + templateUrl: './yaml-editor.component.html', + styleUrl: './yaml-editor.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class YamlEditorComponent implements AfterViewInit, OnChanges { + @Input() workflow: Workflow | null = null; + @Output() yamlChanged = new EventEmitter(); + + @ViewChild('editorContainer') editorContainer!: ElementRef; + + private editor: monaco.editor.IStandaloneCodeEditor | null = null; + private ignoreChange = false; + + parseError: string | null = null; + + ngAfterViewInit(): void { + this.initEditor(); + } + + ngOnChanges(changes: SimpleChanges): void { + if (changes['workflow'] && this.editor && this.workflow) { + this.ignoreChange = true; + const yamlContent = this.workflowToYaml(this.workflow); + this.editor.setValue(yamlContent); + this.ignoreChange = false; + } + } + + private initEditor(): void { + this.editor = monaco.editor.create(this.editorContainer.nativeElement, { + value: this.workflow ? this.workflowToYaml(this.workflow) : '', + language: 'yaml', + theme: 'vs-dark', + automaticLayout: true, + minimap: { enabled: false }, + lineNumbers: 'on', + scrollBeyondLastLine: false, + wordWrap: 'on', + fontSize: 14, + tabSize: 2 + }); + + this.editor.onDidChangeModelContent(() => { + if (this.ignoreChange) return; + + const content = this.editor?.getValue() || ''; + this.validateAndEmit(content); + }); + } + + private validateAndEmit(content: string): void { + try { + yaml.load(content); + this.parseError = null; + this.yamlChanged.emit(content); + } catch (e: any) { + this.parseError = e.message; + } + } + + private workflowToYaml(workflow: Workflow): string { + const yamlObj = { + name: workflow.name, + description: workflow.description, + triggers: workflow.triggers, + steps: workflow.steps.map(step => ({ + id: step.id, + name: step.name, + type: step.type, + depends_on: step.dependencies.length > 0 ? step.dependencies : undefined, + config: step.config + })) + }; + + return yaml.dump(yamlObj, { + indent: 2, + lineWidth: 120, + noRefs: true + }); + } + + onFormat(): void { + if (!this.editor) return; + + try { + const content = this.editor.getValue(); + const parsed = yaml.load(content); + const formatted = yaml.dump(parsed, { indent: 2, lineWidth: 120 }); + this.ignoreChange = true; + this.editor.setValue(formatted); + this.ignoreChange = false; + } catch (e) { + // Ignore formatting errors + } + } + + onCopy(): void { + if (this.editor) { + navigator.clipboard.writeText(this.editor.getValue()); + } + } +} +``` + +```html + +
+
+ + + + + {{ parseError }} + +
+
+
+``` + +--- + +## Acceptance Criteria + +- [ ] Workflow list displays all workflows +- [ ] Create new workflow initializes empty canvas +- [ ] Load existing workflow displays DAG +- [ ] Drag step from palette to canvas +- [ ] Connect steps by dragging +- [ ] Delete steps and connections +- [ ] Step configuration panel updates step +- [ ] Save workflow persists changes +- [ ] Publish workflow marks as non-draft +- [ ] YAML view shows valid YAML +- [ ] YAML changes reflect in visual view +- [ ] Syntax highlighting in YAML editor +- [ ] Zoom in/out works +- [ ] Pan canvas works +- [ ] Validation errors displayed +- [ ] Unit test coverage >=80% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 105_001 Workflow DAG | Internal | TODO | +| Angular 17 | External | Available | +| D3.js | External | Available | +| Monaco Editor | External | Available | +| js-yaml | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| WorkflowListComponent | TODO | | +| WorkflowEditorComponent | TODO | | +| DagCanvasComponent | TODO | | +| StepPaletteComponent | TODO | | +| StepConfigPanelComponent | TODO | | +| StepNodeComponent | TODO | | +| YamlEditorComponent | TODO | | +| DagLayoutService | TODO | | +| Workflow NgRx Store | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_111_005_FE_promotion_approval_ui.md b/docs/implplan/SPRINT_20260110_111_005_FE_promotion_approval_ui.md new file mode 100644 index 000000000..2d3407ce1 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_111_005_FE_promotion_approval_ui.md @@ -0,0 +1,991 @@ +# SPRINT: Promotion & Approval UI + +> **Sprint ID:** 111_005 +> **Module:** FE +> **Phase:** 11 - UI Implementation +> **Status:** TODO +> **Parent:** [111_000_INDEX](SPRINT_20260110_111_000_INDEX_ui_implementation.md) + +--- + +## Overview + +Implement the Promotion and Approval UI providing promotion request creation, approval queue management, approval detail views, and gate results display. + +### Objectives + +- Promotion request form with gate preview +- Approval queue with filtering +- Approval detail with gate results +- Approve/reject with comments +- Batch approval support +- Approval history + +### Working Directory + +``` +src/Web/StellaOps.Web/ +├── src/app/features/release-orchestrator/ +│ └── approvals/ +│ ├── promotion-request/ +│ │ ├── promotion-request.component.ts +│ │ ├── promotion-request.component.html +│ │ └── promotion-request.component.scss +│ ├── approval-queue/ +│ │ ├── approval-queue.component.ts +│ │ ├── approval-queue.component.html +│ │ └── approval-queue.component.scss +│ ├── approval-detail/ +│ │ ├── approval-detail.component.ts +│ │ ├── approval-detail.component.html +│ │ └── approval-detail.component.scss +│ ├── components/ +│ │ ├── gate-results-panel/ +│ │ ├── approval-form/ +│ │ ├── approval-history/ +│ │ └── approver-list/ +│ └── approvals.routes.ts +└── src/app/store/release-orchestrator/ + └── approvals/ + ├── approvals.actions.ts + ├── approvals.reducer.ts + └── approvals.selectors.ts +``` + +--- + +## Deliverables + +### Promotion Request Component + +```typescript +// promotion-request.component.ts +import { Component, OnInit, inject, signal, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { ActivatedRoute, Router } from '@angular/router'; +import { FormBuilder, FormGroup, ReactiveFormsModule, Validators } from '@angular/forms'; +import { Store } from '@ngrx/store'; +import { GateResultsPanelComponent } from '../components/gate-results-panel/gate-results-panel.component'; +import { PromotionActions } from '@store/release-orchestrator/approvals/approvals.actions'; +import * as ApprovalSelectors from '@store/release-orchestrator/approvals/approvals.selectors'; + +export interface GateResult { + gateId: string; + gateName: string; + type: 'security' | 'policy' | 'quality' | 'custom'; + status: 'passed' | 'failed' | 'warning' | 'pending' | 'skipped'; + message: string; + details: Record; + evaluatedAt: Date; +} + +export interface PromotionPreview { + releaseId: string; + releaseName: string; + sourceEnvironment: string; + targetEnvironment: string; + gateResults: GateResult[]; + allGatesPassed: boolean; + requiredApprovers: number; + estimatedDeployTime: number; + warnings: string[]; +} + +@Component({ + selector: 'so-promotion-request', + standalone: true, + imports: [CommonModule, ReactiveFormsModule, GateResultsPanelComponent], + templateUrl: './promotion-request.component.html', + styleUrl: './promotion-request.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class PromotionRequestComponent implements OnInit { + private readonly store = inject(Store); + private readonly route = inject(ActivatedRoute); + private readonly router = inject(Router); + private readonly fb = inject(FormBuilder); + + readonly preview$ = this.store.select(ApprovalSelectors.selectPromotionPreview); + readonly loading$ = this.store.select(ApprovalSelectors.selectLoading); + readonly submitting$ = this.store.select(ApprovalSelectors.selectSubmitting); + + readonly environments$ = this.store.select(ApprovalSelectors.selectAvailableEnvironments); + + form: FormGroup = this.fb.group({ + targetEnvironment: ['', Validators.required], + urgency: ['normal'], + justification: ['', [Validators.required, Validators.minLength(10)]], + notifyApprovers: [true], + scheduledTime: [null] + }); + + urgencyOptions = [ + { label: 'Low', value: 'low' }, + { label: 'Normal', value: 'normal' }, + { label: 'High', value: 'high' }, + { label: 'Critical', value: 'critical' } + ]; + + releaseId: string = ''; + + ngOnInit(): void { + this.releaseId = this.route.snapshot.paramMap.get('releaseId') || ''; + if (this.releaseId) { + this.store.dispatch(PromotionActions.loadAvailableEnvironments({ releaseId: this.releaseId })); + } + + // Watch for target environment changes to fetch preview + this.form.get('targetEnvironment')?.valueChanges.subscribe(envId => { + if (envId) { + this.store.dispatch(PromotionActions.loadPromotionPreview({ + releaseId: this.releaseId, + targetEnvironmentId: envId + })); + } + }); + } + + onSubmit(): void { + if (this.form.invalid) return; + + this.store.dispatch(PromotionActions.submitPromotionRequest({ + releaseId: this.releaseId, + request: { + targetEnvironmentId: this.form.value.targetEnvironment, + urgency: this.form.value.urgency, + justification: this.form.value.justification, + notifyApprovers: this.form.value.notifyApprovers, + scheduledTime: this.form.value.scheduledTime + } + })); + } + + onCancel(): void { + this.router.navigate(['/releases', this.releaseId]); + } + + hasFailedGates(preview: PromotionPreview): boolean { + return preview.gateResults.some(g => g.status === 'failed'); + } + + getFailedGatesCount(preview: PromotionPreview): number { + return preview.gateResults.filter(g => g.status === 'failed').length; + } +} +``` + +```html + +
+
+

Request Promotion

+
+ +
+
+
+

Promotion Details

+ +
+ + + +
+ +
+ + + +
+ +
+ + + Minimum 10 characters required +
+ +
+ + + + Leave empty to deploy as soon as approved +
+ +
+ + +
+
+ + +
+

Gate Evaluation Preview

+ +
+
+ + All gates passed + + {{ getFailedGatesCount(preview) }} gate(s) failed + +
+
+ Required approvers: {{ preview.requiredApprovers }} + Est. deploy time: {{ preview.estimatedDeployTime }}s +
+
+ +
+ + +
    +
  • {{ warning }}
  • +
+
+
+
+ + +
+ +
+ + +
+
+
+
+``` + +### Approval Queue Component + +```typescript +// approval-queue.component.ts +import { Component, OnInit, inject, signal, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { RouterModule } from '@angular/router'; +import { FormsModule } from '@angular/forms'; +import { Store } from '@ngrx/store'; +import { ConfirmationService } from 'primeng/api'; +import { ApprovalActions } from '@store/release-orchestrator/approvals/approvals.actions'; +import * as ApprovalSelectors from '@store/release-orchestrator/approvals/approvals.selectors'; + +export interface ApprovalRequest { + id: string; + releaseId: string; + releaseName: string; + releaseVersion: string; + sourceEnvironment: string; + targetEnvironment: string; + requestedBy: string; + requestedAt: Date; + urgency: 'low' | 'normal' | 'high' | 'critical'; + justification: string; + status: 'pending' | 'approved' | 'rejected' | 'expired'; + currentApprovals: number; + requiredApprovals: number; + gatesPassed: boolean; + scheduledTime: Date | null; + expiresAt: Date; +} + +@Component({ + selector: 'so-approval-queue', + standalone: true, + imports: [CommonModule, RouterModule, FormsModule], + templateUrl: './approval-queue.component.html', + styleUrl: './approval-queue.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush, + providers: [ConfirmationService] +}) +export class ApprovalQueueComponent implements OnInit { + private readonly store = inject(Store); + private readonly confirmationService = inject(ConfirmationService); + + readonly approvals$ = this.store.select(ApprovalSelectors.selectFilteredApprovals); + readonly loading$ = this.store.select(ApprovalSelectors.selectLoading); + readonly selectedIds = signal>(new Set()); + + statusFilter = signal(['pending']); + urgencyFilter = signal([]); + environmentFilter = signal(null); + + readonly statusOptions = [ + { label: 'Pending', value: 'pending' }, + { label: 'Approved', value: 'approved' }, + { label: 'Rejected', value: 'rejected' }, + { label: 'Expired', value: 'expired' } + ]; + + readonly urgencyOptions = [ + { label: 'Low', value: 'low' }, + { label: 'Normal', value: 'normal' }, + { label: 'High', value: 'high' }, + { label: 'Critical', value: 'critical' } + ]; + + ngOnInit(): void { + this.loadApprovals(); + } + + loadApprovals(): void { + this.store.dispatch(ApprovalActions.loadApprovals({ + filter: { + statuses: this.statusFilter(), + urgencies: this.urgencyFilter(), + environment: this.environmentFilter() + } + })); + } + + onStatusFilterChange(statuses: string[]): void { + this.statusFilter.set(statuses); + this.loadApprovals(); + } + + onToggleSelect(id: string): void { + this.selectedIds.update(ids => { + const newIds = new Set(ids); + if (newIds.has(id)) { + newIds.delete(id); + } else { + newIds.add(id); + } + return newIds; + }); + } + + onSelectAll(approvals: ApprovalRequest[]): void { + const pendingIds = approvals + .filter(a => a.status === 'pending') + .map(a => a.id); + this.selectedIds.set(new Set(pendingIds)); + } + + onDeselectAll(): void { + this.selectedIds.set(new Set()); + } + + onBatchApprove(): void { + const ids = Array.from(this.selectedIds()); + if (ids.length === 0) return; + + this.confirmationService.confirm({ + message: `Approve ${ids.length} promotion request(s)?`, + header: 'Batch Approve', + accept: () => { + this.store.dispatch(ApprovalActions.batchApprove({ ids, comment: 'Batch approved' })); + this.selectedIds.set(new Set()); + } + }); + } + + onBatchReject(): void { + const ids = Array.from(this.selectedIds()); + if (ids.length === 0) return; + + this.confirmationService.confirm({ + message: `Reject ${ids.length} promotion request(s)?`, + header: 'Batch Reject', + acceptButtonStyleClass: 'p-button-danger', + accept: () => { + this.store.dispatch(ApprovalActions.batchReject({ ids, comment: 'Batch rejected' })); + this.selectedIds.set(new Set()); + } + }); + } + + getUrgencyClass(urgency: string): string { + const classes: Record = { + low: 'urgency--low', + normal: 'urgency--normal', + high: 'urgency--high', + critical: 'urgency--critical' + }; + return classes[urgency] || ''; + } + + getStatusClass(status: string): string { + const classes: Record = { + pending: 'badge--warning', + approved: 'badge--success', + rejected: 'badge--danger', + expired: 'badge--secondary' + }; + return classes[status] || ''; + } + + isExpiringSoon(approval: ApprovalRequest): boolean { + const hoursUntilExpiry = (new Date(approval.expiresAt).getTime() - Date.now()) / 3600000; + return approval.status === 'pending' && hoursUntilExpiry < 4; + } +} +``` + +```html + +
+
+

Approval Queue

+
+ + +
+
+ +
+ {{ selectedIds().size }} selected + + + +
+ +
+ + + + + + + + Release + Promotion + Urgency + Status + Approvals + Requested + Actions + + + + + + + + + + + {{ approval.releaseName }} + {{ approval.releaseVersion }} + + + + + {{ approval.sourceEnvironment }} + + {{ approval.targetEnvironment }} + + + + + {{ approval.urgency | titlecase }} + + + + + {{ approval.status | titlecase }} + + + + + + + + {{ approval.currentApprovals }}/{{ approval.requiredApprovals }} + + + + {{ approval.requestedAt | date:'short' }} + by {{ approval.requestedBy }} + + + + + + + + + +
+ +

No approvals found

+

There are no promotion requests matching your filters.

+
+ + +
+
+
+ + + + +
+ + +``` + +### Approval Detail Component + +```typescript +// approval-detail.component.ts +import { Component, OnInit, inject, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { ActivatedRoute, RouterModule } from '@angular/router'; +import { FormBuilder, FormGroup, ReactiveFormsModule, Validators } from '@angular/forms'; +import { Store } from '@ngrx/store'; +import { GateResultsPanelComponent } from '../components/gate-results-panel/gate-results-panel.component'; +import { ApprovalHistoryComponent } from '../components/approval-history/approval-history.component'; +import { ApproverListComponent } from '../components/approver-list/approver-list.component'; +import { ApprovalActions } from '@store/release-orchestrator/approvals/approvals.actions'; +import * as ApprovalSelectors from '@store/release-orchestrator/approvals/approvals.selectors'; + +export interface ApprovalAction { + id: string; + approvalId: string; + action: 'approved' | 'rejected'; + actor: string; + comment: string; + timestamp: Date; +} + +export interface ApprovalDetail extends ApprovalRequest { + gateResults: GateResult[]; + actions: ApprovalAction[]; + approvers: Array<{ + id: string; + name: string; + email: string; + hasApproved: boolean; + approvedAt: Date | null; + }>; + releaseComponents: Array<{ + name: string; + version: string; + digest: string; + }>; +} + +@Component({ + selector: 'so-approval-detail', + standalone: true, + imports: [ + CommonModule, + RouterModule, + ReactiveFormsModule, + GateResultsPanelComponent, + ApprovalHistoryComponent, + ApproverListComponent + ], + templateUrl: './approval-detail.component.html', + styleUrl: './approval-detail.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class ApprovalDetailComponent implements OnInit { + private readonly store = inject(Store); + private readonly route = inject(ActivatedRoute); + private readonly fb = inject(FormBuilder); + + readonly approval$ = this.store.select(ApprovalSelectors.selectCurrentApproval); + readonly loading$ = this.store.select(ApprovalSelectors.selectLoading); + readonly canApprove$ = this.store.select(ApprovalSelectors.selectCanApprove); + readonly submitting$ = this.store.select(ApprovalSelectors.selectSubmitting); + + approvalForm: FormGroup = this.fb.group({ + comment: ['', Validators.required] + }); + + showApprovalForm = false; + pendingAction: 'approve' | 'reject' | null = null; + + ngOnInit(): void { + const id = this.route.snapshot.paramMap.get('id'); + if (id) { + this.store.dispatch(ApprovalActions.loadApproval({ id })); + } + } + + onStartApprove(): void { + this.pendingAction = 'approve'; + this.showApprovalForm = true; + } + + onStartReject(): void { + this.pendingAction = 'reject'; + this.showApprovalForm = true; + } + + onCancelAction(): void { + this.pendingAction = null; + this.showApprovalForm = false; + this.approvalForm.reset(); + } + + onSubmitAction(approvalId: string): void { + if (this.approvalForm.invalid || !this.pendingAction) return; + + if (this.pendingAction === 'approve') { + this.store.dispatch(ApprovalActions.approve({ + id: approvalId, + comment: this.approvalForm.value.comment + })); + } else { + this.store.dispatch(ApprovalActions.reject({ + id: approvalId, + comment: this.approvalForm.value.comment + })); + } + + this.onCancelAction(); + } + + getUrgencyClass(urgency: string): string { + return `urgency--${urgency}`; + } + + getStatusClass(status: string): string { + const classes: Record = { + pending: 'badge--warning', + approved: 'badge--success', + rejected: 'badge--danger', + expired: 'badge--secondary' + }; + return classes[status] || ''; + } + + getTimeRemaining(expiresAt: Date): string { + const ms = new Date(expiresAt).getTime() - Date.now(); + if (ms <= 0) return 'Expired'; + + const hours = Math.floor(ms / 3600000); + const minutes = Math.floor((ms % 3600000) / 60000); + + if (hours > 24) { + return `${Math.floor(hours / 24)}d ${hours % 24}h`; + } + return `${hours}h ${minutes}m`; + } +} +``` + +```html + +
+
+
+ Approvals + + {{ approval.releaseName }} +
+ +
+
+

+ Promotion Request + + {{ approval.status | titlecase }} + +

+

+ {{ approval.sourceEnvironment }} + + {{ approval.targetEnvironment }} +

+
+ +
+ + +
+
+ +
+
+ + {{ approval.urgency | titlecase }} urgency +
+
+ + Requested by {{ approval.requestedBy }} +
+
+ + {{ approval.requestedAt | date:'medium' }} +
+
+ + Expires in {{ getTimeRemaining(approval.expiresAt) }} +
+
+
+ + +
+
+

{{ pendingAction === 'approve' ? 'Approve' : 'Reject' }} Promotion

+
+
+ + +
+
+ + +
+
+
+
+ +
+
+ +
+

Justification

+

{{ approval.justification }}

+
+ + +
+

+ Gate Evaluation + + {{ approval.gatesPassed ? 'All Passed' : 'Some Failed' }} + +

+ +
+ + +
+

Release Components

+
+
+ {{ comp.name }} + {{ comp.version }} + {{ comp.digest | slice:0:19 }}... +
+
+
+
+ + +
+
+ +
+ +
+``` + +### Gate Results Panel Component + +```typescript +// gate-results-panel.component.ts +import { Component, Input, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; + +@Component({ + selector: 'so-gate-results-panel', + standalone: true, + imports: [CommonModule], + templateUrl: './gate-results-panel.component.html', + styleUrl: './gate-results-panel.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class GateResultsPanelComponent { + @Input() results: GateResult[] = []; + @Input() showDetails = true; + + expandedGates = new Set(); + + getGateIcon(status: string): string { + const icons: Record = { + passed: 'pi-check-circle', + failed: 'pi-times-circle', + warning: 'pi-exclamation-triangle', + pending: 'pi-spin pi-spinner', + skipped: 'pi-minus-circle' + }; + return icons[status] || 'pi-question-circle'; + } + + getGateClass(status: string): string { + return `gate--${status}`; + } + + getTypeIcon(type: string): string { + const icons: Record = { + security: 'pi-shield', + policy: 'pi-book', + quality: 'pi-chart-bar', + custom: 'pi-cog' + }; + return icons[type] || 'pi-circle'; + } + + toggleExpand(gateId: string): void { + if (this.expandedGates.has(gateId)) { + this.expandedGates.delete(gateId); + } else { + this.expandedGates.add(gateId); + } + } + + isExpanded(gateId: string): boolean { + return this.expandedGates.has(gateId); + } +} +``` + +```html + +
+
+ No gates configured for this promotion +
+ +
+
+
+ +
+
+ + + {{ gate.gateName }} + + {{ gate.message }} +
+
+ +
+
+ +
+
+ {{ item.key }} + {{ item.value | json }} +
+
+ Evaluated at {{ gate.evaluatedAt | date:'medium' }} +
+
+
+
+``` + +--- + +## Acceptance Criteria + +- [ ] Promotion request form validates input +- [ ] Target environment dropdown populated +- [ ] Gate preview loads on environment select +- [ ] Failed gates block submission (with override option) +- [ ] Approval queue shows pending requests +- [ ] Filtering by status works +- [ ] Filtering by urgency works +- [ ] Batch selection works +- [ ] Batch approve/reject works +- [ ] Approval detail loads correctly +- [ ] Approve action with comment +- [ ] Reject action with comment +- [ ] Gate results display correctly +- [ ] Approval progress shows correctly +- [ ] Approver list shows who approved +- [ ] Approval history timeline +- [ ] Unit test coverage >=80% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 106_001 Promotion Request | Internal | TODO | +| Angular 17 | External | Available | +| NgRx 17 | External | Available | +| PrimeNG 17 | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| PromotionRequestComponent | TODO | | +| ApprovalQueueComponent | TODO | | +| ApprovalDetailComponent | TODO | | +| GateResultsPanelComponent | TODO | | +| ApprovalFormComponent | TODO | | +| ApprovalHistoryComponent | TODO | | +| ApproverListComponent | TODO | | +| Approval NgRx Store | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_111_006_FE_deployment_monitoring_ui.md b/docs/implplan/SPRINT_20260110_111_006_FE_deployment_monitoring_ui.md new file mode 100644 index 000000000..bf59f6cb7 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_111_006_FE_deployment_monitoring_ui.md @@ -0,0 +1,895 @@ +# SPRINT: Deployment Monitoring UI + +> **Sprint ID:** 111_006 +> **Module:** FE +> **Phase:** 11 - UI Implementation +> **Status:** TODO +> **Parent:** [111_000_INDEX](SPRINT_20260110_111_000_INDEX_ui_implementation.md) + +--- + +## Overview + +Implement the Deployment Monitoring UI providing real-time deployment status, per-target progress tracking, live log streaming, and rollback capabilities. + +### Objectives + +- Deployment status overview +- Per-target progress tracking +- Real-time log streaming +- Deployment actions (pause, resume, cancel) +- Rollback confirmation dialog +- Deployment history + +### Working Directory + +``` +src/Web/StellaOps.Web/ +├── src/app/features/release-orchestrator/ +│ └── deployments/ +│ ├── deployment-list/ +│ │ ├── deployment-list.component.ts +│ │ ├── deployment-list.component.html +│ │ └── deployment-list.component.scss +│ ├── deployment-monitor/ +│ │ ├── deployment-monitor.component.ts +│ │ ├── deployment-monitor.component.html +│ │ └── deployment-monitor.component.scss +│ ├── components/ +│ │ ├── target-progress-list/ +│ │ ├── log-stream-viewer/ +│ │ ├── deployment-timeline/ +│ │ ├── rollback-dialog/ +│ │ └── deployment-metrics/ +│ └── deployments.routes.ts +└── src/app/store/release-orchestrator/ + └── deployments/ + ├── deployments.actions.ts + ├── deployments.reducer.ts + └── deployments.selectors.ts +``` + +--- + +## Deliverables + +### Deployment Monitor Component + +```typescript +// deployment-monitor.component.ts +import { Component, OnInit, OnDestroy, inject, signal, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { ActivatedRoute, RouterModule } from '@angular/router'; +import { Store } from '@ngrx/store'; +import { Subject, takeUntil } from 'rxjs'; +import { ConfirmationService, MessageService } from 'primeng/api'; +import { TargetProgressListComponent } from '../components/target-progress-list/target-progress-list.component'; +import { LogStreamViewerComponent } from '../components/log-stream-viewer/log-stream-viewer.component'; +import { DeploymentTimelineComponent } from '../components/deployment-timeline/deployment-timeline.component'; +import { DeploymentMetricsComponent } from '../components/deployment-metrics/deployment-metrics.component'; +import { RollbackDialogComponent } from '../components/rollback-dialog/rollback-dialog.component'; +import { DeploymentActions } from '@store/release-orchestrator/deployments/deployments.actions'; +import * as DeploymentSelectors from '@store/release-orchestrator/deployments/deployments.selectors'; + +export interface Deployment { + id: string; + releaseId: string; + releaseName: string; + releaseVersion: string; + environmentId: string; + environmentName: string; + status: 'pending' | 'running' | 'paused' | 'completed' | 'failed' | 'cancelled' | 'rolling_back'; + strategy: 'rolling' | 'blue_green' | 'canary' | 'all_at_once'; + progress: number; + startedAt: Date; + completedAt: Date | null; + initiatedBy: string; + targets: DeploymentTarget[]; +} + +export interface DeploymentTarget { + id: string; + name: string; + type: string; + status: 'pending' | 'running' | 'completed' | 'failed' | 'skipped'; + progress: number; + startedAt: Date | null; + completedAt: Date | null; + duration: number | null; + agentId: string; + error: string | null; +} + +export interface DeploymentEvent { + id: string; + type: 'started' | 'target_started' | 'target_completed' | 'target_failed' | 'paused' | 'resumed' | 'completed' | 'failed' | 'cancelled' | 'rollback_started'; + targetId: string | null; + targetName: string | null; + message: string; + timestamp: Date; +} + +@Component({ + selector: 'so-deployment-monitor', + standalone: true, + imports: [ + CommonModule, + RouterModule, + TargetProgressListComponent, + LogStreamViewerComponent, + DeploymentTimelineComponent, + DeploymentMetricsComponent, + RollbackDialogComponent + ], + templateUrl: './deployment-monitor.component.html', + styleUrl: './deployment-monitor.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush, + providers: [ConfirmationService, MessageService] +}) +export class DeploymentMonitorComponent implements OnInit, OnDestroy { + private readonly store = inject(Store); + private readonly route = inject(ActivatedRoute); + private readonly confirmationService = inject(ConfirmationService); + private readonly destroy$ = new Subject(); + + readonly deployment$ = this.store.select(DeploymentSelectors.selectCurrentDeployment); + readonly targets$ = this.store.select(DeploymentSelectors.selectDeploymentTargets); + readonly events$ = this.store.select(DeploymentSelectors.selectDeploymentEvents); + readonly logs$ = this.store.select(DeploymentSelectors.selectDeploymentLogs); + readonly metrics$ = this.store.select(DeploymentSelectors.selectDeploymentMetrics); + readonly loading$ = this.store.select(DeploymentSelectors.selectLoading); + + selectedTargetId = signal(null); + showRollbackDialog = signal(false); + activeTab = signal<'logs' | 'timeline' | 'metrics'>('logs'); + + ngOnInit(): void { + const id = this.route.snapshot.paramMap.get('id'); + if (id) { + this.store.dispatch(DeploymentActions.loadDeployment({ id })); + this.store.dispatch(DeploymentActions.subscribeToUpdates({ deploymentId: id })); + } + } + + ngOnDestroy(): void { + this.store.dispatch(DeploymentActions.unsubscribeFromUpdates()); + this.destroy$.next(); + this.destroy$.complete(); + } + + onPause(deployment: Deployment): void { + this.confirmationService.confirm({ + message: 'Pause the deployment? In-progress targets will complete, but no new targets will start.', + header: 'Pause Deployment', + accept: () => { + this.store.dispatch(DeploymentActions.pause({ deploymentId: deployment.id })); + } + }); + } + + onResume(deployment: Deployment): void { + this.store.dispatch(DeploymentActions.resume({ deploymentId: deployment.id })); + } + + onCancel(deployment: Deployment): void { + this.confirmationService.confirm({ + message: 'Cancel the deployment? In-progress targets will complete, but no new targets will start. This cannot be undone.', + header: 'Cancel Deployment', + acceptButtonStyleClass: 'p-button-danger', + accept: () => { + this.store.dispatch(DeploymentActions.cancel({ deploymentId: deployment.id })); + } + }); + } + + onRollback(): void { + this.showRollbackDialog.set(true); + } + + onRollbackConfirm(options: { targetIds?: string[]; reason: string }): void { + const id = this.route.snapshot.paramMap.get('id'); + if (id) { + this.store.dispatch(DeploymentActions.rollback({ + deploymentId: id, + targetIds: options.targetIds, + reason: options.reason + })); + } + this.showRollbackDialog.set(false); + } + + onTargetSelect(targetId: string | null): void { + this.selectedTargetId.set(targetId); + if (targetId) { + const id = this.route.snapshot.paramMap.get('id'); + if (id) { + this.store.dispatch(DeploymentActions.loadTargetLogs({ + deploymentId: id, + targetId + })); + } + } + } + + onRetryTarget(targetId: string): void { + const id = this.route.snapshot.paramMap.get('id'); + if (id) { + this.store.dispatch(DeploymentActions.retryTarget({ + deploymentId: id, + targetId + })); + } + } + + getStatusIcon(status: string): string { + const icons: Record = { + pending: 'pi-clock', + running: 'pi-spin pi-spinner', + paused: 'pi-pause', + completed: 'pi-check-circle', + failed: 'pi-times-circle', + cancelled: 'pi-ban', + rolling_back: 'pi-spin pi-undo' + }; + return icons[status] || 'pi-question'; + } + + getStatusClass(status: string): string { + const classes: Record = { + pending: 'status--pending', + running: 'status--running', + paused: 'status--paused', + completed: 'status--success', + failed: 'status--danger', + cancelled: 'status--cancelled', + rolling_back: 'status--warning' + }; + return classes[status] || ''; + } + + getDuration(deployment: Deployment): string { + const start = new Date(deployment.startedAt).getTime(); + const end = deployment.completedAt + ? new Date(deployment.completedAt).getTime() + : Date.now(); + const seconds = Math.floor((end - start) / 1000); + const minutes = Math.floor(seconds / 60); + const remainingSeconds = seconds % 60; + return `${minutes}m ${remainingSeconds}s`; + } + + canPause(deployment: Deployment): boolean { + return deployment.status === 'running'; + } + + canResume(deployment: Deployment): boolean { + return deployment.status === 'paused'; + } + + canCancel(deployment: Deployment): boolean { + return ['running', 'paused', 'pending'].includes(deployment.status); + } + + canRollback(deployment: Deployment): boolean { + return ['completed', 'failed'].includes(deployment.status); + } +} +``` + +```html + +
+
+
+ Deployments + + {{ deployment.releaseName }} +
+ +
+
+

+ + {{ deployment.releaseName }} + {{ deployment.releaseVersion }} +

+

+ Deploying to {{ deployment.environmentName }} + using {{ deployment.strategy | titlecase }} strategy +

+
+ +
+ + + + +
+
+ +
+
+ {{ deployment.progress }}% complete + Duration: {{ getDuration(deployment) }} +
+ + +
+ +
+
+ {{ (targets$ | async)?.length || 0 }} + Total Targets +
+
+ + {{ ((targets$ | async) || []) | filter:'status':'completed' | count }} + + Completed +
+
+ + {{ ((targets$ | async) || []) | filter:'status':'running' | count }} + + Running +
+
+ + {{ ((targets$ | async) || []) | filter:'status':'failed' | count }} + + Failed +
+
+
+ +
+ + +
+
+ + + +
+ +
+ + + + + + + + +
+
+
+
+ + + + + +``` + +### Target Progress List Component + +```typescript +// target-progress-list.component.ts +import { Component, Input, Output, EventEmitter, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; + +@Component({ + selector: 'so-target-progress-list', + standalone: true, + imports: [CommonModule], + templateUrl: './target-progress-list.component.html', + styleUrl: './target-progress-list.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class TargetProgressListComponent { + @Input() targets: DeploymentTarget[] | null = null; + @Input() selectedTargetId: string | null = null; + @Output() targetSelect = new EventEmitter(); + @Output() retryTarget = new EventEmitter(); + + getStatusIcon(status: string): string { + const icons: Record = { + pending: 'pi-clock', + running: 'pi-spin pi-spinner', + completed: 'pi-check-circle', + failed: 'pi-times-circle', + skipped: 'pi-minus-circle' + }; + return icons[status] || 'pi-question'; + } + + getStatusClass(status: string): string { + return `target--${status}`; + } + + getTypeIcon(type: string): string { + const icons: Record = { + docker_host: 'pi-box', + compose_host: 'pi-th-large', + ecs_service: 'pi-cloud', + nomad_job: 'pi-sitemap' + }; + return icons[type] || 'pi-server'; + } + + formatDuration(ms: number | null): string { + if (ms === null) return '-'; + const seconds = Math.floor(ms / 1000); + const minutes = Math.floor(seconds / 60); + const remainingSeconds = seconds % 60; + if (minutes > 0) { + return `${minutes}m ${remainingSeconds}s`; + } + return `${seconds}s`; + } + + onSelect(targetId: string): void { + if (this.selectedTargetId === targetId) { + this.targetSelect.emit(null); + } else { + this.targetSelect.emit(targetId); + } + } + + onRetry(event: Event, targetId: string): void { + event.stopPropagation(); + this.retryTarget.emit(targetId); + } +} +``` + +```html + +
+

Deployment Targets

+ +
+
+
+ +
+ +
+
+ + {{ target.name }} +
+
+ +
+
+ + {{ formatDuration(target.duration) }} + + Agent: {{ target.agentId }} +
+
+ {{ target.error }} +
+
+ +
+ +
+
+
+ +
+ +

No targets

+
+
+``` + +### Log Stream Viewer Component + +```typescript +// log-stream-viewer.component.ts +import { Component, Input, ViewChild, ElementRef, AfterViewChecked, + OnChanges, SimpleChanges, signal, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { FormsModule } from '@angular/forms'; + +export interface LogEntry { + timestamp: Date; + level: 'debug' | 'info' | 'warn' | 'error'; + source: string; + targetId: string | null; + message: string; +} + +@Component({ + selector: 'so-log-stream-viewer', + standalone: true, + imports: [CommonModule, FormsModule], + templateUrl: './log-stream-viewer.component.html', + styleUrl: './log-stream-viewer.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class LogStreamViewerComponent implements AfterViewChecked, OnChanges { + @Input() logs: LogEntry[] | null = null; + @Input() targetId: string | null = null; + + @ViewChild('logContainer') logContainer!: ElementRef; + + autoScroll = signal(true); + searchTerm = signal(''); + levelFilter = signal(['debug', 'info', 'warn', 'error']); + private shouldScroll = false; + + readonly levelOptions = [ + { label: 'Debug', value: 'debug' }, + { label: 'Info', value: 'info' }, + { label: 'Warn', value: 'warn' }, + { label: 'Error', value: 'error' } + ]; + + ngOnChanges(changes: SimpleChanges): void { + if (changes['logs'] && this.autoScroll()) { + this.shouldScroll = true; + } + } + + ngAfterViewChecked(): void { + if (this.shouldScroll && this.logContainer) { + this.scrollToBottom(); + this.shouldScroll = false; + } + } + + get filteredLogs(): LogEntry[] { + if (!this.logs) return []; + + return this.logs.filter(log => { + // Filter by target + if (this.targetId && log.targetId !== this.targetId) { + return false; + } + + // Filter by level + if (!this.levelFilter().includes(log.level)) { + return false; + } + + // Filter by search term + const term = this.searchTerm().toLowerCase(); + if (term && !log.message.toLowerCase().includes(term)) { + return false; + } + + return true; + }); + } + + getLevelClass(level: string): string { + return `log-entry--${level}`; + } + + formatTimestamp(timestamp: Date): string { + return new Date(timestamp).toISOString().split('T')[1].slice(0, 12); + } + + scrollToBottom(): void { + if (this.logContainer) { + const el = this.logContainer.nativeElement; + el.scrollTop = el.scrollHeight; + } + } + + onClear(): void { + this.searchTerm.set(''); + } + + onCopy(): void { + const text = this.filteredLogs + .map(log => `${this.formatTimestamp(log.timestamp)} [${log.level.toUpperCase()}] ${log.message}`) + .join('\n'); + navigator.clipboard.writeText(text); + } + + onDownload(): void { + const text = this.filteredLogs + .map(log => JSON.stringify(log)) + .join('\n'); + const blob = new Blob([text], { type: 'application/x-ndjson' }); + const url = URL.createObjectURL(blob); + const a = document.createElement('a'); + a.href = url; + a.download = `deployment-logs-${Date.now()}.ndjson`; + a.click(); + URL.revokeObjectURL(url); + } +} +``` + +```html + +
+
+ + + + + + + + +
+ + + + +
+
+ +
+
+ +

No logs to display

+
+ +
+ + {{ log.level | uppercase }} + {{ log.source }} + {{ log.message }} +
+
+
+``` + +### Rollback Dialog Component + +```typescript +// rollback-dialog.component.ts +import { Component, Input, Output, EventEmitter, signal, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { FormBuilder, FormGroup, ReactiveFormsModule, Validators } from '@angular/forms'; + +@Component({ + selector: 'so-rollback-dialog', + standalone: true, + imports: [CommonModule, ReactiveFormsModule], + templateUrl: './rollback-dialog.component.html', + styleUrl: './rollback-dialog.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class RollbackDialogComponent { + @Input() deployment: Deployment | null = null; + @Input() targets: DeploymentTarget[] | null = null; + @Output() confirm = new EventEmitter<{ targetIds?: string[]; reason: string }>(); + @Output() cancel = new EventEmitter(); + + private readonly fb = inject(FormBuilder); + + rollbackType = signal<'all' | 'selected'>('all'); + selectedTargets = signal>(new Set()); + + form: FormGroup = this.fb.group({ + reason: ['', [Validators.required, Validators.minLength(10)]] + }); + + get completedTargets(): DeploymentTarget[] { + return (this.targets || []).filter(t => t.status === 'completed'); + } + + onToggleTarget(targetId: string): void { + this.selectedTargets.update(set => { + const newSet = new Set(set); + if (newSet.has(targetId)) { + newSet.delete(targetId); + } else { + newSet.add(targetId); + } + return newSet; + }); + } + + onSelectAll(): void { + this.selectedTargets.set(new Set(this.completedTargets.map(t => t.id))); + } + + onDeselectAll(): void { + this.selectedTargets.set(new Set()); + } + + onConfirm(): void { + if (this.form.invalid) return; + + const targetIds = this.rollbackType() === 'selected' + ? Array.from(this.selectedTargets()) + : undefined; + + this.confirm.emit({ + targetIds, + reason: this.form.value.reason + }); + } +} +``` + +```html + +
+
+
+

+ + Rollback Deployment +

+ +
+ +
+ + +
+

Rollback Scope

+
+
+ + + +
+
+ + + +
+
+
+ +
+
+ Select targets to rollback ({{ selectedTargets().size }} selected) + +
+
+
+ + + +
+
+
+ +
+
+ + + Minimum 10 characters required +
+
+
+ +
+ + +
+
+
+``` + +--- + +## Acceptance Criteria + +- [ ] Deployment list shows all deployments +- [ ] Deployment monitor loads correctly +- [ ] Real-time progress updates via SignalR +- [ ] Target list shows all targets with status +- [ ] Target selection shows target-specific logs +- [ ] Log streaming works in real-time +- [ ] Log filtering by level works +- [ ] Log search works +- [ ] Pause deployment works +- [ ] Resume deployment works +- [ ] Cancel deployment with confirmation +- [ ] Rollback dialog opens +- [ ] Rollback scope selection works +- [ ] Rollback reason required +- [ ] Retry failed target works +- [ ] Timeline shows deployment events +- [ ] Metrics display correctly +- [ ] Unit test coverage >=80% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 107_001 Platform API Gateway | Internal | TODO | +| Angular 17 | External | Available | +| NgRx 17 | External | Available | +| PrimeNG 17 | External | Available | +| SignalR Client | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| DeploymentListComponent | TODO | | +| DeploymentMonitorComponent | TODO | | +| TargetProgressListComponent | TODO | | +| LogStreamViewerComponent | TODO | | +| DeploymentTimelineComponent | TODO | | +| DeploymentMetricsComponent | TODO | | +| RollbackDialogComponent | TODO | | +| Deployment NgRx Store | TODO | | +| SignalR integration | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_111_007_FE_evidence_viewer.md b/docs/implplan/SPRINT_20260110_111_007_FE_evidence_viewer.md new file mode 100644 index 000000000..68151df41 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_111_007_FE_evidence_viewer.md @@ -0,0 +1,1109 @@ +# SPRINT: Evidence Viewer + +> **Sprint ID:** 111_007 +> **Module:** FE +> **Phase:** 11 - UI Implementation +> **Status:** TODO +> **Parent:** [111_000_INDEX](SPRINT_20260110_111_000_INDEX_ui_implementation.md) + +--- + +## Overview + +Implement the Evidence Viewer UI providing evidence packet browsing, detailed evidence inspection, cryptographic signature verification, and evidence export capabilities. + +### Objectives + +- Evidence packet list with filtering +- Evidence detail view with content inspection +- Cryptographic signature verification +- Evidence export to multiple formats +- Evidence comparison view +- Audit trail integration + +### Working Directory + +``` +src/Web/StellaOps.Web/ +├── src/app/features/release-orchestrator/ +│ └── evidence/ +│ ├── evidence-list/ +│ │ ├── evidence-list.component.ts +│ │ ├── evidence-list.component.html +│ │ └── evidence-list.component.scss +│ ├── evidence-detail/ +│ │ ├── evidence-detail.component.ts +│ │ ├── evidence-detail.component.html +│ │ └── evidence-detail.component.scss +│ ├── components/ +│ │ ├── evidence-verifier/ +│ │ ├── evidence-content-viewer/ +│ │ ├── export-dialog/ +│ │ ├── evidence-timeline/ +│ │ └── signature-panel/ +│ └── evidence.routes.ts +└── src/app/store/release-orchestrator/ + └── evidence/ + ├── evidence.actions.ts + ├── evidence.reducer.ts + └── evidence.selectors.ts +``` + +--- + +## Deliverables + +### Evidence List Component + +```typescript +// evidence-list.component.ts +import { Component, OnInit, inject, signal, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { RouterModule } from '@angular/router'; +import { FormsModule } from '@angular/forms'; +import { Store } from '@ngrx/store'; +import { EvidenceActions } from '@store/release-orchestrator/evidence/evidence.actions'; +import * as EvidenceSelectors from '@store/release-orchestrator/evidence/evidence.selectors'; + +export interface EvidencePacket { + id: string; + deploymentId: string; + releaseId: string; + releaseName: string; + releaseVersion: string; + environmentId: string; + environmentName: string; + status: 'pending' | 'complete' | 'failed'; + signatureStatus: 'unsigned' | 'valid' | 'invalid' | 'expired'; + contentHash: string; + signedAt: Date | null; + signedBy: string | null; + createdAt: Date; + size: number; + contentTypes: string[]; +} + +@Component({ + selector: 'so-evidence-list', + standalone: true, + imports: [CommonModule, RouterModule, FormsModule], + templateUrl: './evidence-list.component.html', + styleUrl: './evidence-list.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class EvidenceListComponent implements OnInit { + private readonly store = inject(Store); + + readonly evidencePackets$ = this.store.select(EvidenceSelectors.selectFilteredEvidence); + readonly loading$ = this.store.select(EvidenceSelectors.selectLoading); + readonly totalCount$ = this.store.select(EvidenceSelectors.selectTotalCount); + + searchTerm = signal(''); + signatureFilter = signal([]); + environmentFilter = signal(null); + dateRange = signal<[Date, Date] | null>(null); + + readonly signatureOptions = [ + { label: 'Valid', value: 'valid' }, + { label: 'Invalid', value: 'invalid' }, + { label: 'Unsigned', value: 'unsigned' }, + { label: 'Expired', value: 'expired' } + ]; + + ngOnInit(): void { + this.loadEvidence(); + } + + loadEvidence(): void { + this.store.dispatch(EvidenceActions.loadEvidence({ + filter: { + search: this.searchTerm(), + signatureStatuses: this.signatureFilter(), + environment: this.environmentFilter(), + dateRange: this.dateRange() + } + })); + } + + onSearch(term: string): void { + this.searchTerm.set(term); + this.loadEvidence(); + } + + onSignatureFilterChange(statuses: string[]): void { + this.signatureFilter.set(statuses); + this.loadEvidence(); + } + + onDateRangeChange(range: [Date, Date] | null): void { + this.dateRange.set(range); + this.loadEvidence(); + } + + getSignatureIcon(status: string): string { + const icons: Record = { + valid: 'pi-verified', + invalid: 'pi-times-circle', + unsigned: 'pi-minus-circle', + expired: 'pi-clock' + }; + return icons[status] || 'pi-question-circle'; + } + + getSignatureClass(status: string): string { + const classes: Record = { + valid: 'signature--valid', + invalid: 'signature--invalid', + unsigned: 'signature--unsigned', + expired: 'signature--expired' + }; + return classes[status] || ''; + } + + formatSize(bytes: number): string { + if (bytes < 1024) return `${bytes} B`; + if (bytes < 1024 * 1024) return `${(bytes / 1024).toFixed(1)} KB`; + return `${(bytes / (1024 * 1024)).toFixed(1)} MB`; + } + + onExportMultiple(ids: string[]): void { + this.store.dispatch(EvidenceActions.exportMultiple({ ids, format: 'zip' })); + } +} +``` + +```html + +
+
+

Evidence Packets

+
+ +
+
+ +
+ + + + + + + + + + +
+ +
+ + + + + + + Release + Environment + Signature + Contents + Size + Created + Actions + + + + + + + + + + {{ evidence.releaseName }} + {{ evidence.releaseVersion }} + + + {{ evidence.environmentName }} + + + + {{ evidence.signatureStatus | titlecase }} + + + +
+ + {{ type }} + +
+ + {{ formatSize(evidence.size) }} + {{ evidence.createdAt | date:'short' }} + + + + + +
+ + + +
+ +

No evidence packets found

+

Evidence packets are generated during deployments.

+
+ + +
+
+
+ + + + +
+``` + +### Evidence Detail Component + +```typescript +// evidence-detail.component.ts +import { Component, OnInit, inject, signal, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { ActivatedRoute, RouterModule } from '@angular/router'; +import { Store } from '@ngrx/store'; +import { EvidenceVerifierComponent } from '../components/evidence-verifier/evidence-verifier.component'; +import { EvidenceContentViewerComponent } from '../components/evidence-content-viewer/evidence-content-viewer.component'; +import { SignaturePanelComponent } from '../components/signature-panel/signature-panel.component'; +import { ExportDialogComponent } from '../components/export-dialog/export-dialog.component'; +import { EvidenceTimelineComponent } from '../components/evidence-timeline/evidence-timeline.component'; +import { EvidenceActions } from '@store/release-orchestrator/evidence/evidence.actions'; +import * as EvidenceSelectors from '@store/release-orchestrator/evidence/evidence.selectors'; + +export interface EvidenceDetail extends EvidencePacket { + content: EvidenceContent; + signature: EvidenceSignature | null; + verificationResult: VerificationResult | null; +} + +export interface EvidenceContent { + metadata: { + deploymentId: string; + releaseId: string; + environmentId: string; + startedAt: string; + completedAt: string; + initiatedBy: string; + outcome: string; + }; + release: { + name: string; + version: string; + components: Array<{ + name: string; + digest: string; + version: string; + }>; + }; + workflow: { + id: string; + name: string; + version: number; + stepsExecuted: number; + stepsFailed: number; + }; + targets: Array<{ + id: string; + name: string; + type: string; + outcome: string; + duration: number; + }>; + approvals: Array<{ + approver: string; + action: string; + timestamp: string; + comment: string; + }>; + gateResults: Array<{ + gateId: string; + gateName: string; + status: string; + evaluatedAt: string; + }>; + artifacts: Array<{ + name: string; + type: string; + digest: string; + size: number; + }>; +} + +export interface EvidenceSignature { + algorithm: string; + keyId: string; + signature: string; + signedAt: Date; + signedBy: string; + certificate: string | null; +} + +export interface VerificationResult { + valid: boolean; + message: string; + details: { + signatureValid: boolean; + contentHashValid: boolean; + certificateValid: boolean; + timestampValid: boolean; + }; + verifiedAt: Date; +} + +@Component({ + selector: 'so-evidence-detail', + standalone: true, + imports: [ + CommonModule, + RouterModule, + EvidenceVerifierComponent, + EvidenceContentViewerComponent, + SignaturePanelComponent, + ExportDialogComponent, + EvidenceTimelineComponent + ], + templateUrl: './evidence-detail.component.html', + styleUrl: './evidence-detail.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class EvidenceDetailComponent implements OnInit { + private readonly store = inject(Store); + private readonly route = inject(ActivatedRoute); + + readonly evidence$ = this.store.select(EvidenceSelectors.selectCurrentEvidence); + readonly loading$ = this.store.select(EvidenceSelectors.selectLoading); + readonly verifying$ = this.store.select(EvidenceSelectors.selectVerifying); + + showExportDialog = signal(false); + activeTab = signal<'overview' | 'content' | 'signature' | 'timeline'>('overview'); + + ngOnInit(): void { + const id = this.route.snapshot.paramMap.get('id'); + if (id) { + this.store.dispatch(EvidenceActions.loadEvidence({ id })); + } + } + + onVerify(): void { + const id = this.route.snapshot.paramMap.get('id'); + if (id) { + this.store.dispatch(EvidenceActions.verifyEvidence({ id })); + } + } + + onExport(): void { + this.showExportDialog.set(true); + } + + onExportConfirm(options: { format: string; includeSignature: boolean }): void { + const id = this.route.snapshot.paramMap.get('id'); + if (id) { + this.store.dispatch(EvidenceActions.exportEvidence({ + id, + format: options.format, + includeSignature: options.includeSignature + })); + } + this.showExportDialog.set(false); + } + + onDownloadRaw(): void { + const id = this.route.snapshot.paramMap.get('id'); + if (id) { + this.store.dispatch(EvidenceActions.downloadRaw({ id })); + } + } + + getSignatureIcon(status: string): string { + const icons: Record = { + valid: 'pi-verified', + invalid: 'pi-times-circle', + unsigned: 'pi-minus-circle', + expired: 'pi-clock' + }; + return icons[status] || 'pi-question-circle'; + } + + getSignatureClass(status: string): string { + return `signature--${status}`; + } + + getOutcomeClass(outcome: string): string { + const classes: Record = { + success: 'outcome--success', + failure: 'outcome--failure', + partial: 'outcome--warning', + cancelled: 'outcome--secondary' + }; + return classes[outcome] || ''; + } +} +``` + +```html + +
+
+
+ Evidence + + {{ evidence.releaseName }} +
+ +
+
+

+ Evidence Packet + + + {{ evidence.signatureStatus | titlecase }} + +

+

+ {{ evidence.releaseName }} {{ evidence.releaseVersion }} + deployed to {{ evidence.environmentName }} +

+
+ +
+ + +
+
+ +
+
+ Content Hash + {{ evidence.contentHash }} +
+
+ Created + {{ evidence.createdAt | date:'medium' }} +
+
+ Signed + + {{ evidence.signedAt | date:'medium' }} + by {{ evidence.signedBy }} + +
+
+
+ + +
+ +
+ {{ evidence.verificationResult.valid ? 'Verification Passed' : 'Verification Failed' }} + {{ evidence.verificationResult.message }} +
+ + Verified {{ evidence.verificationResult.verifiedAt | date:'medium' }} + +
+ +
+ + + + +
+ +
+ +
+
+
+

Deployment Summary

+
+
+ {{ evidence.content.metadata.outcome | titlecase }} +
+
+
+ {{ evidence.content.targets.length }} + Targets +
+
+ {{ evidence.content.workflow.stepsExecuted }} + Steps +
+
+ {{ evidence.content.approvals.length }} + Approvals +
+
+
+
+ +
+

Release Components

+
+
+ {{ comp.name }} + {{ comp.version }} + {{ comp.digest | slice:0:19 }}... +
+
+
+ +
+

Gate Results

+
+
+ + {{ gate.gateName }} + {{ gate.status | titlecase }} +
+
+
+ +
+

Artifacts

+
+
+ + {{ artifact.name }} + {{ artifact.type }} + {{ artifact.digest | slice:0:12 }}... +
+
+
+
+
+ + + + + + + + + + + + +
+
+ + + + +
+ +
+``` + +### Evidence Verifier Component + +```typescript +// evidence-verifier.component.ts +import { Component, Input, Output, EventEmitter, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; + +@Component({ + selector: 'so-evidence-verifier', + standalone: true, + imports: [CommonModule], + templateUrl: './evidence-verifier.component.html', + styleUrl: './evidence-verifier.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class EvidenceVerifierComponent { + @Input() result: VerificationResult | null = null; + @Input() verifying = false; + @Output() verify = new EventEmitter(); + + getCheckIcon(passed: boolean): string { + return passed ? 'pi-check-circle' : 'pi-times-circle'; + } + + getCheckClass(passed: boolean): string { + return passed ? 'check--passed' : 'check--failed'; + } +} +``` + +```html + +
+
+ +

Verify the cryptographic signature and content integrity of this evidence packet.

+ +
+ +
+ +

Verifying evidence...

+
+ +
+
+ +

{{ result.valid ? 'Verification Passed' : 'Verification Failed' }}

+
+ +

{{ result.message }}

+ +
+
+ + Signature verification +
+
+ + Content hash verification +
+
+ + Certificate validation +
+
+ + Timestamp validation +
+
+ + +
+
+``` + +### Export Dialog Component + +```typescript +// export-dialog.component.ts +import { Component, Output, EventEmitter, signal, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { FormsModule } from '@angular/forms'; + +@Component({ + selector: 'so-export-dialog', + standalone: true, + imports: [CommonModule, FormsModule], + templateUrl: './export-dialog.component.html', + styleUrl: './export-dialog.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class ExportDialogComponent { + @Output() confirm = new EventEmitter<{ format: string; includeSignature: boolean }>(); + @Output() cancel = new EventEmitter(); + + selectedFormat = signal('json'); + includeSignature = signal(true); + + readonly formatOptions = [ + { + value: 'json', + label: 'JSON', + description: 'Raw JSON evidence packet', + icon: 'pi-file' + }, + { + value: 'pdf', + label: 'PDF Report', + description: 'Human-readable PDF document', + icon: 'pi-file-pdf' + }, + { + value: 'csv', + label: 'CSV', + description: 'Spreadsheet-compatible format', + icon: 'pi-file-excel' + }, + { + value: 'slsa', + label: 'SLSA Provenance', + description: 'SLSA v1.0 provenance format', + icon: 'pi-shield' + } + ]; + + onConfirm(): void { + this.confirm.emit({ + format: this.selectedFormat(), + includeSignature: this.includeSignature() + }); + } +} +``` + +```html + +
+
+
+

+ + Export Evidence +

+ +
+ +
+
+

Export Format

+
+
+
+ +
+
+ {{ format.label }} + {{ format.description }} +
+
+ +
+
+
+
+ +
+

Options

+
+ + + +
+
+
+ +
+ + +
+
+
+``` + +### Signature Panel Component + +```typescript +// signature-panel.component.ts +import { Component, Input, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; + +@Component({ + selector: 'so-signature-panel', + standalone: true, + imports: [CommonModule], + templateUrl: './signature-panel.component.html', + styleUrl: './signature-panel.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class SignaturePanelComponent { + @Input() signature: EvidenceSignature | null = null; + @Input() verificationResult: VerificationResult | null = null; + + showFullSignature = false; + showFullCertificate = false; + + formatSignature(sig: string): string { + if (this.showFullSignature) return sig; + return sig.length > 64 ? sig.substring(0, 64) + '...' : sig; + } + + onCopySignature(): void { + if (this.signature) { + navigator.clipboard.writeText(this.signature.signature); + } + } + + onCopyCertificate(): void { + if (this.signature?.certificate) { + navigator.clipboard.writeText(this.signature.certificate); + } + } +} +``` + +```html + +
+
+ +

Unsigned Evidence

+

This evidence packet has not been cryptographically signed.

+
+ +
+
+

Signature Details

+
+
+ Algorithm + {{ signature.algorithm }} +
+
+ Key ID + {{ signature.keyId }} +
+
+ Signed At + {{ signature.signedAt | date:'medium' }} +
+
+ Signed By + {{ signature.signedBy }} +
+
+
+ +
+
+

Signature Value

+
+ + +
+
+ + {{ formatSignature(signature.signature) }} + +
+ +
+
+

Certificate

+
+ + +
+
+
{{ signature.certificate }}
+
+ +
+

Verification Status

+
+ +
+ {{ verificationResult.valid ? 'Valid Signature' : 'Invalid Signature' }} + {{ verificationResult.message }} +
+
+
+
+
+``` + +### Evidence Content Viewer Component + +```typescript +// evidence-content-viewer.component.ts +import { Component, Input, signal, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; + +@Component({ + selector: 'so-evidence-content-viewer', + standalone: true, + imports: [CommonModule], + templateUrl: './evidence-content-viewer.component.html', + styleUrl: './evidence-content-viewer.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class EvidenceContentViewerComponent { + @Input() content: EvidenceContent | null = null; + + viewMode = signal<'formatted' | 'raw'>('formatted'); + + get rawJson(): string { + return JSON.stringify(this.content, null, 2); + } + + onCopy(): void { + navigator.clipboard.writeText(this.rawJson); + } +} +``` + +```html + +
+
+
+ + +
+ +
+ +
+
+

Metadata

+
+
+ {{ item.key }} + {{ item.value }} +
+
+
+ +
+

Targets ({{ content.targets.length }})

+
+ + + + + + + + + + + + + + + + + +
NameTypeOutcomeDuration
{{ target.name }}{{ target.type }} + + {{ target.outcome }} + + {{ target.duration }}ms
+
+
+ +
+

Approvals ({{ content.approvals.length }})

+
+
+
+ {{ approval.approver }} + + {{ approval.action }} + +
+

{{ approval.comment }}

+ {{ approval.timestamp | date:'medium' }} +
+
+
+
+ +
+
{{ rawJson }}
+
+
+``` + +--- + +## Acceptance Criteria + +- [ ] Evidence list displays all packets +- [ ] Filtering by signature status works +- [ ] Filtering by date range works +- [ ] Search finds evidence by release/environment +- [ ] Evidence detail loads correctly +- [ ] Overview tab shows summary +- [ ] Content tab shows formatted/raw view +- [ ] Signature tab shows signature details +- [ ] Verification triggers and shows result +- [ ] Export dialog opens +- [ ] Export to JSON works +- [ ] Export to PDF works +- [ ] Export to CSV works +- [ ] Export to SLSA format works +- [ ] Copy signature/certificate works +- [ ] Download raw evidence works +- [ ] Timeline shows evidence events +- [ ] Unit test coverage >=80% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 109_002 Evidence Signer | Internal | TODO | +| Angular 17 | External | Available | +| NgRx 17 | External | Available | +| PrimeNG 17 | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| EvidenceListComponent | TODO | | +| EvidenceDetailComponent | TODO | | +| EvidenceVerifierComponent | TODO | | +| EvidenceContentViewerComponent | TODO | | +| ExportDialogComponent | TODO | | +| SignaturePanelComponent | TODO | | +| EvidenceTimelineComponent | TODO | | +| Evidence NgRx Store | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/key-features.md b/docs/key-features.md index e0121829b..ad0f70c4f 100644 --- a/docs/key-features.md +++ b/docs/key-features.md @@ -1,91 +1,186 @@ -# Key Features – Capability Cards +# Key Features — Capability Cards -> **Core Thesis:** Stella Ops isn't a scanner that outputs findings. It's a platform that outputs **attestable decisions that can be replayed**. That difference survives auditors, regulators, and supply-chain propagation. +> **Core Thesis:** Stella Ops Suite isn't a scanner or a deployment tool—it's a **release control plane** that produces **attestable decisions that can be replayed**. Security is a gate, not a blocker. Evidence survives auditors, regulators, and supply-chain propagation. -> **Looking for the complete feature catalog?** See [`full-features-list.md`](full-features-list.md) for the comprehensive list of all platform capabilities, or [`FEATURE_MATRIX.md`](FEATURE_MATRIX.md) for tier-by-tier availability. +> **Looking for the complete feature catalog?** See [`full-features-list.md`](full-features-list.md) for the comprehensive list, or [`FEATURE_MATRIX.md`](FEATURE_MATRIX.md) for tier-by-tier availability. --- ## At a Glance -| What Competitors Do | What Stella Ops Does | -|--------------------|---------------------| -| Output findings | Output decisions with proof chains | +| What Competitors Do | What Stella Ops Suite Does | +|--------------------|---------------------------| +| CI/CD runs pipelines | Central release authority across environments | +| Deployment tools promote | Promotion with integrated security gates | +| Scanners output findings | Security gates output decisions with proof chains | | VEX as suppression file | VEX as logical claim system (K4 lattice) | -| Reachability as badge | Reachability as signed proof | -| "+3 CVEs" reports | "Exploitability dropped 41%" semantic deltas | -| Hide unknowns | Surface and score unknowns | +| Release identity via tags | Release identity via immutable digests | +| Per-seat/per-project pricing | Pay for environments + new digests/day | | Online-first | Offline-first with full parity | --- -Each card below pairs the headline capability with the evidence that backs it and why it matters day to day. +## Release Orchestration (Planned) -## 0. Decision Capsules — Audit-Grade Evidence Bundles +### 0. Release Control Plane -**The core moat capability.** Every scan result is sealed in a **Decision Capsule**—a content-addressed bundle containing everything needed to reproduce and verify the vulnerability decision. +**The new core capability.** Stella Ops Suite becomes the central release authority between CI and runtime targets. + +| Capability | What It Does | +|------------|--------------| +| **Environment management** | Define Dev/Stage/Prod with freeze windows and approval policies | +| **Release bundles** | Compose releases from component OCI digests with semantic versioning | +| **Promotion workflows** | DAG-based workflow engine with approvals, gates, and hooks | +| **Security gates** | Scan on build, evaluate on release, re-evaluate on CVE updates | +| **Deployment execution** | Deploy to Docker/Compose/ECS/Nomad via agents or agentless | +| **Evidence packets** | Every release decision is cryptographically signed and stored | + +**Why it matters:** Non-Kubernetes container teams finally get a central release authority with audit-grade evidence—without replacing their existing CI/SCM/registry stack. + +### 1. Digest-First Release Identity + +**Tags are mutable; digests are truth.** A release is an immutable set of OCI digests, resolved at release creation time. + +``` +Release: myapp-v2.3.1 +Components: + api: sha256:abc123... + worker: sha256:def456... + frontend: sha256:789ghi... +``` + +**What this enables:** +- Tamper detection at pull time (digest mismatch = deployment failure) +- Audit trail of exactly what was deployed +- Rollback to known-good digests, not "latest" tags +- "What is deployed where" tracking with integrity + +**Modules (planned):** `ReleaseManager`, `ComponentRegistry`, `VersionManager` + +### 2. Promotion Workflows with Security Gates + +**Security integrated into release flow, not bolted on.** Promotion requests trigger gate evaluation before deployment. + +| Gate Type | What It Checks | +|-----------|---------------| +| **Security gate** | Reachable critical/high vulnerabilities | +| **Approval gate** | Required approval count, separation of duties | +| **Freeze window gate** | Environment freeze windows | +| **Policy gate** | Custom OPA/Rego policies | +| **Previous environment gate** | Release deployed to prior environment | + +**Decision records include:** +- All gate results with pass/fail reasons +- Evidence refs (scan verdicts, approval records) +- Policy hash + inputs hash for replay +- "Why blocked?" explainability + +**Modules (planned):** `PromotionManager`, `ApprovalGateway`, `DecisionEngine` + +### 3. Deployment Execution + +**Deploy to non-Kubernetes targets as first-class citizens.** Agent-based or agentless deployment to Docker hosts, Compose, ECS, Nomad. + +| Target Type | Deployment Method | +|-------------|-------------------| +| **Docker host** | Agent pulls and starts containers | +| **Compose host** | Agent writes `compose.stella.lock.yml` and runs `docker-compose up` | +| **ECS service** | Agent updates task definition and service | +| **Nomad job** | Agent updates job spec and submits | +| **SSH remote** | Agentless via SSH (Linux) | +| **WinRM remote** | Agentless via WinRM (Windows) | + +**Generated artifacts:** +- `compose.stella.lock.yml`: Pinned digests, resolved environment refs +- `stella.version.json`: Version sticker on target for drift detection +- `release.evidence.json`: Decision record + +**Modules (planned):** `DeployOrchestrator`, `Agent.*`, `ArtifactGenerator` + +### 4. Progressive Delivery + +**A/B releases and canary deployments.** Gradual rollout with automatic rollback on health failure. + +| Strategy | Description | +|----------|-------------| +| **Immediate** | 0% → 100% instantly | +| **Canary** | 10% → 25% → 50% → 100% with health checks | +| **Blue-green** | Deploy to B, switch traffic, retire A | +| **Rolling** | 10% at a time with health checks | + +**Traffic routing plugins:** Nginx, HAProxy, Traefik, AWS ALB + +**Modules (planned):** `ABManager`, `TrafficRouter`, `CanaryController` + +### 5. Plugin System (Three-Surface Model) + +**Extensible without core code changes.** Plugins contribute through three surfaces. + +| Surface | What It Does | +|---------|--------------| +| **Manifest** | Declares what the plugin provides (integrations, steps, agents) | +| **Connector runtime** | gRPC interface for runtime operations | +| **Step provider** | Execution characteristics for workflow steps | + +**Plugin types:** +- **Integration connectors:** SCM (GitHub, GitLab), CI (Actions, Jenkins), Registry (Harbor, ECR), Vault (HashiCorp, AWS Secrets) +- **Step providers:** Custom workflow steps +- **Agent types:** New deployment targets +- **Gate providers:** Custom gate evaluations + +**Modules (planned):** `PluginRegistry`, `PluginLoader`, `PluginSandbox`, `PluginSDK` + +--- + +## Security Capabilities (Operational) + +### 6. Decision Capsules — Audit-Grade Evidence Bundles + +**Every scan and release decision is sealed.** A Decision Capsule is a content-addressed bundle containing everything needed to reproduce and verify the decision. | Component | What's Included | |-----------|----------------| -| **Inputs** | Exact SBOM, frozen feed snapshots (with Merkle roots), policy version, lattice rules | +| **Inputs** | Exact SBOM, frozen feed snapshots (with Merkle roots), policy version | | **Evidence** | Reachability proofs (static + runtime), VEX statements, binary fingerprints | | **Outputs** | Verdicts, risk scores, remediation paths | | **Signatures** | DSSE envelopes over all of the above | -**Why it matters:** Six months from now, an auditor can run `stella replay srm.yaml --assert-digest ` and get *identical* results. This is what "audit-grade assurance" actually means. +**Why it matters:** Auditors can replay any decision bit-for-bit. This is what "audit-grade assurance" actually means. -**No competitor offers this.** Trivy, Grype, Snyk—none can replay a past scan bit-for-bit because they don't freeze feeds or produce deterministic manifests. +**Modules:** `EvidenceLocker`, `Attestor`, `Replay` -## 1. Delta SBOM Engine +### 7. Lattice Policy + OpenVEX (K4 Logic) -**Performance without sacrificing determinism.** Layer-aware ingestion keeps the SBOM catalog content-addressed; rescans only fetch new layers. +**VEX as a logical claim system, not a suppression file.** The policy engine uses Belnap K4 four-valued logic. -- **Speed:** Warm scans < 1 second; CI/CD pipelines stay fast -- **Determinism:** Replay Manifest (SRM) captures exact analyzer inputs/outputs per layer -- **Evidence:** Binary crosswalk via Build-ID mapping; `bin:{sha256}` fallbacks for stripped binaries +| State | Meaning | +|-------|---------| +| **Unknown (bottom)** | No information | +| **True** | Positive assertion | +| **False** | Negative assertion | +| **Conflict (top)** | Contradictory assertions | -**Modules:** `Scanner`, `SbomService`, `BinaryIndex` +**Why it matters:** When vendor says "not_affected" but runtime shows the function was called, you have a *conflict*—not a false positive. ---- +**Modules:** `VexLens`, `TrustLatticeEngine`, `Policy` -## 2. Lattice Policy + OpenVEX (K4 Logic) +### 8. Signed Reachability Proofs -**VEX as a logical claim system, not a suppression file.** The policy engine uses **Belnap K4 four-valued logic** (Unknown, True, False, Conflict) to merge SBOM, advisories, VEX, and waivers. +**Proof of exploitability, not just a badge.** Every reachability graph is sealed with DSSE. -| What Competitors Do | What Stella Does | -|--------------------|------------------| -| VEX filters findings (boolean) | VEX is logical claims with trust weighting | -| Conflicts hidden | Conflicts are explicit state (⊤) | -| "Vendor says not_affected" = done | Vendor + runtime + reachability merged; conflicts surfaced | -| Unknown = assume safe | Unknown = first-class state with risk implications | +| Layer | What It Proves | +|-------|---------------| +| **Static** | Call graph shows path from entrypoint → vulnerable function | +| **Binary** | Compiled binary contains the symbol | +| **Runtime** | Process actually executed the code path | -**Why it matters:** When vendor says "not_affected" but your runtime shows the function was called, you have a *conflict*—not a false positive. The lattice preserves this for policy resolution. +**Why it matters:** "Here's the exact call path" vs "potentially reachable." Signed, not claimed. -**Modules:** `VexLens`, `TrustLatticeEngine`, `Excititor` (110+ tests passing) +**Modules:** `ReachGraph`, `PathWitnessBuilder` ---- +### 9. Deterministic Replay -## 3. Sovereign Crypto Profiles - -**Regional compliance without code changes.** FIPS, eIDAS, GOST, SM, and PQC (post-quantum) profiles are configuration toggles, not recompiles. - -| Profile | Algorithms | Use Case | -|---------|-----------|----------| -| **FIPS-140-3** | ECDSA P-256, RSA-PSS | US federal requirements | -| **eIDAS** | ETSI TS 119 312 | EU qualified signatures | -| **GOST-2012** | GOST R 34.10-2012 | Russian Federation | -| **SM2** | GM/T 0003.2-2012 | People's Republic of China | -| **PQC** | Dilithium, Falcon | Post-quantum readiness | - -**Why it matters:** Multi-signature DSSE envelopes (sign with FIPS *and* GOST) for cross-jurisdiction compliance. No competitor offers this. - -**Modules:** `Cryptography`, `CryptoProfile`, `RootPack` - ---- - -## 4. Deterministic Replay - -**The audit-grade guarantee.** Every scan produces a DSSE + SRM bundle that can be replayed with `stella replay srm.yaml`. +**The audit-grade guarantee.** Every scan produces a DSSE + SRM bundle that can be replayed. ```bash # Six months later, prove what you knew @@ -93,212 +188,62 @@ stella replay srm.yaml --assert-digest sha256:abc123... # Output: PASS - identical result ``` -**What's frozen:** -- Feed snapshots (NVD, KEV, EPSS, distro advisories) with content hashes -- Analyzer versions and configs -- Policy rules and lattice state -- Random seeds for deterministic ordering +**What's frozen:** Feed snapshots, analyzer versions, policy rules, random seeds. -**Why it matters:** This is what "audit-grade" actually means. Not "we logged it" but "you can re-run it." +**Modules:** `Replay`, `Scanner`, `Policy` + +### 10. Sovereign Crypto Profiles + +**Regional compliance without code changes.** FIPS, eIDAS, GOST, SM, and PQC profiles are configuration toggles. + +| Profile | Use Case | +|---------|----------| +| **FIPS-140-3** | US federal | +| **eIDAS** | EU qualified signatures | +| **GOST-2012** | Russian Federation | +| **SM2** | People's Republic of China | +| **PQC** | Post-quantum readiness | + +**Modules:** `Cryptography`, `CryptoProfile` + +### 11. Offline Operations (Air-Gap Parity) + +**Full functionality without network.** Offline Update Kits bundle everything needed. + +| Component | Offline Method | +|-----------|----------------| +| Feed updates | Sealed bundle with Merkle roots | +| Crypto verification | Embedded revocation lists | +| Transparency logging | Local transparency mirror | + +**Modules:** `AirGap.Controller`, `TrustStore` --- -## 5. Offline Operations (Air-Gap Parity) +## Competitive Moats Summary -**Full functionality without network.** Offline Update Kits bundle everything needed for air-gapped operation. +**Six capabilities no competitor offers together:** -| Component | Online | Offline | -|-----------|--------|---------| -| Feed updates | Live | Sealed bundle with Merkle roots | -| Crypto verification | OCSP/CRL | Embedded revocation lists | -| Transparency logging | Rekor | Local transparency mirror | -| Trust roots | Live TSL | RootPack bundles | +| # | Capability | Category | +|---|-----------|----------| +| 1 | **Non-Kubernetes Specialization** | Release orchestration | +| 2 | **Digest-First Release Identity** | Release orchestration | +| 3 | **Security Gates in Promotion Flow** | Release orchestration | +| 4 | **Signed Reachability Proofs** | Security | +| 5 | **Deterministic Replay** | Security | +| 6 | **Sovereign + Offline Operation** | Operations | -**Why it matters:** Air-gapped environments get *identical* results to connected, not degraded. Competitors offer partial offline (cached feeds) but not epistemic parity (sealed, reproducible knowledge state). - -**Modules:** `AirGap.Controller`, `TrustStore`, `EgressPolicy` - ---- - -## 6. Signed Reachability Proofs - -**Proof of exploitability, not just a badge.** Every reachability graph is sealed with DSSE; optional edge-bundle attestations for contested paths. - -| Layer | What It Proves | Attestation | -|-------|---------------|-------------| -| **Static** | Call graph says function is reachable | Graph-level DSSE | -| **Binary** | Compiled binary contains the symbol | Build-ID mapping | -| **Runtime** | Process actually executed the code path | Edge-bundle DSSE (optional) | - -**Why it matters:** Not "potentially reachable" but "here's the exact call path from `main()` to `vulnerable_function()`." You can quarantine or dispute individual edges, not just all-or-nothing. - -**No competitor signs reachability graphs.** They claim reachability; we *prove* it. - -**Modules:** `ReachGraph`, `PathWitnessBuilder`, `CompositeGateDetector` - ---- - -## 7. Semantic Smart-Diff - -**Diff security meaning, not CVE counts.** Compare reachability graphs, policy outcomes, and trust weights between releases. - -``` -Before: 5 critical CVEs (3 reachable) -After: 7 critical CVEs (1 reachable) - -Smart-Diff output: "Exploitability DECREASED by 67% despite +2 CVEs" -``` - -**What's compared:** -- Reachability graph deltas -- VEX state changes -- Policy outcome changes -- Trust weight shifts - -**Why it matters:** "+3 CVEs" tells you nothing. "Reachable attack surface dropped by half" tells you everything. - -**Modules:** `MaterialRiskChangeDetector`, `RiskStateSnapshot`, `Scanner.ReachabilityDrift` - ---- - -## 8. Unknowns as First-Class State - -**Uncertainty is risk—we surface and score it.** Explicit modeling of what we *don't* know, with policy implications. - -| Band | Meaning | Policy Action | -|------|---------|---------------| -| **HOT** | High uncertainty + exploit pressure | Immediate investigation | -| **WARM** | Moderate uncertainty | Scheduled review | -| **COLD** | Low uncertainty | Decay toward resolution | -| **RESOLVED** | Uncertainty eliminated | No action | - -**Why it matters:** Competitors hide unknowns (assume safe). We track them with decay algorithms, blast-radius containment, and policy budgets ("fail if unknowns > N"). - -**Modules:** `UnknownStateLedger`, `Policy`, `Signals` - ---- - -## 9. Three-Layer Reachability Proofs - -**Structural false positive elimination.** All three layers must align for exploitability to be confirmed. - -``` -Layer 1 (Static): Call graph shows path from entrypoint → vulnerable function -Layer 2 (Binary): Compiled binary contains the symbol with matching offset -Layer 3 (Runtime): eBPF probe confirms function was actually executed -``` - -**Confidence tiers:** -- **Confirmed** — All three layers agree -- **Likely** — Static + binary agree; no runtime data -- **Present** — Package present; no reachability evidence -- **Unreachable** — Static analysis proves no path exists - -**Why it matters:** False positives become *structurally impossible*, not heuristically reduced. - -**Modules:** `Scanner.VulnSurfaces`, `PathWitnessBuilder` - ---- - -## 10. Competitive Moats Summary - -**Four capabilities no competitor offers together:** - -| # | Capability | Why It's Hard to Copy | -|---|-----------|----------------------| -| 1 | **Signed Reachability** | Requires three-layer instrumentation + cryptographic binding | -| 2 | **Deterministic Replay** | Requires content-addressed evidence + feed snapshotting | -| 3 | **K4 Lattice VEX** | Requires rethinking VEX from suppression to claims | -| 4 | **Sovereign Offline** | Requires pluggable crypto + offline trust roots | - -**Reference:** `docs/product/competitive-landscape.md`, `docs/product/moat-strategy-summary.md` - ---- - -## 11. Trust Algebra Engine (K4 Lattice) - -**Formal conflict resolution, not naive precedence.** The lattice engine uses Belnap K4 four-valued logic to aggregate heterogeneous security assertions. - -| State | Meaning | Example | -|-------|---------|---------| -| **Unknown (⊥)** | No information | New package, no VEX yet | -| **True (T)** | Positive assertion | "This CVE affects this package" | -| **False (F)** | Negative assertion | "This CVE does not affect this package" | -| **Conflict (⊤)** | Contradictory assertions | Vendor says not_affected; runtime says called | - -**Security Atoms (six orthogonal propositions):** -- PRESENT, APPLIES, REACHABLE, MITIGATED, FIXED, MISATTRIBUTED - -**Why it matters:** Unlike naive precedence (vendor > distro > scanner), we: -- Preserve conflicts as explicit state, not hidden -- Track critical unknowns separately from ancillary ones -- Produce deterministic, explainable dispositions - -**Modules:** `TrustLatticeEngine`, `Policy` (110+ tests passing) - ---- - -## 12. Deterministic Task Packs - -**Auditable automation.** TaskRunner executes declarative Task Packs with plan-hash binding, approvals, and DSSE evidence bundles. - -- **Plan-hash binding:** Task pack execution is tied to specific plan versions -- **Approval gates:** Human sign-off required before execution -- **Sealed mode:** Air-gap compatible execution -- **Evidence bundles:** DSSE-signed results for audit trails - -**Why it matters:** Same workflows online or offline, with provable provenance. - -**Reference:** `docs/modules/packs-registry/guides/spec.md`, `docs/modules/taskrunner/architecture.md` - ---- - -## 13. Evidence-Grade Testing - -**Determinism as a continuous guarantee.** CI lanes that make reproducibility continuously provable. - -| Test Type | What It Proves | -|----------|---------------| -| **Determinism tests** | Same inputs → same outputs | -| **Offline parity tests** | Air-gapped = connected results | -| **Contract stability tests** | APIs don't break | -| **Golden fixture tests** | Historical scans still replay | - -**Why it matters:** Regression-proof audits. Evidence, not assumptions, drives releases. - -**Reference:** `docs/technical/testing/testing-strategy-models.md`, `docs/TEST_SUITE_OVERVIEW.md` +**Pricing moat:** No per-seat, per-project, or per-deployment tax. Limits are environments + new digests/day. --- ## Quick Reference -### Key Commands - -```bash -# Determinism proof -stella scan --image --srm-out a.yaml -stella scan --image --srm-out b.yaml -diff a.yaml b.yaml # Identical - -# Replay proof -stella replay srm.yaml --assert-digest - -# Reachability proof -stella graph show --cve CVE-XXXX-YYYY --artifact - -# VEX evaluation -stella vex evaluate --artifact - -# Offline scan -stella rootpack import bundle.tar.gz -stella scan --offline --image -``` - ### Key Documents -- **Competitive Landscape**: `docs/product/competitive-landscape.md` -- **Moat Strategy**: `docs/product/moat-strategy-summary.md` -- **Proof Architecture**: `docs/modules/platform/proof-driven-moats-architecture.md` -- **Vision**: `docs/VISION.md` -- **Architecture Overview**: `docs/ARCHITECTURE_OVERVIEW.md` -- **Quickstart**: `docs/quickstart.md` +- **Product Vision**: [`docs/product/VISION.md`](product/VISION.md) +- **Architecture Overview**: [`docs/ARCHITECTURE_OVERVIEW.md`](ARCHITECTURE_OVERVIEW.md) +- **Release Orchestrator Architecture**: [`docs/modules/release-orchestrator/architecture.md`](modules/release-orchestrator/architecture.md) +- **Competitive Landscape**: [`docs/product/competitive-landscape.md`](product/competitive-landscape.md) +- **Quickstart**: [`docs/quickstart.md`](quickstart.md) +- **Feature Matrix**: [`docs/FEATURE_MATRIX.md`](FEATURE_MATRIX.md) diff --git a/docs/modules/release-orchestrator/README.md b/docs/modules/release-orchestrator/README.md new file mode 100644 index 000000000..7761711fb --- /dev/null +++ b/docs/modules/release-orchestrator/README.md @@ -0,0 +1,137 @@ +# Release Orchestrator + +> Central release control plane for non-Kubernetes container estates. + +**Status:** Planned (not yet implemented) +**Source:** [Full Architecture Specification](../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) + +## Purpose + +The Release Orchestrator extends Stella Ops from a vulnerability scanning platform into **Stella Ops Suite** — a unified release control plane for non-Kubernetes container environments. It integrates: + +- **Existing capabilities**: SBOM generation, reachability-aware vulnerability analysis, VEX support, policy engine, evidence locker, deterministic replay +- **New capabilities**: Environment management, release orchestration, promotion workflows, deployment execution, progressive delivery, audit-grade release governance + +## Scope + +| In Scope | Out of Scope | +|----------|--------------| +| Non-K8s container deployments (Docker, Compose, ECS, Nomad) | Kubernetes deployments (use ArgoCD, Flux) | +| Release identity via OCI digests | Tag-based release identity | +| Plugin-extensible integrations | Hard-coded vendor integrations | +| SSH/WinRM + agent-based deployment | Cloud-native serverless deployments | +| L4/L7 traffic management via router plugins | Built-in service mesh | + +## Documentation Structure + +### Design & Principles +- [Design Principles](design/principles.md) — Core principles and invariants +- [Key Decisions](design/decisions.md) — Architectural decision record + +### Implementation +- [Implementation Guide](implementation-guide.md) — .NET 10 patterns and best practices +- [Test Structure](test-structure.md) — Test organization and guidelines + +### Module Architecture +- [Module Overview](modules/overview.md) — All modules and themes +- [Integration Hub (INTHUB)](modules/integration-hub.md) — External integrations +- [Environment Manager (ENVMGR)](modules/environment-manager.md) — Environments and targets +- [Release Manager (RELMAN)](modules/release-manager.md) — Release bundles and versions +- [Workflow Engine (WORKFL)](modules/workflow-engine.md) — DAG execution +- [Promotion Manager (PROMOT)](modules/promotion-manager.md) — Approvals and gates +- [Deploy Orchestrator (DEPLOY)](modules/deploy-orchestrator.md) — Deployment execution +- [Agents (AGENTS)](modules/agents.md) — Deployment agents +- [Progressive Delivery (PROGDL)](modules/progressive-delivery.md) — A/B and canary +- [Release Evidence (RELEVI)](modules/evidence.md) — Evidence packets +- [Plugin System (PLUGIN)](modules/plugin-system.md) — Plugin infrastructure + +### Data Model +- [Database Schema](data-model/schema.md) — PostgreSQL schema specification +- [Entity Definitions](data-model/entities.md) — Entity descriptions + +### API Specification +- [API Overview](api/overview.md) — API design principles +- [Environment APIs](api/environments.md) — Environment endpoints +- [Release APIs](api/releases.md) — Release endpoints +- [Promotion APIs](api/promotions.md) — Promotion endpoints +- [Workflow APIs](api/workflows.md) — Workflow endpoints +- [Agent APIs](api/agents.md) — Agent endpoints +- [WebSocket APIs](api/websockets.md) — Real-time endpoints + +### Workflow Engine +- [Template Structure](workflow/templates.md) — Workflow template specification +- [Execution State Machine](workflow/execution.md) — Workflow state machine +- [Promotion State Machine](workflow/promotion.md) — Promotion state machine + +### Security +- [Security Overview](security/overview.md) — Security principles +- [Authentication & Authorization](security/auth.md) — AuthN/AuthZ +- [Agent Security](security/agent-security.md) — Agent security model +- [Threat Model](security/threat-model.md) — Threats and mitigations +- [Audit Trail](security/audit-trail.md) — Audit logging + +### Integrations +- [Integration Overview](integrations/overview.md) — Integration types +- [Connector Interface](integrations/connectors.md) — Connector specification +- [Webhook Architecture](integrations/webhooks.md) — Webhook handling +- [CI/CD Patterns](integrations/ci-cd.md) — CI/CD integration patterns + +### Deployment +- [Deployment Overview](deployment/overview.md) — Architecture overview +- [Deployment Strategies](deployment/strategies.md) — Deployment strategies +- [Agent-Based Deployment](deployment/agent-based.md) — Agent deployment +- [Agentless Deployment](deployment/agentless.md) — SSH/WinRM deployment +- [Artifact Generation](deployment/artifacts.md) — Generated artifacts + +### Progressive Delivery +- [Progressive Overview](progressive-delivery/overview.md) — Progressive delivery architecture +- [A/B Releases](progressive-delivery/ab-releases.md) — A/B release models +- [Canary Controller](progressive-delivery/canary.md) — Canary implementation +- [Router Plugins](progressive-delivery/routers.md) — Traffic routing plugins + +### UI/UX +- [Dashboard Specification](ui/dashboard.md) — Dashboard screens +- [Workflow Editor](ui/workflow-editor.md) — Workflow editor +- [Screen Reference](ui/screens.md) — Key UI screens + +### Operations +- [Metrics](operations/metrics.md) — Metrics specification +- [Logging](operations/logging.md) — Logging patterns +- [Tracing](operations/tracing.md) — Distributed tracing +- [Alerting](operations/alerting.md) — Alert rules + +### Implementation +- [Roadmap](roadmap.md) — Implementation phases +- [Resource Requirements](roadmap.md#resource-requirements) — Sizing + +### Appendices +- [Glossary](appendices/glossary.md) — Term definitions +- [Configuration Reference](appendices/config.md) — Configuration options +- [Error Codes](appendices/errors.md) — API error codes +- [Evidence Schema](appendices/evidence-schema.md) — Evidence packet format + +## Quick Reference + +### Key Principles + +1. **Digest-first release identity** — Releases are immutable OCI digests, not tags +2. **Evidence for every decision** — Every promotion/deployment produces sealed evidence +3. **Pluggable everything, stable core** — Integrations are plugins; core is stable +4. **No feature gating** — All plans include all features +5. **Offline-first operation** — Core works in air-gapped environments +6. **Immutable generated artifacts** — Every deployment generates stored artifacts + +### Platform Themes + +| Theme | Purpose | +|-------|---------| +| **INTHUB** | Integration hub — external system connections | +| **ENVMGR** | Environment management — environments, targets, agents | +| **RELMAN** | Release management — components, versions, releases | +| **WORKFL** | Workflow engine — DAG execution, steps | +| **PROMOT** | Promotion — approvals, gates, decisions | +| **DEPLOY** | Deployment — execution, artifacts, rollback | +| **AGENTS** | Agents — Docker, Compose, ECS, Nomad | +| **PROGDL** | Progressive delivery — A/B, canary | +| **RELEVI** | Evidence — packets, stickers, audit | +| **PLUGIN** | Plugins — registry, loader, SDK | diff --git a/docs/modules/release-orchestrator/api/overview.md b/docs/modules/release-orchestrator/api/overview.md new file mode 100644 index 000000000..4bb8f857b --- /dev/null +++ b/docs/modules/release-orchestrator/api/overview.md @@ -0,0 +1,299 @@ +# API Overview + +**Version**: v1 +**Base Path**: `/api/v1` + +## Design Principles + +| Principle | Implementation | +|-----------|----------------| +| **RESTful** | Resource-oriented URLs, standard HTTP methods | +| **Versioned** | `/api/v1/...` prefix; breaking changes require version bump | +| **Consistent** | Standard response envelope, error format, pagination | +| **Authenticated** | OAuth 2.0 Bearer tokens via Authority module | +| **Tenant-scoped** | Tenant ID from token; all operations scoped to tenant | +| **Audited** | All mutating operations logged with user/timestamp | + +## Authentication + +All API requests require a valid JWT Bearer token: + +```http +Authorization: Bearer +``` + +Tokens are issued by the Authority module and contain: +- `user_id`: User identifier +- `tenant_id`: Tenant scope +- `roles`: User roles +- `permissions`: Specific permissions + +## Standard Response Envelope + +### Success Response + +```typescript +interface ApiResponse { + success: true; + data: T; + meta?: { + pagination?: PaginationMeta; + requestId: string; + timestamp: string; + }; +} +``` + +### Error Response + +```typescript +interface ApiErrorResponse { + success: false; + error: { + code: string; // e.g., "PROMOTION_BLOCKED" + message: string; // Human-readable message + details?: object; // Additional context + validationErrors?: ValidationError[]; + }; + meta: { + requestId: string; + timestamp: string; + }; +} + +interface ValidationError { + field: string; + message: string; + code: string; +} +``` + +### Pagination + +```typescript +interface PaginationMeta { + page: number; + pageSize: number; + totalItems: number; + totalPages: number; + hasNext: boolean; + hasPrevious: boolean; +} +``` + +## HTTP Status Codes + +| Code | Description | +|------|-------------| +| `200` | Success | +| `201` | Created | +| `204` | No Content | +| `400` | Bad Request - validation error | +| `401` | Unauthorized - invalid/missing token | +| `403` | Forbidden - insufficient permissions | +| `404` | Not Found | +| `409` | Conflict - resource state conflict | +| `422` | Unprocessable Entity - business rule violation | +| `429` | Too Many Requests - rate limited | +| `500` | Internal Server Error | + +## Common Query Parameters + +| Parameter | Type | Description | +|-----------|------|-------------| +| `page` | integer | Page number (1-indexed) | +| `pageSize` | integer | Items per page (max 100) | +| `sort` | string | Sort field (prefix `-` for descending) | +| `filter` | string | JSON filter expression | + +## API Modules + +### Integration Hub (INTHUB) + +``` +GET /api/v1/integration-types +GET /api/v1/integration-types/{typeId} +POST /api/v1/integrations +GET /api/v1/integrations +GET /api/v1/integrations/{id} +PUT /api/v1/integrations/{id} +DELETE /api/v1/integrations/{id} +POST /api/v1/integrations/{id}/test +POST /api/v1/integrations/{id}/discover +GET /api/v1/integrations/{id}/health +``` + +### Environment & Inventory (ENVMGR) + +``` +POST /api/v1/environments +GET /api/v1/environments +GET /api/v1/environments/{id} +PUT /api/v1/environments/{id} +DELETE /api/v1/environments/{id} +POST /api/v1/environments/{envId}/freeze-windows +GET /api/v1/environments/{envId}/freeze-windows +DELETE /api/v1/environments/{envId}/freeze-windows/{windowId} +POST /api/v1/targets +GET /api/v1/targets +GET /api/v1/targets/{id} +PUT /api/v1/targets/{id} +DELETE /api/v1/targets/{id} +POST /api/v1/targets/{id}/health-check +GET /api/v1/targets/{id}/sticker +GET /api/v1/targets/{id}/drift +POST /api/v1/agents/register +GET /api/v1/agents +GET /api/v1/agents/{id} +PUT /api/v1/agents/{id} +DELETE /api/v1/agents/{id} +POST /api/v1/agents/{id}/heartbeat +``` + +### Release Management (RELMAN) + +``` +POST /api/v1/components +GET /api/v1/components +GET /api/v1/components/{id} +PUT /api/v1/components/{id} +DELETE /api/v1/components/{id} +POST /api/v1/components/{id}/sync-versions +GET /api/v1/components/{id}/versions +POST /api/v1/releases +GET /api/v1/releases +GET /api/v1/releases/{id} +PUT /api/v1/releases/{id} +DELETE /api/v1/releases/{id} +GET /api/v1/releases/{id}/state +POST /api/v1/releases/{id}/deprecate +GET /api/v1/releases/{id}/compare/{otherId} +POST /api/v1/releases/from-latest +``` + +### Workflow Engine (WORKFL) + +``` +POST /api/v1/workflow-templates +GET /api/v1/workflow-templates +GET /api/v1/workflow-templates/{id} +PUT /api/v1/workflow-templates/{id} +DELETE /api/v1/workflow-templates/{id} +POST /api/v1/workflow-templates/{id}/validate +GET /api/v1/step-types +GET /api/v1/step-types/{type} +POST /api/v1/workflow-runs +GET /api/v1/workflow-runs +GET /api/v1/workflow-runs/{id} +POST /api/v1/workflow-runs/{id}/pause +POST /api/v1/workflow-runs/{id}/resume +POST /api/v1/workflow-runs/{id}/cancel +GET /api/v1/workflow-runs/{id}/steps +GET /api/v1/workflow-runs/{id}/steps/{nodeId} +GET /api/v1/workflow-runs/{id}/steps/{nodeId}/logs +GET /api/v1/workflow-runs/{id}/steps/{nodeId}/artifacts +``` + +### Promotion & Approval (PROMOT) + +``` +POST /api/v1/promotions +GET /api/v1/promotions +GET /api/v1/promotions/{id} +POST /api/v1/promotions/{id}/approve +POST /api/v1/promotions/{id}/reject +POST /api/v1/promotions/{id}/cancel +GET /api/v1/promotions/{id}/decision +GET /api/v1/promotions/{id}/approvals +GET /api/v1/promotions/{id}/evidence +POST /api/v1/promotions/preview-gates +POST /api/v1/approval-policies +GET /api/v1/approval-policies +GET /api/v1/my/pending-approvals +``` + +### Deployment (DEPLOY) + +``` +GET /api/v1/deployment-jobs +GET /api/v1/deployment-jobs/{id} +GET /api/v1/deployment-jobs/{id}/tasks +GET /api/v1/deployment-jobs/{id}/tasks/{taskId} +GET /api/v1/deployment-jobs/{id}/tasks/{taskId}/logs +GET /api/v1/deployment-jobs/{id}/artifacts +GET /api/v1/deployment-jobs/{id}/artifacts/{artifactId} +POST /api/v1/rollbacks +GET /api/v1/rollbacks +``` + +### Progressive Delivery (PROGDL) + +``` +POST /api/v1/ab-releases +GET /api/v1/ab-releases +GET /api/v1/ab-releases/{id} +POST /api/v1/ab-releases/{id}/start +POST /api/v1/ab-releases/{id}/advance +POST /api/v1/ab-releases/{id}/promote +POST /api/v1/ab-releases/{id}/rollback +GET /api/v1/ab-releases/{id}/traffic +GET /api/v1/ab-releases/{id}/health +GET /api/v1/rollout-strategies +``` + +### Release Evidence (RELEVI) + +``` +GET /api/v1/evidence-packets +GET /api/v1/evidence-packets/{id} +GET /api/v1/evidence-packets/{id}/download +POST /api/v1/audit-reports +GET /api/v1/audit-reports/{id} +GET /api/v1/audit-reports/{id}/download +GET /api/v1/version-stickers +GET /api/v1/version-stickers/{id} +``` + +### Plugin Infrastructure (PLUGIN) + +``` +GET /api/v1/plugins +GET /api/v1/plugins/{id} +POST /api/v1/plugins/{id}/enable +POST /api/v1/plugins/{id}/disable +GET /api/v1/plugins/{id}/health +POST /api/v1/plugin-instances +GET /api/v1/plugin-instances +PUT /api/v1/plugin-instances/{id} +DELETE /api/v1/plugin-instances/{id} +``` + +## WebSocket Endpoints + +``` +WS /api/v1/workflow-runs/{id}/stream +WS /api/v1/deployment-jobs/{id}/stream +WS /api/v1/agents/{id}/task-stream +WS /api/v1/dashboard/stream +``` + +## Rate Limits + +| Tier | Requests/minute | Burst | +|------|-----------------|-------| +| Standard | 1000 | 100 | +| Premium | 5000 | 500 | + +Rate limit headers: +- `X-RateLimit-Limit`: Request limit +- `X-RateLimit-Remaining`: Remaining requests +- `X-RateLimit-Reset`: Reset timestamp + +## References + +- [Environments API](environments.md) +- [Releases API](releases.md) +- [Promotions API](promotions.md) +- [Workflows API](workflows.md) +- [Agents API](agents.md) +- [WebSocket API](websockets.md) diff --git a/docs/modules/release-orchestrator/appendices/errors.md b/docs/modules/release-orchestrator/appendices/errors.md new file mode 100644 index 000000000..3b829ddc7 --- /dev/null +++ b/docs/modules/release-orchestrator/appendices/errors.md @@ -0,0 +1,296 @@ +# API Error Codes + +## Overview + +All API errors follow a consistent format with error codes for programmatic handling. + +## Error Response Format + +```typescript +interface ApiErrorResponse { + success: false; + error: { + code: string; // Machine-readable error code + message: string; // Human-readable message + details?: object; // Additional context + validationErrors?: ValidationError[]; + }; + meta: { + requestId: string; + timestamp: string; + }; +} + +interface ValidationError { + field: string; + message: string; + code: string; +} +``` + +## Error Code Categories + +| Prefix | Category | HTTP Status Range | +|--------|----------|-------------------| +| `AUTH_` | Authentication | 401 | +| `PERM_` | Authorization/Permission | 403 | +| `VAL_` | Validation | 400 | +| `RES_` | Resource | 404, 409 | +| `ENV_` | Environment | 422 | +| `REL_` | Release | 422 | +| `PROM_` | Promotion | 422 | +| `DEPLOY_` | Deployment | 422 | +| `GATE_` | Gate | 422 | +| `AGT_` | Agent | 422 | +| `INT_` | Integration | 422 | +| `WF_` | Workflow | 422 | +| `SYS_` | System | 500 | + +## Authentication Errors (401) + +| Code | Message | Description | +|------|---------|-------------| +| `AUTH_TOKEN_MISSING` | Authentication token required | No token provided | +| `AUTH_TOKEN_INVALID` | Invalid authentication token | Token cannot be parsed | +| `AUTH_TOKEN_EXPIRED` | Authentication token expired | Token has expired | +| `AUTH_TOKEN_REVOKED` | Authentication token revoked | Token has been revoked | +| `AUTH_AGENT_CERT_INVALID` | Invalid agent certificate | Agent mTLS cert invalid | +| `AUTH_AGENT_CERT_EXPIRED` | Agent certificate expired | Agent cert has expired | +| `AUTH_API_KEY_INVALID` | Invalid API key | API key not recognized | + +## Permission Errors (403) + +| Code | Message | Description | +|------|---------|-------------| +| `PERM_DENIED` | Permission denied | Generic permission denial | +| `PERM_RESOURCE_DENIED` | Access to resource denied | Cannot access specific resource | +| `PERM_ACTION_DENIED` | Action not permitted | Cannot perform specific action | +| `PERM_SCOPE_DENIED` | Outside permitted scope | Action outside user's scope | +| `PERM_SOD_VIOLATION` | Separation of duties violation | SoD prevents action | +| `PERM_SELF_APPROVAL` | Cannot approve own request | Self-approval not allowed | +| `PERM_TENANT_MISMATCH` | Tenant mismatch | Resource belongs to different tenant | + +## Validation Errors (400) + +| Code | Message | Description | +|------|---------|-------------| +| `VAL_REQUIRED_FIELD` | Required field missing | Field is required | +| `VAL_INVALID_FORMAT` | Invalid field format | Field format incorrect | +| `VAL_INVALID_VALUE` | Invalid field value | Value not in allowed set | +| `VAL_TOO_LONG` | Field value too long | Exceeds max length | +| `VAL_TOO_SHORT` | Field value too short | Below min length | +| `VAL_INVALID_UUID` | Invalid UUID format | Not a valid UUID | +| `VAL_INVALID_DIGEST` | Invalid digest format | Not a valid OCI digest | +| `VAL_INVALID_SEMVER` | Invalid semver format | Not valid semantic version | +| `VAL_INVALID_JSON` | Invalid JSON | Request body not valid JSON | +| `VAL_SCHEMA_MISMATCH` | Schema validation failed | Doesn't match schema | + +## Resource Errors (404, 409) + +| Code | Message | HTTP | Description | +|------|---------|------|-------------| +| `RES_NOT_FOUND` | Resource not found | 404 | Generic not found | +| `RES_ENVIRONMENT_NOT_FOUND` | Environment not found | 404 | Environment doesn't exist | +| `RES_RELEASE_NOT_FOUND` | Release not found | 404 | Release doesn't exist | +| `RES_PROMOTION_NOT_FOUND` | Promotion not found | 404 | Promotion doesn't exist | +| `RES_TARGET_NOT_FOUND` | Target not found | 404 | Target doesn't exist | +| `RES_AGENT_NOT_FOUND` | Agent not found | 404 | Agent doesn't exist | +| `RES_CONFLICT` | Resource conflict | 409 | Resource state conflict | +| `RES_ALREADY_EXISTS` | Resource already exists | 409 | Duplicate resource | +| `RES_VERSION_CONFLICT` | Version conflict | 409 | Optimistic lock failure | + +## Environment Errors (422) + +| Code | Message | Description | +|------|---------|-------------| +| `ENV_FROZEN` | Environment is frozen | Deployment blocked by freeze window | +| `ENV_FREEZE_ACTIVE` | Active freeze window | Cannot modify during freeze | +| `ENV_INVALID_ORDER` | Invalid environment order | Order index conflict | +| `ENV_CIRCULAR_PROMOTION` | Circular promotion path | Auto-promote creates cycle | +| `ENV_QUOTA_EXCEEDED` | Environment quota exceeded | Max environments reached | + +## Release Errors (422) + +| Code | Message | Description | +|------|---------|-------------| +| `REL_ALREADY_FINALIZED` | Release already finalized | Cannot modify finalized release | +| `REL_NOT_READY` | Release not ready | Release not in ready state | +| `REL_DIGEST_MISMATCH` | Digest mismatch | Resolved digest differs | +| `REL_TAG_NOT_FOUND` | Tag not found in registry | Cannot resolve tag | +| `REL_COMPONENT_MISSING` | Component not found | Referenced component missing | +| `REL_INVALID_STATUS_TRANSITION` | Invalid status transition | Status change not allowed | +| `REL_DEPRECATED` | Release deprecated | Cannot promote deprecated release | + +## Promotion Errors (422) + +| Code | Message | Description | +|------|---------|-------------| +| `PROM_ALREADY_EXISTS` | Promotion already pending | Duplicate promotion request | +| `PROM_NOT_PENDING` | Promotion not pending | Cannot approve/reject | +| `PROM_ALREADY_APPROVED` | Promotion already approved | Already approved | +| `PROM_ALREADY_REJECTED` | Promotion already rejected | Already rejected | +| `PROM_ALREADY_CANCELLED` | Promotion already cancelled | Already cancelled | +| `PROM_DEPLOYING` | Promotion is deploying | Cannot cancel during deploy | +| `PROM_INVALID_STATE` | Invalid promotion state | State doesn't allow action | +| `PROM_APPROVER_REQUIRED` | Additional approvers required | Insufficient approvals | +| `PROM_SKIP_ENVIRONMENT` | Cannot skip environments | Must promote sequentially | + +## Deployment Errors (422) + +| Code | Message | Description | +|------|---------|-------------| +| `DEPLOY_IN_PROGRESS` | Deployment in progress | Another deployment running | +| `DEPLOY_NO_TARGETS` | No targets available | No targets in environment | +| `DEPLOY_TARGET_UNHEALTHY` | Target unhealthy | Target failed health check | +| `DEPLOY_AGENT_UNAVAILABLE` | Agent unavailable | Required agent offline | +| `DEPLOY_ARTIFACT_MISSING` | Deployment artifact missing | Required artifact not found | +| `DEPLOY_TIMEOUT` | Deployment timeout | Exceeded timeout | +| `DEPLOY_PULL_FAILED` | Image pull failed | Cannot pull container image | +| `DEPLOY_DIGEST_VERIFICATION_FAILED` | Digest verification failed | Image tampered | +| `DEPLOY_HEALTH_CHECK_FAILED` | Health check failed | Post-deploy health failed | +| `DEPLOY_ROLLBACK_IN_PROGRESS` | Rollback in progress | Already rolling back | +| `DEPLOY_NOTHING_TO_ROLLBACK` | Nothing to rollback | No previous deployment | + +## Gate Errors (422) + +| Code | Message | Description | +|------|---------|-------------| +| `GATE_EVALUATION_FAILED` | Gate evaluation failed | Gate cannot be evaluated | +| `GATE_SECURITY_BLOCKED` | Blocked by security gate | Security policy violation | +| `GATE_POLICY_BLOCKED` | Blocked by policy gate | Custom policy violation | +| `GATE_APPROVAL_BLOCKED` | Blocked pending approval | Awaiting approval | +| `GATE_TIMEOUT` | Gate evaluation timeout | Evaluation exceeded timeout | + +## Agent Errors (422) + +| Code | Message | Description | +|------|---------|-------------| +| `AGT_REGISTRATION_FAILED` | Agent registration failed | Cannot register agent | +| `AGT_TOKEN_INVALID` | Invalid registration token | Bad or expired token | +| `AGT_TOKEN_USED` | Registration token already used | One-time token reused | +| `AGT_CERTIFICATE_FAILED` | Certificate issuance failed | Cannot issue certificate | +| `AGT_OFFLINE` | Agent offline | Agent not responding | +| `AGT_CAPABILITY_MISSING` | Missing capability | Agent lacks required capability | +| `AGT_TASK_FAILED` | Task execution failed | Agent task failed | +| `AGT_HEARTBEAT_TIMEOUT` | Heartbeat timeout | Agent heartbeat overdue | + +## Integration Errors (422) + +| Code | Message | Description | +|------|---------|-------------| +| `INT_CONNECTION_FAILED` | Connection failed | Cannot connect to integration | +| `INT_AUTH_FAILED` | Authentication failed | Integration auth failed | +| `INT_RATE_LIMITED` | Rate limited | Integration rate limit hit | +| `INT_TIMEOUT` | Integration timeout | Request timeout | +| `INT_INVALID_RESPONSE` | Invalid response | Unexpected response format | +| `INT_RESOURCE_NOT_FOUND` | External resource not found | Registry/SCM resource missing | + +## Workflow Errors (422) + +| Code | Message | Description | +|------|---------|-------------| +| `WF_TEMPLATE_NOT_FOUND` | Workflow template not found | Template doesn't exist | +| `WF_TEMPLATE_INVALID` | Invalid workflow template | Template validation failed | +| `WF_CYCLE_DETECTED` | Cycle detected in workflow | DAG contains cycle | +| `WF_STEP_FAILED` | Workflow step failed | Step execution failed | +| `WF_ALREADY_RUNNING` | Workflow already running | Duplicate workflow run | +| `WF_INVALID_STATE` | Invalid workflow state | Cannot perform action | +| `WF_EXPRESSION_ERROR` | Expression evaluation error | Bad expression | + +## System Errors (500) + +| Code | Message | Description | +|------|---------|-------------| +| `SYS_INTERNAL_ERROR` | Internal server error | Unexpected error | +| `SYS_DATABASE_ERROR` | Database error | Database operation failed | +| `SYS_STORAGE_ERROR` | Storage error | Storage operation failed | +| `SYS_VAULT_ERROR` | Vault error | Secret retrieval failed | +| `SYS_QUEUE_ERROR` | Queue error | Message queue failed | +| `SYS_SERVICE_UNAVAILABLE` | Service unavailable | Dependency unavailable | +| `SYS_OVERLOADED` | System overloaded | Capacity exceeded | + +## Example Error Responses + +### Validation Error + +```json +{ + "success": false, + "error": { + "code": "VAL_REQUIRED_FIELD", + "message": "Validation failed", + "validationErrors": [ + { + "field": "releaseId", + "message": "Release ID is required", + "code": "VAL_REQUIRED_FIELD" + }, + { + "field": "targetEnvironmentId", + "message": "Invalid UUID format", + "code": "VAL_INVALID_UUID" + } + ] + }, + "meta": { + "requestId": "req-12345", + "timestamp": "2026-01-10T14:30:00Z" + } +} +``` + +### Permission Error + +```json +{ + "success": false, + "error": { + "code": "PERM_SOD_VIOLATION", + "message": "Separation of duties violation: requester cannot approve their own promotion", + "details": { + "promotionId": "promo-uuid", + "requesterId": "user-uuid", + "approverId": "user-uuid", + "environmentId": "env-uuid", + "requiresSoD": true + } + }, + "meta": { + "requestId": "req-12345", + "timestamp": "2026-01-10T14:30:00Z" + } +} +``` + +### Gate Block Error + +```json +{ + "success": false, + "error": { + "code": "GATE_SECURITY_BLOCKED", + "message": "Promotion blocked by security gate", + "details": { + "gateName": "security-gate", + "releaseId": "rel-uuid", + "targetEnvironment": "production", + "violations": [ + { + "type": "critical_vulnerability", + "count": 3, + "threshold": 0 + } + ] + } + }, + "meta": { + "requestId": "req-12345", + "timestamp": "2026-01-10T14:30:00Z" + } +} +``` + +## References + +- [API Overview](../api/overview.md) +- [Security Overview](../security/overview.md) diff --git a/docs/modules/release-orchestrator/appendices/evidence-schema.md b/docs/modules/release-orchestrator/appendices/evidence-schema.md new file mode 100644 index 000000000..8d50c7320 --- /dev/null +++ b/docs/modules/release-orchestrator/appendices/evidence-schema.md @@ -0,0 +1,549 @@ +# Evidence Packet Schema + +## Overview + +Evidence packets are cryptographically signed, immutable records of deployment decisions and outcomes. They provide audit-grade proof of who did what, when, and why. + +## Evidence Packet Types + +| Type | Description | Generated When | +|------|-------------|----------------| +| `release_decision` | Promotion decision evidence | Promotion approved/rejected | +| `deployment` | Deployment execution evidence | Deployment completes | +| `rollback` | Rollback evidence | Rollback completes | +| `ab_promotion` | A/B release promotion evidence | A/B promotion completes | + +## Schema Definition + +### Evidence Packet Structure + +```typescript +interface EvidencePacket { + // Identification + id: UUID; + version: "1.0"; + type: EvidencePacketType; + + // Metadata + generatedAt: DateTime; + generatorVersion: string; + tenantId: UUID; + + // Content + content: EvidenceContent; + + // Integrity + contentHash: string; // SHA-256 of canonical JSON content + signature: string; // Base64-encoded signature + signatureAlgorithm: string; // "RS256", "ES256" + signerKeyRef: string; // Reference to signing key +} + +type EvidencePacketType = + | "release_decision" + | "deployment" + | "rollback" + | "ab_promotion"; +``` + +### Evidence Content + +```typescript +interface EvidenceContent { + // What was released + release: ReleaseEvidence; + + // Where it was released + environment: EnvironmentEvidence; + + // Who requested and approved + actors: ActorEvidence; + + // Why it was allowed + decision: DecisionEvidence; + + // How it was executed (deployment only) + execution?: ExecutionEvidence; + + // Previous state (for rollback) + previous?: PreviousStateEvidence; +} +``` + +### Release Evidence + +```typescript +interface ReleaseEvidence { + id: UUID; + name: string; + displayName: string; + createdAt: DateTime; + createdBy: ActorRef; + + components: Array<{ + id: UUID; + name: string; + digest: string; + semver: string; + tag: string; + role: "primary" | "sidecar" | "init" | "migration"; + }>; + + sourceRef?: { + scmIntegrationId?: UUID; + repository?: string; + commitSha?: string; + branch?: string; + ciIntegrationId?: UUID; + buildId?: string; + pipelineUrl?: string; + }; +} +``` + +### Environment Evidence + +```typescript +interface EnvironmentEvidence { + id: UUID; + name: string; + displayName: string; + orderIndex: number; + + targets: Array<{ + id: UUID; + name: string; + type: string; + healthStatus: string; + }>; + + configuration: { + requiredApprovals: number; + requireSeparationOfDuties: boolean; + promotionPolicy?: string; + deploymentTimeout: number; + }; +} +``` + +### Actor Evidence + +```typescript +interface ActorEvidence { + requester: ActorRef; + requestReason: string; + requestedAt: DateTime; + + approvers: Array<{ + actor: ActorRef; + action: "approved" | "rejected"; + comment?: string; + timestamp: DateTime; + roles: string[]; + }>; + + deployer?: { + agent: AgentRef; + triggeredBy: ActorRef; + startedAt: DateTime; + }; +} + +interface ActorRef { + id: UUID; + type: "user" | "system" | "agent"; + name: string; + email?: string; +} + +interface AgentRef { + id: UUID; + name: string; + version: string; +} +``` + +### Decision Evidence + +```typescript +interface DecisionEvidence { + promotionId: UUID; + decision: "allow" | "block"; + decidedAt: DateTime; + + gateResults: Array<{ + gateName: string; + gateType: string; + passed: boolean; + blocking: boolean; + message: string; + evaluatedAt: DateTime; + details: object; + }>; + + freezeWindowCheck: { + checked: boolean; + windowActive: boolean; + windowId?: UUID; + exemption?: { + grantedBy: UUID; + reason: string; + }; + }; + + separationOfDuties: { + required: boolean; + satisfied: boolean; + requesterIds: UUID[]; + approverIds: UUID[]; + }; +} +``` + +### Execution Evidence + +```typescript +interface ExecutionEvidence { + deploymentJobId: UUID; + strategy: string; + startedAt: DateTime; + completedAt: DateTime; + status: "succeeded" | "failed" | "rolled_back"; + + tasks: Array<{ + targetId: UUID; + targetName: string; + agentId: UUID; + status: string; + startedAt: DateTime; + completedAt: DateTime; + digest: string; + stickerWritten: boolean; + error?: string; + }>; + + artifacts: Array<{ + name: string; + type: string; + sha256: string; + storageRef: string; + }>; + + metrics: { + totalTasks: number; + succeededTasks: number; + failedTasks: number; + totalDurationSeconds: number; + }; +} +``` + +### Previous State Evidence + +```typescript +interface PreviousStateEvidence { + releaseId: UUID; + releaseName: string; + deployedAt: DateTime; + deployedBy: ActorRef; + components: Array<{ + name: string; + digest: string; + }>; +} +``` + +## Example Evidence Packet + +```json +{ + "id": "evid-12345-uuid", + "version": "1.0", + "type": "deployment", + "generatedAt": "2026-01-10T14:35:00Z", + "generatorVersion": "stella-evidence-generator@1.5.0", + "tenantId": "tenant-uuid", + + "content": { + "release": { + "id": "rel-uuid", + "name": "myapp-v2.3.1", + "displayName": "MyApp v2.3.1", + "createdAt": "2026-01-10T10:00:00Z", + "createdBy": { + "id": "user-uuid", + "type": "user", + "name": "John Doe", + "email": "john@example.com" + }, + "components": [ + { + "id": "comp-api-uuid", + "name": "api", + "digest": "sha256:abc123def456...", + "semver": "2.3.1", + "tag": "v2.3.1", + "role": "primary" + }, + { + "id": "comp-worker-uuid", + "name": "worker", + "digest": "sha256:789xyz...", + "semver": "2.3.1", + "tag": "v2.3.1", + "role": "primary" + } + ], + "sourceRef": { + "repository": "github.com/myorg/myapp", + "commitSha": "abc123", + "branch": "main", + "buildId": "build-456" + } + }, + + "environment": { + "id": "env-prod-uuid", + "name": "production", + "displayName": "Production", + "orderIndex": 2, + "targets": [ + { + "id": "target-1-uuid", + "name": "prod-web-01", + "type": "compose_host", + "healthStatus": "healthy" + }, + { + "id": "target-2-uuid", + "name": "prod-web-02", + "type": "compose_host", + "healthStatus": "healthy" + } + ], + "configuration": { + "requiredApprovals": 2, + "requireSeparationOfDuties": true, + "deploymentTimeout": 600 + } + }, + + "actors": { + "requester": { + "id": "user-john-uuid", + "type": "user", + "name": "John Doe", + "email": "john@example.com" + }, + "requestReason": "Release v2.3.1 with performance improvements", + "requestedAt": "2026-01-10T12:00:00Z", + "approvers": [ + { + "actor": { + "id": "user-jane-uuid", + "type": "user", + "name": "Jane Smith", + "email": "jane@example.com" + }, + "action": "approved", + "comment": "LGTM, tests passed", + "timestamp": "2026-01-10T13:00:00Z", + "roles": ["release_manager"] + }, + { + "actor": { + "id": "user-bob-uuid", + "type": "user", + "name": "Bob Johnson", + "email": "bob@example.com" + }, + "action": "approved", + "comment": "Approved for production", + "timestamp": "2026-01-10T13:30:00Z", + "roles": ["approver"] + } + ], + "deployer": { + "agent": { + "id": "agent-prod-uuid", + "name": "prod-agent-01", + "version": "1.5.0" + }, + "triggeredBy": { + "id": "system", + "type": "system", + "name": "Stella Orchestrator" + }, + "startedAt": "2026-01-10T14:00:00Z" + } + }, + + "decision": { + "promotionId": "promo-uuid", + "decision": "allow", + "decidedAt": "2026-01-10T13:55:00Z", + "gateResults": [ + { + "gateName": "security-gate", + "gateType": "security", + "passed": true, + "blocking": true, + "message": "No critical or high vulnerabilities", + "evaluatedAt": "2026-01-10T13:50:00Z", + "details": { + "critical": 0, + "high": 0, + "medium": 5, + "low": 12 + } + }, + { + "gateName": "approval-gate", + "gateType": "approval", + "passed": true, + "blocking": true, + "message": "2/2 required approvals received", + "evaluatedAt": "2026-01-10T13:55:00Z", + "details": { + "required": 2, + "received": 2 + } + } + ], + "freezeWindowCheck": { + "checked": true, + "windowActive": false + }, + "separationOfDuties": { + "required": true, + "satisfied": true, + "requesterIds": ["user-john-uuid"], + "approverIds": ["user-jane-uuid", "user-bob-uuid"] + } + }, + + "execution": { + "deploymentJobId": "job-uuid", + "strategy": "rolling", + "startedAt": "2026-01-10T14:00:00Z", + "completedAt": "2026-01-10T14:35:00Z", + "status": "succeeded", + "tasks": [ + { + "targetId": "target-1-uuid", + "targetName": "prod-web-01", + "agentId": "agent-prod-uuid", + "status": "succeeded", + "startedAt": "2026-01-10T14:00:00Z", + "completedAt": "2026-01-10T14:15:00Z", + "digest": "sha256:abc123def456...", + "stickerWritten": true + }, + { + "targetId": "target-2-uuid", + "targetName": "prod-web-02", + "agentId": "agent-prod-uuid", + "status": "succeeded", + "startedAt": "2026-01-10T14:20:00Z", + "completedAt": "2026-01-10T14:35:00Z", + "digest": "sha256:abc123def456...", + "stickerWritten": true + } + ], + "artifacts": [ + { + "name": "compose.stella.lock.yml", + "type": "compose-lock", + "sha256": "checksum...", + "storageRef": "s3://artifacts/job-uuid/compose.stella.lock.yml" + } + ], + "metrics": { + "totalTasks": 2, + "succeededTasks": 2, + "failedTasks": 0, + "totalDurationSeconds": 2100 + } + } + }, + + "contentHash": "sha256:content-hash...", + "signature": "base64-signature...", + "signatureAlgorithm": "RS256", + "signerKeyRef": "stella/signing/prod-key-2026" +} +``` + +## Signature Verification + +```typescript +async function verifyEvidencePacket(packet: EvidencePacket): Promise { + // 1. Verify content hash + const canonicalContent = canonicalize(packet.content); + const computedHash = sha256(canonicalContent); + + if (computedHash !== packet.contentHash) { + return { valid: false, error: "Content hash mismatch" }; + } + + // 2. Get signing key + const publicKey = await getPublicKey(packet.signerKeyRef); + + // 3. Verify signature + const signatureValid = await verify( + packet.signature, + packet.contentHash, + publicKey, + packet.signatureAlgorithm + ); + + if (!signatureValid) { + return { valid: false, error: "Invalid signature" }; + } + + return { valid: true }; +} +``` + +## Storage + +Evidence packets are stored in an append-only table: + +```sql +CREATE TABLE release.evidence_packets ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + promotion_id UUID NOT NULL REFERENCES release.promotions(id), + type TEXT NOT NULL, + version TEXT NOT NULL DEFAULT '1.0', + content JSONB NOT NULL, + content_hash TEXT NOT NULL, + signature TEXT NOT NULL, + signature_algorithm TEXT NOT NULL, + signer_key_ref TEXT NOT NULL, + generated_at TIMESTAMPTZ NOT NULL, + generator_version TEXT NOT NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT now() + -- Note: No updated_at - packets are immutable +); + +-- Prevent modifications +REVOKE UPDATE, DELETE ON release.evidence_packets FROM app_role; +``` + +## Export Formats + +Evidence packets can be exported in multiple formats: + +| Format | Use Case | +|--------|----------| +| JSON | API consumption, archival | +| PDF | Human-readable compliance reports | +| CSV | Spreadsheet analysis | +| SLSA | SLSA provenance format | + +## References + +- [Security Overview](../security/overview.md) +- [Deployment Artifacts](../deployment/artifacts.md) +- [Audit Trail](../security/audit-trail.md) diff --git a/docs/modules/release-orchestrator/appendices/glossary.md b/docs/modules/release-orchestrator/appendices/glossary.md new file mode 100644 index 000000000..0a59a5e95 --- /dev/null +++ b/docs/modules/release-orchestrator/appendices/glossary.md @@ -0,0 +1,235 @@ +# Glossary + +## Core Concepts + +### Agent +A software component installed on deployment targets that receives and executes deployment tasks. Agents communicate with the orchestrator via mTLS and execute deployments locally on the target. + +### Approval +A human decision to authorize a promotion request. Approvals may require multiple approvers and enforce separation of duties. + +### Approval Policy +Rules defining who can approve promotions to specific environments, including required approval counts and SoD requirements. + +### Blue-Green Deployment +A deployment strategy using two identical production environments. Traffic switches from "blue" (current) to "green" (new) after validation. + +### Canary Deployment +A deployment strategy that gradually rolls out changes to a small subset of targets before full deployment, allowing validation with real traffic. + +### Channel +A version stream for components (e.g., "stable", "beta", "nightly"). Each channel tracks the latest compatible version. + +### Component +A deployable unit mapped to a container image repository. Components have versions tracked via digest. + +### Compose Lock +A Docker Compose file with all image references pinned to specific digests, ensuring reproducible deployments. + +### Connector +A plugin that integrates Release Orchestrator with external systems (registries, CI/CD, notifications, etc.). + +### Decision Record +An immutable record of all gate evaluations and conditions considered when making a promotion decision. + +### Deployment Job +A unit of work representing the deployment of a release to an environment. Contains multiple deployment tasks. + +### Deployment Task +A single target-level deployment operation within a deployment job. + +### Digest +A cryptographic hash (SHA-256) that uniquely identifies a container image. Format: `sha256:abc123...` + +### Drift +A mismatch between the expected deployed version (from version sticker) and the actual running version on a target. + +### Environment +A logical grouping of deployment targets representing a stage in the promotion pipeline (e.g., dev, staging, production). + +### Evidence Packet +An immutable, cryptographically signed record of deployment decisions and outcomes for audit purposes. + +### Freeze Window +A time period during which deployments to an environment are blocked (e.g., holiday code freeze). + +### Gate +A checkpoint in the promotion workflow that must pass before deployment proceeds. Types include security gates, approval gates, and custom policy gates. + +### Promotion +The process of moving a release from one environment to another, subject to gates and approvals. + +### Release +A versioned bundle of component digests representing a deployable unit. Releases are immutable once created. + +### Rolling Deployment +A deployment strategy that updates targets in batches, maintaining availability throughout the process. + +### Rollback +The process of reverting to a previous release version when a deployment fails or causes issues. + +### Security Gate +An automated gate that evaluates security policies (vulnerability thresholds, compliance requirements) before allowing promotion. + +### Separation of Duties (SoD) +A security principle requiring that the person who requests a promotion cannot be the same person who approves it. + +### Step +A single unit of work within a workflow template. Steps have types (deploy, approve, notify, etc.) and can have dependencies. + +### Target +A specific deployment destination (host, service, container) within an environment. + +### Tenant +An isolated organizational unit with its own environments, releases, and configurations. Multi-tenancy ensures data isolation. + +### Version Map +A mapping of image tags to digests for a component, allowing tag-based references while maintaining digest-based deployments. + +### Version Sticker +Metadata placed on deployment targets indicating the currently deployed release and digest. + +### Workflow +A DAG (Directed Acyclic Graph) of steps defining the deployment process, including gates, approvals, and verification. + +### Workflow Template +A reusable workflow definition that can be customized for specific deployment scenarios. + +## Module Abbreviations + +| Abbreviation | Full Name | Description | +|--------------|-----------|-------------| +| INTHUB | Integration Hub | External system integration | +| ENVMGR | Environment Manager | Environment and target management | +| RELMAN | Release Management | Component and release management | +| WORKFL | Workflow Engine | Workflow execution | +| PROMOT | Promotion & Approval | Promotion and approval handling | +| DEPLOY | Deployment Execution | Deployment orchestration | +| AGENTS | Deployment Agents | Agent management | +| PROGDL | Progressive Delivery | A/B and canary releases | +| RELEVI | Release Evidence | Audit and compliance | +| PLUGIN | Plugin Infrastructure | Plugin system | + +## Deployment Strategies + +| Strategy | Description | +|----------|-------------| +| All-at-once | Deploy to all targets simultaneously | +| Rolling | Deploy in batches with availability | +| Canary | Gradual rollout with metrics validation | +| Blue-Green | Parallel environment with traffic switch | + +## Status Values + +### Promotion Status + +| Status | Description | +|--------|-------------| +| `pending` | Promotion created, not yet evaluated | +| `pending_approval` | Waiting for human approval | +| `approved` | Approved, ready for deployment | +| `rejected` | Rejected by approver | +| `deploying` | Deployment in progress | +| `completed` | Successfully deployed | +| `failed` | Deployment failed | +| `cancelled` | Cancelled by user | + +### Deployment Job Status + +| Status | Description | +|--------|-------------| +| `pending` | Job created, not started | +| `preparing` | Generating artifacts | +| `running` | Tasks executing | +| `completing` | Verifying deployment | +| `completed` | Successfully completed | +| `failed` | Deployment failed | +| `rolling_back` | Rollback in progress | +| `rolled_back` | Rollback completed | + +### Agent Status + +| Status | Description | +|--------|-------------| +| `online` | Agent connected and healthy | +| `offline` | Agent not connected | +| `degraded` | Agent connected but reporting issues | + +### Target Health Status + +| Status | Description | +|--------|-------------| +| `healthy` | Target responding correctly | +| `unhealthy` | Target failing health checks | +| `unknown` | Health status not determined | + +## API Error Codes + +| Code | Description | +|------|-------------| +| `RELEASE_NOT_FOUND` | Release ID does not exist | +| `ENVIRONMENT_NOT_FOUND` | Environment ID does not exist | +| `PROMOTION_BLOCKED` | Promotion blocked by gate or freeze | +| `APPROVAL_REQUIRED` | Promotion requires approval | +| `INSUFFICIENT_APPROVALS` | Not enough approvals | +| `SOD_VIOLATION` | Separation of duties violated | +| `FREEZE_WINDOW_ACTIVE` | Environment in freeze window | +| `SECURITY_GATE_FAILED` | Security requirements not met | +| `NO_AGENT_AVAILABLE` | No agent available for target | +| `DEPLOYMENT_IN_PROGRESS` | Another deployment running | +| `ROLLBACK_NOT_POSSIBLE` | No previous version to rollback to | + +## Integration Types + +| Type | Category | Description | +|------|----------|-------------| +| `docker-registry` | Registry | Docker Registry v2 | +| `ecr` | Registry | AWS ECR | +| `acr` | Registry | Azure Container Registry | +| `gcr` | Registry | Google Container Registry | +| `harbor` | Registry | Harbor Registry | +| `gitlab-ci` | CI/CD | GitLab CI/CD | +| `github-actions` | CI/CD | GitHub Actions | +| `jenkins` | CI/CD | Jenkins | +| `slack` | Notification | Slack | +| `teams` | Notification | Microsoft Teams | +| `email` | Notification | Email (SMTP) | +| `hashicorp-vault` | Secrets | HashiCorp Vault | +| `prometheus` | Metrics | Prometheus | + +## Workflow Step Types + +| Type | Category | Description | +|------|----------|-------------| +| `approval` | Control | Wait for human approval | +| `wait` | Control | Wait for duration | +| `condition` | Control | Branch based on condition | +| `parallel` | Control | Execute children in parallel | +| `security-gate` | Gate | Evaluate security policy | +| `custom-gate` | Gate | Custom OPA policy | +| `freeze-check` | Gate | Check freeze windows | +| `deploy-docker` | Deploy | Deploy single container | +| `deploy-compose` | Deploy | Deploy Compose stack | +| `health-check` | Verify | HTTP/TCP health check | +| `smoke-test` | Verify | Run smoke tests | +| `notify` | Notify | Send notification | +| `webhook` | Integration | Call external webhook | +| `trigger-ci` | Integration | Trigger CI pipeline | +| `rollback` | Recovery | Rollback deployment | + +## Security Terms + +| Term | Description | +|------|-------------| +| mTLS | Mutual TLS - both client and server authenticate with certificates | +| JWT | JSON Web Token - used for API authentication | +| RBAC | Role-Based Access Control | +| OPA | Open Policy Agent - policy evaluation engine | +| SoD | Separation of Duties | +| PEP | Policy Enforcement Point | + +## References + +- [Design Principles](../design/principles.md) +- [API Overview](../api/overview.md) +- [Security Overview](../security/overview.md) diff --git a/docs/modules/release-orchestrator/architecture.md b/docs/modules/release-orchestrator/architecture.md new file mode 100644 index 000000000..5af298546 --- /dev/null +++ b/docs/modules/release-orchestrator/architecture.md @@ -0,0 +1,410 @@ +# Release Orchestrator Architecture + +> Technical architecture specification for the Release Orchestrator — Stella Ops Suite's central release control plane for non-Kubernetes container estates. + +**Status:** Planned (not yet implemented) + +## Overview + +The Release Orchestrator transforms Stella Ops Suite from a vulnerability scanning platform into a centralized, auditable release control plane. It sits between CI systems and runtime targets, governing promotion across environments, enforcing security and policy gates, and producing verifiable evidence for every release decision. + +### Core Value Proposition + +- **Release orchestration** — UI-driven promotion (Dev → Stage → Prod), approvals, policy gates, rollbacks +- **Security decisioning as a gate** — Scan on build, evaluate on release, re-evaluate on CVE updates +- **OCI-digest-first releases** — Immutable digest-based release identity +- **Toolchain-agnostic integrations** — Plug into any SCM, CI, registry, secrets system +- **Auditability + standards** — Evidence packets, SBOM/VEX/attestation support, deterministic replay + +## Design Principles + +1. **Digest-First Release Identity** — A release is an immutable set of OCI digests, never mutable tags. Tags are resolved to digests at release creation time. + +2. **Pluggable Everything, Stable Core** — Integrations are plugins; the core orchestration engine is stable. Plugins contribute UI screens, connector logic, step types, and agent types. + +3. **Evidence for Every Decision** — Every deployment/promotion produces an immutable evidence record containing who, what, why, how, and when. + +4. **No Feature Gating** — All plans include all features. Limits are only: environments, new digests/day, fair use on deployments. + +5. **Offline-First Operation** — All core operations work in air-gapped environments. Plugins may require connectivity; core does not. + +6. **Immutable Generated Artifacts** — Every deployment generates and stores immutable artifacts (compose lockfiles, scripts, evidence). + +## Platform Themes + +The Release Orchestrator introduces ten new functional themes: + +| Theme | Purpose | Key Modules | +|-------|---------|-------------| +| **INTHUB** | Integration hub | Integration Manager, Connection Profiles, Connector Runtime | +| **ENVMGR** | Environment management | Environment Manager, Target Registry, Agent Manager | +| **RELMAN** | Release management | Component Registry, Version Manager, Release Manager | +| **WORKFL** | Workflow engine | Workflow Designer, Workflow Engine, Step Executor | +| **PROMOT** | Promotion and approval | Promotion Manager, Approval Gateway, Decision Engine | +| **DEPLOY** | Deployment execution | Deploy Orchestrator, Target Executor, Artifact Generator | +| **AGENTS** | Deployment agents | Agent Core, Docker/Compose/ECS/Nomad agents | +| **PROGDL** | Progressive delivery | A/B Manager, Traffic Router, Canary Controller | +| **RELEVI** | Release evidence | Evidence Collector, Sticker Writer, Audit Exporter | +| **PLUGIN** | Plugin infrastructure | Plugin Registry, Plugin Loader, Plugin SDK | + +## Components + +``` +ReleaseOrchestrator/ +├── __Libraries/ +│ ├── StellaOps.ReleaseOrchestrator.Core/ # Core domain models +│ ├── StellaOps.ReleaseOrchestrator.Workflow/ # DAG workflow engine +│ ├── StellaOps.ReleaseOrchestrator.Promotion/ # Promotion logic +│ ├── StellaOps.ReleaseOrchestrator.Deploy/ # Deployment coordination +│ ├── StellaOps.ReleaseOrchestrator.Evidence/ # Evidence generation +│ ├── StellaOps.ReleaseOrchestrator.Plugin/ # Plugin infrastructure +│ └── StellaOps.ReleaseOrchestrator.Integration/ # Integration connectors +├── StellaOps.ReleaseOrchestrator.WebService/ # HTTP API +├── StellaOps.ReleaseOrchestrator.Worker/ # Background processing +├── StellaOps.Agent.Core/ # Agent base framework +├── StellaOps.Agent.Docker/ # Docker host agent +├── StellaOps.Agent.Compose/ # Docker Compose agent +├── StellaOps.Agent.SSH/ # SSH agentless executor +├── StellaOps.Agent.WinRM/ # WinRM agentless executor +├── StellaOps.Agent.ECS/ # AWS ECS agent +├── StellaOps.Agent.Nomad/ # HashiCorp Nomad agent +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.*.Tests/ +``` + +## Data Flow + +### Release Orchestration Flow + +``` +CI Build → Registry Push → Webhook → Stella Scan → Create Release → +Request Promotion → Gate Evaluation → Decision Record → +Deploy via Agent → Version Sticker → Evidence Packet +``` + +### Detailed Flow + +1. **CI pushes image** to registry by digest; triggers webhook to Stella +2. **Stella scans** the new digest (if not already scanned); stores verdict +3. **Release created** bundling component digests with semantic version +4. **Promotion requested** to move release from source → target environment +5. **Gate evaluation** runs: security verdict, approval count, freeze windows, custom policies +6. **Decision record** produced with evidence refs and signed +7. **Deployment executed** via agent to target (Docker/Compose/ECS/Nomad) +8. **Version sticker** written to target for drift detection +9. **Evidence packet** sealed and stored + +## Key Abstractions + +### Environment + +```csharp +public sealed record Environment +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required string Name { get; init; } // "dev", "stage", "prod" + public required string Slug { get; init; } // URL-safe identifier + public required int PromotionOrder { get; init; } // 1, 2, 3... + public required FreezeWindow[] FreezeWindows { get; init; } + public required ApprovalPolicy ApprovalPolicy { get; init; } + public required bool IsProduction { get; init; } + public EnvironmentState State { get; init; } // Active, Frozen, Retired +} +``` + +### Release + +```csharp +public sealed record Release +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required string Version { get; init; } // SemVer: "2.3.1" + public required string Name { get; init; } // Display name + public required ImmutableDictionary Components { get; init; } + public required string SourceRef { get; init; } // Git SHA or tag + public required DateTimeOffset CreatedAt { get; init; } + public required Guid CreatedBy { get; init; } + public ReleaseState State { get; init; } // Draft, Active, Deprecated +} + +public sealed record ComponentDigest +{ + public required string Repository { get; init; } // registry.example.com/app/api + public required string Digest { get; init; } // sha256:abc123... + public required string? ResolvedFromTag { get; init; } // Optional: "v2.3.1" +} +``` + +### Promotion + +```csharp +public sealed record Promotion +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required Guid ReleaseId { get; init; } + public required Guid SourceEnvironmentId { get; init; } + public required Guid TargetEnvironmentId { get; init; } + public required Guid RequestedBy { get; init; } + public required DateTimeOffset RequestedAt { get; init; } + public PromotionState State { get; init; } // Pending, Approved, Rejected, Deployed, RolledBack + public required ImmutableArray GateResults { get; init; } + public required ImmutableArray Approvals { get; init; } + public required DecisionRecord? Decision { get; init; } +} +``` + +### Workflow + +```csharp +public sealed record Workflow +{ + public required Guid Id { get; init; } + public required string Name { get; init; } + public required ImmutableArray Steps { get; init; } + public required ImmutableDictionary DependencyGraph { get; init; } +} + +public sealed record WorkflowStep +{ + public required string Id { get; init; } + public required string Type { get; init; } // "script", "approval", "deploy", "gate" + public required StepProvider Provider { get; init; } + public required ImmutableDictionary Config { get; init; } + public required string[] DependsOn { get; init; } + public StepState State { get; init; } +} +``` + +### Target + +```csharp +public sealed record Target +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required Guid EnvironmentId { get; init; } + public required string Name { get; init; } + public required TargetType Type { get; init; } // DockerHost, ComposeHost, ECSService, NomadJob + public required ImmutableDictionary Labels { get; init; } + public required Guid? AgentId { get; init; } // Null for agentless + public required TargetState State { get; init; } + public required HealthStatus Health { get; init; } +} + +public enum TargetType +{ + DockerHost, + ComposeHost, + ECSService, + NomadJob, + SSHRemote, + WinRMRemote +} +``` + +### Agent + +```csharp +public sealed record Agent +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required string Name { get; init; } + public required string Version { get; init; } + public required ImmutableArray Capabilities { get; init; } + public required DateTimeOffset LastHeartbeat { get; init; } + public required AgentState State { get; init; } // Online, Offline, Degraded + public required ImmutableDictionary Labels { get; init; } +} +``` + +## Database Schema + +| Table | Purpose | +|-------|---------| +| `release.environments` | Environment definitions with freeze windows | +| `release.targets` | Deployment targets within environments | +| `release.agents` | Registered deployment agents | +| `release.components` | Component definitions (service → repository mapping) | +| `release.releases` | Release bundles (version → component digests) | +| `release.promotions` | Promotion requests and state | +| `release.approvals` | Approval records | +| `release.workflows` | Workflow templates | +| `release.workflow_runs` | Workflow execution state | +| `release.deployment_jobs` | Deployment job records | +| `release.evidence_packets` | Sealed evidence records | +| `release.integrations` | Integration configurations | +| `release.plugins` | Plugin registrations | + +## Gate Types + +| Gate | Purpose | Evaluation | +|------|---------|------------| +| **Security** | Check scan verdict | Query latest scan for release digest; block on critical/high reachable | +| **Approval** | Human sign-off | Count approvals; check SoD rules | +| **FreezeWindow** | Calendar-based blocking | Check target environment freeze windows | +| **PreviousEnvironment** | Require prior deployment | Verify release deployed to source environment | +| **Policy** | Custom OPA/Rego rules | Evaluate policy with promotion context | +| **HealthCheck** | Target health | Verify target is healthy before deploy | + +## Plugin System (Three-Surface Model) + +Plugins contribute through three surfaces: + +### 1. Manifest (Static Declaration) + +```yaml +# plugin-manifest.yaml +name: github-integration +version: 1.0.0 +provider: StellaOps.Integration.GitHub.Plugin +capabilities: + integrations: + - type: scm + id: github + displayName: GitHub + steps: + - type: github-status + displayName: Update GitHub Status + gates: + - type: github-check + displayName: GitHub Check Required +``` + +### 2. Connector Runtime (Dynamic Execution) + +```csharp +public interface IIntegrationConnector +{ + Task TestConnectionAsync(CancellationToken ct); + Task GetHealthAsync(CancellationToken ct); + Task> DiscoverResourcesAsync(string resourceType, CancellationToken ct); +} + +public interface ISCMConnector : IIntegrationConnector +{ + Task GetCommitAsync(string ref, CancellationToken ct); + Task CreateCommitStatusAsync(string commit, CommitStatus status, CancellationToken ct); +} + +public interface IRegistryConnector : IIntegrationConnector +{ + Task ResolveDigestAsync(string imageRef, CancellationToken ct); + Task VerifyDigestAsync(string imageRef, string expectedDigest, CancellationToken ct); +} +``` + +### 3. Step Provider (Execution Contract) + +```csharp +public interface IStepProvider +{ + StepExecutionCharacteristics Characteristics { get; } + Task ExecuteAsync(StepContext context, CancellationToken ct); + Task RollbackAsync(StepContext context, CancellationToken ct); +} + +public sealed record StepExecutionCharacteristics +{ + public bool IsIdempotent { get; init; } + public bool SupportsRollback { get; init; } + public TimeSpan DefaultTimeout { get; init; } + public ResourceRequirements Resources { get; init; } +} +``` + +## Invariants + +1. **Release identity is immutable** — Once created, a release's component digests cannot be changed. Create a new release instead. + +2. **Promotions are append-only** — Promotion state transitions are recorded; no edits or deletions. + +3. **Evidence packets are sealed** — Evidence is cryptographically signed and stored immutably. + +4. **Digest verification at deploy time** — Agents verify image digests at pull time; mismatch fails deployment. + +5. **Separation of duties enforced** — Requester cannot be sole approver for production promotions. + +6. **Workflow execution is deterministic** — Same inputs produce same execution order and outputs. + +## Error Handling + +- **Transient failures** — Retry with exponential backoff; circuit breaker for repeated failures +- **Agent disconnection** — Mark agent offline; reassign pending tasks to other agents +- **Deployment failure** — Automatic rollback if configured; otherwise mark promotion as failed +- **Gate failure** — Block promotion; require manual intervention or re-evaluation + +## Observability + +### Metrics + +- `release_promotions_total` — Counter by environment and outcome +- `release_deployments_duration_seconds` — Histogram of deployment times +- `release_gate_evaluations_total` — Counter by gate type and result +- `release_agents_online` — Gauge of online agents +- `release_workflow_steps_duration_seconds` — Histogram by step type + +### Traces + +- `promotion.request` — Span for promotion request handling +- `gate.evaluate` — Span for each gate evaluation +- `deployment.execute` — Span for deployment execution +- `agent.task` — Span for agent task execution + +### Logs + +- Structured logs with correlation IDs +- Promotion ID, release ID, environment ID in all relevant logs +- Sensitive data (secrets, credentials) masked + +## Security Considerations + +### Agent Security + +- **mTLS authentication** — Agents authenticate with CA-signed certificates +- **Short-lived credentials** — Task credentials expire after execution +- **Capability-based authorization** — Agents only receive tasks matching their capabilities +- **Heartbeat monitoring** — Detect and flag agent disconnections + +### Secrets Management + +- **Never stored in database** — Only vault references stored +- **Fetched at execution time** — Secrets retrieved just-in-time for deployment +- **Short-lived** — Dynamic credentials with minimal TTL +- **Masked in logs** — Secret values never logged + +### Plugin Sandbox + +- **Resource limits** — CPU, memory, timeout limits per plugin +- **Capability restrictions** — Plugins declare required capabilities +- **Network isolation** — Optional network restrictions for plugins + +## Performance Characteristics + +- **Promotion evaluation** — < 5 seconds for typical gate evaluation +- **Deployment latency** — Dominated by image pull time; orchestration overhead < 10 seconds +- **Agent heartbeat** — 30-second interval; offline detection within 90 seconds +- **Workflow step timeout** — Configurable; default 5 minutes per step + +## Implementation Roadmap + +| Phase | Focus | Key Deliverables | +|-------|-------|------------------| +| **Phase 1** | Foundation | Environment management, integration hub, release bundles | +| **Phase 2** | Workflow Engine | DAG execution, step registry, workflow templates | +| **Phase 3** | Promotion & Decision | Approval gateway, security gates, decision records | +| **Phase 4** | Deployment Execution | Docker/Compose agents, artifact generation, rollback | +| **Phase 5** | UI & Polish | Release dashboard, promotion UI, environment management | +| **Phase 6** | Progressive Delivery | A/B releases, canary, traffic routing | +| **Phase 7** | Extended Targets | ECS, Nomad, SSH/WinRM agentless | +| **Phase 8** | Plugin Ecosystem | Full plugin system, marketplace | + +## References + +- [Product Vision](../../product/VISION.md) +- [Architecture Overview](../../ARCHITECTURE_OVERVIEW.md) +- [Full Orchestrator Specification](../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) +- [Competitive Landscape](../../product/competitive-landscape.md) diff --git a/docs/modules/release-orchestrator/data-model/entities.md b/docs/modules/release-orchestrator/data-model/entities.md new file mode 100644 index 000000000..b538fb7a0 --- /dev/null +++ b/docs/modules/release-orchestrator/data-model/entities.md @@ -0,0 +1,343 @@ +# Entity Definitions + +This document describes the core entities in the Release Orchestrator data model. + +## Entity Relationship Overview + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ ENTITY RELATIONSHIPS │ +│ │ +│ ┌──────────┐ ┌──────────────┐ ┌────────────┐ │ +│ │ Tenant │───────│ Environment │───────│ Target │ │ +│ └──────────┘ └──────────────┘ └────────────┘ │ +│ │ │ │ │ +│ │ │ │ │ +│ ▼ ▼ ▼ │ +│ ┌──────────┐ ┌──────────────┐ ┌────────────┐ │ +│ │ Component│ │ Approval │ │ Agent │ │ +│ └──────────┘ │ Policy │ └────────────┘ │ +│ │ └──────────────┘ │ │ +│ │ │ │ │ +│ ▼ │ ▼ │ +│ ┌──────────┐ │ ┌─────────────┐ │ +│ │ Version │ │ │ Deployment │ │ +│ │ Map │ │ │ Task │ │ +│ └──────────┘ │ └─────────────┘ │ +│ │ │ │ │ +│ │ │ │ │ +│ ▼ │ ▼ │ +│ ┌─────────────────────────┼─────────────────────────────┐ │ +│ │ │ │ │ +│ │ ┌──────────┐ ┌─────▼─────┐ ┌─────────────┐ │ │ +│ │ │ Release │─────│ Promotion │─────│ Deployment │ │ │ +│ │ └──────────┘ └───────────┘ │ Job │ │ │ +│ │ │ │ └─────────────┘ │ │ +│ │ │ │ │ │ │ +│ │ │ ▼ │ │ │ +│ │ │ ┌───────────┐ │ │ │ +│ │ │ │ Approval │ │ │ │ +│ │ │ └───────────┘ │ │ │ +│ │ │ │ │ │ │ +│ │ │ ▼ ▼ │ │ +│ │ │ ┌───────────┐ ┌───────────┐ │ │ +│ │ │ │ Decision │ │ Generated │ │ │ +│ │ │ │ Record │ │ Artifacts │ │ │ +│ │ │ └───────────┘ └───────────┘ │ │ +│ │ │ │ │ │ │ +│ │ │ └────────┬────────┘ │ │ +│ │ │ │ │ │ +│ │ │ ▼ │ │ +│ │ │ ┌───────────┐ │ │ +│ │ └───────────────────►│ Evidence │◄────────────┘ │ +│ │ │ Packet │ │ +│ │ └───────────┘ │ +│ │ │ │ +│ │ ▼ │ +│ │ ┌───────────┐ │ +│ │ │ Version │ │ +│ │ │ Sticker │ │ +│ │ └───────────┘ │ +│ │ │ +│ └─────────────────────────────────────────────────────────────────────────┘ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Core Entities + +### Environment + +Represents a deployment target environment (dev, staging, production). + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `tenant_id` | UUID | Tenant reference | +| `name` | string | Unique name (e.g., "prod") | +| `display_name` | string | Display name (e.g., "Production") | +| `order_index` | integer | Promotion order | +| `config` | JSONB | Environment configuration | +| `freeze_windows` | JSONB | Active freeze windows | +| `required_approvals` | integer | Approvals needed for promotion | +| `require_sod` | boolean | Require separation of duties | +| `created_at` | timestamp | Creation time | + +### Target + +Represents a deployment target (host, service). + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `tenant_id` | UUID | Tenant reference | +| `environment_id` | UUID | Environment reference | +| `name` | string | Target name | +| `target_type` | string | Type (docker_host, compose_host, etc.) | +| `connection` | JSONB | Connection configuration | +| `labels` | JSONB | Target labels | +| `health_status` | string | Current health status | +| `current_digest` | string | Currently deployed digest | + +### Agent + +Represents a deployment agent. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `tenant_id` | UUID | Tenant reference | +| `name` | string | Agent name | +| `version` | string | Agent version | +| `capabilities` | JSONB | Agent capabilities | +| `status` | string | online/offline/degraded | +| `last_heartbeat` | timestamp | Last heartbeat time | + +### Component + +Represents a deployable component (maps to an image repository). + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `tenant_id` | UUID | Tenant reference | +| `name` | string | Component name | +| `display_name` | string | Display name | +| `image_repository` | string | Image repository URL | +| `versioning_strategy` | JSONB | How versions are determined | +| `default_channel` | string | Default version channel | + +### Version Map + +Maps image tags to digests and semantic versions. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `component_id` | UUID | Component reference | +| `tag` | string | Image tag | +| `digest` | string | Image digest (sha256:...) | +| `semver` | string | Semantic version | +| `channel` | string | Version channel (stable, beta) | + +### Release + +A versioned bundle of component digests. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `tenant_id` | UUID | Tenant reference | +| `name` | string | Release name | +| `display_name` | string | Display name | +| `components` | JSONB | Component/digest mappings | +| `source_ref` | JSONB | Source code reference | +| `status` | string | draft/ready/deployed/deprecated | +| `created_by` | UUID | Creator user reference | + +### Promotion + +A request to promote a release to an environment. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `tenant_id` | UUID | Tenant reference | +| `release_id` | UUID | Release reference | +| `source_environment_id` | UUID | Source environment (nullable) | +| `target_environment_id` | UUID | Target environment | +| `status` | string | Promotion status | +| `decision_record` | JSONB | Gate evaluation results | +| `workflow_run_id` | UUID | Associated workflow run | +| `requested_by` | UUID | Requesting user | +| `requested_at` | timestamp | Request time | + +### Approval + +An approval or rejection of a promotion. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `promotion_id` | UUID | Promotion reference | +| `approver_id` | UUID | Approving user | +| `action` | string | approved/rejected | +| `comment` | string | Approval comment | +| `approved_at` | timestamp | Approval time | + +### Deployment Job + +A deployment execution job. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `promotion_id` | UUID | Promotion reference | +| `release_id` | UUID | Release reference | +| `environment_id` | UUID | Environment reference | +| `status` | string | Job status | +| `strategy` | string | Deployment strategy | +| `artifacts` | JSONB | Generated artifacts | +| `rollback_of` | UUID | If rollback, original job | + +### Deployment Task + +A task to deploy to a single target. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `job_id` | UUID | Job reference | +| `target_id` | UUID | Target reference | +| `digest` | string | Digest to deploy | +| `status` | string | Task status | +| `agent_id` | UUID | Assigned agent | +| `logs` | text | Execution logs | +| `previous_digest` | string | Previous digest (for rollback) | + +### Evidence Packet + +Immutable audit evidence for a promotion/deployment. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `promotion_id` | UUID | Promotion reference | +| `packet_type` | string | Type of evidence | +| `content` | JSONB | Evidence content | +| `content_hash` | string | SHA-256 of content | +| `signature` | string | Cryptographic signature | +| `signer_key_ref` | string | Signing key reference | +| `created_at` | timestamp | Creation time (no update) | + +### Version Sticker + +Version marker placed on deployment targets. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `target_id` | UUID | Target reference | +| `release_id` | UUID | Release reference | +| `promotion_id` | UUID | Promotion reference | +| `sticker_content` | JSONB | Sticker JSON content | +| `content_hash` | string | Content hash | +| `written_at` | timestamp | Write time | +| `drift_detected` | boolean | Drift detection flag | + +## Workflow Entities + +### Workflow Template + +A reusable workflow definition. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `tenant_id` | UUID | Tenant reference (null for builtin) | +| `name` | string | Template name | +| `version` | integer | Template version | +| `nodes` | JSONB | Step nodes | +| `edges` | JSONB | Step edges | +| `inputs` | JSONB | Input definitions | +| `outputs` | JSONB | Output definitions | +| `is_builtin` | boolean | Is built-in template | + +### Workflow Run + +An execution of a workflow template. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `template_id` | UUID | Template reference | +| `template_version` | integer | Template version at execution | +| `status` | string | Run status | +| `context` | JSONB | Execution context | +| `inputs` | JSONB | Input values | +| `outputs` | JSONB | Output values | +| `started_at` | timestamp | Start time | +| `completed_at` | timestamp | Completion time | + +### Step Run + +Execution of a single step within a workflow run. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `workflow_run_id` | UUID | Workflow run reference | +| `node_id` | string | Node ID from template | +| `status` | string | Step status | +| `inputs` | JSONB | Resolved inputs | +| `outputs` | JSONB | Produced outputs | +| `logs` | text | Execution logs | +| `attempt_number` | integer | Retry attempt number | + +## Plugin Entities + +### Plugin + +A registered plugin. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `plugin_id` | string | Unique plugin identifier | +| `version` | string | Plugin version | +| `vendor` | string | Plugin vendor | +| `manifest` | JSONB | Plugin manifest | +| `status` | string | Plugin status | +| `entrypoint` | string | Plugin entrypoint path | + +### Plugin Instance + +A tenant-specific plugin configuration. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `plugin_id` | UUID | Plugin reference | +| `tenant_id` | UUID | Tenant reference | +| `config` | JSONB | Tenant configuration | +| `enabled` | boolean | Is enabled for tenant | + +## Integration Entities + +### Integration + +A configured external integration. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `tenant_id` | UUID | Tenant reference | +| `type_id` | string | Integration type | +| `name` | string | Integration name | +| `config` | JSONB | Integration configuration | +| `credential_ref` | string | Vault credential reference | +| `health_status` | string | Connection health | + +## References + +- [Database Schema](schema.md) +- [Module Overview](../modules/overview.md) diff --git a/docs/modules/release-orchestrator/data-model/schema.md b/docs/modules/release-orchestrator/data-model/schema.md new file mode 100644 index 000000000..68539d111 --- /dev/null +++ b/docs/modules/release-orchestrator/data-model/schema.md @@ -0,0 +1,631 @@ +# Database Schema (PostgreSQL) + +This document specifies the complete PostgreSQL schema for the Release Orchestrator. + +## Schema Organization + +All release orchestration tables reside in the `release` schema: + +```sql +CREATE SCHEMA IF NOT EXISTS release; +SET search_path TO release, public; +``` + +## Core Tables + +### Tenant and Authority Extensions + +```sql +-- Extended: Add release-related permissions +ALTER TABLE permissions ADD COLUMN IF NOT EXISTS + resource_type VARCHAR(50) CHECK (resource_type IN ( + 'environment', 'release', 'promotion', 'target', 'workflow', 'plugin' + )); +``` + +--- + +## Integration Hub + +```sql +CREATE TABLE integration_types ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + name VARCHAR(100) NOT NULL UNIQUE, + category VARCHAR(50) NOT NULL CHECK (category IN ( + 'scm', 'ci', 'registry', 'vault', 'target', 'router' + )), + plugin_id UUID REFERENCES plugins(id), + config_schema JSONB NOT NULL, + secrets_schema JSONB NOT NULL, + is_builtin BOOLEAN NOT NULL DEFAULT FALSE, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE TABLE integrations ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + integration_type_id UUID NOT NULL REFERENCES integration_types(id), + name VARCHAR(255) NOT NULL, + config JSONB NOT NULL, + credential_ref VARCHAR(500), -- Vault path or encrypted ref + status VARCHAR(50) NOT NULL DEFAULT 'unknown' CHECK (status IN ( + 'healthy', 'degraded', 'unhealthy', 'unknown' + )), + last_health_check TIMESTAMPTZ, + last_health_message TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + created_by UUID REFERENCES users(id), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_integrations_tenant ON integrations(tenant_id); +CREATE INDEX idx_integrations_type ON integrations(integration_type_id); + +CREATE TABLE connection_profiles ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + user_id UUID NOT NULL REFERENCES users(id), + integration_type_id UUID NOT NULL REFERENCES integration_types(id), + name VARCHAR(255) NOT NULL, + config_defaults JSONB NOT NULL, + is_default BOOLEAN NOT NULL DEFAULT FALSE, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, user_id, integration_type_id, name) +); +``` + +--- + +## Environment & Inventory + +```sql +CREATE TABLE environments ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + name VARCHAR(100) NOT NULL, + display_name VARCHAR(255) NOT NULL, + order_index INTEGER NOT NULL, + config JSONB NOT NULL DEFAULT '{}', + freeze_windows JSONB NOT NULL DEFAULT '[]', + required_approvals INTEGER NOT NULL DEFAULT 0, + require_sod BOOLEAN NOT NULL DEFAULT FALSE, + auto_promote_from UUID REFERENCES environments(id), + promotion_policy VARCHAR(255), + deployment_timeout INTEGER NOT NULL DEFAULT 600, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_environments_tenant ON environments(tenant_id); +CREATE INDEX idx_environments_order ON environments(tenant_id, order_index); + +CREATE TABLE target_groups ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES environments(id) ON DELETE CASCADE, + name VARCHAR(255) NOT NULL, + labels JSONB NOT NULL DEFAULT '{}', + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, environment_id, name) +); + +CREATE TABLE targets ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES environments(id) ON DELETE CASCADE, + target_group_id UUID REFERENCES target_groups(id), + name VARCHAR(255) NOT NULL, + target_type VARCHAR(100) NOT NULL, + connection JSONB NOT NULL, + capabilities JSONB NOT NULL DEFAULT '[]', + labels JSONB NOT NULL DEFAULT '{}', + deployment_directory VARCHAR(500), + health_status VARCHAR(50) NOT NULL DEFAULT 'unknown' CHECK (health_status IN ( + 'healthy', 'degraded', 'unhealthy', 'unknown' + )), + last_health_check TIMESTAMPTZ, + current_digest VARCHAR(100), + agent_id UUID REFERENCES agents(id), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, environment_id, name) +); + +CREATE INDEX idx_targets_tenant_env ON targets(tenant_id, environment_id); +CREATE INDEX idx_targets_type ON targets(target_type); +CREATE INDEX idx_targets_labels ON targets USING GIN (labels); + +CREATE TABLE agents ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + name VARCHAR(255) NOT NULL, + version VARCHAR(50) NOT NULL, + capabilities JSONB NOT NULL DEFAULT '[]', + labels JSONB NOT NULL DEFAULT '{}', + status VARCHAR(50) NOT NULL DEFAULT 'offline' CHECK (status IN ( + 'online', 'offline', 'degraded' + )), + last_heartbeat TIMESTAMPTZ, + resource_usage JSONB, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_agents_tenant ON agents(tenant_id); +CREATE INDEX idx_agents_status ON agents(status); +CREATE INDEX idx_agents_capabilities ON agents USING GIN (capabilities); +``` + +--- + +## Release Management + +```sql +CREATE TABLE components ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + name VARCHAR(255) NOT NULL, + display_name VARCHAR(255) NOT NULL, + image_repository VARCHAR(500) NOT NULL, + registry_integration_id UUID REFERENCES integrations(id), + versioning_strategy JSONB NOT NULL DEFAULT '{"type": "semver"}', + deployment_template VARCHAR(255), + default_channel VARCHAR(50) NOT NULL DEFAULT 'stable', + metadata JSONB NOT NULL DEFAULT '{}', + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_components_tenant ON components(tenant_id); + +CREATE TABLE version_maps ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + component_id UUID NOT NULL REFERENCES components(id) ON DELETE CASCADE, + tag VARCHAR(255) NOT NULL, + digest VARCHAR(100) NOT NULL, + semver VARCHAR(50), + channel VARCHAR(50) NOT NULL DEFAULT 'stable', + prerelease BOOLEAN NOT NULL DEFAULT FALSE, + build_metadata VARCHAR(255), + resolved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + source VARCHAR(50) NOT NULL DEFAULT 'auto' CHECK (source IN ('auto', 'manual')), + UNIQUE (tenant_id, component_id, digest) +); + +CREATE INDEX idx_version_maps_component ON version_maps(component_id); +CREATE INDEX idx_version_maps_digest ON version_maps(digest); +CREATE INDEX idx_version_maps_semver ON version_maps(semver); + +CREATE TABLE releases ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + name VARCHAR(255) NOT NULL, + display_name VARCHAR(255) NOT NULL, + components JSONB NOT NULL, -- [{componentId, digest, semver, tag, role}] + source_ref JSONB, -- {scmIntegrationId, commitSha, ciIntegrationId, buildId} + status VARCHAR(50) NOT NULL DEFAULT 'draft' CHECK (status IN ( + 'draft', 'ready', 'promoting', 'deployed', 'deprecated', 'archived' + )), + metadata JSONB NOT NULL DEFAULT '{}', + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + created_by UUID REFERENCES users(id), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_releases_tenant ON releases(tenant_id); +CREATE INDEX idx_releases_status ON releases(status); +CREATE INDEX idx_releases_created ON releases(created_at DESC); + +CREATE TABLE release_environment_state ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES environments(id) ON DELETE CASCADE, + release_id UUID NOT NULL REFERENCES releases(id), + status VARCHAR(50) NOT NULL CHECK (status IN ( + 'deployed', 'deploying', 'failed', 'rolling_back', 'rolled_back' + )), + deployed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + deployed_by UUID REFERENCES users(id), + promotion_id UUID, -- will reference promotions + evidence_ref VARCHAR(255), + UNIQUE (tenant_id, environment_id) +); + +CREATE INDEX idx_release_env_state_env ON release_environment_state(environment_id); +CREATE INDEX idx_release_env_state_release ON release_environment_state(release_id); +``` + +--- + +## Workflow Engine + +```sql +CREATE TABLE workflow_templates ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID REFERENCES tenants(id) ON DELETE CASCADE, -- NULL for builtin + name VARCHAR(255) NOT NULL, + display_name VARCHAR(255) NOT NULL, + description TEXT, + version INTEGER NOT NULL DEFAULT 1, + nodes JSONB NOT NULL, + edges JSONB NOT NULL, + inputs JSONB NOT NULL DEFAULT '[]', + outputs JSONB NOT NULL DEFAULT '[]', + is_builtin BOOLEAN NOT NULL DEFAULT FALSE, + tags JSONB NOT NULL DEFAULT '[]', + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + created_by UUID REFERENCES users(id), + UNIQUE (tenant_id, name, version) +); + +CREATE INDEX idx_workflow_templates_tenant ON workflow_templates(tenant_id); +CREATE INDEX idx_workflow_templates_builtin ON workflow_templates(is_builtin); + +CREATE TABLE workflow_runs ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + template_id UUID NOT NULL REFERENCES workflow_templates(id), + template_version INTEGER NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'created' CHECK (status IN ( + 'created', 'running', 'paused', 'succeeded', 'failed', 'cancelled' + )), + context JSONB NOT NULL, -- inputs, variables, release info + outputs JSONB, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + error_message TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + triggered_by UUID REFERENCES users(id) +); + +CREATE INDEX idx_workflow_runs_tenant ON workflow_runs(tenant_id); +CREATE INDEX idx_workflow_runs_status ON workflow_runs(status); +CREATE INDEX idx_workflow_runs_template ON workflow_runs(template_id); + +CREATE TABLE step_runs ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + workflow_run_id UUID NOT NULL REFERENCES workflow_runs(id) ON DELETE CASCADE, + node_id VARCHAR(100) NOT NULL, + step_type VARCHAR(100) NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN ( + 'pending', 'running', 'succeeded', 'failed', 'skipped', 'retrying', 'cancelled' + )), + inputs JSONB NOT NULL, + config JSONB NOT NULL, + outputs JSONB, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + attempt_number INTEGER NOT NULL DEFAULT 1, + error_message TEXT, + error_type VARCHAR(100), + logs TEXT, + artifacts JSONB NOT NULL DEFAULT '[]', + t_hlc BIGINT, -- Hybrid Logical Clock for ordering (optional) + ts_wall TIMESTAMPTZ, -- Wall-clock timestamp for debugging (optional) + UNIQUE (workflow_run_id, node_id, attempt_number) +); + +CREATE INDEX idx_step_runs_workflow ON step_runs(workflow_run_id); +CREATE INDEX idx_step_runs_status ON step_runs(status); +``` + +--- + +## Promotion & Approval + +```sql +CREATE TABLE promotions ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + release_id UUID NOT NULL REFERENCES releases(id), + source_environment_id UUID REFERENCES environments(id), + target_environment_id UUID NOT NULL REFERENCES environments(id), + status VARCHAR(50) NOT NULL DEFAULT 'pending_approval' CHECK (status IN ( + 'pending_approval', 'pending_gate', 'approved', 'rejected', + 'deploying', 'deployed', 'failed', 'cancelled', 'rolled_back' + )), + decision_record JSONB, + workflow_run_id UUID REFERENCES workflow_runs(id), + requested_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + requested_by UUID NOT NULL REFERENCES users(id), + request_reason TEXT, + decided_at TIMESTAMPTZ, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + evidence_packet_id UUID, + t_hlc BIGINT, -- Hybrid Logical Clock for ordering (optional) + ts_wall TIMESTAMPTZ -- Wall-clock timestamp for debugging (optional) +); + +CREATE INDEX idx_promotions_tenant ON promotions(tenant_id); +CREATE INDEX idx_promotions_release ON promotions(release_id); +CREATE INDEX idx_promotions_status ON promotions(status); +CREATE INDEX idx_promotions_target_env ON promotions(target_environment_id); + +-- Add FK to release_environment_state +ALTER TABLE release_environment_state + ADD CONSTRAINT fk_release_env_state_promotion + FOREIGN KEY (promotion_id) REFERENCES promotions(id); + +CREATE TABLE approvals ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + promotion_id UUID NOT NULL REFERENCES promotions(id) ON DELETE CASCADE, + approver_id UUID NOT NULL REFERENCES users(id), + action VARCHAR(50) NOT NULL CHECK (action IN ('approved', 'rejected')), + comment TEXT, + approved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + approver_role VARCHAR(255), + approver_groups JSONB NOT NULL DEFAULT '[]' +); + +CREATE INDEX idx_approvals_promotion ON approvals(promotion_id); +CREATE INDEX idx_approvals_approver ON approvals(approver_id); + +CREATE TABLE approval_policies ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES environments(id) ON DELETE CASCADE, + required_count INTEGER NOT NULL DEFAULT 1, + required_roles JSONB NOT NULL DEFAULT '[]', + required_groups JSONB NOT NULL DEFAULT '[]', + require_sod BOOLEAN NOT NULL DEFAULT FALSE, + allow_self_approval BOOLEAN NOT NULL DEFAULT FALSE, + expiration_minutes INTEGER NOT NULL DEFAULT 1440, + UNIQUE (tenant_id, environment_id) +); +``` + +--- + +## Deployment + +```sql +CREATE TABLE deployment_jobs ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + promotion_id UUID NOT NULL REFERENCES promotions(id), + release_id UUID NOT NULL REFERENCES releases(id), + environment_id UUID NOT NULL REFERENCES environments(id), + status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN ( + 'pending', 'running', 'succeeded', 'failed', 'cancelled', 'rolling_back', 'rolled_back' + )), + strategy VARCHAR(50) NOT NULL DEFAULT 'all-at-once', + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + artifacts JSONB NOT NULL DEFAULT '[]', + rollback_of UUID REFERENCES deployment_jobs(id), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + t_hlc BIGINT, -- Hybrid Logical Clock for ordering (optional) + ts_wall TIMESTAMPTZ -- Wall-clock timestamp for debugging (optional) +); + +CREATE INDEX idx_deployment_jobs_promotion ON deployment_jobs(promotion_id); +CREATE INDEX idx_deployment_jobs_status ON deployment_jobs(status); + +CREATE TABLE deployment_tasks ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + job_id UUID NOT NULL REFERENCES deployment_jobs(id) ON DELETE CASCADE, + target_id UUID NOT NULL REFERENCES targets(id), + digest VARCHAR(100) NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN ( + 'pending', 'running', 'succeeded', 'failed', 'cancelled', 'skipped' + )), + agent_id UUID REFERENCES agents(id), + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + exit_code INTEGER, + logs TEXT, + previous_digest VARCHAR(100), + sticker_written BOOLEAN NOT NULL DEFAULT FALSE, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_deployment_tasks_job ON deployment_tasks(job_id); +CREATE INDEX idx_deployment_tasks_target ON deployment_tasks(target_id); +CREATE INDEX idx_deployment_tasks_status ON deployment_tasks(status); + +CREATE TABLE generated_artifacts ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + deployment_job_id UUID REFERENCES deployment_jobs(id) ON DELETE CASCADE, + artifact_type VARCHAR(50) NOT NULL CHECK (artifact_type IN ( + 'compose_lock', 'script', 'sticker', 'evidence', 'config' + )), + name VARCHAR(255) NOT NULL, + content_hash VARCHAR(100) NOT NULL, + content BYTEA, -- for small artifacts + storage_ref VARCHAR(500), -- for large artifacts (S3, etc.) + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_generated_artifacts_job ON generated_artifacts(deployment_job_id); +``` + +--- + +## Progressive Delivery + +```sql +CREATE TABLE ab_releases ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES environments(id), + name VARCHAR(255) NOT NULL, + variations JSONB NOT NULL, -- [{name, releaseId, targetGroupId, trafficPercentage}] + active_variation VARCHAR(50) NOT NULL DEFAULT 'A', + traffic_split JSONB NOT NULL, + rollout_strategy JSONB NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'created' CHECK (status IN ( + 'created', 'deploying', 'running', 'promoting', 'completed', 'rolled_back' + )), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + completed_at TIMESTAMPTZ, + created_by UUID REFERENCES users(id) +); + +CREATE INDEX idx_ab_releases_tenant_env ON ab_releases(tenant_id, environment_id); +CREATE INDEX idx_ab_releases_status ON ab_releases(status); + +CREATE TABLE canary_stages ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + ab_release_id UUID NOT NULL REFERENCES ab_releases(id) ON DELETE CASCADE, + stage_number INTEGER NOT NULL, + traffic_percentage INTEGER NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN ( + 'pending', 'running', 'succeeded', 'failed', 'skipped' + )), + health_threshold DECIMAL(5,2), + duration_seconds INTEGER, + require_approval BOOLEAN NOT NULL DEFAULT FALSE, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + health_result JSONB, + UNIQUE (ab_release_id, stage_number) +); +``` + +--- + +## Release Evidence + +```sql +CREATE TABLE evidence_packets ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + promotion_id UUID NOT NULL REFERENCES promotions(id), + packet_type VARCHAR(50) NOT NULL CHECK (packet_type IN ( + 'release_decision', 'deployment', 'rollback', 'ab_promotion' + )), + content JSONB NOT NULL, + content_hash VARCHAR(100) NOT NULL, + signature TEXT, + signer_key_ref VARCHAR(255), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() + -- Note: No UPDATE or DELETE allowed (append-only) +); + +CREATE INDEX idx_evidence_packets_promotion ON evidence_packets(promotion_id); +CREATE INDEX idx_evidence_packets_created ON evidence_packets(created_at DESC); + +-- Append-only enforcement via trigger +CREATE OR REPLACE FUNCTION prevent_evidence_modification() +RETURNS TRIGGER AS $$ +BEGIN + RAISE EXCEPTION 'Evidence packets are immutable and cannot be modified or deleted'; +END; +$$ LANGUAGE plpgsql; + +CREATE TRIGGER evidence_packets_immutable +BEFORE UPDATE OR DELETE ON evidence_packets +FOR EACH ROW EXECUTE FUNCTION prevent_evidence_modification(); + +CREATE TABLE version_stickers ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + target_id UUID NOT NULL REFERENCES targets(id), + deployment_job_id UUID REFERENCES deployment_jobs(id), + release_id UUID NOT NULL REFERENCES releases(id), + digest VARCHAR(100) NOT NULL, + sticker_content JSONB NOT NULL, + written_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + verified_at TIMESTAMPTZ, + verification_status VARCHAR(50) CHECK (verification_status IN ('valid', 'mismatch', 'missing')) +); + +CREATE INDEX idx_version_stickers_target ON version_stickers(target_id); +CREATE INDEX idx_version_stickers_release ON version_stickers(release_id); +``` + +--- + +## Plugin Infrastructure + +```sql +CREATE TABLE plugins ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + name VARCHAR(255) NOT NULL UNIQUE, + display_name VARCHAR(255) NOT NULL, + version VARCHAR(50) NOT NULL, + description TEXT, + manifest JSONB NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'inactive' CHECK (status IN ( + 'active', 'inactive', 'error' + )), + error_message TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE TABLE plugin_instances ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + plugin_id UUID NOT NULL REFERENCES plugins(id), + config JSONB NOT NULL DEFAULT '{}', + enabled BOOLEAN NOT NULL DEFAULT TRUE, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, plugin_id) +); + +CREATE INDEX idx_plugin_instances_tenant ON plugin_instances(tenant_id); +``` + +--- + +--- + +## Hybrid Logical Clock (HLC) for Distributed Ordering + +**Optional Enhancement**: For strict distributed ordering and multi-region support, the following tables include optional `t_hlc` (Hybrid Logical Clock timestamp) and `ts_wall` (wall-clock timestamp) columns: + +- `promotions` — Promotion state transitions +- `deployment_jobs` — Deployment task ordering +- `step_runs` — Workflow step execution ordering + +**When to use HLC**: +- Multi-region deployments requiring strict causal ordering +- Deterministic replay across distributed systems +- Timeline event ordering in audit logs + +**HLC Schema**: +```sql +t_hlc BIGINT -- HLC timestamp (monotonic, skew-tolerant) +ts_wall TIMESTAMPTZ -- Wall-clock timestamp (informational) +``` + +**Usage**: +- `t_hlc` is generated by `IHybridLogicalClock.Tick()` on state transitions +- `ts_wall` is populated by `TimeProvider.GetUtcNow()` for debugging +- Index on `t_hlc` for ordering queries: `CREATE INDEX idx_promotions_hlc ON promotions(t_hlc);` + +**Reference**: See [Implementation Guide](../implementation-guide.md#hybrid-logical-clock-hlc-for-distributed-ordering) for HLC usage patterns. + +--- + +## Row-Level Security (Multi-Tenancy) + +All tables with `tenant_id` should have RLS enabled: + +```sql +-- Enable RLS on all release tables +ALTER TABLE integrations ENABLE ROW LEVEL SECURITY; +ALTER TABLE environments ENABLE ROW LEVEL SECURITY; +ALTER TABLE targets ENABLE ROW LEVEL SECURITY; +ALTER TABLE releases ENABLE ROW LEVEL SECURITY; +ALTER TABLE promotions ENABLE ROW LEVEL SECURITY; +-- ... etc. + +-- Example policy +CREATE POLICY tenant_isolation ON integrations + FOR ALL + USING (tenant_id = current_setting('app.tenant_id')::UUID); +``` diff --git a/docs/modules/release-orchestrator/deployment/artifacts.md b/docs/modules/release-orchestrator/deployment/artifacts.md new file mode 100644 index 000000000..95f8c910f --- /dev/null +++ b/docs/modules/release-orchestrator/deployment/artifacts.md @@ -0,0 +1,308 @@ +# Artifact Generation + +## Overview + +Every deployment generates immutable artifacts that enable reproducibility, audit, and rollback. + +## Generated Artifacts + +### 1. Compose Lock File + +**File:** `compose.stella.lock.yml` + +A Docker Compose file with all image references pinned to specific digests. + +```yaml +# compose.stella.lock.yml +# Generated by Stella Ops - DO NOT EDIT +# Release: myapp-v2.3.1 +# Generated: 2026-01-10T14:30:00Z +# Generator: stella-artifact-generator@1.5.0 + +version: "3.8" + +services: + api: + image: registry.example.com/myapp/api@sha256:abc123... + # Original tag: v2.3.1 + deploy: + replicas: 2 + environment: + - DATABASE_URL=${DATABASE_URL} + - REDIS_URL=${REDIS_URL} + labels: + stella.component.id: "comp-api-uuid" + stella.release.id: "rel-uuid" + stella.digest: "sha256:abc123..." + + worker: + image: registry.example.com/myapp/worker@sha256:def456... + # Original tag: v2.3.1 + deploy: + replicas: 1 + labels: + stella.component.id: "comp-worker-uuid" + stella.release.id: "rel-uuid" + stella.digest: "sha256:def456..." + +# Stella metadata +x-stella: + release: + id: "rel-uuid" + name: "myapp-v2.3.1" + created_at: "2026-01-10T14:00:00Z" + environment: + id: "env-uuid" + name: "production" + deployment: + id: "deploy-uuid" + started_at: "2026-01-10T14:30:00Z" + checksums: + sha256: "checksum-of-this-file" +``` + +### 2. Version Sticker + +**File:** `stella.version.json` + +Metadata file placed on deployment targets indicating current deployment state. + +```json +{ + "version": "1.0", + "generatedAt": "2026-01-10T14:35:00Z", + "generator": "stella-artifact-generator@1.5.0", + + "release": { + "id": "rel-uuid", + "name": "myapp-v2.3.1", + "createdAt": "2026-01-10T14:00:00Z", + "components": [ + { + "name": "api", + "digest": "sha256:abc123...", + "semver": "2.3.1", + "tag": "v2.3.1" + }, + { + "name": "worker", + "digest": "sha256:def456...", + "semver": "2.3.1", + "tag": "v2.3.1" + } + ] + }, + + "deployment": { + "id": "deploy-uuid", + "promotionId": "promo-uuid", + "environmentId": "env-uuid", + "environmentName": "production", + "targetId": "target-uuid", + "targetName": "prod-web-01", + "strategy": "rolling", + "startedAt": "2026-01-10T14:30:00Z", + "completedAt": "2026-01-10T14:35:00Z" + }, + + "deployer": { + "userId": "user-uuid", + "userName": "john.doe", + "agentId": "agent-uuid", + "agentName": "prod-agent-01" + }, + + "previous": { + "releaseId": "prev-rel-uuid", + "releaseName": "myapp-v2.3.0", + "digest": "sha256:789..." + }, + + "signature": "base64-encoded-signature", + "signatureAlgorithm": "RS256", + "signerKeyRef": "stella/signing/prod-key-2026" +} +``` + +### 3. Evidence Packet + +**File:** Evidence stored in database (exportable as JSON/PDF) + +See [Evidence Schema](../appendices/evidence-schema.md) for full specification. + +### 4. Deployment Script (Optional) + +**File:** `deploy.stella.script.dll` or `deploy.stella.sh` + +When deployments use C# or shell scripts with hooks: + +```csharp +// deploy.stella.csx (source, compiled to DLL) +#r "nuget: StellaOps.Sdk, 1.0.0" + +using StellaOps.Sdk; + +// Pre-deploy hook +await Context.RunPreDeployHook(async (ctx) => { + await ctx.ExecuteCommand("./scripts/backup-database.sh"); + await ctx.HealthCheck("/ready", timeout: 30); +}); + +// Deploy +await Context.Deploy(); + +// Post-deploy hook +await Context.RunPostDeployHook(async (ctx) => { + await ctx.ExecuteCommand("./scripts/warm-cache.sh"); + await ctx.Notify("slack", "Deployment complete"); +}); +``` + +## Artifact Storage + +### Storage Structure + +``` +artifacts/ +├── {tenant_id}/ +│ ├── {deployment_id}/ +│ │ ├── compose.stella.lock.yml +│ │ ├── deploy.stella.script.dll (if applicable) +│ │ ├── deploy.stella.script.csx (source) +│ │ ├── manifest.json +│ │ └── checksums.sha256 +│ └── ... +└── ... +``` + +### Manifest File + +```json +{ + "version": "1.0", + "deploymentId": "deploy-uuid", + "createdAt": "2026-01-10T14:30:00Z", + "artifacts": [ + { + "name": "compose.stella.lock.yml", + "type": "compose-lock", + "size": 2048, + "sha256": "abc123..." + }, + { + "name": "deploy.stella.script.dll", + "type": "script-compiled", + "size": 8192, + "sha256": "def456..." + } + ], + "totalSize": 10240, + "signature": "base64-signature" +} +``` + +## Artifact Generation Process + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ ARTIFACT GENERATION FLOW │ +│ │ +│ ┌─────────────────┐ │ +│ │ Promotion │ │ +│ │ Approved │ │ +│ └────────┬────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ ARTIFACT GENERATOR │ │ +│ │ │ │ +│ │ 1. Load release bundle (components, digests) │ │ +│ │ 2. Load environment configuration (variables, secrets refs) │ │ +│ │ 3. Load workflow template (hooks, scripts) │ │ +│ │ 4. Generate compose.stella.lock.yml │ │ +│ │ 5. Compile scripts (if any) │ │ +│ │ 6. Generate version sticker template │ │ +│ │ 7. Compute checksums │ │ +│ │ 8. Sign artifacts │ │ +│ │ 9. Store in artifact storage │ │ +│ │ │ │ +│ └────────────────────────────┬────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ DEPLOYMENT ORCHESTRATOR │ │ +│ │ │ │ +│ │ Artifacts distributed to targets via agents │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Artifact Properties + +### Immutability + +Once generated, artifacts are never modified: +- Content-addressed storage (hash in path/metadata) +- No overwrite capability +- Append-only storage pattern + +### Integrity + +All artifacts are: +- Checksummed (SHA-256) +- Signed with deployment key +- Verifiable at deployment time + +### Retention + +| Environment | Retention Period | +|-------------|------------------| +| Development | 30 days | +| Staging | 90 days | +| Production | 7 years (compliance) | + +## API Operations + +```yaml +# List artifacts for deployment +GET /api/v1/deployment-jobs/{id}/artifacts +Response: Artifact[] + +# Download specific artifact +GET /api/v1/deployment-jobs/{id}/artifacts/{name} +Response: binary + +# Get artifact manifest +GET /api/v1/deployment-jobs/{id}/artifacts/manifest +Response: ArtifactManifest + +# Verify artifact integrity +POST /api/v1/deployment-jobs/{id}/artifacts/{name}/verify +Response: { valid: boolean, checksum: string, signature: string } +``` + +## Drift Detection + +Version stickers enable drift detection: + +```typescript +interface DriftCheck { + targetId: UUID; + expectedSticker: VersionSticker; + actualSticker: VersionSticker | null; + driftDetected: boolean; + driftType?: "missing" | "corrupted" | "mismatch"; + details?: { + expectedDigest: string; + actualDigest: string; + field: string; + }; +} +``` + +## References + +- [Deployment Overview](overview.md) +- [Deployment Strategies](strategies.md) +- [Evidence Schema](../appendices/evidence-schema.md) diff --git a/docs/modules/release-orchestrator/deployment/overview.md b/docs/modules/release-orchestrator/deployment/overview.md new file mode 100644 index 000000000..e15aa58ad --- /dev/null +++ b/docs/modules/release-orchestrator/deployment/overview.md @@ -0,0 +1,671 @@ +# Deployment Overview + +## Purpose + +The Deployment system executes the actual deployment of releases to target environments, managing deployment jobs, tasks, artifact generation, and rollback capabilities. + +## Deployment Architecture + +``` + DEPLOYMENT ARCHITECTURE + + ┌─────────────────────────────────────────────────────────────────────────────┐ + │ DEPLOY ORCHESTRATOR │ + │ │ + │ ┌─────────────────────────────────────────────────────────────────────┐ │ + │ │ DEPLOYMENT JOB MANAGER │ │ + │ │ │ │ + │ │ Promotion ───► Create Job ───► Plan Tasks ───► Execute Tasks │ │ + │ │ │ │ + │ └─────────────────────────────────────────────────────────────────────┘ │ + │ │ │ + │ ┌───────────────┼───────────────┐ │ + │ │ │ │ │ + │ ▼ ▼ ▼ │ + │ ┌─────────────────────┐ ┌─────────────────┐ ┌─────────────────────┐ │ + │ │ TARGET EXECUTOR │ │ RUNNER EXECUTOR │ │ ARTIFACT GENERATOR │ │ + │ │ │ │ │ │ │ │ + │ │ - Task dispatch │ │ - Agent tasks │ │ - Compose files │ │ + │ │ - Status tracking │ │ - SSH tasks │ │ - Env configs │ │ + │ │ - Log aggregation │ │ - API tasks │ │ - Manifests │ │ + │ └─────────────────────┘ └─────────────────┘ └─────────────────────┘ │ + │ │ │ + └─────────────────────────────────────────────────────────────────────────────┘ + │ + ┌────────────────────────────┼────────────────────────────┐ + │ │ │ + ▼ ▼ ▼ + ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ + │ Agent │ │ Agentless │ │ API │ + │ Execution │ │ Execution │ │ Execution │ + │ │ │ │ │ │ + │ Docker, │ │ SSH, │ │ ECS, │ + │ Compose │ │ WinRM │ │ Nomad │ + └─────────────┘ └─────────────┘ └─────────────┘ +``` + +## Deployment Flow + +### Standard Deployment Flow + +``` + DEPLOYMENT FLOW + + Promotion Deployment Task Agent/Target + Approved Job Execution + │ │ │ │ + │ Create Job │ │ │ + ├───────────────►│ │ │ + │ │ │ │ + │ │ Generate │ │ + │ │ Artifacts │ │ + │ ├────────────────►│ │ + │ │ │ │ + │ │ Create Tasks │ │ + │ │ per Target │ │ + │ ├────────────────►│ │ + │ │ │ │ + │ │ │ Dispatch Task │ + │ │ ├────────────────►│ + │ │ │ │ + │ │ │ Execute │ + │ │ │ (Pull, Deploy) │ + │ │ │ │ + │ │ │ Report Status │ + │ │ │◄────────────────┤ + │ │ │ │ + │ │ Aggregate │ │ + │ │ Results │ │ + │ │◄────────────────┤ │ + │ │ │ │ + │ Job Complete │ │ │ + │◄───────────────┤ │ │ + │ │ │ │ +``` + +## Deployment Job + +### Job Entity + +```typescript +interface DeploymentJob { + id: UUID; + promotionId: UUID; + releaseId: UUID; + environmentId: UUID; + + // Execution configuration + strategy: DeploymentStrategy; + parallelism: number; + + // Status tracking + status: JobStatus; + startedAt?: DateTime; + completedAt?: DateTime; + + // Artifacts + artifacts: GeneratedArtifact[]; + + // Rollback reference + rollbackOf?: UUID; // If this is a rollback job + previousJobId?: UUID; // Previous successful job + + // Tasks + tasks: DeploymentTask[]; +} + +type JobStatus = + | "pending" + | "preparing" + | "running" + | "completing" + | "completed" + | "failed" + | "rolling_back" + | "rolled_back"; + +type DeploymentStrategy = + | "all-at-once" + | "rolling" + | "canary" + | "blue-green"; +``` + +### Job State Machine + +``` + JOB STATE MACHINE + + ┌──────────┐ + │ PENDING │ + └────┬─────┘ + │ start() + ▼ + ┌──────────┐ + │PREPARING │ + │ │ + │ Generate │ + │ artifacts│ + └────┬─────┘ + │ + ▼ + ┌──────────┐ + │ RUNNING │◄────────────────┐ + │ │ │ + │ Execute │ │ + │ tasks │ │ + └────┬─────┘ │ + │ │ + ┌───────────────┼───────────────┐ │ + │ │ │ │ + ▼ ▼ ▼ │ + ┌──────────┐ ┌──────────┐ ┌──────────┐ │ + │COMPLETING│ │ FAILED │ │ ROLLING │ │ + │ │ │ │ │ BACK │──┘ + │ Verify │ │ │ │ │ + │ health │ │ │ │ │ + └────┬─────┘ └────┬─────┘ └────┬─────┘ + │ │ │ + ▼ │ ▼ + ┌──────────┐ │ ┌──────────┐ + │COMPLETED │ │ │ ROLLED │ + └──────────┘ │ │ BACK │ + │ └──────────┘ + │ + ▼ + [Failure + handling] +``` + +## Deployment Task + +### Task Entity + +```typescript +interface DeploymentTask { + id: UUID; + jobId: UUID; + targetId: UUID; + + // What to deploy + componentId: UUID; + digest: string; + + // Execution + status: TaskStatus; + agentId?: UUID; + startedAt?: DateTime; + completedAt?: DateTime; + + // Results + logs: string; + previousDigest?: string; // For rollback + error?: string; + + // Retry tracking + attemptNumber: number; + maxAttempts: number; +} + +type TaskStatus = + | "pending" + | "queued" + | "dispatched" + | "running" + | "verifying" + | "succeeded" + | "failed" + | "retrying"; +``` + +### Task Dispatch + +```typescript +class TaskDispatcher { + async dispatchTask(task: DeploymentTask): Promise { + const target = await this.targetRepository.get(task.targetId); + + switch (target.executionModel) { + case "agent": + await this.dispatchToAgent(task, target); + break; + + case "ssh": + await this.dispatchViaSsh(task, target); + break; + + case "api": + await this.dispatchViaApi(task, target); + break; + } + } + + private async dispatchToAgent( + task: DeploymentTask, + target: Target + ): Promise { + // Find available agent for target + const agent = await this.agentManager.findAgentForTarget(target); + + if (!agent) { + throw new NoAgentAvailableError(target.id); + } + + // Create task payload + const payload: AgentTaskPayload = { + taskId: task.id, + targetId: target.id, + action: "deploy", + digest: task.digest, + config: target.connection, + credentials: await this.fetchTaskCredentials(target) + }; + + // Dispatch to agent + await this.agentClient.dispatchTask(agent.id, payload); + + // Update task status + task.status = "dispatched"; + task.agentId = agent.id; + await this.taskRepository.update(task); + } +} +``` + +## Generated Artifacts + +### Artifact Types + +| Type | Description | Format | +|------|-------------|--------| +| `compose-file` | Docker Compose file | YAML | +| `compose-lock` | Pinned compose file | YAML | +| `env-file` | Environment variables | .env | +| `systemd-unit` | Systemd service unit | .service | +| `nginx-config` | Nginx configuration | .conf | +| `manifest` | Deployment manifest | JSON | + +### Compose Lock Generation + +```typescript +interface ComposeLock { + version: string; + services: Record; + generated: { + releaseId: string; + promotionId: string; + timestamp: string; + digest: string; // Hash of this file + }; +} + +interface LockedService { + image: string; // Full image reference with digest + environment?: Record; + labels: Record; +} + +class ComposeArtifactGenerator { + async generateLock( + release: Release, + target: Target, + template: ComposeTemplate + ): Promise { + const services: Record = {}; + + for (const [serviceName, serviceConfig] of Object.entries(template.services)) { + // Find component for this service + const componentDigest = release.components.find( + c => c.name === serviceConfig.componentName + ); + + if (!componentDigest) { + throw new Error(`No component found for service ${serviceName}`); + } + + // Build locked image reference + const imageRef = `${componentDigest.repository}@${componentDigest.digest}`; + + services[serviceName] = { + image: imageRef, + environment: { + ...serviceConfig.environment, + STELLA_RELEASE_ID: release.id, + STELLA_DIGEST: componentDigest.digest + }, + labels: { + "stella.release.id": release.id, + "stella.component.name": componentDigest.name, + "stella.digest": componentDigest.digest, + "stella.deployed.at": new Date().toISOString() + } + }; + } + + const lock: ComposeLock = { + version: "3.8", + services, + generated: { + releaseId: release.id, + promotionId: target.promotionId, + timestamp: new Date().toISOString(), + digest: "" // Computed below + } + }; + + // Compute content hash + const content = yaml.stringify(lock); + lock.generated.digest = crypto.createHash("sha256").update(content).digest("hex"); + + return lock; + } +} +``` + +## Deployment Execution + +### Execution Models + +| Model | Description | Use Case | +|-------|-------------|----------| +| `agent` | Stella agent on target | Docker hosts, servers | +| `ssh` | SSH-based agentless | Unix servers | +| `winrm` | WinRM-based agentless | Windows servers | +| `api` | API-based | ECS, Nomad, K8s | + +### Agent-Based Execution + +```typescript +class AgentExecutor { + async execute(task: DeploymentTask): Promise { + const agent = await this.agentManager.get(task.agentId); + const target = await this.targetRepository.get(task.targetId); + + // Prepare task payload with secrets + const payload: TaskPayload = { + taskId: task.id, + targetId: target.id, + action: "deploy", + digest: task.digest, + config: target.connection, + artifacts: await this.getArtifacts(task.jobId), + credentials: await this.secretsManager.fetchForTask(target) + }; + + // Dispatch to agent + const taskRef = await this.agentClient.dispatchTask(agent.id, payload); + + // Wait for completion + const result = await this.waitForTaskCompletion(taskRef, task.timeout); + + return result; + } + + private async waitForTaskCompletion( + taskRef: TaskReference, + timeout: number + ): Promise { + const deadline = Date.now() + timeout * 1000; + + while (Date.now() < deadline) { + const status = await this.agentClient.getTaskStatus(taskRef); + + if (status.completed) { + return { + success: status.success, + logs: status.logs, + deployedDigest: status.deployedDigest, + error: status.error + }; + } + + await sleep(1000); + } + + throw new TimeoutError(`Task did not complete within ${timeout} seconds`); + } +} +``` + +### SSH-Based Execution + +```typescript +class SshExecutor { + async execute(task: DeploymentTask): Promise { + const target = await this.targetRepository.get(task.targetId); + const sshConfig = target.connection as SshConnectionConfig; + + // Get SSH credentials from vault + const creds = await this.secretsManager.fetchSshCredentials( + sshConfig.credentialRef + ); + + // Connect via SSH + const ssh = new NodeSSH(); + await ssh.connect({ + host: sshConfig.host, + port: sshConfig.port || 22, + username: creds.username, + privateKey: creds.privateKey + }); + + try { + // Upload artifacts + const artifacts = await this.getArtifacts(task.jobId); + for (const artifact of artifacts) { + await ssh.putFile(artifact.localPath, artifact.remotePath); + } + + // Execute deployment script + const result = await ssh.execCommand( + this.buildDeployCommand(task, target), + { cwd: sshConfig.workDir } + ); + + return { + success: result.code === 0, + logs: `${result.stdout}\n${result.stderr}`, + error: result.code !== 0 ? result.stderr : undefined + }; + } finally { + ssh.dispose(); + } + } + + private buildDeployCommand(task: DeploymentTask, target: Target): string { + // Build deployment command based on target type + switch (target.targetType) { + case "compose_host": + return `cd ${target.connection.workDir} && docker-compose pull && docker-compose up -d`; + + case "docker_host": + return `docker pull ${task.digest} && docker stop ${target.containerName} && docker run -d --name ${target.containerName} ${task.digest}`; + + default: + throw new Error(`Unsupported target type: ${target.targetType}`); + } + } +} +``` + +## Health Verification + +```typescript +interface HealthCheckConfig { + type: "http" | "tcp" | "command"; + timeout: number; + retries: number; + interval: number; + + // HTTP-specific + path?: string; + expectedStatus?: number; + expectedBody?: string; + + // TCP-specific + port?: number; + + // Command-specific + command?: string; +} + +class HealthVerifier { + async verify( + target: Target, + config: HealthCheckConfig + ): Promise { + let lastError: Error | undefined; + + for (let attempt = 0; attempt < config.retries; attempt++) { + try { + const result = await this.performCheck(target, config); + + if (result.healthy) { + return result; + } + + lastError = new Error(result.message); + } catch (error) { + lastError = error as Error; + } + + if (attempt < config.retries - 1) { + await sleep(config.interval * 1000); + } + } + + return { + healthy: false, + message: lastError?.message || "Health check failed", + attempts: config.retries + }; + } + + private async performCheck( + target: Target, + config: HealthCheckConfig + ): Promise { + switch (config.type) { + case "http": + return this.httpCheck(target, config); + + case "tcp": + return this.tcpCheck(target, config); + + case "command": + return this.commandCheck(target, config); + } + } + + private async httpCheck( + target: Target, + config: HealthCheckConfig + ): Promise { + const url = `${target.healthEndpoint}${config.path || "/health"}`; + + try { + const response = await fetch(url, { + signal: AbortSignal.timeout(config.timeout * 1000) + }); + + const healthy = response.status === (config.expectedStatus || 200); + + return { + healthy, + message: healthy ? "OK" : `Status ${response.status}`, + statusCode: response.status + }; + } catch (error) { + return { + healthy: false, + message: (error as Error).message + }; + } + } +} +``` + +## Rollback Management + +```typescript +class RollbackManager { + async initiateRollback( + jobId: UUID, + reason: string + ): Promise { + const failedJob = await this.jobRepository.get(jobId); + const previousJob = await this.findPreviousSuccessfulJob( + failedJob.environmentId, + failedJob.releaseId + ); + + if (!previousJob) { + throw new NoRollbackTargetError(jobId); + } + + // Create rollback job + const rollbackJob: DeploymentJob = { + id: uuidv4(), + promotionId: failedJob.promotionId, + releaseId: previousJob.releaseId, // Previous release + environmentId: failedJob.environmentId, + strategy: "all-at-once", // Fast rollback + parallelism: 10, + status: "pending", + rollbackOf: jobId, + previousJobId: previousJob.id, + artifacts: [], + tasks: [] + }; + + // Create tasks to restore previous state + for (const task of failedJob.tasks) { + const previousTask = previousJob.tasks.find( + t => t.targetId === task.targetId + ); + + if (previousTask) { + rollbackJob.tasks.push({ + id: uuidv4(), + jobId: rollbackJob.id, + targetId: task.targetId, + componentId: previousTask.componentId, + digest: previousTask.previousDigest || task.previousDigest!, + status: "pending", + logs: "", + attemptNumber: 0, + maxAttempts: 3 + }); + } + } + + await this.jobRepository.save(rollbackJob); + + // Execute rollback + await this.executeJob(rollbackJob); + + return rollbackJob; + } + + private async findPreviousSuccessfulJob( + environmentId: UUID, + excludeReleaseId: UUID + ): Promise { + return this.jobRepository.findOne({ + environmentId, + status: "completed", + releaseId: { $ne: excludeReleaseId } + }, { + orderBy: { completedAt: "desc" } + }); + } +} +``` + +## References + +- [Deployment Strategies](strategies.md) +- [Agent-Based Deployment](agent-based.md) +- [Agentless Deployment](agentless.md) +- [Generated Artifacts](artifacts.md) +- [Deploy Orchestrator Module](../modules/deploy-orchestrator.md) diff --git a/docs/modules/release-orchestrator/deployment/strategies.md b/docs/modules/release-orchestrator/deployment/strategies.md new file mode 100644 index 000000000..f787dfc08 --- /dev/null +++ b/docs/modules/release-orchestrator/deployment/strategies.md @@ -0,0 +1,656 @@ +# Deployment Strategies + +## Overview + +Release Orchestrator supports multiple deployment strategies to balance deployment speed, risk, and availability requirements. + +## Strategy Comparison + +| Strategy | Description | Risk Level | Downtime | Rollback Speed | +|----------|-------------|------------|----------|----------------| +| All-at-once | Deploy to all targets simultaneously | High | Brief | Fast | +| Rolling | Deploy to targets in batches | Medium | None | Medium | +| Canary | Deploy to subset, then expand | Low | None | Fast | +| Blue-Green | Deploy to parallel environment | Low | None | Instant | + +## All-at-Once Strategy + +### Description + +Deploys to all targets simultaneously. Simple and fast, but highest risk. + +``` + ALL-AT-ONCE DEPLOYMENT + + Time 0 Time 1 + ┌─────────────────┐ ┌─────────────────┐ + │ Target 1 [v1] │ │ Target 1 [v2] │ + ├─────────────────┤ ├─────────────────┤ + │ Target 2 [v1] │ ───► │ Target 2 [v2] │ + ├─────────────────┤ ├─────────────────┤ + │ Target 3 [v1] │ │ Target 3 [v2] │ + └─────────────────┘ └─────────────────┘ +``` + +### Configuration + +```typescript +interface AllAtOnceConfig { + strategy: "all-at-once"; + + // Concurrency limit (0 = unlimited) + maxConcurrent: number; + + // Health check after deployment + healthCheck: HealthCheckConfig; + + // Failure behavior + failureBehavior: "rollback" | "continue" | "pause"; +} + +// Example +const config: AllAtOnceConfig = { + strategy: "all-at-once", + maxConcurrent: 0, + healthCheck: { + type: "http", + path: "/health", + timeout: 30, + retries: 3, + interval: 10 + }, + failureBehavior: "rollback" +}; +``` + +### Execution + +```typescript +class AllAtOnceExecutor { + async execute(job: DeploymentJob, config: AllAtOnceConfig): Promise { + const tasks = job.tasks; + const concurrency = config.maxConcurrent || tasks.length; + + // Execute all tasks with concurrency limit + const results = await pMap( + tasks, + async (task) => { + try { + await this.executeTask(task); + return { taskId: task.id, success: true }; + } catch (error) { + return { taskId: task.id, success: false, error }; + } + }, + { concurrency } + ); + + // Check for failures + const failures = results.filter(r => !r.success); + + if (failures.length > 0) { + if (config.failureBehavior === "rollback") { + await this.rollbackAll(job); + throw new DeploymentFailedError(failures); + } else if (config.failureBehavior === "pause") { + job.status = "failed"; + throw new DeploymentFailedError(failures); + } + // "continue" - proceed despite failures + } + + // Health check all targets + await this.verifyAllTargets(job, config.healthCheck); + } +} +``` + +### Use Cases + +- Development environments +- Small deployments +- Time-critical updates +- Stateless services with fast startup + +## Rolling Strategy + +### Description + +Deploys to targets in configurable batches, maintaining availability throughout. + +``` + ROLLING DEPLOYMENT (batch size: 1) + + Time 0 Time 1 Time 2 Time 3 + ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ + │ T1 [v1] │ │ T1 [v2] ✓ │ │ T1 [v2] ✓ │ │ T1 [v2] ✓ │ + ├─────────────┤ ├─────────────┤ ├─────────────┤ ├─────────────┤ + │ T2 [v1] │──►│ T2 [v1] │──►│ T2 [v2] ✓ │──►│ T2 [v2] ✓ │ + ├─────────────┤ ├─────────────┤ ├─────────────┤ ├─────────────┤ + │ T3 [v1] │ │ T3 [v1] │ │ T3 [v1] │ │ T3 [v2] ✓ │ + └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ +``` + +### Configuration + +```typescript +interface RollingConfig { + strategy: "rolling"; + + // Batch configuration + batchSize: number; // Targets per batch + batchPercent?: number; // Alternative: percentage of targets + + // Timing + batchDelay: number; // Seconds between batches + stabilizationTime: number; // Wait after health check passes + + // Health check + healthCheck: HealthCheckConfig; + + // Failure handling + maxFailedBatches: number; // Failures before stopping + failureBehavior: "rollback" | "pause" | "skip"; + + // Ordering + targetOrder: "default" | "shuffle" | "priority"; +} + +// Example +const config: RollingConfig = { + strategy: "rolling", + batchSize: 2, + batchDelay: 30, + stabilizationTime: 60, + healthCheck: { + type: "http", + path: "/health", + timeout: 30, + retries: 5, + interval: 10 + }, + maxFailedBatches: 1, + failureBehavior: "rollback", + targetOrder: "default" +}; +``` + +### Execution + +```typescript +class RollingExecutor { + async execute(job: DeploymentJob, config: RollingConfig): Promise { + const tasks = this.orderTasks(job.tasks, config.targetOrder); + const batches = this.createBatches(tasks, config); + let failedBatches = 0; + const completedTasks: DeploymentTask[] = []; + + for (const batch of batches) { + this.emitProgress(job, { + phase: "deploying", + currentBatch: batches.indexOf(batch) + 1, + totalBatches: batches.length, + completedTargets: completedTasks.length, + totalTargets: tasks.length + }); + + // Execute batch + const results = await Promise.all( + batch.map(task => this.executeTask(task)) + ); + + // Check batch results + const failures = results.filter(r => !r.success); + + if (failures.length > 0) { + failedBatches++; + + if (failedBatches > config.maxFailedBatches) { + if (config.failureBehavior === "rollback") { + await this.rollbackCompleted(completedTasks); + } + throw new DeploymentFailedError(failures); + } + + if (config.failureBehavior === "pause") { + job.status = "failed"; + throw new DeploymentFailedError(failures); + } + // "skip" - continue to next batch + } + + // Health check batch targets + await this.verifyBatch(batch, config.healthCheck); + + // Wait for stabilization + if (config.stabilizationTime > 0) { + await sleep(config.stabilizationTime * 1000); + } + + completedTasks.push(...batch); + + // Wait before next batch + if (batches.indexOf(batch) < batches.length - 1) { + await sleep(config.batchDelay * 1000); + } + } + } + + private createBatches( + tasks: DeploymentTask[], + config: RollingConfig + ): DeploymentTask[][] { + const batchSize = config.batchPercent + ? Math.ceil(tasks.length * config.batchPercent / 100) + : config.batchSize; + + const batches: DeploymentTask[][] = []; + for (let i = 0; i < tasks.length; i += batchSize) { + batches.push(tasks.slice(i, i + batchSize)); + } + + return batches; + } +} +``` + +### Use Cases + +- Production deployments +- High-availability requirements +- Large target counts +- Services requiring gradual rollout + +## Canary Strategy + +### Description + +Deploys to a small subset of targets first, validates, then expands to remaining targets. + +``` + CANARY DEPLOYMENT + + Phase 1: Canary (10%) Phase 2: Expand (50%) Phase 3: Full (100%) + + ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ + │ T1 [v2] ✓ │ ◄─canary │ T1 [v2] ✓ │ │ T1 [v2] ✓ │ + ├─────────────┤ ├─────────────┤ ├─────────────┤ + │ T2 [v1] │ │ T2 [v2] ✓ │ │ T2 [v2] ✓ │ + ├─────────────┤ ├─────────────┤ ├─────────────┤ + │ T3 [v1] │ │ T3 [v2] ✓ │ │ T3 [v2] ✓ │ + ├─────────────┤ ├─────────────┤ ├─────────────┤ + │ T4 [v1] │ │ T4 [v2] ✓ │ │ T4 [v2] ✓ │ + ├─────────────┤ ├─────────────┤ ├─────────────┤ + │ T5 [v1] │ │ T5 [v1] │ │ T5 [v2] ✓ │ + └─────────────┘ └─────────────┘ └─────────────┘ + + │ │ │ + ▼ ▼ ▼ + Health Check Health Check Health Check + Error Rate Check Error Rate Check Error Rate Check +``` + +### Configuration + +```typescript +interface CanaryConfig { + strategy: "canary"; + + // Canary stages + stages: CanaryStage[]; + + // Canary selection + canarySelector: "random" | "labeled" | "first"; + canaryLabel?: string; // Label for canary targets + + // Automatic vs manual progression + autoProgress: boolean; + + // Health and metrics checks + healthCheck: HealthCheckConfig; + metricsCheck?: MetricsCheckConfig; +} + +interface CanaryStage { + name: string; + percentage: number; // Target percentage + duration: number; // Minimum time at this stage (seconds) + autoProgress: boolean; // Auto-advance after duration +} + +interface MetricsCheckConfig { + integrationId: UUID; // Metrics integration + queries: MetricQuery[]; + failureThreshold: number; // Percentage deviation to fail +} + +interface MetricQuery { + name: string; + query: string; // PromQL or similar + operator: "lt" | "gt" | "eq"; + threshold: number; +} + +// Example +const config: CanaryConfig = { + strategy: "canary", + stages: [ + { name: "canary", percentage: 10, duration: 300, autoProgress: false }, + { name: "expand", percentage: 50, duration: 300, autoProgress: true }, + { name: "full", percentage: 100, duration: 0, autoProgress: true } + ], + canarySelector: "labeled", + canaryLabel: "canary=true", + autoProgress: false, + healthCheck: { + type: "http", + path: "/health", + timeout: 30, + retries: 5, + interval: 10 + }, + metricsCheck: { + integrationId: "prometheus-uuid", + queries: [ + { + name: "error_rate", + query: "rate(http_requests_total{status=~\"5..\"}[5m]) / rate(http_requests_total[5m])", + operator: "lt", + threshold: 0.01 // Less than 1% error rate + } + ], + failureThreshold: 10 + } +}; +``` + +### Execution + +```typescript +class CanaryExecutor { + async execute(job: DeploymentJob, config: CanaryConfig): Promise { + const tasks = this.orderTasks(job.tasks, config); + + for (const stage of config.stages) { + const targetCount = Math.ceil(tasks.length * stage.percentage / 100); + const stageTasks = tasks.slice(0, targetCount); + const newTasks = stageTasks.filter(t => t.status === "pending"); + + this.emitProgress(job, { + phase: "canary", + stage: stage.name, + percentage: stage.percentage, + targets: stageTasks.length + }); + + // Deploy to new targets in this stage + await Promise.all(newTasks.map(task => this.executeTask(task))); + + // Health check stage targets + await this.verifyTargets(stageTasks, config.healthCheck); + + // Metrics check if configured + if (config.metricsCheck) { + await this.checkMetrics(stageTasks, config.metricsCheck); + } + + // Wait for stage duration + if (stage.duration > 0) { + await this.waitWithMonitoring( + stageTasks, + stage.duration, + config.metricsCheck + ); + } + + // Wait for manual approval if not auto-progress + if (!stage.autoProgress && stage.percentage < 100) { + await this.waitForApproval(job, stage.name); + } + } + } + + private async checkMetrics( + targets: DeploymentTask[], + config: MetricsCheckConfig + ): Promise { + const metricsClient = await this.getMetricsClient(config.integrationId); + + for (const query of config.queries) { + const result = await metricsClient.query(query.query); + + const passed = this.evaluateMetric(result, query); + + if (!passed) { + throw new CanaryMetricsFailedError(query.name, result, query.threshold); + } + } + } +} +``` + +### Use Cases + +- Risk-sensitive deployments +- Services with real user traffic +- Deployments with metrics-based validation +- Gradual feature rollouts + +## Blue-Green Strategy + +### Description + +Deploys to a parallel "green" environment while "blue" continues serving traffic, then switches. + +``` + BLUE-GREEN DEPLOYMENT + + Phase 1: Deploy Green Phase 2: Switch Traffic + + ┌─────────────────────────┐ ┌─────────────────────────┐ + │ Load Balancer │ │ Load Balancer │ + │ │ │ │ │ │ + │ ▼ │ │ ▼ │ + │ ┌─────────────┐ │ │ ┌─────────────┐ │ + │ │ Blue [v1] │◄─active│ │ │ Blue [v1] │ │ + │ │ T1, T2, T3 │ │ │ │ T1, T2, T3 │ │ + │ └─────────────┘ │ │ └─────────────┘ │ + │ │ │ │ + │ ┌─────────────┐ │ │ ┌─────────────┐ │ + │ │ Green [v2] │◄─deploy│ │ │ Green [v2] │◄─active│ + │ │ T4, T5, T6 │ │ │ │ T4, T5, T6 │ │ + │ └─────────────┘ │ │ └─────────────┘ │ + │ │ │ │ + └─────────────────────────┘ └─────────────────────────┘ +``` + +### Configuration + +```typescript +interface BlueGreenConfig { + strategy: "blue-green"; + + // Environment labels + blueLabel: string; // Label for blue targets + greenLabel: string; // Label for green targets + + // Traffic routing + routerIntegration: UUID; // Router/LB integration + routingConfig: RoutingConfig; + + // Validation + healthCheck: HealthCheckConfig; + warmupTime: number; // Seconds to warm up green + validationTests?: string[]; // Test suites to run + + // Switchover + switchoverMode: "instant" | "gradual"; + gradualSteps?: number[]; // Percentage steps for gradual + + // Rollback + keepBlueActive: number; // Seconds to keep blue ready +} + +// Example +const config: BlueGreenConfig = { + strategy: "blue-green", + blueLabel: "deployment=blue", + greenLabel: "deployment=green", + routerIntegration: "nginx-lb-uuid", + routingConfig: { + upstreamName: "myapp", + healthEndpoint: "/health" + }, + healthCheck: { + type: "http", + path: "/health", + timeout: 30, + retries: 5, + interval: 10 + }, + warmupTime: 60, + validationTests: ["smoke-test-suite"], + switchoverMode: "instant", + keepBlueActive: 1800 // 30 minutes +}; +``` + +### Execution + +```typescript +class BlueGreenExecutor { + async execute(job: DeploymentJob, config: BlueGreenConfig): Promise { + // Identify blue and green targets + const { blue, green } = this.categorizeTargets(job.tasks, config); + + // Phase 1: Deploy to green + this.emitProgress(job, { phase: "deploying-green" }); + + await Promise.all(green.map(task => this.executeTask(task))); + + // Health check green targets + await this.verifyTargets(green, config.healthCheck); + + // Warmup period + if (config.warmupTime > 0) { + this.emitProgress(job, { phase: "warming-up" }); + await sleep(config.warmupTime * 1000); + } + + // Run validation tests + if (config.validationTests?.length) { + this.emitProgress(job, { phase: "validating" }); + await this.runValidationTests(green, config.validationTests); + } + + // Phase 2: Switch traffic + this.emitProgress(job, { phase: "switching-traffic" }); + + if (config.switchoverMode === "instant") { + await this.instantSwitchover(config, blue, green); + } else { + await this.gradualSwitchover(config, blue, green); + } + + // Verify traffic routing + await this.verifyRouting(green, config); + + // Schedule blue decommission + if (config.keepBlueActive > 0) { + this.scheduleBlueDecommission(blue, config.keepBlueActive); + } + } + + private async instantSwitchover( + config: BlueGreenConfig, + blue: DeploymentTask[], + green: DeploymentTask[] + ): Promise { + const router = await this.getRouter(config.routerIntegration); + + // Update upstream to green targets + await router.updateUpstream(config.routingConfig.upstreamName, { + servers: green.map(t => ({ + address: t.target.address, + weight: 1 + })) + }); + + // Remove blue from rotation + await router.removeServers( + config.routingConfig.upstreamName, + blue.map(t => t.target.address) + ); + } + + private async gradualSwitchover( + config: BlueGreenConfig, + blue: DeploymentTask[], + green: DeploymentTask[] + ): Promise { + const router = await this.getRouter(config.routerIntegration); + const steps = config.gradualSteps || [25, 50, 75, 100]; + + for (const percentage of steps) { + await router.setTrafficSplit(config.routingConfig.upstreamName, { + blue: 100 - percentage, + green: percentage + }); + + // Monitor for errors + await this.monitorTraffic(30); + } + } +} +``` + +### Use Cases + +- Zero-downtime deployments +- Database migration deployments +- High-stakes production updates +- Instant rollback requirements + +## Strategy Selection Guide + +``` + STRATEGY SELECTION + + START + │ + ▼ + ┌────────────────────────┐ + │ Zero downtime needed? │ + └───────────┬────────────┘ + │ + No │ Yes + │ │ │ + ▼ │ ▼ + ┌──────────┐ │ ┌───────────────────┐ + │ All-at- │ │ │ Metrics-based │ + │ once │ │ │ validation needed?│ + └──────────┘ │ └─────────┬─────────┘ + │ │ + │ No │ Yes + │ │ │ │ + │ ▼ │ ▼ + │ ┌──────────┐│ ┌──────────┐ + │ │ Instant ││ │ Canary │ + │ │ rollback? ││ │ │ + │ └────┬─────┘│ └──────────┘ + │ │ │ + │ No │ Yes │ + │ │ │ │ │ + │ ▼ │ ▼ │ + │┌──────┐│┌────┴─────┐ + ││Rolling│││Blue-Green│ + │└──────┘│└──────────┘ + │ │ + └───────┘ +``` + +## References + +- [Deployment Overview](overview.md) +- [Progressive Delivery](../modules/progressive-delivery.md) +- [Rollback Management](overview.md#rollback-management) diff --git a/docs/modules/release-orchestrator/design/decisions.md b/docs/modules/release-orchestrator/design/decisions.md new file mode 100644 index 000000000..5b1f1386e --- /dev/null +++ b/docs/modules/release-orchestrator/design/decisions.md @@ -0,0 +1,249 @@ +# Key Architectural Decisions + +This document records significant architectural decisions and their rationale. + +## ADR-001: Digest-First Release Identity + +**Status:** Accepted + +**Context:** +Container images can be referenced by tags (e.g., `v1.2.3`) or digests (e.g., `sha256:abc123...`). Tags are mutable - the same tag can point to different images over time. + +**Decision:** +All releases are identified by immutable OCI digests, never tags. Tags are accepted as input but immediately resolved to digests at release creation time. + +**Consequences:** +- Releases are immutable and reproducible +- Digest mismatch at pull time indicates tampering (deployment fails) +- Rollback targets specific digest, not "previous tag" +- Requires registry integration for tag resolution +- Users see both tag (friendly) and digest (authoritative) in UI + +--- + +## ADR-002: Evidence for Every Decision + +**Status:** Accepted + +**Context:** +Compliance and audit requirements demand proof of what was deployed, when, by whom, and why. + +**Decision:** +Every promotion and deployment produces a cryptographically signed evidence packet that is immutable and append-only. + +**Consequences:** +- Evidence table has no UPDATE/DELETE permissions +- Evidence enables audit-grade compliance reporting +- Evidence enables deterministic replay (same inputs + policy = same decision) +- Evidence packets are exportable for external audit systems +- Storage requirements increase over time + +--- + +## ADR-003: Plugin Architecture for Integrations + +**Status:** Accepted + +**Context:** +Organizations use diverse toolchains (registries, CI/CD, vaults, notification systems). Hard-coding integrations limits adoption. + +**Decision:** +All integrations are implemented as plugins via a three-surface contract (Manifest, Connector Runtime, Step Provider). Core orchestration is stable and plugin-agnostic. + +**Consequences:** +- Core has no hard-coded vendor integrations +- New integrations can be added without core changes +- Plugin failures cannot crash core (sandbox isolation) +- Plugin interface must be versioned and stable +- Additional complexity in plugin lifecycle management + +--- + +## ADR-004: No Feature Gating + +**Status:** Accepted + +**Context:** +Enterprise software often gates security features behind premium tiers, creating "pay for security" anti-patterns. + +**Decision:** +All plans include all features. Pricing is based only on: +- Number of environments +- New digests analyzed per day +- Fair use on deployments + +**Consequences:** +- No feature flags tied to billing tier +- Transparent pricing without feature fragmentation +- May limit revenue optimization per customer +- Quota enforcement must be clear and user-friendly + +--- + +## ADR-005: Offline-First Operation + +**Status:** Accepted + +**Context:** +Many organizations operate in air-gapped or restricted network environments. Dependency on external services limits adoption. + +**Decision:** +All core operations must work in air-gapped environments. External data is synced via mirror bundles. Plugins may require connectivity; core does not. + +**Consequences:** +- No runtime calls to external APIs for core decisions +- Advisory data synced via offline bundles +- Plugin connectivity requirements are declared in manifest +- Evidence packets exportable for external submission +- Additional complexity in data synchronization + +--- + +## ADR-006: Agent-Based and Agentless Deployment + +**Status:** Accepted + +**Context:** +Some organizations prefer agents for security isolation; others prefer agentless for simplicity. + +**Decision:** +Support both agent-based (persistent daemon on targets) and agentless (SSH/WinRM on demand) deployment models. + +**Consequences:** +- Agent provides better performance and reliability +- Agentless reduces infrastructure footprint +- Unified task model abstracts deployment details +- Security model must handle both patterns +- Higher testing matrix + +--- + +## ADR-007: PostgreSQL as Primary Database + +**Status:** Accepted + +**Context:** +Database choice affects scalability, operations, and feature availability. + +**Decision:** +PostgreSQL (16+) as the primary database with: +- Per-module schema isolation +- Row-level security for multi-tenancy +- JSONB for flexible configuration +- Append-only triggers for evidence tables + +**Consequences:** +- Proven scalability and reliability +- Rich feature set (JSONB, RLS, triggers) +- Single database technology to operate +- Requires PostgreSQL expertise +- Schema migrations must be carefully managed + +--- + +## ADR-008: Workflow Engine with DAG Execution + +**Status:** Accepted + +**Context:** +Deployment workflows need conditional logic, parallel execution, error handling, and rollback support. + +**Decision:** +Implement a DAG-based workflow engine where: +- Workflows are templates with nodes (steps) and edges (dependencies) +- Steps execute when all dependencies are satisfied +- Expressions reference previous step outputs +- Built-in support for approval, retry, timeout, and rollback + +**Consequences:** +- Flexible workflow composition +- Visual representation in UI +- Complex error handling scenarios supported +- Learning curve for workflow authors +- Expression engine security considerations + +--- + +## ADR-009: Separation of Duties Enforcement + +**Status:** Accepted + +**Context:** +Compliance requires that the person requesting a change cannot be the same person approving it. + +**Decision:** +Separation of Duties (SoD) is enforced at the approval gateway level, preventing self-approval when SoD is enabled for an environment. + +**Consequences:** +- Prevents single-person deployment to sensitive environments +- Configurable per environment +- May slow down deployments +- Requires minimum team size for SoD-enabled environments + +--- + +## ADR-010: Version Stickers for Drift Detection + +**Status:** Accepted + +**Context:** +Knowing what's actually deployed on targets is essential for audit and troubleshooting. + +**Decision:** +Every deployment writes a `stella.version.json` sticker file on the target containing release ID, digests, deployment timestamp, and deployer identity. + +**Consequences:** +- Enables drift detection (expected vs actual) +- Provides audit trail on target hosts +- Enables accurate "what's deployed where" queries +- Requires file access on targets +- Sticker corruption/deletion must be handled + +--- + +## ADR-011: Security Gate Integration + +**Status:** Accepted + +**Context:** +Security scanning exists as a separate concern; release orchestration should leverage but not duplicate it. + +**Decision:** +Security scanning remains in existing modules (Scanner, VEX). Release orchestration consumes scan results through a security gate that evaluates vulnerability thresholds. + +**Consequences:** +- Clear separation of concerns +- Existing scanning investment preserved +- Gate configuration determines block thresholds +- Requires API integration with scanning modules +- Policy engine evaluates security verdicts + +--- + +## ADR-012: gRPC for Agent Communication + +**Status:** Accepted + +**Context:** +Agent communication requires efficient, bidirectional, and secure data transfer. + +**Decision:** +Use gRPC for agent communication with: +- mTLS for transport security +- Bidirectional streaming for logs and progress +- Protocol buffers for efficient serialization + +**Consequences:** +- Efficient binary protocol +- Strong typing via protobuf +- Built-in streaming support +- Requires gRPC infrastructure +- Firewall considerations for gRPC traffic + +--- + +## References + +- [Design Principles](principles.md) +- [Security Architecture](../security/overview.md) +- [Plugin System](../modules/plugin-system.md) diff --git a/docs/modules/release-orchestrator/design/principles.md b/docs/modules/release-orchestrator/design/principles.md new file mode 100644 index 000000000..61163dccf --- /dev/null +++ b/docs/modules/release-orchestrator/design/principles.md @@ -0,0 +1,221 @@ +# Design Principles & Invariants + +> These principles are **inviolable** and MUST be reflected in all code, UI, documentation, and audit artifacts. + +## Core Principles + +### Principle 1: Release Identity via Digest + +``` +INVARIANT: A release is a set of OCI image digests (component → digest mapping), never tags. +``` + +- Tags are convenience inputs for resolution +- Tags are resolved to digests at release creation time +- All downstream operations (promotion, deployment, rollback) use digests +- Digest mismatch at pull time = deployment failure (tamper detection) + +**Implementation Requirements:** +- Release creation API accepts tags but immediately resolves to digests +- All internal references use `sha256:` prefixed digests +- Agent deployment verifies digest at pull time +- Rollback targets specific digest, not "previous tag" + +### Principle 2: Determinism and Evidence + +``` +INVARIANT: Every deployment/promotion produces an immutable evidence record. +``` + +Evidence record contains: +- **Who**: User identity (from Authority) +- **What**: Release bundle (digests), target environment, target hosts +- **Why**: Policy evaluation result, approval records, decision reasons +- **How**: Generated artifacts (compose files, scripts), execution logs +- **When**: Timestamps for request, decision, execution, completion + +Evidence enables: +- Audit-grade compliance reporting +- Deterministic replay (same inputs + policy → same decision) +- "Why blocked?" explainability + +**Implementation Requirements:** +- Evidence is generated synchronously with decision +- Evidence is signed before storage +- Evidence table is append-only (no UPDATE/DELETE) +- Evidence includes hash of all inputs for replay verification + +### Principle 3: Pluggable Everything, Stable Core + +``` +INVARIANT: Integrations are plugins; the core orchestration engine is stable. +``` + +**Plugins contribute:** +- Configuration screens (UI) +- Connector logic (runtime) +- Step node types (workflow) +- Doctor checks (diagnostics) +- Agent types (deployment) + +**Core engine provides:** +- Workflow execution (DAG processing) +- State machine management +- Evidence generation +- Policy evaluation +- Credential brokering + +**Implementation Requirements:** +- Core has no hard-coded integrations +- Plugin interface is versioned and stable +- Plugin failures cannot crash core +- Core provides fallback behavior when plugins unavailable + +### Principle 4: No Feature Gating + +``` +INVARIANT: All plans include all features. Limits are only: +- Number of environments +- Number of new digests analyzed per day +- Fair use on deployments +``` + +This prevents: +- "Pay for security" anti-pattern +- Per-project/per-seat billing landmines +- Feature fragmentation across tiers + +**Implementation Requirements:** +- No feature flags tied to billing tier +- Quota enforcement is transparent (clear error messages) +- Usage metrics exposed for customer visibility +- Overage handling is graceful (soft limits with warnings) + +### Principle 5: Offline-First Operation + +``` +INVARIANT: All core operations MUST work in air-gapped environments. +``` + +Implications: +- No runtime calls to external APIs for core decisions +- Vulnerability data synced via mirror bundles +- Plugins may require connectivity; core does not +- Evidence packets exportable for external audit + +**Implementation Requirements:** +- Core decision logic has no external HTTP calls +- All external data is pre-synced and cached +- Plugin connectivity requirements are declared in manifest +- Offline mode is explicit configuration, not degraded fallback + +### Principle 6: Immutable Generated Artifacts + +``` +INVARIANT: Every deployment generates and stores immutable artifacts. +``` + +Generated artifacts: +- `compose.stella.lock.yml`: Pinned digests, resolved env refs +- `deploy.stella.script.dll`: Compiled C# script (or hash reference) +- `release.evidence.json`: Decision record +- `stella.version.json`: Version sticker placed on target + +Version sticker enables: +- Drift detection (expected vs actual) +- Audit trail on target host +- Rollback reference + +**Implementation Requirements:** +- Artifacts are content-addressed (hash in filename or metadata) +- Artifacts are stored before deployment execution +- Artifact storage is immutable (no overwrites) +- Version sticker is atomic write on target + +--- + +## Architectural Invariants (Enforced by Design) + +These invariants are enforced through database constraints, code architecture, and operational controls. + +| Invariant | Enforcement Mechanism | +|-----------|----------------------| +| Digests are immutable | Database constraint: digest column is unique, no updates | +| Evidence packets are append-only | Evidence table has no UPDATE/DELETE permissions | +| Secrets never in database | Vault integration; only references stored | +| Plugins cannot bypass policy | Policy evaluation in core, not plugin | +| Multi-tenant isolation | `tenant_id` FK on all tables; row-level security | +| Workflow state is auditable | State transitions logged; no direct state manipulation | +| Approvals are tamper-evident | Approval records are signed and append-only | + +### Database Enforcement + +```sql +-- Example: Evidence table with no UPDATE/DELETE +CREATE TABLE release.evidence_packets ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + promotion_id UUID NOT NULL REFERENCES release.promotions(id), + content_hash TEXT NOT NULL, + content JSONB NOT NULL, + signature TEXT NOT NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT now() + -- No updated_at column; immutable by design +); + +-- Revoke UPDATE/DELETE from application role +REVOKE UPDATE, DELETE ON release.evidence_packets FROM app_role; +``` + +### Code Architecture Enforcement + +```csharp +// Policy evaluation is ALWAYS in core, never delegated to plugins +public sealed class PromotionDecisionEngine +{ + // Plugins provide gate implementations, but core orchestrates evaluation + public async Task EvaluateAsync( + Promotion promotion, + IReadOnlyList gates, + CancellationToken ct) + { + // Core controls evaluation order and aggregation + var results = new List(); + foreach (var gate in gates) + { + // Plugin provides evaluation logic + var result = await gate.EvaluateAsync(promotion, ct); + results.Add(result); + + // Core decides how to aggregate (plugins cannot override) + if (result.IsBlocking && _policy.FailFast) + break; + } + + // Core makes final decision + return _decisionAggregator.Aggregate(results); + } +} +``` + +--- + +## Document Conventions + +Throughout the Release Orchestrator documentation: + +- **MUST**: Mandatory requirement; non-compliance is a bug +- **SHOULD**: Recommended but not mandatory; deviation requires justification +- **MAY**: Optional; implementation decision +- **Entity names**: `PascalCase` (e.g., `ReleaseBundle`) +- **Table names**: `snake_case` (e.g., `release_bundles`) +- **API paths**: `/api/v1/resource-name` +- **Module names**: `kebab-case` (e.g., `release-manager`) + +--- + +## References + +- [Key Architectural Decisions](decisions.md) +- [Module Architecture](../modules/overview.md) +- [Security Architecture](../security/overview.md) diff --git a/docs/modules/release-orchestrator/implementation-guide.md b/docs/modules/release-orchestrator/implementation-guide.md new file mode 100644 index 000000000..f9b806b32 --- /dev/null +++ b/docs/modules/release-orchestrator/implementation-guide.md @@ -0,0 +1,602 @@ +# Implementation Guide + +> .NET 10 implementation patterns and best practices for Release Orchestrator modules. + +**Target Audience**: Development team implementing Release Orchestrator modules +**Prerequisites**: Familiarity with [CLAUDE.md](../../../CLAUDE.md) coding rules + +--- + +## Overview + +This guide supplements the architecture documentation with .NET 10-specific implementation patterns required for all Release Orchestrator modules. These patterns ensure: + +- Deterministic behavior for evidence reproducibility +- Testability through dependency injection +- Compliance with Stella Ops coding standards +- Performance and reliability + +--- + +## Code Quality Requirements + +### Compiler Configuration + +All Release Orchestrator projects **MUST** enforce warnings as errors: + +```xml + + + true + enable + disable + +``` + +**Rationale**: Warnings indicate potential bugs, regressions, or code quality drift. Treating them as errors prevents them from being ignored. + +--- + +## Determinism & Time Handling + +### TimeProvider Injection + +**Never** use `DateTime.UtcNow`, `DateTimeOffset.UtcNow`, or `DateTimeOffset.Now` directly. Always inject `TimeProvider`. + +```csharp +// ❌ BAD - non-deterministic, hard to test +public class PromotionManager +{ + public Promotion CreatePromotion(Guid releaseId, Guid targetEnvId) + { + return new Promotion + { + Id = Guid.NewGuid(), + ReleaseId = releaseId, + TargetEnvironmentId = targetEnvId, + RequestedAt = DateTimeOffset.UtcNow // ❌ Hard-coded time + }; + } +} + +// ✅ GOOD - injectable, testable, deterministic +public class PromotionManager +{ + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + + public PromotionManager(TimeProvider timeProvider, IGuidGenerator guidGenerator) + { + _timeProvider = timeProvider; + _guidGenerator = guidGenerator; + } + + public Promotion CreatePromotion(Guid releaseId, Guid targetEnvId) + { + return new Promotion + { + Id = _guidGenerator.NewGuid(), + ReleaseId = releaseId, + TargetEnvironmentId = targetEnvId, + RequestedAt = _timeProvider.GetUtcNow() // ✅ Injected, testable + }; + } +} +``` + +**Registration**: +```csharp +// Production: use system time +services.AddSingleton(TimeProvider.System); + +// Testing: use manual time for deterministic tests +var manualTime = new ManualTimeProvider(); +manualTime.SetUtcNow(new DateTimeOffset(2026, 1, 10, 12, 0, 0, TimeSpan.Zero)); +services.AddSingleton(manualTime); +``` + +--- + +### GUID Generation + +**Never** use `Guid.NewGuid()` directly. Always inject `IGuidGenerator`. + +```csharp +// ❌ BAD +var releaseId = Guid.NewGuid(); + +// ✅ GOOD +var releaseId = _guidGenerator.NewGuid(); +``` + +**Interface**: +```csharp +public interface IGuidGenerator +{ + Guid NewGuid(); +} + +// Production implementation +public sealed class SystemGuidGenerator : IGuidGenerator +{ + public Guid NewGuid() => Guid.NewGuid(); +} + +// Deterministic test implementation +public sealed class SequentialGuidGenerator : IGuidGenerator +{ + private int _counter; + + public Guid NewGuid() + { + var bytes = new byte[16]; + BitConverter.GetBytes(_counter++).CopyTo(bytes, 0); + return new Guid(bytes); + } +} +``` + +--- + +## Async & Cancellation + +### CancellationToken Propagation + +**Always** propagate `CancellationToken` through async call chains. Never use `CancellationToken.None` except at entry points where no token is available. + +```csharp +// ❌ BAD - ignores cancellation +public async Task ApprovePromotionAsync(Guid promotionId, Guid userId, CancellationToken ct) +{ + var promotion = await _repository.GetByIdAsync(promotionId, CancellationToken.None); // ❌ Wrong + + promotion.Approvals.Add(new Approval + { + ApproverId = userId, + ApprovedAt = _timeProvider.GetUtcNow() + }); + + await _repository.SaveAsync(promotion, CancellationToken.None); // ❌ Wrong + await Task.Delay(1000); // ❌ Missing ct + + return promotion; +} + +// ✅ GOOD - propagates cancellation +public async Task ApprovePromotionAsync(Guid promotionId, Guid userId, CancellationToken ct) +{ + var promotion = await _repository.GetByIdAsync(promotionId, ct); // ✅ Propagated + + promotion.Approvals.Add(new Approval + { + ApproverId = userId, + ApprovedAt = _timeProvider.GetUtcNow() + }); + + await _repository.SaveAsync(promotion, ct); // ✅ Propagated + await Task.Delay(1000, ct); // ✅ Cancellable + + return promotion; +} +``` + +--- + +## HTTP Client Usage + +### IHttpClientFactory for Connector Runtime + +**Never** instantiate `HttpClient` directly. Always use `IHttpClientFactory` with configured timeouts and resilience policies. + +```csharp +// ❌ BAD - direct instantiation risks socket exhaustion +public class GitHubConnector +{ + public async Task GetCommitAsync(string sha) + { + using var client = new HttpClient(); // ❌ Socket exhaustion risk + var response = await client.GetAsync($"https://api.github.com/commits/{sha}"); + return await response.Content.ReadAsStringAsync(); + } +} + +// ✅ GOOD - factory with resilience +public class GitHubConnector +{ + private readonly IHttpClientFactory _httpClientFactory; + + public GitHubConnector(IHttpClientFactory httpClientFactory) + { + _httpClientFactory = httpClientFactory; + } + + public async Task GetCommitAsync(string sha, CancellationToken ct) + { + var client = _httpClientFactory.CreateClient("GitHub"); + var response = await client.GetAsync($"/commits/{sha}", ct); + response.EnsureSuccessStatusCode(); + return await response.Content.ReadAsStringAsync(ct); + } +} +``` + +**Registration with resilience**: +```csharp +services.AddHttpClient("GitHub", client => +{ + client.BaseAddress = new Uri("https://api.github.com"); + client.Timeout = TimeSpan.FromSeconds(30); + client.DefaultRequestHeaders.Add("User-Agent", "StellaOps/1.0"); +}) +.AddStandardResilienceHandler(options => +{ + options.Retry.MaxRetryAttempts = 3; + options.CircuitBreaker.SamplingDuration = TimeSpan.FromSeconds(30); + options.TotalRequestTimeout.Timeout = TimeSpan.FromMinutes(1); +}); +``` + +--- + +## Culture & Formatting + +### Invariant Culture for Parsing + +**Always** use `CultureInfo.InvariantCulture` for parsing and formatting dates, numbers, and any string that will be persisted, hashed, or compared. + +```csharp +// ❌ BAD - culture-sensitive +var percentage = double.Parse(input); +var formatted = value.ToString("P2"); +var dateStr = date.ToString("yyyy-MM-dd"); + +// ✅ GOOD - invariant culture +var percentage = double.Parse(input, CultureInfo.InvariantCulture); +var formatted = value.ToString("P2", CultureInfo.InvariantCulture); +var dateStr = date.ToString("yyyy-MM-dd", CultureInfo.InvariantCulture); +``` + +--- + +## JSON Handling + +### RFC 8785 Canonical JSON for Evidence + +For evidence packets and decision records that will be hashed or signed, use **RFC 8785-compliant** canonical JSON serialization. + +```csharp +// ❌ BAD - non-canonical JSON +var json = JsonSerializer.Serialize(decisionRecord, new JsonSerializerOptions +{ + Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping, + PropertyNamingPolicy = JsonNamingPolicy.CamelCase +}); +var hash = ComputeHash(json); // ❌ Non-deterministic + +// ✅ GOOD - use shared canonicalizer +var canonicalJson = CanonicalJsonSerializer.Serialize(decisionRecord); +var hash = ComputeHash(canonicalJson); // ✅ Deterministic +``` + +**Canonical JSON Requirements**: +- Keys sorted alphabetically +- Minimal escaping per RFC 8785 spec +- No exponent notation for numbers +- No trailing/leading zeros +- No whitespace + +--- + +## Database Interaction + +### DateTimeOffset for PostgreSQL timestamptz + +PostgreSQL `timestamptz` columns **MUST** be read and written as `DateTimeOffset`, not `DateTime`. + +```csharp +// ❌ BAD - loses offset information +await using var reader = await command.ExecuteReaderAsync(ct); +while (await reader.ReadAsync(ct)) +{ + var createdAt = reader.GetDateTime(reader.GetOrdinal("created_at")); // ❌ Loses offset +} + +// ✅ GOOD - preserves offset +await using var reader = await command.ExecuteReaderAsync(ct); +while (await reader.ReadAsync(ct)) +{ + var createdAt = reader.GetFieldValue(reader.GetOrdinal("created_at")); // ✅ Correct +} +``` + +**Insertion**: +```csharp +// ✅ Always use UTC DateTimeOffset +var createdAt = _timeProvider.GetUtcNow(); // Returns DateTimeOffset +await command.ExecuteNonQueryAsync(ct); +``` + +--- + +## Hybrid Logical Clock (HLC) for Distributed Ordering + +For distributed ordering and audit-safe sequencing, use `IHybridLogicalClock` from `StellaOps.HybridLogicalClock`. + +**When to use HLC**: +- Promotion state transitions +- Workflow step execution ordering +- Deployment task sequencing +- Timeline event ordering + +```csharp +public class PromotionStateTransition +{ + private readonly IHybridLogicalClock _hlc; + private readonly TimeProvider _timeProvider; + + public async Task TransitionStateAsync( + Promotion promotion, + PromotionState newState, + CancellationToken ct) + { + var transition = new StateTransition + { + PromotionId = promotion.Id, + FromState = promotion.Status, + ToState = newState, + THlc = _hlc.Tick(), // ✅ Monotonic, skew-tolerant ordering + TsWall = _timeProvider.GetUtcNow(), // ✅ Informational timestamp + TransitionedBy = _currentUser.Id + }; + + await _repository.RecordTransitionAsync(transition, ct); + } +} +``` + +**HLC State Persistence**: +```csharp +// Service startup +public async Task StartAsync(CancellationToken ct) +{ + await _hlc.InitializeFromStateAsync(ct); // Restore monotonicity +} + +// Service shutdown +public async Task StopAsync(CancellationToken ct) +{ + await _hlc.PersistStateAsync(ct); // Persist HLC state +} +``` + +--- + +## Configuration & Options + +### Options Validation at Startup + +Use `ValidateDataAnnotations()` and `ValidateOnStart()` for all options classes. + +```csharp +// Options class +public sealed class PromotionManagerOptions +{ + [Required] + [Range(1, 10)] + public int MaxConcurrentPromotions { get; set; } = 3; + + [Required] + [Range(1, 3600)] + public int ApprovalExpirationSeconds { get; set; } = 1440; +} + +// Registration with validation +services.AddOptions() + .Bind(configuration.GetSection("PromotionManager")) + .ValidateDataAnnotations() + .ValidateOnStart(); + +// Complex validation +public class PromotionManagerOptionsValidator : IValidateOptions +{ + public ValidateOptionsResult Validate(string? name, PromotionManagerOptions options) + { + if (options.MaxConcurrentPromotions <= 0) + return ValidateOptionsResult.Fail("MaxConcurrentPromotions must be positive"); + + return ValidateOptionsResult.Success; + } +} + +services.AddSingleton, PromotionManagerOptionsValidator>(); +``` + +--- + +## Immutability & Collections + +### Return Immutable Collections from Public APIs + +Public APIs **MUST** return `IReadOnlyList`, `ImmutableArray`, or defensive copies. Never expose mutable backing stores. + +```csharp +// ❌ BAD - exposes mutable backing store +public class ReleaseManager +{ + private readonly List _components = new(); + + public List Components => _components; // ❌ Callers can mutate! +} + +// ✅ GOOD - immutable return +public class ReleaseManager +{ + private readonly List _components = new(); + + public IReadOnlyList Components => _components.AsReadOnly(); // ✅ Read-only + + // Or using ImmutableArray + public ImmutableArray GetComponents() => _components.ToImmutableArray(); +} +``` + +--- + +## Error Handling + +### No Silent Stubs + +Placeholder code **MUST** throw `NotImplementedException` or return an explicit error. Never return success from unimplemented paths. + +```csharp +// ❌ BAD - silent stub masks missing implementation +public async Task DeployToNomadAsync(Deployment deployment, CancellationToken ct) +{ + // TODO: implement Nomad deployment + return Result.Success(); // ❌ Ships broken feature! +} + +// ✅ GOOD - explicit failure +public async Task DeployToNomadAsync(Deployment deployment, CancellationToken ct) +{ + throw new NotImplementedException( + "Nomad deployment not yet implemented. See SPRINT_20260115_003_AGENTS_nomad_support.md"); +} + +// ✅ Alternative: return unsupported result +public async Task DeployToNomadAsync(Deployment deployment, CancellationToken ct) +{ + return Result.Failure("Nomad deployment target not yet supported. Use Docker or Compose."); +} +``` + +--- + +## Caching + +### Bounded Caches with Eviction + +**Do not** use `ConcurrentDictionary` or `Dictionary` for caching without eviction policies. Use bounded caches with TTL/LRU eviction. + +```csharp +// ❌ BAD - unbounded growth +public class VersionMapCache +{ + private readonly ConcurrentDictionary _cache = new(); + + public void Add(string tag, DigestMapping mapping) + { + _cache[tag] = mapping; // ❌ Never evicts, memory grows forever + } +} + +// ✅ GOOD - bounded with eviction +public class VersionMapCache +{ + private readonly MemoryCache _cache; + + public VersionMapCache() + { + _cache = new MemoryCache(new MemoryCacheOptions + { + SizeLimit = 10_000 // Max 10k entries + }); + } + + public void Add(string tag, DigestMapping mapping) + { + _cache.Set(tag, mapping, new MemoryCacheEntryOptions + { + Size = 1, + SlidingExpiration = TimeSpan.FromHours(1) // ✅ 1 hour TTL + }); + } + + public DigestMapping? Get(string tag) => _cache.Get(tag); +} +``` + +**Cache TTL Recommendations**: +- **Integration health checks**: 5 minutes +- **Version maps (tag → digest)**: 1 hour +- **Environment configs**: 30 minutes +- **Agent capabilities**: 10 minutes + +--- + +## Testing + +### Test Helpers Must Call Production Code + +Test helpers **MUST** call production code, not reimplement algorithms. Only mock I/O and network boundaries. + +```csharp +// ❌ BAD - test reimplements production logic +public static string ComputeEvidenceHash(DecisionRecord record) +{ + // Custom hash implementation in test + var json = JsonSerializer.Serialize(record); // ❌ Different from production! + return SHA256.HashData(Encoding.UTF8.GetBytes(json)).ToHexString(); +} + +// ✅ GOOD - test uses production code +public static string ComputeEvidenceHash(DecisionRecord record) +{ + // Calls production EvidenceHasher + return EvidenceHasher.ComputeHash(record); // ✅ Same as production +} +``` + +--- + +## Path Resolution + +### Explicit CLI Options for Paths + +**Do not** derive paths from `AppContext.BaseDirectory` with parent directory walks. Use explicit CLI options or environment variables. + +```csharp +// ❌ BAD - fragile parent walks +var repoRoot = Path.GetFullPath(Path.Combine( + AppContext.BaseDirectory, "..", "..", "..", "..")); + +// ✅ GOOD - explicit option with fallback +[Option("--repo-root", Description = "Repository root path")] +public string? RepoRoot { get; set; } + +public string GetRepoRoot() => + RepoRoot + ?? Environment.GetEnvironmentVariable("STELLAOPS_REPO_ROOT") + ?? throw new InvalidOperationException( + "Repository root not specified. Use --repo-root or set STELLAOPS_REPO_ROOT."); +``` + +--- + +## Summary Checklist + +Before submitting a pull request, verify: + +- [ ] `TreatWarningsAsErrors` enabled in project file +- [ ] All timestamps use `TimeProvider`, never `DateTime.UtcNow` +- [ ] All GUIDs use `IGuidGenerator`, never `Guid.NewGuid()` +- [ ] `CancellationToken` propagated through all async methods +- [ ] HTTP clients use `IHttpClientFactory`, never `new HttpClient()` +- [ ] Culture-invariant parsing for all formatted strings +- [ ] Canonical JSON for evidence/decision records +- [ ] `DateTimeOffset` for all PostgreSQL `timestamptz` columns +- [ ] HLC used for distributed ordering where applicable +- [ ] Options classes validated at startup with `ValidateOnStart()` +- [ ] Public APIs return immutable collections +- [ ] No silent stubs; unimplemented code throws `NotImplementedException` +- [ ] Caches have bounded size and TTL eviction +- [ ] Tests exercise production code, not reimplementations + +--- + +## References + +- [CLAUDE.md](../../../CLAUDE.md) — Stella Ops coding rules +- [Test Structure](./test-structure.md) — Test organization guidelines +- [Database Schema](./data-model/schema.md) — Schema patterns +- [HLC Documentation](../../eventing/event-envelope-schema.md) — Event ordering with HLC diff --git a/docs/modules/release-orchestrator/integrations/ci-cd.md b/docs/modules/release-orchestrator/integrations/ci-cd.md new file mode 100644 index 000000000..7d0b65dbc --- /dev/null +++ b/docs/modules/release-orchestrator/integrations/ci-cd.md @@ -0,0 +1,643 @@ +# CI/CD Integration + +## Overview + +Release Orchestrator integrates with CI/CD systems to: +- Receive build completion notifications +- Trigger additional pipelines during deployment +- Create releases from CI artifacts +- Report deployment status back to CI systems + +## Integration Patterns + +### Pattern 1: CI Triggers Release + +``` + CI TRIGGERS RELEASE + + ┌────────────┐ ┌────────────┐ ┌────────────────────┐ + │ CI/CD │ │ Container │ │ Release │ + │ System │ │ Registry │ │ Orchestrator │ + └─────┬──────┘ └─────┬──────┘ └─────────┬──────────┘ + │ │ │ + │ Build & Push │ │ + │─────────────────►│ │ + │ │ │ + │ │ Webhook: image pushed + │ │─────────────────────►│ + │ │ │ + │ │ │ Create/Update + │ │ │ Version Map + │ │ │ + │ │ │ Auto-create + │ │ │ Release (if configured) + │ │ │ + │ API: Create Release (optional) │ + │────────────────────────────────────────►│ + │ │ │ + │ │ │ Start Promotion + │ │ │ Workflow + │ │ │ +``` + +### Pattern 2: Orchestrator Triggers CI + +``` + ORCHESTRATOR TRIGGERS CI + + ┌────────────────────┐ ┌────────────┐ ┌────────────┐ + │ Release │ │ CI/CD │ │ Target │ + │ Orchestrator │ │ System │ │ Systems │ + └─────────┬──────────┘ └─────┬──────┘ └─────┬──────┘ + │ │ │ + │ Pre-deploy: Trigger │ │ + │ Integration Tests │ │ + │─────────────────────►│ │ + │ │ │ + │ │ Run Tests │ + │ │─────────────────►│ + │ │ │ + │ Wait for completion │ │ + │◄─────────────────────│ │ + │ │ │ + │ If passed: Deploy │ │ + │─────────────────────────────────────────► + │ │ │ +``` + +### Pattern 3: Bidirectional Integration + +``` + BIDIRECTIONAL INTEGRATION + + ┌────────────┐ ┌────────────────────┐ + │ CI/CD │◄───────────────────────►│ Release │ + │ System │ │ Orchestrator │ + └─────┬──────┘ └─────────┬──────────┘ + │ │ + │══════════════════════════════════════════│ + │ Events (both directions) │ + │══════════════════════════════════════════│ + │ │ + │ CI Events: │ + │ - Pipeline completed │ + │ - Tests passed/failed │ + │ - Artifacts ready │ + │ │ + │ Orchestrator Events: │ + │ - Deployment started │ + │ - Deployment completed │ + │ - Rollback initiated │ + │ │ +``` + +## CI/CD System Configuration + +### GitLab CI Integration + +```yaml +# .gitlab-ci.yml +stages: + - build + - push + - release + +variables: + STELLA_API_URL: https://stella.example.com/api/v1 + COMPONENT_NAME: myapp + +build: + stage: build + script: + - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA . + +push: + stage: push + script: + - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA + - docker tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA $CI_REGISTRY_IMAGE:$CI_COMMIT_TAG + - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_TAG + rules: + - if: $CI_COMMIT_TAG + +release: + stage: release + image: curlimages/curl:latest + script: + - | + # Get image digest + DIGEST=$(docker inspect --format='{{index .RepoDigests 0}}' $CI_REGISTRY_IMAGE:$CI_COMMIT_TAG | cut -d@ -f2) + + # Create release in Stella + curl -X POST "$STELLA_API_URL/releases" \ + -H "Authorization: Bearer $STELLA_TOKEN" \ + -H "Content-Type: application/json" \ + -d "{ + \"name\": \"$COMPONENT_NAME-$CI_COMMIT_TAG\", + \"components\": [{ + \"componentId\": \"$STELLA_COMPONENT_ID\", + \"digest\": \"$DIGEST\" + }], + \"sourceRef\": { + \"type\": \"git\", + \"repository\": \"$CI_PROJECT_URL\", + \"commit\": \"$CI_COMMIT_SHA\", + \"tag\": \"$CI_COMMIT_TAG\" + } + }" + rules: + - if: $CI_COMMIT_TAG +``` + +### GitHub Actions Integration + +```yaml +# .github/workflows/release.yml +name: Release to Stella + +on: + push: + tags: + - 'v*' + +jobs: + build-and-release: + runs-on: ubuntu-latest + + steps: + - uses: actions/checkout@v4 + + - name: Set up Docker Buildx + uses: docker/setup-buildx-action@v3 + + - name: Login to Container Registry + uses: docker/login-action@v3 + with: + registry: ghcr.io + username: ${{ github.actor }} + password: ${{ secrets.GITHUB_TOKEN }} + + - name: Build and push + uses: docker/build-push-action@v5 + with: + push: true + tags: | + ghcr.io/${{ github.repository }}:${{ github.sha }} + ghcr.io/${{ github.repository }}:${{ github.ref_name }} + + - name: Get image digest + id: digest + run: | + DIGEST=$(docker inspect --format='{{index .RepoDigests 0}}' ghcr.io/${{ github.repository }}:${{ github.ref_name }} | cut -d@ -f2) + echo "digest=$DIGEST" >> $GITHUB_OUTPUT + + - name: Create Stella Release + uses: stella-ops/create-release-action@v1 + with: + stella-url: ${{ vars.STELLA_API_URL }} + stella-token: ${{ secrets.STELLA_TOKEN }} + release-name: ${{ github.event.repository.name }}-${{ github.ref_name }} + components: | + - componentId: ${{ vars.STELLA_COMPONENT_ID }} + digest: ${{ steps.digest.outputs.digest }} + source-ref: | + type: git + repository: ${{ github.server_url }}/${{ github.repository }} + commit: ${{ github.sha }} + tag: ${{ github.ref_name }} +``` + +### Jenkins Integration + +```groovy +// Jenkinsfile +pipeline { + agent any + + environment { + STELLA_API_URL = 'https://stella.example.com/api/v1' + STELLA_TOKEN = credentials('stella-api-token') + REGISTRY = 'registry.example.com' + IMAGE_NAME = 'myorg/myapp' + } + + stages { + stage('Build') { + steps { + script { + docker.build("${REGISTRY}/${IMAGE_NAME}:${env.BUILD_TAG}") + } + } + } + + stage('Push') { + steps { + script { + docker.withRegistry("https://${REGISTRY}", 'registry-creds') { + docker.image("${REGISTRY}/${IMAGE_NAME}:${env.BUILD_TAG}").push() + } + } + } + } + + stage('Create Release') { + when { + tag pattern: "v\\d+\\.\\d+\\.\\d+", comparator: "REGEXP" + } + steps { + script { + def digest = sh( + script: "docker inspect --format='{{index .RepoDigests 0}}' ${REGISTRY}/${IMAGE_NAME}:${env.TAG_NAME} | cut -d@ -f2", + returnStdout: true + ).trim() + + def response = httpRequest( + url: "${STELLA_API_URL}/releases", + httpMode: 'POST', + contentType: 'APPLICATION_JSON', + customHeaders: [[name: 'Authorization', value: "Bearer ${STELLA_TOKEN}"]], + requestBody: """ + { + "name": "${IMAGE_NAME}-${env.TAG_NAME}", + "components": [{ + "componentId": "${env.STELLA_COMPONENT_ID}", + "digest": "${digest}" + }], + "sourceRef": { + "type": "git", + "repository": "${env.GIT_URL}", + "commit": "${env.GIT_COMMIT}", + "tag": "${env.TAG_NAME}" + } + } + """ + ) + + echo "Release created: ${response.content}" + } + } + } + } + + post { + success { + // Notify Stella of successful build + httpRequest( + url: "${STELLA_API_URL}/webhooks/ci-status", + httpMode: 'POST', + contentType: 'APPLICATION_JSON', + customHeaders: [[name: 'Authorization', value: "Bearer ${STELLA_TOKEN}"]], + requestBody: """ + { + "buildId": "${env.BUILD_ID}", + "status": "success", + "commit": "${env.GIT_COMMIT}" + } + """ + ) + } + } +} +``` + +## Workflow Step Integration + +### Trigger CI Pipeline Step + +```typescript +// Step type: trigger-ci +interface TriggerCIConfig { + integrationId: UUID; // CI integration reference + pipelineId: string; // Pipeline to trigger + ref?: string; // Branch/tag reference + variables?: Record; + waitForCompletion: boolean; + timeout?: number; +} + +class TriggerCIStep implements IStepExecutor { + async execute( + inputs: StepInputs, + config: TriggerCIConfig, + context: ExecutionContext + ): Promise { + const connector = await this.getConnector(config.integrationId); + + // Trigger pipeline + const run = await connector.triggerPipeline( + config.pipelineId, + { + ref: config.ref || context.release?.sourceRef?.tag, + variables: { + ...config.variables, + STELLA_RELEASE_ID: context.release?.id, + STELLA_PROMOTION_ID: context.promotion?.id, + STELLA_ENVIRONMENT: context.environment?.name + } + } + ); + + if (!config.waitForCompletion) { + return { + pipelineRunId: run.id, + status: run.status, + webUrl: run.webUrl + }; + } + + // Wait for completion + const finalStatus = await this.waitForPipeline( + connector, + run.id, + config.timeout || 3600 + ); + + if (finalStatus.status !== "success") { + throw new StepError( + `Pipeline failed with status: ${finalStatus.status}`, + { pipelineRunId: run.id, status: finalStatus } + ); + } + + return { + pipelineRunId: run.id, + status: finalStatus.status, + webUrl: run.webUrl + }; + } + + private async waitForPipeline( + connector: ICICDConnector, + runId: string, + timeout: number + ): Promise { + const deadline = Date.now() + timeout * 1000; + + while (Date.now() < deadline) { + const run = await connector.getPipelineRun(runId); + + if (run.status === "success" || run.status === "failed" || run.status === "cancelled") { + return run; + } + + await sleep(10000); // Poll every 10 seconds + } + + throw new TimeoutError(`Pipeline did not complete within ${timeout} seconds`); + } +} +``` + +### Wait for CI Step + +```typescript +// Step type: wait-ci +interface WaitCIConfig { + integrationId: UUID; + runId?: string; // If known, or from input + runIdInput?: string; // Input name containing run ID + timeout: number; + failOnError: boolean; +} + +class WaitCIStep implements IStepExecutor { + async execute( + inputs: StepInputs, + config: WaitCIConfig, + context: ExecutionContext + ): Promise { + const runId = config.runId || inputs[config.runIdInput!]; + + if (!runId) { + throw new StepError("Pipeline run ID not provided"); + } + + const connector = await this.getConnector(config.integrationId); + + const finalStatus = await this.waitForPipeline( + connector, + runId, + config.timeout + ); + + const success = finalStatus.status === "success"; + + if (!success && config.failOnError) { + throw new StepError( + `Pipeline failed with status: ${finalStatus.status}`, + { pipelineRunId: runId, status: finalStatus } + ); + } + + return { + status: finalStatus.status, + success, + pipelineRun: finalStatus + }; + } +} +``` + +## Deployment Status Reporting + +### GitHub Deployment Status + +```typescript +class GitHubStatusReporter { + async reportDeploymentStart( + integration: Integration, + deployment: DeploymentContext + ): Promise { + const client = await this.getClient(integration); + + // Create deployment + const { data: ghDeployment } = await client.repos.createDeployment({ + owner: deployment.repository.owner, + repo: deployment.repository.name, + ref: deployment.sourceRef.commit, + environment: deployment.environment.name, + auto_merge: false, + required_contexts: [], + payload: { + stellaReleaseId: deployment.release.id, + stellaPromotionId: deployment.promotion.id + } + }); + + // Set status to in_progress + await client.repos.createDeploymentStatus({ + owner: deployment.repository.owner, + repo: deployment.repository.name, + deployment_id: ghDeployment.id, + state: "in_progress", + log_url: `${this.stellaUrl}/deployments/${deployment.jobId}`, + description: "Deployment in progress" + }); + + // Store deployment ID for later status update + await this.storeMapping(deployment.jobId, ghDeployment.id); + } + + async reportDeploymentComplete( + integration: Integration, + deployment: DeploymentContext, + success: boolean + ): Promise { + const client = await this.getClient(integration); + const ghDeploymentId = await this.getMapping(deployment.jobId); + + await client.repos.createDeploymentStatus({ + owner: deployment.repository.owner, + repo: deployment.repository.name, + deployment_id: ghDeploymentId, + state: success ? "success" : "failure", + log_url: `${this.stellaUrl}/deployments/${deployment.jobId}`, + environment_url: deployment.environment.url, + description: success + ? "Deployment completed successfully" + : "Deployment failed" + }); + } +} +``` + +### GitLab Pipeline Status + +```typescript +class GitLabStatusReporter { + async reportDeploymentStatus( + integration: Integration, + deployment: DeploymentContext, + state: "running" | "success" | "failed" | "canceled" + ): Promise { + const client = await this.getClient(integration); + + await client.post( + `/projects/${integration.config.projectId}/statuses/${deployment.sourceRef.commit}`, + { + state, + ref: deployment.sourceRef.tag || deployment.sourceRef.branch, + name: `stella/${deployment.environment.name}`, + target_url: `${this.stellaUrl}/deployments/${deployment.jobId}`, + description: this.getDescription(state, deployment) + } + ); + } + + private getDescription(state: string, deployment: DeploymentContext): string { + switch (state) { + case "running": + return `Deploying to ${deployment.environment.name}`; + case "success": + return `Deployed to ${deployment.environment.name}`; + case "failed": + return `Deployment to ${deployment.environment.name} failed`; + case "canceled": + return `Deployment to ${deployment.environment.name} cancelled`; + default: + return ""; + } + } +} +``` + +## API for CI Systems + +### Create Release from CI + +```http +POST /api/v1/releases +Authorization: Bearer +Content-Type: application/json + +{ + "name": "myapp-v1.2.0", + "components": [ + { + "componentId": "component-uuid", + "digest": "sha256:abc123..." + } + ], + "sourceRef": { + "type": "git", + "repository": "https://github.com/myorg/myapp", + "commit": "abc123def456", + "tag": "v1.2.0", + "branch": "main" + }, + "metadata": { + "buildId": "12345", + "buildUrl": "https://ci.example.com/builds/12345", + "triggeredBy": "ci-pipeline" + } +} +``` + +### Report Build Status + +```http +POST /api/v1/ci-events/build-complete +Authorization: Bearer +Content-Type: application/json + +{ + "integrationId": "integration-uuid", + "buildId": "12345", + "status": "success", + "commit": "abc123def456", + "artifacts": [ + { + "name": "myapp", + "digest": "sha256:abc123...", + "repository": "registry.example.com/myorg/myapp" + } + ], + "testResults": { + "passed": 150, + "failed": 0, + "skipped": 5 + } +} +``` + +## Service Account for CI + +### Creating CI Service Account + +```http +POST /api/v1/service-accounts +Authorization: Bearer +Content-Type: application/json + +{ + "name": "ci-pipeline", + "description": "Service account for CI/CD integration", + "roles": ["release-creator"], + "permissions": [ + { "resource": "release", "action": "create" }, + { "resource": "component", "action": "read" }, + { "resource": "version-map", "action": "read" } + ], + "expiresIn": "365d" +} +``` + +Response: +```json +{ + "success": true, + "data": { + "id": "sa-uuid", + "name": "ci-pipeline", + "token": "stella_sa_xxxxxxxxxxxxx", + "expiresAt": "2027-01-09T00:00:00Z" + } +} +``` + +## References + +- [Integrations Overview](overview.md) +- [Connectors](connectors.md) +- [Webhooks](webhooks.md) +- [Workflow Templates](../workflow/templates.md) diff --git a/docs/modules/release-orchestrator/integrations/connectors.md b/docs/modules/release-orchestrator/integrations/connectors.md new file mode 100644 index 000000000..007cc7505 --- /dev/null +++ b/docs/modules/release-orchestrator/integrations/connectors.md @@ -0,0 +1,900 @@ +# Connector Development + +## Overview + +Connectors are the integration layer between Release Orchestrator and external systems. Each connector implements a standard interface for its integration type. + +## Connector Architecture + +``` + CONNECTOR ARCHITECTURE + + ┌─────────────────────────────────────────────────────────────────────────────┐ + │ CONNECTOR RUNTIME │ + │ │ + │ ┌─────────────────────────────────────────────────────────────────────┐ │ + │ │ CONNECTOR INTERFACE │ │ + │ │ │ │ + │ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │ + │ │ │ getCapabilities()│ │ ping() │ │ authenticate() │ │ │ + │ │ └──────────────────┘ └──────────────────┘ └──────────────────┘ │ │ + │ │ │ │ + │ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │ + │ │ │ discover() │ │ execute() │ │ healthCheck() │ │ │ + │ │ └──────────────────┘ └──────────────────┘ └──────────────────┘ │ │ + │ │ │ │ + │ └─────────────────────────────────────────────────────────────────────┘ │ + │ │ │ + │ ▼ │ + │ ┌─────────────────────────────────────────────────────────────────────┐ │ + │ │ CONNECTOR IMPLEMENTATIONS │ │ + │ │ │ │ + │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ + │ │ │ Registry │ │ CI/CD │ │ Notification│ │ Secret │ │ │ + │ │ │ Connectors │ │ Connectors │ │ Connectors │ │ Connectors │ │ │ + │ │ │ │ │ │ │ │ │ │ │ │ + │ │ │ - Docker │ │ - GitLab │ │ - Slack │ │ - Vault │ │ │ + │ │ │ - ECR │ │ - GitHub │ │ - Teams │ │ - AWS SM │ │ │ + │ │ │ - ACR │ │ - Jenkins │ │ - Email │ │ - Azure KV │ │ │ + │ │ │ - Harbor │ │ - Azure DO │ │ - PagerDuty │ │ │ │ │ + │ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ + │ │ │ │ + │ └─────────────────────────────────────────────────────────────────────┘ │ + │ │ + └─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Base Connector Interface + +```typescript +interface IConnector { + // Metadata + readonly typeId: string; + readonly displayName: string; + readonly version: string; + readonly capabilities: ConnectorCapabilities; + + // Lifecycle + initialize(config: IntegrationConfig): Promise; + dispose(): Promise; + + // Health + ping(config: IntegrationConfig): Promise; + healthCheck(config: IntegrationConfig, creds: Credential): Promise; + + // Authentication + authenticate(config: IntegrationConfig, creds: Credential): Promise; + + // Discovery (optional) + discover?( + config: IntegrationConfig, + authContext: AuthContext, + resourceType: string, + filter?: DiscoveryFilter + ): Promise; +} + +interface ConnectorCapabilities { + discovery: boolean; + webhooks: boolean; + streaming: boolean; + batchOperations: boolean; + customActions: string[]; +} +``` + +## Registry Connectors + +### IRegistryConnector + +```typescript +interface IRegistryConnector extends IConnector { + // Repository operations + listRepositories(authContext: AuthContext): Promise; + + // Tag operations + listTags(authContext: AuthContext, repository: string): Promise; + getManifest(authContext: AuthContext, repository: string, reference: string): Promise; + getDigest(authContext: AuthContext, repository: string, tag: string): Promise; + + // Image operations + imageExists(authContext: AuthContext, repository: string, digest: string): Promise; + getImageMetadata(authContext: AuthContext, repository: string, digest: string): Promise; +} + +interface Repository { + name: string; + fullName: string; + tagCount?: number; + lastUpdated?: DateTime; +} + +interface Tag { + name: string; + digest: string; + createdAt?: DateTime; + size?: number; +} + +interface ImageMetadata { + digest: string; + mediaType: string; + size: number; + architecture: string; + os: string; + created: DateTime; + labels: Record; + layers: LayerInfo[]; +} +``` + +### Docker Registry Connector + +```typescript +class DockerRegistryConnector implements IRegistryConnector { + readonly typeId = "docker-registry"; + readonly displayName = "Docker Registry"; + readonly version = "1.0.0"; + readonly capabilities: ConnectorCapabilities = { + discovery: true, + webhooks: true, + streaming: false, + batchOperations: false, + customActions: [] + }; + + private httpClient: HttpClient; + + async initialize(config: DockerRegistryConfig): Promise { + this.httpClient = new HttpClient({ + baseUrl: config.url, + timeout: config.timeout || 30000, + insecureSkipVerify: config.insecureSkipVerify + }); + } + + async ping(config: DockerRegistryConfig): Promise { + const response = await this.httpClient.get("/v2/"); + if (response.status !== 200 && response.status !== 401) { + throw new Error(`Registry unavailable: ${response.status}`); + } + } + + async authenticate( + config: DockerRegistryConfig, + creds: BasicCredential + ): Promise { + // Get auth challenge from /v2/ + const challenge = await this.getAuthChallenge(); + + if (challenge.type === "bearer") { + // OAuth2 token flow + const token = await this.getToken(challenge, creds); + return { type: "bearer", token }; + } else { + // Basic auth + return { + type: "basic", + credentials: Buffer.from(`${creds.username}:${creds.password}`).toString("base64") + }; + } + } + + async getDigest( + authContext: AuthContext, + repository: string, + tag: string + ): Promise { + const response = await this.httpClient.head( + `/v2/${repository}/manifests/${tag}`, + { + headers: { + ...this.authHeader(authContext), + Accept: "application/vnd.docker.distribution.manifest.v2+json" + } + } + ); + + const digest = response.headers.get("docker-content-digest"); + if (!digest) { + throw new Error("No digest header in response"); + } + + return digest; + } + + async getImageMetadata( + authContext: AuthContext, + repository: string, + digest: string + ): Promise { + // Fetch manifest + const manifest = await this.getManifest(authContext, repository, digest); + + // Fetch config blob + const configDigest = manifest.config.digest; + const configResponse = await this.httpClient.get( + `/v2/${repository}/blobs/${configDigest}`, + { headers: this.authHeader(authContext) } + ); + + const config = await configResponse.json(); + + return { + digest, + mediaType: manifest.mediaType, + size: manifest.config.size, + architecture: config.architecture, + os: config.os, + created: new Date(config.created), + labels: config.config?.Labels || {}, + layers: manifest.layers.map(l => ({ + digest: l.digest, + size: l.size, + mediaType: l.mediaType + })) + }; + } +} +``` + +### ECR Connector + +```typescript +class ECRConnector implements IRegistryConnector { + readonly typeId = "ecr"; + readonly displayName = "AWS ECR"; + readonly version = "1.0.0"; + readonly capabilities: ConnectorCapabilities = { + discovery: true, + webhooks: false, + streaming: false, + batchOperations: true, + customActions: ["createRepository", "setLifecyclePolicy"] + }; + + private ecrClient: ECRClient; + + async initialize(config: ECRConfig): Promise { + this.ecrClient = new ECRClient({ + region: config.region, + credentials: { + accessKeyId: config.accessKeyId, + secretAccessKey: config.secretAccessKey + } + }); + } + + async authenticate( + config: ECRConfig, + creds: AWSCredential + ): Promise { + const command = new GetAuthorizationTokenCommand({}); + const response = await this.ecrClient.send(command); + + const authData = response.authorizationData?.[0]; + if (!authData?.authorizationToken) { + throw new Error("Failed to get ECR authorization token"); + } + + return { + type: "bearer", + token: authData.authorizationToken, + expiresAt: authData.expiresAt + }; + } + + async listRepositories(authContext: AuthContext): Promise { + const repositories: Repository[] = []; + let nextToken: string | undefined; + + do { + const command = new DescribeRepositoriesCommand({ + nextToken + }); + const response = await this.ecrClient.send(command); + + for (const repo of response.repositories || []) { + repositories.push({ + name: repo.repositoryName!, + fullName: repo.repositoryUri!, + lastUpdated: repo.createdAt + }); + } + + nextToken = response.nextToken; + } while (nextToken); + + return repositories; + } +} +``` + +## CI/CD Connectors + +### ICICDConnector + +```typescript +interface ICICDConnector extends IConnector { + // Pipeline operations + listPipelines(authContext: AuthContext): Promise; + getPipeline(authContext: AuthContext, pipelineId: string): Promise; + + // Trigger operations + triggerPipeline( + authContext: AuthContext, + pipelineId: string, + params: TriggerParams + ): Promise; + + // Run operations + getPipelineRun(authContext: AuthContext, runId: string): Promise; + cancelPipelineRun(authContext: AuthContext, runId: string): Promise; + getPipelineRunLogs(authContext: AuthContext, runId: string): Promise; +} + +interface Pipeline { + id: string; + name: string; + ref?: string; + webUrl?: string; +} + +interface TriggerParams { + ref?: string; // Branch/tag + variables?: Record; +} + +interface PipelineRun { + id: string; + pipelineId: string; + status: PipelineStatus; + ref?: string; + webUrl?: string; + startedAt?: DateTime; + finishedAt?: DateTime; +} + +type PipelineStatus = + | "pending" + | "running" + | "success" + | "failed" + | "cancelled"; +``` + +### GitLab CI Connector + +```typescript +class GitLabCIConnector implements ICICDConnector { + readonly typeId = "gitlab-ci"; + readonly displayName = "GitLab CI/CD"; + readonly version = "1.0.0"; + readonly capabilities: ConnectorCapabilities = { + discovery: true, + webhooks: true, + streaming: false, + batchOperations: false, + customActions: ["retryPipeline"] + }; + + private apiClient: GitLabClient; + + async initialize(config: GitLabCIConfig): Promise { + this.apiClient = new GitLabClient({ + baseUrl: config.url, + projectId: config.projectId + }); + } + + async authenticate( + config: GitLabCIConfig, + creds: TokenCredential + ): Promise { + // Validate token with user endpoint + this.apiClient.setToken(creds.token); + await this.apiClient.get("/user"); + + return { + type: "bearer", + token: creds.token + }; + } + + async triggerPipeline( + authContext: AuthContext, + pipelineId: string, + params: TriggerParams + ): Promise { + const response = await this.apiClient.post( + `/projects/${this.projectId}/pipeline`, + { + ref: params.ref || this.defaultBranch, + variables: Object.entries(params.variables || {}).map(([key, value]) => ({ + key, + value, + variable_type: "env_var" + })) + }, + { headers: { Authorization: `Bearer ${authContext.token}` } } + ); + + return { + id: response.id.toString(), + pipelineId: pipelineId, + status: this.mapStatus(response.status), + ref: response.ref, + webUrl: response.web_url, + startedAt: response.started_at ? new Date(response.started_at) : undefined + }; + } + + async getPipelineRun( + authContext: AuthContext, + runId: string + ): Promise { + const response = await this.apiClient.get( + `/projects/${this.projectId}/pipelines/${runId}`, + { headers: { Authorization: `Bearer ${authContext.token}` } } + ); + + return { + id: response.id.toString(), + pipelineId: response.id.toString(), + status: this.mapStatus(response.status), + ref: response.ref, + webUrl: response.web_url, + startedAt: response.started_at ? new Date(response.started_at) : undefined, + finishedAt: response.finished_at ? new Date(response.finished_at) : undefined + }; + } + + private mapStatus(gitlabStatus: string): PipelineStatus { + const statusMap: Record = { + created: "pending", + waiting_for_resource: "pending", + preparing: "pending", + pending: "pending", + running: "running", + success: "success", + failed: "failed", + canceled: "cancelled", + skipped: "cancelled", + manual: "pending" + }; + return statusMap[gitlabStatus] || "pending"; + } +} +``` + +## Notification Connectors + +### INotificationConnector + +```typescript +interface INotificationConnector extends IConnector { + // Channel operations + listChannels(authContext: AuthContext): Promise; + + // Send operations + sendMessage( + authContext: AuthContext, + channel: string, + message: NotificationMessage + ): Promise; + + sendTemplate( + authContext: AuthContext, + channel: string, + templateId: string, + data: Record + ): Promise; +} + +interface Channel { + id: string; + name: string; + type: string; +} + +interface NotificationMessage { + text: string; + title?: string; + color?: string; + fields?: MessageField[]; + actions?: MessageAction[]; +} + +interface MessageField { + name: string; + value: string; + inline?: boolean; +} + +interface MessageAction { + type: "button" | "link"; + text: string; + url?: string; + style?: "primary" | "danger" | "default"; +} +``` + +### Slack Connector + +```typescript +class SlackConnector implements INotificationConnector { + readonly typeId = "slack"; + readonly displayName = "Slack"; + readonly version = "1.0.0"; + readonly capabilities: ConnectorCapabilities = { + discovery: true, + webhooks: true, + streaming: false, + batchOperations: false, + customActions: ["addReaction", "updateMessage"] + }; + + private slackClient: WebClient; + + async initialize(config: SlackConfig): Promise { + // Client initialized in authenticate + } + + async authenticate( + config: SlackConfig, + creds: TokenCredential + ): Promise { + this.slackClient = new WebClient(creds.token); + + // Test authentication + const result = await this.slackClient.auth.test(); + if (!result.ok) { + throw new Error("Slack authentication failed"); + } + + return { + type: "bearer", + token: creds.token, + teamId: result.team_id, + userId: result.user_id + }; + } + + async listChannels(authContext: AuthContext): Promise { + const channels: Channel[] = []; + let cursor: string | undefined; + + do { + const result = await this.slackClient.conversations.list({ + types: "public_channel,private_channel", + cursor + }); + + for (const channel of result.channels || []) { + channels.push({ + id: channel.id!, + name: channel.name!, + type: channel.is_private ? "private" : "public" + }); + } + + cursor = result.response_metadata?.next_cursor; + } while (cursor); + + return channels; + } + + async sendMessage( + authContext: AuthContext, + channel: string, + message: NotificationMessage + ): Promise { + const blocks = this.buildBlocks(message); + + const result = await this.slackClient.chat.postMessage({ + channel, + text: message.text, + blocks, + attachments: message.color ? [{ + color: message.color, + blocks + }] : undefined + }); + + return { + messageId: result.ts!, + channel: result.channel!, + success: result.ok + }; + } + + private buildBlocks(message: NotificationMessage): KnownBlock[] { + const blocks: KnownBlock[] = []; + + if (message.title) { + blocks.push({ + type: "header", + text: { + type: "plain_text", + text: message.title + } + }); + } + + blocks.push({ + type: "section", + text: { + type: "mrkdwn", + text: message.text + } + }); + + if (message.fields?.length) { + blocks.push({ + type: "section", + fields: message.fields.map(f => ({ + type: "mrkdwn", + text: `*${f.name}*\n${f.value}` + })) + }); + } + + if (message.actions?.length) { + blocks.push({ + type: "actions", + elements: message.actions.map(a => ({ + type: "button", + text: { + type: "plain_text", + text: a.text + }, + url: a.url, + style: a.style === "danger" ? "danger" : "primary" + })) + }); + } + + return blocks; + } +} +``` + +## Secret Store Connectors + +### ISecretConnector + +```typescript +interface ISecretConnector extends IConnector { + // Secret operations + getSecret( + authContext: AuthContext, + path: string, + key?: string + ): Promise; + + listSecrets( + authContext: AuthContext, + path: string + ): Promise; +} + +interface SecretValue { + value: string; + version?: string; + createdAt?: DateTime; + expiresAt?: DateTime; +} +``` + +### HashiCorp Vault Connector + +```typescript +class VaultConnector implements ISecretConnector { + readonly typeId = "hashicorp-vault"; + readonly displayName = "HashiCorp Vault"; + readonly version = "1.0.0"; + readonly capabilities: ConnectorCapabilities = { + discovery: true, + webhooks: false, + streaming: false, + batchOperations: false, + customActions: ["renewToken"] + }; + + private vaultClient: VaultClient; + + async initialize(config: VaultConfig): Promise { + this.vaultClient = new VaultClient({ + endpoint: config.url, + namespace: config.namespace + }); + } + + async authenticate( + config: VaultConfig, + creds: Credential + ): Promise { + let token: string; + + switch (config.authMethod) { + case "token": + token = (creds as TokenCredential).token; + break; + + case "approle": + const approle = creds as AppRoleCredential; + const result = await this.vaultClient.auth.approle.login({ + role_id: approle.roleId, + secret_id: approle.secretId + }); + token = result.auth.client_token; + break; + + case "kubernetes": + const k8s = creds as KubernetesCredential; + const k8sResult = await this.vaultClient.auth.kubernetes.login({ + role: k8s.role, + jwt: k8s.serviceAccountToken + }); + token = k8sResult.auth.client_token; + break; + + default: + throw new Error(`Unsupported auth method: ${config.authMethod}`); + } + + this.vaultClient.token = token; + + return { + type: "bearer", + token, + renewable: true + }; + } + + async getSecret( + authContext: AuthContext, + path: string, + key?: string + ): Promise { + const result = await this.vaultClient.kv.v2.read({ + mount_path: this.mountPath, + path + }); + + const data = result.data.data; + const value = key ? data[key] : JSON.stringify(data); + + return { + value, + version: result.data.metadata.version.toString(), + createdAt: new Date(result.data.metadata.created_time) + }; + } + + async listSecrets( + authContext: AuthContext, + path: string + ): Promise { + const result = await this.vaultClient.kv.v2.list({ + mount_path: this.mountPath, + path + }); + + return result.data.keys; + } +} +``` + +## Custom Connector Development + +### Plugin Structure + +``` +my-connector/ + ├── manifest.yaml + ├── src/ + │ ├── connector.ts + │ ├── config.ts + │ └── types.ts + └── package.json +``` + +### Manifest + +```yaml +# manifest.yaml +id: my-custom-connector +version: 1.0.0 +name: My Custom Connector +description: Custom connector for XYZ service +author: Your Name + +connector: + typeId: my-service + displayName: My Service + entrypoint: ./src/connector.js + + capabilities: + discovery: true + webhooks: false + streaming: false + batchOperations: false + + config_schema: + type: object + properties: + url: + type: string + format: uri + description: Service URL + timeout: + type: integer + default: 30000 + required: + - url + + credential_types: + - api-key + - oauth2 +``` + +### Implementation + +```typescript +// connector.ts +import { IConnector, ConnectorCapabilities } from "@stella-ops/connector-sdk"; + +export class MyConnector implements IConnector { + readonly typeId = "my-service"; + readonly displayName = "My Service"; + readonly version = "1.0.0"; + readonly capabilities: ConnectorCapabilities = { + discovery: true, + webhooks: false, + streaming: false, + batchOperations: false, + customActions: [] + }; + + async initialize(config: MyConfig): Promise { + // Initialize your connector + } + + async dispose(): Promise { + // Cleanup resources + } + + async ping(config: MyConfig): Promise { + // Check connectivity + } + + async healthCheck(config: MyConfig, creds: Credential): Promise { + // Full health check + } + + async authenticate(config: MyConfig, creds: Credential): Promise { + // Authenticate and return context + } + + async discover( + config: MyConfig, + authContext: AuthContext, + resourceType: string, + filter?: DiscoveryFilter + ): Promise { + // Discover resources + } +} + +// Export connector factory +export default function createConnector(): IConnector { + return new MyConnector(); +} +``` + +## References + +- [Integrations Overview](overview.md) +- [Webhooks](webhooks.md) +- [Plugin System](../modules/plugin-system.md) diff --git a/docs/modules/release-orchestrator/integrations/overview.md b/docs/modules/release-orchestrator/integrations/overview.md new file mode 100644 index 000000000..5574ceec8 --- /dev/null +++ b/docs/modules/release-orchestrator/integrations/overview.md @@ -0,0 +1,412 @@ +# Integrations Overview + +## Purpose + +The Integration Hub (INTHUB) provides a unified interface for connecting Release Orchestrator to external systems including container registries, CI/CD pipelines, notification services, secret stores, and metrics providers. + +## Integration Architecture + +``` + INTEGRATION HUB ARCHITECTURE + + ┌─────────────────────────────────────────────────────────────────────────────┐ + │ INTEGRATION HUB │ + │ │ + │ ┌─────────────────────────────────────────────────────────────────────┐ │ + │ │ INTEGRATION MANAGER │ │ + │ │ │ │ + │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ + │ │ │ Type │ │ Instance │ │ Health │ │ Discovery │ │ │ + │ │ │ Registry │ │ Manager │ │ Monitor │ │ Service │ │ │ + │ │ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │ │ + │ │ │ │ + │ └─────────────────────────────────────────────────────────────────────┘ │ + │ │ │ + │ ▼ │ + │ ┌─────────────────────────────────────────────────────────────────────┐ │ + │ │ CONNECTOR RUNTIME │ │ + │ │ │ │ + │ │ ┌──────────────────────────────────────────────────────────────┐ │ │ + │ │ │ CONNECTOR POOL │ │ │ + │ │ │ │ │ │ + │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ + │ │ │ │ Docker │ │ GitLab │ │ Slack │ │ Vault │ │ │ │ + │ │ │ │ Registry │ │ CI │ │ │ │ │ │ │ │ + │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ + │ │ │ │ │ │ + │ │ └──────────────────────────────────────────────────────────────┘ │ │ + │ │ │ │ + │ └─────────────────────────────────────────────────────────────────────┘ │ + │ │ + └─────────────────────────────────────────────────────────────────────────────┘ + │ + ┌─────────────┬─────────────────┼─────────────────┬─────────────┐ + │ │ │ │ │ + ▼ ▼ ▼ ▼ ▼ + ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ + │Container│ │ CI/CD │ │ Notifi- │ │ Secret │ │ Metrics │ + │Registry │ │ Systems │ │ cations │ │ Stores │ │ Systems │ + └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ +``` + +## Integration Types + +### Container Registries + +| Type ID | Description | Discovery Support | +|---------|-------------|-------------------| +| `docker-registry` | Docker Registry v2 API | Yes | +| `docker-hub` | Docker Hub | Yes | +| `gcr` | Google Container Registry | Yes | +| `ecr` | AWS Elastic Container Registry | Yes | +| `acr` | Azure Container Registry | Yes | +| `ghcr` | GitHub Container Registry | Yes | +| `harbor` | Harbor Registry | Yes | +| `jfrog` | JFrog Artifactory | Yes | +| `nexus` | Sonatype Nexus | Yes | +| `quay` | Quay.io | Yes | + +### CI/CD Systems + +| Type ID | Description | Trigger Support | +|---------|-------------|-----------------| +| `gitlab-ci` | GitLab CI/CD | Yes | +| `github-actions` | GitHub Actions | Yes | +| `jenkins` | Jenkins | Yes | +| `azure-devops` | Azure DevOps Pipelines | Yes | +| `circleci` | CircleCI | Yes | +| `teamcity` | TeamCity | Yes | +| `drone` | Drone CI | Yes | + +### Notification Services + +| Type ID | Description | Features | +|---------|-------------|----------| +| `slack` | Slack | Channels, threads, reactions | +| `teams` | Microsoft Teams | Channels, cards | +| `email` | Email (SMTP) | Templates, attachments | +| `webhook` | Generic Webhook | JSON payloads | +| `pagerduty` | PagerDuty | Incidents, alerts | +| `opsgenie` | OpsGenie | Alerts, on-call | + +### Secret Stores + +| Type ID | Description | Features | +|---------|-------------|----------| +| `hashicorp-vault` | HashiCorp Vault | KV, Transit, PKI | +| `aws-secrets-manager` | AWS Secrets Manager | Rotation, versioning | +| `azure-key-vault` | Azure Key Vault | Keys, secrets, certs | +| `gcp-secret-manager` | GCP Secret Manager | Versions, labels | + +### Metrics & Monitoring + +| Type ID | Description | Use Case | +|---------|-------------|----------| +| `prometheus` | Prometheus | Canary metrics | +| `datadog` | Datadog | APM, logs, metrics | +| `newrelic` | New Relic | APM, infra monitoring | +| `dynatrace` | Dynatrace | Full-stack monitoring | + +## Integration Configuration + +### Integration Entity + +```typescript +interface Integration { + id: UUID; + tenantId: UUID; + typeId: string; // e.g., "docker-registry" + name: string; // Display name + description?: string; + + // Connection configuration + config: IntegrationConfig; + + // Credential reference (stored in vault) + credentialRef: string; + + // Health tracking + healthStatus: "healthy" | "degraded" | "unhealthy" | "unknown"; + lastHealthCheck?: DateTime; + + // Metadata + labels: Record; + createdAt: DateTime; + updatedAt: DateTime; +} + +interface IntegrationConfig { + // Common fields + url?: string; + timeout?: number; + retries?: number; + + // Type-specific fields + [key: string]: any; +} +``` + +### Type-Specific Configuration + +```typescript +// Docker Registry +interface DockerRegistryConfig extends IntegrationConfig { + url: string; // https://registry.example.com + repository?: string; // Optional default repository + insecureSkipVerify?: boolean; // Skip TLS verification +} + +// GitLab CI +interface GitLabCIConfig extends IntegrationConfig { + url: string; // https://gitlab.example.com + projectId: string; // Project ID or path + defaultBranch?: string; // Default ref for triggers +} + +// Slack +interface SlackConfig extends IntegrationConfig { + workspace?: string; // Workspace identifier + defaultChannel?: string; // Default channel for notifications + iconEmoji?: string; // Bot icon +} + +// HashiCorp Vault +interface VaultConfig extends IntegrationConfig { + url: string; // https://vault.example.com + namespace?: string; // Vault namespace + mountPath: string; // Secret mount path + authMethod: "token" | "approle" | "kubernetes"; +} +``` + +## Credential Management + +Credentials are never stored in the Release Orchestrator database. Instead, references to external secret stores are used. + +### Credential Reference Format + +``` +vault://vault-integration-id/path/to/secret#key + └─────────┬────────┘ └─────┬─────┘ └┬┘ + Vault ID Secret path Key +``` + +### Credential Types + +```typescript +type CredentialType = + | "basic" // Username/password + | "token" // Bearer token + | "api-key" // API key + | "oauth2" // OAuth2 credentials + | "service-account" // GCP/K8s service account + | "certificate"; // Client certificate + +interface CredentialReference { + type: CredentialType; + ref: string; // Vault reference +} + +// Examples +const dockerCreds: CredentialReference = { + type: "basic", + ref: "vault://vault-1/docker/registry.example.com#credentials" +}; + +const gitlabToken: CredentialReference = { + type: "token", + ref: "vault://vault-1/ci/gitlab#access_token" +}; +``` + +## Health Monitoring + +### Health Check Types + +| Check Type | Description | Frequency | +|------------|-------------|-----------| +| `connectivity` | TCP/HTTP connectivity | 1 min | +| `authentication` | Credential validity | 5 min | +| `functionality` | Full operation test | 15 min | + +### Health Check Flow + +```typescript +interface HealthCheckResult { + integrationId: UUID; + checkType: string; + status: "healthy" | "degraded" | "unhealthy"; + latencyMs: number; + message?: string; + checkedAt: DateTime; +} + +class IntegrationHealthMonitor { + async checkHealth(integration: Integration): Promise { + const connector = this.connectorPool.get(integration.typeId); + const startTime = Date.now(); + + try { + // Connectivity check + await connector.ping(integration.config); + + // Authentication check + const creds = await this.fetchCredentials(integration.credentialRef); + await connector.authenticate(integration.config, creds); + + return { + integrationId: integration.id, + checkType: "full", + status: "healthy", + latencyMs: Date.now() - startTime, + checkedAt: new Date() + }; + } catch (error) { + return { + integrationId: integration.id, + checkType: "full", + status: this.classifyError(error), + latencyMs: Date.now() - startTime, + message: error.message, + checkedAt: new Date() + }; + } + } +} +``` + +## Discovery Service + +Integrations can discover resources from connected systems. + +### Discovery Operations + +```typescript +interface DiscoveryService { + // Discover available repositories + discoverRepositories(integrationId: UUID): Promise; + + // Discover tags/versions + discoverTags(integrationId: UUID, repository: string): Promise; + + // Discover pipelines + discoverPipelines(integrationId: UUID): Promise; + + // Discover notification channels + discoverChannels(integrationId: UUID): Promise; +} + +// Example: Discover Docker repositories +const repos = await discoveryService.discoverRepositories(dockerIntegrationId); +// Returns: [{ name: "myapp", tags: ["latest", "v1.0.0", ...] }, ...] +``` + +### Discovery Caching + +```typescript +interface DiscoveryCache { + key: string; // integration_id:resource_type + data: any; + discoveredAt: DateTime; + ttlSeconds: number; +} + +// Cache TTLs by resource type +const cacheTTLs = { + repositories: 3600, // 1 hour + tags: 300, // 5 minutes + pipelines: 3600, // 1 hour + channels: 86400 // 24 hours +}; +``` + +## API Reference + +### Create Integration + +```http +POST /api/v1/integrations +Content-Type: application/json + +{ + "typeId": "docker-registry", + "name": "Production Registry", + "config": { + "url": "https://registry.example.com", + "repository": "myorg" + }, + "credentialRef": "vault://vault-1/docker/prod-registry#credentials", + "labels": { + "environment": "production" + } +} +``` + +### Test Integration + +```http +POST /api/v1/integrations/{id}/test +``` + +Response: +```json +{ + "success": true, + "data": { + "connectivityTest": { "status": "passed", "latencyMs": 45 }, + "authenticationTest": { "status": "passed", "latencyMs": 120 }, + "functionalityTest": { "status": "passed", "latencyMs": 230 } + } +} +``` + +### Discover Resources + +```http +POST /api/v1/integrations/{id}/discover +Content-Type: application/json + +{ + "resourceType": "repositories", + "filter": { + "namePattern": "myapp-*" + } +} +``` + +## Error Handling + +### Integration Errors + +| Error Code | Description | Retry Strategy | +|------------|-------------|----------------| +| `INTEGRATION_NOT_FOUND` | Integration ID not found | No retry | +| `INTEGRATION_UNHEALTHY` | Integration health check failing | Backoff retry | +| `CREDENTIAL_FETCH_FAILED` | Cannot fetch credentials | Retry with backoff | +| `CONNECTION_REFUSED` | Cannot connect to endpoint | Retry with backoff | +| `AUTHENTICATION_FAILED` | Invalid credentials | No retry | +| `RATE_LIMITED` | Too many requests | Retry after delay | + +### Circuit Breaker + +```typescript +interface CircuitBreakerConfig { + failureThreshold: number; // Failures before opening + successThreshold: number; // Successes to close + timeout: number; // Time in open state (ms) +} + +// Default configuration +const defaultCircuitBreaker: CircuitBreakerConfig = { + failureThreshold: 5, + successThreshold: 3, + timeout: 60000 +}; +``` + +## References + +- [Connectors](connectors.md) +- [Webhooks](webhooks.md) +- [CI/CD Integration](ci-cd.md) +- [Integration Hub Module](../modules/integration-hub.md) diff --git a/docs/modules/release-orchestrator/integrations/webhooks.md b/docs/modules/release-orchestrator/integrations/webhooks.md new file mode 100644 index 000000000..23aae6ee2 --- /dev/null +++ b/docs/modules/release-orchestrator/integrations/webhooks.md @@ -0,0 +1,627 @@ +# Webhooks + +## Overview + +Release Orchestrator supports both inbound webhooks (receiving events from external systems) and outbound webhooks (sending events to external systems). + +## Inbound Webhooks + +### Webhook Types + +| Type | Source | Triggers | +|------|--------|----------| +| `registry-push` | Container registries | Image push events | +| `ci-pipeline` | CI/CD systems | Pipeline completion | +| `github-app` | GitHub | PR, push, workflow events | +| `gitlab-webhook` | GitLab | Pipeline, push, MR events | +| `generic` | Any system | Custom payloads | + +### Registry Push Webhook + +Receives events when new images are pushed to registries. + +``` +POST /api/v1/webhooks/registry/{integrationId} +Content-Type: application/json + +# Docker Hub +{ + "push_data": { + "tag": "v1.2.0", + "images": ["sha256:abc123..."], + "pushed_at": 1704067200 + }, + "repository": { + "name": "myapp", + "namespace": "myorg", + "repo_url": "https://hub.docker.com/r/myorg/myapp" + } +} + +# Harbor +{ + "type": "PUSH_ARTIFACT", + "occur_at": 1704067200, + "event_data": { + "repository": { + "name": "myapp", + "repo_full_name": "myorg/myapp" + }, + "resources": [{ + "digest": "sha256:abc123...", + "tag": "v1.2.0" + }] + } +} +``` + +### Webhook Handler + +```typescript +interface WebhookHandler { + handleRegistryPush( + integrationId: UUID, + payload: RegistryPushPayload + ): Promise; + + handleCIPipeline( + integrationId: UUID, + payload: CIPipelinePayload + ): Promise; +} + +class RegistryWebhookHandler implements WebhookHandler { + async handleRegistryPush( + integrationId: UUID, + payload: RegistryPushPayload + ): Promise { + // Normalize payload from different registries + const normalized = this.normalizePayload(payload); + + // Find matching component + const component = await this.componentRegistry.findByRepository( + normalized.repository + ); + + if (!component) { + return { + success: true, + action: "ignored", + reason: "No matching component" + }; + } + + // Update version map + await this.versionManager.addVersion({ + componentId: component.id, + tag: normalized.tag, + digest: normalized.digest, + channel: this.determineChannel(normalized.tag) + }); + + // Check for auto-release triggers + const triggers = await this.getTriggers(component.id, normalized.tag); + for (const trigger of triggers) { + await this.triggerRelease(trigger, normalized); + } + + return { + success: true, + action: "processed", + componentId: component.id, + versionsAdded: 1, + triggersActivated: triggers.length + }; + } + + private normalizePayload(payload: any): NormalizedPushEvent { + // Detect registry type and normalize + if (payload.push_data) { + // Docker Hub format + return { + repository: `${payload.repository.namespace}/${payload.repository.name}`, + tag: payload.push_data.tag, + digest: payload.push_data.images[0], + pushedAt: new Date(payload.push_data.pushed_at * 1000) + }; + } + + if (payload.type === "PUSH_ARTIFACT") { + // Harbor format + return { + repository: payload.event_data.repository.repo_full_name, + tag: payload.event_data.resources[0].tag, + digest: payload.event_data.resources[0].digest, + pushedAt: new Date(payload.occur_at * 1000) + }; + } + + // Generic format + return payload as NormalizedPushEvent; + } +} +``` + +### Webhook Authentication + +```typescript +interface WebhookAuth { + // Signature validation + validateSignature( + payload: Buffer, + signature: string, + secret: string, + algorithm: SignatureAlgorithm + ): boolean; + + // Token validation + validateToken( + token: string, + expectedToken: string + ): boolean; +} + +type SignatureAlgorithm = "hmac-sha256" | "hmac-sha1"; + +class WebhookAuthenticator implements WebhookAuth { + validateSignature( + payload: Buffer, + signature: string, + secret: string, + algorithm: SignatureAlgorithm + ): boolean { + const algo = algorithm === "hmac-sha256" ? "sha256" : "sha1"; + const expected = crypto + .createHmac(algo, secret) + .update(payload) + .digest("hex"); + + // Constant-time comparison + return crypto.timingSafeEqual( + Buffer.from(signature), + Buffer.from(expected) + ); + } +} +``` + +### Webhook Configuration + +```typescript +interface WebhookConfig { + id: UUID; + integrationId: UUID; + type: WebhookType; + + // Security + secretRef: string; // Vault reference for signature secret + signatureHeader?: string; // Header containing signature + signatureAlgorithm?: SignatureAlgorithm; + + // Processing + enabled: boolean; + filters?: WebhookFilter[]; // Filter events + + // Retry + retryPolicy: RetryPolicy; +} + +interface WebhookFilter { + field: string; // JSONPath to field + operator: "equals" | "contains" | "matches"; + value: string; +} + +// Example: Only process tags matching semver +const semverFilter: WebhookFilter = { + field: "$.tag", + operator: "matches", + value: "^v\\d+\\.\\d+\\.\\d+$" +}; +``` + +## Outbound Webhooks + +### Event Types + +| Event | Description | Payload | +|-------|-------------|---------| +| `release.created` | New release created | Release details | +| `promotion.requested` | Promotion requested | Promotion details | +| `promotion.approved` | Promotion approved | Approval details | +| `promotion.rejected` | Promotion rejected | Rejection details | +| `deployment.started` | Deployment started | Job details | +| `deployment.completed` | Deployment completed | Job details, results | +| `deployment.failed` | Deployment failed | Job details, error | +| `rollback.initiated` | Rollback initiated | Rollback details | + +### Webhook Subscription + +```typescript +interface WebhookSubscription { + id: UUID; + tenantId: UUID; + name: string; + + // Target + url: string; + method: "POST" | "PUT"; + headers?: Record; + + // Authentication + authType: "none" | "basic" | "bearer" | "signature"; + credentialRef?: string; + signatureSecret?: string; + + // Events + events: string[]; // Event types to subscribe + filters?: EventFilter[]; // Filter events + + // Delivery + retryPolicy: RetryPolicy; + timeout: number; + + // Status + enabled: boolean; + lastDelivery?: DateTime; + lastStatus?: number; +} + +interface EventFilter { + field: string; + operator: string; + value: any; +} +``` + +### Webhook Delivery + +```typescript +interface WebhookPayload { + id: string; // Delivery ID + timestamp: string; // ISO-8601 + event: string; // Event type + tenantId: string; + data: Record; // Event-specific data +} + +class WebhookDeliveryService { + async deliver( + subscription: WebhookSubscription, + event: DomainEvent + ): Promise { + const payload: WebhookPayload = { + id: uuidv4(), + timestamp: new Date().toISOString(), + event: event.type, + tenantId: subscription.tenantId, + data: this.buildEventData(event) + }; + + const headers = this.buildHeaders(subscription, payload); + const body = JSON.stringify(payload); + + // Attempt delivery with retries + return this.deliverWithRetry(subscription, headers, body); + } + + private buildHeaders( + subscription: WebhookSubscription, + payload: WebhookPayload + ): Record { + const headers: Record = { + "Content-Type": "application/json", + "X-Stella-Event": payload.event, + "X-Stella-Delivery": payload.id, + "X-Stella-Timestamp": payload.timestamp, + ...subscription.headers + }; + + // Add signature if configured + if (subscription.authType === "signature") { + const signature = this.computeSignature( + JSON.stringify(payload), + subscription.signatureSecret! + ); + headers["X-Stella-Signature"] = signature; + } + + return headers; + } + + private async deliverWithRetry( + subscription: WebhookSubscription, + headers: Record, + body: string + ): Promise { + const policy = subscription.retryPolicy; + let lastError: Error | undefined; + + for (let attempt = 0; attempt <= policy.maxRetries; attempt++) { + try { + const response = await fetch(subscription.url, { + method: subscription.method, + headers, + body, + signal: AbortSignal.timeout(subscription.timeout) + }); + + // Record delivery + await this.recordDelivery(subscription.id, { + attempt, + statusCode: response.status, + success: response.ok + }); + + if (response.ok) { + return { success: true, statusCode: response.status, attempts: attempt + 1 }; + } + + // Non-retryable status codes + if (response.status >= 400 && response.status < 500) { + return { + success: false, + statusCode: response.status, + attempts: attempt + 1, + error: `Client error: ${response.status}` + }; + } + + lastError = new Error(`Server error: ${response.status}`); + } catch (error) { + lastError = error as Error; + } + + // Wait before retry + if (attempt < policy.maxRetries) { + const delay = this.calculateDelay(policy, attempt); + await sleep(delay); + } + } + + return { + success: false, + attempts: policy.maxRetries + 1, + error: lastError?.message + }; + } +} +``` + +### Delivery Logging + +```typescript +interface WebhookDeliveryLog { + id: UUID; + subscriptionId: UUID; + deliveryId: string; + + // Request + url: string; + method: string; + headers: Record; + body: string; + + // Response + statusCode?: number; + responseBody?: string; + responseTime: number; + + // Result + success: boolean; + attempt: number; + error?: string; + + // Timing + createdAt: DateTime; +} +``` + +## Webhook API + +### Register Subscription + +```http +POST /api/v1/webhook-subscriptions +Content-Type: application/json + +{ + "name": "Deployment Notifications", + "url": "https://api.example.com/webhooks/stella", + "method": "POST", + "authType": "signature", + "signatureSecret": "my-secret-key", + "events": [ + "deployment.started", + "deployment.completed", + "deployment.failed" + ], + "filters": [ + { + "field": "data.environment.name", + "operator": "equals", + "value": "production" + } + ], + "retryPolicy": { + "maxRetries": 3, + "backoffType": "exponential", + "backoffSeconds": 10 + }, + "timeout": 30000 +} +``` + +### Test Subscription + +```http +POST /api/v1/webhook-subscriptions/{id}/test +Content-Type: application/json + +{ + "event": "deployment.completed" +} +``` + +Response: +```json +{ + "success": true, + "data": { + "deliveryId": "d1234567-...", + "statusCode": 200, + "responseTime": 245, + "response": "OK" + } +} +``` + +### List Deliveries + +```http +GET /api/v1/webhook-subscriptions/{id}/deliveries?page=1&pageSize=20 +``` + +## Event Payloads + +### deployment.completed + +```json +{ + "id": "delivery-uuid", + "timestamp": "2026-01-09T10:30:00Z", + "event": "deployment.completed", + "tenantId": "tenant-uuid", + "data": { + "deploymentJob": { + "id": "job-uuid", + "status": "completed" + }, + "release": { + "id": "release-uuid", + "name": "myapp-v1.2.0", + "components": [ + { + "name": "api", + "digest": "sha256:abc123..." + } + ] + }, + "environment": { + "id": "env-uuid", + "name": "production" + }, + "promotion": { + "id": "promo-uuid", + "requestedBy": "user@example.com" + }, + "targets": [ + { + "id": "target-uuid", + "name": "prod-host-1", + "status": "succeeded" + } + ], + "timing": { + "startedAt": "2026-01-09T10:25:00Z", + "completedAt": "2026-01-09T10:30:00Z", + "durationSeconds": 300 + } + } +} +``` + +### promotion.requested + +```json +{ + "id": "delivery-uuid", + "timestamp": "2026-01-09T10:00:00Z", + "event": "promotion.requested", + "tenantId": "tenant-uuid", + "data": { + "promotion": { + "id": "promo-uuid", + "status": "pending_approval" + }, + "release": { + "id": "release-uuid", + "name": "myapp-v1.2.0" + }, + "sourceEnvironment": { + "id": "staging-uuid", + "name": "staging" + }, + "targetEnvironment": { + "id": "prod-uuid", + "name": "production" + }, + "requestedBy": { + "id": "user-uuid", + "email": "user@example.com", + "name": "John Doe" + }, + "approvalRequired": { + "count": 2, + "currentApprovals": 0 + } + } +} +``` + +## Security Considerations + +### Signature Verification + +Receivers should verify webhook signatures: + +```python +import hmac +import hashlib + +def verify_signature(payload: bytes, signature: str, secret: str) -> bool: + expected = hmac.new( + secret.encode(), + payload, + hashlib.sha256 + ).hexdigest() + + return hmac.compare_digest(signature, expected) + +# In webhook handler +@app.route("/webhooks/stella", methods=["POST"]) +def handle_webhook(): + signature = request.headers.get("X-Stella-Signature") + if not verify_signature(request.data, signature, WEBHOOK_SECRET): + return "Invalid signature", 401 + + payload = request.json + # Process event... +``` + +### IP Allowlisting + +Configure firewall rules to only accept webhooks from Stella IP ranges: +- Document IP ranges in deployment configuration +- Use VPN or private networking where possible + +### Replay Protection + +Check delivery timestamps to prevent replay attacks: + +```python +from datetime import datetime, timedelta + +MAX_TIMESTAMP_AGE = timedelta(minutes=5) + +def check_timestamp(timestamp_str: str) -> bool: + timestamp = datetime.fromisoformat(timestamp_str.replace("Z", "+00:00")) + now = datetime.now(timestamp.tzinfo) + return abs(now - timestamp) < MAX_TIMESTAMP_AGE +``` + +## References + +- [Integrations Overview](overview.md) +- [Connectors](connectors.md) +- [CI/CD Integration](ci-cd.md) diff --git a/docs/modules/release-orchestrator/modules/agents.md b/docs/modules/release-orchestrator/modules/agents.md new file mode 100644 index 000000000..5b07ab936 --- /dev/null +++ b/docs/modules/release-orchestrator/modules/agents.md @@ -0,0 +1,597 @@ +# AGENTS: Deployment Agents + +**Purpose**: Lightweight deployment agents for target execution. + +## Agent Types + +| Agent Type | Transport | Target Types | +|------------|-----------|--------------| +| `agent-docker` | gRPC | Docker hosts | +| `agent-compose` | gRPC | Docker Compose hosts | +| `agent-ssh` | SSH | Linux remote hosts | +| `agent-winrm` | WinRM | Windows remote hosts | +| `agent-ecs` | AWS API | AWS ECS services | +| `agent-nomad` | Nomad API | HashiCorp Nomad jobs | + +## Modules + +### Module: `agent-core` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Shared agent runtime; task execution framework | +| **Protocol** | gRPC for communication with Stella Core | +| **Security** | mTLS authentication; short-lived JWT for tasks | + +**Agent Lifecycle**: +1. Agent starts with registration token +2. Agent registers with capabilities and labels +3. Agent sends heartbeats (default: 30s interval) +4. Agent receives tasks from Stella Core +5. Agent reports task completion/failure + +**Agent Task Protocol**: +```typescript +// Task assignment (Core → Agent) +interface AgentTask { + id: UUID; + type: TaskType; + targetId: UUID; + payload: TaskPayload; + credentials: EncryptedCredentials; + timeout: number; + priority: TaskPriority; + idempotencyKey: string; + assignedAt: DateTime; + expiresAt: DateTime; +} + +type TaskType = + | "deploy" + | "rollback" + | "health-check" + | "inspect" + | "execute-command" + | "upload-files" + | "write-sticker" + | "read-sticker"; + +interface DeployTaskPayload { + image: string; + digest: string; + config: DeployConfig; + artifacts: ArtifactReference[]; + previousDigest?: string; + hooks: { + preDeploy?: HookConfig; + postDeploy?: HookConfig; + }; +} + +// Task result (Agent → Core) +interface TaskResult { + taskId: UUID; + success: boolean; + startedAt: DateTime; + completedAt: DateTime; + + // Success details + outputs?: Record; + artifacts?: ArtifactReference[]; + + // Failure details + error?: string; + errorType?: string; + retriable?: boolean; + + // Logs + logs: string; + + // Metrics + metrics: { + pullDurationMs?: number; + deployDurationMs?: number; + healthCheckDurationMs?: number; + }; +} +``` + +--- + +### Module: `agent-docker` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Docker container deployment | +| **Dependencies** | Docker Engine API | +| **Capabilities** | `docker.deploy`, `docker.rollback`, `docker.inspect` | + +**Docker Agent Implementation**: +```typescript +class DockerAgent implements TargetExecutor { + private docker: Docker; + + async deploy(task: DeployTaskPayload): Promise { + const { image, digest, config, previousDigest } = task; + const containerName = config.containerName; + + // 1. Pull image and verify digest + this.log(`Pulling image ${image}@${digest}`); + await this.docker.pull(image, { digest }); + + const pulledDigest = await this.getImageDigest(image); + if (pulledDigest !== digest) { + throw new DigestMismatchError( + `Expected digest ${digest}, got ${pulledDigest}. Possible tampering detected.` + ); + } + + // 2. Run pre-deploy hook + if (task.hooks?.preDeploy) { + await this.runHook(task.hooks.preDeploy, "pre-deploy"); + } + + // 3. Stop and rename existing container + const existingContainer = await this.findContainer(containerName); + if (existingContainer) { + this.log(`Stopping existing container ${containerName}`); + await existingContainer.stop({ t: 10 }); + await existingContainer.rename(`${containerName}-previous-${Date.now()}`); + } + + // 4. Create new container + this.log(`Creating container ${containerName} from ${image}@${digest}`); + const container = await this.docker.createContainer({ + name: containerName, + Image: `${image}@${digest}`, // Always use digest, not tag + Env: this.buildEnvVars(config.environment), + HostConfig: { + PortBindings: this.buildPortBindings(config.ports), + Binds: this.buildBindMounts(config.volumes), + RestartPolicy: { Name: config.restartPolicy || "unless-stopped" }, + Memory: config.memoryLimit, + CpuQuota: config.cpuLimit, + }, + Labels: { + "stella.release.id": config.releaseId, + "stella.release.name": config.releaseName, + "stella.digest": digest, + "stella.deployed.at": new Date().toISOString(), + }, + }); + + // 5. Start container + this.log(`Starting container ${containerName}`); + await container.start(); + + // 6. Wait for container to be healthy + if (config.healthCheck) { + this.log(`Waiting for container health check`); + const healthy = await this.waitForHealthy(container, config.healthCheck.timeout); + if (!healthy) { + await this.rollbackContainer(containerName, existingContainer); + throw new HealthCheckFailedError(`Container ${containerName} failed health check`); + } + } + + // 7. Run post-deploy hook + if (task.hooks?.postDeploy) { + await this.runHook(task.hooks.postDeploy, "post-deploy"); + } + + // 8. Cleanup previous container + if (existingContainer && config.cleanupPrevious !== false) { + this.log(`Removing previous container`); + await existingContainer.remove({ force: true }); + } + + return { + success: true, + containerId: container.id, + previousDigest: previousDigest, + }; + } + + async rollback(task: RollbackTaskPayload): Promise { + const { containerName, targetDigest } = task; + + if (targetDigest) { + // Deploy specific digest + return this.deploy({ ...task, digest: targetDigest }); + } + + // Find and restore previous container + const previousContainer = await this.findContainer(`${containerName}-previous-*`); + if (!previousContainer) { + throw new RollbackError(`No previous container found for ${containerName}`); + } + + const currentContainer = await this.findContainer(containerName); + if (currentContainer) { + await currentContainer.stop({ t: 10 }); + await currentContainer.rename(`${containerName}-failed-${Date.now()}`); + } + + await previousContainer.rename(containerName); + await previousContainer.start(); + + return { success: true, containerId: previousContainer.id }; + } + + async writeSticker(sticker: VersionSticker): Promise { + const stickerPath = this.config.stickerPath || "/var/stella/version.json"; + const stickerContent = JSON.stringify(sticker, null, 2); + + if (this.config.stickerLocation === "volume") { + await this.docker.run("alpine", [ + "sh", "-c", + `echo '${stickerContent}' > ${stickerPath}` + ], { + HostConfig: { Binds: [`${this.config.stickerVolume}:/var/stella`] } + }); + } else { + fs.writeFileSync(stickerPath, stickerContent); + } + } +} +``` + +--- + +### Module: `agent-compose` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Docker Compose stack deployment | +| **Dependencies** | Docker Compose CLI | +| **Capabilities** | `compose.deploy`, `compose.rollback`, `compose.inspect` | + +**Compose Agent Implementation**: +```typescript +class ComposeAgent implements TargetExecutor { + async deploy(task: DeployTaskPayload): Promise { + const { artifacts, config } = task; + const deployDir = config.deploymentDirectory; + + // 1. Write compose lock file + const composeLock = artifacts.find(a => a.type === "compose_lock"); + const composeContent = await this.fetchArtifact(composeLock); + const composePath = path.join(deployDir, "compose.stella.lock.yml"); + await fs.writeFile(composePath, composeContent); + + // 2. Run pre-deploy hook + if (task.hooks?.preDeploy) { + await this.runHook(task.hooks.preDeploy, deployDir); + } + + // 3. Pull images + this.log("Pulling images..."); + await this.runCompose(deployDir, ["pull"]); + + // 4. Verify digests + await this.verifyDigests(composePath, config.expectedDigests); + + // 5. Deploy + this.log("Deploying services..."); + await this.runCompose(deployDir, ["up", "-d", "--remove-orphans", "--force-recreate"]); + + // 6. Wait for services to be healthy + if (config.healthCheck) { + const healthy = await this.waitForServicesHealthy(deployDir, config.healthCheck.timeout); + if (!healthy) { + await this.rollbackToBackup(deployDir); + throw new HealthCheckFailedError("Services failed health check"); + } + } + + // 7. Run post-deploy hook + if (task.hooks?.postDeploy) { + await this.runHook(task.hooks.postDeploy, deployDir); + } + + // 8. Write version sticker + await this.writeSticker(config.sticker, deployDir); + + return { success: true }; + } + + private async verifyDigests( + composePath: string, + expectedDigests: Record + ): Promise { + const composeContent = yaml.parse(await fs.readFile(composePath, "utf-8")); + + for (const [service, expectedDigest] of Object.entries(expectedDigests)) { + const serviceConfig = composeContent.services[service]; + if (!serviceConfig) { + throw new Error(`Service ${service} not found in compose file`); + } + + const image = serviceConfig.image; + if (!image.includes("@sha256:")) { + throw new Error(`Service ${service} image not pinned to digest: ${image}`); + } + + const actualDigest = image.split("@")[1]; + if (actualDigest !== expectedDigest) { + throw new DigestMismatchError( + `Service ${service}: expected ${expectedDigest}, got ${actualDigest}` + ); + } + } + } +} +``` + +--- + +### Module: `agent-ssh` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | SSH remote execution (agentless) | +| **Dependencies** | SSH client library | +| **Capabilities** | `ssh.deploy`, `ssh.execute`, `ssh.upload` | + +**SSH Remote Executor**: +```typescript +class SSHRemoteExecutor implements TargetExecutor { + async connect(config: SSHConnectionConfig): Promise { + const privateKey = await this.secrets.getSecret(config.privateKeyRef); + + this.ssh = new SSHClient(); + await this.ssh.connect({ + host: config.host, + port: config.port || 22, + username: config.username, + privateKey: privateKey.value, + readyTimeout: config.connectionTimeout || 30000, + }); + } + + async deploy(task: DeployTaskPayload): Promise { + const { artifacts, config } = task; + const deployDir = config.deploymentDirectory; + + try { + // 1. Ensure deployment directory exists + await this.exec(`mkdir -p ${deployDir}`); + await this.exec(`mkdir -p ${deployDir}/.stella-backup`); + + // 2. Backup current deployment + await this.exec(`cp -r ${deployDir}/* ${deployDir}/.stella-backup/ 2>/dev/null || true`); + + // 3. Upload artifacts + for (const artifact of artifacts) { + const content = await this.fetchArtifact(artifact); + const remotePath = path.join(deployDir, artifact.name); + await this.uploadFile(content, remotePath); + } + + // 4. Run pre-deploy hook + if (task.hooks?.preDeploy) { + await this.runRemoteHook(task.hooks.preDeploy, deployDir); + } + + // 5. Execute deployment script + const deployScript = artifacts.find(a => a.type === "deploy_script"); + if (deployScript) { + const scriptPath = path.join(deployDir, deployScript.name); + await this.exec(`chmod +x ${scriptPath}`); + const result = await this.exec(scriptPath, { cwd: deployDir, timeout: config.deploymentTimeout }); + if (result.exitCode !== 0) { + throw new DeploymentError(`Deploy script failed: ${result.stderr}`); + } + } + + // 6. Run post-deploy hook + if (task.hooks?.postDeploy) { + await this.runRemoteHook(task.hooks.postDeploy, deployDir); + } + + // 7. Health check + if (config.healthCheck) { + const healthy = await this.runHealthCheck(config.healthCheck); + if (!healthy) { + await this.rollback(task); + throw new HealthCheckFailedError("Health check failed"); + } + } + + // 8. Write version sticker + await this.writeSticker(config.sticker, deployDir); + + // 9. Cleanup backup + await this.exec(`rm -rf ${deployDir}/.stella-backup`); + + return { success: true }; + } finally { + this.ssh.end(); + } + } +} +``` + +--- + +### Module: `agent-winrm` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | WinRM remote execution (agentless) | +| **Dependencies** | WinRM client library | +| **Capabilities** | `winrm.deploy`, `winrm.execute`, `winrm.upload` | +| **Authentication** | NTLM, Kerberos, Basic | + +--- + +### Module: `agent-ecs` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | AWS ECS service deployment | +| **Dependencies** | AWS SDK | +| **Capabilities** | `ecs.deploy`, `ecs.rollback`, `ecs.inspect` | + +--- + +### Module: `agent-nomad` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | HashiCorp Nomad job deployment | +| **Dependencies** | Nomad API client | +| **Capabilities** | `nomad.deploy`, `nomad.rollback`, `nomad.inspect` | + +--- + +## Agent Security Model + +### Registration Flow + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ AGENT REGISTRATION FLOW │ +│ │ +│ 1. Admin generates registration token (one-time use) │ +│ POST /api/v1/admin/agent-tokens │ +│ → { token: "reg_xxx", expiresAt: "..." } │ +│ │ +│ 2. Agent starts with registration token │ +│ ./stella-agent --register --token=reg_xxx │ +│ │ +│ 3. Agent requests mTLS certificate │ +│ POST /api/v1/agents/register │ +│ Headers: X-Registration-Token: reg_xxx │ +│ Body: { name, version, capabilities, csr } │ +│ → { agentId, certificate, caCertificate } │ +│ │ +│ 4. Agent establishes mTLS connection │ +│ Uses issued certificate for all subsequent requests │ +│ │ +│ 5. Agent requests short-lived JWT for task execution │ +│ POST /api/v1/agents/token (over mTLS) │ +│ → { token, expiresIn: 3600 } // 1 hour │ +│ │ +│ 6. Agent refreshes token before expiration │ +│ Token refresh only over mTLS connection │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### Communication Security + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ AGENT COMMUNICATION SECURITY │ +│ │ +│ ┌──────────────┐ ┌──────────────┐ │ +│ │ AGENT │ │ STELLA CORE │ │ +│ └──────┬───────┘ └──────┬───────┘ │ +│ │ │ │ +│ │ mTLS (mutual TLS) │ │ +│ │ - Agent cert signed by Stella CA │ │ +│ │ - Server cert verified by Agent │ │ +│ │ - TLS 1.3 only │ │ +│ │ - Perfect forward secrecy │ │ +│ │◄───────────────────────────────────────►│ │ +│ │ │ │ +│ │ Encrypted payload │ │ +│ │ - Task payloads encrypted with │ │ +│ │ agent-specific key │ │ +│ │ - Logs encrypted in transit │ │ +│ │◄───────────────────────────────────────►│ │ +│ │ │ │ +│ │ Heartbeat + capability refresh │ │ +│ │ - Every 30 seconds │ │ +│ │ - Signed with agent key │ │ +│ │─────────────────────────────────────────►│ │ +│ │ │ │ +│ │ Task assignment │ │ +│ │ - Contains short-lived credentials │ │ +│ │ - Scoped to specific target │ │ +│ │ - Expires after task timeout │ │ +│ │◄─────────────────────────────────────────│ │ +│ │ │ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Database Schema + +```sql +-- Agents +CREATE TABLE release.agents ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + name VARCHAR(255) NOT NULL, + version VARCHAR(50) NOT NULL, + capabilities JSONB NOT NULL DEFAULT '[]', + labels JSONB NOT NULL DEFAULT '{}', + status VARCHAR(50) NOT NULL DEFAULT 'offline' CHECK (status IN ( + 'online', 'offline', 'degraded' + )), + last_heartbeat TIMESTAMPTZ, + resource_usage JSONB, + certificate_fingerprint VARCHAR(64), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_agents_tenant ON release.agents(tenant_id); +CREATE INDEX idx_agents_status ON release.agents(status); +CREATE INDEX idx_agents_capabilities ON release.agents USING GIN (capabilities); +``` + +--- + +## API Endpoints + +```yaml +# Agent Registration +POST /api/v1/agents/register + Headers: X-Registration-Token: {token} + Body: { name, version, capabilities, csr } + Response: { agentId, certificate, caCertificate } + +# Agent Management +GET /api/v1/agents + Query: ?status={online|offline|degraded}&capability={type} + Response: Agent[] + +GET /api/v1/agents/{id} + Response: Agent + +PUT /api/v1/agents/{id} + Body: { labels?, capabilities? } + Response: Agent + +DELETE /api/v1/agents/{id} + Response: { deleted: true } + +# Agent Communication +POST /api/v1/agents/{id}/heartbeat + Body: { status, resourceUsage, capabilities } + Response: { tasks: AgentTask[] } + +POST /api/v1/agents/{id}/tasks/{taskId}/complete + Body: { success, result, logs } + Response: { acknowledged: true } + +# WebSocket for real-time task stream +WS /api/v1/agents/{id}/task-stream + Messages: + - { type: "task_assigned", task: AgentTask } + - { type: "task_cancelled", taskId } +``` + +--- + +## References + +- [Module Overview](overview.md) +- [Deploy Orchestrator](deploy-orchestrator.md) +- [Agent Security](../security/agent-security.md) +- [API Documentation](../api/agents.md) diff --git a/docs/modules/release-orchestrator/modules/deploy-orchestrator.md b/docs/modules/release-orchestrator/modules/deploy-orchestrator.md new file mode 100644 index 000000000..4d67023df --- /dev/null +++ b/docs/modules/release-orchestrator/modules/deploy-orchestrator.md @@ -0,0 +1,477 @@ +# DEPLOY: Deployment Execution + +**Purpose**: Orchestrate deployment jobs, execute on targets, manage rollbacks, and generate artifacts. + +## Modules + +### Module: `deploy-orchestrator` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Deployment job coordination; strategy execution | +| **Dependencies** | `target-executor`, `artifact-generator`, `agent-manager` | +| **Data Entities** | `DeploymentJob`, `DeploymentTask` | +| **Events Produced** | `deployment.started`, `deployment.task_started`, `deployment.task_completed`, `deployment.completed`, `deployment.failed` | + +**Deployment Job Entity**: +```typescript +interface DeploymentJob { + id: UUID; + tenantId: UUID; + promotionId: UUID; + releaseId: UUID; + environmentId: UUID; + status: DeploymentStatus; + strategy: DeploymentStrategy; + startedAt: DateTime; + completedAt: DateTime; + artifacts: GeneratedArtifact[]; + rollbackOf: UUID | null; // If this is a rollback job + tasks: DeploymentTask[]; +} + +type DeploymentStatus = + | "pending" // Waiting to start + | "running" // Deployment in progress + | "succeeded" // All tasks succeeded + | "failed" // One or more tasks failed + | "cancelled" // User cancelled + | "rolling_back" // Rollback in progress + | "rolled_back"; // Rollback complete + +interface DeploymentTask { + id: UUID; + jobId: UUID; + targetId: UUID; + digest: string; + status: TaskStatus; + agentId: UUID | null; + startedAt: DateTime; + completedAt: DateTime; + exitCode: number | null; + logs: string; + previousDigest: string | null; + stickerWritten: boolean; +} + +type TaskStatus = + | "pending" + | "running" + | "succeeded" + | "failed" + | "cancelled" + | "skipped"; +``` + +--- + +### Module: `target-executor` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Target-specific deployment logic | +| **Dependencies** | `agent-manager`, `connector-runtime` | +| **Protocol** | gRPC for agents, SSH/WinRM for agentless | + +**Executor Types**: + +| Type | Transport | Use Case | +|------|-----------|----------| +| `agent-docker` | gRPC | Docker hosts with agent | +| `agent-compose` | gRPC | Compose hosts with agent | +| `ssh-remote` | SSH | Agentless Linux hosts | +| `winrm-remote` | WinRM | Agentless Windows hosts | +| `ecs-api` | AWS API | AWS ECS services | +| `nomad-api` | Nomad API | HashiCorp Nomad jobs | + +--- + +### Module: `runner-executor` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Script/hook execution in sandbox | +| **Dependencies** | `plugin-sandbox` | +| **Supported Scripts** | C# (.csx), Bash, PowerShell | + +**Hook Types**: +- `pre-deploy`: Run before deployment starts +- `post-deploy`: Run after deployment succeeds +- `on-failure`: Run when deployment fails +- `on-rollback`: Run during rollback + +--- + +### Module: `artifact-generator` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Generate immutable deployment artifacts | +| **Dependencies** | `release-manager`, `environment-manager` | +| **Data Entities** | `GeneratedArtifact`, `ComposeLock`, `VersionSticker` | + +**Generated Artifacts**: + +| Artifact Type | Description | +|---------------|-------------| +| `compose_lock` | `compose.stella.lock.yml` - Pinned digests | +| `script` | Compiled deployment script | +| `sticker` | `stella.version.json` - Version marker | +| `evidence` | Decision and execution evidence | +| `config` | Environment-specific config files | + +**Compose Lock File Generation**: +```typescript +class ComposeLockGenerator { + async generate( + release: Release, + environment: Environment, + targets: Target[] + ): Promise { + + const services: Record = {}; + + for (const component of release.components) { + services[component.componentName] = { + // CRITICAL: Always use digest, never tag + image: `${component.imageRepository}@${component.digest}`, + + // Environment variables + environment: this.mergeEnvironment( + environment.config.variables, + this.buildStellaEnv(release, environment) + ), + + // Labels for Stella tracking + labels: { + "stella.release.id": release.id, + "stella.release.name": release.name, + "stella.component.name": component.componentName, + "stella.component.digest": component.digest, + "stella.environment": environment.name, + "stella.deployed.at": new Date().toISOString(), + }, + }; + } + + const composeLock = { + version: "3.8", + services, + "x-stella": { + release_id: release.id, + release_name: release.name, + environment: environment.name, + generated_at: new Date().toISOString(), + inputs_hash: this.computeInputsHash(release, environment), + components: release.components.map(c => ({ + name: c.componentName, + digest: c.digest, + semver: c.semver, + })), + }, + }; + + const content = yaml.stringify(composeLock); + const hash = crypto.createHash("sha256").update(content).digest("hex"); + + return { + type: "compose_lock", + name: "compose.stella.lock.yml", + content: Buffer.from(content), + contentHash: `sha256:${hash}`, + }; + } +} +``` + +**Version Sticker Generation**: +```typescript +interface VersionSticker { + stella_version: "1.0"; + release_id: UUID; + release_name: string; + components: Array<{ + name: string; + digest: string; + semver: string; + tag: string; + image_repository: string; + }>; + environment: string; + environment_id: UUID; + deployed_at: string; + deployed_by: UUID; + promotion_id: UUID; + workflow_run_id: UUID; + evidence_packet_id: UUID; + evidence_packet_hash: string; + orchestrator_version: string; + source_ref?: { + commit_sha: string; + branch: string; + repository: string; + }; +} +``` + +--- + +### Module: `rollback-manager` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Rollback orchestration; previous state recovery | +| **Dependencies** | `deploy-orchestrator`, `target-registry` | + +**Rollback Strategies**: + +| Strategy | Description | +|----------|-------------| +| `to-previous` | Roll back to last successful deployment | +| `to-release` | Roll back to specific release ID | +| `to-sticker` | Roll back to version in sticker on target | + +**Rollback Flow**: +1. Identify rollback target (previous release or specified) +2. Create rollback deployment job +3. Execute deployment with rollback artifacts +4. Update target state and sticker +5. Record rollback evidence + +--- + +## Deployment Strategies + +### All-at-Once +Deploy to all targets simultaneously. + +```typescript +interface AllAtOnceConfig { + parallelism: number; // Max concurrent deployments (0 = unlimited) + continueOnFailure: boolean; // Continue if some targets fail + failureThreshold: number; // Max failures before abort +} +``` + +### Rolling +Deploy to targets sequentially with health checks. + +```typescript +interface RollingConfig { + batchSize: number; // Targets per batch + batchDelay: number; // Seconds between batches + healthCheckBetweenBatches: boolean; + rollbackOnFailure: boolean; + maxUnavailable: number; // Max targets unavailable at once +} +``` + +### Canary +Deploy to subset, verify, then proceed. + +```typescript +interface CanaryConfig { + canaryTargets: number; // Number or percentage for canary + canaryDuration: number; // Seconds to run canary + healthThreshold: number; // Required health percentage + autoPromote: boolean; // Auto-proceed if healthy + requireApproval: boolean; // Require manual approval +} +``` + +### Blue-Green +Deploy to B, switch traffic, retire A. + +```typescript +interface BlueGreenConfig { + targetGroupA: UUID; // Current (blue) target group + targetGroupB: UUID; // New (green) target group + trafficShiftType: "instant" | "gradual"; + gradualShiftSteps?: number[]; // e.g., [10, 25, 50, 100] + rollbackOnHealthFailure: boolean; +} +``` + +--- + +## Rolling Deployment Algorithm + +```python +class RollingDeploymentExecutor: + def execute(self, job: DeploymentJob, config: RollingConfig) -> DeploymentResult: + targets = self.get_targets(job.environment_id) + batches = self.create_batches(targets, config.batch_size) + + deployed_targets = [] + failed_targets = [] + + for batch_index, batch in enumerate(batches): + self.log(f"Starting batch {batch_index + 1} of {len(batches)}") + + # Deploy batch in parallel + batch_results = self.deploy_batch(job, batch) + + for target, result in batch_results: + if result.success: + deployed_targets.append(target) + # Write version sticker + self.write_sticker(target, job.release) + else: + failed_targets.append(target) + + if config.rollback_on_failure: + # Rollback all deployed targets + self.rollback_targets(deployed_targets, job.previous_release) + return DeploymentResult( + success=False, + error=f"Batch {batch_index + 1} failed, rolled back", + deployed=deployed_targets, + failed=failed_targets, + rolled_back=deployed_targets + ) + + # Health check between batches + if config.health_check_between_batches and batch_index < len(batches) - 1: + health_result = self.check_batch_health(deployed_targets[-len(batch):]) + + if not health_result.healthy: + if config.rollback_on_failure: + self.rollback_targets(deployed_targets, job.previous_release) + return DeploymentResult( + success=False, + error=f"Health check failed after batch {batch_index + 1}", + deployed=deployed_targets, + failed=failed_targets, + rolled_back=deployed_targets + ) + + # Delay between batches + if config.batch_delay > 0 and batch_index < len(batches) - 1: + time.sleep(config.batch_delay) + + return DeploymentResult( + success=len(failed_targets) == 0, + deployed=deployed_targets, + failed=failed_targets + ) +``` + +--- + +## Database Schema + +```sql +-- Deployment Jobs +CREATE TABLE release.deployment_jobs ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + promotion_id UUID NOT NULL REFERENCES release.promotions(id), + release_id UUID NOT NULL REFERENCES release.releases(id), + environment_id UUID NOT NULL REFERENCES release.environments(id), + status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN ( + 'pending', 'running', 'succeeded', 'failed', 'cancelled', 'rolling_back', 'rolled_back' + )), + strategy VARCHAR(50) NOT NULL DEFAULT 'all-at-once', + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + artifacts JSONB NOT NULL DEFAULT '[]', + rollback_of UUID REFERENCES release.deployment_jobs(id), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_deployment_jobs_promotion ON release.deployment_jobs(promotion_id); +CREATE INDEX idx_deployment_jobs_status ON release.deployment_jobs(status); + +-- Deployment Tasks +CREATE TABLE release.deployment_tasks ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + job_id UUID NOT NULL REFERENCES release.deployment_jobs(id) ON DELETE CASCADE, + target_id UUID NOT NULL REFERENCES release.targets(id), + digest VARCHAR(100) NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN ( + 'pending', 'running', 'succeeded', 'failed', 'cancelled', 'skipped' + )), + agent_id UUID REFERENCES release.agents(id), + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + exit_code INTEGER, + logs TEXT, + previous_digest VARCHAR(100), + sticker_written BOOLEAN NOT NULL DEFAULT FALSE, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_deployment_tasks_job ON release.deployment_tasks(job_id); +CREATE INDEX idx_deployment_tasks_target ON release.deployment_tasks(target_id); +CREATE INDEX idx_deployment_tasks_status ON release.deployment_tasks(status); + +-- Generated Artifacts +CREATE TABLE release.generated_artifacts ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + deployment_job_id UUID REFERENCES release.deployment_jobs(id) ON DELETE CASCADE, + artifact_type VARCHAR(50) NOT NULL CHECK (artifact_type IN ( + 'compose_lock', 'script', 'sticker', 'evidence', 'config' + )), + name VARCHAR(255) NOT NULL, + content_hash VARCHAR(100) NOT NULL, + content BYTEA, -- for small artifacts + storage_ref VARCHAR(500), -- for large artifacts (S3, etc.) + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_generated_artifacts_job ON release.generated_artifacts(deployment_job_id); +``` + +--- + +## API Endpoints + +```yaml +# Deployment Jobs (mostly read-only; created by promotions) +GET /api/v1/deployment-jobs + Query: ?promotionId={uuid}&status={status}&environmentId={uuid} + Response: DeploymentJob[] + +GET /api/v1/deployment-jobs/{id} + Response: DeploymentJob (with tasks) + +GET /api/v1/deployment-jobs/{id}/tasks + Response: DeploymentTask[] + +GET /api/v1/deployment-jobs/{id}/tasks/{taskId} + Response: DeploymentTask (with logs) + +GET /api/v1/deployment-jobs/{id}/tasks/{taskId}/logs + Query: ?follow=true + Response: string | SSE stream + +GET /api/v1/deployment-jobs/{id}/artifacts + Response: GeneratedArtifact[] + +GET /api/v1/deployment-jobs/{id}/artifacts/{artifactId} + Response: binary (download) + +# Rollback +POST /api/v1/rollbacks + Body: { + environmentId: UUID, + strategy: "to-previous" | "to-release" | "to-sticker", + targetReleaseId?: UUID # for to-release strategy + } + Response: DeploymentJob (rollback job) + +GET /api/v1/rollbacks + Query: ?environmentId={uuid} + Response: DeploymentJob[] (rollback jobs only) +``` + +--- + +## References + +- [Module Overview](overview.md) +- [Agents Specification](agents.md) +- [Deployment Strategies](../deployment/strategies.md) +- [Artifact Generation](../deployment/artifacts.md) +- [API Documentation](../api/deployments.md) diff --git a/docs/modules/release-orchestrator/modules/environment-manager.md b/docs/modules/release-orchestrator/modules/environment-manager.md new file mode 100644 index 000000000..3b5b70a3c --- /dev/null +++ b/docs/modules/release-orchestrator/modules/environment-manager.md @@ -0,0 +1,418 @@ +# ENVMGR: Environment & Inventory Manager + +**Purpose**: Model environments, targets, agents, and their relationships. + +## Modules + +### Module: `environment-manager` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Environment CRUD, ordering, configuration, freeze windows | +| **Dependencies** | `authority` | +| **Data Entities** | `Environment`, `EnvironmentConfig`, `FreezeWindow` | +| **Events Produced** | `environment.created`, `environment.updated`, `environment.freeze_started`, `environment.freeze_ended` | + +**Key Operations**: +``` +CreateEnvironment(name, displayName, orderIndex, config) → Environment +UpdateEnvironment(id, config) → Environment +DeleteEnvironment(id) → void +SetFreezeWindow(environmentId, start, end, reason, exceptions) → FreezeWindow +ClearFreezeWindow(environmentId, windowId) → void +ListEnvironments(tenantId) → Environment[] +GetEnvironmentState(id) → EnvironmentState +``` + +**Environment Entity**: +```typescript +interface Environment { + id: UUID; + tenantId: UUID; + name: string; // "dev", "stage", "prod" + displayName: string; // "Development" + orderIndex: number; // 0, 1, 2 for promotion order + config: EnvironmentConfig; + freezeWindows: FreezeWindow[]; + requiredApprovals: number; // 0 for dev, 1+ for prod + requireSeparationOfDuties: boolean; + autoPromoteFrom: UUID | null; // auto-promote from this env + promotionPolicy: string; // OPA policy name + createdAt: DateTime; + updatedAt: DateTime; +} + +interface EnvironmentConfig { + variables: Record; // env-specific variables + secrets: SecretReference[]; // vault references + registryOverrides: RegistryOverride[]; // per-env registry + agentLabels: string[]; // required agent labels + deploymentTimeout: number; // seconds + healthCheckConfig: HealthCheckConfig; +} + +interface FreezeWindow { + id: UUID; + start: DateTime; + end: DateTime; + reason: string; + createdBy: UUID; + exceptions: UUID[]; // users who can override +} +``` + +--- + +### Module: `target-registry` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Deployment target inventory; capability tracking | +| **Dependencies** | `environment-manager`, `agent-manager` | +| **Data Entities** | `Target`, `TargetGroup`, `TargetCapability` | +| **Events Produced** | `target.created`, `target.updated`, `target.deleted`, `target.health_changed` | + +**Target Types** (plugin-provided): + +| Type | Description | +|------|-------------| +| `docker_host` | Single Docker host | +| `compose_host` | Docker Compose host | +| `ssh_remote` | Generic SSH target | +| `winrm_remote` | Windows remote target | +| `ecs_service` | AWS ECS service | +| `nomad_job` | HashiCorp Nomad job | + +**Target Entity**: +```typescript +interface Target { + id: UUID; + tenantId: UUID; + environmentId: UUID; + name: string; // "prod-web-01" + targetType: string; // "docker_host" + connection: TargetConnection; // type-specific + capabilities: TargetCapability[]; + labels: Record; // for grouping + healthStatus: HealthStatus; + lastHealthCheck: DateTime; + deploymentDirectory: string; // where artifacts are placed + currentDigest: string | null; // what's currently deployed + agentId: UUID | null; // assigned agent +} + +interface TargetConnection { + // Common fields + host: string; + port: number; + + // Type-specific (examples) + // docker_host: + dockerSocket?: string; + tlsCert?: SecretReference; + + // ssh_remote: + username?: string; + privateKey?: SecretReference; + + // ecs_service: + cluster?: string; + service?: string; + region?: string; + roleArn?: string; +} + +interface TargetGroup { + id: UUID; + tenantId: UUID; + environmentId: UUID; + name: string; + labels: Record; + createdAt: DateTime; +} +``` + +--- + +### Module: `agent-manager` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Agent registration, heartbeat, capability advertisement | +| **Dependencies** | `authority` (for agent tokens) | +| **Data Entities** | `Agent`, `AgentCapability`, `AgentHeartbeat` | +| **Events Produced** | `agent.registered`, `agent.online`, `agent.offline`, `agent.capability_changed` | + +**Agent Lifecycle**: +1. Agent starts, requests registration token from Authority +2. Agent registers with capabilities and labels +3. Agent sends heartbeats (default: 30s interval) +4. Agent pulls tasks from task queue +5. Agent reports task completion/failure + +**Agent Entity**: +```typescript +interface Agent { + id: UUID; + tenantId: UUID; + name: string; + version: string; + capabilities: AgentCapability[]; + labels: Record; + status: "online" | "offline" | "degraded"; + lastHeartbeat: DateTime; + assignedTargets: UUID[]; + resourceUsage: ResourceUsage; +} + +interface AgentCapability { + type: string; // "docker", "compose", "ssh", "winrm" + version: string; // capability version + config: object; // capability-specific config +} + +interface ResourceUsage { + cpuPercent: number; + memoryPercent: number; + diskPercent: number; + activeTasks: number; +} +``` + +**Agent Registration Protocol**: +``` +1. Admin generates registration token (one-time use) + POST /api/v1/admin/agent-tokens + → { token: "reg_xxx", expiresAt: "..." } + +2. Agent starts with registration token + ./stella-agent --register --token=reg_xxx + +3. Agent requests mTLS certificate + POST /api/v1/agents/register + Headers: X-Registration-Token: reg_xxx + Body: { name, version, capabilities, csr } + → { agentId, certificate, caCertificate } + +4. Agent establishes mTLS connection + Uses issued certificate for all subsequent requests + +5. Agent requests short-lived JWT for task execution + POST /api/v1/agents/token (over mTLS) + → { token, expiresIn: 3600 } // 1 hour +``` + +--- + +### Module: `inventory-sync` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Drift detection; expected vs actual state reconciliation | +| **Dependencies** | `target-registry`, `agent-manager` | +| **Events Produced** | `inventory.drift_detected`, `inventory.reconciled` | + +**Drift Detection Process**: +1. Read `stella.version.json` from target deployment directory +2. Compare with expected state in database +3. Flag discrepancies (digest mismatch, missing sticker, unexpected files) +4. Report on dashboard + +**Drift Detection Types**: + +| Drift Type | Description | Severity | +|------------|-------------|----------| +| `digest_mismatch` | Running digest differs from expected | Critical | +| `missing_sticker` | No version sticker found on target | Warning | +| `stale_sticker` | Sticker timestamp older than last deployment | Warning | +| `orphan_container` | Container not managed by Stella | Info | +| `extra_files` | Unexpected files in deployment directory | Info | + +--- + +## Cache Eviction Policies + +Environment configurations and target states are cached to improve performance. **All caches MUST have bounded size and TTL-based eviction**: + +| Cache Type | Purpose | TTL | Max Size | Eviction Strategy | +|-----------|---------|-----|----------|-------------------| +| **Environment Configs** | Environment configuration data | 30 minutes | 500 entries | Sliding expiration | +| **Target Health** | Target health status | 5 minutes | 2,000 entries | Sliding expiration | +| **Agent Capabilities** | Agent capability advertisement | 10 minutes | 1,000 entries | Sliding expiration | +| **Freeze Windows** | Active freeze window checks | 15 minutes | 100 entries | Absolute expiration | + +**Implementation**: +```csharp +public class EnvironmentConfigCache +{ + private readonly MemoryCache _cache; + + public EnvironmentConfigCache() + { + _cache = new MemoryCache(new MemoryCacheOptions + { + SizeLimit = 500 // Max 500 environment configs + }); + } + + public void CacheConfig(Guid environmentId, EnvironmentConfig config) + { + _cache.Set(environmentId, config, new MemoryCacheEntryOptions + { + Size = 1, + SlidingExpiration = TimeSpan.FromMinutes(30) // 30-minute TTL + }); + } + + public EnvironmentConfig? GetCachedConfig(Guid environmentId) + => _cache.Get(environmentId); + + public void InvalidateConfig(Guid environmentId) + => _cache.Remove(environmentId); +} +``` + +**Cache Invalidation**: +- Environment configs: Invalidate on update +- Target health: Invalidate on health check or deployment +- Agent capabilities: Invalidate on capability change event +- Freeze windows: Invalidate on window creation/deletion + +**Reference**: See [Implementation Guide](../implementation-guide.md#caching) for cache implementation patterns. + +--- + +## Database Schema + +```sql +-- Environments +CREATE TABLE release.environments ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + name VARCHAR(100) NOT NULL, + display_name VARCHAR(255) NOT NULL, + order_index INTEGER NOT NULL, + config JSONB NOT NULL DEFAULT '{}', + freeze_windows JSONB NOT NULL DEFAULT '[]', + required_approvals INTEGER NOT NULL DEFAULT 0, + require_sod BOOLEAN NOT NULL DEFAULT FALSE, + auto_promote_from UUID REFERENCES release.environments(id), + promotion_policy VARCHAR(255), + deployment_timeout INTEGER NOT NULL DEFAULT 600, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_environments_tenant ON release.environments(tenant_id); +CREATE INDEX idx_environments_order ON release.environments(tenant_id, order_index); + +-- Target Groups +CREATE TABLE release.target_groups ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES release.environments(id) ON DELETE CASCADE, + name VARCHAR(255) NOT NULL, + labels JSONB NOT NULL DEFAULT '{}', + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, environment_id, name) +); + +-- Targets +CREATE TABLE release.targets ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES release.environments(id) ON DELETE CASCADE, + target_group_id UUID REFERENCES release.target_groups(id), + name VARCHAR(255) NOT NULL, + target_type VARCHAR(100) NOT NULL, + connection JSONB NOT NULL, + capabilities JSONB NOT NULL DEFAULT '[]', + labels JSONB NOT NULL DEFAULT '{}', + deployment_directory VARCHAR(500), + health_status VARCHAR(50) NOT NULL DEFAULT 'unknown', + last_health_check TIMESTAMPTZ, + current_digest VARCHAR(100), + agent_id UUID REFERENCES release.agents(id), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, environment_id, name) +); + +CREATE INDEX idx_targets_tenant_env ON release.targets(tenant_id, environment_id); +CREATE INDEX idx_targets_type ON release.targets(target_type); +CREATE INDEX idx_targets_labels ON release.targets USING GIN (labels); + +-- Agents +CREATE TABLE release.agents ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + name VARCHAR(255) NOT NULL, + version VARCHAR(50) NOT NULL, + capabilities JSONB NOT NULL DEFAULT '[]', + labels JSONB NOT NULL DEFAULT '{}', + status VARCHAR(50) NOT NULL DEFAULT 'offline', + last_heartbeat TIMESTAMPTZ, + resource_usage JSONB, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_agents_tenant ON release.agents(tenant_id); +CREATE INDEX idx_agents_status ON release.agents(status); +CREATE INDEX idx_agents_capabilities ON release.agents USING GIN (capabilities); +``` + +--- + +## API Endpoints + +```yaml +# Environments +POST /api/v1/environments +GET /api/v1/environments +GET /api/v1/environments/{id} +PUT /api/v1/environments/{id} +DELETE /api/v1/environments/{id} + +# Freeze Windows +POST /api/v1/environments/{envId}/freeze-windows +GET /api/v1/environments/{envId}/freeze-windows +DELETE /api/v1/environments/{envId}/freeze-windows/{windowId} + +# Target Groups +POST /api/v1/environments/{envId}/target-groups +GET /api/v1/environments/{envId}/target-groups +GET /api/v1/target-groups/{id} +PUT /api/v1/target-groups/{id} +DELETE /api/v1/target-groups/{id} + +# Targets +POST /api/v1/targets +GET /api/v1/targets +GET /api/v1/targets/{id} +PUT /api/v1/targets/{id} +DELETE /api/v1/targets/{id} +POST /api/v1/targets/{id}/health-check +GET /api/v1/targets/{id}/sticker +GET /api/v1/targets/{id}/drift + +# Agents +POST /api/v1/agents/register +GET /api/v1/agents +GET /api/v1/agents/{id} +PUT /api/v1/agents/{id} +DELETE /api/v1/agents/{id} +POST /api/v1/agents/{id}/heartbeat +POST /api/v1/agents/{id}/tasks/{taskId}/complete +``` + +--- + +## References + +- [Module Overview](overview.md) +- [Agent Specification](agents.md) +- [API Documentation](../api/environments.md) +- [Agent Security](../security/agent-security.md) diff --git a/docs/modules/release-orchestrator/modules/evidence.md b/docs/modules/release-orchestrator/modules/evidence.md new file mode 100644 index 000000000..38bc30410 --- /dev/null +++ b/docs/modules/release-orchestrator/modules/evidence.md @@ -0,0 +1,575 @@ +# RELEVI: Release Evidence + +**Purpose**: Cryptographically sealed evidence packets for audit-grade release governance. + +## Modules + +### Module: `evidence-collector` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Evidence aggregation; packet composition | +| **Dependencies** | `promotion-manager`, `deploy-orchestrator`, `decision-engine` | +| **Data Entities** | `EvidencePacket`, `EvidenceContent` | +| **Events Produced** | `evidence.collected`, `evidence.packet_created` | + +**Evidence Packet Structure**: +```typescript +interface EvidencePacket { + id: UUID; + tenantId: UUID; + promotionId: UUID; + packetType: EvidencePacketType; + content: EvidenceContent; + contentHash: string; // SHA-256 of content + signature: string; // Cryptographic signature + signerKeyRef: string; // Reference to signing key + createdAt: DateTime; + // Note: No updatedAt - packets are immutable +} + +type EvidencePacketType = + | "release_decision" // Promotion decision evidence + | "deployment" // Deployment execution evidence + | "rollback" // Rollback evidence + | "ab_promotion"; // A/B promotion evidence + +interface EvidenceContent { + // Metadata + version: "1.0"; + generatedAt: DateTime; + generatorVersion: string; + + // What + release: { + id: UUID; + name: string; + components: Array<{ + name: string; + digest: string; + semver: string; + imageRepository: string; + }>; + sourceRef: SourceReference | null; + }; + + // Where + environment: { + id: UUID; + name: string; + targets: Array<{ + id: UUID; + name: string; + type: string; + }>; + }; + + // Who + actors: { + requester: { + id: UUID; + name: string; + email: string; + }; + approvers: Array<{ + id: UUID; + name: string; + action: string; + at: DateTime; + comment: string | null; + }>; + }; + + // Why + decision: { + result: "allow" | "deny"; + gates: Array<{ + type: string; + name: string; + status: string; + message: string; + details: Record; + }>; + reasons: string[]; + }; + + // How + execution: { + workflowRunId: UUID | null; + deploymentJobId: UUID | null; + artifacts: Array<{ + type: string; + name: string; + contentHash: string; + }>; + logs: string | null; // Compressed/truncated + }; + + // When + timeline: { + requestedAt: DateTime; + decidedAt: DateTime | null; + startedAt: DateTime | null; + completedAt: DateTime | null; + }; + + // Integrity + inputsHash: string; // Hash of all inputs for replay + previousEvidenceId: UUID | null; // Chain to previous evidence +} +``` + +--- + +### Module: `evidence-signer` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Cryptographic signing of evidence packets | +| **Dependencies** | `authority`, `vault` (for key storage) | +| **Algorithms** | RS256, ES256, Ed25519 | + +**Signing Process**: +```typescript +class EvidenceSigner { + async sign(content: EvidenceContent): Promise { + // 1. Canonicalize content (RFC 8785) + const canonicalJson = canonicalize(content); + + // 2. Compute content hash + const contentHash = crypto + .createHash("sha256") + .update(canonicalJson) + .digest("hex"); + + // 3. Get signing key from vault + const keyRef = await this.getActiveSigningKey(); + const privateKey = await this.vault.getPrivateKey(keyRef); + + // 4. Sign the content hash + const signature = await this.signWithKey(contentHash, privateKey); + + return { + content, + contentHash: `sha256:${contentHash}`, + signature: base64Encode(signature), + signerKeyRef: keyRef, + algorithm: this.config.signatureAlgorithm, + }; + } + + async verify(packet: EvidencePacket): Promise { + // 1. Canonicalize stored content + const canonicalJson = canonicalize(packet.content); + + // 2. Verify content hash + const computedHash = crypto + .createHash("sha256") + .update(canonicalJson) + .digest("hex"); + + if (`sha256:${computedHash}` !== packet.contentHash) { + return { valid: false, error: "Content hash mismatch" }; + } + + // 3. Get public key + const publicKey = await this.vault.getPublicKey(packet.signerKeyRef); + + // 4. Verify signature + const signatureValid = await this.verifySignature( + computedHash, + base64Decode(packet.signature), + publicKey + ); + + return { + valid: signatureValid, + signerKeyRef: packet.signerKeyRef, + signedAt: packet.createdAt, + }; + } +} +``` + +--- + +### Module: `sticker-writer` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Version sticker generation and placement | +| **Dependencies** | `deploy-orchestrator`, `agent-manager` | +| **Data Entities** | `VersionSticker` | + +**Version Sticker Schema**: +```typescript +interface VersionSticker { + stella_version: "1.0"; + + // Release identity + release_id: UUID; + release_name: string; + + // Component details + components: Array<{ + name: string; + digest: string; + semver: string; + tag: string; + image_repository: string; + }>; + + // Deployment context + environment: string; + environment_id: UUID; + deployed_at: string; // ISO 8601 + deployed_by: UUID; + + // Traceability + promotion_id: UUID; + workflow_run_id: UUID; + + // Evidence chain + evidence_packet_id: UUID; + evidence_packet_hash: string; + policy_decision_hash: string; + + // Orchestrator info + orchestrator_version: string; + + // Source reference + source_ref?: { + commit_sha: string; + branch: string; + repository: string; + }; +} +``` + +**Sticker Placement**: +- Written to `/var/stella/version.json` on each target +- Atomic write (write to temp, rename) +- Read during drift detection +- Verified against expected state + +--- + +### Module: `audit-exporter` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Compliance report generation; evidence export | +| **Dependencies** | `evidence-collector` | +| **Export Formats** | JSON, PDF, CSV | + +**Audit Report Types**: + +| Report Type | Description | +|-------------|-------------| +| `release_audit` | Full audit trail for a release | +| `environment_audit` | All deployments to an environment | +| `compliance_summary` | Summary for compliance review | +| `change_log` | Chronological change log | + +**Report Generation**: +```typescript +interface AuditReportRequest { + type: AuditReportType; + scope: { + releaseId?: UUID; + environmentId?: UUID; + from?: DateTime; + to?: DateTime; + }; + format: "json" | "pdf" | "csv"; + options?: { + includeDecisionDetails: boolean; + includeApproverDetails: boolean; + includeLogs: boolean; + includeArtifacts: boolean; + }; +} + +interface AuditReport { + id: UUID; + type: AuditReportType; + scope: ReportScope; + generatedAt: DateTime; + generatedBy: UUID; + + summary: { + totalPromotions: number; + successfulDeployments: number; + failedDeployments: number; + rollbacks: number; + averageDeploymentTime: number; + }; + + entries: AuditEntry[]; + + // For compliance + signatureChain: { + valid: boolean; + verifiedPackets: number; + invalidPackets: number; + }; +} +``` + +--- + +## Immutability Enforcement + +Evidence packets are append-only. This is enforced at multiple levels: + +### Database Level +```sql +-- Evidence packets table with no UPDATE/DELETE +CREATE TABLE release.evidence_packets ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + promotion_id UUID NOT NULL REFERENCES release.promotions(id), + packet_type VARCHAR(50) NOT NULL CHECK (packet_type IN ( + 'release_decision', 'deployment', 'rollback', 'ab_promotion' + )), + content JSONB NOT NULL, + content_hash VARCHAR(100) NOT NULL, + signature TEXT, + signer_key_ref VARCHAR(255), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() + -- Note: No updated_at column; immutable by design +); + +-- Append-only enforcement via trigger +CREATE OR REPLACE FUNCTION prevent_evidence_modification() +RETURNS TRIGGER AS $$ +BEGIN + RAISE EXCEPTION 'Evidence packets are immutable and cannot be modified or deleted'; +END; +$$ LANGUAGE plpgsql; + +CREATE TRIGGER evidence_packets_immutable +BEFORE UPDATE OR DELETE ON evidence_packets +FOR EACH ROW EXECUTE FUNCTION prevent_evidence_modification(); + +-- Revoke UPDATE/DELETE from application role +REVOKE UPDATE, DELETE ON release.evidence_packets FROM app_role; + +-- Version stickers table +CREATE TABLE release.version_stickers ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + target_id UUID NOT NULL REFERENCES release.targets(id), + release_id UUID NOT NULL REFERENCES release.releases(id), + promotion_id UUID NOT NULL REFERENCES release.promotions(id), + sticker_content JSONB NOT NULL, + content_hash VARCHAR(100) NOT NULL, + written_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + verified_at TIMESTAMPTZ, + drift_detected BOOLEAN NOT NULL DEFAULT FALSE +); + +CREATE INDEX idx_version_stickers_target ON release.version_stickers(target_id); +CREATE INDEX idx_version_stickers_release ON release.version_stickers(release_id); +CREATE INDEX idx_evidence_packets_promotion ON release.evidence_packets(promotion_id); +CREATE INDEX idx_evidence_packets_created ON release.evidence_packets(created_at DESC); +``` + +### Application Level +```csharp +// Evidence service enforces immutability +public sealed class EvidenceService +{ + // Only Create method - no Update or Delete + public async Task CreateAsync( + EvidenceContent content, + CancellationToken ct) + { + // Sign content + var signed = await _signer.SignAsync(content, ct); + + // Store (append-only) + var packet = new EvidencePacket + { + Id = Guid.NewGuid(), + TenantId = content.TenantId, + PromotionId = content.PromotionId, + PacketType = content.PacketType, + Content = content, + ContentHash = signed.ContentHash, + Signature = signed.Signature, + SignerKeyRef = signed.SignerKeyRef, + CreatedAt = DateTime.UtcNow, + }; + + await _repository.InsertAsync(packet, ct); + return packet; + } + + // Read methods only + public async Task GetAsync(Guid id, CancellationToken ct); + public async Task> ListAsync( + EvidenceFilter filter, CancellationToken ct); + public async Task VerifyAsync( + Guid id, CancellationToken ct); + + // No Update or Delete methods exist +} +``` + +--- + +## Evidence Chain + +Evidence packets form a verifiable chain: + +``` +┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ +│ Evidence #1 │ │ Evidence #2 │ │ Evidence #3 │ +│ (Dev Deploy) │────►│ (Stage Deploy) │────►│ (Prod Deploy) │ +│ │ │ │ │ │ +│ prevEvidenceId: │ │ prevEvidenceId: │ │ prevEvidenceId: │ +│ null │ │ #1 │ │ #2 │ +│ │ │ │ │ │ +│ contentHash: │ │ contentHash: │ │ contentHash: │ +│ sha256:abc... │ │ sha256:def... │ │ sha256:ghi... │ +└─────────────────┘ └─────────────────┘ └─────────────────┘ +``` + +**Chain Verification**: +```typescript +async function verifyEvidenceChain(releaseId: UUID): Promise { + const packets = await getPacketsForRelease(releaseId); + const results: PacketVerificationResult[] = []; + + let previousHash: string | null = null; + + for (const packet of packets) { + // 1. Verify packet signature + const signatureValid = await verifySignature(packet); + + // 2. Verify content hash + const contentValid = await verifyContentHash(packet); + + // 3. Verify chain link + const chainValid = packet.content.previousEvidenceId === null + ? previousHash === null + : await verifyPreviousLink(packet, previousHash); + + results.push({ + packetId: packet.id, + signatureValid, + contentValid, + chainValid, + valid: signatureValid && contentValid && chainValid, + }); + + previousHash = packet.contentHash; + } + + return { + valid: results.every(r => r.valid), + packets: results, + }; +} +``` + +--- + +## API Endpoints + +```yaml +# Evidence Packets +GET /api/v1/evidence-packets + Query: ?promotionId={uuid}&type={type}&from={date}&to={date} + Response: EvidencePacket[] + +GET /api/v1/evidence-packets/{id} + Response: EvidencePacket (full content) + +GET /api/v1/evidence-packets/{id}/verify + Response: VerificationResult + +GET /api/v1/evidence-packets/{id}/download + Query: ?format={json|pdf} + Response: binary + +# Evidence Chain +GET /api/v1/releases/{id}/evidence-chain + Response: EvidenceChain + +GET /api/v1/releases/{id}/evidence-chain/verify + Response: ChainVerificationResult + +# Audit Reports +POST /api/v1/audit-reports + Body: { + type: "release" | "environment" | "compliance", + scope: { releaseId?, environmentId?, from?, to? }, + format: "json" | "pdf" | "csv" + } + Response: { reportId: UUID, status: "generating" } + +GET /api/v1/audit-reports/{id} + Response: { status, downloadUrl? } + +GET /api/v1/audit-reports/{id}/download + Response: binary + +# Version Stickers +GET /api/v1/version-stickers + Query: ?targetId={uuid}&releaseId={uuid} + Response: VersionSticker[] + +GET /api/v1/version-stickers/{id} + Response: VersionSticker +``` + +--- + +## Deterministic Replay + +Evidence packets enable deterministic replay - given the same inputs and policy version, the same decision is produced: + +```typescript +async function replayDecision(evidencePacket: EvidencePacket): Promise { + const content = evidencePacket.content; + + // 1. Verify inputs hash + const currentInputsHash = computeInputsHash( + content.release, + content.environment, + content.decision.gates + ); + + if (currentInputsHash !== content.inputsHash) { + return { valid: false, error: "Inputs have changed since original decision" }; + } + + // 2. Re-evaluate decision with same inputs + const replayedDecision = await evaluateDecision( + content.release, + content.environment, + { asOf: content.timeline.decidedAt } // Use policy version from that time + ); + + // 3. Compare decisions + const decisionsMatch = replayedDecision.result === content.decision.result; + + return { + valid: decisionsMatch, + originalDecision: content.decision.result, + replayedDecision: replayedDecision.result, + differences: decisionsMatch ? [] : computeDifferences(content.decision, replayedDecision), + }; +} +``` + +--- + +## References + +- [Module Overview](overview.md) +- [Design Principles](../design/principles.md) +- [Security Architecture](../security/overview.md) +- [Evidence Schema](../appendices/evidence-schema.md) diff --git a/docs/modules/release-orchestrator/modules/integration-hub.md b/docs/modules/release-orchestrator/modules/integration-hub.md new file mode 100644 index 000000000..7db9acf90 --- /dev/null +++ b/docs/modules/release-orchestrator/modules/integration-hub.md @@ -0,0 +1,373 @@ +# INTHUB: Integration Hub + +**Purpose**: Central management of all external integrations (SCM, CI, registries, vaults, targets). + +## Modules + +### Module: `integration-manager` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | CRUD for integration instances; plugin type registry | +| **Dependencies** | `plugin-registry`, `authority` (for credentials) | +| **Data Entities** | `Integration`, `IntegrationType`, `IntegrationCredential` | +| **Events Produced** | `integration.created`, `integration.updated`, `integration.deleted`, `integration.health_changed` | +| **Events Consumed** | `plugin.registered`, `plugin.unregistered` | + +**Key Operations**: +``` +CreateIntegration(type, name, config, credentials) → Integration +UpdateIntegration(id, config, credentials) → Integration +DeleteIntegration(id) → void +TestConnection(id) → ConnectionTestResult +DiscoverResources(id, resourceType) → Resource[] +GetIntegrationHealth(id) → HealthStatus +ListIntegrations(filter) → Integration[] +``` + +**Integration Entity**: +```typescript +interface Integration { + id: UUID; + tenantId: UUID; + type: string; // "scm.github", "registry.harbor" + name: string; // user-defined name + config: IntegrationConfig; // type-specific config + credentialId: UUID; // reference to vault + healthStatus: HealthStatus; + lastHealthCheck: DateTime; + createdAt: DateTime; + updatedAt: DateTime; +} + +interface IntegrationConfig { + endpoint: string; + authMode: "token" | "oauth" | "mtls" | "iam"; + timeout: number; + retryPolicy: RetryPolicy; + customHeaders?: Record; + // Type-specific fields added by plugin + [key: string]: any; +} +``` + +--- + +### Module: `connection-profiles` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Default settings management; "last used" pattern | +| **Dependencies** | `integration-manager` | +| **Data Entities** | `ConnectionProfile`, `ProfileTemplate` | + +**Behavior**: When user adds a new integration instance: +1. Wizard defaults to last used endpoint, auth mode, network settings +2. Secrets are **never** auto-reused (explicit confirmation required) +3. User can save as named profile for reuse + +**Profile Entity**: +```typescript +interface ConnectionProfile { + id: UUID; + tenantId: UUID; + name: string; // "Production GitHub" + integrationType: string; + defaultConfig: Partial; + isDefault: boolean; + lastUsedAt: DateTime; + createdBy: UUID; +} +``` + +--- + +### Module: `connector-runtime` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Execute plugin connector logic in controlled environment | +| **Dependencies** | `plugin-loader`, `plugin-sandbox` | +| **Protocol** | gRPC (preferred) or HTTP/REST | + +**Connector Interface** (implemented by plugins): +```protobuf +service Connector { + // Connection management + rpc TestConnection(TestConnectionRequest) returns (TestConnectionResponse); + rpc GetHealth(HealthRequest) returns (HealthResponse); + + // Resource discovery + rpc DiscoverResources(DiscoverRequest) returns (DiscoverResponse); + rpc ListRepositories(ListReposRequest) returns (ListReposResponse); + rpc ListBranches(ListBranchesRequest) returns (ListBranchesResponse); + rpc ListTags(ListTagsRequest) returns (ListTagsResponse); + + // Registry operations + rpc ResolveTagToDigest(ResolveRequest) returns (ResolveResponse); + rpc FetchManifest(ManifestRequest) returns (ManifestResponse); + rpc VerifyDigest(VerifyRequest) returns (VerifyResponse); + + // Secrets operations + rpc GetSecretsRef(SecretsRequest) returns (SecretsResponse); + rpc FetchSecret(FetchSecretRequest) returns (FetchSecretResponse); + + // Workflow step execution + rpc ExecuteStep(StepRequest) returns (stream StepResponse); + rpc CancelStep(CancelRequest) returns (CancelResponse); +} +``` + +**Request/Response Types**: +```protobuf +message TestConnectionRequest { + string integration_id = 1; + map config = 2; + string credential_ref = 3; +} + +message TestConnectionResponse { + bool success = 1; + string error_message = 2; + map details = 3; + int64 latency_ms = 4; +} + +message ResolveRequest { + string integration_id = 1; + string image_ref = 2; // "myapp:v2.3.1" +} + +message ResolveResponse { + string digest = 1; // "sha256:abc123..." + string manifest_type = 2; + int64 size_bytes = 3; + google.protobuf.Timestamp pushed_at = 4; +} +``` + +--- + +### Module: `doctor-checks` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Integration health diagnostics; troubleshooting | +| **Dependencies** | `integration-manager`, `connector-runtime` | + +**Doctor Check Types**: + +| Check | Purpose | Pass Criteria | +|-------|---------|---------------| +| **Connectivity** | Can reach endpoint | TCP connect succeeds | +| **TLS** | Certificate valid | Chain validates, not expired | +| **Authentication** | Credentials valid | Auth request succeeds | +| **Authorization** | Permissions sufficient | Required scopes present | +| **Version** | API version supported | Version in supported range | +| **Rate Limit** | Quota available | >10% remaining | +| **Latency** | Response time acceptable | <5s p99 | + +**Doctor Check Output**: +```typescript +interface DoctorCheckResult { + checkType: string; + status: "pass" | "warn" | "fail"; + message: string; + details: Record; + suggestions: string[]; + runAt: DateTime; + durationMs: number; +} + +interface DoctorReport { + integrationId: UUID; + overallStatus: "healthy" | "degraded" | "unhealthy"; + checks: DoctorCheckResult[]; + generatedAt: DateTime; +} +``` + +--- + +## Cache Eviction Policies + +Integration health status and connector results are cached to reduce load on external systems. **All caches MUST have bounded size and TTL-based eviction**: + +| Cache Type | Purpose | TTL | Max Size | Eviction Strategy | +|-----------|---------|-----|----------|-------------------| +| **Health Checks** | Integration health status | 5 minutes | 1,000 entries | Sliding expiration | +| **Connection Tests** | Test connection results | 2 minutes | 500 entries | Sliding expiration | +| **Resource Discovery** | Discovered resources (repos, tags) | 10 minutes | 5,000 entries | Sliding expiration | +| **Tag Resolution** | Tag → digest mappings | 1 hour | 10,000 entries | Absolute expiration | + +**Implementation**: +```csharp +public class IntegrationHealthCache +{ + private readonly MemoryCache _cache; + + public IntegrationHealthCache() + { + _cache = new MemoryCache(new MemoryCacheOptions + { + SizeLimit = 1_000 // Max 1,000 integration health entries + }); + } + + public void CacheHealthStatus(Guid integrationId, HealthStatus status) + { + _cache.Set(integrationId, status, new MemoryCacheEntryOptions + { + Size = 1, + SlidingExpiration = TimeSpan.FromMinutes(5) // 5-minute TTL + }); + } + + public HealthStatus? GetCachedHealthStatus(Guid integrationId) + => _cache.Get(integrationId); +} +``` + +**Reference**: See [Implementation Guide](../implementation-guide.md#caching) for cache implementation patterns. + +--- + +## Integration Types + +The following integration types are supported (via plugins): + +### SCM Integrations + +| Type | Plugin | Capabilities | +|------|--------|--------------| +| `scm.github` | Built-in | repos, branches, commits, webhooks, status | +| `scm.gitlab` | Built-in | repos, branches, commits, webhooks, pipelines | +| `scm.bitbucket` | Plugin | repos, branches, commits, webhooks | +| `scm.azure_repos` | Plugin | repos, branches, commits, pipelines | + +### Registry Integrations + +| Type | Plugin | Capabilities | +|------|--------|--------------| +| `registry.harbor` | Built-in | repos, tags, digests, scanning status | +| `registry.ecr` | Plugin | repos, tags, digests, IAM auth | +| `registry.gcr` | Plugin | repos, tags, digests | +| `registry.dockerhub` | Plugin | repos, tags, digests | +| `registry.ghcr` | Plugin | repos, tags, digests | +| `registry.acr` | Plugin | repos, tags, digests | + +### Vault Integrations + +| Type | Plugin | Capabilities | +|------|--------|--------------| +| `vault.hashicorp` | Built-in | KV, transit, PKI | +| `vault.aws_secrets` | Plugin | secrets, IAM auth | +| `vault.azure_keyvault` | Plugin | secrets, certificates | +| `vault.gcp_secrets` | Plugin | secrets, IAM auth | + +### CI Integrations + +| Type | Plugin | Capabilities | +|------|--------|--------------| +| `ci.github_actions` | Built-in | workflows, runs, artifacts, status | +| `ci.gitlab_ci` | Built-in | pipelines, jobs, artifacts | +| `ci.jenkins` | Plugin | jobs, builds, artifacts | +| `ci.azure_pipelines` | Plugin | pipelines, runs, artifacts | + +### Router Integrations (for Progressive Delivery) + +| Type | Plugin | Capabilities | +|------|--------|--------------| +| `router.nginx` | Plugin | upstream config, reload | +| `router.haproxy` | Plugin | backend config, reload | +| `router.traefik` | Plugin | dynamic config | +| `router.aws_alb` | Plugin | target groups, listener rules | + +--- + +## Database Schema + +```sql +-- Integration types (populated by plugins) +CREATE TABLE release.integration_types ( + id TEXT PRIMARY KEY, -- "scm.github" + plugin_id UUID REFERENCES release.plugins(id), + display_name TEXT NOT NULL, + description TEXT, + icon_url TEXT, + config_schema JSONB NOT NULL, -- JSON Schema for config + capabilities TEXT[] NOT NULL, -- ["repos", "webhooks", "status"] + created_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +-- Integration instances +CREATE TABLE release.integrations ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + type_id TEXT NOT NULL REFERENCES release.integration_types(id), + name TEXT NOT NULL, + config JSONB NOT NULL, + credential_ref TEXT NOT NULL, -- vault reference + health_status TEXT NOT NULL DEFAULT 'unknown', + last_health_check TIMESTAMPTZ, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + created_by UUID NOT NULL REFERENCES users(id), + UNIQUE(tenant_id, name) +); + +-- Connection profiles +CREATE TABLE release.connection_profiles ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + name TEXT NOT NULL, + integration_type TEXT NOT NULL, + default_config JSONB NOT NULL, + is_default BOOLEAN NOT NULL DEFAULT false, + last_used_at TIMESTAMPTZ, + created_by UUID NOT NULL REFERENCES users(id), + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + UNIQUE(tenant_id, name) +); + +-- Doctor check history +CREATE TABLE release.doctor_checks ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + integration_id UUID NOT NULL REFERENCES release.integrations(id), + check_type TEXT NOT NULL, + status TEXT NOT NULL, + message TEXT, + details JSONB, + duration_ms INTEGER NOT NULL, + run_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +CREATE INDEX idx_doctor_checks_integration ON release.doctor_checks(integration_id, run_at DESC); +``` + +--- + +## API Endpoints + +See [API Documentation](../api/overview.md) for full specification. + +``` +GET /api/v1/integration-types # List available types +GET /api/v1/integration-types/{type} # Get type details + +GET /api/v1/integrations # List integrations +POST /api/v1/integrations # Create integration +GET /api/v1/integrations/{id} # Get integration +PUT /api/v1/integrations/{id} # Update integration +DELETE /api/v1/integrations/{id} # Delete integration +POST /api/v1/integrations/{id}/test # Test connection +GET /api/v1/integrations/{id}/health # Get health status +POST /api/v1/integrations/{id}/doctor # Run doctor checks +GET /api/v1/integrations/{id}/resources # Discover resources + +GET /api/v1/connection-profiles # List profiles +POST /api/v1/connection-profiles # Create profile +GET /api/v1/connection-profiles/{id} # Get profile +PUT /api/v1/connection-profiles/{id} # Update profile +DELETE /api/v1/connection-profiles/{id} # Delete profile +``` diff --git a/docs/modules/release-orchestrator/modules/overview.md b/docs/modules/release-orchestrator/modules/overview.md new file mode 100644 index 000000000..2e81a99bc --- /dev/null +++ b/docs/modules/release-orchestrator/modules/overview.md @@ -0,0 +1,203 @@ +# Module Landscape Overview + +The Stella Ops Suite comprises existing modules (vulnerability scanning) and new modules (release orchestration). Modules are organized into **themes** (functional areas). + +## Architecture Diagram + +``` +┌─────────────────────────────────────────────────────────────────────────────────┐ +│ STELLA OPS SUITE │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────────┐ │ +│ │ EXISTING THEMES (Vulnerability) │ │ +│ │ │ │ +│ │ INGEST VEXOPS REASON SCANENG EVIDENCE │ │ +│ │ ├─concelier ├─excititor ├─policy ├─scanner ├─locker │ │ +│ │ └─advisory-ai └─linksets └─opa-runtime ├─sbom-gen ├─export │ │ +│ │ └─reachability └─timeline │ │ +│ │ │ │ +│ │ RUNTIME JOBCTRL OBSERVE REPLAY DEVEXP │ │ +│ │ ├─signals ├─scheduler ├─notifier └─replay-core ├─cli │ │ +│ │ ├─graph ├─orchestrator └─telemetry ├─web-ui │ │ +│ │ └─zastava └─task-runner └─sdk │ │ +│ └───────────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────────┐ │ +│ │ NEW THEMES (Release Orchestration) │ │ +│ │ │ │ +│ │ INTHUB (Integration Hub) │ │ +│ │ ├─integration-manager Central registry of configured integrations │ │ +│ │ ├─connection-profiles Default settings + credential management │ │ +│ │ ├─connector-runtime Plugin connector execution environment │ │ +│ │ └─doctor-checks Integration health diagnostics │ │ +│ │ │ │ +│ │ ENVMGR (Environment & Inventory) │ │ +│ │ ├─environment-manager Environment CRUD, ordering, config │ │ +│ │ ├─target-registry Deployment targets (hosts/services) │ │ +│ │ ├─agent-manager Agent registration, health, capabilities │ │ +│ │ └─inventory-sync Drift detection, state reconciliation │ │ +│ │ │ │ +│ │ RELMAN (Release Management) │ │ +│ │ ├─component-registry Image repos → components mapping │ │ +│ │ ├─version-manager Tag/digest → semver mapping │ │ +│ │ ├─release-manager Release bundle lifecycle │ │ +│ │ └─release-catalog Release history, search, compare │ │ +│ │ │ │ +│ │ WORKFL (Workflow Engine) │ │ +│ │ ├─workflow-designer Template creation, step graph editor │ │ +│ │ ├─workflow-engine DAG execution, state machine │ │ +│ │ ├─step-executor Step dispatch, retry, timeout │ │ +│ │ └─step-registry Built-in + plugin-provided steps │ │ +│ │ │ │ +│ │ PROMOT (Promotion & Approval) │ │ +│ │ ├─promotion-manager Promotion request lifecycle │ │ +│ │ ├─approval-gateway Approval collection, SoD enforcement │ │ +│ │ ├─decision-engine Gate evaluation, policy integration │ │ +│ │ └─gate-registry Built-in + custom gates │ │ +│ │ │ │ +│ │ DEPLOY (Deployment Execution) │ │ +│ │ ├─deploy-orchestrator Deployment job coordination │ │ +│ │ ├─target-executor Target-specific deployment logic │ │ +│ │ ├─runner-executor Script/hook execution sandbox │ │ +│ │ ├─artifact-generator Compose/script artifact generation │ │ +│ │ └─rollback-manager Rollback orchestration │ │ +│ │ │ │ +│ │ AGENTS (Deployment Agents) │ │ +│ │ ├─agent-core Shared agent runtime │ │ +│ │ ├─agent-docker Docker host agent │ │ +│ │ ├─agent-compose Docker Compose agent │ │ +│ │ ├─agent-ssh SSH remote executor │ │ +│ │ ├─agent-winrm WinRM remote executor │ │ +│ │ ├─agent-ecs AWS ECS agent │ │ +│ │ └─agent-nomad HashiCorp Nomad agent │ │ +│ │ │ │ +│ │ PROGDL (Progressive Delivery) │ │ +│ │ ├─ab-manager A/B release coordination │ │ +│ │ ├─traffic-router Router plugin orchestration │ │ +│ │ ├─canary-controller Canary ramp automation │ │ +│ │ └─rollout-strategy Strategy templates │ │ +│ │ │ │ +│ │ RELEVI (Release Evidence) │ │ +│ │ ├─evidence-collector Evidence aggregation │ │ +│ │ ├─evidence-signer Cryptographic signing │ │ +│ │ ├─sticker-writer Version sticker generation │ │ +│ │ └─audit-exporter Compliance report generation │ │ +│ │ │ │ +│ │ PLUGIN (Plugin Infrastructure) │ │ +│ │ ├─plugin-registry Plugin discovery, versioning │ │ +│ │ ├─plugin-loader Plugin lifecycle management │ │ +│ │ ├─plugin-sandbox Isolation, resource limits │ │ +│ │ └─plugin-sdk SDK for plugin development │ │ +│ └───────────────────────────────────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────────────────────────────┘ +``` + +## Theme Summary + +### Existing Themes (Vulnerability Scanning) + +| Theme | Purpose | Key Modules | +|-------|---------|-------------| +| **INGEST** | Advisory ingestion | concelier, advisory-ai | +| **VEXOPS** | VEX document handling | excititor, linksets | +| **REASON** | Policy and decisioning | policy, opa-runtime | +| **SCANENG** | Scanning and SBOM | scanner, sbom-gen, reachability | +| **EVIDENCE** | Evidence and attestation | locker, export, timeline | +| **RUNTIME** | Runtime signals | signals, graph, zastava | +| **JOBCTRL** | Job orchestration | scheduler, orchestrator, task-runner | +| **OBSERVE** | Observability | notifier, telemetry | +| **REPLAY** | Deterministic replay | replay-core | +| **DEVEXP** | Developer experience | cli, web-ui, sdk | + +### New Themes (Release Orchestration) + +| Theme | Purpose | Key Modules | Documentation | +|-------|---------|-------------|---------------| +| **INTHUB** | Integration hub | integration-manager, connection-profiles, connector-runtime, doctor-checks | [Details](integration-hub.md) | +| **ENVMGR** | Environment & inventory | environment-manager, target-registry, agent-manager, inventory-sync | [Details](environment-manager.md) | +| **RELMAN** | Release management | component-registry, version-manager, release-manager, release-catalog | [Details](release-manager.md) | +| **WORKFL** | Workflow engine | workflow-designer, workflow-engine, step-executor, step-registry | [Details](workflow-engine.md) | +| **PROMOT** | Promotion & approval | promotion-manager, approval-gateway, decision-engine, gate-registry | [Details](promotion-manager.md) | +| **DEPLOY** | Deployment execution | deploy-orchestrator, target-executor, runner-executor, artifact-generator, rollback-manager | [Details](deploy-orchestrator.md) | +| **AGENTS** | Deployment agents | agent-core, agent-docker, agent-compose, agent-ssh, agent-winrm, agent-ecs, agent-nomad | [Details](agents.md) | +| **PROGDL** | Progressive delivery | ab-manager, traffic-router, canary-controller, rollout-strategy | [Details](progressive-delivery.md) | +| **RELEVI** | Release evidence | evidence-collector, evidence-signer, sticker-writer, audit-exporter | [Details](evidence.md) | +| **PLUGIN** | Plugin infrastructure | plugin-registry, plugin-loader, plugin-sandbox, plugin-sdk | [Details](plugin-system.md) | + +## Module Dependencies + +``` + ┌──────────────┐ + │ AUTHORITY │ + └──────┬───────┘ + │ + ┌──────────────────┼──────────────────┐ + │ │ │ + ▼ ▼ ▼ +┌───────────────┐ ┌───────────────┐ ┌───────────────┐ +│ INTHUB │ │ ENVMGR │ │ PLUGIN │ +│ (Integrations)│ │ (Environments)│ │ (Plugins) │ +└───────┬───────┘ └───────┬───────┘ └───────┬───────┘ + │ │ │ + └──────────┬───────┴──────────────────┘ + │ + ▼ + ┌───────────────┐ + │ RELMAN │ + │ (Releases) │ + └───────┬───────┘ + │ + ▼ + ┌───────────────┐ + │ WORKFL │ + │ (Workflows) │ + └───────┬───────┘ + │ + ┌──────────┴──────────┐ + │ │ + ▼ ▼ +┌───────────────┐ ┌───────────────┐ +│ PROMOT │ │ DEPLOY │ +│ (Promotion) │ │ (Deployment) │ +└───────┬───────┘ └───────┬───────┘ + │ │ + │ ▼ + │ ┌───────────────┐ + │ │ AGENTS │ + │ │ (Agents) │ + │ └───────┬───────┘ + │ │ + └──────────┬──────────┘ + │ + ▼ + ┌───────────────┐ + │ RELEVI │ + │ (Evidence) │ + └───────────────┘ +``` + +## Communication Patterns + +| Pattern | Usage | +|---------|-------| +| **Synchronous API** | User-initiated operations (CRUD, queries) | +| **Event Bus** | Cross-module notifications (domain events) | +| **Task Queue** | Long-running operations (deployments, syncs) | +| **WebSocket/SSE** | Real-time UI updates | +| **gRPC Streams** | Agent communication | + +## Database Schema Organization + +Each theme owns a PostgreSQL schema: + +| Schema | Owner Theme | +|--------|-------------| +| `release.integrations` | INTHUB | +| `release.environments` | ENVMGR | +| `release.components` | RELMAN | +| `release.workflows` | WORKFL | +| `release.promotions` | PROMOT | +| `release.deployments` | DEPLOY | +| `release.agents` | AGENTS | +| `release.evidence` | RELEVI | +| `release.plugins` | PLUGIN | diff --git a/docs/modules/release-orchestrator/modules/plugin-system.md b/docs/modules/release-orchestrator/modules/plugin-system.md new file mode 100644 index 000000000..4671d5537 --- /dev/null +++ b/docs/modules/release-orchestrator/modules/plugin-system.md @@ -0,0 +1,629 @@ +# PLUGIN: Plugin Infrastructure + +**Purpose**: Extensible plugin system for integrations, steps, and custom functionality. + +## Architecture Overview + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ PLUGIN ARCHITECTURE │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ PLUGIN REGISTRY │ │ +│ │ │ │ +│ │ - Plugin discovery and versioning │ │ +│ │ - Manifest validation │ │ +│ │ - Dependency resolution │ │ +│ └──────────────────────────────┬──────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ PLUGIN LOADER │ │ +│ │ │ │ +│ │ - Lifecycle management (load, start, stop, unload) │ │ +│ │ - Health monitoring │ │ +│ │ - Hot reload support │ │ +│ └──────────────────────────────┬──────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ PLUGIN SANDBOX │ │ +│ │ │ │ +│ │ - Process isolation │ │ +│ │ - Resource limits (CPU, memory, network) │ │ +│ │ - Capability enforcement │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ Plugin Types: │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ Connector │ │ Step │ │ Gate │ │ Agent │ │ +│ │ Plugins │ │ Providers │ │ Providers │ │ Plugins │ │ +│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Modules + +### Module: `plugin-registry` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Plugin discovery; versioning; manifest management | +| **Data Entities** | `Plugin`, `PluginManifest`, `PluginVersion` | +| **Events Produced** | `plugin.discovered`, `plugin.registered`, `plugin.unregistered` | + +**Plugin Entity**: +```typescript +interface Plugin { + id: UUID; + pluginId: string; // "com.example.my-connector" + version: string; // "1.2.3" + vendor: string; + license: string; + manifest: PluginManifest; + status: PluginStatus; + entrypoint: string; // Path to plugin executable/module + lastHealthCheck: DateTime; + healthMessage: string | null; + installedAt: DateTime; + updatedAt: DateTime; +} + +type PluginStatus = + | "discovered" // Found but not loaded + | "loaded" // Loaded but not active + | "active" // Running and healthy + | "stopped" // Manually stopped + | "failed" // Failed to load or crashed + | "degraded"; // Running but with issues +``` + +--- + +### Module: `plugin-loader` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Plugin lifecycle management | +| **Dependencies** | `plugin-registry`, `plugin-sandbox` | +| **Events Produced** | `plugin.loaded`, `plugin.started`, `plugin.stopped`, `plugin.failed` | + +**Plugin Lifecycle**: +``` +┌──────────────┐ +│ DISCOVERED │ ──── Plugin found in registry +└──────┬───────┘ + │ load() + ▼ +┌──────────────┐ +│ LOADED │ ──── Plugin validated and prepared +└──────┬───────┘ + │ start() + ▼ +┌──────────────┐ ┌──────────────┐ +│ ACTIVE │ ──── │ DEGRADED │ ◄── Health issues +└──────┬───────┘ └──────────────┘ + │ stop() │ + ▼ │ +┌──────────────┐ │ +│ STOPPED │ ◄───────────┘ manual stop +└──────────────┘ + + │ unload() + ▼ +┌──────────────┐ +│ UNLOADED │ +└──────────────┘ +``` + +**Lifecycle Operations**: +```typescript +interface PluginLoader { + // Discovery + discover(): Promise; + refresh(): Promise; + + // Lifecycle + load(pluginId: string): Promise; + start(pluginId: string): Promise; + stop(pluginId: string): Promise; + unload(pluginId: string): Promise; + restart(pluginId: string): Promise; + + // Health + checkHealth(pluginId: string): Promise; + getStatus(pluginId: string): Promise; + + // Hot reload + reload(pluginId: string): Promise; +} +``` + +--- + +### Module: `plugin-sandbox` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Isolation; resource limits; security | +| **Enforcement** | Process isolation, capability-based security | + +**Sandbox Configuration**: +```typescript +interface SandboxConfig { + // Process isolation + processIsolation: boolean; // Run in separate process + containerIsolation: boolean; // Run in container + + // Resource limits + resourceLimits: { + maxMemoryMb: number; // Memory limit + maxCpuPercent: number; // CPU limit + maxDiskMb: number; // Disk quota + maxNetworkBandwidth: number; // Network bandwidth limit + }; + + // Network restrictions + networkPolicy: { + allowedHosts: string[]; // Allowed outbound hosts + blockedHosts: string[]; // Blocked hosts + allowOutbound: boolean; // Allow any outbound + }; + + // Filesystem restrictions + filesystemPolicy: { + readOnlyPaths: string[]; + writablePaths: string[]; + blockedPaths: string[]; + }; + + // Timeouts + timeouts: { + initializationMs: number; + operationMs: number; + shutdownMs: number; + }; +} +``` + +**Capability Enforcement**: +```typescript +interface PluginCapabilities { + // Integration capabilities + integrations: { + scm: boolean; + ci: boolean; + registry: boolean; + vault: boolean; + router: boolean; + }; + + // Step capabilities + steps: { + deploy: boolean; + gate: boolean; + notify: boolean; + custom: boolean; + }; + + // System capabilities + system: { + network: boolean; + filesystem: boolean; + secrets: boolean; + database: boolean; + }; +} +``` + +--- + +### Module: `plugin-sdk` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | SDK for plugin development | +| **Languages** | C#, TypeScript, Go | + +**Plugin SDK Interface**: +```typescript +// Base plugin interface +interface StellaPlugin { + // Lifecycle + initialize(config: PluginConfig): Promise; + start(): Promise; + stop(): Promise; + dispose(): Promise; + + // Health + getHealth(): Promise; + + // Metadata + getManifest(): PluginManifest; +} + +// Connector plugin interface +interface ConnectorPlugin extends StellaPlugin { + createConnector(config: ConnectorConfig): Promise; +} + +// Step provider plugin interface +interface StepProviderPlugin extends StellaPlugin { + getStepTypes(): StepType[]; + executeStep( + stepType: string, + config: StepConfig, + inputs: StepInputs, + context: StepContext + ): AsyncGenerator; +} + +// Gate provider plugin interface +interface GateProviderPlugin extends StellaPlugin { + getGateTypes(): GateType[]; + evaluateGate( + gateType: string, + config: GateConfig, + context: GateContext + ): Promise; +} +``` + +--- + +## Three-Surface Plugin Model + +Plugins contribute to the system through three distinct surfaces: + +### 1. Manifest Surface (Static) + +The plugin manifest declares: +- Plugin identity and version +- Required capabilities +- Provided integrations/steps/gates +- Configuration schema +- UI components (optional) + +```yaml +# plugin.stella.yaml +plugin: + id: "com.example.jenkins-connector" + version: "1.0.0" + vendor: "Example Corp" + license: "Apache-2.0" + description: "Jenkins CI integration for Stella Ops" + +capabilities: + required: + - network + optional: + - secrets + +provides: + integrations: + - type: "ci.jenkins" + displayName: "Jenkins" + configSchema: "./schemas/jenkins-config.json" + capabilities: + - "pipelines" + - "builds" + - "artifacts" + + steps: + - type: "jenkins-trigger" + displayName: "Trigger Jenkins Build" + category: "integration" + configSchema: "./schemas/jenkins-trigger-config.json" + inputSchema: "./schemas/jenkins-trigger-input.json" + outputSchema: "./schemas/jenkins-trigger-output.json" + +ui: + configScreen: "./ui/config.html" + icon: "./assets/jenkins-icon.svg" + +dependencies: + stellaCore: ">=1.0.0" +``` + +### 2. Connector Runtime Surface (Dynamic) + +Plugins implement connector interfaces for runtime operations: + +```typescript +// Jenkins connector implementation +class JenkinsConnector implements CIConnector { + private client: JenkinsClient; + + async initialize(config: ConnectorConfig, secrets: SecretHandle[]): Promise { + const apiToken = await this.getSecret(secrets, "api_token"); + this.client = new JenkinsClient({ + baseUrl: config.endpoint, + username: config.username, + apiToken: apiToken, + }); + } + + async testConnection(): Promise { + try { + const crumb = await this.client.getCrumb(); + return { success: true, message: "Connected to Jenkins" }; + } catch (error) { + return { success: false, message: error.message }; + } + } + + async listPipelines(): Promise { + const jobs = await this.client.getJobs(); + return jobs.map(job => ({ + id: job.name, + name: job.displayName, + url: job.url, + lastBuild: job.lastBuild?.number, + })); + } + + async triggerPipeline(pipelineId: string, params: object): Promise { + const queueItem = await this.client.build(pipelineId, params); + return { + id: queueItem.id.toString(), + pipelineId, + status: "queued", + startedAt: new Date(), + }; + } + + async getPipelineRun(runId: string): Promise { + const build = await this.client.getBuild(runId); + return { + id: build.number.toString(), + pipelineId: build.job, + status: this.mapStatus(build.result), + startedAt: new Date(build.timestamp), + completedAt: build.result ? new Date(build.timestamp + build.duration) : null, + }; + } +} +``` + +### 3. Step Provider Surface (Execution) + +Plugins implement step execution logic: + +```typescript +// Jenkins trigger step implementation +class JenkinsTriggerStep implements StepExecutor { + async *execute( + config: StepConfig, + inputs: StepInputs, + context: StepContext + ): AsyncGenerator { + const connector = await context.getConnector(config.integrationId); + + yield { type: "log", line: `Triggering Jenkins job: ${config.jobName}` }; + + // Trigger build + const run = await connector.triggerPipeline(config.jobName, inputs.parameters); + yield { type: "output", name: "buildId", value: run.id }; + yield { type: "log", line: `Build queued: ${run.id}` }; + + // Wait for completion if configured + if (config.waitForCompletion) { + yield { type: "log", line: "Waiting for build to complete..." }; + + while (true) { + const status = await connector.getPipelineRun(run.id); + + if (status.status === "succeeded") { + yield { type: "output", name: "status", value: "succeeded" }; + yield { type: "result", success: true }; + return; + } + + if (status.status === "failed") { + yield { type: "output", name: "status", value: "failed" }; + yield { type: "result", success: false, message: "Build failed" }; + return; + } + + yield { type: "progress", progress: 50, message: `Build running: ${status.status}` }; + await sleep(config.pollIntervalSeconds * 1000); + } + } + + yield { type: "result", success: true }; + } +} +``` + +--- + +## Database Schema + +```sql +-- Plugins +CREATE TABLE release.plugins ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + plugin_id VARCHAR(255) NOT NULL UNIQUE, + version VARCHAR(50) NOT NULL, + vendor VARCHAR(255) NOT NULL, + license VARCHAR(100), + manifest JSONB NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'discovered' CHECK (status IN ( + 'discovered', 'loaded', 'active', 'stopped', 'failed', 'degraded' + )), + entrypoint VARCHAR(500) NOT NULL, + last_health_check TIMESTAMPTZ, + health_message TEXT, + installed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_plugins_status ON release.plugins(status); + +-- Plugin Instances (per-tenant configuration) +CREATE TABLE release.plugin_instances ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + plugin_id UUID NOT NULL REFERENCES release.plugins(id) ON DELETE CASCADE, + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + config JSONB NOT NULL DEFAULT '{}', + enabled BOOLEAN NOT NULL DEFAULT TRUE, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_plugin_instances_tenant ON release.plugin_instances(tenant_id); + +-- Integration types (populated by plugins) +CREATE TABLE release.integration_types ( + id TEXT PRIMARY KEY, -- "scm.github", "ci.jenkins" + plugin_id UUID REFERENCES release.plugins(id), + display_name TEXT NOT NULL, + description TEXT, + icon_url TEXT, + config_schema JSONB NOT NULL, -- JSON Schema for config + capabilities TEXT[] NOT NULL, -- ["repos", "webhooks", "status"] + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); +``` + +--- + +## API Endpoints + +```yaml +# Plugin Registry +GET /api/v1/plugins + Query: ?status={status}&capability={type} + Response: Plugin[] + +GET /api/v1/plugins/{id} + Response: Plugin (with manifest) + +POST /api/v1/plugins/{id}/enable + Response: Plugin + +POST /api/v1/plugins/{id}/disable + Response: Plugin + +GET /api/v1/plugins/{id}/health + Response: { status, message, diagnostics[] } + +# Plugin Instances (per-tenant config) +POST /api/v1/plugin-instances + Body: { pluginId: UUID, config: object } + Response: PluginInstance + +GET /api/v1/plugin-instances + Response: PluginInstance[] + +PUT /api/v1/plugin-instances/{id} + Body: { config: object, enabled: boolean } + Response: PluginInstance + +DELETE /api/v1/plugin-instances/{id} + Response: { deleted: true } +``` + +--- + +## Plugin Security + +### Capability Declaration + +Plugins must declare all required capabilities in their manifest. The system enforces: + +1. **Network Access**: Plugins can only access declared hosts +2. **Secret Access**: Plugins receive secrets through controlled injection +3. **Database Access**: No direct database access; API only +4. **Filesystem Access**: Limited to declared paths + +### Sandbox Enforcement + +```typescript +// Plugin execution is sandboxed +class PluginSandbox { + async execute( + plugin: Plugin, + operation: () => Promise + ): Promise { + // 1. Verify capabilities + this.verifyCapabilities(plugin); + + // 2. Set resource limits + const limits = this.getResourceLimits(plugin); + await this.applyLimits(limits); + + // 3. Create isolated context + const context = await this.createIsolatedContext(plugin); + + try { + // 4. Execute with timeout + return await this.withTimeout( + operation(), + plugin.manifest.timeouts.operationMs + ); + } catch (error) { + // 5. Log and handle errors + await this.handlePluginError(plugin, error); + throw error; + } finally { + // 6. Cleanup + await context.dispose(); + } + } +} +``` + +### Plugin Failures Cannot Crash Core + +```csharp +// Core orchestration is protected from plugin failures +public sealed class PromotionDecisionEngine +{ + public async Task EvaluateAsync( + Promotion promotion, + IReadOnlyList gates, + CancellationToken ct) + { + var results = new List(); + + foreach (var gate in gates) + { + try + { + // Plugin provides evaluation logic + var result = await gate.EvaluateAsync(promotion, ct); + results.Add(result); + } + catch (Exception ex) + { + // Plugin failure is logged but doesn't crash core + _logger.LogError(ex, "Gate {GateType} failed", gate.Type); + results.Add(new GateResult + { + GateType = gate.Type, + Status = GateStatus.Failed, + Message = $"Gate evaluation failed: {ex.Message}", + IsBlocking = gate.IsBlocking, + }); + } + + // Core decides how to aggregate (plugins cannot override) + if (results.Last().IsBlocking && _policy.FailFast) + break; + } + + // Core makes final decision + return _decisionAggregator.Aggregate(results); + } +} +``` + +--- + +## References + +- [Module Overview](overview.md) +- [Integration Hub](integration-hub.md) +- [Workflow Engine](workflow-engine.md) +- [Connector Interface](../integrations/connectors.md) diff --git a/docs/modules/release-orchestrator/modules/progressive-delivery.md b/docs/modules/release-orchestrator/modules/progressive-delivery.md new file mode 100644 index 000000000..03c02fe1f --- /dev/null +++ b/docs/modules/release-orchestrator/modules/progressive-delivery.md @@ -0,0 +1,471 @@ +# PROGDL: Progressive Delivery + +**Purpose**: A/B releases, canary deployments, and traffic management. + +## Architecture Overview + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ PROGRESSIVE DELIVERY ARCHITECTURE │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ A/B RELEASE MANAGER │ │ +│ │ │ │ +│ │ - Create A/B release with variations │ │ +│ │ - Manage traffic split configuration │ │ +│ │ - Coordinate rollout stages │ │ +│ │ - Handle promotion/rollback │ │ +│ └──────────────────────────────┬──────────────────────────────────────┘ │ +│ │ │ +│ ┌──────────────────┴──────────────────┐ │ +│ │ │ │ +│ ▼ ▼ │ +│ ┌───────────────────────┐ ┌───────────────────────┐ │ +│ │ TARGET-GROUP A/B │ │ ROUTER-BASED A/B │ │ +│ │ │ │ │ │ +│ │ Deploy to groups │ │ Configure traffic │ │ +│ │ by labels/membership │ │ via load balancer │ │ +│ │ │ │ │ │ +│ │ Good for: │ │ Good for: │ │ +│ │ - Background workers │ │ - Web/API traffic │ │ +│ │ - Batch processors │ │ - Customer-facing │ │ +│ │ - Internal services │ │ - L7 routing │ │ +│ └───────────────────────┘ └───────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ CANARY CONTROLLER │ │ +│ │ │ │ +│ │ - Execute rollout stages │ │ +│ │ - Monitor health metrics │ │ +│ │ - Auto-advance or pause │ │ +│ │ - Trigger rollback on failure │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ TRAFFIC ROUTER INTEGRATION │ │ +│ │ │ │ +│ │ Plugin-based integration with: │ │ +│ │ - Nginx (config generation + reload) │ │ +│ │ - HAProxy (config generation + reload) │ │ +│ │ - Traefik (dynamic config API) │ │ +│ │ - AWS ALB (target group weights) │ │ +│ │ - Custom (webhook) │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Modules + +### Module: `ab-manager` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | A/B release lifecycle; variation management | +| **Dependencies** | `release-manager`, `environment-manager`, `deploy-orchestrator` | +| **Data Entities** | `ABRelease`, `Variation`, `TrafficSplit` | +| **Events Produced** | `ab.created`, `ab.started`, `ab.stage_advanced`, `ab.promoted`, `ab.rolled_back` | + +**A/B Release Entity**: +```typescript +interface ABRelease { + id: UUID; + tenantId: UUID; + environmentId: UUID; + name: string; + variations: Variation[]; + activeVariation: string; // "A" or "B" + trafficSplit: TrafficSplit; + rolloutStrategy: RolloutStrategy; + status: ABReleaseStatus; + createdAt: DateTime; + completedAt: DateTime | null; + createdBy: UUID; +} + +interface Variation { + name: string; // "A", "B" + releaseId: UUID; + targetGroupId: UUID | null; // for target-group based A/B + trafficPercentage: number; + deploymentJobId: UUID | null; +} + +interface TrafficSplit { + type: "percentage" | "sticky" | "header"; + percentages: Record; // {"A": 90, "B": 10} + stickyKey?: string; // cookie or header name + headerMatch?: { // for header-based routing + header: string; + values: Record; // value -> variation + }; +} + +type ABReleaseStatus = + | "created" // Configured, not started + | "deploying" // Deploying variations + | "running" // Active with traffic split + | "promoting" // Promoting winner to 100% + | "completed" // Successfully completed + | "rolled_back"; // Rolled back to original +``` + +**A/B Release Models**: + +| Model | Description | Use Case | +|-------|-------------|----------| +| **Target-Group A/B** | Deploy different releases to different target groups | Background workers, internal services | +| **Router-Based A/B** | Use load balancer to split traffic | Web/API traffic, customer-facing | +| **Hybrid A/B** | Combination of both | Complex deployments | + +--- + +### Module: `traffic-router` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Router plugin orchestration; traffic shifting | +| **Dependencies** | `integration-manager`, `connector-runtime` | +| **Protocol** | Plugin-specific (API calls, config generation) | + +**Router Connector Interface**: +```typescript +interface RouterConnector extends BaseConnector { + // Traffic management + configureRoute(config: RouteConfig): Promise; + getTrafficDistribution(): Promise; + shiftTraffic(from: string, to: string, percentage: number): Promise; + + // Configuration + reloadConfig(): Promise; + validateConfig(config: string): Promise; +} + +interface RouteConfig { + upstream: string; + backends: Array<{ + name: string; + targets: string[]; + weight: number; + }>; + healthCheck?: { + path: string; + interval: number; + timeout: number; + }; +} + +interface TrafficDistribution { + backends: Array<{ + name: string; + weight: number; + healthyTargets: number; + totalTargets: number; + }>; + timestamp: DateTime; +} +``` + +**Router Plugins**: + +| Plugin | Capabilities | +|--------|-------------| +| `router.nginx` | Config generation, reload via signal/API | +| `router.haproxy` | Config generation, reload via socket | +| `router.traefik` | Dynamic config API | +| `router.aws_alb` | Target group weights via AWS API | +| `router.custom` | Webhook-based custom integration | + +--- + +### Module: `canary-controller` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Canary ramp automation; health monitoring | +| **Dependencies** | `ab-manager`, `traffic-router` | +| **Data Entities** | `CanaryStage`, `HealthResult` | +| **Events Produced** | `canary.stage_started`, `canary.stage_passed`, `canary.stage_failed` | + +**Canary Stage Entity**: +```typescript +interface CanaryStage { + id: UUID; + abReleaseId: UUID; + stageNumber: number; + trafficPercentage: number; + status: CanaryStageStatus; + healthThreshold: number; // Required health % to pass + durationSeconds: number; // How long to run stage + requireApproval: boolean; // Require manual approval + startedAt: DateTime | null; + completedAt: DateTime | null; + healthResult: HealthResult | null; +} + +type CanaryStageStatus = + | "pending" + | "running" + | "succeeded" + | "failed" + | "skipped"; + +interface HealthResult { + healthy: boolean; + healthPercentage: number; + metrics: { + successRate: number; + errorRate: number; + latencyP50: number; + latencyP99: number; + }; + samples: number; + evaluatedAt: DateTime; +} +``` + +**Canary Rollout Execution**: +```typescript +class CanaryController { + async executeRollout(abRelease: ABRelease): Promise { + const stages = abRelease.rolloutStrategy.stages; + + for (const stage of stages) { + this.log(`Starting canary stage ${stage.stageNumber}: ${stage.trafficPercentage}%`); + + // 1. Shift traffic to canary percentage + await this.trafficRouter.shiftTraffic( + abRelease.variations[0].name, // baseline + abRelease.variations[1].name, // canary + stage.trafficPercentage + ); + + // 2. Update stage status + stage.status = "running"; + stage.startedAt = new Date(); + await this.save(stage); + + // 3. Wait for stage duration + await this.waitForDuration(stage.durationSeconds); + + // 4. Evaluate health + const healthResult = await this.evaluateHealth(abRelease, stage); + stage.healthResult = healthResult; + + if (!healthResult.healthy || healthResult.healthPercentage < stage.healthThreshold) { + stage.status = "failed"; + await this.save(stage); + + // Rollback + await this.rollback(abRelease); + throw new CanaryFailedError(`Stage ${stage.stageNumber} failed health check`); + } + + // 5. Check if approval required + if (stage.requireApproval) { + await this.waitForApproval(abRelease, stage); + } + + stage.status = "succeeded"; + stage.completedAt = new Date(); + await this.save(stage); + + // 6. Check for auto-advance + if (!abRelease.rolloutStrategy.autoAdvance) { + await this.waitForManualAdvance(abRelease); + } + } + + // All stages passed - promote canary to 100% + await this.promote(abRelease, abRelease.variations[1].name); + } + + private async evaluateHealth(abRelease: ABRelease, stage: CanaryStage): Promise { + // Collect metrics from targets + const canaryVariation = abRelease.variations.find(v => v.name === "B"); + const targets = await this.getTargets(canaryVariation.targetGroupId); + + let healthyCount = 0; + let totalLatency = 0; + let errorCount = 0; + + for (const target of targets) { + const health = await this.checkTargetHealth(target); + if (health.healthy) healthyCount++; + totalLatency += health.latencyMs; + if (health.errorRate > 0) errorCount++; + } + + return { + healthy: healthyCount >= targets.length * (stage.healthThreshold / 100), + healthPercentage: (healthyCount / targets.length) * 100, + metrics: { + successRate: ((targets.length - errorCount) / targets.length) * 100, + errorRate: (errorCount / targets.length) * 100, + latencyP50: totalLatency / targets.length, + latencyP99: totalLatency / targets.length * 1.5, // simplified + }, + samples: targets.length, + evaluatedAt: new Date(), + }; + } +} +``` + +--- + +### Module: `rollout-strategy` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Strategy templates; configuration | +| **Data Entities** | `RolloutStrategyTemplate` | + +**Built-in Strategy Templates**: + +| Template | Stages | Description | +|----------|--------|-------------| +| `canary-10-25-50-100` | 4 | Standard canary: 10%, 25%, 50%, 100% | +| `canary-1-5-10-50-100` | 5 | Conservative: 1%, 5%, 10%, 50%, 100% | +| `blue-green-instant` | 2 | Deploy 100% to green, instant switch | +| `blue-green-gradual` | 4 | Gradual shift: 25%, 50%, 75%, 100% | + +**Rollout Strategy Definition**: +```typescript +interface RolloutStrategy { + id: UUID; + name: string; + stages: Array<{ + trafficPercentage: number; + durationSeconds: number; + healthThreshold: number; + requireApproval: boolean; + }>; + autoAdvance: boolean; + rollbackOnFailure: boolean; + healthCheckInterval: number; +} + +// Example: Standard Canary +const standardCanary: RolloutStrategy = { + name: "canary-10-25-50-100", + stages: [ + { trafficPercentage: 10, durationSeconds: 300, healthThreshold: 95, requireApproval: false }, + { trafficPercentage: 25, durationSeconds: 600, healthThreshold: 95, requireApproval: false }, + { trafficPercentage: 50, durationSeconds: 900, healthThreshold: 95, requireApproval: true }, + { trafficPercentage: 100, durationSeconds: 0, healthThreshold: 95, requireApproval: false }, + ], + autoAdvance: true, + rollbackOnFailure: true, + healthCheckInterval: 30, +}; +``` + +--- + +## Database Schema + +```sql +-- A/B Releases +CREATE TABLE release.ab_releases ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES release.environments(id), + name VARCHAR(255) NOT NULL, + variations JSONB NOT NULL, -- [{name, releaseId, targetGroupId, trafficPercentage}] + active_variation VARCHAR(50) NOT NULL DEFAULT 'A', + traffic_split JSONB NOT NULL, + rollout_strategy JSONB NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'created' CHECK (status IN ( + 'created', 'deploying', 'running', 'promoting', 'completed', 'rolled_back' + )), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + completed_at TIMESTAMPTZ, + created_by UUID REFERENCES users(id) +); + +CREATE INDEX idx_ab_releases_tenant_env ON release.ab_releases(tenant_id, environment_id); +CREATE INDEX idx_ab_releases_status ON release.ab_releases(status); + +-- Canary Stages +CREATE TABLE release.canary_stages ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + ab_release_id UUID NOT NULL REFERENCES release.ab_releases(id) ON DELETE CASCADE, + stage_number INTEGER NOT NULL, + traffic_percentage INTEGER NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN ( + 'pending', 'running', 'succeeded', 'failed', 'skipped' + )), + health_threshold DECIMAL(5,2), + duration_seconds INTEGER, + require_approval BOOLEAN NOT NULL DEFAULT FALSE, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + health_result JSONB, + UNIQUE (ab_release_id, stage_number) +); +``` + +--- + +## API Endpoints + +```yaml +# A/B Releases +POST /api/v1/ab-releases + Body: { + environmentId: UUID, + name: string, + variations: [ + { name: "A", releaseId: UUID, targetGroupId?: UUID }, + { name: "B", releaseId: UUID, targetGroupId?: UUID } + ], + trafficSplit: TrafficSplit, + rolloutStrategy: RolloutStrategy + } + Response: ABRelease + +GET /api/v1/ab-releases + Query: ?environmentId={uuid}&status={status} + Response: ABRelease[] + +GET /api/v1/ab-releases/{id} + Response: ABRelease (with stages) + +POST /api/v1/ab-releases/{id}/start + Response: ABRelease + +POST /api/v1/ab-releases/{id}/advance + Body: { stageNumber?: number } # advance to next or specific stage + Response: ABRelease + +POST /api/v1/ab-releases/{id}/promote + Body: { variation: "A" | "B" } # promote to 100% + Response: ABRelease + +POST /api/v1/ab-releases/{id}/rollback + Response: ABRelease + +GET /api/v1/ab-releases/{id}/traffic + Response: { currentSplit: TrafficDistribution, history: TrafficHistory[] } + +GET /api/v1/ab-releases/{id}/health + Response: { variations: [{ name, healthStatus, metrics }] } + +# Rollout Strategies +GET /api/v1/rollout-strategies + Response: RolloutStrategyTemplate[] + +GET /api/v1/rollout-strategies/{id} + Response: RolloutStrategyTemplate +``` + +--- + +## References + +- [Module Overview](overview.md) +- [Deploy Orchestrator](deploy-orchestrator.md) +- [A/B Releases](../progressive-delivery/ab-releases.md) +- [Canary Controller](../progressive-delivery/canary.md) +- [Router Plugins](../progressive-delivery/routers.md) diff --git a/docs/modules/release-orchestrator/modules/promotion-manager.md b/docs/modules/release-orchestrator/modules/promotion-manager.md new file mode 100644 index 000000000..40e331f4e --- /dev/null +++ b/docs/modules/release-orchestrator/modules/promotion-manager.md @@ -0,0 +1,433 @@ +# PROMOT: Promotion & Approval Manager + +**Purpose**: Manage promotion requests, approvals, gates, and decision records. + +## Modules + +### Module: `promotion-manager` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Promotion request lifecycle; state management | +| **Dependencies** | `release-manager`, `environment-manager`, `workflow-engine` | +| **Data Entities** | `Promotion`, `PromotionState` | +| **Events Produced** | `promotion.requested`, `promotion.approved`, `promotion.rejected`, `promotion.started`, `promotion.completed`, `promotion.failed`, `promotion.rolled_back` | + +**Key Operations**: +``` +RequestPromotion(releaseId, targetEnvironmentId, reason) → Promotion +ApprovePromotion(promotionId, comment) → Promotion +RejectPromotion(promotionId, reason) → Promotion +CancelPromotion(promotionId) → Promotion +GetPromotionStatus(promotionId) → PromotionState +GetDecisionRecord(promotionId) → DecisionRecord +``` + +**Promotion Entity**: +```typescript +interface Promotion { + id: UUID; + tenantId: UUID; + releaseId: UUID; + sourceEnvironmentId: UUID | null; // null for first deployment + targetEnvironmentId: UUID; + status: PromotionStatus; + decisionRecord: DecisionRecord; + workflowRunId: UUID | null; + requestedAt: DateTime; + requestedBy: UUID; + requestReason: string; + decidedAt: DateTime | null; + startedAt: DateTime | null; + completedAt: DateTime | null; + evidencePacketId: UUID | null; +} + +type PromotionStatus = + | "pending_approval" // Waiting for human approval + | "pending_gate" // Waiting for gate evaluation + | "approved" // Ready for deployment + | "rejected" // Blocked by approval or gate + | "deploying" // Deployment in progress + | "deployed" // Successfully deployed + | "failed" // Deployment failed + | "cancelled" // User cancelled + | "rolled_back"; // Rolled back after failure +``` + +--- + +### Module: `approval-gateway` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Approval collection; separation of duties enforcement | +| **Dependencies** | `authority` (for user/group lookup) | +| **Data Entities** | `Approval`, `ApprovalPolicy` | +| **Events Produced** | `approval.granted`, `approval.denied` | + +**Approval Policy Entity**: +```typescript +interface ApprovalPolicy { + id: UUID; + tenantId: UUID; + environmentId: UUID; + requiredCount: number; // Minimum approvals required + requiredRoles: string[]; // At least one approver must have role + requiredGroups: string[]; // At least one approver must be in group + requireSeparationOfDuties: boolean; // Requester cannot approve + allowSelfApproval: boolean; // Override SoD for specific users + expirationMinutes: number; // Approval expires after N minutes +} + +interface Approval { + id: UUID; + tenantId: UUID; + promotionId: UUID; + approverId: UUID; + action: "approved" | "rejected"; + comment: string; + approvedAt: DateTime; + approverRole: string; + approverGroups: string[]; +} +``` + +**Separation of Duties (SoD) Rules**: +1. Requester cannot approve their own promotion (if `requireSeparationOfDuties` is true) +2. Same user cannot approve twice +3. At least N different users must approve (based on `requiredCount`) +4. At least one approver must match `requiredRoles` if specified +5. At least one approver must be in `requiredGroups` if specified + +--- + +### Module: `decision-engine` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Gate evaluation; policy integration; decision record generation | +| **Dependencies** | `gate-registry`, `policy` (OPA integration), `scanner` (security data) | +| **Data Entities** | `DecisionRecord`, `GateResult` | +| **Events Produced** | `decision.evaluated`, `decision.recorded` | + +**Decision Record Structure**: +```typescript +interface DecisionRecord { + promotionId: UUID; + evaluatedAt: DateTime; + decision: "allow" | "deny" | "pending"; + + // What was evaluated + release: { + id: UUID; + name: string; + components: Array<{ + name: string; + digest: string; + semver: string; + }>; + }; + + environment: { + id: UUID; + name: string; + requiredApprovals: number; + freezeWindow: boolean; + }; + + // Gate evaluation results + gates: GateResult[]; + + // Approval status + approvalStatus: { + required: number; + received: number; + approvers: Array<{ + userId: UUID; + action: string; + at: DateTime; + }>; + sodViolation: boolean; + }; + + // Reason for decision + reasons: string[]; + + // Hash of all inputs for replay verification + inputsHash: string; +} + +interface GateResult { + gateType: string; + gateName: string; + status: "passed" | "failed" | "warning" | "skipped"; + message: string; + details: Record; + evaluatedAt: DateTime; + durationMs: number; +} +``` + +**Gate Evaluation Order**: +1. **Freeze Window Check**: Is environment in freeze? +2. **Approval Check**: All required approvals received? +3. **Security Gate**: No blocking vulnerabilities? +4. **Custom Policy Gates**: All OPA policies pass? +5. **Integration Gates**: External system checks pass? + +--- + +### Module: `gate-registry` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Built-in + custom gate registration | +| **Dependencies** | `plugin-registry` | +| **Data Entities** | `GateDefinition`, `GateConfig` | + +**Built-in Gates**: + +| Gate Type | Description | +|-----------|-------------| +| `freeze-window` | Check if environment is in freeze | +| `approval` | Check if required approvals received | +| `security-scan` | Check for blocking vulnerabilities | +| `scan-freshness` | Check if scan is recent enough | +| `digest-verification` | Verify digests haven't changed | +| `environment-sequence` | Enforce promotion order | +| `custom-opa` | Custom OPA/Rego policy | +| `webhook` | External webhook gate | + +**Gate Definition**: +```typescript +interface GateDefinition { + type: string; + displayName: string; + description: string; + configSchema: JSONSchema; + evaluator: "builtin" | UUID; // builtin or plugin ID + blocking: boolean; // Can block promotion + cacheable: boolean; // Can cache result + cacheTtlSeconds: number; +} +``` + +--- + +## Promotion State Machine + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ PROMOTION STATE MACHINE │ +│ │ +│ ┌───────────────┐ │ +│ │ REQUESTED │ ◄──── User requests promotion │ +│ └───────┬───────┘ │ +│ │ │ +│ ▼ │ +│ ┌───────────────┐ ┌───────────────┐ │ +│ │ PENDING │─────►│ REJECTED │ ◄──── Approver rejects │ +│ │ APPROVAL │ └───────────────┘ │ +│ └───────┬───────┘ │ +│ │ approval received │ +│ ▼ │ +│ ┌───────────────┐ ┌───────────────┐ │ +│ │ PENDING │─────►│ REJECTED │ ◄──── Gate fails │ +│ │ GATE │ └───────────────┘ │ +│ └───────┬───────┘ │ +│ │ all gates pass │ +│ ▼ │ +│ ┌───────────────┐ │ +│ │ APPROVED │ ◄──── Ready for deployment │ +│ └───────┬───────┘ │ +│ │ workflow starts │ +│ ▼ │ +│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ +│ │ DEPLOYING │─────►│ FAILED │─────►│ ROLLED_BACK │ │ +│ └───────┬───────┘ └───────────────┘ └───────────────┘ │ +│ │ │ +│ │ deployment complete │ +│ ▼ │ +│ ┌───────────────┐ │ +│ │ DEPLOYED │ ◄──── Success! │ +│ └───────────────┘ │ +│ │ +│ Additional transitions: │ +│ - Any non-terminal → CANCELLED: user cancels │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Database Schema + +```sql +-- Promotions +CREATE TABLE release.promotions ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + release_id UUID NOT NULL REFERENCES release.releases(id), + source_environment_id UUID REFERENCES release.environments(id), + target_environment_id UUID NOT NULL REFERENCES release.environments(id), + status VARCHAR(50) NOT NULL DEFAULT 'pending_approval' CHECK (status IN ( + 'pending_approval', 'pending_gate', 'approved', 'rejected', + 'deploying', 'deployed', 'failed', 'cancelled', 'rolled_back' + )), + decision_record JSONB, + workflow_run_id UUID REFERENCES release.workflow_runs(id), + requested_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + requested_by UUID NOT NULL REFERENCES users(id), + request_reason TEXT, + decided_at TIMESTAMPTZ, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + evidence_packet_id UUID +); + +CREATE INDEX idx_promotions_tenant ON release.promotions(tenant_id); +CREATE INDEX idx_promotions_release ON release.promotions(release_id); +CREATE INDEX idx_promotions_status ON release.promotions(status); +CREATE INDEX idx_promotions_target_env ON release.promotions(target_environment_id); + +-- Approvals +CREATE TABLE release.approvals ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + promotion_id UUID NOT NULL REFERENCES release.promotions(id) ON DELETE CASCADE, + approver_id UUID NOT NULL REFERENCES users(id), + action VARCHAR(50) NOT NULL CHECK (action IN ('approved', 'rejected')), + comment TEXT, + approved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + approver_role VARCHAR(255), + approver_groups JSONB NOT NULL DEFAULT '[]' +); + +CREATE INDEX idx_approvals_promotion ON release.approvals(promotion_id); +CREATE INDEX idx_approvals_approver ON release.approvals(approver_id); + +-- Approval Policies +CREATE TABLE release.approval_policies ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES release.environments(id) ON DELETE CASCADE, + required_count INTEGER NOT NULL DEFAULT 1, + required_roles JSONB NOT NULL DEFAULT '[]', + required_groups JSONB NOT NULL DEFAULT '[]', + require_sod BOOLEAN NOT NULL DEFAULT FALSE, + allow_self_approval BOOLEAN NOT NULL DEFAULT FALSE, + expiration_minutes INTEGER NOT NULL DEFAULT 1440, + UNIQUE (tenant_id, environment_id) +); +``` + +--- + +## API Endpoints + +```yaml +# Promotions +POST /api/v1/promotions + Body: { releaseId, targetEnvironmentId, reason? } + Response: Promotion + +GET /api/v1/promotions + Query: ?status={status}&releaseId={uuid}&environmentId={uuid}&page={n} + Response: { data: Promotion[], meta: PaginationMeta } + +GET /api/v1/promotions/{id} + Response: Promotion (with decision record, approvals) + +POST /api/v1/promotions/{id}/approve + Body: { comment? } + Response: Promotion + +POST /api/v1/promotions/{id}/reject + Body: { reason } + Response: Promotion + +POST /api/v1/promotions/{id}/cancel + Response: Promotion + +GET /api/v1/promotions/{id}/decision + Response: DecisionRecord + +GET /api/v1/promotions/{id}/approvals + Response: Approval[] + +GET /api/v1/promotions/{id}/evidence + Response: EvidencePacket + +# Gate Evaluation Preview +POST /api/v1/promotions/preview-gates + Body: { releaseId, targetEnvironmentId } + Response: { wouldPass: boolean, gates: GateResult[] } + +# Approval Policies +POST /api/v1/approval-policies +GET /api/v1/approval-policies +GET /api/v1/approval-policies/{id} +PUT /api/v1/approval-policies/{id} +DELETE /api/v1/approval-policies/{id} + +# Pending Approvals (for current user) +GET /api/v1/my/pending-approvals + Response: Promotion[] +``` + +--- + +## Security Gate Integration + +The security gate evaluates the release against vulnerability data from the Scanner module: + +```typescript +interface SecurityGateConfig { + blockOnCritical: boolean; // Block if any critical severity + blockOnHigh: boolean; // Block if any high severity + maxCritical: number; // Max allowed critical (0 for strict) + maxHigh: number; // Max allowed high + requireFreshScan: boolean; // Require scan within N hours + scanFreshnessHours: number; // How recent scan must be + allowExceptions: boolean; // Allow VEX exceptions + requireVexJustification: boolean; // Require VEX for exceptions +} + +interface SecurityGateResult { + passed: boolean; + summary: { + critical: number; + high: number; + medium: number; + low: number; + }; + blocking: Array<{ + cve: string; + severity: string; + component: string; + digest: string; + fixAvailable: boolean; + }>; + exceptions: Array<{ + cve: string; + vexStatus: string; + justification: string; + }>; + scanAge: { + component: string; + scannedAt: DateTime; + ageHours: number; + fresh: boolean; + }[]; +} +``` + +--- + +## References + +- [Module Overview](overview.md) +- [Workflow Engine](workflow-engine.md) +- [Security Architecture](../security/overview.md) +- [API Documentation](../api/promotions.md) diff --git a/docs/modules/release-orchestrator/modules/release-manager.md b/docs/modules/release-orchestrator/modules/release-manager.md new file mode 100644 index 000000000..c43b68f47 --- /dev/null +++ b/docs/modules/release-orchestrator/modules/release-manager.md @@ -0,0 +1,406 @@ +# RELMAN: Release Management + +**Purpose**: Manage components, versions, and release bundles. + +## Modules + +### Module: `component-registry` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Map image repositories to logical components | +| **Dependencies** | `integration-manager` (for registry access) | +| **Data Entities** | `Component`, `ComponentVersion` | +| **Events Produced** | `component.created`, `component.updated`, `component.deleted` | + +**Key Operations**: +``` +CreateComponent(name, displayName, imageRepository, registryId) → Component +UpdateComponent(id, config) → Component +DeleteComponent(id) → void +SyncVersions(componentId, forceRefresh) → VersionMap[] +ListComponents(tenantId) → Component[] +``` + +**Component Entity**: +```typescript +interface Component { + id: UUID; + tenantId: UUID; + name: string; // "api", "worker", "frontend" + displayName: string; // "API Service" + imageRepository: string; // "registry.example.com/myapp/api" + registryIntegrationId: UUID; // which registry integration + versioningStrategy: VersionStrategy; + deploymentTemplate: string; // which workflow template to use + defaultChannel: string; // "stable", "beta" + metadata: Record; +} + +interface VersionStrategy { + type: "semver" | "date" | "sequential" | "manual"; + tagPattern?: string; // regex for tag extraction + semverExtract?: string; // regex capture group +} +``` + +--- + +### Module: `version-manager` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Tag/digest mapping; version rules | +| **Dependencies** | `component-registry`, `connector-runtime` | +| **Data Entities** | `VersionMap`, `VersionRule`, `Channel` | +| **Events Produced** | `version.resolved`, `version.updated` | + +**Version Resolution**: +```typescript +interface VersionMap { + id: UUID; + componentId: UUID; + tag: string; // "v2.3.1" + digest: string; // "sha256:abc123..." + semver: string; // "2.3.1" + channel: string; // "stable" + prerelease: boolean; + buildMetadata: string; + resolvedAt: DateTime; + source: "auto" | "manual"; +} + +interface VersionRule { + id: UUID; + componentId: UUID; + pattern: string; // "^v(\\d+\\.\\d+\\.\\d+)$" + channel: string; // "stable" + prereleasePattern: string;// ".*-(alpha|beta|rc).*" +} +``` + +**Version Resolution Algorithm**: +1. Fetch tags from registry (via connector) +2. Apply version rules to extract semver +3. Resolve each tag to digest +4. Store in version map +5. Update channels ("latest stable", "latest beta") + +--- + +### Module: `release-manager` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Release bundle lifecycle; composition | +| **Dependencies** | `component-registry`, `version-manager` | +| **Data Entities** | `Release`, `ReleaseComponent` | +| **Events Produced** | `release.created`, `release.promoted`, `release.deprecated` | + +**Release Entity**: +```typescript +interface Release { + id: UUID; + tenantId: UUID; + name: string; // "myapp-v2.3.1" + displayName: string; // "MyApp 2.3.1" + components: ReleaseComponent[]; + sourceRef: SourceReference; + status: ReleaseStatus; + createdAt: DateTime; + createdBy: UUID; + deployedEnvironments: UUID[]; // where currently deployed + metadata: Record; +} + +interface ReleaseComponent { + componentId: UUID; + componentName: string; + digest: string; // sha256:... + semver: string; // resolved semver + tag: string; // original tag (for display) + role: "primary" | "sidecar" | "init" | "migration"; +} + +interface SourceReference { + scmIntegrationId?: UUID; + commitSha?: string; + branch?: string; + ciIntegrationId?: UUID; + buildId?: string; + pipelineUrl?: string; +} + +type ReleaseStatus = + | "draft" // being composed + | "ready" // ready for promotion + | "promoting" // promotion in progress + | "deployed" // deployed to at least one env + | "deprecated" // marked as deprecated + | "archived"; // no longer active +``` + +**Release Creation Modes**: + +| Mode | Description | +|------|-------------| +| **Full Release** | All components, latest versions | +| **Partial Release** | Subset of components updated; others pinned from last deployment | +| **Pinned Release** | All versions explicitly specified | +| **Channel Release** | All components from specific channel ("beta") | + +--- + +### Module: `release-catalog` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Release history, search, comparison | +| **Dependencies** | `release-manager` | + +**Key Operations**: +``` +SearchReleases(filter, pagination) → Release[] +CompareReleases(releaseA, releaseB) → ReleaseDiff +GetReleaseHistory(componentId) → Release[] +GetReleaseLineage(releaseId) → ReleaseLineage // promotion path +``` + +**Release Comparison**: +```typescript +interface ReleaseDiff { + releaseA: UUID; + releaseB: UUID; + added: ComponentDiff[]; // Components in B not in A + removed: ComponentDiff[]; // Components in A not in B + changed: ComponentChange[]; // Components with different versions + unchanged: ComponentDiff[]; // Components with same version +} + +interface ComponentChange { + componentId: UUID; + componentName: string; + fromVersion: string; + toVersion: string; + fromDigest: string; + toDigest: string; +} +``` + +--- + +## Database Schema + +```sql +-- Components +CREATE TABLE release.components ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + name VARCHAR(255) NOT NULL, + display_name VARCHAR(255) NOT NULL, + image_repository VARCHAR(500) NOT NULL, + registry_integration_id UUID REFERENCES release.integrations(id), + versioning_strategy JSONB NOT NULL DEFAULT '{"type": "semver"}', + deployment_template VARCHAR(255), + default_channel VARCHAR(50) NOT NULL DEFAULT 'stable', + metadata JSONB NOT NULL DEFAULT '{}', + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_components_tenant ON release.components(tenant_id); + +-- Version Maps +CREATE TABLE release.version_maps ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + component_id UUID NOT NULL REFERENCES release.components(id) ON DELETE CASCADE, + tag VARCHAR(255) NOT NULL, + digest VARCHAR(100) NOT NULL, + semver VARCHAR(50), + channel VARCHAR(50) NOT NULL DEFAULT 'stable', + prerelease BOOLEAN NOT NULL DEFAULT FALSE, + build_metadata VARCHAR(255), + resolved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + source VARCHAR(50) NOT NULL DEFAULT 'auto', + UNIQUE (tenant_id, component_id, digest) +); + +CREATE INDEX idx_version_maps_component ON release.version_maps(component_id); +CREATE INDEX idx_version_maps_digest ON release.version_maps(digest); +CREATE INDEX idx_version_maps_semver ON release.version_maps(semver); + +-- Releases +CREATE TABLE release.releases ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + name VARCHAR(255) NOT NULL, + display_name VARCHAR(255) NOT NULL, + components JSONB NOT NULL, -- [{componentId, digest, semver, tag, role}] + source_ref JSONB, -- {scmIntegrationId, commitSha, ciIntegrationId, buildId} + status VARCHAR(50) NOT NULL DEFAULT 'draft', + metadata JSONB NOT NULL DEFAULT '{}', + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + created_by UUID REFERENCES users(id), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_releases_tenant ON release.releases(tenant_id); +CREATE INDEX idx_releases_status ON release.releases(status); +CREATE INDEX idx_releases_created ON release.releases(created_at DESC); + +-- Release Environment State +CREATE TABLE release.release_environment_state ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES release.environments(id) ON DELETE CASCADE, + release_id UUID NOT NULL REFERENCES release.releases(id), + status VARCHAR(50) NOT NULL, + deployed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + deployed_by UUID REFERENCES users(id), + promotion_id UUID, + evidence_ref VARCHAR(255), + UNIQUE (tenant_id, environment_id) +); + +CREATE INDEX idx_release_env_state_env ON release.release_environment_state(environment_id); +CREATE INDEX idx_release_env_state_release ON release.release_environment_state(release_id); +``` + +--- + +## API Endpoints + +```yaml +# Components +POST /api/v1/components + Body: { name, displayName, imageRepository, registryIntegrationId, versioningStrategy?, defaultChannel? } + Response: Component + +GET /api/v1/components + Response: Component[] + +GET /api/v1/components/{id} + Response: Component + +PUT /api/v1/components/{id} + Response: Component + +DELETE /api/v1/components/{id} + Response: { deleted: true } + +POST /api/v1/components/{id}/sync-versions + Body: { forceRefresh?: boolean } + Response: { synced: number, versions: VersionMap[] } + +GET /api/v1/components/{id}/versions + Query: ?channel={stable|beta}&limit={n} + Response: VersionMap[] + +# Version Maps +POST /api/v1/version-maps + Body: { componentId, tag, semver, channel } # manual version assignment + Response: VersionMap + +GET /api/v1/version-maps + Query: ?componentId={uuid}&channel={channel} + Response: VersionMap[] + +# Releases +POST /api/v1/releases + Body: { + name: string, + displayName?: string, + components: [ + { componentId: UUID, version?: string, digest?: string, channel?: string } + ], + sourceRef?: SourceReference + } + Response: Release + +GET /api/v1/releases + Query: ?status={status}&componentId={uuid}&page={n}&pageSize={n} + Response: { data: Release[], meta: PaginationMeta } + +GET /api/v1/releases/{id} + Response: Release (with full component details) + +PUT /api/v1/releases/{id} + Body: { displayName?, metadata?, status? } + Response: Release + +DELETE /api/v1/releases/{id} + Response: { deleted: true } + +GET /api/v1/releases/{id}/state + Response: { environments: [{ environmentId, status, deployedAt }] } + +POST /api/v1/releases/{id}/deprecate + Response: Release + +GET /api/v1/releases/{id}/compare/{otherId} + Response: ReleaseDiff + +# Quick release creation +POST /api/v1/releases/from-latest + Body: { + name: string, + channel?: string, # default: stable + componentIds?: UUID[], # default: all + pinFrom?: { environmentId: UUID } # for partial release + } + Response: Release +``` + +--- + +## Release Identity: Digest-First Principle + +A core design invariant of the Release Orchestrator: + +``` +INVARIANT: A release is a set of OCI image digests (component -> digest mapping), never tags. +``` + +**Implementation Requirements**: +- Tags are convenience inputs for resolution +- Tags are resolved to digests at release creation time +- All downstream operations (promotion, deployment, rollback) use digests +- Digest mismatch at pull time = deployment failure (tamper detection) + +**Example**: +```json +{ + "id": "release-uuid", + "name": "myapp-v2.3.1", + "components": [ + { + "componentId": "api-component-uuid", + "componentName": "api", + "tag": "v2.3.1", + "digest": "sha256:abc123def456...", + "semver": "2.3.1", + "role": "primary" + }, + { + "componentId": "worker-component-uuid", + "componentName": "worker", + "tag": "v2.3.1", + "digest": "sha256:789xyz123abc...", + "semver": "2.3.1", + "role": "primary" + } + ] +} +``` + +--- + +## References + +- [Module Overview](overview.md) +- [Design Principles](../design/principles.md) +- [API Documentation](../api/releases.md) +- [Promotion Manager](promotion-manager.md) diff --git a/docs/modules/release-orchestrator/modules/workflow-engine.md b/docs/modules/release-orchestrator/modules/workflow-engine.md new file mode 100644 index 000000000..4dcafc894 --- /dev/null +++ b/docs/modules/release-orchestrator/modules/workflow-engine.md @@ -0,0 +1,590 @@ +# WORKFL: Workflow Engine + +**Purpose**: DAG-based workflow execution for deployments, approvals, and custom automation. + +## Modules + +### Module: `workflow-designer` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Template creation; DAG graph editor; validation | +| **Dependencies** | `step-registry` | +| **Data Entities** | `WorkflowTemplate`, `StepNode`, `StepEdge` | + +**Workflow Template Structure**: +```typescript +interface WorkflowTemplate { + id: UUID; + tenantId: UUID; + name: string; + displayName: string; + description: string; + version: number; + + // DAG structure + nodes: StepNode[]; + edges: StepEdge[]; + + // I/O + inputs: InputDefinition[]; + outputs: OutputDefinition[]; + + // Metadata + tags: string[]; + isBuiltin: boolean; + createdAt: DateTime; + createdBy: UUID; +} +``` + +--- + +### Module: `workflow-engine` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | DAG execution; state machine; pause/resume | +| **Dependencies** | `step-executor`, `step-registry` | +| **Data Entities** | `WorkflowRun`, `WorkflowState` | +| **Events Produced** | `workflow.started`, `workflow.paused`, `workflow.resumed`, `workflow.completed`, `workflow.failed` | + +**Workflow Execution Algorithm**: +```python +class WorkflowEngine: + def execute(self, workflow_run: WorkflowRun) -> None: + """Main workflow execution loop.""" + + # Initialize + workflow_run.status = "running" + workflow_run.started_at = now() + self.save(workflow_run) + + try: + while not self.is_terminal(workflow_run): + # Handle pause state + if workflow_run.status == "paused": + self.wait_for_resume(workflow_run) + continue + + # Get nodes ready for execution + ready_nodes = self.get_ready_nodes(workflow_run) + + if not ready_nodes: + # Check if we're waiting on approvals + if self.has_pending_approvals(workflow_run): + workflow_run.status = "paused" + self.save(workflow_run) + continue + + # Check if all nodes are complete + if self.all_nodes_complete(workflow_run): + break + + # Deadlock detection + raise WorkflowDeadlockError(workflow_run.id) + + # Execute ready nodes in parallel + futures = [] + for node in ready_nodes: + future = self.executor.submit( + self.execute_node, + workflow_run, + node + ) + futures.append((node, future)) + + # Wait for at least one to complete + completed = self.wait_any(futures) + + for node, result in completed: + step_run = self.get_step_run(workflow_run, node.id) + + if result.success: + step_run.status = "succeeded" + step_run.outputs = result.outputs + self.propagate_outputs(workflow_run, node, result.outputs) + else: + step_run.status = "failed" + step_run.error_message = result.error + + # Handle failure action + if node.on_failure == "fail": + workflow_run.status = "failed" + workflow_run.error_message = f"Step {node.name} failed: {result.error}" + self.cancel_pending_steps(workflow_run) + return + elif node.on_failure == "rollback": + self.trigger_rollback(workflow_run, node) + elif node.on_failure.startswith("goto:"): + target = node.on_failure.split(":")[1] + self.add_ready_node(workflow_run, target) + # "continue" just continues to next nodes + + step_run.completed_at = now() + self.save(step_run) + + # Workflow completed successfully + workflow_run.status = "succeeded" + workflow_run.completed_at = now() + self.save(workflow_run) + + except WorkflowCancelledError: + workflow_run.status = "cancelled" + workflow_run.completed_at = now() + self.save(workflow_run) + except Exception as e: + workflow_run.status = "failed" + workflow_run.error_message = str(e) + workflow_run.completed_at = now() + self.save(workflow_run) +``` + +--- + +### Module: `step-executor` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Step dispatch; retry logic; timeout handling | +| **Dependencies** | `step-registry`, `plugin-sandbox` | +| **Data Entities** | `StepRun`, `StepResult` | +| **Events Produced** | `step.started`, `step.progress`, `step.completed`, `step.failed`, `step.retrying` | + +**Step Node Structure**: +```typescript +interface StepNode { + id: string; // Unique within template (e.g., "deploy-api") + type: string; // Step type from registry + name: string; // Display name + config: Record; // Step-specific configuration + inputs: InputBinding[]; // Input value bindings + outputs: OutputBinding[]; // Output declarations + position: { x: number; y: number }; // UI position + + // Execution settings + timeout: number; // Seconds (default from step type) + retryPolicy: RetryPolicy; + onFailure: FailureAction; + condition?: string; // JS expression for conditional execution + + // Documentation + description?: string; + documentation?: string; +} + +type FailureAction = "fail" | "continue" | "rollback" | "goto:{nodeId}"; + +interface InputBinding { + name: string; // Input parameter name + source: InputSource; +} + +type InputSource = + | { type: "literal"; value: any } + | { type: "context"; path: string } // e.g., "release.name" + | { type: "output"; nodeId: string; outputName: string } + | { type: "secret"; secretName: string } + | { type: "expression"; expression: string }; // JS expression + +interface StepEdge { + id: string; + from: string; // Source node ID + to: string; // Target node ID + condition?: string; // Optional condition expression + label?: string; // Display label for conditional edges +} + +interface RetryPolicy { + maxRetries: number; + backoffType: "fixed" | "exponential"; + backoffSeconds: number; + retryableErrors: string[]; +} +``` + +--- + +### Module: `step-registry` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Built-in + plugin-provided step types | +| **Dependencies** | `plugin-registry` | +| **Data Entities** | `StepType`, `StepSchema` | + +**Built-in Step Types**: + +| Step Type | Category | Description | +|-----------|----------|-------------| +| `approval` | Control | Wait for human approval | +| `security-gate` | Gate | Evaluate security policy | +| `custom-gate` | Gate | Custom OPA policy evaluation | +| `deploy-docker` | Deploy | Deploy single container | +| `deploy-compose` | Deploy | Deploy Docker Compose stack | +| `deploy-ecs` | Deploy | Deploy to AWS ECS | +| `deploy-nomad` | Deploy | Deploy to HashiCorp Nomad | +| `health-check` | Verify | HTTP/TCP health check | +| `smoke-test` | Verify | Run smoke test suite | +| `execute-script` | Custom | Run C#/Bash script | +| `webhook` | Integration | Call external webhook | +| `trigger-ci` | Integration | Trigger CI pipeline | +| `wait-ci` | Integration | Wait for CI pipeline | +| `notify` | Notification | Send notification | +| `rollback` | Recovery | Rollback deployment | +| `traffic-shift` | Progressive | Shift traffic percentage | + +**Step Type Definition**: +```typescript +interface StepType { + type: string; // "deploy-compose" + displayName: string; // "Deploy Compose Stack" + description: string; + category: StepCategory; + icon: string; + + // Schema + configSchema: JSONSchema; // Step configuration schema + inputSchema: JSONSchema; // Required inputs schema + outputSchema: JSONSchema; // Produced outputs schema + + // Execution + executor: "builtin" | UUID; // builtin or plugin ID + defaultTimeout: number; + safeToRetry: boolean; + retryableErrors: string[]; + + // Documentation + documentation: string; + examples: StepExample[]; +} +``` + +--- + +## Workflow Run State Machine + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ WORKFLOW RUN STATE MACHINE │ +│ │ +│ ┌──────────┐ │ +│ │ CREATED │ │ +│ └────┬─────┘ │ +│ │ start() │ +│ ▼ │ +│ ┌─────────────────────────────┐ │ +│ │ │ │ +│ pause() ┌──┴──────────┐ │ │ +│ ┌────────►│ PAUSED │◄─────────┐ │ │ +│ │ └──────┬──────┘ │ │ │ +│ │ │ resume() │ │ │ +│ │ ▼ │ │ │ +│ │ ┌─────────────┐ │ │ │ +│ └─────────│ RUNNING │──────────┘ │ │ +│ └──────┬──────┘ (waiting for │ │ +│ │ approval) │ │ +│ ┌────────────┼────────────┐ │ │ +│ │ │ │ │ │ +│ ▼ ▼ ▼ │ │ +│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │ +│ │ SUCCEEDED │ │ FAILED │ │ CANCELLED │ │ │ +│ └───────────┘ └───────────┘ └───────────┘ │ │ +│ │ +│ Transitions: │ +│ - CREATED → RUNNING: start() │ +│ - RUNNING → PAUSED: pause(), waiting approval │ +│ - PAUSED → RUNNING: resume(), approval granted │ +│ - RUNNING → SUCCEEDED: all nodes complete │ +│ - RUNNING → FAILED: node fails with fail action │ +│ - RUNNING → CANCELLED: cancel() │ +│ - PAUSED → CANCELLED: cancel() │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Step Run State Machine + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ STEP RUN STATE MACHINE │ +│ │ +│ ┌──────────┐ │ +│ │ PENDING │ ◄──── Initial state; dependencies not met │ +│ └────┬─────┘ │ +│ │ dependencies met + condition true │ +│ ▼ │ +│ ┌──────────┐ │ +│ │ RUNNING │ ◄──── Step is executing │ +│ └────┬─────┘ │ +│ │ │ +│ ┌────┴────────────────┬─────────────────┐ │ +│ │ │ │ │ +│ ▼ ▼ ▼ │ +│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ +│ │ SUCCEEDED │ │ FAILED │ │ SKIPPED │ │ +│ └───────────┘ └─────┬─────┘ └───────────┘ │ +│ │ ▲ │ +│ │ │ condition false │ +│ ▼ │ │ +│ ┌───────────┐ │ │ +│ │ RETRYING │──────┘ (max retries exceeded) │ +│ └─────┬─────┘ │ +│ │ │ +│ │ retry attempt │ +│ └──────────────────┐ │ +│ │ │ +│ ▼ │ +│ ┌──────────┐ │ +│ │ RUNNING │ (retry) │ +│ └──────────┘ │ +│ │ +│ Additional transitions: │ +│ - Any state → CANCELLED: workflow cancelled │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Database Schema + +```sql +-- Workflow Templates +CREATE TABLE release.workflow_templates ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID REFERENCES tenants(id) ON DELETE CASCADE, + name VARCHAR(255) NOT NULL, + display_name VARCHAR(255) NOT NULL, + description TEXT, + version INTEGER NOT NULL DEFAULT 1, + nodes JSONB NOT NULL, + edges JSONB NOT NULL, + inputs JSONB NOT NULL DEFAULT '[]', + outputs JSONB NOT NULL DEFAULT '[]', + tags JSONB NOT NULL DEFAULT '[]', + is_builtin BOOLEAN NOT NULL DEFAULT FALSE, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + created_by UUID REFERENCES users(id) +); + +CREATE INDEX idx_workflow_templates_tenant ON release.workflow_templates(tenant_id); +CREATE INDEX idx_workflow_templates_name ON release.workflow_templates(name); + +-- Workflow Runs +CREATE TABLE release.workflow_runs ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + template_id UUID NOT NULL REFERENCES release.workflow_templates(id), + template_version INTEGER NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'created', + context JSONB NOT NULL, + inputs JSONB NOT NULL DEFAULT '{}', + outputs JSONB NOT NULL DEFAULT '{}', + error_message TEXT, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + created_by UUID REFERENCES users(id) +); + +CREATE INDEX idx_workflow_runs_tenant ON release.workflow_runs(tenant_id); +CREATE INDEX idx_workflow_runs_template ON release.workflow_runs(template_id); +CREATE INDEX idx_workflow_runs_status ON release.workflow_runs(status); + +-- Step Runs +CREATE TABLE release.step_runs ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + workflow_run_id UUID NOT NULL REFERENCES release.workflow_runs(id) ON DELETE CASCADE, + node_id VARCHAR(255) NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'pending', + inputs JSONB NOT NULL DEFAULT '{}', + outputs JSONB NOT NULL DEFAULT '{}', + error_message TEXT, + logs TEXT, + attempt_number INTEGER NOT NULL DEFAULT 1, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + UNIQUE (workflow_run_id, node_id) +); + +CREATE INDEX idx_step_runs_workflow ON release.step_runs(workflow_run_id); +CREATE INDEX idx_step_runs_status ON release.step_runs(status); + +-- Step Registry +CREATE TABLE release.step_types ( + type VARCHAR(255) PRIMARY KEY, + display_name VARCHAR(255) NOT NULL, + description TEXT, + category VARCHAR(100) NOT NULL, + icon VARCHAR(255), + config_schema JSONB NOT NULL, + input_schema JSONB NOT NULL, + output_schema JSONB NOT NULL, + executor VARCHAR(255) NOT NULL DEFAULT 'builtin', + default_timeout INTEGER NOT NULL DEFAULT 300, + safe_to_retry BOOLEAN NOT NULL DEFAULT FALSE, + retryable_errors JSONB NOT NULL DEFAULT '[]', + documentation TEXT, + examples JSONB NOT NULL DEFAULT '[]', + plugin_id UUID REFERENCES release.plugins(id), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_step_types_category ON release.step_types(category); +CREATE INDEX idx_step_types_plugin ON release.step_types(plugin_id); +``` + +--- + +## Workflow Template Example: Standard Deployment + +```json +{ + "id": "template-standard-deploy", + "name": "standard-deploy", + "displayName": "Standard Deployment", + "version": 1, + "inputs": [ + { "name": "releaseId", "type": "uuid", "required": true }, + { "name": "environmentId", "type": "uuid", "required": true }, + { "name": "promotionId", "type": "uuid", "required": true } + ], + "nodes": [ + { + "id": "approval", + "type": "approval", + "name": "Approval Gate", + "config": {}, + "inputs": [ + { "name": "promotionId", "source": { "type": "context", "path": "promotionId" } } + ], + "position": { "x": 100, "y": 100 } + }, + { + "id": "security-gate", + "type": "security-gate", + "name": "Security Verification", + "config": { + "blockOnCritical": true, + "blockOnHigh": true + }, + "inputs": [ + { "name": "releaseId", "source": { "type": "context", "path": "releaseId" } } + ], + "position": { "x": 100, "y": 200 } + }, + { + "id": "deploy-targets", + "type": "deploy-compose", + "name": "Deploy to Targets", + "config": { + "strategy": "rolling", + "parallelism": 2 + }, + "inputs": [ + { "name": "releaseId", "source": { "type": "context", "path": "releaseId" } }, + { "name": "environmentId", "source": { "type": "context", "path": "environmentId" } } + ], + "timeout": 600, + "retryPolicy": { + "maxRetries": 2, + "backoffType": "exponential", + "backoffSeconds": 30 + }, + "onFailure": "rollback", + "position": { "x": 100, "y": 400 } + }, + { + "id": "health-check", + "type": "health-check", + "name": "Health Verification", + "config": { + "type": "http", + "path": "/health", + "expectedStatus": 200, + "timeout": 30, + "retries": 5 + }, + "inputs": [ + { "name": "targets", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "deployedTargets" } } + ], + "onFailure": "rollback", + "position": { "x": 100, "y": 500 } + }, + { + "id": "notify-success", + "type": "notify", + "name": "Success Notification", + "config": { + "channel": "slack", + "template": "deployment-success" + }, + "onFailure": "continue", + "position": { "x": 100, "y": 700 } + }, + { + "id": "rollback-handler", + "type": "rollback", + "name": "Rollback Handler", + "config": { + "strategy": "to-previous" + }, + "inputs": [ + { "name": "deploymentJobId", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "jobId" } } + ], + "position": { "x": 300, "y": 450 } + } + ], + "edges": [ + { "id": "e1", "from": "approval", "to": "security-gate" }, + { "id": "e2", "from": "security-gate", "to": "deploy-targets" }, + { "id": "e3", "from": "deploy-targets", "to": "health-check" }, + { "id": "e4", "from": "health-check", "to": "notify-success" }, + { "id": "e5", "from": "deploy-targets", "to": "rollback-handler", "condition": "status === 'failed'" }, + { "id": "e6", "from": "health-check", "to": "rollback-handler", "condition": "status === 'failed'" } + ] +} +``` + +--- + +## API Endpoints + +See [API Documentation](../api/workflows.md) for full specification. + +```yaml +# Workflow Templates +POST /api/v1/workflow-templates +GET /api/v1/workflow-templates +GET /api/v1/workflow-templates/{id} +PUT /api/v1/workflow-templates/{id} +DELETE /api/v1/workflow-templates/{id} +POST /api/v1/workflow-templates/{id}/validate + +# Step Registry +GET /api/v1/step-types +GET /api/v1/step-types/{type} + +# Workflow Runs +POST /api/v1/workflow-runs +GET /api/v1/workflow-runs +GET /api/v1/workflow-runs/{id} +POST /api/v1/workflow-runs/{id}/pause +POST /api/v1/workflow-runs/{id}/resume +POST /api/v1/workflow-runs/{id}/cancel +GET /api/v1/workflow-runs/{id}/steps +GET /api/v1/workflow-runs/{id}/steps/{nodeId} +GET /api/v1/workflow-runs/{id}/steps/{nodeId}/logs +GET /api/v1/workflow-runs/{id}/steps/{nodeId}/artifacts +``` + +--- + +## References + +- [Module Overview](overview.md) +- [Workflow Templates](../workflow/templates.md) +- [Execution State Machine](../workflow/execution.md) +- [API Documentation](../api/workflows.md) diff --git a/docs/modules/release-orchestrator/operations/metrics.md b/docs/modules/release-orchestrator/operations/metrics.md new file mode 100644 index 000000000..827b5ed7d --- /dev/null +++ b/docs/modules/release-orchestrator/operations/metrics.md @@ -0,0 +1,274 @@ +# Metrics Specification + +## Overview + +Release Orchestrator exposes Prometheus-compatible metrics for monitoring deployment health, performance, and operational status. + +## Core Metrics + +### Release Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_releases_total` | counter | Total releases created | `tenant`, `status` | +| `stella_releases_active` | gauge | Currently active releases | `tenant`, `status` | +| `stella_release_components_count` | histogram | Components per release | `tenant` | + +### Promotion Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_promotions_total` | counter | Total promotions | `tenant`, `env`, `status` | +| `stella_promotions_in_progress` | gauge | Promotions currently in progress | `tenant`, `env` | +| `stella_promotion_duration_seconds` | histogram | Time from request to completion | `tenant`, `env`, `status` | +| `stella_approval_pending_count` | gauge | Pending approvals | `tenant`, `env` | +| `stella_approval_duration_seconds` | histogram | Time to approve | `tenant`, `env` | + +### Deployment Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_deployments_total` | counter | Total deployments | `tenant`, `env`, `strategy`, `status` | +| `stella_deployment_duration_seconds` | histogram | Deployment duration | `tenant`, `env`, `strategy` | +| `stella_deployment_tasks_total` | counter | Total deployment tasks | `tenant`, `status` | +| `stella_deployment_task_duration_seconds` | histogram | Task duration | `target_type` | +| `stella_rollbacks_total` | counter | Total rollbacks | `tenant`, `env`, `reason` | + +### Agent Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_agents_connected` | gauge | Connected agents | `tenant` | +| `stella_agents_by_status` | gauge | Agents by status | `tenant`, `status` | +| `stella_agent_tasks_total` | counter | Tasks executed by agents | `agent`, `type`, `status` | +| `stella_agent_task_duration_seconds` | histogram | Agent task duration | `agent`, `type` | +| `stella_agent_heartbeat_age_seconds` | gauge | Seconds since last heartbeat | `agent` | +| `stella_agent_resource_cpu_percent` | gauge | Agent CPU usage | `agent` | +| `stella_agent_resource_memory_percent` | gauge | Agent memory usage | `agent` | + +### Workflow Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_workflow_runs_total` | counter | Workflow executions | `tenant`, `template`, `status` | +| `stella_workflow_runs_active` | gauge | Currently running workflows | `tenant`, `template` | +| `stella_workflow_duration_seconds` | histogram | Workflow duration | `template`, `status` | +| `stella_workflow_step_duration_seconds` | histogram | Step execution time | `step_type`, `status` | +| `stella_workflow_step_retries_total` | counter | Step retry count | `step_type` | + +### Target Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_targets_total` | gauge | Total targets | `tenant`, `env`, `type` | +| `stella_targets_by_health` | gauge | Targets by health status | `tenant`, `env`, `health` | +| `stella_target_drift_detected` | gauge | Targets with drift | `tenant`, `env` | + +### Integration Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_integrations_total` | gauge | Configured integrations | `tenant`, `type` | +| `stella_integration_health` | gauge | Integration health (1=healthy) | `tenant`, `integration` | +| `stella_integration_requests_total` | counter | Requests to integrations | `integration`, `operation`, `status` | +| `stella_integration_latency_seconds` | histogram | Integration request latency | `integration`, `operation` | + +### Gate Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_gate_evaluations_total` | counter | Gate evaluations | `tenant`, `gate_type`, `result` | +| `stella_gate_evaluation_duration_seconds` | histogram | Gate evaluation time | `gate_type` | +| `stella_gate_blocks_total` | counter | Blocked promotions by gate | `tenant`, `gate_type`, `env` | + +## API Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_http_requests_total` | counter | HTTP requests | `method`, `path`, `status` | +| `stella_http_request_duration_seconds` | histogram | Request latency | `method`, `path` | +| `stella_http_requests_in_flight` | gauge | Active requests | `method` | +| `stella_http_request_size_bytes` | histogram | Request size | `method`, `path` | +| `stella_http_response_size_bytes` | histogram | Response size | `method`, `path` | + +## Evidence Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_evidence_packets_total` | counter | Evidence packets generated | `tenant`, `type` | +| `stella_evidence_packet_size_bytes` | histogram | Evidence packet size | `type` | +| `stella_evidence_verification_total` | counter | Evidence verifications | `result` | + +## Prometheus Configuration + +```yaml +# prometheus.yml +global: + scrape_interval: 15s + evaluation_interval: 15s + +scrape_configs: + - job_name: 'stella-orchestrator' + static_configs: + - targets: ['stella-orchestrator:9090'] + metrics_path: /metrics + scheme: https + tls_config: + ca_file: /etc/prometheus/ca.crt + + - job_name: 'stella-agents' + kubernetes_sd_configs: + - role: pod + selectors: + - role: pod + label: "app.kubernetes.io/name=stella-agent" + relabel_configs: + - source_labels: [__meta_kubernetes_pod_label_agent_id] + target_label: agent_id +``` + +## Histogram Buckets + +### Duration Buckets (seconds) + +```yaml +# Short operations (API calls, gate evaluations) +short_duration_buckets: [0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10] + +# Medium operations (workflow steps) +medium_duration_buckets: [0.1, 0.5, 1, 2.5, 5, 10, 30, 60, 120, 300] + +# Long operations (deployments) +long_duration_buckets: [1, 5, 10, 30, 60, 120, 300, 600, 1200, 3600] +``` + +### Size Buckets (bytes) + +```yaml +# Request/response sizes +size_buckets: [100, 1000, 10000, 100000, 1000000, 10000000] + +# Evidence packet sizes +evidence_buckets: [1000, 10000, 100000, 500000, 1000000, 5000000] +``` + +## SLI Definitions + +### Availability SLI + +```promql +# API availability (99.9% target) +sum(rate(stella_http_requests_total{status!~"5.."}[5m])) +/ +sum(rate(stella_http_requests_total[5m])) +``` + +### Latency SLI + +```promql +# API latency P99 < 500ms +histogram_quantile(0.99, + sum(rate(stella_http_request_duration_seconds_bucket[5m])) by (le) +) +``` + +### Deployment Success SLI + +```promql +# Deployment success rate (99% target) +sum(rate(stella_deployments_total{status="succeeded"}[24h])) +/ +sum(rate(stella_deployments_total[24h])) +``` + +## Alert Rules + +```yaml +groups: + - name: stella-orchestrator + rules: + - alert: HighDeploymentFailureRate + expr: | + sum(rate(stella_deployments_total{status="failed"}[1h])) + / + sum(rate(stella_deployments_total[1h])) > 0.1 + for: 5m + labels: + severity: critical + annotations: + summary: High deployment failure rate + description: More than 10% of deployments failing in the last hour + + - alert: AgentOffline + expr: stella_agent_heartbeat_age_seconds > 120 + for: 2m + labels: + severity: warning + annotations: + summary: Agent {{ $labels.agent }} offline + description: Agent has not sent heartbeat for > 2 minutes + + - alert: PendingApprovalsStale + expr: | + stella_approval_pending_count > 0 + and + time() - stella_promotion_request_timestamp > 3600 + for: 5m + labels: + severity: warning + annotations: + summary: Stale pending approvals + description: Approvals pending for more than 1 hour + + - alert: IntegrationUnhealthy + expr: stella_integration_health == 0 + for: 5m + labels: + severity: warning + annotations: + summary: Integration {{ $labels.integration }} unhealthy + description: Integration health check failing + + - alert: HighAPILatency + expr: | + histogram_quantile(0.99, + sum(rate(stella_http_request_duration_seconds_bucket[5m])) by (le, path) + ) > 1 + for: 5m + labels: + severity: warning + annotations: + summary: High API latency on {{ $labels.path }} + description: P99 latency exceeds 1 second +``` + +## Grafana Dashboards + +### Main Dashboard Panels + +1. **Deployment Pipeline Overview** + - Promotions per environment (time series) + - Success/failure rates (gauge) + - Active deployments (stat) + +2. **Agent Health** + - Connected agents (stat) + - Agent status distribution (pie chart) + - Heartbeat age (table) + +3. **Gate Performance** + - Gate evaluation counts (bar chart) + - Block rate by gate type (time series) + - Evaluation latency (heatmap) + +4. **API Performance** + - Request rate (time series) + - Error rate (time series) + - Latency distribution (heatmap) + +## References + +- [Operations Overview](overview.md) +- [Logging](logging.md) +- [Tracing](tracing.md) +- [Alerting](alerting.md) diff --git a/docs/modules/release-orchestrator/operations/overview.md b/docs/modules/release-orchestrator/operations/overview.md new file mode 100644 index 000000000..d310f2137 --- /dev/null +++ b/docs/modules/release-orchestrator/operations/overview.md @@ -0,0 +1,508 @@ +# Operations Overview + +## Observability Stack + +Release Orchestrator provides comprehensive observability through metrics, logging, and distributed tracing. + +``` + OBSERVABILITY ARCHITECTURE + + ┌─────────────────────────────────────────────────────────────────────────────┐ + │ RELEASE ORCHESTRATOR │ + │ │ + │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ + │ │ Metrics │ │ Logs │ │ Traces │ │ Events │ │ + │ │ Exporter │ │ Collector │ │ Exporter │ │ Publisher │ │ + │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ + │ │ │ │ │ │ + └─────────┼────────────────┼────────────────┼────────────────┼────────────────┘ + │ │ │ │ + ▼ ▼ ▼ ▼ + ┌─────────────────────────────────────────────────────────────────────────────┐ + │ OBSERVABILITY BACKENDS │ + │ │ + │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ + │ │ Prometheus │ │ Loki / │ │ Jaeger / │ │ Event │ │ + │ │ / Mimir │ │ Elasticsearch│ │ Tempo │ │ Bus │ │ + │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ + │ │ │ │ │ │ + │ └────────────────┴────────────────┴────────────────┘ │ + │ │ │ + │ ▼ │ + │ ┌─────────────────┐ │ + │ │ Grafana │ │ + │ │ Dashboards │ │ + │ └─────────────────┘ │ + │ │ + └─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Metrics + +### Core Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_releases_total` | counter | Total releases created | `tenant`, `status` | +| `stella_promotions_total` | counter | Total promotions | `tenant`, `env`, `status` | +| `stella_deployments_total` | counter | Total deployments | `tenant`, `env`, `strategy` | +| `stella_deployment_duration_seconds` | histogram | Deployment duration | `tenant`, `env`, `strategy` | +| `stella_rollbacks_total` | counter | Total rollbacks | `tenant`, `env`, `reason` | +| `stella_agents_connected` | gauge | Connected agents | `tenant` | +| `stella_targets_total` | gauge | Total targets | `tenant`, `env`, `type` | +| `stella_workflow_runs_total` | counter | Workflow executions | `tenant`, `template`, `status` | +| `stella_workflow_step_duration_seconds` | histogram | Step execution time | `step_type` | +| `stella_approval_pending_count` | gauge | Pending approvals | `tenant`, `env` | +| `stella_approval_duration_seconds` | histogram | Time to approve | `tenant`, `env` | + +### API Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_http_requests_total` | counter | HTTP requests | `method`, `path`, `status` | +| `stella_http_request_duration_seconds` | histogram | Request latency | `method`, `path` | +| `stella_http_requests_in_flight` | gauge | Active requests | `method` | + +### Agent Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_agent_tasks_total` | counter | Tasks executed | `agent`, `type`, `status` | +| `stella_agent_task_duration_seconds` | histogram | Task duration | `agent`, `type` | +| `stella_agent_heartbeat_age_seconds` | gauge | Since last heartbeat | `agent` | + +### Prometheus Configuration + +```yaml +# prometheus.yml +scrape_configs: + - job_name: 'stella-orchestrator' + static_configs: + - targets: ['stella-orchestrator:9090'] + metrics_path: /metrics + scheme: https + tls_config: + ca_file: /etc/prometheus/ca.crt + + - job_name: 'stella-agents' + kubernetes_sd_configs: + - role: pod + selectors: + - role: pod + label: "app.kubernetes.io/name=stella-agent" + relabel_configs: + - source_labels: [__meta_kubernetes_pod_label_agent_id] + target_label: agent_id +``` + +## Logging + +### Log Format + +```json +{ + "timestamp": "2026-01-09T10:30:00.123Z", + "level": "info", + "message": "Deployment started", + "service": "deploy-orchestrator", + "version": "1.0.0", + "traceId": "abc123def456", + "spanId": "789ghi", + "tenantId": "tenant-uuid", + "correlationId": "corr-uuid", + "context": { + "deploymentJobId": "job-uuid", + "releaseId": "release-uuid", + "environmentId": "env-uuid" + } +} +``` + +### Log Levels + +| Level | Usage | +|-------|-------| +| `error` | Failures requiring attention | +| `warn` | Degraded operation, recoverable issues | +| `info` | Business events (deployment started, approval granted) | +| `debug` | Detailed operational info | +| `trace` | Very detailed debugging | + +### Structured Logging Configuration + +```typescript +// Logging configuration +const loggerConfig = { + level: process.env.LOG_LEVEL || 'info', + format: 'json', + outputs: [ + { + type: 'stdout', + format: 'json' + }, + { + type: 'file', + path: '/var/log/stella/orchestrator.log', + rotation: { + maxSize: '100MB', + maxFiles: 10 + } + } + ], + // Sensitive field masking + redact: [ + 'password', + 'token', + 'secret', + 'credentials', + 'authorization' + ] +}; +``` + +### Important Log Events + +| Event | Level | Description | +|-------|-------|-------------| +| `deployment.started` | info | Deployment job started | +| `deployment.completed` | info | Deployment successful | +| `deployment.failed` | error | Deployment failed | +| `rollback.initiated` | warn | Rollback triggered | +| `approval.granted` | info | Promotion approved | +| `approval.denied` | info | Promotion rejected | +| `agent.connected` | info | Agent came online | +| `agent.disconnected` | warn | Agent went offline | +| `security.gate.failed` | warn | Security check blocked | + +## Distributed Tracing + +### Trace Context Propagation + +```typescript +// Trace context in requests +interface TraceContext { + traceId: string; + spanId: string; + parentSpanId?: string; + sampled: boolean; + baggage?: Record; +} + +// W3C Trace Context headers +// traceparent: 00-{traceId}-{spanId}-{flags} +// tracestate: stella=... + +// Example trace propagation +class TracingMiddleware { + handle(req: Request, res: Response, next: NextFunction): void { + const traceparent = req.headers['traceparent']; + const traceContext = this.parseTraceParent(traceparent); + + // Start span for this request + const span = this.tracer.startSpan('http.request', { + parent: traceContext, + attributes: { + 'http.method': req.method, + 'http.url': req.url, + 'http.user_agent': req.headers['user-agent'], + 'tenant.id': req.tenantId + } + }); + + // Attach to request for downstream use + req.span = span; + + res.on('finish', () => { + span.setAttribute('http.status_code', res.statusCode); + span.end(); + }); + + next(); + } +} +``` + +### Key Spans + +| Span Name | Description | Attributes | +|-----------|-------------|------------| +| `deployment.execute` | Full deployment | `release_id`, `environment` | +| `task.dispatch` | Task dispatch to agent | `target_id`, `agent_id` | +| `agent.execute` | Agent task execution | `task_type`, `duration` | +| `workflow.run` | Workflow execution | `template_id`, `status` | +| `workflow.step` | Individual step | `step_type`, `node_id` | +| `approval.wait` | Waiting for approval | `promotion_id`, `duration` | +| `gate.evaluate` | Gate evaluation | `gate_type`, `result` | + +### Jaeger Configuration + +```yaml +# jaeger-config.yaml +apiVersion: jaegertracing.io/v1 +kind: Jaeger +metadata: + name: stella-jaeger +spec: + strategy: production + collector: + maxReplicas: 5 + storage: + type: elasticsearch + options: + es: + server-urls: https://elasticsearch:9200 + secretName: jaeger-es-secret + ingress: + enabled: true +``` + +## Alerting + +### Alert Rules + +```yaml +# prometheus-rules.yaml +groups: + - name: stella.deployment + rules: + - alert: DeploymentFailureRateHigh + expr: | + sum(rate(stella_deployments_total{status="failed"}[5m])) / + sum(rate(stella_deployments_total[5m])) > 0.1 + for: 5m + labels: + severity: critical + annotations: + summary: "High deployment failure rate" + description: "More than 10% of deployments are failing" + + - alert: DeploymentDurationHigh + expr: | + histogram_quantile(0.95, sum(rate(stella_deployment_duration_seconds_bucket[5m])) by (le, tenant)) > 600 + for: 10m + labels: + severity: warning + annotations: + summary: "Deployment duration high" + description: "P95 deployment duration exceeds 10 minutes" + + - alert: RollbackRateHigh + expr: | + sum(rate(stella_rollbacks_total[1h])) > 3 + for: 5m + labels: + severity: warning + annotations: + summary: "High rollback rate" + description: "More than 3 rollbacks in the last hour" + + - name: stella.agents + rules: + - alert: AgentOffline + expr: | + stella_agent_heartbeat_age_seconds > 120 + for: 2m + labels: + severity: critical + annotations: + summary: "Agent offline" + description: "Agent {{ $labels.agent }} has not sent heartbeat for 2 minutes" + + - alert: AgentPoolLow + expr: | + count(stella_agents_connected{status="online"}) by (tenant) < 2 + for: 5m + labels: + severity: warning + annotations: + summary: "Low agent count" + description: "Fewer than 2 agents online for tenant {{ $labels.tenant }}" + + - name: stella.approvals + rules: + - alert: ApprovalBacklogHigh + expr: | + stella_approval_pending_count > 10 + for: 1h + labels: + severity: warning + annotations: + summary: "Approval backlog growing" + description: "More than 10 pending approvals for over an hour" + + - alert: ApprovalWaitLong + expr: | + histogram_quantile(0.90, stella_approval_duration_seconds_bucket) > 86400 + for: 1h + labels: + severity: info + annotations: + summary: "Long approval wait times" + description: "P90 approval wait time exceeds 24 hours" +``` + +### PagerDuty Integration + +```typescript +interface AlertManagerConfig { + receivers: [ + { + name: "stella-critical", + pagerduty_configs: [ + { + service_key: "${PAGERDUTY_SERVICE_KEY}", + severity: "critical" + } + ] + }, + { + name: "stella-warning", + slack_configs: [ + { + api_url: "${SLACK_WEBHOOK_URL}", + channel: "#stella-alerts", + send_resolved: true + } + ] + } + ], + route: { + receiver: "stella-warning", + routes: [ + { + match: { severity: "critical" }, + receiver: "stella-critical" + } + ] + } +} +``` + +## Dashboards + +### Deployment Dashboard + +Key panels: +- Deployment rate over time +- Success/failure ratio +- Average deployment duration +- Deployment duration histogram +- Active deployments by environment +- Recent deployment list + +### Agent Health Dashboard + +Key panels: +- Connected agents count +- Agent heartbeat status +- Tasks per agent +- Task success rate by agent +- Agent resource utilization + +### Approval Dashboard + +Key panels: +- Pending approvals count +- Approval response time +- Approvals by user +- Rejection reasons breakdown + +## Health Endpoints + +### Application Health + +```http +GET /health +``` + +Response: +```json +{ + "status": "healthy", + "version": "1.0.0", + "uptime": 86400, + "checks": { + "database": { "status": "healthy", "latency": 5 }, + "redis": { "status": "healthy", "latency": 2 }, + "vault": { "status": "healthy", "latency": 10 } + } +} +``` + +### Readiness Probe + +```http +GET /health/ready +``` + +### Liveness Probe + +```http +GET /health/live +``` + +## Performance Tuning + +### Database Connection Pool + +```typescript +const poolConfig = { + min: 5, + max: 20, + acquireTimeout: 30000, + idleTimeout: 600000, + connectionTimeout: 10000 +}; +``` + +### Cache Configuration + +```typescript +const cacheConfig = { + // Release cache + releases: { + ttl: 300, // 5 minutes + maxSize: 1000 + }, + // Target cache + targets: { + ttl: 60, // 1 minute + maxSize: 5000 + }, + // Workflow template cache + templates: { + ttl: 3600, // 1 hour + maxSize: 100 + } +}; +``` + +### Rate Limiting + +```typescript +const rateLimitConfig = { + // API rate limits + api: { + windowMs: 60000, // 1 minute + max: 1000, // requests per window + burst: 100 // burst allowance + }, + // Webhook rate limits + webhooks: { + windowMs: 60000, + max: 100 + }, + // Per-tenant limits + tenant: { + windowMs: 60000, + max: 500 + } +}; +``` + +## References + +- [Metrics Reference](metrics.md) +- [Logging Guide](logging.md) +- [Tracing Setup](tracing.md) +- [Alert Configuration](alerting.md) diff --git a/docs/modules/release-orchestrator/security/agent-security.md b/docs/modules/release-orchestrator/security/agent-security.md new file mode 100644 index 000000000..f2afabb27 --- /dev/null +++ b/docs/modules/release-orchestrator/security/agent-security.md @@ -0,0 +1,286 @@ +# Agent Security Model + +## Overview + +Agents are trusted components that execute deployment tasks on targets. Their security model ensures: +- Strong identity through mTLS certificates +- Minimal privilege through scoped task credentials +- Audit trail through signed task receipts +- Isolation through process sandboxing + +## Agent Registration Flow + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ AGENT REGISTRATION FLOW │ +│ │ +│ 1. Admin generates registration token (one-time use) │ +│ POST /api/v1/admin/agent-tokens │ +│ Response: { token: "reg_xxx", expiresAt: "..." } │ +│ │ +│ 2. Agent starts with registration token │ +│ ./stella-agent --register --token=reg_xxx │ +│ │ +│ 3. Agent requests mTLS certificate │ +│ POST /api/v1/agents/register │ +│ Headers: X-Registration-Token: reg_xxx │ +│ Body: { name, version, capabilities, csr } │ +│ Response: { agentId, certificate, caCertificate } │ +│ │ +│ 4. Agent establishes mTLS connection │ +│ Uses issued certificate for all subsequent requests │ +│ │ +│ 5. Agent requests short-lived JWT for task execution │ +│ POST /api/v1/agents/token (over mTLS) │ +│ Response: { token, expiresIn: 3600 } │ +│ │ +│ 6. Agent refreshes token before expiration │ +│ Token refresh only over mTLS connection │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## mTLS Communication + +All agent-to-core communication uses mutual TLS: + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ AGENT COMMUNICATION SECURITY │ +│ │ +│ ┌──────────────┐ ┌──────────────┐ │ +│ │ AGENT │ │ STELLA CORE │ │ +│ └──────┬───────┘ └──────┬───────┘ │ +│ │ │ │ +│ │ mTLS (mutual TLS) │ │ +│ │ - Agent cert signed by Stella CA │ │ +│ │ - Server cert verified by Agent │ │ +│ │ - TLS 1.3 only │ │ +│ │ - Perfect forward secrecy │ │ +│ │◄────────────────────────────────────────►│ │ +│ │ │ │ +│ │ Encrypted payload │ │ +│ │ - Task payloads encrypted with │ │ +│ │ agent-specific key │ │ +│ │ - Logs encrypted in transit │ │ +│ │◄────────────────────────────────────────►│ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### TLS Requirements + +| Requirement | Value | +|-------------|-------| +| Protocol | TLS 1.3 only | +| Cipher Suites | TLS_AES_256_GCM_SHA384, TLS_CHACHA20_POLY1305_SHA256 | +| Key Exchange | ECDHE with P-384 or X25519 | +| Certificate Key | RSA 4096-bit or ECDSA P-384 | +| Certificate Validity | 90 days (auto-renewed) | + +## Certificate Management + +### Certificate Structure + +```typescript +interface AgentCertificate { + subject: { + CN: string; // Agent name + O: string; // "Stella Ops" + OU: string; // Tenant ID + }; + serialNumber: string; + issuer: string; // Stella CA + validFrom: DateTime; + validTo: DateTime; + extensions: { + keyUsage: ["digitalSignature", "keyEncipherment"]; + extendedKeyUsage: ["clientAuth"]; + subjectAltName: string[]; // Agent ID as URI + }; +} +``` + +### Certificate Renewal + +Agents automatically renew certificates before expiration: +1. Agent detects certificate expiring within 30 days +2. Agent generates new CSR with same identity +3. Agent submits renewal request over existing mTLS connection +4. Authority issues new certificate +5. Agent transitions to new certificate seamlessly + +## Secrets Management + +Secrets are NEVER stored in the Stella database. Only vault references are stored. + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ SECRETS FLOW (NEVER STORED IN DB) │ +│ │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ VAULT │ │ STELLA CORE │ │ AGENT │ │ +│ │ (Source) │ │ (Broker) │ │ (Consumer) │ │ +│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ +│ │ │ │ │ +│ │ │ Task requires secret │ │ +│ │ │ │ │ +│ │ Fetch with service │ │ │ +│ │ account token │ │ │ +│ │◄─────────────────────── │ │ +│ │ │ │ │ +│ │ Return secret │ │ │ +│ │ (wrapped, short TTL) │ │ │ +│ │────────────────────────► │ │ +│ │ │ │ │ +│ │ │ Embed in task payload │ │ +│ │ │ (encrypted) │ │ +│ │ │────────────────────────► │ +│ │ │ │ │ +│ │ │ │ Decrypt │ +│ │ │ │ Use for task │ +│ │ │ │ Discard │ +│ │ +│ Rules: │ +│ - Secrets NEVER stored in Stella database │ +│ - Only Vault references stored │ +│ - Secrets fetched at execution time only │ +│ - Secrets not logged (masked in logs) │ +│ - Secrets not persisted in agent memory beyond task scope │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Task Security + +### Task Assignment + +```typescript +interface AgentTask { + id: UUID; + type: TaskType; + targetId: UUID; + payload: TaskPayload; + credentials: EncryptedCredentials; // Encrypted with agent's public key + timeout: number; + priority: TaskPriority; + idempotencyKey: string; + assignedAt: DateTime; + expiresAt: DateTime; +} +``` + +### Credential Scoping + +Task credentials are: +- Scoped to specific target only +- Valid only for task duration +- Encrypted with agent's public key +- Logged when accessed (without values) + +### Task Execution Isolation + +Agents execute tasks with isolation: +```typescript +interface TaskExecutionContext { + // Process isolation + workingDirectory: string; // Unique per task + processUser: string; // Non-root user + networkNamespace: string; // If network isolation enabled + + // Resource limits + memoryLimit: number; // Bytes + cpuLimit: number; // Millicores + diskLimit: number; // Bytes + networkEgress: string[]; // Allowed destinations + + // Cleanup + cleanupOnComplete: boolean; + cleanupTimeout: number; +} +``` + +## Agent Capabilities + +Agents declare capabilities that determine what tasks they can execute: + +```typescript +interface AgentCapabilities { + docker?: DockerCapability; + compose?: ComposeCapability; + ssh?: SshCapability; + winrm?: WinrmCapability; + ecs?: EcsCapability; + nomad?: NomadCapability; +} + +interface DockerCapability { + version: string; + apiVersion: string; + runtimes: string[]; + registryAuth: boolean; +} + +interface ComposeCapability { + version: string; + fileFormats: string[]; +} +``` + +## Heartbeat Protocol + +```typescript +interface AgentHeartbeat { + agentId: UUID; + timestamp: DateTime; + status: "healthy" | "degraded"; + resourceUsage: { + cpuPercent: number; + memoryPercent: number; + diskPercent: number; + networkRxBytes: number; + networkTxBytes: number; + }; + activeTaskCount: number; + completedTasks: number; + failedTasks: number; + errors: string[]; + signature: string; // HMAC of heartbeat data +} +``` + +### Heartbeat Validation + +1. Verify signature matches expected HMAC +2. Check timestamp is within acceptable skew (30s) +3. Update agent status based on heartbeat content +4. Trigger alerts if heartbeat missing for >90s + +## Agent Revocation + +When an agent is compromised or decommissioned: + +1. Certificate added to CRL (Certificate Revocation List) +2. All pending tasks for agent cancelled +3. Agent removed from target assignments +4. Audit event logged +5. New agent can be registered with same name (new identity) + +## Security Checklist + +| Control | Implementation | +|---------|----------------| +| Identity | mTLS certificates signed by internal CA | +| Authentication | Certificate-based + short-lived JWT | +| Authorization | Task-scoped credentials | +| Encryption | TLS 1.3 for transport, envelope encryption for secrets | +| Isolation | Process sandboxing, resource limits | +| Audit | All task assignments and completions logged | +| Revocation | CRL for compromised agents | +| Secret handling | Vault integration, no persistence | + +## References + +- [Security Overview](overview.md) +- [Authentication & Authorization](auth.md) +- [Threat Model](threat-model.md) diff --git a/docs/modules/release-orchestrator/security/auth.md b/docs/modules/release-orchestrator/security/auth.md new file mode 100644 index 000000000..1ee0dc2f4 --- /dev/null +++ b/docs/modules/release-orchestrator/security/auth.md @@ -0,0 +1,305 @@ +# Authentication & Authorization + +## Authentication Methods + +### OAuth 2.0 for Human Users + +``` +┌──────────────────────────────────────────────────────────────────────────────┐ +│ OAUTH 2.0 AUTHORIZATION CODE FLOW │ +│ │ +│ ┌──────────┐ ┌──────────────┐ │ +│ │ Browser │ │ Authority │ │ +│ └────┬─────┘ └──────┬───────┘ │ +│ │ │ │ +│ │ 1. Login request │ │ +│ │ ────────────────────────────────────► │ │ +│ │ │ │ +│ │ 2. Redirect to IdP │ │ +│ │ ◄──────────────────────────────────── │ │ +│ │ │ │ +│ │ 3. User authenticates at IdP │ │ +│ │ ─────────────────────────────────► │ │ +│ │ │ │ +│ │ 4. IdP callback with code │ │ +│ │ ◄──────────────────────────────────── │ │ +│ │ │ │ +│ │ 5. Exchange code for tokens │ │ +│ │ ────────────────────────────────────► │ │ +│ │ │ │ +│ │ 6. Access token + refresh token │ │ +│ │ ◄──────────────────────────────────── │ │ +│ │ │ │ +└──────────────────────────────────────────────────────────────────────────────┘ +``` + +### mTLS for Agents + +Agents authenticate using mutual TLS with certificates issued by Stella's internal CA. + +**Registration Flow:** +1. Admin generates one-time registration token +2. Agent starts with registration token +3. Agent submits CSR (Certificate Signing Request) +4. Authority issues certificate signed by Stella CA +5. Agent uses certificate for all subsequent requests + +### API Keys for Service-to-Service + +External services can use API keys for programmatic access: +- Keys are tenant-scoped +- Keys can have restricted permissions +- Keys can have expiration dates +- Key usage is audited + +## JWT Token Structure + +### Access Token Claims + +```typescript +interface AccessTokenClaims { + // Standard claims + iss: string; // "https://authority.stella.local" + sub: string; // User ID + aud: string[]; // ["stella-api"] + exp: number; // Expiration timestamp + iat: number; // Issued at timestamp + jti: string; // Unique token ID + + // Custom claims + tenant_id: string; + roles: string[]; + permissions: Permission[]; + email?: string; + name?: string; +} +``` + +### Token Lifetimes + +| Token Type | Lifetime | Refresh | +|------------|----------|---------| +| Access Token | 15 minutes | Via refresh token | +| Refresh Token | 7 days | Rotated on use | +| Agent Token | 1 hour | Via mTLS connection | +| API Key | Configurable | Not refreshed | + +## Authorization Model + +### Resource Types + +```typescript +type ResourceType = + | "environment" + | "release" + | "promotion" + | "target" + | "agent" + | "workflow" + | "plugin" + | "integration" + | "evidence"; +``` + +### Action Types + +```typescript +type ActionType = + | "create" + | "read" + | "update" + | "delete" + | "execute" + | "approve" + | "deploy" + | "rollback"; +``` + +### Permission Structure + +```typescript +interface Permission { + resource: ResourceType; + action: ActionType; + scope?: PermissionScope; + conditions?: Condition[]; +} + +type PermissionScope = + | "*" // All resources + | { environmentId: UUID } // Specific environment + | { labels: Record }; // Label-based +``` + +### Built-in Roles + +| Role | Description | Key Permissions | +|------|-------------|-----------------| +| `admin` | Full access | All permissions | +| `release_manager` | Manage releases and promotions | Create releases, request promotions | +| `deployer` | Execute deployments | Approve promotions (where allowed), view releases | +| `approver` | Approve promotions | Approve promotions (SoD respected) | +| `viewer` | Read-only access | Read all resources | +| `agent` | Agent service account | Execute deployment tasks | + +### Role Definitions + +```typescript +const roles = { + admin: { + permissions: [ + { resource: "*", action: "*" } + ] + }, + release_manager: { + permissions: [ + { resource: "release", action: "create" }, + { resource: "release", action: "read" }, + { resource: "release", action: "update" }, + { resource: "promotion", action: "create" }, + { resource: "promotion", action: "read" }, + { resource: "environment", action: "read" }, + { resource: "workflow", action: "read" }, + { resource: "workflow", action: "execute" } + ] + }, + deployer: { + permissions: [ + { resource: "release", action: "read" }, + { resource: "promotion", action: "read" }, + { resource: "promotion", action: "approve" }, + { resource: "environment", action: "read" }, + { resource: "target", action: "read" }, + { resource: "agent", action: "read" } + ] + }, + approver: { + permissions: [ + { resource: "promotion", action: "read" }, + { resource: "promotion", action: "approve" }, + { resource: "release", action: "read" }, + { resource: "environment", action: "read" } + ] + }, + viewer: { + permissions: [ + { resource: "*", action: "read" } + ] + } +}; +``` + +## Environment-Scoped Permissions + +Permissions can be scoped to specific environments: + +```typescript +// User can approve promotions only to staging +{ + resource: "promotion", + action: "approve", + scope: { environmentId: "staging-env-id" } +} + +// User can deploy only to targets with specific labels +{ + resource: "target", + action: "deploy", + scope: { labels: { "tier": "frontend" } } +} +``` + +## Separation of Duties (SoD) + +When SoD is enabled for an environment: +- The user who requested a promotion cannot approve it +- The user who created a release cannot be the sole approver +- Approval records include SoD verification status + +```typescript +interface ApprovalValidation { + promotionId: UUID; + approverId: UUID; + requesterId: UUID; + sodRequired: boolean; + sodSatisfied: boolean; + validationResult: "valid" | "self_approval_denied" | "sod_violation"; +} +``` + +## Permission Checking Algorithm + +```typescript +async function checkPermission( + userId: UUID, + resource: ResourceType, + action: ActionType, + resourceId?: UUID +): Promise { + // 1. Get user's roles and direct permissions + const userRoles = await getUserRoles(userId); + const userPermissions = await getUserPermissions(userId); + + // 2. Expand role permissions + const rolePermissions = userRoles.flatMap(r => roles[r].permissions); + const allPermissions = [...rolePermissions, ...userPermissions]; + + // 3. Check for matching permission + for (const perm of allPermissions) { + if (matchesResource(perm.resource, resource) && + matchesAction(perm.action, action) && + matchesScope(perm.scope, resourceId) && + evaluateConditions(perm.conditions)) { + return true; + } + } + + return false; +} + +function matchesResource(pattern: string, resource: string): boolean { + return pattern === "*" || pattern === resource; +} + +function matchesAction(pattern: string, action: string): boolean { + return pattern === "*" || pattern === action; +} +``` + +## API Authorization Headers + +All API requests require: +```http +Authorization: Bearer +``` + +For agent requests (over mTLS): +```http +X-Agent-Id: +Authorization: Bearer +``` + +## Permission Denied Response + +```json +{ + "success": false, + "error": { + "code": "PERMISSION_DENIED", + "message": "User does not have permission to approve promotions to production", + "details": { + "resource": "promotion", + "action": "approve", + "scope": { "environmentId": "prod-env-id" }, + "requiredRoles": ["admin", "approver"], + "userRoles": ["viewer"] + } + } +} +``` + +## References + +- [Security Overview](overview.md) +- [Agent Security](agent-security.md) +- [Authority Module](../../../authority/architecture.md) diff --git a/docs/modules/release-orchestrator/security/overview.md b/docs/modules/release-orchestrator/security/overview.md new file mode 100644 index 000000000..3d5b8b3fd --- /dev/null +++ b/docs/modules/release-orchestrator/security/overview.md @@ -0,0 +1,281 @@ +# Security Architecture Overview + +## Security Principles + +| Principle | Implementation | +|-----------|----------------| +| **Defense in depth** | Multiple layers: network, auth, authz, audit | +| **Least privilege** | Role-based access; minimal permissions | +| **Zero trust** | All requests authenticated; mTLS for agents | +| **Secrets hygiene** | Secrets in vault; never in DB; ephemeral injection | +| **Audit everything** | All mutations logged; evidence trail | +| **Immutable evidence** | Evidence packets append-only; cryptographically signed | + +## Authentication Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ AUTHENTICATION ARCHITECTURE │ +│ │ +│ Human Users Service/Agent │ +│ ┌──────────┐ ┌──────────┐ │ +│ │ Browser │ │ Agent │ │ +│ └────┬─────┘ └────┬─────┘ │ +│ │ │ │ +│ │ OAuth 2.0 │ mTLS + JWT │ +│ │ Authorization Code │ │ +│ ▼ ▼ │ +│ ┌──────────────────────────────────────────────────────────────────┐ │ +│ │ AUTHORITY MODULE │ │ +│ │ │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ OAuth 2.0 │ │ mTLS │ │ API Key │ │ │ +│ │ │ Provider │ │ Validator │ │ Validator │ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────────────────────┐ │ │ +│ │ │ TOKEN ISSUER │ │ │ +│ │ │ - Short-lived JWT (15 min) │ │ │ +│ │ │ - Contains: user_id, tenant_id, roles, permissions │ │ │ +│ │ │ - Signed with RS256 │ │ │ +│ │ └─────────────────────────────────────────────────────────────┘ │ │ +│ └──────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌──────────────────────────────────────────────────────────────────┐ │ +│ │ API GATEWAY │ │ +│ │ │ │ +│ │ - Validate JWT signature │ │ +│ │ - Check token expiration │ │ +│ │ - Extract tenant context │ │ +│ │ - Enforce rate limits │ │ +│ └──────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Authorization Model + +### Permission Structure + +```typescript +interface Permission { + resource: ResourceType; + action: ActionType; + scope?: ScopeType; + conditions?: Condition[]; +} + +type ResourceType = + | "environment" + | "release" + | "promotion" + | "target" + | "agent" + | "workflow" + | "plugin" + | "integration" + | "evidence"; + +type ActionType = + | "create" + | "read" + | "update" + | "delete" + | "execute" + | "approve" + | "deploy" + | "rollback"; + +type ScopeType = + | "*" // All resources + | { environmentId: UUID } // Specific environment + | { labels: Record }; // Label-based +``` + +### Role Definitions + +| Role | Permissions | +|------|-------------| +| `admin` | All permissions on all resources | +| `release-manager` | Full access to releases, promotions; read environments/targets | +| `deployer` | Read releases; create/read promotions; read targets | +| `approver` | Read/approve promotions | +| `viewer` | Read-only access to all resources | + +### Environment-Scoped Roles + +Roles can be scoped to specific environments: + +```typescript +// Example: Production deployer can only deploy to production +const prodDeployer = { + role: "deployer", + scope: { environmentId: "prod-environment-uuid" } +}; +``` + +## Policy Enforcement Points + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ POLICY ENFORCEMENT POINTS │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ API LAYER (PEP 1) │ │ +│ │ - Authenticate request │ │ +│ │ - Check resource-level permissions │ │ +│ │ - Enforce tenant isolation │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ SERVICE LAYER (PEP 2) │ │ +│ │ - Check business-level permissions │ │ +│ │ - Validate separation of duties │ │ +│ │ - Enforce approval policies │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ DECISION ENGINE (PEP 3) │ │ +│ │ - Evaluate security gates │ │ +│ │ - Evaluate custom OPA policies │ │ +│ │ - Produce signed decision records │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ DATA LAYER (PEP 4) │ │ +│ │ - Row-level security (tenant_id) │ │ +│ │ - Append-only enforcement (evidence) │ │ +│ │ - Encryption at rest │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Agent Security Model + +See [Agent Security](agent-security.md) for detailed agent security architecture. + +Key features: +- mTLS authentication with CA-signed certificates +- One-time registration tokens +- Short-lived JWT for task execution +- Encrypted task payloads +- Scoped credentials per task + +## Secrets Management + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ SECRETS FLOW (NEVER STORED IN DB) │ +│ │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ VAULT │ │ STELLA CORE │ │ AGENT │ │ +│ │ (Source) │ │ (Broker) │ │ (Consumer) │ │ +│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ +│ │ │ │ │ +│ │ │ Task requires secret │ │ +│ │ │ │ │ +│ │ Fetch with service │ │ │ +│ │ account token │ │ │ +│ │◄─────────────────────── │ │ +│ │ │ │ │ +│ │ Return secret │ │ │ +│ │ (wrapped, short TTL) │ │ │ +│ │───────────────────────► │ │ +│ │ │ │ │ +│ │ │ Embed in task payload │ │ +│ │ │ (encrypted) │ │ +│ │ │───────────────────────► │ +│ │ │ │ │ +│ │ │ │ Decrypt │ +│ │ │ │ Use for task │ +│ │ │ │ Discard │ +│ │ +│ Rules: │ +│ - Secrets NEVER stored in Stella database │ +│ - Only Vault references stored │ +│ - Secrets fetched at execution time only │ +│ - Secrets not logged (masked in logs) │ +│ - Secrets not persisted in agent memory beyond task scope │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Threat Model + +| Threat | Attack Vector | Mitigation | +|--------|---------------|------------| +| **Credential theft** | Database breach | Secrets never in DB; only vault refs | +| **Token replay** | Stolen JWT | Short-lived tokens (15 min); refresh tokens rotated | +| **Agent impersonation** | Fake agent | mTLS with CA-signed certs; registration token one-time | +| **Digest tampering** | Modified image | Digest verification at pull time; mismatch = failure | +| **Evidence tampering** | Modified audit records | Append-only table; cryptographic signing | +| **Privilege escalation** | Compromised account | Role-based access; SoD enforcement; audit logs | +| **Supply chain attack** | Malicious plugin | Plugin sandbox; capability declarations; review process | +| **Lateral movement** | Compromised target | Short-lived task credentials; scoped permissions | +| **Data exfiltration** | Log/artifact theft | Encryption at rest; network segmentation | +| **Denial of service** | Resource exhaustion | Rate limiting; resource quotas; circuit breakers | + +## Audit Trail + +### Audit Event Structure + +```typescript +interface AuditEvent { + id: UUID; + timestamp: DateTime; + tenantId: UUID; + + // Actor + actorType: "user" | "agent" | "system" | "plugin"; + actorId: UUID; + actorName: string; + actorIp?: string; + + // Action + action: string; // "promotion.approved", "deployment.started" + resource: string; // "promotion" + resourceId: UUID; + + // Context + environmentId?: UUID; + releaseId?: UUID; + promotionId?: UUID; + + // Details + before?: object; // State before (for updates) + after?: object; // State after + metadata?: object; // Additional context + + // Integrity + previousEventHash: string; // Hash chain for tamper detection + eventHash: string; +} +``` + +### Audited Operations + +| Category | Operations | +|----------|------------| +| **Authentication** | Login, logout, token refresh, failed attempts | +| **Authorization** | Permission denied events | +| **Environments** | Create, update, delete, freeze window changes | +| **Releases** | Create, deprecate, archive | +| **Promotions** | Request, approve, reject, cancel | +| **Deployments** | Start, complete, fail, rollback | +| **Targets** | Register, update, delete, health changes | +| **Agents** | Register, heartbeat gaps, capability changes | +| **Integrations** | Create, update, delete, test | +| **Plugins** | Enable, disable, config changes | +| **Evidence** | Create (never update/delete) | + +## References + +- [Authentication & Authorization](auth.md) +- [Agent Security](agent-security.md) +- [Threat Model](threat-model.md) +- [Audit Trail](audit-trail.md) diff --git a/docs/modules/release-orchestrator/security/threat-model.md b/docs/modules/release-orchestrator/security/threat-model.md new file mode 100644 index 000000000..7879a0dbc --- /dev/null +++ b/docs/modules/release-orchestrator/security/threat-model.md @@ -0,0 +1,207 @@ +# Threat Model + +## Overview + +This document identifies threats to the Release Orchestrator and their mitigations. + +## Threat Categories + +### T1: Credential Theft + +| Aspect | Description | +|--------|-------------| +| **Threat** | Attacker gains access to credentials through database breach | +| **Attack Vector** | SQL injection, database backup theft, insider threat | +| **Assets at Risk** | Registry credentials, vault tokens, SSH keys | +| **Mitigation** | Secrets NEVER stored in database; only vault references stored | +| **Detection** | Anomalous vault access patterns, failed authentication attempts | + +### T2: Token Replay + +| Aspect | Description | +|--------|-------------| +| **Threat** | Attacker captures and reuses valid JWT tokens | +| **Attack Vector** | Man-in-the-middle, log file exposure, memory dump | +| **Assets at Risk** | User sessions, API access | +| **Mitigation** | Short-lived tokens (15 min), refresh token rotation, TLS everywhere | +| **Detection** | Token used from unusual IP, concurrent sessions | + +### T3: Agent Impersonation + +| Aspect | Description | +|--------|-------------| +| **Threat** | Attacker registers fake agent to receive deployment tasks | +| **Attack Vector** | Stolen registration token, certificate forgery | +| **Assets at Risk** | Deployment credentials, target access | +| **Mitigation** | One-time registration tokens, mTLS with CA-signed certs | +| **Detection** | Registration from unexpected network, capability mismatch | + +### T4: Digest Tampering + +| Aspect | Description | +|--------|-------------| +| **Threat** | Attacker modifies container image after release creation | +| **Attack Vector** | Registry compromise, man-in-the-middle at pull time | +| **Assets at Risk** | Application integrity, supply chain | +| **Mitigation** | Digest verification at pull time; mismatch = deployment failure | +| **Detection** | Pull failures due to digest mismatch | + +### T5: Evidence Tampering + +| Aspect | Description | +|--------|-------------| +| **Threat** | Attacker modifies audit records to hide malicious activity | +| **Attack Vector** | Database admin access, SQL injection | +| **Assets at Risk** | Audit integrity, compliance | +| **Mitigation** | Append-only table, cryptographic signing, no UPDATE/DELETE | +| **Detection** | Signature verification failure, hash chain break | + +### T6: Privilege Escalation + +| Aspect | Description | +|--------|-------------| +| **Threat** | User gains permissions beyond their role | +| **Attack Vector** | Role assignment exploit, permission bypass | +| **Assets at Risk** | Environment access, approval authority | +| **Mitigation** | Role-based access, SoD enforcement, audit logs | +| **Detection** | Unusual permission patterns, SoD violation attempts | + +### T7: Supply Chain Attack + +| Aspect | Description | +|--------|-------------| +| **Threat** | Malicious plugin injected into workflow | +| **Attack Vector** | Plugin repository compromise, typosquatting | +| **Assets at Risk** | All environments, all credentials | +| **Mitigation** | Plugin sandbox, capability declarations, signed manifests | +| **Detection** | Unexpected network egress, resource anomalies | + +### T8: Lateral Movement + +| Aspect | Description | +|--------|-------------| +| **Threat** | Attacker uses compromised target to access others | +| **Attack Vector** | Target compromise, credential reuse | +| **Assets at Risk** | Other targets, environments | +| **Mitigation** | Short-lived task credentials, scoped permissions | +| **Detection** | Cross-target credential use, unexpected connections | + +### T9: Data Exfiltration + +| Aspect | Description | +|--------|-------------| +| **Threat** | Attacker extracts logs, artifacts, or configuration | +| **Attack Vector** | API abuse, log aggregator compromise | +| **Assets at Risk** | Application data, deployment configurations | +| **Mitigation** | Encryption at rest, network segmentation, audit logging | +| **Detection** | Large data transfers, unusual API patterns | + +### T10: Denial of Service + +| Aspect | Description | +|--------|-------------| +| **Threat** | Attacker exhausts resources to prevent deployments | +| **Attack Vector** | API flooding, workflow loop, agent task spam | +| **Assets at Risk** | Service availability | +| **Mitigation** | Rate limiting, resource quotas, circuit breakers | +| **Detection** | Resource exhaustion alerts, traffic spikes | + +## STRIDE Analysis + +| Category | Threats | Primary Mitigations | +|----------|---------|---------------------| +| **Spoofing** | T3 Agent Impersonation | mTLS, registration tokens | +| **Tampering** | T4 Digest, T5 Evidence | Digest verification, append-only tables | +| **Repudiation** | Evidence manipulation | Signed evidence packets | +| **Information Disclosure** | T1 Credentials, T9 Exfiltration | Vault integration, encryption | +| **Denial of Service** | T10 Resource exhaustion | Rate limits, quotas | +| **Elevation of Privilege** | T6 Escalation | RBAC, SoD enforcement | + +## Trust Boundaries + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ TRUST BOUNDARIES │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ PUBLIC NETWORK (Untrusted) │ │ +│ │ │ │ +│ │ Internet, External Users, External Services │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ │ TLS + Authentication │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ DMZ (Semi-trusted) │ │ +│ │ │ │ +│ │ API Gateway, Webhook Gateway │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ │ Internal mTLS │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ INTERNAL NETWORK (Trusted) │ │ +│ │ │ │ +│ │ Stella Core Services, Database, Internal Vault │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ │ Agent mTLS │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ DEPLOYMENT NETWORK (Controlled) │ │ +│ │ │ │ +│ │ Agents, Targets │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Data Classification + +| Classification | Examples | Protection Requirements | +|---------------|----------|------------------------| +| **Critical** | Vault credentials, signing keys | Hardware security, minimal access | +| **Sensitive** | User tokens, agent certificates | Encryption, access logging | +| **Internal** | Release configs, workflow definitions | Encryption at rest | +| **Public** | API documentation, release names | Integrity protection | + +## Security Controls Summary + +| Control | Implementation | Threats Addressed | +|---------|----------------|-------------------| +| mTLS | Agent communication | T3 | +| Short-lived tokens | 15-min access tokens | T2 | +| Vault integration | No secrets in DB | T1 | +| Digest verification | Pull-time validation | T4 | +| Append-only tables | Evidence immutability | T5 | +| RBAC + SoD | Permission enforcement | T6 | +| Plugin sandbox | Resource limits, capability control | T7 | +| Scoped credentials | Task-specific access | T8 | +| Encryption | At rest and in transit | T9 | +| Rate limiting | API and resource quotas | T10 | + +## Incident Response + +### Detection Signals + +| Signal | Indicates | Response | +|--------|-----------|----------| +| Digest mismatch at pull | T4 Tampering | Halt deployment, investigate registry | +| Evidence signature failure | T5 Tampering | Preserve logs, forensic analysis | +| Unusual agent registration | T3 Impersonation | Revoke agent, review access | +| SoD violation attempt | T6 Escalation | Block action, alert admin | +| Plugin network egress | T7 Supply chain | Isolate plugin, review manifest | + +### Response Procedures + +1. **Contain** - Isolate affected component (revoke token, disable agent) +2. **Investigate** - Collect logs, evidence packets, audit trail +3. **Remediate** - Patch vulnerability, rotate credentials +4. **Recover** - Restore service, verify integrity +5. **Report** - Document incident, update threat model + +## References + +- [Security Overview](overview.md) +- [Agent Security](agent-security.md) +- [Audit Trail](audit-trail.md) diff --git a/docs/modules/release-orchestrator/test-structure.md b/docs/modules/release-orchestrator/test-structure.md new file mode 100644 index 000000000..aafaade6c --- /dev/null +++ b/docs/modules/release-orchestrator/test-structure.md @@ -0,0 +1,508 @@ +# Test Structure & Guidelines + +> Test organization, categorization, and patterns for Release Orchestrator modules. + +--- + +## Test Directory Layout + +Release Orchestrator tests follow the Stella Ops standard test structure: + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ ├── StellaOps.ReleaseOrchestrator.Core/ +│ ├── StellaOps.ReleaseOrchestrator.Workflow/ +│ ├── StellaOps.ReleaseOrchestrator.Promotion/ +│ └── StellaOps.ReleaseOrchestrator.Deploy/ +├── __Tests/ +│ ├── StellaOps.ReleaseOrchestrator.Core.Tests/ # Unit tests for Core +│ ├── StellaOps.ReleaseOrchestrator.Workflow.Tests/ # Unit tests for Workflow +│ ├── StellaOps.ReleaseOrchestrator.Promotion.Tests/ # Unit tests for Promotion +│ ├── StellaOps.ReleaseOrchestrator.Deploy.Tests/ # Unit tests for Deploy +│ ├── StellaOps.ReleaseOrchestrator.Integration.Tests/ # Integration tests +│ └── StellaOps.ReleaseOrchestrator.Acceptance.Tests/ # End-to-end tests +└── StellaOps.ReleaseOrchestrator.WebService/ +``` + +**Shared test infrastructure**: +``` +src/__Tests/__Libraries/ +├── StellaOps.Infrastructure.Postgres.Testing/ # PostgreSQL Testcontainers fixtures +└── StellaOps.Testing.Common/ # Common test utilities +``` + +--- + +## Test Categories + +Tests **MUST** be categorized using xUnit traits to enable selective execution: + +### Unit Tests + +```csharp +[Trait("Category", "Unit")] +public class PromotionValidatorTests +{ + [Fact] + public void Validate_MissingReleaseId_ReturnsFalse() + { + // Arrange + var validator = new PromotionValidator(); + var promotion = new Promotion { ReleaseId = Guid.Empty }; + + // Act + var result = validator.Validate(promotion); + + // Assert + Assert.False(result.IsValid); + Assert.Contains("ReleaseId is required", result.Errors); + } +} +``` + +**Characteristics**: +- No database, network, or file system access +- Fast execution (< 100ms per test) +- Isolated from external dependencies +- Deterministic and repeatable + +### Integration Tests + +```csharp +[Trait("Category", "Integration")] +public class PromotionRepositoryTests : IClassFixture +{ + private readonly PostgresFixture _fixture; + + public PromotionRepositoryTests(PostgresFixture fixture) + { + _fixture = fixture; + } + + [Fact] + public async Task SaveAsync_ValidPromotion_PersistsToDatabase() + { + // Arrange + await using var connection = _fixture.CreateConnection(); + var repository = new PromotionRepository(connection, _fixture.TimeProvider); + + var promotion = new Promotion + { + Id = Guid.NewGuid(), + TenantId = _fixture.DefaultTenantId, + ReleaseId = Guid.NewGuid(), + TargetEnvironmentId = Guid.NewGuid(), + Status = PromotionState.PendingApproval, + RequestedAt = _fixture.TimeProvider.GetUtcNow(), + RequestedBy = Guid.NewGuid() + }; + + // Act + await repository.SaveAsync(promotion, CancellationToken.None); + + // Assert + var retrieved = await repository.GetByIdAsync(promotion.Id, CancellationToken.None); + Assert.NotNull(retrieved); + Assert.Equal(promotion.ReleaseId, retrieved.ReleaseId); + } +} +``` + +**Characteristics**: +- Uses Testcontainers for PostgreSQL +- Requires Docker to be running +- Slower execution (hundreds of ms per test) +- Tests data access layer and database constraints + +### Acceptance Tests + +```csharp +[Trait("Category", "Acceptance")] +public class PromotionWorkflowTests : IClassFixture> +{ + private readonly WebApplicationFactory _factory; + private readonly HttpClient _client; + + public PromotionWorkflowTests(WebApplicationFactory factory) + { + _factory = factory; + _client = factory.CreateClient(); + } + + [Fact] + public async Task PromotionWorkflow_EndToEnd_SuccessfullyDeploysRelease() + { + // Arrange: Create environment, release, and promotion + var envId = await CreateEnvironmentAsync("Production"); + var releaseId = await CreateReleaseAsync("v2.3.1"); + + // Act: Request promotion + var promotionResponse = await _client.PostAsJsonAsync( + "/api/v1/promotions", + new { releaseId, targetEnvironmentId = envId }); + + promotionResponse.EnsureSuccessStatusCode(); + var promotion = await promotionResponse.Content.ReadFromJsonAsync(); + + // Act: Approve promotion + var approveResponse = await _client.PostAsync( + $"/api/v1/promotions/{promotion.Id}/approve", null); + + approveResponse.EnsureSuccessStatusCode(); + + // Assert: Verify deployment completed + var status = await GetPromotionStatusAsync(promotion.Id); + Assert.Equal("deployed", status.Status); + } +} +``` + +**Characteristics**: +- Tests full API surface and workflows +- Uses `WebApplicationFactory` for in-memory hosting +- Tests end-to-end scenarios +- May involve multiple services + +--- + +## PostgreSQL Test Fixtures + +### Testcontainers Fixture + +```csharp +public class PostgresFixture : IAsyncLifetime +{ + private PostgreSqlContainer? _container; + private NpgsqlConnection? _connection; + public TimeProvider TimeProvider { get; private set; } = null!; + public IGuidGenerator GuidGenerator { get; private set; } = null!; + public Guid DefaultTenantId { get; private set; } + + public async Task InitializeAsync() + { + // Start PostgreSQL container + _container = new PostgreSqlBuilder() + .WithImage("postgres:16") + .WithDatabase("stellaops_test") + .WithUsername("postgres") + .WithPassword("postgres") + .Build(); + + await _container.StartAsync(); + + // Create connection + _connection = new NpgsqlConnection(_container.GetConnectionString()); + await _connection.OpenAsync(); + + // Run migrations + await ApplyMigrationsAsync(); + + // Setup test infrastructure + TimeProvider = new ManualTimeProvider(); + GuidGenerator = new SequentialGuidGenerator(); + DefaultTenantId = Guid.Parse("00000000-0000-0000-0000-000000000001"); + + // Seed test data + await SeedTestDataAsync(); + } + + public NpgsqlConnection CreateConnection() + { + if (_container == null) + throw new InvalidOperationException("Container not initialized"); + + return new NpgsqlConnection(_container.GetConnectionString()); + } + + private async Task ApplyMigrationsAsync() + { + // Apply schema migrations + await ExecuteSqlFileAsync("schema/release-orchestrator-schema.sql"); + } + + private async Task SeedTestDataAsync() + { + // Create default tenant + await using var cmd = _connection!.CreateCommand(); + cmd.CommandText = @" + INSERT INTO tenants (id, name, created_at) + VALUES (@id, @name, @created_at) + ON CONFLICT DO NOTHING"; + cmd.Parameters.AddWithValue("id", DefaultTenantId); + cmd.Parameters.AddWithValue("name", "Test Tenant"); + cmd.Parameters.AddWithValue("created_at", TimeProvider.GetUtcNow()); + await cmd.ExecuteNonQueryAsync(); + } + + public async Task DisposeAsync() + { + if (_connection != null) + { + await _connection.DisposeAsync(); + } + + if (_container != null) + { + await _container.DisposeAsync(); + } + } +} +``` + +--- + +## Test Patterns + +### Deterministic Time in Tests + +```csharp +public class PromotionTimingTests +{ + [Fact] + public void CreatePromotion_SetsCorrectTimestamp() + { + // Arrange + var manualTime = new ManualTimeProvider(); + manualTime.SetUtcNow(new DateTimeOffset(2026, 1, 10, 14, 30, 0, TimeSpan.Zero)); + + var guidGen = new SequentialGuidGenerator(); + var manager = new PromotionManager(manualTime, guidGen); + + // Act + var promotion = manager.CreatePromotion( + releaseId: Guid.Parse("00000000-0000-0000-0000-000000000001"), + targetEnvId: Guid.Parse("00000000-0000-0000-0000-000000000002") + ); + + // Assert + Assert.Equal( + new DateTimeOffset(2026, 1, 10, 14, 30, 0, TimeSpan.Zero), + promotion.RequestedAt + ); + } +} +``` + +### Testing CancellationToken Propagation + +```csharp +public class PromotionCancellationTests +{ + [Fact] + public async Task ApprovePromotionAsync_CancellationRequested_ThrowsOperationCanceledException() + { + // Arrange + var cts = new CancellationTokenSource(); + var repository = new Mock(); + + repository + .Setup(r => r.GetByIdAsync(It.IsAny(), It.IsAny())) + .Returns(async (Guid id, CancellationToken ct) => + { + await Task.Delay(100, ct); // Simulate delay + return new Promotion { Id = id }; + }); + + var manager = new PromotionManager(repository.Object, TimeProvider.System, new SystemGuidGenerator()); + + // Act & Assert + cts.Cancel(); // Cancel before operation completes + + await Assert.ThrowsAsync(async () => + await manager.ApprovePromotionAsync(Guid.NewGuid(), Guid.NewGuid(), cts.Token) + ); + } +} +``` + +### Testing Immutability + +```csharp +public class ReleaseImmutabilityTests +{ + [Fact] + public void GetComponents_ReturnsImmutableCollection() + { + // Arrange + var release = new Release + { + Components = new Dictionary + { + ["api"] = new ComponentDigest("registry.io/api", "sha256:abc123", "v1.0.0") + }.ToImmutableDictionary() + }; + + // Act + var components = release.Components; + + // Assert: Attempting to modify throws + Assert.Throws(() => + { + var mutable = (IDictionary)components; + mutable["web"] = new ComponentDigest("registry.io/web", "sha256:def456", "v1.0.0"); + }); + } +} +``` + +### Testing Evidence Hash Determinism + +```csharp +public class EvidenceHashDeterminismTests +{ + [Fact] + public void ComputeEvidenceHash_SameInputs_ProducesSameHash() + { + // Arrange + var decisionRecord = new DecisionRecord + { + PromotionId = Guid.Parse("00000000-0000-0000-0000-000000000001"), + DecidedAt = new DateTimeOffset(2026, 1, 10, 12, 0, 0, TimeSpan.Zero), + Outcome = "approved", + GateResults = ImmutableArray.Create( + new GateResult("security", "pass", null) + ) + }; + + // Act: Compute hash multiple times + var hash1 = EvidenceHasher.ComputeHash(decisionRecord); + var hash2 = EvidenceHasher.ComputeHash(decisionRecord); + + // Assert: Hashes are identical + Assert.Equal(hash1, hash2); + } +} +``` + +--- + +## Running Tests + +### Run All Tests + +```bash +dotnet test src/StellaOps.sln +``` + +### Run Only Unit Tests + +```bash +dotnet test src/StellaOps.sln --filter "Category=Unit" +``` + +### Run Only Integration Tests + +```bash +dotnet test src/StellaOps.sln --filter "Category=Integration" +``` + +### Run Specific Test Class + +```bash +dotnet test --filter "FullyQualifiedName~PromotionValidatorTests" +``` + +### Run with Coverage + +```bash +dotnet test src/StellaOps.sln --collect:"XPlat Code Coverage" +``` + +--- + +## Test Data Builders + +Use builder pattern for complex test data: + +```csharp +public class PromotionBuilder +{ + private Guid _id = Guid.NewGuid(); + private Guid _tenantId = Guid.NewGuid(); + private Guid _releaseId = Guid.NewGuid(); + private Guid _targetEnvId = Guid.NewGuid(); + private PromotionState _status = PromotionState.PendingApproval; + private DateTimeOffset _requestedAt = DateTimeOffset.UtcNow; + + public PromotionBuilder WithId(Guid id) + { + _id = id; + return this; + } + + public PromotionBuilder WithStatus(PromotionState status) + { + _status = status; + return this; + } + + public PromotionBuilder WithReleaseId(Guid releaseId) + { + _releaseId = releaseId; + return this; + } + + public Promotion Build() + { + return new Promotion + { + Id = _id, + TenantId = _tenantId, + ReleaseId = _releaseId, + TargetEnvironmentId = _targetEnvId, + Status = _status, + RequestedAt = _requestedAt, + RequestedBy = Guid.NewGuid() + }; + } +} + +// Usage in tests +[Fact] +public void ApprovePromotion_PendingStatus_TransitionsToApproved() +{ + var promotion = new PromotionBuilder() + .WithStatus(PromotionState.PendingApproval) + .Build(); + + // ... test logic +} +``` + +--- + +## Code Coverage Requirements + +- **Unit tests**: Aim for 80%+ coverage of business logic +- **Integration tests**: Cover all data access paths and constraints +- **Acceptance tests**: Cover critical user journeys + +**Exclusions from coverage**: +- Program.cs / Startup.cs configuration code +- DTOs and simple data classes +- Generated code + +--- + +## Summary Checklist + +Before merging: + +- [ ] All tests categorized with `[Trait("Category", "...")]` +- [ ] Unit tests use `TimeProvider` and `IGuidGenerator` for determinism +- [ ] Integration tests use `PostgresFixture` with Testcontainers +- [ ] `CancellationToken` propagation tested where applicable +- [ ] Evidence hash determinism verified +- [ ] No test reimplements production logic +- [ ] All tests pass locally and in CI +- [ ] Code coverage meets requirements + +--- + +## References + +- [Implementation Guide](./implementation-guide.md) — .NET implementation patterns +- [CLAUDE.md](../../../CLAUDE.md) — Stella Ops coding rules +- [PostgreSQL Testing Guide](../../infrastructure/Postgres.Testing/README.md) — Testcontainers setup +- [src/__Tests/AGENTS.md](../../../src/__Tests/AGENTS.md) — Global test infrastructure diff --git a/docs/modules/release-orchestrator/ui/overview.md b/docs/modules/release-orchestrator/ui/overview.md new file mode 100644 index 000000000..a16d4c0e2 --- /dev/null +++ b/docs/modules/release-orchestrator/ui/overview.md @@ -0,0 +1,332 @@ +# UI Overview + +## Status + +**Planned** - UI implementation has not started. + +## Design Principles + +| Principle | Implementation | +|-----------|----------------| +| **Clarity** | Clear status indicators, intuitive navigation | +| **Real-time** | Live updates via WebSocket for deployments | +| **Actionable** | One-click approvals, quick actions | +| **Audit-friendly** | Full history visibility, evidence access | +| **Mobile-aware** | Responsive design for on-call scenarios | + +## Main Screens + +### Dashboard + +The main dashboard provides an at-a-glance view of deployment health across environments. + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ RELEASE ORCHESTRATOR [User] [Settings] │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ ENVIRONMENT PIPELINE │ │ +│ │ │ │ +│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ +│ │ │ DEV │───►│ STAGING │───►│ UAT │───►│ PROD │ │ │ +│ │ │ v1.5.0 │ │ v1.4.2 │ │ v1.4.1 │ │ v1.4.0 │ │ │ +│ │ │ 3/3 OK │ │ 2/2 OK │ │ 2/2 OK │ │ 5/5 OK │ │ │ +│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ +│ │ │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌──────────────────────────────┐ ┌──────────────────────────────┐ │ +│ │ PENDING APPROVALS (3) │ │ RECENT DEPLOYMENTS │ │ +│ │ │ │ │ │ +│ │ ● myapp → prod [Approve] │ │ ✓ api v1.5.0 → dev 2m │ │ +│ │ Requested by: John │ │ ✓ web v1.4.2 → staging 15m │ │ +│ │ 2 hours ago │ │ ✗ api v1.4.1 → uat 1h │ │ +│ │ │ │ ✓ web v1.4.0 → prod 2h │ │ +│ │ ● web → uat [Approve] │ │ │ │ +│ │ Requested by: Jane │ │ [View All] │ │ +│ │ 30 minutes ago │ │ │ │ +│ │ │ │ │ │ +│ └──────────────────────────────┘ └──────────────────────────────┘ │ +│ │ +│ ┌──────────────────────────────┐ ┌──────────────────────────────┐ │ +│ │ AGENT STATUS │ │ ACTIVE WORKFLOWS │ │ +│ │ │ │ │ │ +│ │ ● 12 Online │ │ ● Deploy api v1.5.0 │ │ +│ │ ○ 1 Offline │ │ Step: Health Check (3/5) │ │ +│ │ ◐ 2 Degraded │ │ │ │ +│ │ │ │ ● Promote web to UAT │ │ +│ │ [View Details] │ │ Step: Awaiting Approval │ │ +│ │ │ │ │ │ +│ └──────────────────────────────┘ └──────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### Releases View + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ RELEASES [+ Create Release] │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ Filter: [All ▼] Status: [All ▼] Search: [________________] │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ NAME STATUS COMPONENTS ENVIRONMENTS CREATED │ │ +│ ├─────────────────────────────────────────────────────────────────────┤ │ +│ │ myapp-v1.5.0 Ready 3 dev 2h ago │ │ +│ │ myapp-v1.4.2 Deployed 3 staging, uat 1d ago │ │ +│ │ myapp-v1.4.1 Deployed 3 prod 3d ago │ │ +│ │ myapp-v1.4.0 Deprecated 3 - 1w ago │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ RELEASE DETAIL: myapp-v1.5.0 [Promote ▼] │ │ +│ ├─────────────────────────────────────────────────────────────────────┤ │ +│ │ │ │ +│ │ Components: │ │ +│ │ ┌────────────────────────────────────────────────────────────┐ │ │ +│ │ │ api sha256:abc123... registry.io/myorg/api │ │ │ +│ │ │ web sha256:def456... registry.io/myorg/web │ │ │ +│ │ │ worker sha256:ghi789... registry.io/myorg/worker │ │ │ +│ │ └────────────────────────────────────────────────────────────┘ │ │ +│ │ │ │ +│ │ Source: https://github.com/myorg/myapp @ v1.5.0 │ │ +│ │ Created: 2h ago by john@example.com │ │ +│ │ │ │ +│ │ Promotion History: │ │ +│ │ dev (✓) → staging (pending) → uat (-) → prod (-) │ │ +│ │ │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### Promotion Detail + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ PROMOTION: myapp-v1.5.0 → production │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ Status: PENDING APPROVAL [Approve] [Reject] │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ GATE EVALUATION │ │ +│ ├─────────────────────────────────────────────────────────────────────┤ │ +│ │ │ │ +│ │ ✓ Security Gate Passed │ │ +│ │ No critical vulnerabilities │ │ +│ │ │ │ +│ │ ✓ Freeze Window Check Passed │ │ +│ │ No active freeze windows │ │ +│ │ │ │ +│ │ ◐ Approval Gate 1/2 Approvals │ │ +│ │ Jane approved 30m ago │ │ +│ │ Waiting for 1 more approval │ │ +│ │ │ │ +│ │ ○ Separation of Duties Pending │ │ +│ │ Requester: John (cannot approve) │ │ +│ │ │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ PROMOTION TIMELINE │ │ +│ ├─────────────────────────────────────────────────────────────────────┤ │ +│ │ │ │ +│ │ 10:00 John requested promotion │ │ +│ │ 10:05 Security gate evaluated: PASSED │ │ +│ │ 10:05 Freeze check: PASSED │ │ +│ │ 10:30 Jane approved │ │ +│ │ 11:00 Waiting for additional approval... │ │ +│ │ │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### Workflow Editor + +Visual editor for creating and modifying workflow templates. + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ WORKFLOW EDITOR: standard-deploy [Save] [Run] │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ ┌─────────────────┐ ┌─────────────────────────────────────────────────┐ │ +│ │ STEP PALETTE │ │ │ │ +│ │ │ │ │ │ +│ │ Control │ │ ┌──────────┐ │ │ +│ │ ├─ Approval │ │ │ Approval │ │ │ +│ │ ├─ Wait │ │ │ Gate │ │ │ +│ │ └─ Condition │ │ └────┬─────┘ │ │ +│ │ │ │ │ │ │ +│ │ Gates │ │ ▼ │ │ +│ │ ├─ Security │ │ ┌──────────┐ │ │ +│ │ ├─ Freeze │ │ │ Security │ │ │ +│ │ └─ Custom │ │ │ Gate │ │ │ +│ │ │ │ └────┬─────┘ │ │ +│ │ Deploy │ │ │ │ │ +│ │ ├─ Docker │ │ ▼ │ │ +│ │ ├─ Compose │ │ ┌──────────┐ │ │ +│ │ └─ ECS │ │ │ Deploy │ │ │ +│ │ │ │ │ Targets │ │ │ +│ │ Verify │ │ └────┬─────┘ │ │ +│ │ ├─ Health │ │ │ │ │ +│ │ └─ Smoke Test │ │ ┌────┴────┐ │ │ +│ │ │ │ │ │ │ │ +│ │ Notify │ │ ▼ ▼ │ │ +│ │ ├─ Slack │ │ ┌──────┐ ┌──────────┐ │ │ +│ │ └─ Email │ │ │Health│ │ Rollback │◄──[on failure] │ │ +│ │ │ │ │Check │ │ Handler │ │ │ +│ │ │ │ └──┬───┘ └────┬─────┘ │ │ +│ │ │ │ │ │ │ │ +│ │ │ │ ▼ ▼ │ │ +│ │ │ │ ┌──────┐ ┌──────────┐ │ │ +│ │ │ │ │Notify│ │ Notify │ │ │ +│ │ │ │ │Success│ │ Failure │ │ │ +│ │ │ │ └──────┘ └──────────┘ │ │ +│ │ │ │ │ │ +│ └─────────────────┘ └─────────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ STEP PROPERTIES: Deploy Targets │ │ +│ ├─────────────────────────────────────────────────────────────────────┤ │ +│ │ Type: deploy-compose │ │ +│ │ Strategy: [Rolling ▼] │ │ +│ │ Parallelism: [2] │ │ +│ │ Timeout: [600] seconds │ │ +│ │ On Failure: [Rollback ▼] │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### Deployment Live View + +Real-time view of an active deployment. + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ DEPLOYMENT: myapp-v1.5.0 → production [Abort]│ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ Status: RUNNING Progress: ████████░░ 80% │ +│ Strategy: Rolling (batch 4/5) Duration: 5m 23s │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ TARGET STATUS │ │ +│ ├─────────────────────────────────────────────────────────────────────┤ │ +│ │ │ │ +│ │ ✓ prod-host-1 sha256:abc123 Deployed Health: OK │ │ +│ │ ✓ prod-host-2 sha256:abc123 Deployed Health: OK │ │ +│ │ ✓ prod-host-3 sha256:abc123 Deployed Health: OK │ │ +│ │ ● prod-host-4 sha256:abc123 Deploying Health: Checking... │ │ +│ │ ○ prod-host-5 - Pending Health: - │ │ +│ │ │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ LIVE LOGS: prod-host-4 │ │ +│ ├─────────────────────────────────────────────────────────────────────┤ │ +│ │ 10:25:15 Pulling image sha256:abc123... │ │ +│ │ 10:25:18 Image pulled successfully │ │ +│ │ 10:25:19 Stopping existing container... │ │ +│ │ 10:25:20 Starting new container... │ │ +│ │ 10:25:21 Container started │ │ +│ │ 10:25:22 Running health check... │ │ +│ │ 10:25:25 Health check passed (1/3) │ │ +│ │ 10:25:28 Health check passed (2/3) │ │ +│ │ ... │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### Environment Management + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ ENVIRONMENTS [+ Add Environment] │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ NAME ORDER TARGETS CURRENT RELEASE APPROVALS STATUS │ │ +│ ├─────────────────────────────────────────────────────────────────────┤ │ +│ │ development 1 3 myapp-v1.5.0 0 Active │ │ +│ │ staging 2 2 myapp-v1.4.2 1 Active │ │ +│ │ uat 3 2 myapp-v1.4.1 1 Active │ │ +│ │ production 4 5 myapp-v1.4.0 2 + SoD Active │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ ENVIRONMENT DETAIL: production [Edit] │ │ +│ ├─────────────────────────────────────────────────────────────────────┤ │ +│ │ │ │ +│ │ Approval Policy: │ │ +│ │ - Required approvals: 2 │ │ +│ │ - Separation of duties: Enabled │ │ +│ │ - Approver roles: release-manager, tech-lead │ │ +│ │ │ │ +│ │ Freeze Windows: │ │ +│ │ ┌────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Holiday Freeze Dec 20 - Jan 5 Active [Remove] │ │ │ +│ │ │ Weekend Freeze Sat-Sun Active [Remove] │ │ │ +│ │ └────────────────────────────────────────────────────────────┘ │ │ +│ │ [+ Add Freeze Window] │ │ +│ │ │ │ +│ │ Targets: │ │ +│ │ ┌────────────────────────────────────────────────────────────┐ │ │ +│ │ │ prod-host-1 docker_host healthy sha256:abc... │ │ │ +│ │ │ prod-host-2 docker_host healthy sha256:abc... │ │ │ +│ │ │ prod-host-3 docker_host healthy sha256:abc... │ │ │ +│ │ │ prod-host-4 docker_host healthy sha256:abc... │ │ │ +│ │ │ prod-host-5 docker_host degraded sha256:abc... │ │ │ +│ │ └────────────────────────────────────────────────────────────┘ │ │ +│ │ │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Key Interactions + +### Approval Flow + +1. User sees pending approval notification on dashboard +2. Clicks to view promotion detail +3. Reviews gate evaluation results and change details +4. Clicks "Approve" or "Reject" with optional comment +5. System validates SoD requirements +6. Promotion advances or notification sent + +### Quick Promote + +1. From release detail, user clicks "Promote" +2. Selects target environment from dropdown +3. Confirms promotion request +4. System evaluates gates immediately +5. If auto-approved, deployment begins +6. If approval required, notification sent to approvers + +### Emergency Rollback + +1. From deployment history or alert, user clicks "Rollback" +2. System shows previous healthy version +3. User confirms rollback +4. System creates rollback deployment job +5. Real-time progress shown + +## Mobile Considerations + +- Responsive design for smaller screens +- Critical actions (approve/reject) accessible on mobile +- Push notifications for pending approvals +- Simplified views for monitoring on-the-go + +## References + +- [API Overview](../api/overview.md) +- [Workflow Templates](../workflow/templates.md) diff --git a/docs/modules/release-orchestrator/workflow/execution.md b/docs/modules/release-orchestrator/workflow/execution.md new file mode 100644 index 000000000..bfb161480 --- /dev/null +++ b/docs/modules/release-orchestrator/workflow/execution.md @@ -0,0 +1,591 @@ +# Workflow Execution + +## Overview + +The Workflow Engine executes workflow templates as DAGs (Directed Acyclic Graphs) of steps, managing state transitions, parallelism, retries, and failure handling. + +## Execution Architecture + +``` + WORKFLOW EXECUTION ARCHITECTURE + + ┌─────────────────────────────────────────────────────────────────────────────┐ + │ WORKFLOW ENGINE │ + │ │ + │ ┌─────────────────────────────────────────────────────────────────────┐ │ + │ │ WORKFLOW RUNNER │ │ + │ │ │ │ + │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ + │ │ │ Template │───►│ Execution │───►│ Context │ │ │ + │ │ │ Parser │ │ Planner │ │ Builder │ │ │ + │ │ └────────────┘ └────────────┘ └────────────┘ │ │ + │ │ │ │ │ │ │ + │ │ └────────────────┼─────────────────┘ │ │ + │ │ ▼ │ │ + │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ + │ │ │ DAG EXECUTOR │ │ │ + │ │ │ │ │ │ + │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ + │ │ │ │ Ready │ │ Running │ │ Waiting │ │ Completed│ │ │ │ + │ │ │ │ Queue │ │ Set │ │ Set │ │ Set │ │ │ │ + │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ + │ │ │ │ │ │ + │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ + │ │ │ │ STEP DISPATCHER │ │ │ │ + │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ + │ │ └─────────────────────────────────────────────────────────────┘ │ │ + │ └─────────────────────────────────────────────────────────────────────┘ │ + │ │ │ + │ ▼ │ + │ ┌─────────────────────────────────────────────────────────────────────┐ │ + │ │ STEP EXECUTOR POOL │ │ + │ │ │ │ + │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ + │ │ │ Executor 1 │ │ Executor 2 │ │ Executor 3 │ │ Executor N │ │ │ + │ │ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │ │ + │ │ │ │ + │ └─────────────────────────────────────────────────────────────────────┘ │ + │ │ + └─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Workflow Run State Machine + +``` + WORKFLOW RUN STATES + + ┌──────────┐ + │ CREATED │ + └────┬─────┘ + │ start() + ▼ + ┌──────────┐ + │ RUNNING │◄──────────────────┐ + └────┬─────┘ │ + │ │ + ┌───────────────────┼───────────────────┐ │ + │ │ │ │ + ▼ ▼ ▼ │ + ┌──────────┐ ┌──────────┐ ┌──────────┐│ + │ WAITING │ │ PAUSED │ │ FAILING ││ + │ APPROVAL │ │ │ │ ││ + └────┬─────┘ └────┬─────┘ └────┬─────┘│ + │ │ │ │ + │ approve() │ resume() │ │ + │ │ │ │ + └───────────────►──┴──────────────────┘ │ + │ │ + └─────────────────────────┘ + │ + ┌───────────────────────┼───────────────────┐ + │ │ │ + ▼ ▼ ▼ + ┌──────────┐ ┌──────────┐ ┌──────────┐ + │COMPLETED │ │ FAILED │ │ CANCELLED│ + └──────────┘ └──────────┘ └──────────┘ +``` + +### State Transitions + +| Current State | Event | Next State | Description | +|---------------|-------|------------|-------------| +| `created` | `start()` | `running` | Begin workflow execution | +| `running` | Step requires approval | `waiting_approval` | Pause for human approval | +| `running` | `pause()` | `paused` | Manual pause requested | +| `running` | Step fails | `failing` | Handle failure path | +| `running` | All steps complete | `completed` | Workflow success | +| `waiting_approval` | `approve()` | `running` | Resume after approval | +| `waiting_approval` | `reject()` | `failed` | Rejection ends workflow | +| `paused` | `resume()` | `running` | Resume execution | +| `paused` | `cancel()` | `cancelled` | Cancel workflow | +| `failing` | Rollback complete | `failed` | Failure handling done | +| `failing` | Rollback succeeds | `running` | Resume with fallback | + +## Step Execution State Machine + +``` + STEP STATES + + ┌──────────┐ + │ PENDING │ + └────┬─────┘ + │ schedule() + ▼ + ┌──────────┐ + │ QUEUED │ + └────┬─────┘ + │ dispatch() + ▼ + ┌──────────┐ + │ RUNNING │◄─────────┐ + └────┬─────┘ │ + │ │ retry() + ┌───────────────────┼───────────────┐│ + │ │ ││ + ▼ ▼ ▼│ + ┌──────────┐ ┌──────────┐ ┌──────────┐ + │SUCCEEDED │ │ FAILED │ │ RETRYING │ + └──────────┘ └────┬─────┘ └──────────┘ + │ + ▼ + ┌─────────────────────┐ + │ FAILURE HANDLER │ + │ ┌───────────────┐ │ + │ │ fail │──┼─► Mark workflow failing + │ │ continue │──┼─► Continue to next step + │ │ rollback │──┼─► Trigger rollback path + │ │ goto:{nodeId} │──┼─► Jump to specific node + │ └───────────────┘ │ + └─────────────────────┘ +``` + +### Step States + +| State | Description | +|-------|-------------| +| `pending` | Step not yet ready (dependencies incomplete) | +| `queued` | Ready for execution, waiting for executor | +| `running` | Currently executing | +| `succeeded` | Completed successfully | +| `failed` | Failed after all retries exhausted | +| `retrying` | Failed, waiting for retry | +| `skipped` | Condition evaluated to false | + +## DAG Execution Algorithm + +```python +class DAGExecutor: + def __init__(self, workflow_run: WorkflowRun): + self.run = workflow_run + self.template = workflow_run.template + self.pending = set(node.id for node in template.nodes) + self.running = set() + self.completed = set() + self.failed = set() + self.outputs = {} # nodeId -> outputs + + async def execute(self): + """Main execution loop.""" + self.run.status = WorkflowStatus.RUNNING + self.run.started_at = datetime.utcnow() + + while self.pending or self.running: + # Find ready nodes (all dependencies satisfied) + ready = self.find_ready_nodes() + + # Dispatch ready nodes + for node_id in ready: + asyncio.create_task(self.execute_node(node_id)) + self.pending.remove(node_id) + self.running.add(node_id) + + # Wait for any node to complete + if self.running: + await self.wait_for_completion() + + # Check for deadlock + if not ready and self.pending and not self.running: + raise DeadlockException(self.pending) + + # Determine final status + if self.failed: + self.run.status = WorkflowStatus.FAILED + else: + self.run.status = WorkflowStatus.COMPLETED + + self.run.completed_at = datetime.utcnow() + + def find_ready_nodes(self) -> List[str]: + """Find nodes whose dependencies are all complete.""" + ready = [] + for node_id in self.pending: + node = self.template.get_node(node_id) + + # Check condition + if node.condition: + if not self.evaluate_condition(node.condition): + self.mark_skipped(node_id) + continue + + # Check all incoming edges + incoming = self.template.get_incoming_edges(node_id) + dependencies_met = all( + edge.from_node in self.completed + for edge in incoming + if self.evaluate_edge_condition(edge) + ) + + if dependencies_met: + ready.append(node_id) + + return ready + + async def execute_node(self, node_id: str): + """Execute a single node.""" + node = self.template.get_node(node_id) + step_run = StepRun( + workflow_run_id=self.run.id, + node_id=node_id, + status=StepStatus.RUNNING + ) + + try: + # Resolve inputs + inputs = self.resolve_inputs(node) + + # Get step executor + executor = self.step_registry.get_executor(node.type) + + # Execute with timeout + async with asyncio.timeout(node.timeout): + outputs = await executor.execute(inputs, node.config) + + # Store outputs + self.outputs[node_id] = outputs + step_run.outputs = outputs + step_run.status = StepStatus.SUCCEEDED + + self.running.remove(node_id) + self.completed.add(node_id) + + except Exception as e: + await self.handle_step_failure(node, step_run, e) + + async def handle_step_failure(self, node, step_run, error): + """Handle step failure according to retry and failure policies.""" + step_run.attempt_number += 1 + + # Check retry policy + if step_run.attempt_number <= node.retry_policy.max_retries: + if self.is_retryable(error, node.retry_policy): + step_run.status = StepStatus.RETRYING + delay = self.calculate_backoff(node.retry_policy, step_run.attempt_number) + await asyncio.sleep(delay) + await self.execute_node(node.id) # Retry + return + + # No more retries - handle failure + step_run.status = StepStatus.FAILED + step_run.error = str(error) + + match node.on_failure: + case "fail": + self.run.status = WorkflowStatus.FAILING + self.failed.add(node.id) + case "continue": + self.completed.add(node.id) # Continue as if succeeded + case "rollback": + await self.trigger_rollback(node) + case _ if node.on_failure.startswith("goto:"): + target = node.on_failure.split(":")[1] + self.pending.add(target) # Add target to pending + + self.running.remove(node.id) +``` + +## Input Resolution + +Inputs to steps can come from multiple sources: + +```typescript +interface InputResolver { + resolve(binding: InputBinding, context: ExecutionContext): any; +} + +class StandardInputResolver implements InputResolver { + resolve(binding: InputBinding, context: ExecutionContext): any { + switch (binding.source.type) { + case "literal": + return binding.source.value; + + case "context": + // Navigate context path: "release.name" -> context.release.name + return this.navigatePath(context, binding.source.path); + + case "output": + // Get output from previous step + const stepOutputs = context.stepOutputs[binding.source.nodeId]; + return stepOutputs?.[binding.source.outputName]; + + case "secret": + // Fetch from vault (never cached) + return this.secretsClient.fetch(binding.source.secretName); + + case "expression": + // Evaluate JavaScript expression + return this.expressionEvaluator.evaluate( + binding.source.expression, + context + ); + } + } +} +``` + +## Execution Context + +The execution context provides data available to all steps: + +```typescript +interface ExecutionContext { + // Workflow identifiers + workflowRunId: UUID; + templateId: UUID; + templateVersion: number; + + // Input values + inputs: Record; + + // Domain objects (loaded at start) + release?: Release; + promotion?: Promotion; + environment?: Environment; + targets?: Target[]; + + // Step outputs (accumulated during execution) + stepOutputs: Record>; + + // Tenant context + tenantId: UUID; + userId: UUID; + + // Metadata + startedAt: DateTime; + correlationId: string; +} +``` + +## Concurrency Control + +### Parallelism Within Workflows + +```typescript +interface ParallelConfig { + maxConcurrency: number; // Max simultaneous steps + failFast: boolean; // Stop all on first failure +} + +// Example: Parallel deployment to multiple targets +const parallelDeploy: StepNode = { + id: "parallel-deploy", + type: "parallel", + config: { + maxConcurrency: 5, + failFast: false + }, + children: [ + { id: "deploy-target-1", type: "deploy-docker", ... }, + { id: "deploy-target-2", type: "deploy-docker", ... }, + { id: "deploy-target-3", type: "deploy-docker", ... }, + ] +}; +``` + +### Global Concurrency Limits + +```typescript +interface ConcurrencyLimits { + maxWorkflowsPerTenant: number; // Concurrent workflow runs + maxStepsPerWorkflow: number; // Concurrent steps per workflow + maxDeploymentsPerEnvironment: number; // Prevent deployment conflicts +} + +// Default limits +const defaults: ConcurrencyLimits = { + maxWorkflowsPerTenant: 10, + maxStepsPerWorkflow: 20, + maxDeploymentsPerEnvironment: 1 // One deployment at a time +}; +``` + +## Checkpoint and Resume + +Workflows support checkpointing for long-running executions: + +```typescript +interface WorkflowCheckpoint { + workflowRunId: UUID; + checkpointedAt: DateTime; + + // Execution state + pendingNodes: string[]; + completedNodes: string[]; + failedNodes: string[]; + + // Accumulated data + stepOutputs: Record>; + + // Context snapshot + contextSnapshot: ExecutionContext; +} + +class CheckpointManager { + // Save checkpoint after each step completion + async saveCheckpoint(run: WorkflowRun): Promise { + const checkpoint: WorkflowCheckpoint = { + workflowRunId: run.id, + checkpointedAt: new Date(), + pendingNodes: Array.from(run.executor.pending), + completedNodes: Array.from(run.executor.completed), + failedNodes: Array.from(run.executor.failed), + stepOutputs: run.executor.outputs, + contextSnapshot: run.context + }; + + await this.repository.save(checkpoint); + } + + // Resume from checkpoint after service restart + async resumeFromCheckpoint(workflowRunId: UUID): Promise { + const checkpoint = await this.repository.get(workflowRunId); + + const run = new WorkflowRun(); + run.executor.pending = new Set(checkpoint.pendingNodes); + run.executor.completed = new Set(checkpoint.completedNodes); + run.executor.failed = new Set(checkpoint.failedNodes); + run.executor.outputs = checkpoint.stepOutputs; + run.context = checkpoint.contextSnapshot; + + // Resume execution + await run.executor.execute(); + return run; + } +} +``` + +## Timeout Handling + +```typescript +interface TimeoutConfig { + stepTimeout: number; // Per-step timeout (seconds) + workflowTimeout: number; // Total workflow timeout (seconds) +} + +class TimeoutHandler { + async executeWithTimeout( + operation: () => Promise, + timeoutSeconds: number, + onTimeout: () => Promise + ): Promise { + const controller = new AbortController(); + const timeoutId = setTimeout( + () => controller.abort(), + timeoutSeconds * 1000 + ); + + try { + const result = await operation(); + clearTimeout(timeoutId); + return result; + } catch (error) { + if (error.name === 'AbortError') { + await onTimeout(); + throw new TimeoutException(timeoutSeconds); + } + throw error; + } + } +} +``` + +## Event Emission + +The workflow engine emits events for observability: + +```typescript +type WorkflowEvent = + | { type: "workflow.started"; workflowRunId: UUID; templateId: UUID } + | { type: "workflow.completed"; workflowRunId: UUID; status: string } + | { type: "workflow.failed"; workflowRunId: UUID; error: string } + | { type: "step.started"; workflowRunId: UUID; nodeId: string } + | { type: "step.completed"; workflowRunId: UUID; nodeId: string; outputs: any } + | { type: "step.failed"; workflowRunId: UUID; nodeId: string; error: string } + | { type: "step.retrying"; workflowRunId: UUID; nodeId: string; attempt: number }; + +class WorkflowEventEmitter { + private subscribers: Map void)[]> = new Map(); + + emit(event: WorkflowEvent): void { + const handlers = this.subscribers.get(event.type) || []; + for (const handler of handlers) { + handler(event); + } + + // Also emit to event bus for external consumers + this.eventBus.publish("workflow.events", event); + } +} +``` + +## Execution Monitoring + +### Real-time Progress + +```typescript +interface WorkflowProgress { + workflowRunId: UUID; + status: WorkflowStatus; + + // Step progress + totalSteps: number; + completedSteps: number; + runningSteps: number; + failedSteps: number; + + // Current activity + currentNodes: string[]; + + // Timing + startedAt: DateTime; + estimatedCompletion?: DateTime; + + // Step details + steps: StepProgress[]; +} + +interface StepProgress { + nodeId: string; + nodeName: string; + status: StepStatus; + startedAt?: DateTime; + completedAt?: DateTime; + attempt: number; + logs?: string; +} +``` + +### WebSocket Streaming + +```typescript +// Client subscribes to workflow progress +const ws = new WebSocket(`/api/v1/workflow-runs/${runId}/stream`); + +ws.onmessage = (event) => { + const progress: WorkflowProgress = JSON.parse(event.data); + updateUI(progress); +}; + +// Server streams updates +class WorkflowStreamHandler { + async stream(runId: UUID, connection: WebSocket): Promise { + const subscription = this.eventBus.subscribe(`workflow.${runId}.*`); + + for await (const event of subscription) { + const progress = await this.buildProgress(runId); + connection.send(JSON.stringify(progress)); + + if (progress.status === 'completed' || progress.status === 'failed') { + break; + } + } + + connection.close(); + } +} +``` + +## References + +- [Workflow Templates](templates.md) +- [Workflow Engine Module](../modules/workflow-engine.md) +- [Promotion Manager](../modules/promotion-manager.md) diff --git a/docs/modules/release-orchestrator/workflow/promotion.md b/docs/modules/release-orchestrator/workflow/promotion.md new file mode 100644 index 000000000..85248cb8e --- /dev/null +++ b/docs/modules/release-orchestrator/workflow/promotion.md @@ -0,0 +1,405 @@ +# Promotion State Machine + +## Overview + +Promotions move releases through environments (Dev -> Staging -> Production). The promotion state machine manages the lifecycle from request to completion. + +## Promotion States + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ PROMOTION STATE MACHINE │ +│ │ +│ ┌──────────────────┐ │ +│ │ PENDING_APPROVAL │ (initial) │ +│ └────────┬─────────┘ │ +│ │ │ +│ ┌──────────────────┼──────────────────┐ │ +│ │ │ │ │ +│ ▼ ▼ ▼ │ +│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │ +│ │ REJECTED │ │ PENDING_GATE │ │ CANCELLED │ │ +│ └────────────────┘ └────────┬───────┘ └────────────────┘ │ +│ │ │ +│ │ gates pass │ +│ ▼ │ +│ ┌────────────────┐ │ +│ │ APPROVED │ │ +│ └────────┬───────┘ │ +│ │ │ +│ │ start deployment │ +│ ▼ │ +│ ┌────────────────┐ │ +│ │ DEPLOYING │ │ +│ └────────┬───────┘ │ +│ │ │ +│ ┌──────────────────┼──────────────────┐ │ +│ │ │ │ │ +│ ▼ ▼ ▼ │ +│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │ +│ │ FAILED │ │ DEPLOYED │ │ ROLLED_BACK │ │ +│ └────────────────┘ └────────────────┘ └────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## State Definitions + +| State | Description | +|-------|-------------| +| `pending_approval` | Awaiting human approval (if required) | +| `pending_gate` | Awaiting automated gate evaluation | +| `approved` | All approvals and gates satisfied; ready for deployment | +| `rejected` | Blocked by approval rejection or gate failure | +| `deploying` | Deployment in progress | +| `deployed` | Successfully deployed to target environment | +| `failed` | Deployment failed (not rolled back) | +| `cancelled` | Cancelled by user before completion | +| `rolled_back` | Deployment rolled back to previous version | + +## State Transitions + +### Valid Transitions + +```typescript +const validTransitions: Record = { + pending_approval: ["pending_gate", "approved", "rejected", "cancelled"], + pending_gate: ["approved", "rejected", "cancelled"], + approved: ["deploying", "cancelled"], + deploying: ["deployed", "failed", "rolled_back"], + rejected: [], // terminal + cancelled: [], // terminal + deployed: [], // terminal (for this promotion) + failed: ["rolled_back"], // can trigger rollback + rolled_back: [] // terminal +}; +``` + +### Transition Events + +```typescript +interface PromotionTransition { + promotionId: UUID; + fromState: PromotionStatus; + toState: PromotionStatus; + trigger: TransitionTrigger; + triggeredBy: UUID; // user or system + timestamp: DateTime; + details: object; +} + +type TransitionTrigger = + | "approval_granted" + | "approval_rejected" + | "gate_passed" + | "gate_failed" + | "deployment_started" + | "deployment_completed" + | "deployment_failed" + | "rollback_triggered" + | "rollback_completed" + | "user_cancelled"; +``` + +## Promotion Flow + +### 1. Request Promotion + +```typescript +async function requestPromotion(request: PromotionRequest): Promise { + // Validate release exists and is ready + const release = await getRelease(request.releaseId); + if (release.status !== "ready" && release.status !== "deployed") { + throw new Error("Release not ready for promotion"); + } + + // Validate target environment + const environment = await getEnvironment(request.targetEnvironmentId); + + // Check freeze windows + if (await isEnvironmentFrozen(environment.id)) { + throw new Error("Environment is frozen"); + } + + // Determine initial state + const requiresApproval = environment.requiredApprovals > 0; + const initialStatus = requiresApproval ? "pending_approval" : "pending_gate"; + + // Create promotion + const promotion = await createPromotion({ + releaseId: request.releaseId, + sourceEnvironmentId: release.currentEnvironmentId, + targetEnvironmentId: environment.id, + status: initialStatus, + requestedBy: request.userId, + requestReason: request.reason + }); + + // Emit event + await emitEvent("promotion.requested", promotion); + + return promotion; +} +``` + +### 2. Approval Phase + +```typescript +async function processApproval( + promotionId: UUID, + approverId: UUID, + action: "approve" | "reject", + comment?: string +): Promise { + const promotion = await getPromotion(promotionId); + const environment = await getEnvironment(promotion.targetEnvironmentId); + + // Validate approver can approve + await validateApproverPermission(approverId, environment.id); + + // Check separation of duties + if (environment.requireSeparationOfDuties) { + if (approverId === promotion.requestedBy) { + throw new Error("Separation of duties violation: requester cannot approve"); + } + } + + // Record approval + await recordApproval({ + promotionId, + approverId, + action, + comment + }); + + if (action === "reject") { + return await transitionState(promotion, "rejected", { + trigger: "approval_rejected", + triggeredBy: approverId, + details: { reason: comment } + }); + } + + // Check if all required approvals received + const approvalCount = await countApprovals(promotionId); + if (approvalCount >= environment.requiredApprovals) { + return await transitionState(promotion, "pending_gate", { + trigger: "approval_granted", + triggeredBy: approverId + }); + } + + return promotion; +} +``` + +### 3. Gate Evaluation + +```typescript +async function evaluateGates(promotionId: UUID): Promise { + const promotion = await getPromotion(promotionId); + const environment = await getEnvironment(promotion.targetEnvironmentId); + const release = await getRelease(promotion.releaseId); + + const gateResults: GateResult[] = []; + + // Security gate + const securityResult = await evaluateSecurityGate(release, environment); + gateResults.push(securityResult); + + // Custom policy gates + for (const policy of environment.policies) { + const policyResult = await evaluatePolicyGate(release, environment, policy); + gateResults.push(policyResult); + } + + // Aggregate results + const allPassed = gateResults.every(g => g.passed); + const blockingFailures = gateResults.filter(g => !g.passed && g.blocking); + + // Create decision record + const decisionRecord = await createDecisionRecord({ + promotionId, + gateResults, + decision: allPassed ? "allow" : "block", + decidedAt: new Date() + }); + + // Transition state + if (allPassed) { + await transitionState(promotion, "approved", { + trigger: "gate_passed", + triggeredBy: "system", + details: { decisionRecordId: decisionRecord.id } + }); + } else { + await transitionState(promotion, "rejected", { + trigger: "gate_failed", + triggeredBy: "system", + details: { blockingGates: blockingFailures } + }); + } + + return { passed: allPassed, gateResults, decisionRecord }; +} +``` + +### 4. Deployment Execution + +```typescript +async function executeDeployment(promotionId: UUID): Promise { + const promotion = await getPromotion(promotionId); + + // Transition to deploying + await transitionState(promotion, "deploying", { + trigger: "deployment_started", + triggeredBy: "system" + }); + + // Generate artifacts + const artifacts = await generateArtifacts(promotion); + + // Create deployment job + const job = await createDeploymentJob({ + promotionId, + releaseId: promotion.releaseId, + environmentId: promotion.targetEnvironmentId, + artifacts + }); + + // Execute via workflow or direct + const workflowRun = await startDeploymentWorkflow(job); + + // Update promotion with workflow reference + await updatePromotion(promotionId, { workflowRunId: workflowRun.id }); + + return job; +} +``` + +### 5. Completion Handling + +```typescript +async function handleDeploymentCompletion( + jobId: UUID, + status: "succeeded" | "failed" +): Promise { + const job = await getDeploymentJob(jobId); + const promotion = await getPromotion(job.promotionId); + + if (status === "succeeded") { + // Generate evidence packet + const evidence = await generateEvidencePacket(promotion, job); + + // Update release environment state + await updateReleaseEnvironmentState({ + releaseId: promotion.releaseId, + environmentId: promotion.targetEnvironmentId, + status: "deployed", + promotionId: promotion.id, + evidenceRef: evidence.id + }); + + return await transitionState(promotion, "deployed", { + trigger: "deployment_completed", + triggeredBy: "system", + details: { evidencePacketId: evidence.id } + }); + } else { + return await transitionState(promotion, "failed", { + trigger: "deployment_failed", + triggeredBy: "system", + details: { jobId, error: job.errorMessage } + }); + } +} +``` + +## Decision Record + +Every promotion produces a decision record: + +```typescript +interface DecisionRecord { + id: UUID; + promotionId: UUID; + decision: "allow" | "block"; + decidedAt: DateTime; + + // Inputs + release: { + id: UUID; + name: string; + components: Array<{ name: string; digest: string }>; + }; + environment: { + id: UUID; + name: string; + }; + + // Gate results + gateResults: Array<{ + gateName: string; + gateType: string; + passed: boolean; + blocking: boolean; + message: string; + details: object; + evaluatedAt: DateTime; + }>; + + // Approvals + approvals: Array<{ + approverId: UUID; + approverName: string; + action: "approved" | "rejected"; + comment?: string; + timestamp: DateTime; + }>; + + // Context + requester: { + id: UUID; + name: string; + }; + requestReason: string; + + // Signature + contentHash: string; + signature: string; +} +``` + +## API Endpoints + +```yaml +# Request promotion +POST /api/v1/promotions +Body: { releaseId, targetEnvironmentId, reason? } +Response: Promotion + +# Approve/reject promotion +POST /api/v1/promotions/{id}/approve +POST /api/v1/promotions/{id}/reject +Body: { comment? } +Response: Promotion + +# Cancel promotion +POST /api/v1/promotions/{id}/cancel +Response: Promotion + +# Get decision record +GET /api/v1/promotions/{id}/decision +Response: DecisionRecord + +# Preview gates (dry run) +POST /api/v1/promotions/preview-gates +Body: { releaseId, targetEnvironmentId } +Response: { wouldPass: boolean, gates: GateResult[] } +``` + +## References + +- [Workflow Templates](templates.md) +- [Workflow Execution](execution.md) +- [Evidence Schema](../appendices/evidence-schema.md) diff --git a/docs/modules/release-orchestrator/workflow/templates.md b/docs/modules/release-orchestrator/workflow/templates.md new file mode 100644 index 000000000..9d8e13d7f --- /dev/null +++ b/docs/modules/release-orchestrator/workflow/templates.md @@ -0,0 +1,327 @@ +# Workflow Template Structure + +## Overview + +Workflow templates define the DAG (Directed Acyclic Graph) of steps to execute during deployment, promotion, and other automated processes. + +## Template Structure + +```typescript +interface WorkflowTemplate { + id: UUID; + tenantId: UUID; + name: string; // "standard-deploy" + displayName: string; // "Standard Deployment" + description: string; + version: number; // Auto-incremented + + // DAG structure + nodes: StepNode[]; + edges: StepEdge[]; + + // I/O definitions + inputs: InputDefinition[]; + outputs: OutputDefinition[]; + + // Metadata + tags: string[]; + isBuiltin: boolean; + createdAt: DateTime; + createdBy: UUID; +} +``` + +## Node Types + +### Step Node + +```typescript +interface StepNode { + id: string; // Unique within template (e.g., "deploy-api") + type: string; // Step type from registry + name: string; // Display name + config: Record; // Step-specific configuration + inputs: InputBinding[]; // Input value bindings + outputs: OutputBinding[]; // Output declarations + position: { x: number; y: number }; // UI position + + // Execution settings + timeout: number; // Seconds (default from step type) + retryPolicy: RetryPolicy; + onFailure: FailureAction; + condition?: string; // JS expression for conditional execution + + // Documentation + description?: string; + documentation?: string; +} + +type FailureAction = "fail" | "continue" | "rollback" | "goto:{nodeId}"; + +interface RetryPolicy { + maxRetries: number; + backoffType: "fixed" | "exponential"; + backoffSeconds: number; + retryableErrors: string[]; +} +``` + +### Input Bindings + +```typescript +interface InputBinding { + name: string; // Input parameter name + source: InputSource; +} + +type InputSource = + | { type: "literal"; value: any } + | { type: "context"; path: string } // e.g., "release.name" + | { type: "output"; nodeId: string; outputName: string } + | { type: "secret"; secretName: string } + | { type: "expression"; expression: string }; // JS expression +``` + +### Edge Types + +```typescript +interface StepEdge { + id: string; + from: string; // Source node ID + to: string; // Target node ID + condition?: string; // Optional condition expression + label?: string; // Display label for conditional edges +} +``` + +## Built-in Step Types + +### Control Steps + +| Type | Description | Config | +|------|-------------|--------| +| `approval` | Wait for human approval | `promotionId` | +| `wait` | Wait for specified duration | `durationSeconds` | +| `condition` | Branch based on condition | `expression` | +| `parallel` | Execute children in parallel | `maxConcurrency` | + +### Gate Steps + +| Type | Description | Config | +|------|-------------|--------| +| `security-gate` | Evaluate security policy | `blockOnCritical`, `blockOnHigh` | +| `custom-gate` | Custom OPA policy evaluation | `policyName` | +| `freeze-check` | Check freeze windows | - | +| `approval-check` | Check approval status | `requiredCount` | + +### Deploy Steps + +| Type | Description | Config | +|------|-------------|--------| +| `deploy-docker` | Deploy single container | `containerName`, `strategy` | +| `deploy-compose` | Deploy Docker Compose stack | `composePath`, `strategy` | +| `deploy-ecs` | Deploy to AWS ECS | `cluster`, `service` | +| `deploy-nomad` | Deploy to HashiCorp Nomad | `jobName` | + +### Verification Steps + +| Type | Description | Config | +|------|-------------|--------| +| `health-check` | HTTP/TCP health check | `type`, `path`, `expectedStatus` | +| `smoke-test` | Run smoke test suite | `testSuite`, `timeout` | +| `verify-digest` | Verify deployed digest | `expectedDigest` | + +### Integration Steps + +| Type | Description | Config | +|------|-------------|--------| +| `webhook` | Call external webhook | `url`, `method`, `headers` | +| `trigger-ci` | Trigger CI pipeline | `integrationId`, `pipelineId` | +| `wait-ci` | Wait for CI pipeline | `runId`, `timeout` | + +### Notification Steps + +| Type | Description | Config | +|------|-------------|--------| +| `notify` | Send notification | `channel`, `template` | +| `slack` | Send Slack message | `channel`, `message` | +| `email` | Send email | `recipients`, `template` | + +### Recovery Steps + +| Type | Description | Config | +|------|-------------|--------| +| `rollback` | Rollback deployment | `strategy`, `targetReleaseId` | +| `execute-script` | Run recovery script | `scriptType`, `scriptRef` | + +## Template Example: Standard Deployment + +```json +{ + "id": "template-standard-deploy", + "name": "standard-deploy", + "displayName": "Standard Deployment", + "version": 1, + "inputs": [ + { "name": "releaseId", "type": "uuid", "required": true }, + { "name": "environmentId", "type": "uuid", "required": true }, + { "name": "promotionId", "type": "uuid", "required": true } + ], + "nodes": [ + { + "id": "approval", + "type": "approval", + "name": "Approval Gate", + "config": {}, + "inputs": [ + { "name": "promotionId", "source": { "type": "context", "path": "promotionId" } } + ], + "position": { "x": 100, "y": 100 } + }, + { + "id": "security-gate", + "type": "security-gate", + "name": "Security Verification", + "config": { + "blockOnCritical": true, + "blockOnHigh": true + }, + "inputs": [ + { "name": "releaseId", "source": { "type": "context", "path": "releaseId" } } + ], + "position": { "x": 100, "y": 200 } + }, + { + "id": "pre-deploy-hook", + "type": "execute-script", + "name": "Pre-Deploy Hook", + "config": { + "scriptType": "csharp", + "scriptRef": "hooks/pre-deploy.csx" + }, + "inputs": [ + { "name": "release", "source": { "type": "context", "path": "release" } }, + { "name": "environment", "source": { "type": "context", "path": "environment" } } + ], + "timeout": 300, + "onFailure": "fail", + "position": { "x": 100, "y": 300 } + }, + { + "id": "deploy-targets", + "type": "deploy-compose", + "name": "Deploy to Targets", + "config": { + "strategy": "rolling", + "parallelism": 2 + }, + "inputs": [ + { "name": "releaseId", "source": { "type": "context", "path": "releaseId" } }, + { "name": "environmentId", "source": { "type": "context", "path": "environmentId" } } + ], + "timeout": 600, + "retryPolicy": { + "maxRetries": 2, + "backoffType": "exponential", + "backoffSeconds": 30 + }, + "onFailure": "rollback", + "position": { "x": 100, "y": 400 } + }, + { + "id": "health-check", + "type": "health-check", + "name": "Health Verification", + "config": { + "type": "http", + "path": "/health", + "expectedStatus": 200, + "timeout": 30, + "retries": 5 + }, + "inputs": [ + { "name": "targets", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "deployedTargets" } } + ], + "onFailure": "rollback", + "position": { "x": 100, "y": 500 } + }, + { + "id": "post-deploy-hook", + "type": "execute-script", + "name": "Post-Deploy Hook", + "config": { + "scriptType": "bash", + "inline": "echo 'Deployment complete'" + }, + "timeout": 300, + "onFailure": "continue", + "position": { "x": 100, "y": 600 } + }, + { + "id": "notify-success", + "type": "notify", + "name": "Success Notification", + "config": { + "channel": "slack", + "template": "deployment-success" + }, + "inputs": [ + { "name": "release", "source": { "type": "context", "path": "release" } }, + { "name": "environment", "source": { "type": "context", "path": "environment" } } + ], + "onFailure": "continue", + "position": { "x": 100, "y": 700 } + }, + { + "id": "rollback-handler", + "type": "rollback", + "name": "Rollback Handler", + "config": { + "strategy": "to-previous" + }, + "inputs": [ + { "name": "deploymentJobId", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "jobId" } } + ], + "position": { "x": 300, "y": 450 } + }, + { + "id": "notify-failure", + "type": "notify", + "name": "Failure Notification", + "config": { + "channel": "slack", + "template": "deployment-failure" + }, + "onFailure": "continue", + "position": { "x": 300, "y": 550 } + } + ], + "edges": [ + { "id": "e1", "from": "approval", "to": "security-gate" }, + { "id": "e2", "from": "security-gate", "to": "pre-deploy-hook" }, + { "id": "e3", "from": "pre-deploy-hook", "to": "deploy-targets" }, + { "id": "e4", "from": "deploy-targets", "to": "health-check" }, + { "id": "e5", "from": "health-check", "to": "post-deploy-hook" }, + { "id": "e6", "from": "post-deploy-hook", "to": "notify-success" }, + { "id": "e7", "from": "deploy-targets", "to": "rollback-handler", "condition": "status === 'failed'" }, + { "id": "e8", "from": "health-check", "to": "rollback-handler", "condition": "status === 'failed'" }, + { "id": "e9", "from": "rollback-handler", "to": "notify-failure" } + ] +} +``` + +## Template Validation + +Templates are validated for: + +1. **Structural validity**: Valid JSON/YAML, required fields present +2. **DAG validity**: No cycles, all edges reference valid nodes +3. **Type validity**: All step types exist in registry +4. **Schema validity**: Step configs match type schemas +5. **Input validity**: All required inputs are bindable + +## References + +- [Workflow Engine](../modules/workflow-engine.md) +- [Execution State Machine](execution.md) +- [Step Registry](../modules/workflow-engine.md#module-step-registry) diff --git a/docs/overview.md b/docs/overview.md index 6fdce8e04..e1f5b2063 100644 --- a/docs/overview.md +++ b/docs/overview.md @@ -1,39 +1,75 @@ -# Stella Ops – 2‑Minute Overview +# Stella Ops Suite — 2-Minute Overview -## The Problem We Solve +## What Stella Ops Suite Is -- **Supply-chain attacks exploded 742 % in three years;** regulated teams still need to scan hundreds of containers a day while disconnected from the public Internet. -- **Existing scanners trade freedom for SaaS:** no offline feeds, hidden quotas, noisy results that lack exploitability context. -- **Audit fatigue is real:** Policy decisions are opaque, replaying scans is guesswork, and trust hinges on external transparency logs you do not control. +**Stella Ops Suite is a centralized, auditable release control plane for non-Kubernetes container estates.** -## The Promise +It sits between your CI and your runtime targets, governs promotion across environments, enforces security and policy gates, and produces verifiable evidence for every release decision—while remaining plug-in friendly to any SCM/CI/registry/secrets stack. -Stella Ops delivers **deterministic, sovereign container security** that works the same online or fully air-gapped: +## The Problems We Solve -1. **Deterministic replay manifests** (SRM) prove every scan result, so auditors can rerun evidence and see the exact same outcome. -2. **Lattice policy engine + OpenVEX** keeps findings explainable; exploitability, attestation, and waivers merge into one verdict. -3. **Sovereign crypto profiles** let you anchor signatures to eIDAS, FIPS, GOST, or SM roots, mirror your feeds, and keep Sigstore-compatible transparency logs offline. +- **Release governance is fragmented:** CI tools run pipelines but lack central release authority; deployment tools promote but bolt on security as an afterthought. +- **Non-Kubernetes targets are second-class:** Docker hosts, Compose, ECS, and Nomad deployments lack the GitOps tooling that Kubernetes enjoys. +- **Security blocks releases without explanation:** Scanners find vulnerabilities but don't integrate with promotion workflows; teams bypass gates or ignore findings. +- **Audit trails are scattered:** Release decisions live in CI logs, approval emails, and Slack threads—not in a unified, cryptographically verifiable ledger. +- **Pricing punishes automation:** Per-project, per-seat, or per-deployment billing creates friction for teams that deploy frequently. -## Core Capability Clusters +## What Stella Ops Suite Does -| Cluster | What you get | Why it matters | -|---------|--------------|----------------| -| **SBOM-first scanning** | Delta-layer SBOM cache, sub‑5 s warm scans, Trivy/CycloneDX/SPDX ingestion + dependency cartographing | Speeds repeat scans 10× and keeps SBOMs the source of truth | -| **Explainable policy** | OpenVEX + lattice logic, policy engine for custom rule packs, waiver expirations | Reduces alert fatigue, supports alert muting beyond VEX, and shows why a finding blocks deploy | -| **Attestation & provenance** | DSSE bundles, optional Rekor mirror, DSSE → CLI/UI exports | Lets you prove integrity without relying on external services | -| **Offline operations** | Offline Update Kit bundles, mirrored feeds, quota tokens verified locally | Works for sovereign clouds, SCIFs, and heavily regulated sectors | -| **Governance & observability** | Structured audit trails, quota transparency, per-tenant metrics | Keeps compliance teams and operators in sync without extra tooling | +| Capability | Description | +|------------|-------------| +| **Release orchestration** | UI-driven promotion (Dev → Stage → Prod), approvals, policy gates, rollbacks; steps are hook-able with scripts and step providers | +| **Security decisioning as a gate** | Scan on build, evaluate on release, re-evaluate when vulnerability intelligence updates—without forcing re-scans | +| **OCI-digest-first releases** | A release is an immutable digest (or bundle of digests); track "what is deployed where" with integrity | +| **Toolchain-agnostic integrations** | Plug into any SCM, any CI, any registry, any secrets system; customers reuse their existing stack | +| **Auditability + standards** | Audit log + evidence packets (exportable), SBOM/VEX/attestation-friendly, standards-first approach | + +## Core Strengths + +| Strength | Why It Matters | +|----------|----------------| +| **Non-Kubernetes specialization** | Docker hosts, Compose, ECS, Nomad-style targets are first-class, not an afterthought | +| **Reproducibility** | Deterministic release decisions captured as evidence (inputs + policy hash + verdict + approvals) | +| **Attestability** | Produces and verifies release evidence/attestations (provenance, SBOM linkage, decision records) in standard formats | +| **Verity (integrity)** | Digest-based release identity; signature/provenance verification; tamper-evident audit trail | +| **Hybrid reachability** | Reachability-aware vulnerability prioritization (static + runtime signals) to reduce noise and focus on exploitable paths | +| **Cost that doesn't punish automation** | No per-project tax, no per-seat tax, no "deployments bill." Limits are only: (1) number of environments and (2) number of new digests analyzed per day | ## Who Benefits -| Persona | Outcome in week one | -|---------|--------------------| -| **Security engineering** | Deterministic replay + explain traces | cuts review time, keeps waivers honest | -| **Platform / SRE** | Fast scans, local registry, no Internet dependency | fits pipelines and air-gapped staging | -| **Compliance & risk** | Signed SBOMs, provable quotas, legal/attestation docs | supports audits without custom tooling | +| Persona | Outcome | +|---------|---------| +| **Release managers** | Central control plane for promotions; clear approval workflows; audit-ready evidence | +| **Security engineering** | Security gates integrated into release flow; reachability-aware prioritization; VEX support | +| **Platform / SRE** | Deploy to Docker/Compose/ECS/Nomad with agents or agentless; rollback with confidence | +| **Compliance & risk** | Every release decision is cryptographically signed and replayable; export compliance reports | +| **DevOps / CI owners** | Integrate via webhooks; keep existing CI/SCM/registry; add release governance without replacing tools | + +## Platform Capabilities + +### Operational Today + +- **Vulnerability scanning** with SBOM-first approach and delta-layer caching +- **Advisory ingestion** from multiple sources with aggregation-not-merge semantics +- **VEX support** for exploitability decisioning (OpenVEX + SPDX 3.0.1 relationships) +- **Policy engine** with lattice logic for explainable, deterministic verdicts +- **Attestation and signing** (DSSE/in-toto) with optional Sigstore Rekor transparency +- **Offline operations** via Offline Kit bundles for air-gapped deployments +- **Sovereign crypto profiles** (eIDAS, FIPS, GOST, SM) + +### Planned (Release Orchestration) + +- **Environment management** — Define Dev/Stage/Prod environments with freeze windows and approval policies +- **Release bundles** — Compose releases from component digests with semantic versioning +- **Promotion workflows** — DAG-based workflow engine with approvals, gates, and hooks +- **Deployment execution** — Agents for Docker, Compose, ECS, Nomad; agentless via SSH/WinRM +- **Progressive delivery** — A/B releases, canary deployments, traffic routing +- **Plugin system** — Three-surface plugin model for integrations, steps, and agents +- **Version stickers** — Tamper-evident deployment records on targets for drift detection ## Where to Go Next -- Ready to pull the containers? Head to [quickstart.md](quickstart.md). -- Want the capability detail? Browse the five cards in [key-features.md](key-features.md). -- Need to evaluate fit and build a rollout plan? Grab the [evaluation checklist](onboarding/evaluation-checklist.md). +- Ready to try it? Head to [quickstart.md](quickstart.md) +- Want capability details? Browse [key-features.md](key-features.md) +- Understand the architecture? See [ARCHITECTURE_OVERVIEW.md](ARCHITECTURE_OVERVIEW.md) +- Review the roadmap? Check [ROADMAP.md](ROADMAP.md) diff --git a/docs/product/VISION.md b/docs/product/VISION.md index 0da50e8d8..c135b7c0f 100755 --- a/docs/product/VISION.md +++ b/docs/product/VISION.md @@ -1,409 +1,299 @@ -#  3 · Product Vision — **Stella Ops** +# Product Vision — Stella Ops Suite -> Stella Ops isn't just another scanner—it's a different product category: **deterministic, evidence-linked vulnerability decisions** that survive auditors, regulators, and supply-chain propagation. +> Stella Ops Suite isn't just another scanner or deployment tool—it's a different product category: **a centralized, auditable release control plane** that gates releases using reachability-aware security and produces verifiable evidence for every decision. ## 1) Problem Statement & Goals -We ship containers. We need: -- **Authenticity & integrity** of build artifacts and metadata. -- **Provenance** attached to artifacts, not platforms. -- **Transparency** to detect tampering and retroactive edits. -- **Determinism & explainability** so scanner judgments can be replayed and justified. -- **Actionability** to separate theoretical from exploitable risk (VEX). -- **Minimal trust** across multi‑tenant and third‑party boundaries. +We ship containers to non-Kubernetes targets (Docker hosts, Compose, ECS, Nomad). We need: -**Non‑goals:** Building a new package manager, inventing new SBOM/attestation formats, or depending on closed standards. +- **Release governance** across environments (Dev → Stage → Prod) with approvals and audit trails. +- **Security as a gate, not a blocker** — integrate vulnerability decisions into promotion workflows. +- **Digest-based release identity** — immutable releases, not mutable tags. +- **Toolchain flexibility** — plug into any SCM, CI, registry, and secrets system. +- **Determinism & explainability** — release decisions can be replayed and justified. +- **Evidence packets** — every release decision links to concrete artifacts. +- **Non-Kubernetes first-class support** — Docker hosts, Compose, ECS, Nomad are not afterthoughts. +- **Pricing that doesn't punish automation** — no per-project, per-seat, or per-deployment taxes. + +**Non-goals:** Replacing CI systems, building Kubernetes deployments (use ArgoCD/Flux), or inventing new SBOM/attestation formats. --- -## 2) Golden Path (Minimal End‑to‑End Flow) +## 2) Golden Path (Release-Centric Flow) + +``` +Build → Scan → Create Release → Request Promotion → Gate Evaluation → Deploy → Evidence +``` ```mermaid flowchart LR - A[Source / Image / Rootfs] --> B[SBOM Producer\nCycloneDX 1.7] - B --> C[Signer\nin‑toto Attestation + DSSE] - C --> D[Transparency\nSigstore Rekor - optional but RECOMMENDED] - D --> E[Durable Storage\nSBOMs, Attestations, Proofs] - E --> F[Scanner\nPkg analyzers + Entry‑trace + Layer cache] - F --> G[VEX Authoring\nOpenVEX + SPDX 3.0.1 relationships] - G --> H[Policy Gate\nOPA/Rego: allow/deny + waivers] - H --> I[Artifacts Store\nReports, SARIF, VEX, Audit log] -```` + A[CI Build] --> B[OCI Registry\nPush by digest] + B --> C[Stella Scan\nSBOM + Vuln Analysis] + C --> D[Create Release\nDigest bundle] + D --> E[Request Promotion\nDev → Stage → Prod] + E --> F[Gate Evaluation\nSecurity + Policy + Approval] + F --> G{Decision} + G -->|Allow| H[Deploy to Targets\nDocker/Compose/ECS/Nomad] + G -->|Deny| I[Block with Explanation] + H --> J[Evidence Packet\nSigned + Stored] +``` -**Adopted standards (pinned for interoperability):** +### What Stella Ops Suite Does -* **SBOM:** CycloneDX **1.7** (JSON/XML; 1.6 accepted for ingest) -* **Attestation & signing:** **in‑toto Attestations** (Statement + Predicate) in **DSSE** envelopes -* **Transparency:** **Sigstore Rekor** (inclusion proofs, monitoring) -* **Exploitability:** **OpenVEX** (statuses & justifications) -* **Modeling & interop:** **SPDX 3.0.1** (relationships / VEX modeling) -* **Findings interchange (optional):** SARIF for analyzer output - -> Pinnings are *policy*, not claims about “latest”. We may update pins via normal change control. +| Capability | Description | +|------------|-------------| +| **Release orchestration** | UI-driven promotion (Dev → Stage → Prod), approvals, policy gates, rollbacks; steps are hook-able with scripts | +| **Security decisioning as a gate** | Scan on build, evaluate on release, re-evaluate on CVE updates—without forcing re-scans | +| **OCI-digest-first releases** | A release is an immutable digest (or bundle of digests); track "what is deployed where" | +| **Toolchain-agnostic integrations** | Plug into any SCM, any CI, any registry, any secrets system | +| **Auditability + standards** | Evidence packets (exportable), SBOM/VEX/attestation support, deterministic replay | --- -## 3) Security Invariants (What MUST Always Hold) +## 3) Design Principles & Invariants -1. **Artifact identity is content‑addressed.** +These principles are **inviolable** and MUST be reflected in all code, UI, documentation, and audit artifacts. + +### Principle 1: Release Identity via Digest + +``` +INVARIANT: A release is a set of OCI image digests (component → digest mapping), never tags. +``` + +- Tags are convenience inputs for resolution +- Tags are resolved to digests at release creation time +- All downstream operations (promotion, deployment, rollback) use digests +- Digest mismatch at pull time = deployment failure (tamper detection) + +### Principle 2: Determinism and Evidence + +``` +INVARIANT: Every deployment/promotion produces an immutable evidence record. +``` + +Evidence record contains: +- **Who**: User identity (from Authority) +- **What**: Release bundle (digests), target environment, target hosts +- **Why**: Policy evaluation result, approval records, decision reasons +- **How**: Generated artifacts (compose files, scripts), execution logs +- **When**: Timestamps for request, decision, execution, completion + +### Principle 3: Pluggable Everything, Stable Core + +``` +INVARIANT: Integrations are plugins; the core orchestration engine is stable. +``` + +Plugins contribute: +- Configuration screens (UI) +- Connector logic (runtime) +- Step node types (workflow) +- Doctor checks (diagnostics) +- Agent types (deployment) + +Core engine provides: +- Workflow execution (DAG processing) +- State machine management +- Evidence generation +- Policy evaluation +- Credential brokering + +### Principle 4: No Feature Gating + +``` +INVARIANT: All plans include all features. Limits are only: +- Number of environments +- Number of new digests analyzed per day +- Fair use on deployments +``` + +### Principle 5: Offline-First Operation + +``` +INVARIANT: All core operations MUST work in air-gapped environments. +``` + +- No runtime calls to external APIs for core decisions +- Vulnerability data synced via mirror bundles +- Plugins may require connectivity; core does not +- Evidence packets exportable for external audit + +### Principle 6: Immutable Generated Artifacts + +``` +INVARIANT: Every deployment generates and stores immutable artifacts. +``` + +Generated artifacts: +- `compose.stella.lock.yml`: Pinned digests, resolved env refs +- `deploy.stella.script.dll`: Compiled C# script (or hash reference) +- `release.evidence.json`: Decision record +- `stella.version.json`: Version sticker placed on target + +--- + +## 4) Security Invariants (Scanning & Attestation) + +These invariants from our scanning heritage remain core to the security gate: + +1. **Artifact identity is content-addressed.** + - All identities are SHA-256 digests of immutable blobs. - * All identities are SHA‑256 digests of immutable blobs (images, SBOMs, attestations). 2. **Every SBOM is signed.** + - SBOMs MUST be wrapped in **in-toto DSSE** attestations tied to the container digest. - * SBOMs MUST be wrapped in **in‑toto DSSE** attestations tied to the container digest. 3. **Provenance is attached, not implied.** + - Build metadata (who/where/how) MUST ride as attestations linked by digest. - * Build metadata (who/where/how) MUST ride as attestations linked by digest. 4. **Transparency FIRST mindset.** + - Signatures/attestations SHOULD be logged to **Rekor** and store inclusion proofs. - * Signatures/attestations SHOULD be logged to **Rekor** and store inclusion proofs. 5. **Determinism & replay.** + - Scans MUST be reproducible given: input digests, scanner version, DB snapshot, and config. - * Scans MUST be reproducible given: input digests, scanner version, DB snapshot, and config. 6. **Explainability.** + - Findings MUST show the *why*: package → file path → call-stack / entrypoint (when available). - * Findings MUST show the *why*: package → file path → call‑stack / entrypoint (when available). 7. **Exploitability over enumeration.** + - Risk MUST be communicated via **VEX** (OpenVEX), including **under_investigation** where appropriate. - * Risk MUST be communicated via **VEX** (OpenVEX), including **under_investigation** where appropriate. -8. **Least privilege & minimal trust.** - - * Build keys are short‑lived; scanners run on ephemeral, least‑privileged workers. -9. **Air‑gap friendly.** - - * Mirrors for vuln DBs and containers; all verification MUST work without public egress. -10. **No hidden blockers.** - -* Policy gates MUST be code‑reviewable (e.g., Rego) and auditable; waivers are attestations, not emails. +8. **Air-gap friendly.** + - Mirrors for vuln DBs and containers; all verification MUST work without public egress. --- -## 4) Trust Boundaries & Roles +## 5) Adopted Standards - CI - CI -->|image digest| REG - REG -->|pull by digest| SB - SB --> AT --> TR --> REK - AT --> ST - REK --> ST - ST --> SCN --> POL --> ST - -``` --> - -* **Build/CI:** Holds signing capability (short‑lived keys or keyless signing). -* **Registry:** Source of truth for image bytes; access via digest only. -* **Scanner Pool:** Ephemeral nodes; content‑addressed caches; no shared mutable state. -* **Artifacts Store:** Immutable, WORM‑like storage for SBOMs, attestations, proofs, SARIF, VEX. +| Domain | Standard | Stella Pin | Notes | +|--------|----------|------------|-------| +| **SBOM** | CycloneDX | **1.7** | JSON or XML; 1.6 ingest supported | +| **Attestation** | in-toto | **Statement v1** | Predicates per use case | +| **Envelope** | DSSE | **v1** | Canonical JSON payloads | +| **Transparency** | Sigstore Rekor | **API stable** | Inclusion proof stored alongside artifacts | +| **VEX** | OpenVEX | **spec current** | Map to SPDX 3.0.1 relationships as needed | +| **Interop** | SPDX | **3.0.1** | Use for modeling & cross-ecosystem exchange | +| **Findings** | SARIF | **2.1.0** | Optional but recommended | --- -## 5) Data & Evidence We Persist +## 6) Competitive Positioning -| Artifact | MUST Persist | Why | -| -------------------- | ------------------------------------ | ---------------------------- | -| SBOM (CycloneDX 1.7) | Raw file + DSSE attestation | Reproducibility, audit | -| in‑toto Statement | Full JSON | Traceability | -| Rekor entry | UUID + inclusion proof | Tamper‑evidence | -| Scanner output | SARIF + raw notes | Triage & tooling interop | -| VEX | OpenVEX + links to findings | Noise reduction & compliance | -| Policy decisions | Input set + decision + rule versions | Governance & forensics | +### Why Stella Wins (One Line Each) -Retention follows our Compliance policy; default **≥ 18 months**. +- **CI/CD tools** (Actions/Jenkins/GitLab CI): great at running pipelines, weak at being a central release authority with audit-grade evidence. +- **CD orchestrators** (Octopus/Harness/Spinnaker): strong promotions, but security is bolt-on and pricing scales poorly. +- **Registries** (Harbor/JFrog): can store and scan, but don't provide release governance. +- **Scanners/CNAPP** (Trivy/Snyk/Aqua): scan well, but don't provide release orchestration. + +### Core Differentiators (Moats) + +1. **Non-Kubernetes Specialization** — Docker hosts, Compose, ECS, Nomad are first-class, not afterthoughts. + +2. **Signed Reachability** — Every reachability graph is sealed with DSSE; optional edge-bundle attestations for runtime/init/contested paths. + +3. **Deterministic Replay** — Scans and release decisions run bit-for-bit identical from frozen feeds and manifests. + +4. **Evidence-Linked Decisions** — Every gate evaluation produces a signed decision record with evidence refs. + +5. **Sovereign + Offline Operation** — FIPS/eIDAS/GOST/SM/PQC profiles and offline mirrors as first-class toggles. + +6. **Cost Model** — No per-seat, per-project, or per-deployment tax. Limits are environments + new digests/day. --- -## 6) Scanner Requirements (Determinism & Explainability) +## 7) Release Orchestration Architecture (Planned) -* **Inputs pinned:** image digest(s), SBOM(s), scanner version, vuln DB snapshot date, config hash. -* **Explainability:** show file paths, package coords (e.g., purl), and—when possible—**entry‑trace/call‑stack** from executable entrypoints to vulnerable symbol(s). -* **Caching:** content‑addressed per‑layer & per‑ecosystem caches; warming does not change decisions. -* **Unknowns:** output **under_investigation** where exploitability is not yet known; roll into VEX. -* **Interchange:** emit **SARIF** for IDE and pipeline consumption (optional but recommended). +### New Themes + +| Theme | Purpose | Key Modules | +|-------|---------|-------------| +| **INTHUB** | Integration hub | Integration Manager, Connection Profiles, Connector Runtime | +| **ENVMGR** | Environment management | Environment Manager, Target Registry, Agent Manager | +| **RELMAN** | Release management | Component Registry, Version Manager, Release Manager | +| **WORKFL** | Workflow engine | Workflow Designer, Workflow Engine, Step Executor | +| **PROMOT** | Promotion and approval | Promotion Manager, Approval Gateway, Decision Engine | +| **DEPLOY** | Deployment execution | Deploy Orchestrator, Target Executor, Artifact Generator | +| **AGENTS** | Deployment agents | Agent Core, Docker/Compose/ECS/Nomad agents | +| **PROGDL** | Progressive delivery | A/B Manager, Traffic Router, Canary Controller | +| **RELEVI** | Release evidence | Evidence Collector, Sticker Writer, Audit Exporter | +| **PLUGIN** | Plugin infrastructure | Plugin Registry, Plugin Loader, Plugin SDK | + +### Key Data Entities + +- **Environment**: Dev/Stage/Prod with freeze windows, approval policies +- **Target**: Deployment destination (Docker host, Compose host, ECS service, Nomad job) +- **Agent**: Deployment executor with capabilities and heartbeat +- **Component**: Logical service mapped to image repository +- **Release**: Bundle of component digests with semantic version +- **Promotion**: Request to move release between environments +- **Workflow**: DAG of steps for deployment execution +- **Evidence Packet**: Signed bundle of decision inputs and outputs --- -## 7) Policy Gate (OPA/Rego) — Examples +## 8) Existing Capabilities (Operational) -> Gate runs after scan + VEX merge. It treats VEX as first‑class input. +These themes power the security gate within release orchestration: -### 7.1 Deny unreconciled criticals that are exploitable - -```rego -package stella.policy - -default allow := false - -exploitable(v) { - v.severity == "CRITICAL" - v.exploitability == "affected" -} - -allow { - not exploitable_some -} - -exploitable_some { - some v in input.findings - exploitable(v) - not waived(v.id) -} - -waived(id) { - some w in input.vex - w.vuln_id == id - w.status == "not_affected" - w.justification != "" -} -``` - -### 7.2 Require Rekor inclusion for attestations - -```rego -package stella.policy - -violation[msg] { - some a in input.attestations - not a.rekor.inclusion_proof - msg := sprintf("Attestation %s lacks Rekor inclusion proof", [a.id]) -} -``` +| Theme | Purpose | Key Modules | +|-------|---------|-------------| +| **INGEST** | Advisory ingestion | Concelier, Advisory-AI | +| **VEXOPS** | VEX document handling | Excititor, VEX Lens, VEX Hub | +| **REASON** | Policy and decisioning | Policy Engine, OPA Runtime | +| **SCANENG** | Scanning and SBOM | Scanner, SBOM Service, Reachability | +| **EVIDENCE** | Evidence and attestation | Evidence Locker, Attestor, Export Center | +| **RUNTIME** | Runtime signals | Signals, Graph, Zastava | +| **JOBCTRL** | Job orchestration | Scheduler, Orchestrator, TaskRunner | +| **OBSERVE** | Observability | Notifier, Telemetry | +| **REPLAY** | Deterministic replay | Replay Engine | +| **DEVEXP** | Developer experience | CLI, Web UI, SDK | --- -## 8) Version Pins & Compatibility +## 9) Pricing Model -| Domain | Standard | Stella Pin | Notes | -| ------------ | -------------- | ---------------- | ------------------------------------------------ | -| SBOM | CycloneDX | **1.7** | JSON or XML accepted; 1.6 ingest supported | -| Attestation | in‑toto | **Statement v1** | Predicates per use case (e.g., sbom, provenance) | -| Envelope | DSSE | **v1** | Canonical JSON payloads | -| Transparency | Sigstore Rekor | **API stable** | Inclusion proof stored alongside artifacts | -| VEX | OpenVEX | **spec current** | Map to SPDX 3.0.1 relationships as needed | -| Interop | SPDX | **3.0.1** | Use for modeling & cross‑ecosystem exchange | -| Findings | SARIF | **2.1.0** | Optional but recommended | +**Principle:** Pay for scale, not for features or automation. + +| Plan | Price | Environments | New Digests/Day | Notes | +|------|-------|--------------|-----------------|-------| +| **Free** | $0/month | 3 | 333 | Full features, unlimited deployments (fair use) | +| **Pro** | $699/month | 33 | 3,333 | Same features | +| **Enterprise** | $1,999/month | Unlimited | Unlimited | Fair use on mirroring/audit bandwidth | --- -## 9) Minimal CLI Playbook (Illustrative) +## 10) Implementation Roadmap (Planned) -> Commands below are illustrative; wire them into CI with short‑lived credentials. - -```bash -# 1) Produce SBOM (CycloneDX 1.7) from image digest -syft registry:5000/myimg@sha256:... -o cyclonedx-json > sbom.cdx.json - -# 2) Create in‑toto DSSE attestation bound to the image digest -cosign attest --predicate sbom.cdx.json \ - --type https://stella-ops.org/attestations/sbom/1 \ - --key env://COSIGN_KEY \ - registry:5000/myimg@sha256:... - -# 3) (Optional but recommended) Rekor transparency -cosign sign --key env://COSIGN_KEY registry:5000/myimg@sha256:... -cosign verify-attestation --type ... --certificate-oidc-issuer https://token.actions... registry:5000/myimg@sha256:... > rekor-proof.json - -# 4) Scan (pinned DB snapshot) -stella-scan --image registry:5000/myimg@sha256:... \ - --sbom sbom.cdx.json \ - --db-snapshot 2025-10-01 \ - --out findings.sarif - -# 5) Emit VEX -stella-vex --from findings.sarif --policy vex-policy.yaml --out vex.json - -# 6) Gate -opa eval -i gate-input.json -d policy/ -f pretty "data.stella.policy.allow" -``` +| Phase | Focus | Key Deliverables | +|-------|-------|------------------| +| **Phase 1** | Foundation | Environment management, integration hub, release bundles | +| **Phase 2** | Workflow Engine | DAG execution, step registry, workflow templates | +| **Phase 3** | Promotion & Decision | Approval gateway, security gates, decision records | +| **Phase 4** | Deployment Execution | Docker/Compose agents, artifact generation, rollback | +| **Phase 5** | UI & Polish | Release dashboard, promotion UI, environment management | +| **Phase 6** | Progressive Delivery | A/B releases, canary, traffic routing | +| **Phase 7** | Extended Targets | ECS, Nomad, SSH/WinRM agentless | +| **Phase 8** | Plugin Ecosystem | Full plugin system, marketplace | --- -## 10) JSON Skeletons (Copy‑Ready) +## 11) Change Log -### 10.1 in‑toto Statement (DSSE payload) - -```json -{ - "_type": "https://in-toto.io/Statement/v1", - "subject": [ - { - "name": "registry:5000/myimg", - "digest": { "sha256": "IMAGE_DIGEST_SHA256" } - } - ], - "predicateType": "https://stella-ops.org/attestations/sbom/1", - "predicate": { - "sbomFormat": "CycloneDX", - "sbomVersion": "1.7", - "mediaType": "application/vnd.cyclonedx+json", - "location": "sha256:SBOM_BLOB_SHA256" - } -} -``` - -### 10.2 DSSE Envelope (wrapping the Statement) - -```json -{ - "payloadType": "application/vnd.in-toto+json", - "payload": "BASE64URL_OF_CANONICAL_STATEMENT_JSON", - "signatures": [ - { - "keyid": "KEY_ID_OR_CERT_ID", - "sig": "BASE64URL_SIGNATURE" - } - ] -} -``` - -### 10.3 OpenVEX (compact) - -```json -{ - "@context": "https://openvex.dev/ns/v0.2.0", - "author": "Stella Ops Security", - "timestamp": "2025-10-29T00:00:00Z", - "statements": [ - { - "vulnerability": "CVE-2025-0001", - "products": ["pkg:purl/example@1.2.3?arch=amd64"], - "status": "under_investigation", - "justification": "analysis_ongoing", - "timestamp": "2025-10-29T00:00:00Z" - } - ] -} -``` +| Version | Date | Note | +|---------|------|------| +| v2.0 | 09-Jan-2026 | Major revision: pivot to release control plane; scanning becomes gate | +| v1.4 | 29-Oct-2025 | Initial principles, golden path, policy examples, JSON skeletons | +| v1.3 | 12-Jul-2025 | Expanded ecosystem pillar, added metrics/integrations | +| v1.2 | 11-Jul-2025 | Restructured to link with WHY; merged principles | +| v1.1 | 11-Jul-2025 | Original OSS-only vision | +| v1.0 | 09-Jul-2025 | First public draft | --- -## 11) Handling “Unknowns” & Noise +## References -* Use **OpenVEX** statuses: `affected`, `not_affected`, `fixed`, `under_investigation`. -* Prefer **justifications** over free‑text. -* Time‑bound **waivers** are modeled as VEX with `not_affected` + justification or `affected` + compensating controls. -* Dashboards MUST surface counts separately for `under_investigation` so risk is visible. - ---- - -## 12) Operational Guidance - -**Key management** - -* Use **ephemeral OIDC** or short‑lived keys (HSM/KMS bound). -* Rotate signer identities at least quarterly; no shared long‑term keys in CI. - -**Caching & performance** - -* Layer caches keyed by digest + analyzer version. -* Pre‑warm vuln DB snapshots; mirror into air‑gapped envs. - -**Multi‑tenancy** - -* Strict tenant isolation for storage and compute. -* Rate‑limit and bound memory/CPU per scan job. - -**Auditing** - -* Every decision is a record: inputs, versions, rule commit, actor, result. -* Preserve Rekor inclusion proofs with the attestation record. - ---- - -## 13) Exceptions Process (Break‑glass) - -1. Open a tracked exception with: artifact digest, CVE(s), business justification, expiry. -2. Generate VEX entry reflecting the exception (`not_affected` with justification or `affected` with compensating controls). -3. Merge into policy inputs; **policy MUST read VEX**, not tickets. -4. Re‑review before expiry; exceptions cannot auto‑renew. - ---- - -## 14) Threat Model (Abbreviated) - -* **Tampering**: modified SBOMs/attestations → mitigated by DSSE + Rekor + WORM storage. -* **Confused deputy**: scanning a different image → mitigated by digest‑only pulls and subject digests in attestations. -* **TOCTOU / re‑tagging**: registry tags drift → mitigated by digest pinning everywhere. -* **Scanner poisoning**: unpinned DBs → mitigated by snapshotting and recording version/date. -* **Key compromise**: long‑lived CI keys → mitigated by OIDC keyless or short‑lived KMS keys. - ---- - -## 15) Implementation Checklist - -* [ ] SBOM producer emits CycloneDX 1.7; bound to image digest. -* [ ] in‑toto+DSSE signing wired in CI; Rekor logging enabled. -* [ ] Durable artifact store with WORM semantics. -* [ ] Scanner produces explainable findings; SARIF optional. -* [ ] OpenVEX emitted and archived; linked to findings & image. -* [ ] Policy gate enforced; waivers modeled as VEX; decisions logged. -* [ ] Air‑gap mirrors for registry and vuln DBs. -* [ ] Runbooks for key rotation, Rekor outage, and database rollback. - ---- - -## 16) Glossary - -* **SBOM**: Software Bill of Materials describing packages/components within an artifact. -* **Attestation**: Signed statement binding facts (predicate) to a subject (artifact) using in‑toto. -* **DSSE**: Envelope that signs the canonical payload detached from transport. -* **Transparency Log**: Append‑only log (e.g., Rekor) giving inclusion and temporal proofs. -* **VEX**: Vulnerability Exploitability eXchange expressing exploitability status & justification. - ---- - -## 9) Moats - - -**Four capabilities no competitor offers together:** - -1. **Signed Reachability** – Every reachability graph is sealed with DSSE; optional edge-bundle attestations for runtime/init/contested paths. -2. **Deterministic Replay** – Scans run bit-for-bit identical from frozen feeds and analyzer manifests. -3. **Explainable Policy (Lattice VEX)** – Evidence-linked VEX decisions with explicit "Unknown" state handling. -4. **Sovereign + Offline Operation** – FIPS/eIDAS/GOST/SM/PQC profiles and offline mirrors as first-class toggles. - -**Decision Capsules:** Every scan result is sealed in a Decision Capsule—a content-addressed bundle containing exact SBOM, vuln feed snapshots, reachability evidence, policy version, derived VEX, and signatures. Auditors can re-run any capsule bit-for-bit to verify the outcome. - -**Additional moat details:** -- **Deterministic replay:** Hash-stable scans with frozen feeds and analyzer manifests; replay packs verifiable offline. -- **Hybrid reachability attestations:** Graph-level DSSE always; selective edge-bundle DSSE for runtime/init/contested edges with Rekor caps. Both static call-graph edges and runtime-derived edges can be attested. -- **Lattice VEX engine (Evidence-Linked):** Trust algebra across advisories, runtime, reachability, waivers; explainable paths with proof-linked decisions. Unlike yes/no approaches, explicit "Unknown" state handling ensures incomplete data never leads to false safety. -- **Crypto sovereignty:** FIPS/eIDAS/GOST/SM/PQC profiles and offline mirrors as first-class configuration. -- **Proof graph:** DSSE + Rekor spanning SBOM, call-graph, VEX, Decision Capsules, replay manifests for chain-of-custody evidence. -- **VEX Propagation:** Generate vulnerability status attestations downstream consumers can automatically trust and ingest—scalable VEX sharing across the supply chain. - -See also: `docs/product/competitive-landscape.md` for vendor comparison and talking points. - ---- - - -## 8 · Change Log - -| Version | Date | Note (high‑level) | -| ------- | ----------- | ----------------------------------------------------------------------------------------------------- | -| v1.4 | 29-Oct-2025 | Initial principles, golden path, policy examples, and JSON skeletons. | -| v1.4 | 14‑Jul‑2025 | First public revision reflecting quarterly roadmap & KPI baseline. | -| v1.3 | 12‑Jul‑2025 | Expanded ecosystem pillar, added metrics/integrations, refined non-goals, community persona/feedback. | -| v1.2 | 11‑Jul‑2025 | Restructured to link with WHY; merged principles into Strategic Pillars; added review §7 | -| v1.1 | 11‑Jul‑2025 | Original OSS‑only vision | -| v1.0 | 09‑Jul‑2025 | First public draft | - -*(End of Product Vision v1.3)* +- [Overview](../overview.md) — 2-minute product summary +- [Architecture Overview](../ARCHITECTURE_OVERVIEW.md) — High-level architecture +- [Release Orchestrator Architecture](../modules/release-orchestrator/architecture.md) — Detailed orchestrator design +- [Competitive Landscape](competitive-landscape.md) — Vendor comparison +- [Roadmap](../ROADMAP.md) — Implementation priorities diff --git a/docs/product/competitive-landscape.md b/docs/product/competitive-landscape.md index b1ea5a754..1b68b54a2 100644 --- a/docs/product/competitive-landscape.md +++ b/docs/product/competitive-landscape.md @@ -1,8 +1,50 @@ # Competitive Landscape -> **TL;DR:** Stella Ops isn't a scanner that outputs findings. It's a platform that outputs **attestable decisions that can be replayed**. That difference survives auditors, regulators, and supply-chain propagation. +> **TL;DR:** Stella Ops Suite isn't a scanner or a deployment tool—it's a **release control plane** that gates releases using reachability-aware security and produces **attestable decisions that can be replayed**. Non-Kubernetes container estates finally get a central release authority. -Source: internal advisory "23-Nov-2025 - Stella Ops vs Competitors", updated Jan 2026. This summary distils a 15-vendor comparison into actionable positioning notes for sales/PMM and engineering prioritization. +Source: internal advisories "23-Nov-2025 - Stella Ops vs Competitors" and "09-Jan-2026 - Stella Ops Pivot", updated Jan 2026. This summary covers both release orchestration and security positioning. + +--- + +## The New Category: Release Control Plane + +**Stella Ops Suite** occupies a unique position by combining: +- Release orchestration (promotions, approvals, workflows) +- Security decisioning as a gate (not a blocker) +- Non-Kubernetes target specialization +- Evidence-linked decisions with deterministic replay + +### Why Competitors Can't Easily Catch Up (Release Orchestration) + +| Category | Representatives | What They Optimized For | Why They Can't Easily Catch Up | +|----------|----------------|------------------------|-------------------------------| +| **CI/CD Tools** | GitHub Actions, Jenkins, GitLab CI | Running pipelines, build automation | No central release authority; no audit-grade evidence; deployment is afterthought | +| **CD Orchestrators** | Octopus, Harness, Spinnaker | Deployment automation, Kubernetes | Security is bolt-on; non-K8s is second-class; pricing punishes automation | +| **Registries** | Harbor, JFrog Artifactory | Artifact storage, scanning | No release governance; no promotion workflows; no deployment execution | +| **Scanners/CNAPP** | Trivy, Snyk, Aqua | Vulnerability detection | No release orchestration; findings don't integrate with promotion gates | + +### Stella Ops Suite Positioning + +| vs. Category | Why Stella Wins | +|--------------|-----------------| +| **vs. CI/CD tools** | They run pipelines; we provide central release authority with audit-grade evidence | +| **vs. CD orchestrators** | They bolt on security; we integrate it as gates. They punish automation with per-project pricing; we don't | +| **vs. Registries** | They store and scan; we govern releases and orchestrate deployments | +| **vs. Scanners** | They output findings; we output release decisions with evidence packets | + +### Unique Differentiators (Release Orchestration) + +| Differentiator | What It Means | +|----------------|---------------| +| **Non-Kubernetes Specialization** | Docker hosts, Compose, ECS, Nomad are first-class—not afterthoughts | +| **Digest-First Release Identity** | Releases are immutable OCI digests, not mutable tags | +| **Security Gates in Promotion** | Scan on build, evaluate on release, re-evaluate on CVE updates | +| **Evidence Packets** | Every release decision is cryptographically signed and replayable | +| **Cost Model** | No per-seat, per-project, per-deployment tax. Environments + new digests/day | + +--- + +## Security Positioning (Original Analysis) ---