diff --git a/AGENTS.md b/AGENTS.md index be402f746..d054d5ed1 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -15,27 +15,20 @@ Unless explicitly told otherwise, assume you are working inside the StellaOps mo --- -### 1) What is StellaOps? +## Project Overview -**StellaOps** is a next-generation, sovereign container-security toolkit built for high-speed, offline operation and released under AGPL-3.0-or-later. +**Stella Ops Suite** is a self-hostable, sovereign release control plane for non-Kubernetes container estates, released under AGPL-3.0-or-later. It orchestrates environment promotions (Dev → Stage → Prod), gates releases using reachability-aware security and policy, and produces verifiable evidence for every release decision. -StellaOps is a self-hostable, sovereign container-security platform that makes proof—not promises—default. It binds every container digest to content-addressed SBOMs (SPDX 3.0.1 and CycloneDX 1.6), in-toto/DSSE attestations, and optional Sigstore Rekor transparency, then layers deterministic, replayable scanning with entry-trace and VEX-first decisioning. +The platform combines: +- **Release orchestration** — UI-driven promotion, approvals, policy gates, rollbacks; hook-able with scripts +- **Security decisioning as a gate** — Scan on build, evaluate on release, re-evaluate on CVE updates +- **OCI-digest-first releases** — Immutable digest-based release identity with "what is deployed where" tracking +- **Toolchain-agnostic integrations** — Plug into any SCM, CI, registry, and secrets system +- **Auditability + standards** — Evidence packets, SBOM/VEX/attestation support, deterministic replay -“Next-gen” means: +Existing capabilities (operational): Reproducible vulnerability scanning with VEX-first decisioning, SBOM generation (SPDX 2.2/2.3 and CycloneDX 1.7; SPDX 3.0.1 planned), in-toto/DSSE attestations, and optional Sigstore Rekor transparency. The platform is designed for offline/air-gapped operation with regional crypto support (eIDAS/FIPS/GOST/SM). -* Findings are reproducible and explainable. -* Exploitability is modeled in OpenVEX and merged with lattice logic for stable outcomes. -* The same workflow runs online or fully air-gapped. - -“Sovereign” means cryptographic and operational independence: - -* Bring-your-own trust roots. -* Regional crypto readiness (eIDAS/FIPS/GOST/SM). -* Offline bundles and post-quantum-ready modes. - -Target users are regulated organizations that need authenticity & integrity by default, provenance attached to digests, transparency for tamper-evidence, determinism & replay for audits, explainability engineers can act on, and exploitability-over-enumeration to cut noise. We minimize trust and blast radius with short-lived keys, least-privilege, and content-addressed caches; we stay air-gap friendly with mirrored feeds; and we keep governance honest with reviewable OPA/Rego policy gates and VEX-based waivers. - -More documentation is in `./docs/*.md`. Start with `docs/README.md` to discover available documentation. When needed, you may request specific documents to be provided (e.g., `docs/modules/scanner/architecture.md`). +Planned capabilities (release orchestration): Environment management, release bundles, promotion workflows, deployment execution (Docker/Compose/ECS/Nomad agents), progressive delivery (A/B, canary), and a three-surface plugin system. See `docs/modules/release-orchestrator/README.md` for the full specification. --- diff --git a/CLAUDE.md b/CLAUDE.md index b59dd10e1..f83bfb416 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -4,7 +4,18 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co ## Project Overview -StellaOps is a self-hostable, sovereign container-security platform released under AGPL-3.0-or-later. It provides reproducible vulnerability scanning with VEX-first decisioning, SBOM generation (SPDX 2.2/2.3 and CycloneDX 1.7; SPDX 3.0.1 planned), in-toto/DSSE attestations, and optional Sigstore Rekor transparency. The platform is designed for offline/air-gapped operation with regional crypto support (eIDAS/FIPS/GOST/SM). +**Stella Ops Suite** is a self-hostable, sovereign release control plane for non-Kubernetes container estates, released under AGPL-3.0-or-later. It orchestrates environment promotions (Dev → Stage → Prod), gates releases using reachability-aware security and policy, and produces verifiable evidence for every release decision. + +The platform combines: +- **Release orchestration** — UI-driven promotion, approvals, policy gates, rollbacks; hook-able with scripts +- **Security decisioning as a gate** — Scan on build, evaluate on release, re-evaluate on CVE updates +- **OCI-digest-first releases** — Immutable digest-based release identity with "what is deployed where" tracking +- **Toolchain-agnostic integrations** — Plug into any SCM, CI, registry, and secrets system +- **Auditability + standards** — Evidence packets, SBOM/VEX/attestation support, deterministic replay + +Existing capabilities (operational): Reproducible vulnerability scanning with VEX-first decisioning, SBOM generation (SPDX 2.2/2.3 and CycloneDX 1.7; SPDX 3.0.1 planned), in-toto/DSSE attestations, and optional Sigstore Rekor transparency. The platform is designed for offline/air-gapped operation with regional crypto support (eIDAS/FIPS/GOST/SM). + +Planned capabilities (release orchestration): Environment management, release bundles, promotion workflows, deployment execution (Docker/Compose/ECS/Nomad agents), progressive delivery (A/B, canary), and a three-surface plugin system. See `docs/modules/release-orchestrator/README.md` for the full specification. ## Build Commands diff --git a/docs/ARCHITECTURE_OVERVIEW.md b/docs/ARCHITECTURE_OVERVIEW.md index fcdfb8d52..2e456ad7f 100755 --- a/docs/ARCHITECTURE_OVERVIEW.md +++ b/docs/ARCHITECTURE_OVERVIEW.md @@ -1,41 +1,84 @@ # Architecture Overview (High-Level) -This document is the 10-minute tour for StellaOps: what components exist, how they fit together, and what "offline-first + deterministic + evidence-linked decisions" means in practice. +This document is the 10-minute tour for Stella Ops Suite: what components exist, how they fit together, and what "release control plane + security gates + evidence-linked decisions" means in practice. For the full reference map (services, boundaries, detailed flows), see `docs/ARCHITECTURE_REFERENCE.md`. +## What Stella Ops Suite Is + +**Stella Ops Suite is a centralized, auditable release control plane for non-Kubernetes container estates.** + +It sits between your CI and your runtime targets, governs promotion across environments, enforces security and policy gates, and produces verifiable evidence for every release decision. + +``` +CI Build → Registry → Stella (Scan + Release + Promote + Gate + Deploy) → Targets → Evidence +``` + ## Guiding Principles -- **SBOM-first:** scan and reason over SBOMs; fall back to unpacking only when needed. +- **Digest-first releases:** a release is an immutable set of OCI digests, never mutable tags. - **Deterministic replay:** the same inputs yield the same outputs (stable ordering, canonical hashing, UTC timestamps). -- **Evidence-linked decisions:** policy decisions link back to specific evidence artifacts (SBOM slices, advisory/VEX observations, reachability proofs, attestations). -- **Aggregation-not-merge:** upstream advisories and VEX are stored and exposed with provenance; conflicts are visible, not silently collapsed. -- **Offline-first:** the same workflow runs connected or air-gapped via Offline Kit snapshots and signed bundles. +- **Evidence-linked decisions:** every release decision links to concrete evidence artifacts (scan verdicts, approvals, policy evaluations). +- **Pluggable everything:** integrations are plugins; the core orchestration engine is stable. +- **Offline-first:** all core operations work in air-gapped environments. +- **No feature gating:** all plans include all features; limits are environments + new digests/day. -## System Map (What Runs) +## System Map + +### Release-Centric Flow ``` -Build -> Sign -> Store -> Scan -> Decide -> Attest -> Notify/Export +Build → Scan → Create Release → Request Promotion → Gate Evaluation → Deploy → Evidence + ↑ ↓ + └── Re-evaluate on CVE Updates ┘ ``` -At a high level, StellaOps is a set of services grouped by responsibility: +### Platform Themes -- **Identity and authorization:** Authority (OIDC/OAuth2, scopes/tenancy) -- **Scanning and SBOM:** Scanner WebService + Worker (facts generation) -- **Advisories:** Concelier (ingest/normalize/export vulnerability sources) -- **VEX:** Excititor + VEX Lens (VEX observations/linksets and exploration) -- **Decisioning:** Policy Engine surfaces (lattice-style explainable policy) -- **Signing and transparency:** Signer + Attestor (DSSE/in-toto and optional transparency) -- **Orchestration and delivery:** Scheduler, Notify, Export Center -- **Console:** Web UI for operators and auditors +Stella Ops Suite organizes capabilities into **themes** (functional areas): -| Tier | Services | Key responsibilities | +#### Existing Themes (Operational) + +| Theme | Purpose | Key Modules | +|-------|---------|-------------| +| **INGEST** | Advisory ingestion | Concelier, Advisory-AI | +| **VEXOPS** | VEX document handling | Excititor, VEX Lens, VEX Hub | +| **REASON** | Policy and decisioning | Policy Engine, OPA Runtime | +| **SCANENG** | Scanning and SBOM | Scanner, SBOM Service, Reachability | +| **EVIDENCE** | Evidence and attestation | Evidence Locker, Attestor, Export Center | +| **RUNTIME** | Runtime signals | Signals, Graph, Zastava | +| **JOBCTRL** | Job orchestration | Scheduler, Orchestrator, TaskRunner | +| **OBSERVE** | Observability | Notifier, Telemetry | +| **REPLAY** | Deterministic replay | Replay Engine | +| **DEVEXP** | Developer experience | CLI, Web UI, SDK | + +#### Planned Themes (Release Orchestration) + +| Theme | Purpose | Key Modules | +|-------|---------|-------------| +| **INTHUB** | Integration hub | Integration Manager, Connection Profiles, Connector Runtime, Doctor Checks | +| **ENVMGR** | Environment management | Environment Manager, Target Registry, Agent Manager, Inventory Sync | +| **RELMAN** | Release management | Component Registry, Version Manager, Release Manager, Release Catalog | +| **WORKFL** | Workflow engine | Workflow Designer, Workflow Engine, Step Executor, Step Registry | +| **PROMOT** | Promotion and approval | Promotion Manager, Approval Gateway, Decision Engine, Gate Registry | +| **DEPLOY** | Deployment execution | Deploy Orchestrator, Target Executor, Runner Executor, Artifact Generator, Rollback Manager | +| **AGENTS** | Deployment agents | Agent Core, Agent Docker, Agent Compose, Agent SSH, Agent WinRM, Agent ECS, Agent Nomad | +| **PROGDL** | Progressive delivery | A/B Manager, Traffic Router, Canary Controller, Rollout Strategy | +| **RELEVI** | Release evidence | Evidence Collector, Evidence Signer, Sticker Writer, Audit Exporter | +| **PLUGIN** | Plugin infrastructure | Plugin Registry, Plugin Loader, Plugin Sandbox, Plugin SDK | + +### Service Tiers + +| Tier | Services | Key Responsibilities | |------|----------|----------------------| -| **Edge / Identity** | `StellaOps.Authority` | Issues short-lived tokens (DPoP + mTLS), exposes OIDC device-code + auth-code flows, rotates JWKS. | -| **Scan & attest** | `StellaOps.Scanner` (API + Worker), `StellaOps.Signer`, `StellaOps.Attestor` | Accept SBOMs/images, drive analyzers, produce DSSE bundles, optionally log to a Rekor mirror. | -| **Evidence graph** | `StellaOps.Concelier`, `StellaOps.Excititor`, `StellaOps.Policy.Engine` | Ingest advisories/VEX, correlate linksets, run lattice policy and VEX-first decisioning. | -| **Experience** | `StellaOps.Web` (Console), `StellaOps.Cli`, `StellaOps.Notify`, `StellaOps.ExportCenter` | Operator UX, automation, notifications, and offline/mirror packaging. | -| **Data plane** | PostgreSQL, Valkey, RustFS/object storage (optional NATS JetStream) | Canonical store, counters/queues, and artifact storage with deterministic layouts. | +| **Edge / Identity** | `StellaOps.Authority` | Issues short-lived tokens (DPoP + mTLS), exposes OIDC flows, rotates JWKS | +| **Release Control** | `StellaOps.ReleaseManager`, `StellaOps.PromotionManager`, `StellaOps.WorkflowEngine` | Release bundles, promotion workflows, gate evaluation (planned) | +| **Integration Hub** | `StellaOps.IntegrationManager`, `StellaOps.ConnectorRuntime` | SCM/CI/Registry/Vault connectors (planned) | +| **Scan & Attest** | `StellaOps.Scanner`, `StellaOps.Signer`, `StellaOps.Attestor` | Accept SBOMs/images, produce DSSE bundles, transparency logging | +| **Evidence Graph** | `StellaOps.Concelier`, `StellaOps.Excititor`, `StellaOps.Policy.Engine` | Advisories/VEX, linksets, lattice policy | +| **Deployment** | `StellaOps.DeployOrchestrator`, `StellaOps.Agent.*` | Deployment execution to Docker/Compose/ECS/Nomad (planned) | +| **Experience** | `StellaOps.Web`, `StellaOps.Cli`, `StellaOps.Notify`, `StellaOps.ExportCenter` | Operator UX, automation, notifications | +| **Data Plane** | PostgreSQL, Valkey, RustFS/object storage | Canonical store, queues, artifact storage | ## Infrastructure (What Is Required) @@ -50,7 +93,9 @@ At a high level, StellaOps is a set of services grouped by responsibility: - **NATS JetStream:** optional messaging transport in some deployments. - **Transparency log services:** Rekor mirror (and CA services) when transparency is enabled. -## End-to-End Flow (Typical) +## End-to-End Flows + +### Current: Vulnerability Scanning Flow 1. **Evidence enters** via Concelier and Excititor connectors (Aggregation-Only Contract). 2. **SBOM arrives** from CLI/CI; Scanner deduplicates layers and enqueues work. @@ -59,22 +104,64 @@ At a high level, StellaOps is a set of services grouped by responsibility: 5. **Signer + Attestor** wrap outputs into DSSE bundles and (optionally) anchor them in a Rekor mirror. 6. **Console/CLI/Export** surface findings and package verifiable evidence; Notify emits digests/incidents. -## Extension Points (Where You Customize) +### Planned: Release Orchestration Flow + +1. **CI pushes image** to registry by digest; triggers webhook to Stella. +2. **Stella scans** the new digest and stores the verdict. +3. **Release created** bundling component digests with semantic version. +4. **Promotion requested** to move release from Dev → Stage → Prod. +5. **Gate evaluation** checks: security verdict, approval count, freeze windows, custom policies. +6. **Decision record** produced with evidence refs and signed. +7. **Deployment executed** via agent to target (Docker/Compose/ECS/Nomad). +8. **Version sticker** written to target for drift detection. +9. **Evidence packet** sealed and stored. + +## Extension Points + +### Current Extension Points - **Scanner analyzers** (restart-time plug-ins) for ecosystem-specific parsing and facts extraction. - **Concelier connectors** for new advisory sources (preserving aggregation-only guardrails). - **Policy packs** for organization-specific gating and waivers/justifications. - **Export profiles** for output formats and offline bundle shapes. +### Planned Extension Points (Three-Surface Plugin Model) + +Plugins contribute through three surfaces: + +1. **Manifest** (static declaration): What the plugin provides (integrations, steps, agents, gates) +2. **Connector Runtime** (dynamic execution): gRPC interface for runtime operations +3. **Step Provider** (execution contract): Execution characteristics for workflow steps + +Plugin types: +- **Integration connectors:** SCM, CI, Registry, Vault, Target, Router +- **Step providers:** Custom workflow steps +- **Agent types:** New deployment target types +- **Gate providers:** Custom gate evaluations + ## Offline & Sovereign Notes - Offline Kit carries vulnerability feeds, container images, signatures, and verification material so the workflow stays identical when air-gapped. - Authority + token verification remain local; quota enforcement is verifiable offline. - Attestor can cache transparency proofs for offline verification. +- Evidence packets are exportable for external audit in air-gapped environments. +- All release decisions can be replayed with frozen inputs. + +## Key Architectural Decisions + +| Decision | Rationale | +|----------|-----------| +| **Digest-first release identity** | Tags are mutable; digests provide immutable release identity for audit | +| **3-surface plugin model** | Enables extensibility without core code changes | +| **Compiled C# scripts + sandboxed bash** | C# for complex orchestration; bash for simple hooks | +| **Agent + agentless execution** | Agent-based preferred for reliability; agentless for adoption | +| **Evidence packets for every decision** | Enables deterministic replay and audit-grade compliance | ## References -- `docs/ARCHITECTURE_REFERENCE.md` -- `docs/OFFLINE_KIT.md` -- `docs/API_CLI_REFERENCE.md` -- `docs/modules/platform/architecture-overview.md` +- `docs/ARCHITECTURE_REFERENCE.md` — Full reference map +- `docs/modules/release-orchestrator/architecture.md` — Release orchestrator design (planned) +- `docs/OFFLINE_KIT.md` — Air-gap operations +- `docs/API_CLI_REFERENCE.md` — API and CLI contracts +- `docs/modules/platform/architecture-overview.md` — Platform service design +- `docs/product/advisories/09-Jan-2026 - Stella Ops Orchestrator Architecture.md` — Full orchestrator specification diff --git a/docs/FEATURE_MATRIX.md b/docs/FEATURE_MATRIX.md index 9f5d80a18..91e0b7c76 100755 --- a/docs/FEATURE_MATRIX.md +++ b/docs/FEATURE_MATRIX.md @@ -1,30 +1,44 @@ -# 4 · Feature Matrix — **Stella Ops** -*(rev 4.0 · 24 Dec 2025)* +# Feature Matrix — Stella Ops Suite +*(rev 5.0 · 09 Jan 2026)* > **Looking for a quick read?** Check [`key-features.md`](key-features.md) for the short capability cards; this matrix keeps full tier-by-tier detail. --- -## Pricing Tiers Overview +## Product Evolution -| Tier | Scans/Day | Registration | Token Refresh | Target User | Price | -|------|-----------|--------------|---------------|-------------|-------| -| **Free** | 33 | None | 12h auto | Individual developer | $0 | -| **Community** | 333 | Required | 30d manual | Startups, small teams (<25) | $0 | -| **Enterprise** | 2,000+ | SSO/Contract | Annual | Organizations (25+), regulated | Contact Sales | +**Stella Ops Suite** is now a centralized, auditable release control plane for non-Kubernetes container estates. The platform combines release orchestration with security decisioning as a gate. -**Key Differences:** -- **Free → Community**: 10× quota, deep analysis, Helm/K8s, email alerts, requires registration -- **Community → Enterprise**: Scale (HA), multi-team (RBAC scopes), automation (CI/CD), support (SLA) +- **Release orchestration** — UI-driven promotion (Dev → Stage → Prod), approvals, policy gates, rollbacks +- **Security decisioning as a gate** — Scan on build, evaluate on release, re-evaluate on CVE updates +- **OCI-digest-first releases** — Immutable digest-based release identity +- **Evidence packets** — Every release decision is cryptographically signed and stored + +--- + +## Pricing Model + +**Principle:** Pay for scale, not for features or automation. No per-seat, per-project, or per-deployment taxes. + +| Plan | Price | Environments | New Digests/Day | Deployments | Notes | +|------|-------|--------------|-----------------|-------------|-------| +| **Free** | $0/month | 3 | 333 | Unlimited (fair use) | Full features | +| **Pro** | $699/month | 33 | 3,333 | Unlimited (fair use) | Same features | +| **Enterprise** | $1,999/month | Unlimited | Unlimited | Unlimited | Fair use on mirroring/audit bandwidth | + +**Key Principles:** +- All plans include all features (no feature gating) +- Limits are environments + new digests analyzed per day +- Unlimited deployments with fair use policy --- ## Competitive Moat Features -*These differentiators are available across all tiers to build brand and adoption.* +*These differentiators are available across all plans.* -| Capability | Free | Community | Enterprise | Notes | -|------------|:----:|:---------:|:----------:|-------| +| Capability | Free | Pro | Enterprise | Notes | +|------------|:----:|:---:|:----------:|-------| | Signed Replayable Risk Verdicts | ✅ | ✅ | ✅ | Core differentiator | | Decision Capsules | ✅ | ✅ | ✅ | Audit-grade evidence bundles | | VEX Decisioning Engine | ✅ | ✅ | ✅ | Trust lattice + conflict resolution | @@ -32,6 +46,79 @@ | Smart-Diff (Semantic Risk Delta) | ✅ | ✅ | ✅ | Material change detection | | Unknowns as First-Class State | ✅ | ✅ | ✅ | Uncertainty budgets | | Deterministic Replay | ✅ | ✅ | ✅ | `stella replay srm.yaml` | +| Non-Kubernetes First-Class | ✅ | ✅ | ✅ | Docker/Compose/ECS/Nomad targets | +| Digest-First Release Identity | ✅ | ✅ | ✅ | Immutable releases | + +--- + +## Release Orchestration (Planned) + +*Release orchestration capabilities are planned for implementation. All plans will include all features.* + +| Capability | Free | Pro | Enterprise | Notes | +|------------|:----:|:---:|:----------:|-------| +| **Environment Management** | | | | | +| Environment CRUD | ⏳ | ⏳ | ⏳ | Dev/Stage/Prod definitions | +| Freeze Windows | ⏳ | ⏳ | ⏳ | Calendar-based blocking | +| Approval Policies | ⏳ | ⏳ | ⏳ | Per-environment rules | +| **Release Management** | | | | | +| Component Registry | ⏳ | ⏳ | ⏳ | Service → repository mapping | +| Release Bundles | ⏳ | ⏳ | ⏳ | Component → digest bundles | +| Semantic Versioning | ⏳ | ⏳ | ⏳ | SemVer release versions | +| Tag → Digest Resolution | ⏳ | ⏳ | ⏳ | Immutable digest pinning | +| **Promotion & Gates** | | | | | +| Promotion Workflows | ⏳ | ⏳ | ⏳ | Environment transitions | +| Security Gate | ⏳ | ⏳ | ⏳ | Scan verdict evaluation | +| Approval Gate | ⏳ | ⏳ | ⏳ | Human sign-off | +| Freeze Window Gate | ⏳ | ⏳ | ⏳ | Calendar enforcement | +| Policy Gate (OPA/Rego) | ⏳ | ⏳ | ⏳ | Custom rules | +| Decision Records | ⏳ | ⏳ | ⏳ | Evidence-linked decisions | +| **Deployment Execution** | | | | | +| Docker Host Agent | ⏳ | ⏳ | ⏳ | Direct container deployment | +| Compose Host Agent | ⏳ | ⏳ | ⏳ | Docker Compose deployment | +| SSH Agentless | ⏳ | ⏳ | ⏳ | Linux remote execution | +| WinRM Agentless | ⏳ | ⏳ | ⏳ | Windows remote execution | +| ECS Agent | ⏳ | ⏳ | ⏳ | AWS ECS deployment | +| Nomad Agent | ⏳ | ⏳ | ⏳ | HashiCorp Nomad deployment | +| Rollback | ⏳ | ⏳ | ⏳ | Previous version restore | +| **Progressive Delivery** | | | | | +| A/B Releases | ⏳ | ⏳ | ⏳ | Traffic splitting | +| Canary Deployments | ⏳ | ⏳ | ⏳ | Gradual rollout | +| Blue-Green | ⏳ | ⏳ | ⏳ | Zero-downtime switch | +| Traffic Routing Plugins | ⏳ | ⏳ | ⏳ | Nginx/HAProxy/Traefik/ALB | +| **Workflow Engine** | | | | | +| DAG Workflow Execution | ⏳ | ⏳ | ⏳ | Directed acyclic graphs | +| Step Registry | ⏳ | ⏳ | ⏳ | Built-in + custom steps | +| Workflow Templates | ⏳ | ⏳ | ⏳ | Reusable workflows | +| Script Steps (Bash/C#) | ⏳ | ⏳ | ⏳ | Custom automation | +| **Evidence & Audit** | | | | | +| Evidence Packets | ⏳ | ⏳ | ⏳ | Sealed decision bundles | +| Version Stickers | ⏳ | ⏳ | ⏳ | On-target deployment records | +| Audit Export | ⏳ | ⏳ | ⏳ | Compliance reporting | +| **Integrations** | | | | | +| GitHub Integration | ⏳ | ⏳ | ⏳ | SCM + webhooks | +| GitLab Integration | ⏳ | ⏳ | ⏳ | SCM + webhooks | +| Harbor Integration | ⏳ | ⏳ | ⏳ | Registry + scanning | +| HashiCorp Vault | ⏳ | ⏳ | ⏳ | Secrets management | +| AWS Secrets Manager | ⏳ | ⏳ | ⏳ | Secrets management | +| **Plugin System** | | | | | +| Plugin Manifest | ⏳ | ⏳ | ⏳ | Static declarations | +| Connector Runtime | ⏳ | ⏳ | ⏳ | Dynamic execution | +| Step Providers | ⏳ | ⏳ | ⏳ | Custom workflow steps | +| Agent Types | ⏳ | ⏳ | ⏳ | Custom deployment targets | + +--- + +## Plan Limits + +| Limit | Free | Pro | Enterprise | +|-------|:----:|:---:|:----------:| +| **Environments** | 3 | 33 | Unlimited | +| **New Digests/Day** | 333 | 3,333 | Unlimited | +| **Deployments** | Fair use | Fair use | Fair use | +| **Targets per Environment** | 10 | 100 | Unlimited | +| **Agents** | 3 | 33 | Unlimited | +| **Integrations** | 5 | 50 | Unlimited | --- diff --git a/docs/README.md b/docs/README.md index e402681af..d43a3c042 100755 --- a/docs/README.md +++ b/docs/README.md @@ -1,6 +1,13 @@ -# StellaOps Documentation +# Stella Ops Suite Documentation -StellaOps is a deterministic, offline-first container security platform: every verdict links back to concrete evidence (SBOM slices, advisory/VEX observations, reachability proofs, policy explain traces) and can be replayed for audits. +**Stella Ops Suite** is a centralized, auditable release control plane for non-Kubernetes container estates. It orchestrates environment promotions, gates releases using reachability-aware security and policy, and produces verifiable evidence for every decision. + +The platform combines: +- **Release orchestration** — UI-driven promotion (Dev → Stage → Prod), approvals, policy gates, rollbacks +- **Security decisioning as a gate** — Scan on build, evaluate on release, re-evaluate on CVE updates +- **OCI-digest-first releases** — Immutable digest-based release identity with "what is deployed where" tracking +- **Toolchain-agnostic integrations** — Plug into any SCM, CI, registry, and secrets system +- **Auditability + standards** — Evidence packets, SBOM/VEX/attestation support, deterministic replay ## Two Levels of Documentation @@ -11,39 +18,98 @@ This documentation set is internal and does not keep compatibility stubs for old ## Start Here +### Product Understanding + | Goal | Open this | | --- | --- | | Understand the product in 2 minutes | [overview.md](overview.md) | -| Run a first scan (CLI) | [quickstart.md](quickstart.md) | | Browse capabilities | [key-features.md](key-features.md) | +| Feature matrix | [FEATURE_MATRIX.md](FEATURE_MATRIX.md) | +| Product vision | [product/VISION.md](product/VISION.md) | | Roadmap (priorities + definition of "done") | [ROADMAP.md](ROADMAP.md) | + +### Getting Started + +| Goal | Open this | +| --- | --- | +| Run a first scan (CLI) | [quickstart.md](quickstart.md) | +| Ingest advisories (Concelier + CLI) | [CONCELIER_CLI_QUICKSTART.md](CONCELIER_CLI_QUICKSTART.md) | +| Console (Web UI) operator guide | [UI_GUIDE.md](UI_GUIDE.md) | +| Offline / air-gap operations | [OFFLINE_KIT.md](OFFLINE_KIT.md) | + +### Architecture + +| Goal | Open this | +| --- | --- | | Architecture: high-level overview | [ARCHITECTURE_OVERVIEW.md](ARCHITECTURE_OVERVIEW.md) | | Architecture: full reference map | [ARCHITECTURE_REFERENCE.md](ARCHITECTURE_REFERENCE.md) | | Architecture: user flows (UML) | [technical/architecture/user-flows.md](technical/architecture/user-flows.md) | -| Architecture: module matrix (46 modules) | [technical/architecture/module-matrix.md](technical/architecture/module-matrix.md) | +| Architecture: module matrix | [technical/architecture/module-matrix.md](technical/architecture/module-matrix.md) | | Architecture: data flows | [technical/architecture/data-flows.md](technical/architecture/data-flows.md) | | Architecture: schema mapping | [technical/architecture/schema-mapping.md](technical/architecture/schema-mapping.md) | -| Offline / air-gap operations | [OFFLINE_KIT.md](OFFLINE_KIT.md) | -| Security deployment hardening | [SECURITY_HARDENING_GUIDE.md](SECURITY_HARDENING_GUIDE.md) | -| Ingest advisories (Concelier + CLI) | [CONCELIER_CLI_QUICKSTART.md](CONCELIER_CLI_QUICKSTART.md) | +| Release Orchestrator architecture | [modules/release-orchestrator/architecture.md](modules/release-orchestrator/architecture.md) | + +### Development & Operations + +| Goal | Open this | +| --- | --- | | Develop plugins/connectors | [PLUGIN_SDK_GUIDE.md](PLUGIN_SDK_GUIDE.md) | -| Console (Web UI) operator guide | [UI_GUIDE.md](UI_GUIDE.md) | +| Security deployment hardening | [SECURITY_HARDENING_GUIDE.md](SECURITY_HARDENING_GUIDE.md) | | VEX consensus and issuer trust | [VEX_CONSENSUS_GUIDE.md](VEX_CONSENSUS_GUIDE.md) | | Vulnerability Explorer guide | [VULNERABILITY_EXPLORER_GUIDE.md](VULNERABILITY_EXPLORER_GUIDE.md) | ## Detailed Indexes - **Technical index (everything):** [docs/technical/README.md](/docs/technical/) -- **End-to-end workflow flows:** [docs/flows/](/docs/flows/) (16 detailed flow documents) +- **End-to-end workflow flows:** [docs/flows/](/docs/flows/) - **Module dossiers:** [docs/modules/](/docs/modules/) - **API contracts and samples:** [docs/api/](/docs/api/) - **Architecture notes / ADRs:** [docs/technical/architecture/](/docs/technical/architecture/), [docs/technical/adr/](/docs/technical/adr/) -- **Operations and deployment:** [docs/operations/](/docs/operations/), [docs/deploy/](/docs/deploy/), [docs/deployment/](/docs/deployment/) +- **Operations and deployment:** [docs/operations/](/docs/operations/) - **Air-gap workflows:** [docs/modules/airgap/guides/](/docs/modules/airgap/guides/) - **Security deep dives:** [docs/security/](/docs/security/) - **Benchmarks and fixtures:** [docs/benchmarks/](/docs/benchmarks/), [docs/assets/](/docs/assets/) +- **Product advisories:** [docs/product/advisories/](/docs/product/advisories/) -## Notes +## Platform Themes -- The product is **offline-first**: docs and examples should avoid network dependencies and prefer deterministic fixtures. -- Feature exposure is configuration-driven; module dossiers define authoritative schemas and contracts per component. +Stella Ops Suite organizes capabilities into themes: + +### Existing Themes (Operational) + +| Theme | Purpose | Key Modules | +|-------|---------|-------------| +| **INGEST** | Advisory ingestion | Concelier, Advisory-AI | +| **VEXOPS** | VEX document handling | Excititor, VEX Lens, VEX Hub | +| **REASON** | Policy and decisioning | Policy Engine, OPA Runtime | +| **SCANENG** | Scanning and SBOM | Scanner, SBOM Service, Reachability | +| **EVIDENCE** | Evidence and attestation | Evidence Locker, Attestor, Export Center | +| **RUNTIME** | Runtime signals | Signals, Graph, Zastava | +| **JOBCTRL** | Job orchestration | Scheduler, Orchestrator, TaskRunner | +| **OBSERVE** | Observability | Notifier, Telemetry | +| **REPLAY** | Deterministic replay | Replay Engine | +| **DEVEXP** | Developer experience | CLI, Web UI, SDK | + +### Planned Themes (Release Orchestration) + +| Theme | Purpose | Key Modules | +|-------|---------|-------------| +| **INTHUB** | Integration hub | Integration Manager, Connection Profiles, Connector Runtime | +| **ENVMGR** | Environment management | Environment Manager, Target Registry, Agent Manager | +| **RELMAN** | Release management | Component Registry, Version Manager, Release Manager | +| **WORKFL** | Workflow engine | Workflow Designer, Workflow Engine, Step Executor | +| **PROMOT** | Promotion and approval | Promotion Manager, Approval Gateway, Decision Engine | +| **DEPLOY** | Deployment execution | Deploy Orchestrator, Target Executor, Artifact Generator | +| **AGENTS** | Deployment agents | Agent Core, Docker/Compose/ECS/Nomad agents | +| **PROGDL** | Progressive delivery | A/B Manager, Traffic Router, Canary Controller | +| **RELEVI** | Release evidence | Evidence Collector, Sticker Writer, Audit Exporter | +| **PLUGIN** | Plugin infrastructure | Plugin Registry, Plugin Loader, Plugin SDK | + +## Design Principles + +- **Offline-first**: All core operations work in air-gapped environments +- **Deterministic replay**: Same inputs yield same outputs (stable ordering, canonical hashing) +- **Evidence-linked decisions**: Every decision links to concrete evidence artifacts +- **Digest-first release identity**: Releases are immutable OCI digests, not mutable tags +- **Pluggable everything**: Integrations are plugins; core orchestration is stable +- **No feature gating**: All plans include all features; limits are environments + new digests/day diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md index 94e0d1d14..937e5537b 100755 --- a/docs/ROADMAP.md +++ b/docs/ROADMAP.md @@ -1,34 +1,112 @@ # Roadmap -This repository is the source of truth for StellaOps direction. The roadmap is expressed as stable, evidence-based capability milestones (not calendar promises) so it stays correct during long audits and offline operation. +This repository is the source of truth for Stella Ops Suite direction. The roadmap is expressed as stable, evidence-based capability milestones (not calendar promises) so it stays correct during long audits and offline operation. -## How to read this -- **Now / Next / Later** are priority bands, not dates. -- A capability is "done" when the required evidence exists and is reproducible (see `docs/product/roadmap/maturity-model.md`). +## Strategic Direction -## Now (Foundation) -- Deterministic scan pipeline: image -> SBOMs (SPDX 3.0.1 + CycloneDX 1.7) with stable identifiers and replayable outputs. -- Advisory ingestion with offline-friendly mirrors, normalization, and deterministic merges. -- VEX-first triage: OpenVEX ingestion/consensus with explainable, stable verdicts. -- Policy gates: deterministic policy evaluation (OPA/Rego where applicable) with audit-friendly decision traces. -- Offline Kit workflows (bundle -> import -> verify) with signed artifacts and deterministic indexes. +**Stella Ops Suite** is evolving from a vulnerability scanning platform into a **centralized, auditable release control plane** for non-Kubernetes container estates. The existing scanning capabilities become security gates within release orchestration. -## Next (Hardening) -- Multi-tenant isolation (tenancy boundaries + RLS where applicable) and an audit trail built for replay. -- Signing and provenance hardening: DSSE/in-toto everywhere; configurable crypto profiles (FIPS/GOST/SM) where enabled. -- Determinism gates and replay tests in CI to prevent output drift across time and environments. +- **Release orchestration** — UI-driven promotion (Dev → Stage → Prod), approvals, policy gates, rollbacks +- **Security decisioning as a gate** — Scan on build, evaluate on release, re-evaluate on CVE updates +- **OCI-digest-first releases** — Immutable digest-based release identity +- **Non-Kubernetes specialization** — Docker hosts, Compose, ECS, Nomad as first-class targets -## Later (Ecosystem) -- Wider connector/plugin ecosystem, operator tooling, and SDKs. -- Expanded graph/reachability capabilities and export/pack formats for regulated environments. +## How to Read This -## Detailed breakdown -- `docs/product/roadmap/README.md` -- `docs/product/roadmap/maturity-model.md` +- **Operational** = capabilities that are implemented and working +- **Now / Next / Later** = priority bands for new development (not calendar dates) +- A capability is "done" when the required evidence exists and is reproducible (see `docs/product/roadmap/maturity-model.md`) -## Related high-level docs -- `docs/VISION.md` -- `docs/FEATURE_MATRIX.md` -- `docs/ARCHITECTURE_OVERVIEW.md` -- `docs/OFFLINE_KIT.md` -- `docs/key-features.md` +--- + +## Operational (Existing Capabilities) + +These capabilities are implemented and serve as the foundation for security gates: + +- **Deterministic scan pipeline** — Image → SBOMs (SPDX 3.0.1 + CycloneDX 1.7) with stable identifiers and replayable outputs +- **Advisory ingestion** — Offline-friendly mirrors, normalization, deterministic merges (Concelier) +- **VEX-first triage** — OpenVEX ingestion/consensus with explainable, stable verdicts (VEX Lens) +- **Policy gates** — Deterministic policy evaluation (OPA/Rego) with audit-friendly decision traces +- **Offline Kit workflows** — Bundle → import → verify with signed artifacts and deterministic indexes +- **Signing and provenance** — DSSE/in-toto attestations; configurable crypto profiles (FIPS/eIDAS/GOST/SM) +- **Determinism guarantees** — Replay tests in CI; frozen feeds; stable ordering + +--- + +## Now (Release Orchestration Foundation) + +Priority: Building the core release orchestration infrastructure. + +### Phase 1: Foundation +- **Environment management** — Environment CRUD, freeze windows, approval policies +- **Integration hub** — Connection profiles, basic connectors (GitHub, Harbor) +- **Release bundles** — Component registry, release creation, tag → digest resolution +- **Database schemas** — Core release, environment, target tables + +### Phase 2: Workflow Engine +- **DAG execution** — Directed acyclic graph workflow processing +- **Step registry** — Built-in steps (script, approval, deploy, gate) +- **Workflow templates** — Reusable workflow definitions +- **Script execution** — C# compiled scripts + sandboxed bash + +--- + +## Next (Promotion & Deployment) + +Priority: Enabling end-to-end release flow. + +### Phase 3: Promotion & Decision +- **Approval gateway** — Approval collection, separation of duties +- **Security gates** — Integration with scan verdicts for gate evaluation +- **Decision engine** — Gate aggregation, decision record generation +- **Evidence packets** — Sealed, signed evidence bundles + +### Phase 4: Deployment Execution +- **Agent framework** — Core agent infrastructure, heartbeat, capability advertisement +- **Docker/Compose agents** — Agent-based deployment to Docker and Compose targets +- **Artifact generation** — `compose.stella.lock.yml`, deployment scripts +- **Rollback support** — Previous version restoration +- **Version stickers** — On-target deployment records for drift detection + +### Phase 5: UI & Polish +- **Release dashboard** — Release list, status, promotion history +- **Promotion UI** — Request, approve, track promotions +- **Environment management UI** — Environment configuration, freeze windows + +--- + +## Later (Advanced Capabilities) + +Priority: Expanding target support and delivery strategies. + +### Phase 6: Progressive Delivery +- **A/B releases** — Traffic splitting between versions +- **Canary deployments** — Gradual rollout with health checks +- **Traffic routing plugins** — Nginx, HAProxy, Traefik, AWS ALB integration + +### Phase 7: Extended Targets +- **ECS agent** — AWS ECS service deployment +- **Nomad agent** — HashiCorp Nomad job deployment +- **SSH/WinRM agentless** — Remote execution without installed agent + +### Phase 8: Plugin Ecosystem +- **Full plugin system** — Three-surface plugin model (manifest, connector, step provider) +- **Plugin SDK** — Development kit for custom integrations +- **Additional connectors** — Expanded SCM, CI, registry, vault support + +--- + +## Detailed Breakdown + +- `docs/product/roadmap/README.md` — Detailed roadmap documentation +- `docs/product/roadmap/maturity-model.md` — Capability maturity definitions +- `docs/modules/release-orchestrator/architecture.md` — Release orchestrator architecture + +## Related Documents + +- [Product Vision](product/VISION.md) +- [Architecture Overview](ARCHITECTURE_OVERVIEW.md) +- [Feature Matrix](FEATURE_MATRIX.md) +- [Key Features](key-features.md) +- [Offline Kit](OFFLINE_KIT.md) +- [Release Orchestrator Specification](product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) diff --git a/docs/implplan/SPRINT_20260110_100_000_INDEX_plugin_unification.md b/docs/implplan/SPRINT_20260110_100_000_INDEX_plugin_unification.md new file mode 100644 index 000000000..687201b88 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_000_INDEX_plugin_unification.md @@ -0,0 +1,574 @@ +# SPRINT INDEX: Phase 100 - Plugin System Unification + +> **Epic:** Platform Foundation +> **Phase:** 100 - Plugin System Unification +> **Batch:** 100 +> **Status:** TODO +> **Successor:** [101_000_INDEX](SPRINT_20260110_101_000_INDEX_foundation.md) (Release Orchestrator Foundation) + +--- + +## Executive Summary + +Phase 100 establishes a **unified plugin architecture** for the entire Stella Ops platform. This phase reworks all existing plugin systems (Crypto, Auth, LLM, SCM, Scanner, Router, Concelier) into a single, cohesive model that supports: + +- **Trust-based execution** - Built-in plugins run in-process; untrusted plugins run sandboxed +- **Capability composition** - Plugins declare and implement multiple capabilities +- **Database-backed registry** - Centralized plugin management with health tracking +- **Full lifecycle management** - Discovery, loading, initialization, health monitoring, graceful shutdown +- **Multi-tenant isolation** - Per-tenant plugin instances with separate configurations + +This unification is **prerequisite** to the Release Orchestrator (Phase 101+), which extends the plugin system with workflow steps, gates, and orchestration-specific connectors. + +--- + +## Strategic Rationale + +### Why Unify Now? + +1. **Technical Debt Reduction** - Seven disparate plugin patterns create maintenance burden +2. **Security Posture** - Unified trust model enables consistent security enforcement +3. **Developer Experience** - Single SDK for all plugin development +4. **Observability** - Centralized registry enables unified health monitoring +5. **Future Extensibility** - Release Orchestrator requires robust plugin infrastructure + +### Current State Analysis + +| Plugin Type | Location | Interface | Pattern | Issues | +|-------------|----------|-----------|---------|--------| +| Crypto | `src/Cryptography/` | `ICryptoProvider` | Simple DI | No lifecycle, no health checks | +| Authority | `src/Authority/` | Various | Config-driven | Inconsistent interfaces | +| LLM | `src/AdvisoryAI/` | `ILlmProviderPlugin` | Priority selection | No isolation | +| SCM | `src/Integrations/` | `IScmConnectorPlugin` | Factory + auto-detect | No registry | +| Scanner | `src/Scanner/` | Analyzer interfaces | Pipeline | Tightly coupled | +| Router | `src/Router/` | `IRouterTransportPlugin` | Transport abstraction | No health tracking | +| Concelier | `src/Concelier/` | `IConcielierConnector` | Feed ingestion | No unified lifecycle | + +### Target State + +All plugins implement: +```csharp +public interface IPlugin : IAsyncDisposable +{ + PluginInfo Info { get; } + PluginTrustLevel TrustLevel { get; } + PluginCapabilities Capabilities { get; } + Task InitializeAsync(IPluginContext context, CancellationToken ct); + Task HealthCheckAsync(CancellationToken ct); +} +``` + +With capability-specific interfaces: +```csharp +// Crypto capability +public interface ICryptoCapability { ... } + +// Connector capability +public interface IConnectorCapability { ... } + +// Analysis capability +public interface IAnalysisCapability { ... } + +// Transport capability +public interface ITransportCapability { ... } +``` + +--- + +## Architecture Overview + +``` +┌─────────────────────────────────────────────────────────────────────────────────┐ +│ UNIFIED PLUGIN ARCHITECTURE │ +│ │ +│ ┌────────────────────────────────────────────────────────────────────────────┐ │ +│ │ StellaOps.Plugin.Abstractions │ │ +│ │ │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ IPlugin │ │ PluginInfo │ │ TrustLevel │ │ Capabilities│ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Capability Interfaces │ │ │ +│ │ │ │ │ │ +│ │ │ ICryptoCapability IConnectorCapability IAnalysisCapability │ │ │ +│ │ │ IAuthCapability ITransportCapability ILlmCapability │ │ │ +│ │ │ IStepProviderCapability IGateProviderCapability │ │ │ +│ │ └─────────────────────────────────────────────────────────────────────┘ │ │ +│ └────────────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌────────────────────────────────────────────────────────────────────────────┐ │ +│ │ StellaOps.Plugin.Host │ │ +│ │ │ │ +│ │ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │ │ +│ │ │ PluginDiscovery │ │ PluginLoader │ │ PluginRegistry │ │ │ +│ │ │ - File system │ │ - Assembly load │ │ - Database │ │ │ +│ │ │ - Manifest parse │ │ - Type activate │ │ - Health track │ │ │ +│ │ └───────────────────┘ └───────────────────┘ └───────────────────┘ │ │ +│ │ │ │ +│ │ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │ │ +│ │ │LifecycleManager │ │ PluginContext │ │ HealthMonitor │ │ │ +│ │ │ - State machine │ │ - Config bind │ │ - Periodic check │ │ │ +│ │ │ - Graceful stop │ │ - Service access │ │ - Alert on fail │ │ │ +│ │ └───────────────────┘ └───────────────────┘ └───────────────────┘ │ │ +│ └────────────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ┌─────────────────────────┼─────────────────────────┐ │ +│ │ │ │ │ +│ ▼ ▼ ▼ │ +│ ┌────────────────────┐ ┌────────────────────┐ ┌────────────────────┐ │ +│ │ In-Process │ │ Isolated │ │ Sandboxed │ │ +│ │ Execution │ │ Execution │ │ Execution │ │ +│ │ │ │ │ │ │ │ +│ │ TrustLevel.BuiltIn│ │ TrustLevel.Trusted │ │TrustLevel.Untrusted│ │ +│ │ - Direct calls │ │ - AppDomain/ALC │ │ - Process isolation│ │ +│ │ - Shared memory │ │ - Resource limits │ │ - gRPC boundary │ │ +│ │ - No overhead │ │ - Moderate overhead│ │ - Full sandboxing │ │ +│ └────────────────────┘ └────────────────────┘ └────────────────────┘ │ +│ │ +│ ┌────────────────────────────────────────────────────────────────────────────┐ │ +│ │ StellaOps.Plugin.Sandbox │ │ +│ │ │ │ +│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ +│ │ │ ProcessManager │ │ ResourceLimiter │ │ NetworkPolicy │ │ │ +│ │ │ - Spawn/kill │ │ - CPU/memory │ │ - Allow/block │ │ │ +│ │ │ - Health watch │ │ - Disk/network │ │ - Rate limit │ │ │ +│ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │ +│ │ │ │ +│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ +│ │ │ GrpcBridge │ │ SecretProxy │ │ LogCollector │ │ │ +│ │ │ - Method call │ │ - Vault access │ │ - Structured │ │ │ +│ │ │ - Streaming │ │ - Scoped access │ │ - Rate limited │ │ │ +│ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │ +│ └────────────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────────┘ + + REWORKED PLUGINS +┌─────────────────────────────────────────────────────────────────────────────────┐ +│ │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ Crypto │ │ Auth │ │ LLM │ │ SCM │ │ +│ │ Plugins │ │ Plugins │ │ Plugins │ │ Connectors │ │ +│ │ │ │ │ │ │ │ │ │ +│ │ - GOST │ │ - LDAP │ │ - llama │ │ - GitHub │ │ +│ │ - eIDAS │ │ - OIDC │ │ - ollama │ │ - GitLab │ │ +│ │ - SM2/3/4 │ │ - SAML │ │ - OpenAI │ │ - AzDO │ │ +│ │ - FIPS │ │ - Workforce │ │ - Claude │ │ - Gitea │ │ +│ │ - HSM │ │ │ │ │ │ │ │ +│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │ +│ │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ Scanner │ │ Router │ │ Concelier │ │ Future │ │ +│ │ Analyzers │ │ Transports │ │ Connectors │ │ Plugins │ │ +│ │ │ │ │ │ │ │ │ │ +│ │ - Go │ │ - TCP/TLS │ │ - NVD │ │ - Steps │ │ +│ │ - Java │ │ - UDP │ │ - OSV │ │ - Gates │ │ +│ │ - .NET │ │ - RabbitMQ │ │ - GHSA │ │ - CI │ │ +│ │ - Python │ │ - Valkey │ │ - Distros │ │ - Registry │ │ +│ │ - 7 more... │ │ │ │ │ │ - Vault │ │ +│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Sprint Structure + +| Sprint ID | Title | Working Directory | Status | Dependencies | +|-----------|-------|-------------------|--------|--------------| +| 100_001 | Plugin Abstractions Library | `src/Plugin/StellaOps.Plugin.Abstractions/` | TODO | None | +| 100_002 | Plugin Host & Lifecycle Manager | `src/Plugin/StellaOps.Plugin.Host/` | TODO | 100_001 | +| 100_003 | Plugin Registry (Database) | `src/Plugin/StellaOps.Plugin.Registry/` | TODO | 100_001, 100_002 | +| 100_004 | Plugin Sandbox Infrastructure | `src/Plugin/StellaOps.Plugin.Sandbox/` | TODO | 100_001, 100_002 | +| 100_005 | Crypto Plugin Rework | `src/Cryptography/` | TODO | 100_001, 100_002, 100_003 | +| 100_006 | Auth Plugin Rework | `src/Authority/` | TODO | 100_001, 100_002, 100_003 | +| 100_007 | LLM Provider Rework | `src/AdvisoryAI/` | TODO | 100_001, 100_002, 100_003 | +| 100_008 | SCM Connector Rework | `src/Integrations/` | TODO | 100_001, 100_002, 100_003 | +| 100_009 | Scanner Analyzer Rework | `src/Scanner/` | TODO | 100_001, 100_002, 100_003 | +| 100_010 | Router Transport Rework | `src/Router/` | TODO | 100_001, 100_002, 100_003 | +| 100_011 | Concelier Connector Rework | `src/Concelier/` | TODO | 100_001, 100_002, 100_003 | +| 100_012 | Plugin SDK & Developer Experience | `src/Plugin/StellaOps.Plugin.Sdk/` | TODO | All above | + +--- + +## Database Schema + +### Core Tables + +```sql +-- Platform-wide plugin registry +CREATE TABLE platform.plugins ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + plugin_id VARCHAR(255) NOT NULL, -- e.g., "com.stellaops.crypto.gost" + name VARCHAR(255) NOT NULL, + version VARCHAR(50) NOT NULL, -- SemVer + vendor VARCHAR(255) NOT NULL, + description TEXT, + license_id VARCHAR(50), -- SPDX identifier + + -- Trust and security + trust_level VARCHAR(50) NOT NULL CHECK (trust_level IN ('builtin', 'trusted', 'untrusted')), + signature BYTEA, -- Plugin signature for verification + signing_key_id VARCHAR(255), + + -- Capabilities (bitmask stored as array for queryability) + capabilities TEXT[] NOT NULL DEFAULT '{}', -- ['crypto', 'connector.scm', 'analysis'] + capability_details JSONB NOT NULL DEFAULT '{}', -- Detailed capability metadata + + -- Source and deployment + source VARCHAR(50) NOT NULL CHECK (source IN ('bundled', 'installed', 'discovered')), + assembly_path VARCHAR(500), + entry_point VARCHAR(255), -- Type name for activation + + -- Lifecycle + status VARCHAR(50) NOT NULL DEFAULT 'discovered' CHECK (status IN ( + 'discovered', 'loading', 'initializing', 'active', + 'degraded', 'stopping', 'stopped', 'failed', 'unloading' + )), + status_message TEXT, + + -- Health + health_status VARCHAR(50) DEFAULT 'unknown' CHECK (health_status IN ( + 'unknown', 'healthy', 'degraded', 'unhealthy' + )), + last_health_check TIMESTAMPTZ, + health_check_failures INT NOT NULL DEFAULT 0, + + -- Metadata + manifest JSONB, -- Full plugin manifest + runtime_info JSONB, -- Runtime metrics, resource usage + + -- Audit + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + loaded_at TIMESTAMPTZ, + + UNIQUE(plugin_id, version) +); + +-- Plugin capability registry (denormalized for fast queries) +CREATE TABLE platform.plugin_capabilities ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + plugin_id UUID NOT NULL REFERENCES platform.plugins(id) ON DELETE CASCADE, + + capability_type VARCHAR(100) NOT NULL, -- 'crypto', 'connector.scm', 'analysis.java' + capability_id VARCHAR(255) NOT NULL, -- 'sign', 'github', 'maven-analyzer' + + -- Capability-specific metadata + config_schema JSONB, -- JSON Schema for configuration + input_schema JSONB, -- Input contract + output_schema JSONB, -- Output contract + + -- Discovery metadata + display_name VARCHAR(255), + description TEXT, + documentation_url VARCHAR(500), + + -- Runtime + is_enabled BOOLEAN NOT NULL DEFAULT TRUE, + + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + + UNIQUE(plugin_id, capability_type, capability_id) +); + +-- Plugin instances for multi-tenant scenarios +CREATE TABLE platform.plugin_instances ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + plugin_id UUID NOT NULL REFERENCES platform.plugins(id) ON DELETE CASCADE, + tenant_id UUID REFERENCES platform.tenants(id) ON DELETE CASCADE, -- NULL = global instance + + instance_name VARCHAR(255), -- Optional friendly name + config JSONB NOT NULL DEFAULT '{}', -- Tenant-specific configuration + secrets_path VARCHAR(500), -- Vault path for secrets + + -- Instance state + enabled BOOLEAN NOT NULL DEFAULT TRUE, + status VARCHAR(50) NOT NULL DEFAULT 'pending', + + -- Resource allocation (for sandboxed plugins) + resource_limits JSONB, -- CPU, memory, network limits + + -- Usage tracking + last_used_at TIMESTAMPTZ, + invocation_count BIGINT NOT NULL DEFAULT 0, + error_count BIGINT NOT NULL DEFAULT 0, + + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + + UNIQUE(plugin_id, tenant_id, COALESCE(instance_name, '')) +); + +-- Plugin health history for trending +CREATE TABLE platform.plugin_health_history ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + plugin_id UUID NOT NULL REFERENCES platform.plugins(id) ON DELETE CASCADE, + + checked_at TIMESTAMPTZ NOT NULL DEFAULT now(), + status VARCHAR(50) NOT NULL, + response_time_ms INT, + details JSONB, + + -- Partition by time for efficient cleanup + created_at TIMESTAMPTZ NOT NULL DEFAULT now() +) PARTITION BY RANGE (created_at); + +-- Indexes +CREATE INDEX idx_plugins_status ON platform.plugins(status) WHERE status != 'active'; +CREATE INDEX idx_plugins_trust_level ON platform.plugins(trust_level); +CREATE INDEX idx_plugins_capabilities ON platform.plugins USING GIN (capabilities); +CREATE INDEX idx_plugin_capabilities_type ON platform.plugin_capabilities(capability_type); +CREATE INDEX idx_plugin_capabilities_lookup ON platform.plugin_capabilities(capability_type, capability_id); +CREATE INDEX idx_plugin_instances_tenant ON platform.plugin_instances(tenant_id) WHERE tenant_id IS NOT NULL; +CREATE INDEX idx_plugin_instances_enabled ON platform.plugin_instances(plugin_id, enabled) WHERE enabled = TRUE; +CREATE INDEX idx_plugin_health_history_plugin ON platform.plugin_health_history(plugin_id, checked_at DESC); +``` + +--- + +## Trust Model + +### Trust Level Determination + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ TRUST LEVEL DETERMINATION │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ Plugin Discovery │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ Is bundled with platform? │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ │ │ +│ YES NO │ +│ │ │ │ +│ ▼ ▼ │ +│ ┌─────────────────┐ ┌─────────────────────────────────────────┐ │ +│ │ TrustLevel. │ │ Has valid signature? │ │ +│ │ BuiltIn │ └─────────────────────────────────────────┘ │ +│ │ │ │ │ │ +│ │ - In-process │ YES NO │ +│ │ - No sandbox │ │ │ │ +│ │ - Full access │ ▼ ▼ │ +│ └─────────────────┘ ┌─────────────────────┐ ┌─────────────────────┐ │ +│ │ Signer in trusted │ │ TrustLevel. │ │ +│ │ vendor list? │ │ Untrusted │ │ +│ └─────────────────────┘ │ │ │ +│ │ │ │ - Process isolation│ │ +│ YES NO │ - Resource limits │ │ +│ │ │ │ - Network policy │ │ +│ ▼ ▼ │ - gRPC boundary │ │ +│ ┌─────────────────┐ │ └─────────────────────┘ │ +│ │ TrustLevel. │ │ │ +│ │ Trusted │◄───┘ │ +│ │ │ │ +│ │ - AppDomain │ │ +│ │ - Soft limits │ │ +│ │ - Monitored │ │ +│ └─────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### Capability-Based Access Control + +Each capability grants specific permissions: + +| Capability | Permissions Granted | +|------------|---------------------| +| `crypto` | Access to key material, signing operations | +| `network` | Outbound HTTP/gRPC calls (host allowlist) | +| `filesystem.read` | Read-only access to specified paths | +| `filesystem.write` | Write access to plugin workspace | +| `secrets` | Access to vault secrets (scoped by policy) | +| `database` | Database connections (scoped by schema) | +| `process` | Spawn child processes (sandboxed only) | + +--- + +## Plugin Lifecycle + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ PLUGIN LIFECYCLE STATE MACHINE │ +│ │ +│ ┌──────────────┐ │ +│ │ Discovered │ │ +│ └──────┬───────┘ │ +│ │ load() │ +│ ▼ │ +│ ┌──────────────┐ │ +│ │ Loading │ │ +│ └──────┬───────┘ │ +│ │ assembly loaded │ +│ ▼ │ +│ ┌──────────────┐ │ +│ │ Initializing │ │ +│ └──────┬───────┘ │ +│ ┌──────────────┼──────────────┐ │ +│ │ success │ │ failure │ +│ ▼ │ ▼ │ +│ ┌──────────────┐ │ ┌──────────────┐ │ +│ │ Active │ │ │ Failed │ │ +│ └──────┬───────┘ │ └──────┬───────┘ │ +│ │ │ │ │ +│ ┌─────────────┼─────────────┐│ │ retry() │ +│ │ │ ││ │ │ +│ health fail stop() health degrade ▼ │ +│ │ │ ││ ┌──────────────┐ │ +│ ▼ │ ▼│ │ Loading │ (retry) │ +│ ┌──────────────┐ │ ┌──────────────┐└──────────────┘ │ +│ │ Unhealthy │ │ │ Degraded │ │ +│ └──────┬───────┘ │ └──────┬───────┘ │ +│ │ │ │ │ +│ auto-recover │ health ok │ +│ │ │ │ │ +│ └─────────────┼─────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌──────────────┐ │ +│ │ Stopping │ │ +│ └──────┬───────┘ │ +│ │ cleanup complete │ +│ ▼ │ +│ ┌──────────────┐ │ +│ │ Stopped │ │ +│ └──────┬───────┘ │ +│ │ unload() │ +│ ▼ │ +│ ┌──────────────┐ │ +│ │ Unloading │ │ +│ └──────┬───────┘ │ +│ │ resources freed │ +│ ▼ │ +│ ┌──────────────┐ │ +│ │ (removed) │ │ +│ └──────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Migration Strategy + +### Phase Approach + +Each plugin type migration follows the same pattern: + +1. **Create New Implementation** - Implement `IPlugin` + capability interfaces +2. **Parallel Operation** - Both old and new implementations active +3. **Feature Parity Validation** - Automated tests verify identical behavior +4. **Gradual Cutover** - Configuration flag switches to new implementation +5. **Deprecation** - Old interfaces marked deprecated +6. **Removal** - Old implementations removed after transition period + +### Breaking Change Policy + +- **Internal interfaces** - Can be changed; update all internal consumers +- **Plugin SDK** - Maintain backward compatibility for one major version +- **Configuration** - Provide migration tooling for config format changes +- **Database** - Always use migrations; never break existing data + +--- + +## Deliverables Summary + +### Libraries Created + +| Library | Purpose | NuGet Package | +|---------|---------|---------------| +| `StellaOps.Plugin.Abstractions` | Core interfaces | `StellaOps.Plugin.Abstractions` | +| `StellaOps.Plugin.Host` | Plugin hosting | `StellaOps.Plugin.Host` | +| `StellaOps.Plugin.Registry` | Database registry | Internal | +| `StellaOps.Plugin.Sandbox` | Process isolation | Internal | +| `StellaOps.Plugin.Sdk` | Plugin development | `StellaOps.Plugin.Sdk` | +| `StellaOps.Plugin.Testing` | Test infrastructure | `StellaOps.Plugin.Testing` | + +### Plugins Reworked + +| Plugin Type | Count | Capability Interface | +|-------------|-------|----------------------| +| Crypto | 5 | `ICryptoCapability` | +| Auth | 4 | `IAuthCapability` | +| LLM | 4 | `ILlmCapability` | +| SCM | 4 | `IScmCapability` | +| Scanner | 11 | `IAnalysisCapability` | +| Router | 4 | `ITransportCapability` | +| Concelier | 8+ | `IFeedCapability` | + +--- + +## Success Criteria + +### Functional Requirements + +- [ ] All existing plugin functionality preserved +- [ ] All plugins implement unified `IPlugin` interface +- [ ] Database registry tracks all plugins +- [ ] Health checks report accurate status +- [ ] Trust levels correctly enforced +- [ ] Sandboxing works for untrusted plugins + +### Non-Functional Requirements + +- [ ] Plugin load time < 500ms (in-process) +- [ ] Plugin load time < 2s (sandboxed) +- [ ] Health check latency < 100ms +- [ ] No memory leaks in plugin lifecycle +- [ ] Graceful shutdown completes in < 10s + +### Quality Requirements + +- [ ] Unit test coverage >= 80% +- [ ] Integration test coverage >= 70% +- [ ] All public APIs documented +- [ ] Migration guide for each plugin type + +--- + +## Risk Assessment + +| Risk | Impact | Likelihood | Mitigation | +|------|--------|------------|------------| +| Breaking existing integrations | High | Medium | Comprehensive testing, gradual rollout | +| Performance regression | Medium | Low | Benchmarking, profiling | +| Sandbox escape vulnerability | Critical | Low | Security audit, penetration testing | +| Migration complexity | Medium | Medium | Clear documentation, tooling | +| Timeline overrun | Medium | Medium | Parallel workstreams, MVP scope | + +--- + +## Dependencies + +### External Dependencies + +| Dependency | Version | Purpose | +|------------|---------|---------| +| .NET 10 | Latest | Runtime | +| gRPC | 2.x | Sandbox communication | +| Npgsql | 8.x | Database access | +| System.Text.Json | Built-in | Manifest parsing | + +### Internal Dependencies + +| Dependency | Purpose | +|------------|---------| +| `StellaOps.Infrastructure.Postgres` | Database utilities | +| `StellaOps.Telemetry` | Logging, metrics | +| `StellaOps.HybridLogicalClock` | Event ordering | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Phase 100 index created | diff --git a/docs/implplan/SPRINT_20260110_100_000_INDEX_release_orchestrator.md b/docs/implplan/SPRINT_20260110_100_000_INDEX_release_orchestrator.md new file mode 100644 index 000000000..f3904de0a --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_000_INDEX_release_orchestrator.md @@ -0,0 +1,326 @@ +# SPRINT INDEX: Release Orchestrator Implementation + +> **Epic:** Stella Ops Suite - Release Control Plane +> **Batch:** 100 +> **Status:** Planning +> **Created:** 10-Jan-2026 +> **Source:** [Architecture Specification](../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) + +--- + +## Overview + +This sprint batch implements the **Release Orchestrator** - transforming Stella Ops from a vulnerability scanning platform into **Stella Ops Suite**, a unified release control plane for non-Kubernetes container environments. + +### Business Value + +- **Unified release governance:** Single pane of glass for release lifecycle +- **Audit-grade evidence:** Cryptographically signed proof of every decision +- **Security as a gate:** Reachability-aware scanning integrated into promotion flow +- **Plugin extensibility:** Support for any SCM, CI, registry, and vault +- **Non-K8s first:** Docker, Compose, ECS, Nomad deployment targets + +### Key Principles + +1. **Digest-first release identity** - Releases are immutable OCI digests, not tags +2. **Evidence for every decision** - Every promotion/deployment produces sealed evidence +3. **Pluggable everything, stable core** - Integrations are plugins; core is stable +4. **No feature gating** - All plans include all features +5. **Offline-first operation** - Core works in air-gapped environments +6. **Immutable generated artifacts** - Every deployment generates stored artifacts + +--- + +## Implementation Phases + +| Phase | Batch | Title | Description | Duration Est. | +|-------|-------|-------|-------------|---------------| +| 1 | 101 | Foundation | Database schema, plugin infrastructure | Foundation | +| 2 | 102 | Integration Hub | Connector runtime, built-in integrations | Foundation | +| 3 | 103 | Environment Manager | Environments, targets, agent registration | Core | +| 4 | 104 | Release Manager | Components, versions, release bundles | Core | +| 5 | 105 | Workflow Engine | DAG execution, step registry | Core | +| 6 | 106 | Promotion & Gates | Approvals, security gates, decisions | Core | +| 7 | 107 | Deployment Execution | Deploy orchestrator, artifact generation | Core | +| 8 | 108 | Agents | Docker, Compose, SSH, WinRM agents | Deployment | +| 9 | 109 | Evidence & Audit | Evidence packets, version stickers | Audit | +| 10 | 110 | Progressive Delivery | A/B releases, canary, traffic routing | Advanced | +| 11 | 111 | UI Implementation | Dashboard, workflow editor, screens | Frontend | + +--- + +## Module Dependencies + +``` + ┌──────────────┐ + │ AUTHORITY │ (existing) + └──────┬───────┘ + │ + ┌──────────────────┼──────────────────┐ + │ │ │ + ▼ ▼ ▼ +┌───────────────┐ ┌───────────────┐ ┌───────────────┐ +│ PLUGIN │ │ INTHUB │ │ ENVMGR │ +│ (Batch 101) │ │ (Batch 102) │ │ (Batch 103) │ +└───────┬───────┘ └───────┬───────┘ └───────┬───────┘ + │ │ │ + └──────────┬───────┴──────────────────┘ + │ + ▼ + ┌───────────────┐ + │ RELMAN │ + │ (Batch 104) │ + └───────┬───────┘ + │ + ▼ + ┌───────────────┐ + │ WORKFL │ + │ (Batch 105) │ + └───────┬───────┘ + │ + ┌──────────┴──────────┐ + │ │ + ▼ ▼ +┌───────────────┐ ┌───────────────┐ +│ PROMOT │ │ DEPLOY │ +│ (Batch 106) │ │ (Batch 107) │ +└───────┬───────┘ └───────┬───────┘ + │ │ + │ ▼ + │ ┌───────────────┐ + │ │ AGENTS │ + │ │ (Batch 108) │ + │ └───────┬───────┘ + │ │ + └──────────┬──────────┘ + │ + ▼ + ┌───────────────┐ + │ RELEVI │ + │ (Batch 109) │ + └───────┬───────┘ + │ + ▼ + ┌───────────────┐ + │ PROGDL │ + │ (Batch 110) │ + └───────────────┘ +``` + +--- + +## Sprint Structure + +### Phase 1: Foundation (Batch 101) + +| Sprint ID | Title | Module | Dependencies | +|-----------|-------|--------|--------------| +| 101_001 | Database Schema - Core Tables | DB | - | +| 101_002 | Plugin Registry | PLUGIN | 101_001 | +| 101_003 | Plugin Loader & Sandbox | PLUGIN | 101_002 | +| 101_004 | Plugin SDK | PLUGIN | 101_003 | + +### Phase 2: Integration Hub (Batch 102) + +| Sprint ID | Title | Module | Dependencies | +|-----------|-------|--------|--------------| +| 102_001 | Integration Manager | INTHUB | 101_002 | +| 102_002 | Connector Runtime | INTHUB | 102_001 | +| 102_003 | Built-in SCM Connectors | INTHUB | 102_002 | +| 102_004 | Built-in Registry Connectors | INTHUB | 102_002 | +| 102_005 | Built-in Vault Connector | INTHUB | 102_002 | +| 102_006 | Doctor Checks | INTHUB | 102_002 | + +### Phase 3: Environment Manager (Batch 103) + +| Sprint ID | Title | Module | Dependencies | +|-----------|-------|--------|--------------| +| 103_001 | Environment CRUD | ENVMGR | 101_001 | +| 103_002 | Target Registry | ENVMGR | 103_001 | +| 103_003 | Agent Manager - Core | ENVMGR | 103_002 | +| 103_004 | Inventory Sync | ENVMGR | 103_002, 103_003 | + +### Phase 4: Release Manager (Batch 104) + +| Sprint ID | Title | Module | Dependencies | +|-----------|-------|--------|--------------| +| 104_001 | Component Registry | RELMAN | 102_004 | +| 104_002 | Version Manager | RELMAN | 104_001 | +| 104_003 | Release Manager | RELMAN | 104_002 | +| 104_004 | Release Catalog | RELMAN | 104_003 | + +### Phase 5: Workflow Engine (Batch 105) + +| Sprint ID | Title | Module | Dependencies | +|-----------|-------|--------|--------------| +| 105_001 | Workflow Template Designer | WORKFL | 101_001 | +| 105_002 | Step Registry | WORKFL | 101_002 | +| 105_003 | Workflow Engine - DAG Executor | WORKFL | 105_001, 105_002 | +| 105_004 | Step Executor | WORKFL | 105_003 | +| 105_005 | Built-in Steps | WORKFL | 105_004 | + +### Phase 6: Promotion & Gates (Batch 106) + +| Sprint ID | Title | Module | Dependencies | +|-----------|-------|--------|--------------| +| 106_001 | Promotion Manager | PROMOT | 104_003, 103_001 | +| 106_002 | Approval Gateway | PROMOT | 106_001 | +| 106_003 | Gate Registry | PROMOT | 106_001 | +| 106_004 | Security Gate | PROMOT | 106_003 | +| 106_005 | Decision Engine | PROMOT | 106_002, 106_003 | + +### Phase 7: Deployment Execution (Batch 107) + +| Sprint ID | Title | Module | Dependencies | +|-----------|-------|--------|--------------| +| 107_001 | Deploy Orchestrator | DEPLOY | 105_003, 106_005 | +| 107_002 | Target Executor | DEPLOY | 107_001, 103_002 | +| 107_003 | Artifact Generator | DEPLOY | 107_001 | +| 107_004 | Rollback Manager | DEPLOY | 107_002 | +| 107_005 | Deployment Strategies | DEPLOY | 107_002 | + +### Phase 8: Agents (Batch 108) + +| Sprint ID | Title | Module | Dependencies | +|-----------|-------|--------|--------------| +| 108_001 | Agent Core Runtime | AGENTS | 103_003 | +| 108_002 | Agent - Docker | AGENTS | 108_001 | +| 108_003 | Agent - Compose | AGENTS | 108_002 | +| 108_004 | Agent - SSH | AGENTS | 108_001 | +| 108_005 | Agent - WinRM | AGENTS | 108_001 | + +### Phase 9: Evidence & Audit (Batch 109) + +| Sprint ID | Title | Module | Dependencies | +|-----------|-------|--------|--------------| +| 109_001 | Evidence Collector | RELEVI | 106_005, 107_001 | +| 109_002 | Evidence Signer | RELEVI | 109_001 | +| 109_003 | Version Sticker Writer | RELEVI | 107_002 | +| 109_004 | Audit Exporter | RELEVI | 109_002 | + +### Phase 10: Progressive Delivery (Batch 110) + +| Sprint ID | Title | Module | Dependencies | +|-----------|-------|--------|--------------| +| 110_001 | A/B Release Manager | PROGDL | 107_005 | +| 110_002 | Traffic Router Framework | PROGDL | 110_001 | +| 110_003 | Canary Controller | PROGDL | 110_002 | +| 110_004 | Router Plugin - Nginx | PROGDL | 110_002 | + +### Phase 11: UI Implementation (Batch 111) + +| Sprint ID | Title | Module | Dependencies | +|-----------|-------|--------|--------------| +| 111_001 | Dashboard - Overview | FE | 107_001 | +| 111_002 | Environment Management UI | FE | 103_001 | +| 111_003 | Release Management UI | FE | 104_003 | +| 111_004 | Workflow Editor | FE | 105_001 | +| 111_005 | Promotion & Approval UI | FE | 106_001 | +| 111_006 | Deployment Monitoring UI | FE | 107_001 | +| 111_007 | Evidence Viewer | FE | 109_002 | + +--- + +## Documentation References + +All architecture documentation is available in: + +``` +docs/modules/release-orchestrator/ +├── README.md # Entry point +├── design/ +│ ├── principles.md # Design principles +│ └── decisions.md # ADRs +├── modules/ +│ ├── overview.md # Module landscape +│ ├── integration-hub.md # INTHUB spec +│ ├── environment-manager.md # ENVMGR spec +│ ├── release-manager.md # RELMAN spec +│ ├── workflow-engine.md # WORKFL spec +│ ├── promotion-manager.md # PROMOT spec +│ ├── deploy-orchestrator.md # DEPLOY spec +│ ├── agents.md # AGENTS spec +│ ├── progressive-delivery.md # PROGDL spec +│ ├── evidence.md # RELEVI spec +│ └── plugin-system.md # PLUGIN spec +├── data-model/ +│ ├── schema.md # PostgreSQL schema +│ └── entities.md # Entity definitions +├── api/ +│ └── overview.md # API design +├── workflow/ +│ ├── templates.md # Template spec +│ ├── execution.md # Execution state machine +│ └── promotion.md # Promotion state machine +├── security/ +│ ├── overview.md # Security architecture +│ ├── auth.md # AuthN/AuthZ +│ ├── agent-security.md # Agent security +│ └── threat-model.md # Threat model +├── deployment/ +│ ├── overview.md # Deployment architecture +│ ├── strategies.md # Deployment strategies +│ └── artifacts.md # Artifact generation +├── integrations/ +│ ├── overview.md # Integration types +│ ├── connectors.md # Connector interface +│ ├── webhooks.md # Webhook architecture +│ └── ci-cd.md # CI/CD patterns +├── operations/ +│ ├── overview.md # Observability +│ └── metrics.md # Prometheus metrics +├── ui/ +│ └── overview.md # UI specification +└── appendices/ + ├── glossary.md # Terms + ├── errors.md # Error codes + └── evidence-schema.md # Evidence format +``` + +--- + +## Technology Stack + +| Layer | Technology | +|-------|------------| +| Backend | .NET 10, C# preview | +| Database | PostgreSQL 16+ | +| Message Queue | RabbitMQ / Valkey | +| Frontend | Angular 17 | +| Agent Runtime | .NET AOT | +| Plugin Runtime | gRPC, container sandbox | +| Observability | OpenTelemetry, Prometheus | + +--- + +## Risk Register + +| Risk | Impact | Mitigation | +|------|--------|------------| +| Plugin security vulnerabilities | High | Sandbox isolation, capability restrictions | +| Agent compromise | High | mTLS, short-lived credentials, audit | +| Evidence tampering | High | Append-only DB, cryptographic signing | +| Registry unavailability | Medium | Connection pooling, caching, fallbacks | +| Complex workflow failures | Medium | Comprehensive testing, rollback support | + +--- + +## Success Criteria + +- [ ] Complete database schema for all 10 themes +- [ ] Plugin system supports connector, step, gate types +- [ ] At least 2 built-in connectors per integration type +- [ ] Environment → Release → Promotion → Deploy flow works E2E +- [ ] Evidence packet generated for every deployment +- [ ] Agent deploys to Docker and Compose targets +- [ ] UI shows pipeline overview, approval queues, deployment logs +- [ ] Performance: <500ms API P99, <5min deployment for 10 targets + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint index created | +| | Architecture documentation complete | diff --git a/docs/implplan/SPRINT_20260110_100_001_PLUGIN_abstractions.md b/docs/implplan/SPRINT_20260110_100_001_PLUGIN_abstractions.md new file mode 100644 index 000000000..e49a31d55 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_001_PLUGIN_abstractions.md @@ -0,0 +1,1514 @@ +# SPRINT: Plugin Abstractions Library + +> **Sprint ID:** 100_001 +> **Module:** PLUGIN +> **Phase:** 100 - Plugin System Unification +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_plugin_unification.md) + +--- + +## Overview + +Create the foundational abstractions library that defines all plugin interfaces, types, and contracts. This library is the cornerstone of the unified plugin architecture and must be carefully designed for long-term stability. + +### Objectives + +- Define core `IPlugin` interface and supporting types +- Define capability interfaces for all plugin types +- Define plugin context and lifecycle contracts +- Define configuration and manifest models +- Ensure backward-compatible extensibility patterns +- Create comprehensive XML documentation + +### Working Directory + +``` +src/Plugin/ +├── StellaOps.Plugin.Abstractions/ +│ ├── StellaOps.Plugin.Abstractions.csproj +│ ├── IPlugin.cs +│ ├── PluginInfo.cs +│ ├── PluginTrustLevel.cs +│ ├── PluginCapabilities.cs +│ ├── PluginStatus.cs +│ ├── Context/ +│ │ ├── IPluginContext.cs +│ │ ├── IPluginConfiguration.cs +│ │ ├── IPluginLogger.cs +│ │ └── IPluginServices.cs +│ ├── Lifecycle/ +│ │ ├── PluginLifecycleState.cs +│ │ ├── IPluginLifecycle.cs +│ │ └── PluginLifecycleException.cs +│ ├── Health/ +│ │ ├── HealthCheckResult.cs +│ │ ├── HealthStatus.cs +│ │ └── IHealthCheckable.cs +│ ├── Capabilities/ +│ │ ├── ICryptoCapability.cs +│ │ ├── IAuthCapability.cs +│ │ ├── ILlmCapability.cs +│ │ ├── IConnectorCapability.cs +│ │ ├── IScmCapability.cs +│ │ ├── IRegistryCapability.cs +│ │ ├── IAnalysisCapability.cs +│ │ ├── ITransportCapability.cs +│ │ ├── IFeedCapability.cs +│ │ ├── IStepProviderCapability.cs +│ │ └── IGateProviderCapability.cs +│ ├── Manifest/ +│ │ ├── PluginManifest.cs +│ │ ├── ManifestCapabilityDeclaration.cs +│ │ ├── ManifestDependency.cs +│ │ └── ManifestResourceRequirements.cs +│ ├── Execution/ +│ │ ├── ExecutionContext.cs +│ │ ├── IExecutionBoundary.cs +│ │ └── ExecutionResult.cs +│ └── Attributes/ +│ ├── PluginAttribute.cs +│ ├── CapabilityAttribute.cs +│ ├── RequiresCapabilityAttribute.cs +│ └── PluginVersionAttribute.cs +└── __Tests/ + └── StellaOps.Plugin.Abstractions.Tests/ + ├── StellaOps.Plugin.Abstractions.Tests.csproj + ├── PluginInfoTests.cs + ├── PluginCapabilitiesTests.cs + ├── HealthCheckResultTests.cs + └── ManifestTests.cs +``` + +--- + +## Deliverables + +### Core Plugin Interface + +```csharp +// IPlugin.cs +namespace StellaOps.Plugin.Abstractions; + +/// +/// Core interface that all Stella Ops plugins must implement. +/// Plugins provide one or more capabilities to the platform. +/// +/// +/// +/// The plugin lifecycle follows these phases: +/// +/// Discovery - Plugin assembly found and manifest parsed +/// Loading - Assembly loaded, types resolved +/// Initialization - called with context +/// Active - Plugin servicing requests +/// Shutdown - called for cleanup +/// +/// +/// +/// Plugins declare their trust level, which determines execution context: +/// +/// - Runs in-process, full access +/// - Runs isolated, monitored +/// - Runs sandboxed, restricted +/// +/// +/// +public interface IPlugin : IAsyncDisposable +{ + /// + /// Unique plugin metadata including ID, version, and vendor. + /// + PluginInfo Info { get; } + + /// + /// Trust level determines the execution environment. + /// Bundled plugins return . + /// Third-party plugins typically return . + /// + PluginTrustLevel TrustLevel { get; } + + /// + /// Capabilities this plugin provides. Used for discovery and routing. + /// + PluginCapabilities Capabilities { get; } + + /// + /// Current lifecycle state of the plugin. + /// + PluginLifecycleState State { get; } + + /// + /// Initialize the plugin with the provided context. + /// Called once after loading, before any capability methods. + /// + /// Provides configuration, logging, and service access. + /// Cancellation token for initialization timeout. + /// If initialization fails. + Task InitializeAsync(IPluginContext context, CancellationToken ct); + + /// + /// Perform a health check to verify plugin is functioning correctly. + /// Called periodically by the plugin host. + /// + /// Cancellation token for health check timeout. + /// Health check result with status and optional diagnostics. + Task HealthCheckAsync(CancellationToken ct); +} +``` + +### Plugin Info + +```csharp +// PluginInfo.cs +namespace StellaOps.Plugin.Abstractions; + +/// +/// Immutable metadata identifying a plugin. +/// +/// +/// Reverse domain notation identifier, e.g., "com.stellaops.crypto.gost". +/// Must be unique across all plugins. +/// +/// Human-readable display name. +/// Semantic version string (Major.Minor.Patch[-PreRelease]). +/// Organization or individual that created the plugin. +/// Optional description of plugin functionality. +/// Optional SPDX license identifier. +/// Optional URL to project homepage or repository. +/// Optional URL to plugin icon (64x64 PNG recommended). +public sealed record PluginInfo( + string Id, + string Name, + string Version, + string Vendor, + string? Description = null, + string? LicenseId = null, + string? ProjectUrl = null, + string? IconUrl = null) +{ + /// + /// Validates the plugin info and throws if invalid. + /// + /// If any required field is invalid. + public void Validate() + { + if (string.IsNullOrWhiteSpace(Id)) + throw new ArgumentException("Plugin ID is required", nameof(Id)); + + if (!PluginIdPattern().IsMatch(Id)) + throw new ArgumentException( + "Plugin ID must be reverse domain notation (e.g., com.example.myplugin)", + nameof(Id)); + + if (string.IsNullOrWhiteSpace(Name)) + throw new ArgumentException("Plugin name is required", nameof(Name)); + + if (string.IsNullOrWhiteSpace(Version)) + throw new ArgumentException("Plugin version is required", nameof(Version)); + + if (!SemVerPattern().IsMatch(Version)) + throw new ArgumentException( + "Plugin version must be valid SemVer (e.g., 1.0.0 or 1.0.0-beta.1)", + nameof(Version)); + + if (string.IsNullOrWhiteSpace(Vendor)) + throw new ArgumentException("Plugin vendor is required", nameof(Vendor)); + } + + /// + /// Parses the version string into a comparable . + /// Pre-release suffixes are stripped for comparison. + /// + public Version ParsedVersion => + Version.Parse(Version.Split('-')[0]); + + [GeneratedRegex(@"^[a-z][a-z0-9]*(\.[a-z][a-z0-9]*)+$", RegexOptions.Compiled)] + private static partial Regex PluginIdPattern(); + + [GeneratedRegex(@"^\d+\.\d+\.\d+(-[a-zA-Z0-9\.]+)?$", RegexOptions.Compiled)] + private static partial Regex SemVerPattern(); +} +``` + +### Trust Level + +```csharp +// PluginTrustLevel.cs +namespace StellaOps.Plugin.Abstractions; + +/// +/// Trust level determines plugin execution environment and permissions. +/// +public enum PluginTrustLevel +{ + /// + /// Plugin is bundled with the platform and fully trusted. + /// Executes in-process with full access to platform internals. + /// No sandboxing or resource limits applied. + /// + BuiltIn = 0, + + /// + /// Plugin is signed by a trusted vendor. + /// Executes with moderate isolation (AssemblyLoadContext). + /// Soft resource limits applied, behavior monitored. + /// + Trusted = 1, + + /// + /// Plugin is from an unknown or untrusted source. + /// Executes in isolated process with full sandboxing. + /// Hard resource limits, network restrictions, filesystem isolation. + /// Communication via gRPC over Unix domain socket. + /// + Untrusted = 2 +} + +/// +/// Extension methods for . +/// +public static class PluginTrustLevelExtensions +{ + /// + /// Returns true if the plugin requires process isolation. + /// + public static bool RequiresProcessIsolation(this PluginTrustLevel level) => + level == PluginTrustLevel.Untrusted; + + /// + /// Returns true if the plugin should have resource limits enforced. + /// + public static bool HasResourceLimits(this PluginTrustLevel level) => + level >= PluginTrustLevel.Trusted; + + /// + /// Returns true if the plugin can access platform internals directly. + /// + public static bool CanAccessInternals(this PluginTrustLevel level) => + level == PluginTrustLevel.BuiltIn; +} +``` + +### Plugin Capabilities + +```csharp +// PluginCapabilities.cs +namespace StellaOps.Plugin.Abstractions; + +/// +/// Flags indicating plugin capabilities. Plugins may provide multiple capabilities. +/// +[Flags] +public enum PluginCapabilities : long +{ + /// No capabilities declared. + None = 0, + + // ========== Core Platform Capabilities (bits 0-9) ========== + + /// Cryptographic operations (signing, verification, encryption). + Crypto = 1L << 0, + + /// Authentication and authorization. + Auth = 1L << 1, + + /// Large language model inference. + Llm = 1L << 2, + + /// General secret management. + Secrets = 1L << 3, + + // ========== Connector Capabilities (bits 10-19) ========== + + /// Source control management (GitHub, GitLab, etc.). + Scm = 1L << 10, + + /// Container registry operations. + Registry = 1L << 11, + + /// Continuous integration systems. + Ci = 1L << 12, + + /// Secret vault integration (HashiCorp, Azure KeyVault, etc.). + Vault = 1L << 13, + + /// Notification delivery (email, Slack, Teams, webhooks). + Notification = 1L << 14, + + /// Issue tracking systems (Jira, GitHub Issues, etc.). + IssueTracker = 1L << 15, + + // ========== Analysis Capabilities (bits 20-29) ========== + + /// Source code or binary analysis. + Analysis = 1L << 20, + + /// Vulnerability feed ingestion. + Feed = 1L << 21, + + /// SBOM generation or parsing. + Sbom = 1L << 22, + + // ========== Infrastructure Capabilities (bits 30-39) ========== + + /// Message transport (TCP, UDP, AMQP, etc.). + Transport = 1L << 30, + + /// Network access required. + Network = 1L << 31, + + /// Filesystem read access. + FilesystemRead = 1L << 32, + + /// Filesystem write access. + FilesystemWrite = 1L << 33, + + /// Process spawning. + Process = 1L << 34, + + // ========== Orchestrator Capabilities (bits 40-49) ========== + + /// Workflow step provider. + StepProvider = 1L << 40, + + /// Promotion gate provider. + GateProvider = 1L << 41, + + /// Deployment target provider. + TargetProvider = 1L << 42, + + /// Evidence collector. + EvidenceProvider = 1L << 43, + + // ========== Composite Capabilities ========== + + /// All connector capabilities. + AllConnectors = Scm | Registry | Ci | Vault | Notification | IssueTracker, + + /// All orchestrator capabilities. + AllOrchestrator = StepProvider | GateProvider | TargetProvider | EvidenceProvider, + + /// All infrastructure capabilities. + AllInfrastructure = Transport | Network | FilesystemRead | FilesystemWrite | Process +} + +/// +/// Extension methods for . +/// +public static class PluginCapabilitiesExtensions +{ + /// + /// Returns true if the plugin has the specified capability. + /// + public static bool Has(this PluginCapabilities capabilities, PluginCapabilities capability) => + (capabilities & capability) == capability; + + /// + /// Returns true if the plugin has any of the specified capabilities. + /// + public static bool HasAny(this PluginCapabilities capabilities, PluginCapabilities any) => + (capabilities & any) != 0; + + /// + /// Converts capabilities to string array for database storage. + /// + public static string[] ToStringArray(this PluginCapabilities capabilities) + { + var result = new List(); + foreach (PluginCapabilities value in Enum.GetValues()) + { + if (value != PluginCapabilities.None && + !value.ToString().StartsWith("All") && + capabilities.Has(value)) + { + result.Add(value.ToString().ToLowerInvariant()); + } + } + return result.ToArray(); + } + + /// + /// Parses capability strings back to flags. + /// + public static PluginCapabilities FromStringArray(string[] capabilities) + { + var result = PluginCapabilities.None; + foreach (var cap in capabilities) + { + if (Enum.TryParse(cap, ignoreCase: true, out var parsed)) + { + result |= parsed; + } + } + return result; + } +} +``` + +### Plugin Context + +```csharp +// Context/IPluginContext.cs +namespace StellaOps.Plugin.Abstractions.Context; + +/// +/// Provides access to platform services and configuration during plugin execution. +/// Passed to and available throughout plugin lifetime. +/// +public interface IPluginContext +{ + /// + /// Plugin-specific configuration bound from manifest and runtime settings. + /// + IPluginConfiguration Configuration { get; } + + /// + /// Scoped logger for plugin diagnostics. Automatically tagged with plugin ID. + /// + IPluginLogger Logger { get; } + + /// + /// Service locator for accessing platform services. + /// Available services depend on plugin trust level and declared capabilities. + /// + IPluginServices Services { get; } + + /// + /// Current tenant ID, if plugin is running in tenant context. + /// Null for global plugins. + /// + Guid? TenantId { get; } + + /// + /// Unique instance ID for this plugin activation. + /// + Guid InstanceId { get; } + + /// + /// Cancellation token that fires when shutdown is requested. + /// Plugins should monitor this and clean up gracefully. + /// + CancellationToken ShutdownToken { get; } + + /// + /// Time provider for deterministic time operations. + /// + TimeProvider TimeProvider { get; } +} + +/// +/// Plugin-specific configuration access. +/// +public interface IPluginConfiguration +{ + /// + /// Gets a configuration value by key. + /// + /// Target type for conversion. + /// Configuration key (dot-separated path). + /// Default if key not found. + T? GetValue(string key, T? defaultValue = default); + + /// + /// Binds configuration section to a strongly-typed options class. + /// + /// Options type with properties matching config keys. + /// Section key, or null for root. + T Bind(string? sectionKey = null) where T : class, new(); + + /// + /// Gets a secret value from the configured vault. + /// + /// Name of the secret. + /// Cancellation token. + /// Secret value, or null if not found. + Task GetSecretAsync(string secretName, CancellationToken ct); +} + +/// +/// Plugin logging interface with structured logging support. +/// +public interface IPluginLogger +{ + void Log(LogLevel level, string message, params object[] args); + void Log(LogLevel level, Exception exception, string message, params object[] args); + + void Debug(string message, params object[] args) => Log(LogLevel.Debug, message, args); + void Info(string message, params object[] args) => Log(LogLevel.Information, message, args); + void Warning(string message, params object[] args) => Log(LogLevel.Warning, message, args); + void Error(string message, params object[] args) => Log(LogLevel.Error, message, args); + void Error(Exception ex, string message, params object[] args) => Log(LogLevel.Error, ex, message, args); + + /// + /// Creates a scoped logger with additional properties. + /// + IPluginLogger WithProperty(string name, object value); + + /// + /// Creates a scoped logger for a specific operation. + /// + IPluginLogger ForOperation(string operationName); +} + +/// +/// Service locator for accessing platform services from plugins. +/// +public interface IPluginServices +{ + /// + /// Gets a required service. Throws if not available. + /// + /// Service type. + /// If service not available. + T GetRequiredService() where T : class; + + /// + /// Gets an optional service. Returns null if not available. + /// + T? GetService() where T : class; + + /// + /// Gets all registered implementations of a service. + /// + IEnumerable GetServices() where T : class; + + /// + /// Creates a scoped service provider for the current operation. + /// Dispose the scope when operation completes. + /// + IAsyncDisposable CreateScope(out IPluginServices scopedServices); +} +``` + +### Health Check + +```csharp +// Health/HealthCheckResult.cs +namespace StellaOps.Plugin.Abstractions.Health; + +/// +/// Result of a plugin health check. +/// +/// Overall health status. +/// Optional message describing status. +/// Time taken to perform health check. +/// Additional diagnostic details. +public sealed record HealthCheckResult( + HealthStatus Status, + string? Message = null, + TimeSpan? Duration = null, + IReadOnlyDictionary? Details = null) +{ + /// + /// Creates a healthy result. + /// + public static HealthCheckResult Healthy(string? message = null) => + new(HealthStatus.Healthy, message); + + /// + /// Creates a degraded result (functioning but impaired). + /// + public static HealthCheckResult Degraded(string message, IReadOnlyDictionary? details = null) => + new(HealthStatus.Degraded, message, Details: details); + + /// + /// Creates an unhealthy result. + /// + public static HealthCheckResult Unhealthy(string message, IReadOnlyDictionary? details = null) => + new(HealthStatus.Unhealthy, message, Details: details); + + /// + /// Creates an unhealthy result from an exception. + /// + public static HealthCheckResult Unhealthy(Exception exception) => + new(HealthStatus.Unhealthy, exception.Message, Details: new Dictionary + { + ["exceptionType"] = exception.GetType().Name, + ["stackTrace"] = exception.StackTrace ?? string.Empty + }); +} + +/// +/// Health status values. +/// +public enum HealthStatus +{ + /// Plugin is healthy and fully operational. + Healthy = 0, + + /// Plugin is functioning but with degraded performance or partial failures. + Degraded = 1, + + /// Plugin is not functioning and cannot service requests. + Unhealthy = 2 +} +``` + +### Capability Interfaces - Crypto + +```csharp +// Capabilities/ICryptoCapability.cs +namespace StellaOps.Plugin.Abstractions.Capabilities; + +/// +/// Capability interface for cryptographic operations. +/// Implemented by plugins providing signing, verification, encryption, or hashing. +/// +public interface ICryptoCapability +{ + /// + /// Algorithms supported by this provider. + /// Format: "{family}-{variant}" e.g., "RSA-SHA256", "ECDSA-P256", "GOST-R34.10-2012". + /// + IReadOnlyList SupportedAlgorithms { get; } + + /// + /// Returns true if this provider can perform the specified operation with the given algorithm. + /// + bool CanHandle(CryptoOperation operation, string algorithm); + + /// + /// Sign data using the specified algorithm and key. + /// + /// Data to sign. + /// Signing options including algorithm and key reference. + /// Cancellation token. + /// Signature bytes. + Task SignAsync(ReadOnlyMemory data, CryptoSignOptions options, CancellationToken ct); + + /// + /// Verify a signature. + /// + /// Original data. + /// Signature to verify. + /// Verification options including algorithm and key reference. + /// Cancellation token. + /// True if signature is valid. + Task VerifyAsync(ReadOnlyMemory data, ReadOnlyMemory signature, CryptoVerifyOptions options, CancellationToken ct); + + /// + /// Encrypt data. + /// + Task EncryptAsync(ReadOnlyMemory data, CryptoEncryptOptions options, CancellationToken ct); + + /// + /// Decrypt data. + /// + Task DecryptAsync(ReadOnlyMemory data, CryptoDecryptOptions options, CancellationToken ct); + + /// + /// Compute hash of data. + /// + Task HashAsync(ReadOnlyMemory data, string algorithm, CancellationToken ct); +} + +public enum CryptoOperation +{ + Sign, + Verify, + Encrypt, + Decrypt, + Hash +} + +public sealed record CryptoSignOptions( + string Algorithm, + string KeyId, + string? KeyVersion = null, + IReadOnlyDictionary? Metadata = null); + +public sealed record CryptoVerifyOptions( + string Algorithm, + string KeyId, + string? KeyVersion = null, + string? CertificateChain = null); + +public sealed record CryptoEncryptOptions( + string Algorithm, + string KeyId, + byte[]? Iv = null, + byte[]? Aad = null); + +public sealed record CryptoDecryptOptions( + string Algorithm, + string KeyId, + byte[]? Iv = null, + byte[]? Aad = null); +``` + +### Capability Interfaces - Connector + +```csharp +// Capabilities/IConnectorCapability.cs +namespace StellaOps.Plugin.Abstractions.Capabilities; + +/// +/// Base capability for external system connectors. +/// +public interface IConnectorCapability +{ + /// + /// Connector type identifier, e.g., "scm.github", "registry.ecr", "vault.hashicorp". + /// + string ConnectorType { get; } + + /// + /// Human-readable display name. + /// + string DisplayName { get; } + + /// + /// Test the connection to the external system. + /// + Task TestConnectionAsync(CancellationToken ct); + + /// + /// Get current connection status and metadata. + /// + Task GetConnectionInfoAsync(CancellationToken ct); +} + +public sealed record ConnectionTestResult( + bool Success, + string? Message = null, + TimeSpan? Latency = null, + IReadOnlyDictionary? Details = null) +{ + public static ConnectionTestResult Succeeded(TimeSpan? latency = null) => + new(true, "Connection successful", latency); + + public static ConnectionTestResult Failed(string message, Exception? ex = null) => + new(false, message, Details: ex != null ? new Dictionary + { + ["exception"] = ex.GetType().Name, + ["exceptionMessage"] = ex.Message + } : null); +} + +public sealed record ConnectionInfo( + string EndpointUrl, + string? AuthenticatedAs = null, + DateTimeOffset? ConnectedSince = null, + IReadOnlyDictionary? Metadata = null); +``` + +### Capability Interfaces - SCM + +```csharp +// Capabilities/IScmCapability.cs +namespace StellaOps.Plugin.Abstractions.Capabilities; + +/// +/// Capability interface for source control management systems. +/// +public interface IScmCapability : IConnectorCapability +{ + /// + /// SCM type (github, gitlab, azdo, gitea, bitbucket). + /// + string ScmType { get; } + + /// + /// Returns true if this connector can handle the given repository URL. + /// Used for auto-detection. + /// + bool CanHandle(string repositoryUrl); + + /// + /// List branches in a repository. + /// + Task> ListBranchesAsync(string repositoryUrl, CancellationToken ct); + + /// + /// List commits on a branch. + /// + Task> ListCommitsAsync( + string repositoryUrl, + string branch, + int limit = 50, + CancellationToken ct = default); + + /// + /// Get details of a specific commit. + /// + Task GetCommitAsync(string repositoryUrl, string commitSha, CancellationToken ct); + + /// + /// Get file content at a specific ref. + /// + Task GetFileAsync( + string repositoryUrl, + string filePath, + string? reference = null, + CancellationToken ct = default); + + /// + /// Download repository archive. + /// + Task GetArchiveAsync( + string repositoryUrl, + string reference, + ArchiveFormat format = ArchiveFormat.TarGz, + CancellationToken ct = default); + + /// + /// Create or update a webhook. + /// + Task UpsertWebhookAsync( + string repositoryUrl, + ScmWebhookConfig config, + CancellationToken ct); + + /// + /// Get current authenticated user info. + /// + Task GetCurrentUserAsync(CancellationToken ct); +} + +public sealed record ScmBranch( + string Name, + string CommitSha, + bool IsDefault, + bool IsProtected); + +public sealed record ScmCommit( + string Sha, + string Message, + string AuthorName, + string AuthorEmail, + DateTimeOffset AuthoredAt, + IReadOnlyList ParentShas); + +public sealed record ScmFileContent( + string Path, + string Content, + string Encoding, + string Sha, + long Size); + +public sealed record ScmWebhook( + string Id, + string Url, + IReadOnlyList Events, + bool Active); + +public sealed record ScmWebhookConfig( + string Url, + string Secret, + IReadOnlyList Events); + +public sealed record ScmUser( + string Id, + string Username, + string? DisplayName, + string? Email, + string? AvatarUrl); + +public enum ArchiveFormat +{ + TarGz, + Zip +} +``` + +### Capability Interfaces - Analysis + +```csharp +// Capabilities/IAnalysisCapability.cs +namespace StellaOps.Plugin.Abstractions.Capabilities; + +/// +/// Capability interface for source code and binary analysis. +/// Implemented by scanner analyzers for different languages/ecosystems. +/// +public interface IAnalysisCapability +{ + /// + /// Analysis type identifier, e.g., "maven", "npm", "go-mod", "dotnet". + /// + string AnalysisType { get; } + + /// + /// File patterns this analyzer can process. + /// Glob patterns, e.g., ["pom.xml", "**/pom.xml", "*.jar"]. + /// + IReadOnlyList FilePatterns { get; } + + /// + /// Languages/ecosystems this analyzer supports. + /// + IReadOnlyList SupportedEcosystems { get; } + + /// + /// Returns true if this analyzer can process the given file. + /// + bool CanAnalyze(string filePath, ReadOnlySpan fileHeader); + + /// + /// Analyze a file or directory and extract dependency information. + /// + /// Analysis context with file access and configuration. + /// Cancellation token. + /// Analysis result with discovered components. + Task AnalyzeAsync(IAnalysisContext context, CancellationToken ct); +} + +public interface IAnalysisContext +{ + /// Root path being analyzed. + string RootPath { get; } + + /// Target file or directory for analysis. + string TargetPath { get; } + + /// Read file contents. + Task ReadFileAsync(string relativePath, CancellationToken ct); + + /// List files matching a pattern. + Task> GlobAsync(string pattern, CancellationToken ct); + + /// Check if file exists. + Task FileExistsAsync(string relativePath, CancellationToken ct); + + /// Analysis configuration. + IPluginConfiguration Configuration { get; } + + /// Logger for diagnostics. + IPluginLogger Logger { get; } +} + +public sealed record AnalysisResult( + bool Success, + IReadOnlyList Components, + IReadOnlyList Diagnostics, + AnalysisMetadata Metadata); + +public sealed record DiscoveredComponent( + string Name, + string Version, + string Ecosystem, + string? Purl, + string? Cpe, + ComponentType Type, + string? License, + IReadOnlyList Dependencies, + IReadOnlyDictionary? Metadata = null); + +public sealed record ComponentDependency( + string Name, + string? VersionConstraint, + DependencyScope Scope, + bool IsOptional); + +public sealed record AnalysisDiagnostic( + DiagnosticSeverity Severity, + string Code, + string Message, + string? FilePath = null, + int? Line = null); + +public sealed record AnalysisMetadata( + string AnalyzerType, + string AnalyzerVersion, + TimeSpan Duration, + int FilesProcessed); + +public enum ComponentType +{ + Library, + Framework, + Application, + OperatingSystem, + Device, + Container, + File +} + +public enum DependencyScope +{ + Runtime, + Development, + Test, + Build, + Optional, + Provided +} + +public enum DiagnosticSeverity +{ + Info, + Warning, + Error +} +``` + +### Capability Interfaces - Step Provider + +```csharp +// Capabilities/IStepProviderCapability.cs +namespace StellaOps.Plugin.Abstractions.Capabilities; + +/// +/// Capability interface for workflow step providers. +/// Plugins implementing this provide custom steps for release workflows. +/// +public interface IStepProviderCapability +{ + /// + /// Steps provided by this plugin. + /// + IReadOnlyList ProvidedSteps { get; } + + /// + /// Create an executor for the specified step type. + /// + /// Step type from . + /// Cancellation token. + Task CreateExecutorAsync(string stepType, CancellationToken ct); +} + +/// +/// Definition of a step type provided by a plugin. +/// +public sealed record StepDefinition( + string Type, + string DisplayName, + string Description, + string Category, + JsonDocument ConfigSchema, + JsonDocument InputSchema, + JsonDocument OutputSchema, + IReadOnlyList RequiredCapabilities); + +/// +/// Executor for a workflow step. +/// +public interface IStepExecutor : IAsyncDisposable +{ + /// + /// Execute the step with streaming events. + /// + /// Step configuration from workflow definition. + /// Input values from previous steps or workflow inputs. + /// Execution context with services and cancellation. + /// Cancellation token. + /// Async stream of step events (logs, outputs, progress, result). + IAsyncEnumerable ExecuteAsync( + JsonDocument config, + IReadOnlyDictionary inputs, + IStepContext context, + CancellationToken ct); +} + +/// +/// Context provided to step executors during execution. +/// +public interface IStepContext +{ + /// Unique execution ID for this step run. + Guid ExecutionId { get; } + + /// Workflow execution ID. + Guid WorkflowExecutionId { get; } + + /// Deployment ID if step is part of deployment. + Guid? DeploymentId { get; } + + /// Logger for step diagnostics. + IPluginLogger Logger { get; } + + /// Secret access (scoped to step permissions). + Task GetSecretAsync(string secretName, CancellationToken ct); + + /// Report progress (0-100). + Task ReportProgressAsync(int percentage, string? message = null, CancellationToken ct = default); +} + +/// +/// Event emitted during step execution. +/// +public abstract record StepEvent(DateTimeOffset Timestamp); + +public sealed record StepLogEvent( + DateTimeOffset Timestamp, + LogLevel Level, + string Message) : StepEvent(Timestamp); + +public sealed record StepOutputEvent( + DateTimeOffset Timestamp, + string Name, + object Value) : StepEvent(Timestamp); + +public sealed record StepProgressEvent( + DateTimeOffset Timestamp, + int Percentage, + string? Message = null) : StepEvent(Timestamp); + +public sealed record StepResultEvent( + DateTimeOffset Timestamp, + bool Success, + string? Message = null, + IReadOnlyDictionary? Outputs = null) : StepEvent(Timestamp); +``` + +### Capability Interfaces - Gate Provider + +```csharp +// Capabilities/IGateProviderCapability.cs +namespace StellaOps.Plugin.Abstractions.Capabilities; + +/// +/// Capability interface for promotion gate providers. +/// Plugins implementing this provide custom gates for release promotion. +/// +public interface IGateProviderCapability +{ + /// + /// Gates provided by this plugin. + /// + IReadOnlyList ProvidedGates { get; } + + /// + /// Create an evaluator for the specified gate type. + /// + Task CreateEvaluatorAsync(string gateType, CancellationToken ct); +} + +/// +/// Definition of a gate type provided by a plugin. +/// +public sealed record GateDefinition( + string Type, + string DisplayName, + string Description, + string Category, + JsonDocument ConfigSchema, + bool SupportsAutoApprove, + bool SupportsManualOverride); + +/// +/// Evaluator for a promotion gate. +/// +public interface IGateEvaluator : IAsyncDisposable +{ + /// + /// Evaluate the gate and return a decision. + /// + /// Gate configuration from workflow definition. + /// Evaluation context with release and environment info. + /// Cancellation token. + Task EvaluateAsync( + JsonDocument config, + IGateContext context, + CancellationToken ct); +} + +/// +/// Context provided to gate evaluators. +/// +public interface IGateContext +{ + /// Release being promoted. + GateReleaseInfo Release { get; } + + /// Source environment. + GateEnvironmentInfo SourceEnvironment { get; } + + /// Target environment. + GateEnvironmentInfo TargetEnvironment { get; } + + /// User requesting promotion. + GateUserInfo RequestedBy { get; } + + /// Previous gate results in this promotion. + IReadOnlyList PreviousGateResults { get; } + + /// Logger for diagnostics. + IPluginLogger Logger { get; } +} + +public sealed record GateReleaseInfo( + Guid Id, + string Name, + string Version, + IReadOnlyList Components); + +public sealed record GateComponentInfo( + string Name, + string Digest, + string? Version); + +public sealed record GateEnvironmentInfo( + Guid Id, + string Name, + int Tier); + +public sealed record GateUserInfo( + string Id, + string Username, + IReadOnlyList Roles); + +/// +/// Result of gate evaluation. +/// +public sealed record GateResult( + GateDecision Decision, + string? Message = null, + IReadOnlyList? Findings = null, + IReadOnlyDictionary? Evidence = null, + DateTimeOffset? ExpiresAt = null); + +public enum GateDecision +{ + /// Gate passed, promotion can proceed. + Passed, + + /// Gate failed, promotion blocked. + Failed, + + /// Gate requires manual review. + PendingReview, + + /// Gate could not evaluate, retry later. + Inconclusive +} + +public sealed record GateFinding( + GateFindingSeverity Severity, + string Code, + string Title, + string Description, + string? Remediation = null, + IReadOnlyDictionary? Metadata = null); + +public enum GateFindingSeverity +{ + Info, + Low, + Medium, + High, + Critical +} +``` + +### Plugin Manifest + +```csharp +// Manifest/PluginManifest.cs +namespace StellaOps.Plugin.Abstractions.Manifest; + +/// +/// Plugin manifest describing plugin metadata, capabilities, and requirements. +/// Typically loaded from plugin.yaml or plugin.json in the plugin package. +/// +public sealed record PluginManifest +{ + /// Plugin metadata. + public required PluginInfo Info { get; init; } + + /// Plugin entry point type (fully qualified name). + public required string EntryPoint { get; init; } + + /// Minimum platform version required. + public string? MinPlatformVersion { get; init; } + + /// Maximum platform version supported. + public string? MaxPlatformVersion { get; init; } + + /// Capabilities declared by this plugin. + public IReadOnlyList Capabilities { get; init; } = []; + + /// Dependencies on other plugins. + public IReadOnlyList Dependencies { get; init; } = []; + + /// Resource requirements for sandboxed execution. + public ManifestResourceRequirements? ResourceRequirements { get; init; } + + /// Network hosts the plugin needs to access. + public IReadOnlyList RequiredHosts { get; init; } = []; + + /// Configuration schema (JSON Schema). + public JsonDocument? ConfigSchema { get; init; } + + /// Default configuration values. + public JsonDocument? DefaultConfig { get; init; } +} + +/// +/// Capability declaration in the manifest. +/// +public sealed record ManifestCapabilityDeclaration( + string Type, + string? Id = null, + JsonDocument? ConfigSchema = null, + IReadOnlyDictionary? Metadata = null); + +/// +/// Dependency on another plugin. +/// +public sealed record ManifestDependency( + string PluginId, + string? MinVersion = null, + string? MaxVersion = null, + bool Optional = false); + +/// +/// Resource requirements for sandboxed plugins. +/// +public sealed record ManifestResourceRequirements( + int? MaxMemoryMb = null, + int? MaxCpuPercent = null, + int? MaxDiskMb = null, + int? MaxNetworkBandwidthMbps = null, + TimeSpan? InitializationTimeout = null, + TimeSpan? OperationTimeout = null); +``` + +### Plugin Attributes + +```csharp +// Attributes/PluginAttribute.cs +namespace StellaOps.Plugin.Abstractions.Attributes; + +/// +/// Marks a class as a plugin entry point. +/// +[AttributeUsage(AttributeTargets.Class, AllowMultiple = false, Inherited = false)] +public sealed class PluginAttribute : Attribute +{ + public string Id { get; } + public string Name { get; } + public string Version { get; } + public string Vendor { get; } + public string? Description { get; set; } + public string? LicenseId { get; set; } + + public PluginAttribute(string id, string name, string version, string vendor) + { + Id = id; + Name = name; + Version = version; + Vendor = vendor; + } +} + +/// +/// Declares a capability provided by the plugin. +/// +[AttributeUsage(AttributeTargets.Class, AllowMultiple = true, Inherited = false)] +public sealed class ProvidesCapabilityAttribute : Attribute +{ + public PluginCapabilities Capability { get; } + public string? CapabilityId { get; set; } + + public ProvidesCapabilityAttribute(PluginCapabilities capability) + { + Capability = capability; + } +} + +/// +/// Declares a capability required by the plugin. +/// +[AttributeUsage(AttributeTargets.Class, AllowMultiple = true, Inherited = false)] +public sealed class RequiresCapabilityAttribute : Attribute +{ + public PluginCapabilities Capability { get; } + public bool Optional { get; set; } + + public RequiresCapabilityAttribute(PluginCapabilities capability) + { + Capability = capability; + } +} + +/// +/// Specifies the minimum platform version required. +/// +[AttributeUsage(AttributeTargets.Class, AllowMultiple = false, Inherited = false)] +public sealed class RequiresPlatformVersionAttribute : Attribute +{ + public string MinVersion { get; } + public string? MaxVersion { get; set; } + + public RequiresPlatformVersionAttribute(string minVersion) + { + MinVersion = minVersion; + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] `IPlugin` interface defined with all lifecycle methods +- [ ] `PluginInfo` record with validation +- [ ] `PluginTrustLevel` enum with extension methods +- [ ] `PluginCapabilities` flags enum with 40+ capabilities +- [ ] `IPluginContext` and related interfaces +- [ ] `HealthCheckResult` with factory methods +- [ ] All capability interfaces defined: + - [ ] `ICryptoCapability` with sign/verify/encrypt/decrypt/hash + - [ ] `IConnectorCapability` with connection test + - [ ] `IScmCapability` with branches/commits/files/webhooks + - [ ] `IRegistryCapability` with repos/tags/manifests + - [ ] `IAnalysisCapability` with file patterns and analysis + - [ ] `ITransportCapability` with send/receive + - [ ] `IFeedCapability` with fetch/parse + - [ ] `ILlmCapability` with session creation + - [ ] `IAuthCapability` with authenticate/authorize + - [ ] `IStepProviderCapability` with step execution + - [ ] `IGateProviderCapability` with gate evaluation +- [ ] `PluginManifest` model with all fields +- [ ] Plugin attributes for decoration +- [ ] XML documentation on all public members +- [ ] Unit tests for all validation logic +- [ ] Unit tests for enum extensions +- [ ] Test coverage >= 90% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| .NET 10 | External | Available | +| System.Text.Json | External | Built-in | +| Microsoft.Extensions.Logging.Abstractions | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IPlugin interface | TODO | | +| PluginInfo record | TODO | | +| PluginTrustLevel enum | TODO | | +| PluginCapabilities enum | TODO | | +| IPluginContext interfaces | TODO | | +| Health types | TODO | | +| ICryptoCapability | TODO | | +| IConnectorCapability | TODO | | +| IScmCapability | TODO | | +| IRegistryCapability | TODO | | +| IAnalysisCapability | TODO | | +| ITransportCapability | TODO | | +| IFeedCapability | TODO | | +| ILlmCapability | TODO | | +| IAuthCapability | TODO | | +| IStepProviderCapability | TODO | | +| IGateProviderCapability | TODO | | +| PluginManifest model | TODO | | +| Attributes | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_100_002_PLUGIN_host.md b/docs/implplan/SPRINT_20260110_100_002_PLUGIN_host.md new file mode 100644 index 000000000..5fda5fda8 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_002_PLUGIN_host.md @@ -0,0 +1,1173 @@ +# SPRINT: Plugin Host & Lifecycle Manager + +> **Sprint ID:** 100_002 +> **Module:** PLUGIN +> **Phase:** 100 - Plugin System Unification +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_plugin_unification.md) + +--- + +## Overview + +Implement the unified plugin host that manages plugin discovery, loading, initialization, lifecycle transitions, and shutdown. The host is the central coordinator for all plugins in the platform. + +### Objectives + +- Implement plugin discovery from filesystem and embedded assemblies +- Implement assembly loading with isolation (AssemblyLoadContext) +- Implement plugin lifecycle state machine +- Implement graceful initialization and shutdown +- Implement health monitoring and auto-recovery +- Implement plugin dependency resolution + +### Working Directory + +``` +src/Plugin/ +├── StellaOps.Plugin.Host/ +│ ├── StellaOps.Plugin.Host.csproj +│ ├── PluginHost.cs +│ ├── IPluginHost.cs +│ ├── PluginHostOptions.cs +│ ├── Discovery/ +│ │ ├── IPluginDiscovery.cs +│ │ ├── FileSystemPluginDiscovery.cs +│ │ ├── EmbeddedPluginDiscovery.cs +│ │ ├── CompositePluginDiscovery.cs +│ │ └── PluginDiscoveryResult.cs +│ ├── Loading/ +│ │ ├── IPluginLoader.cs +│ │ ├── AssemblyPluginLoader.cs +│ │ ├── PluginAssemblyLoadContext.cs +│ │ └── PluginLoadResult.cs +│ ├── Lifecycle/ +│ │ ├── IPluginLifecycleManager.cs +│ │ ├── PluginLifecycleManager.cs +│ │ ├── PluginStateMachine.cs +│ │ └── PluginStateTransition.cs +│ ├── Context/ +│ │ ├── PluginContext.cs +│ │ ├── PluginConfiguration.cs +│ │ ├── PluginLogger.cs +│ │ └── PluginServices.cs +│ ├── Health/ +│ │ ├── IPluginHealthMonitor.cs +│ │ ├── PluginHealthMonitor.cs +│ │ └── HealthCheckScheduler.cs +│ ├── Dependencies/ +│ │ ├── IPluginDependencyResolver.cs +│ │ ├── PluginDependencyResolver.cs +│ │ └── DependencyGraph.cs +│ └── Extensions/ +│ └── ServiceCollectionExtensions.cs +└── __Tests/ + └── StellaOps.Plugin.Host.Tests/ + ├── PluginHostTests.cs + ├── PluginDiscoveryTests.cs + ├── PluginLoaderTests.cs + ├── LifecycleManagerTests.cs + └── DependencyResolverTests.cs +``` + +--- + +## Deliverables + +### Plugin Host Interface + +```csharp +// IPluginHost.cs +namespace StellaOps.Plugin.Host; + +/// +/// Central coordinator for plugin lifecycle management. +/// +public interface IPluginHost : IAsyncDisposable +{ + /// + /// All currently loaded plugins. + /// + IReadOnlyDictionary Plugins { get; } + + /// + /// Discover and load all plugins from configured sources. + /// + Task StartAsync(CancellationToken ct); + + /// + /// Gracefully stop all plugins and release resources. + /// + Task StopAsync(CancellationToken ct); + + /// + /// Load a specific plugin from a source. + /// + Task LoadPluginAsync(PluginSource source, CancellationToken ct); + + /// + /// Unload a specific plugin. + /// + Task UnloadPluginAsync(string pluginId, CancellationToken ct); + + /// + /// Reload a plugin (unload then load). + /// + Task ReloadPluginAsync(string pluginId, CancellationToken ct); + + /// + /// Get plugins with a specific capability. + /// + IEnumerable GetPluginsWithCapability() where T : class; + + /// + /// Get a specific plugin by ID. + /// + LoadedPlugin? GetPlugin(string pluginId); + + /// + /// Get a plugin capability instance. + /// + T? GetCapability(string pluginId) where T : class; + + /// + /// Event raised when a plugin state changes. + /// + event EventHandler? PluginStateChanged; + + /// + /// Event raised when a plugin health status changes. + /// + event EventHandler? PluginHealthChanged; +} + +/// +/// Represents a loaded plugin with its runtime state. +/// +public sealed class LoadedPlugin +{ + public required string PluginId { get; init; } + public required PluginInfo Info { get; init; } + public required IPlugin Instance { get; init; } + public required PluginTrustLevel TrustLevel { get; init; } + public required PluginCapabilities Capabilities { get; init; } + public required PluginLifecycleState State { get; init; } + public required HealthStatus HealthStatus { get; init; } + public required DateTimeOffset LoadedAt { get; init; } + public DateTimeOffset? LastHealthCheck { get; init; } + public PluginManifest? Manifest { get; init; } + public IPluginContext? Context { get; init; } +} + +public sealed record PluginSource( + PluginSourceType Type, + string Location, + IReadOnlyDictionary? Metadata = null); + +public enum PluginSourceType +{ + FileSystem, + Embedded, + Remote, + Database +} + +public sealed class PluginStateChangedEventArgs : EventArgs +{ + public required string PluginId { get; init; } + public required PluginLifecycleState OldState { get; init; } + public required PluginLifecycleState NewState { get; init; } + public string? Reason { get; init; } +} + +public sealed class PluginHealthChangedEventArgs : EventArgs +{ + public required string PluginId { get; init; } + public required HealthStatus OldStatus { get; init; } + public required HealthStatus NewStatus { get; init; } + public HealthCheckResult? CheckResult { get; init; } +} +``` + +### Plugin Host Implementation + +```csharp +// PluginHost.cs +namespace StellaOps.Plugin.Host; + +public sealed class PluginHost : IPluginHost +{ + private readonly IPluginDiscovery _discovery; + private readonly IPluginLoader _loader; + private readonly IPluginLifecycleManager _lifecycle; + private readonly IPluginHealthMonitor _healthMonitor; + private readonly IPluginDependencyResolver _dependencyResolver; + private readonly IPluginRegistry? _registry; + private readonly PluginHostOptions _options; + private readonly ILogger _logger; + private readonly TimeProvider _timeProvider; + + private readonly ConcurrentDictionary _plugins = new(); + private readonly SemaphoreSlim _loadLock = new(1, 1); + private CancellationTokenSource? _shutdownCts; + + public IReadOnlyDictionary Plugins => _plugins; + + public event EventHandler? PluginStateChanged; + public event EventHandler? PluginHealthChanged; + + public PluginHost( + IPluginDiscovery discovery, + IPluginLoader loader, + IPluginLifecycleManager lifecycle, + IPluginHealthMonitor healthMonitor, + IPluginDependencyResolver dependencyResolver, + IOptions options, + ILogger logger, + TimeProvider timeProvider, + IPluginRegistry? registry = null) + { + _discovery = discovery; + _loader = loader; + _lifecycle = lifecycle; + _healthMonitor = healthMonitor; + _dependencyResolver = dependencyResolver; + _options = options.Value; + _logger = logger; + _timeProvider = timeProvider; + _registry = registry; + + _healthMonitor.HealthChanged += OnPluginHealthChanged; + } + + public async Task StartAsync(CancellationToken ct) + { + _shutdownCts = CancellationTokenSource.CreateLinkedTokenSource(ct); + + _logger.LogInformation("Starting plugin host..."); + + // 1. Discover plugins + var discovered = await _discovery.DiscoverAsync(_options.PluginPaths, ct); + _logger.LogInformation("Discovered {Count} plugins", discovered.Count); + + // 2. Resolve dependencies and determine load order + var loadOrder = _dependencyResolver.ResolveLoadOrder(discovered); + + // 3. Load plugins in dependency order + foreach (var manifest in loadOrder) + { + try + { + await LoadPluginInternalAsync(manifest, ct); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to load plugin {PluginId}", manifest.Info.Id); + + if (_options.FailOnPluginLoadError) + throw; + } + } + + // 4. Start health monitoring + await _healthMonitor.StartAsync(_shutdownCts.Token); + + _logger.LogInformation("Plugin host started with {Count} active plugins", + _plugins.Count(p => p.Value.State == PluginLifecycleState.Active)); + } + + public async Task StopAsync(CancellationToken ct) + { + _logger.LogInformation("Stopping plugin host..."); + + // Cancel ongoing operations + _shutdownCts?.Cancel(); + + // Stop health monitoring + await _healthMonitor.StopAsync(ct); + + // Unload plugins in reverse dependency order + var unloadOrder = _dependencyResolver.ResolveUnloadOrder(_plugins.Values.Select(p => p.Manifest!)); + + foreach (var pluginId in unloadOrder) + { + try + { + await UnloadPluginInternalAsync(pluginId, ct); + } + catch (Exception ex) + { + _logger.LogError(ex, "Error unloading plugin {PluginId}", pluginId); + } + } + + _logger.LogInformation("Plugin host stopped"); + } + + public async Task LoadPluginAsync(PluginSource source, CancellationToken ct) + { + await _loadLock.WaitAsync(ct); + try + { + // Discover manifest from source + var manifest = await _discovery.DiscoverSingleAsync(source, ct); + + // Check if already loaded + if (_plugins.ContainsKey(manifest.Info.Id)) + throw new InvalidOperationException($"Plugin {manifest.Info.Id} is already loaded"); + + return await LoadPluginInternalAsync(manifest, ct); + } + finally + { + _loadLock.Release(); + } + } + + private async Task LoadPluginInternalAsync(PluginManifest manifest, CancellationToken ct) + { + var pluginId = manifest.Info.Id; + _logger.LogDebug("Loading plugin {PluginId} v{Version}", pluginId, manifest.Info.Version); + + // Transition to Loading state + await _lifecycle.TransitionAsync(pluginId, PluginLifecycleState.Loading, ct); + RaiseStateChanged(pluginId, PluginLifecycleState.Discovered, PluginLifecycleState.Loading); + + try + { + // 1. Determine trust level + var trustLevel = await DetermineTrustLevelAsync(manifest, ct); + + // 2. Load assembly and create instance + var loadResult = await _loader.LoadAsync(manifest, trustLevel, ct); + + // 3. Create plugin context + var context = CreatePluginContext(manifest, trustLevel); + + // 4. Transition to Initializing + await _lifecycle.TransitionAsync(pluginId, PluginLifecycleState.Initializing, ct); + RaiseStateChanged(pluginId, PluginLifecycleState.Loading, PluginLifecycleState.Initializing); + + // 5. Initialize plugin + using var initCts = CancellationTokenSource.CreateLinkedTokenSource(ct); + initCts.CancelAfter(_options.InitializationTimeout); + + await loadResult.Instance.InitializeAsync(context, initCts.Token); + + // 6. Transition to Active + await _lifecycle.TransitionAsync(pluginId, PluginLifecycleState.Active, ct); + + var loadedPlugin = new LoadedPlugin + { + PluginId = pluginId, + Info = manifest.Info, + Instance = loadResult.Instance, + TrustLevel = trustLevel, + Capabilities = loadResult.Instance.Capabilities, + State = PluginLifecycleState.Active, + HealthStatus = HealthStatus.Healthy, + LoadedAt = _timeProvider.GetUtcNow(), + Manifest = manifest, + Context = context + }; + + _plugins[pluginId] = loadedPlugin; + + // 7. Register in database if available + if (_registry != null) + { + await _registry.RegisterAsync(loadedPlugin, ct); + } + + // 8. Register with health monitor + _healthMonitor.RegisterPlugin(loadedPlugin); + + RaiseStateChanged(pluginId, PluginLifecycleState.Initializing, PluginLifecycleState.Active); + + _logger.LogInformation( + "Loaded plugin {PluginId} v{Version} with capabilities [{Capabilities}]", + pluginId, manifest.Info.Version, loadedPlugin.Capabilities); + + return loadedPlugin; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to load plugin {PluginId}", pluginId); + await _lifecycle.TransitionAsync(pluginId, PluginLifecycleState.Failed, ct); + RaiseStateChanged(pluginId, PluginLifecycleState.Initializing, PluginLifecycleState.Failed, ex.Message); + throw; + } + } + + public async Task UnloadPluginAsync(string pluginId, CancellationToken ct) + { + await _loadLock.WaitAsync(ct); + try + { + await UnloadPluginInternalAsync(pluginId, ct); + } + finally + { + _loadLock.Release(); + } + } + + private async Task UnloadPluginInternalAsync(string pluginId, CancellationToken ct) + { + if (!_plugins.TryGetValue(pluginId, out var plugin)) + return; + + _logger.LogDebug("Unloading plugin {PluginId}", pluginId); + + var oldState = plugin.State; + + // Transition to Stopping + await _lifecycle.TransitionAsync(pluginId, PluginLifecycleState.Stopping, ct); + RaiseStateChanged(pluginId, oldState, PluginLifecycleState.Stopping); + + try + { + // Unregister from health monitor + _healthMonitor.UnregisterPlugin(pluginId); + + // Dispose plugin + using var disposeCts = CancellationTokenSource.CreateLinkedTokenSource(ct); + disposeCts.CancelAfter(_options.ShutdownTimeout); + + await plugin.Instance.DisposeAsync(); + + // Transition to Stopped + await _lifecycle.TransitionAsync(pluginId, PluginLifecycleState.Stopped, ct); + RaiseStateChanged(pluginId, PluginLifecycleState.Stopping, PluginLifecycleState.Stopped); + + // Unload assembly + await _loader.UnloadAsync(pluginId, ct); + + // Remove from registry + _plugins.TryRemove(pluginId, out _); + + if (_registry != null) + { + await _registry.UnregisterAsync(pluginId, ct); + } + + _logger.LogInformation("Unloaded plugin {PluginId}", pluginId); + } + catch (Exception ex) + { + _logger.LogError(ex, "Error unloading plugin {PluginId}", pluginId); + throw; + } + } + + public async Task ReloadPluginAsync(string pluginId, CancellationToken ct) + { + if (!_plugins.TryGetValue(pluginId, out var existing)) + throw new InvalidOperationException($"Plugin {pluginId} is not loaded"); + + var manifest = existing.Manifest + ?? throw new InvalidOperationException($"Plugin {pluginId} has no manifest"); + + await UnloadPluginAsync(pluginId, ct); + + // Small delay to allow resources to be released + await Task.Delay(100, ct); + + return await LoadPluginInternalAsync(manifest, ct); + } + + public IEnumerable GetPluginsWithCapability() where T : class + { + foreach (var plugin in _plugins.Values) + { + if (plugin.State == PluginLifecycleState.Active && plugin.Instance is T capability) + { + yield return capability; + } + } + } + + public LoadedPlugin? GetPlugin(string pluginId) => + _plugins.TryGetValue(pluginId, out var plugin) ? plugin : null; + + public T? GetCapability(string pluginId) where T : class => + GetPlugin(pluginId)?.Instance as T; + + private async Task DetermineTrustLevelAsync(PluginManifest manifest, CancellationToken ct) + { + // Built-in plugins are always trusted + if (_options.BuiltInPluginIds.Contains(manifest.Info.Id)) + return PluginTrustLevel.BuiltIn; + + // Check signature if present + if (manifest.Signature != null) + { + var isValid = await VerifySignatureAsync(manifest, ct); + if (isValid && _options.TrustedVendors.Contains(manifest.Info.Vendor)) + return PluginTrustLevel.Trusted; + } + + // Check trusted plugins list + if (_options.TrustedPluginIds.Contains(manifest.Info.Id)) + return PluginTrustLevel.Trusted; + + // Default to untrusted + return PluginTrustLevel.Untrusted; + } + + private async Task VerifySignatureAsync(PluginManifest manifest, CancellationToken ct) + { + // Signature verification implementation + // Uses crypto capability if available + await Task.CompletedTask; // Placeholder + return false; + } + + private IPluginContext CreatePluginContext(PluginManifest manifest, PluginTrustLevel trustLevel) + { + return new PluginContext( + manifest, + trustLevel, + _options, + _logger, + _timeProvider, + _shutdownCts!.Token); + } + + private void OnPluginHealthChanged(object? sender, PluginHealthChangedEventArgs e) + { + if (_plugins.TryGetValue(e.PluginId, out var plugin)) + { + // Update plugin health status + var updated = plugin with { HealthStatus = e.NewStatus, LastHealthCheck = _timeProvider.GetUtcNow() }; + _plugins[e.PluginId] = updated; + + // Handle unhealthy plugins + if (e.NewStatus == HealthStatus.Unhealthy && _options.AutoRecoverUnhealthyPlugins) + { + _ = Task.Run(async () => + { + try + { + _logger.LogWarning("Plugin {PluginId} unhealthy, attempting recovery", e.PluginId); + await ReloadPluginAsync(e.PluginId, CancellationToken.None); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to recover plugin {PluginId}", e.PluginId); + } + }); + } + } + + PluginHealthChanged?.Invoke(this, e); + } + + private void RaiseStateChanged(string pluginId, PluginLifecycleState oldState, PluginLifecycleState newState, string? reason = null) + { + PluginStateChanged?.Invoke(this, new PluginStateChangedEventArgs + { + PluginId = pluginId, + OldState = oldState, + NewState = newState, + Reason = reason + }); + } + + public async ValueTask DisposeAsync() + { + await StopAsync(CancellationToken.None); + _shutdownCts?.Dispose(); + _loadLock.Dispose(); + } +} +``` + +### Plugin Loader + +```csharp +// Loading/AssemblyPluginLoader.cs +namespace StellaOps.Plugin.Host.Loading; + +public sealed class AssemblyPluginLoader : IPluginLoader +{ + private readonly ConcurrentDictionary _loadContexts = new(); + private readonly ILogger _logger; + + public AssemblyPluginLoader(ILogger logger) + { + _logger = logger; + } + + public async Task LoadAsync( + PluginManifest manifest, + PluginTrustLevel trustLevel, + CancellationToken ct) + { + var assemblyPath = ResolveAssemblyPath(manifest); + + _logger.LogDebug("Loading plugin assembly from {Path}", assemblyPath); + + // Create isolated load context + var loadContext = new PluginAssemblyLoadContext( + manifest.Info.Id, + assemblyPath, + isCollectible: trustLevel != PluginTrustLevel.BuiltIn); + + _loadContexts[manifest.Info.Id] = loadContext; + + try + { + // Load the assembly + var assembly = loadContext.LoadFromAssemblyPath(assemblyPath); + + // Find the entry point type + var entryPointType = assembly.GetType(manifest.EntryPoint) + ?? throw new PluginLoadException($"Entry point type '{manifest.EntryPoint}' not found"); + + // Verify it implements IPlugin + if (!typeof(IPlugin).IsAssignableFrom(entryPointType)) + throw new PluginLoadException($"Entry point type '{manifest.EntryPoint}' does not implement IPlugin"); + + // Create instance + var instance = Activator.CreateInstance(entryPointType) as IPlugin + ?? throw new PluginLoadException($"Failed to create instance of '{manifest.EntryPoint}'"); + + return new PluginLoadResult(instance, assembly, loadContext); + } + catch (Exception ex) + { + // Cleanup on failure + _loadContexts.TryRemove(manifest.Info.Id, out _); + loadContext.Unload(); + throw new PluginLoadException($"Failed to load plugin {manifest.Info.Id}", ex); + } + } + + public async Task UnloadAsync(string pluginId, CancellationToken ct) + { + if (_loadContexts.TryRemove(pluginId, out var loadContext)) + { + loadContext.Unload(); + + // Wait for GC to collect the assemblies + for (int i = 0; i < 10 && loadContext.IsAlive; i++) + { + GC.Collect(); + GC.WaitForPendingFinalizers(); + await Task.Delay(100, ct); + } + + if (loadContext.IsAlive) + { + _logger.LogWarning("Plugin {PluginId} load context still alive after unload", pluginId); + } + } + } + + private static string ResolveAssemblyPath(PluginManifest manifest) + { + // Implementation to resolve the main assembly path from manifest + // Could be relative to manifest location or absolute + return manifest.AssemblyPath ?? throw new PluginLoadException("Assembly path not specified in manifest"); + } +} + +public sealed class PluginAssemblyLoadContext : AssemblyLoadContext +{ + private readonly AssemblyDependencyResolver _resolver; + private readonly WeakReference _weakReference; + + public bool IsAlive => _weakReference.IsAlive; + + public PluginAssemblyLoadContext(string name, string pluginPath, bool isCollectible) + : base(name, isCollectible) + { + _resolver = new AssemblyDependencyResolver(pluginPath); + _weakReference = new WeakReference(this); + } + + protected override Assembly? Load(AssemblyName assemblyName) + { + var assemblyPath = _resolver.ResolveAssemblyToPath(assemblyName); + if (assemblyPath != null) + { + return LoadFromAssemblyPath(assemblyPath); + } + return null; + } + + protected override IntPtr LoadUnmanagedDll(string unmanagedDllName) + { + var libraryPath = _resolver.ResolveUnmanagedDllToPath(unmanagedDllName); + if (libraryPath != null) + { + return LoadUnmanagedDllFromPath(libraryPath); + } + return IntPtr.Zero; + } +} + +public sealed record PluginLoadResult( + IPlugin Instance, + Assembly Assembly, + AssemblyLoadContext LoadContext); + +public class PluginLoadException : Exception +{ + public PluginLoadException(string message) : base(message) { } + public PluginLoadException(string message, Exception inner) : base(message, inner) { } +} +``` + +### Plugin Discovery + +```csharp +// Discovery/FileSystemPluginDiscovery.cs +namespace StellaOps.Plugin.Host.Discovery; + +public sealed class FileSystemPluginDiscovery : IPluginDiscovery +{ + private readonly ILogger _logger; + private static readonly string[] ManifestFileNames = ["plugin.yaml", "plugin.yml", "plugin.json"]; + + public FileSystemPluginDiscovery(ILogger logger) + { + _logger = logger; + } + + public async Task> DiscoverAsync( + IEnumerable searchPaths, + CancellationToken ct) + { + var manifests = new List(); + + foreach (var searchPath in searchPaths) + { + if (!Directory.Exists(searchPath)) + { + _logger.LogWarning("Plugin search path does not exist: {Path}", searchPath); + continue; + } + + _logger.LogDebug("Searching for plugins in {Path}", searchPath); + + // Look for plugin directories (contain plugin.yaml/plugin.json) + foreach (var dir in Directory.EnumerateDirectories(searchPath)) + { + ct.ThrowIfCancellationRequested(); + + var manifestPath = FindManifestFile(dir); + if (manifestPath == null) + continue; + + try + { + var manifest = await ParseManifestAsync(manifestPath, ct); + manifests.Add(manifest); + _logger.LogDebug("Discovered plugin {PluginId} at {Path}", manifest.Info.Id, dir); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Failed to parse manifest at {Path}", manifestPath); + } + } + } + + return manifests; + } + + public async Task DiscoverSingleAsync(PluginSource source, CancellationToken ct) + { + if (source.Type != PluginSourceType.FileSystem) + throw new ArgumentException($"Unsupported source type: {source.Type}"); + + var manifestPath = FindManifestFile(source.Location) + ?? throw new FileNotFoundException($"No plugin manifest found in {source.Location}"); + + return await ParseManifestAsync(manifestPath, ct); + } + + private static string? FindManifestFile(string directory) + { + foreach (var fileName in ManifestFileNames) + { + var path = Path.Combine(directory, fileName); + if (File.Exists(path)) + return path; + } + return null; + } + + private static async Task ParseManifestAsync(string manifestPath, CancellationToken ct) + { + var content = await File.ReadAllTextAsync(manifestPath, ct); + var extension = Path.GetExtension(manifestPath).ToLowerInvariant(); + + return extension switch + { + ".yaml" or ".yml" => ParseYamlManifest(content, manifestPath), + ".json" => ParseJsonManifest(content, manifestPath), + _ => throw new InvalidOperationException($"Unknown manifest format: {extension}") + }; + } + + private static PluginManifest ParseYamlManifest(string content, string path) + { + var deserializer = new DeserializerBuilder() + .WithNamingConvention(CamelCaseNamingConvention.Instance) + .Build(); + + var manifestDto = deserializer.Deserialize(content); + return manifestDto.ToManifest(Path.GetDirectoryName(path)!); + } + + private static PluginManifest ParseJsonManifest(string content, string path) + { + var manifestDto = JsonSerializer.Deserialize(content, new JsonSerializerOptions + { + PropertyNamingPolicy = JsonNamingPolicy.CamelCase + }); + + return manifestDto?.ToManifest(Path.GetDirectoryName(path)!) + ?? throw new InvalidOperationException("Failed to parse manifest JSON"); + } +} +``` + +### Health Monitor + +```csharp +// Health/PluginHealthMonitor.cs +namespace StellaOps.Plugin.Host.Health; + +public sealed class PluginHealthMonitor : IPluginHealthMonitor, IAsyncDisposable +{ + private readonly PluginHostOptions _options; + private readonly ILogger _logger; + private readonly TimeProvider _timeProvider; + private readonly ConcurrentDictionary _healthStates = new(); + private readonly Channel _checkQueue; + private Task? _monitorTask; + private CancellationTokenSource? _cts; + + public event EventHandler? HealthChanged; + + public PluginHealthMonitor( + IOptions options, + ILogger logger, + TimeProvider timeProvider) + { + _options = options.Value; + _logger = logger; + _timeProvider = timeProvider; + _checkQueue = Channel.CreateBounded(new BoundedChannelOptions(100) + { + FullMode = BoundedChannelFullMode.DropOldest + }); + } + + public async Task StartAsync(CancellationToken ct) + { + _cts = CancellationTokenSource.CreateLinkedTokenSource(ct); + _monitorTask = Task.Run(() => MonitorLoopAsync(_cts.Token), _cts.Token); + _logger.LogInformation("Plugin health monitor started"); + } + + public async Task StopAsync(CancellationToken ct) + { + _cts?.Cancel(); + if (_monitorTask != null) + { + try + { + await _monitorTask.WaitAsync(ct); + } + catch (OperationCanceledException) { } + } + _logger.LogInformation("Plugin health monitor stopped"); + } + + public void RegisterPlugin(LoadedPlugin plugin) + { + _healthStates[plugin.PluginId] = new PluginHealthState + { + Plugin = plugin, + LastCheck = _timeProvider.GetUtcNow(), + Status = HealthStatus.Healthy, + ConsecutiveFailures = 0 + }; + } + + public void UnregisterPlugin(string pluginId) + { + _healthStates.TryRemove(pluginId, out _); + } + + public async Task CheckHealthAsync(string pluginId, CancellationToken ct) + { + if (!_healthStates.TryGetValue(pluginId, out var state)) + return HealthCheckResult.Unhealthy("Plugin not registered"); + + return await PerformHealthCheckAsync(state, ct); + } + + private async Task MonitorLoopAsync(CancellationToken ct) + { + var periodicTimer = new PeriodicTimer(_options.HealthCheckInterval); + + while (!ct.IsCancellationRequested) + { + try + { + await periodicTimer.WaitForNextTickAsync(ct); + + // Check all registered plugins + foreach (var kvp in _healthStates) + { + ct.ThrowIfCancellationRequested(); + + var state = kvp.Value; + var timeSinceLastCheck = _timeProvider.GetUtcNow() - state.LastCheck; + + if (timeSinceLastCheck >= _options.HealthCheckInterval) + { + try + { + await PerformHealthCheckAsync(state, ct); + } + catch (Exception ex) + { + _logger.LogError(ex, "Health check failed for plugin {PluginId}", kvp.Key); + } + } + } + } + catch (OperationCanceledException) + { + break; + } + catch (Exception ex) + { + _logger.LogError(ex, "Error in health monitor loop"); + } + } + } + + private async Task PerformHealthCheckAsync(PluginHealthState state, CancellationToken ct) + { + var plugin = state.Plugin; + var stopwatch = Stopwatch.StartNew(); + + try + { + using var timeoutCts = CancellationTokenSource.CreateLinkedTokenSource(ct); + timeoutCts.CancelAfter(_options.HealthCheckTimeout); + + var result = await plugin.Instance.HealthCheckAsync(timeoutCts.Token); + stopwatch.Stop(); + + result = result with { Duration = stopwatch.Elapsed }; + + // Update state + var oldStatus = state.Status; + state.Status = result.Status; + state.LastCheck = _timeProvider.GetUtcNow(); + state.LastResult = result; + + if (result.Status == HealthStatus.Healthy) + { + state.ConsecutiveFailures = 0; + } + else + { + state.ConsecutiveFailures++; + } + + // Raise event if status changed + if (oldStatus != result.Status) + { + HealthChanged?.Invoke(this, new PluginHealthChangedEventArgs + { + PluginId = plugin.PluginId, + OldStatus = oldStatus, + NewStatus = result.Status, + CheckResult = result + }); + } + + return result; + } + catch (OperationCanceledException) + { + var result = HealthCheckResult.Unhealthy("Health check timed out"); + state.ConsecutiveFailures++; + UpdateHealthStatus(state, result); + return result; + } + catch (Exception ex) + { + var result = HealthCheckResult.Unhealthy(ex); + state.ConsecutiveFailures++; + UpdateHealthStatus(state, result); + return result; + } + } + + private void UpdateHealthStatus(PluginHealthState state, HealthCheckResult result) + { + var oldStatus = state.Status; + state.Status = result.Status; + state.LastCheck = _timeProvider.GetUtcNow(); + state.LastResult = result; + + if (oldStatus != result.Status) + { + HealthChanged?.Invoke(this, new PluginHealthChangedEventArgs + { + PluginId = state.Plugin.PluginId, + OldStatus = oldStatus, + NewStatus = result.Status, + CheckResult = result + }); + } + } + + public async ValueTask DisposeAsync() + { + await StopAsync(CancellationToken.None); + _cts?.Dispose(); + } + + private sealed class PluginHealthState + { + public required LoadedPlugin Plugin { get; init; } + public DateTimeOffset LastCheck { get; set; } + public HealthStatus Status { get; set; } + public int ConsecutiveFailures { get; set; } + public HealthCheckResult? LastResult { get; set; } + } +} +``` + +### Service Collection Extensions + +```csharp +// Extensions/ServiceCollectionExtensions.cs +namespace StellaOps.Plugin.Host.Extensions; + +public static class ServiceCollectionExtensions +{ + public static IServiceCollection AddPluginHost( + this IServiceCollection services, + IConfiguration configuration) + { + // Bind options + services.Configure(configuration.GetSection("Plugins")); + + // Core services + services.AddSingleton(); + services.AddSingleton(); + services.AddSingleton(); + services.AddSingleton(); + services.AddSingleton(); + + // Plugin host + services.AddSingleton(); + + // Hosted service to start/stop plugin host + services.AddHostedService(); + + return services; + } + + public static IServiceCollection AddPluginRegistry(this IServiceCollection services) + { + services.AddScoped(); + return services; + } +} + +public sealed class PluginHostedService : IHostedService +{ + private readonly IPluginHost _pluginHost; + private readonly ILogger _logger; + + public PluginHostedService(IPluginHost pluginHost, ILogger logger) + { + _pluginHost = pluginHost; + _logger = logger; + } + + public async Task StartAsync(CancellationToken ct) + { + _logger.LogInformation("Starting plugin host..."); + await _pluginHost.StartAsync(ct); + } + + public async Task StopAsync(CancellationToken ct) + { + _logger.LogInformation("Stopping plugin host..."); + await _pluginHost.StopAsync(ct); + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] `IPluginHost` interface with all methods +- [ ] Plugin discovery from filesystem +- [ ] Plugin discovery from embedded assemblies +- [ ] Assembly loading with `AssemblyLoadContext` isolation +- [ ] Plugin lifecycle state machine +- [ ] Graceful initialization with timeout +- [ ] Graceful shutdown with timeout +- [ ] Health monitoring with configurable interval +- [ ] Health status change events +- [ ] Auto-recovery for unhealthy plugins (optional) +- [ ] Dependency resolution for load order +- [ ] Hot reload support +- [ ] Service collection extensions +- [ ] Integration tests with test plugins +- [ ] Unit tests for all components +- [ ] Test coverage >= 80% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_001 Plugin Abstractions | Internal | TODO | +| .NET 10 | External | Available | +| YamlDotNet | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IPluginHost interface | TODO | | +| PluginHost implementation | TODO | | +| FileSystemPluginDiscovery | TODO | | +| EmbeddedPluginDiscovery | TODO | | +| AssemblyPluginLoader | TODO | | +| PluginAssemblyLoadContext | TODO | | +| PluginLifecycleManager | TODO | | +| PluginHealthMonitor | TODO | | +| PluginDependencyResolver | TODO | | +| PluginContext | TODO | | +| ServiceCollectionExtensions | TODO | | +| Unit tests | TODO | | +| Integration tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_100_003_PLUGIN_registry.md b/docs/implplan/SPRINT_20260110_100_003_PLUGIN_registry.md new file mode 100644 index 000000000..9a7517e63 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_003_PLUGIN_registry.md @@ -0,0 +1,762 @@ +# SPRINT: Plugin Registry (Database) + +> **Sprint ID:** 100_003 +> **Module:** PLUGIN +> **Phase:** 100 - Plugin System Unification +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_plugin_unification.md) + +--- + +## Overview + +Implement the database-backed plugin registry that persists plugin metadata, tracks health status, and supports multi-tenant plugin instances. The registry provides centralized plugin management and enables querying plugins by capability. + +### Objectives + +- Implement PostgreSQL-backed plugin registry +- Implement plugin capability indexing +- Implement tenant-specific plugin instances +- Implement health history tracking +- Implement plugin version management +- Provide migration scripts for schema creation + +### Working Directory + +``` +src/Plugin/ +├── StellaOps.Plugin.Registry/ +│ ├── StellaOps.Plugin.Registry.csproj +│ ├── IPluginRegistry.cs +│ ├── PostgresPluginRegistry.cs +│ ├── Models/ +│ │ ├── PluginRecord.cs +│ │ ├── PluginCapabilityRecord.cs +│ │ ├── PluginInstanceRecord.cs +│ │ └── PluginHealthRecord.cs +│ ├── Queries/ +│ │ ├── PluginQueries.cs +│ │ ├── CapabilityQueries.cs +│ │ └── InstanceQueries.cs +│ └── Migrations/ +│ └── 001_CreatePluginTables.sql +└── __Tests/ + └── StellaOps.Plugin.Registry.Tests/ + ├── PostgresPluginRegistryTests.cs + └── PluginQueryTests.cs +``` + +--- + +## Deliverables + +### Plugin Registry Interface + +```csharp +// IPluginRegistry.cs +namespace StellaOps.Plugin.Registry; + +/// +/// Database-backed plugin registry for persistent plugin management. +/// +public interface IPluginRegistry +{ + // ========== Plugin Management ========== + + /// + /// Register a loaded plugin in the database. + /// + Task RegisterAsync(LoadedPlugin plugin, CancellationToken ct); + + /// + /// Update plugin status. + /// + Task UpdateStatusAsync(string pluginId, PluginLifecycleState status, string? message = null, CancellationToken ct = default); + + /// + /// Update plugin health status. + /// + Task UpdateHealthAsync(string pluginId, HealthStatus status, HealthCheckResult? result = null, CancellationToken ct = default); + + /// + /// Unregister a plugin. + /// + Task UnregisterAsync(string pluginId, CancellationToken ct); + + /// + /// Get plugin by ID. + /// + Task GetAsync(string pluginId, CancellationToken ct); + + /// + /// Get all registered plugins. + /// + Task> GetAllAsync(CancellationToken ct); + + /// + /// Get plugins by status. + /// + Task> GetByStatusAsync(PluginLifecycleState status, CancellationToken ct); + + // ========== Capability Queries ========== + + /// + /// Get plugins with a specific capability. + /// + Task> GetByCapabilityAsync(PluginCapabilities capability, CancellationToken ct); + + /// + /// Get plugins providing a specific capability type/id. + /// + Task> GetByCapabilityTypeAsync(string capabilityType, string? capabilityId = null, CancellationToken ct = default); + + /// + /// Register plugin capabilities. + /// + Task RegisterCapabilitiesAsync(Guid pluginDbId, IEnumerable capabilities, CancellationToken ct); + + // ========== Instance Management ========== + + /// + /// Create a tenant-specific plugin instance. + /// + Task CreateInstanceAsync(CreatePluginInstanceRequest request, CancellationToken ct); + + /// + /// Get plugin instance. + /// + Task GetInstanceAsync(Guid instanceId, CancellationToken ct); + + /// + /// Get instances for a tenant. + /// + Task> GetInstancesForTenantAsync(Guid tenantId, CancellationToken ct); + + /// + /// Get instances for a plugin. + /// + Task> GetInstancesForPluginAsync(string pluginId, CancellationToken ct); + + /// + /// Update instance configuration. + /// + Task UpdateInstanceConfigAsync(Guid instanceId, JsonDocument config, CancellationToken ct); + + /// + /// Enable/disable instance. + /// + Task SetInstanceEnabledAsync(Guid instanceId, bool enabled, CancellationToken ct); + + /// + /// Delete instance. + /// + Task DeleteInstanceAsync(Guid instanceId, CancellationToken ct); + + // ========== Health History ========== + + /// + /// Record health check result. + /// + Task RecordHealthCheckAsync(string pluginId, HealthCheckResult result, CancellationToken ct); + + /// + /// Get health history for a plugin. + /// + Task> GetHealthHistoryAsync( + string pluginId, + DateTimeOffset since, + int limit = 100, + CancellationToken ct = default); +} + +public sealed record CreatePluginInstanceRequest( + string PluginId, + Guid? TenantId, + string? InstanceName, + JsonDocument Config, + string? SecretsPath = null, + JsonDocument? ResourceLimits = null); +``` + +### PostgreSQL Implementation + +```csharp +// PostgresPluginRegistry.cs +namespace StellaOps.Plugin.Registry; + +public sealed class PostgresPluginRegistry : IPluginRegistry +{ + private readonly NpgsqlDataSource _dataSource; + private readonly ILogger _logger; + private readonly TimeProvider _timeProvider; + + public PostgresPluginRegistry( + NpgsqlDataSource dataSource, + ILogger logger, + TimeProvider timeProvider) + { + _dataSource = dataSource; + _logger = logger; + _timeProvider = timeProvider; + } + + public async Task RegisterAsync(LoadedPlugin plugin, CancellationToken ct) + { + const string sql = """ + INSERT INTO platform.plugins ( + plugin_id, name, version, vendor, description, license_id, + trust_level, capabilities, capability_details, source, + assembly_path, entry_point, status, manifest, created_at, updated_at, loaded_at + ) VALUES ( + @plugin_id, @name, @version, @vendor, @description, @license_id, + @trust_level, @capabilities, @capability_details, @source, + @assembly_path, @entry_point, @status, @manifest, @now, @now, @now + ) + ON CONFLICT (plugin_id, version) DO UPDATE SET + status = @status, + updated_at = @now, + loaded_at = @now + RETURNING * + """; + + await using var conn = await _dataSource.OpenConnectionAsync(ct); + await using var cmd = new NpgsqlCommand(sql, conn); + + var now = _timeProvider.GetUtcNow(); + cmd.Parameters.AddWithValue("plugin_id", plugin.Info.Id); + cmd.Parameters.AddWithValue("name", plugin.Info.Name); + cmd.Parameters.AddWithValue("version", plugin.Info.Version); + cmd.Parameters.AddWithValue("vendor", plugin.Info.Vendor); + cmd.Parameters.AddWithValue("description", (object?)plugin.Info.Description ?? DBNull.Value); + cmd.Parameters.AddWithValue("license_id", (object?)plugin.Info.LicenseId ?? DBNull.Value); + cmd.Parameters.AddWithValue("trust_level", plugin.TrustLevel.ToString().ToLowerInvariant()); + cmd.Parameters.AddWithValue("capabilities", plugin.Capabilities.ToStringArray()); + cmd.Parameters.AddWithValue("capability_details", JsonSerializer.Serialize(new { })); + cmd.Parameters.AddWithValue("source", "installed"); + cmd.Parameters.AddWithValue("assembly_path", (object?)plugin.Manifest?.AssemblyPath ?? DBNull.Value); + cmd.Parameters.AddWithValue("entry_point", (object?)plugin.Manifest?.EntryPoint ?? DBNull.Value); + cmd.Parameters.AddWithValue("status", plugin.State.ToString().ToLowerInvariant()); + cmd.Parameters.AddWithValue("manifest", plugin.Manifest != null + ? JsonSerializer.Serialize(plugin.Manifest) + : DBNull.Value); + cmd.Parameters.AddWithValue("now", now); + + await using var reader = await cmd.ExecuteReaderAsync(ct); + if (await reader.ReadAsync(ct)) + { + var record = MapPluginRecord(reader); + + // Register capabilities + if (plugin.Manifest?.Capabilities.Count > 0) + { + var capRecords = plugin.Manifest.Capabilities.Select(c => new PluginCapabilityRecord + { + Id = Guid.NewGuid(), + PluginId = record.Id, + CapabilityType = c.Type, + CapabilityId = c.Id ?? c.Type, + ConfigSchema = c.ConfigSchema, + Metadata = c.Metadata, + IsEnabled = true, + CreatedAt = now + }); + + await RegisterCapabilitiesAsync(record.Id, capRecords, ct); + } + + _logger.LogDebug("Registered plugin {PluginId} with DB ID {DbId}", plugin.Info.Id, record.Id); + return record; + } + + throw new InvalidOperationException($"Failed to register plugin {plugin.Info.Id}"); + } + + public async Task UpdateStatusAsync(string pluginId, PluginLifecycleState status, string? message = null, CancellationToken ct = default) + { + const string sql = """ + UPDATE platform.plugins + SET status = @status, status_message = @message, updated_at = @now + WHERE plugin_id = @plugin_id + """; + + await using var conn = await _dataSource.OpenConnectionAsync(ct); + await using var cmd = new NpgsqlCommand(sql, conn); + + cmd.Parameters.AddWithValue("plugin_id", pluginId); + cmd.Parameters.AddWithValue("status", status.ToString().ToLowerInvariant()); + cmd.Parameters.AddWithValue("message", (object?)message ?? DBNull.Value); + cmd.Parameters.AddWithValue("now", _timeProvider.GetUtcNow()); + + await cmd.ExecuteNonQueryAsync(ct); + } + + public async Task UpdateHealthAsync(string pluginId, HealthStatus status, HealthCheckResult? result = null, CancellationToken ct = default) + { + const string sql = """ + UPDATE platform.plugins + SET health_status = @health_status, last_health_check = @now, updated_at = @now + WHERE plugin_id = @plugin_id + """; + + await using var conn = await _dataSource.OpenConnectionAsync(ct); + await using var cmd = new NpgsqlCommand(sql, conn); + + var now = _timeProvider.GetUtcNow(); + cmd.Parameters.AddWithValue("plugin_id", pluginId); + cmd.Parameters.AddWithValue("health_status", status.ToString().ToLowerInvariant()); + cmd.Parameters.AddWithValue("now", now); + + await cmd.ExecuteNonQueryAsync(ct); + + // Record health history + if (result != null) + { + await RecordHealthCheckAsync(pluginId, result, ct); + } + } + + public async Task UnregisterAsync(string pluginId, CancellationToken ct) + { + const string sql = "DELETE FROM platform.plugins WHERE plugin_id = @plugin_id"; + + await using var conn = await _dataSource.OpenConnectionAsync(ct); + await using var cmd = new NpgsqlCommand(sql, conn); + + cmd.Parameters.AddWithValue("plugin_id", pluginId); + await cmd.ExecuteNonQueryAsync(ct); + + _logger.LogDebug("Unregistered plugin {PluginId}", pluginId); + } + + public async Task GetAsync(string pluginId, CancellationToken ct) + { + const string sql = "SELECT * FROM platform.plugins WHERE plugin_id = @plugin_id"; + + await using var conn = await _dataSource.OpenConnectionAsync(ct); + await using var cmd = new NpgsqlCommand(sql, conn); + + cmd.Parameters.AddWithValue("plugin_id", pluginId); + + await using var reader = await cmd.ExecuteReaderAsync(ct); + return await reader.ReadAsync(ct) ? MapPluginRecord(reader) : null; + } + + public async Task> GetAllAsync(CancellationToken ct) + { + const string sql = "SELECT * FROM platform.plugins ORDER BY name"; + + await using var conn = await _dataSource.OpenConnectionAsync(ct); + await using var cmd = new NpgsqlCommand(sql, conn); + + var results = new List(); + await using var reader = await cmd.ExecuteReaderAsync(ct); + + while (await reader.ReadAsync(ct)) + { + results.Add(MapPluginRecord(reader)); + } + + return results; + } + + public async Task> GetByCapabilityAsync(PluginCapabilities capability, CancellationToken ct) + { + var capabilityStrings = capability.ToStringArray(); + + const string sql = """ + SELECT * FROM platform.plugins + WHERE capabilities && @capabilities + AND status = 'active' + ORDER BY name + """; + + await using var conn = await _dataSource.OpenConnectionAsync(ct); + await using var cmd = new NpgsqlCommand(sql, conn); + + cmd.Parameters.AddWithValue("capabilities", capabilityStrings); + + var results = new List(); + await using var reader = await cmd.ExecuteReaderAsync(ct); + + while (await reader.ReadAsync(ct)) + { + results.Add(MapPluginRecord(reader)); + } + + return results; + } + + public async Task> GetByCapabilityTypeAsync( + string capabilityType, + string? capabilityId = null, + CancellationToken ct = default) + { + var sql = """ + SELECT p.* FROM platform.plugins p + INNER JOIN platform.plugin_capabilities c ON c.plugin_id = p.id + WHERE c.capability_type = @capability_type + AND c.is_enabled = TRUE + AND p.status = 'active' + """; + + if (capabilityId != null) + { + sql += " AND c.capability_id = @capability_id"; + } + + sql += " ORDER BY p.name"; + + await using var conn = await _dataSource.OpenConnectionAsync(ct); + await using var cmd = new NpgsqlCommand(sql, conn); + + cmd.Parameters.AddWithValue("capability_type", capabilityType); + if (capabilityId != null) + { + cmd.Parameters.AddWithValue("capability_id", capabilityId); + } + + var results = new List(); + await using var reader = await cmd.ExecuteReaderAsync(ct); + + while (await reader.ReadAsync(ct)) + { + results.Add(MapPluginRecord(reader)); + } + + return results; + } + + public async Task RegisterCapabilitiesAsync( + Guid pluginDbId, + IEnumerable capabilities, + CancellationToken ct) + { + const string sql = """ + INSERT INTO platform.plugin_capabilities ( + id, plugin_id, capability_type, capability_id, + config_schema, metadata, is_enabled, created_at + ) VALUES ( + @id, @plugin_id, @capability_type, @capability_id, + @config_schema, @metadata, @is_enabled, @created_at + ) + ON CONFLICT (plugin_id, capability_type, capability_id) DO UPDATE SET + config_schema = @config_schema, + metadata = @metadata + """; + + await using var conn = await _dataSource.OpenConnectionAsync(ct); + await using var batch = new NpgsqlBatch(conn); + + foreach (var cap in capabilities) + { + var cmd = new NpgsqlBatchCommand(sql); + cmd.Parameters.AddWithValue("id", cap.Id); + cmd.Parameters.AddWithValue("plugin_id", pluginDbId); + cmd.Parameters.AddWithValue("capability_type", cap.CapabilityType); + cmd.Parameters.AddWithValue("capability_id", cap.CapabilityId); + cmd.Parameters.AddWithValue("config_schema", cap.ConfigSchema != null + ? JsonSerializer.Serialize(cap.ConfigSchema) + : DBNull.Value); + cmd.Parameters.AddWithValue("metadata", cap.Metadata != null + ? JsonSerializer.Serialize(cap.Metadata) + : DBNull.Value); + cmd.Parameters.AddWithValue("is_enabled", cap.IsEnabled); + cmd.Parameters.AddWithValue("created_at", cap.CreatedAt); + batch.BatchCommands.Add(cmd); + } + + await batch.ExecuteNonQueryAsync(ct); + } + + // ========== Instance Management ========== + + public async Task CreateInstanceAsync(CreatePluginInstanceRequest request, CancellationToken ct) + { + const string sql = """ + INSERT INTO platform.plugin_instances ( + plugin_id, tenant_id, instance_name, config, secrets_path, + resource_limits, enabled, status, created_at, updated_at + ) + SELECT p.id, @tenant_id, @instance_name, @config, @secrets_path, + @resource_limits, TRUE, 'pending', @now, @now + FROM platform.plugins p + WHERE p.plugin_id = @plugin_id + RETURNING * + """; + + await using var conn = await _dataSource.OpenConnectionAsync(ct); + await using var cmd = new NpgsqlCommand(sql, conn); + + var now = _timeProvider.GetUtcNow(); + cmd.Parameters.AddWithValue("plugin_id", request.PluginId); + cmd.Parameters.AddWithValue("tenant_id", (object?)request.TenantId ?? DBNull.Value); + cmd.Parameters.AddWithValue("instance_name", (object?)request.InstanceName ?? DBNull.Value); + cmd.Parameters.AddWithValue("config", JsonSerializer.Serialize(request.Config)); + cmd.Parameters.AddWithValue("secrets_path", (object?)request.SecretsPath ?? DBNull.Value); + cmd.Parameters.AddWithValue("resource_limits", request.ResourceLimits != null + ? JsonSerializer.Serialize(request.ResourceLimits) + : DBNull.Value); + cmd.Parameters.AddWithValue("now", now); + + await using var reader = await cmd.ExecuteReaderAsync(ct); + if (await reader.ReadAsync(ct)) + { + return MapInstanceRecord(reader); + } + + throw new InvalidOperationException($"Failed to create instance for plugin {request.PluginId}"); + } + + public async Task RecordHealthCheckAsync(string pluginId, HealthCheckResult result, CancellationToken ct) + { + const string sql = """ + INSERT INTO platform.plugin_health_history ( + plugin_id, checked_at, status, response_time_ms, details, created_at + ) + SELECT p.id, @checked_at, @status, @response_time_ms, @details, @checked_at + FROM platform.plugins p + WHERE p.plugin_id = @plugin_id + """; + + await using var conn = await _dataSource.OpenConnectionAsync(ct); + await using var cmd = new NpgsqlCommand(sql, conn); + + cmd.Parameters.AddWithValue("plugin_id", pluginId); + cmd.Parameters.AddWithValue("checked_at", _timeProvider.GetUtcNow()); + cmd.Parameters.AddWithValue("status", result.Status.ToString().ToLowerInvariant()); + cmd.Parameters.AddWithValue("response_time_ms", result.Duration?.TotalMilliseconds ?? 0); + cmd.Parameters.AddWithValue("details", result.Details != null + ? JsonSerializer.Serialize(result.Details) + : DBNull.Value); + + await cmd.ExecuteNonQueryAsync(ct); + } + + // ... additional method implementations ... + + private static PluginRecord MapPluginRecord(NpgsqlDataReader reader) => new() + { + Id = reader.GetGuid(reader.GetOrdinal("id")), + PluginId = reader.GetString(reader.GetOrdinal("plugin_id")), + Name = reader.GetString(reader.GetOrdinal("name")), + Version = reader.GetString(reader.GetOrdinal("version")), + Vendor = reader.GetString(reader.GetOrdinal("vendor")), + Description = reader.IsDBNull(reader.GetOrdinal("description")) ? null : reader.GetString(reader.GetOrdinal("description")), + TrustLevel = Enum.Parse(reader.GetString(reader.GetOrdinal("trust_level")), ignoreCase: true), + Capabilities = PluginCapabilitiesExtensions.FromStringArray(reader.GetFieldValue(reader.GetOrdinal("capabilities"))), + Status = Enum.Parse(reader.GetString(reader.GetOrdinal("status")), ignoreCase: true), + HealthStatus = reader.IsDBNull(reader.GetOrdinal("health_status")) + ? HealthStatus.Unknown + : Enum.Parse(reader.GetString(reader.GetOrdinal("health_status")), ignoreCase: true), + CreatedAt = reader.GetFieldValue(reader.GetOrdinal("created_at")), + UpdatedAt = reader.GetFieldValue(reader.GetOrdinal("updated_at")), + LoadedAt = reader.IsDBNull(reader.GetOrdinal("loaded_at")) ? null : reader.GetFieldValue(reader.GetOrdinal("loaded_at")) + }; + + private static PluginInstanceRecord MapInstanceRecord(NpgsqlDataReader reader) => new() + { + Id = reader.GetGuid(reader.GetOrdinal("id")), + PluginId = reader.GetGuid(reader.GetOrdinal("plugin_id")), + TenantId = reader.IsDBNull(reader.GetOrdinal("tenant_id")) ? null : reader.GetGuid(reader.GetOrdinal("tenant_id")), + InstanceName = reader.IsDBNull(reader.GetOrdinal("instance_name")) ? null : reader.GetString(reader.GetOrdinal("instance_name")), + Config = JsonDocument.Parse(reader.GetString(reader.GetOrdinal("config"))), + SecretsPath = reader.IsDBNull(reader.GetOrdinal("secrets_path")) ? null : reader.GetString(reader.GetOrdinal("secrets_path")), + Enabled = reader.GetBoolean(reader.GetOrdinal("enabled")), + Status = reader.GetString(reader.GetOrdinal("status")), + CreatedAt = reader.GetFieldValue(reader.GetOrdinal("created_at")), + UpdatedAt = reader.GetFieldValue(reader.GetOrdinal("updated_at")) + }; +} +``` + +### Database Migration + +```sql +-- Migrations/001_CreatePluginTables.sql + +-- Plugin registry table +CREATE TABLE IF NOT EXISTS platform.plugins ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + plugin_id VARCHAR(255) NOT NULL, + name VARCHAR(255) NOT NULL, + version VARCHAR(50) NOT NULL, + vendor VARCHAR(255) NOT NULL, + description TEXT, + license_id VARCHAR(50), + + -- Trust and security + trust_level VARCHAR(50) NOT NULL CHECK (trust_level IN ('builtin', 'trusted', 'untrusted')), + signature BYTEA, + signing_key_id VARCHAR(255), + + -- Capabilities + capabilities TEXT[] NOT NULL DEFAULT '{}', + capability_details JSONB NOT NULL DEFAULT '{}', + + -- Source and deployment + source VARCHAR(50) NOT NULL CHECK (source IN ('bundled', 'installed', 'discovered')), + assembly_path VARCHAR(500), + entry_point VARCHAR(255), + + -- Lifecycle + status VARCHAR(50) NOT NULL DEFAULT 'discovered' CHECK (status IN ( + 'discovered', 'loading', 'initializing', 'active', + 'degraded', 'stopping', 'stopped', 'failed', 'unloading' + )), + status_message TEXT, + + -- Health + health_status VARCHAR(50) DEFAULT 'unknown' CHECK (health_status IN ( + 'unknown', 'healthy', 'degraded', 'unhealthy' + )), + last_health_check TIMESTAMPTZ, + health_check_failures INT NOT NULL DEFAULT 0, + + -- Metadata + manifest JSONB, + runtime_info JSONB, + + -- Audit + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + loaded_at TIMESTAMPTZ, + + UNIQUE(plugin_id, version) +); + +-- Plugin capabilities +CREATE TABLE IF NOT EXISTS platform.plugin_capabilities ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + plugin_id UUID NOT NULL REFERENCES platform.plugins(id) ON DELETE CASCADE, + + capability_type VARCHAR(100) NOT NULL, + capability_id VARCHAR(255) NOT NULL, + + config_schema JSONB, + input_schema JSONB, + output_schema JSONB, + + display_name VARCHAR(255), + description TEXT, + documentation_url VARCHAR(500), + + is_enabled BOOLEAN NOT NULL DEFAULT TRUE, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + + UNIQUE(plugin_id, capability_type, capability_id) +); + +-- Plugin instances (for multi-tenant) +CREATE TABLE IF NOT EXISTS platform.plugin_instances ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + plugin_id UUID NOT NULL REFERENCES platform.plugins(id) ON DELETE CASCADE, + tenant_id UUID REFERENCES platform.tenants(id) ON DELETE CASCADE, + + instance_name VARCHAR(255), + config JSONB NOT NULL DEFAULT '{}', + secrets_path VARCHAR(500), + + enabled BOOLEAN NOT NULL DEFAULT TRUE, + status VARCHAR(50) NOT NULL DEFAULT 'pending', + + resource_limits JSONB, + + last_used_at TIMESTAMPTZ, + invocation_count BIGINT NOT NULL DEFAULT 0, + error_count BIGINT NOT NULL DEFAULT 0, + + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + + UNIQUE(plugin_id, tenant_id, COALESCE(instance_name, '')) +); + +-- Plugin health history (partitioned) +CREATE TABLE IF NOT EXISTS platform.plugin_health_history ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + plugin_id UUID NOT NULL REFERENCES platform.plugins(id) ON DELETE CASCADE, + + checked_at TIMESTAMPTZ NOT NULL DEFAULT now(), + status VARCHAR(50) NOT NULL, + response_time_ms INT, + details JSONB, + + created_at TIMESTAMPTZ NOT NULL DEFAULT now() +) PARTITION BY RANGE (created_at); + +-- Create partitions for health history (last 30 days) +CREATE TABLE IF NOT EXISTS platform.plugin_health_history_current + PARTITION OF platform.plugin_health_history + FOR VALUES FROM (CURRENT_DATE - INTERVAL '30 days') TO (CURRENT_DATE + INTERVAL '1 day'); + +-- Indexes +CREATE INDEX IF NOT EXISTS idx_plugins_plugin_id ON platform.plugins(plugin_id); +CREATE INDEX IF NOT EXISTS idx_plugins_status ON platform.plugins(status) WHERE status != 'active'; +CREATE INDEX IF NOT EXISTS idx_plugins_trust_level ON platform.plugins(trust_level); +CREATE INDEX IF NOT EXISTS idx_plugins_capabilities ON platform.plugins USING GIN (capabilities); +CREATE INDEX IF NOT EXISTS idx_plugins_health ON platform.plugins(health_status) WHERE health_status != 'healthy'; + +CREATE INDEX IF NOT EXISTS idx_plugin_capabilities_type ON platform.plugin_capabilities(capability_type); +CREATE INDEX IF NOT EXISTS idx_plugin_capabilities_lookup ON platform.plugin_capabilities(capability_type, capability_id); +CREATE INDEX IF NOT EXISTS idx_plugin_capabilities_plugin ON platform.plugin_capabilities(plugin_id); + +CREATE INDEX IF NOT EXISTS idx_plugin_instances_tenant ON platform.plugin_instances(tenant_id) WHERE tenant_id IS NOT NULL; +CREATE INDEX IF NOT EXISTS idx_plugin_instances_plugin ON platform.plugin_instances(plugin_id); +CREATE INDEX IF NOT EXISTS idx_plugin_instances_enabled ON platform.plugin_instances(plugin_id, enabled) WHERE enabled = TRUE; + +CREATE INDEX IF NOT EXISTS idx_plugin_health_history_plugin ON platform.plugin_health_history(plugin_id, checked_at DESC); +``` + +--- + +## Acceptance Criteria + +- [ ] `IPluginRegistry` interface with all methods +- [ ] PostgreSQL implementation +- [ ] Plugin registration and unregistration +- [ ] Status updates +- [ ] Health updates and history +- [ ] Capability registration and queries +- [ ] Capability type/id lookup +- [ ] Instance creation +- [ ] Instance configuration updates +- [ ] Instance enable/disable +- [ ] Tenant-scoped instance queries +- [ ] Database migration scripts +- [ ] Partitioned health history table +- [ ] Integration tests with PostgreSQL +- [ ] Test coverage >= 80% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_001 Plugin Abstractions | Internal | TODO | +| 100_002 Plugin Host | Internal | TODO | +| PostgreSQL 16+ | External | Available | +| Npgsql 8.x | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IPluginRegistry interface | TODO | | +| PostgresPluginRegistry | TODO | | +| PluginRecord model | TODO | | +| PluginCapabilityRecord model | TODO | | +| PluginInstanceRecord model | TODO | | +| PluginHealthRecord model | TODO | | +| Database migration | TODO | | +| Integration tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_100_004_PLUGIN_sandbox.md b/docs/implplan/SPRINT_20260110_100_004_PLUGIN_sandbox.md new file mode 100644 index 000000000..0d1da0b94 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_004_PLUGIN_sandbox.md @@ -0,0 +1,1134 @@ +# SPRINT: Plugin Sandbox Infrastructure + +> **Sprint ID:** 100_004 +> **Module:** PLUGIN +> **Phase:** 100 - Plugin System Unification +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_plugin_unification.md) + +--- + +## Overview + +Implement the plugin sandbox infrastructure that provides process isolation, resource limits, and security boundaries for untrusted plugins. The sandbox ensures that third-party plugins cannot compromise platform stability or security. + +### Objectives + +- Implement process-based plugin isolation +- Implement resource limits (CPU, memory, disk, network) +- Implement gRPC communication bridge +- Implement network policy enforcement +- Implement filesystem isolation +- Implement secret proxy for controlled vault access + +### Working Directory + +``` +src/Plugin/ +├── StellaOps.Plugin.Sandbox/ +│ ├── StellaOps.Plugin.Sandbox.csproj +│ ├── ISandbox.cs +│ ├── ISandboxFactory.cs +│ ├── ProcessSandbox.cs +│ ├── SandboxConfiguration.cs +│ ├── Process/ +│ │ ├── PluginProcessManager.cs +│ │ ├── PluginProcessHost.cs +│ │ └── ProcessMonitor.cs +│ ├── Communication/ +│ │ ├── GrpcPluginBridge.cs +│ │ ├── PluginServiceImpl.cs +│ │ └── Proto/ +│ │ └── plugin_bridge.proto +│ ├── Resources/ +│ │ ├── IResourceLimiter.cs +│ │ ├── LinuxResourceLimiter.cs +│ │ ├── WindowsResourceLimiter.cs +│ │ └── ResourceUsage.cs +│ ├── Network/ +│ │ ├── INetworkPolicy.cs +│ │ ├── NetworkPolicyEnforcer.cs +│ │ └── AllowedHostsFilter.cs +│ ├── Filesystem/ +│ │ ├── IFilesystemPolicy.cs +│ │ ├── SandboxedFilesystem.cs +│ │ └── FilesystemMount.cs +│ └── Secrets/ +│ ├── ISecretProxy.cs +│ └── ScopedSecretProxy.cs +├── StellaOps.Plugin.Sandbox.Host/ +│ ├── StellaOps.Plugin.Sandbox.Host.csproj +│ ├── Program.cs +│ └── PluginHostService.cs +└── __Tests/ + └── StellaOps.Plugin.Sandbox.Tests/ + ├── ProcessSandboxTests.cs + ├── ResourceLimiterTests.cs + └── NetworkPolicyTests.cs +``` + +--- + +## Deliverables + +### Sandbox Interface + +```csharp +// ISandbox.cs +namespace StellaOps.Plugin.Sandbox; + +/// +/// Provides isolated execution environment for untrusted plugins. +/// +public interface ISandbox : IAsyncDisposable +{ + /// + /// Sandbox identifier. + /// + string Id { get; } + + /// + /// Current sandbox state. + /// + SandboxState State { get; } + + /// + /// Current resource usage. + /// + ResourceUsage CurrentUsage { get; } + + /// + /// Start the sandbox and load the plugin. + /// + Task StartAsync(PluginManifest manifest, CancellationToken ct); + + /// + /// Stop the sandbox gracefully. + /// + Task StopAsync(TimeSpan timeout, CancellationToken ct); + + /// + /// Execute an operation in the sandbox. + /// + Task ExecuteAsync( + string operationName, + object? parameters, + TimeSpan timeout, + CancellationToken ct); + + /// + /// Execute a streaming operation in the sandbox. + /// + IAsyncEnumerable ExecuteStreamingAsync( + string operationName, + object? parameters, + CancellationToken ct); + + /// + /// Perform health check on sandboxed plugin. + /// + Task HealthCheckAsync(CancellationToken ct); + + /// + /// Event raised when sandbox state changes. + /// + event EventHandler? StateChanged; + + /// + /// Event raised when resource limits are approached. + /// + event EventHandler? ResourceWarning; +} + +public enum SandboxState +{ + Created, + Starting, + Running, + Stopping, + Stopped, + Failed, + Killed +} + +public sealed class SandboxStateChangedEventArgs : EventArgs +{ + public required SandboxState OldState { get; init; } + public required SandboxState NewState { get; init; } + public string? Reason { get; init; } +} + +public sealed class ResourceWarningEventArgs : EventArgs +{ + public required ResourceType Resource { get; init; } + public required double CurrentUsagePercent { get; init; } + public required double ThresholdPercent { get; init; } +} + +public enum ResourceType +{ + Memory, + Cpu, + Disk, + Network +} +``` + +### Sandbox Configuration + +```csharp +// SandboxConfiguration.cs +namespace StellaOps.Plugin.Sandbox; + +/// +/// Configuration for plugin sandbox. +/// +public sealed record SandboxConfiguration +{ + /// + /// Resource limits for the sandbox. + /// + public required ResourceLimits ResourceLimits { get; init; } + + /// + /// Network policy for the sandbox. + /// + public required NetworkPolicy NetworkPolicy { get; init; } + + /// + /// Filesystem policy for the sandbox. + /// + public required FilesystemPolicy FilesystemPolicy { get; init; } + + /// + /// Timeouts for sandbox operations. + /// + public required SandboxTimeouts Timeouts { get; init; } + + /// + /// Whether to enable process isolation. + /// + public bool ProcessIsolation { get; init; } = true; + + /// + /// Working directory for the sandbox. + /// + public string? WorkingDirectory { get; init; } + + /// + /// Environment variables to pass to the sandbox. + /// + public IReadOnlyDictionary EnvironmentVariables { get; init; } = + new Dictionary(); + + /// + /// Default configuration for untrusted plugins. + /// + public static SandboxConfiguration Default => new() + { + ResourceLimits = new ResourceLimits + { + MaxMemoryMb = 512, + MaxCpuPercent = 25, + MaxDiskMb = 100, + MaxNetworkBandwidthMbps = 10 + }, + NetworkPolicy = new NetworkPolicy + { + AllowedHosts = new HashSet(), + BlockedPorts = new HashSet { 22, 3389, 5432, 27017, 6379 } + }, + FilesystemPolicy = new FilesystemPolicy + { + ReadOnlyPaths = new List(), + WritablePaths = new List(), + BlockedPaths = new List { "/etc", "/var", "/root", "C:\\Windows" } + }, + Timeouts = new SandboxTimeouts + { + StartupTimeout = TimeSpan.FromSeconds(30), + OperationTimeout = TimeSpan.FromSeconds(60), + ShutdownTimeout = TimeSpan.FromSeconds(10), + HealthCheckTimeout = TimeSpan.FromSeconds(5) + } + }; +} + +public sealed record ResourceLimits +{ + public int MaxMemoryMb { get; init; } = 512; + public int MaxCpuPercent { get; init; } = 25; + public int MaxDiskMb { get; init; } = 100; + public int MaxNetworkBandwidthMbps { get; init; } = 10; + public int MaxOpenFiles { get; init; } = 1000; + public int MaxProcesses { get; init; } = 10; +} + +public sealed record NetworkPolicy +{ + public IReadOnlySet AllowedHosts { get; init; } = new HashSet(); + public IReadOnlySet BlockedHosts { get; init; } = new HashSet(); + public IReadOnlySet AllowedPorts { get; init; } = new HashSet { 80, 443 }; + public IReadOnlySet BlockedPorts { get; init; } = new HashSet(); + public bool AllowDns { get; init; } = true; + public int MaxConnectionsPerHost { get; init; } = 10; +} + +public sealed record FilesystemPolicy +{ + public IReadOnlyList ReadOnlyPaths { get; init; } = new List(); + public IReadOnlyList WritablePaths { get; init; } = new List(); + public IReadOnlyList BlockedPaths { get; init; } = new List(); + public long MaxWriteBytes { get; init; } = 100 * 1024 * 1024; // 100 MB +} + +public sealed record SandboxTimeouts +{ + public TimeSpan StartupTimeout { get; init; } = TimeSpan.FromSeconds(30); + public TimeSpan OperationTimeout { get; init; } = TimeSpan.FromSeconds(60); + public TimeSpan ShutdownTimeout { get; init; } = TimeSpan.FromSeconds(10); + public TimeSpan HealthCheckTimeout { get; init; } = TimeSpan.FromSeconds(5); +} +``` + +### Process Sandbox Implementation + +```csharp +// ProcessSandbox.cs +namespace StellaOps.Plugin.Sandbox; + +public sealed class ProcessSandbox : ISandbox +{ + private readonly SandboxConfiguration _config; + private readonly IPluginProcessManager _processManager; + private readonly IGrpcPluginBridge _bridge; + private readonly IResourceLimiter _resourceLimiter; + private readonly INetworkPolicyEnforcer _networkEnforcer; + private readonly ILogger _logger; + + private Process? _process; + private SandboxState _state = SandboxState.Created; + private ResourceUsage _currentUsage = new(); + + public string Id { get; } + public SandboxState State => _state; + public ResourceUsage CurrentUsage => _currentUsage; + + public event EventHandler? StateChanged; + public event EventHandler? ResourceWarning; + + public ProcessSandbox( + string id, + SandboxConfiguration config, + IPluginProcessManager processManager, + IGrpcPluginBridge bridge, + IResourceLimiter resourceLimiter, + INetworkPolicyEnforcer networkEnforcer, + ILogger logger) + { + Id = id; + _config = config; + _processManager = processManager; + _bridge = bridge; + _resourceLimiter = resourceLimiter; + _networkEnforcer = networkEnforcer; + _logger = logger; + } + + public async Task StartAsync(PluginManifest manifest, CancellationToken ct) + { + TransitionState(SandboxState.Starting); + + try + { + // 1. Create isolated working directory + var workDir = PrepareWorkingDirectory(manifest); + + // 2. Configure resource limits + var resourceConfig = _resourceLimiter.CreateConfiguration(_config.ResourceLimits); + + // 3. Configure network policy + await _networkEnforcer.ApplyPolicyAsync(Id, _config.NetworkPolicy, ct); + + // 4. Start the plugin host process + var socketPath = GetSocketPath(); + _process = await _processManager.StartAsync(new ProcessStartRequest + { + PluginAssemblyPath = manifest.AssemblyPath!, + EntryPoint = manifest.EntryPoint, + WorkingDirectory = workDir, + SocketPath = socketPath, + ResourceConfiguration = resourceConfig, + EnvironmentVariables = _config.EnvironmentVariables + }, ct); + + // 5. Wait for the process to be ready + await WaitForReadyAsync(ct); + + // 6. Connect gRPC bridge + await _bridge.ConnectAsync(socketPath, ct); + + // 7. Initialize the plugin + await _bridge.InitializePluginAsync(manifest, ct); + + // 8. Start resource monitoring + StartResourceMonitoring(); + + TransitionState(SandboxState.Running); + + _logger.LogInformation("Sandbox {Id} started for plugin {PluginId}", + Id, manifest.Info.Id); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to start sandbox {Id}", Id); + TransitionState(SandboxState.Failed, ex.Message); + throw; + } + } + + public async Task StopAsync(TimeSpan timeout, CancellationToken ct) + { + if (_state != SandboxState.Running) + return; + + TransitionState(SandboxState.Stopping); + + try + { + // 1. Signal graceful shutdown via gRPC + using var timeoutCts = CancellationTokenSource.CreateLinkedTokenSource(ct); + timeoutCts.CancelAfter(timeout); + + try + { + await _bridge.ShutdownPluginAsync(timeoutCts.Token); + } + catch (OperationCanceledException) + { + _logger.LogWarning("Sandbox {Id} did not shutdown gracefully, killing", Id); + } + + // 2. Disconnect bridge + await _bridge.DisconnectAsync(ct); + + // 3. Stop the process + await _processManager.StopAsync(_process!, timeout, ct); + + // 4. Cleanup network policy + await _networkEnforcer.RemovePolicyAsync(Id, ct); + + // 5. Cleanup working directory + CleanupWorkingDirectory(); + + TransitionState(SandboxState.Stopped); + + _logger.LogInformation("Sandbox {Id} stopped", Id); + } + catch (Exception ex) + { + _logger.LogError(ex, "Error stopping sandbox {Id}", Id); + TransitionState(SandboxState.Failed, ex.Message); + throw; + } + } + + public async Task ExecuteAsync( + string operationName, + object? parameters, + TimeSpan timeout, + CancellationToken ct) + { + EnsureRunning(); + + using var timeoutCts = CancellationTokenSource.CreateLinkedTokenSource(ct); + timeoutCts.CancelAfter(timeout); + + try + { + return await _bridge.InvokeAsync(operationName, parameters, timeoutCts.Token); + } + catch (OperationCanceledException) when (timeoutCts.IsCancellationRequested && !ct.IsCancellationRequested) + { + throw new TimeoutException($"Operation '{operationName}' timed out after {timeout}"); + } + } + + public async IAsyncEnumerable ExecuteStreamingAsync( + string operationName, + object? parameters, + [EnumeratorCancellation] CancellationToken ct) + { + EnsureRunning(); + + await foreach (var evt in _bridge.InvokeStreamingAsync(operationName, parameters, ct)) + { + yield return evt; + } + } + + public async Task HealthCheckAsync(CancellationToken ct) + { + if (_state != SandboxState.Running) + { + return HealthCheckResult.Unhealthy($"Sandbox is in state {_state}"); + } + + try + { + using var timeoutCts = CancellationTokenSource.CreateLinkedTokenSource(ct); + timeoutCts.CancelAfter(_config.Timeouts.HealthCheckTimeout); + + var result = await _bridge.HealthCheckAsync(timeoutCts.Token); + + // Add resource usage to details + var details = new Dictionary(result.Details ?? new Dictionary()) + { + ["sandboxId"] = Id, + ["memoryUsageMb"] = _currentUsage.MemoryUsageMb, + ["cpuUsagePercent"] = _currentUsage.CpuUsagePercent + }; + + return result with { Details = details }; + } + catch (Exception ex) + { + return HealthCheckResult.Unhealthy(ex); + } + } + + private void EnsureRunning() + { + if (_state != SandboxState.Running) + { + throw new InvalidOperationException($"Sandbox is not running (state: {_state})"); + } + } + + private void TransitionState(SandboxState newState, string? reason = null) + { + var oldState = _state; + _state = newState; + + StateChanged?.Invoke(this, new SandboxStateChangedEventArgs + { + OldState = oldState, + NewState = newState, + Reason = reason + }); + } + + private string PrepareWorkingDirectory(PluginManifest manifest) + { + var workDir = _config.WorkingDirectory + ?? Path.Combine(Path.GetTempPath(), "stellaops-sandbox", Id); + + if (Directory.Exists(workDir)) + Directory.Delete(workDir, recursive: true); + + Directory.CreateDirectory(workDir); + + // Copy plugin files to sandbox directory + var pluginDir = Path.GetDirectoryName(manifest.AssemblyPath)!; + CopyDirectory(pluginDir, workDir); + + return workDir; + } + + private void CleanupWorkingDirectory() + { + var workDir = _config.WorkingDirectory + ?? Path.Combine(Path.GetTempPath(), "stellaops-sandbox", Id); + + if (Directory.Exists(workDir)) + { + try + { + Directory.Delete(workDir, recursive: true); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Failed to cleanup sandbox directory {WorkDir}", workDir); + } + } + } + + private string GetSocketPath() + { + if (OperatingSystem.IsWindows()) + { + return $"\\\\.\\pipe\\stellaops-sandbox-{Id}"; + } + else + { + return Path.Combine(Path.GetTempPath(), $"stellaops-sandbox-{Id}.sock"); + } + } + + private async Task WaitForReadyAsync(CancellationToken ct) + { + using var timeoutCts = CancellationTokenSource.CreateLinkedTokenSource(ct); + timeoutCts.CancelAfter(_config.Timeouts.StartupTimeout); + + while (!timeoutCts.IsCancellationRequested) + { + if (_process?.HasExited == true) + { + throw new InvalidOperationException( + $"Plugin process exited with code {_process.ExitCode}"); + } + + if (File.Exists(GetSocketPath()) || OperatingSystem.IsWindows()) + { + // Try to connect + try + { + await _bridge.ConnectAsync(GetSocketPath(), timeoutCts.Token); + return; + } + catch + { + // Not ready yet + } + } + + await Task.Delay(100, timeoutCts.Token); + } + + throw new TimeoutException("Plugin process did not become ready in time"); + } + + private void StartResourceMonitoring() + { + _ = Task.Run(async () => + { + while (_state == SandboxState.Running) + { + try + { + _currentUsage = await _resourceLimiter.GetUsageAsync(_process!, default); + + // Check thresholds + CheckResourceThreshold(ResourceType.Memory, + _currentUsage.MemoryUsageMb, + _config.ResourceLimits.MaxMemoryMb); + + CheckResourceThreshold(ResourceType.Cpu, + _currentUsage.CpuUsagePercent, + _config.ResourceLimits.MaxCpuPercent); + + await Task.Delay(1000); + } + catch (Exception ex) + { + _logger.LogError(ex, "Error monitoring resources for sandbox {Id}", Id); + } + } + }); + } + + private void CheckResourceThreshold(ResourceType resource, double current, double max) + { + var percent = (current / max) * 100; + if (percent >= 80) + { + ResourceWarning?.Invoke(this, new ResourceWarningEventArgs + { + Resource = resource, + CurrentUsagePercent = percent, + ThresholdPercent = 80 + }); + } + } + + private static void CopyDirectory(string source, string destination) + { + foreach (var dir in Directory.GetDirectories(source, "*", SearchOption.AllDirectories)) + { + Directory.CreateDirectory(dir.Replace(source, destination)); + } + + foreach (var file in Directory.GetFiles(source, "*", SearchOption.AllDirectories)) + { + File.Copy(file, file.Replace(source, destination), overwrite: true); + } + } + + public async ValueTask DisposeAsync() + { + if (_state == SandboxState.Running) + { + await StopAsync(_config.Timeouts.ShutdownTimeout, CancellationToken.None); + } + + _bridge?.Dispose(); + } +} +``` + +### gRPC Plugin Bridge + +```protobuf +// Proto/plugin_bridge.proto +syntax = "proto3"; + +package stellaops.plugin.bridge; + +option csharp_namespace = "StellaOps.Plugin.Sandbox.Communication"; + +service PluginBridge { + // Lifecycle + rpc Initialize(InitializeRequest) returns (InitializeResponse); + rpc Shutdown(ShutdownRequest) returns (ShutdownResponse); + rpc HealthCheck(HealthCheckRequest) returns (HealthCheckResponse); + + // Operations + rpc Invoke(InvokeRequest) returns (InvokeResponse); + rpc InvokeStreaming(InvokeRequest) returns (stream StreamingEvent); + + // Logging + rpc StreamLogs(LogStreamRequest) returns (stream LogEntry); +} + +message InitializeRequest { + string manifest_json = 1; + string config_json = 2; +} + +message InitializeResponse { + bool success = 1; + string error = 2; +} + +message ShutdownRequest { + int32 timeout_ms = 1; +} + +message ShutdownResponse { + bool success = 1; +} + +message HealthCheckRequest {} + +message HealthCheckResponse { + string status = 1; // healthy, degraded, unhealthy + string message = 2; + int32 duration_ms = 3; + string details_json = 4; +} + +message InvokeRequest { + string operation = 1; + string parameters_json = 2; + int32 timeout_ms = 3; +} + +message InvokeResponse { + bool success = 1; + string result_json = 2; + string error = 3; +} + +message StreamingEvent { + string event_type = 1; + string payload_json = 2; + int64 timestamp_unix_ms = 3; +} + +message LogStreamRequest { + string min_level = 1; +} + +message LogEntry { + int64 timestamp_unix_ms = 1; + string level = 2; + string message = 3; + string properties_json = 4; +} +``` + +### Resource Limiter (Linux) + +```csharp +// Resources/LinuxResourceLimiter.cs +namespace StellaOps.Plugin.Sandbox.Resources; + +public sealed class LinuxResourceLimiter : IResourceLimiter +{ + private readonly ILogger _logger; + + public LinuxResourceLimiter(ILogger logger) + { + _logger = logger; + } + + public ResourceConfiguration CreateConfiguration(ResourceLimits limits) + { + return new ResourceConfiguration + { + // Memory limit using cgroups v2 + MemoryLimitBytes = limits.MaxMemoryMb * 1024L * 1024L, + + // CPU limit as percentage (cgroups cpu.max) + CpuQuotaUs = (long)(limits.MaxCpuPercent * 1000), // Per 100ms period + CpuPeriodUs = 100_000, // 100ms + + // Process limit + MaxProcesses = limits.MaxProcesses, + + // File descriptor limit + MaxOpenFiles = limits.MaxOpenFiles + }; + } + + public async Task ApplyLimitsAsync(Process process, ResourceConfiguration config, CancellationToken ct) + { + var cgroupPath = $"/sys/fs/cgroup/stellaops-sandbox/{process.Id}"; + + try + { + // Create cgroup for this process + Directory.CreateDirectory(cgroupPath); + + // Set memory limit + await File.WriteAllTextAsync( + Path.Combine(cgroupPath, "memory.max"), + config.MemoryLimitBytes.ToString(), + ct); + + // Set CPU limit + await File.WriteAllTextAsync( + Path.Combine(cgroupPath, "cpu.max"), + $"{config.CpuQuotaUs} {config.CpuPeriodUs}", + ct); + + // Set process limit + await File.WriteAllTextAsync( + Path.Combine(cgroupPath, "pids.max"), + config.MaxProcesses.ToString(), + ct); + + // Add process to cgroup + await File.WriteAllTextAsync( + Path.Combine(cgroupPath, "cgroup.procs"), + process.Id.ToString(), + ct); + + _logger.LogDebug("Applied cgroup limits for process {ProcessId}", process.Id); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to apply cgroup limits for process {ProcessId}", process.Id); + throw; + } + } + + public async Task GetUsageAsync(Process process, CancellationToken ct) + { + var cgroupPath = $"/sys/fs/cgroup/stellaops-sandbox/{process.Id}"; + + try + { + // Read memory usage + var memoryUsageStr = await File.ReadAllTextAsync( + Path.Combine(cgroupPath, "memory.current"), ct); + var memoryUsageBytes = long.Parse(memoryUsageStr.Trim()); + + // Read CPU usage + var cpuStatStr = await File.ReadAllTextAsync( + Path.Combine(cgroupPath, "cpu.stat"), ct); + var cpuUsageUs = ParseCpuStat(cpuStatStr); + + return new ResourceUsage + { + MemoryUsageMb = memoryUsageBytes / (1024.0 * 1024.0), + CpuUsagePercent = CalculateCpuPercent(cpuUsageUs), + ProcessCount = process.Threads.Count + }; + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Failed to read resource usage for process {ProcessId}", process.Id); + return new ResourceUsage(); + } + } + + public async Task RemoveLimitsAsync(Process process, CancellationToken ct) + { + var cgroupPath = $"/sys/fs/cgroup/stellaops-sandbox/{process.Id}"; + + try + { + if (Directory.Exists(cgroupPath)) + { + // Move process out of cgroup first + await File.WriteAllTextAsync( + "/sys/fs/cgroup/cgroup.procs", + process.Id.ToString(), + ct); + + Directory.Delete(cgroupPath); + } + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Failed to cleanup cgroup for process {ProcessId}", process.Id); + } + } + + private static long ParseCpuStat(string stat) + { + foreach (var line in stat.Split('\n')) + { + if (line.StartsWith("usage_usec")) + { + return long.Parse(line.Split(' ')[1]); + } + } + return 0; + } + + private double CalculateCpuPercent(long cpuUsageUs) + { + // Simplified calculation - would need to track over time for accuracy + return 0; + } +} + +public sealed record ResourceConfiguration +{ + public long MemoryLimitBytes { get; init; } + public long CpuQuotaUs { get; init; } + public long CpuPeriodUs { get; init; } + public int MaxProcesses { get; init; } + public int MaxOpenFiles { get; init; } +} + +public sealed record ResourceUsage +{ + public double MemoryUsageMb { get; init; } + public double CpuUsagePercent { get; init; } + public int ProcessCount { get; init; } + public long DiskUsageBytes { get; init; } + public long NetworkBytesIn { get; init; } + public long NetworkBytesOut { get; init; } +} +``` + +### Network Policy Enforcer + +```csharp +// Network/NetworkPolicyEnforcer.cs +namespace StellaOps.Plugin.Sandbox.Network; + +public sealed class NetworkPolicyEnforcer : INetworkPolicyEnforcer +{ + private readonly ILogger _logger; + private readonly ConcurrentDictionary _activePolicies = new(); + + public NetworkPolicyEnforcer(ILogger logger) + { + _logger = logger; + } + + public async Task ApplyPolicyAsync(string sandboxId, NetworkPolicy policy, CancellationToken ct) + { + _activePolicies[sandboxId] = policy; + + if (OperatingSystem.IsLinux()) + { + await ApplyIptablesRulesAsync(sandboxId, policy, ct); + } + else if (OperatingSystem.IsWindows()) + { + await ApplyWindowsFirewallRulesAsync(sandboxId, policy, ct); + } + + _logger.LogDebug("Applied network policy for sandbox {SandboxId}", sandboxId); + } + + public async Task RemovePolicyAsync(string sandboxId, CancellationToken ct) + { + if (_activePolicies.TryRemove(sandboxId, out _)) + { + if (OperatingSystem.IsLinux()) + { + await RemoveIptablesRulesAsync(sandboxId, ct); + } + else if (OperatingSystem.IsWindows()) + { + await RemoveWindowsFirewallRulesAsync(sandboxId, ct); + } + + _logger.LogDebug("Removed network policy for sandbox {SandboxId}", sandboxId); + } + } + + public bool IsAllowed(string sandboxId, string host, int port) + { + if (!_activePolicies.TryGetValue(sandboxId, out var policy)) + return false; + + // Check blocked ports + if (policy.BlockedPorts.Contains(port)) + return false; + + // Check allowed ports + if (policy.AllowedPorts.Count > 0 && !policy.AllowedPorts.Contains(port)) + return false; + + // Check blocked hosts + if (policy.BlockedHosts.Contains(host)) + return false; + + // Check allowed hosts (if specified, only these are allowed) + if (policy.AllowedHosts.Count > 0 && !policy.AllowedHosts.Contains(host)) + return false; + + return true; + } + + private async Task ApplyIptablesRulesAsync(string sandboxId, NetworkPolicy policy, CancellationToken ct) + { + var chain = $"STELLAOPS_SANDBOX_{sandboxId.Replace("-", "_").ToUpperInvariant()}"; + + // Create chain + await ExecuteCommandAsync("iptables", $"-N {chain}", ct); + + // Add rules for blocked ports + foreach (var port in policy.BlockedPorts) + { + await ExecuteCommandAsync("iptables", + $"-A {chain} -p tcp --dport {port} -j DROP", ct); + await ExecuteCommandAsync("iptables", + $"-A {chain} -p udp --dport {port} -j DROP", ct); + } + + // Add rules for allowed hosts only + if (policy.AllowedHosts.Count > 0) + { + foreach (var host in policy.AllowedHosts) + { + await ExecuteCommandAsync("iptables", + $"-A {chain} -d {host} -j ACCEPT", ct); + } + // Drop everything else + await ExecuteCommandAsync("iptables", + $"-A {chain} -j DROP", ct); + } + } + + private async Task RemoveIptablesRulesAsync(string sandboxId, CancellationToken ct) + { + var chain = $"STELLAOPS_SANDBOX_{sandboxId.Replace("-", "_").ToUpperInvariant()}"; + + await ExecuteCommandAsync("iptables", $"-F {chain}", ct); + await ExecuteCommandAsync("iptables", $"-X {chain}", ct); + } + + private async Task ApplyWindowsFirewallRulesAsync(string sandboxId, NetworkPolicy policy, CancellationToken ct) + { + // Use Windows Firewall with Advanced Security + foreach (var port in policy.BlockedPorts) + { + var ruleName = $"StellaOps-Sandbox-{sandboxId}-Block-{port}"; + await ExecuteCommandAsync("netsh", + $"advfirewall firewall add rule name=\"{ruleName}\" dir=out protocol=tcp remoteport={port} action=block", + ct); + } + } + + private async Task RemoveWindowsFirewallRulesAsync(string sandboxId, CancellationToken ct) + { + await ExecuteCommandAsync("netsh", + $"advfirewall firewall delete rule name=all dir=out | findstr \"StellaOps-Sandbox-{sandboxId}\"", + ct); + } + + private static async Task ExecuteCommandAsync(string command, string args, CancellationToken ct) + { + using var process = new Process + { + StartInfo = new ProcessStartInfo + { + FileName = command, + Arguments = args, + RedirectStandardOutput = true, + RedirectStandardError = true, + UseShellExecute = false, + CreateNoWindow = true + } + }; + + process.Start(); + await process.WaitForExitAsync(ct); + + if (process.ExitCode != 0) + { + var error = await process.StandardError.ReadToEndAsync(ct); + throw new InvalidOperationException($"Command failed: {command} {args}\n{error}"); + } + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] `ISandbox` interface with all methods +- [ ] `SandboxConfiguration` with defaults +- [ ] Process-based sandbox implementation +- [ ] gRPC communication bridge +- [ ] Plugin host process executable +- [ ] Resource limits (Linux cgroups v2) +- [ ] Resource limits (Windows Job Objects) +- [ ] Network policy enforcement (iptables/Windows Firewall) +- [ ] Filesystem isolation +- [ ] Resource usage monitoring +- [ ] Resource warning events +- [ ] Graceful shutdown with timeout +- [ ] Process kill on timeout +- [ ] Unit tests for all components +- [ ] Integration tests with real processes +- [ ] Test coverage >= 80% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_001 Plugin Abstractions | Internal | TODO | +| 100_002 Plugin Host | Internal | TODO | +| Grpc.AspNetCore | External | Available | +| .NET 10 | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| ISandbox interface | TODO | | +| SandboxConfiguration | TODO | | +| ProcessSandbox | TODO | | +| GrpcPluginBridge | TODO | | +| plugin_bridge.proto | TODO | | +| PluginProcessManager | TODO | | +| LinuxResourceLimiter | TODO | | +| WindowsResourceLimiter | TODO | | +| NetworkPolicyEnforcer | TODO | | +| SandboxedFilesystem | TODO | | +| ScopedSecretProxy | TODO | | +| Plugin host executable | TODO | | +| Unit tests | TODO | | +| Integration tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_100_005_PLUGIN_crypto_rework.md b/docs/implplan/SPRINT_20260110_100_005_PLUGIN_crypto_rework.md new file mode 100644 index 000000000..d801ce48f --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_005_PLUGIN_crypto_rework.md @@ -0,0 +1,421 @@ +# SPRINT: Crypto Plugin Rework + +> **Sprint ID:** 100_005 +> **Module:** PLUGIN +> **Phase:** 100 - Plugin System Unification +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_plugin_unification.md) + +--- + +## Overview + +Rework all cryptographic providers (GOST, eIDAS, SM2/SM3/SM4, FIPS, HSM) to implement the unified plugin architecture with `IPlugin` and `ICryptoCapability` interfaces. + +### Objectives + +- Migrate GOST provider to unified plugin model +- Migrate eIDAS provider to unified plugin model +- Migrate SM2/SM3/SM4 provider to unified plugin model +- Migrate FIPS provider to unified plugin model +- Migrate HSM integration to unified plugin model +- Preserve all existing functionality +- Add health checks for all providers +- Add plugin manifests + +### Current State + +``` +src/Cryptography/ +├── StellaOps.Cryptography.Gost/ # GOST R 34.10-2012, R 34.11-2012 +├── StellaOps.Cryptography.Eidas/ # EU eIDAS qualified signatures +├── StellaOps.Cryptography.Sm/ # Chinese SM2/SM3/SM4 +├── StellaOps.Cryptography.Fips/ # US FIPS 140-2 compliant +└── StellaOps.Cryptography.Hsm/ # Hardware Security Module integration +``` + +### Target State + +``` +src/Cryptography/ +├── StellaOps.Cryptography.Plugin.Gost/ +│ ├── GostPlugin.cs # IPlugin implementation +│ ├── GostCryptoCapability.cs # ICryptoCapability implementation +│ ├── plugin.yaml # Plugin manifest +│ └── ... +├── StellaOps.Cryptography.Plugin.Eidas/ +├── StellaOps.Cryptography.Plugin.Sm/ +├── StellaOps.Cryptography.Plugin.Fips/ +└── StellaOps.Cryptography.Plugin.Hsm/ +``` + +--- + +## Deliverables + +### GOST Plugin Implementation + +```csharp +// GostPlugin.cs +namespace StellaOps.Cryptography.Plugin.Gost; + +[Plugin( + id: "com.stellaops.crypto.gost", + name: "GOST Cryptography Provider", + version: "1.0.0", + vendor: "Stella Ops")] +[ProvidesCapability(PluginCapabilities.Crypto, CapabilityId = "gost")] +public sealed class GostPlugin : IPlugin, ICryptoCapability +{ + private IPluginContext? _context; + private GostCryptoService? _cryptoService; + + public PluginInfo Info => new( + Id: "com.stellaops.crypto.gost", + Name: "GOST Cryptography Provider", + Version: "1.0.0", + Vendor: "Stella Ops", + Description: "Russian GOST R 34.10-2012 and R 34.11-2012 cryptographic algorithms", + LicenseId: "AGPL-3.0-or-later"); + + public PluginTrustLevel TrustLevel => PluginTrustLevel.BuiltIn; + + public PluginCapabilities Capabilities => PluginCapabilities.Crypto; + + public PluginLifecycleState State { get; private set; } = PluginLifecycleState.Discovered; + + // ICryptoCapability implementation + public IReadOnlyList SupportedAlgorithms => new[] + { + "GOST-R34.10-2012-256", + "GOST-R34.10-2012-512", + "GOST-R34.11-2012-256", + "GOST-R34.11-2012-512", + "GOST-28147-89" + }; + + public async Task InitializeAsync(IPluginContext context, CancellationToken ct) + { + _context = context; + State = PluginLifecycleState.Initializing; + + try + { + var options = context.Configuration.Bind(); + _cryptoService = new GostCryptoService(options, context.Logger); + + await _cryptoService.InitializeAsync(ct); + + State = PluginLifecycleState.Active; + context.Logger.Info("GOST cryptography provider initialized"); + } + catch (Exception ex) + { + State = PluginLifecycleState.Failed; + context.Logger.Error(ex, "Failed to initialize GOST provider"); + throw; + } + } + + public async Task HealthCheckAsync(CancellationToken ct) + { + if (_cryptoService == null) + return HealthCheckResult.Unhealthy("Provider not initialized"); + + try + { + // Verify we can perform a test operation + var testData = "test"u8.ToArray(); + var hash = await HashAsync(testData, "GOST-R34.11-2012-256", ct); + + if (hash.Length != 32) + return HealthCheckResult.Degraded("Hash output size mismatch"); + + return HealthCheckResult.Healthy(); + } + catch (Exception ex) + { + return HealthCheckResult.Unhealthy(ex); + } + } + + public bool CanHandle(CryptoOperation operation, string algorithm) + { + return algorithm.StartsWith("GOST", StringComparison.OrdinalIgnoreCase) && + SupportedAlgorithms.Contains(algorithm, StringComparer.OrdinalIgnoreCase); + } + + public async Task SignAsync( + ReadOnlyMemory data, + CryptoSignOptions options, + CancellationToken ct) + { + EnsureInitialized(); + + _context!.Logger.Debug("Signing with algorithm {Algorithm}", options.Algorithm); + + return await _cryptoService!.SignAsync( + data, + options.Algorithm, + options.KeyId, + options.KeyVersion, + ct); + } + + public async Task VerifyAsync( + ReadOnlyMemory data, + ReadOnlyMemory signature, + CryptoVerifyOptions options, + CancellationToken ct) + { + EnsureInitialized(); + + return await _cryptoService!.VerifyAsync( + data, + signature, + options.Algorithm, + options.KeyId, + options.CertificateChain, + ct); + } + + public async Task EncryptAsync( + ReadOnlyMemory data, + CryptoEncryptOptions options, + CancellationToken ct) + { + EnsureInitialized(); + + if (!options.Algorithm.Contains("28147", StringComparison.Ordinal)) + throw new NotSupportedException($"Encryption not supported for {options.Algorithm}"); + + return await _cryptoService!.EncryptAsync( + data, + options.KeyId, + options.Iv, + ct); + } + + public async Task DecryptAsync( + ReadOnlyMemory data, + CryptoDecryptOptions options, + CancellationToken ct) + { + EnsureInitialized(); + + return await _cryptoService!.DecryptAsync( + data, + options.KeyId, + options.Iv, + ct); + } + + public async Task HashAsync( + ReadOnlyMemory data, + string algorithm, + CancellationToken ct) + { + EnsureInitialized(); + + return await _cryptoService!.HashAsync(data, algorithm, ct); + } + + private void EnsureInitialized() + { + if (State != PluginLifecycleState.Active || _cryptoService == null) + throw new InvalidOperationException("GOST provider is not initialized"); + } + + public async ValueTask DisposeAsync() + { + if (_cryptoService != null) + { + await _cryptoService.DisposeAsync(); + _cryptoService = null; + } + State = PluginLifecycleState.Stopped; + } +} +``` + +### Plugin Manifest + +```yaml +# plugin.yaml +plugin: + id: com.stellaops.crypto.gost + name: GOST Cryptography Provider + version: 1.0.0 + vendor: Stella Ops + description: Russian GOST R 34.10-2012 and R 34.11-2012 cryptographic algorithms + license: AGPL-3.0-or-later + +entryPoint: StellaOps.Cryptography.Plugin.Gost.GostPlugin + +minPlatformVersion: 1.0.0 + +capabilities: + - type: crypto + id: gost + algorithms: + - GOST-R34.10-2012-256 + - GOST-R34.10-2012-512 + - GOST-R34.11-2012-256 + - GOST-R34.11-2012-512 + - GOST-28147-89 + +configSchema: + type: object + properties: + keyStorePath: + type: string + description: Path to GOST key store + defaultKeyId: + type: string + description: Default key identifier for signing + required: [] +``` + +### Shared Crypto Base Class + +```csharp +// CryptoPluginBase.cs +namespace StellaOps.Cryptography.Plugin; + +/// +/// Base class for crypto plugins with common functionality. +/// +public abstract class CryptoPluginBase : IPlugin, ICryptoCapability +{ + protected IPluginContext? Context { get; private set; } + + public abstract PluginInfo Info { get; } + public PluginTrustLevel TrustLevel => PluginTrustLevel.BuiltIn; + public PluginCapabilities Capabilities => PluginCapabilities.Crypto; + public PluginLifecycleState State { get; protected set; } = PluginLifecycleState.Discovered; + + public abstract IReadOnlyList SupportedAlgorithms { get; } + + public async Task InitializeAsync(IPluginContext context, CancellationToken ct) + { + Context = context; + State = PluginLifecycleState.Initializing; + + try + { + await InitializeCryptoServiceAsync(context, ct); + State = PluginLifecycleState.Active; + context.Logger.Info("{PluginName} initialized", Info.Name); + } + catch (Exception ex) + { + State = PluginLifecycleState.Failed; + context.Logger.Error(ex, "Failed to initialize {PluginName}", Info.Name); + throw; + } + } + + protected abstract Task InitializeCryptoServiceAsync(IPluginContext context, CancellationToken ct); + + public virtual async Task HealthCheckAsync(CancellationToken ct) + { + if (State != PluginLifecycleState.Active) + return HealthCheckResult.Unhealthy($"Plugin is in state {State}"); + + try + { + // Default health check: verify we can hash test data + var testData = "health-check-test"u8.ToArray(); + var algorithm = SupportedAlgorithms.FirstOrDefault(a => a.Contains("256") || a.Contains("SHA")); + + if (algorithm != null) + { + await HashAsync(testData, algorithm, ct); + } + + return HealthCheckResult.Healthy(); + } + catch (Exception ex) + { + return HealthCheckResult.Unhealthy(ex); + } + } + + public abstract bool CanHandle(CryptoOperation operation, string algorithm); + public abstract Task SignAsync(ReadOnlyMemory data, CryptoSignOptions options, CancellationToken ct); + public abstract Task VerifyAsync(ReadOnlyMemory data, ReadOnlyMemory signature, CryptoVerifyOptions options, CancellationToken ct); + public abstract Task EncryptAsync(ReadOnlyMemory data, CryptoEncryptOptions options, CancellationToken ct); + public abstract Task DecryptAsync(ReadOnlyMemory data, CryptoDecryptOptions options, CancellationToken ct); + public abstract Task HashAsync(ReadOnlyMemory data, string algorithm, CancellationToken ct); + + public abstract ValueTask DisposeAsync(); + + protected void EnsureActive() + { + if (State != PluginLifecycleState.Active) + throw new InvalidOperationException($"{Info.Name} is not active (state: {State})"); + } +} +``` + +### Migration Tasks + +| Provider | Current Interface | New Implementation | Status | +|----------|-------------------|-------------------|--------| +| GOST | `ICryptoProvider` | `GostPlugin : IPlugin, ICryptoCapability` | TODO | +| eIDAS | `ICryptoProvider` | `EidasPlugin : IPlugin, ICryptoCapability` | TODO | +| SM2/SM3/SM4 | `ICryptoProvider` | `SmPlugin : IPlugin, ICryptoCapability` | TODO | +| FIPS | `ICryptoProvider` | `FipsPlugin : IPlugin, ICryptoCapability` | TODO | +| HSM | `IHsmProvider` | `HsmPlugin : IPlugin, ICryptoCapability` | TODO | + +--- + +## Acceptance Criteria + +- [ ] All 5 crypto providers implement `IPlugin` +- [ ] All 5 crypto providers implement `ICryptoCapability` +- [ ] All providers have plugin manifests +- [ ] All existing crypto operations preserved +- [ ] Health checks implemented for all providers +- [ ] All providers discoverable by plugin host +- [ ] All providers register in plugin registry +- [ ] Backward-compatible configuration +- [ ] Unit tests migrated/updated +- [ ] Integration tests passing +- [ ] Performance benchmarks comparable to original + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_001 Plugin Abstractions | Internal | TODO | +| 100_002 Plugin Host | Internal | TODO | +| 100_003 Plugin Registry | Internal | TODO | +| BouncyCastle | External | Available | +| CryptoPro SDK | External | Available (GOST) | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| GostPlugin | TODO | | +| EidasPlugin | TODO | | +| SmPlugin | TODO | | +| FipsPlugin | TODO | | +| HsmPlugin | TODO | | +| CryptoPluginBase | TODO | | +| Plugin manifests (5) | TODO | | +| Unit tests | TODO | | +| Integration tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_100_006_PLUGIN_auth_rework.md b/docs/implplan/SPRINT_20260110_100_006_PLUGIN_auth_rework.md new file mode 100644 index 000000000..3b8c9104f --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_006_PLUGIN_auth_rework.md @@ -0,0 +1,455 @@ +# SPRINT: Auth Plugin Rework + +> **Sprint ID:** 100_006 +> **Module:** PLUGIN +> **Phase:** 100 - Plugin System Unification +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_plugin_unification.md) + +--- + +## Overview + +Rework all authentication providers (LDAP, OIDC, SAML, Workforce Identity) to implement the unified plugin architecture with `IPlugin` and `IAuthCapability` interfaces. + +### Objectives + +- Migrate LDAP provider to unified plugin model +- Migrate OIDC providers (Azure AD, Okta, Google, etc.) to unified plugin model +- Migrate SAML provider to unified plugin model +- Migrate Workforce Identity provider to unified plugin model +- Preserve all existing authentication flows +- Add health checks for all providers +- Add plugin manifests + +### Current State + +``` +src/Authority/ +├── __Plugins/ +│ ├── StellaOps.Authority.Plugin.Ldap/ +│ ├── StellaOps.Authority.Plugin.Oidc/ +│ └── StellaOps.Authority.Plugin.Saml/ +└── __Libraries/ + └── StellaOps.Authority.Identity/ +``` + +### Target State + +Each auth plugin implements: +- `IPlugin` - Core plugin interface with lifecycle +- `IAuthCapability` - Authentication/authorization operations +- Health checks for connectivity +- Plugin manifest for discovery + +--- + +## Deliverables + +### Auth Capability Interface + +```csharp +// IAuthCapability.cs (added to 100_001 Abstractions) +namespace StellaOps.Plugin.Abstractions.Capabilities; + +/// +/// Capability interface for authentication and authorization. +/// +public interface IAuthCapability +{ + /// + /// Auth provider type (ldap, oidc, saml, workforce). + /// + string ProviderType { get; } + + /// + /// Supported authentication methods. + /// + IReadOnlyList SupportedMethods { get; } + + /// + /// Authenticate a user with credentials. + /// + Task AuthenticateAsync(AuthRequest request, CancellationToken ct); + + /// + /// Validate an existing token/session. + /// + Task ValidateTokenAsync(string token, CancellationToken ct); + + /// + /// Get user information. + /// + Task GetUserInfoAsync(string userId, CancellationToken ct); + + /// + /// Get user's group memberships. + /// + Task> GetUserGroupsAsync(string userId, CancellationToken ct); + + /// + /// Check if user has specific permission. + /// + Task HasPermissionAsync(string userId, string permission, CancellationToken ct); + + /// + /// Initiate SSO flow (for OIDC/SAML). + /// + Task InitiateSsoAsync(SsoRequest request, CancellationToken ct); + + /// + /// Complete SSO callback. + /// + Task CompleteSsoAsync(SsoCallback callback, CancellationToken ct); +} + +public sealed record AuthRequest( + string Method, + string? Username, + string? Password, + string? Token, + IReadOnlyDictionary? AdditionalData); + +public sealed record AuthResult( + bool Success, + string? UserId, + string? AccessToken, + string? RefreshToken, + DateTimeOffset? ExpiresAt, + IReadOnlyList? Roles, + string? Error); + +public sealed record ValidationResult( + bool Valid, + string? UserId, + DateTimeOffset? ExpiresAt, + IReadOnlyList? Claims, + string? Error); + +public sealed record UserInfo( + string Id, + string Username, + string? Email, + string? DisplayName, + IReadOnlyDictionary? Attributes); + +public sealed record GroupInfo( + string Id, + string Name, + string? Description); + +public sealed record SsoRequest( + string RedirectUri, + string? State, + IReadOnlyList? Scopes); + +public sealed record SsoInitiation( + string AuthorizationUrl, + string State, + string? CodeVerifier); + +public sealed record SsoCallback( + string? Code, + string? State, + string? Error, + string? CodeVerifier); +``` + +### LDAP Plugin Implementation + +```csharp +// LdapPlugin.cs +namespace StellaOps.Authority.Plugin.Ldap; + +[Plugin( + id: "com.stellaops.auth.ldap", + name: "LDAP Authentication Provider", + version: "1.0.0", + vendor: "Stella Ops")] +[ProvidesCapability(PluginCapabilities.Auth, CapabilityId = "ldap")] +public sealed class LdapPlugin : IPlugin, IAuthCapability +{ + private IPluginContext? _context; + private LdapConnection? _connection; + private LdapOptions? _options; + + public PluginInfo Info => new( + Id: "com.stellaops.auth.ldap", + Name: "LDAP Authentication Provider", + Version: "1.0.0", + Vendor: "Stella Ops", + Description: "LDAP/Active Directory authentication and user lookup"); + + public PluginTrustLevel TrustLevel => PluginTrustLevel.BuiltIn; + public PluginCapabilities Capabilities => PluginCapabilities.Auth | PluginCapabilities.Network; + public PluginLifecycleState State { get; private set; } = PluginLifecycleState.Discovered; + + public string ProviderType => "ldap"; + public IReadOnlyList SupportedMethods => new[] { "password", "kerberos" }; + + public async Task InitializeAsync(IPluginContext context, CancellationToken ct) + { + _context = context; + State = PluginLifecycleState.Initializing; + + _options = context.Configuration.Bind(); + + // Test connection + _connection = new LdapConnection(new LdapDirectoryIdentifier(_options.Server, _options.Port)); + _connection.Credential = new NetworkCredential(_options.BindDn, _options.BindPassword); + _connection.AuthType = AuthType.Basic; + _connection.SessionOptions.SecureSocketLayer = _options.UseSsl; + + await Task.Run(() => _connection.Bind(), ct); + + State = PluginLifecycleState.Active; + context.Logger.Info("LDAP plugin connected to {Server}", _options.Server); + } + + public async Task HealthCheckAsync(CancellationToken ct) + { + if (_connection == null) + return HealthCheckResult.Unhealthy("Not initialized"); + + try + { + // Perform a simple search to verify connectivity + var request = new SearchRequest( + _options!.BaseDn, + "(objectClass=*)", + SearchScope.Base, + "objectClass"); + + var response = await Task.Run(() => + (SearchResponse)_connection.SendRequest(request), ct); + + return response.Entries.Count > 0 + ? HealthCheckResult.Healthy() + : HealthCheckResult.Degraded("Base DN search returned no results"); + } + catch (Exception ex) + { + return HealthCheckResult.Unhealthy(ex); + } + } + + public async Task AuthenticateAsync(AuthRequest request, CancellationToken ct) + { + if (request.Method != "password" || string.IsNullOrEmpty(request.Username)) + return new AuthResult(false, null, null, null, null, null, "Invalid auth method or missing username"); + + try + { + // Find user DN + var userDn = await FindUserDnAsync(request.Username, ct); + if (userDn == null) + return new AuthResult(false, null, null, null, null, null, "User not found"); + + // Attempt bind with user credentials + using var userConnection = new LdapConnection( + new LdapDirectoryIdentifier(_options!.Server, _options.Port)); + userConnection.Credential = new NetworkCredential(userDn, request.Password); + + await Task.Run(() => userConnection.Bind(), ct); + + // Get user info and groups + var userInfo = await GetUserInfoAsync(request.Username, ct); + var groups = await GetUserGroupsAsync(request.Username, ct); + + return new AuthResult( + Success: true, + UserId: request.Username, + AccessToken: null, // LDAP doesn't issue tokens + RefreshToken: null, + ExpiresAt: null, + Roles: groups.Select(g => g.Name).ToList(), + Error: null); + } + catch (LdapException ex) + { + _context?.Logger.Warning(ex, "LDAP authentication failed for {Username}", request.Username); + return new AuthResult(false, null, null, null, null, null, "Authentication failed"); + } + } + + public Task ValidateTokenAsync(string token, CancellationToken ct) + { + // LDAP doesn't use tokens + return Task.FromResult(new ValidationResult(false, null, null, null, "LDAP does not support token validation")); + } + + public async Task GetUserInfoAsync(string userId, CancellationToken ct) + { + var userDn = await FindUserDnAsync(userId, ct); + if (userDn == null) return null; + + var request = new SearchRequest( + userDn, + "(objectClass=*)", + SearchScope.Base, + "uid", "mail", "displayName", "cn", "sn", "givenName"); + + var response = await Task.Run(() => + (SearchResponse)_connection!.SendRequest(request), ct); + + if (response.Entries.Count == 0) return null; + + var entry = response.Entries[0]; + return new UserInfo( + Id: userId, + Username: GetAttribute(entry, "uid") ?? userId, + Email: GetAttribute(entry, "mail"), + DisplayName: GetAttribute(entry, "displayName") ?? GetAttribute(entry, "cn"), + Attributes: entry.Attributes.Cast() + .ToDictionary(a => a.Name, a => a[0]?.ToString() ?? "")); + } + + public async Task> GetUserGroupsAsync(string userId, CancellationToken ct) + { + var userDn = await FindUserDnAsync(userId, ct); + if (userDn == null) return Array.Empty(); + + var request = new SearchRequest( + _options!.GroupBaseDn ?? _options.BaseDn, + $"(member={userDn})", + SearchScope.Subtree, + "cn", "description"); + + var response = await Task.Run(() => + (SearchResponse)_connection!.SendRequest(request), ct); + + return response.Entries.Cast() + .Select(e => new GroupInfo( + Id: e.DistinguishedName, + Name: GetAttribute(e, "cn") ?? e.DistinguishedName, + Description: GetAttribute(e, "description"))) + .ToList(); + } + + public async Task HasPermissionAsync(string userId, string permission, CancellationToken ct) + { + var groups = await GetUserGroupsAsync(userId, ct); + // Permission checking would be based on group membership + return groups.Any(g => g.Name.Equals(permission, StringComparison.OrdinalIgnoreCase)); + } + + public Task InitiateSsoAsync(SsoRequest request, CancellationToken ct) + { + // LDAP doesn't support SSO initiation + return Task.FromResult(null); + } + + public Task CompleteSsoAsync(SsoCallback callback, CancellationToken ct) + { + return Task.FromResult(new AuthResult(false, null, null, null, null, null, "LDAP does not support SSO")); + } + + private async Task FindUserDnAsync(string username, CancellationToken ct) + { + var filter = string.Format(_options!.UserFilter, username); + var request = new SearchRequest( + _options.BaseDn, + filter, + SearchScope.Subtree, + "distinguishedName"); + + var response = await Task.Run(() => + (SearchResponse)_connection!.SendRequest(request), ct); + + return response.Entries.Count > 0 ? response.Entries[0].DistinguishedName : null; + } + + private static string? GetAttribute(SearchResultEntry entry, string name) + { + return entry.Attributes[name]?[0]?.ToString(); + } + + public async ValueTask DisposeAsync() + { + _connection?.Dispose(); + _connection = null; + State = PluginLifecycleState.Stopped; + } +} + +public sealed class LdapOptions +{ + public string Server { get; set; } = "localhost"; + public int Port { get; set; } = 389; + public bool UseSsl { get; set; } = false; + public string BaseDn { get; set; } = ""; + public string? GroupBaseDn { get; set; } + public string BindDn { get; set; } = ""; + public string BindPassword { get; set; } = ""; + public string UserFilter { get; set; } = "(uid={0})"; +} +``` + +### Migration Tasks + +| Provider | Current Interface | New Implementation | Status | +|----------|-------------------|-------------------|--------| +| LDAP | Authority plugin interfaces | `LdapPlugin : IPlugin, IAuthCapability` | TODO | +| OIDC Generic | Authority plugin interfaces | `OidcPlugin : IPlugin, IAuthCapability` | TODO | +| Azure AD | Authority plugin interfaces | `AzureAdPlugin : OidcPlugin` | TODO | +| Okta | Authority plugin interfaces | `OktaPlugin : OidcPlugin` | TODO | +| Google | Authority plugin interfaces | `GooglePlugin : OidcPlugin` | TODO | +| SAML | Authority plugin interfaces | `SamlPlugin : IPlugin, IAuthCapability` | TODO | +| Workforce | Authority plugin interfaces | `WorkforcePlugin : IPlugin, IAuthCapability` | TODO | + +--- + +## Acceptance Criteria + +- [ ] All auth providers implement `IPlugin` +- [ ] All auth providers implement `IAuthCapability` +- [ ] All providers have plugin manifests +- [ ] LDAP bind/search operations work +- [ ] OIDC authorization flow works +- [ ] OIDC token validation works +- [ ] SAML assertion handling works +- [ ] SSO initiation/completion works +- [ ] User info retrieval works +- [ ] Group membership queries work +- [ ] Health checks for all providers +- [ ] Unit tests migrated/updated +- [ ] Integration tests passing + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_001 Plugin Abstractions | Internal | TODO | +| 100_002 Plugin Host | Internal | TODO | +| 100_003 Plugin Registry | Internal | TODO | +| System.DirectoryServices.Protocols | External | Available | +| Microsoft.IdentityModel.* | External | Available | +| ITfoxtec.Identity.Saml2 | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IAuthCapability interface | TODO | | +| LdapPlugin | TODO | | +| OidcPlugin (base) | TODO | | +| AzureAdPlugin | TODO | | +| OktaPlugin | TODO | | +| GooglePlugin | TODO | | +| SamlPlugin | TODO | | +| WorkforcePlugin | TODO | | +| Plugin manifests | TODO | | +| Unit tests | TODO | | +| Integration tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_100_007_PLUGIN_llm_rework.md b/docs/implplan/SPRINT_20260110_100_007_PLUGIN_llm_rework.md new file mode 100644 index 000000000..204d49a3f --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_007_PLUGIN_llm_rework.md @@ -0,0 +1,453 @@ +# SPRINT: LLM Provider Rework + +> **Sprint ID:** 100_007 +> **Module:** PLUGIN +> **Phase:** 100 - Plugin System Unification +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_plugin_unification.md) + +--- + +## Overview + +Rework all LLM providers (llama-server, ollama, OpenAI, Claude) to implement the unified plugin architecture with `IPlugin` and `ILlmCapability` interfaces. + +### Objectives + +- Migrate llama-server provider to unified plugin model +- Migrate ollama provider to unified plugin model +- Migrate OpenAI provider to unified plugin model +- Migrate Claude provider to unified plugin model +- Preserve priority-based provider selection +- Add health checks with model availability +- Add plugin manifests + +### Current State + +``` +src/AdvisoryAI/ +├── __Libraries/ +│ └── StellaOps.AdvisoryAI.Providers/ +│ ├── LlamaServerProvider.cs +│ ├── OllamaProvider.cs +│ ├── OpenAiProvider.cs +│ └── ClaudeProvider.cs +``` + +--- + +## Deliverables + +### LLM Capability Interface + +```csharp +// ILlmCapability.cs +namespace StellaOps.Plugin.Abstractions.Capabilities; + +/// +/// Capability interface for Large Language Model inference. +/// +public interface ILlmCapability +{ + /// + /// Provider identifier (llama, ollama, openai, claude). + /// + string ProviderId { get; } + + /// + /// Priority for provider selection (higher = preferred). + /// + int Priority { get; } + + /// + /// Available models from this provider. + /// + IReadOnlyList AvailableModels { get; } + + /// + /// Create an inference session. + /// + Task CreateSessionAsync(LlmSessionOptions options, CancellationToken ct); + + /// + /// Check if provider can serve the specified model. + /// + Task CanServeModelAsync(string modelId, CancellationToken ct); + + /// + /// Refresh available models list. + /// + Task RefreshModelsAsync(CancellationToken ct); +} + +public interface ILlmSession : IAsyncDisposable +{ + /// + /// Session identifier. + /// + string SessionId { get; } + + /// + /// Model being used. + /// + string ModelId { get; } + + /// + /// Generate a completion. + /// + Task CompleteAsync(LlmPrompt prompt, CancellationToken ct); + + /// + /// Generate a streaming completion. + /// + IAsyncEnumerable CompleteStreamingAsync(LlmPrompt prompt, CancellationToken ct); + + /// + /// Generate embeddings for text. + /// + Task EmbedAsync(string text, CancellationToken ct); +} + +public sealed record LlmModelInfo( + string Id, + string Name, + string? Description, + long? ParameterCount, + int? ContextLength, + IReadOnlyList Capabilities); // ["chat", "completion", "embedding"] + +public sealed record LlmSessionOptions( + string ModelId, + LlmParameters? Parameters = null, + string? SystemPrompt = null); + +public sealed record LlmParameters( + float? Temperature = null, + float? TopP = null, + int? MaxTokens = null, + float? FrequencyPenalty = null, + float? PresencePenalty = null, + IReadOnlyList? StopSequences = null); + +public sealed record LlmPrompt( + IReadOnlyList Messages, + LlmParameters? ParameterOverrides = null); + +public sealed record LlmMessage( + LlmRole Role, + string Content); + +public enum LlmRole +{ + System, + User, + Assistant +} + +public sealed record LlmCompletion( + string Content, + LlmUsage Usage, + string? FinishReason); + +public sealed record LlmCompletionChunk( + string Content, + bool IsComplete, + LlmUsage? Usage = null); + +public sealed record LlmUsage( + int PromptTokens, + int CompletionTokens, + int TotalTokens); + +public sealed record LlmEmbedding( + float[] Vector, + int Dimensions, + LlmUsage Usage); +``` + +### OpenAI Plugin Implementation + +```csharp +// OpenAiPlugin.cs +namespace StellaOps.AdvisoryAI.Plugin.OpenAi; + +[Plugin( + id: "com.stellaops.llm.openai", + name: "OpenAI LLM Provider", + version: "1.0.0", + vendor: "Stella Ops")] +[ProvidesCapability(PluginCapabilities.Llm, CapabilityId = "openai")] +[RequiresCapability(PluginCapabilities.Network)] +public sealed class OpenAiPlugin : IPlugin, ILlmCapability +{ + private IPluginContext? _context; + private OpenAiClient? _client; + private List _models = new(); + + public PluginInfo Info => new( + Id: "com.stellaops.llm.openai", + Name: "OpenAI LLM Provider", + Version: "1.0.0", + Vendor: "Stella Ops", + Description: "OpenAI GPT models for AI-assisted advisory analysis"); + + public PluginTrustLevel TrustLevel => PluginTrustLevel.BuiltIn; + public PluginCapabilities Capabilities => PluginCapabilities.Llm | PluginCapabilities.Network; + public PluginLifecycleState State { get; private set; } = PluginLifecycleState.Discovered; + + public string ProviderId => "openai"; + public int Priority { get; private set; } = 10; + public IReadOnlyList AvailableModels => _models; + + public async Task InitializeAsync(IPluginContext context, CancellationToken ct) + { + _context = context; + State = PluginLifecycleState.Initializing; + + var options = context.Configuration.Bind(); + var apiKey = await context.Configuration.GetSecretAsync("openai-api-key", ct) + ?? options.ApiKey; + + if (string.IsNullOrEmpty(apiKey)) + { + State = PluginLifecycleState.Failed; + throw new InvalidOperationException("OpenAI API key not configured"); + } + + _client = new OpenAiClient(apiKey, options.BaseUrl); + Priority = options.Priority; + + await RefreshModelsAsync(ct); + + State = PluginLifecycleState.Active; + context.Logger.Info("OpenAI plugin initialized with {ModelCount} models", _models.Count); + } + + public async Task HealthCheckAsync(CancellationToken ct) + { + if (_client == null) + return HealthCheckResult.Unhealthy("Not initialized"); + + try + { + var models = await _client.ListModelsAsync(ct); + return HealthCheckResult.Healthy(details: new Dictionary + { + ["modelCount"] = models.Count + }); + } + catch (Exception ex) + { + return HealthCheckResult.Unhealthy(ex); + } + } + + public async Task CreateSessionAsync(LlmSessionOptions options, CancellationToken ct) + { + EnsureActive(); + + if (!await CanServeModelAsync(options.ModelId, ct)) + throw new InvalidOperationException($"Model {options.ModelId} not available"); + + return new OpenAiSession(_client!, options, _context!.Logger); + } + + public async Task CanServeModelAsync(string modelId, CancellationToken ct) + { + return _models.Any(m => m.Id.Equals(modelId, StringComparison.OrdinalIgnoreCase)); + } + + public async Task RefreshModelsAsync(CancellationToken ct) + { + var models = await _client!.ListModelsAsync(ct); + _models = models + .Where(m => m.Id.StartsWith("gpt") || m.Id.Contains("embedding")) + .Select(m => new LlmModelInfo( + Id: m.Id, + Name: m.Id, + Description: null, + ParameterCount: null, + ContextLength: GetContextLength(m.Id), + Capabilities: GetModelCapabilities(m.Id))) + .ToList(); + } + + private static int? GetContextLength(string modelId) => modelId switch + { + var m when m.Contains("gpt-4-turbo") => 128000, + var m when m.Contains("gpt-4") => 8192, + var m when m.Contains("gpt-3.5-turbo-16k") => 16384, + var m when m.Contains("gpt-3.5") => 4096, + _ => null + }; + + private static List GetModelCapabilities(string modelId) + { + if (modelId.Contains("embedding")) + return new List { "embedding" }; + return new List { "chat", "completion" }; + } + + private void EnsureActive() + { + if (State != PluginLifecycleState.Active) + throw new InvalidOperationException($"OpenAI plugin is not active (state: {State})"); + } + + public ValueTask DisposeAsync() + { + _client?.Dispose(); + State = PluginLifecycleState.Stopped; + return ValueTask.CompletedTask; + } +} + +internal sealed class OpenAiSession : ILlmSession +{ + private readonly OpenAiClient _client; + private readonly LlmSessionOptions _options; + private readonly IPluginLogger _logger; + + public string SessionId { get; } = Guid.NewGuid().ToString("N"); + public string ModelId => _options.ModelId; + + public OpenAiSession(OpenAiClient client, LlmSessionOptions options, IPluginLogger logger) + { + _client = client; + _options = options; + _logger = logger; + } + + public async Task CompleteAsync(LlmPrompt prompt, CancellationToken ct) + { + var request = BuildRequest(prompt); + var response = await _client.ChatCompleteAsync(request, ct); + + return new LlmCompletion( + Content: response.Choices[0].Message.Content, + Usage: new LlmUsage( + response.Usage.PromptTokens, + response.Usage.CompletionTokens, + response.Usage.TotalTokens), + FinishReason: response.Choices[0].FinishReason); + } + + public async IAsyncEnumerable CompleteStreamingAsync( + LlmPrompt prompt, + [EnumeratorCancellation] CancellationToken ct) + { + var request = BuildRequest(prompt); + request.Stream = true; + + await foreach (var chunk in _client.ChatCompleteStreamAsync(request, ct)) + { + yield return new LlmCompletionChunk( + Content: chunk.Choices[0].Delta?.Content ?? "", + IsComplete: chunk.Choices[0].FinishReason != null, + Usage: chunk.Usage != null ? new LlmUsage( + chunk.Usage.PromptTokens, + chunk.Usage.CompletionTokens, + chunk.Usage.TotalTokens) : null); + } + } + + public async Task EmbedAsync(string text, CancellationToken ct) + { + var response = await _client.EmbedAsync(text, "text-embedding-ada-002", ct); + + return new LlmEmbedding( + Vector: response.Data[0].Embedding, + Dimensions: response.Data[0].Embedding.Length, + Usage: new LlmUsage(response.Usage.PromptTokens, 0, response.Usage.TotalTokens)); + } + + private ChatCompletionRequest BuildRequest(LlmPrompt prompt) + { + var messages = new List(); + + if (!string.IsNullOrEmpty(_options.SystemPrompt)) + { + messages.Add(new ChatMessage("system", _options.SystemPrompt)); + } + + messages.AddRange(prompt.Messages.Select(m => new ChatMessage( + m.Role.ToString().ToLowerInvariant(), + m.Content))); + + var parameters = prompt.ParameterOverrides ?? _options.Parameters ?? new LlmParameters(); + + return new ChatCompletionRequest + { + Model = ModelId, + Messages = messages, + Temperature = parameters.Temperature, + TopP = parameters.TopP, + MaxTokens = parameters.MaxTokens, + FrequencyPenalty = parameters.FrequencyPenalty, + PresencePenalty = parameters.PresencePenalty, + Stop = parameters.StopSequences?.ToArray() + }; + } + + public ValueTask DisposeAsync() => ValueTask.CompletedTask; +} +``` + +### Migration Tasks + +| Provider | Priority | New Implementation | Status | +|----------|----------|-------------------|--------| +| llama-server | 100 (local) | `LlamaServerPlugin : IPlugin, ILlmCapability` | TODO | +| ollama | 90 (local) | `OllamaPlugin : IPlugin, ILlmCapability` | TODO | +| Claude | 20 | `ClaudePlugin : IPlugin, ILlmCapability` | TODO | +| OpenAI | 10 | `OpenAiPlugin : IPlugin, ILlmCapability` | TODO | + +--- + +## Acceptance Criteria + +- [ ] All LLM providers implement `IPlugin` +- [ ] All LLM providers implement `ILlmCapability` +- [ ] Priority-based provider selection preserved +- [ ] Chat completion works +- [ ] Streaming completion works +- [ ] Embedding generation works +- [ ] Model listing works +- [ ] Health checks verify API connectivity +- [ ] Local providers (llama/ollama) check process availability +- [ ] Unit tests migrated/updated +- [ ] Integration tests with mock servers + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_001 Plugin Abstractions | Internal | TODO | +| 100_002 Plugin Host | Internal | TODO | +| OpenAI .NET SDK | External | Available | +| Anthropic SDK | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| ILlmCapability interface | TODO | | +| LlamaServerPlugin | TODO | | +| OllamaPlugin | TODO | | +| OpenAiPlugin | TODO | | +| ClaudePlugin | TODO | | +| LlmProviderSelector | TODO | Priority-based selection | +| Plugin manifests | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_100_008_PLUGIN_scm_rework.md b/docs/implplan/SPRINT_20260110_100_008_PLUGIN_scm_rework.md new file mode 100644 index 000000000..3aa251343 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_008_PLUGIN_scm_rework.md @@ -0,0 +1,359 @@ +# SPRINT: SCM Connector Rework + +> **Sprint ID:** 100_008 +> **Module:** PLUGIN +> **Phase:** 100 - Plugin System Unification +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_plugin_unification.md) + +--- + +## Overview + +Rework all SCM connectors (GitHub, GitLab, Azure DevOps, Gitea, Bitbucket) to implement the unified plugin architecture with `IPlugin` and `IScmCapability` interfaces. + +### Objectives + +- Migrate GitHub connector to unified plugin model +- Migrate GitLab connector to unified plugin model +- Migrate Azure DevOps connector to unified plugin model +- Migrate Gitea connector to unified plugin model +- Add Bitbucket connector +- Preserve URL auto-detection +- Add health checks with API connectivity +- Add plugin manifests + +### Migration Tasks + +| Provider | Current Interface | New Implementation | Status | +|----------|-------------------|-------------------|--------| +| GitHub | `IScmConnectorPlugin` | `GitHubPlugin : IPlugin, IScmCapability` | TODO | +| GitLab | `IScmConnectorPlugin` | `GitLabPlugin : IPlugin, IScmCapability` | TODO | +| Azure DevOps | `IScmConnectorPlugin` | `AzureDevOpsPlugin : IPlugin, IScmCapability` | TODO | +| Gitea | `IScmConnectorPlugin` | `GiteaPlugin : IPlugin, IScmCapability` | TODO | +| Bitbucket | (new) | `BitbucketPlugin : IPlugin, IScmCapability` | TODO | + +--- + +## Deliverables + +### GitHub Plugin Implementation + +```csharp +// GitHubPlugin.cs +namespace StellaOps.Integrations.Plugin.GitHub; + +[Plugin( + id: "com.stellaops.scm.github", + name: "GitHub SCM Connector", + version: "1.0.0", + vendor: "Stella Ops")] +[ProvidesCapability(PluginCapabilities.Scm, CapabilityId = "github")] +public sealed class GitHubPlugin : IPlugin, IScmCapability +{ + private IPluginContext? _context; + private GitHubClient? _client; + private GitHubOptions? _options; + + public PluginInfo Info => new( + Id: "com.stellaops.scm.github", + Name: "GitHub SCM Connector", + Version: "1.0.0", + Vendor: "Stella Ops", + Description: "GitHub repository integration for source control operations"); + + public PluginTrustLevel TrustLevel => PluginTrustLevel.BuiltIn; + public PluginCapabilities Capabilities => PluginCapabilities.Scm | PluginCapabilities.Network; + public PluginLifecycleState State { get; private set; } = PluginLifecycleState.Discovered; + + public string ConnectorType => "scm.github"; + public string DisplayName => "GitHub"; + public string ScmType => "github"; + + private static readonly Regex GitHubUrlPattern = new( + @"^https?://(?:www\.)?github\.com/([^/]+)/([^/]+?)(?:\.git)?/?$", + RegexOptions.Compiled | RegexOptions.IgnoreCase); + + public async Task InitializeAsync(IPluginContext context, CancellationToken ct) + { + _context = context; + State = PluginLifecycleState.Initializing; + + _options = context.Configuration.Bind(); + + var token = await context.Configuration.GetSecretAsync("github-token", ct) + ?? _options.Token; + + _client = new GitHubClient(new ProductHeaderValue("StellaOps")) + { + Credentials = new Credentials(token) + }; + + if (!string.IsNullOrEmpty(_options.BaseUrl)) + { + _client = new GitHubClient( + new ProductHeaderValue("StellaOps"), + new Uri(_options.BaseUrl)) + { + Credentials = new Credentials(token) + }; + } + + State = PluginLifecycleState.Active; + context.Logger.Info("GitHub plugin initialized"); + } + + public async Task HealthCheckAsync(CancellationToken ct) + { + if (_client == null) + return HealthCheckResult.Unhealthy("Not initialized"); + + try + { + var user = await _client.User.Current(); + return HealthCheckResult.Healthy(details: new Dictionary + { + ["authenticatedAs"] = user.Login, + ["rateLimitRemaining"] = _client.GetLastApiInfo()?.RateLimit?.Remaining ?? -1 + }); + } + catch (Exception ex) + { + return HealthCheckResult.Unhealthy(ex); + } + } + + public bool CanHandle(string repositoryUrl) => GitHubUrlPattern.IsMatch(repositoryUrl); + + public async Task TestConnectionAsync(CancellationToken ct) + { + try + { + var sw = Stopwatch.StartNew(); + var user = await _client!.User.Current(); + sw.Stop(); + + return ConnectionTestResult.Succeeded(sw.Elapsed); + } + catch (Exception ex) + { + return ConnectionTestResult.Failed(ex.Message, ex); + } + } + + public async Task GetConnectionInfoAsync(CancellationToken ct) + { + var user = await _client!.User.Current(); + var apiInfo = _client.GetLastApiInfo(); + + return new ConnectionInfo( + EndpointUrl: _options?.BaseUrl ?? "https://api.github.com", + AuthenticatedAs: user.Login, + Metadata: new Dictionary + { + ["rateLimitRemaining"] = apiInfo?.RateLimit?.Remaining ?? -1, + ["rateLimitReset"] = apiInfo?.RateLimit?.Reset.ToString() ?? "" + }); + } + + public async Task> ListBranchesAsync(string repositoryUrl, CancellationToken ct) + { + var (owner, repo) = ParseRepositoryUrl(repositoryUrl); + var branches = await _client!.Repository.Branch.GetAll(owner, repo); + var defaultBranch = (await _client.Repository.Get(owner, repo)).DefaultBranch; + + return branches.Select(b => new ScmBranch( + Name: b.Name, + CommitSha: b.Commit.Sha, + IsDefault: b.Name == defaultBranch, + IsProtected: b.Protected)).ToList(); + } + + public async Task> ListCommitsAsync( + string repositoryUrl, + string branch, + int limit = 50, + CancellationToken ct = default) + { + var (owner, repo) = ParseRepositoryUrl(repositoryUrl); + var commits = await _client!.Repository.Commit.GetAll(owner, repo, + new CommitRequest { Sha = branch }, + new ApiOptions { PageSize = limit, PageCount = 1 }); + + return commits.Select(c => new ScmCommit( + Sha: c.Sha, + Message: c.Commit.Message, + AuthorName: c.Commit.Author.Name, + AuthorEmail: c.Commit.Author.Email, + AuthoredAt: c.Commit.Author.Date, + ParentShas: c.Parents.Select(p => p.Sha).ToList())).ToList(); + } + + public async Task GetCommitAsync(string repositoryUrl, string commitSha, CancellationToken ct) + { + var (owner, repo) = ParseRepositoryUrl(repositoryUrl); + var commit = await _client!.Repository.Commit.Get(owner, repo, commitSha); + + return new ScmCommit( + Sha: commit.Sha, + Message: commit.Commit.Message, + AuthorName: commit.Commit.Author.Name, + AuthorEmail: commit.Commit.Author.Email, + AuthoredAt: commit.Commit.Author.Date, + ParentShas: commit.Parents.Select(p => p.Sha).ToList()); + } + + public async Task GetFileAsync( + string repositoryUrl, + string filePath, + string? reference = null, + CancellationToken ct = default) + { + var (owner, repo) = ParseRepositoryUrl(repositoryUrl); + var content = await _client!.Repository.Content.GetAllContentsByRef(owner, repo, filePath, reference ?? "HEAD"); + var file = content.First(); + + return new ScmFileContent( + Path: file.Path, + Content: file.Content, + Encoding: file.Encoding.StringValue, + Sha: file.Sha, + Size: file.Size); + } + + public async Task GetArchiveAsync( + string repositoryUrl, + string reference, + ArchiveFormat format = ArchiveFormat.TarGz, + CancellationToken ct = default) + { + var (owner, repo) = ParseRepositoryUrl(repositoryUrl); + var archiveFormat = format == ArchiveFormat.Zip + ? Octokit.ArchiveFormat.Zipball + : Octokit.ArchiveFormat.Tarball; + + var bytes = await _client!.Repository.Content.GetArchive(owner, repo, archiveFormat, reference); + return new MemoryStream(bytes); + } + + public async Task UpsertWebhookAsync( + string repositoryUrl, + ScmWebhookConfig config, + CancellationToken ct) + { + var (owner, repo) = ParseRepositoryUrl(repositoryUrl); + + var existingHooks = await _client!.Repository.Hooks.GetAll(owner, repo); + var existing = existingHooks.FirstOrDefault(h => + h.Config.TryGetValue("url", out var url) && url == config.Url); + + if (existing != null) + { + var updated = await _client.Repository.Hooks.Edit(owner, repo, (int)existing.Id, + new EditRepositoryHook(config.Events.ToArray()) + { + Active = true, + Config = new Dictionary + { + ["url"] = config.Url, + ["secret"] = config.Secret, + ["content_type"] = "json" + } + }); + + return new ScmWebhook(updated.Id.ToString(), updated.Config["url"], updated.Events.ToList(), updated.Active); + } + + var created = await _client.Repository.Hooks.Create(owner, repo, new NewRepositoryHook("web", new Dictionary + { + ["url"] = config.Url, + ["secret"] = config.Secret, + ["content_type"] = "json" + }) + { + Events = config.Events.ToArray(), + Active = true + }); + + return new ScmWebhook(created.Id.ToString(), created.Config["url"], created.Events.ToList(), created.Active); + } + + public async Task GetCurrentUserAsync(CancellationToken ct) + { + var user = await _client!.User.Current(); + + return new ScmUser( + Id: user.Id.ToString(), + Username: user.Login, + DisplayName: user.Name, + Email: user.Email, + AvatarUrl: user.AvatarUrl); + } + + private static (string Owner, string Repo) ParseRepositoryUrl(string url) + { + var match = GitHubUrlPattern.Match(url); + if (!match.Success) + throw new ArgumentException($"Invalid GitHub repository URL: {url}"); + + return (match.Groups[1].Value, match.Groups[2].Value); + } + + public ValueTask DisposeAsync() + { + State = PluginLifecycleState.Stopped; + return ValueTask.CompletedTask; + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] All SCM connectors implement `IPlugin` +- [ ] All SCM connectors implement `IScmCapability` +- [ ] URL auto-detection works for all providers +- [ ] Branch listing works +- [ ] Commit listing works +- [ ] File retrieval works +- [ ] Archive download works +- [ ] Webhook management works +- [ ] Health checks verify API connectivity +- [ ] Rate limit information exposed +- [ ] Unit tests migrated/updated +- [ ] Integration tests with mock APIs + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_001 Plugin Abstractions | Internal | TODO | +| 100_002 Plugin Host | Internal | TODO | +| Octokit | External | Available | +| GitLabApiClient | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| GitHubPlugin | TODO | | +| GitLabPlugin | TODO | | +| AzureDevOpsPlugin | TODO | | +| GiteaPlugin | TODO | | +| BitbucketPlugin | TODO | New | +| ScmPluginBase | TODO | Shared base class | +| Plugin manifests | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_100_009_PLUGIN_scanner_rework.md b/docs/implplan/SPRINT_20260110_100_009_PLUGIN_scanner_rework.md new file mode 100644 index 000000000..a00a4457d --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_009_PLUGIN_scanner_rework.md @@ -0,0 +1,1156 @@ +# SPRINT: Scanner Analyzer Rework + +> **Sprint ID:** 100_009 +> **Module:** PLUGIN +> **Phase:** 100 - Plugin System Unification +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_plugin_unification.md) + +--- + +## Overview + +Rework all Scanner analyzers (11 language analyzers) to implement the unified plugin architecture with `IPlugin` and `IAnalysisCapability` interfaces. + +### Objectives + +- Migrate all 11 language analyzers to unified plugin model +- Migrate SBOM generators to unified plugin model +- Migrate binary analyzers to unified plugin model +- Preserve deterministic output guarantees +- Add health checks for analyzer availability +- Add plugin manifests +- Maintain backward compatibility with existing scan workflows + +### Current State + +``` +src/Scanner/ +├── __Libraries/ +│ └── StellaOps.Scanner.Analyzers/ +│ ├── Languages/ +│ │ ├── DotNetAnalyzer.cs # .NET/NuGet +│ │ ├── GoAnalyzer.cs # Go modules +│ │ ├── JavaAnalyzer.cs # Maven/Gradle +│ │ ├── JavaScriptAnalyzer.cs # npm/yarn/pnpm +│ │ ├── PythonAnalyzer.cs # pip/poetry/pipenv +│ │ ├── RubyAnalyzer.cs # Bundler/Gemfile +│ │ ├── RustAnalyzer.cs # Cargo +│ │ ├── PhpAnalyzer.cs # Composer +│ │ ├── SwiftAnalyzer.cs # Swift Package Manager +│ │ ├── CppAnalyzer.cs # Conan/vcpkg +│ │ └── ElixirAnalyzer.cs # Mix/Hex +│ ├── Binary/ +│ │ ├── ElfAnalyzer.cs +│ │ ├── PeAnalyzer.cs +│ │ └── MachOAnalyzer.cs +│ └── Sbom/ +│ ├── SpdxGenerator.cs +│ └── CycloneDxGenerator.cs +``` + +### Target State + +Each analyzer implements: +- `IPlugin` - Core plugin interface with lifecycle +- `IAnalysisCapability` - Analysis operations +- Health checks for tool availability +- Plugin manifest for discovery +- Deterministic output guarantees + +--- + +## Deliverables + +### Analysis Capability Interface + +```csharp +// IAnalysisCapability.cs +namespace StellaOps.Plugin.Abstractions.Capabilities; + +/// +/// Capability interface for container/dependency analysis. +/// +public interface IAnalysisCapability +{ + /// + /// Analyzer identifier (dotnet, go, java, etc.). + /// + string AnalyzerId { get; } + + /// + /// Analysis category (language, binary, sbom). + /// + AnalysisCategory Category { get; } + + /// + /// File patterns this analyzer can process. + /// + IReadOnlyList SupportedPatterns { get; } + + /// + /// Check if analyzer can handle specific file. + /// + bool CanAnalyze(string filePath); + + /// + /// Analyze a single file or directory. + /// + Task AnalyzeAsync( + AnalysisRequest request, + CancellationToken ct); + + /// + /// Batch analyze multiple targets. + /// + IAsyncEnumerable AnalyzeBatchAsync( + IReadOnlyList requests, + CancellationToken ct); +} + +public enum AnalysisCategory +{ + Language, + Binary, + Sbom, + Container +} + +public sealed record AnalysisRequest( + string TargetPath, + AnalysisOptions Options); + +public sealed record AnalysisOptions( + bool IncludeDevDependencies = false, + bool IncludeTransitive = true, + int MaxDepth = 100, + IReadOnlyList? ExcludePatterns = null, + IReadOnlyDictionary? Environment = null); + +public sealed record AnalysisResult( + string TargetPath, + string AnalyzerId, + bool Success, + IReadOnlyList Components, + IReadOnlyList Errors, + AnalysisMetadata Metadata); + +public sealed record DetectedComponent( + string Name, + string Version, + string? Purl, + ComponentType Type, + string? Ecosystem, + string? License, + string? SourceLocation, + IReadOnlyList DirectDependencies, + IReadOnlyDictionary? Metadata); + +public enum ComponentType +{ + Library, + Application, + Framework, + Container, + OperatingSystem, + Device, + File, + Data +} + +public sealed record AnalysisError( + string Code, + string Message, + string? FilePath, + int? Line, + AnalysisErrorSeverity Severity); + +public enum AnalysisErrorSeverity +{ + Warning, + Error, + Fatal +} + +public sealed record AnalysisMetadata( + DateTimeOffset AnalyzedAt, + TimeSpan Duration, + string AnalyzerVersion, + IReadOnlyDictionary? AdditionalInfo); +``` + +### Language Analyzer Base Class + +```csharp +// LanguageAnalyzerBase.cs +namespace StellaOps.Scanner.Plugin; + +/// +/// Base class for language-specific analyzers. +/// +public abstract class LanguageAnalyzerBase : IPlugin, IAnalysisCapability +{ + protected IPluginContext? Context { get; private set; } + + public abstract PluginInfo Info { get; } + public PluginTrustLevel TrustLevel => PluginTrustLevel.BuiltIn; + public PluginCapabilities Capabilities => PluginCapabilities.Analysis; + public PluginLifecycleState State { get; protected set; } = PluginLifecycleState.Discovered; + + public abstract string AnalyzerId { get; } + public AnalysisCategory Category => AnalysisCategory.Language; + public abstract IReadOnlyList SupportedPatterns { get; } + + public async Task InitializeAsync(IPluginContext context, CancellationToken ct) + { + Context = context; + State = PluginLifecycleState.Initializing; + + try + { + await InitializeAnalyzerAsync(context, ct); + State = PluginLifecycleState.Active; + context.Logger.Info("{AnalyzerId} analyzer initialized", AnalyzerId); + } + catch (Exception ex) + { + State = PluginLifecycleState.Failed; + context.Logger.Error(ex, "Failed to initialize {AnalyzerId} analyzer", AnalyzerId); + throw; + } + } + + protected virtual Task InitializeAnalyzerAsync(IPluginContext context, CancellationToken ct) + => Task.CompletedTask; + + public virtual async Task HealthCheckAsync(CancellationToken ct) + { + if (State != PluginLifecycleState.Active) + return HealthCheckResult.Unhealthy($"Analyzer is in state {State}"); + + try + { + // Check if required tools are available + var toolsAvailable = await CheckToolAvailabilityAsync(ct); + if (!toolsAvailable) + return HealthCheckResult.Degraded("Some analysis tools unavailable"); + + return HealthCheckResult.Healthy(); + } + catch (Exception ex) + { + return HealthCheckResult.Unhealthy(ex); + } + } + + protected virtual Task CheckToolAvailabilityAsync(CancellationToken ct) + => Task.FromResult(true); + + public virtual bool CanAnalyze(string filePath) + { + var fileName = Path.GetFileName(filePath); + return SupportedPatterns.Any(pattern => + FilePatternMatcher.Matches(fileName, pattern)); + } + + public abstract Task AnalyzeAsync( + AnalysisRequest request, + CancellationToken ct); + + public virtual async IAsyncEnumerable AnalyzeBatchAsync( + IReadOnlyList requests, + [EnumeratorCancellation] CancellationToken ct) + { + foreach (var request in requests) + { + ct.ThrowIfCancellationRequested(); + yield return await AnalyzeAsync(request, ct); + } + } + + protected void EnsureActive() + { + if (State != PluginLifecycleState.Active) + throw new InvalidOperationException($"{AnalyzerId} analyzer is not active (state: {State})"); + } + + public virtual ValueTask DisposeAsync() + { + State = PluginLifecycleState.Stopped; + return ValueTask.CompletedTask; + } +} +``` + +### .NET Analyzer Plugin Implementation + +```csharp +// DotNetAnalyzerPlugin.cs +namespace StellaOps.Scanner.Plugin.DotNet; + +[Plugin( + id: "com.stellaops.analyzer.dotnet", + name: ".NET Dependency Analyzer", + version: "1.0.0", + vendor: "Stella Ops")] +[ProvidesCapability(PluginCapabilities.Analysis, CapabilityId = "dotnet")] +public sealed class DotNetAnalyzerPlugin : LanguageAnalyzerBase +{ + private NuGetClient? _nugetClient; + private DotNetAnalyzerOptions? _options; + + public override PluginInfo Info => new( + Id: "com.stellaops.analyzer.dotnet", + Name: ".NET Dependency Analyzer", + Version: "1.0.0", + Vendor: "Stella Ops", + Description: "Analyzes .NET projects for NuGet dependencies"); + + public override string AnalyzerId => "dotnet"; + + public override IReadOnlyList SupportedPatterns => new[] + { + "*.csproj", + "*.fsproj", + "*.vbproj", + "*.sln", + "packages.config", + "*.deps.json", + "Directory.Packages.props", + "global.json" + }; + + protected override async Task InitializeAnalyzerAsync(IPluginContext context, CancellationToken ct) + { + _options = context.Configuration.Bind(); + + // Initialize NuGet client for metadata enrichment + _nugetClient = new NuGetClient( + _options.NuGetSources ?? new[] { "https://api.nuget.org/v3/index.json" }, + context.Logger); + + await _nugetClient.InitializeAsync(ct); + } + + protected override async Task CheckToolAvailabilityAsync(CancellationToken ct) + { + // Check if dotnet CLI is available + try + { + var result = await ProcessRunner.RunAsync("dotnet", "--version", ct); + return result.ExitCode == 0; + } + catch + { + return false; + } + } + + public override async Task AnalyzeAsync( + AnalysisRequest request, + CancellationToken ct) + { + EnsureActive(); + + var sw = Stopwatch.StartNew(); + var components = new List(); + var errors = new List(); + + try + { + Context!.Logger.Debug("Analyzing .NET project: {Path}", request.TargetPath); + + var fileType = DetermineFileType(request.TargetPath); + + components = fileType switch + { + DotNetFileType.Project => await AnalyzeProjectAsync(request, ct), + DotNetFileType.Solution => await AnalyzeSolutionAsync(request, ct), + DotNetFileType.PackagesConfig => await AnalyzePackagesConfigAsync(request, ct), + DotNetFileType.DepsJson => await AnalyzeDepsJsonAsync(request, ct), + DotNetFileType.DirectoryPackagesProps => await AnalyzeCentralPackageManagementAsync(request, ct), + _ => throw new NotSupportedException($"Unsupported file type: {request.TargetPath}") + }; + + // Enrich with NuGet metadata if enabled + if (_options!.EnrichMetadata) + { + components = await EnrichComponentsAsync(components, ct); + } + + // Sort for deterministic output + components = components + .OrderBy(c => c.Name, StringComparer.OrdinalIgnoreCase) + .ThenBy(c => c.Version, StringComparer.OrdinalIgnoreCase) + .ToList(); + + sw.Stop(); + + return new AnalysisResult( + TargetPath: request.TargetPath, + AnalyzerId: AnalyzerId, + Success: true, + Components: components, + Errors: errors, + Metadata: new AnalysisMetadata( + AnalyzedAt: Context.TimeProvider.GetUtcNow(), + Duration: sw.Elapsed, + AnalyzerVersion: Info.Version, + AdditionalInfo: new Dictionary + { + ["fileType"] = fileType.ToString(), + ["componentCount"] = components.Count + })); + } + catch (Exception ex) + { + sw.Stop(); + Context!.Logger.Error(ex, "Failed to analyze {Path}", request.TargetPath); + + errors.Add(new AnalysisError( + Code: "DOTNET001", + Message: ex.Message, + FilePath: request.TargetPath, + Line: null, + Severity: AnalysisErrorSeverity.Error)); + + return new AnalysisResult( + TargetPath: request.TargetPath, + AnalyzerId: AnalyzerId, + Success: false, + Components: components, + Errors: errors, + Metadata: new AnalysisMetadata( + AnalyzedAt: Context.TimeProvider.GetUtcNow(), + Duration: sw.Elapsed, + AnalyzerVersion: Info.Version, + AdditionalInfo: null)); + } + } + + private async Task> AnalyzeProjectAsync( + AnalysisRequest request, + CancellationToken ct) + { + var components = new List(); + + // Use dotnet list package --format json for accurate dependency resolution + var result = await ProcessRunner.RunAsync( + "dotnet", + $"list \"{request.TargetPath}\" package --format json" + + (request.Options.IncludeTransitive ? " --include-transitive" : ""), + ct, + workingDirectory: Path.GetDirectoryName(request.TargetPath)); + + if (result.ExitCode != 0) + { + // Fallback to parsing project file directly + return await ParseProjectFileAsync(request.TargetPath, ct); + } + + var packageData = JsonSerializer.Deserialize(result.Output); + if (packageData?.Projects == null) return components; + + foreach (var project in packageData.Projects) + { + foreach (var framework in project.Frameworks ?? Enumerable.Empty()) + { + foreach (var pkg in framework.TopLevelPackages ?? Enumerable.Empty()) + { + components.Add(CreateComponent(pkg, isDirect: true)); + } + + if (request.Options.IncludeTransitive) + { + foreach (var pkg in framework.TransitivePackages ?? Enumerable.Empty()) + { + components.Add(CreateComponent(pkg, isDirect: false)); + } + } + } + } + + return components; + } + + private async Task> AnalyzeSolutionAsync( + AnalysisRequest request, + CancellationToken ct) + { + var components = new List(); + + // Parse solution file to find all projects + var solutionContent = await File.ReadAllTextAsync(request.TargetPath, ct); + var projectPaths = SolutionParser.ExtractProjectPaths(solutionContent, Path.GetDirectoryName(request.TargetPath)!); + + foreach (var projectPath in projectPaths) + { + if (!File.Exists(projectPath)) continue; + + var projectRequest = request with { TargetPath = projectPath }; + var projectComponents = await AnalyzeProjectAsync(projectRequest, ct); + components.AddRange(projectComponents); + } + + // Deduplicate by name+version + return components + .GroupBy(c => (c.Name, c.Version)) + .Select(g => g.First()) + .ToList(); + } + + private async Task> AnalyzePackagesConfigAsync( + AnalysisRequest request, + CancellationToken ct) + { + var components = new List(); + var content = await File.ReadAllTextAsync(request.TargetPath, ct); + var doc = XDocument.Parse(content); + + foreach (var package in doc.Descendants("package")) + { + var id = package.Attribute("id")?.Value; + var version = package.Attribute("version")?.Value; + + if (string.IsNullOrEmpty(id) || string.IsNullOrEmpty(version)) continue; + + components.Add(new DetectedComponent( + Name: id, + Version: version, + Purl: $"pkg:nuget/{id}@{version}", + Type: ComponentType.Library, + Ecosystem: "nuget", + License: null, + SourceLocation: request.TargetPath, + DirectDependencies: Array.Empty(), + Metadata: null)); + } + + return components; + } + + private async Task> AnalyzeDepsJsonAsync( + AnalysisRequest request, + CancellationToken ct) + { + var components = new List(); + var content = await File.ReadAllTextAsync(request.TargetPath, ct); + var depsJson = JsonSerializer.Deserialize(content); + + if (depsJson?.Libraries == null) return components; + + foreach (var (key, library) in depsJson.Libraries) + { + var parts = key.Split('/'); + if (parts.Length != 2) continue; + + var name = parts[0]; + var version = parts[1]; + + // Skip runtime libraries + if (library.Type == "project") continue; + + components.Add(new DetectedComponent( + Name: name, + Version: version, + Purl: $"pkg:nuget/{name}@{version}", + Type: ComponentType.Library, + Ecosystem: "nuget", + License: null, + SourceLocation: request.TargetPath, + DirectDependencies: Array.Empty(), + Metadata: library.Sha512 != null + ? new Dictionary { ["sha512"] = library.Sha512 } + : null)); + } + + return components; + } + + private async Task> AnalyzeCentralPackageManagementAsync( + AnalysisRequest request, + CancellationToken ct) + { + var components = new List(); + var content = await File.ReadAllTextAsync(request.TargetPath, ct); + var doc = XDocument.Parse(content); + + foreach (var packageVersion in doc.Descendants("PackageVersion")) + { + var include = packageVersion.Attribute("Include")?.Value; + var version = packageVersion.Attribute("Version")?.Value; + + if (string.IsNullOrEmpty(include) || string.IsNullOrEmpty(version)) continue; + + components.Add(new DetectedComponent( + Name: include, + Version: version, + Purl: $"pkg:nuget/{include}@{version}", + Type: ComponentType.Library, + Ecosystem: "nuget", + License: null, + SourceLocation: request.TargetPath, + DirectDependencies: Array.Empty(), + Metadata: new Dictionary { ["centrallyManaged"] = "true" })); + } + + return components; + } + + private async Task> EnrichComponentsAsync( + List components, + CancellationToken ct) + { + var enriched = new List(); + + foreach (var component in components) + { + try + { + var metadata = await _nugetClient!.GetPackageMetadataAsync( + component.Name, + component.Version, + ct); + + enriched.Add(component with + { + License = metadata?.LicenseExpression ?? metadata?.LicenseUrl, + Metadata = MergeMetadata(component.Metadata, new Dictionary + { + ["description"] = metadata?.Description ?? "", + ["projectUrl"] = metadata?.ProjectUrl ?? "", + ["authors"] = metadata?.Authors ?? "" + }) + }); + } + catch + { + enriched.Add(component); + } + } + + return enriched; + } + + private DetectedComponent CreateComponent(PackageInfo pkg, bool isDirect) + { + return new DetectedComponent( + Name: pkg.Id, + Version: pkg.ResolvedVersion ?? pkg.RequestedVersion, + Purl: $"pkg:nuget/{pkg.Id}@{pkg.ResolvedVersion ?? pkg.RequestedVersion}", + Type: ComponentType.Library, + Ecosystem: "nuget", + License: null, + SourceLocation: null, + DirectDependencies: Array.Empty(), + Metadata: new Dictionary + { + ["isDirect"] = isDirect.ToString(), + ["requestedVersion"] = pkg.RequestedVersion ?? "" + }); + } + + private static DotNetFileType DetermineFileType(string path) + { + var fileName = Path.GetFileName(path); + return fileName.ToLowerInvariant() switch + { + "packages.config" => DotNetFileType.PackagesConfig, + "directory.packages.props" => DotNetFileType.DirectoryPackagesProps, + "global.json" => DotNetFileType.GlobalJson, + var f when f.EndsWith(".deps.json") => DotNetFileType.DepsJson, + var f when f.EndsWith(".sln") => DotNetFileType.Solution, + var f when f.EndsWith("proj") => DotNetFileType.Project, + _ => DotNetFileType.Unknown + }; + } + + private static IReadOnlyDictionary? MergeMetadata( + IReadOnlyDictionary? existing, + Dictionary additional) + { + if (existing == null) return additional; + var merged = new Dictionary(existing); + foreach (var (key, value) in additional) + { + merged[key] = value; + } + return merged; + } + + private enum DotNetFileType + { + Unknown, + Project, + Solution, + PackagesConfig, + DepsJson, + DirectoryPackagesProps, + GlobalJson + } + + public override async ValueTask DisposeAsync() + { + if (_nugetClient != null) + { + await _nugetClient.DisposeAsync(); + _nugetClient = null; + } + await base.DisposeAsync(); + } +} + +public sealed class DotNetAnalyzerOptions +{ + public string[]? NuGetSources { get; set; } + public bool EnrichMetadata { get; set; } = true; +} +``` + +### Go Analyzer Plugin Implementation + +```csharp +// GoAnalyzerPlugin.cs +namespace StellaOps.Scanner.Plugin.Go; + +[Plugin( + id: "com.stellaops.analyzer.go", + name: "Go Module Analyzer", + version: "1.0.0", + vendor: "Stella Ops")] +[ProvidesCapability(PluginCapabilities.Analysis, CapabilityId = "go")] +public sealed class GoAnalyzerPlugin : LanguageAnalyzerBase +{ + public override PluginInfo Info => new( + Id: "com.stellaops.analyzer.go", + Name: "Go Module Analyzer", + Version: "1.0.0", + Vendor: "Stella Ops", + Description: "Analyzes Go modules for dependencies"); + + public override string AnalyzerId => "go"; + + public override IReadOnlyList SupportedPatterns => new[] + { + "go.mod", + "go.sum", + "Gopkg.lock", + "Gopkg.toml", + "vendor/modules.txt" + }; + + protected override async Task CheckToolAvailabilityAsync(CancellationToken ct) + { + try + { + var result = await ProcessRunner.RunAsync("go", "version", ct); + return result.ExitCode == 0; + } + catch + { + return false; + } + } + + public override async Task AnalyzeAsync( + AnalysisRequest request, + CancellationToken ct) + { + EnsureActive(); + + var sw = Stopwatch.StartNew(); + var components = new List(); + var errors = new List(); + + try + { + var fileName = Path.GetFileName(request.TargetPath); + + if (fileName == "go.mod") + { + components = await AnalyzeGoModAsync(request, ct); + } + else if (fileName == "go.sum") + { + components = await AnalyzeGoSumAsync(request, ct); + } + else if (fileName == "Gopkg.lock") + { + components = await AnalyzeDepLockAsync(request, ct); + } + else if (fileName == "modules.txt") + { + components = await AnalyzeVendorModulesAsync(request, ct); + } + + // Sort for deterministic output + components = components + .OrderBy(c => c.Name, StringComparer.OrdinalIgnoreCase) + .ThenBy(c => c.Version, StringComparer.OrdinalIgnoreCase) + .ToList(); + + sw.Stop(); + + return new AnalysisResult( + TargetPath: request.TargetPath, + AnalyzerId: AnalyzerId, + Success: true, + Components: components, + Errors: errors, + Metadata: new AnalysisMetadata( + AnalyzedAt: Context!.TimeProvider.GetUtcNow(), + Duration: sw.Elapsed, + AnalyzerVersion: Info.Version, + AdditionalInfo: new Dictionary + { + ["componentCount"] = components.Count + })); + } + catch (Exception ex) + { + sw.Stop(); + Context!.Logger.Error(ex, "Failed to analyze {Path}", request.TargetPath); + + errors.Add(new AnalysisError( + Code: "GO001", + Message: ex.Message, + FilePath: request.TargetPath, + Line: null, + Severity: AnalysisErrorSeverity.Error)); + + return new AnalysisResult( + TargetPath: request.TargetPath, + AnalyzerId: AnalyzerId, + Success: false, + Components: components, + Errors: errors, + Metadata: new AnalysisMetadata( + AnalyzedAt: Context.TimeProvider.GetUtcNow(), + Duration: sw.Elapsed, + AnalyzerVersion: Info.Version, + AdditionalInfo: null)); + } + } + + private async Task> AnalyzeGoModAsync( + AnalysisRequest request, + CancellationToken ct) + { + var components = new List(); + var workDir = Path.GetDirectoryName(request.TargetPath)!; + + // Use go list for accurate dependency resolution + var result = await ProcessRunner.RunAsync( + "go", + "list -m -json all", + ct, + workingDirectory: workDir, + environment: new Dictionary + { + ["GO111MODULE"] = "on" + }); + + if (result.ExitCode == 0) + { + // Parse NDJSON output + var reader = new StringReader(result.Output); + string? line; + while ((line = await reader.ReadLineAsync(ct)) != null) + { + if (string.IsNullOrWhiteSpace(line)) continue; + + var module = JsonSerializer.Deserialize(line); + if (module?.Path == null || module.Main) continue; + + components.Add(new DetectedComponent( + Name: module.Path, + Version: module.Version ?? "unknown", + Purl: CreateGoPurl(module.Path, module.Version), + Type: ComponentType.Library, + Ecosystem: "go", + License: null, + SourceLocation: request.TargetPath, + DirectDependencies: Array.Empty(), + Metadata: module.Replace != null + ? new Dictionary + { + ["replacedBy"] = module.Replace.Path, + ["replacedVersion"] = module.Replace.Version ?? "" + } + : null)); + } + } + else + { + // Fallback: parse go.mod directly + components = await ParseGoModFileAsync(request.TargetPath, ct); + } + + return components; + } + + private async Task> ParseGoModFileAsync( + string path, + CancellationToken ct) + { + var components = new List(); + var lines = await File.ReadAllLinesAsync(path, ct); + var inRequireBlock = false; + + foreach (var line in lines) + { + var trimmed = line.Trim(); + + if (trimmed.StartsWith("require (")) + { + inRequireBlock = true; + continue; + } + if (trimmed == ")") + { + inRequireBlock = false; + continue; + } + + if (inRequireBlock || trimmed.StartsWith("require ")) + { + var requireLine = inRequireBlock ? trimmed : trimmed[8..].Trim(); + var parts = requireLine.Split(' ', StringSplitOptions.RemoveEmptyEntries); + + if (parts.Length >= 2) + { + var modulePath = parts[0]; + var version = parts[1]; + + // Skip indirect dependencies if not requested + var isIndirect = requireLine.Contains("// indirect"); + + components.Add(new DetectedComponent( + Name: modulePath, + Version: version, + Purl: CreateGoPurl(modulePath, version), + Type: ComponentType.Library, + Ecosystem: "go", + License: null, + SourceLocation: path, + DirectDependencies: Array.Empty(), + Metadata: new Dictionary + { + ["isDirect"] = (!isIndirect).ToString() + })); + } + } + } + + return components; + } + + private async Task> AnalyzeGoSumAsync( + AnalysisRequest request, + CancellationToken ct) + { + var components = new List(); + var lines = await File.ReadAllLinesAsync(request.TargetPath, ct); + + foreach (var line in lines) + { + var parts = line.Split(' ', StringSplitOptions.RemoveEmptyEntries); + if (parts.Length < 3) continue; + + var modulePath = parts[0]; + var version = parts[1].TrimEnd("/go.mod".ToCharArray()); + var hash = parts[2]; + + // Deduplicate (go.sum has entries for both module and go.mod) + if (version.EndsWith("/go.mod")) continue; + + components.Add(new DetectedComponent( + Name: modulePath, + Version: version, + Purl: CreateGoPurl(modulePath, version), + Type: ComponentType.Library, + Ecosystem: "go", + License: null, + SourceLocation: request.TargetPath, + DirectDependencies: Array.Empty(), + Metadata: new Dictionary + { + ["h1"] = hash + })); + } + + // Deduplicate by module path + version + return components + .GroupBy(c => (c.Name, c.Version)) + .Select(g => g.First()) + .ToList(); + } + + private async Task> AnalyzeDepLockAsync( + AnalysisRequest request, + CancellationToken ct) + { + var components = new List(); + var content = await File.ReadAllTextAsync(request.TargetPath, ct); + + // Parse TOML lock file + var toml = Toml.Parse(content); + + foreach (var project in toml.GetTableArray("projects")) + { + var name = project.GetString("name"); + var version = project.GetString("version"); + var revision = project.GetString("revision"); + + if (string.IsNullOrEmpty(name)) continue; + + components.Add(new DetectedComponent( + Name: name, + Version: version ?? revision ?? "unknown", + Purl: CreateGoPurl(name, version ?? revision), + Type: ComponentType.Library, + Ecosystem: "go", + License: null, + SourceLocation: request.TargetPath, + DirectDependencies: Array.Empty(), + Metadata: revision != null + ? new Dictionary { ["revision"] = revision } + : null)); + } + + return components; + } + + private async Task> AnalyzeVendorModulesAsync( + AnalysisRequest request, + CancellationToken ct) + { + var components = new List(); + var lines = await File.ReadAllLinesAsync(request.TargetPath, ct); + + foreach (var line in lines) + { + if (!line.StartsWith("# ")) continue; + + var parts = line[2..].Split(' ', StringSplitOptions.RemoveEmptyEntries); + if (parts.Length < 2) continue; + + var modulePath = parts[0]; + var version = parts[1]; + + components.Add(new DetectedComponent( + Name: modulePath, + Version: version, + Purl: CreateGoPurl(modulePath, version), + Type: ComponentType.Library, + Ecosystem: "go", + License: null, + SourceLocation: request.TargetPath, + DirectDependencies: Array.Empty(), + Metadata: new Dictionary + { + ["vendored"] = "true" + })); + } + + return components; + } + + private static string CreateGoPurl(string modulePath, string? version) + { + // Go PURLs use the module path as namespace/name + var encoded = Uri.EscapeDataString(modulePath.ToLowerInvariant()); + return version != null + ? $"pkg:golang/{encoded}@{version}" + : $"pkg:golang/{encoded}"; + } + + private sealed record GoModule( + string? Path, + string? Version, + bool Main, + GoModule? Replace); +} +``` + +### Migration Tasks + +| Analyzer | Current Interface | New Implementation | Status | +|----------|-------------------|-------------------|--------| +| DotNet | `ILanguageAnalyzer` | `DotNetAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| Go | `ILanguageAnalyzer` | `GoAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| Java | `ILanguageAnalyzer` | `JavaAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| JavaScript | `ILanguageAnalyzer` | `JavaScriptAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| Python | `ILanguageAnalyzer` | `PythonAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| Ruby | `ILanguageAnalyzer` | `RubyAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| Rust | `ILanguageAnalyzer` | `RustAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| PHP | `ILanguageAnalyzer` | `PhpAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| Swift | `ILanguageAnalyzer` | `SwiftAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| C++ | `ILanguageAnalyzer` | `CppAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| Elixir | `ILanguageAnalyzer` | `ElixirAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| ELF Binary | `IBinaryAnalyzer` | `ElfAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| PE Binary | `IBinaryAnalyzer` | `PeAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| Mach-O Binary | `IBinaryAnalyzer` | `MachOAnalyzerPlugin : IPlugin, IAnalysisCapability` | TODO | +| SPDX Gen | `ISbomGenerator` | `SpdxGeneratorPlugin : IPlugin, ISbomCapability` | TODO | +| CycloneDX Gen | `ISbomGenerator` | `CycloneDxGeneratorPlugin : IPlugin, ISbomCapability` | TODO | + +--- + +## Acceptance Criteria + +- [ ] All 11 language analyzers implement `IPlugin` +- [ ] All 11 language analyzers implement `IAnalysisCapability` +- [ ] Binary analyzers (ELF, PE, Mach-O) implement plugin interfaces +- [ ] SBOM generators implement plugin interfaces +- [ ] Deterministic output maintained (sorted components) +- [ ] Health checks verify tool availability +- [ ] Plugin manifests for all analyzers +- [ ] Backward compatibility with Scanner service +- [ ] Unit tests migrated/updated +- [ ] Integration tests with real packages + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_001 Plugin Abstractions | Internal | TODO | +| 100_002 Plugin Host | Internal | TODO | +| NuGet.Protocol | External | Available | +| Tomlyn | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IAnalysisCapability interface | TODO | | +| LanguageAnalyzerBase | TODO | | +| DotNetAnalyzerPlugin | TODO | | +| GoAnalyzerPlugin | TODO | | +| JavaAnalyzerPlugin | TODO | Maven + Gradle | +| JavaScriptAnalyzerPlugin | TODO | npm + yarn + pnpm | +| PythonAnalyzerPlugin | TODO | pip + poetry + pipenv | +| RubyAnalyzerPlugin | TODO | Bundler | +| RustAnalyzerPlugin | TODO | Cargo | +| PhpAnalyzerPlugin | TODO | Composer | +| SwiftAnalyzerPlugin | TODO | SPM | +| CppAnalyzerPlugin | TODO | Conan + vcpkg | +| ElixirAnalyzerPlugin | TODO | Mix/Hex | +| ElfAnalyzerPlugin | TODO | | +| PeAnalyzerPlugin | TODO | | +| MachOAnalyzerPlugin | TODO | | +| SpdxGeneratorPlugin | TODO | | +| CycloneDxGeneratorPlugin | TODO | | +| Plugin manifests | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_100_010_PLUGIN_router_rework.md b/docs/implplan/SPRINT_20260110_100_010_PLUGIN_router_rework.md new file mode 100644 index 000000000..489eac6c8 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_010_PLUGIN_router_rework.md @@ -0,0 +1,1129 @@ +# SPRINT: Router Transport Rework + +> **Sprint ID:** 100_010 +> **Module:** PLUGIN +> **Phase:** 100 - Plugin System Unification +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_plugin_unification.md) + +--- + +## Overview + +Rework all Router transport providers (TCP/TLS, UDP, RabbitMQ, Valkey) to implement the unified plugin architecture with `IPlugin` and `ITransportCapability` interfaces. + +### Objectives + +- Migrate TCP/TLS transport to unified plugin model +- Migrate UDP transport to unified plugin model +- Migrate RabbitMQ transport to unified plugin model +- Migrate Valkey (Redis) transport to unified plugin model +- Preserve message routing semantics +- Add health checks with connectivity verification +- Add plugin manifests +- Support hot-swap transport configuration + +### Current State + +``` +src/Router/ +├── __Libraries/ +│ └── StellaOps.Router.Core/ +│ └── Transports/ +│ ├── TcpTransport.cs +│ ├── TlsTransport.cs +│ ├── UdpTransport.cs +│ ├── RabbitMqTransport.cs +│ └── ValkeyTransport.cs +``` + +### Target State + +Each transport implements: +- `IPlugin` - Core plugin interface with lifecycle +- `ITransportCapability` - Message transport operations +- Health checks for connectivity +- Plugin manifest for discovery +- Connection pooling and resilience + +--- + +## Deliverables + +### Transport Capability Interface + +```csharp +// ITransportCapability.cs +namespace StellaOps.Plugin.Abstractions.Capabilities; + +/// +/// Capability interface for message transport. +/// +public interface ITransportCapability +{ + /// + /// Transport identifier (tcp, tls, udp, rabbitmq, valkey). + /// + string TransportId { get; } + + /// + /// Transport protocol type. + /// + TransportProtocol Protocol { get; } + + /// + /// Whether this transport supports pub/sub patterns. + /// + bool SupportsPubSub { get; } + + /// + /// Whether this transport supports request/reply patterns. + /// + bool SupportsRequestReply { get; } + + /// + /// Whether this transport supports message queuing. + /// + bool SupportsQueuing { get; } + + /// + /// Create a connection to a destination. + /// + Task ConnectAsync( + TransportEndpoint endpoint, + TransportConnectionOptions options, + CancellationToken ct); + + /// + /// Create a listener on an endpoint. + /// + Task ListenAsync( + TransportEndpoint endpoint, + TransportListenerOptions options, + CancellationToken ct); + + /// + /// Subscribe to a topic/channel (for pub/sub transports). + /// + Task SubscribeAsync( + string topic, + TransportSubscriptionOptions options, + CancellationToken ct); +} + +public enum TransportProtocol +{ + Tcp, + Tls, + Udp, + Amqp, + Redis +} + +public sealed record TransportEndpoint( + string Host, + int Port, + string? Path = null, + IReadOnlyDictionary? Parameters = null); + +public sealed record TransportConnectionOptions( + TimeSpan ConnectTimeout = default, + TimeSpan ReadTimeout = default, + TimeSpan WriteTimeout = default, + int MaxRetries = 3, + TimeSpan RetryDelay = default, + bool KeepAlive = true, + int BufferSize = 65536, + TlsOptions? Tls = null); + +public sealed record TlsOptions( + string? CertificatePath = null, + string? CertificatePassword = null, + bool ValidateServerCertificate = true, + string? ServerName = null, + IReadOnlyList? AllowedCipherSuites = null); + +public sealed record TransportListenerOptions( + int Backlog = 100, + int MaxConnections = 1000, + TimeSpan IdleTimeout = default, + TlsOptions? Tls = null); + +public sealed record TransportSubscriptionOptions( + string? ConsumerGroup = null, + int PrefetchCount = 10, + bool AutoAck = false, + TimeSpan? AckTimeout = null); + +/// +/// Represents an active transport connection. +/// +public interface ITransportConnection : IAsyncDisposable +{ + /// + /// Connection identifier. + /// + string ConnectionId { get; } + + /// + /// Remote endpoint. + /// + TransportEndpoint RemoteEndpoint { get; } + + /// + /// Connection state. + /// + TransportConnectionState State { get; } + + /// + /// Send a message. + /// + Task SendAsync(TransportMessage message, CancellationToken ct); + + /// + /// Receive a message (blocking). + /// + Task ReceiveAsync(CancellationToken ct); + + /// + /// Stream of incoming messages. + /// + IAsyncEnumerable ReceiveStreamAsync(CancellationToken ct); + + /// + /// Request-reply pattern. + /// + Task RequestAsync( + TransportMessage request, + TimeSpan timeout, + CancellationToken ct); +} + +public enum TransportConnectionState +{ + Connecting, + Connected, + Disconnected, + Failed +} + +/// +/// Represents a transport listener accepting connections. +/// +public interface ITransportListener : IAsyncDisposable +{ + /// + /// Local endpoint being listened on. + /// + TransportEndpoint LocalEndpoint { get; } + + /// + /// Accept incoming connections. + /// + IAsyncEnumerable AcceptAsync(CancellationToken ct); +} + +/// +/// Represents a pub/sub subscription. +/// +public interface ITransportSubscription : IAsyncDisposable +{ + /// + /// Subscription topic. + /// + string Topic { get; } + + /// + /// Stream of incoming messages. + /// + IAsyncEnumerable MessagesAsync(CancellationToken ct); + + /// + /// Acknowledge a message. + /// + Task AcknowledgeAsync(string messageId, CancellationToken ct); + + /// + /// Negative acknowledge (requeue) a message. + /// + Task NegativeAcknowledgeAsync(string messageId, CancellationToken ct); +} + +/// +/// Transport message envelope. +/// +public sealed record TransportMessage( + string Id, + ReadOnlyMemory Payload, + string? ContentType = null, + string? CorrelationId = null, + string? ReplyTo = null, + IReadOnlyDictionary? Headers = null, + DateTimeOffset? Timestamp = null, + TimeSpan? Ttl = null); +``` + +### TCP/TLS Transport Plugin Implementation + +```csharp +// TcpTransportPlugin.cs +namespace StellaOps.Router.Plugin.Tcp; + +[Plugin( + id: "com.stellaops.transport.tcp", + name: "TCP Transport", + version: "1.0.0", + vendor: "Stella Ops")] +[ProvidesCapability(PluginCapabilities.Transport, CapabilityId = "tcp")] +public sealed class TcpTransportPlugin : IPlugin, ITransportCapability +{ + private IPluginContext? _context; + private TcpTransportOptions? _options; + private readonly ConcurrentDictionary _connections = new(); + + public PluginInfo Info => new( + Id: "com.stellaops.transport.tcp", + Name: "TCP Transport", + Version: "1.0.0", + Vendor: "Stella Ops", + Description: "Raw TCP transport for high-performance internal messaging"); + + public PluginTrustLevel TrustLevel => PluginTrustLevel.BuiltIn; + public PluginCapabilities Capabilities => PluginCapabilities.Transport | PluginCapabilities.Network; + public PluginLifecycleState State { get; private set; } = PluginLifecycleState.Discovered; + + public string TransportId => "tcp"; + public TransportProtocol Protocol => TransportProtocol.Tcp; + public bool SupportsPubSub => false; + public bool SupportsRequestReply => true; + public bool SupportsQueuing => false; + + public Task InitializeAsync(IPluginContext context, CancellationToken ct) + { + _context = context; + State = PluginLifecycleState.Initializing; + + _options = context.Configuration.Bind(); + + State = PluginLifecycleState.Active; + context.Logger.Info("TCP transport initialized"); + + return Task.CompletedTask; + } + + public async Task HealthCheckAsync(CancellationToken ct) + { + if (State != PluginLifecycleState.Active) + return HealthCheckResult.Unhealthy($"Transport is in state {State}"); + + var activeConnections = _connections.Count(c => c.Value.State == TransportConnectionState.Connected); + + return HealthCheckResult.Healthy(details: new Dictionary + { + ["activeConnections"] = activeConnections, + ["totalConnections"] = _connections.Count + }); + } + + public async Task ConnectAsync( + TransportEndpoint endpoint, + TransportConnectionOptions options, + CancellationToken ct) + { + EnsureActive(); + + var client = new TcpClient(); + + try + { + var connectTimeout = options.ConnectTimeout != default + ? options.ConnectTimeout + : TimeSpan.FromSeconds(30); + + using var cts = CancellationTokenSource.CreateLinkedTokenSource(ct); + cts.CancelAfter(connectTimeout); + + await client.ConnectAsync(endpoint.Host, endpoint.Port, cts.Token); + + if (options.KeepAlive) + { + client.Client.SetSocketOption( + SocketOptionLevel.Socket, + SocketOptionName.KeepAlive, + true); + } + + client.ReceiveBufferSize = options.BufferSize; + client.SendBufferSize = options.BufferSize; + + var connection = new TcpTransportConnection( + client, + endpoint, + options, + _context!.Logger, + _context.TimeProvider); + + _connections[connection.ConnectionId] = connection; + + _context.Logger.Debug("TCP connection established: {ConnectionId} -> {Host}:{Port}", + connection.ConnectionId, endpoint.Host, endpoint.Port); + + return connection; + } + catch (Exception ex) + { + client.Dispose(); + _context!.Logger.Error(ex, "Failed to connect to {Host}:{Port}", + endpoint.Host, endpoint.Port); + throw; + } + } + + public async Task ListenAsync( + TransportEndpoint endpoint, + TransportListenerOptions options, + CancellationToken ct) + { + EnsureActive(); + + var listener = new TcpListener( + IPAddress.Parse(endpoint.Host), + endpoint.Port); + + listener.Server.SetSocketOption( + SocketOptionLevel.Socket, + SocketOptionName.ReuseAddress, + true); + + listener.Start(options.Backlog); + + var transportListener = new TcpTransportListener( + listener, + endpoint, + options, + _context!, + conn => _connections[conn.ConnectionId] = conn); + + _context.Logger.Info("TCP listener started on {Host}:{Port}", + endpoint.Host, endpoint.Port); + + return transportListener; + } + + public Task SubscribeAsync( + string topic, + TransportSubscriptionOptions options, + CancellationToken ct) + { + throw new NotSupportedException("TCP transport does not support pub/sub"); + } + + private void EnsureActive() + { + if (State != PluginLifecycleState.Active) + throw new InvalidOperationException($"TCP transport is not active (state: {State})"); + } + + public async ValueTask DisposeAsync() + { + foreach (var connection in _connections.Values) + { + await connection.DisposeAsync(); + } + _connections.Clear(); + State = PluginLifecycleState.Stopped; + } +} + +internal sealed class TcpTransportConnection : ITransportConnection +{ + private readonly TcpClient _client; + private readonly NetworkStream _stream; + private readonly TransportConnectionOptions _options; + private readonly IPluginLogger _logger; + private readonly TimeProvider _timeProvider; + private readonly SemaphoreSlim _sendLock = new(1, 1); + private readonly SemaphoreSlim _receiveLock = new(1, 1); + + public string ConnectionId { get; } = Guid.NewGuid().ToString("N"); + public TransportEndpoint RemoteEndpoint { get; } + public TransportConnectionState State { get; private set; } = TransportConnectionState.Connected; + + public TcpTransportConnection( + TcpClient client, + TransportEndpoint endpoint, + TransportConnectionOptions options, + IPluginLogger logger, + TimeProvider timeProvider) + { + _client = client; + _stream = client.GetStream(); + RemoteEndpoint = endpoint; + _options = options; + _logger = logger; + _timeProvider = timeProvider; + } + + public async Task SendAsync(TransportMessage message, CancellationToken ct) + { + await _sendLock.WaitAsync(ct); + try + { + var frame = FrameEncoder.Encode(message); + await _stream.WriteAsync(frame, ct); + await _stream.FlushAsync(ct); + } + finally + { + _sendLock.Release(); + } + } + + public async Task ReceiveAsync(CancellationToken ct) + { + await _receiveLock.WaitAsync(ct); + try + { + return await FrameDecoder.DecodeAsync(_stream, ct); + } + finally + { + _receiveLock.Release(); + } + } + + public async IAsyncEnumerable ReceiveStreamAsync( + [EnumeratorCancellation] CancellationToken ct) + { + while (!ct.IsCancellationRequested && State == TransportConnectionState.Connected) + { + TransportMessage message; + try + { + message = await ReceiveAsync(ct); + } + catch (IOException) when (ct.IsCancellationRequested) + { + yield break; + } + catch (Exception ex) + { + _logger.Error(ex, "Error receiving message on connection {ConnectionId}", ConnectionId); + State = TransportConnectionState.Disconnected; + yield break; + } + + yield return message; + } + } + + public async Task RequestAsync( + TransportMessage request, + TimeSpan timeout, + CancellationToken ct) + { + using var cts = CancellationTokenSource.CreateLinkedTokenSource(ct); + cts.CancelAfter(timeout); + + var correlationId = request.CorrelationId ?? Guid.NewGuid().ToString("N"); + var requestWithCorrelation = request with { CorrelationId = correlationId }; + + await SendAsync(requestWithCorrelation, cts.Token); + var response = await ReceiveAsync(cts.Token); + + if (response.CorrelationId != correlationId) + { + throw new InvalidOperationException( + $"Correlation ID mismatch: expected {correlationId}, got {response.CorrelationId}"); + } + + return response; + } + + public async ValueTask DisposeAsync() + { + State = TransportConnectionState.Disconnected; + _sendLock.Dispose(); + _receiveLock.Dispose(); + await _stream.DisposeAsync(); + _client.Dispose(); + } +} + +internal sealed class TcpTransportListener : ITransportListener +{ + private readonly TcpListener _listener; + private readonly TransportListenerOptions _options; + private readonly IPluginContext _context; + private readonly Action _onConnectionAccepted; + + public TransportEndpoint LocalEndpoint { get; } + + public TcpTransportListener( + TcpListener listener, + TransportEndpoint localEndpoint, + TransportListenerOptions options, + IPluginContext context, + Action onConnectionAccepted) + { + _listener = listener; + LocalEndpoint = localEndpoint; + _options = options; + _context = context; + _onConnectionAccepted = onConnectionAccepted; + } + + public async IAsyncEnumerable AcceptAsync( + [EnumeratorCancellation] CancellationToken ct) + { + while (!ct.IsCancellationRequested) + { + TcpClient client; + try + { + client = await _listener.AcceptTcpClientAsync(ct); + } + catch (OperationCanceledException) + { + yield break; + } + catch (SocketException ex) when (ex.SocketErrorCode == SocketError.OperationAborted) + { + yield break; + } + + var connection = new TcpTransportConnection( + client, + new TransportEndpoint( + ((IPEndPoint)client.Client.RemoteEndPoint!).Address.ToString(), + ((IPEndPoint)client.Client.RemoteEndPoint!).Port), + new TransportConnectionOptions(), + _context.Logger, + _context.TimeProvider); + + _onConnectionAccepted(connection); + + _context.Logger.Debug("Accepted TCP connection: {ConnectionId} from {RemoteEndpoint}", + connection.ConnectionId, connection.RemoteEndpoint); + + yield return connection; + } + } + + public ValueTask DisposeAsync() + { + _listener.Stop(); + return ValueTask.CompletedTask; + } +} + +public sealed class TcpTransportOptions +{ + public int DefaultBufferSize { get; set; } = 65536; + public TimeSpan DefaultConnectTimeout { get; set; } = TimeSpan.FromSeconds(30); + public bool EnableKeepAlive { get; set; } = true; +} +``` + +### RabbitMQ Transport Plugin Implementation + +```csharp +// RabbitMqTransportPlugin.cs +namespace StellaOps.Router.Plugin.RabbitMq; + +[Plugin( + id: "com.stellaops.transport.rabbitmq", + name: "RabbitMQ Transport", + version: "1.0.0", + vendor: "Stella Ops")] +[ProvidesCapability(PluginCapabilities.Transport, CapabilityId = "rabbitmq")] +[RequiresCapability(PluginCapabilities.Network)] +public sealed class RabbitMqTransportPlugin : IPlugin, ITransportCapability +{ + private IPluginContext? _context; + private IConnection? _connection; + private RabbitMqOptions? _options; + private readonly ConcurrentDictionary _channels = new(); + + public PluginInfo Info => new( + Id: "com.stellaops.transport.rabbitmq", + Name: "RabbitMQ Transport", + Version: "1.0.0", + Vendor: "Stella Ops", + Description: "RabbitMQ AMQP transport for reliable message queuing"); + + public PluginTrustLevel TrustLevel => PluginTrustLevel.BuiltIn; + public PluginCapabilities Capabilities => PluginCapabilities.Transport | PluginCapabilities.Network; + public PluginLifecycleState State { get; private set; } = PluginLifecycleState.Discovered; + + public string TransportId => "rabbitmq"; + public TransportProtocol Protocol => TransportProtocol.Amqp; + public bool SupportsPubSub => true; + public bool SupportsRequestReply => true; + public bool SupportsQueuing => true; + + public async Task InitializeAsync(IPluginContext context, CancellationToken ct) + { + _context = context; + State = PluginLifecycleState.Initializing; + + _options = context.Configuration.Bind(); + + var password = await context.Configuration.GetSecretAsync("rabbitmq-password", ct) + ?? _options.Password; + + var factory = new ConnectionFactory + { + HostName = _options.Host, + Port = _options.Port, + UserName = _options.Username, + Password = password, + VirtualHost = _options.VirtualHost, + AutomaticRecoveryEnabled = true, + NetworkRecoveryInterval = TimeSpan.FromSeconds(10), + RequestedHeartbeat = TimeSpan.FromSeconds(60) + }; + + if (_options.UseSsl) + { + factory.Ssl = new SslOption + { + Enabled = true, + ServerName = _options.Host + }; + } + + _connection = await Task.Run(() => factory.CreateConnection(), ct); + + State = PluginLifecycleState.Active; + context.Logger.Info("RabbitMQ transport connected to {Host}:{Port}", + _options.Host, _options.Port); + } + + public async Task HealthCheckAsync(CancellationToken ct) + { + if (_connection == null || !_connection.IsOpen) + return HealthCheckResult.Unhealthy("Connection not open"); + + try + { + using var channel = _connection.CreateModel(); + return HealthCheckResult.Healthy(details: new Dictionary + { + ["connected"] = _connection.IsOpen, + ["serverVersion"] = _connection.ServerProperties.TryGetValue("version", out var v) ? v : "unknown", + ["activeChannels"] = _channels.Count + }); + } + catch (Exception ex) + { + return HealthCheckResult.Unhealthy(ex); + } + } + + public Task ConnectAsync( + TransportEndpoint endpoint, + TransportConnectionOptions options, + CancellationToken ct) + { + EnsureActive(); + + var channel = _connection!.CreateModel(); + var queueName = endpoint.Path ?? throw new ArgumentException("Queue name required in endpoint path"); + + // Declare queue if it doesn't exist + channel.QueueDeclare( + queue: queueName, + durable: true, + exclusive: false, + autoDelete: false, + arguments: null); + + var connection = new RabbitMqConnection( + channel, + queueName, + endpoint, + _context!); + + _channels[connection.ConnectionId] = channel; + + return Task.FromResult(connection); + } + + public Task ListenAsync( + TransportEndpoint endpoint, + TransportListenerOptions options, + CancellationToken ct) + { + throw new NotSupportedException( + "RabbitMQ uses Subscribe for consuming messages, not Listen"); + } + + public Task SubscribeAsync( + string topic, + TransportSubscriptionOptions options, + CancellationToken ct) + { + EnsureActive(); + + var channel = _connection!.CreateModel(); + + // For topic-based routing, use fanout exchange + channel.ExchangeDeclare( + exchange: topic, + type: ExchangeType.Fanout, + durable: true); + + // Create exclusive queue for this subscription + var queueName = channel.QueueDeclare( + queue: "", + durable: false, + exclusive: true, + autoDelete: true).QueueName; + + channel.QueueBind(queueName, topic, ""); + + if (options.PrefetchCount > 0) + { + channel.BasicQos(0, (ushort)options.PrefetchCount, false); + } + + var subscription = new RabbitMqSubscription( + channel, + topic, + queueName, + options, + _context!); + + return Task.FromResult(subscription); + } + + private void EnsureActive() + { + if (State != PluginLifecycleState.Active || _connection == null || !_connection.IsOpen) + throw new InvalidOperationException("RabbitMQ transport is not active"); + } + + public async ValueTask DisposeAsync() + { + foreach (var channel in _channels.Values) + { + channel.Close(); + channel.Dispose(); + } + _channels.Clear(); + + _connection?.Close(); + _connection?.Dispose(); + _connection = null; + + State = PluginLifecycleState.Stopped; + } +} + +internal sealed class RabbitMqConnection : ITransportConnection +{ + private readonly IModel _channel; + private readonly string _queueName; + private readonly IPluginContext _context; + private readonly Channel _incomingMessages; + private readonly AsyncEventingBasicConsumer? _consumer; + + public string ConnectionId { get; } = Guid.NewGuid().ToString("N"); + public TransportEndpoint RemoteEndpoint { get; } + public TransportConnectionState State { get; private set; } = TransportConnectionState.Connected; + + public RabbitMqConnection( + IModel channel, + string queueName, + TransportEndpoint endpoint, + IPluginContext context) + { + _channel = channel; + _queueName = queueName; + RemoteEndpoint = endpoint; + _context = context; + _incomingMessages = Channel.CreateUnbounded(); + + // Start consumer + _consumer = new AsyncEventingBasicConsumer(_channel); + _consumer.Received += OnMessageReceived; + _channel.BasicConsume(_queueName, autoAck: false, _consumer); + } + + private async Task OnMessageReceived(object sender, BasicDeliverEventArgs e) + { + var message = new TransportMessage( + Id: e.BasicProperties.MessageId ?? e.DeliveryTag.ToString(), + Payload: e.Body.ToArray(), + ContentType: e.BasicProperties.ContentType, + CorrelationId: e.BasicProperties.CorrelationId, + ReplyTo: e.BasicProperties.ReplyTo, + Headers: e.BasicProperties.Headers?.ToDictionary( + h => h.Key, + h => Encoding.UTF8.GetString((byte[])h.Value)), + Timestamp: e.BasicProperties.Timestamp.UnixTime > 0 + ? DateTimeOffset.FromUnixTimeSeconds(e.BasicProperties.Timestamp.UnixTime) + : null, + Ttl: e.BasicProperties.Expiration != null + ? TimeSpan.FromMilliseconds(int.Parse(e.BasicProperties.Expiration)) + : null); + + await _incomingMessages.Writer.WriteAsync(message); + } + + public Task SendAsync(TransportMessage message, CancellationToken ct) + { + var properties = _channel.CreateBasicProperties(); + properties.MessageId = message.Id; + properties.ContentType = message.ContentType ?? "application/octet-stream"; + properties.CorrelationId = message.CorrelationId; + properties.ReplyTo = message.ReplyTo; + properties.DeliveryMode = 2; // Persistent + + if (message.Headers != null) + { + properties.Headers = message.Headers.ToDictionary( + h => h.Key, + h => (object)Encoding.UTF8.GetBytes(h.Value)); + } + + if (message.Timestamp.HasValue) + { + properties.Timestamp = new AmqpTimestamp(message.Timestamp.Value.ToUnixTimeSeconds()); + } + + if (message.Ttl.HasValue) + { + properties.Expiration = message.Ttl.Value.TotalMilliseconds.ToString(CultureInfo.InvariantCulture); + } + + _channel.BasicPublish( + exchange: "", + routingKey: _queueName, + basicProperties: properties, + body: message.Payload); + + return Task.CompletedTask; + } + + public async Task ReceiveAsync(CancellationToken ct) + { + return await _incomingMessages.Reader.ReadAsync(ct); + } + + public async IAsyncEnumerable ReceiveStreamAsync( + [EnumeratorCancellation] CancellationToken ct) + { + await foreach (var message in _incomingMessages.Reader.ReadAllAsync(ct)) + { + yield return message; + } + } + + public async Task RequestAsync( + TransportMessage request, + TimeSpan timeout, + CancellationToken ct) + { + var correlationId = request.CorrelationId ?? Guid.NewGuid().ToString("N"); + var replyQueue = _channel.QueueDeclare(queue: "", exclusive: true).QueueName; + + var requestWithReply = request with + { + CorrelationId = correlationId, + ReplyTo = replyQueue + }; + + var tcs = new TaskCompletionSource(); + + var consumer = new AsyncEventingBasicConsumer(_channel); + consumer.Received += async (_, e) => + { + if (e.BasicProperties.CorrelationId == correlationId) + { + var response = new TransportMessage( + Id: e.BasicProperties.MessageId ?? e.DeliveryTag.ToString(), + Payload: e.Body.ToArray(), + ContentType: e.BasicProperties.ContentType, + CorrelationId: e.BasicProperties.CorrelationId); + tcs.TrySetResult(response); + } + }; + + _channel.BasicConsume(replyQueue, autoAck: true, consumer); + await SendAsync(requestWithReply, ct); + + using var cts = CancellationTokenSource.CreateLinkedTokenSource(ct); + cts.CancelAfter(timeout); + + using var registration = cts.Token.Register(() => + tcs.TrySetCanceled(cts.Token)); + + return await tcs.Task; + } + + public ValueTask DisposeAsync() + { + State = TransportConnectionState.Disconnected; + _incomingMessages.Writer.Complete(); + _channel.Close(); + _channel.Dispose(); + return ValueTask.CompletedTask; + } +} + +internal sealed class RabbitMqSubscription : ITransportSubscription +{ + private readonly IModel _channel; + private readonly string _queueName; + private readonly TransportSubscriptionOptions _options; + private readonly IPluginContext _context; + private readonly Channel<(TransportMessage Message, ulong DeliveryTag)> _messages; + private readonly string _consumerTag; + + public string Topic { get; } + + public RabbitMqSubscription( + IModel channel, + string topic, + string queueName, + TransportSubscriptionOptions options, + IPluginContext context) + { + _channel = channel; + Topic = topic; + _queueName = queueName; + _options = options; + _context = context; + _messages = Channel.CreateUnbounded<(TransportMessage, ulong)>(); + + var consumer = new AsyncEventingBasicConsumer(_channel); + consumer.Received += OnMessageReceived; + _consumerTag = _channel.BasicConsume(_queueName, autoAck: options.AutoAck, consumer); + } + + private async Task OnMessageReceived(object sender, BasicDeliverEventArgs e) + { + var message = new TransportMessage( + Id: e.DeliveryTag.ToString(), + Payload: e.Body.ToArray(), + ContentType: e.BasicProperties.ContentType, + CorrelationId: e.BasicProperties.CorrelationId, + Headers: e.BasicProperties.Headers?.ToDictionary( + h => h.Key, + h => Encoding.UTF8.GetString((byte[])h.Value))); + + await _messages.Writer.WriteAsync((message, e.DeliveryTag)); + } + + public async IAsyncEnumerable MessagesAsync( + [EnumeratorCancellation] CancellationToken ct) + { + await foreach (var (message, _) in _messages.Reader.ReadAllAsync(ct)) + { + yield return message; + } + } + + public Task AcknowledgeAsync(string messageId, CancellationToken ct) + { + if (ulong.TryParse(messageId, out var deliveryTag)) + { + _channel.BasicAck(deliveryTag, multiple: false); + } + return Task.CompletedTask; + } + + public Task NegativeAcknowledgeAsync(string messageId, CancellationToken ct) + { + if (ulong.TryParse(messageId, out var deliveryTag)) + { + _channel.BasicNack(deliveryTag, multiple: false, requeue: true); + } + return Task.CompletedTask; + } + + public ValueTask DisposeAsync() + { + _channel.BasicCancel(_consumerTag); + _messages.Writer.Complete(); + _channel.Close(); + _channel.Dispose(); + return ValueTask.CompletedTask; + } +} + +public sealed class RabbitMqOptions +{ + public string Host { get; set; } = "localhost"; + public int Port { get; set; } = 5672; + public string Username { get; set; } = "guest"; + public string Password { get; set; } = "guest"; + public string VirtualHost { get; set; } = "/"; + public bool UseSsl { get; set; } = false; +} +``` + +### Migration Tasks + +| Transport | Current Interface | New Implementation | Status | +|-----------|-------------------|-------------------|--------| +| TCP | `ITransport` | `TcpTransportPlugin : IPlugin, ITransportCapability` | TODO | +| TLS | `ITransport` | `TlsTransportPlugin : IPlugin, ITransportCapability` | TODO | +| UDP | `ITransport` | `UdpTransportPlugin : IPlugin, ITransportCapability` | TODO | +| RabbitMQ | `ITransport` | `RabbitMqTransportPlugin : IPlugin, ITransportCapability` | TODO | +| Valkey | `ITransport` | `ValkeyTransportPlugin : IPlugin, ITransportCapability` | TODO | + +--- + +## Acceptance Criteria + +- [ ] All transports implement `IPlugin` +- [ ] All transports implement `ITransportCapability` +- [ ] TCP/TLS transports support connection pooling +- [ ] RabbitMQ supports pub/sub and queuing +- [ ] Valkey supports pub/sub +- [ ] Health checks verify connectivity +- [ ] Message framing is consistent +- [ ] Request/reply pattern works +- [ ] Graceful connection shutdown +- [ ] Plugin manifests for all transports +- [ ] Unit tests migrated/updated +- [ ] Integration tests with real brokers + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_001 Plugin Abstractions | Internal | TODO | +| 100_002 Plugin Host | Internal | TODO | +| RabbitMQ.Client | External | Available | +| StackExchange.Redis | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| ITransportCapability interface | TODO | | +| TcpTransportPlugin | TODO | | +| TlsTransportPlugin | TODO | | +| UdpTransportPlugin | TODO | | +| RabbitMqTransportPlugin | TODO | | +| ValkeyTransportPlugin | TODO | | +| FrameEncoder/Decoder | TODO | | +| Plugin manifests | TODO | | +| Unit tests | TODO | | +| Integration tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_100_011_PLUGIN_concelier_rework.md b/docs/implplan/SPRINT_20260110_100_011_PLUGIN_concelier_rework.md new file mode 100644 index 000000000..4a6b311ec --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_011_PLUGIN_concelier_rework.md @@ -0,0 +1,1209 @@ +# SPRINT: Concelier Connector Rework + +> **Sprint ID:** 100_011 +> **Module:** PLUGIN +> **Phase:** 100 - Plugin System Unification +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_plugin_unification.md) + +--- + +## Overview + +Rework all Concelier vulnerability feed connectors to implement the unified plugin architecture with `IPlugin` and `IFeedCapability` interfaces. + +### Objectives + +- Migrate all vulnerability feed connectors to unified plugin model +- Migrate OVAL connectors (distro security data) +- Migrate NVD/CVE connectors +- Migrate OSV connectors (ecosystem advisories) +- Migrate vendor advisory connectors +- Preserve feed synchronization semantics +- Add health checks with feed availability +- Add plugin manifests +- Support incremental/delta updates + +### Current State + +``` +src/Concelier/ +├── __Libraries/ +│ └── StellaOps.Concelier.Connectors/ +│ ├── Oval/ +│ │ ├── RedHatOvalConnector.cs +│ │ ├── UbuntuOvalConnector.cs +│ │ ├── DebianOvalConnector.cs +│ │ ├── SuseOvalConnector.cs +│ │ ├── OracleOvalConnector.cs +│ │ ├── AlmaLinuxOvalConnector.cs +│ │ ├── RockyLinuxOvalConnector.cs +│ │ └── AlpineSecDbConnector.cs +│ ├── Cve/ +│ │ ├── NvdConnector.cs +│ │ ├── MitreConnector.cs +│ │ └── CveListV5Connector.cs +│ ├── Osv/ +│ │ ├── OsvConnector.cs +│ │ ├── GhsaConnector.cs +│ │ └── GitLabAdvisoriesConnector.cs +│ ├── Vendor/ +│ │ ├── MicrosoftMsrcConnector.cs +│ │ ├── AmazonInspectorConnector.cs +│ │ └── CisaKevConnector.cs +│ └── Mirror/ +│ └── MirrorFeedConnector.cs +``` + +### Target State + +Each connector implements: +- `IPlugin` - Core plugin interface with lifecycle +- `IFeedCapability` - Feed synchronization operations +- Health checks for feed availability +- Plugin manifest for discovery +- Incremental update support +- Deterministic output ordering + +--- + +## Deliverables + +### Feed Capability Interface + +```csharp +// IFeedCapability.cs +namespace StellaOps.Plugin.Abstractions.Capabilities; + +/// +/// Capability interface for vulnerability feed ingestion. +/// +public interface IFeedCapability +{ + /// + /// Feed identifier (nvd, ghsa, redhat-oval, etc.). + /// + string FeedId { get; } + + /// + /// Feed type category. + /// + FeedType Type { get; } + + /// + /// Supported advisory formats. + /// + IReadOnlyList SupportedFormats { get; } + + /// + /// Whether this feed supports incremental updates. + /// + bool SupportsIncremental { get; } + + /// + /// Get feed metadata and statistics. + /// + Task GetMetadataAsync(CancellationToken ct); + + /// + /// Synchronize feed data. + /// + Task SyncAsync(FeedSyncOptions options, CancellationToken ct); + + /// + /// Stream advisories incrementally. + /// + IAsyncEnumerable StreamAdvisoriesAsync( + FeedStreamOptions options, + CancellationToken ct); + + /// + /// Get a specific advisory by ID. + /// + Task GetAdvisoryAsync(string advisoryId, CancellationToken ct); +} + +public enum FeedType +{ + Cve, + Oval, + Osv, + Vendor, + Kev, + Mirror +} + +public enum AdvisoryFormat +{ + Cve5, + Oval, + Osv, + Csaf, + Vex, + Custom +} + +public sealed record FeedMetadata( + string FeedId, + string Name, + string? Description, + DateTimeOffset? LastModified, + DateTimeOffset? LastSync, + long AdvisoryCount, + string? Version, + string? SourceUrl, + IReadOnlyDictionary? AdditionalInfo); + +public sealed record FeedSyncOptions( + DateTimeOffset? Since = null, + DateTimeOffset? Until = null, + bool FullSync = false, + int? MaxItems = null, + IReadOnlyList? FilterIds = null, + string? Checkpoint = null); + +public sealed record FeedSyncResult( + bool Success, + int ItemsProcessed, + int ItemsAdded, + int ItemsUpdated, + int ItemsRemoved, + DateTimeOffset SyncedAt, + TimeSpan Duration, + string? NextCheckpoint, + IReadOnlyList Errors); + +public sealed record FeedSyncError( + string AdvisoryId, + string Message, + Exception? Exception); + +public sealed record FeedStreamOptions( + DateTimeOffset? ModifiedSince = null, + int BatchSize = 100, + string? StartAfter = null, + IReadOnlyList? Ecosystems = null); + +/// +/// Normalized advisory representation. +/// +public sealed record Advisory( + string Id, + string FeedId, + AdvisoryFormat SourceFormat, + string? Title, + string? Description, + AdvisorySeverity? Severity, + CvssScore? Cvss, + DateTimeOffset? Published, + DateTimeOffset? Modified, + IReadOnlyList AffectedPackages, + IReadOnlyList References, + IReadOnlyList? Aliases, + IReadOnlyDictionary? Metadata, + ReadOnlyMemory? RawData); + +public enum AdvisorySeverity +{ + None, + Low, + Medium, + High, + Critical +} + +public sealed record CvssScore( + string Version, + double Score, + string? Vector, + string? Severity); + +public sealed record AffectedPackage( + string Name, + string? Ecosystem, + string? Purl, + VersionRange? AffectedVersions, + string? FixedVersion, + PackageStatus Status); + +public sealed record VersionRange( + string? Start, + bool StartInclusive, + string? End, + bool EndInclusive); + +public enum PackageStatus +{ + Unknown, + Affected, + NotAffected, + Fixed, + UnderInvestigation +} + +public sealed record AdvisoryReference( + string Url, + ReferenceType Type, + string? Description); + +public enum ReferenceType +{ + Advisory, + Article, + Report, + Fix, + Web, + Package, + Evidence +} +``` + +### Feed Connector Base Class + +```csharp +// FeedConnectorBase.cs +namespace StellaOps.Concelier.Plugin; + +/// +/// Base class for vulnerability feed connectors. +/// +public abstract class FeedConnectorBase : IPlugin, IFeedCapability +{ + protected IPluginContext? Context { get; private set; } + protected HttpClient? HttpClient { get; private set; } + + public abstract PluginInfo Info { get; } + public PluginTrustLevel TrustLevel => PluginTrustLevel.BuiltIn; + public PluginCapabilities Capabilities => PluginCapabilities.Feed | PluginCapabilities.Network; + public PluginLifecycleState State { get; protected set; } = PluginLifecycleState.Discovered; + + public abstract string FeedId { get; } + public abstract FeedType Type { get; } + public abstract IReadOnlyList SupportedFormats { get; } + public virtual bool SupportsIncremental => true; + + public async Task InitializeAsync(IPluginContext context, CancellationToken ct) + { + Context = context; + State = PluginLifecycleState.Initializing; + + try + { + HttpClient = context.HttpClientFactory.CreateClient(FeedId); + await InitializeConnectorAsync(context, ct); + + State = PluginLifecycleState.Active; + context.Logger.Info("{FeedId} feed connector initialized", FeedId); + } + catch (Exception ex) + { + State = PluginLifecycleState.Failed; + context.Logger.Error(ex, "Failed to initialize {FeedId} connector", FeedId); + throw; + } + } + + protected virtual Task InitializeConnectorAsync(IPluginContext context, CancellationToken ct) + => Task.CompletedTask; + + public virtual async Task HealthCheckAsync(CancellationToken ct) + { + if (State != PluginLifecycleState.Active) + return HealthCheckResult.Unhealthy($"Connector is in state {State}"); + + try + { + var metadata = await GetMetadataAsync(ct); + return HealthCheckResult.Healthy(details: new Dictionary + { + ["lastModified"] = metadata.LastModified?.ToString("O") ?? "unknown", + ["advisoryCount"] = metadata.AdvisoryCount + }); + } + catch (Exception ex) + { + return HealthCheckResult.Unhealthy(ex); + } + } + + public abstract Task GetMetadataAsync(CancellationToken ct); + public abstract Task SyncAsync(FeedSyncOptions options, CancellationToken ct); + public abstract IAsyncEnumerable StreamAdvisoriesAsync(FeedStreamOptions options, CancellationToken ct); + public abstract Task GetAdvisoryAsync(string advisoryId, CancellationToken ct); + + protected void EnsureActive() + { + if (State != PluginLifecycleState.Active) + throw new InvalidOperationException($"{FeedId} connector is not active (state: {State})"); + } + + protected static AdvisorySeverity ParseSeverity(double? cvssScore) => cvssScore switch + { + null => AdvisorySeverity.None, + < 0.1 => AdvisorySeverity.None, + < 4.0 => AdvisorySeverity.Low, + < 7.0 => AdvisorySeverity.Medium, + < 9.0 => AdvisorySeverity.High, + _ => AdvisorySeverity.Critical + }; + + public virtual ValueTask DisposeAsync() + { + HttpClient?.Dispose(); + HttpClient = null; + State = PluginLifecycleState.Stopped; + return ValueTask.CompletedTask; + } +} +``` + +### NVD Connector Plugin Implementation + +```csharp +// NvdConnectorPlugin.cs +namespace StellaOps.Concelier.Plugin.Nvd; + +[Plugin( + id: "com.stellaops.feed.nvd", + name: "NVD CVE Feed", + version: "1.0.0", + vendor: "Stella Ops")] +[ProvidesCapability(PluginCapabilities.Feed, CapabilityId = "nvd")] +public sealed class NvdConnectorPlugin : FeedConnectorBase +{ + private NvdOptions? _options; + private string? _apiKey; + + public override PluginInfo Info => new( + Id: "com.stellaops.feed.nvd", + Name: "NVD CVE Feed", + Version: "1.0.0", + Vendor: "Stella Ops", + Description: "NIST National Vulnerability Database CVE feed"); + + public override string FeedId => "nvd"; + public override FeedType Type => FeedType.Cve; + public override IReadOnlyList SupportedFormats => new[] { AdvisoryFormat.Cve5 }; + + protected override async Task InitializeConnectorAsync(IPluginContext context, CancellationToken ct) + { + _options = context.Configuration.Bind(); + _apiKey = await context.Configuration.GetSecretAsync("nvd-api-key", ct); + + if (string.IsNullOrEmpty(_apiKey)) + { + context.Logger.Warning("NVD API key not configured - rate limits will apply"); + } + } + + public override async Task GetMetadataAsync(CancellationToken ct) + { + EnsureActive(); + + // Query NVD statistics + var request = new HttpRequestMessage(HttpMethod.Get, + $"{_options!.BaseUrl}/cves/2.0?resultsPerPage=1"); + + AddApiKeyHeader(request); + + var response = await HttpClient!.SendAsync(request, ct); + response.EnsureSuccessStatusCode(); + + var content = await response.Content.ReadAsStringAsync(ct); + var result = JsonSerializer.Deserialize(content); + + return new FeedMetadata( + FeedId: FeedId, + Name: "NIST National Vulnerability Database", + Description: "Official US government CVE feed", + LastModified: result?.Timestamp, + LastSync: null, + AdvisoryCount: result?.TotalResults ?? 0, + Version: result?.Version, + SourceUrl: _options.BaseUrl, + AdditionalInfo: new Dictionary + { + ["format"] = result?.Format ?? "NVD_CVE" + }); + } + + public override async Task SyncAsync(FeedSyncOptions options, CancellationToken ct) + { + EnsureActive(); + + var sw = Stopwatch.StartNew(); + var errors = new List(); + var added = 0; + var updated = 0; + var processed = 0; + + try + { + await foreach (var advisory in StreamAdvisoriesAsync( + new FeedStreamOptions(ModifiedSince: options.Since), + ct)) + { + processed++; + + // Emit advisory for processing by caller + // In real implementation, this would invoke storage callback + + if (advisory.Published >= (options.Since ?? DateTimeOffset.MinValue)) + added++; + else + updated++; + + if (options.MaxItems.HasValue && processed >= options.MaxItems.Value) + break; + } + + sw.Stop(); + + return new FeedSyncResult( + Success: true, + ItemsProcessed: processed, + ItemsAdded: added, + ItemsUpdated: updated, + ItemsRemoved: 0, + SyncedAt: Context!.TimeProvider.GetUtcNow(), + Duration: sw.Elapsed, + NextCheckpoint: Context.TimeProvider.GetUtcNow().ToString("O"), + Errors: errors); + } + catch (Exception ex) + { + sw.Stop(); + Context!.Logger.Error(ex, "NVD sync failed"); + + return new FeedSyncResult( + Success: false, + ItemsProcessed: processed, + ItemsAdded: added, + ItemsUpdated: updated, + ItemsRemoved: 0, + SyncedAt: Context.TimeProvider.GetUtcNow(), + Duration: sw.Elapsed, + NextCheckpoint: null, + Errors: new[] { new FeedSyncError("sync", ex.Message, ex) }); + } + } + + public override async IAsyncEnumerable StreamAdvisoriesAsync( + FeedStreamOptions options, + [EnumeratorCancellation] CancellationToken ct) + { + EnsureActive(); + + var startIndex = 0; + var batchSize = Math.Min(options.BatchSize, 2000); // NVD max is 2000 + + while (!ct.IsCancellationRequested) + { + var url = BuildQueryUrl(options, startIndex, batchSize); + var request = new HttpRequestMessage(HttpMethod.Get, url); + AddApiKeyHeader(request); + + var response = await HttpClient!.SendAsync(request, ct); + + // Handle rate limiting + if (response.StatusCode == HttpStatusCode.TooManyRequests) + { + var retryAfter = response.Headers.RetryAfter?.Delta ?? TimeSpan.FromSeconds(30); + Context!.Logger.Warning("NVD rate limited, waiting {Seconds}s", retryAfter.TotalSeconds); + await Task.Delay(retryAfter, ct); + continue; + } + + response.EnsureSuccessStatusCode(); + + var content = await response.Content.ReadAsStringAsync(ct); + var result = JsonSerializer.Deserialize(content); + + if (result?.Vulnerabilities == null || result.Vulnerabilities.Count == 0) + yield break; + + foreach (var vuln in result.Vulnerabilities) + { + var advisory = MapToAdvisory(vuln); + if (advisory != null) + yield return advisory; + } + + startIndex += batchSize; + if (startIndex >= result.TotalResults) + yield break; + + // Rate limit delay (6 requests per minute without API key) + if (string.IsNullOrEmpty(_apiKey)) + { + await Task.Delay(TimeSpan.FromSeconds(10), ct); + } + } + } + + public override async Task GetAdvisoryAsync(string advisoryId, CancellationToken ct) + { + EnsureActive(); + + var request = new HttpRequestMessage(HttpMethod.Get, + $"{_options!.BaseUrl}/cves/2.0?cveId={advisoryId}"); + AddApiKeyHeader(request); + + var response = await HttpClient!.SendAsync(request, ct); + if (response.StatusCode == HttpStatusCode.NotFound) + return null; + + response.EnsureSuccessStatusCode(); + + var content = await response.Content.ReadAsStringAsync(ct); + var result = JsonSerializer.Deserialize(content); + + return result?.Vulnerabilities?.FirstOrDefault() is { } vuln + ? MapToAdvisory(vuln) + : null; + } + + private string BuildQueryUrl(FeedStreamOptions options, int startIndex, int batchSize) + { + var url = $"{_options!.BaseUrl}/cves/2.0?startIndex={startIndex}&resultsPerPage={batchSize}"; + + if (options.ModifiedSince.HasValue) + { + url += $"&lastModStartDate={options.ModifiedSince.Value:yyyy-MM-ddTHH:mm:ss.fff}Z"; + url += $"&lastModEndDate={Context!.TimeProvider.GetUtcNow():yyyy-MM-ddTHH:mm:ss.fff}Z"; + } + + return url; + } + + private void AddApiKeyHeader(HttpRequestMessage request) + { + if (!string.IsNullOrEmpty(_apiKey)) + { + request.Headers.Add("apiKey", _apiKey); + } + } + + private Advisory? MapToAdvisory(NvdVulnerability vuln) + { + var cve = vuln.Cve; + if (cve == null) return null; + + var cvss = ExtractCvss(cve); + + return new Advisory( + Id: cve.Id, + FeedId: FeedId, + SourceFormat: AdvisoryFormat.Cve5, + Title: cve.Id, + Description: cve.Descriptions?.FirstOrDefault(d => d.Lang == "en")?.Value, + Severity: ParseSeverity(cvss?.Score), + Cvss: cvss, + Published: cve.Published, + Modified: cve.LastModified, + AffectedPackages: MapAffectedPackages(cve.Configurations), + References: MapReferences(cve.References), + Aliases: null, + Metadata: new Dictionary + { + ["vulnStatus"] = cve.VulnStatus ?? "unknown", + ["source"] = "NVD" + }, + RawData: null); + } + + private CvssScore? ExtractCvss(NvdCve cve) + { + // Prefer CVSS 3.1, then 3.0, then 2.0 + var metrics = cve.Metrics; + if (metrics == null) return null; + + if (metrics.CvssMetricV31?.FirstOrDefault() is { } v31) + { + return new CvssScore( + Version: "3.1", + Score: v31.CvssData.BaseScore, + Vector: v31.CvssData.VectorString, + Severity: v31.CvssData.BaseSeverity); + } + + if (metrics.CvssMetricV30?.FirstOrDefault() is { } v30) + { + return new CvssScore( + Version: "3.0", + Score: v30.CvssData.BaseScore, + Vector: v30.CvssData.VectorString, + Severity: v30.CvssData.BaseSeverity); + } + + if (metrics.CvssMetricV2?.FirstOrDefault() is { } v2) + { + return new CvssScore( + Version: "2.0", + Score: v2.CvssData.BaseScore, + Vector: v2.CvssData.VectorString, + Severity: v2.BaseSeverity); + } + + return null; + } + + private IReadOnlyList MapAffectedPackages(IReadOnlyList? configs) + { + if (configs == null) return Array.Empty(); + + var packages = new List(); + + foreach (var config in configs) + { + foreach (var node in config.Nodes ?? Enumerable.Empty()) + { + foreach (var cpeMatch in node.CpeMatch ?? Enumerable.Empty()) + { + if (!cpeMatch.Vulnerable) continue; + + // Parse CPE to extract package info + var cpe = ParseCpe(cpeMatch.Criteria); + if (cpe == null) continue; + + packages.Add(new AffectedPackage( + Name: cpe.Product, + Ecosystem: cpe.Vendor, + Purl: null, // CPE doesn't map directly to PURL + AffectedVersions: new VersionRange( + Start: cpeMatch.VersionStartIncluding ?? cpeMatch.VersionStartExcluding, + StartInclusive: cpeMatch.VersionStartIncluding != null, + End: cpeMatch.VersionEndIncluding ?? cpeMatch.VersionEndExcluding, + EndInclusive: cpeMatch.VersionEndIncluding != null), + FixedVersion: null, + Status: PackageStatus.Affected)); + } + } + } + + // Sort for deterministic output + return packages + .OrderBy(p => p.Name, StringComparer.OrdinalIgnoreCase) + .ThenBy(p => p.Ecosystem, StringComparer.OrdinalIgnoreCase) + .ToList(); + } + + private IReadOnlyList MapReferences(IReadOnlyList? refs) + { + if (refs == null) return Array.Empty(); + + return refs + .Select(r => new AdvisoryReference( + Url: r.Url, + Type: MapReferenceType(r.Tags), + Description: r.Source)) + .OrderBy(r => r.Url, StringComparer.OrdinalIgnoreCase) + .ToList(); + } + + private static ReferenceType MapReferenceType(IReadOnlyList? tags) + { + if (tags == null || tags.Count == 0) return ReferenceType.Web; + + if (tags.Contains("Patch")) return ReferenceType.Fix; + if (tags.Contains("Vendor Advisory")) return ReferenceType.Advisory; + if (tags.Contains("Third Party Advisory")) return ReferenceType.Advisory; + if (tags.Contains("Exploit")) return ReferenceType.Evidence; + + return ReferenceType.Web; + } + + private static CpeInfo? ParseCpe(string cpe) + { + // CPE 2.3 format: cpe:2.3:a:vendor:product:version:... + var parts = cpe.Split(':'); + if (parts.Length < 5) return null; + + return new CpeInfo( + Part: parts[2], + Vendor: parts[3], + Product: parts[4], + Version: parts.Length > 5 ? parts[5] : null); + } + + private sealed record CpeInfo(string Part, string Vendor, string Product, string? Version); + + // NVD API response models + private sealed record NvdResponse( + int ResultsPerPage, + int StartIndex, + int TotalResults, + string? Format, + string? Version, + DateTimeOffset? Timestamp, + IReadOnlyList? Vulnerabilities); + + private sealed record NvdVulnerability(NvdCve? Cve); + + private sealed record NvdCve( + string Id, + string? VulnStatus, + DateTimeOffset? Published, + DateTimeOffset? LastModified, + IReadOnlyList? Descriptions, + NvdMetrics? Metrics, + IReadOnlyList? Configurations, + IReadOnlyList? References); + + private sealed record NvdDescription(string Lang, string Value); + + private sealed record NvdMetrics( + IReadOnlyList? CvssMetricV31, + IReadOnlyList? CvssMetricV30, + IReadOnlyList? CvssMetricV2); + + private sealed record NvdCvssMetricV31(NvdCvssData CvssData); + private sealed record NvdCvssMetricV30(NvdCvssData CvssData); + private sealed record NvdCvssMetricV2(NvdCvssDataV2 CvssData, string? BaseSeverity); + + private sealed record NvdCvssData(double BaseScore, string? VectorString, string? BaseSeverity); + private sealed record NvdCvssDataV2(double BaseScore, string? VectorString); + + private sealed record NvdConfiguration(IReadOnlyList? Nodes); + private sealed record NvdNode(IReadOnlyList? CpeMatch); + + private sealed record NvdCpeMatch( + bool Vulnerable, + string Criteria, + string? VersionStartIncluding, + string? VersionStartExcluding, + string? VersionEndIncluding, + string? VersionEndExcluding); + + private sealed record NvdReference(string Url, string? Source, IReadOnlyList? Tags); +} + +public sealed class NvdOptions +{ + public string BaseUrl { get; set; } = "https://services.nvd.nist.gov/rest/json"; +} +``` + +### Red Hat OVAL Connector Plugin Implementation + +```csharp +// RedHatOvalConnectorPlugin.cs +namespace StellaOps.Concelier.Plugin.Oval.RedHat; + +[Plugin( + id: "com.stellaops.feed.oval.redhat", + name: "Red Hat OVAL Feed", + version: "1.0.0", + vendor: "Stella Ops")] +[ProvidesCapability(PluginCapabilities.Feed, CapabilityId = "redhat-oval")] +public sealed class RedHatOvalConnectorPlugin : FeedConnectorBase +{ + private RedHatOvalOptions? _options; + + public override PluginInfo Info => new( + Id: "com.stellaops.feed.oval.redhat", + Name: "Red Hat OVAL Feed", + Version: "1.0.0", + Vendor: "Stella Ops", + Description: "Red Hat Enterprise Linux security advisories in OVAL format"); + + public override string FeedId => "redhat-oval"; + public override FeedType Type => FeedType.Oval; + public override IReadOnlyList SupportedFormats => new[] { AdvisoryFormat.Oval }; + + protected override Task InitializeConnectorAsync(IPluginContext context, CancellationToken ct) + { + _options = context.Configuration.Bind(); + return Task.CompletedTask; + } + + public override async Task GetMetadataAsync(CancellationToken ct) + { + EnsureActive(); + + // Check PULP repository for metadata + var response = await HttpClient!.GetAsync( + $"{_options!.BaseUrl}/PULP_MANIFEST", ct); + + if (!response.IsSuccessStatusCode) + { + return new FeedMetadata( + FeedId: FeedId, + Name: "Red Hat OVAL", + Description: "Red Hat Enterprise Linux security data", + LastModified: null, + LastSync: null, + AdvisoryCount: 0, + Version: null, + SourceUrl: _options.BaseUrl, + AdditionalInfo: null); + } + + var manifest = await response.Content.ReadAsStringAsync(ct); + var lines = manifest.Split('\n', StringSplitOptions.RemoveEmptyEntries); + + return new FeedMetadata( + FeedId: FeedId, + Name: "Red Hat OVAL", + Description: "Red Hat Enterprise Linux security data", + LastModified: response.Content.Headers.LastModified, + LastSync: null, + AdvisoryCount: lines.Length, + Version: null, + SourceUrl: _options.BaseUrl, + AdditionalInfo: new Dictionary + { + ["fileCount"] = lines.Length + }); + } + + public override async Task SyncAsync(FeedSyncOptions options, CancellationToken ct) + { + EnsureActive(); + + var sw = Stopwatch.StartNew(); + var errors = new List(); + var processed = 0; + var added = 0; + + try + { + // Get list of OVAL files from PULP manifest + var manifestResponse = await HttpClient!.GetAsync( + $"{_options!.BaseUrl}/PULP_MANIFEST", ct); + manifestResponse.EnsureSuccessStatusCode(); + + var manifest = await manifestResponse.Content.ReadAsStringAsync(ct); + var files = ParseManifest(manifest); + + foreach (var file in files) + { + if (ct.IsCancellationRequested) break; + + try + { + var ovalContent = await DownloadOvalFileAsync(file, ct); + var advisories = await ParseOvalContentAsync(ovalContent, ct); + + foreach (var advisory in advisories) + { + processed++; + added++; + } + } + catch (Exception ex) + { + errors.Add(new FeedSyncError(file.Path, ex.Message, ex)); + } + } + + sw.Stop(); + + return new FeedSyncResult( + Success: errors.Count == 0, + ItemsProcessed: processed, + ItemsAdded: added, + ItemsUpdated: 0, + ItemsRemoved: 0, + SyncedAt: Context!.TimeProvider.GetUtcNow(), + Duration: sw.Elapsed, + NextCheckpoint: Context.TimeProvider.GetUtcNow().ToString("O"), + Errors: errors); + } + catch (Exception ex) + { + sw.Stop(); + return new FeedSyncResult( + Success: false, + ItemsProcessed: processed, + ItemsAdded: added, + ItemsUpdated: 0, + ItemsRemoved: 0, + SyncedAt: Context!.TimeProvider.GetUtcNow(), + Duration: sw.Elapsed, + NextCheckpoint: null, + Errors: new[] { new FeedSyncError("manifest", ex.Message, ex) }); + } + } + + public override async IAsyncEnumerable StreamAdvisoriesAsync( + FeedStreamOptions options, + [EnumeratorCancellation] CancellationToken ct) + { + EnsureActive(); + + // Get manifest + var manifestResponse = await HttpClient!.GetAsync( + $"{_options!.BaseUrl}/PULP_MANIFEST", ct); + manifestResponse.EnsureSuccessStatusCode(); + + var manifest = await manifestResponse.Content.ReadAsStringAsync(ct); + var files = ParseManifest(manifest); + + foreach (var file in files) + { + if (ct.IsCancellationRequested) yield break; + + var ovalContent = await DownloadOvalFileAsync(file, ct); + var advisories = await ParseOvalContentAsync(ovalContent, ct); + + foreach (var advisory in advisories) + { + if (options.ModifiedSince.HasValue && + advisory.Modified < options.ModifiedSince.Value) + continue; + + yield return advisory; + } + } + } + + public override Task GetAdvisoryAsync(string advisoryId, CancellationToken ct) + { + // OVAL doesn't support individual advisory lookup + // Would need to search through all files + return Task.FromResult(null); + } + + private IReadOnlyList ParseManifest(string manifest) + { + var files = new List(); + + foreach (var line in manifest.Split('\n', StringSplitOptions.RemoveEmptyEntries)) + { + var parts = line.Split(','); + if (parts.Length < 3) continue; + + files.Add(new OvalFileInfo( + Path: parts[0].Trim(), + Checksum: parts[1].Trim(), + Size: long.Parse(parts[2].Trim()))); + } + + return files + .Where(f => f.Path.EndsWith(".xml") || f.Path.EndsWith(".xml.bz2")) + .OrderBy(f => f.Path, StringComparer.OrdinalIgnoreCase) + .ToList(); + } + + private async Task DownloadOvalFileAsync(OvalFileInfo file, CancellationToken ct) + { + var response = await HttpClient!.GetAsync($"{_options!.BaseUrl}/{file.Path}", ct); + response.EnsureSuccessStatusCode(); + + if (file.Path.EndsWith(".bz2")) + { + await using var stream = await response.Content.ReadAsStreamAsync(ct); + await using var decompressed = new BZip2InputStream(stream); + using var reader = new StreamReader(decompressed); + return await reader.ReadToEndAsync(ct); + } + + return await response.Content.ReadAsStringAsync(ct); + } + + private Task> ParseOvalContentAsync(string content, CancellationToken ct) + { + var advisories = new List(); + var doc = XDocument.Parse(content); + var ns = doc.Root?.GetDefaultNamespace() ?? XNamespace.None; + + var definitions = doc.Descendants(ns + "definition"); + + foreach (var def in definitions) + { + var id = def.Attribute("id")?.Value; + if (string.IsNullOrEmpty(id)) continue; + + var metadata = def.Element(ns + "metadata"); + var title = metadata?.Element(ns + "title")?.Value; + var description = metadata?.Element(ns + "description")?.Value; + + var advisory = metadata?.Element(ns + "advisory"); + var severity = advisory?.Element(ns + "severity")?.Value; + var issued = advisory?.Element(ns + "issued")?.Attribute("date")?.Value; + var updated = advisory?.Element(ns + "updated")?.Attribute("date")?.Value; + + var cves = advisory?.Elements(ns + "cve") + .Select(c => c.Value) + .ToList() ?? new List(); + + var references = metadata?.Elements(ns + "reference") + .Select(r => new AdvisoryReference( + Url: r.Attribute("ref_url")?.Value ?? "", + Type: ReferenceType.Advisory, + Description: r.Attribute("source")?.Value)) + .Where(r => !string.IsNullOrEmpty(r.Url)) + .ToList() ?? new List(); + + advisories.Add(new Advisory( + Id: id, + FeedId: FeedId, + SourceFormat: AdvisoryFormat.Oval, + Title: title, + Description: description, + Severity: MapOvalSeverity(severity), + Cvss: null, + Published: ParseDate(issued), + Modified: ParseDate(updated), + AffectedPackages: ParseAffectedPackages(def, ns), + References: references, + Aliases: cves, + Metadata: new Dictionary + { + ["source"] = "Red Hat OVAL" + }, + RawData: null)); + } + + // Sort for deterministic output + return Task.FromResult>(advisories + .OrderBy(a => a.Id, StringComparer.OrdinalIgnoreCase) + .ToList()); + } + + private IReadOnlyList ParseAffectedPackages(XElement definition, XNamespace ns) + { + var packages = new List(); + + // Parse criteria for RPM references + var criteria = definition.Descendants(ns + "criterion"); + + foreach (var criterion in criteria) + { + var comment = criterion.Attribute("comment")?.Value; + if (string.IsNullOrEmpty(comment)) continue; + + // Parse patterns like "package-name is earlier than 0:1.2.3-4.el8" + var match = Regex.Match(comment, @"^(.+?)\s+is earlier than\s+(.+)$"); + if (match.Success) + { + packages.Add(new AffectedPackage( + Name: match.Groups[1].Value, + Ecosystem: "rpm", + Purl: $"pkg:rpm/redhat/{match.Groups[1].Value}", + AffectedVersions: new VersionRange( + Start: null, + StartInclusive: false, + End: match.Groups[2].Value, + EndInclusive: false), + FixedVersion: match.Groups[2].Value, + Status: PackageStatus.Fixed)); + } + } + + return packages + .OrderBy(p => p.Name, StringComparer.OrdinalIgnoreCase) + .ToList(); + } + + private static AdvisorySeverity MapOvalSeverity(string? severity) => severity?.ToLowerInvariant() switch + { + "critical" => AdvisorySeverity.Critical, + "important" => AdvisorySeverity.High, + "moderate" => AdvisorySeverity.Medium, + "low" => AdvisorySeverity.Low, + _ => AdvisorySeverity.None + }; + + private static DateTimeOffset? ParseDate(string? date) + { + if (string.IsNullOrEmpty(date)) return null; + return DateTimeOffset.TryParse(date, out var result) ? result : null; + } + + private sealed record OvalFileInfo(string Path, string Checksum, long Size); +} + +public sealed class RedHatOvalOptions +{ + public string BaseUrl { get; set; } = "https://www.redhat.com/security/data/oval/v2"; +} +``` + +### Migration Tasks + +| Connector | Current Interface | New Implementation | Status | +|-----------|-------------------|-------------------|--------| +| NVD | `IFeedConnector` | `NvdConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| MITRE | `IFeedConnector` | `MitreConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| CVE List v5 | `IFeedConnector` | `CveListV5ConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| Red Hat OVAL | `IFeedConnector` | `RedHatOvalConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| Ubuntu OVAL | `IFeedConnector` | `UbuntuOvalConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| Debian OVAL | `IFeedConnector` | `DebianOvalConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| SUSE OVAL | `IFeedConnector` | `SuseOvalConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| Oracle OVAL | `IFeedConnector` | `OracleOvalConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| AlmaLinux OVAL | `IFeedConnector` | `AlmaLinuxOvalConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| Rocky Linux OVAL | `IFeedConnector` | `RockyLinuxOvalConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| Alpine SecDB | `IFeedConnector` | `AlpineSecDbConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| OSV | `IFeedConnector` | `OsvConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| GHSA | `IFeedConnector` | `GhsaConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| GitLab Advisories | `IFeedConnector` | `GitLabAdvisoriesConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| Microsoft MSRC | `IFeedConnector` | `MsrcConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| Amazon Inspector | `IFeedConnector` | `AmazonInspectorConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| CISA KEV | `IFeedConnector` | `CisaKevConnectorPlugin : IPlugin, IFeedCapability` | TODO | +| Mirror | `IFeedConnector` | `MirrorFeedConnectorPlugin : IPlugin, IFeedCapability` | TODO | + +--- + +## Acceptance Criteria + +- [ ] All feed connectors implement `IPlugin` +- [ ] All feed connectors implement `IFeedCapability` +- [ ] Incremental sync with checkpoints works +- [ ] Full sync works for all feeds +- [ ] Advisory streaming works +- [ ] Deterministic output ordering maintained +- [ ] Health checks verify feed availability +- [ ] Rate limiting handled gracefully +- [ ] Plugin manifests for all connectors +- [ ] Air-gap mirror connector works +- [ ] Unit tests migrated/updated +- [ ] Integration tests with mock feeds + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_001 Plugin Abstractions | Internal | TODO | +| 100_002 Plugin Host | Internal | TODO | +| SharpCompress | External | Available (bz2) | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IFeedCapability interface | TODO | | +| FeedConnectorBase | TODO | | +| NvdConnectorPlugin | TODO | | +| MitreConnectorPlugin | TODO | | +| CveListV5ConnectorPlugin | TODO | | +| RedHatOvalConnectorPlugin | TODO | | +| UbuntuOvalConnectorPlugin | TODO | | +| DebianOvalConnectorPlugin | TODO | | +| SuseOvalConnectorPlugin | TODO | | +| OracleOvalConnectorPlugin | TODO | | +| AlmaLinuxOvalConnectorPlugin | TODO | | +| RockyLinuxOvalConnectorPlugin | TODO | | +| AlpineSecDbConnectorPlugin | TODO | | +| OsvConnectorPlugin | TODO | | +| GhsaConnectorPlugin | TODO | | +| GitLabAdvisoriesConnectorPlugin | TODO | | +| MsrcConnectorPlugin | TODO | | +| AmazonInspectorConnectorPlugin | TODO | | +| CisaKevConnectorPlugin | TODO | | +| MirrorFeedConnectorPlugin | TODO | | +| Plugin manifests | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_100_012_PLUGIN_sdk.md b/docs/implplan/SPRINT_20260110_100_012_PLUGIN_sdk.md new file mode 100644 index 000000000..43291d8a5 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_100_012_PLUGIN_sdk.md @@ -0,0 +1,1168 @@ +# SPRINT: Plugin SDK & Developer Experience + +> **Sprint ID:** 100_012 +> **Module:** PLUGIN +> **Phase:** 100 - Plugin System Unification +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_plugin_unification.md) + +--- + +## Overview + +Create a comprehensive Plugin SDK that provides developers with tools, templates, testing utilities, and documentation for building plugins for the Stella Ops platform. + +### Objectives + +- Create plugin project templates (dotnet new) +- Build plugin development CLI tool +- Create plugin testing framework +- Build plugin packaging tooling +- Create plugin validation tooling +- Write comprehensive documentation +- Provide sample plugins for each capability + +### Target Deliverables + +``` +src/ +├── Plugin/ +│ ├── StellaOps.Plugin.Sdk/ # SDK library +│ ├── StellaOps.Plugin.Templates/ # dotnet new templates +│ ├── StellaOps.Plugin.Testing/ # Testing utilities +│ ├── StellaOps.Plugin.Cli/ # Plugin development CLI +│ └── StellaOps.Plugin.Samples/ # Sample plugins +``` + +--- + +## Deliverables + +### Plugin SDK Library + +```csharp +// StellaOps.Plugin.Sdk - Main entry point for plugin developers + +namespace StellaOps.Plugin.Sdk; + +/// +/// Base class for simplified plugin development. +/// Provides common patterns and reduces boilerplate. +/// +public abstract class PluginBase : IPlugin +{ + private IPluginContext? _context; + protected IPluginLogger Logger => _context?.Logger ?? NullPluginLogger.Instance; + protected IPluginConfiguration Configuration => _context?.Configuration ?? EmptyConfiguration.Instance; + protected TimeProvider TimeProvider => _context?.TimeProvider ?? TimeProvider.System; + + public abstract PluginInfo Info { get; } + public virtual PluginTrustLevel TrustLevel => PluginTrustLevel.Untrusted; + public abstract PluginCapabilities Capabilities { get; } + public PluginLifecycleState State { get; protected set; } = PluginLifecycleState.Discovered; + + public async Task InitializeAsync(IPluginContext context, CancellationToken ct) + { + _context = context; + State = PluginLifecycleState.Initializing; + + try + { + await OnInitializeAsync(context, ct); + State = PluginLifecycleState.Active; + Logger.Info("Plugin {PluginId} initialized successfully", Info.Id); + } + catch (Exception ex) + { + State = PluginLifecycleState.Failed; + Logger.Error(ex, "Plugin {PluginId} failed to initialize", Info.Id); + throw; + } + } + + protected virtual Task OnInitializeAsync(IPluginContext context, CancellationToken ct) + => Task.CompletedTask; + + public virtual Task HealthCheckAsync(CancellationToken ct) + { + return Task.FromResult(State == PluginLifecycleState.Active + ? HealthCheckResult.Healthy() + : HealthCheckResult.Unhealthy($"Plugin is in state {State}")); + } + + public virtual async ValueTask DisposeAsync() + { + try + { + await OnDisposeAsync(); + } + finally + { + State = PluginLifecycleState.Stopped; + } + } + + protected virtual ValueTask OnDisposeAsync() => ValueTask.CompletedTask; +} + +/// +/// Fluent builder for creating PluginInfo. +/// +public sealed class PluginInfoBuilder +{ + private string _id = ""; + private string _name = ""; + private string _version = "1.0.0"; + private string _vendor = ""; + private string? _description; + private string? _licenseId; + private string? _homepage; + private string? _repository; + private readonly List _dependencies = new(); + private readonly Dictionary _metadata = new(); + + public PluginInfoBuilder WithId(string id) + { + _id = id; + return this; + } + + public PluginInfoBuilder WithName(string name) + { + _name = name; + return this; + } + + public PluginInfoBuilder WithVersion(string version) + { + _version = version; + return this; + } + + public PluginInfoBuilder WithVendor(string vendor) + { + _vendor = vendor; + return this; + } + + public PluginInfoBuilder WithDescription(string description) + { + _description = description; + return this; + } + + public PluginInfoBuilder WithLicense(string licenseId) + { + _licenseId = licenseId; + return this; + } + + public PluginInfoBuilder WithHomepage(string homepage) + { + _homepage = homepage; + return this; + } + + public PluginInfoBuilder WithRepository(string repository) + { + _repository = repository; + return this; + } + + public PluginInfoBuilder DependsOn(string pluginId, string? versionRange = null) + { + _dependencies.Add(new PluginDependency(pluginId, versionRange, false)); + return this; + } + + public PluginInfoBuilder OptionallyDependsOn(string pluginId, string? versionRange = null) + { + _dependencies.Add(new PluginDependency(pluginId, versionRange, true)); + return this; + } + + public PluginInfoBuilder WithMetadata(string key, string value) + { + _metadata[key] = value; + return this; + } + + public PluginInfo Build() + { + if (string.IsNullOrEmpty(_id)) + throw new InvalidOperationException("Plugin ID is required"); + if (string.IsNullOrEmpty(_name)) + throw new InvalidOperationException("Plugin name is required"); + + return new PluginInfo( + Id: _id, + Name: _name, + Version: _version, + Vendor: _vendor, + Description: _description, + LicenseId: _licenseId, + Homepage: _homepage, + Repository: _repository, + Dependencies: _dependencies, + Metadata: _metadata.Count > 0 ? _metadata : null); + } +} + +/// +/// Extension methods for common plugin operations. +/// +public static class PluginExtensions +{ + /// + /// Get configuration value with type conversion. + /// + public static T GetValue(this IPluginConfiguration config, string key, T defaultValue = default!) + { + var value = config.GetValue(key); + if (value == null) return defaultValue; + + return (T)Convert.ChangeType(value, typeof(T), CultureInfo.InvariantCulture); + } + + /// + /// Get secret with caching. + /// + public static async Task GetCachedSecretAsync( + this IPluginConfiguration config, + string key, + TimeSpan cacheDuration, + CancellationToken ct) + { + // Implementation would cache secrets to reduce vault calls + return await config.GetSecretAsync(key, ct); + } + + /// + /// Create a scoped logger for a specific operation. + /// + public static IDisposable BeginScope(this IPluginLogger logger, string operationName) + { + logger.Debug("Starting operation: {Operation}", operationName); + var sw = Stopwatch.StartNew(); + + return new ScopeDisposable(() => + { + sw.Stop(); + logger.Debug("Completed operation: {Operation} in {Elapsed}ms", + operationName, sw.ElapsedMilliseconds); + }); + } + + private sealed class ScopeDisposable(Action onDispose) : IDisposable + { + public void Dispose() => onDispose(); + } +} + +/// +/// Attribute for marking plugin configuration properties. +/// +[AttributeUsage(AttributeTargets.Property)] +public sealed class PluginConfigAttribute : Attribute +{ + public string? Key { get; set; } + public string? Description { get; set; } + public bool Required { get; set; } + public object? DefaultValue { get; set; } + public bool Secret { get; set; } +} + +/// +/// Options base class with validation support. +/// +public abstract class PluginOptionsBase : IValidatableObject +{ + public virtual IEnumerable Validate(ValidationContext validationContext) + { + yield break; + } +} +``` + +### Plugin Testing Framework + +```csharp +// StellaOps.Plugin.Testing + +namespace StellaOps.Plugin.Testing; + +/// +/// Test host for running plugins in isolation during testing. +/// +public sealed class PluginTestHost : IAsyncDisposable +{ + private readonly List _plugins = new(); + private readonly TestPluginContext _context; + + public PluginTestHost(Action? configure = null) + { + var options = new PluginTestHostOptions(); + configure?.Invoke(options); + + _context = new TestPluginContext(options); + } + + /// + /// Load and initialize a plugin. + /// + public async Task LoadPluginAsync(CancellationToken ct = default) where T : IPlugin, new() + { + var plugin = new T(); + await plugin.InitializeAsync(_context, ct); + _plugins.Add(plugin); + return plugin; + } + + /// + /// Load and initialize a plugin with custom configuration. + /// + public async Task LoadPluginAsync( + Dictionary configuration, + CancellationToken ct = default) where T : IPlugin, new() + { + foreach (var (key, value) in configuration) + { + _context.Configuration.SetValue(key, value); + } + + return await LoadPluginAsync(ct); + } + + /// + /// Get the test context for assertions. + /// + public TestPluginContext Context => _context; + + /// + /// Verify plugin health. + /// + public async Task CheckHealthAsync(T plugin, CancellationToken ct = default) + where T : IPlugin + { + return await plugin.HealthCheckAsync(ct); + } + + public async ValueTask DisposeAsync() + { + foreach (var plugin in _plugins) + { + await plugin.DisposeAsync(); + } + _plugins.Clear(); + } +} + +/// +/// Options for configuring the test host. +/// +public sealed class PluginTestHostOptions +{ + public bool EnableLogging { get; set; } = true; + public LogLevel MinLogLevel { get; set; } = LogLevel.Debug; + public TimeProvider? TimeProvider { get; set; } + public Dictionary Secrets { get; } = new(); + public Dictionary Configuration { get; } = new(); +} + +/// +/// Test implementation of IPluginContext. +/// +public sealed class TestPluginContext : IPluginContext +{ + public TestPluginConfiguration Configuration { get; } + public TestPluginLogger Logger { get; } + public TimeProvider TimeProvider { get; } + public IHttpClientFactory HttpClientFactory { get; } + public IGuidGenerator GuidGenerator { get; } + + IPluginConfiguration IPluginContext.Configuration => Configuration; + IPluginLogger IPluginContext.Logger => Logger; + + public TestPluginContext(PluginTestHostOptions options) + { + Configuration = new TestPluginConfiguration(options.Configuration, options.Secrets); + Logger = new TestPluginLogger(options.MinLogLevel, options.EnableLogging); + TimeProvider = options.TimeProvider ?? new FakeTimeProvider(DateTimeOffset.UtcNow); + HttpClientFactory = new TestHttpClientFactory(); + GuidGenerator = new SequentialGuidGenerator(); + } +} + +/// +/// Test implementation of plugin configuration. +/// +public sealed class TestPluginConfiguration : IPluginConfiguration +{ + private readonly Dictionary _values; + private readonly Dictionary _secrets; + + public TestPluginConfiguration( + Dictionary values, + Dictionary secrets) + { + _values = new Dictionary(values); + _secrets = new Dictionary(secrets); + } + + public string? GetValue(string key) + { + return _values.TryGetValue(key, out var value) ? value?.ToString() : null; + } + + public void SetValue(string key, object value) + { + _values[key] = value; + } + + public T Bind() where T : new() + { + var result = new T(); + var properties = typeof(T).GetProperties(); + + foreach (var prop in properties) + { + var key = prop.Name; + var configAttr = prop.GetCustomAttribute(); + if (configAttr?.Key != null) + key = configAttr.Key; + + if (_values.TryGetValue(key, out var value)) + { + prop.SetValue(result, Convert.ChangeType(value, prop.PropertyType)); + } + } + + return result; + } + + public Task GetSecretAsync(string key, CancellationToken ct) + { + return Task.FromResult(_secrets.TryGetValue(key, out var secret) ? secret : null); + } + + public void SetSecret(string key, string value) + { + _secrets[key] = value; + } +} + +/// +/// Test logger that captures log entries for assertions. +/// +public sealed class TestPluginLogger : IPluginLogger +{ + private readonly LogLevel _minLevel; + private readonly bool _enabled; + private readonly List _entries = new(); + private readonly object _lock = new(); + + public IReadOnlyList Entries + { + get + { + lock (_lock) return _entries.ToList(); + } + } + + public TestPluginLogger(LogLevel minLevel, bool enabled) + { + _minLevel = minLevel; + _enabled = enabled; + } + + public void Log(LogLevel level, string message, params object[] args) + { + if (!_enabled || level < _minLevel) return; + + var formatted = args.Length > 0 + ? string.Format(CultureInfo.InvariantCulture, message, args) + : message; + + lock (_lock) + { + _entries.Add(new LogEntry(level, formatted, null)); + } + + if (_enabled) + { + Console.WriteLine($"[{level}] {formatted}"); + } + } + + public void Log(LogLevel level, Exception exception, string message, params object[] args) + { + if (!_enabled || level < _minLevel) return; + + var formatted = args.Length > 0 + ? string.Format(CultureInfo.InvariantCulture, message, args) + : message; + + lock (_lock) + { + _entries.Add(new LogEntry(level, formatted, exception)); + } + + if (_enabled) + { + Console.WriteLine($"[{level}] {formatted}"); + Console.WriteLine(exception); + } + } + + public void Debug(string message, params object[] args) => Log(LogLevel.Debug, message, args); + public void Info(string message, params object[] args) => Log(LogLevel.Information, message, args); + public void Warning(string message, params object[] args) => Log(LogLevel.Warning, message, args); + public void Warning(Exception ex, string message, params object[] args) => Log(LogLevel.Warning, ex, message, args); + public void Error(string message, params object[] args) => Log(LogLevel.Error, message, args); + public void Error(Exception ex, string message, params object[] args) => Log(LogLevel.Error, ex, message, args); + + public bool HasLoggedAtLevel(LogLevel level) => Entries.Any(e => e.Level == level); + public bool HasLoggedError() => HasLoggedAtLevel(LogLevel.Error); + public bool HasLoggedWarning() => HasLoggedAtLevel(LogLevel.Warning); + + public void Clear() + { + lock (_lock) _entries.Clear(); + } +} + +public sealed record LogEntry(LogLevel Level, string Message, Exception? Exception); + +/// +/// Fake time provider for deterministic testing. +/// +public sealed class FakeTimeProvider : TimeProvider +{ + private DateTimeOffset _now; + + public FakeTimeProvider(DateTimeOffset startTime) + { + _now = startTime; + } + + public override DateTimeOffset GetUtcNow() => _now; + + public void Advance(TimeSpan duration) => _now += duration; + public void SetTime(DateTimeOffset time) => _now = time; +} + +/// +/// Sequential GUID generator for deterministic testing. +/// +public sealed class SequentialGuidGenerator : IGuidGenerator +{ + private int _counter; + + public Guid NewGuid() + { + var counter = Interlocked.Increment(ref _counter); + var bytes = new byte[16]; + BitConverter.GetBytes(counter).CopyTo(bytes, 0); + return new Guid(bytes); + } + + public void Reset() => _counter = 0; +} + +/// +/// Test HTTP client factory with request recording. +/// +public sealed class TestHttpClientFactory : IHttpClientFactory +{ + private readonly Dictionary _handlers = new(); + private readonly List _requests = new(); + + public HttpClient CreateClient(string name) + { + if (!_handlers.TryGetValue(name, out var handler)) + { + handler = new MockHttpMessageHandler(_requests); + _handlers[name] = handler; + } + + return new HttpClient(handler); + } + + public void SetupResponse(string name, string url, HttpResponseMessage response) + { + if (!_handlers.TryGetValue(name, out var handler)) + { + handler = new MockHttpMessageHandler(_requests); + _handlers[name] = handler; + } + + handler.SetupResponse(url, response); + } + + public IReadOnlyList RecordedRequests => _requests; +} + +internal sealed class MockHttpMessageHandler : HttpMessageHandler +{ + private readonly List _requests; + private readonly Dictionary _responses = new(); + + public MockHttpMessageHandler(List requests) + { + _requests = requests; + } + + public void SetupResponse(string url, HttpResponseMessage response) + { + _responses[url] = response; + } + + protected override Task SendAsync( + HttpRequestMessage request, + CancellationToken cancellationToken) + { + _requests.Add(request); + + var url = request.RequestUri?.ToString() ?? ""; + if (_responses.TryGetValue(url, out var response)) + { + return Task.FromResult(response); + } + + return Task.FromResult(new HttpResponseMessage(HttpStatusCode.NotFound)); + } +} + +/// +/// xUnit test fixtures for plugin testing. +/// +public abstract class PluginTestBase : IAsyncLifetime where TPlugin : IPlugin, new() +{ + protected PluginTestHost Host { get; private set; } = null!; + protected TPlugin Plugin { get; private set; } = default!; + protected TestPluginContext Context => Host.Context; + + protected virtual void ConfigureHost(PluginTestHostOptions options) { } + protected virtual Dictionary GetConfiguration() => new(); + + public virtual async Task InitializeAsync() + { + Host = new PluginTestHost(ConfigureHost); + Plugin = await Host.LoadPluginAsync(GetConfiguration()); + } + + public virtual async Task DisposeAsync() + { + await Host.DisposeAsync(); + } +} +``` + +### Plugin CLI Tool + +```csharp +// StellaOps.Plugin.Cli - Command-line tool for plugin development + +namespace StellaOps.Plugin.Cli; + +/// +/// CLI commands for plugin development workflow. +/// +public static class PluginCliCommands +{ + /// + /// Create a new plugin project from template. + /// + [Command("new")] + public static class NewCommand + { + [CommandOption("--name", "-n", Description = "Plugin name")] + public required string Name { get; set; } + + [CommandOption("--capability", "-c", Description = "Plugin capability type")] + public PluginCapabilities Capability { get; set; } = PluginCapabilities.Custom; + + [CommandOption("--output", "-o", Description = "Output directory")] + public string Output { get; set; } = "."; + + public async Task ExecuteAsync() + { + Console.WriteLine($"Creating new plugin: {Name}"); + Console.WriteLine($"Capability: {Capability}"); + Console.WriteLine($"Output: {Output}"); + + var generator = new PluginProjectGenerator(); + await generator.GenerateAsync(Name, Capability, Output); + + Console.WriteLine("Plugin project created successfully!"); + return 0; + } + } + + /// + /// Validate a plugin manifest. + /// + [Command("validate")] + public static class ValidateCommand + { + [CommandOption("--manifest", "-m", Description = "Path to plugin manifest")] + public string Manifest { get; set; } = "plugin.yaml"; + + public async Task ExecuteAsync() + { + Console.WriteLine($"Validating manifest: {Manifest}"); + + var validator = new PluginManifestValidator(); + var result = await validator.ValidateAsync(Manifest); + + if (result.IsValid) + { + Console.WriteLine("Manifest is valid."); + return 0; + } + + Console.WriteLine("Validation errors:"); + foreach (var error in result.Errors) + { + Console.WriteLine($" - {error}"); + } + return 1; + } + } + + /// + /// Package a plugin for distribution. + /// + [Command("pack")] + public static class PackCommand + { + [CommandOption("--project", "-p", Description = "Path to plugin project")] + public string Project { get; set; } = "."; + + [CommandOption("--output", "-o", Description = "Output directory for package")] + public string Output { get; set; } = "./packages"; + + [CommandOption("--configuration", "-c", Description = "Build configuration")] + public string Configuration { get; set; } = "Release"; + + public async Task ExecuteAsync() + { + Console.WriteLine($"Packaging plugin from: {Project}"); + + var packager = new PluginPackager(); + var package = await packager.PackAsync(Project, Output, Configuration); + + Console.WriteLine($"Package created: {package}"); + return 0; + } + } + + /// + /// Run a plugin locally for testing. + /// + [Command("run")] + public static class RunCommand + { + [CommandOption("--project", "-p", Description = "Path to plugin project")] + public string Project { get; set; } = "."; + + [CommandOption("--config", "-c", Description = "Path to configuration file")] + public string? Config { get; set; } + + public async Task ExecuteAsync() + { + Console.WriteLine($"Running plugin from: {Project}"); + + var runner = new PluginLocalRunner(); + await runner.RunAsync(Project, Config, CancellationToken.None); + + return 0; + } + } + + /// + /// Generate plugin manifest from code. + /// + [Command("manifest")] + public static class ManifestCommand + { + [CommandOption("--assembly", "-a", Description = "Path to plugin assembly")] + public required string Assembly { get; set; } + + [CommandOption("--output", "-o", Description = "Output path for manifest")] + public string Output { get; set; } = "plugin.yaml"; + + public async Task ExecuteAsync() + { + Console.WriteLine($"Generating manifest from: {Assembly}"); + + var generator = new ManifestGenerator(); + await generator.GenerateFromAssemblyAsync(Assembly, Output); + + Console.WriteLine($"Manifest generated: {Output}"); + return 0; + } + } + + /// + /// Test plugin in isolated environment. + /// + [Command("test")] + public static class TestCommand + { + [CommandOption("--project", "-p", Description = "Path to plugin project")] + public string Project { get; set; } = "."; + + [CommandOption("--filter", "-f", Description = "Test filter")] + public string? Filter { get; set; } + + public async Task ExecuteAsync() + { + Console.WriteLine($"Testing plugin: {Project}"); + + var tester = new PluginTestRunner(); + var result = await tester.RunAsync(Project, Filter); + + Console.WriteLine($"Tests: {result.Passed} passed, {result.Failed} failed"); + return result.Failed > 0 ? 1 : 0; + } + } +} + +/// +/// Generates plugin project structure from templates. +/// +public sealed class PluginProjectGenerator +{ + public async Task GenerateAsync(string name, PluginCapabilities capability, string outputDir) + { + var projectDir = Path.Combine(outputDir, name); + Directory.CreateDirectory(projectDir); + + // Generate .csproj + var csproj = GenerateCsproj(name, capability); + await File.WriteAllTextAsync(Path.Combine(projectDir, $"{name}.csproj"), csproj); + + // Generate main plugin class + var pluginClass = GeneratePluginClass(name, capability); + await File.WriteAllTextAsync(Path.Combine(projectDir, $"{ToPascalCase(name)}Plugin.cs"), pluginClass); + + // Generate plugin manifest + var manifest = GenerateManifest(name, capability); + await File.WriteAllTextAsync(Path.Combine(projectDir, "plugin.yaml"), manifest); + + // Generate options class + var options = GenerateOptions(name); + await File.WriteAllTextAsync(Path.Combine(projectDir, $"{ToPascalCase(name)}Options.cs"), options); + + // Generate test project + var testDir = Path.Combine(projectDir, "Tests"); + Directory.CreateDirectory(testDir); + + var testCsproj = GenerateTestCsproj(name); + await File.WriteAllTextAsync(Path.Combine(testDir, $"{name}.Tests.csproj"), testCsproj); + + var testClass = GenerateTestClass(name); + await File.WriteAllTextAsync(Path.Combine(testDir, $"{ToPascalCase(name)}PluginTests.cs"), testClass); + } + + private string GenerateCsproj(string name, PluginCapabilities capability) => $""" + + + net10.0 + enable + enable + true + + + + + + + + + + + + """; + + private string GeneratePluginClass(string name, PluginCapabilities capability) + { + var className = ToPascalCase(name); + var capabilityInterface = GetCapabilityInterface(capability); + + return $$""" + using StellaOps.Plugin.Abstractions; + using StellaOps.Plugin.Sdk; + + namespace {{className}}; + + [Plugin( + id: "com.example.{{name.ToLowerInvariant()}}", + name: "{{className}}", + version: "1.0.0", + vendor: "Your Company")] + [ProvidesCapability(PluginCapabilities.{{capability}}, CapabilityId = "{{name.ToLowerInvariant()}}")] + public sealed class {{className}}Plugin : PluginBase{{(capabilityInterface != null ? $", {capabilityInterface}" : "")}} + { + private {{className}}Options? _options; + + public override PluginInfo Info => new PluginInfoBuilder() + .WithId("com.example.{{name.ToLowerInvariant()}}") + .WithName("{{className}}") + .WithVersion("1.0.0") + .WithVendor("Your Company") + .WithDescription("Description of your plugin") + .Build(); + + public override PluginCapabilities Capabilities => PluginCapabilities.{{capability}}; + + protected override async Task OnInitializeAsync(IPluginContext context, CancellationToken ct) + { + _options = context.Configuration.Bind<{{className}}Options>(); + + // Add your initialization logic here + Logger.Info("{{className}} plugin initialized"); + } + + public override async Task HealthCheckAsync(CancellationToken ct) + { + // Add your health check logic here + return HealthCheckResult.Healthy(); + } + + protected override async ValueTask OnDisposeAsync() + { + // Add your cleanup logic here + } + } + """; + } + + private string GenerateManifest(string name, PluginCapabilities capability) => $""" + plugin: + id: com.example.{name.ToLowerInvariant()} + name: {ToPascalCase(name)} + version: 1.0.0 + vendor: Your Company + description: Description of your plugin + license: MIT + + entryPoint: {ToPascalCase(name)}.{ToPascalCase(name)}Plugin + + minPlatformVersion: 1.0.0 + + capabilities: + - type: {capability.ToString().ToLowerInvariant()} + id: {name.ToLowerInvariant()} + + configSchema: + type: object + properties: + exampleSetting: + type: string + description: An example configuration setting + required: [] + """; + + private string GenerateOptions(string name) => $""" + using System.ComponentModel.DataAnnotations; + using StellaOps.Plugin.Sdk; + + namespace {ToPascalCase(name)}; + + public sealed class {ToPascalCase(name)}Options : PluginOptionsBase + {{ + [PluginConfig(Description = "An example configuration setting")] + public string? ExampleSetting {{ get; set; }} + }} + """; + + private string GenerateTestCsproj(string name) => $""" + + + net10.0 + enable + enable + false + + + + + + + + + + + + + + """; + + private string GenerateTestClass(string name) => $""" + using StellaOps.Plugin.Testing; + using Xunit; + + namespace {ToPascalCase(name)}.Tests; + + public class {ToPascalCase(name)}PluginTests : PluginTestBase<{ToPascalCase(name)}Plugin> + {{ + [Fact] + public async Task Plugin_Initializes_Successfully() + {{ + // Assert plugin is in active state after initialization + Assert.Equal(PluginLifecycleState.Active, Plugin.State); + }} + + [Fact] + public async Task HealthCheck_Returns_Healthy() + {{ + var result = await Host.CheckHealthAsync(Plugin); + Assert.Equal(HealthStatus.Healthy, result.Status); + }} + }} + """; + + private static string? GetCapabilityInterface(PluginCapabilities capability) => capability switch + { + PluginCapabilities.Crypto => "ICryptoCapability", + PluginCapabilities.Auth => "IAuthCapability", + PluginCapabilities.Llm => "ILlmCapability", + PluginCapabilities.Scm => "IScmCapability", + PluginCapabilities.Analysis => "IAnalysisCapability", + PluginCapabilities.Transport => "ITransportCapability", + PluginCapabilities.Feed => "IFeedCapability", + PluginCapabilities.WorkflowStep => "IStepProviderCapability", + PluginCapabilities.PromotionGate => "IGateProviderCapability", + _ => null + }; + + private static string ToPascalCase(string name) + { + return string.Join("", name.Split('-', '_') + .Select(s => char.ToUpperInvariant(s[0]) + s[1..])); + } +} + +/// +/// Validates plugin manifests. +/// +public sealed class PluginManifestValidator +{ + public async Task ValidateAsync(string manifestPath) + { + var errors = new List(); + + if (!File.Exists(manifestPath)) + { + errors.Add($"Manifest file not found: {manifestPath}"); + return new ValidationResult(false, errors); + } + + var content = await File.ReadAllTextAsync(manifestPath); + + try + { + var manifest = YamlDeserializer.Deserialize(content); + + // Validate required fields + if (string.IsNullOrEmpty(manifest.Plugin?.Id)) + errors.Add("Plugin ID is required"); + + if (string.IsNullOrEmpty(manifest.Plugin?.Name)) + errors.Add("Plugin name is required"); + + if (string.IsNullOrEmpty(manifest.Plugin?.Version)) + errors.Add("Plugin version is required"); + + if (string.IsNullOrEmpty(manifest.EntryPoint)) + errors.Add("Entry point is required"); + + // Validate ID format + if (manifest.Plugin?.Id != null && !PluginIdPattern.IsMatch(manifest.Plugin.Id)) + errors.Add("Plugin ID must be in reverse domain notation (e.g., com.example.plugin)"); + + // Validate version format + if (manifest.Plugin?.Version != null && !SemVerPattern.IsMatch(manifest.Plugin.Version)) + errors.Add("Plugin version must be valid SemVer"); + } + catch (Exception ex) + { + errors.Add($"Failed to parse manifest: {ex.Message}"); + } + + return new ValidationResult(errors.Count == 0, errors); + } + + private static readonly Regex PluginIdPattern = new(@"^[a-z][a-z0-9]*(\.[a-z][a-z0-9]*)+$", RegexOptions.Compiled); + private static readonly Regex SemVerPattern = new(@"^\d+\.\d+\.\d+(-[a-zA-Z0-9]+)?$", RegexOptions.Compiled); + + public sealed record ValidationResult(bool IsValid, IReadOnlyList Errors); +} +``` + +### Sample Plugins + +```yaml +# Directory structure for sample plugins +samples/ +├── HelloWorldPlugin/ # Basic plugin example +├── CustomStepPlugin/ # Workflow step example +├── CustomGatePlugin/ # Promotion gate example +├── WebhookReceiverPlugin/ # Webhook handling example +└── MetricsCollectorPlugin/ # Metrics capability example +``` + +--- + +## Acceptance Criteria + +- [ ] dotnet new templates work for all capability types +- [ ] Plugin CLI tool builds and runs +- [ ] `stellaops-plugin new` creates valid projects +- [ ] `stellaops-plugin validate` validates manifests +- [ ] `stellaops-plugin pack` creates distributable packages +- [ ] `stellaops-plugin test` runs plugin tests +- [ ] Testing framework provides all mock implementations +- [ ] Deterministic testing with FakeTimeProvider works +- [ ] HTTP request recording works in tests +- [ ] Sample plugins compile and pass tests +- [ ] Documentation is comprehensive +- [ ] API reference generated from XML docs + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_001 Plugin Abstractions | Internal | TODO | +| 100_002 Plugin Host | Internal | TODO | +| YamlDotNet | External | Available | +| McMaster.Extensions.CommandLineUtils | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| StellaOps.Plugin.Sdk | TODO | | +| StellaOps.Plugin.Templates | TODO | | +| StellaOps.Plugin.Testing | TODO | | +| StellaOps.Plugin.Cli | TODO | | +| HelloWorldPlugin sample | TODO | | +| CustomStepPlugin sample | TODO | | +| CustomGatePlugin sample | TODO | | +| WebhookReceiverPlugin sample | TODO | | +| Developer documentation | TODO | | +| API reference | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_101_000_INDEX_foundation.md b/docs/implplan/SPRINT_20260110_101_000_INDEX_foundation.md new file mode 100644 index 000000000..a4c31dcec --- /dev/null +++ b/docs/implplan/SPRINT_20260110_101_000_INDEX_foundation.md @@ -0,0 +1,200 @@ +# SPRINT INDEX: Phase 1 - Foundation + +> **Epic:** Release Orchestrator +> **Phase:** 1 - Foundation +> **Batch:** 101 +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_release_orchestrator.md) +> **Prerequisites:** [100_000_INDEX - Plugin System Unification](SPRINT_20260110_100_000_INDEX_plugin_unification.md) (must be completed first) + +--- + +## Overview + +Phase 1 establishes the foundational infrastructure for the Release Orchestrator: database schema and Release Orchestrator-specific plugin extensions. The unified plugin system from Phase 100 provides the core plugin infrastructure; this phase builds on it with Release Orchestrator domain-specific capabilities. + +### Prerequisites + +**Phase 100 - Plugin System Unification** must be completed before starting Phase 101. Phase 100 provides: +- `IPlugin` base interface and lifecycle management +- `IPluginHost` and `PluginHost` implementation +- Database-backed plugin registry +- Plugin sandbox infrastructure +- Core capability interfaces (ICryptoCapability, IAuthCapability, etc.) +- Plugin SDK and developer tooling + +### Objectives + +- Create PostgreSQL schema for all release orchestration tables +- Extend plugin registry with Release Orchestrator-specific capability types +- Implement `IStepProviderCapability` for workflow steps +- Implement `IGateProviderCapability` for promotion gates +- Deliver built-in step and gate providers + +--- + +## Sprint Structure + +| Sprint ID | Title | Module | Status | Dependencies | +|-----------|-------|--------|--------|--------------| +| 101_001 | Database Schema - Core Tables | DB | TODO | Phase 100 complete | +| 101_002 | Plugin Registry Extensions | PLUGIN | TODO | 101_001, 100_003 | +| 101_003 | Loader & Sandbox Extensions | PLUGIN | TODO | 101_002, 100_002, 100_004 | +| 101_004 | SDK Extensions | PLUGIN | TODO | 101_003, 100_012 | + +> **Note:** Sprint numbers 101_002-101_004 now focus on Release Orchestrator-specific plugin extensions rather than duplicating the unified plugin infrastructure built in Phase 100. + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ FOUNDATION LAYER │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ DATABASE SCHEMA (101_001) │ │ +│ │ │ │ +│ │ release.integration_types release.environments │ │ +│ │ release.integrations release.targets │ │ +│ │ release.components release.releases │ │ +│ │ release.workflow_templates release.promotions │ │ +│ │ release.deployment_jobs release.evidence_packets │ │ +│ │ release.plugins release.agents │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ PLUGIN SYSTEM │ │ +│ │ │ │ +│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ +│ │ │ Plugin Registry │ │ Plugin Loader │ │ Plugin Sandbox │ │ │ +│ │ │ (101_002) │ │ (101_003) │ │ (101_003) │ │ │ +│ │ │ │ │ │ │ │ │ │ +│ │ │ - Discovery │ │ - Load/Unload │ │ - Process │ │ │ +│ │ │ - Versioning │ │ - Health check │ │ isolation │ │ │ +│ │ │ - Dependencies │ │ - Hot reload │ │ - Resource │ │ │ +│ │ │ - Manifest │ │ - Lifecycle │ │ limits │ │ │ +│ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Plugin SDK (101_004) │ │ │ +│ │ │ │ │ │ +│ │ │ - Connector interfaces - Step provider interfaces │ │ │ +│ │ │ - Gate provider interfaces - Manifest builder │ │ │ +│ │ │ - Testing utilities - Documentation templates │ │ │ +│ │ └─────────────────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Deliverables Summary + +### 101_001: Database Schema + +| Deliverable | Type | Description | +|-------------|------|-------------| +| Migration 001 | SQL | Integration hub tables | +| Migration 002 | SQL | Environment tables | +| Migration 003 | SQL | Release management tables | +| Migration 004 | SQL | Workflow engine tables | +| Migration 005 | SQL | Promotion tables | +| Migration 006 | SQL | Deployment tables | +| Migration 007 | SQL | Agent tables | +| Migration 008 | SQL | Evidence tables | +| Migration 009 | SQL | Plugin tables | +| RLS Policies | SQL | Row-level security | +| Indexes | SQL | Performance indexes | + +### 101_002: Plugin Registry + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IPluginRegistry` | Interface | Plugin discovery/versioning | +| `PluginRegistry` | Class | Implementation | +| `PluginManifest` | Record | Manifest schema | +| `PluginManifestValidator` | Class | Schema validation | +| `PluginVersion` | Record | SemVer handling | +| `PluginDependencyResolver` | Class | Dependency resolution | + +### 101_003: Plugin Loader & Sandbox + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IPluginLoader` | Interface | Load/unload/reload | +| `PluginLoader` | Class | Implementation | +| `IPluginSandbox` | Interface | Isolation contract | +| `ContainerSandbox` | Class | Container-based isolation | +| `ProcessSandbox` | Class | Process-based isolation | +| `ResourceLimiter` | Class | CPU/memory limits | +| `PluginHealthMonitor` | Class | Health checking | + +### 101_004: Plugin SDK + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `StellaOps.Plugin.Sdk` | NuGet | SDK package | +| `IConnectorPlugin` | Interface | Connector contract | +| `IStepProvider` | Interface | Step contract | +| `IGateProvider` | Interface | Gate contract | +| `ManifestBuilder` | Class | Fluent manifest building | +| Plugin Templates | dotnet new | Project templates | +| Documentation | Markdown | SDK documentation | + +--- + +## Dependencies + +### Phase Dependencies + +| Phase | Purpose | Status | +|-------|---------|--------| +| **Phase 100 - Plugin System Unification** | Unified plugin infrastructure | TODO | +| 100_001 Plugin Abstractions | IPlugin, capabilities | TODO | +| 100_002 Plugin Host | Lifecycle management | TODO | +| 100_003 Plugin Registry | Database registry | TODO | +| 100_004 Plugin Sandbox | Process isolation | TODO | +| 100_012 Plugin SDK | Developer tooling | TODO | + +### External Dependencies + +| Dependency | Purpose | +|------------|---------| +| PostgreSQL 16+ | Database | +| Docker | Plugin sandbox (via Phase 100) | +| gRPC | Plugin communication (via Phase 100) | + +### Internal Dependencies + +| Module | Purpose | +|--------|---------| +| Authority | Tenant context, permissions | +| Telemetry | Metrics, tracing | +| StellaOps.Plugin.Abstractions | Core plugin interfaces (from Phase 100) | +| StellaOps.Plugin.Host | Plugin host (from Phase 100) | +| StellaOps.Plugin.Sdk | SDK library (from Phase 100) | + +--- + +## Acceptance Criteria + +- [ ] All database migrations execute successfully +- [ ] RLS policies enforce tenant isolation +- [ ] Plugin manifest validation covers all required fields +- [ ] Plugin loader can load, start, stop, and unload plugins +- [ ] Sandbox enforces resource limits +- [ ] SDK compiles to NuGet package +- [ ] Sample plugin builds and loads successfully +- [ ] Unit test coverage ≥80% +- [ ] Integration tests pass + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Phase 1 index created | +| 10-Jan-2026 | Added Phase 100 (Plugin System Unification) as prerequisite - plugin infrastructure now centralized | diff --git a/docs/implplan/SPRINT_20260110_101_001_DB_schema_core_tables.md b/docs/implplan/SPRINT_20260110_101_001_DB_schema_core_tables.md new file mode 100644 index 000000000..329a2ac65 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_101_001_DB_schema_core_tables.md @@ -0,0 +1,617 @@ +# SPRINT: Database Schema - Core Tables + +> **Sprint ID:** 101_001 +> **Module:** DB +> **Phase:** 1 - Foundation +> **Status:** TODO +> **Parent:** [101_000_INDEX](SPRINT_20260110_101_000_INDEX_foundation.md) + +--- + +## Overview + +Create the PostgreSQL schema for all Release Orchestrator tables within the `release` schema. This sprint establishes the data model foundation for all subsequent modules. + +> **NORMATIVE:** This sprint MUST comply with [docs/db/SPECIFICATION.md](../../db/SPECIFICATION.md) which defines the authoritative database design patterns for Stella Ops, including schema ownership, RLS policies, UUID generation, and JSONB conventions. + +### Objectives + +- Create `release` schema with RLS policies per SPECIFICATION.md +- Implement all core tables for 10 platform themes +- Add performance indexes and constraints +- Create audit triggers for append-only tables +- Use `require_current_tenant()` RLS helper pattern +- Add generated columns for JSONB hot paths + +### Working Directory + +``` +src/Platform/__Libraries/StellaOps.Platform.Database/ +├── Migrations/ +│ └── Release/ +│ ├── 001_IntegrationHub.sql +│ ├── 002_Environments.sql +│ ├── 003_ReleaseManagement.sql +│ ├── 004_Workflow.sql +│ ├── 005_Promotion.sql +│ ├── 006_Deployment.sql +│ ├── 007_Agents.sql +│ ├── 008_Evidence.sql +│ └── 009_Plugin.sql +└── ReleaseSchema/ + ├── Tables/ + ├── Indexes/ + ├── Functions/ + └── Policies/ +``` + +--- + +## Architecture Reference + +- [Data Model Schema](../modules/release-orchestrator/data-model/schema.md) +- [Entity Definitions](../modules/release-orchestrator/data-model/entities.md) +- [Security Overview](../modules/release-orchestrator/security/overview.md) + +--- + +## Deliverables + +### Migration 001: Integration Hub Tables + +| Table | Description | Key Columns | +|-------|-------------|-------------| +| `release.integration_types` | Enum-like type registry | `id`, `name`, `category` | +| `release.integrations` | Configured integrations | `id`, `tenant_id`, `type_id`, `name`, `config_encrypted` | +| `release.integration_health_checks` | Health check history | `id`, `integration_id`, `status`, `checked_at` | + +```sql +-- release.integration_types +CREATE TABLE release.integration_types ( + id TEXT PRIMARY KEY, + name TEXT NOT NULL, + category TEXT NOT NULL CHECK (category IN ('scm', 'ci', 'registry', 'vault', 'notify')), + description TEXT, + config_schema JSONB NOT NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +-- release.integrations +CREATE TABLE release.integrations ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + type_id TEXT NOT NULL REFERENCES release.integration_types(id), + name TEXT NOT NULL, + display_name TEXT NOT NULL, + config_encrypted BYTEA NOT NULL, + is_enabled BOOLEAN NOT NULL DEFAULT true, + health_status TEXT NOT NULL DEFAULT 'unknown', + last_health_check TIMESTAMPTZ, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + created_by UUID NOT NULL, + UNIQUE (tenant_id, name) +); +``` + +### Migration 002: Environment Tables + +| Table | Description | Key Columns | +|-------|-------------|-------------| +| `release.environments` | Deployment environments | `id`, `tenant_id`, `name`, `order_index` | +| `release.targets` | Deployment targets | `id`, `environment_id`, `type`, `connection_config` | +| `release.freeze_windows` | Deployment freeze periods | `id`, `environment_id`, `start_at`, `end_at` | + +```sql +-- release.environments +CREATE TABLE release.environments ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + name TEXT NOT NULL, + display_name TEXT NOT NULL, + description TEXT, + order_index INT NOT NULL, + is_production BOOLEAN NOT NULL DEFAULT false, + required_approvals INT NOT NULL DEFAULT 0, + require_separation_of_duties BOOLEAN NOT NULL DEFAULT false, + auto_promote_from UUID REFERENCES release.environments(id), + deployment_timeout_seconds INT NOT NULL DEFAULT 600, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + created_by UUID NOT NULL, + UNIQUE (tenant_id, name), + UNIQUE (tenant_id, order_index) +); + +-- release.targets +CREATE TABLE release.targets ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + environment_id UUID NOT NULL REFERENCES release.environments(id), + name TEXT NOT NULL, + display_name TEXT NOT NULL, + type TEXT NOT NULL CHECK (type IN ('docker_host', 'compose_host', 'ecs_service', 'nomad_job')), + connection_config_encrypted BYTEA NOT NULL, + agent_id UUID, + health_status TEXT NOT NULL DEFAULT 'unknown', + last_health_check TIMESTAMPTZ, + last_sync_at TIMESTAMPTZ, + inventory_snapshot JSONB, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + UNIQUE (tenant_id, environment_id, name) +); +``` + +### Migration 003: Release Management Tables + +| Table | Description | Key Columns | +|-------|-------------|-------------| +| `release.components` | Container components | `id`, `tenant_id`, `name`, `registry_integration_id` | +| `release.component_versions` | Version snapshots | `id`, `component_id`, `digest`, `semver` | +| `release.releases` | Release bundles | `id`, `tenant_id`, `name`, `status` | +| `release.release_components` | Release-component mapping | `release_id`, `component_version_id` | + +```sql +-- release.components +CREATE TABLE release.components ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + name TEXT NOT NULL, + display_name TEXT NOT NULL, + description TEXT, + registry_integration_id UUID NOT NULL REFERENCES release.integrations(id), + repository TEXT NOT NULL, + scm_integration_id UUID REFERENCES release.integrations(id), + scm_repository TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + UNIQUE (tenant_id, name) +); + +-- release.releases +CREATE TABLE release.releases ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + name TEXT NOT NULL, + display_name TEXT NOT NULL, + description TEXT, + status TEXT NOT NULL DEFAULT 'draft' CHECK (status IN ('draft', 'ready', 'promoting', 'deployed', 'deprecated')), + source_commit_sha TEXT, + source_branch TEXT, + ci_build_id TEXT, + ci_pipeline_url TEXT, + finalized_at TIMESTAMPTZ, + finalized_by UUID, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + created_by UUID NOT NULL, + UNIQUE (tenant_id, name) +); +``` + +### Migration 004: Workflow Tables + +| Table | Description | Key Columns | +|-------|-------------|-------------| +| `release.workflow_templates` | DAG templates | `id`, `tenant_id`, `name`, `definition` | +| `release.workflow_runs` | Workflow executions | `id`, `template_id`, `status` | +| `release.workflow_steps` | Step definitions | `id`, `run_id`, `step_type`, `status` | + +```sql +-- release.workflow_templates +CREATE TABLE release.workflow_templates ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + name TEXT NOT NULL, + display_name TEXT NOT NULL, + description TEXT, + definition JSONB NOT NULL, + version INT NOT NULL DEFAULT 1, + is_active BOOLEAN NOT NULL DEFAULT true, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + UNIQUE (tenant_id, name, version) +); + +-- release.workflow_runs +CREATE TABLE release.workflow_runs ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + template_id UUID NOT NULL REFERENCES release.workflow_templates(id), + template_version INT NOT NULL, + context_type TEXT NOT NULL, + context_id UUID NOT NULL, + status TEXT NOT NULL DEFAULT 'pending' CHECK (status IN ('pending', 'running', 'succeeded', 'failed', 'cancelled')), + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + error_message TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT now() +); +``` + +### Migration 005: Promotion Tables + +| Table | Description | Key Columns | +|-------|-------------|-------------| +| `release.promotions` | Promotion requests | `id`, `release_id`, `target_environment_id`, `status` | +| `release.approvals` | Approval records | `id`, `promotion_id`, `approver_id`, `decision` | +| `release.gate_results` | Gate evaluation results | `id`, `promotion_id`, `gate_type`, `passed` | + +```sql +-- release.promotions +CREATE TABLE release.promotions ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + release_id UUID NOT NULL REFERENCES release.releases(id), + source_environment_id UUID REFERENCES release.environments(id), + target_environment_id UUID NOT NULL REFERENCES release.environments(id), + status TEXT NOT NULL DEFAULT 'pending' CHECK (status IN ( + 'pending', 'awaiting_approval', 'approved', 'rejected', + 'deploying', 'deployed', 'failed', 'cancelled', 'rolled_back' + )), + requested_by UUID NOT NULL, + requested_at TIMESTAMPTZ NOT NULL DEFAULT now(), + request_reason TEXT, + decision TEXT CHECK (decision IN ('allow', 'block')), + decided_at TIMESTAMPTZ, + deployment_job_id UUID, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +-- release.approvals (append-only) +CREATE TABLE release.approvals ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + promotion_id UUID NOT NULL REFERENCES release.promotions(id), + approver_id UUID NOT NULL, + decision TEXT NOT NULL CHECK (decision IN ('approved', 'rejected')), + comment TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT now() + -- No updated_at - append only +); + +-- Prevent modifications to approvals +REVOKE UPDATE, DELETE ON release.approvals FROM app_role; +``` + +### Migration 006: Deployment Tables + +| Table | Description | Key Columns | +|-------|-------------|-------------| +| `release.deployment_jobs` | Deployment executions | `id`, `promotion_id`, `strategy`, `status` | +| `release.deployment_tasks` | Per-target tasks | `id`, `job_id`, `target_id`, `status` | +| `release.deployment_artifacts` | Generated artifacts | `id`, `job_id`, `type`, `storage_ref` | + +```sql +-- release.deployment_jobs +CREATE TABLE release.deployment_jobs ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + promotion_id UUID NOT NULL REFERENCES release.promotions(id), + strategy TEXT NOT NULL DEFAULT 'rolling' CHECK (strategy IN ('rolling', 'blue_green', 'canary', 'all_at_once')), + status TEXT NOT NULL DEFAULT 'pending' CHECK (status IN ( + 'pending', 'pulling', 'deploying', 'verifying', + 'succeeded', 'failed', 'rolling_back', 'rolled_back' + )), + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + error_message TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +-- release.deployment_tasks +CREATE TABLE release.deployment_tasks ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + job_id UUID NOT NULL REFERENCES release.deployment_jobs(id), + target_id UUID NOT NULL REFERENCES release.targets(id), + agent_id UUID, + status TEXT NOT NULL DEFAULT 'pending' CHECK (status IN ( + 'pending', 'assigned', 'pulling', 'deploying', + 'verifying', 'succeeded', 'failed' + )), + digest_deployed TEXT, + sticker_written BOOLEAN NOT NULL DEFAULT false, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + error_message TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +); +``` + +### Migration 007: Agent Tables + +| Table | Description | Key Columns | +|-------|-------------|-------------| +| `release.agents` | Registered agents | `id`, `tenant_id`, `name`, `status` | +| `release.agent_capabilities` | Agent capabilities | `agent_id`, `capability` | +| `release.agent_heartbeats` | Heartbeat history | `id`, `agent_id`, `received_at` | + +```sql +-- release.agents +CREATE TABLE release.agents ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + name TEXT NOT NULL, + display_name TEXT NOT NULL, + version TEXT NOT NULL, + status TEXT NOT NULL DEFAULT 'pending' CHECK (status IN ('pending', 'active', 'inactive', 'revoked')), + certificate_thumbprint TEXT, + certificate_expires_at TIMESTAMPTZ, + last_heartbeat_at TIMESTAMPTZ, + last_heartbeat_status JSONB, + registered_at TIMESTAMPTZ, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + UNIQUE (tenant_id, name) +); + +-- release.agent_capabilities +CREATE TABLE release.agent_capabilities ( + agent_id UUID NOT NULL REFERENCES release.agents(id) ON DELETE CASCADE, + capability TEXT NOT NULL CHECK (capability IN ('docker', 'compose', 'ssh', 'winrm')), + config JSONB, + PRIMARY KEY (agent_id, capability) +); +``` + +### Migration 008: Evidence Tables + +| Table | Description | Key Columns | +|-------|-------------|-------------| +| `release.evidence_packets` | Signed evidence (append-only) | `id`, `promotion_id`, `type`, `content` | + +```sql +-- release.evidence_packets (append-only, immutable) +CREATE TABLE release.evidence_packets ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + promotion_id UUID NOT NULL REFERENCES release.promotions(id), + type TEXT NOT NULL CHECK (type IN ('release_decision', 'deployment', 'rollback', 'ab_promotion')), + version TEXT NOT NULL DEFAULT '1.0', + content JSONB NOT NULL, + content_hash TEXT NOT NULL, + signature TEXT NOT NULL, + signature_algorithm TEXT NOT NULL, + signer_key_ref TEXT NOT NULL, + generated_at TIMESTAMPTZ NOT NULL, + generator_version TEXT NOT NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT now() + -- No updated_at - packets are immutable +); + +-- Prevent modifications +REVOKE UPDATE, DELETE ON release.evidence_packets FROM app_role; + +-- Index for quick lookups +CREATE INDEX idx_evidence_packets_promotion ON release.evidence_packets(promotion_id); +CREATE INDEX idx_evidence_packets_type ON release.evidence_packets(tenant_id, type); +``` + +### Migration 009: Plugin Tables + +| Table | Description | Key Columns | +|-------|-------------|-------------| +| `release.plugins` | Registered plugins | `id`, `tenant_id`, `name`, `type` | +| `release.plugin_versions` | Plugin versions | `id`, `plugin_id`, `version`, `manifest` | + +```sql +-- release.plugins +CREATE TABLE release.plugins ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID REFERENCES tenants(id), -- NULL for system plugins + name TEXT NOT NULL, + display_name TEXT NOT NULL, + description TEXT, + type TEXT NOT NULL CHECK (type IN ('connector', 'step', 'gate')), + is_builtin BOOLEAN NOT NULL DEFAULT false, + is_enabled BOOLEAN NOT NULL DEFAULT true, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + UNIQUE (COALESCE(tenant_id, '00000000-0000-0000-0000-000000000000'::UUID), name) +); + +-- release.plugin_versions +CREATE TABLE release.plugin_versions ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + plugin_id UUID NOT NULL REFERENCES release.plugins(id), + version TEXT NOT NULL, + manifest JSONB NOT NULL, + package_hash TEXT NOT NULL, + package_url TEXT, + is_active BOOLEAN NOT NULL DEFAULT false, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + UNIQUE (plugin_id, version) +); +``` + +### RLS Policies + +Following the pattern established in `docs/db/SPECIFICATION.md`, all RLS policies use the `require_current_tenant()` helper function for consistent tenant isolation. + +```sql +-- Create helper function per SPECIFICATION.md Section 2.3 +CREATE OR REPLACE FUNCTION release_app.require_current_tenant() +RETURNS UUID +LANGUAGE sql +STABLE +AS $$ + SELECT COALESCE( + NULLIF(current_setting('app.current_tenant_id', true), '')::UUID, + (SELECT id FROM shared.tenants WHERE is_default = true LIMIT 1) + ) +$$; + +-- Enable RLS on all tables +ALTER TABLE release.integrations ENABLE ROW LEVEL SECURITY; +ALTER TABLE release.environments ENABLE ROW LEVEL SECURITY; +ALTER TABLE release.targets ENABLE ROW LEVEL SECURITY; +ALTER TABLE release.components ENABLE ROW LEVEL SECURITY; +ALTER TABLE release.releases ENABLE ROW LEVEL SECURITY; +ALTER TABLE release.promotions ENABLE ROW LEVEL SECURITY; +ALTER TABLE release.approvals ENABLE ROW LEVEL SECURITY; +ALTER TABLE release.deployment_jobs ENABLE ROW LEVEL SECURITY; +ALTER TABLE release.deployment_tasks ENABLE ROW LEVEL SECURITY; +ALTER TABLE release.agents ENABLE ROW LEVEL SECURITY; +ALTER TABLE release.evidence_packets ENABLE ROW LEVEL SECURITY; +ALTER TABLE release.plugins ENABLE ROW LEVEL SECURITY; + +-- Standard tenant isolation policy using helper (example for integrations) +CREATE POLICY tenant_isolation ON release.integrations + USING (tenant_id = release_app.require_current_tenant()); + +-- Repeat pattern for all tenant-scoped tables +``` + +### Performance Indexes + +```sql +-- High-cardinality lookup indexes +CREATE INDEX idx_releases_tenant_status ON release.releases(tenant_id, status); +CREATE INDEX idx_promotions_tenant_status ON release.promotions(tenant_id, status); +CREATE INDEX idx_promotions_release ON release.promotions(release_id); +CREATE INDEX idx_deployment_jobs_promotion ON release.deployment_jobs(promotion_id); +CREATE INDEX idx_deployment_tasks_job ON release.deployment_tasks(job_id); +CREATE INDEX idx_agents_tenant_status ON release.agents(tenant_id, status); +CREATE INDEX idx_targets_environment ON release.targets(environment_id); +CREATE INDEX idx_targets_agent ON release.targets(agent_id); + +-- Partial indexes for active records +CREATE INDEX idx_promotions_pending ON release.promotions(tenant_id, target_environment_id) + WHERE status IN ('pending', 'awaiting_approval'); +CREATE INDEX idx_agents_active ON release.agents(tenant_id) + WHERE status = 'active'; +``` + +### Generated Columns for JSONB Hot Paths + +Per `docs/db/SPECIFICATION.md` Section 4.5, use generated columns for frequently-queried JSONB fields to enable efficient indexing and avoid repeated JSON parsing. + +```sql +-- Evidence packets: extract release_id for quick lookups +ALTER TABLE release.evidence_packets + ADD COLUMN release_id UUID GENERATED ALWAYS AS ( + (content->>'releaseId')::UUID + ) STORED; + +-- Evidence packets: extract what.type for filtering by evidence type +ALTER TABLE release.evidence_packets + ADD COLUMN evidence_what_type TEXT GENERATED ALWAYS AS ( + content->'what'->>'type' + ) STORED; + +-- Workflow templates: extract step count for UI display +ALTER TABLE release.workflow_templates + ADD COLUMN step_count INT GENERATED ALWAYS AS ( + jsonb_array_length(COALESCE(definition->'steps', '[]'::JSONB)) + ) STORED; + +-- Agents: extract primary capability from last heartbeat +ALTER TABLE release.agents + ADD COLUMN primary_capability TEXT GENERATED ALWAYS AS ( + last_heartbeat_status->>'primaryCapability' + ) STORED; + +-- Targets: extract deployed digest from inventory snapshot +ALTER TABLE release.targets + ADD COLUMN current_digest TEXT GENERATED ALWAYS AS ( + inventory_snapshot->>'digest' + ) STORED; + +-- Index generated columns for efficient queries +CREATE INDEX idx_evidence_packets_release ON release.evidence_packets(release_id); +CREATE INDEX idx_evidence_packets_what_type ON release.evidence_packets(tenant_id, evidence_what_type); +CREATE INDEX idx_targets_current_digest ON release.targets(current_digest) WHERE current_digest IS NOT NULL; +``` + +--- + +## Acceptance Criteria + +- [ ] All 9 migrations execute successfully in order +- [ ] Schema complies with docs/db/SPECIFICATION.md +- [ ] RLS policies use `require_current_tenant()` helper +- [ ] RLS policies enforce tenant isolation +- [ ] Append-only tables reject UPDATE/DELETE +- [ ] All foreign key constraints valid +- [ ] Performance indexes created +- [ ] Generated columns created for JSONB hot paths +- [ ] Schema documentation generated +- [ ] Migration rollback scripts created +- [ ] Integration tests pass with Testcontainers + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `MigrationOrderTest` | Verify migrations run in dependency order | +| `RlsPolicyTest` | Verify tenant isolation enforced | +| `AppendOnlyTest` | Verify UPDATE/DELETE rejected on evidence tables | +| `ForeignKeyTest` | Verify all FK constraints | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `SchemaCreationTest` | Full schema creation on fresh database | +| `MigrationIdempotencyTest` | Migrations can be re-run safely | +| `PerformanceIndexTest` | Verify indexes used in common queries | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| PostgreSQL 16+ | External | Available | +| `tenants` table | Internal | Exists | +| Testcontainers | Testing | Available | + +--- + +## Risks & Mitigations + +| Risk | Impact | Mitigation | +|------|--------|------------| +| Schema conflicts with existing tables | High | Use dedicated `release` schema | +| Migration performance on large DBs | Medium | Use concurrent index creation | +| RLS policy overhead | Low | Benchmark and optimize | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| Migration 001 - Integration Hub | TODO | | +| Migration 002 - Environments | TODO | | +| Migration 003 - Release Management | TODO | | +| Migration 004 - Workflow | TODO | | +| Migration 005 - Promotion | TODO | | +| Migration 006 - Deployment | TODO | | +| Migration 007 - Agents | TODO | | +| Migration 008 - Evidence | TODO | | +| Migration 009 - Plugin | TODO | | +| RLS Policies | TODO | Uses `require_current_tenant()` helper | +| Performance Indexes | TODO | | +| Generated Columns | TODO | JSONB hot paths for evidence, workflows, agents | +| Integration Tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 10-Jan-2026 | Added reference to docs/db/SPECIFICATION.md as normative | +| 10-Jan-2026 | Added require_current_tenant() RLS helper pattern | +| 10-Jan-2026 | Added generated columns for JSONB hot paths (evidence_packets, workflow_templates, agents, targets) | diff --git a/docs/implplan/SPRINT_20260110_101_002_PLUGIN_registry.md b/docs/implplan/SPRINT_20260110_101_002_PLUGIN_registry.md new file mode 100644 index 000000000..1e483eead --- /dev/null +++ b/docs/implplan/SPRINT_20260110_101_002_PLUGIN_registry.md @@ -0,0 +1,938 @@ +# SPRINT: Plugin Registry Extensions for Release Orchestrator + +> **Sprint ID:** 101_002 +> **Module:** PLUGIN +> **Phase:** 1 - Foundation +> **Status:** TODO +> **Parent:** [101_000_INDEX](SPRINT_20260110_101_000_INDEX_foundation.md) +> **Prerequisites:** [100_003 Plugin Registry](SPRINT_20260110_100_003_PLUGIN_registry.md), [100_001 Plugin Abstractions](SPRINT_20260110_100_001_PLUGIN_abstractions.md) + +--- + +## Overview + +Extend the unified plugin registry (from Phase 100) with Release Orchestrator-specific capability types, including workflow step providers, promotion gate providers, and integration connectors. This sprint builds on top of the core `IPluginRegistry` infrastructure. + +> **Note:** The core plugin registry (`IPluginRegistry`, `PostgresPluginRegistry`, database schema) is implemented in Phase 100 sprint 100_003. This sprint adds Release Orchestrator domain-specific extensions. + +### Objectives + +- Register Release Orchestrator capability interfaces with the plugin system +- Define `IStepProviderCapability` for workflow steps +- Define `IGateProviderCapability` for promotion gates +- Define `IConnectorCapability` variants for Integration Hub +- Create domain-specific registry queries +- Add capability-specific validation + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Plugin/ +│ ├── Capabilities/ +│ │ ├── IStepProviderCapability.cs +│ │ ├── IGateProviderCapability.cs +│ │ ├── IScmConnectorCapability.cs +│ │ ├── IRegistryConnectorCapability.cs +│ │ ├── IVaultConnectorCapability.cs +│ │ ├── INotifyConnectorCapability.cs +│ │ └── ICiConnectorCapability.cs +│ ├── Registry/ +│ │ ├── ReleaseOrchestratorPluginRegistry.cs +│ │ ├── StepProviderRegistry.cs +│ │ ├── GateProviderRegistry.cs +│ │ └── ConnectorRegistry.cs +│ └── Models/ +│ ├── StepDefinition.cs +│ ├── GateDefinition.cs +│ └── ConnectorDefinition.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Plugin.Tests/ + ├── StepProviderRegistryTests.cs + ├── GateProviderRegistryTests.cs + └── ConnectorRegistryTests.cs +``` + +--- + +## Architecture Reference + +- [Phase 100 Plugin System](SPRINT_20260110_100_000_INDEX_plugin_unification.md) +- [Plugin System](../modules/release-orchestrator/modules/plugin-system.md) +- [Workflow Engine](../modules/release-orchestrator/modules/workflow-engine.md) + +--- + +## Deliverables + +### IStepProviderCapability Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Capabilities; + +/// +/// Capability interface for workflow step providers. +/// Plugins implementing this capability can provide custom workflow steps. +/// +public interface IStepProviderCapability +{ + /// + /// Get step definitions provided by this plugin. + /// + IReadOnlyList GetStepDefinitions(); + + /// + /// Execute a step. + /// + Task ExecuteStepAsync(StepExecutionContext context, CancellationToken ct); + + /// + /// Validate step configuration before execution. + /// + Task ValidateStepConfigAsync( + string stepType, + JsonElement configuration, + CancellationToken ct); + + /// + /// Get step output schema for a step type. + /// + JsonSchema? GetOutputSchema(string stepType); +} + +public sealed record StepDefinition( + string Type, + string DisplayName, + string Description, + string Category, + JsonSchema ConfigSchema, + JsonSchema OutputSchema, + IReadOnlyList RequiredCapabilities, + bool SupportsRetry, + TimeSpan DefaultTimeout); + +public sealed record StepExecutionContext( + Guid StepId, + Guid WorkflowRunId, + Guid TenantId, + string StepType, + JsonElement Configuration, + IReadOnlyDictionary Inputs, + IStepOutputWriter OutputWriter, + IPluginLogger Logger); + +public sealed record StepResult( + StepStatus Status, + IReadOnlyDictionary Outputs, + string? ErrorMessage = null, + TimeSpan? Duration = null); + +public enum StepStatus +{ + Succeeded, + Failed, + Skipped, + TimedOut +} + +public sealed record StepValidationResult( + bool IsValid, + IReadOnlyList Errors) +{ + public static StepValidationResult Success() => new(true, []); + public static StepValidationResult Failure(params string[] errors) => new(false, errors); +} +``` + +### IGateProviderCapability Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Capabilities; + +/// +/// Capability interface for promotion gate providers. +/// Plugins implementing this capability can provide custom promotion gates. +/// +public interface IGateProviderCapability +{ + /// + /// Get gate definitions provided by this plugin. + /// + IReadOnlyList GetGateDefinitions(); + + /// + /// Evaluate a gate for a promotion. + /// + Task EvaluateGateAsync(GateEvaluationContext context, CancellationToken ct); + + /// + /// Validate gate configuration. + /// + Task ValidateGateConfigAsync( + string gateType, + JsonElement configuration, + CancellationToken ct); +} + +public sealed record GateDefinition( + string Type, + string DisplayName, + string Description, + string Category, + JsonSchema ConfigSchema, + bool IsBlocking, + bool SupportsOverride, + IReadOnlyList RequiredPermissions); + +public sealed record GateEvaluationContext( + Guid GateId, + Guid PromotionId, + Guid ReleaseId, + Guid SourceEnvironmentId, + Guid TargetEnvironmentId, + Guid TenantId, + string GateType, + JsonElement Configuration, + ReleaseInfo Release, + EnvironmentInfo TargetEnvironment, + IPluginLogger Logger); + +public sealed record GateResult( + GateStatus Status, + string Message, + IReadOnlyDictionary Details, + IReadOnlyList? Evidence = null) +{ + public static GateResult Pass(string message, IReadOnlyDictionary? details = null) => + new(GateStatus.Passed, message, details ?? new Dictionary()); + + public static GateResult Fail(string message, IReadOnlyDictionary? details = null) => + new(GateStatus.Failed, message, details ?? new Dictionary()); + + public static GateResult Warn(string message, IReadOnlyDictionary? details = null) => + new(GateStatus.Warning, message, details ?? new Dictionary()); + + public static GateResult Pending(string message) => + new(GateStatus.Pending, message, new Dictionary()); +} + +public enum GateStatus +{ + Passed, + Failed, + Warning, + Pending, + Skipped +} + +public sealed record GateEvidence( + string Type, + string Description, + JsonElement Data); + +public sealed record GateValidationResult( + bool IsValid, + IReadOnlyList Errors) +{ + public static GateValidationResult Success() => new(true, []); + public static GateValidationResult Failure(params string[] errors) => new(false, errors); +} +``` + +### Integration Connector Capability Interfaces + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Capabilities; + +/// +/// Base interface for all Integration Hub connectors. +/// +public interface IIntegrationConnectorCapability +{ + /// + /// Connector category (SCM, CI, Registry, Vault, Notify). + /// + ConnectorCategory Category { get; } + + /// + /// Connector type identifier. + /// + string ConnectorType { get; } + + /// + /// Human-readable display name. + /// + string DisplayName { get; } + + /// + /// Validate connector configuration. + /// + Task ValidateConfigAsync( + JsonElement config, + CancellationToken ct); + + /// + /// Test connection with current configuration. + /// + Task TestConnectionAsync( + ConnectorContext context, + CancellationToken ct); + + /// + /// Get connector capabilities. + /// + IReadOnlyList GetSupportedOperations(); +} + +public enum ConnectorCategory +{ + Scm, + Ci, + Registry, + Vault, + Notify +} + +public sealed record ConnectorContext( + Guid IntegrationId, + Guid TenantId, + JsonElement Configuration, + ISecretResolver SecretResolver, + IPluginLogger Logger); + +/// +/// Extended interface for SCM connectors in Release Orchestrator context. +/// Extends the base IScmCapability from Phase 100 with Release Orchestrator-specific operations. +/// +public interface IScmConnectorCapability : IIntegrationConnectorCapability +{ + /// + /// List repositories accessible by this integration. + /// + Task> ListRepositoriesAsync( + ConnectorContext context, + string? searchPattern = null, + CancellationToken ct = default); + + /// + /// Get commit information. + /// + Task GetCommitAsync( + ConnectorContext context, + string repository, + string commitSha, + CancellationToken ct = default); + + /// + /// Create a webhook for repository events. + /// + Task CreateWebhookAsync( + ConnectorContext context, + string repository, + IReadOnlyList events, + string callbackUrl, + CancellationToken ct = default); + + /// + /// Get release/tag information. + /// + Task> ListReleasesAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default); +} + +/// +/// Extended interface for container registry connectors. +/// +public interface IRegistryConnectorCapability : IIntegrationConnectorCapability +{ + /// + /// List repositories in the registry. + /// + Task> ListRepositoriesAsync( + ConnectorContext context, + string? prefix = null, + CancellationToken ct = default); + + /// + /// List tags for a repository. + /// + Task> ListTagsAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default); + + /// + /// Resolve a tag to its digest. + /// + Task ResolveTagAsync( + ConnectorContext context, + string repository, + string tag, + CancellationToken ct = default); + + /// + /// Get image manifest. + /// + Task GetManifestAsync( + ConnectorContext context, + string repository, + string reference, + CancellationToken ct = default); + + /// + /// Generate pull credentials for an image. + /// + Task GetPullCredentialsAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default); +} + +/// +/// Extended interface for vault/secrets connectors. +/// +public interface IVaultConnectorCapability : IIntegrationConnectorCapability +{ + /// + /// Get a secret value. + /// + Task GetSecretAsync( + ConnectorContext context, + string path, + CancellationToken ct = default); + + /// + /// List secrets at a path. + /// + Task> ListSecretsAsync( + ConnectorContext context, + string path, + CancellationToken ct = default); +} + +/// +/// Extended interface for notification connectors. +/// +public interface INotifyConnectorCapability : IIntegrationConnectorCapability +{ + /// + /// Send a notification. + /// + Task SendNotificationAsync( + ConnectorContext context, + Notification notification, + CancellationToken ct = default); + + /// + /// Get supported notification channels. + /// + IReadOnlyList GetSupportedChannels(); +} + +/// +/// Extended interface for CI/CD system connectors. +/// +public interface ICiConnectorCapability : IIntegrationConnectorCapability +{ + /// + /// Trigger a pipeline/workflow. + /// + Task TriggerPipelineAsync( + ConnectorContext context, + PipelineTriggerRequest request, + CancellationToken ct = default); + + /// + /// Get pipeline status. + /// + Task GetPipelineStatusAsync( + ConnectorContext context, + string pipelineId, + CancellationToken ct = default); + + /// + /// List available pipelines. + /// + Task> ListPipelinesAsync( + ConnectorContext context, + string? repository = null, + CancellationToken ct = default); +} +``` + +### Step Provider Registry + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Registry; + +/// +/// Registry for discovering and querying step providers. +/// Builds on top of the unified plugin registry from Phase 100. +/// +public interface IStepProviderRegistry +{ + /// + /// Get all registered step definitions. + /// + Task> GetAllStepsAsync(CancellationToken ct = default); + + /// + /// Get steps by category. + /// + Task> GetStepsByCategoryAsync( + string category, + CancellationToken ct = default); + + /// + /// Get a specific step definition. + /// + Task GetStepAsync(string stepType, CancellationToken ct = default); + + /// + /// Get the plugin that provides a step. + /// + Task GetStepProviderPluginAsync(string stepType, CancellationToken ct = default); + + /// + /// Execute a step using its provider. + /// + Task ExecuteStepAsync( + string stepType, + StepExecutionContext context, + CancellationToken ct = default); +} + +public sealed record RegisteredStep( + StepDefinition Definition, + string PluginId, + string PluginVersion, + bool IsBuiltIn); + +/// +/// Implementation that queries the unified plugin registry for step providers. +/// +public sealed class StepProviderRegistry : IStepProviderRegistry +{ + private readonly IPluginHost _pluginHost; + private readonly IPluginRegistry _pluginRegistry; + private readonly ILogger _logger; + + public StepProviderRegistry( + IPluginHost pluginHost, + IPluginRegistry pluginRegistry, + ILogger logger) + { + _pluginHost = pluginHost; + _pluginRegistry = pluginRegistry; + _logger = logger; + } + + public async Task> GetAllStepsAsync(CancellationToken ct = default) + { + var steps = new List(); + + // Query plugins with WorkflowStep capability + var stepProviders = await _pluginRegistry.QueryByCapabilityAsync( + PluginCapabilities.WorkflowStep, ct); + + foreach (var pluginInfo in stepProviders) + { + var plugin = _pluginHost.GetPlugin(pluginInfo.Id); + if (plugin is IStepProviderCapability stepProvider) + { + var definitions = stepProvider.GetStepDefinitions(); + foreach (var def in definitions) + { + steps.Add(new RegisteredStep( + Definition: def, + PluginId: pluginInfo.Id, + PluginVersion: pluginInfo.Version, + IsBuiltIn: plugin.TrustLevel == PluginTrustLevel.BuiltIn)); + } + } + } + + return steps; + } + + public async Task> GetStepsByCategoryAsync( + string category, + CancellationToken ct = default) + { + var allSteps = await GetAllStepsAsync(ct); + return allSteps.Where(s => + s.Definition.Category.Equals(category, StringComparison.OrdinalIgnoreCase)) + .ToList(); + } + + public async Task GetStepAsync(string stepType, CancellationToken ct = default) + { + var allSteps = await GetAllStepsAsync(ct); + return allSteps.FirstOrDefault(s => + s.Definition.Type.Equals(stepType, StringComparison.OrdinalIgnoreCase)); + } + + public async Task GetStepProviderPluginAsync(string stepType, CancellationToken ct = default) + { + var step = await GetStepAsync(stepType, ct); + if (step == null) return null; + + return _pluginHost.GetPlugin(step.PluginId); + } + + public async Task ExecuteStepAsync( + string stepType, + StepExecutionContext context, + CancellationToken ct = default) + { + var plugin = await GetStepProviderPluginAsync(stepType, ct); + if (plugin is not IStepProviderCapability stepProvider) + { + throw new InvalidOperationException($"No step provider found for type: {stepType}"); + } + + return await stepProvider.ExecuteStepAsync(context, ct); + } +} +``` + +### Gate Provider Registry + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Registry; + +/// +/// Registry for discovering and querying gate providers. +/// +public interface IGateProviderRegistry +{ + /// + /// Get all registered gate definitions. + /// + Task> GetAllGatesAsync(CancellationToken ct = default); + + /// + /// Get gates by category. + /// + Task> GetGatesByCategoryAsync( + string category, + CancellationToken ct = default); + + /// + /// Get a specific gate definition. + /// + Task GetGateAsync(string gateType, CancellationToken ct = default); + + /// + /// Get the plugin that provides a gate. + /// + Task GetGateProviderPluginAsync(string gateType, CancellationToken ct = default); + + /// + /// Evaluate a gate using its provider. + /// + Task EvaluateGateAsync( + string gateType, + GateEvaluationContext context, + CancellationToken ct = default); +} + +public sealed record RegisteredGate( + GateDefinition Definition, + string PluginId, + string PluginVersion, + bool IsBuiltIn); + +/// +/// Implementation that queries the unified plugin registry for gate providers. +/// +public sealed class GateProviderRegistry : IGateProviderRegistry +{ + private readonly IPluginHost _pluginHost; + private readonly IPluginRegistry _pluginRegistry; + private readonly ILogger _logger; + + public GateProviderRegistry( + IPluginHost pluginHost, + IPluginRegistry pluginRegistry, + ILogger logger) + { + _pluginHost = pluginHost; + _pluginRegistry = pluginRegistry; + _logger = logger; + } + + public async Task> GetAllGatesAsync(CancellationToken ct = default) + { + var gates = new List(); + + var gateProviders = await _pluginRegistry.QueryByCapabilityAsync( + PluginCapabilities.PromotionGate, ct); + + foreach (var pluginInfo in gateProviders) + { + var plugin = _pluginHost.GetPlugin(pluginInfo.Id); + if (plugin is IGateProviderCapability gateProvider) + { + var definitions = gateProvider.GetGateDefinitions(); + foreach (var def in definitions) + { + gates.Add(new RegisteredGate( + Definition: def, + PluginId: pluginInfo.Id, + PluginVersion: pluginInfo.Version, + IsBuiltIn: plugin.TrustLevel == PluginTrustLevel.BuiltIn)); + } + } + } + + return gates; + } + + public async Task> GetGatesByCategoryAsync( + string category, + CancellationToken ct = default) + { + var allGates = await GetAllGatesAsync(ct); + return allGates.Where(g => + g.Definition.Category.Equals(category, StringComparison.OrdinalIgnoreCase)) + .ToList(); + } + + public async Task GetGateAsync(string gateType, CancellationToken ct = default) + { + var allGates = await GetAllGatesAsync(ct); + return allGates.FirstOrDefault(g => + g.Definition.Type.Equals(gateType, StringComparison.OrdinalIgnoreCase)); + } + + public async Task GetGateProviderPluginAsync(string gateType, CancellationToken ct = default) + { + var gate = await GetGateAsync(gateType, ct); + if (gate == null) return null; + + return _pluginHost.GetPlugin(gate.PluginId); + } + + public async Task EvaluateGateAsync( + string gateType, + GateEvaluationContext context, + CancellationToken ct = default) + { + var plugin = await GetGateProviderPluginAsync(gateType, ct); + if (plugin is not IGateProviderCapability gateProvider) + { + throw new InvalidOperationException($"No gate provider found for type: {gateType}"); + } + + return await gateProvider.EvaluateGateAsync(context, ct); + } +} +``` + +### Connector Registry + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Registry; + +/// +/// Registry for discovering and querying integration connectors. +/// +public interface IConnectorRegistry +{ + /// + /// Get all registered connectors. + /// + Task> GetAllConnectorsAsync(CancellationToken ct = default); + + /// + /// Get connectors by category. + /// + Task> GetConnectorsByCategoryAsync( + ConnectorCategory category, + CancellationToken ct = default); + + /// + /// Get a specific connector. + /// + Task GetConnectorAsync(string connectorType, CancellationToken ct = default); + + /// + /// Get the plugin that provides a connector. + /// + Task GetConnectorPluginAsync(string connectorType, CancellationToken ct = default); +} + +public sealed record RegisteredConnector( + string Type, + string DisplayName, + ConnectorCategory Category, + string PluginId, + string PluginVersion, + IReadOnlyList SupportedOperations, + bool IsBuiltIn); + +/// +/// Implementation that queries the unified plugin registry for connectors. +/// +public sealed class ConnectorRegistry : IConnectorRegistry +{ + private readonly IPluginHost _pluginHost; + private readonly IPluginRegistry _pluginRegistry; + private readonly ILogger _logger; + + private static readonly Dictionary CategoryToCapability = new() + { + [ConnectorCategory.Scm] = PluginCapabilities.Scm, + [ConnectorCategory.Ci] = PluginCapabilities.Ci, + [ConnectorCategory.Registry] = PluginCapabilities.ContainerRegistry, + [ConnectorCategory.Vault] = PluginCapabilities.SecretsVault, + [ConnectorCategory.Notify] = PluginCapabilities.Notification + }; + + public ConnectorRegistry( + IPluginHost pluginHost, + IPluginRegistry pluginRegistry, + ILogger logger) + { + _pluginHost = pluginHost; + _pluginRegistry = pluginRegistry; + _logger = logger; + } + + public async Task> GetAllConnectorsAsync(CancellationToken ct = default) + { + var connectors = new List(); + + foreach (var (category, capability) in CategoryToCapability) + { + var categoryConnectors = await GetConnectorsByCategoryAsync(category, ct); + connectors.AddRange(categoryConnectors); + } + + return connectors; + } + + public async Task> GetConnectorsByCategoryAsync( + ConnectorCategory category, + CancellationToken ct = default) + { + var connectors = new List(); + + if (!CategoryToCapability.TryGetValue(category, out var capability)) + return connectors; + + var plugins = await _pluginRegistry.QueryByCapabilityAsync(capability, ct); + + foreach (var pluginInfo in plugins) + { + var plugin = _pluginHost.GetPlugin(pluginInfo.Id); + if (plugin is IIntegrationConnectorCapability connector) + { + connectors.Add(new RegisteredConnector( + Type: connector.ConnectorType, + DisplayName: connector.DisplayName, + Category: connector.Category, + PluginId: pluginInfo.Id, + PluginVersion: pluginInfo.Version, + SupportedOperations: connector.GetSupportedOperations(), + IsBuiltIn: plugin.TrustLevel == PluginTrustLevel.BuiltIn)); + } + } + + return connectors; + } + + public async Task GetConnectorAsync(string connectorType, CancellationToken ct = default) + { + var all = await GetAllConnectorsAsync(ct); + return all.FirstOrDefault(c => + c.Type.Equals(connectorType, StringComparison.OrdinalIgnoreCase)); + } + + public async Task GetConnectorPluginAsync(string connectorType, CancellationToken ct = default) + { + var connector = await GetConnectorAsync(connectorType, ct); + if (connector == null) return null; + + return _pluginHost.GetPlugin(connector.PluginId); + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] `IStepProviderCapability` interface defined with full step lifecycle +- [ ] `IGateProviderCapability` interface defined with gate evaluation +- [ ] Integration connector interfaces defined for all categories (SCM, CI, Registry, Vault, Notify) +- [ ] `StepProviderRegistry` queries plugins from unified registry +- [ ] `GateProviderRegistry` queries plugins from unified registry +- [ ] `ConnectorRegistry` queries plugins from unified registry +- [ ] Step execution routing works through registry +- [ ] Gate evaluation routing works through registry +- [ ] Unit test coverage >= 90% + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `StepProviderRegistry_ReturnsStepsFromPlugins` | Registry queries plugins correctly | +| `GateProviderRegistry_ReturnsGatesFromPlugins` | Registry queries plugins correctly | +| `ConnectorRegistry_FiltersByCategory` | Category filtering works | +| `StepExecution_RoutesToCorrectPlugin` | Step execution routing | +| `GateEvaluation_RoutesToCorrectPlugin` | Gate evaluation routing | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `BuiltInStepsAvailable` | Built-in steps discoverable | +| `BuiltInGatesAvailable` | Built-in gates discoverable | +| `ThirdPartyPluginIntegration` | Third-party plugins integrate | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_001 Plugin Abstractions | Internal | TODO | +| 100_002 Plugin Host | Internal | TODO | +| 100_003 Plugin Registry | Internal | TODO | +| 101_001 Database Schema | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IStepProviderCapability interface | TODO | | +| IGateProviderCapability interface | TODO | | +| IScmConnectorCapability interface | TODO | | +| IRegistryConnectorCapability interface | TODO | | +| IVaultConnectorCapability interface | TODO | | +| INotifyConnectorCapability interface | TODO | | +| ICiConnectorCapability interface | TODO | | +| StepProviderRegistry | TODO | | +| GateProviderRegistry | TODO | | +| ConnectorRegistry | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 10-Jan-2026 | Refocused on Release Orchestrator-specific extensions (builds on Phase 100 core) | diff --git a/docs/implplan/SPRINT_20260110_101_003_PLUGIN_loader_sandbox.md b/docs/implplan/SPRINT_20260110_101_003_PLUGIN_loader_sandbox.md new file mode 100644 index 000000000..beed25d1b --- /dev/null +++ b/docs/implplan/SPRINT_20260110_101_003_PLUGIN_loader_sandbox.md @@ -0,0 +1,935 @@ +# SPRINT: Plugin Loader & Sandbox Extensions for Release Orchestrator + +> **Sprint ID:** 101_003 +> **Module:** PLUGIN +> **Phase:** 1 - Foundation +> **Status:** TODO +> **Parent:** [101_000_INDEX](SPRINT_20260110_101_000_INDEX_foundation.md) +> **Prerequisites:** [100_002 Plugin Host](SPRINT_20260110_100_002_PLUGIN_host.md), [100_004 Plugin Sandbox](SPRINT_20260110_100_004_PLUGIN_sandbox.md), [101_002 Registry Extensions](SPRINT_20260110_101_002_PLUGIN_registry.md) + +--- + +## Overview + +Extend the unified plugin host and sandbox (from Phase 100) with Release Orchestrator-specific execution contexts, service integrations, and domain-specific lifecycle management. This sprint builds on the core plugin infrastructure to add release orchestration capabilities. + +> **Note:** The core plugin host (`IPluginHost`, `PluginHost`, lifecycle management) and sandbox infrastructure (`ISandbox`, `ProcessSandbox`, resource limits) are implemented in Phase 100 sprints 100_002 and 100_004. This sprint adds Release Orchestrator domain-specific extensions. + +### Objectives + +- Create Release Orchestrator plugin context extensions +- Implement step execution context with workflow integration +- Implement gate evaluation context with promotion integration +- Add connector context with tenant-aware secret resolution +- Integrate with Release Orchestrator services (secrets, evidence, notifications) +- Add domain-specific health monitoring + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Plugin/ +│ ├── Context/ +│ │ ├── ReleaseOrchestratorPluginContext.cs +│ │ ├── StepExecutionContextBuilder.cs +│ │ ├── GateEvaluationContextBuilder.cs +│ │ └── ConnectorContextBuilder.cs +│ ├── Integration/ +│ │ ├── TenantSecretResolver.cs +│ │ ├── EvidenceCollector.cs +│ │ ├── NotificationBridge.cs +│ │ └── AuditLogger.cs +│ ├── Execution/ +│ │ ├── StepExecutor.cs +│ │ ├── GateEvaluator.cs +│ │ └── ConnectorInvoker.cs +│ └── Monitoring/ +│ ├── ReleaseOrchestratorPluginMonitor.cs +│ └── PluginMetricsCollector.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Plugin.Tests/ + ├── StepExecutorTests.cs + ├── GateEvaluatorTests.cs + └── ConnectorInvokerTests.cs +``` + +--- + +## Architecture Reference + +- [Phase 100 Plugin System](SPRINT_20260110_100_000_INDEX_plugin_unification.md) +- [Workflow Engine](../modules/release-orchestrator/modules/workflow-engine.md) +- [Promotion Gates](../modules/release-orchestrator/modules/promotion-gates.md) + +--- + +## Deliverables + +### Release Orchestrator Plugin Context Extensions + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Context; + +/// +/// Extended plugin context for Release Orchestrator with domain-specific services. +/// Wraps the base IPluginContext from Phase 100. +/// +public sealed class ReleaseOrchestratorPluginContext : IPluginContext +{ + private readonly IPluginContext _baseContext; + private readonly ITenantSecretResolver _secretResolver; + private readonly IEvidenceCollector _evidenceCollector; + private readonly INotificationBridge _notificationBridge; + private readonly IAuditLogger _auditLogger; + + public ReleaseOrchestratorPluginContext( + IPluginContext baseContext, + ITenantSecretResolver secretResolver, + IEvidenceCollector evidenceCollector, + INotificationBridge notificationBridge, + IAuditLogger auditLogger) + { + _baseContext = baseContext; + _secretResolver = secretResolver; + _evidenceCollector = evidenceCollector; + _notificationBridge = notificationBridge; + _auditLogger = auditLogger; + } + + // Delegate to base context + public IPluginConfiguration Configuration => _baseContext.Configuration; + public IPluginLogger Logger => _baseContext.Logger; + public TimeProvider TimeProvider => _baseContext.TimeProvider; + public IHttpClientFactory HttpClientFactory => _baseContext.HttpClientFactory; + public IGuidGenerator GuidGenerator => _baseContext.GuidGenerator; + + // Release Orchestrator-specific services + public ITenantSecretResolver SecretResolver => _secretResolver; + public IEvidenceCollector EvidenceCollector => _evidenceCollector; + public INotificationBridge NotificationBridge => _notificationBridge; + public IAuditLogger AuditLogger => _auditLogger; + + /// + /// Create a scoped context for a specific tenant. + /// + public ReleaseOrchestratorPluginContext ForTenant(Guid tenantId) + { + return new ReleaseOrchestratorPluginContext( + _baseContext, + _secretResolver.ForTenant(tenantId), + _evidenceCollector.ForTenant(tenantId), + _notificationBridge.ForTenant(tenantId), + _auditLogger.ForTenant(tenantId)); + } +} +``` + +### Tenant-Aware Secret Resolution + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Integration; + +/// +/// Resolves secrets with tenant isolation and vault connector integration. +/// +public interface ITenantSecretResolver : ISecretResolver +{ + /// + /// Create a resolver scoped to a specific tenant. + /// + ITenantSecretResolver ForTenant(Guid tenantId); + + /// + /// Resolve a secret using a specific vault integration. + /// + Task ResolveFromVaultAsync( + Guid integrationId, + string secretPath, + CancellationToken ct = default); + + /// + /// Resolve secret references in configuration. + /// Handles patterns like ${vault:integration-id/path/to/secret} + /// + Task ResolveConfigurationSecretsAsync( + JsonElement configuration, + CancellationToken ct = default); +} + +public sealed class TenantSecretResolver : ITenantSecretResolver +{ + private readonly IConnectorRegistry _connectorRegistry; + private readonly IPluginHost _pluginHost; + private readonly ILogger _logger; + private Guid? _tenantId; + + public TenantSecretResolver( + IConnectorRegistry connectorRegistry, + IPluginHost pluginHost, + ILogger logger) + { + _connectorRegistry = connectorRegistry; + _pluginHost = pluginHost; + _logger = logger; + } + + public ITenantSecretResolver ForTenant(Guid tenantId) + { + return new TenantSecretResolver(_connectorRegistry, _pluginHost, _logger) + { + _tenantId = tenantId + }; + } + + public async Task ResolveAsync(string key, CancellationToken ct = default) + { + // First try environment variables + var envValue = Environment.GetEnvironmentVariable(key); + if (envValue != null) return envValue; + + // Then try configured secrets store + // Implementation depends on deployment configuration + return null; + } + + public async Task ResolveFromVaultAsync( + Guid integrationId, + string secretPath, + CancellationToken ct = default) + { + if (_tenantId == null) + throw new InvalidOperationException("Tenant ID not set. Call ForTenant first."); + + // Find vault connector for this integration + var connector = await _connectorRegistry.GetConnectorAsync("vault", ct); + if (connector == null) + { + _logger.LogWarning("No vault connector found for integration {IntegrationId}", integrationId); + return null; + } + + var plugin = await _connectorRegistry.GetConnectorPluginAsync(connector.Type, ct); + if (plugin is not IVaultConnectorCapability vaultConnector) + { + _logger.LogWarning("Connector is not a vault connector"); + return null; + } + + var context = new ConnectorContext( + IntegrationId: integrationId, + TenantId: _tenantId.Value, + Configuration: JsonDocument.Parse("{}").RootElement, // Loaded from DB + SecretResolver: this, + Logger: new PluginLoggerAdapter(_logger)); + + var secret = await vaultConnector.GetSecretAsync(context, secretPath, ct); + return secret?.Value; + } + + public async Task ResolveConfigurationSecretsAsync( + JsonElement configuration, + CancellationToken ct = default) + { + // Parse and resolve ${vault:...} patterns in configuration + var json = configuration.GetRawText(); + var pattern = new Regex(@"\$\{vault:([^/]+)/([^}]+)\}"); + + var matches = pattern.Matches(json); + foreach (Match match in matches) + { + var integrationId = Guid.Parse(match.Groups[1].Value); + var secretPath = match.Groups[2].Value; + + var secretValue = await ResolveFromVaultAsync(integrationId, secretPath, ct); + if (secretValue != null) + { + json = json.Replace(match.Value, secretValue); + } + } + + return JsonDocument.Parse(json).RootElement; + } +} +``` + +### Step Executor + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Execution; + +/// +/// Executes workflow steps with full context integration. +/// +public interface IStepExecutor +{ + /// + /// Execute a step in the context of a workflow run. + /// + Task ExecuteAsync( + StepExecutionRequest request, + CancellationToken ct = default); +} + +public sealed record StepExecutionRequest( + Guid StepId, + Guid WorkflowRunId, + Guid TenantId, + string StepType, + JsonElement Configuration, + IReadOnlyDictionary Inputs, + TimeSpan Timeout); + +public sealed record StepExecutionResult( + StepStatus Status, + IReadOnlyDictionary Outputs, + TimeSpan Duration, + string? ErrorMessage, + IReadOnlyList Logs, + EvidencePacket? Evidence); + +public sealed record StepLogEntry( + DateTimeOffset Timestamp, + LogLevel Level, + string Message); + +public sealed class StepExecutor : IStepExecutor +{ + private readonly IStepProviderRegistry _stepRegistry; + private readonly ITenantSecretResolver _secretResolver; + private readonly IEvidenceCollector _evidenceCollector; + private readonly IAuditLogger _auditLogger; + private readonly ILogger _logger; + private readonly TimeProvider _timeProvider; + + public StepExecutor( + IStepProviderRegistry stepRegistry, + ITenantSecretResolver secretResolver, + IEvidenceCollector evidenceCollector, + IAuditLogger auditLogger, + ILogger logger, + TimeProvider timeProvider) + { + _stepRegistry = stepRegistry; + _secretResolver = secretResolver; + _evidenceCollector = evidenceCollector; + _auditLogger = auditLogger; + _logger = logger; + _timeProvider = timeProvider; + } + + public async Task ExecuteAsync( + StepExecutionRequest request, + CancellationToken ct = default) + { + var startTime = _timeProvider.GetUtcNow(); + var logs = new List(); + var outputWriter = new BufferedStepOutputWriter(); + + // Resolve secrets in configuration + var resolvedConfig = await _secretResolver + .ForTenant(request.TenantId) + .ResolveConfigurationSecretsAsync(request.Configuration, ct); + + // Create execution context + var context = new StepExecutionContext( + StepId: request.StepId, + WorkflowRunId: request.WorkflowRunId, + TenantId: request.TenantId, + StepType: request.StepType, + Configuration: resolvedConfig, + Inputs: request.Inputs, + OutputWriter: outputWriter, + Logger: new StepLogger(logs, _logger)); + + // Log execution start + await _auditLogger.LogAsync(new AuditEntry( + EventType: "step.execution.started", + TenantId: request.TenantId, + ResourceType: "workflow_step", + ResourceId: request.StepId.ToString(), + Details: new Dictionary + { + ["stepType"] = request.StepType, + ["workflowRunId"] = request.WorkflowRunId + })); + + try + { + // Execute with timeout + using var cts = CancellationTokenSource.CreateLinkedTokenSource(ct); + cts.CancelAfter(request.Timeout); + + var result = await _stepRegistry.ExecuteStepAsync( + request.StepType, + context, + cts.Token); + + var duration = _timeProvider.GetUtcNow() - startTime; + + // Collect evidence if step succeeded + EvidencePacket? evidence = null; + if (result.Status == StepStatus.Succeeded) + { + evidence = await _evidenceCollector + .ForTenant(request.TenantId) + .CollectStepEvidenceAsync(request.StepId, result, ct); + } + + // Log execution completion + await _auditLogger.LogAsync(new AuditEntry( + EventType: "step.execution.completed", + TenantId: request.TenantId, + ResourceType: "workflow_step", + ResourceId: request.StepId.ToString(), + Details: new Dictionary + { + ["stepType"] = request.StepType, + ["status"] = result.Status.ToString(), + ["durationMs"] = duration.TotalMilliseconds + })); + + return new StepExecutionResult( + Status: result.Status, + Outputs: result.Outputs, + Duration: duration, + ErrorMessage: result.ErrorMessage, + Logs: logs, + Evidence: evidence); + } + catch (OperationCanceledException) when (!ct.IsCancellationRequested) + { + var duration = _timeProvider.GetUtcNow() - startTime; + + await _auditLogger.LogAsync(new AuditEntry( + EventType: "step.execution.timeout", + TenantId: request.TenantId, + ResourceType: "workflow_step", + ResourceId: request.StepId.ToString(), + Details: new Dictionary + { + ["stepType"] = request.StepType, + ["timeoutMs"] = request.Timeout.TotalMilliseconds + })); + + return new StepExecutionResult( + Status: StepStatus.TimedOut, + Outputs: new Dictionary(), + Duration: duration, + ErrorMessage: $"Step timed out after {request.Timeout.TotalSeconds}s", + Logs: logs, + Evidence: null); + } + catch (Exception ex) + { + var duration = _timeProvider.GetUtcNow() - startTime; + + _logger.LogError(ex, "Step execution failed: {StepType}", request.StepType); + + await _auditLogger.LogAsync(new AuditEntry( + EventType: "step.execution.failed", + TenantId: request.TenantId, + ResourceType: "workflow_step", + ResourceId: request.StepId.ToString(), + Details: new Dictionary + { + ["stepType"] = request.StepType, + ["error"] = ex.Message + })); + + return new StepExecutionResult( + Status: StepStatus.Failed, + Outputs: new Dictionary(), + Duration: duration, + ErrorMessage: ex.Message, + Logs: logs, + Evidence: null); + } + } +} +``` + +### Gate Evaluator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Execution; + +/// +/// Evaluates promotion gates with full context integration. +/// +public interface IGateEvaluator +{ + /// + /// Evaluate a gate for a promotion. + /// + Task EvaluateAsync( + GateEvaluationRequest request, + CancellationToken ct = default); +} + +public sealed record GateEvaluationRequest( + Guid GateId, + Guid PromotionId, + Guid ReleaseId, + Guid SourceEnvironmentId, + Guid TargetEnvironmentId, + Guid TenantId, + string GateType, + JsonElement Configuration); + +public sealed record GateEvaluationResult( + GateStatus Status, + string Message, + IReadOnlyDictionary Details, + IReadOnlyList Evidence, + TimeSpan Duration, + bool CanOverride, + IReadOnlyList OverridePermissions); + +public sealed class GateEvaluator : IGateEvaluator +{ + private readonly IGateProviderRegistry _gateRegistry; + private readonly ITenantSecretResolver _secretResolver; + private readonly IEvidenceCollector _evidenceCollector; + private readonly IReleaseRepository _releaseRepository; + private readonly IEnvironmentRepository _environmentRepository; + private readonly IAuditLogger _auditLogger; + private readonly ILogger _logger; + private readonly TimeProvider _timeProvider; + + public GateEvaluator( + IGateProviderRegistry gateRegistry, + ITenantSecretResolver secretResolver, + IEvidenceCollector evidenceCollector, + IReleaseRepository releaseRepository, + IEnvironmentRepository environmentRepository, + IAuditLogger auditLogger, + ILogger logger, + TimeProvider timeProvider) + { + _gateRegistry = gateRegistry; + _secretResolver = secretResolver; + _evidenceCollector = evidenceCollector; + _releaseRepository = releaseRepository; + _environmentRepository = environmentRepository; + _auditLogger = auditLogger; + _logger = logger; + _timeProvider = timeProvider; + } + + public async Task EvaluateAsync( + GateEvaluationRequest request, + CancellationToken ct = default) + { + var startTime = _timeProvider.GetUtcNow(); + + // Load release and environment info + var release = await _releaseRepository.GetAsync(request.ReleaseId, ct) + ?? throw new InvalidOperationException($"Release not found: {request.ReleaseId}"); + + var targetEnvironment = await _environmentRepository.GetAsync(request.TargetEnvironmentId, ct) + ?? throw new InvalidOperationException($"Environment not found: {request.TargetEnvironmentId}"); + + // Get gate definition + var gateDefinition = await _gateRegistry.GetGateAsync(request.GateType, ct); + if (gateDefinition == null) + { + throw new InvalidOperationException($"Gate type not found: {request.GateType}"); + } + + // Resolve secrets in configuration + var resolvedConfig = await _secretResolver + .ForTenant(request.TenantId) + .ResolveConfigurationSecretsAsync(request.Configuration, ct); + + // Create evaluation context + var context = new GateEvaluationContext( + GateId: request.GateId, + PromotionId: request.PromotionId, + ReleaseId: request.ReleaseId, + SourceEnvironmentId: request.SourceEnvironmentId, + TargetEnvironmentId: request.TargetEnvironmentId, + TenantId: request.TenantId, + GateType: request.GateType, + Configuration: resolvedConfig, + Release: release.ToReleaseInfo(), + TargetEnvironment: targetEnvironment.ToEnvironmentInfo(), + Logger: new GateLogger(_logger)); + + // Log evaluation start + await _auditLogger.LogAsync(new AuditEntry( + EventType: "gate.evaluation.started", + TenantId: request.TenantId, + ResourceType: "promotion_gate", + ResourceId: request.GateId.ToString(), + Details: new Dictionary + { + ["gateType"] = request.GateType, + ["promotionId"] = request.PromotionId, + ["releaseId"] = request.ReleaseId + })); + + try + { + var result = await _gateRegistry.EvaluateGateAsync( + request.GateType, + context, + ct); + + var duration = _timeProvider.GetUtcNow() - startTime; + + // Collect evidence + var evidence = result.Evidence?.ToList() ?? new List(); + + // Add evaluation metadata as evidence + evidence.Add(new GateEvidence( + Type: "gate_evaluation_metadata", + Description: "Gate evaluation details", + Data: JsonSerializer.SerializeToElement(new + { + gateType = request.GateType, + evaluatedAt = _timeProvider.GetUtcNow(), + durationMs = duration.TotalMilliseconds, + status = result.Status.ToString() + }))); + + // Log evaluation completion + await _auditLogger.LogAsync(new AuditEntry( + EventType: "gate.evaluation.completed", + TenantId: request.TenantId, + ResourceType: "promotion_gate", + ResourceId: request.GateId.ToString(), + Details: new Dictionary + { + ["gateType"] = request.GateType, + ["status"] = result.Status.ToString(), + ["message"] = result.Message, + ["durationMs"] = duration.TotalMilliseconds + })); + + return new GateEvaluationResult( + Status: result.Status, + Message: result.Message, + Details: result.Details, + Evidence: evidence, + Duration: duration, + CanOverride: gateDefinition.Definition.SupportsOverride, + OverridePermissions: gateDefinition.Definition.RequiredPermissions); + } + catch (Exception ex) + { + var duration = _timeProvider.GetUtcNow() - startTime; + + _logger.LogError(ex, "Gate evaluation failed: {GateType}", request.GateType); + + await _auditLogger.LogAsync(new AuditEntry( + EventType: "gate.evaluation.failed", + TenantId: request.TenantId, + ResourceType: "promotion_gate", + ResourceId: request.GateId.ToString(), + Details: new Dictionary + { + ["gateType"] = request.GateType, + ["error"] = ex.Message + })); + + return new GateEvaluationResult( + Status: GateStatus.Failed, + Message: $"Gate evaluation error: {ex.Message}", + Details: new Dictionary { ["exception"] = ex.GetType().Name }, + Evidence: new List(), + Duration: duration, + CanOverride: gateDefinition.Definition.SupportsOverride, + OverridePermissions: gateDefinition.Definition.RequiredPermissions); + } + } +} +``` + +### Evidence Collector Integration + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Integration; + +/// +/// Collects evidence from plugin executions for audit trails. +/// +public interface IEvidenceCollector +{ + /// + /// Create a collector scoped to a specific tenant. + /// + IEvidenceCollector ForTenant(Guid tenantId); + + /// + /// Collect evidence from a step execution. + /// + Task CollectStepEvidenceAsync( + Guid stepId, + StepResult result, + CancellationToken ct = default); + + /// + /// Collect evidence from a gate evaluation. + /// + Task CollectGateEvidenceAsync( + Guid gateId, + GateResult result, + CancellationToken ct = default); + + /// + /// Collect evidence from a connector operation. + /// + Task CollectConnectorEvidenceAsync( + Guid integrationId, + string operation, + JsonElement result, + CancellationToken ct = default); +} + +public sealed record EvidencePacket( + Guid Id, + string Type, + Guid TenantId, + DateTimeOffset CollectedAt, + string ContentDigest, + JsonElement Content, + IReadOnlyDictionary Metadata); +``` + +### Plugin Metrics Collector + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Monitoring; + +/// +/// Collects metrics from Release Orchestrator plugin executions. +/// +public sealed class ReleaseOrchestratorPluginMonitor : IHostedService +{ + private readonly IPluginHost _pluginHost; + private readonly IStepProviderRegistry _stepRegistry; + private readonly IGateProviderRegistry _gateRegistry; + private readonly IConnectorRegistry _connectorRegistry; + private readonly IMeterFactory _meterFactory; + private readonly ILogger _logger; + private readonly TimeSpan _monitoringInterval = TimeSpan.FromSeconds(30); + private Timer? _timer; + + private Meter _meter; + private Counter _stepExecutionCounter; + private Counter _gateEvaluationCounter; + private Counter _connectorOperationCounter; + private Histogram _stepExecutionDuration; + private Histogram _gateEvaluationDuration; + + public ReleaseOrchestratorPluginMonitor( + IPluginHost pluginHost, + IStepProviderRegistry stepRegistry, + IGateProviderRegistry gateRegistry, + IConnectorRegistry connectorRegistry, + IMeterFactory meterFactory, + ILogger logger) + { + _pluginHost = pluginHost; + _stepRegistry = stepRegistry; + _gateRegistry = gateRegistry; + _connectorRegistry = connectorRegistry; + _meterFactory = meterFactory; + _logger = logger; + + InitializeMetrics(); + } + + private void InitializeMetrics() + { + _meter = _meterFactory.Create("StellaOps.ReleaseOrchestrator.Plugin"); + + _stepExecutionCounter = _meter.CreateCounter( + "stellaops_step_executions_total", + description: "Total number of step executions"); + + _gateEvaluationCounter = _meter.CreateCounter( + "stellaops_gate_evaluations_total", + description: "Total number of gate evaluations"); + + _connectorOperationCounter = _meter.CreateCounter( + "stellaops_connector_operations_total", + description: "Total number of connector operations"); + + _stepExecutionDuration = _meter.CreateHistogram( + "stellaops_step_execution_duration_ms", + unit: "ms", + description: "Step execution duration in milliseconds"); + + _gateEvaluationDuration = _meter.CreateHistogram( + "stellaops_gate_evaluation_duration_ms", + unit: "ms", + description: "Gate evaluation duration in milliseconds"); + } + + public Task StartAsync(CancellationToken ct) + { + _timer = new Timer( + MonitorPlugins, + null, + TimeSpan.FromSeconds(10), + _monitoringInterval); + + return Task.CompletedTask; + } + + public Task StopAsync(CancellationToken ct) + { + _timer?.Change(Timeout.Infinite, 0); + return Task.CompletedTask; + } + + private async void MonitorPlugins(object? state) + { + try + { + // Collect plugin health status + var plugins = _pluginHost.GetLoadedPlugins(); + foreach (var pluginInfo in plugins) + { + var plugin = _pluginHost.GetPlugin(pluginInfo.Id); + if (plugin != null) + { + var health = await plugin.HealthCheckAsync(CancellationToken.None); + _logger.LogDebug( + "Plugin {PluginId} health: {Status}", + pluginInfo.Id, + health.Status); + } + } + + // Count available steps, gates, connectors + var steps = await _stepRegistry.GetAllStepsAsync(); + var gates = await _gateRegistry.GetAllGatesAsync(); + var connectors = await _connectorRegistry.GetAllConnectorsAsync(); + + _logger.LogDebug( + "Plugin inventory: {Steps} steps, {Gates} gates, {Connectors} connectors", + steps.Count, gates.Count, connectors.Count); + } + catch (Exception ex) + { + _logger.LogError(ex, "Error monitoring plugins"); + } + } + + /// + /// Record a step execution metric. + /// + public void RecordStepExecution(string stepType, StepStatus status, TimeSpan duration) + { + _stepExecutionCounter.Add(1, + new KeyValuePair("step_type", stepType), + new KeyValuePair("status", status.ToString())); + + _stepExecutionDuration.Record(duration.TotalMilliseconds, + new KeyValuePair("step_type", stepType)); + } + + /// + /// Record a gate evaluation metric. + /// + public void RecordGateEvaluation(string gateType, GateStatus status, TimeSpan duration) + { + _gateEvaluationCounter.Add(1, + new KeyValuePair("gate_type", gateType), + new KeyValuePair("status", status.ToString())); + + _gateEvaluationDuration.Record(duration.TotalMilliseconds, + new KeyValuePair("gate_type", gateType)); + } + + /// + /// Record a connector operation metric. + /// + public void RecordConnectorOperation(string connectorType, string operation, bool success) + { + _connectorOperationCounter.Add(1, + new KeyValuePair("connector_type", connectorType), + new KeyValuePair("operation", operation), + new KeyValuePair("success", success)); + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] `ReleaseOrchestratorPluginContext` wraps base context with domain services +- [ ] `TenantSecretResolver` resolves secrets with tenant isolation +- [ ] Secret reference patterns (`${vault:...}`) resolved in configuration +- [ ] `StepExecutor` executes steps with full context integration +- [ ] `GateEvaluator` evaluates gates with evidence collection +- [ ] Audit logging for all plugin executions +- [ ] Evidence collection for steps and gates +- [ ] Plugin metrics exposed via OpenTelemetry +- [ ] Unit test coverage >= 85% +- [ ] Integration tests with mock plugins + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `TenantSecretResolver_IsolatesTenants` | Tenant isolation works | +| `SecretPattern_ResolvedInConfig` | ${vault:...} patterns resolved | +| `StepExecutor_RecordsAuditLogs` | Audit logging works | +| `StepExecutor_CollectsEvidence` | Evidence collected on success | +| `StepExecutor_HandlesTimeout` | Timeout handling works | +| `GateEvaluator_ReturnsOverrideInfo` | Override permissions returned | +| `PluginMonitor_CollectsMetrics` | Metrics recorded correctly | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `StepExecutor_ExecutesBuiltInStep` | Built-in step execution | +| `GateEvaluator_EvaluatesBuiltInGate` | Built-in gate evaluation | +| `EvidenceCollector_PersistsEvidence` | Evidence persisted to storage | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_002 Plugin Host | Internal | TODO | +| 100_004 Plugin Sandbox | Internal | TODO | +| 101_002 Registry Extensions | Internal | TODO | +| 101_001 Database Schema | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| ReleaseOrchestratorPluginContext | TODO | | +| TenantSecretResolver | TODO | | +| StepExecutor | TODO | | +| GateEvaluator | TODO | | +| ConnectorInvoker | TODO | | +| EvidenceCollector | TODO | | +| AuditLogger integration | TODO | | +| ReleaseOrchestratorPluginMonitor | TODO | | +| Unit tests | TODO | | +| Integration tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 10-Jan-2026 | Refocused on Release Orchestrator-specific extensions (builds on Phase 100 core) | diff --git a/docs/implplan/SPRINT_20260110_101_004_PLUGIN_sdk.md b/docs/implplan/SPRINT_20260110_101_004_PLUGIN_sdk.md new file mode 100644 index 000000000..05a86a8b8 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_101_004_PLUGIN_sdk.md @@ -0,0 +1,1120 @@ +# SPRINT: Plugin SDK Extensions for Release Orchestrator + +> **Sprint ID:** 101_004 +> **Module:** PLUGIN +> **Phase:** 1 - Foundation +> **Status:** TODO +> **Parent:** [101_000_INDEX](SPRINT_20260110_101_000_INDEX_foundation.md) +> **Prerequisites:** [100_012 Plugin SDK](SPRINT_20260110_100_012_PLUGIN_sdk.md), [101_002 Registry Extensions](SPRINT_20260110_101_002_PLUGIN_registry.md), [101_003 Loader Extensions](SPRINT_20260110_101_003_PLUGIN_loader_sandbox.md) + +--- + +## Overview + +Extend the unified Plugin SDK (from Phase 100) with Release Orchestrator-specific base classes, project templates, and testing utilities for building workflow steps, promotion gates, and integration connectors. + +> **Note:** The core Plugin SDK (`StellaOps.Plugin.Sdk`, base classes, manifest builder, developer tools) is implemented in Phase 100 sprint 100_012. This sprint adds Release Orchestrator domain-specific templates and utilities. + +### Objectives + +- Create base classes for step, gate, and connector plugins +- Create project templates for each plugin type +- Build testing utilities for Release Orchestrator plugins +- Create sample plugins demonstrating best practices +- Add documentation and tutorials + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Plugin.Sdk/ +│ ├── Contracts/ +│ │ ├── IStepPlugin.cs +│ │ ├── IGatePlugin.cs +│ │ └── IConnectorPlugin.cs +│ ├── Base/ +│ │ ├── StepPluginBase.cs +│ │ ├── GatePluginBase.cs +│ │ ├── ScmConnectorPluginBase.cs +│ │ ├── RegistryConnectorPluginBase.cs +│ │ ├── VaultConnectorPluginBase.cs +│ │ └── NotifyConnectorPluginBase.cs +│ ├── Testing/ +│ │ ├── StepTestHost.cs +│ │ ├── GateTestHost.cs +│ │ ├── ConnectorTestHost.cs +│ │ ├── MockReleaseContext.cs +│ │ └── MockEnvironmentContext.cs +│ └── StellaOps.ReleaseOrchestrator.Plugin.Sdk.csproj +├── __Templates/ +│ └── stella-orchestrator-plugin/ +│ ├── template.json +│ ├── StepPlugin/ +│ ├── GatePlugin/ +│ └── ConnectorPlugin/ +└── __Samples/ + ├── StellaOps.Plugin.Sample.Step/ + ├── StellaOps.Plugin.Sample.Gate/ + └── StellaOps.Plugin.Sample.Connector/ +``` + +--- + +## Architecture Reference + +- [Phase 100 Plugin SDK](SPRINT_20260110_100_012_PLUGIN_sdk.md) +- [Plugin System](../modules/release-orchestrator/modules/plugin-system.md) +- [Workflow Engine](../modules/release-orchestrator/modules/workflow-engine.md) + +--- + +## Deliverables + +### Step Plugin Contracts and Base Class + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Sdk.Contracts; + +/// +/// Complete interface for a workflow step plugin. +/// Combines IPlugin from Phase 100 with IStepProviderCapability. +/// +public interface IStepPlugin : IPlugin, IStepProviderCapability +{ + /// + /// Step metadata for registration. + /// + StepPluginMetadata StepMetadata { get; } +} + +public sealed record StepPluginMetadata( + string StepType, + string DisplayName, + string Description, + string Category, + JsonSchema ConfigSchema, + JsonSchema OutputSchema, + bool SupportsRetry = true, + TimeSpan? DefaultTimeout = null, + IReadOnlyList? RequiredCapabilities = null); +``` + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Sdk.Base; + +/// +/// Base class for workflow step plugins with common functionality. +/// +public abstract class StepPluginBase : PluginBase, IStepPlugin +{ + private readonly List _stepDefinitions = new(); + + /// + /// Step metadata. Derived classes must implement. + /// + public abstract StepPluginMetadata StepMetadata { get; } + + /// + /// Register additional step types (if plugin provides multiple). + /// + protected void RegisterStep(StepDefinition definition) + { + _stepDefinitions.Add(definition); + } + + public IReadOnlyList GetStepDefinitions() + { + // Primary step from metadata + var primary = new StepDefinition( + Type: StepMetadata.StepType, + DisplayName: StepMetadata.DisplayName, + Description: StepMetadata.Description, + Category: StepMetadata.Category, + ConfigSchema: StepMetadata.ConfigSchema, + OutputSchema: StepMetadata.OutputSchema, + RequiredCapabilities: StepMetadata.RequiredCapabilities ?? [], + SupportsRetry: StepMetadata.SupportsRetry, + DefaultTimeout: StepMetadata.DefaultTimeout ?? TimeSpan.FromMinutes(5)); + + return new[] { primary }.Concat(_stepDefinitions).ToList(); + } + + /// + /// Execute the step. Derived classes must implement. + /// + public abstract Task ExecuteStepAsync( + StepExecutionContext context, + CancellationToken ct); + + public virtual Task ValidateStepConfigAsync( + string stepType, + JsonElement configuration, + CancellationToken ct) + { + // Default implementation - schema validation only + return Task.FromResult(StepValidationResult.Success()); + } + + public virtual JsonSchema? GetOutputSchema(string stepType) + { + if (stepType == StepMetadata.StepType) + return StepMetadata.OutputSchema; + + var definition = _stepDefinitions.FirstOrDefault(d => d.Type == stepType); + return definition?.OutputSchema; + } + + // Helper methods for derived classes + + /// + /// Write structured output. + /// + protected static Task WriteOutputAsync( + StepExecutionContext context, + string key, + object value) + { + return context.OutputWriter.WriteAsync(key, value); + } + + /// + /// Write log message. + /// + protected static void Log( + StepExecutionContext context, + LogLevel level, + string message, + params object[] args) + { + context.Logger.Log(level, message, args); + } + + /// + /// Create a success result. + /// + protected static StepResult Success(IReadOnlyDictionary? outputs = null) + { + return new StepResult( + Status: StepStatus.Succeeded, + Outputs: outputs ?? new Dictionary()); + } + + /// + /// Create a failure result. + /// + protected static StepResult Failure(string errorMessage) + { + return new StepResult( + Status: StepStatus.Failed, + Outputs: new Dictionary(), + ErrorMessage: errorMessage); + } + + /// + /// Create a skipped result. + /// + protected static StepResult Skipped(string reason) + { + return new StepResult( + Status: StepStatus.Skipped, + Outputs: new Dictionary + { + ["skipReason"] = reason + }); + } +} +``` + +### Gate Plugin Contracts and Base Class + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Sdk.Contracts; + +/// +/// Complete interface for a promotion gate plugin. +/// +public interface IGatePlugin : IPlugin, IGateProviderCapability +{ + /// + /// Gate metadata for registration. + /// + GatePluginMetadata GateMetadata { get; } +} + +public sealed record GatePluginMetadata( + string GateType, + string DisplayName, + string Description, + string Category, + JsonSchema ConfigSchema, + bool IsBlocking = true, + bool SupportsOverride = true, + IReadOnlyList? RequiredPermissions = null); +``` + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Sdk.Base; + +/// +/// Base class for promotion gate plugins. +/// +public abstract class GatePluginBase : PluginBase, IGatePlugin +{ + private readonly List _gateDefinitions = new(); + + public abstract GatePluginMetadata GateMetadata { get; } + + protected void RegisterGate(GateDefinition definition) + { + _gateDefinitions.Add(definition); + } + + public IReadOnlyList GetGateDefinitions() + { + var primary = new GateDefinition( + Type: GateMetadata.GateType, + DisplayName: GateMetadata.DisplayName, + Description: GateMetadata.Description, + Category: GateMetadata.Category, + ConfigSchema: GateMetadata.ConfigSchema, + IsBlocking: GateMetadata.IsBlocking, + SupportsOverride: GateMetadata.SupportsOverride, + RequiredPermissions: GateMetadata.RequiredPermissions ?? []); + + return new[] { primary }.Concat(_gateDefinitions).ToList(); + } + + /// + /// Evaluate the gate. Derived classes must implement. + /// + public abstract Task EvaluateGateAsync( + GateEvaluationContext context, + CancellationToken ct); + + public virtual Task ValidateGateConfigAsync( + string gateType, + JsonElement configuration, + CancellationToken ct) + { + return Task.FromResult(GateValidationResult.Success()); + } + + // Helper methods + + /// + /// Create a pass result. + /// + protected static GateResult Pass( + string message, + IReadOnlyDictionary? details = null, + IReadOnlyList? evidence = null) + { + return new GateResult( + Status: GateStatus.Passed, + Message: message, + Details: details ?? new Dictionary(), + Evidence: evidence); + } + + /// + /// Create a fail result. + /// + protected static GateResult Fail( + string message, + IReadOnlyDictionary? details = null, + IReadOnlyList? evidence = null) + { + return new GateResult( + Status: GateStatus.Failed, + Message: message, + Details: details ?? new Dictionary(), + Evidence: evidence); + } + + /// + /// Create a warning result (passes but with advisory). + /// + protected static GateResult Warn( + string message, + IReadOnlyDictionary? details = null, + IReadOnlyList? evidence = null) + { + return new GateResult( + Status: GateStatus.Warning, + Message: message, + Details: details ?? new Dictionary(), + Evidence: evidence); + } + + /// + /// Create a pending result (requires async evaluation). + /// + protected static GateResult Pending(string message) + { + return new GateResult( + Status: GateStatus.Pending, + Message: message, + Details: new Dictionary()); + } + + /// + /// Create evidence from JSON data. + /// + protected static GateEvidence CreateEvidence( + string type, + string description, + object data) + { + return new GateEvidence( + Type: type, + Description: description, + Data: JsonSerializer.SerializeToElement(data)); + } +} +``` + +### Connector Plugin Base Classes + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Sdk.Base; + +/// +/// Base class for SCM connector plugins. +/// +public abstract class ScmConnectorPluginBase : PluginBase, IPlugin, IScmConnectorCapability +{ + public ConnectorCategory Category => ConnectorCategory.Scm; + + public abstract string ConnectorType { get; } + public abstract string DisplayName { get; } + + public abstract Task ValidateConfigAsync( + JsonElement config, + CancellationToken ct); + + public abstract Task TestConnectionAsync( + ConnectorContext context, + CancellationToken ct); + + public abstract IReadOnlyList GetSupportedOperations(); + + public abstract Task> ListRepositoriesAsync( + ConnectorContext context, + string? searchPattern = null, + CancellationToken ct = default); + + public abstract Task GetCommitAsync( + ConnectorContext context, + string repository, + string commitSha, + CancellationToken ct = default); + + public abstract Task CreateWebhookAsync( + ConnectorContext context, + string repository, + IReadOnlyList events, + string callbackUrl, + CancellationToken ct = default); + + public abstract Task> ListReleasesAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default); + + // Helper methods + + /// + /// Get configuration value with type conversion. + /// + protected static T GetConfig(JsonElement config, string key, T defaultValue = default!) + { + if (config.TryGetProperty(key, out var element)) + { + try + { + return element.Deserialize() ?? defaultValue; + } + catch + { + return defaultValue; + } + } + return defaultValue; + } + + /// + /// Validate required configuration fields. + /// + protected static ConfigValidationResult ValidateRequired( + JsonElement config, + params string[] requiredFields) + { + var errors = new List(); + + foreach (var field in requiredFields) + { + if (!config.TryGetProperty(field, out var element) || + element.ValueKind == JsonValueKind.Null || + (element.ValueKind == JsonValueKind.String && string.IsNullOrEmpty(element.GetString()))) + { + errors.Add($"Required field '{field}' is missing or empty"); + } + } + + return errors.Count == 0 + ? ConfigValidationResult.Success() + : ConfigValidationResult.Failure(errors.ToArray()); + } + + /// + /// Create a connection test result. + /// + protected static ConnectionTestResult ConnectionSuccess(TimeSpan responseTime) + { + return new ConnectionTestResult(true, "Connection successful", responseTime); + } + + protected static ConnectionTestResult ConnectionFailed(string message) + { + return new ConnectionTestResult(false, message, TimeSpan.Zero); + } +} + +/// +/// Base class for registry connector plugins. +/// +public abstract class RegistryConnectorPluginBase : PluginBase, IPlugin, IRegistryConnectorCapability +{ + public ConnectorCategory Category => ConnectorCategory.Registry; + + public abstract string ConnectorType { get; } + public abstract string DisplayName { get; } + + public abstract Task ValidateConfigAsync( + JsonElement config, + CancellationToken ct); + + public abstract Task TestConnectionAsync( + ConnectorContext context, + CancellationToken ct); + + public abstract IReadOnlyList GetSupportedOperations(); + + public abstract Task> ListRepositoriesAsync( + ConnectorContext context, + string? prefix = null, + CancellationToken ct = default); + + public abstract Task> ListTagsAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default); + + public abstract Task ResolveTagAsync( + ConnectorContext context, + string repository, + string tag, + CancellationToken ct = default); + + public abstract Task GetManifestAsync( + ConnectorContext context, + string repository, + string reference, + CancellationToken ct = default); + + public abstract Task GetPullCredentialsAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default); + + // Helper methods inherited from ScmConnectorPluginBase + protected static T GetConfig(JsonElement config, string key, T defaultValue = default!) => + ScmConnectorPluginBase.GetConfig(config, key, defaultValue); + + protected static ConfigValidationResult ValidateRequired(JsonElement config, params string[] fields) => + ScmConnectorPluginBase.ValidateRequired(config, fields); + + protected static ConnectionTestResult ConnectionSuccess(TimeSpan responseTime) => + ScmConnectorPluginBase.ConnectionSuccess(responseTime); + + protected static ConnectionTestResult ConnectionFailed(string message) => + ScmConnectorPluginBase.ConnectionFailed(message); +} +``` + +### Testing Utilities + +```csharp +namespace StellaOps.ReleaseOrchestrator.Plugin.Sdk.Testing; + +/// +/// Test host for step plugin unit testing. +/// +public sealed class StepTestHost : IAsyncDisposable +{ + private readonly IStepPlugin _plugin; + private readonly IPluginContext _context; + + private StepTestHost(IStepPlugin plugin, IPluginContext context) + { + _plugin = plugin; + _context = context; + } + + /// + /// Create a test host for a step plugin. + /// + public static async Task CreateAsync( + Action? configureContext = null) + where TPlugin : IStepPlugin, new() + { + var context = new MockPluginContext(); + configureContext?.Invoke(context); + + var plugin = new TPlugin(); + await plugin.InitializeAsync(context, CancellationToken.None); + + return new StepTestHost(plugin, context); + } + + /// + /// Execute the step with test inputs. + /// + public Task ExecuteAsync( + JsonElement? configuration = null, + IReadOnlyDictionary? inputs = null, + CancellationToken ct = default) + { + var context = new StepExecutionContext( + StepId: Guid.NewGuid(), + WorkflowRunId: Guid.NewGuid(), + TenantId: Guid.NewGuid(), + StepType: _plugin.StepMetadata.StepType, + Configuration: configuration ?? JsonDocument.Parse("{}").RootElement, + Inputs: inputs ?? new Dictionary(), + OutputWriter: new MockStepOutputWriter(), + Logger: new MockPluginLogger()); + + return _plugin.ExecuteStepAsync(context, ct); + } + + /// + /// Execute with strongly-typed configuration. + /// + public Task ExecuteAsync( + TConfig configuration, + IReadOnlyDictionary? inputs = null, + CancellationToken ct = default) + { + var configJson = JsonSerializer.SerializeToElement(configuration); + return ExecuteAsync(configJson, inputs, ct); + } + + /// + /// Validate step configuration. + /// + public Task ValidateConfigAsync( + JsonElement configuration, + CancellationToken ct = default) + { + return _plugin.ValidateStepConfigAsync( + _plugin.StepMetadata.StepType, + configuration, + ct); + } + + public async ValueTask DisposeAsync() + { + await _plugin.DisposeAsync(); + } +} + +/// +/// Test host for gate plugin unit testing. +/// +public sealed class GateTestHost : IAsyncDisposable +{ + private readonly IGatePlugin _plugin; + private readonly IPluginContext _context; + + private GateTestHost(IGatePlugin plugin, IPluginContext context) + { + _plugin = plugin; + _context = context; + } + + public static async Task CreateAsync( + Action? configureContext = null) + where TPlugin : IGatePlugin, new() + { + var context = new MockPluginContext(); + configureContext?.Invoke(context); + + var plugin = new TPlugin(); + await plugin.InitializeAsync(context, CancellationToken.None); + + return new GateTestHost(plugin, context); + } + + /// + /// Evaluate the gate with test context. + /// + public Task EvaluateAsync( + JsonElement? configuration = null, + ReleaseInfo? release = null, + EnvironmentInfo? environment = null, + CancellationToken ct = default) + { + var context = new GateEvaluationContext( + GateId: Guid.NewGuid(), + PromotionId: Guid.NewGuid(), + ReleaseId: Guid.NewGuid(), + SourceEnvironmentId: Guid.NewGuid(), + TargetEnvironmentId: Guid.NewGuid(), + TenantId: Guid.NewGuid(), + GateType: _plugin.GateMetadata.GateType, + Configuration: configuration ?? JsonDocument.Parse("{}").RootElement, + Release: release ?? MockReleaseContext.Create(), + TargetEnvironment: environment ?? MockEnvironmentContext.Create(), + Logger: new MockPluginLogger()); + + return _plugin.EvaluateGateAsync(context, ct); + } + + public async ValueTask DisposeAsync() + { + await _plugin.DisposeAsync(); + } +} + +/// +/// Test host for connector plugin unit testing. +/// +public sealed class ConnectorTestHost : IAsyncDisposable + where TConnector : IPlugin, IIntegrationConnectorCapability +{ + private readonly TConnector _plugin; + private readonly MockPluginContext _context; + + private ConnectorTestHost(TConnector plugin, MockPluginContext context) + { + _plugin = plugin; + _context = context; + } + + public TConnector Plugin => _plugin; + + public static async Task> CreateAsync( + TConnector plugin, + Action? configureContext = null) + { + var context = new MockPluginContext(); + configureContext?.Invoke(context); + + await plugin.InitializeAsync(context, CancellationToken.None); + + return new ConnectorTestHost(plugin, context); + } + + /// + /// Create a connector context for testing. + /// + public ConnectorContext CreateContext( + JsonElement? configuration = null, + Dictionary? secrets = null) + { + return new ConnectorContext( + IntegrationId: Guid.NewGuid(), + TenantId: Guid.NewGuid(), + Configuration: configuration ?? JsonDocument.Parse("{}").RootElement, + SecretResolver: new MockSecretResolver(secrets ?? new()), + Logger: new MockPluginLogger()); + } + + /// + /// Test connection with configuration. + /// + public Task TestConnectionAsync( + JsonElement? configuration = null, + CancellationToken ct = default) + { + var context = CreateContext(configuration); + return _plugin.TestConnectionAsync(context, ct); + } + + public async ValueTask DisposeAsync() + { + await _plugin.DisposeAsync(); + } +} + +/// +/// Mock release context for testing. +/// +public static class MockReleaseContext +{ + public static ReleaseInfo Create( + string? version = null, + IReadOnlyList? components = null) + { + return new ReleaseInfo( + Id: Guid.NewGuid(), + Name: "test-release", + Version: version ?? "1.0.0", + CreatedAt: DateTimeOffset.UtcNow, + Components: components ?? new List + { + new ReleaseComponent( + Name: "test-service", + ImageReference: "registry.example.com/test:1.0.0", + Digest: "sha256:abc123", + Tags: new[] { "1.0.0", "latest" }) + }, + Metadata: new Dictionary()); + } +} + +/// +/// Mock environment context for testing. +/// +public static class MockEnvironmentContext +{ + public static EnvironmentInfo Create( + string? name = null, + string? tier = null) + { + return new EnvironmentInfo( + Id: Guid.NewGuid(), + Name: name ?? "test-env", + Tier: tier ?? "development", + Variables: new Dictionary(), + Metadata: new Dictionary()); + } +} +``` + +### Project Template + +```json +// template.json +{ + "$schema": "http://json.schemastore.org/template", + "author": "Stella Ops", + "classifications": ["Stella", "Plugin", "Release Orchestrator", "Step", "Gate", "Connector"], + "identity": "StellaOps.ReleaseOrchestrator.Plugin.Templates", + "name": "Stella Ops Release Orchestrator Plugin", + "shortName": "stella-orchestrator-plugin", + "tags": { + "language": "C#", + "type": "project" + }, + "sourceName": "StellaOps.Plugin.Template", + "preferNameDirectory": true, + "symbols": { + "pluginType": { + "type": "parameter", + "datatype": "choice", + "choices": [ + { "choice": "step", "description": "Workflow step plugin" }, + { "choice": "gate", "description": "Promotion gate plugin" }, + { "choice": "scm", "description": "SCM connector plugin" }, + { "choice": "registry", "description": "Container registry connector plugin" }, + { "choice": "vault", "description": "Secrets vault connector plugin" }, + { "choice": "notify", "description": "Notification connector plugin" } + ], + "defaultValue": "step", + "description": "Type of plugin to create" + }, + "pluginName": { + "type": "parameter", + "datatype": "string", + "defaultValue": "my-plugin", + "description": "Plugin identifier (lowercase, alphanumeric, hyphens)" + }, + "author": { + "type": "parameter", + "datatype": "string", + "defaultValue": "Your Name", + "description": "Plugin author name" + } + }, + "sources": [ + { + "modifiers": [ + { "condition": "(pluginType == 'step')", "include": ["StepPlugin/**/*"] }, + { "condition": "(pluginType == 'gate')", "include": ["GatePlugin/**/*"] }, + { "condition": "(pluginType == 'scm')", "include": ["ScmConnectorPlugin/**/*"] }, + { "condition": "(pluginType == 'registry')", "include": ["RegistryConnectorPlugin/**/*"] }, + { "condition": "(pluginType == 'vault')", "include": ["VaultConnectorPlugin/**/*"] }, + { "condition": "(pluginType == 'notify')", "include": ["NotifyConnectorPlugin/**/*"] } + ] + } + ] +} +``` + +### Sample Step Plugin + +```csharp +// Sample: HttpRequestStep - executes HTTP requests as workflow step +namespace StellaOps.Plugin.Sample.Step; + +[Plugin( + id: "com.stellaops.step.http-request", + name: "HTTP Request Step", + version: "1.0.0", + vendor: "Stella Ops")] +[ProvidesCapability(PluginCapabilities.WorkflowStep, CapabilityId = "http-request")] +[RequiresCapability(PluginCapabilities.Network)] +public sealed class HttpRequestStep : StepPluginBase +{ + private HttpClient? _httpClient; + + public override PluginInfo Info => new( + Id: "com.stellaops.step.http-request", + Name: "HTTP Request Step", + Version: "1.0.0", + Vendor: "Stella Ops", + Description: "Execute HTTP requests as part of workflow"); + + public override PluginTrustLevel TrustLevel => PluginTrustLevel.BuiltIn; + public override PluginCapabilities Capabilities => + PluginCapabilities.WorkflowStep | PluginCapabilities.Network; + + public override StepPluginMetadata StepMetadata => new( + StepType: "http-request", + DisplayName: "HTTP Request", + Description: "Execute an HTTP request and capture the response", + Category: "Integration", + ConfigSchema: CreateConfigSchema(), + OutputSchema: CreateOutputSchema(), + SupportsRetry: true, + DefaultTimeout: TimeSpan.FromSeconds(30)); + + protected override Task InitializeCoreAsync(IPluginContext context, CancellationToken ct) + { + _httpClient = context.HttpClientFactory.CreateClient("HttpRequestStep"); + return Task.CompletedTask; + } + + public override async Task ExecuteStepAsync( + StepExecutionContext context, + CancellationToken ct) + { + var config = context.Configuration; + + var method = GetConfig(config, "method", "GET"); + var url = GetConfig(config, "url", ""); + var headers = GetConfig>(config, "headers", new()); + var body = GetConfig(config, "body", null); + var expectedStatus = GetConfig(config, "expectedStatus", null); + + if (string.IsNullOrEmpty(url)) + { + return Failure("URL is required"); + } + + Log(context, LogLevel.Information, "Executing {Method} request to {Url}", method, url); + + var request = new HttpRequestMessage(new HttpMethod(method), url); + + foreach (var header in headers) + { + request.Headers.TryAddWithoutValidation(header.Key, header.Value); + } + + if (!string.IsNullOrEmpty(body)) + { + request.Content = new StringContent(body, Encoding.UTF8, "application/json"); + } + + var response = await _httpClient!.SendAsync(request, ct); + + var responseBody = await response.Content.ReadAsStringAsync(ct); + var statusCode = (int)response.StatusCode; + + await WriteOutputAsync(context, "statusCode", statusCode); + await WriteOutputAsync(context, "body", responseBody); + await WriteOutputAsync(context, "headers", + response.Headers.ToDictionary(h => h.Key, h => string.Join(", ", h.Value))); + + if (expectedStatus.HasValue && statusCode != expectedStatus.Value) + { + return Failure($"Expected status {expectedStatus.Value} but got {statusCode}"); + } + + if (!response.IsSuccessStatusCode) + { + return Failure($"Request failed with status {statusCode}"); + } + + return Success(new Dictionary + { + ["statusCode"] = statusCode, + ["body"] = responseBody + }); + } + + private static JsonSchema CreateConfigSchema() + { + return JsonSchema.Parse(""" + { + "type": "object", + "properties": { + "method": { "type": "string", "enum": ["GET", "POST", "PUT", "DELETE", "PATCH"] }, + "url": { "type": "string", "format": "uri" }, + "headers": { "type": "object", "additionalProperties": { "type": "string" } }, + "body": { "type": "string" }, + "expectedStatus": { "type": "integer" } + }, + "required": ["url"] + } + """); + } + + private static JsonSchema CreateOutputSchema() + { + return JsonSchema.Parse(""" + { + "type": "object", + "properties": { + "statusCode": { "type": "integer" }, + "body": { "type": "string" }, + "headers": { "type": "object" } + } + } + """); + } + + private static T GetConfig(JsonElement config, string key, T defaultValue) + { + if (config.TryGetProperty(key, out var element)) + { + try { return element.Deserialize() ?? defaultValue; } + catch { return defaultValue; } + } + return defaultValue; + } + + public override ValueTask DisposeAsync() + { + _httpClient?.Dispose(); + return base.DisposeAsync(); + } +} +``` + +--- + +## NuGet Package Configuration + +```xml + + + + net10.0 + enable + enable + preview + + StellaOps.ReleaseOrchestrator.Plugin.Sdk + 1.0.0 + Stella Ops + Stella Ops + SDK for building Stella Ops Release Orchestrator plugins (steps, gates, connectors) + stellaops;plugin;step;gate;connector;release;orchestrator;workflow + AGPL-3.0-or-later + https://stellaops.io + https://git.stella-ops.org/stella-ops.org/git.stella-ops.org + git + README.md + true + + + + + + + + + + + +``` + +--- + +## Acceptance Criteria + +- [ ] `IStepPlugin` and `StepPluginBase` enable step development +- [ ] `IGatePlugin` and `GatePluginBase` enable gate development +- [ ] Connector base classes for all categories (SCM, Registry, Vault, Notify) +- [ ] `StepTestHost` enables step unit testing +- [ ] `GateTestHost` enables gate unit testing +- [ ] `ConnectorTestHost` enables connector unit testing +- [ ] Mock contexts for releases and environments +- [ ] Project templates install via `dotnet new` +- [ ] Sample plugins demonstrate best practices +- [ ] SDK documentation complete +- [ ] NuGet package builds + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `StepPluginBase_ProvidesDefinitions` | Step definitions registered | +| `GatePluginBase_ProvidesDefinitions` | Gate definitions registered | +| `StepTestHost_ExecutesStep` | Test host works | +| `GateTestHost_EvaluatesGate` | Test host works | +| `MockContexts_ProvideDefaults` | Mock contexts usable | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `SampleStepPlugin_ExecutesCorrectly` | HTTP request step works | +| `TemplateProject_Builds` | Template generates valid project | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 100_012 Plugin SDK | Internal | TODO | +| 101_002 Registry Extensions | Internal | TODO | +| 101_003 Loader Extensions | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IStepPlugin interface | TODO | | +| StepPluginBase | TODO | | +| IGatePlugin interface | TODO | | +| GatePluginBase | TODO | | +| Connector base classes (5) | TODO | | +| StepTestHost | TODO | | +| GateTestHost | TODO | | +| ConnectorTestHost | TODO | | +| Mock contexts | TODO | | +| Project templates | TODO | | +| Sample plugins (3) | TODO | | +| NuGet package | TODO | | +| SDK documentation | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 10-Jan-2026 | Refocused on Release Orchestrator-specific SDK extensions (builds on Phase 100 core) | diff --git a/docs/implplan/SPRINT_20260110_102_000_INDEX_integration_hub.md b/docs/implplan/SPRINT_20260110_102_000_INDEX_integration_hub.md new file mode 100644 index 000000000..cf648e2ec --- /dev/null +++ b/docs/implplan/SPRINT_20260110_102_000_INDEX_integration_hub.md @@ -0,0 +1,201 @@ +# SPRINT INDEX: Phase 2 - Integration Hub + +> **Epic:** Release Orchestrator +> **Phase:** 2 - Integration Hub +> **Batch:** 102 +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_release_orchestrator.md) + +--- + +## Overview + +Phase 2 builds the Integration Hub - the system for connecting to external SCM, CI, Registry, Vault, and Notification services. Includes the connector runtime and built-in connectors. + +### Objectives + +- Implement Integration Manager for CRUD operations +- Build Connector Runtime for plugin execution +- Create built-in SCM connectors (GitHub, GitLab, Gitea) +- Create built-in Registry connectors (Docker Hub, Harbor, ACR, ECR, GCR) +- Create built-in Vault connector +- Implement Doctor checks for integration health + +--- + +## Sprint Structure + +| Sprint ID | Title | Module | Status | Dependencies | +|-----------|-------|--------|--------|--------------| +| 102_001 | Integration Manager | INTHUB | TODO | 101_002 | +| 102_002 | Connector Runtime | INTHUB | TODO | 102_001 | +| 102_003 | Built-in SCM Connectors | INTHUB | TODO | 102_002 | +| 102_004 | Built-in Registry Connectors | INTHUB | TODO | 102_002 | +| 102_005 | Built-in Vault Connector | INTHUB | TODO | 102_002 | +| 102_006 | Doctor Checks | INTHUB | TODO | 102_002 | + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ INTEGRATION HUB │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ INTEGRATION MANAGER (102_001) │ │ +│ │ │ │ +│ │ - Integration CRUD - Config encryption │ │ +│ │ - Health status tracking - Integration events │ │ +│ │ - Tenant isolation - Audit logging │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ CONNECTOR RUNTIME (102_002) │ │ +│ │ │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ Connector │ │ Connector │ │ Pool │ │ │ +│ │ │ Factory │ │ Pool │ │ Manager │ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ │ │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ Retry │ │ Circuit │ │ Rate │ │ │ +│ │ │ Policy │ │ Breaker │ │ Limiter │ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ BUILT-IN CONNECTORS │ │ +│ │ │ │ +│ │ ┌─────────────────────┐ ┌─────────────────────┐ │ │ +│ │ │ SCM (102_003) │ │ Registry (102_004) │ │ │ +│ │ │ │ │ │ │ │ +│ │ │ - GitHub │ │ - Docker Hub │ │ │ +│ │ │ - GitLab │ │ - Harbor │ │ │ +│ │ │ - Gitea │ │ - ACR / ECR / GCR │ │ │ +│ │ │ - Azure DevOps │ │ - Generic OCI │ │ │ +│ │ └─────────────────────┘ └─────────────────────┘ │ │ +│ │ │ │ +│ │ ┌─────────────────────┐ ┌─────────────────────┐ │ │ +│ │ │ Vault (102_005) │ │ Doctor (102_006) │ │ │ +│ │ │ │ │ │ │ │ +│ │ │ - HashiCorp Vault │ │ - Connectivity │ │ │ +│ │ │ - Azure Key Vault │ │ - Credentials │ │ │ +│ │ │ - AWS Secrets Mgr │ │ - Permissions │ │ │ +│ │ │ │ │ - Rate limits │ │ │ +│ │ └─────────────────────┘ └─────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Deliverables Summary + +### 102_001: Integration Manager + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IIntegrationManager` | Interface | CRUD operations | +| `IntegrationManager` | Class | Implementation | +| `IntegrationStore` | Class | Database persistence | +| `IntegrationEncryption` | Class | Config encryption | +| `IntegrationEvents` | Events | Domain events | + +### 102_002: Connector Runtime + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IConnectorFactory` | Interface | Creates connectors | +| `ConnectorFactory` | Class | Plugin-aware factory | +| `ConnectorPool` | Class | Connection pooling | +| `ConnectorRetryPolicy` | Class | Retry with backoff | +| `ConnectorCircuitBreaker` | Class | Fault tolerance | +| `ConnectorRateLimiter` | Class | Rate limiting | + +### 102_003: Built-in SCM Connectors + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `GitHubConnector` | Connector | GitHub.com / GHE | +| `GitLabConnector` | Connector | GitLab.com / Self-hosted | +| `GiteaConnector` | Connector | Gitea self-hosted | +| `AzureDevOpsConnector` | Connector | Azure DevOps Services | + +### 102_004: Built-in Registry Connectors + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `DockerHubConnector` | Connector | Docker Hub | +| `HarborConnector` | Connector | Harbor registry | +| `AcrConnector` | Connector | Azure Container Registry | +| `EcrConnector` | Connector | AWS ECR | +| `GcrConnector` | Connector | Google Container Registry | +| `GenericOciConnector` | Connector | Any OCI-compliant registry | + +### 102_005: Built-in Vault Connector + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `HashiCorpVaultConnector` | Connector | HashiCorp Vault | +| `AzureKeyVaultConnector` | Connector | Azure Key Vault | +| `AwsSecretsManagerConnector` | Connector | AWS Secrets Manager | + +### 102_006: Doctor Checks + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IDoctorCheck` | Interface | Health check contract | +| `ConnectivityCheck` | Check | Network connectivity | +| `CredentialsCheck` | Check | Credential validity | +| `PermissionsCheck` | Check | Required permissions | +| `RateLimitCheck` | Check | Rate limit status | +| `DoctorReport` | Record | Aggregated results | + +--- + +## Dependencies + +### External Dependencies + +| Dependency | Purpose | +|------------|---------| +| Octokit | GitHub API client | +| GitLabApiClient | GitLab API client | +| AWSSDK.* | AWS service clients | +| Azure.* | Azure service clients | +| Docker.DotNet | Docker API client | + +### Internal Dependencies + +| Module | Purpose | +|--------|---------| +| 101_002 Plugin Registry | Plugin discovery | +| 101_003 Plugin Loader | Plugin execution | +| Authority | Tenant context, credentials | + +--- + +## Acceptance Criteria + +- [ ] Integration CRUD operations work +- [ ] Config encryption with tenant keys +- [ ] Connector factory creates correct instances +- [ ] Connection pooling reduces overhead +- [ ] Retry policy handles transient failures +- [ ] Circuit breaker prevents cascading failures +- [ ] All built-in SCM connectors work +- [ ] All built-in registry connectors work +- [ ] Vault connectors retrieve secrets +- [ ] Doctor checks identify issues +- [ ] Unit test coverage ≥80% +- [ ] Integration tests pass + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Phase 2 index created | diff --git a/docs/implplan/SPRINT_20260110_102_001_INTHUB_integration_manager.md b/docs/implplan/SPRINT_20260110_102_001_INTHUB_integration_manager.md new file mode 100644 index 000000000..8cd6d9e17 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_102_001_INTHUB_integration_manager.md @@ -0,0 +1,328 @@ +# SPRINT: Integration Manager + +> **Sprint ID:** 102_001 +> **Module:** INTHUB +> **Phase:** 2 - Integration Hub +> **Status:** TODO +> **Parent:** [102_000_INDEX](SPRINT_20260110_102_000_INDEX_integration_hub.md) + +--- + +## Overview + +Implement the Integration Manager for creating, updating, and managing integrations with external systems. Includes encrypted configuration storage and tenant isolation. + +### Objectives + +- CRUD operations for integrations +- Encrypted configuration storage +- Health status tracking +- Tenant isolation +- Domain events for integration changes + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.IntegrationHub/ +│ ├── Manager/ +│ │ ├── IIntegrationManager.cs +│ │ ├── IntegrationManager.cs +│ │ └── IntegrationValidator.cs +│ ├── Store/ +│ │ ├── IIntegrationStore.cs +│ │ ├── IntegrationStore.cs +│ │ └── IntegrationMapper.cs +│ ├── Encryption/ +│ │ ├── IIntegrationEncryption.cs +│ │ └── IntegrationEncryption.cs +│ ├── Events/ +│ │ ├── IntegrationCreated.cs +│ │ ├── IntegrationUpdated.cs +│ │ └── IntegrationDeleted.cs +│ └── Models/ +│ ├── Integration.cs +│ ├── IntegrationType.cs +│ └── HealthStatus.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.IntegrationHub.Tests/ + └── Manager/ +``` + +--- + +## Architecture Reference + +- [Integration Hub](../modules/release-orchestrator/modules/integration-hub.md) +- [Security Overview](../modules/release-orchestrator/security/overview.md) + +--- + +## Deliverables + +### IIntegrationManager Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Manager; + +public interface IIntegrationManager +{ + Task CreateAsync( + CreateIntegrationRequest request, + CancellationToken ct = default); + + Task UpdateAsync( + Guid id, + UpdateIntegrationRequest request, + CancellationToken ct = default); + + Task DeleteAsync(Guid id, CancellationToken ct = default); + + Task GetAsync(Guid id, CancellationToken ct = default); + + Task GetByNameAsync( + string name, + CancellationToken ct = default); + + Task> ListAsync( + IntegrationFilter? filter = null, + CancellationToken ct = default); + + Task> ListByTypeAsync( + IntegrationType type, + CancellationToken ct = default); + + Task SetEnabledAsync(Guid id, bool enabled, CancellationToken ct = default); + + Task UpdateHealthAsync( + Guid id, + HealthStatus status, + CancellationToken ct = default); + + Task TestConnectionAsync( + Guid id, + CancellationToken ct = default); +} + +public sealed record CreateIntegrationRequest( + string Name, + string DisplayName, + IntegrationType Type, + JsonElement Configuration +); + +public sealed record UpdateIntegrationRequest( + string? DisplayName = null, + JsonElement? Configuration = null, + bool? IsEnabled = null +); + +public sealed record IntegrationFilter( + IntegrationType? Type = null, + bool? IsEnabled = null, + HealthStatus? HealthStatus = null +); +``` + +### Integration Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Models; + +public sealed record Integration +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required string Name { get; init; } + public required string DisplayName { get; init; } + public required IntegrationType Type { get; init; } + public required bool IsEnabled { get; init; } + public required HealthStatus HealthStatus { get; init; } + public DateTimeOffset? LastHealthCheck { get; init; } + public DateTimeOffset CreatedAt { get; init; } + public DateTimeOffset UpdatedAt { get; init; } + public Guid CreatedBy { get; init; } + + // Configuration is decrypted on demand, not stored in memory + public JsonElement? Configuration { get; init; } +} + +public enum IntegrationType +{ + Scm, + Ci, + Registry, + Vault, + Notify +} + +public enum HealthStatus +{ + Unknown, + Healthy, + Degraded, + Unhealthy +} +``` + +### IntegrationEncryption + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Encryption; + +public interface IIntegrationEncryption +{ + Task EncryptAsync( + Guid tenantId, + JsonElement configuration, + CancellationToken ct = default); + + Task DecryptAsync( + Guid tenantId, + byte[] encryptedConfig, + CancellationToken ct = default); +} + +public sealed class IntegrationEncryption : IIntegrationEncryption +{ + private readonly ITenantKeyProvider _keyProvider; + + public IntegrationEncryption(ITenantKeyProvider keyProvider) + { + _keyProvider = keyProvider; + } + + public async Task EncryptAsync( + Guid tenantId, + JsonElement configuration, + CancellationToken ct = default) + { + var key = await _keyProvider.GetKeyAsync(tenantId, ct); + var json = configuration.GetRawText(); + var plaintext = Encoding.UTF8.GetBytes(json); + + using var aes = Aes.Create(); + aes.Key = key; + aes.GenerateIV(); + + using var encryptor = aes.CreateEncryptor(); + var ciphertext = encryptor.TransformFinalBlock( + plaintext, 0, plaintext.Length); + + // Prepend IV to ciphertext + var result = new byte[aes.IV.Length + ciphertext.Length]; + aes.IV.CopyTo(result, 0); + ciphertext.CopyTo(result, aes.IV.Length); + + return result; + } + + public async Task DecryptAsync( + Guid tenantId, + byte[] encryptedConfig, + CancellationToken ct = default) + { + var key = await _keyProvider.GetKeyAsync(tenantId, ct); + + using var aes = Aes.Create(); + aes.Key = key; + + // Extract IV from beginning + var iv = encryptedConfig[..16]; + var ciphertext = encryptedConfig[16..]; + aes.IV = iv; + + using var decryptor = aes.CreateDecryptor(); + var plaintext = decryptor.TransformFinalBlock( + ciphertext, 0, ciphertext.Length); + + var json = Encoding.UTF8.GetString(plaintext); + return JsonDocument.Parse(json).RootElement; + } +} +``` + +### Domain Events + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Events; + +public sealed record IntegrationCreated( + Guid IntegrationId, + Guid TenantId, + string Name, + IntegrationType Type, + DateTimeOffset CreatedAt, + Guid CreatedBy +) : IDomainEvent; + +public sealed record IntegrationUpdated( + Guid IntegrationId, + Guid TenantId, + IReadOnlyList ChangedFields, + DateTimeOffset UpdatedAt, + Guid UpdatedBy +) : IDomainEvent; + +public sealed record IntegrationDeleted( + Guid IntegrationId, + Guid TenantId, + string Name, + DateTimeOffset DeletedAt, + Guid DeletedBy +) : IDomainEvent; + +public sealed record IntegrationHealthChanged( + Guid IntegrationId, + Guid TenantId, + HealthStatus OldStatus, + HealthStatus NewStatus, + DateTimeOffset ChangedAt +) : IDomainEvent; +``` + +--- + +## Acceptance Criteria + +- [ ] Create integration with encrypted config +- [ ] Update integration preserves encryption +- [ ] Delete integration removes all data +- [ ] List integrations with filtering +- [ ] Tenant isolation enforced +- [ ] Domain events published +- [ ] Health status tracked +- [ ] Connection test works +- [ ] Unit test coverage ≥85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 101_001 Database Schema | Internal | TODO | +| 101_002 Plugin Registry | Internal | TODO | +| Authority | Internal | Exists | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IIntegrationManager | TODO | | +| IntegrationManager | TODO | | +| IntegrationStore | TODO | | +| IntegrationEncryption | TODO | | +| Domain events | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_102_002_INTHUB_connector_runtime.md b/docs/implplan/SPRINT_20260110_102_002_INTHUB_connector_runtime.md new file mode 100644 index 000000000..59e30ea04 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_102_002_INTHUB_connector_runtime.md @@ -0,0 +1,522 @@ +# SPRINT: Connector Runtime + +> **Sprint ID:** 102_002 +> **Module:** INTHUB +> **Phase:** 2 - Integration Hub +> **Status:** TODO +> **Parent:** [102_000_INDEX](SPRINT_20260110_102_000_INDEX_integration_hub.md) + +--- + +## Overview + +Build the Connector Runtime that manages connector instantiation, pooling, and resilience patterns. Handles both built-in and plugin connectors uniformly. + +### Objectives + +- Connector factory for creating instances +- Connection pooling for efficiency +- Retry policies with exponential backoff +- Circuit breaker for fault isolation +- Rate limiting per integration + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.IntegrationHub/ +│ └── Runtime/ +│ ├── IConnectorFactory.cs +│ ├── ConnectorFactory.cs +│ ├── ConnectorPool.cs +│ ├── ConnectorPoolManager.cs +│ ├── ConnectorRetryPolicy.cs +│ ├── ConnectorCircuitBreaker.cs +│ ├── ConnectorRateLimiter.cs +│ └── ConnectorContext.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.IntegrationHub.Tests/ + └── Runtime/ +``` + +--- + +## Deliverables + +### IConnectorFactory + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Runtime; + +public interface IConnectorFactory +{ + Task CreateAsync( + Integration integration, + CancellationToken ct = default); + + Task CreateAsync( + Integration integration, + CancellationToken ct = default) where T : IConnectorPlugin; + + bool CanCreate(IntegrationType type, string? pluginName = null); + + IReadOnlyList GetAvailableConnectors(IntegrationType type); +} + +public sealed class ConnectorFactory : IConnectorFactory +{ + private readonly IPluginRegistry _pluginRegistry; + private readonly IPluginLoader _pluginLoader; + private readonly IIntegrationEncryption _encryption; + private readonly IServiceProvider _serviceProvider; + private readonly ILogger _logger; + + private readonly Dictionary _builtInConnectors = new() + { + ["github"] = typeof(GitHubConnector), + ["gitlab"] = typeof(GitLabConnector), + ["gitea"] = typeof(GiteaConnector), + ["dockerhub"] = typeof(DockerHubConnector), + ["harbor"] = typeof(HarborConnector), + ["acr"] = typeof(AcrConnector), + ["ecr"] = typeof(EcrConnector), + ["gcr"] = typeof(GcrConnector), + ["hashicorp-vault"] = typeof(HashiCorpVaultConnector), + ["azure-keyvault"] = typeof(AzureKeyVaultConnector) + }; + + public async Task CreateAsync( + Integration integration, + CancellationToken ct = default) + { + var config = await _encryption.DecryptAsync( + integration.TenantId, + integration.EncryptedConfig!, + ct); + + // Check built-in first + var connectorKey = GetConnectorKey(config); + if (_builtInConnectors.TryGetValue(connectorKey, out var type)) + { + return CreateBuiltIn(type, integration, config); + } + + // Fall back to plugin + var plugin = await _pluginLoader.GetPlugin(connectorKey); + if (plugin?.Sandbox is not null) + { + return CreateFromPlugin(plugin, integration, config); + } + + throw new ConnectorNotFoundException( + $"No connector found for type {connectorKey}"); + } +} +``` + +### ConnectorPool + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Runtime; + +public sealed class ConnectorPool : IAsyncDisposable +{ + private readonly Integration _integration; + private readonly IConnectorFactory _factory; + private readonly Channel _available; + private readonly ConcurrentDictionary _inUse = new(); + private readonly int _maxSize; + private int _currentSize; + + public ConnectorPool( + Integration integration, + IConnectorFactory factory, + int maxSize = 10) + { + _integration = integration; + _factory = factory; + _maxSize = maxSize; + _available = Channel.CreateBounded(maxSize); + } + + public async Task AcquireAsync( + CancellationToken ct = default) + { + // Try to get existing + if (_available.Reader.TryRead(out var existing)) + { + existing.MarkInUse(); + _inUse[existing.Id] = existing; + return existing; + } + + // Create new if under limit + if (Interlocked.Increment(ref _currentSize) <= _maxSize) + { + var connector = await _factory.CreateAsync(_integration, ct); + var pooled = new PooledConnector(connector, this); + pooled.MarkInUse(); + _inUse[pooled.Id] = pooled; + return pooled; + } + + Interlocked.Decrement(ref _currentSize); + + // Wait for available + var released = await _available.Reader.ReadAsync(ct); + released.MarkInUse(); + _inUse[released.Id] = released; + return released; + } + + public void Release(PooledConnector connector) + { + _inUse.TryRemove(connector.Id, out _); + connector.MarkAvailable(); + + if (!_available.Writer.TryWrite(connector)) + { + // Pool full, dispose + connector.DisposeConnector(); + Interlocked.Decrement(ref _currentSize); + } + } + + public async ValueTask DisposeAsync() + { + _available.Writer.Complete(); + + await foreach (var connector in _available.Reader.ReadAllAsync()) + { + connector.DisposeConnector(); + } + + foreach (var (_, connector) in _inUse) + { + connector.DisposeConnector(); + } + } +} + +public sealed class PooledConnector : IAsyncDisposable +{ + private readonly ConnectorPool _pool; + private readonly IConnectorPlugin _connector; + + public Guid Id { get; } = Guid.NewGuid(); + public IConnectorPlugin Connector => _connector; + public bool InUse { get; private set; } + public DateTimeOffset LastUsed { get; private set; } + + internal PooledConnector(IConnectorPlugin connector, ConnectorPool pool) + { + _connector = connector; + _pool = pool; + } + + internal void MarkInUse() + { + InUse = true; + LastUsed = TimeProvider.System.GetUtcNow(); + } + + internal void MarkAvailable() => InUse = false; + + internal void DisposeConnector() + { + if (_connector is IDisposable disposable) + disposable.Dispose(); + } + + public ValueTask DisposeAsync() + { + _pool.Release(this); + return ValueTask.CompletedTask; + } +} +``` + +### ConnectorRetryPolicy + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Runtime; + +public sealed class ConnectorRetryPolicy +{ + private readonly int _maxRetries; + private readonly TimeSpan _baseDelay; + private readonly ILogger _logger; + + public ConnectorRetryPolicy( + int maxRetries = 3, + TimeSpan? baseDelay = null, + ILogger? logger = null) + { + _maxRetries = maxRetries; + _baseDelay = baseDelay ?? TimeSpan.FromMilliseconds(200); + _logger = logger ?? NullLogger.Instance; + } + + public async Task ExecuteAsync( + Func> action, + CancellationToken ct = default) + { + var attempt = 0; + var exceptions = new List(); + + while (true) + { + try + { + return await action(ct); + } + catch (Exception ex) when (IsTransient(ex) && attempt < _maxRetries) + { + exceptions.Add(ex); + attempt++; + + var delay = CalculateDelay(attempt); + _logger.LogWarning( + "Connector operation failed (attempt {Attempt}/{Max}), retrying in {Delay}ms: {Error}", + attempt, _maxRetries, delay.TotalMilliseconds, ex.Message); + + await Task.Delay(delay, ct); + } + catch (Exception ex) + { + exceptions.Add(ex); + throw new ConnectorRetryExhaustedException( + $"Operation failed after {attempt + 1} attempts", + new AggregateException(exceptions)); + } + } + } + + private TimeSpan CalculateDelay(int attempt) + { + // Exponential backoff with jitter + var exponential = Math.Pow(2, attempt - 1); + var jitter = Random.Shared.NextDouble() * 0.3 + 0.85; // 0.85-1.15 + return TimeSpan.FromMilliseconds( + _baseDelay.TotalMilliseconds * exponential * jitter); + } + + private static bool IsTransient(Exception ex) => + ex is HttpRequestException or + TimeoutException or + TaskCanceledException { CancellationToken.IsCancellationRequested: false } or + OperationCanceledException { CancellationToken.IsCancellationRequested: false }; +} +``` + +### ConnectorCircuitBreaker + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Runtime; + +public sealed class ConnectorCircuitBreaker +{ + private readonly int _failureThreshold; + private readonly TimeSpan _resetTimeout; + private readonly ILogger _logger; + + private int _failureCount; + private CircuitState _state = CircuitState.Closed; + private DateTimeOffset _lastFailure; + private DateTimeOffset _openedAt; + + public ConnectorCircuitBreaker( + int failureThreshold = 5, + TimeSpan? resetTimeout = null, + ILogger? logger = null) + { + _failureThreshold = failureThreshold; + _resetTimeout = resetTimeout ?? TimeSpan.FromMinutes(1); + _logger = logger ?? NullLogger.Instance; + } + + public CircuitState State => _state; + + public async Task ExecuteAsync( + Func> action, + CancellationToken ct = default) + { + if (_state == CircuitState.Open) + { + if (ShouldAttemptReset()) + { + _state = CircuitState.HalfOpen; + _logger.LogInformation("Circuit breaker half-open, attempting reset"); + } + else + { + throw new CircuitBreakerOpenException( + $"Circuit breaker is open, retry after {_openedAt.Add(_resetTimeout)}"); + } + } + + try + { + var result = await action(ct); + OnSuccess(); + return result; + } + catch (Exception ex) when (!IsCritical(ex)) + { + OnFailure(); + throw; + } + } + + private void OnSuccess() + { + _failureCount = 0; + if (_state == CircuitState.HalfOpen) + { + _state = CircuitState.Closed; + _logger.LogInformation("Circuit breaker closed after successful request"); + } + } + + private void OnFailure() + { + _lastFailure = TimeProvider.System.GetUtcNow(); + _failureCount++; + + if (_state == CircuitState.HalfOpen) + { + _state = CircuitState.Open; + _openedAt = _lastFailure; + _logger.LogWarning("Circuit breaker opened after half-open failure"); + } + else if (_failureCount >= _failureThreshold) + { + _state = CircuitState.Open; + _openedAt = _lastFailure; + _logger.LogWarning( + "Circuit breaker opened after {Count} failures", + _failureCount); + } + } + + private bool ShouldAttemptReset() => + TimeProvider.System.GetUtcNow() >= _openedAt.Add(_resetTimeout); + + private static bool IsCritical(Exception ex) => + ex is OutOfMemoryException or StackOverflowException; +} + +public enum CircuitState +{ + Closed, + Open, + HalfOpen +} +``` + +### ConnectorRateLimiter + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Runtime; + +public sealed class ConnectorRateLimiter : IAsyncDisposable +{ + private readonly SemaphoreSlim _semaphore; + private readonly RateLimitLease[] _leases; + private readonly TimeSpan _window; + private readonly int _limit; + private int _leaseIndex; + + public ConnectorRateLimiter(int requestsPerWindow, TimeSpan window) + { + _limit = requestsPerWindow; + _window = window; + _semaphore = new SemaphoreSlim(requestsPerWindow, requestsPerWindow); + _leases = new RateLimitLease[requestsPerWindow]; + } + + public async Task AcquireAsync(CancellationToken ct = default) + { + await _semaphore.WaitAsync(ct); + + var index = Interlocked.Increment(ref _leaseIndex) % _limit; + var oldLease = _leases[index]; + + if (oldLease is not null) + { + var elapsed = TimeProvider.System.GetUtcNow() - oldLease.AcquiredAt; + if (elapsed < _window) + { + await Task.Delay(_window - elapsed, ct); + } + } + + var lease = new RateLimitLease(this); + _leases[index] = lease; + return lease; + } + + private void Release() => _semaphore.Release(); + + public ValueTask DisposeAsync() + { + _semaphore.Dispose(); + return ValueTask.CompletedTask; + } + + private sealed class RateLimitLease : IDisposable + { + private readonly ConnectorRateLimiter _limiter; + public DateTimeOffset AcquiredAt { get; } = TimeProvider.System.GetUtcNow(); + + public RateLimitLease(ConnectorRateLimiter limiter) => _limiter = limiter; + public void Dispose() => _limiter.Release(); + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Factory creates built-in connectors +- [ ] Factory creates plugin connectors +- [ ] Connection pooling works +- [ ] Retry policy retries transient failures +- [ ] Circuit breaker opens on failures +- [ ] Rate limiter enforces limits +- [ ] Metrics exposed for monitoring +- [ ] Unit test coverage ≥85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 102_001 Integration Manager | Internal | TODO | +| 101_003 Plugin Loader | Internal | TODO | +| Polly | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IConnectorFactory | TODO | | +| ConnectorFactory | TODO | | +| ConnectorPool | TODO | | +| ConnectorRetryPolicy | TODO | | +| ConnectorCircuitBreaker | TODO | | +| ConnectorRateLimiter | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_102_003_INTHUB_scm_connectors.md b/docs/implplan/SPRINT_20260110_102_003_INTHUB_scm_connectors.md new file mode 100644 index 000000000..313e50f4b --- /dev/null +++ b/docs/implplan/SPRINT_20260110_102_003_INTHUB_scm_connectors.md @@ -0,0 +1,460 @@ +# SPRINT: Built-in SCM Connectors + +> **Sprint ID:** 102_003 +> **Module:** INTHUB +> **Phase:** 2 - Integration Hub +> **Status:** TODO +> **Parent:** [102_000_INDEX](SPRINT_20260110_102_000_INDEX_integration_hub.md) + +--- + +## Overview + +Implement built-in SCM (Source Control Management) connectors for GitHub, GitLab, Gitea, and Azure DevOps. Each connector implements the `IScmConnector` interface. + +### Objectives + +- GitHub connector (GitHub.com and GitHub Enterprise) +- GitLab connector (GitLab.com and self-hosted) +- Gitea connector (self-hosted) +- Azure DevOps connector +- Webhook support for all connectors + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.IntegrationHub/ +│ └── Connectors/ +│ └── Scm/ +│ ├── GitHubConnector.cs +│ ├── GitLabConnector.cs +│ ├── GiteaConnector.cs +│ └── AzureDevOpsConnector.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.IntegrationHub.Tests/ + └── Connectors/ + └── Scm/ +``` + +--- + +## Deliverables + +### GitHubConnector + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Connectors.Scm; + +public sealed class GitHubConnector : IScmConnector +{ + public ConnectorCategory Category => ConnectorCategory.Scm; + public IReadOnlyList Capabilities { get; } = [ + "repositories", "commits", "branches", "webhooks", "actions" + ]; + + private GitHubClient? _client; + private ConnectorContext? _context; + + public Task InitializeAsync( + ConnectorContext context, + CancellationToken ct = default) + { + _context = context; + var config = ParseConfig(context.Configuration); + + var credentials = new Credentials( + config.Token ?? await ResolveTokenAsync(context, ct)); + + _client = new GitHubClient(new ProductHeaderValue("StellaOps")) + { + Credentials = credentials + }; + + if (!string.IsNullOrEmpty(config.BaseUrl)) + { + _client = new GitHubClient( + new ProductHeaderValue("StellaOps"), + new Uri(config.BaseUrl)) + { + Credentials = credentials + }; + } + + return Task.CompletedTask; + } + + public async Task ValidateConfigAsync( + JsonElement config, + CancellationToken ct = default) + { + var errors = new List(); + var parsed = ParseConfig(config); + + if (string.IsNullOrEmpty(parsed.Token) && + string.IsNullOrEmpty(parsed.TokenSecretRef)) + { + errors.Add("Either 'token' or 'tokenSecretRef' is required"); + } + + return errors.Count == 0 + ? ConfigValidationResult.Success() + : ConfigValidationResult.Failure(errors.ToArray()); + } + + public async Task TestConnectionAsync( + ConnectorContext context, + CancellationToken ct = default) + { + var sw = Stopwatch.StartNew(); + try + { + var user = await _client!.User.Current(); + return new ConnectionTestResult( + Success: true, + Message: $"Authenticated as {user.Login}", + ResponseTime: sw.Elapsed); + } + catch (Exception ex) + { + return new ConnectionTestResult( + Success: false, + Message: ex.Message, + ResponseTime: sw.Elapsed); + } + } + + public async Task> ListRepositoriesAsync( + ConnectorContext context, + string? searchPattern = null, + CancellationToken ct = default) + { + var repos = await _client!.Repository.GetAllForCurrent(); + + var result = repos + .Where(r => searchPattern is null || + r.FullName.Contains(searchPattern, StringComparison.OrdinalIgnoreCase)) + .Select(r => new ScmRepository( + Id: r.Id.ToString(CultureInfo.InvariantCulture), + Name: r.Name, + FullName: r.FullName, + DefaultBranch: r.DefaultBranch, + CloneUrl: r.CloneUrl, + IsPrivate: r.Private)) + .ToList(); + + return result; + } + + public async Task GetCommitAsync( + ConnectorContext context, + string repository, + string commitSha, + CancellationToken ct = default) + { + var (owner, repo) = ParseRepository(repository); + var commit = await _client!.Repository.Commit.Get(owner, repo, commitSha); + + return new ScmCommit( + Sha: commit.Sha, + Message: commit.Commit.Message, + AuthorName: commit.Commit.Author.Name, + AuthorEmail: commit.Commit.Author.Email, + AuthoredAt: commit.Commit.Author.Date, + ParentSha: commit.Parents.FirstOrDefault()?.Sha); + } + + public async Task CreateWebhookAsync( + ConnectorContext context, + string repository, + IReadOnlyList events, + string callbackUrl, + CancellationToken ct = default) + { + var (owner, repo) = ParseRepository(repository); + + var webhook = await _client!.Repository.Hooks.Create(owner, repo, new NewRepositoryHook( + "web", + new Dictionary + { + ["url"] = callbackUrl, + ["content_type"] = "json", + ["secret"] = GenerateWebhookSecret() + }) + { + Events = events.ToArray(), + Active = true + }); + + return new WebhookRegistration( + Id: webhook.Id.ToString(CultureInfo.InvariantCulture), + Url: callbackUrl, + Secret: webhook.Config["secret"], + Events: events); + } + + private static (string Owner, string Repo) ParseRepository(string fullName) + { + var parts = fullName.Split('/'); + return (parts[0], parts[1]); + } +} + +internal sealed record GitHubConfig( + string? BaseUrl, + string? Token, + string? TokenSecretRef +); +``` + +### GitLabConnector + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Connectors.Scm; + +public sealed class GitLabConnector : IScmConnector +{ + public ConnectorCategory Category => ConnectorCategory.Scm; + public IReadOnlyList Capabilities { get; } = [ + "repositories", "commits", "branches", "webhooks", "pipelines" + ]; + + private GitLabClient? _client; + + public async Task InitializeAsync( + ConnectorContext context, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + var token = config.Token ?? + await context.SecretResolver.ResolveAsync(config.TokenSecretRef!, ct); + + var baseUrl = config.BaseUrl ?? "https://gitlab.com"; + _client = new GitLabClient(baseUrl, token); + } + + public async Task> ListRepositoriesAsync( + ConnectorContext context, + string? searchPattern = null, + CancellationToken ct = default) + { + var projects = await _client!.Projects.GetAsync(new ProjectQueryOptions + { + Search = searchPattern, + Membership = true + }); + + return projects.Select(p => new ScmRepository( + Id: p.Id.ToString(CultureInfo.InvariantCulture), + Name: p.Name, + FullName: p.PathWithNamespace, + DefaultBranch: p.DefaultBranch, + CloneUrl: p.HttpUrlToRepo, + IsPrivate: p.Visibility == ProjectVisibility.Private)) + .ToList(); + } + + public async Task GetCommitAsync( + ConnectorContext context, + string repository, + string commitSha, + CancellationToken ct = default) + { + var commit = await _client!.Commits.GetAsync(repository, commitSha); + + return new ScmCommit( + Sha: commit.Id, + Message: commit.Message, + AuthorName: commit.AuthorName, + AuthorEmail: commit.AuthorEmail, + AuthoredAt: commit.AuthoredDate, + ParentSha: commit.ParentIds?.FirstOrDefault()); + } + + public async Task CreateWebhookAsync( + ConnectorContext context, + string repository, + IReadOnlyList events, + string callbackUrl, + CancellationToken ct = default) + { + var secret = GenerateWebhookSecret(); + var hook = await _client!.Projects.CreateWebhookAsync(repository, new CreateWebhookRequest + { + Url = callbackUrl, + Token = secret, + PushEvents = events.Contains("push"), + TagPushEvents = events.Contains("tag_push"), + MergeRequestsEvents = events.Contains("merge_request"), + PipelineEvents = events.Contains("pipeline") + }); + + return new WebhookRegistration( + Id: hook.Id.ToString(CultureInfo.InvariantCulture), + Url: callbackUrl, + Secret: secret, + Events: events); + } +} +``` + +### GiteaConnector + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Connectors.Scm; + +public sealed class GiteaConnector : IScmConnector +{ + public ConnectorCategory Category => ConnectorCategory.Scm; + public IReadOnlyList Capabilities { get; } = [ + "repositories", "commits", "branches", "webhooks" + ]; + + private HttpClient? _httpClient; + private string? _baseUrl; + + public async Task InitializeAsync( + ConnectorContext context, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + var token = config.Token ?? + await context.SecretResolver.ResolveAsync(config.TokenSecretRef!, ct); + + _baseUrl = config.BaseUrl?.TrimEnd('/'); + + _httpClient = new HttpClient + { + BaseAddress = new Uri(_baseUrl + "/api/v1/") + }; + _httpClient.DefaultRequestHeaders.Authorization = + new AuthenticationHeaderValue("token", token); + } + + public async Task> ListRepositoriesAsync( + ConnectorContext context, + string? searchPattern = null, + CancellationToken ct = default) + { + var response = await _httpClient!.GetAsync("user/repos", ct); + response.EnsureSuccessStatusCode(); + + var repos = await response.Content + .ReadFromJsonAsync>(ct); + + return repos! + .Where(r => searchPattern is null || + r.FullName.Contains(searchPattern, StringComparison.OrdinalIgnoreCase)) + .Select(r => new ScmRepository( + Id: r.Id.ToString(CultureInfo.InvariantCulture), + Name: r.Name, + FullName: r.FullName, + DefaultBranch: r.DefaultBranch, + CloneUrl: r.CloneUrl, + IsPrivate: r.Private)) + .ToList(); + } + + // Additional methods... +} +``` + +### AzureDevOpsConnector + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Connectors.Scm; + +public sealed class AzureDevOpsConnector : IScmConnector +{ + public ConnectorCategory Category => ConnectorCategory.Scm; + public IReadOnlyList Capabilities { get; } = [ + "repositories", "commits", "branches", "webhooks", "pipelines", "workitems" + ]; + + private VssConnection? _connection; + private GitHttpClient? _gitClient; + + public async Task InitializeAsync( + ConnectorContext context, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + var pat = config.Pat ?? + await context.SecretResolver.ResolveAsync(config.PatSecretRef!, ct); + + var credentials = new VssBasicCredential(string.Empty, pat); + _connection = new VssConnection(new Uri(config.OrganizationUrl), credentials); + _gitClient = await _connection.GetClientAsync(ct); + } + + public async Task> ListRepositoriesAsync( + ConnectorContext context, + string? searchPattern = null, + CancellationToken ct = default) + { + var repos = await _gitClient!.GetRepositoriesAsync(cancellationToken: ct); + + return repos + .Where(r => searchPattern is null || + r.Name.Contains(searchPattern, StringComparison.OrdinalIgnoreCase)) + .Select(r => new ScmRepository( + Id: r.Id.ToString(), + Name: r.Name, + FullName: $"{r.ProjectReference.Name}/{r.Name}", + DefaultBranch: r.DefaultBranch?.Replace("refs/heads/", "") ?? "main", + CloneUrl: r.RemoteUrl, + IsPrivate: true)) // Azure DevOps repos are always private to org + .ToList(); + } + + // Additional methods... +} +``` + +--- + +## Acceptance Criteria + +- [ ] GitHub connector authenticates +- [ ] GitHub connector lists repositories +- [ ] GitHub connector creates webhooks +- [ ] GitLab connector works +- [ ] Gitea connector works +- [ ] Azure DevOps connector works +- [ ] All connectors handle errors gracefully +- [ ] Webhook secret generation is secure +- [ ] Config validation catches issues +- [ ] Integration tests pass + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 102_002 Connector Runtime | Internal | TODO | +| Octokit | NuGet | Available | +| GitLabApiClient | NuGet | Available | +| Microsoft.TeamFoundationServer.Client | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| GitHubConnector | TODO | | +| GitLabConnector | TODO | | +| GiteaConnector | TODO | | +| AzureDevOpsConnector | TODO | | +| Unit tests | TODO | | +| Integration tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_102_004_INTHUB_registry_connectors.md b/docs/implplan/SPRINT_20260110_102_004_INTHUB_registry_connectors.md new file mode 100644 index 000000000..8e65a1eba --- /dev/null +++ b/docs/implplan/SPRINT_20260110_102_004_INTHUB_registry_connectors.md @@ -0,0 +1,617 @@ +# SPRINT: Built-in Registry Connectors + +> **Sprint ID:** 102_004 +> **Module:** INTHUB +> **Phase:** 2 - Integration Hub +> **Status:** TODO +> **Parent:** [102_000_INDEX](SPRINT_20260110_102_000_INDEX_integration_hub.md) + +--- + +## Overview + +Implement built-in container registry connectors for Docker Hub, Harbor, ACR, ECR, GCR, and generic OCI registries. Each implements `IRegistryConnector`. + +### Objectives + +- Docker Hub connector with rate limit handling +- Harbor connector for self-hosted registries +- Azure Container Registry (ACR) connector +- AWS Elastic Container Registry (ECR) connector +- Google Container Registry (GCR) connector +- Generic OCI connector for any compliant registry + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.IntegrationHub/ +│ └── Connectors/ +│ └── Registry/ +│ ├── DockerHubConnector.cs +│ ├── HarborConnector.cs +│ ├── AcrConnector.cs +│ ├── EcrConnector.cs +│ ├── GcrConnector.cs +│ └── GenericOciConnector.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.IntegrationHub.Tests/ + └── Connectors/ + └── Registry/ +``` + +--- + +## Deliverables + +### DockerHubConnector + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Connectors.Registry; + +public sealed class DockerHubConnector : IRegistryConnector +{ + private const string RegistryUrl = "https://registry-1.docker.io"; + private const string AuthUrl = "https://auth.docker.io/token"; + private const string HubApiUrl = "https://hub.docker.com/v2"; + + public ConnectorCategory Category => ConnectorCategory.Registry; + public IReadOnlyList Capabilities { get; } = [ + "list_repos", "list_tags", "resolve_tag", "get_manifest", "pull_credentials" + ]; + + private HttpClient? _httpClient; + private string? _username; + private string? _password; + + public async Task InitializeAsync( + ConnectorContext context, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + _username = config.Username; + _password = config.Password ?? + await context.SecretResolver.ResolveAsync(config.PasswordSecretRef!, ct); + + _httpClient = new HttpClient(); + } + + public async Task> ListTagsAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default) + { + var token = await GetAuthTokenAsync(repository, "pull", ct); + + var request = new HttpRequestMessage( + HttpMethod.Get, + $"{RegistryUrl}/v2/{repository}/tags/list"); + request.Headers.Authorization = + new AuthenticationHeaderValue("Bearer", token); + + var response = await _httpClient!.SendAsync(request, ct); + response.EnsureSuccessStatusCode(); + + var result = await response.Content + .ReadFromJsonAsync(ct); + + // Get tag details from Hub API + var tags = new List(); + foreach (var tag in result!.Tags) + { + var detail = await GetTagDetailAsync(repository, tag, ct); + tags.Add(new ImageTag( + Name: tag, + Digest: detail?.Digest, + PushedAt: detail?.LastPushed)); + } + + return tags; + } + + public async Task ResolveTagAsync( + ConnectorContext context, + string repository, + string tag, + CancellationToken ct = default) + { + var token = await GetAuthTokenAsync(repository, "pull", ct); + + var request = new HttpRequestMessage( + HttpMethod.Head, + $"{RegistryUrl}/v2/{repository}/manifests/{tag}"); + request.Headers.Authorization = + new AuthenticationHeaderValue("Bearer", token); + request.Headers.Accept.Add(new MediaTypeWithQualityHeaderValue( + "application/vnd.docker.distribution.manifest.v2+json")); + request.Headers.Accept.Add(new MediaTypeWithQualityHeaderValue( + "application/vnd.oci.image.manifest.v1+json")); + + var response = await _httpClient!.SendAsync(request, ct); + if (response.StatusCode == HttpStatusCode.NotFound) + return null; + + response.EnsureSuccessStatusCode(); + + var digest = response.Headers.GetValues("Docker-Content-Digest").First(); + var contentType = response.Content.Headers.ContentType?.MediaType ?? ""; + var size = response.Content.Headers.ContentLength ?? 0; + + return new ImageDigest(digest, contentType, size); + } + + public async Task GetManifestAsync( + ConnectorContext context, + string repository, + string reference, + CancellationToken ct = default) + { + var token = await GetAuthTokenAsync(repository, "pull", ct); + + var request = new HttpRequestMessage( + HttpMethod.Get, + $"{RegistryUrl}/v2/{repository}/manifests/{reference}"); + request.Headers.Authorization = + new AuthenticationHeaderValue("Bearer", token); + request.Headers.Accept.Add(new MediaTypeWithQualityHeaderValue( + "application/vnd.docker.distribution.manifest.v2+json")); + request.Headers.Accept.Add(new MediaTypeWithQualityHeaderValue( + "application/vnd.oci.image.manifest.v1+json")); + + var response = await _httpClient!.SendAsync(request, ct); + if (response.StatusCode == HttpStatusCode.NotFound) + return null; + + response.EnsureSuccessStatusCode(); + + var json = await response.Content.ReadAsStringAsync(ct); + var digest = response.Headers.GetValues("Docker-Content-Digest").First(); + + return new ImageManifest( + Digest: digest, + MediaType: response.Content.Headers.ContentType?.MediaType ?? "", + Size: response.Content.Headers.ContentLength ?? 0, + RawManifest: json); + } + + public Task GetPullCredentialsAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default) + { + return Task.FromResult(new PullCredentials( + Registry: "docker.io", + Username: _username!, + Password: _password!, + ExpiresAt: DateTimeOffset.MaxValue)); + } + + private async Task GetAuthTokenAsync( + string repository, + string scope, + CancellationToken ct) + { + var url = $"{AuthUrl}?service=registry.docker.io&scope=repository:{repository}:{scope}"; + + var request = new HttpRequestMessage(HttpMethod.Get, url); + if (!string.IsNullOrEmpty(_username)) + { + var credentials = Convert.ToBase64String( + Encoding.UTF8.GetBytes($"{_username}:{_password}")); + request.Headers.Authorization = + new AuthenticationHeaderValue("Basic", credentials); + } + + var response = await _httpClient!.SendAsync(request, ct); + response.EnsureSuccessStatusCode(); + + var result = await response.Content.ReadFromJsonAsync(ct); + return result!.Token; + } +} +``` + +### AcrConnector + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Connectors.Registry; + +public sealed class AcrConnector : IRegistryConnector +{ + public ConnectorCategory Category => ConnectorCategory.Registry; + + private ContainerRegistryClient? _client; + private string? _registryUrl; + + public async Task InitializeAsync( + ConnectorContext context, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + _registryUrl = config.RegistryUrl; + + // Support both service principal and managed identity + TokenCredential credential = config.AuthMethod switch + { + "service_principal" => new ClientSecretCredential( + config.TenantId, + config.ClientId, + config.ClientSecret ?? + await context.SecretResolver.ResolveAsync(config.ClientSecretRef!, ct)), + + "managed_identity" => new ManagedIdentityCredential(), + + _ => new DefaultAzureCredential() + }; + + _client = new ContainerRegistryClient( + new Uri($"https://{_registryUrl}"), + credential); + } + + public async Task> ListRepositoriesAsync( + ConnectorContext context, + string? prefix = null, + CancellationToken ct = default) + { + var repos = new List(); + + await foreach (var name in _client!.GetRepositoryNamesAsync(ct)) + { + if (prefix is null || name.StartsWith(prefix, StringComparison.OrdinalIgnoreCase)) + { + repos.Add(new RegistryRepository(name)); + } + } + + return repos; + } + + public async Task> ListTagsAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default) + { + var repo = _client!.GetRepository(repository); + var tags = new List(); + + await foreach (var manifest in repo.GetAllManifestPropertiesAsync(ct)) + { + foreach (var tag in manifest.Tags) + { + tags.Add(new ImageTag( + Name: tag, + Digest: manifest.Digest, + PushedAt: manifest.CreatedOn)); + } + } + + return tags; + } + + public async Task ResolveTagAsync( + ConnectorContext context, + string repository, + string tag, + CancellationToken ct = default) + { + try + { + var repo = _client!.GetRepository(repository); + var artifact = repo.GetArtifact(tag); + var manifest = await artifact.GetManifestPropertiesAsync(ct); + + return new ImageDigest( + Digest: manifest.Value.Digest, + MediaType: manifest.Value.MediaType ?? "", + Size: manifest.Value.SizeInBytes ?? 0); + } + catch (RequestFailedException ex) when (ex.Status == 404) + { + return null; + } + } + + public async Task GetPullCredentialsAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default) + { + // Get short-lived token for pull + var exchangeClient = new ContainerRegistryContentClient( + new Uri($"https://{_registryUrl}"), + repository, + new DefaultAzureCredential()); + + // Use refresh token exchange + return new PullCredentials( + Registry: _registryUrl!, + Username: "00000000-0000-0000-0000-000000000000", + Password: await GetAcrRefreshTokenAsync(ct), + ExpiresAt: DateTimeOffset.UtcNow.AddHours(1)); + } +} +``` + +### EcrConnector + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Connectors.Registry; + +public sealed class EcrConnector : IRegistryConnector +{ + public ConnectorCategory Category => ConnectorCategory.Registry; + + private AmazonECRClient? _ecrClient; + private string? _registryId; + + public async Task InitializeAsync( + ConnectorContext context, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + + AWSCredentials credentials = config.AuthMethod switch + { + "access_key" => new BasicAWSCredentials( + config.AccessKeyId, + config.SecretAccessKey ?? + await context.SecretResolver.ResolveAsync(config.SecretAccessKeyRef!, ct)), + + "assume_role" => new AssumeRoleAWSCredentials( + new BasicAWSCredentials(config.AccessKeyId, config.SecretAccessKey), + config.RoleArn, + "StellaOps"), + + _ => new InstanceProfileAWSCredentials() + }; + + _ecrClient = new AmazonECRClient(credentials, RegionEndpoint.GetBySystemName(config.Region)); + _registryId = config.RegistryId; + } + + public async Task> ListRepositoriesAsync( + ConnectorContext context, + string? prefix = null, + CancellationToken ct = default) + { + var repos = new List(); + string? nextToken = null; + + do + { + var response = await _ecrClient!.DescribeRepositoriesAsync( + new DescribeRepositoriesRequest + { + RegistryId = _registryId, + NextToken = nextToken + }, ct); + + foreach (var repo in response.Repositories) + { + if (prefix is null || + repo.RepositoryName.StartsWith(prefix, StringComparison.OrdinalIgnoreCase)) + { + repos.Add(new RegistryRepository(repo.RepositoryName)); + } + } + + nextToken = response.NextToken; + } while (!string.IsNullOrEmpty(nextToken)); + + return repos; + } + + public async Task> ListTagsAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default) + { + var tags = new List(); + string? nextToken = null; + + do + { + var response = await _ecrClient!.DescribeImagesAsync( + new DescribeImagesRequest + { + RegistryId = _registryId, + RepositoryName = repository, + NextToken = nextToken + }, ct); + + foreach (var image in response.ImageDetails) + { + foreach (var tag in image.ImageTags) + { + tags.Add(new ImageTag( + Name: tag, + Digest: image.ImageDigest, + PushedAt: image.ImagePushedAt)); + } + } + + nextToken = response.NextToken; + } while (!string.IsNullOrEmpty(nextToken)); + + return tags; + } + + public async Task GetPullCredentialsAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default) + { + var response = await _ecrClient!.GetAuthorizationTokenAsync( + new GetAuthorizationTokenRequest + { + RegistryIds = _registryId is not null ? [_registryId] : null + }, ct); + + var auth = response.AuthorizationData.First(); + var decoded = Encoding.UTF8.GetString( + Convert.FromBase64String(auth.AuthorizationToken)); + var parts = decoded.Split(':'); + + return new PullCredentials( + Registry: new Uri(auth.ProxyEndpoint).Host, + Username: parts[0], + Password: parts[1], + ExpiresAt: auth.ExpiresAt); + } +} +``` + +### GenericOciConnector + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Connectors.Registry; + +/// +/// Generic OCI Distribution-compliant registry connector. +/// Works with any registry implementing OCI Distribution Spec. +/// +public sealed class GenericOciConnector : IRegistryConnector +{ + public ConnectorCategory Category => ConnectorCategory.Registry; + + private HttpClient? _httpClient; + private string? _registryUrl; + private string? _username; + private string? _password; + + public async Task InitializeAsync( + ConnectorContext context, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + _registryUrl = config.RegistryUrl.TrimEnd('/'); + _username = config.Username; + _password = config.Password ?? + (config.PasswordSecretRef is not null + ? await context.SecretResolver.ResolveAsync(config.PasswordSecretRef, ct) + : null); + + _httpClient = new HttpClient(); + + if (!string.IsNullOrEmpty(_username)) + { + var credentials = Convert.ToBase64String( + Encoding.UTF8.GetBytes($"{_username}:{_password}")); + _httpClient.DefaultRequestHeaders.Authorization = + new AuthenticationHeaderValue("Basic", credentials); + } + } + + public async Task> ListRepositoriesAsync( + ConnectorContext context, + string? prefix = null, + CancellationToken ct = default) + { + var response = await _httpClient!.GetAsync( + $"{_registryUrl}/v2/_catalog", ct); + response.EnsureSuccessStatusCode(); + + var result = await response.Content + .ReadFromJsonAsync(ct); + + return result!.Repositories + .Where(r => prefix is null || + r.StartsWith(prefix, StringComparison.OrdinalIgnoreCase)) + .Select(r => new RegistryRepository(r)) + .ToList(); + } + + public async Task ResolveTagAsync( + ConnectorContext context, + string repository, + string tag, + CancellationToken ct = default) + { + var request = new HttpRequestMessage( + HttpMethod.Head, + $"{_registryUrl}/v2/{repository}/manifests/{tag}"); + + request.Headers.Accept.Add(new MediaTypeWithQualityHeaderValue( + "application/vnd.oci.image.manifest.v1+json")); + request.Headers.Accept.Add(new MediaTypeWithQualityHeaderValue( + "application/vnd.docker.distribution.manifest.v2+json")); + + var response = await _httpClient!.SendAsync(request, ct); + if (response.StatusCode == HttpStatusCode.NotFound) + return null; + + response.EnsureSuccessStatusCode(); + + var digest = response.Headers + .GetValues("Docker-Content-Digest") + .FirstOrDefault() ?? ""; + var mediaType = response.Content.Headers.ContentType?.MediaType ?? ""; + var size = response.Content.Headers.ContentLength ?? 0; + + return new ImageDigest(digest, mediaType, size); + } + + public Task GetPullCredentialsAsync( + ConnectorContext context, + string repository, + CancellationToken ct = default) + { + var uri = new Uri(_registryUrl!); + return Task.FromResult(new PullCredentials( + Registry: uri.Host, + Username: _username ?? "", + Password: _password ?? "", + ExpiresAt: DateTimeOffset.MaxValue)); + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Docker Hub connector works with rate limiting +- [ ] Harbor connector supports webhooks +- [ ] ACR connector uses Azure Identity +- [ ] ECR connector handles token refresh +- [ ] GCR connector uses GCP credentials +- [ ] Generic OCI connector works with any registry +- [ ] All connectors resolve tags to digests +- [ ] Pull credentials generated correctly +- [ ] Integration tests pass + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 102_002 Connector Runtime | Internal | TODO | +| Azure.Containers.ContainerRegistry | NuGet | Available | +| AWSSDK.ECR | NuGet | Available | +| Google.Cloud.ArtifactRegistry.V1 | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| DockerHubConnector | TODO | | +| HarborConnector | TODO | | +| AcrConnector | TODO | | +| EcrConnector | TODO | | +| GcrConnector | TODO | | +| GenericOciConnector | TODO | | +| Unit tests | TODO | | +| Integration tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_102_005_INTHUB_vault_connector.md b/docs/implplan/SPRINT_20260110_102_005_INTHUB_vault_connector.md new file mode 100644 index 000000000..cf54ed255 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_102_005_INTHUB_vault_connector.md @@ -0,0 +1,503 @@ +# SPRINT: Built-in Vault Connector + +> **Sprint ID:** 102_005 +> **Module:** INTHUB +> **Phase:** 2 - Integration Hub +> **Status:** TODO +> **Parent:** [102_000_INDEX](SPRINT_20260110_102_000_INDEX_integration_hub.md) + +--- + +## Overview + +Implement built-in vault connectors for secrets management: HashiCorp Vault, Azure Key Vault, and AWS Secrets Manager. + +### Objectives + +- HashiCorp Vault connector with multiple auth methods +- Azure Key Vault connector with managed identity support +- AWS Secrets Manager connector +- Unified secret resolution interface + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.IntegrationHub/ +│ └── Connectors/ +│ └── Vault/ +│ ├── IVaultConnector.cs +│ ├── HashiCorpVaultConnector.cs +│ ├── AzureKeyVaultConnector.cs +│ └── AwsSecretsManagerConnector.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### IVaultConnector Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Connectors.Vault; + +public interface IVaultConnector : IConnectorPlugin +{ + Task GetSecretAsync( + ConnectorContext context, + string path, + string? key = null, + CancellationToken ct = default); + + Task> GetSecretsAsync( + ConnectorContext context, + string path, + CancellationToken ct = default); + + Task SetSecretAsync( + ConnectorContext context, + string path, + string key, + string value, + CancellationToken ct = default); + + Task> ListSecretsAsync( + ConnectorContext context, + string? path = null, + CancellationToken ct = default); +} +``` + +### HashiCorpVaultConnector + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Connectors.Vault; + +public sealed class HashiCorpVaultConnector : IVaultConnector +{ + public ConnectorCategory Category => ConnectorCategory.Vault; + + private VaultClient? _client; + + public async Task InitializeAsync( + ConnectorContext context, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + + IAuthMethodInfo authMethod = config.AuthMethod switch + { + "token" => new TokenAuthMethodInfo( + config.Token ?? await context.SecretResolver.ResolveAsync( + config.TokenSecretRef!, ct)), + + "approle" => new AppRoleAuthMethodInfo( + config.RoleId, + config.SecretId ?? await context.SecretResolver.ResolveAsync( + config.SecretIdRef!, ct)), + + "kubernetes" => new KubernetesAuthMethodInfo( + config.Role, + await File.ReadAllTextAsync( + "/var/run/secrets/kubernetes.io/serviceaccount/token", ct)), + + _ => throw new ArgumentException($"Unknown auth method: {config.AuthMethod}") + }; + + var vaultSettings = new VaultClientSettings(config.Address, authMethod); + _client = new VaultClient(vaultSettings); + } + + public async Task GetSecretAsync( + ConnectorContext context, + string path, + string? key = null, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + var mountPoint = config.MountPoint ?? "secret"; + + var secret = await _client!.V1.Secrets.KeyValue.V2 + .ReadSecretAsync(path, mountPoint: mountPoint); + + if (secret?.Data?.Data is null) + return null; + + if (key is null) + { + // Return first value if no key specified + return secret.Data.Data.Values.FirstOrDefault()?.ToString(); + } + + return secret.Data.Data.TryGetValue(key, out var value) + ? value?.ToString() + : null; + } + + public async Task> GetSecretsAsync( + ConnectorContext context, + string path, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + var mountPoint = config.MountPoint ?? "secret"; + + var secret = await _client!.V1.Secrets.KeyValue.V2 + .ReadSecretAsync(path, mountPoint: mountPoint); + + if (secret?.Data?.Data is null) + return new Dictionary(); + + return secret.Data.Data + .ToDictionary( + kvp => kvp.Key, + kvp => kvp.Value?.ToString() ?? ""); + } + + public async Task SetSecretAsync( + ConnectorContext context, + string path, + string key, + string value, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + var mountPoint = config.MountPoint ?? "secret"; + + // Get existing secrets to merge + var existing = await GetSecretsAsync(context, path, ct); + var data = new Dictionary( + existing.ToDictionary(k => k.Key, v => (object)v.Value)) + { + [key] = value + }; + + await _client!.V1.Secrets.KeyValue.V2 + .WriteSecretAsync(path, data, mountPoint: mountPoint); + } + + public async Task> ListSecretsAsync( + ConnectorContext context, + string? path = null, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + var mountPoint = config.MountPoint ?? "secret"; + + var result = await _client!.V1.Secrets.KeyValue.V2 + .ReadSecretPathsAsync(path ?? "", mountPoint: mountPoint); + + return result?.Data?.Keys?.ToList() ?? []; + } +} +``` + +### AzureKeyVaultConnector + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Connectors.Vault; + +public sealed class AzureKeyVaultConnector : IVaultConnector +{ + public ConnectorCategory Category => ConnectorCategory.Vault; + + private SecretClient? _client; + + public async Task InitializeAsync( + ConnectorContext context, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + + TokenCredential credential = config.AuthMethod switch + { + "service_principal" => new ClientSecretCredential( + config.TenantId, + config.ClientId, + config.ClientSecret ?? await context.SecretResolver.ResolveAsync( + config.ClientSecretRef!, ct)), + + "managed_identity" => new ManagedIdentityCredential( + config.ManagedIdentityClientId), + + _ => new DefaultAzureCredential() + }; + + _client = new SecretClient( + new Uri($"https://{config.VaultName}.vault.azure.net/"), + credential); + } + + public async Task GetSecretAsync( + ConnectorContext context, + string path, + string? key = null, + CancellationToken ct = default) + { + try + { + // Azure Key Vault uses flat namespace, path is the secret name + var secretName = key is not null ? $"{path}--{key}" : path; + var response = await _client!.GetSecretAsync(secretName, cancellationToken: ct); + return response.Value.Value; + } + catch (RequestFailedException ex) when (ex.Status == 404) + { + return null; + } + } + + public async Task> GetSecretsAsync( + ConnectorContext context, + string path, + CancellationToken ct = default) + { + var result = new Dictionary(); + + await foreach (var secret in _client!.GetPropertiesOfSecretsAsync(ct)) + { + if (secret.Name.StartsWith(path, StringComparison.OrdinalIgnoreCase)) + { + var value = await _client.GetSecretAsync(secret.Name, cancellationToken: ct); + var key = secret.Name[(path.Length + 2)..]; // Remove prefix and "--" + result[key] = value.Value.Value; + } + } + + return result; + } + + public async Task SetSecretAsync( + ConnectorContext context, + string path, + string key, + string value, + CancellationToken ct = default) + { + var secretName = $"{path}--{key}"; + await _client!.SetSecretAsync(secretName, value, ct); + } + + public async Task> ListSecretsAsync( + ConnectorContext context, + string? path = null, + CancellationToken ct = default) + { + var result = new List(); + + await foreach (var secret in _client!.GetPropertiesOfSecretsAsync(ct)) + { + if (path is null || secret.Name.StartsWith(path, StringComparison.OrdinalIgnoreCase)) + { + result.Add(secret.Name); + } + } + + return result; + } +} +``` + +### AwsSecretsManagerConnector + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Connectors.Vault; + +public sealed class AwsSecretsManagerConnector : IVaultConnector +{ + public ConnectorCategory Category => ConnectorCategory.Vault; + + private AmazonSecretsManagerClient? _client; + + public async Task InitializeAsync( + ConnectorContext context, + CancellationToken ct = default) + { + var config = ParseConfig(context.Configuration); + + AWSCredentials credentials = config.AuthMethod switch + { + "access_key" => new BasicAWSCredentials( + config.AccessKeyId, + config.SecretAccessKey ?? await context.SecretResolver.ResolveAsync( + config.SecretAccessKeyRef!, ct)), + + "assume_role" => new AssumeRoleAWSCredentials( + new BasicAWSCredentials(config.AccessKeyId, config.SecretAccessKey), + config.RoleArn, + "StellaOps"), + + _ => new InstanceProfileAWSCredentials() + }; + + _client = new AmazonSecretsManagerClient( + credentials, + RegionEndpoint.GetBySystemName(config.Region)); + } + + public async Task GetSecretAsync( + ConnectorContext context, + string path, + string? key = null, + CancellationToken ct = default) + { + try + { + var response = await _client!.GetSecretValueAsync( + new GetSecretValueRequest { SecretId = path }, ct); + + var secretValue = response.SecretString; + + if (key is null) + return secretValue; + + // AWS secrets can be JSON, try to parse + try + { + var json = JsonDocument.Parse(secretValue); + if (json.RootElement.TryGetProperty(key, out var prop)) + { + return prop.GetString(); + } + } + catch (JsonException) + { + // Not JSON, return full value + } + + return secretValue; + } + catch (ResourceNotFoundException) + { + return null; + } + } + + public async Task> GetSecretsAsync( + ConnectorContext context, + string path, + CancellationToken ct = default) + { + var secretValue = await GetSecretAsync(context, path, ct: ct); + if (secretValue is null) + return new Dictionary(); + + try + { + var json = JsonDocument.Parse(secretValue); + return json.RootElement.EnumerateObject() + .ToDictionary(p => p.Name, p => p.Value.GetString() ?? ""); + } + catch (JsonException) + { + return new Dictionary { ["value"] = secretValue }; + } + } + + public async Task SetSecretAsync( + ConnectorContext context, + string path, + string key, + string value, + CancellationToken ct = default) + { + // Get existing secret to merge + var existing = await GetSecretsAsync(context, path, ct); + var data = new Dictionary(existing) { [key] = value }; + var json = JsonSerializer.Serialize(data); + + try + { + await _client!.UpdateSecretAsync( + new UpdateSecretRequest + { + SecretId = path, + SecretString = json + }, ct); + } + catch (ResourceNotFoundException) + { + await _client.CreateSecretAsync( + new CreateSecretRequest + { + Name = path, + SecretString = json + }, ct); + } + } + + public async Task> ListSecretsAsync( + ConnectorContext context, + string? path = null, + CancellationToken ct = default) + { + var result = new List(); + string? nextToken = null; + + do + { + var response = await _client!.ListSecretsAsync( + new ListSecretsRequest + { + NextToken = nextToken, + Filters = path is not null + ? [new Filter { Key = FilterNameStringType.Name, Values = [path] }] + : null + }, ct); + + result.AddRange(response.SecretList.Select(s => s.Name)); + nextToken = response.NextToken; + } while (!string.IsNullOrEmpty(nextToken)); + + return result; + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] HashiCorp Vault token auth works +- [ ] HashiCorp Vault AppRole auth works +- [ ] HashiCorp Vault Kubernetes auth works +- [ ] Azure Key Vault service principal works +- [ ] Azure Key Vault managed identity works +- [ ] AWS Secrets Manager IAM auth works +- [ ] All connectors read/write secrets +- [ ] Secret listing works with path prefix +- [ ] Integration tests pass + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 102_002 Connector Runtime | Internal | TODO | +| VaultSharp | NuGet | Available | +| Azure.Security.KeyVault.Secrets | NuGet | Available | +| AWSSDK.SecretsManager | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IVaultConnector | TODO | | +| HashiCorpVaultConnector | TODO | | +| AzureKeyVaultConnector | TODO | | +| AwsSecretsManagerConnector | TODO | | +| Unit tests | TODO | | +| Integration tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_102_006_INTHUB_doctor_checks.md b/docs/implplan/SPRINT_20260110_102_006_INTHUB_doctor_checks.md new file mode 100644 index 000000000..14ca1cef6 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_102_006_INTHUB_doctor_checks.md @@ -0,0 +1,605 @@ +# SPRINT: Doctor Checks + +> **Sprint ID:** 102_006 +> **Module:** INTHUB +> **Phase:** 2 - Integration Hub +> **Status:** TODO +> **Parent:** [102_000_INDEX](SPRINT_20260110_102_000_INDEX_integration_hub.md) + +--- + +## Overview + +Implement Doctor checks that diagnose integration health issues. Checks validate connectivity, credentials, permissions, and rate limit status. + +### Objectives + +- Connectivity check for all integration types +- Credential validation checks +- Permission verification checks +- Rate limit status checks +- Aggregated health report generation + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.IntegrationHub/ +│ └── Doctor/ +│ ├── IDoctorCheck.cs +│ ├── DoctorService.cs +│ ├── Checks/ +│ │ ├── ConnectivityCheck.cs +│ │ ├── CredentialsCheck.cs +│ │ ├── PermissionsCheck.cs +│ │ └── RateLimitCheck.cs +│ └── Reports/ +│ ├── DoctorReport.cs +│ └── CheckResult.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### IDoctorCheck Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Doctor; + +public interface IDoctorCheck +{ + string Name { get; } + string Description { get; } + CheckCategory Category { get; } + + Task ExecuteAsync( + Integration integration, + IConnectorPlugin connector, + CancellationToken ct = default); +} + +public enum CheckCategory +{ + Connectivity, + Credentials, + Permissions, + RateLimit +} + +public sealed record CheckResult( + string CheckName, + CheckStatus Status, + string Message, + IReadOnlyDictionary? Details = null, + TimeSpan Duration = default +) +{ + public static CheckResult Pass(string name, string message, + IReadOnlyDictionary? details = null) => + new(name, CheckStatus.Pass, message, details); + + public static CheckResult Warn(string name, string message, + IReadOnlyDictionary? details = null) => + new(name, CheckStatus.Warning, message, details); + + public static CheckResult Fail(string name, string message, + IReadOnlyDictionary? details = null) => + new(name, CheckStatus.Fail, message, details); + + public static CheckResult Skip(string name, string message) => + new(name, CheckStatus.Skipped, message); +} + +public enum CheckStatus +{ + Pass, + Warning, + Fail, + Skipped +} +``` + +### DoctorService + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Doctor; + +public sealed class DoctorService +{ + private readonly IIntegrationManager _integrationManager; + private readonly IConnectorFactory _connectorFactory; + private readonly IEnumerable _checks; + private readonly ILogger _logger; + + public DoctorService( + IIntegrationManager integrationManager, + IConnectorFactory connectorFactory, + IEnumerable checks, + ILogger logger) + { + _integrationManager = integrationManager; + _connectorFactory = connectorFactory; + _checks = checks; + _logger = logger; + } + + public async Task CheckIntegrationAsync( + Guid integrationId, + CancellationToken ct = default) + { + var integration = await _integrationManager.GetAsync(integrationId, ct) + ?? throw new IntegrationNotFoundException(integrationId); + + var connector = await _connectorFactory.CreateAsync(integration, ct); + var results = new List(); + + foreach (var check in _checks) + { + try + { + var sw = Stopwatch.StartNew(); + var result = await check.ExecuteAsync(integration, connector, ct); + results.Add(result with { Duration = sw.Elapsed }); + } + catch (Exception ex) + { + _logger.LogError(ex, + "Doctor check {CheckName} failed for integration {IntegrationId}", + check.Name, integrationId); + + results.Add(CheckResult.Fail( + check.Name, + $"Check threw exception: {ex.Message}")); + } + } + + return new DoctorReport( + IntegrationId: integrationId, + IntegrationName: integration.Name, + IntegrationType: integration.Type, + CheckedAt: TimeProvider.System.GetUtcNow(), + Results: results, + OverallStatus: DetermineOverallStatus(results)); + } + + public async Task> CheckAllIntegrationsAsync( + CancellationToken ct = default) + { + var integrations = await _integrationManager.ListAsync(ct: ct); + var reports = new List(); + + foreach (var integration in integrations) + { + var report = await CheckIntegrationAsync(integration.Id, ct); + reports.Add(report); + } + + return reports; + } + + private static HealthStatus DetermineOverallStatus( + IReadOnlyList results) + { + if (results.Any(r => r.Status == CheckStatus.Fail)) + return HealthStatus.Unhealthy; + + if (results.Any(r => r.Status == CheckStatus.Warning)) + return HealthStatus.Degraded; + + return HealthStatus.Healthy; + } +} + +public sealed record DoctorReport( + Guid IntegrationId, + string IntegrationName, + IntegrationType IntegrationType, + DateTimeOffset CheckedAt, + IReadOnlyList Results, + HealthStatus OverallStatus +); +``` + +### ConnectivityCheck + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Doctor.Checks; + +public sealed class ConnectivityCheck : IDoctorCheck +{ + public string Name => "connectivity"; + public string Description => "Verifies network connectivity to the integration endpoint"; + public CheckCategory Category => CheckCategory.Connectivity; + + public async Task ExecuteAsync( + Integration integration, + IConnectorPlugin connector, + CancellationToken ct = default) + { + try + { + var result = await connector.TestConnectionAsync( + new ConnectorContext( + integration.Id, + integration.TenantId, + default, // Config already loaded in connector + null!, + NullLogger.Instance), + ct); + + if (result.Success) + { + return CheckResult.Pass( + Name, + $"Connected successfully in {result.ResponseTime.TotalMilliseconds:F0}ms", + new Dictionary + { + ["response_time_ms"] = result.ResponseTime.TotalMilliseconds + }); + } + + return CheckResult.Fail( + Name, + $"Connection failed: {result.Message}"); + } + catch (HttpRequestException ex) + { + return CheckResult.Fail( + Name, + $"Network error: {ex.Message}", + new Dictionary + { + ["exception_type"] = ex.GetType().Name + }); + } + catch (TaskCanceledException) + { + return CheckResult.Fail(Name, "Connection timed out"); + } + } +} +``` + +### CredentialsCheck + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Doctor.Checks; + +public sealed class CredentialsCheck : IDoctorCheck +{ + public string Name => "credentials"; + public string Description => "Validates that credentials are valid and not expired"; + public CheckCategory Category => CheckCategory.Credentials; + + public async Task ExecuteAsync( + Integration integration, + IConnectorPlugin connector, + CancellationToken ct = default) + { + // First verify we can connect + var connectionResult = await connector.TestConnectionAsync( + CreateContext(integration), ct); + + if (!connectionResult.Success) + { + // Check if it's specifically a credential issue + if (IsCredentialError(connectionResult.Message)) + { + return CheckResult.Fail( + Name, + $"Invalid credentials: {connectionResult.Message}", + new Dictionary + { + ["error_type"] = "authentication_failed" + }); + } + + return CheckResult.Skip( + Name, + "Skipped: connectivity check failed first"); + } + + // Check for expiring credentials if applicable + if (connector is ICredentialExpiration credExpiration) + { + var expiration = await credExpiration.GetCredentialExpirationAsync(ct); + if (expiration.HasValue) + { + var remaining = expiration.Value - TimeProvider.System.GetUtcNow(); + + if (remaining < TimeSpan.Zero) + { + return CheckResult.Fail( + Name, + "Credentials have expired", + new Dictionary + { + ["expired_at"] = expiration.Value.ToString("O") + }); + } + + if (remaining < TimeSpan.FromDays(7)) + { + return CheckResult.Warn( + Name, + $"Credentials expire in {remaining.Days} days", + new Dictionary + { + ["expires_at"] = expiration.Value.ToString("O"), + ["days_remaining"] = remaining.Days + }); + } + } + } + + return CheckResult.Pass(Name, "Credentials are valid"); + } + + private static bool IsCredentialError(string? message) + { + if (message is null) return false; + + var credentialKeywords = new[] + { + "401", "unauthorized", "authentication", + "invalid token", "invalid credentials", + "access denied", "forbidden" + }; + + return credentialKeywords.Any(k => + message.Contains(k, StringComparison.OrdinalIgnoreCase)); + } +} +``` + +### PermissionsCheck + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Doctor.Checks; + +public sealed class PermissionsCheck : IDoctorCheck +{ + public string Name => "permissions"; + public string Description => "Verifies the integration has required permissions"; + public CheckCategory Category => CheckCategory.Permissions; + + public async Task ExecuteAsync( + Integration integration, + IConnectorPlugin connector, + CancellationToken ct = default) + { + var requiredCapabilities = GetRequiredCapabilities(integration.Type); + var availableCapabilities = connector.GetCapabilities(); + + var missing = requiredCapabilities + .Except(availableCapabilities) + .ToList(); + + if (missing.Count > 0) + { + return CheckResult.Warn( + Name, + $"Missing capabilities: {string.Join(", ", missing)}", + new Dictionary + { + ["missing_capabilities"] = missing, + ["available_capabilities"] = availableCapabilities + }); + } + + // Type-specific permission checks + var specificResult = integration.Type switch + { + IntegrationType.Scm => await CheckScmPermissionsAsync( + (IScmConnector)connector, ct), + IntegrationType.Registry => await CheckRegistryPermissionsAsync( + (IRegistryConnector)connector, ct), + IntegrationType.Vault => await CheckVaultPermissionsAsync( + (IVaultConnector)connector, ct), + _ => null + }; + + if (specificResult is not null && specificResult.Status != CheckStatus.Pass) + { + return specificResult; + } + + return CheckResult.Pass( + Name, + "All required permissions available", + new Dictionary + { + ["capabilities"] = availableCapabilities + }); + } + + private async Task CheckScmPermissionsAsync( + IScmConnector connector, + CancellationToken ct) + { + try + { + // Try to list repos to verify read access + var repos = await connector.ListRepositoriesAsync( + CreateContext(), null, ct); + + return repos.Count == 0 + ? CheckResult.Warn(Name, "No repositories accessible") + : null; + } + catch (Exception ex) + { + return CheckResult.Fail( + Name, + $"Cannot list repositories: {ex.Message}"); + } + } + + private async Task CheckRegistryPermissionsAsync( + IRegistryConnector connector, + CancellationToken ct) + { + try + { + var repos = await connector.ListRepositoriesAsync( + CreateContext(), null, ct); + + return repos.Count == 0 + ? CheckResult.Warn(Name, "No repositories accessible") + : null; + } + catch (Exception ex) + { + return CheckResult.Fail( + Name, + $"Cannot list repositories: {ex.Message}"); + } + } + + private async Task CheckVaultPermissionsAsync( + IVaultConnector connector, + CancellationToken ct) + { + try + { + var secrets = await connector.ListSecretsAsync(CreateContext(), ct: ct); + return null; // Can list = has permissions + } + catch (Exception ex) + { + return CheckResult.Fail( + Name, + $"Cannot list secrets: {ex.Message}"); + } + } +} +``` + +### RateLimitCheck + +```csharp +namespace StellaOps.ReleaseOrchestrator.IntegrationHub.Doctor.Checks; + +public sealed class RateLimitCheck : IDoctorCheck +{ + public string Name => "rate_limit"; + public string Description => "Checks remaining API rate limit quota"; + public CheckCategory Category => CheckCategory.RateLimit; + + public async Task ExecuteAsync( + Integration integration, + IConnectorPlugin connector, + CancellationToken ct = default) + { + if (connector is not IRateLimitInfo rateLimitInfo) + { + return CheckResult.Skip( + Name, + "Connector does not expose rate limit information"); + } + + try + { + var info = await rateLimitInfo.GetRateLimitInfoAsync(ct); + + var percentUsed = info.Limit > 0 + ? (double)(info.Limit - info.Remaining) / info.Limit * 100 + : 0; + + var details = new Dictionary + { + ["limit"] = info.Limit, + ["remaining"] = info.Remaining, + ["reset_at"] = info.ResetAt?.ToString("O") ?? "unknown", + ["percent_used"] = percentUsed + }; + + if (info.Remaining == 0) + { + return CheckResult.Fail( + Name, + $"Rate limit exhausted, resets at {info.ResetAt:HH:mm:ss}", + details); + } + + if (percentUsed > 80) + { + return CheckResult.Warn( + Name, + $"Rate limit {percentUsed:F0}% consumed ({info.Remaining}/{info.Limit} remaining)", + details); + } + + return CheckResult.Pass( + Name, + $"Rate limit healthy: {info.Remaining}/{info.Limit} remaining", + details); + } + catch (Exception ex) + { + return CheckResult.Skip( + Name, + $"Could not retrieve rate limit info: {ex.Message}"); + } + } +} + +public interface IRateLimitInfo +{ + Task GetRateLimitInfoAsync(CancellationToken ct = default); +} + +public sealed record RateLimitStatus( + int Limit, + int Remaining, + DateTimeOffset? ResetAt +); +``` + +--- + +## Acceptance Criteria + +- [ ] Connectivity check works for all types +- [ ] Credential check detects auth failures +- [ ] Credential expiration warning works +- [ ] Permission check verifies capabilities +- [ ] Rate limit check warns on low quota +- [ ] Doctor report aggregates all results +- [ ] Check all integrations at once works +- [ ] Health status updates after checks +- [ ] Unit test coverage ≥85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 102_001 Integration Manager | Internal | TODO | +| 102_002 Connector Runtime | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IDoctorCheck interface | TODO | | +| DoctorService | TODO | | +| ConnectivityCheck | TODO | | +| CredentialsCheck | TODO | | +| PermissionsCheck | TODO | | +| RateLimitCheck | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_103_000_INDEX_environment_manager.md b/docs/implplan/SPRINT_20260110_103_000_INDEX_environment_manager.md new file mode 100644 index 000000000..7ffd35a39 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_103_000_INDEX_environment_manager.md @@ -0,0 +1,197 @@ +# SPRINT INDEX: Phase 3 - Environment Manager + +> **Epic:** Release Orchestrator +> **Phase:** 3 - Environment Manager +> **Batch:** 103 +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_release_orchestrator.md) + +--- + +## Overview + +Phase 3 implements the Environment Manager - managing deployment environments (Dev, Stage, Prod), targets within environments, and agent registration. + +### Objectives + +- Environment CRUD with promotion order +- Target registry for deployment destinations +- Agent registration and lifecycle +- Inventory synchronization from targets + +--- + +## Sprint Structure + +| Sprint ID | Title | Module | Status | Dependencies | +|-----------|-------|--------|--------|--------------| +| 103_001 | Environment CRUD | ENVMGR | TODO | 101_001 | +| 103_002 | Target Registry | ENVMGR | TODO | 103_001 | +| 103_003 | Agent Manager - Core | ENVMGR | TODO | 103_002 | +| 103_004 | Inventory Sync | ENVMGR | TODO | 103_002, 103_003 | + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ ENVIRONMENT MANAGER │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ ENVIRONMENT SERVICE (103_001) │ │ +│ │ │ │ +│ │ - Create/Update/Delete environments │ │ +│ │ - Promotion order management │ │ +│ │ - Freeze window configuration │ │ +│ │ - Auto-promotion rules │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ TARGET REGISTRY (103_002) │ │ +│ │ │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │Docker Host │ │Compose Host │ │ECS Service │ │ Nomad Job │ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ │ │ │ +│ │ - Target registration - Health monitoring │ │ +│ │ - Connection validation - Capability detection │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ AGENT MANAGER (103_003) │ │ +│ │ │ │ +│ │ - Agent registration flow - Certificate issuance │ │ +│ │ - Heartbeat processing - Capability registration │ │ +│ │ - Agent lifecycle (active/inactive/revoked) │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ INVENTORY SYNC (103_004) │ │ +│ │ │ │ +│ │ - Pull current state from targets │ │ +│ │ - Detect drift from expected state │ │ +│ │ - Container inventory snapshot │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Deliverables Summary + +### 103_001: Environment CRUD + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IEnvironmentService` | Interface | Environment operations | +| `EnvironmentService` | Class | Implementation | +| `Environment` | Model | Environment entity | +| `FreezeWindow` | Model | Deployment freeze windows | +| `EnvironmentValidator` | Class | Business rule validation | + +### 103_002: Target Registry + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `ITargetRegistry` | Interface | Target registration | +| `TargetRegistry` | Class | Implementation | +| `Target` | Model | Deployment target entity | +| `TargetType` | Enum | docker_host, compose_host, ecs_service, nomad_job | +| `TargetHealthChecker` | Class | Health monitoring | + +### 103_003: Agent Manager - Core + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IAgentManager` | Interface | Agent lifecycle | +| `AgentManager` | Class | Implementation | +| `AgentRegistration` | Flow | One-time token registration | +| `AgentCertificateService` | Class | mTLS certificate issuance | +| `HeartbeatProcessor` | Class | Process agent heartbeats | + +### 103_004: Inventory Sync + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IInventorySyncService` | Interface | Sync operations | +| `InventorySyncService` | Class | Implementation | +| `InventorySnapshot` | Model | Container state snapshot | +| `DriftDetector` | Class | Detect configuration drift | + +--- + +## Key Interfaces + +```csharp +public interface IEnvironmentService +{ + Task CreateAsync(CreateEnvironmentRequest request, CancellationToken ct); + Task UpdateAsync(Guid id, UpdateEnvironmentRequest request, CancellationToken ct); + Task DeleteAsync(Guid id, CancellationToken ct); + Task GetAsync(Guid id, CancellationToken ct); + Task> ListAsync(CancellationToken ct); + Task ReorderAsync(IReadOnlyList orderedIds, CancellationToken ct); + Task IsFrozenAsync(Guid id, CancellationToken ct); +} + +public interface ITargetRegistry +{ + Task RegisterAsync(RegisterTargetRequest request, CancellationToken ct); + Task UpdateAsync(Guid id, UpdateTargetRequest request, CancellationToken ct); + Task UnregisterAsync(Guid id, CancellationToken ct); + Task GetAsync(Guid id, CancellationToken ct); + Task> ListByEnvironmentAsync(Guid environmentId, CancellationToken ct); + Task UpdateHealthAsync(Guid id, HealthStatus status, CancellationToken ct); +} + +public interface IAgentManager +{ + Task CreateRegistrationTokenAsync(CreateTokenRequest request, CancellationToken ct); + Task RegisterAsync(AgentRegistrationRequest request, CancellationToken ct); + Task ProcessHeartbeatAsync(AgentHeartbeat heartbeat, CancellationToken ct); + Task GetAsync(Guid id, CancellationToken ct); + Task RevokeAsync(Guid id, CancellationToken ct); +} +``` + +--- + +## Dependencies + +### External Dependencies + +| Dependency | Purpose | +|------------|---------| +| PostgreSQL 16+ | Database | +| gRPC | Agent communication | + +### Internal Dependencies + +| Module | Purpose | +|--------|---------| +| 101_001 Database Schema | Tables | +| Authority | Tenant context, PKI | + +--- + +## Acceptance Criteria + +- [ ] Environment CRUD with ordering +- [ ] Freeze window blocks deployments +- [ ] Target types validated +- [ ] Agent registration flow works +- [ ] mTLS certificates issued +- [ ] Heartbeats update status +- [ ] Inventory snapshot captured +- [ ] Drift detection works +- [ ] Unit test coverage ≥80% + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Phase 3 index created | diff --git a/docs/implplan/SPRINT_20260110_103_001_ENVMGR_environment.md b/docs/implplan/SPRINT_20260110_103_001_ENVMGR_environment.md new file mode 100644 index 000000000..ef571958e --- /dev/null +++ b/docs/implplan/SPRINT_20260110_103_001_ENVMGR_environment.md @@ -0,0 +1,415 @@ +# SPRINT: Environment CRUD + +> **Sprint ID:** 103_001 +> **Module:** ENVMGR +> **Phase:** 3 - Environment Manager +> **Status:** TODO +> **Parent:** [103_000_INDEX](SPRINT_20260110_103_000_INDEX_environment_manager.md) + +--- + +## Overview + +Implement Environment CRUD operations including promotion order management, freeze windows, and auto-promotion configuration. + +### Objectives + +- Create/Read/Update/Delete environments +- Manage promotion order (Dev → Stage → Prod) +- Configure freeze windows for deployment blocks +- Set up auto-promotion rules between environments + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Environment/ +│ ├── Services/ +│ │ ├── IEnvironmentService.cs +│ │ ├── EnvironmentService.cs +│ │ └── EnvironmentValidator.cs +│ ├── Store/ +│ │ ├── IEnvironmentStore.cs +│ │ ├── EnvironmentStore.cs +│ │ └── EnvironmentMapper.cs +│ ├── FreezeWindow/ +│ │ ├── IFreezeWindowService.cs +│ │ ├── FreezeWindowService.cs +│ │ └── FreezeWindowChecker.cs +│ ├── Models/ +│ │ ├── Environment.cs +│ │ ├── FreezeWindow.cs +│ │ ├── EnvironmentConfig.cs +│ │ └── PromotionPolicy.cs +│ └── Events/ +│ ├── EnvironmentCreated.cs +│ ├── EnvironmentUpdated.cs +│ └── FreezeWindowActivated.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Environment.Tests/ +``` + +--- + +## Architecture Reference + +- [Environment Manager](../modules/release-orchestrator/modules/environment-manager.md) +- [Data Model Schema](../modules/release-orchestrator/data-model/schema.md) + +--- + +## Deliverables + +### IEnvironmentService Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Services; + +public interface IEnvironmentService +{ + Task CreateAsync(CreateEnvironmentRequest request, CancellationToken ct = default); + Task UpdateAsync(Guid id, UpdateEnvironmentRequest request, CancellationToken ct = default); + Task DeleteAsync(Guid id, CancellationToken ct = default); + Task GetAsync(Guid id, CancellationToken ct = default); + Task GetByNameAsync(string name, CancellationToken ct = default); + Task> ListAsync(CancellationToken ct = default); + Task> ListOrderedAsync(CancellationToken ct = default); + Task ReorderAsync(IReadOnlyList orderedIds, CancellationToken ct = default); + Task GetNextPromotionTargetAsync(Guid environmentId, CancellationToken ct = default); +} + +public sealed record CreateEnvironmentRequest( + string Name, + string DisplayName, + string? Description, + int OrderIndex, + bool IsProduction, + int RequiredApprovals, + bool RequireSeparationOfDuties, + Guid? AutoPromoteFrom, + int DeploymentTimeoutSeconds +); + +public sealed record UpdateEnvironmentRequest( + string? DisplayName = null, + string? Description = null, + int? OrderIndex = null, + bool? IsProduction = null, + int? RequiredApprovals = null, + bool? RequireSeparationOfDuties = null, + Guid? AutoPromoteFrom = null, + int? DeploymentTimeoutSeconds = null +); +``` + +### Environment Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Models; + +public sealed record Environment +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required string Name { get; init; } + public required string DisplayName { get; init; } + public string? Description { get; init; } + public required int OrderIndex { get; init; } + public required bool IsProduction { get; init; } + public required int RequiredApprovals { get; init; } + public required bool RequireSeparationOfDuties { get; init; } + public Guid? AutoPromoteFrom { get; init; } + public required int DeploymentTimeoutSeconds { get; init; } + public DateTimeOffset CreatedAt { get; init; } + public DateTimeOffset UpdatedAt { get; init; } + public Guid CreatedBy { get; init; } +} +``` + +### IFreezeWindowService Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.FreezeWindow; + +public interface IFreezeWindowService +{ + Task CreateAsync(CreateFreezeWindowRequest request, CancellationToken ct = default); + Task UpdateAsync(Guid id, UpdateFreezeWindowRequest request, CancellationToken ct = default); + Task DeleteAsync(Guid id, CancellationToken ct = default); + Task> ListByEnvironmentAsync(Guid environmentId, CancellationToken ct = default); + Task IsEnvironmentFrozenAsync(Guid environmentId, CancellationToken ct = default); + Task GetActiveFreezeWindowAsync(Guid environmentId, CancellationToken ct = default); + Task GrantExemptionAsync(Guid freezeWindowId, GrantExemptionRequest request, CancellationToken ct = default); +} + +public sealed record FreezeWindow +{ + public required Guid Id { get; init; } + public required Guid EnvironmentId { get; init; } + public required string Name { get; init; } + public required DateTimeOffset StartAt { get; init; } + public required DateTimeOffset EndAt { get; init; } + public string? Reason { get; init; } + public bool IsRecurring { get; init; } + public string? RecurrenceRule { get; init; } // iCal RRULE format + public DateTimeOffset CreatedAt { get; init; } + public Guid CreatedBy { get; init; } +} + +public sealed record CreateFreezeWindowRequest( + Guid EnvironmentId, + string Name, + DateTimeOffset StartAt, + DateTimeOffset EndAt, + string? Reason, + bool IsRecurring = false, + string? RecurrenceRule = null +); +``` + +### EnvironmentValidator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Services; + +public sealed class EnvironmentValidator +{ + private readonly IEnvironmentStore _store; + + public EnvironmentValidator(IEnvironmentStore store) + { + _store = store; + } + + public async Task ValidateCreateAsync( + CreateEnvironmentRequest request, + CancellationToken ct = default) + { + var errors = new List(); + + // Name format validation + if (!IsValidEnvironmentName(request.Name)) + { + errors.Add("Environment name must be lowercase alphanumeric with hyphens, 2-32 characters"); + } + + // Check for duplicate name + var existing = await _store.GetByNameAsync(request.Name, ct); + if (existing is not null) + { + errors.Add($"Environment with name '{request.Name}' already exists"); + } + + // Check for duplicate order index + var existingOrder = await _store.GetByOrderIndexAsync(request.OrderIndex, ct); + if (existingOrder is not null) + { + errors.Add($"Environment with order index {request.OrderIndex} already exists"); + } + + // Validate auto-promote reference + if (request.AutoPromoteFrom.HasValue) + { + var sourceEnv = await _store.GetAsync(request.AutoPromoteFrom.Value, ct); + if (sourceEnv is null) + { + errors.Add("Auto-promote source environment not found"); + } + else if (sourceEnv.OrderIndex >= request.OrderIndex) + { + errors.Add("Auto-promote source must have lower order index (earlier in pipeline)"); + } + } + + // Production environment validation + if (request.IsProduction && request.RequiredApprovals < 1) + { + errors.Add("Production environments must require at least 1 approval"); + } + + return errors.Count == 0 + ? ValidationResult.Success() + : ValidationResult.Failure(errors); + } + + public async Task ValidateReorderAsync( + IReadOnlyList orderedIds, + CancellationToken ct = default) + { + var errors = new List(); + var allEnvironments = await _store.ListAsync(ct); + + // Check all environments are included + var existingIds = allEnvironments.Select(e => e.Id).ToHashSet(); + var providedIds = orderedIds.ToHashSet(); + + if (!existingIds.SetEquals(providedIds)) + { + errors.Add("Reorder must include all existing environments exactly once"); + } + + // Check no duplicates + if (orderedIds.Count != orderedIds.Distinct().Count()) + { + errors.Add("Reorder list contains duplicate environment IDs"); + } + + // Validate auto-promote chains don't break + foreach (var env in allEnvironments.Where(e => e.AutoPromoteFrom.HasValue)) + { + var sourceIndex = orderedIds.ToList().IndexOf(env.AutoPromoteFrom!.Value); + var targetIndex = orderedIds.ToList().IndexOf(env.Id); + + if (sourceIndex >= targetIndex) + { + errors.Add($"Reorder would break auto-promote chain: {env.Name} must come after its source"); + } + } + + return errors.Count == 0 + ? ValidationResult.Success() + : ValidationResult.Failure(errors); + } + + private static bool IsValidEnvironmentName(string name) => + Regex.IsMatch(name, @"^[a-z][a-z0-9-]{1,31}$"); +} +``` + +### Domain Events + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Events; + +public sealed record EnvironmentCreated( + Guid EnvironmentId, + Guid TenantId, + string Name, + int OrderIndex, + bool IsProduction, + DateTimeOffset CreatedAt, + Guid CreatedBy +) : IDomainEvent; + +public sealed record EnvironmentUpdated( + Guid EnvironmentId, + Guid TenantId, + IReadOnlyList ChangedFields, + DateTimeOffset UpdatedAt, + Guid UpdatedBy +) : IDomainEvent; + +public sealed record EnvironmentDeleted( + Guid EnvironmentId, + Guid TenantId, + string Name, + DateTimeOffset DeletedAt, + Guid DeletedBy +) : IDomainEvent; + +public sealed record FreezeWindowActivated( + Guid FreezeWindowId, + Guid EnvironmentId, + Guid TenantId, + DateTimeOffset StartAt, + DateTimeOffset EndAt, + string? Reason +) : IDomainEvent; + +public sealed record FreezeWindowDeactivated( + Guid FreezeWindowId, + Guid EnvironmentId, + Guid TenantId, + DateTimeOffset EndedAt +) : IDomainEvent; +``` + +### Documentation Deliverables + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `docs/modules/release-orchestrator/api/environments.md` (partial) | Markdown | API endpoint documentation for environment management (CRUD, freeze windows) | + +--- + +## Acceptance Criteria + +### Code +- [ ] Create environment with all fields +- [ ] Update environment preserves audit fields +- [ ] Delete environment checks for targets/releases +- [ ] List environments returns ordered by OrderIndex +- [ ] Reorder validates chain integrity +- [ ] Auto-promote reference validated +- [ ] Freeze window blocks deployments +- [ ] Freeze window exemptions work +- [ ] Recurring freeze windows calculated correctly +- [ ] Domain events published +- [ ] Unit test coverage ≥85% + +### Documentation +- [ ] API documentation created for environment endpoints +- [ ] All environment CRUD endpoints documented with request/response schemas +- [ ] Freeze window endpoints documented +- [ ] Cross-references to environment-manager.md added + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `CreateEnvironment_ValidRequest_Succeeds` | Valid creation works | +| `CreateEnvironment_DuplicateName_Fails` | Duplicate name rejected | +| `CreateEnvironment_DuplicateOrder_Fails` | Duplicate order rejected | +| `UpdateEnvironment_ValidRequest_Succeeds` | Update works | +| `DeleteEnvironment_WithTargets_Fails` | Cannot delete with children | +| `Reorder_AllEnvironments_Succeeds` | Reorder works | +| `Reorder_BreaksAutoPromote_Fails` | Chain validation works | +| `IsFrozen_ActiveWindow_ReturnsTrue` | Freeze detection works | +| `IsFrozen_WithExemption_ReturnsFalse` | Exemption works | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `EnvironmentLifecycle_E2E` | Full CRUD cycle | +| `FreezeWindowRecurrence_E2E` | Recurring windows | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 101_001 Database Schema | Internal | TODO | +| Authority | Internal | Exists | +| ICal.Net | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IEnvironmentService | TODO | | +| EnvironmentService | TODO | | +| EnvironmentValidator | TODO | | +| IFreezeWindowService | TODO | | +| FreezeWindowService | TODO | | +| FreezeWindowChecker | TODO | | +| Domain events | TODO | | +| Unit tests | TODO | | +| Integration tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 11-Jan-2026 | Added documentation deliverable: api/environments.md (partial) | diff --git a/docs/implplan/SPRINT_20260110_103_002_ENVMGR_target_registry.md b/docs/implplan/SPRINT_20260110_103_002_ENVMGR_target_registry.md new file mode 100644 index 000000000..302226cd2 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_103_002_ENVMGR_target_registry.md @@ -0,0 +1,421 @@ +# SPRINT: Target Registry + +> **Sprint ID:** 103_002 +> **Module:** ENVMGR +> **Phase:** 3 - Environment Manager +> **Status:** TODO +> **Parent:** [103_000_INDEX](SPRINT_20260110_103_000_INDEX_environment_manager.md) + +--- + +## Overview + +Implement the Target Registry for managing deployment targets within environments. Targets represent where containers are deployed (Docker hosts, Compose hosts, ECS services, Nomad jobs). + +### Objectives + +- Register deployment targets in environments +- Support multiple target types (docker_host, compose_host, ecs_service, nomad_job) +- Validate target connection configurations +- Track target health status +- Manage target-agent associations + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Environment/ +│ ├── Target/ +│ │ ├── ITargetRegistry.cs +│ │ ├── TargetRegistry.cs +│ │ ├── TargetValidator.cs +│ │ └── TargetConnectionTester.cs +│ ├── Store/ +│ │ ├── ITargetStore.cs +│ │ └── TargetStore.cs +│ ├── Health/ +│ │ ├── ITargetHealthChecker.cs +│ │ ├── TargetHealthChecker.cs +│ │ └── HealthCheckScheduler.cs +│ └── Models/ +│ ├── Target.cs +│ ├── TargetType.cs +│ ├── TargetConfig.cs +│ └── TargetHealthStatus.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Environment.Tests/ + └── Target/ +``` + +--- + +## Architecture Reference + +- [Environment Manager](../modules/release-orchestrator/modules/environment-manager.md) +- [Agents](../modules/release-orchestrator/modules/agents.md) + +--- + +## Deliverables + +### ITargetRegistry Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Target; + +public interface ITargetRegistry +{ + Task RegisterAsync(RegisterTargetRequest request, CancellationToken ct = default); + Task UpdateAsync(Guid id, UpdateTargetRequest request, CancellationToken ct = default); + Task UnregisterAsync(Guid id, CancellationToken ct = default); + Task GetAsync(Guid id, CancellationToken ct = default); + Task GetByNameAsync(Guid environmentId, string name, CancellationToken ct = default); + Task> ListByEnvironmentAsync(Guid environmentId, CancellationToken ct = default); + Task> ListByAgentAsync(Guid agentId, CancellationToken ct = default); + Task> ListHealthyAsync(Guid environmentId, CancellationToken ct = default); + Task AssignAgentAsync(Guid targetId, Guid agentId, CancellationToken ct = default); + Task UnassignAgentAsync(Guid targetId, CancellationToken ct = default); + Task UpdateHealthAsync(Guid id, HealthStatus status, string? message, CancellationToken ct = default); + Task TestConnectionAsync(Guid id, CancellationToken ct = default); +} + +public sealed record RegisterTargetRequest( + Guid EnvironmentId, + string Name, + string DisplayName, + TargetType Type, + TargetConnectionConfig ConnectionConfig, + Guid? AgentId = null +); + +public sealed record UpdateTargetRequest( + string? DisplayName = null, + TargetConnectionConfig? ConnectionConfig = null, + Guid? AgentId = null +); +``` + +### Target Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Models; + +public sealed record Target +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required Guid EnvironmentId { get; init; } + public required string Name { get; init; } + public required string DisplayName { get; init; } + public required TargetType Type { get; init; } + public Guid? AgentId { get; init; } + public required HealthStatus HealthStatus { get; init; } + public string? HealthMessage { get; init; } + public DateTimeOffset? LastHealthCheck { get; init; } + public DateTimeOffset? LastSyncAt { get; init; } + public InventorySnapshot? InventorySnapshot { get; init; } + public DateTimeOffset CreatedAt { get; init; } + public DateTimeOffset UpdatedAt { get; init; } +} + +public enum TargetType +{ + DockerHost, + ComposeHost, + EcsService, + NomadJob +} + +public enum HealthStatus +{ + Unknown, + Healthy, + Degraded, + Unhealthy, + Unreachable +} +``` + +### TargetConnectionConfig + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Models; + +public abstract record TargetConnectionConfig +{ + public abstract TargetType TargetType { get; } +} + +public sealed record DockerHostConfig : TargetConnectionConfig +{ + public override TargetType TargetType => TargetType.DockerHost; + public required string Host { get; init; } + public int Port { get; init; } = 2376; + public bool UseTls { get; init; } = true; + public string? CaCertSecretRef { get; init; } + public string? ClientCertSecretRef { get; init; } + public string? ClientKeySecretRef { get; init; } +} + +public sealed record ComposeHostConfig : TargetConnectionConfig +{ + public override TargetType TargetType => TargetType.ComposeHost; + public required string Host { get; init; } + public int Port { get; init; } = 2376; + public bool UseTls { get; init; } = true; + public required string ComposeProjectPath { get; init; } + public string? ComposeFile { get; init; } = "docker-compose.yml"; + public string? CaCertSecretRef { get; init; } + public string? ClientCertSecretRef { get; init; } + public string? ClientKeySecretRef { get; init; } +} + +public sealed record EcsServiceConfig : TargetConnectionConfig +{ + public override TargetType TargetType => TargetType.EcsService; + public required string Region { get; init; } + public required string ClusterArn { get; init; } + public required string ServiceName { get; init; } + public string? RoleArn { get; init; } + public string? AccessKeyIdSecretRef { get; init; } + public string? SecretAccessKeySecretRef { get; init; } +} + +public sealed record NomadJobConfig : TargetConnectionConfig +{ + public override TargetType TargetType => TargetType.NomadJob; + public required string Address { get; init; } + public required string Namespace { get; init; } + public required string JobId { get; init; } + public string? TokenSecretRef { get; init; } + public bool UseTls { get; init; } = true; +} +``` + +### TargetHealthChecker + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Health; + +public interface ITargetHealthChecker +{ + Task CheckAsync(Target target, CancellationToken ct = default); +} + +public sealed class TargetHealthChecker : ITargetHealthChecker +{ + private readonly ITargetConnectionTester _connectionTester; + private readonly IAgentManager _agentManager; + private readonly ILogger _logger; + + public async Task CheckAsync( + Target target, + CancellationToken ct = default) + { + var sw = Stopwatch.StartNew(); + + try + { + // If target has assigned agent, check via agent + if (target.AgentId.HasValue) + { + return await CheckViaAgentAsync(target, ct); + } + + // Otherwise, check directly + return await CheckDirectlyAsync(target, ct); + } + catch (Exception ex) + { + _logger.LogWarning(ex, + "Health check failed for target {TargetId}", + target.Id); + + return new HealthCheckResult( + Status: HealthStatus.Unreachable, + Message: ex.Message, + Duration: sw.Elapsed, + CheckedAt: TimeProvider.System.GetUtcNow() + ); + } + } + + private async Task CheckViaAgentAsync( + Target target, + CancellationToken ct) + { + var agent = await _agentManager.GetAsync(target.AgentId!.Value, ct); + if (agent is null || agent.Status != AgentStatus.Active) + { + return new HealthCheckResult( + Status: HealthStatus.Unreachable, + Message: "Assigned agent is not active", + Duration: TimeSpan.Zero, + CheckedAt: TimeProvider.System.GetUtcNow() + ); + } + + // Dispatch health check task to agent + var result = await _agentManager.ExecuteTaskAsync( + target.AgentId!.Value, + new HealthCheckTask(target.Id, target.Type), + ct); + + return ParseAgentHealthResult(result); + } + + private async Task CheckDirectlyAsync( + Target target, + CancellationToken ct) + { + var testResult = await _connectionTester.TestAsync(target, ct); + + return new HealthCheckResult( + Status: testResult.Success ? HealthStatus.Healthy : HealthStatus.Unreachable, + Message: testResult.Message, + Duration: testResult.Duration, + CheckedAt: TimeProvider.System.GetUtcNow() + ); + } +} + +public sealed record HealthCheckResult( + HealthStatus Status, + string? Message, + TimeSpan Duration, + DateTimeOffset CheckedAt +); +``` + +### HealthCheckScheduler + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Health; + +public sealed class HealthCheckScheduler : IHostedService, IDisposable +{ + private readonly ITargetRegistry _targetRegistry; + private readonly ITargetHealthChecker _healthChecker; + private readonly ILogger _logger; + private readonly TimeSpan _checkInterval = TimeSpan.FromMinutes(1); + private Timer? _timer; + + public Task StartAsync(CancellationToken ct) + { + _timer = new Timer( + DoHealthChecks, + null, + TimeSpan.FromSeconds(30), + _checkInterval); + + return Task.CompletedTask; + } + + public Task StopAsync(CancellationToken ct) + { + _timer?.Change(Timeout.Infinite, 0); + return Task.CompletedTask; + } + + private async void DoHealthChecks(object? state) + { + try + { + var environments = await _environmentService.ListAsync(); + + foreach (var env in environments) + { + var targets = await _targetRegistry.ListByEnvironmentAsync(env.Id); + + foreach (var target in targets) + { + try + { + var result = await _healthChecker.CheckAsync(target); + await _targetRegistry.UpdateHealthAsync( + target.Id, + result.Status, + result.Message); + } + catch (Exception ex) + { + _logger.LogError(ex, + "Failed to update health for target {TargetId}", + target.Id); + } + } + } + } + catch (Exception ex) + { + _logger.LogError(ex, "Health check scheduler failed"); + } + } + + public void Dispose() => _timer?.Dispose(); +} +``` + +### Documentation Deliverables + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `docs/modules/release-orchestrator/api/environments.md` (partial) | Markdown | API endpoint documentation for target management (target groups, targets, health) | + +--- + +## Acceptance Criteria + +### Code +- [ ] Register target with connection config +- [ ] Update target preserves encrypted config +- [ ] Unregister checks for active deployments +- [ ] List targets by environment works +- [ ] List healthy targets filters correctly +- [ ] Assign/unassign agent works +- [ ] Connection test validates config +- [ ] Health check updates status +- [ ] Scheduled health checks run +- [ ] Unit test coverage ≥85% + +### Documentation +- [ ] API documentation created for target endpoints +- [ ] All target CRUD endpoints documented +- [ ] Target group endpoints documented +- [ ] Health check endpoints documented + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 103_001 Environment CRUD | Internal | TODO | +| 101_001 Database Schema | Internal | TODO | +| Docker.DotNet | NuGet | Available | +| AWSSDK.ECS | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| ITargetRegistry | TODO | | +| TargetRegistry | TODO | | +| TargetValidator | TODO | | +| TargetConnectionTester | TODO | | +| ITargetHealthChecker | TODO | | +| TargetHealthChecker | TODO | | +| HealthCheckScheduler | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 11-Jan-2026 | Added documentation deliverable: api/environments.md (partial - targets) | diff --git a/docs/implplan/SPRINT_20260110_103_003_ENVMGR_agent_manager.md b/docs/implplan/SPRINT_20260110_103_003_ENVMGR_agent_manager.md new file mode 100644 index 000000000..a60f8787d --- /dev/null +++ b/docs/implplan/SPRINT_20260110_103_003_ENVMGR_agent_manager.md @@ -0,0 +1,554 @@ +# SPRINT: Agent Manager - Core + +> **Sprint ID:** 103_003 +> **Module:** ENVMGR +> **Phase:** 3 - Environment Manager +> **Status:** TODO +> **Parent:** [103_000_INDEX](SPRINT_20260110_103_000_INDEX_environment_manager.md) + +--- + +## Overview + +Implement the Agent Manager for registering, authenticating, and managing deployment agents. Agents are secure executors that run on target hosts. + +### Objectives + +- One-time token generation for agent registration +- Agent registration with certificate issuance +- Heartbeat processing and status tracking +- Agent capability registration +- Agent lifecycle management (active/inactive/revoked) + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Agent/ +│ ├── Manager/ +│ │ ├── IAgentManager.cs +│ │ ├── AgentManager.cs +│ │ └── AgentValidator.cs +│ ├── Registration/ +│ │ ├── IAgentRegistration.cs +│ │ ├── AgentRegistration.cs +│ │ ├── RegistrationTokenService.cs +│ │ └── RegistrationToken.cs +│ ├── Certificate/ +│ │ ├── IAgentCertificateService.cs +│ │ ├── AgentCertificateService.cs +│ │ └── CertificateTemplate.cs +│ ├── Heartbeat/ +│ │ ├── IHeartbeatProcessor.cs +│ │ ├── HeartbeatProcessor.cs +│ │ └── HeartbeatTimeoutMonitor.cs +│ ├── Capability/ +│ │ ├── AgentCapability.cs +│ │ └── CapabilityRegistry.cs +│ └── Models/ +│ ├── Agent.cs +│ ├── AgentStatus.cs +│ └── AgentHeartbeat.cs +└── __Tests/ +``` + +--- + +## Architecture Reference + +- [Agent Security](../modules/release-orchestrator/security/agent-security.md) +- [Agents](../modules/release-orchestrator/modules/agents.md) + +--- + +## Deliverables + +### IAgentManager Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Agent.Manager; + +public interface IAgentManager +{ + // Registration + Task CreateRegistrationTokenAsync( + CreateRegistrationTokenRequest request, + CancellationToken ct = default); + + Task RegisterAsync( + AgentRegistrationRequest request, + CancellationToken ct = default); + + // Lifecycle + Task GetAsync(Guid id, CancellationToken ct = default); + Task GetByNameAsync(string name, CancellationToken ct = default); + Task> ListAsync(AgentFilter? filter = null, CancellationToken ct = default); + Task> ListActiveAsync(CancellationToken ct = default); + Task ActivateAsync(Guid id, CancellationToken ct = default); + Task DeactivateAsync(Guid id, CancellationToken ct = default); + Task RevokeAsync(Guid id, string reason, CancellationToken ct = default); + + // Heartbeat + Task ProcessHeartbeatAsync(AgentHeartbeat heartbeat, CancellationToken ct = default); + + // Certificate + Task RenewCertificateAsync(Guid id, CancellationToken ct = default); + + // Task execution + Task ExecuteTaskAsync( + Guid agentId, + AgentTask task, + CancellationToken ct = default); +} + +public sealed record CreateRegistrationTokenRequest( + string AgentName, + string DisplayName, + IReadOnlyList Capabilities, + TimeSpan? ValidFor = null +); + +public sealed record AgentRegistrationRequest( + string Token, + string AgentVersion, + string Hostname, + IReadOnlyDictionary Labels +); +``` + +### Agent Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Agent.Models; + +public sealed record Agent +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required string Name { get; init; } + public required string DisplayName { get; init; } + public required string Version { get; init; } + public string? Hostname { get; init; } + public required AgentStatus Status { get; init; } + public required ImmutableArray Capabilities { get; init; } + public ImmutableDictionary Labels { get; init; } = ImmutableDictionary.Empty; + public string? CertificateThumbprint { get; init; } + public DateTimeOffset? CertificateExpiresAt { get; init; } + public DateTimeOffset? LastHeartbeatAt { get; init; } + public AgentResourceStatus? LastResourceStatus { get; init; } + public DateTimeOffset? RegisteredAt { get; init; } + public DateTimeOffset CreatedAt { get; init; } + public DateTimeOffset UpdatedAt { get; init; } +} + +public enum AgentStatus +{ + Pending, // Token created, not yet registered + Active, // Registered and healthy + Inactive, // Manually deactivated + Stale, // Missed heartbeats + Revoked // Permanently disabled +} + +public enum AgentCapability +{ + Docker, + Compose, + Ssh, + WinRm +} + +public sealed record AgentResourceStatus( + double CpuPercent, + long MemoryUsedBytes, + long MemoryTotalBytes, + long DiskUsedBytes, + long DiskTotalBytes +); +``` + +### RegistrationTokenService + +```csharp +namespace StellaOps.ReleaseOrchestrator.Agent.Registration; + +public sealed class RegistrationTokenService +{ + private readonly IAgentStore _store; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + + private static readonly TimeSpan DefaultTokenValidity = TimeSpan.FromHours(24); + + public async Task CreateAsync( + CreateRegistrationTokenRequest request, + CancellationToken ct = default) + { + // Validate agent name is unique + var existing = await _store.GetByNameAsync(request.AgentName, ct); + if (existing is not null) + { + throw new AgentAlreadyExistsException(request.AgentName); + } + + var token = GenerateSecureToken(); + var validity = request.ValidFor ?? DefaultTokenValidity; + var expiresAt = _timeProvider.GetUtcNow().Add(validity); + + var registrationToken = new RegistrationToken + { + Id = _guidGenerator.NewGuid(), + Token = token, + AgentName = request.AgentName, + DisplayName = request.DisplayName, + Capabilities = request.Capabilities.ToImmutableArray(), + ExpiresAt = expiresAt, + CreatedAt = _timeProvider.GetUtcNow(), + IsUsed = false + }; + + await _store.SaveRegistrationTokenAsync(registrationToken, ct); + + return registrationToken; + } + + public async Task ValidateAndConsumeAsync( + string token, + CancellationToken ct = default) + { + var registrationToken = await _store.GetRegistrationTokenAsync(token, ct); + + if (registrationToken is null) + { + return null; + } + + if (registrationToken.IsUsed) + { + throw new RegistrationTokenAlreadyUsedException(token); + } + + if (registrationToken.ExpiresAt < _timeProvider.GetUtcNow()) + { + throw new RegistrationTokenExpiredException(token); + } + + // Mark as used + await _store.MarkRegistrationTokenUsedAsync(registrationToken.Id, ct); + + return registrationToken; + } + + private static string GenerateSecureToken() + { + var bytes = RandomNumberGenerator.GetBytes(32); + return Convert.ToBase64String(bytes) + .Replace("+", "-") + .Replace("/", "_") + .TrimEnd('='); + } +} + +public sealed record RegistrationToken +{ + public required Guid Id { get; init; } + public required string Token { get; init; } + public required string AgentName { get; init; } + public required string DisplayName { get; init; } + public required ImmutableArray Capabilities { get; init; } + public required DateTimeOffset ExpiresAt { get; init; } + public required DateTimeOffset CreatedAt { get; init; } + public required bool IsUsed { get; init; } +} +``` + +### AgentCertificateService + +```csharp +namespace StellaOps.ReleaseOrchestrator.Agent.Certificate; + +public interface IAgentCertificateService +{ + Task IssueAsync(Agent agent, CancellationToken ct = default); + Task RenewAsync(Agent agent, CancellationToken ct = default); + Task RevokeAsync(Agent agent, CancellationToken ct = default); + Task ValidateAsync(string thumbprint, CancellationToken ct = default); +} + +public sealed class AgentCertificateService : IAgentCertificateService +{ + private readonly ICertificateAuthority _ca; + private readonly IAgentStore _store; + private readonly TimeProvider _timeProvider; + + private static readonly TimeSpan CertificateValidity = TimeSpan.FromHours(24); + + public async Task IssueAsync( + Agent agent, + CancellationToken ct = default) + { + var now = _timeProvider.GetUtcNow(); + var notAfter = now.Add(CertificateValidity); + + var subject = new X500DistinguishedName( + $"CN={agent.Name}, O=StellaOps Agent, OU={agent.TenantId}"); + + var certificate = await _ca.IssueCertificateAsync( + subject: subject, + notBefore: now, + notAfter: notAfter, + keyUsage: X509KeyUsageFlags.DigitalSignature | X509KeyUsageFlags.KeyEncipherment, + extendedKeyUsage: [Oids.ClientAuthentication], + ct: ct); + + var agentCertificate = new AgentCertificate + { + Thumbprint = certificate.Thumbprint, + SubjectName = certificate.Subject, + NotBefore = now, + NotAfter = notAfter, + CertificatePem = certificate.ExportCertificatePem(), + PrivateKeyPem = certificate.GetRSAPrivateKey()!.ExportRSAPrivateKeyPem() + }; + + // Update agent with new certificate + await _store.UpdateCertificateAsync( + agent.Id, + agentCertificate.Thumbprint, + notAfter, + ct); + + return agentCertificate; + } + + public async Task RenewAsync( + Agent agent, + CancellationToken ct = default) + { + // Revoke old certificate if exists + if (!string.IsNullOrEmpty(agent.CertificateThumbprint)) + { + await _ca.RevokeCertificateAsync(agent.CertificateThumbprint, ct); + } + + // Issue new certificate + return await IssueAsync(agent, ct); + } + + public async Task RevokeAsync(Agent agent, CancellationToken ct = default) + { + if (!string.IsNullOrEmpty(agent.CertificateThumbprint)) + { + await _ca.RevokeCertificateAsync(agent.CertificateThumbprint, ct); + await _store.ClearCertificateAsync(agent.Id, ct); + } + } +} + +public sealed record AgentCertificate +{ + public required string Thumbprint { get; init; } + public required string SubjectName { get; init; } + public required DateTimeOffset NotBefore { get; init; } + public required DateTimeOffset NotAfter { get; init; } + public required string CertificatePem { get; init; } + public required string PrivateKeyPem { get; init; } +} +``` + +### HeartbeatProcessor + +```csharp +namespace StellaOps.ReleaseOrchestrator.Agent.Heartbeat; + +public interface IHeartbeatProcessor +{ + Task ProcessAsync(AgentHeartbeat heartbeat, CancellationToken ct = default); +} + +public sealed class HeartbeatProcessor : IHeartbeatProcessor +{ + private readonly IAgentStore _store; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public async Task ProcessAsync( + AgentHeartbeat heartbeat, + CancellationToken ct = default) + { + var agent = await _store.GetAsync(heartbeat.AgentId, ct); + if (agent is null) + { + _logger.LogWarning( + "Received heartbeat from unknown agent {AgentId}", + heartbeat.AgentId); + return; + } + + if (agent.Status == AgentStatus.Revoked) + { + _logger.LogWarning( + "Received heartbeat from revoked agent {AgentName}", + agent.Name); + return; + } + + // Update last heartbeat + await _store.UpdateHeartbeatAsync( + heartbeat.AgentId, + _timeProvider.GetUtcNow(), + heartbeat.ResourceStatus, + ct); + + // If agent was stale, reactivate it + if (agent.Status == AgentStatus.Stale) + { + await _store.UpdateStatusAsync( + heartbeat.AgentId, + AgentStatus.Active, + ct); + + _logger.LogInformation( + "Agent {AgentName} recovered from stale state", + agent.Name); + } + } +} + +public sealed record AgentHeartbeat( + Guid AgentId, + string Version, + AgentResourceStatus ResourceStatus, + IReadOnlyList RunningTasks, + DateTimeOffset Timestamp +); +``` + +### HeartbeatTimeoutMonitor + +```csharp +namespace StellaOps.ReleaseOrchestrator.Agent.Heartbeat; + +public sealed class HeartbeatTimeoutMonitor : IHostedService, IDisposable +{ + private readonly IAgentManager _agentManager; + private readonly ILogger _logger; + private readonly TimeSpan _checkInterval = TimeSpan.FromSeconds(30); + private readonly TimeSpan _heartbeatTimeout = TimeSpan.FromMinutes(2); + private Timer? _timer; + + public Task StartAsync(CancellationToken ct) + { + _timer = new Timer( + CheckForTimeouts, + null, + TimeSpan.FromMinutes(1), + _checkInterval); + + return Task.CompletedTask; + } + + public Task StopAsync(CancellationToken ct) + { + _timer?.Change(Timeout.Infinite, 0); + return Task.CompletedTask; + } + + private async void CheckForTimeouts(object? state) + { + try + { + var agents = await _agentManager.ListActiveAsync(); + var now = TimeProvider.System.GetUtcNow(); + + foreach (var agent in agents) + { + if (agent.LastHeartbeatAt is null) + continue; + + var timeSinceHeartbeat = now - agent.LastHeartbeatAt.Value; + + if (timeSinceHeartbeat > _heartbeatTimeout) + { + _logger.LogWarning( + "Agent {AgentName} missed heartbeat (last: {LastHeartbeat})", + agent.Name, + agent.LastHeartbeatAt); + + await _agentManager.MarkStaleAsync(agent.Id); + } + } + } + catch (Exception ex) + { + _logger.LogError(ex, "Heartbeat timeout check failed"); + } + } + + public void Dispose() => _timer?.Dispose(); +} +``` + +### Documentation Deliverables + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `docs/modules/release-orchestrator/api/agents.md` | Markdown | API endpoint documentation for agent registration, heartbeat, task management | + +--- + +## Acceptance Criteria + +### Code +- [ ] Registration token created with expiry +- [ ] Token can only be used once +- [ ] Agent registered with certificate +- [ ] mTLS certificate issued correctly +- [ ] Certificate renewed before expiry +- [ ] Heartbeat updates agent status +- [ ] Stale agents detected after timeout +- [ ] Revoked agents cannot send heartbeats +- [ ] Agent capabilities stored correctly +- [ ] Unit test coverage ≥85% + +### Documentation +- [ ] API documentation file created (api/agents.md) +- [ ] Agent registration endpoint documented +- [ ] Heartbeat endpoint documented +- [ ] Task endpoints documented +- [ ] mTLS flow documented + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 103_002 Target Registry | Internal | TODO | +| 101_001 Database Schema | Internal | TODO | +| Authority | Internal | Exists | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IAgentManager | TODO | | +| AgentManager | TODO | | +| RegistrationTokenService | TODO | | +| IAgentCertificateService | TODO | | +| AgentCertificateService | TODO | | +| HeartbeatProcessor | TODO | | +| HeartbeatTimeoutMonitor | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 11-Jan-2026 | Added documentation deliverable: api/agents.md | diff --git a/docs/implplan/SPRINT_20260110_103_004_ENVMGR_inventory_sync.md b/docs/implplan/SPRINT_20260110_103_004_ENVMGR_inventory_sync.md new file mode 100644 index 000000000..ae5d2b3e7 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_103_004_ENVMGR_inventory_sync.md @@ -0,0 +1,385 @@ +# SPRINT: Inventory Sync + +> **Sprint ID:** 103_004 +> **Module:** ENVMGR +> **Phase:** 3 - Environment Manager +> **Status:** TODO +> **Parent:** [103_000_INDEX](SPRINT_20260110_103_000_INDEX_environment_manager.md) + +--- + +## Overview + +Implement Inventory Sync for capturing current container state from targets and detecting configuration drift. + +### Objectives + +- Pull current container state from targets +- Create inventory snapshots +- Detect drift from expected/deployed state +- Support scheduled and on-demand sync + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Environment/ +│ └── Inventory/ +│ ├── IInventorySyncService.cs +│ ├── InventorySyncService.cs +│ ├── InventoryCollector.cs +│ ├── DriftDetector.cs +│ ├── InventorySnapshot.cs +│ └── SyncScheduler.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### IInventorySyncService Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Inventory; + +public interface IInventorySyncService +{ + Task SyncTargetAsync(Guid targetId, CancellationToken ct = default); + Task> SyncEnvironmentAsync(Guid environmentId, CancellationToken ct = default); + Task GetLatestSnapshotAsync(Guid targetId, CancellationToken ct = default); + Task DetectDriftAsync(Guid targetId, CancellationToken ct = default); + Task DetectDriftAsync(Guid targetId, Guid releaseId, CancellationToken ct = default); +} +``` + +### InventorySnapshot Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Inventory; + +public sealed record InventorySnapshot +{ + public required Guid Id { get; init; } + public required Guid TargetId { get; init; } + public required DateTimeOffset CollectedAt { get; init; } + public required ImmutableArray Containers { get; init; } + public required ImmutableArray Networks { get; init; } + public required ImmutableArray Volumes { get; init; } + public string? CollectionError { get; init; } +} + +public sealed record ContainerInfo( + string Id, + string Name, + string Image, + string ImageDigest, + string Status, + ImmutableDictionary Labels, + ImmutableArray Ports, + DateTimeOffset CreatedAt, + DateTimeOffset? StartedAt +); + +public sealed record NetworkInfo( + string Id, + string Name, + string Driver, + ImmutableArray ConnectedContainers +); + +public sealed record VolumeInfo( + string Name, + string Driver, + string Mountpoint +); + +public sealed record PortMapping( + int PrivatePort, + int? PublicPort, + string Type +); +``` + +### DriftDetector + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Inventory; + +public sealed class DriftDetector +{ + public DriftReport Detect( + InventorySnapshot currentState, + ExpectedState expectedState) + { + var drifts = new List(); + + // Check for missing containers + foreach (var expected in expectedState.Containers) + { + var actual = currentState.Containers + .FirstOrDefault(c => c.Name == expected.Name); + + if (actual is null) + { + drifts.Add(new DriftItem( + Type: DriftType.Missing, + Resource: "container", + Name: expected.Name, + Expected: expected.ImageDigest, + Actual: null, + Message: $"Container '{expected.Name}' not found" + )); + continue; + } + + // Check digest mismatch + if (actual.ImageDigest != expected.ImageDigest) + { + drifts.Add(new DriftItem( + Type: DriftType.DigestMismatch, + Resource: "container", + Name: expected.Name, + Expected: expected.ImageDigest, + Actual: actual.ImageDigest, + Message: $"Container '{expected.Name}' has different image digest" + )); + } + + // Check status + if (actual.Status != "running") + { + drifts.Add(new DriftItem( + Type: DriftType.StatusMismatch, + Resource: "container", + Name: expected.Name, + Expected: "running", + Actual: actual.Status, + Message: $"Container '{expected.Name}' is not running" + )); + } + } + + // Check for unexpected containers + var expectedNames = expectedState.Containers.Select(c => c.Name).ToHashSet(); + foreach (var actual in currentState.Containers) + { + if (!expectedNames.Contains(actual.Name) && + !IsSystemContainer(actual.Name)) + { + drifts.Add(new DriftItem( + Type: DriftType.Unexpected, + Resource: "container", + Name: actual.Name, + Expected: null, + Actual: actual.ImageDigest, + Message: $"Unexpected container '{actual.Name}' found" + )); + } + } + + return new DriftReport( + TargetId: currentState.TargetId, + DetectedAt: TimeProvider.System.GetUtcNow(), + HasDrift: drifts.Count > 0, + Drifts: drifts.ToImmutableArray() + ); + } + + private static bool IsSystemContainer(string name) => + name.StartsWith("stella-agent") || + name.StartsWith("k8s_") || + name.StartsWith("rancher-"); +} + +public sealed record DriftReport( + Guid TargetId, + DateTimeOffset DetectedAt, + bool HasDrift, + ImmutableArray Drifts +); + +public sealed record DriftItem( + DriftType Type, + string Resource, + string Name, + string? Expected, + string? Actual, + string Message +); + +public enum DriftType +{ + Missing, + Unexpected, + DigestMismatch, + StatusMismatch, + ConfigMismatch +} +``` + +### InventoryCollector + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Inventory; + +public sealed class InventoryCollector +{ + private readonly IAgentManager _agentManager; + private readonly ILogger _logger; + + public async Task CollectAsync( + Target target, + CancellationToken ct = default) + { + var sw = Stopwatch.StartNew(); + + try + { + if (target.AgentId is null) + { + throw new InvalidOperationException( + $"Target {target.Name} has no assigned agent"); + } + + var agent = await _agentManager.GetAsync(target.AgentId.Value, ct); + if (agent?.Status != AgentStatus.Active) + { + throw new InvalidOperationException( + $"Agent for target {target.Name} is not active"); + } + + // Dispatch inventory collection task to agent + var result = await _agentManager.ExecuteTaskAsync( + target.AgentId.Value, + new InventoryCollectionTask(target.Id, target.Type), + ct); + + return ParseInventoryResult(target.Id, result); + } + catch (Exception ex) + { + _logger.LogError(ex, + "Failed to collect inventory from target {TargetName}", + target.Name); + + return new InventorySnapshot + { + Id = Guid.NewGuid(), + TargetId = target.Id, + CollectedAt = TimeProvider.System.GetUtcNow(), + Containers = [], + Networks = [], + Volumes = [], + CollectionError = ex.Message + }; + } + } +} +``` + +### SyncScheduler + +```csharp +namespace StellaOps.ReleaseOrchestrator.Environment.Inventory; + +public sealed class SyncScheduler : IHostedService, IDisposable +{ + private readonly IInventorySyncService _syncService; + private readonly IEnvironmentService _environmentService; + private readonly ILogger _logger; + private readonly TimeSpan _syncInterval = TimeSpan.FromMinutes(5); + private Timer? _timer; + + public Task StartAsync(CancellationToken ct) + { + _timer = new Timer( + DoSync, + null, + TimeSpan.FromMinutes(2), + _syncInterval); + + return Task.CompletedTask; + } + + public Task StopAsync(CancellationToken ct) + { + _timer?.Change(Timeout.Infinite, 0); + return Task.CompletedTask; + } + + private async void DoSync(object? state) + { + try + { + var environments = await _environmentService.ListAsync(); + + foreach (var env in environments) + { + try + { + await _syncService.SyncEnvironmentAsync(env.Id); + } + catch (Exception ex) + { + _logger.LogError(ex, + "Failed to sync environment {EnvironmentName}", + env.Name); + } + } + } + catch (Exception ex) + { + _logger.LogError(ex, "Inventory sync scheduler failed"); + } + } + + public void Dispose() => _timer?.Dispose(); +} +``` + +--- + +## Acceptance Criteria + +- [ ] Collect inventory from Docker targets +- [ ] Collect inventory from Compose targets +- [ ] Store inventory snapshots +- [ ] Detect missing containers +- [ ] Detect digest mismatches +- [ ] Detect unexpected containers +- [ ] Generate drift report +- [ ] Scheduled sync runs periodically +- [ ] On-demand sync works +- [ ] Unit test coverage ≥85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 103_002 Target Registry | Internal | TODO | +| 103_003 Agent Manager | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IInventorySyncService | TODO | | +| InventorySyncService | TODO | | +| InventoryCollector | TODO | | +| DriftDetector | TODO | | +| SyncScheduler | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_104_000_INDEX_release_manager.md b/docs/implplan/SPRINT_20260110_104_000_INDEX_release_manager.md new file mode 100644 index 000000000..fd0a97447 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_104_000_INDEX_release_manager.md @@ -0,0 +1,200 @@ +# SPRINT INDEX: Phase 4 - Release Manager + +> **Epic:** Release Orchestrator +> **Phase:** 4 - Release Manager +> **Batch:** 104 +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_release_orchestrator.md) + +--- + +## Overview + +Phase 4 implements the Release Manager - handling components (container images), versions, release bundles, and the release catalog. + +### Objectives + +- Component registry for tracking container images +- Version management with digest-first identity +- Release bundle creation (multiple components) +- Release catalog with status lifecycle + +--- + +## Sprint Structure + +| Sprint ID | Title | Module | Status | Dependencies | +|-----------|-------|--------|--------|--------------| +| 104_001 | Component Registry | RELMAN | TODO | 102_004 | +| 104_002 | Version Manager | RELMAN | TODO | 104_001 | +| 104_003 | Release Manager | RELMAN | TODO | 104_002 | +| 104_004 | Release Catalog | RELMAN | TODO | 104_003 | + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ RELEASE MANAGER │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ COMPONENT REGISTRY (104_001) │ │ +│ │ │ │ +│ │ Component ──────────────────────────────────────────────┐ │ │ +│ │ │ id: UUID │ │ │ +│ │ │ name: "api" │ │ │ +│ │ │ registry: acr.example.io │ │ │ +│ │ │ repository: myorg/api │ │ │ +│ │ └───────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ VERSION MANAGER (104_002) │ │ +│ │ │ │ +│ │ ComponentVersion ───────────────────────────────────────┐ │ │ +│ │ │ digest: sha256:abc123... │ │ │ +│ │ │ semver: 2.3.1 │ ◄── SOURCE │ │ +│ │ │ tag: v2.3.1 │ OF TRUTH│ │ +│ │ │ discovered_at: 2026-01-10T10:00:00Z │ │ │ +│ │ └───────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ RELEASE MANAGER (104_003) │ │ +│ │ │ │ +│ │ Release Bundle ─────────────────────────────────────────┐ │ │ +│ │ │ name: "myapp-v2.3.1" │ │ │ +│ │ │ status: ready │ │ │ +│ │ │ components: [ │ │ │ +│ │ │ { component: api, version: sha256:abc... } │ │ │ +│ │ │ { component: worker, version: sha256:def... } │ │ │ +│ │ │ ] │ │ │ +│ │ └───────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ RELEASE CATALOG (104_004) │ │ +│ │ │ │ +│ │ Status Lifecycle: │ │ +│ │ ┌──────┐ finalize ┌───────┐ promote ┌──────────┐ │ │ +│ │ │draft │──────────►│ ready │─────────►│promoting │ │ │ +│ │ └──────┘ └───────┘ └────┬─────┘ │ │ +│ │ │ │ │ +│ │ ┌──────────┴──────────┐ │ │ +│ │ ▼ ▼ │ │ +│ │ ┌──────────┐ ┌────────────┐ │ │ +│ │ │ deployed │ │ deprecated │ │ │ +│ │ └──────────┘ └────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Deliverables Summary + +### 104_001: Component Registry + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IComponentRegistry` | Interface | Component CRUD | +| `ComponentRegistry` | Class | Implementation | +| `Component` | Model | Container component entity | +| `ComponentDiscovery` | Class | Auto-discover from registry | + +### 104_002: Version Manager + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IVersionManager` | Interface | Version tracking | +| `VersionManager` | Class | Implementation | +| `ComponentVersion` | Model | Digest-first version | +| `VersionResolver` | Class | Tag → Digest resolution | +| `VersionWatcher` | Service | Watch for new versions | + +### 104_003: Release Manager + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IReleaseManager` | Interface | Release operations | +| `ReleaseManager` | Class | Implementation | +| `Release` | Model | Release bundle entity | +| `ReleaseComponent` | Model | Release-component mapping | +| `ReleaseFinalizer` | Class | Finalize and lock release | + +### 104_004: Release Catalog + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IReleaseCatalog` | Interface | Catalog queries | +| `ReleaseCatalog` | Class | Implementation | +| `ReleaseStatusMachine` | Class | Status transitions | +| `ReleaseHistory` | Service | Track release history | + +--- + +## Key Interfaces + +```csharp +public interface IComponentRegistry +{ + Task CreateAsync(CreateComponentRequest request, CancellationToken ct); + Task GetAsync(Guid id, CancellationToken ct); + Task> ListAsync(CancellationToken ct); + Task> GetVersionsAsync(Guid componentId, CancellationToken ct); +} + +public interface IVersionManager +{ + Task ResolveAsync(Guid componentId, string tagOrDigest, CancellationToken ct); + Task GetByDigestAsync(Guid componentId, string digest, CancellationToken ct); + Task> ListLatestAsync(Guid componentId, int count, CancellationToken ct); +} + +public interface IReleaseManager +{ + Task CreateAsync(CreateReleaseRequest request, CancellationToken ct); + Task AddComponentAsync(Guid releaseId, AddReleaseComponentRequest request, CancellationToken ct); + Task FinalizeAsync(Guid releaseId, CancellationToken ct); + Task GetAsync(Guid releaseId, CancellationToken ct); +} + +public interface IReleaseCatalog +{ + Task> ListAsync(ReleaseFilter? filter, CancellationToken ct); + Task GetLatestDeployedAsync(Guid environmentId, CancellationToken ct); + Task GetHistoryAsync(Guid releaseId, CancellationToken ct); +} +``` + +--- + +## Dependencies + +| Module | Purpose | +|--------|---------| +| 102_004 Registry Connectors | Tag resolution | +| 101_001 Database Schema | Tables | + +--- + +## Acceptance Criteria + +- [ ] Component registration works +- [ ] Tag resolves to immutable digest +- [ ] Release bundles multiple components +- [ ] Release finalization locks versions +- [ ] Status transitions validated +- [ ] Release history tracked +- [ ] Deprecation prevents promotion +- [ ] Unit test coverage ≥80% + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Phase 4 index created | diff --git a/docs/implplan/SPRINT_20260110_104_001_RELMAN_component_registry.md b/docs/implplan/SPRINT_20260110_104_001_RELMAN_component_registry.md new file mode 100644 index 000000000..65b937df4 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_104_001_RELMAN_component_registry.md @@ -0,0 +1,535 @@ +# SPRINT: Component Registry + +> **Sprint ID:** 104_001 +> **Module:** RELMAN +> **Phase:** 4 - Release Manager +> **Status:** TODO +> **Parent:** [104_000_INDEX](SPRINT_20260110_104_000_INDEX_release_manager.md) + +--- + +## Overview + +Implement the Component Registry for tracking container images as deployable components. + +### Objectives + +- Register container components with registry/repository metadata +- Discover components from connected registries +- Track component configurations and labels +- Support component lifecycle (active/deprecated) + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Release/ +│ ├── Component/ +│ │ ├── IComponentRegistry.cs +│ │ ├── ComponentRegistry.cs +│ │ ├── ComponentValidator.cs +│ │ └── ComponentDiscovery.cs +│ ├── Store/ +│ │ ├── IComponentStore.cs +│ │ └── ComponentStore.cs +│ └── Models/ +│ ├── Component.cs +│ ├── ComponentConfig.cs +│ └── ComponentStatus.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Release.Tests/ + └── Component/ +``` + +--- + +## Architecture Reference + +- [Release Manager](../modules/release-orchestrator/modules/release-manager.md) +- [Data Model Schema](../modules/release-orchestrator/data-model/schema.md) + +--- + +## Deliverables + +### IComponentRegistry Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Component; + +public interface IComponentRegistry +{ + Task RegisterAsync(RegisterComponentRequest request, CancellationToken ct = default); + Task UpdateAsync(Guid id, UpdateComponentRequest request, CancellationToken ct = default); + Task GetAsync(Guid id, CancellationToken ct = default); + Task GetByNameAsync(string name, CancellationToken ct = default); + Task> ListAsync(ComponentFilter? filter = null, CancellationToken ct = default); + Task> ListActiveAsync(CancellationToken ct = default); + Task DeprecateAsync(Guid id, string reason, CancellationToken ct = default); + Task ReactivateAsync(Guid id, CancellationToken ct = default); + Task DeleteAsync(Guid id, CancellationToken ct = default); +} + +public sealed record RegisterComponentRequest( + string Name, + string DisplayName, + string RegistryUrl, + string Repository, + string? Description = null, + IReadOnlyDictionary? Labels = null, + ComponentConfig? Config = null +); + +public sealed record UpdateComponentRequest( + string? DisplayName = null, + string? Description = null, + IReadOnlyDictionary? Labels = null, + ComponentConfig? Config = null +); + +public sealed record ComponentFilter( + string? NameContains = null, + string? RegistryUrl = null, + ComponentStatus? Status = null, + IReadOnlyDictionary? Labels = null +); +``` + +### Component Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Models; + +public sealed record Component +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required string Name { get; init; } + public required string DisplayName { get; init; } + public string? Description { get; init; } + public required string RegistryUrl { get; init; } + public required string Repository { get; init; } + public required ComponentStatus Status { get; init; } + public string? DeprecationReason { get; init; } + public ImmutableDictionary Labels { get; init; } = ImmutableDictionary.Empty; + public ComponentConfig? Config { get; init; } + public DateTimeOffset CreatedAt { get; init; } + public DateTimeOffset UpdatedAt { get; init; } + public Guid CreatedBy { get; init; } + + public string FullImageRef => $"{RegistryUrl}/{Repository}"; +} + +public enum ComponentStatus +{ + Active, + Deprecated +} + +public sealed record ComponentConfig +{ + public string? DefaultTag { get; init; } + public bool WatchForNewVersions { get; init; } = true; + public string? TagPattern { get; init; } // Regex for valid tags + public int? RetainVersionCount { get; init; } + public ImmutableArray RequiredLabels { get; init; } = []; +} +``` + +### ComponentRegistry Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Component; + +public sealed class ComponentRegistry : IComponentRegistry +{ + private readonly IComponentStore _store; + private readonly IComponentValidator _validator; + private readonly IRegistryConnectorFactory _registryFactory; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + private readonly ILogger _logger; + + public async Task RegisterAsync( + RegisterComponentRequest request, + CancellationToken ct = default) + { + // Validate request + var validation = await _validator.ValidateRegisterAsync(request, ct); + if (!validation.IsValid) + { + throw new ComponentValidationException(validation.Errors); + } + + // Verify registry connectivity + var connector = await _registryFactory.GetConnectorAsync(request.RegistryUrl, ct); + var exists = await connector.RepositoryExistsAsync(request.Repository, ct); + if (!exists) + { + throw new RepositoryNotFoundException(request.RegistryUrl, request.Repository); + } + + var component = new Component + { + Id = _guidGenerator.NewGuid(), + TenantId = _tenantContext.TenantId, + Name = request.Name, + DisplayName = request.DisplayName, + Description = request.Description, + RegistryUrl = request.RegistryUrl, + Repository = request.Repository, + Status = ComponentStatus.Active, + Labels = request.Labels?.ToImmutableDictionary() ?? ImmutableDictionary.Empty, + Config = request.Config, + CreatedAt = _timeProvider.GetUtcNow(), + UpdatedAt = _timeProvider.GetUtcNow(), + CreatedBy = _userContext.UserId + }; + + await _store.SaveAsync(component, ct); + + await _eventPublisher.PublishAsync(new ComponentRegistered( + component.Id, + component.TenantId, + component.Name, + component.FullImageRef, + _timeProvider.GetUtcNow() + ), ct); + + _logger.LogInformation( + "Registered component {ComponentName} ({FullRef})", + component.Name, + component.FullImageRef); + + return component; + } + + public async Task DeprecateAsync( + Guid id, + string reason, + CancellationToken ct = default) + { + var component = await _store.GetAsync(id, ct) + ?? throw new ComponentNotFoundException(id); + + if (component.Status == ComponentStatus.Deprecated) + { + return; + } + + var updated = component with + { + Status = ComponentStatus.Deprecated, + DeprecationReason = reason, + UpdatedAt = _timeProvider.GetUtcNow() + }; + + await _store.SaveAsync(updated, ct); + + await _eventPublisher.PublishAsync(new ComponentDeprecated( + component.Id, + component.TenantId, + component.Name, + reason, + _timeProvider.GetUtcNow() + ), ct); + } + + public async Task DeleteAsync(Guid id, CancellationToken ct = default) + { + var component = await _store.GetAsync(id, ct) + ?? throw new ComponentNotFoundException(id); + + // Check for existing releases using this component + var hasReleases = await _store.HasReleasesAsync(id, ct); + if (hasReleases) + { + throw new ComponentInUseException(id, + "Cannot delete component with existing releases"); + } + + await _store.DeleteAsync(id, ct); + + await _eventPublisher.PublishAsync(new ComponentDeleted( + component.Id, + component.TenantId, + component.Name, + _timeProvider.GetUtcNow() + ), ct); + } +} +``` + +### ComponentDiscovery + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Component; + +public sealed class ComponentDiscovery +{ + private readonly IRegistryConnectorFactory _registryFactory; + private readonly IComponentRegistry _registry; + private readonly ILogger _logger; + + public async Task> DiscoverAsync( + string registryUrl, + string repositoryPattern, + CancellationToken ct = default) + { + var connector = await _registryFactory.GetConnectorAsync(registryUrl, ct); + var repositories = await connector.ListRepositoriesAsync(repositoryPattern, ct); + + var discovered = new List(); + + foreach (var repo in repositories) + { + var existing = await _registry.GetByNameAsync( + NormalizeComponentName(repo), ct); + + discovered.Add(new DiscoveredComponent( + RegistryUrl: registryUrl, + Repository: repo, + SuggestedName: NormalizeComponentName(repo), + AlreadyRegistered: existing is not null, + ExistingComponentId: existing?.Id + )); + } + + return discovered.AsReadOnly(); + } + + public async Task> ImportDiscoveredAsync( + IReadOnlyList components, + CancellationToken ct = default) + { + var imported = new List(); + + foreach (var discovered in components.Where(c => !c.AlreadyRegistered)) + { + try + { + var component = await _registry.RegisterAsync( + new RegisterComponentRequest( + Name: discovered.SuggestedName, + DisplayName: FormatDisplayName(discovered.SuggestedName), + RegistryUrl: discovered.RegistryUrl, + Repository: discovered.Repository + ), ct); + + imported.Add(component); + } + catch (Exception ex) + { + _logger.LogWarning(ex, + "Failed to import discovered component {Repository}", + discovered.Repository); + } + } + + return imported.AsReadOnly(); + } + + private static string NormalizeComponentName(string repository) => + repository.Replace("/", "-").ToLowerInvariant(); + + private static string FormatDisplayName(string name) => + CultureInfo.InvariantCulture.TextInfo.ToTitleCase( + name.Replace("-", " ").Replace("_", " ")); +} + +public sealed record DiscoveredComponent( + string RegistryUrl, + string Repository, + string SuggestedName, + bool AlreadyRegistered, + Guid? ExistingComponentId +); +``` + +### ComponentValidator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Component; + +public sealed class ComponentValidator : IComponentValidator +{ + private readonly IComponentStore _store; + + public async Task ValidateRegisterAsync( + RegisterComponentRequest request, + CancellationToken ct = default) + { + var errors = new List(); + + // Name validation + if (!IsValidComponentName(request.Name)) + { + errors.Add("Component name must be lowercase alphanumeric with hyphens, 2-64 characters"); + } + + // Check for duplicate name + var existing = await _store.GetByNameAsync(request.Name, ct); + if (existing is not null) + { + errors.Add($"Component with name '{request.Name}' already exists"); + } + + // Check for duplicate registry/repository combination + var duplicate = await _store.GetByRegistryAndRepositoryAsync( + request.RegistryUrl, + request.Repository, + ct); + if (duplicate is not null) + { + errors.Add($"Component already registered for {request.RegistryUrl}/{request.Repository}"); + } + + // Validate registry URL format + if (!Uri.TryCreate($"https://{request.RegistryUrl}", UriKind.Absolute, out _)) + { + errors.Add("Invalid registry URL format"); + } + + // Validate tag pattern if specified + if (request.Config?.TagPattern is not null) + { + try + { + _ = new Regex(request.Config.TagPattern); + } + catch (RegexParseException) + { + errors.Add("Invalid tag pattern regex"); + } + } + + return errors.Count == 0 + ? ValidationResult.Success() + : ValidationResult.Failure(errors); + } + + private static bool IsValidComponentName(string name) => + Regex.IsMatch(name, @"^[a-z][a-z0-9-]{1,63}$"); +} +``` + +### Domain Events + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Events; + +public sealed record ComponentRegistered( + Guid ComponentId, + Guid TenantId, + string Name, + string FullImageRef, + DateTimeOffset RegisteredAt +) : IDomainEvent; + +public sealed record ComponentUpdated( + Guid ComponentId, + Guid TenantId, + IReadOnlyList ChangedFields, + DateTimeOffset UpdatedAt +) : IDomainEvent; + +public sealed record ComponentDeprecated( + Guid ComponentId, + Guid TenantId, + string Name, + string Reason, + DateTimeOffset DeprecatedAt +) : IDomainEvent; + +public sealed record ComponentDeleted( + Guid ComponentId, + Guid TenantId, + string Name, + DateTimeOffset DeletedAt +) : IDomainEvent; +``` + +### Documentation Deliverables + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `docs/modules/release-orchestrator/api/releases.md` (partial) | Markdown | API endpoint documentation for component registry (list, create, update components) | + +--- + +## Acceptance Criteria + +### Code +- [ ] Register component with registry/repository +- [ ] Validate registry connectivity on register +- [ ] Check for duplicate components +- [ ] List components with filters +- [ ] Deprecate component with reason +- [ ] Reactivate deprecated component +- [ ] Delete component (only if no releases) +- [ ] Discover components from registry +- [ ] Import discovered components +- [ ] Unit test coverage >=85% + +### Documentation +- [ ] Component API endpoints documented +- [ ] List/Get/Create/Update/Delete component endpoints included +- [ ] Component version strategy schema documented + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `RegisterComponent_ValidRequest_Succeeds` | Registration works | +| `RegisterComponent_DuplicateName_Fails` | Duplicate name rejected | +| `RegisterComponent_InvalidRegistry_Fails` | Bad registry rejected | +| `DeprecateComponent_Active_Succeeds` | Deprecation works | +| `DeleteComponent_WithReleases_Fails` | In-use check works | +| `DiscoverComponents_ReturnsRepositories` | Discovery works | +| `ImportDiscovered_CreatesComponents` | Import works | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `ComponentLifecycle_E2E` | Full CRUD cycle | +| `RegistryDiscovery_E2E` | Discovery from real registry | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 102_004 Registry Connectors | Internal | TODO | +| 101_001 Database Schema | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IComponentRegistry | TODO | | +| ComponentRegistry | TODO | | +| ComponentValidator | TODO | | +| ComponentDiscovery | TODO | | +| IComponentStore | TODO | | +| ComponentStore | TODO | | +| Domain events | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 11-Jan-2026 | Added documentation deliverable: api/releases.md (partial - components) | diff --git a/docs/implplan/SPRINT_20260110_104_002_RELMAN_version_manager.md b/docs/implplan/SPRINT_20260110_104_002_RELMAN_version_manager.md new file mode 100644 index 000000000..32bb8ccc8 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_104_002_RELMAN_version_manager.md @@ -0,0 +1,541 @@ +# SPRINT: Version Manager + +> **Sprint ID:** 104_002 +> **Module:** RELMAN +> **Phase:** 4 - Release Manager +> **Status:** TODO +> **Parent:** [104_000_INDEX](SPRINT_20260110_104_000_INDEX_release_manager.md) + +--- + +## Overview + +Implement the Version Manager for digest-first version tracking of container images. + +### Objectives + +- Resolve tags to immutable digests +- Track component versions with metadata +- Watch for new versions from registries +- Support semantic versioning extraction + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Release/ +│ ├── Version/ +│ │ ├── IVersionManager.cs +│ │ ├── VersionManager.cs +│ │ ├── VersionResolver.cs +│ │ ├── VersionWatcher.cs +│ │ └── SemVerExtractor.cs +│ ├── Store/ +│ │ ├── IVersionStore.cs +│ │ └── VersionStore.cs +│ └── Models/ +│ ├── ComponentVersion.cs +│ └── VersionMetadata.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Release.Tests/ + └── Version/ +``` + +--- + +## Architecture Reference + +- [Release Manager](../modules/release-orchestrator/modules/release-manager.md) +- [Data Model Schema](../modules/release-orchestrator/data-model/schema.md) + +--- + +## Deliverables + +### IVersionManager Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Version; + +public interface IVersionManager +{ + Task ResolveAsync( + Guid componentId, + string tagOrDigest, + CancellationToken ct = default); + + Task GetByDigestAsync( + Guid componentId, + string digest, + CancellationToken ct = default); + + Task GetLatestAsync( + Guid componentId, + CancellationToken ct = default); + + Task> ListAsync( + Guid componentId, + VersionFilter? filter = null, + CancellationToken ct = default); + + Task> ListLatestAsync( + Guid componentId, + int count = 10, + CancellationToken ct = default); + + Task RecordVersionAsync( + RecordVersionRequest request, + CancellationToken ct = default); + + Task DigestExistsAsync( + Guid componentId, + string digest, + CancellationToken ct = default); +} + +public sealed record VersionFilter( + string? DigestPrefix = null, + string? TagContains = null, + SemanticVersion? MinVersion = null, + SemanticVersion? MaxVersion = null, + DateTimeOffset? DiscoveredAfter = null, + DateTimeOffset? DiscoveredBefore = null +); + +public sealed record RecordVersionRequest( + Guid ComponentId, + string Digest, + string? Tag = null, + SemanticVersion? SemVer = null, + VersionMetadata? Metadata = null +); +``` + +### ComponentVersion Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Models; + +public sealed record ComponentVersion +{ + public required Guid Id { get; init; } + public required Guid ComponentId { get; init; } + public required string Digest { get; init; } // sha256:abc123... + public string? Tag { get; init; } // v2.3.1, latest, etc. + public SemanticVersion? SemVer { get; init; } // Parsed semantic version + public VersionMetadata Metadata { get; init; } = new(); + public required DateTimeOffset DiscoveredAt { get; init; } + public DateTimeOffset? BuiltAt { get; init; } + public Guid? DiscoveredBy { get; init; } // User or system + + public string ShortDigest => Digest.Length > 19 + ? Digest[7..19] // sha256: prefix + 12 chars + : Digest; +} + +public sealed record SemanticVersion( + int Major, + int Minor, + int Patch, + string? Prerelease = null, + string? BuildMetadata = null +) : IComparable +{ + public override string ToString() + { + var version = $"{Major}.{Minor}.{Patch}"; + if (Prerelease is not null) + version += $"-{Prerelease}"; + if (BuildMetadata is not null) + version += $"+{BuildMetadata}"; + return version; + } + + public int CompareTo(SemanticVersion? other) + { + if (other is null) return 1; + + var majorCmp = Major.CompareTo(other.Major); + if (majorCmp != 0) return majorCmp; + + var minorCmp = Minor.CompareTo(other.Minor); + if (minorCmp != 0) return minorCmp; + + var patchCmp = Patch.CompareTo(other.Patch); + if (patchCmp != 0) return patchCmp; + + // Prerelease versions have lower precedence + if (Prerelease is null && other.Prerelease is not null) return 1; + if (Prerelease is not null && other.Prerelease is null) return -1; + + return string.Compare(Prerelease, other.Prerelease, + StringComparison.OrdinalIgnoreCase); + } +} + +public sealed record VersionMetadata +{ + public long? SizeBytes { get; init; } + public string? Architecture { get; init; } + public string? Os { get; init; } + public string? Author { get; init; } + public DateTimeOffset? CreatedAt { get; init; } + public ImmutableDictionary Labels { get; init; } = + ImmutableDictionary.Empty; + public ImmutableArray Layers { get; init; } = []; +} +``` + +### VersionResolver + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Version; + +public sealed class VersionResolver +{ + private readonly IRegistryConnectorFactory _registryFactory; + private readonly IComponentStore _componentStore; + private readonly ILogger _logger; + + public async Task ResolveAsync( + Guid componentId, + string tagOrDigest, + CancellationToken ct = default) + { + var component = await _componentStore.GetAsync(componentId, ct) + ?? throw new ComponentNotFoundException(componentId); + + var connector = await _registryFactory.GetConnectorAsync( + component.RegistryUrl, ct); + + // Check if already a digest + if (IsDigest(tagOrDigest)) + { + var manifest = await connector.GetManifestAsync( + component.Repository, + tagOrDigest, + ct); + + return new ResolvedVersion( + Digest: tagOrDigest, + Tag: null, + Manifest: manifest, + ResolvedAt: TimeProvider.System.GetUtcNow() + ); + } + + // Resolve tag to digest + var tag = tagOrDigest; + var resolvedDigest = await connector.ResolveTagAsync( + component.Repository, + tag, + ct); + + var manifestData = await connector.GetManifestAsync( + component.Repository, + resolvedDigest, + ct); + + _logger.LogDebug( + "Resolved {Component}:{Tag} to {Digest}", + component.Name, + tag, + resolvedDigest); + + return new ResolvedVersion( + Digest: resolvedDigest, + Tag: tag, + Manifest: manifestData, + ResolvedAt: TimeProvider.System.GetUtcNow() + ); + } + + private static bool IsDigest(string value) => + value.StartsWith("sha256:", StringComparison.OrdinalIgnoreCase) && + value.Length == 71; // sha256: + 64 hex chars +} + +public sealed record ResolvedVersion( + string Digest, + string? Tag, + ManifestData Manifest, + DateTimeOffset ResolvedAt +); + +public sealed record ManifestData( + string MediaType, + long TotalSize, + string? Architecture, + string? Os, + IReadOnlyList Layers, + IReadOnlyDictionary Labels, + DateTimeOffset? CreatedAt +); + +public sealed record LayerInfo( + string Digest, + long Size, + string MediaType +); +``` + +### VersionWatcher + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Version; + +public sealed class VersionWatcher : IHostedService, IDisposable +{ + private readonly IComponentRegistry _componentRegistry; + private readonly IVersionManager _versionManager; + private readonly IRegistryConnectorFactory _registryFactory; + private readonly IEventPublisher _eventPublisher; + private readonly ILogger _logger; + private readonly TimeSpan _pollInterval = TimeSpan.FromMinutes(5); + private Timer? _timer; + + public Task StartAsync(CancellationToken ct) + { + _timer = new Timer( + PollForNewVersions, + null, + TimeSpan.FromMinutes(1), + _pollInterval); + + return Task.CompletedTask; + } + + public Task StopAsync(CancellationToken ct) + { + _timer?.Change(Timeout.Infinite, 0); + return Task.CompletedTask; + } + + private async void PollForNewVersions(object? state) + { + try + { + var components = await _componentRegistry.ListActiveAsync(); + + foreach (var component in components) + { + if (component.Config?.WatchForNewVersions != true) + continue; + + await CheckForNewVersionsAsync(component); + } + } + catch (Exception ex) + { + _logger.LogError(ex, "Version watch poll failed"); + } + } + + private async Task CheckForNewVersionsAsync(Component component) + { + try + { + var connector = await _registryFactory.GetConnectorAsync( + component.RegistryUrl); + + var tags = await connector.ListTagsAsync( + component.Repository, + component.Config?.TagPattern); + + foreach (var tag in tags) + { + var digest = await connector.ResolveTagAsync( + component.Repository, + tag); + + var exists = await _versionManager.DigestExistsAsync( + component.Id, + digest); + + if (!exists) + { + var version = await _versionManager.RecordVersionAsync( + new RecordVersionRequest( + ComponentId: component.Id, + Digest: digest, + Tag: tag, + SemVer: SemVerExtractor.TryParse(tag) + )); + + await _eventPublisher.PublishAsync(new NewVersionDiscovered( + component.Id, + component.TenantId, + component.Name, + version.Digest, + tag, + TimeProvider.System.GetUtcNow() + )); + + _logger.LogInformation( + "Discovered new version for {Component}: {Tag} ({Digest})", + component.Name, + tag, + version.ShortDigest); + } + } + } + catch (Exception ex) + { + _logger.LogWarning(ex, + "Failed to check versions for component {Component}", + component.Name); + } + } + + public void Dispose() => _timer?.Dispose(); +} +``` + +### SemVerExtractor + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Version; + +public static class SemVerExtractor +{ + private static readonly Regex SemVerPattern = new( + @"^v?(?\d+)\.(?\d+)\.(?\d+)" + + @"(?:-(?[0-9A-Za-z-]+(?:\.[0-9A-Za-z-]+)*))?" + + @"(?:\+(?[0-9A-Za-z-]+(?:\.[0-9A-Za-z-]+)*))?$", + RegexOptions.Compiled | RegexOptions.CultureInvariant); + + public static SemanticVersion? TryParse(string? tag) + { + if (string.IsNullOrEmpty(tag)) + return null; + + var match = SemVerPattern.Match(tag); + if (!match.Success) + return null; + + return new SemanticVersion( + Major: int.Parse(match.Groups["major"].Value, CultureInfo.InvariantCulture), + Minor: int.Parse(match.Groups["minor"].Value, CultureInfo.InvariantCulture), + Patch: int.Parse(match.Groups["patch"].Value, CultureInfo.InvariantCulture), + Prerelease: match.Groups["prerelease"].Success + ? match.Groups["prerelease"].Value + : null, + BuildMetadata: match.Groups["build"].Success + ? match.Groups["build"].Value + : null + ); + } + + public static bool IsValidSemVer(string tag) => + SemVerPattern.IsMatch(tag); +} +``` + +### Domain Events + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Events; + +public sealed record NewVersionDiscovered( + Guid ComponentId, + Guid TenantId, + string ComponentName, + string Digest, + string? Tag, + DateTimeOffset DiscoveredAt +) : IDomainEvent; + +public sealed record VersionResolved( + Guid ComponentId, + Guid TenantId, + string Tag, + string Digest, + DateTimeOffset ResolvedAt +) : IDomainEvent; +``` + +### Documentation Deliverables + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `docs/modules/release-orchestrator/api/releases.md` (partial) | Markdown | API endpoint documentation for version resolution (tag to digest, version maps) | + +--- + +## Acceptance Criteria + +### Code +- [ ] Resolve tag to digest +- [ ] Resolve digest returns same digest +- [ ] Record new version with metadata +- [ ] Extract semantic version from tag +- [ ] Watch for new versions +- [ ] Filter versions by criteria +- [ ] Get latest version for component +- [ ] List versions with pagination +- [ ] Unit test coverage >=85% + +### Documentation +- [ ] Version API endpoints documented +- [ ] Tag resolution endpoint documented +- [ ] Version map listing documented +- [ ] Digest-first principle explained + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `ResolveTag_ReturnsDigest` | Tag resolution works | +| `ResolveDigest_ReturnsSameDigest` | Digest passthrough works | +| `RecordVersion_StoresMetadata` | Recording works | +| `SemVerExtractor_ParsesValid` | SemVer parsing works | +| `SemVerExtractor_RejectsInvalid` | Invalid tags rejected | +| `GetLatest_ReturnsNewest` | Latest selection works | +| `ListVersions_AppliesFilter` | Filtering works | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `VersionResolution_E2E` | Full resolution flow | +| `VersionWatcher_E2E` | Discovery polling | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 104_001 Component Registry | Internal | TODO | +| 102_004 Registry Connectors | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IVersionManager | TODO | | +| VersionManager | TODO | | +| VersionResolver | TODO | | +| VersionWatcher | TODO | | +| SemVerExtractor | TODO | | +| ComponentVersion model | TODO | | +| IVersionStore | TODO | | +| VersionStore | TODO | | +| Domain events | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 11-Jan-2026 | Added documentation deliverable: api/releases.md (partial - versions) | diff --git a/docs/implplan/SPRINT_20260110_104_003_RELMAN_release_manager.md b/docs/implplan/SPRINT_20260110_104_003_RELMAN_release_manager.md new file mode 100644 index 000000000..7814b824d --- /dev/null +++ b/docs/implplan/SPRINT_20260110_104_003_RELMAN_release_manager.md @@ -0,0 +1,643 @@ +# SPRINT: Release Manager + +> **Sprint ID:** 104_003 +> **Module:** RELMAN +> **Phase:** 4 - Release Manager +> **Status:** TODO +> **Parent:** [104_000_INDEX](SPRINT_20260110_104_000_INDEX_release_manager.md) + +--- + +## Overview + +Implement the Release Manager for creating and managing release bundles containing multiple component versions. + +### Objectives + +- Create release bundles with multiple components +- Add/remove components from draft releases +- Finalize releases to lock component versions +- Generate release manifests + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Release/ +│ ├── Manager/ +│ │ ├── IReleaseManager.cs +│ │ ├── ReleaseManager.cs +│ │ ├── ReleaseValidator.cs +│ │ ├── ReleaseFinalizer.cs +│ │ └── ReleaseManifestGenerator.cs +│ ├── Store/ +│ │ ├── IReleaseStore.cs +│ │ └── ReleaseStore.cs +│ └── Models/ +│ ├── Release.cs +│ ├── ReleaseComponent.cs +│ ├── ReleaseStatus.cs +│ └── ReleaseManifest.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Release.Tests/ + └── Manager/ +``` + +--- + +## Architecture Reference + +- [Release Manager](../modules/release-orchestrator/modules/release-manager.md) +- [Data Model Schema](../modules/release-orchestrator/data-model/schema.md) + +--- + +## Deliverables + +### IReleaseManager Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Manager; + +public interface IReleaseManager +{ + // CRUD + Task CreateAsync(CreateReleaseRequest request, CancellationToken ct = default); + Task UpdateAsync(Guid id, UpdateReleaseRequest request, CancellationToken ct = default); + Task GetAsync(Guid id, CancellationToken ct = default); + Task GetByNameAsync(string name, CancellationToken ct = default); + Task DeleteAsync(Guid id, CancellationToken ct = default); + + // Component management + Task AddComponentAsync(Guid releaseId, AddComponentRequest request, CancellationToken ct = default); + Task UpdateComponentAsync(Guid releaseId, Guid componentId, UpdateReleaseComponentRequest request, CancellationToken ct = default); + Task RemoveComponentAsync(Guid releaseId, Guid componentId, CancellationToken ct = default); + + // Lifecycle + Task FinalizeAsync(Guid id, CancellationToken ct = default); + Task DeprecateAsync(Guid id, string reason, CancellationToken ct = default); + + // Manifest + Task GetManifestAsync(Guid id, CancellationToken ct = default); +} + +public sealed record CreateReleaseRequest( + string Name, + string DisplayName, + string? Description = null, + IReadOnlyList? Components = null +); + +public sealed record UpdateReleaseRequest( + string? DisplayName = null, + string? Description = null +); + +public sealed record AddComponentRequest( + Guid ComponentId, + string VersionRef, // Tag or digest + IReadOnlyDictionary? Config = null +); + +public sealed record UpdateReleaseComponentRequest( + string? VersionRef = null, + IReadOnlyDictionary? Config = null +); +``` + +### Release Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Models; + +public sealed record Release +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required string Name { get; init; } + public required string DisplayName { get; init; } + public string? Description { get; init; } + public required ReleaseStatus Status { get; init; } + public required ImmutableArray Components { get; init; } + public string? ManifestDigest { get; init; } // Set on finalization + public DateTimeOffset? FinalizedAt { get; init; } + public Guid? FinalizedBy { get; init; } + public string? DeprecationReason { get; init; } + public DateTimeOffset? DeprecatedAt { get; init; } + public DateTimeOffset CreatedAt { get; init; } + public DateTimeOffset UpdatedAt { get; init; } + public Guid CreatedBy { get; init; } + + public bool IsDraft => Status == ReleaseStatus.Draft; + public bool IsFinalized => Status != ReleaseStatus.Draft; +} + +public enum ReleaseStatus +{ + Draft, // Can be modified + Ready, // Finalized, can be promoted + Promoting, // Currently being promoted + Deployed, // Deployed to at least one environment + Deprecated // Should not be used +} + +public sealed record ReleaseComponent +{ + public required Guid Id { get; init; } + public required Guid ComponentId { get; init; } + public required string ComponentName { get; init; } + public required string Digest { get; init; } + public string? Tag { get; init; } + public string? SemVer { get; init; } + public ImmutableDictionary Config { get; init; } = + ImmutableDictionary.Empty; + public int OrderIndex { get; init; } +} +``` + +### ReleaseManager Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Manager; + +public sealed class ReleaseManager : IReleaseManager +{ + private readonly IReleaseStore _store; + private readonly IReleaseValidator _validator; + private readonly IVersionManager _versionManager; + private readonly IComponentRegistry _componentRegistry; + private readonly IReleaseFinalizer _finalizer; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + private readonly ILogger _logger; + + public async Task CreateAsync( + CreateReleaseRequest request, + CancellationToken ct = default) + { + var validation = await _validator.ValidateCreateAsync(request, ct); + if (!validation.IsValid) + { + throw new ReleaseValidationException(validation.Errors); + } + + var release = new Release + { + Id = _guidGenerator.NewGuid(), + TenantId = _tenantContext.TenantId, + Name = request.Name, + DisplayName = request.DisplayName, + Description = request.Description, + Status = ReleaseStatus.Draft, + Components = [], + CreatedAt = _timeProvider.GetUtcNow(), + UpdatedAt = _timeProvider.GetUtcNow(), + CreatedBy = _userContext.UserId + }; + + await _store.SaveAsync(release, ct); + + // Add initial components if provided + if (request.Components?.Count > 0) + { + foreach (var compRequest in request.Components) + { + release = await AddComponentInternalAsync(release, compRequest, ct); + } + } + + await _eventPublisher.PublishAsync(new ReleaseCreated( + release.Id, + release.TenantId, + release.Name, + _timeProvider.GetUtcNow() + ), ct); + + return release; + } + + public async Task AddComponentAsync( + Guid releaseId, + AddComponentRequest request, + CancellationToken ct = default) + { + var release = await _store.GetAsync(releaseId, ct) + ?? throw new ReleaseNotFoundException(releaseId); + + if (!release.IsDraft) + { + throw new ReleaseNotEditableException(releaseId, + "Cannot modify finalized release"); + } + + return await AddComponentInternalAsync(release, request, ct); + } + + private async Task AddComponentInternalAsync( + Release release, + AddComponentRequest request, + CancellationToken ct) + { + // Check component exists + var component = await _componentRegistry.GetAsync(request.ComponentId, ct) + ?? throw new ComponentNotFoundException(request.ComponentId); + + // Check for duplicate component + if (release.Components.Any(c => c.ComponentId == request.ComponentId)) + { + throw new DuplicateReleaseComponentException(release.Id, request.ComponentId); + } + + // Resolve version + var version = await _versionManager.ResolveAsync( + request.ComponentId, + request.VersionRef, + ct); + + var releaseComponent = new ReleaseComponent + { + Id = _guidGenerator.NewGuid(), + ComponentId = component.Id, + ComponentName = component.Name, + Digest = version.Digest, + Tag = version.Tag, + SemVer = version.SemVer?.ToString(), + Config = request.Config?.ToImmutableDictionary() ?? + ImmutableDictionary.Empty, + OrderIndex = release.Components.Length + }; + + var updatedRelease = release with + { + Components = release.Components.Add(releaseComponent), + UpdatedAt = _timeProvider.GetUtcNow() + }; + + await _store.SaveAsync(updatedRelease, ct); + + _logger.LogInformation( + "Added component {Component}@{Digest} to release {Release}", + component.Name, + version.ShortDigest, + release.Name); + + return updatedRelease; + } + + public async Task FinalizeAsync( + Guid id, + CancellationToken ct = default) + { + var release = await _store.GetAsync(id, ct) + ?? throw new ReleaseNotFoundException(id); + + if (!release.IsDraft) + { + throw new ReleaseAlreadyFinalizedException(id); + } + + // Validate release is complete + var validation = await _validator.ValidateFinalizeAsync(release, ct); + if (!validation.IsValid) + { + throw new ReleaseValidationException(validation.Errors); + } + + // Generate manifest and digest + var (manifest, manifestDigest) = await _finalizer.FinalizeAsync(release, ct); + + var finalizedRelease = release with + { + Status = ReleaseStatus.Ready, + ManifestDigest = manifestDigest, + FinalizedAt = _timeProvider.GetUtcNow(), + FinalizedBy = _userContext.UserId, + UpdatedAt = _timeProvider.GetUtcNow() + }; + + await _store.SaveAsync(finalizedRelease, ct); + await _store.SaveManifestAsync(id, manifest, ct); + + await _eventPublisher.PublishAsync(new ReleaseFinalized( + release.Id, + release.TenantId, + release.Name, + manifestDigest, + release.Components.Length, + _timeProvider.GetUtcNow() + ), ct); + + _logger.LogInformation( + "Finalized release {Release} with {Count} components (manifest: {Digest})", + release.Name, + release.Components.Length, + manifestDigest[..16]); + + return finalizedRelease; + } + + public async Task DeleteAsync(Guid id, CancellationToken ct = default) + { + var release = await _store.GetAsync(id, ct) + ?? throw new ReleaseNotFoundException(id); + + if (!release.IsDraft) + { + throw new ReleaseNotEditableException(id, + "Cannot delete finalized release"); + } + + await _store.DeleteAsync(id, ct); + + await _eventPublisher.PublishAsync(new ReleaseDeleted( + release.Id, + release.TenantId, + release.Name, + _timeProvider.GetUtcNow() + ), ct); + } +} +``` + +### ReleaseFinalizer + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Manager; + +public sealed class ReleaseFinalizer : IReleaseFinalizer +{ + private readonly IReleaseManifestGenerator _manifestGenerator; + private readonly ILogger _logger; + + public async Task<(ReleaseManifest Manifest, string Digest)> FinalizeAsync( + Release release, + CancellationToken ct = default) + { + // Generate canonical manifest + var manifest = await _manifestGenerator.GenerateAsync(release, ct); + + // Compute digest of canonical JSON + var canonicalJson = CanonicalJsonSerializer.Serialize(manifest); + var digest = ComputeDigest(canonicalJson); + + _logger.LogDebug( + "Generated manifest for release {Release}: {Digest}", + release.Name, + digest); + + return (manifest, digest); + } + + private static string ComputeDigest(string content) + { + var bytes = Encoding.UTF8.GetBytes(content); + var hash = SHA256.HashData(bytes); + return $"sha256:{Convert.ToHexString(hash).ToLowerInvariant()}"; + } +} +``` + +### ReleaseManifest + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Models; + +public sealed record ReleaseManifest +{ + public required string SchemaVersion { get; init; } = "1.0"; + public required string Name { get; init; } + public required string DisplayName { get; init; } + public string? Description { get; init; } + public required DateTimeOffset FinalizedAt { get; init; } + public required string FinalizedBy { get; init; } + public required ImmutableArray Components { get; init; } + public required ManifestMetadata Metadata { get; init; } +} + +public sealed record ManifestComponent( + string Name, + string Registry, + string Repository, + string Digest, + string? Tag, + string? SemVer, + int Order +); + +public sealed record ManifestMetadata( + string TenantId, + string CreatedBy, + DateTimeOffset CreatedAt, + int TotalComponents, + long? TotalSizeBytes +); +``` + +### ReleaseValidator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Manager; + +public sealed class ReleaseValidator : IReleaseValidator +{ + private readonly IReleaseStore _store; + + public async Task ValidateCreateAsync( + CreateReleaseRequest request, + CancellationToken ct = default) + { + var errors = new List(); + + // Name format validation + if (!IsValidReleaseName(request.Name)) + { + errors.Add("Release name must be lowercase alphanumeric with hyphens, 2-64 characters"); + } + + // Check for duplicate name + var existing = await _store.GetByNameAsync(request.Name, ct); + if (existing is not null) + { + errors.Add($"Release with name '{request.Name}' already exists"); + } + + return errors.Count == 0 + ? ValidationResult.Success() + : ValidationResult.Failure(errors); + } + + public async Task ValidateFinalizeAsync( + Release release, + CancellationToken ct = default) + { + var errors = new List(); + + // Must have at least one component + if (release.Components.Length == 0) + { + errors.Add("Release must have at least one component"); + } + + // All components must have valid digests + foreach (var component in release.Components) + { + if (string.IsNullOrEmpty(component.Digest)) + { + errors.Add($"Component {component.ComponentName} has no digest"); + } + + if (!IsValidDigest(component.Digest)) + { + errors.Add($"Component {component.ComponentName} has invalid digest format"); + } + } + + return errors.Count == 0 + ? ValidationResult.Success() + : ValidationResult.Failure(errors); + } + + private static bool IsValidReleaseName(string name) => + Regex.IsMatch(name, @"^[a-z][a-z0-9-]{1,63}$"); + + private static bool IsValidDigest(string digest) => + digest.StartsWith("sha256:", StringComparison.OrdinalIgnoreCase) && + digest.Length == 71; +} +``` + +### Domain Events + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Events; + +public sealed record ReleaseCreated( + Guid ReleaseId, + Guid TenantId, + string Name, + DateTimeOffset CreatedAt +) : IDomainEvent; + +public sealed record ReleaseComponentAdded( + Guid ReleaseId, + Guid TenantId, + Guid ComponentId, + string ComponentName, + string Digest, + DateTimeOffset AddedAt +) : IDomainEvent; + +public sealed record ReleaseFinalized( + Guid ReleaseId, + Guid TenantId, + string Name, + string ManifestDigest, + int ComponentCount, + DateTimeOffset FinalizedAt +) : IDomainEvent; + +public sealed record ReleaseDeprecated( + Guid ReleaseId, + Guid TenantId, + string Name, + string Reason, + DateTimeOffset DeprecatedAt +) : IDomainEvent; + +public sealed record ReleaseDeleted( + Guid ReleaseId, + Guid TenantId, + string Name, + DateTimeOffset DeletedAt +) : IDomainEvent; +``` + +### Documentation Deliverables + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `docs/modules/release-orchestrator/api/releases.md` (partial) | Markdown | API endpoint documentation for release management (create, quick create, compare) | + +--- + +## Acceptance Criteria + +### Code +- [ ] Create draft release +- [ ] Add components to draft release +- [ ] Remove components from draft release +- [ ] Finalize release locks versions +- [ ] Cannot modify finalized release +- [ ] Generate release manifest +- [ ] Compute manifest digest +- [ ] Deprecate release +- [ ] Delete only draft releases +- [ ] Unit test coverage >=85% + +### Documentation +- [ ] Release API endpoints documented +- [ ] Create release endpoint documented with full schema +- [ ] Quick create release endpoint documented +- [ ] Compare releases endpoint documented +- [ ] Release creation modes explained + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `CreateRelease_ValidRequest_Succeeds` | Creation works | +| `AddComponent_ToDraft_Succeeds` | Add component works | +| `AddComponent_ToFinalized_Fails` | Finalized protection works | +| `AddComponent_Duplicate_Fails` | Duplicate check works | +| `FinalizeRelease_GeneratesManifest` | Finalization works | +| `FinalizeRelease_NoComponents_Fails` | Validation works | +| `DeleteRelease_Draft_Succeeds` | Draft deletion works | +| `DeleteRelease_Finalized_Fails` | Finalized protection works | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `ReleaseLifecycle_E2E` | Full create-add-finalize flow | +| `ManifestGeneration_E2E` | Manifest correctness | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 104_002 Version Manager | Internal | TODO | +| 104_001 Component Registry | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IReleaseManager | TODO | | +| ReleaseManager | TODO | | +| ReleaseValidator | TODO | | +| ReleaseFinalizer | TODO | | +| ReleaseManifestGenerator | TODO | | +| Release model | TODO | | +| IReleaseStore | TODO | | +| ReleaseStore | TODO | | +| Domain events | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 11-Jan-2026 | Added documentation deliverable: api/releases.md (partial - releases) | diff --git a/docs/implplan/SPRINT_20260110_104_004_RELMAN_release_catalog.md b/docs/implplan/SPRINT_20260110_104_004_RELMAN_release_catalog.md new file mode 100644 index 000000000..11ef0caf0 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_104_004_RELMAN_release_catalog.md @@ -0,0 +1,623 @@ +# SPRINT: Release Catalog + +> **Sprint ID:** 104_004 +> **Module:** RELMAN +> **Phase:** 4 - Release Manager +> **Status:** TODO +> **Parent:** [104_000_INDEX](SPRINT_20260110_104_000_INDEX_release_manager.md) + +--- + +## Overview + +Implement the Release Catalog for querying releases and tracking deployment history. + +### Objectives + +- Query releases with filtering and pagination +- Track release status transitions +- Maintain deployment history per environment +- Support release comparison + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Release/ +│ ├── Catalog/ +│ │ ├── IReleaseCatalog.cs +│ │ ├── ReleaseCatalog.cs +│ │ ├── ReleaseStatusMachine.cs +│ │ └── ReleaseComparer.cs +│ ├── History/ +│ │ ├── IReleaseHistory.cs +│ │ ├── ReleaseHistory.cs +│ │ └── DeploymentRecord.cs +│ └── Models/ +│ ├── ReleaseDeploymentHistory.cs +│ └── ReleaseComparison.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Release.Tests/ + └── Catalog/ +``` + +--- + +## Architecture Reference + +- [Release Manager](../modules/release-orchestrator/modules/release-manager.md) +- [Data Model Schema](../modules/release-orchestrator/data-model/schema.md) + +--- + +## Deliverables + +### IReleaseCatalog Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Catalog; + +public interface IReleaseCatalog +{ + // Queries + Task> ListAsync( + ReleaseFilter? filter = null, + CancellationToken ct = default); + + Task> ListPagedAsync( + ReleaseFilter? filter, + PaginationParams pagination, + CancellationToken ct = default); + + Task GetLatestAsync(CancellationToken ct = default); + + Task GetLatestDeployedAsync( + Guid environmentId, + CancellationToken ct = default); + + Task> GetDeployedReleasesAsync( + Guid environmentId, + CancellationToken ct = default); + + // History + Task GetHistoryAsync( + Guid releaseId, + CancellationToken ct = default); + + Task> GetEnvironmentHistoryAsync( + Guid environmentId, + int limit = 50, + CancellationToken ct = default); + + // Comparison + Task CompareAsync( + Guid sourceReleaseId, + Guid targetReleaseId, + CancellationToken ct = default); +} + +public sealed record ReleaseFilter( + string? NameContains = null, + ReleaseStatus? Status = null, + Guid? ComponentId = null, + DateTimeOffset? CreatedAfter = null, + DateTimeOffset? CreatedBefore = null, + DateTimeOffset? FinalizedAfter = null, + DateTimeOffset? FinalizedBefore = null, + bool? HasDeployments = null +); + +public sealed record PaginationParams( + int PageNumber = 1, + int PageSize = 20, + string? SortBy = null, + bool SortDescending = true +); + +public sealed record PagedResult( + IReadOnlyList Items, + int TotalCount, + int PageNumber, + int PageSize, + int TotalPages +); +``` + +### ReleaseDeploymentHistory + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Models; + +public sealed record ReleaseDeploymentHistory +{ + public required Guid ReleaseId { get; init; } + public required string ReleaseName { get; init; } + public required ImmutableArray Deployments { get; init; } + public DateTimeOffset? FirstDeployedAt { get; init; } + public DateTimeOffset? LastDeployedAt { get; init; } + public int TotalDeployments { get; init; } +} + +public sealed record EnvironmentDeployment( + Guid EnvironmentId, + string EnvironmentName, + DeploymentStatus Status, + DateTimeOffset DeployedAt, + Guid DeployedBy, + DateTimeOffset? ReplacedAt, + Guid? ReplacedByReleaseId +); + +public sealed record DeploymentRecord +{ + public required Guid Id { get; init; } + public required Guid EnvironmentId { get; init; } + public required Guid ReleaseId { get; init; } + public required string ReleaseName { get; init; } + public required DeploymentStatus Status { get; init; } + public required DateTimeOffset DeployedAt { get; init; } + public required Guid DeployedBy { get; init; } + public TimeSpan? Duration { get; init; } + public string? Notes { get; init; } +} + +public enum DeploymentStatus +{ + Current, // Currently deployed + Replaced, // Was deployed, replaced by newer + RolledBack, // Was rolled back from + Failed // Deployment failed +} +``` + +### ReleaseStatusMachine + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Catalog; + +public sealed class ReleaseStatusMachine +{ + private static readonly ImmutableDictionary> ValidTransitions = + new Dictionary> + { + [ReleaseStatus.Draft] = [ReleaseStatus.Ready], + [ReleaseStatus.Ready] = [ReleaseStatus.Promoting, ReleaseStatus.Deprecated], + [ReleaseStatus.Promoting] = [ReleaseStatus.Ready, ReleaseStatus.Deployed], + [ReleaseStatus.Deployed] = [ReleaseStatus.Promoting, ReleaseStatus.Deprecated], + [ReleaseStatus.Deprecated] = [] // Terminal state + }.ToImmutableDictionary(); + + public bool CanTransition(ReleaseStatus from, ReleaseStatus to) + { + if (!ValidTransitions.TryGetValue(from, out var validTargets)) + return false; + + return validTargets.Contains(to); + } + + public ValidationResult ValidateTransition(ReleaseStatus from, ReleaseStatus to) + { + if (CanTransition(from, to)) + return ValidationResult.Success(); + + return ValidationResult.Failure( + $"Invalid status transition from {from} to {to}"); + } + + public IReadOnlyList GetValidTransitions(ReleaseStatus current) + { + return ValidTransitions.TryGetValue(current, out var targets) + ? targets + : []; + } +} +``` + +### ReleaseCatalog Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Catalog; + +public sealed class ReleaseCatalog : IReleaseCatalog +{ + private readonly IReleaseStore _releaseStore; + private readonly IDeploymentStore _deploymentStore; + private readonly IEnvironmentService _environmentService; + private readonly ILogger _logger; + + public async Task> ListPagedAsync( + ReleaseFilter? filter, + PaginationParams pagination, + CancellationToken ct = default) + { + var (releases, totalCount) = await _releaseStore.QueryAsync( + filter, + pagination.PageNumber, + pagination.PageSize, + pagination.SortBy, + pagination.SortDescending, + ct); + + var totalPages = (int)Math.Ceiling((double)totalCount / pagination.PageSize); + + return new PagedResult( + Items: releases, + TotalCount: totalCount, + PageNumber: pagination.PageNumber, + PageSize: pagination.PageSize, + TotalPages: totalPages + ); + } + + public async Task GetLatestDeployedAsync( + Guid environmentId, + CancellationToken ct = default) + { + var deployment = await _deploymentStore.GetCurrentDeploymentAsync( + environmentId, ct); + + if (deployment is null) + return null; + + return await _releaseStore.GetAsync(deployment.ReleaseId, ct); + } + + public async Task GetHistoryAsync( + Guid releaseId, + CancellationToken ct = default) + { + var release = await _releaseStore.GetAsync(releaseId, ct) + ?? throw new ReleaseNotFoundException(releaseId); + + var deployments = await _deploymentStore.GetDeploymentsForReleaseAsync( + releaseId, ct); + + var environments = await _environmentService.ListAsync(ct); + var envLookup = environments.ToDictionary(e => e.Id); + + var envDeployments = deployments + .Select(d => new EnvironmentDeployment( + EnvironmentId: d.EnvironmentId, + EnvironmentName: envLookup.TryGetValue(d.EnvironmentId, out var env) + ? env.Name + : "Unknown", + Status: d.Status, + DeployedAt: d.DeployedAt, + DeployedBy: d.DeployedBy, + ReplacedAt: d.ReplacedAt, + ReplacedByReleaseId: d.ReplacedByReleaseId + )) + .ToImmutableArray(); + + return new ReleaseDeploymentHistory + { + ReleaseId = release.Id, + ReleaseName = release.Name, + Deployments = envDeployments, + FirstDeployedAt = envDeployments.MinBy(d => d.DeployedAt)?.DeployedAt, + LastDeployedAt = envDeployments.MaxBy(d => d.DeployedAt)?.DeployedAt, + TotalDeployments = envDeployments.Length + }; + } +} +``` + +### ReleaseComparer + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Catalog; + +public sealed class ReleaseComparer +{ + public ReleaseComparison Compare(Release source, Release target) + { + var sourceComponents = source.Components.ToDictionary(c => c.ComponentId); + var targetComponents = target.Components.ToDictionary(c => c.ComponentId); + + var added = new List(); + var removed = new List(); + var changed = new List(); + var unchanged = new List(); + + // Find added and changed components + foreach (var (componentId, targetComp) in targetComponents) + { + if (!sourceComponents.TryGetValue(componentId, out var sourceComp)) + { + added.Add(new ComponentChange( + ComponentId: componentId, + ComponentName: targetComp.ComponentName, + ChangeType: ComponentChangeType.Added, + OldDigest: null, + NewDigest: targetComp.Digest, + OldTag: null, + NewTag: targetComp.Tag + )); + } + else if (sourceComp.Digest != targetComp.Digest) + { + changed.Add(new ComponentChange( + ComponentId: componentId, + ComponentName: targetComp.ComponentName, + ChangeType: ComponentChangeType.Changed, + OldDigest: sourceComp.Digest, + NewDigest: targetComp.Digest, + OldTag: sourceComp.Tag, + NewTag: targetComp.Tag + )); + } + else + { + unchanged.Add(new ComponentChange( + ComponentId: componentId, + ComponentName: targetComp.ComponentName, + ChangeType: ComponentChangeType.Unchanged, + OldDigest: sourceComp.Digest, + NewDigest: targetComp.Digest, + OldTag: sourceComp.Tag, + NewTag: targetComp.Tag + )); + } + } + + // Find removed components + foreach (var (componentId, sourceComp) in sourceComponents) + { + if (!targetComponents.ContainsKey(componentId)) + { + removed.Add(new ComponentChange( + ComponentId: componentId, + ComponentName: sourceComp.ComponentName, + ChangeType: ComponentChangeType.Removed, + OldDigest: sourceComp.Digest, + NewDigest: null, + OldTag: sourceComp.Tag, + NewTag: null + )); + } + } + + return new ReleaseComparison( + SourceReleaseId: source.Id, + SourceReleaseName: source.Name, + TargetReleaseId: target.Id, + TargetReleaseName: target.Name, + Added: added.ToImmutableArray(), + Removed: removed.ToImmutableArray(), + Changed: changed.ToImmutableArray(), + Unchanged: unchanged.ToImmutableArray(), + HasChanges: added.Count > 0 || removed.Count > 0 || changed.Count > 0 + ); + } +} + +public sealed record ReleaseComparison( + Guid SourceReleaseId, + string SourceReleaseName, + Guid TargetReleaseId, + string TargetReleaseName, + ImmutableArray Added, + ImmutableArray Removed, + ImmutableArray Changed, + ImmutableArray Unchanged, + bool HasChanges +) +{ + public int TotalChanges => Added.Length + Removed.Length + Changed.Length; +} + +public sealed record ComponentChange( + Guid ComponentId, + string ComponentName, + ComponentChangeType ChangeType, + string? OldDigest, + string? NewDigest, + string? OldTag, + string? NewTag +); + +public enum ComponentChangeType +{ + Added, + Removed, + Changed, + Unchanged +} +``` + +### ReleaseHistory Service + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.History; + +public interface IReleaseHistory +{ + Task RecordDeploymentAsync( + Guid releaseId, + Guid environmentId, + Guid deploymentId, + CancellationToken ct = default); + + Task RecordReplacementAsync( + Guid oldReleaseId, + Guid newReleaseId, + Guid environmentId, + CancellationToken ct = default); + + Task RecordRollbackAsync( + Guid fromReleaseId, + Guid toReleaseId, + Guid environmentId, + CancellationToken ct = default); +} + +public sealed class ReleaseHistory : IReleaseHistory +{ + private readonly IDeploymentStore _store; + private readonly IReleaseStore _releaseStore; + private readonly ReleaseStatusMachine _statusMachine; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public async Task RecordDeploymentAsync( + Guid releaseId, + Guid environmentId, + Guid deploymentId, + CancellationToken ct = default) + { + var release = await _releaseStore.GetAsync(releaseId, ct) + ?? throw new ReleaseNotFoundException(releaseId); + + // Mark any existing deployment as replaced + var currentDeployment = await _store.GetCurrentDeploymentAsync( + environmentId, ct); + + if (currentDeployment is not null) + { + await _store.MarkReplacedAsync( + currentDeployment.Id, + releaseId, + _timeProvider.GetUtcNow(), + ct); + } + + // Update release status if first deployment + if (release.Status == ReleaseStatus.Ready || + release.Status == ReleaseStatus.Promoting) + { + var updatedRelease = release with + { + Status = ReleaseStatus.Deployed, + UpdatedAt = _timeProvider.GetUtcNow() + }; + await _releaseStore.SaveAsync(updatedRelease, ct); + } + + _logger.LogInformation( + "Recorded deployment of release {Release} to environment {Environment}", + release.Name, + environmentId); + } + + public async Task RecordRollbackAsync( + Guid fromReleaseId, + Guid toReleaseId, + Guid environmentId, + CancellationToken ct = default) + { + // Mark the from-deployment as rolled back + var currentDeployment = await _store.GetCurrentDeploymentAsync( + environmentId, ct); + + if (currentDeployment?.ReleaseId == fromReleaseId) + { + await _store.MarkRolledBackAsync( + currentDeployment.Id, + _timeProvider.GetUtcNow(), + ct); + } + + await _eventPublisher.PublishAsync(new ReleaseRolledBack( + FromReleaseId: fromReleaseId, + ToReleaseId: toReleaseId, + EnvironmentId: environmentId, + RolledBackAt: _timeProvider.GetUtcNow() + ), ct); + } +} +``` + +### Domain Events + +```csharp +namespace StellaOps.ReleaseOrchestrator.Release.Events; + +public sealed record ReleaseStatusChanged( + Guid ReleaseId, + Guid TenantId, + ReleaseStatus OldStatus, + ReleaseStatus NewStatus, + DateTimeOffset ChangedAt +) : IDomainEvent; + +public sealed record ReleaseRolledBack( + Guid FromReleaseId, + Guid ToReleaseId, + Guid EnvironmentId, + DateTimeOffset RolledBackAt +) : IDomainEvent; +``` + +--- + +## Acceptance Criteria + +- [ ] List releases with filtering +- [ ] Paginate release list +- [ ] Get latest deployed release for environment +- [ ] Track deployment history +- [ ] Record status transitions +- [ ] Compare two releases +- [ ] Identify added/removed/changed components +- [ ] Record rollback history +- [ ] Status machine validates transitions +- [ ] Unit test coverage >=85% + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `ListReleases_WithFilter_ReturnsFiltered` | Filtering works | +| `ListReleases_Paginated_ReturnsPaged` | Pagination works | +| `GetLatestDeployed_ReturnsCorrect` | Latest lookup works | +| `CompareReleases_DetectsChanges` | Comparison works | +| `StatusMachine_ValidTransitions` | Valid transitions work | +| `StatusMachine_InvalidTransitions_Rejected` | Invalid rejected | +| `RecordDeployment_UpdatesHistory` | History recording works | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `ReleaseCatalog_E2E` | Full query/history flow | +| `ReleaseComparison_E2E` | Comparison accuracy | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 104_003 Release Manager | Internal | TODO | +| 103_001 Environment CRUD | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IReleaseCatalog | TODO | | +| ReleaseCatalog | TODO | | +| ReleaseStatusMachine | TODO | | +| ReleaseComparer | TODO | | +| IReleaseHistory | TODO | | +| ReleaseHistory | TODO | | +| IDeploymentStore | TODO | | +| DeploymentStore | TODO | | +| Domain events | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_105_000_INDEX_workflow_engine.md b/docs/implplan/SPRINT_20260110_105_000_INDEX_workflow_engine.md new file mode 100644 index 000000000..3befbbbe1 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_105_000_INDEX_workflow_engine.md @@ -0,0 +1,263 @@ +# SPRINT INDEX: Phase 5 - Workflow Engine + +> **Epic:** Release Orchestrator +> **Phase:** 5 - Workflow Engine +> **Batch:** 105 +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_release_orchestrator.md) + +--- + +## Overview + +Phase 5 implements the Workflow Engine - DAG-based workflow execution for deployments, promotions, and custom automation. + +### Objectives + +- Workflow template designer with YAML/JSON DSL +- Step registry for built-in and plugin steps +- DAG executor with parallel and sequential execution +- Step executor with retry and timeout handling +- Built-in steps (script, approval, notification) + +--- + +## Sprint Structure + +| Sprint ID | Title | Module | Status | Dependencies | +|-----------|-------|--------|--------|--------------| +| 105_001 | Workflow Template Designer | WORKFL | TODO | 101_001 | +| 105_002 | Step Registry | WORKFL | TODO | 101_002 | +| 105_003 | Workflow Engine - DAG Executor | WORKFL | TODO | 105_001, 105_002 | +| 105_004 | Step Executor | WORKFL | TODO | 105_003 | +| 105_005 | Built-in Steps | WORKFL | TODO | 105_004 | + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ WORKFLOW ENGINE │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ WORKFLOW TEMPLATE (105_001) │ │ +│ │ │ │ +│ │ name: deploy-to-production │ │ +│ │ steps: │ │ +│ │ - id: security-scan │ │ +│ │ type: security-gate │ │ +│ │ - id: approval │ │ +│ │ type: approval │ │ +│ │ dependsOn: [security-scan] │ │ +│ │ - id: deploy │ │ +│ │ type: deploy │ │ +│ │ dependsOn: [approval] │ │ +│ │ - id: notify │ │ +│ │ type: notify │ │ +│ │ dependsOn: [deploy] │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ STEP REGISTRY (105_002) │ │ +│ │ │ │ +│ │ Built-in Steps: Plugin Steps: │ │ +│ │ ├── script ├── custom-gate │ │ +│ │ ├── approval ├── jira-update │ │ +│ │ ├── notify ├── terraform-apply │ │ +│ │ ├── wait └── k8s-rollout │ │ +│ │ ├── security-gate │ │ +│ │ └── deploy │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ DAG EXECUTOR (105_003) │ │ +│ │ │ │ +│ │ ┌─────────────┐ │ │ +│ │ │security-scan│ │ │ +│ │ └──────┬──────┘ │ │ +│ │ │ │ │ +│ │ ▼ │ │ +│ │ ┌─────────────┐ │ │ +│ │ │ approval │ │ │ +│ │ └──────┬──────┘ │ │ +│ │ │ │ │ +│ │ ┌────┴────┐ │ │ +│ │ ▼ ▼ │ │ +│ │ ┌──────┐ ┌──────┐ (parallel) │ │ +│ │ │deploy│ │smoke │ │ │ +│ │ └──┬───┘ └──┬───┘ │ │ +│ │ └────┬───┘ │ │ +│ │ ▼ │ │ +│ │ ┌─────────────┐ │ │ +│ │ │ notify │ │ │ +│ │ └─────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Deliverables Summary + +### 105_001: Workflow Template Designer + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `WorkflowTemplate` | Model | Template entity | +| `WorkflowParser` | Class | YAML/JSON parser | +| `WorkflowValidator` | Class | DAG validation | +| `TemplateStore` | Class | Persistence | + +### 105_002: Step Registry + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IStepRegistry` | Interface | Step lookup | +| `StepRegistry` | Class | Implementation | +| `StepDefinition` | Model | Step metadata | +| `StepSchema` | Class | Config schema | + +### 105_003: DAG Executor + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IWorkflowEngine` | Interface | Execution control | +| `WorkflowEngine` | Class | Implementation | +| `DagScheduler` | Class | Step scheduling | +| `WorkflowRun` | Model | Execution state | + +### 105_004: Step Executor + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IStepExecutor` | Interface | Step execution | +| `StepExecutor` | Class | Implementation | +| `StepContext` | Model | Execution context | +| `StepRetryPolicy` | Class | Retry handling | + +### 105_005: Built-in Steps + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `ScriptStep` | Step | Execute shell scripts | +| `ApprovalStep` | Step | Manual approval | +| `NotifyStep` | Step | Send notifications | +| `WaitStep` | Step | Time delay | +| `SecurityGateStep` | Step | Security check | +| `DeployStep` | Step | Deployment trigger | + +--- + +## Key Interfaces + +```csharp +public interface IWorkflowEngine +{ + Task StartAsync(Guid templateId, WorkflowContext context, CancellationToken ct); + Task ResumeAsync(Guid runId, CancellationToken ct); + Task CancelAsync(Guid runId, CancellationToken ct); + Task GetRunAsync(Guid runId, CancellationToken ct); +} + +public interface IStepExecutor +{ + Task ExecuteAsync(StepDefinition step, StepContext context, CancellationToken ct); +} + +public interface IStepRegistry +{ + void RegisterBuiltIn(string type) where T : IStepProvider; + Task GetAsync(string type, CancellationToken ct); + IReadOnlyList GetAllDefinitions(); +} +``` + +--- + +## Workflow DSL Example + +```yaml +name: production-deployment +version: 1 +triggers: + - type: promotion + environment: production + +steps: + - id: security-check + type: security-gate + config: + maxCritical: 0 + maxHigh: 5 + + - id: lead-approval + type: approval + dependsOn: [security-check] + config: + approvers: ["@release-managers"] + minApprovals: 1 + + - id: deploy + type: deploy + dependsOn: [lead-approval] + config: + strategy: rolling + batchSize: 25% + + - id: smoke-test + type: script + dependsOn: [deploy] + config: + script: ./scripts/smoke-test.sh + timeout: 300 + + - id: notify-success + type: notify + dependsOn: [smoke-test] + condition: success() + config: + channel: slack + message: "Deployment to production succeeded" + + - id: notify-failure + type: notify + dependsOn: [smoke-test] + condition: failure() + config: + channel: slack + message: "Deployment to production FAILED" +``` + +--- + +## Dependencies + +| Module | Purpose | +|--------|---------| +| 101_002 Plugin Registry | Plugin steps | +| 101_003 Plugin Loader | Execute plugin steps | +| 106_* Promotion | Gate integration | + +--- + +## Acceptance Criteria + +- [ ] Workflow templates parse correctly +- [ ] DAG cycle detection works +- [ ] Parallel steps execute concurrently +- [ ] Step dependencies respected +- [ ] Retry policy works +- [ ] Timeout cancels steps +- [ ] Built-in steps functional +- [ ] Workflow state persisted +- [ ] Unit test coverage ≥80% + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Phase 5 index created | diff --git a/docs/implplan/SPRINT_20260110_105_001_WORKFL_workflow_template.md b/docs/implplan/SPRINT_20260110_105_001_WORKFL_workflow_template.md new file mode 100644 index 000000000..41eee5953 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_105_001_WORKFL_workflow_template.md @@ -0,0 +1,686 @@ +# SPRINT: Workflow Template Designer + +> **Sprint ID:** 105_001 +> **Module:** WORKFL +> **Phase:** 5 - Workflow Engine +> **Status:** TODO +> **Parent:** [105_000_INDEX](SPRINT_20260110_105_000_INDEX_workflow_engine.md) + +--- + +## Overview + +Implement the Workflow Template Designer for defining deployment and automation workflows using YAML/JSON DSL. + +### Objectives + +- Define workflow template data model +- Parse YAML/JSON workflow definitions +- Validate DAG structure (no cycles) +- Store and version workflow templates + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Workflow/ +│ ├── Template/ +│ │ ├── IWorkflowTemplateService.cs +│ │ ├── WorkflowTemplateService.cs +│ │ ├── WorkflowParser.cs +│ │ ├── WorkflowValidator.cs +│ │ └── DagBuilder.cs +│ ├── Store/ +│ │ ├── IWorkflowTemplateStore.cs +│ │ └── WorkflowTemplateStore.cs +│ └── Models/ +│ ├── WorkflowTemplate.cs +│ ├── WorkflowStep.cs +│ ├── StepConfig.cs +│ └── WorkflowTrigger.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Workflow.Tests/ + └── Template/ +``` + +--- + +## Architecture Reference + +- [Workflow Engine](../modules/release-orchestrator/modules/workflow-engine.md) +- [Data Model Schema](../modules/release-orchestrator/data-model/schema.md) + +--- + +## Deliverables + +### IWorkflowTemplateService Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Template; + +public interface IWorkflowTemplateService +{ + Task CreateAsync(CreateWorkflowTemplateRequest request, CancellationToken ct = default); + Task UpdateAsync(Guid id, UpdateWorkflowTemplateRequest request, CancellationToken ct = default); + Task GetAsync(Guid id, CancellationToken ct = default); + Task GetByNameAsync(string name, CancellationToken ct = default); + Task GetByNameAndVersionAsync(string name, int version, CancellationToken ct = default); + Task> ListAsync(WorkflowTemplateFilter? filter = null, CancellationToken ct = default); + Task PublishAsync(Guid id, CancellationToken ct = default); + Task DeprecateAsync(Guid id, CancellationToken ct = default); + Task DeleteAsync(Guid id, CancellationToken ct = default); + Task ValidateAsync(string content, WorkflowFormat format, CancellationToken ct = default); +} + +public sealed record CreateWorkflowTemplateRequest( + string Name, + string DisplayName, + string Content, + WorkflowFormat Format, + string? Description = null +); + +public sealed record UpdateWorkflowTemplateRequest( + string? DisplayName = null, + string? Content = null, + string? Description = null +); + +public enum WorkflowFormat +{ + Yaml, + Json +} +``` + +### WorkflowTemplate Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Models; + +public sealed record WorkflowTemplate +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required string Name { get; init; } + public required string DisplayName { get; init; } + public string? Description { get; init; } + public required int Version { get; init; } + public required WorkflowTemplateStatus Status { get; init; } + public required string Content { get; init; } + public required WorkflowFormat Format { get; init; } + public required ImmutableArray Steps { get; init; } + public required ImmutableArray Triggers { get; init; } + public ImmutableDictionary Variables { get; init; } = + ImmutableDictionary.Empty; + public DateTimeOffset CreatedAt { get; init; } + public DateTimeOffset UpdatedAt { get; init; } + public DateTimeOffset? PublishedAt { get; init; } + public Guid CreatedBy { get; init; } +} + +public enum WorkflowTemplateStatus +{ + Draft, + Published, + Deprecated +} + +public sealed record WorkflowStep +{ + public required string Id { get; init; } + public required string Type { get; init; } + public string? DisplayName { get; init; } + public ImmutableArray DependsOn { get; init; } = []; + public string? Condition { get; init; } + public ImmutableDictionary Config { get; init; } = + ImmutableDictionary.Empty; + public TimeSpan? Timeout { get; init; } + public RetryConfig? Retry { get; init; } + public bool ContinueOnError { get; init; } = false; +} + +public sealed record RetryConfig( + int MaxAttempts = 3, + TimeSpan InitialDelay = default, + double BackoffMultiplier = 2.0 +) +{ + public TimeSpan InitialDelay { get; init; } = InitialDelay == default + ? TimeSpan.FromSeconds(5) + : InitialDelay; +} + +public sealed record WorkflowTrigger +{ + public required TriggerType Type { get; init; } + public Guid? EnvironmentId { get; init; } + public string? EnvironmentName { get; init; } + public string? CronExpression { get; init; } + public ImmutableDictionary Filters { get; init; } = + ImmutableDictionary.Empty; +} + +public enum TriggerType +{ + Manual, + Promotion, + Schedule, + Webhook, + NewVersion +} +``` + +### WorkflowParser + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Template; + +public sealed class WorkflowParser +{ + private readonly ILogger _logger; + + public ParsedWorkflow Parse(string content, WorkflowFormat format) + { + return format switch + { + WorkflowFormat.Yaml => ParseYaml(content), + WorkflowFormat.Json => ParseJson(content), + _ => throw new ArgumentOutOfRangeException(nameof(format)) + }; + } + + private ParsedWorkflow ParseYaml(string content) + { + var deserializer = new DeserializerBuilder() + .WithNamingConvention(CamelCaseNamingConvention.Instance) + .Build(); + + try + { + var raw = deserializer.Deserialize(content); + return MapToWorkflow(raw); + } + catch (YamlException ex) + { + throw new WorkflowParseException($"YAML parse error at line {ex.Start.Line}: {ex.Message}", ex); + } + } + + private ParsedWorkflow ParseJson(string content) + { + try + { + var raw = JsonSerializer.Deserialize(content, + new JsonSerializerOptions + { + PropertyNamingPolicy = JsonNamingPolicy.CamelCase, + ReadCommentHandling = JsonCommentHandling.Skip + }); + + return MapToWorkflow(raw!); + } + catch (JsonException ex) + { + throw new WorkflowParseException($"JSON parse error: {ex.Message}", ex); + } + } + + private ParsedWorkflow MapToWorkflow(RawWorkflowDefinition raw) + { + var steps = raw.Steps.Select(s => new WorkflowStep + { + Id = s.Id, + Type = s.Type, + DisplayName = s.DisplayName, + DependsOn = s.DependsOn?.ToImmutableArray() ?? [], + Condition = s.Condition, + Config = s.Config?.ToImmutableDictionary() ?? ImmutableDictionary.Empty, + Timeout = s.Timeout.HasValue ? TimeSpan.FromSeconds(s.Timeout.Value) : null, + Retry = s.Retry is not null ? new RetryConfig( + s.Retry.MaxAttempts ?? 3, + TimeSpan.FromSeconds(s.Retry.InitialDelaySeconds ?? 5), + s.Retry.BackoffMultiplier ?? 2.0 + ) : null, + ContinueOnError = s.ContinueOnError ?? false + }).ToImmutableArray(); + + var triggers = raw.Triggers?.Select(t => new WorkflowTrigger + { + Type = Enum.Parse(t.Type, ignoreCase: true), + EnvironmentName = t.Environment, + CronExpression = t.Cron, + Filters = t.Filters?.ToImmutableDictionary() ?? ImmutableDictionary.Empty + }).ToImmutableArray() ?? []; + + return new ParsedWorkflow( + Name: raw.Name, + Version: raw.Version ?? 1, + Steps: steps, + Triggers: triggers, + Variables: raw.Variables?.ToImmutableDictionary() ?? ImmutableDictionary.Empty + ); + } +} + +public sealed record ParsedWorkflow( + string Name, + int Version, + ImmutableArray Steps, + ImmutableArray Triggers, + ImmutableDictionary Variables +); +``` + +### WorkflowValidator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Template; + +public sealed class WorkflowValidator +{ + private readonly IStepRegistry _stepRegistry; + + public async Task ValidateAsync( + ParsedWorkflow workflow, + CancellationToken ct = default) + { + var errors = new List(); + var warnings = new List(); + + // Validate workflow name + if (!IsValidWorkflowName(workflow.Name)) + { + errors.Add(new ValidationError( + "workflow.name", + "Workflow name must be lowercase alphanumeric with hyphens, 2-64 characters")); + } + + // Validate steps exist + if (workflow.Steps.Length == 0) + { + errors.Add(new ValidationError( + "workflow.steps", + "Workflow must have at least one step")); + } + + // Validate step IDs are unique + var stepIds = workflow.Steps.Select(s => s.Id).ToList(); + var duplicates = stepIds.GroupBy(id => id) + .Where(g => g.Count() > 1) + .Select(g => g.Key); + + foreach (var dup in duplicates) + { + errors.Add(new ValidationError( + $"steps.{dup}", + $"Duplicate step ID: {dup}")); + } + + // Validate step types exist + foreach (var step in workflow.Steps) + { + var stepDef = await _stepRegistry.GetAsync(step.Type, ct); + if (stepDef is null) + { + errors.Add(new ValidationError( + $"steps.{step.Id}.type", + $"Unknown step type: {step.Type}")); + } + } + + // Validate dependencies exist + var stepIdSet = stepIds.ToHashSet(); + foreach (var step in workflow.Steps) + { + foreach (var dep in step.DependsOn) + { + if (!stepIdSet.Contains(dep)) + { + errors.Add(new ValidationError( + $"steps.{step.Id}.dependsOn", + $"Unknown dependency: {dep}")); + } + } + } + + // Validate DAG has no cycles + var cycleError = DetectCycles(workflow.Steps); + if (cycleError is not null) + { + errors.Add(cycleError); + } + + // Validate triggers + foreach (var (trigger, index) in workflow.Triggers.Select((t, i) => (t, i))) + { + if (trigger.Type == TriggerType.Schedule && + string.IsNullOrEmpty(trigger.CronExpression)) + { + errors.Add(new ValidationError( + $"triggers[{index}].cron", + "Schedule trigger requires cron expression")); + } + } + + // Check for unreachable steps (warning only) + var reachable = FindReachableSteps(workflow.Steps); + var unreachable = stepIdSet.Except(reachable); + foreach (var stepId in unreachable) + { + warnings.Add(new ValidationWarning( + $"steps.{stepId}", + $"Step {stepId} is unreachable (has dependencies but nothing depends on it)")); + } + + return new WorkflowValidationResult( + IsValid: errors.Count == 0, + Errors: errors.ToImmutableArray(), + Warnings: warnings.ToImmutableArray() + ); + } + + private static ValidationError? DetectCycles(ImmutableArray steps) + { + var visited = new HashSet(); + var recursionStack = new HashSet(); + var stepMap = steps.ToDictionary(s => s.Id); + + foreach (var step in steps) + { + if (HasCycle(step.Id, stepMap, visited, recursionStack, out var cycle)) + { + return new ValidationError( + "workflow.steps", + $"Circular dependency detected: {string.Join(" -> ", cycle)}"); + } + } + + return null; + } + + private static bool HasCycle( + string stepId, + Dictionary stepMap, + HashSet visited, + HashSet recursionStack, + out List cycle) + { + cycle = []; + + if (recursionStack.Contains(stepId)) + { + cycle.Add(stepId); + return true; + } + + if (visited.Contains(stepId)) + return false; + + visited.Add(stepId); + recursionStack.Add(stepId); + + if (stepMap.TryGetValue(stepId, out var step)) + { + foreach (var dep in step.DependsOn) + { + if (HasCycle(dep, stepMap, visited, recursionStack, out cycle)) + { + cycle.Insert(0, stepId); + return true; + } + } + } + + recursionStack.Remove(stepId); + return false; + } + + private static HashSet FindReachableSteps(ImmutableArray steps) + { + // Steps with no dependencies are entry points + var entryPoints = steps.Where(s => s.DependsOn.Length == 0) + .Select(s => s.Id) + .ToHashSet(); + + // All steps that are depended upon + var dependedUpon = steps.SelectMany(s => s.DependsOn).ToHashSet(); + + // Entry points + all steps that something depends on + return entryPoints.Union(dependedUpon).ToHashSet(); + } + + private static bool IsValidWorkflowName(string name) => + Regex.IsMatch(name, @"^[a-z][a-z0-9-]{1,63}$"); +} + +public sealed record WorkflowValidationResult( + bool IsValid, + ImmutableArray Errors, + ImmutableArray Warnings +); + +public sealed record ValidationError(string Path, string Message); +public sealed record ValidationWarning(string Path, string Message); +``` + +### DagBuilder + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Template; + +public sealed class DagBuilder +{ + public WorkflowDag Build(ImmutableArray steps) + { + var nodes = new Dictionary(); + + // Create nodes + foreach (var step in steps) + { + nodes[step.Id] = new DagNode(step.Id, step); + } + + // Build edges + foreach (var step in steps) + { + var node = nodes[step.Id]; + foreach (var dep in step.DependsOn) + { + if (nodes.TryGetValue(dep, out var depNode)) + { + node.Dependencies.Add(depNode); + depNode.Dependents.Add(node); + } + } + } + + // Find entry nodes (no dependencies) + var entryNodes = nodes.Values + .Where(n => n.Dependencies.Count == 0) + .ToImmutableArray(); + + // Compute topological order + var order = TopologicalSort(nodes.Values); + + return new WorkflowDag( + Nodes: nodes.Values.ToImmutableArray(), + EntryNodes: entryNodes, + TopologicalOrder: order + ); + } + + private static ImmutableArray TopologicalSort(IEnumerable nodes) + { + var sorted = new List(); + var visited = new HashSet(); + var nodeList = nodes.ToList(); + + void Visit(DagNode node) + { + if (visited.Contains(node.Id)) + return; + + visited.Add(node.Id); + + foreach (var dep in node.Dependencies) + { + Visit(dep); + } + + sorted.Add(node); + } + + foreach (var node in nodeList) + { + Visit(node); + } + + return sorted.ToImmutableArray(); + } +} + +public sealed record WorkflowDag( + ImmutableArray Nodes, + ImmutableArray EntryNodes, + ImmutableArray TopologicalOrder +); + +public sealed class DagNode +{ + public string Id { get; } + public WorkflowStep Step { get; } + public List Dependencies { get; } = []; + public List Dependents { get; } = []; + + public DagNode(string id, WorkflowStep step) + { + Id = id; + Step = step; + } + + public bool IsReady(IReadOnlySet completedSteps) => + Dependencies.All(d => completedSteps.Contains(d.Id)); +} +``` + +### Domain Events + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Events; + +public sealed record WorkflowTemplateCreated( + Guid TemplateId, + Guid TenantId, + string Name, + int Version, + int StepCount, + DateTimeOffset CreatedAt +) : IDomainEvent; + +public sealed record WorkflowTemplatePublished( + Guid TemplateId, + Guid TenantId, + string Name, + int Version, + DateTimeOffset PublishedAt +) : IDomainEvent; + +public sealed record WorkflowTemplateDeprecated( + Guid TemplateId, + Guid TenantId, + string Name, + DateTimeOffset DeprecatedAt +) : IDomainEvent; +``` + +### Documentation Deliverables + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `docs/modules/release-orchestrator/api/workflows.md` (partial) | Markdown | API endpoint documentation for workflow templates (CRUD, validate) | + +--- + +## Acceptance Criteria + +### Code + +- [ ] Parse YAML workflow definitions +- [ ] Parse JSON workflow definitions +- [ ] Validate step types exist +- [ ] Detect circular dependencies +- [ ] Validate dependencies exist +- [ ] Create workflow templates +- [ ] Version workflow templates +- [ ] Publish workflow templates +- [ ] Deprecate workflow templates +- [ ] Unit test coverage >=85% + +### Documentation +- [ ] Workflow template API endpoints documented +- [ ] Template validation endpoint documented +- [ ] Full workflow template JSON schema included +- [ ] DAG validation rules documented + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `ParseYaml_ValidWorkflow_Succeeds` | YAML parsing works | +| `ParseJson_ValidWorkflow_Succeeds` | JSON parsing works | +| `Validate_CyclicDependency_Fails` | Cycle detection works | +| `Validate_MissingDependency_Fails` | Dependency check works | +| `Validate_UnknownStepType_Fails` | Step type check works | +| `DagBuilder_CreatesTopologicalOrder` | DAG building works | +| `CreateTemplate_StoresContent` | Creation works | +| `PublishTemplate_ChangesStatus` | Publishing works | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `WorkflowTemplateLifecycle_E2E` | Full CRUD cycle | +| `WorkflowParsing_E2E` | Real workflow files | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 101_001 Database Schema | Internal | TODO | +| 105_002 Step Registry | Internal | TODO | +| YamlDotNet | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IWorkflowTemplateService | TODO | | +| WorkflowTemplateService | TODO | | +| WorkflowParser | TODO | | +| WorkflowValidator | TODO | | +| DagBuilder | TODO | | +| WorkflowTemplate model | TODO | | +| IWorkflowTemplateStore | TODO | | +| WorkflowTemplateStore | TODO | | +| Domain events | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 11-Jan-2026 | Added documentation deliverable: api/workflows.md (partial - templates) | diff --git a/docs/implplan/SPRINT_20260110_105_002_WORKFL_step_registry.md b/docs/implplan/SPRINT_20260110_105_002_WORKFL_step_registry.md new file mode 100644 index 000000000..25e7b1ab0 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_105_002_WORKFL_step_registry.md @@ -0,0 +1,564 @@ +# SPRINT: Step Registry + +> **Sprint ID:** 105_002 +> **Module:** WORKFL +> **Phase:** 5 - Workflow Engine +> **Status:** TODO +> **Parent:** [105_000_INDEX](SPRINT_20260110_105_000_INDEX_workflow_engine.md) + +--- + +## Overview + +Implement the Step Registry for managing built-in and plugin workflow steps. + +### Objectives + +- Register built-in step types +- Load plugin step types dynamically +- Define step schemas for configuration validation +- Provide step discovery and documentation + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Workflow/ +│ ├── Steps/ +│ │ ├── IStepRegistry.cs +│ │ ├── StepRegistry.cs +│ │ ├── IStepProvider.cs +│ │ ├── StepDefinition.cs +│ │ └── StepSchema.cs +│ ├── Steps.BuiltIn/ +│ │ └── (see 105_005) +│ └── Steps.Plugin/ +│ ├── IPluginStepLoader.cs +│ └── PluginStepLoader.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Workflow.Tests/ + └── Steps/ +``` + +--- + +## Architecture Reference + +- [Workflow Engine](../modules/release-orchestrator/modules/workflow-engine.md) +- [Plugin System](../modules/release-orchestrator/plugins/step-plugins.md) + +--- + +## Deliverables + +### IStepRegistry Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps; + +public interface IStepRegistry +{ + void RegisterBuiltIn(string type) where T : class, IStepProvider; + void RegisterPlugin(StepDefinition definition, IStepProvider provider); + Task GetProviderAsync(string type, CancellationToken ct = default); + StepDefinition? GetDefinition(string type); + IReadOnlyList GetAllDefinitions(); + IReadOnlyList GetBuiltInDefinitions(); + IReadOnlyList GetPluginDefinitions(); + bool IsRegistered(string type); +} + +public interface IStepProvider +{ + string Type { get; } + string DisplayName { get; } + string Description { get; } + StepSchema ConfigSchema { get; } + StepCapabilities Capabilities { get; } + + Task ExecuteAsync(StepContext context, CancellationToken ct = default); + Task ValidateConfigAsync(IReadOnlyDictionary config, CancellationToken ct = default); +} +``` + +### StepDefinition Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps; + +public sealed record StepDefinition +{ + public required string Type { get; init; } + public required string DisplayName { get; init; } + public required string Description { get; init; } + public required StepCategory Category { get; init; } + public required StepSource Source { get; init; } + public string? PluginId { get; init; } + public required StepSchema ConfigSchema { get; init; } + public required StepCapabilities Capabilities { get; init; } + public string? DocumentationUrl { get; init; } + public string? IconUrl { get; init; } + public ImmutableArray Examples { get; init; } = []; +} + +public enum StepCategory +{ + Deployment, + Gate, + Approval, + Notification, + Script, + Integration, + Utility +} + +public enum StepSource +{ + BuiltIn, + Plugin +} + +public sealed record StepCapabilities +{ + public bool SupportsRetry { get; init; } = true; + public bool SupportsTimeout { get; init; } = true; + public bool SupportsCondition { get; init; } = true; + public bool RequiresAgent { get; init; } = false; + public bool IsAsync { get; init; } = false; // Requires callback to complete + public ImmutableArray RequiredPermissions { get; init; } = []; +} + +public sealed record StepExample( + string Name, + string Description, + ImmutableDictionary Config +); +``` + +### StepSchema + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps; + +public sealed record StepSchema +{ + public ImmutableArray Properties { get; init; } = []; + public ImmutableArray Required { get; init; } = []; + + public ValidationResult Validate(IReadOnlyDictionary config) + { + var errors = new List(); + + // Check required properties + foreach (var required in Required) + { + if (!config.ContainsKey(required) || config[required] is null) + { + errors.Add($"Required property '{required}' is missing"); + } + } + + // Validate property types + foreach (var prop in Properties) + { + if (config.TryGetValue(prop.Name, out var value) && value is not null) + { + var propError = ValidateProperty(prop, value); + if (propError is not null) + { + errors.Add(propError); + } + } + } + + return errors.Count == 0 + ? ValidationResult.Success() + : ValidationResult.Failure(errors); + } + + private static string? ValidateProperty(StepProperty prop, object value) + { + return prop.Type switch + { + StepPropertyType.String when value is not string => + $"Property '{prop.Name}' must be a string", + + StepPropertyType.Integer when !IsInteger(value) => + $"Property '{prop.Name}' must be an integer", + + StepPropertyType.Number when !IsNumber(value) => + $"Property '{prop.Name}' must be a number", + + StepPropertyType.Boolean when value is not bool => + $"Property '{prop.Name}' must be a boolean", + + StepPropertyType.Array when value is not IEnumerable => + $"Property '{prop.Name}' must be an array", + + StepPropertyType.Object when value is not IDictionary => + $"Property '{prop.Name}' must be an object", + + _ => null + }; + } + + private static bool IsInteger(object value) => + value is int or long or short or byte; + + private static bool IsNumber(object value) => + value is int or long or short or byte or float or double or decimal; +} + +public sealed record StepProperty +{ + public required string Name { get; init; } + public required StepPropertyType Type { get; init; } + public string? Description { get; init; } + public object? Default { get; init; } + public ImmutableArray? Enum { get; init; } + public int? MinValue { get; init; } + public int? MaxValue { get; init; } + public int? MinLength { get; init; } + public int? MaxLength { get; init; } + public string? Pattern { get; init; } +} + +public enum StepPropertyType +{ + String, + Integer, + Number, + Boolean, + Array, + Object, + Secret +} +``` + +### StepRegistry Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps; + +public sealed class StepRegistry : IStepRegistry +{ + private readonly ConcurrentDictionary _steps = new(); + private readonly IServiceProvider _serviceProvider; + private readonly ILogger _logger; + + public StepRegistry(IServiceProvider serviceProvider, ILogger logger) + { + _serviceProvider = serviceProvider; + _logger = logger; + } + + public void RegisterBuiltIn(string type) where T : class, IStepProvider + { + var provider = _serviceProvider.GetRequiredService(); + + var definition = new StepDefinition + { + Type = type, + DisplayName = provider.DisplayName, + Description = provider.Description, + Category = InferCategory(type), + Source = StepSource.BuiltIn, + ConfigSchema = provider.ConfigSchema, + Capabilities = provider.Capabilities + }; + + if (!_steps.TryAdd(type, (definition, provider))) + { + throw new InvalidOperationException($"Step type '{type}' is already registered"); + } + + _logger.LogInformation("Registered built-in step: {Type}", type); + } + + public void RegisterPlugin(StepDefinition definition, IStepProvider provider) + { + if (definition.Source != StepSource.Plugin) + { + throw new ArgumentException("Definition must have Plugin source"); + } + + if (!_steps.TryAdd(definition.Type, (definition, provider))) + { + throw new InvalidOperationException($"Step type '{definition.Type}' is already registered"); + } + + _logger.LogInformation( + "Registered plugin step: {Type} from {PluginId}", + definition.Type, + definition.PluginId); + } + + public Task GetProviderAsync(string type, CancellationToken ct = default) + { + return _steps.TryGetValue(type, out var entry) + ? Task.FromResult(entry.Provider) + : Task.FromResult(null); + } + + public StepDefinition? GetDefinition(string type) + { + return _steps.TryGetValue(type, out var entry) + ? entry.Definition + : null; + } + + public IReadOnlyList GetAllDefinitions() + { + return _steps.Values.Select(e => e.Definition).ToList().AsReadOnly(); + } + + public IReadOnlyList GetBuiltInDefinitions() + { + return _steps.Values + .Where(e => e.Definition.Source == StepSource.BuiltIn) + .Select(e => e.Definition) + .ToList() + .AsReadOnly(); + } + + public IReadOnlyList GetPluginDefinitions() + { + return _steps.Values + .Where(e => e.Definition.Source == StepSource.Plugin) + .Select(e => e.Definition) + .ToList() + .AsReadOnly(); + } + + public bool IsRegistered(string type) => _steps.ContainsKey(type); + + private static StepCategory InferCategory(string type) => + type switch + { + "deploy" or "rollback" => StepCategory.Deployment, + "security-gate" or "policy-gate" => StepCategory.Gate, + "approval" => StepCategory.Approval, + "notify" => StepCategory.Notification, + "script" => StepCategory.Script, + "wait" => StepCategory.Utility, + _ => StepCategory.Integration + }; +} +``` + +### PluginStepLoader + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps.Plugin; + +public interface IPluginStepLoader +{ + Task LoadPluginStepsAsync(CancellationToken ct = default); + Task ReloadPluginAsync(string pluginId, CancellationToken ct = default); +} + +public sealed class PluginStepLoader : IPluginStepLoader +{ + private readonly IPluginLoader _pluginLoader; + private readonly IStepRegistry _stepRegistry; + private readonly ILogger _logger; + + public async Task LoadPluginStepsAsync(CancellationToken ct = default) + { + var plugins = await _pluginLoader.GetPluginsAsync(ct); + + foreach (var plugin in plugins) + { + try + { + await LoadPluginAsync(plugin, ct); + } + catch (Exception ex) + { + _logger.LogError(ex, + "Failed to load step plugin {PluginId}", + plugin.Manifest.Id); + } + } + } + + private async Task LoadPluginAsync(LoadedPlugin plugin, CancellationToken ct) + { + var stepProviders = plugin.Instance.GetStepProviders(); + + foreach (var provider in stepProviders) + { + var definition = new StepDefinition + { + Type = provider.Type, + DisplayName = provider.DisplayName, + Description = provider.Description, + Category = plugin.Instance.Category, + Source = StepSource.Plugin, + PluginId = plugin.Manifest.Id, + ConfigSchema = provider.ConfigSchema, + Capabilities = provider.Capabilities, + DocumentationUrl = plugin.Manifest.DocumentationUrl + }; + + _stepRegistry.RegisterPlugin(definition, provider); + + _logger.LogInformation( + "Loaded step '{Type}' from plugin '{PluginId}'", + provider.Type, + plugin.Manifest.Id); + } + } + + public async Task ReloadPluginAsync(string pluginId, CancellationToken ct = default) + { + // Unregister existing steps from this plugin + var existingDefs = _stepRegistry.GetPluginDefinitions() + .Where(d => d.PluginId == pluginId) + .ToList(); + + // Note: Full unregistration would require registry modification + // For now, just log and reload (new registration will override) + + _logger.LogInformation("Reloading step plugin {PluginId}", pluginId); + + var plugin = await _pluginLoader.GetPluginAsync(pluginId, ct); + if (plugin is not null) + { + await LoadPluginAsync(plugin, ct); + } + } +} + +public interface IStepPlugin +{ + StepCategory Category { get; } + IReadOnlyList GetStepProviders(); +} +``` + +### StepRegistryInitializer + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps; + +public sealed class StepRegistryInitializer : IHostedService +{ + private readonly IStepRegistry _registry; + private readonly IPluginStepLoader _pluginLoader; + private readonly ILogger _logger; + + public async Task StartAsync(CancellationToken ct) + { + _logger.LogInformation("Initializing step registry"); + + // Register built-in steps + _registry.RegisterBuiltIn("script"); + _registry.RegisterBuiltIn("approval"); + _registry.RegisterBuiltIn("notify"); + _registry.RegisterBuiltIn("wait"); + _registry.RegisterBuiltIn("security-gate"); + _registry.RegisterBuiltIn("deploy"); + _registry.RegisterBuiltIn("rollback"); + + _logger.LogInformation( + "Registered {Count} built-in steps", + _registry.GetBuiltInDefinitions().Count); + + // Load plugin steps + await _pluginLoader.LoadPluginStepsAsync(ct); + + _logger.LogInformation( + "Loaded {Count} plugin steps", + _registry.GetPluginDefinitions().Count); + } + + public Task StopAsync(CancellationToken ct) => Task.CompletedTask; +} +``` + +### Documentation Deliverables + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `docs/modules/release-orchestrator/api/workflows.md` (partial) | Markdown | API endpoint documentation for step registry (list available steps, get step schema) | + +--- + +## Acceptance Criteria + +### Code + +- [ ] Register built-in step types +- [ ] Load plugin step types +- [ ] Validate step configurations against schema +- [ ] Get step provider by type +- [ ] List all step definitions +- [ ] Filter by built-in vs plugin +- [ ] Step schema validation works +- [ ] Required property validation works +- [ ] Unit test coverage >=85% + +### Documentation +- [ ] Step registry API endpoints documented +- [ ] List steps endpoint documented (GET /api/v1/steps) +- [ ] Built-in step types listed +- [ ] Plugin-provided step discovery explained + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `RegisterBuiltIn_AddsStep` | Registration works | +| `RegisterPlugin_AddsStep` | Plugin registration works | +| `GetProvider_ReturnsProvider` | Lookup works | +| `GetDefinition_ReturnsDefinition` | Definition lookup works | +| `SchemaValidation_RequiredMissing_Fails` | Required check works | +| `SchemaValidation_WrongType_Fails` | Type check works | +| `SchemaValidation_Valid_Succeeds` | Valid config passes | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `StepRegistryInit_E2E` | Full initialization | +| `PluginStepLoading_E2E` | Plugin step loading | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 101_002 Plugin Registry | Internal | TODO | +| 101_003 Plugin Loader | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IStepRegistry | TODO | | +| StepRegistry | TODO | | +| IStepProvider | TODO | | +| StepDefinition | TODO | | +| StepSchema | TODO | | +| IPluginStepLoader | TODO | | +| PluginStepLoader | TODO | | +| StepRegistryInitializer | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 11-Jan-2026 | Added documentation deliverable: api/workflows.md (partial - step registry) | diff --git a/docs/implplan/SPRINT_20260110_105_003_WORKFL_dag_executor.md b/docs/implplan/SPRINT_20260110_105_003_WORKFL_dag_executor.md new file mode 100644 index 000000000..cb0a79f48 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_105_003_WORKFL_dag_executor.md @@ -0,0 +1,734 @@ +# SPRINT: Workflow Engine - DAG Executor + +> **Sprint ID:** 105_003 +> **Module:** WORKFL +> **Phase:** 5 - Workflow Engine +> **Status:** TODO +> **Parent:** [105_000_INDEX](SPRINT_20260110_105_000_INDEX_workflow_engine.md) + +--- + +## Overview + +Implement the DAG Executor for orchestrating workflow step execution with parallel and sequential support. + +### Objectives + +- Start workflow runs from templates +- Schedule steps based on DAG dependencies +- Execute parallel steps concurrently +- Track workflow run state +- Support pause/resume/cancel + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Workflow/ +│ ├── Engine/ +│ │ ├── IWorkflowEngine.cs +│ │ ├── WorkflowEngine.cs +│ │ ├── DagScheduler.cs +│ │ └── WorkflowRuntime.cs +│ ├── State/ +│ │ ├── IWorkflowStateManager.cs +│ │ ├── WorkflowStateManager.cs +│ │ └── WorkflowCheckpoint.cs +│ ├── Store/ +│ │ ├── IWorkflowRunStore.cs +│ │ └── WorkflowRunStore.cs +│ └── Models/ +│ ├── WorkflowRun.cs +│ ├── StepRun.cs +│ └── WorkflowContext.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Workflow.Tests/ + └── Engine/ +``` + +--- + +## Architecture Reference + +- [Workflow Engine](../modules/release-orchestrator/modules/workflow-engine.md) +- [Data Model Schema](../modules/release-orchestrator/data-model/schema.md) + +--- + +## Deliverables + +### IWorkflowEngine Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Engine; + +public interface IWorkflowEngine +{ + Task StartAsync( + Guid templateId, + WorkflowContext context, + CancellationToken ct = default); + + Task StartFromTemplateAsync( + WorkflowTemplate template, + WorkflowContext context, + CancellationToken ct = default); + + Task ResumeAsync(Guid runId, CancellationToken ct = default); + Task PauseAsync(Guid runId, CancellationToken ct = default); + Task CancelAsync(Guid runId, string? reason = null, CancellationToken ct = default); + Task GetRunAsync(Guid runId, CancellationToken ct = default); + Task> ListRunsAsync(WorkflowRunFilter? filter = null, CancellationToken ct = default); + Task RetryStepAsync(Guid runId, string stepId, CancellationToken ct = default); + Task SkipStepAsync(Guid runId, string stepId, CancellationToken ct = default); +} +``` + +### WorkflowRun Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Models; + +public sealed record WorkflowRun +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required Guid TemplateId { get; init; } + public required string TemplateName { get; init; } + public required int TemplateVersion { get; init; } + public required WorkflowRunStatus Status { get; init; } + public required WorkflowContext Context { get; init; } + public required ImmutableArray Steps { get; init; } + public string? FailureReason { get; init; } + public string? CancelReason { get; init; } + public DateTimeOffset StartedAt { get; init; } + public DateTimeOffset? CompletedAt { get; init; } + public DateTimeOffset? PausedAt { get; init; } + public TimeSpan? Duration => CompletedAt.HasValue + ? CompletedAt.Value - StartedAt + : null; + public Guid StartedBy { get; init; } + + public bool IsTerminal => Status is + WorkflowRunStatus.Completed or + WorkflowRunStatus.Failed or + WorkflowRunStatus.Cancelled; +} + +public enum WorkflowRunStatus +{ + Pending, + Running, + Paused, + WaitingForApproval, + Completed, + Failed, + Cancelled +} + +public sealed record StepRun +{ + public required string StepId { get; init; } + public required string StepType { get; init; } + public required StepRunStatus Status { get; init; } + public int AttemptCount { get; init; } + public DateTimeOffset? StartedAt { get; init; } + public DateTimeOffset? CompletedAt { get; init; } + public StepResult? Result { get; init; } + public string? Error { get; init; } + public ImmutableArray Attempts { get; init; } = []; +} + +public enum StepRunStatus +{ + Pending, + Ready, + Running, + WaitingForCallback, + Completed, + Failed, + Skipped, + Cancelled +} + +public sealed record StepAttempt( + int AttemptNumber, + DateTimeOffset StartedAt, + DateTimeOffset? CompletedAt, + StepResult? Result, + string? Error +); + +public sealed record WorkflowContext +{ + public Guid? ReleaseId { get; init; } + public Guid? EnvironmentId { get; init; } + public Guid? PromotionId { get; init; } + public Guid? DeploymentId { get; init; } + public ImmutableDictionary Variables { get; init; } = + ImmutableDictionary.Empty; + public ImmutableDictionary Outputs { get; init; } = + ImmutableDictionary.Empty; +} +``` + +### WorkflowEngine Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Engine; + +public sealed class WorkflowEngine : IWorkflowEngine +{ + private readonly IWorkflowTemplateService _templateService; + private readonly IWorkflowRunStore _runStore; + private readonly IWorkflowStateManager _stateManager; + private readonly IDagScheduler _scheduler; + private readonly IStepExecutor _stepExecutor; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + private readonly ILogger _logger; + + public async Task StartAsync( + Guid templateId, + WorkflowContext context, + CancellationToken ct = default) + { + var template = await _templateService.GetAsync(templateId, ct) + ?? throw new WorkflowTemplateNotFoundException(templateId); + + if (template.Status != WorkflowTemplateStatus.Published) + { + throw new WorkflowTemplateNotPublishedException(templateId); + } + + return await StartFromTemplateAsync(template, context, ct); + } + + public async Task StartFromTemplateAsync( + WorkflowTemplate template, + WorkflowContext context, + CancellationToken ct = default) + { + var now = _timeProvider.GetUtcNow(); + + var stepRuns = template.Steps.Select(step => new StepRun + { + StepId = step.Id, + StepType = step.Type, + Status = StepRunStatus.Pending, + AttemptCount = 0 + }).ToImmutableArray(); + + var run = new WorkflowRun + { + Id = _guidGenerator.NewGuid(), + TenantId = _tenantContext.TenantId, + TemplateId = template.Id, + TemplateName = template.Name, + TemplateVersion = template.Version, + Status = WorkflowRunStatus.Pending, + Context = context, + Steps = stepRuns, + StartedAt = now, + StartedBy = _userContext.UserId + }; + + await _runStore.SaveAsync(run, ct); + + await _eventPublisher.PublishAsync(new WorkflowRunStarted( + run.Id, + run.TenantId, + run.TemplateName, + run.Context.ReleaseId, + run.Context.EnvironmentId, + now + ), ct); + + _logger.LogInformation( + "Started workflow run {RunId} from template {TemplateName}", + run.Id, + template.Name); + + // Start execution + _ = ExecuteAsync(run.Id, template, ct); + + return run; + } + + private async Task ExecuteAsync( + Guid runId, + WorkflowTemplate template, + CancellationToken ct) + { + try + { + var dag = new DagBuilder().Build(template.Steps); + + await _stateManager.SetStatusAsync(runId, WorkflowRunStatus.Running, ct); + + while (!ct.IsCancellationRequested) + { + var run = await _runStore.GetAsync(runId, ct); + if (run is null || run.IsTerminal) + break; + + if (run.Status == WorkflowRunStatus.Paused) + { + await Task.Delay(TimeSpan.FromSeconds(1), ct); + continue; + } + + // Get ready steps + var completedStepIds = run.Steps + .Where(s => s.Status == StepRunStatus.Completed || s.Status == StepRunStatus.Skipped) + .Select(s => s.StepId) + .ToHashSet(); + + var readySteps = _scheduler.GetReadySteps(dag, completedStepIds, run.Steps); + + if (readySteps.Count == 0) + { + // Check if all steps are complete or if we're stuck + var pendingSteps = run.Steps.Where(s => + s.Status is StepRunStatus.Pending or + StepRunStatus.Ready or + StepRunStatus.Running or + StepRunStatus.WaitingForCallback); + + if (!pendingSteps.Any()) + { + // All steps complete + await _stateManager.CompleteAsync(runId, ct); + break; + } + + // Waiting for async steps + await Task.Delay(TimeSpan.FromSeconds(1), ct); + continue; + } + + // Execute ready steps in parallel + var tasks = readySteps.Select(step => + ExecuteStepAsync(runId, step, run.Context, ct)); + + await Task.WhenAll(tasks); + } + } + catch (OperationCanceledException) when (ct.IsCancellationRequested) + { + _logger.LogInformation("Workflow run {RunId} cancelled", runId); + } + catch (Exception ex) + { + _logger.LogError(ex, "Workflow run {RunId} failed", runId); + await _stateManager.FailAsync(runId, ex.Message, ct); + } + } + + private async Task ExecuteStepAsync( + Guid runId, + WorkflowStep step, + WorkflowContext context, + CancellationToken ct) + { + try + { + await _stateManager.SetStepStatusAsync(runId, step.Id, StepRunStatus.Running, ct); + + var stepContext = new StepContext + { + RunId = runId, + StepId = step.Id, + StepType = step.Type, + Config = step.Config, + WorkflowContext = context, + Timeout = step.Timeout, + RetryConfig = step.Retry + }; + + var result = await _stepExecutor.ExecuteAsync(step, stepContext, ct); + + if (result.IsSuccess) + { + await _stateManager.CompleteStepAsync(runId, step.Id, result, ct); + } + else if (result.RequiresCallback) + { + await _stateManager.SetStepStatusAsync(runId, step.Id, + StepRunStatus.WaitingForCallback, ct); + } + else + { + await _stateManager.FailStepAsync(runId, step.Id, result.Error ?? "Unknown error", ct); + } + } + catch (Exception ex) + { + _logger.LogError(ex, "Step {StepId} failed in run {RunId}", step.Id, runId); + await _stateManager.FailStepAsync(runId, step.Id, ex.Message, ct); + } + } + + public async Task CancelAsync(Guid runId, string? reason = null, CancellationToken ct = default) + { + var run = await _runStore.GetAsync(runId, ct) + ?? throw new WorkflowRunNotFoundException(runId); + + if (run.IsTerminal) + { + throw new WorkflowRunAlreadyTerminalException(runId); + } + + await _stateManager.CancelAsync(runId, reason, ct); + + await _eventPublisher.PublishAsync(new WorkflowRunCancelled( + runId, + run.TenantId, + run.TemplateName, + reason, + _timeProvider.GetUtcNow() + ), ct); + } +} +``` + +### DagScheduler + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Engine; + +public interface IDagScheduler +{ + IReadOnlyList GetReadySteps( + WorkflowDag dag, + IReadOnlySet completedStepIds, + ImmutableArray stepRuns); +} + +public sealed class DagScheduler : IDagScheduler +{ + public IReadOnlyList GetReadySteps( + WorkflowDag dag, + IReadOnlySet completedStepIds, + ImmutableArray stepRuns) + { + var readySteps = new List(); + var runningOrWaiting = stepRuns + .Where(s => s.Status is StepRunStatus.Running or StepRunStatus.WaitingForCallback) + .Select(s => s.StepId) + .ToHashSet(); + + foreach (var node in dag.Nodes) + { + var stepRun = stepRuns.FirstOrDefault(s => s.StepId == node.Id); + if (stepRun is null) + continue; + + // Skip if already running, complete, or failed + if (stepRun.Status != StepRunStatus.Pending && + stepRun.Status != StepRunStatus.Ready) + continue; + + // Check if all dependencies are complete + if (node.IsReady(completedStepIds)) + { + // Evaluate condition if present + if (ShouldExecute(node.Step, stepRuns)) + { + readySteps.Add(node.Step); + } + } + } + + return readySteps.AsReadOnly(); + } + + private static bool ShouldExecute(WorkflowStep step, ImmutableArray stepRuns) + { + if (string.IsNullOrEmpty(step.Condition)) + return true; + + // Evaluate simple conditions + return step.Condition switch + { + "success()" => stepRuns + .Where(s => step.DependsOn.Contains(s.StepId)) + .All(s => s.Status == StepRunStatus.Completed), + + "failure()" => stepRuns + .Where(s => step.DependsOn.Contains(s.StepId)) + .Any(s => s.Status == StepRunStatus.Failed), + + "always()" => true, + + _ => true // Default to execute for unrecognized conditions + }; + } +} +``` + +### WorkflowStateManager + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.State; + +public interface IWorkflowStateManager +{ + Task SetStatusAsync(Guid runId, WorkflowRunStatus status, CancellationToken ct = default); + Task SetStepStatusAsync(Guid runId, string stepId, StepRunStatus status, CancellationToken ct = default); + Task CompleteAsync(Guid runId, CancellationToken ct = default); + Task FailAsync(Guid runId, string reason, CancellationToken ct = default); + Task CancelAsync(Guid runId, string? reason, CancellationToken ct = default); + Task CompleteStepAsync(Guid runId, string stepId, StepResult result, CancellationToken ct = default); + Task FailStepAsync(Guid runId, string stepId, string error, CancellationToken ct = default); +} + +public sealed class WorkflowStateManager : IWorkflowStateManager +{ + private readonly IWorkflowRunStore _store; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public async Task CompleteStepAsync( + Guid runId, + string stepId, + StepResult result, + CancellationToken ct = default) + { + var run = await _store.GetAsync(runId, ct) + ?? throw new WorkflowRunNotFoundException(runId); + + var updatedSteps = run.Steps.Select(s => + { + if (s.StepId != stepId) + return s; + + return s with + { + Status = StepRunStatus.Completed, + CompletedAt = _timeProvider.GetUtcNow(), + Result = result, + AttemptCount = s.AttemptCount + 1 + }; + }).ToImmutableArray(); + + var updatedRun = run with { Steps = updatedSteps }; + await _store.SaveAsync(updatedRun, ct); + + await _eventPublisher.PublishAsync(new WorkflowStepCompleted( + runId, + stepId, + result.Outputs, + _timeProvider.GetUtcNow() + ), ct); + + _logger.LogInformation( + "Step {StepId} completed in run {RunId}", + stepId, + runId); + } + + public async Task FailStepAsync( + Guid runId, + string stepId, + string error, + CancellationToken ct = default) + { + var run = await _store.GetAsync(runId, ct) + ?? throw new WorkflowRunNotFoundException(runId); + + var step = run.Steps.FirstOrDefault(s => s.StepId == stepId); + if (step is null) + return; + + // Check if we should retry + var template = await GetStepDefinition(run.TemplateId, stepId, ct); + var shouldRetry = template?.Retry is not null && + step.AttemptCount < template.Retry.MaxAttempts; + + var updatedSteps = run.Steps.Select(s => + { + if (s.StepId != stepId) + return s; + + return s with + { + Status = shouldRetry ? StepRunStatus.Pending : StepRunStatus.Failed, + Error = error, + AttemptCount = s.AttemptCount + 1 + }; + }).ToImmutableArray(); + + var updatedRun = run with { Steps = updatedSteps }; + + // If step failed and no retry, fail the workflow + if (!shouldRetry && !step.ContinueOnError) + { + updatedRun = updatedRun with + { + Status = WorkflowRunStatus.Failed, + FailureReason = $"Step {stepId} failed: {error}", + CompletedAt = _timeProvider.GetUtcNow() + }; + } + + await _store.SaveAsync(updatedRun, ct); + + if (!shouldRetry) + { + await _eventPublisher.PublishAsync(new WorkflowStepFailed( + runId, + stepId, + error, + _timeProvider.GetUtcNow() + ), ct); + } + } +} +``` + +### Domain Events + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Events; + +public sealed record WorkflowRunStarted( + Guid RunId, + Guid TenantId, + string TemplateName, + Guid? ReleaseId, + Guid? EnvironmentId, + DateTimeOffset StartedAt +) : IDomainEvent; + +public sealed record WorkflowRunCompleted( + Guid RunId, + Guid TenantId, + string TemplateName, + TimeSpan Duration, + DateTimeOffset CompletedAt +) : IDomainEvent; + +public sealed record WorkflowRunFailed( + Guid RunId, + Guid TenantId, + string TemplateName, + string Reason, + DateTimeOffset FailedAt +) : IDomainEvent; + +public sealed record WorkflowRunCancelled( + Guid RunId, + Guid TenantId, + string TemplateName, + string? Reason, + DateTimeOffset CancelledAt +) : IDomainEvent; + +public sealed record WorkflowStepCompleted( + Guid RunId, + string StepId, + IReadOnlyDictionary? Outputs, + DateTimeOffset CompletedAt +) : IDomainEvent; + +public sealed record WorkflowStepFailed( + Guid RunId, + string StepId, + string Error, + DateTimeOffset FailedAt +) : IDomainEvent; +``` + +### Documentation Deliverables + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `docs/modules/release-orchestrator/api/workflows.md` (partial) | Markdown | API endpoint documentation for workflow runs (start, pause, resume, cancel) | + +--- + +## Acceptance Criteria + +### Code + +- [ ] Start workflow from template +- [ ] Execute steps in dependency order +- [ ] Execute independent steps in parallel +- [ ] Track workflow run state +- [ ] Pause/resume workflow +- [ ] Cancel workflow +- [ ] Retry failed step +- [ ] Skip step +- [ ] Evaluate step conditions +- [ ] Unit test coverage >=85% + +### Documentation +- [ ] Workflow run API endpoints documented +- [ ] Start workflow run endpoint documented +- [ ] Pause/Resume/Cancel endpoints documented +- [ ] Run status response schema included + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `StartWorkflow_CreatesRun` | Start creates run | +| `DagScheduler_RespectsOrdering` | Dependencies respected | +| `DagScheduler_ParallelSteps` | Parallel execution | +| `ExecuteStep_Success_CompletesStep` | Success handling | +| `ExecuteStep_Failure_RetriesOrFails` | Failure handling | +| `Cancel_StopsExecution` | Cancellation works | +| `Condition_Success_ExecutesStep` | Condition evaluation | +| `Condition_Failure_SkipsStep` | Conditional skip | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `WorkflowExecution_E2E` | Full workflow run | +| `WorkflowRetry_E2E` | Retry behavior | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 105_001 Workflow Template | Internal | TODO | +| 105_002 Step Registry | Internal | TODO | +| 105_004 Step Executor | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IWorkflowEngine | TODO | | +| WorkflowEngine | TODO | | +| IDagScheduler | TODO | | +| DagScheduler | TODO | | +| IWorkflowStateManager | TODO | | +| WorkflowStateManager | TODO | | +| WorkflowRun model | TODO | | +| IWorkflowRunStore | TODO | | +| WorkflowRunStore | TODO | | +| Domain events | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 11-Jan-2026 | Added documentation deliverable: api/workflows.md (partial - workflow runs) | diff --git a/docs/implplan/SPRINT_20260110_105_004_WORKFL_step_executor.md b/docs/implplan/SPRINT_20260110_105_004_WORKFL_step_executor.md new file mode 100644 index 000000000..d159dd1f5 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_105_004_WORKFL_step_executor.md @@ -0,0 +1,615 @@ +# SPRINT: Step Executor + +> **Sprint ID:** 105_004 +> **Module:** WORKFL +> **Phase:** 5 - Workflow Engine +> **Status:** TODO +> **Parent:** [105_000_INDEX](SPRINT_20260110_105_000_INDEX_workflow_engine.md) + +--- + +## Overview + +Implement the Step Executor for executing individual workflow steps with retry and timeout handling. + +### Objectives + +- Execute steps with configuration validation +- Apply retry policies with exponential backoff +- Handle step timeouts +- Manage step execution context +- Support async steps with callbacks + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Workflow/ +│ ├── Executor/ +│ │ ├── IStepExecutor.cs +│ │ ├── StepExecutor.cs +│ │ ├── StepContext.cs +│ │ ├── StepResult.cs +│ │ ├── StepRetryPolicy.cs +│ │ └── StepTimeoutHandler.cs +│ └── Callback/ +│ ├── IStepCallbackHandler.cs +│ ├── StepCallbackHandler.cs +│ └── CallbackToken.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Workflow.Tests/ + └── Executor/ +``` + +--- + +## Architecture Reference + +- [Workflow Engine](../modules/release-orchestrator/modules/workflow-engine.md) + +--- + +## Deliverables + +### IStepExecutor Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Executor; + +public interface IStepExecutor +{ + Task ExecuteAsync( + WorkflowStep step, + StepContext context, + CancellationToken ct = default); +} +``` + +### StepContext Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Executor; + +public sealed record StepContext +{ + public required Guid RunId { get; init; } + public required string StepId { get; init; } + public required string StepType { get; init; } + public required ImmutableDictionary Config { get; init; } + public required WorkflowContext WorkflowContext { get; init; } + public TimeSpan? Timeout { get; init; } + public RetryConfig? RetryConfig { get; init; } + public int AttemptNumber { get; init; } = 1; + + // For variable interpolation + public string Interpolate(string template) + { + var result = template; + + foreach (var (key, value) in WorkflowContext.Variables) + { + result = result.Replace($"${{variables.{key}}}", value); + } + + foreach (var (key, value) in WorkflowContext.Outputs) + { + result = result.Replace($"${{outputs.{key}}}", value?.ToString() ?? ""); + } + + // Built-in variables + result = result.Replace("${run.id}", RunId.ToString()); + result = result.Replace("${step.id}", StepId); + result = result.Replace("${step.attempt}", AttemptNumber.ToString(CultureInfo.InvariantCulture)); + + if (WorkflowContext.ReleaseId.HasValue) + result = result.Replace("${release.id}", WorkflowContext.ReleaseId.Value.ToString()); + + if (WorkflowContext.EnvironmentId.HasValue) + result = result.Replace("${environment.id}", WorkflowContext.EnvironmentId.Value.ToString()); + + return result; + } +} +``` + +### StepResult Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Executor; + +public sealed record StepResult +{ + public required StepResultStatus Status { get; init; } + public string? Error { get; init; } + public ImmutableDictionary Outputs { get; init; } = + ImmutableDictionary.Empty; + public TimeSpan Duration { get; init; } + public bool RequiresCallback { get; init; } + public string? CallbackToken { get; init; } + public DateTimeOffset? CallbackExpiresAt { get; init; } + + public bool IsSuccess => Status == StepResultStatus.Success; + public bool IsFailure => Status == StepResultStatus.Failed; + + public static StepResult Success( + ImmutableDictionary? outputs = null, + TimeSpan duration = default) => + new() + { + Status = StepResultStatus.Success, + Outputs = outputs ?? ImmutableDictionary.Empty, + Duration = duration + }; + + public static StepResult Failed(string error, TimeSpan duration = default) => + new() + { + Status = StepResultStatus.Failed, + Error = error, + Duration = duration + }; + + public static StepResult WaitingForCallback( + string callbackToken, + DateTimeOffset expiresAt) => + new() + { + Status = StepResultStatus.WaitingForCallback, + RequiresCallback = true, + CallbackToken = callbackToken, + CallbackExpiresAt = expiresAt + }; + + public static StepResult Skipped(string reason) => + new() + { + Status = StepResultStatus.Skipped, + Error = reason + }; +} + +public enum StepResultStatus +{ + Success, + Failed, + Skipped, + WaitingForCallback, + TimedOut, + Cancelled +} +``` + +### StepExecutor Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Executor; + +public sealed class StepExecutor : IStepExecutor +{ + private readonly IStepRegistry _stepRegistry; + private readonly IStepRetryPolicy _retryPolicy; + private readonly IStepTimeoutHandler _timeoutHandler; + private readonly ILogger _logger; + private readonly TimeProvider _timeProvider; + + public async Task ExecuteAsync( + WorkflowStep step, + StepContext context, + CancellationToken ct = default) + { + var sw = Stopwatch.StartNew(); + + _logger.LogInformation( + "Executing step {StepId} (type: {StepType}, attempt: {Attempt})", + step.Id, + step.Type, + context.AttemptNumber); + + try + { + // Get step provider + var provider = await _stepRegistry.GetProviderAsync(step.Type, ct); + if (provider is null) + { + return StepResult.Failed($"Unknown step type: {step.Type}", sw.Elapsed); + } + + // Validate configuration + var validation = await provider.ValidateConfigAsync(context.Config, ct); + if (!validation.IsValid) + { + return StepResult.Failed( + $"Invalid configuration: {string.Join(", ", validation.Errors)}", + sw.Elapsed); + } + + // Apply timeout if configured + using var timeoutCts = context.Timeout.HasValue + ? new CancellationTokenSource(context.Timeout.Value) + : new CancellationTokenSource(); + + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource( + ct, timeoutCts.Token); + + try + { + var result = await provider.ExecuteAsync(context, linkedCts.Token); + result = result with { Duration = sw.Elapsed }; + + _logger.LogInformation( + "Step {StepId} completed with status {Status} in {Duration}ms", + step.Id, + result.Status, + sw.ElapsedMilliseconds); + + return result; + } + catch (OperationCanceledException) when (timeoutCts.IsCancellationRequested) + { + _logger.LogWarning( + "Step {StepId} timed out after {Timeout}", + step.Id, + context.Timeout); + + return new StepResult + { + Status = StepResultStatus.TimedOut, + Error = $"Step timed out after {context.Timeout}", + Duration = sw.Elapsed + }; + } + } + catch (OperationCanceledException) when (ct.IsCancellationRequested) + { + return new StepResult + { + Status = StepResultStatus.Cancelled, + Duration = sw.Elapsed + }; + } + catch (Exception ex) + { + _logger.LogError(ex, + "Step {StepId} failed with exception", + step.Id); + + return StepResult.Failed(ex.Message, sw.Elapsed); + } + } +} +``` + +### StepRetryPolicy + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Executor; + +public interface IStepRetryPolicy +{ + bool ShouldRetry(StepResult result, RetryConfig? config, int attemptNumber); + TimeSpan GetDelay(RetryConfig config, int attemptNumber); +} + +public sealed class StepRetryPolicy : IStepRetryPolicy +{ + private static readonly HashSet RetryableStatuses = new() + { + StepResultStatus.Failed, + StepResultStatus.TimedOut + }; + + public bool ShouldRetry(StepResult result, RetryConfig? config, int attemptNumber) + { + if (config is null) + return false; + + if (!RetryableStatuses.Contains(result.Status)) + return false; + + if (attemptNumber >= config.MaxAttempts) + return false; + + return true; + } + + public TimeSpan GetDelay(RetryConfig config, int attemptNumber) + { + // Exponential backoff with jitter + var baseDelay = config.InitialDelay.TotalMilliseconds; + var exponentialDelay = baseDelay * Math.Pow(config.BackoffMultiplier, attemptNumber - 1); + + // Add jitter (+-20%) + var jitter = exponentialDelay * (Random.Shared.NextDouble() * 0.4 - 0.2); + var totalDelayMs = exponentialDelay + jitter; + + // Cap at 5 minutes + var cappedDelay = Math.Min(totalDelayMs, TimeSpan.FromMinutes(5).TotalMilliseconds); + + return TimeSpan.FromMilliseconds(cappedDelay); + } +} +``` + +### StepCallbackHandler + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Callback; + +public interface IStepCallbackHandler +{ + Task CreateCallbackAsync( + Guid runId, + string stepId, + TimeSpan? expiresIn = null, + CancellationToken ct = default); + + Task ProcessCallbackAsync( + string token, + CallbackPayload payload, + CancellationToken ct = default); + + Task ValidateCallbackAsync( + string token, + CancellationToken ct = default); +} + +public sealed class StepCallbackHandler : IStepCallbackHandler +{ + private readonly ICallbackStore _store; + private readonly IWorkflowStateManager _stateManager; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + private static readonly TimeSpan DefaultExpiry = TimeSpan.FromHours(24); + + public async Task CreateCallbackAsync( + Guid runId, + string stepId, + TimeSpan? expiresIn = null, + CancellationToken ct = default) + { + var token = GenerateSecureToken(); + var expiry = _timeProvider.GetUtcNow().Add(expiresIn ?? DefaultExpiry); + + var callback = new PendingCallback + { + Token = token, + RunId = runId, + StepId = stepId, + CreatedAt = _timeProvider.GetUtcNow(), + ExpiresAt = expiry + }; + + await _store.SaveAsync(callback, ct); + + return new CallbackToken(token, expiry); + } + + public async Task ProcessCallbackAsync( + string token, + CallbackPayload payload, + CancellationToken ct = default) + { + var pending = await _store.GetByTokenAsync(token, ct); + if (pending is null) + { + return CallbackResult.Failed("Invalid callback token"); + } + + if (pending.ExpiresAt < _timeProvider.GetUtcNow()) + { + return CallbackResult.Failed("Callback token expired"); + } + + if (pending.ProcessedAt.HasValue) + { + return CallbackResult.Failed("Callback already processed"); + } + + // Mark as processed + pending = pending with { ProcessedAt = _timeProvider.GetUtcNow() }; + await _store.SaveAsync(pending, ct); + + // Update step with callback result + var result = payload.Success + ? StepResult.Success(payload.Outputs?.ToImmutableDictionary()) + : StepResult.Failed(payload.Error ?? "Callback indicated failure"); + + await _stateManager.CompleteStepAsync(pending.RunId, pending.StepId, result, ct); + + _logger.LogInformation( + "Processed callback for step {StepId} in run {RunId}", + pending.StepId, + pending.RunId); + + return CallbackResult.Succeeded(pending.RunId, pending.StepId); + } + + private static string GenerateSecureToken() + { + var bytes = RandomNumberGenerator.GetBytes(32); + return Convert.ToBase64String(bytes) + .Replace("+", "-") + .Replace("/", "_") + .TrimEnd('='); + } +} + +public sealed record CallbackToken( + string Token, + DateTimeOffset ExpiresAt +); + +public sealed record CallbackPayload( + bool Success, + string? Error = null, + IReadOnlyDictionary? Outputs = null +); + +public sealed record CallbackResult( + bool IsSuccess, + string? Error = null, + Guid? RunId = null, + string? StepId = null +) +{ + public static CallbackResult Succeeded(Guid runId, string stepId) => + new(true, RunId: runId, StepId: stepId); + + public static CallbackResult Failed(string error) => + new(false, Error: error); +} + +public sealed record PendingCallback +{ + public required string Token { get; init; } + public required Guid RunId { get; init; } + public required string StepId { get; init; } + public required DateTimeOffset CreatedAt { get; init; } + public required DateTimeOffset ExpiresAt { get; init; } + public DateTimeOffset? ProcessedAt { get; init; } +} +``` + +### StepTimeoutHandler + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Executor; + +public interface IStepTimeoutHandler +{ + Task MonitorTimeoutsAsync(CancellationToken ct = default); +} + +public sealed class StepTimeoutHandler : IStepTimeoutHandler, IHostedService +{ + private readonly IWorkflowRunStore _runStore; + private readonly IWorkflowStateManager _stateManager; + private readonly ICallbackStore _callbackStore; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + private Timer? _timer; + + public Task StartAsync(CancellationToken ct) + { + _timer = new Timer( + _ => _ = MonitorTimeoutsAsync(ct), + null, + TimeSpan.FromSeconds(30), + TimeSpan.FromSeconds(30)); + + return Task.CompletedTask; + } + + public Task StopAsync(CancellationToken ct) + { + _timer?.Change(Timeout.Infinite, 0); + return Task.CompletedTask; + } + + public async Task MonitorTimeoutsAsync(CancellationToken ct = default) + { + try + { + var now = _timeProvider.GetUtcNow(); + + // Check for expired callbacks + var expiredCallbacks = await _callbackStore.GetExpiredAsync(now, ct); + foreach (var callback in expiredCallbacks) + { + _logger.LogWarning( + "Callback expired for step {StepId} in run {RunId}", + callback.StepId, + callback.RunId); + + await _stateManager.FailStepAsync( + callback.RunId, + callback.StepId, + "Callback timed out", + ct); + } + } + catch (Exception ex) + { + _logger.LogError(ex, "Error monitoring step timeouts"); + } + } + + public void Dispose() => _timer?.Dispose(); +} +``` + +--- + +## Acceptance Criteria + +- [ ] Execute steps with configuration +- [ ] Validate step configuration before execution +- [ ] Apply timeout to step execution +- [ ] Retry failed steps with exponential backoff +- [ ] Create callback tokens for async steps +- [ ] Process callbacks and complete steps +- [ ] Monitor and fail expired callbacks +- [ ] Interpolate variables in configuration +- [ ] Unit test coverage >=85% + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `ExecuteStep_Success_ReturnsSuccess` | Success case | +| `ExecuteStep_InvalidConfig_Fails` | Config validation | +| `ExecuteStep_Timeout_ReturnsTimedOut` | Timeout handling | +| `RetryPolicy_ShouldRetry_ReturnsTrue` | Retry logic | +| `RetryPolicy_MaxAttempts_ReturnsFalse` | Max attempts | +| `GetDelay_ExponentialBackoff` | Backoff calculation | +| `Callback_ValidToken_Succeeds` | Callback processing | +| `Callback_ExpiredToken_Fails` | Expiry handling | +| `Interpolate_ReplacesVariables` | Variable interpolation | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `StepExecution_E2E` | Full execution flow | +| `StepCallback_E2E` | Async callback flow | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 105_002 Step Registry | Internal | TODO | +| 105_003 DAG Executor | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IStepExecutor | TODO | | +| StepExecutor | TODO | | +| StepContext | TODO | | +| StepResult | TODO | | +| IStepRetryPolicy | TODO | | +| StepRetryPolicy | TODO | | +| IStepCallbackHandler | TODO | | +| StepCallbackHandler | TODO | | +| IStepTimeoutHandler | TODO | | +| StepTimeoutHandler | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_105_005_WORKFL_builtin_steps.md b/docs/implplan/SPRINT_20260110_105_005_WORKFL_builtin_steps.md new file mode 100644 index 000000000..4ce74aa4b --- /dev/null +++ b/docs/implplan/SPRINT_20260110_105_005_WORKFL_builtin_steps.md @@ -0,0 +1,771 @@ +# SPRINT: Built-in Steps + +> **Sprint ID:** 105_005 +> **Module:** WORKFL +> **Phase:** 5 - Workflow Engine +> **Status:** TODO +> **Parent:** [105_000_INDEX](SPRINT_20260110_105_000_INDEX_workflow_engine.md) + +--- + +## Overview + +Implement the built-in workflow steps for common deployment and automation tasks. + +### Objectives + +- Implement core workflow steps required for v1 release +- Define complete step type catalog (16 types total) +- Document deferral strategy for post-v1 and plugin-based steps + +### Step Type Catalog + +The Release Orchestrator supports 16 built-in step types. This sprint implements the **7 core types** required for v1; remaining types are deferred to post-v1 or delivered via the plugin SDK. + +| Step Type | v1 Status | Description | +|-----------|-----------|-------------| +| `script` | **v1** | Execute shell scripts on target host | +| `approval` | **v1** | Request manual approval before proceeding | +| `notify` | **v1** | Send notifications via configured channels | +| `wait` | **v1** | Pause execution for duration/until time | +| `security-gate` | **v1** | Check vulnerability thresholds | +| `deploy` | **v1** | Trigger deployment to target environment | +| `rollback` | **v1** | Rollback to previous release version | +| `http` | Post-v1 | Make HTTP requests (API calls, webhooks) | +| `smoke-test` | Post-v1 | Run smoke tests against deployed service | +| `health-check` | Post-v1 | Custom health check beyond deploy step | +| `database-migrate` | Post-v1 | Run database migrations via agent | +| `feature-flag` | Plugin | Toggle feature flags (LaunchDarkly, Split, etc.) | +| `cache-invalidate` | Plugin | Invalidate CDN/cache (CloudFront, Fastly, etc.) | +| `metric-check` | Plugin | Query metrics (Prometheus, DataDog, etc.) | +| `dns-switch` | Plugin | Update DNS records for blue-green | +| `custom` | Plugin | User-defined plugin steps | + +> **Deferral Strategy:** Post-v1 types will be implemented in Release Orchestrator 1.1. Plugin types are delivered via the Plugin SDK (`StellaOps.Plugin.Sdk`) and can be developed by users or third parties using `IStepProviderCapability`. + +### v1 Core Step Objectives + +- Script step for executing shell commands +- Approval step for manual gates +- Notify step for sending notifications +- Wait step for time delays +- Security gate step for vulnerability checks +- Deploy step for triggering deployments +- Rollback step for reverting releases + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Workflow/ +│ └── Steps.BuiltIn/ +│ ├── ScriptStepProvider.cs +│ ├── ApprovalStepProvider.cs +│ ├── NotifyStepProvider.cs +│ ├── WaitStepProvider.cs +│ ├── SecurityGateStepProvider.cs +│ ├── DeployStepProvider.cs +│ └── RollbackStepProvider.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Workflow.Tests/ + └── Steps.BuiltIn/ +``` + +--- + +## Architecture Reference + +- [Workflow Engine](../modules/release-orchestrator/modules/workflow-engine.md) +- [Step Plugins](../modules/release-orchestrator/plugins/step-plugins.md) + +--- + +## Deliverables + +### ScriptStepProvider + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps.BuiltIn; + +public sealed class ScriptStepProvider : IStepProvider +{ + private readonly IAgentManager _agentManager; + private readonly ILogger _logger; + + public string Type => "script"; + public string DisplayName => "Script"; + public string Description => "Execute a shell script or command on target host"; + + public StepSchema ConfigSchema => new() + { + Properties = + [ + new StepProperty { Name = "script", Type = StepPropertyType.String, Description = "Script content or path" }, + new StepProperty { Name = "shell", Type = StepPropertyType.String, Default = "bash", Description = "Shell to use (bash, sh, powershell)" }, + new StepProperty { Name = "workingDir", Type = StepPropertyType.String, Description = "Working directory" }, + new StepProperty { Name = "environment", Type = StepPropertyType.Object, Description = "Environment variables" }, + new StepProperty { Name = "timeout", Type = StepPropertyType.Integer, Default = 300, Description = "Timeout in seconds" }, + new StepProperty { Name = "failOnNonZero", Type = StepPropertyType.Boolean, Default = true, Description = "Fail if exit code is non-zero" } + ], + Required = ["script"] + }; + + public StepCapabilities Capabilities => new() + { + SupportsRetry = true, + SupportsTimeout = true, + RequiresAgent = true + }; + + public async Task ExecuteAsync(StepContext context, CancellationToken ct) + { + var script = context.Interpolate(context.Config.GetValueOrDefault("script")?.ToString() ?? ""); + var shell = context.Config.GetValueOrDefault("shell")?.ToString() ?? "bash"; + var workingDir = context.Config.GetValueOrDefault("workingDir")?.ToString(); + var timeout = context.Config.GetValueOrDefault("timeout") is int t ? t : 300; + var failOnNonZero = context.Config.GetValueOrDefault("failOnNonZero") as bool? ?? true; + + var environmentId = context.WorkflowContext.EnvironmentId + ?? throw new InvalidOperationException("Script step requires an environment"); + + // Get agent for target + var agent = await GetAgentForEnvironment(environmentId, ct); + + var task = new ScriptExecutionTask + { + Script = script, + Shell = shell, + WorkingDirectory = workingDir, + TimeoutSeconds = timeout, + Environment = ExtractEnvironment(context.Config) + }; + + var result = await _agentManager.ExecuteTaskAsync(agent.Id, task, ct); + + if (result.ExitCode != 0 && failOnNonZero) + { + return StepResult.Failed( + $"Script exited with code {result.ExitCode}: {result.Stderr}"); + } + + return StepResult.Success(new Dictionary + { + ["exitCode"] = result.ExitCode, + ["stdout"] = result.Stdout, + ["stderr"] = result.Stderr + }.ToImmutableDictionary()); + } + + public Task ValidateConfigAsync( + IReadOnlyDictionary config, + CancellationToken ct) => + Task.FromResult(ConfigSchema.Validate(config)); +} +``` + +### ApprovalStepProvider + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps.BuiltIn; + +public sealed class ApprovalStepProvider : IStepProvider +{ + private readonly IApprovalService _approvalService; + private readonly IStepCallbackHandler _callbackHandler; + private readonly ILogger _logger; + + public string Type => "approval"; + public string DisplayName => "Approval"; + public string Description => "Request manual approval before proceeding"; + + public StepSchema ConfigSchema => new() + { + Properties = + [ + new StepProperty { Name = "approvers", Type = StepPropertyType.Array, Description = "List of approver user IDs or group names" }, + new StepProperty { Name = "minApprovals", Type = StepPropertyType.Integer, Default = 1, Description = "Minimum approvals required" }, + new StepProperty { Name = "message", Type = StepPropertyType.String, Description = "Message to display to approvers" }, + new StepProperty { Name = "timeout", Type = StepPropertyType.Integer, Default = 86400, Description = "Timeout in seconds (default 24h)" }, + new StepProperty { Name = "autoApproveOnTimeout", Type = StepPropertyType.Boolean, Default = false, Description = "Auto-approve if timeout reached" } + ], + Required = ["approvers"] + }; + + public StepCapabilities Capabilities => new() + { + SupportsRetry = false, + SupportsTimeout = true, + IsAsync = true + }; + + public async Task ExecuteAsync(StepContext context, CancellationToken ct) + { + var approvers = context.Config.GetValueOrDefault("approvers") as IEnumerable + ?? throw new InvalidOperationException("Approvers required"); + var minApprovals = context.Config.GetValueOrDefault("minApprovals") as int? ?? 1; + var message = context.Interpolate( + context.Config.GetValueOrDefault("message")?.ToString() ?? "Approval required"); + var timeoutSeconds = context.Config.GetValueOrDefault("timeout") as int? ?? 86400; + + // Create callback token + var callback = await _callbackHandler.CreateCallbackAsync( + context.RunId, + context.StepId, + TimeSpan.FromSeconds(timeoutSeconds), + ct); + + // Create approval request + var approval = await _approvalService.CreateAsync(new CreateApprovalRequest + { + WorkflowRunId = context.RunId, + StepId = context.StepId, + Message = message, + Approvers = approvers.Select(a => a.ToString()!).ToList(), + MinApprovals = minApprovals, + ExpiresAt = callback.ExpiresAt, + CallbackToken = callback.Token, + ReleaseId = context.WorkflowContext.ReleaseId, + EnvironmentId = context.WorkflowContext.EnvironmentId + }, ct); + + _logger.LogInformation( + "Created approval request {ApprovalId} for step {StepId}", + approval.Id, + context.StepId); + + return StepResult.WaitingForCallback(callback.Token, callback.ExpiresAt); + } + + public Task ValidateConfigAsync( + IReadOnlyDictionary config, + CancellationToken ct) => + Task.FromResult(ConfigSchema.Validate(config)); +} +``` + +### NotifyStepProvider + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps.BuiltIn; + +public sealed class NotifyStepProvider : IStepProvider +{ + private readonly INotificationService _notificationService; + private readonly ILogger _logger; + + public string Type => "notify"; + public string DisplayName => "Notify"; + public string Description => "Send notifications via configured channels"; + + public StepSchema ConfigSchema => new() + { + Properties = + [ + new StepProperty { Name = "channel", Type = StepPropertyType.String, Description = "Notification channel (slack, teams, email, webhook)" }, + new StepProperty { Name = "message", Type = StepPropertyType.String, Description = "Message content" }, + new StepProperty { Name = "title", Type = StepPropertyType.String, Description = "Message title" }, + new StepProperty { Name = "recipients", Type = StepPropertyType.Array, Description = "Recipient addresses/channels" }, + new StepProperty { Name = "severity", Type = StepPropertyType.String, Default = "info", Description = "Message severity (info, warning, error)" }, + new StepProperty { Name = "template", Type = StepPropertyType.String, Description = "Named template to use" } + ], + Required = ["channel", "message"] + }; + + public StepCapabilities Capabilities => new() + { + SupportsRetry = true, + SupportsTimeout = true + }; + + public async Task ExecuteAsync(StepContext context, CancellationToken ct) + { + var channel = context.Config.GetValueOrDefault("channel")?.ToString() + ?? throw new InvalidOperationException("Channel required"); + var message = context.Interpolate( + context.Config.GetValueOrDefault("message")?.ToString() ?? ""); + var title = context.Interpolate( + context.Config.GetValueOrDefault("title")?.ToString() ?? "Workflow Notification"); + var severity = context.Config.GetValueOrDefault("severity")?.ToString() ?? "info"; + + var recipients = context.Config.GetValueOrDefault("recipients") as IEnumerable; + + var notification = new NotificationRequest + { + Channel = channel, + Title = title, + Message = message, + Severity = Enum.Parse(severity, ignoreCase: true), + Recipients = recipients?.Select(r => r.ToString()!).ToList(), + Metadata = new Dictionary + { + ["workflowRunId"] = context.RunId.ToString(), + ["stepId"] = context.StepId, + ["releaseId"] = context.WorkflowContext.ReleaseId?.ToString() ?? "", + ["environmentId"] = context.WorkflowContext.EnvironmentId?.ToString() ?? "" + } + }; + + await _notificationService.SendAsync(notification, ct); + + _logger.LogInformation( + "Sent {Channel} notification for step {StepId}", + channel, + context.StepId); + + return StepResult.Success(); + } + + public Task ValidateConfigAsync( + IReadOnlyDictionary config, + CancellationToken ct) => + Task.FromResult(ConfigSchema.Validate(config)); +} +``` + +### WaitStepProvider + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps.BuiltIn; + +public sealed class WaitStepProvider : IStepProvider +{ + public string Type => "wait"; + public string DisplayName => "Wait"; + public string Description => "Pause workflow execution for specified duration"; + + public StepSchema ConfigSchema => new() + { + Properties = + [ + new StepProperty { Name = "duration", Type = StepPropertyType.Integer, Description = "Wait duration in seconds" }, + new StepProperty { Name = "until", Type = StepPropertyType.String, Description = "Wait until specific time (ISO 8601)" } + ] + }; + + public StepCapabilities Capabilities => new() + { + SupportsRetry = false, + SupportsTimeout = false + }; + + public async Task ExecuteAsync(StepContext context, CancellationToken ct) + { + TimeSpan waitDuration; + + if (context.Config.TryGetValue("duration", out var durationObj) && durationObj is int duration) + { + waitDuration = TimeSpan.FromSeconds(duration); + } + else if (context.Config.TryGetValue("until", out var untilObj) && + untilObj is string untilStr && + DateTimeOffset.TryParse(untilStr, CultureInfo.InvariantCulture, + DateTimeStyles.AssumeUniversal, out var until)) + { + waitDuration = until - TimeProvider.System.GetUtcNow(); + if (waitDuration < TimeSpan.Zero) + waitDuration = TimeSpan.Zero; + } + else + { + return StepResult.Failed("Either 'duration' or 'until' must be specified"); + } + + if (waitDuration > TimeSpan.Zero) + { + await Task.Delay(waitDuration, ct); + } + + return StepResult.Success(new Dictionary + { + ["waitedSeconds"] = (int)waitDuration.TotalSeconds + }.ToImmutableDictionary()); + } + + public Task ValidateConfigAsync( + IReadOnlyDictionary config, + CancellationToken ct) + { + if (!config.ContainsKey("duration") && !config.ContainsKey("until")) + { + return Task.FromResult(ValidationResult.Failure( + "Either 'duration' or 'until' must be specified")); + } + return Task.FromResult(ValidationResult.Success()); + } +} +``` + +### SecurityGateStepProvider + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps.BuiltIn; + +public sealed class SecurityGateStepProvider : IStepProvider +{ + private readonly IReleaseManager _releaseManager; + private readonly IScannerService _scannerService; + private readonly ILogger _logger; + + public string Type => "security-gate"; + public string DisplayName => "Security Gate"; + public string Description => "Check security vulnerabilities meet thresholds"; + + public StepSchema ConfigSchema => new() + { + Properties = + [ + new StepProperty { Name = "maxCritical", Type = StepPropertyType.Integer, Default = 0, Description = "Maximum critical vulnerabilities allowed" }, + new StepProperty { Name = "maxHigh", Type = StepPropertyType.Integer, Default = 5, Description = "Maximum high vulnerabilities allowed" }, + new StepProperty { Name = "maxMedium", Type = StepPropertyType.Integer, Description = "Maximum medium vulnerabilities allowed" }, + new StepProperty { Name = "requireScan", Type = StepPropertyType.Boolean, Default = true, Description = "Require scan to exist" }, + new StepProperty { Name = "maxAge", Type = StepPropertyType.Integer, Default = 86400, Description = "Max scan age in seconds" } + ] + }; + + public StepCapabilities Capabilities => new() + { + SupportsRetry = true, + SupportsTimeout = true, + RequiredPermissions = ["release:read", "scanner:read"] + }; + + public async Task ExecuteAsync(StepContext context, CancellationToken ct) + { + var releaseId = context.WorkflowContext.ReleaseId + ?? throw new InvalidOperationException("Security gate requires a release"); + + var maxCritical = context.Config.GetValueOrDefault("maxCritical") as int? ?? 0; + var maxHigh = context.Config.GetValueOrDefault("maxHigh") as int? ?? 5; + var maxMedium = context.Config.GetValueOrDefault("maxMedium") as int?; + var requireScan = context.Config.GetValueOrDefault("requireScan") as bool? ?? true; + var maxAgeSeconds = context.Config.GetValueOrDefault("maxAge") as int? ?? 86400; + + var release = await _releaseManager.GetAsync(releaseId, ct) + ?? throw new ReleaseNotFoundException(releaseId); + + var violations = new List(); + var totalCritical = 0; + var totalHigh = 0; + var totalMedium = 0; + + foreach (var component in release.Components) + { + var scanResult = await _scannerService.GetLatestScanAsync( + component.Digest, ct); + + if (scanResult is null) + { + if (requireScan) + { + violations.Add($"No scan found for {component.ComponentName}"); + } + continue; + } + + var scanAge = TimeProvider.System.GetUtcNow() - scanResult.CompletedAt; + if (scanAge.TotalSeconds > maxAgeSeconds) + { + violations.Add($"Scan for {component.ComponentName} is too old ({scanAge.TotalHours:F1}h)"); + } + + totalCritical += scanResult.CriticalCount; + totalHigh += scanResult.HighCount; + totalMedium += scanResult.MediumCount; + } + + // Check thresholds + if (totalCritical > maxCritical) + { + violations.Add($"Critical vulnerabilities ({totalCritical}) exceed threshold ({maxCritical})"); + } + + if (totalHigh > maxHigh) + { + violations.Add($"High vulnerabilities ({totalHigh}) exceed threshold ({maxHigh})"); + } + + if (maxMedium.HasValue && totalMedium > maxMedium.Value) + { + violations.Add($"Medium vulnerabilities ({totalMedium}) exceed threshold ({maxMedium})"); + } + + if (violations.Count > 0) + { + return StepResult.Failed(string.Join("; ", violations)); + } + + _logger.LogInformation( + "Security gate passed for release {ReleaseId}: {Critical}C/{High}H/{Medium}M", + releaseId, + totalCritical, + totalHigh, + totalMedium); + + return StepResult.Success(new Dictionary + { + ["criticalCount"] = totalCritical, + ["highCount"] = totalHigh, + ["mediumCount"] = totalMedium, + ["componentsScanned"] = release.Components.Length + }.ToImmutableDictionary()); + } + + public Task ValidateConfigAsync( + IReadOnlyDictionary config, + CancellationToken ct) => + Task.FromResult(ConfigSchema.Validate(config)); +} +``` + +### DeployStepProvider + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps.BuiltIn; + +public sealed class DeployStepProvider : IStepProvider +{ + private readonly IDeploymentService _deploymentService; + private readonly IStepCallbackHandler _callbackHandler; + private readonly ILogger _logger; + + public string Type => "deploy"; + public string DisplayName => "Deploy"; + public string Description => "Deploy release to target environment"; + + public StepSchema ConfigSchema => new() + { + Properties = + [ + new StepProperty { Name = "strategy", Type = StepPropertyType.String, Default = "rolling", Description = "Deployment strategy (rolling, blue-green, canary)" }, + new StepProperty { Name = "batchSize", Type = StepPropertyType.String, Default = "25%", Description = "Batch size for rolling deploys" }, + new StepProperty { Name = "timeout", Type = StepPropertyType.Integer, Default = 3600, Description = "Deployment timeout in seconds" }, + new StepProperty { Name = "healthCheck", Type = StepPropertyType.Boolean, Default = true, Description = "Wait for health checks" }, + new StepProperty { Name = "rollbackOnFailure", Type = StepPropertyType.Boolean, Default = true, Description = "Auto-rollback on failure" } + ] + }; + + public StepCapabilities Capabilities => new() + { + SupportsRetry = false, // Deployments should not auto-retry + SupportsTimeout = true, + IsAsync = true, + RequiresAgent = true, + RequiredPermissions = ["deployment:create", "environment:deploy"] + }; + + public async Task ExecuteAsync(StepContext context, CancellationToken ct) + { + var releaseId = context.WorkflowContext.ReleaseId + ?? throw new InvalidOperationException("Deploy step requires a release"); + var environmentId = context.WorkflowContext.EnvironmentId + ?? throw new InvalidOperationException("Deploy step requires an environment"); + + var strategy = context.Config.GetValueOrDefault("strategy")?.ToString() ?? "rolling"; + var batchSize = context.Config.GetValueOrDefault("batchSize")?.ToString() ?? "25%"; + var timeoutSeconds = context.Config.GetValueOrDefault("timeout") as int? ?? 3600; + var healthCheck = context.Config.GetValueOrDefault("healthCheck") as bool? ?? true; + var rollbackOnFailure = context.Config.GetValueOrDefault("rollbackOnFailure") as bool? ?? true; + + // Create callback for deployment completion + var callback = await _callbackHandler.CreateCallbackAsync( + context.RunId, + context.StepId, + TimeSpan.FromSeconds(timeoutSeconds), + ct); + + // Start deployment + var deployment = await _deploymentService.CreateAsync(new CreateDeploymentRequest + { + ReleaseId = releaseId, + EnvironmentId = environmentId, + Strategy = Enum.Parse(strategy, ignoreCase: true), + BatchSize = batchSize, + WaitForHealthCheck = healthCheck, + RollbackOnFailure = rollbackOnFailure, + WorkflowRunId = context.RunId, + CallbackToken = callback.Token + }, ct); + + _logger.LogInformation( + "Started deployment {DeploymentId} for release {ReleaseId} to environment {EnvironmentId}", + deployment.Id, + releaseId, + environmentId); + + return StepResult.WaitingForCallback(callback.Token, callback.ExpiresAt); + } + + public Task ValidateConfigAsync( + IReadOnlyDictionary config, + CancellationToken ct) => + Task.FromResult(ConfigSchema.Validate(config)); +} +``` + +### RollbackStepProvider + +```csharp +namespace StellaOps.ReleaseOrchestrator.Workflow.Steps.BuiltIn; + +public sealed class RollbackStepProvider : IStepProvider +{ + private readonly IDeploymentService _deploymentService; + private readonly IReleaseCatalog _releaseCatalog; + private readonly IStepCallbackHandler _callbackHandler; + private readonly ILogger _logger; + + public string Type => "rollback"; + public string DisplayName => "Rollback"; + public string Description => "Rollback to previous release version"; + + public StepSchema ConfigSchema => new() + { + Properties = + [ + new StepProperty { Name = "targetRelease", Type = StepPropertyType.String, Description = "Specific release ID to rollback to (optional)" }, + new StepProperty { Name = "skipCount", Type = StepPropertyType.Integer, Default = 1, Description = "Number of releases to skip back" }, + new StepProperty { Name = "timeout", Type = StepPropertyType.Integer, Default = 1800, Description = "Rollback timeout in seconds" } + ] + }; + + public StepCapabilities Capabilities => new() + { + SupportsRetry = false, + SupportsTimeout = true, + IsAsync = true, + RequiredPermissions = ["deployment:rollback"] + }; + + public async Task ExecuteAsync(StepContext context, CancellationToken ct) + { + var environmentId = context.WorkflowContext.EnvironmentId + ?? throw new InvalidOperationException("Rollback step requires an environment"); + + var targetReleaseId = context.Config.GetValueOrDefault("targetRelease")?.ToString(); + var skipCount = context.Config.GetValueOrDefault("skipCount") as int? ?? 1; + var timeoutSeconds = context.Config.GetValueOrDefault("timeout") as int? ?? 1800; + + Guid rollbackToReleaseId; + + if (!string.IsNullOrEmpty(targetReleaseId)) + { + rollbackToReleaseId = Guid.Parse(targetReleaseId); + } + else + { + // Get deployment history and find previous release + var history = await _releaseCatalog.GetEnvironmentHistoryAsync( + environmentId, skipCount + 1, ct); + + if (history.Count <= skipCount) + { + return StepResult.Failed("No previous release to rollback to"); + } + + rollbackToReleaseId = history[skipCount].ReleaseId; + } + + var callback = await _callbackHandler.CreateCallbackAsync( + context.RunId, + context.StepId, + TimeSpan.FromSeconds(timeoutSeconds), + ct); + + var deployment = await _deploymentService.RollbackAsync(new RollbackRequest + { + EnvironmentId = environmentId, + TargetReleaseId = rollbackToReleaseId, + WorkflowRunId = context.RunId, + CallbackToken = callback.Token + }, ct); + + _logger.LogInformation( + "Started rollback to release {ReleaseId} in environment {EnvironmentId}", + rollbackToReleaseId, + environmentId); + + return StepResult.WaitingForCallback(callback.Token, callback.ExpiresAt); + } + + public Task ValidateConfigAsync( + IReadOnlyDictionary config, + CancellationToken ct) => + Task.FromResult(ValidationResult.Success()); +} +``` + +--- + +## Acceptance Criteria + +- [ ] Script step executes commands via agent +- [ ] Approval step creates approval request +- [ ] Approval completes via callback +- [ ] Notify step sends to configured channels +- [ ] Wait step delays execution +- [ ] Security gate checks vulnerability thresholds +- [ ] Deploy step triggers deployment +- [ ] Rollback step reverts to previous release +- [ ] All steps validate configuration +- [ ] Unit test coverage >=85% + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `ScriptStep_ExecutesViaAgent` | Script execution | +| `ApprovalStep_CreatesRequest` | Approval creation | +| `ApprovalStep_CallbackCompletes` | Callback handling | +| `NotifyStep_SendsNotification` | Notification sending | +| `WaitStep_DelaysExecution` | Wait behavior | +| `SecurityGate_PassesThreshold` | Pass case | +| `SecurityGate_FailsThreshold` | Fail case | +| `DeployStep_TriggersDeployment` | Deployment trigger | +| `RollbackStep_FindsPreviousRelease` | Rollback logic | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `ScriptStep_E2E` | Full script execution | +| `ApprovalWorkflow_E2E` | Approval flow | +| `DeploymentWorkflow_E2E` | Deploy flow | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 105_004 Step Executor | Internal | TODO | +| 103_003 Agent Manager | Internal | TODO | +| 107_* Deployment Execution | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| ScriptStepProvider | TODO | | +| ApprovalStepProvider | TODO | | +| NotifyStepProvider | TODO | | +| WaitStepProvider | TODO | | +| SecurityGateStepProvider | TODO | | +| DeployStepProvider | TODO | | +| RollbackStepProvider | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 10-Jan-2026 | Added Step Type Catalog (16 types) with v1/post-v1/plugin deferral strategy | diff --git a/docs/implplan/SPRINT_20260110_106_000_INDEX_promotion_gates.md b/docs/implplan/SPRINT_20260110_106_000_INDEX_promotion_gates.md new file mode 100644 index 000000000..b97eae6f4 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_106_000_INDEX_promotion_gates.md @@ -0,0 +1,254 @@ +# SPRINT INDEX: Phase 6 - Promotion & Gates + +> **Epic:** Release Orchestrator +> **Phase:** 6 - Promotion & Gates +> **Batch:** 106 +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_release_orchestrator.md) + +--- + +## Overview + +Phase 6 implements the Promotion system - managing release promotions between environments with approval workflows and policy gates. + +### Objectives + +- Promotion manager for promotion requests +- Approval gateway with multi-approver support +- Gate registry for built-in and plugin gates +- Security gate with vulnerability thresholds +- Decision engine combining gates and approvals + +--- + +## Sprint Structure + +| Sprint ID | Title | Module | Status | Dependencies | +|-----------|-------|--------|--------|--------------| +| 106_001 | Promotion Manager | PROMOT | TODO | 104_003, 103_001 | +| 106_002 | Approval Gateway | PROMOT | TODO | 106_001 | +| 106_003 | Gate Registry | PROMOT | TODO | 106_001 | +| 106_004 | Security Gate | PROMOT | TODO | 106_003 | +| 106_005 | Decision Engine | PROMOT | TODO | 106_002, 106_003 | + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ PROMOTION & GATES │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ PROMOTION MANAGER (106_001) │ │ +│ │ │ │ +│ │ Promotion Request ──────────────────────────────────────┐ │ │ +│ │ │ release_id: uuid │ │ │ +│ │ │ source_environment: staging │ │ │ +│ │ │ target_environment: production │ │ │ +│ │ │ requested_by: user-123 │ │ │ +│ │ │ reason: "Release v2.3.1 for Q1 launch" │ │ │ +│ │ └───────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ APPROVAL GATEWAY (106_002) │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Approval Flow │ │ │ +│ │ │ │ │ │ +│ │ │ pending ──► awaiting_approval ──┬──► approved ──► deploying │ │ │ +│ │ │ │ │ │ │ +│ │ │ └──► rejected │ │ │ +│ │ │ │ │ │ +│ │ │ Separation of Duties: requester ≠ approver │ │ │ +│ │ │ Multi-approval: 2 of 3 approvers required │ │ │ +│ │ └─────────────────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ GATE REGISTRY (106_003) │ │ +│ │ │ │ +│ │ Built-in Gates: Plugin Gates: │ │ +│ │ ├── security-gate ├── compliance-gate │ │ +│ │ ├── approval-gate ├── change-window-gate │ │ +│ │ ├── freeze-window-gate └── custom-policy-gate │ │ +│ │ └── manual-gate │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ SECURITY GATE (106_004) │ │ +│ │ │ │ +│ │ Config: Result: │ │ +│ │ ├── max_critical: 0 ├── passed: false │ │ +│ │ ├── max_high: 5 ├── blocking: true │ │ +│ │ ├── max_medium: -1 ├── message: "3 critical vulns found" │ │ +│ │ └── require_sbom: true └── details: { critical: 3, high: 2 } │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ DECISION ENGINE (106_005) │ │ +│ │ │ │ +│ │ Input: Promotion + Gates + Approvals │ │ +│ │ │ │ +│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ +│ │ │ Security Gate│ │Approval Gate │ │ Freeze Gate │ │ │ +│ │ │ ✓ PASS │ │ ✓ PASS │ │ ✓ PASS │ │ │ +│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ +│ │ │ │ │ +│ │ ▼ │ │ +│ │ Decision: ALLOW ───────────────────────────────────────────────────►│ │ +│ │ │ │ +│ │ Decision Record: { gates: [...], approvals: [...], decision: allow } │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Deliverables Summary + +### 106_001: Promotion Manager + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IPromotionManager` | Interface | Promotion operations | +| `PromotionManager` | Class | Implementation | +| `Promotion` | Model | Promotion entity | +| `PromotionValidator` | Class | Business rules | + +### 106_002: Approval Gateway + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IApprovalGateway` | Interface | Approval operations | +| `ApprovalGateway` | Class | Implementation | +| `Approval` | Model | Approval record | +| `SeparationOfDuties` | Class | SoD enforcement | + +### 106_003: Gate Registry + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IGateRegistry` | Interface | Gate lookup | +| `GateRegistry` | Class | Implementation | +| `GateDefinition` | Model | Gate metadata | +| `GateEvaluator` | Class | Execute gates | + +### 106_004: Security Gate + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `SecurityGate` | Gate | Vulnerability threshold gate | +| `SecurityGateConfig` | Config | Threshold configuration | +| `VulnerabilityCounter` | Class | Count by severity | + +### 106_005: Decision Engine + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IDecisionEngine` | Interface | Decision evaluation | +| `DecisionEngine` | Class | Implementation | +| `DecisionRecord` | Model | Decision with evidence | +| `DecisionRules` | Class | Gate combination rules | + +--- + +## Key Interfaces + +```csharp +public interface IPromotionManager +{ + Task RequestAsync(PromotionRequest request, CancellationToken ct); + Task ApproveAsync(Guid promotionId, ApprovalRequest request, CancellationToken ct); + Task RejectAsync(Guid promotionId, RejectionRequest request, CancellationToken ct); + Task CancelAsync(Guid promotionId, CancellationToken ct); + Task GetAsync(Guid promotionId, CancellationToken ct); + Task> ListPendingAsync(Guid? environmentId, CancellationToken ct); +} + +public interface IDecisionEngine +{ + Task EvaluateAsync(Guid promotionId, CancellationToken ct); + Task EvaluateGateAsync(Guid promotionId, string gateName, CancellationToken ct); +} + +public interface IGateProvider +{ + string GateType { get; } + Task EvaluateAsync(GateContext context, CancellationToken ct); +} +``` + +--- + +## Promotion State Machine + +``` +┌─────────┐ +│ pending │ +└────┬────┘ + │ submit + ▼ +┌───────────────────┐ +│ awaiting_approval │◄─────────┐ +└─────────┬─────────┘ │ + │ │ + ┌─────┴─────┐ more approvers + │ │ needed + ▼ ▼ │ +┌────────┐ ┌────────┐ │ +│approved│ │rejected│ │ +└───┬────┘ └────────┘ │ + │ │ + │ gates pass │ + ▼ │ +┌──────────┐ │ +│ deploying│───────────────────┘ +└────┬─────┘ rollback + │ + ├──────────────┐ + ▼ ▼ +┌────────┐ ┌────────┐ +│deployed│ │ failed │ +└────────┘ └───┬────┘ + │ + ▼ + ┌───────────┐ + │rolled_back│ + └───────────┘ +``` + +--- + +## Dependencies + +| Module | Purpose | +|--------|---------| +| 104_003 Release Manager | Release to promote | +| 103_001 Environment CRUD | Target environment | +| Scanner | Security data | + +--- + +## Acceptance Criteria + +- [ ] Promotion request created +- [ ] Approval flow works +- [ ] Separation of duties enforced +- [ ] Multiple approvers supported +- [ ] Security gate blocks on vulns +- [ ] Freeze window blocks promotions +- [ ] Decision record captured +- [ ] Gate results aggregated +- [ ] Unit test coverage ≥80% + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Phase 6 index created | diff --git a/docs/implplan/SPRINT_20260110_106_001_PROMOT_promotion_manager.md b/docs/implplan/SPRINT_20260110_106_001_PROMOT_promotion_manager.md new file mode 100644 index 000000000..365c647ca --- /dev/null +++ b/docs/implplan/SPRINT_20260110_106_001_PROMOT_promotion_manager.md @@ -0,0 +1,600 @@ +# SPRINT: Promotion Manager + +> **Sprint ID:** 106_001 +> **Module:** PROMOT +> **Phase:** 6 - Promotion & Gates +> **Status:** TODO +> **Parent:** [106_000_INDEX](SPRINT_20260110_106_000_INDEX_promotion_gates.md) + +--- + +## Overview + +Implement the Promotion Manager for handling release promotion requests between environments. + +### Objectives + +- Create promotion requests with release and environment +- Validate promotion prerequisites +- Track promotion lifecycle states +- Support promotion cancellation + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Promotion/ +│ ├── Manager/ +│ │ ├── IPromotionManager.cs +│ │ ├── PromotionManager.cs +│ │ ├── PromotionValidator.cs +│ │ └── PromotionStateMachine.cs +│ ├── Store/ +│ │ ├── IPromotionStore.cs +│ │ └── PromotionStore.cs +│ └── Models/ +│ ├── Promotion.cs +│ ├── PromotionStatus.cs +│ └── PromotionRequest.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Promotion.Tests/ + └── Manager/ +``` + +--- + +## Architecture Reference + +- [Promotion Manager](../modules/release-orchestrator/modules/promotion-manager.md) +- [Data Model Schema](../modules/release-orchestrator/data-model/schema.md) + +--- + +## Deliverables + +### IPromotionManager Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Manager; + +public interface IPromotionManager +{ + Task RequestAsync(CreatePromotionRequest request, CancellationToken ct = default); + Task SubmitAsync(Guid promotionId, CancellationToken ct = default); + Task GetAsync(Guid promotionId, CancellationToken ct = default); + Task> ListAsync(PromotionFilter? filter = null, CancellationToken ct = default); + Task> ListPendingApprovalsAsync(Guid? environmentId = null, CancellationToken ct = default); + Task> ListByReleaseAsync(Guid releaseId, CancellationToken ct = default); + Task CancelAsync(Guid promotionId, string? reason = null, CancellationToken ct = default); + Task UpdateStatusAsync(Guid promotionId, PromotionStatus status, CancellationToken ct = default); +} + +public sealed record CreatePromotionRequest( + Guid ReleaseId, + Guid SourceEnvironmentId, + Guid TargetEnvironmentId, + string? Reason = null, + bool AutoSubmit = false +); + +public sealed record PromotionFilter( + Guid? ReleaseId = null, + Guid? SourceEnvironmentId = null, + Guid? TargetEnvironmentId = null, + PromotionStatus? Status = null, + Guid? RequestedBy = null, + DateTimeOffset? RequestedAfter = null, + DateTimeOffset? RequestedBefore = null +); +``` + +### Promotion Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Models; + +public sealed record Promotion +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required Guid ReleaseId { get; init; } + public required string ReleaseName { get; init; } + public required Guid SourceEnvironmentId { get; init; } + public required string SourceEnvironmentName { get; init; } + public required Guid TargetEnvironmentId { get; init; } + public required string TargetEnvironmentName { get; init; } + public required PromotionStatus Status { get; init; } + public string? Reason { get; init; } + public string? RejectionReason { get; init; } + public string? CancellationReason { get; init; } + public string? FailureReason { get; init; } + public ImmutableArray Approvals { get; init; } = []; + public ImmutableArray GateResults { get; init; } = []; + public Guid? DeploymentId { get; init; } + public DateTimeOffset RequestedAt { get; init; } + public DateTimeOffset? SubmittedAt { get; init; } + public DateTimeOffset? ApprovedAt { get; init; } + public DateTimeOffset? DeployedAt { get; init; } + public DateTimeOffset? CompletedAt { get; init; } + public Guid RequestedBy { get; init; } + public string RequestedByName { get; init; } = ""; + + public bool IsActive => Status is + PromotionStatus.Pending or + PromotionStatus.AwaitingApproval or + PromotionStatus.Approved or + PromotionStatus.Deploying; + + public bool IsTerminal => Status is + PromotionStatus.Deployed or + PromotionStatus.Rejected or + PromotionStatus.Cancelled or + PromotionStatus.Failed or + PromotionStatus.RolledBack; +} + +public enum PromotionStatus +{ + Pending, // Created, not yet submitted + AwaitingApproval, // Submitted, waiting for approvals + Approved, // Approvals complete, ready to deploy + Deploying, // Deployment in progress + Deployed, // Successfully deployed + Rejected, // Approval rejected + Cancelled, // Cancelled by requester + Failed, // Deployment failed + RolledBack // Rolled back after failure +} + +public sealed record ApprovalRecord( + Guid UserId, + string UserName, + ApprovalDecision Decision, + string? Comment, + DateTimeOffset DecidedAt +); + +public enum ApprovalDecision +{ + Approved, + Rejected +} + +public sealed record GateResult( + string GateName, + string GateType, + bool Passed, + bool Blocking, + string? Message, + ImmutableDictionary Details, + DateTimeOffset EvaluatedAt +); +``` + +### PromotionManager Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Manager; + +public sealed class PromotionManager : IPromotionManager +{ + private readonly IPromotionStore _store; + private readonly IPromotionValidator _validator; + private readonly PromotionStateMachine _stateMachine; + private readonly IReleaseManager _releaseManager; + private readonly IEnvironmentService _environmentService; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + private readonly ILogger _logger; + + public async Task RequestAsync( + CreatePromotionRequest request, + CancellationToken ct = default) + { + // Validate request + var validation = await _validator.ValidateRequestAsync(request, ct); + if (!validation.IsValid) + { + throw new PromotionValidationException(validation.Errors); + } + + // Get release and environments + var release = await _releaseManager.GetAsync(request.ReleaseId, ct) + ?? throw new ReleaseNotFoundException(request.ReleaseId); + + var sourceEnv = await _environmentService.GetAsync(request.SourceEnvironmentId, ct) + ?? throw new EnvironmentNotFoundException(request.SourceEnvironmentId); + + var targetEnv = await _environmentService.GetAsync(request.TargetEnvironmentId, ct) + ?? throw new EnvironmentNotFoundException(request.TargetEnvironmentId); + + var now = _timeProvider.GetUtcNow(); + + var promotion = new Promotion + { + Id = _guidGenerator.NewGuid(), + TenantId = _tenantContext.TenantId, + ReleaseId = release.Id, + ReleaseName = release.Name, + SourceEnvironmentId = sourceEnv.Id, + SourceEnvironmentName = sourceEnv.Name, + TargetEnvironmentId = targetEnv.Id, + TargetEnvironmentName = targetEnv.Name, + Status = PromotionStatus.Pending, + Reason = request.Reason, + RequestedAt = now, + RequestedBy = _userContext.UserId, + RequestedByName = _userContext.UserName + }; + + await _store.SaveAsync(promotion, ct); + + await _eventPublisher.PublishAsync(new PromotionRequested( + promotion.Id, + promotion.TenantId, + promotion.ReleaseName, + promotion.SourceEnvironmentName, + promotion.TargetEnvironmentName, + now, + _userContext.UserId + ), ct); + + _logger.LogInformation( + "Created promotion {PromotionId} for release {Release} to {Environment}", + promotion.Id, + release.Name, + targetEnv.Name); + + // Auto-submit if requested + if (request.AutoSubmit) + { + promotion = await SubmitAsync(promotion.Id, ct); + } + + return promotion; + } + + public async Task SubmitAsync(Guid promotionId, CancellationToken ct = default) + { + var promotion = await _store.GetAsync(promotionId, ct) + ?? throw new PromotionNotFoundException(promotionId); + + _stateMachine.ValidateTransition(promotion.Status, PromotionStatus.AwaitingApproval); + + var updatedPromotion = promotion with + { + Status = PromotionStatus.AwaitingApproval, + SubmittedAt = _timeProvider.GetUtcNow() + }; + + await _store.SaveAsync(updatedPromotion, ct); + + await _eventPublisher.PublishAsync(new PromotionSubmitted( + promotionId, + promotion.TenantId, + promotion.TargetEnvironmentId, + _timeProvider.GetUtcNow() + ), ct); + + return updatedPromotion; + } + + public async Task CancelAsync( + Guid promotionId, + string? reason = null, + CancellationToken ct = default) + { + var promotion = await _store.GetAsync(promotionId, ct) + ?? throw new PromotionNotFoundException(promotionId); + + if (promotion.IsTerminal) + { + throw new PromotionAlreadyTerminalException(promotionId); + } + + // Only requester or admin can cancel + if (promotion.RequestedBy != _userContext.UserId && + !_userContext.IsInRole("admin")) + { + throw new UnauthorizedPromotionActionException(promotionId, "cancel"); + } + + var updatedPromotion = promotion with + { + Status = PromotionStatus.Cancelled, + CancellationReason = reason, + CompletedAt = _timeProvider.GetUtcNow() + }; + + await _store.SaveAsync(updatedPromotion, ct); + + await _eventPublisher.PublishAsync(new PromotionCancelled( + promotionId, + promotion.TenantId, + reason, + _timeProvider.GetUtcNow() + ), ct); + + return updatedPromotion; + } + + public async Task> ListPendingApprovalsAsync( + Guid? environmentId = null, + CancellationToken ct = default) + { + var filter = new PromotionFilter( + Status: PromotionStatus.AwaitingApproval, + TargetEnvironmentId: environmentId + ); + + return await _store.ListAsync(filter, ct); + } +} +``` + +### PromotionValidator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Manager; + +public sealed class PromotionValidator : IPromotionValidator +{ + private readonly IReleaseManager _releaseManager; + private readonly IEnvironmentService _environmentService; + private readonly IFreezeWindowService _freezeWindowService; + private readonly IPromotionStore _promotionStore; + + public async Task ValidateRequestAsync( + CreatePromotionRequest request, + CancellationToken ct = default) + { + var errors = new List(); + + // Check release exists and is finalized + var release = await _releaseManager.GetAsync(request.ReleaseId, ct); + if (release is null) + { + errors.Add($"Release {request.ReleaseId} not found"); + } + else if (release.Status == ReleaseStatus.Draft) + { + errors.Add("Cannot promote a draft release"); + } + else if (release.Status == ReleaseStatus.Deprecated) + { + errors.Add("Cannot promote a deprecated release"); + } + + // Check environments exist + var sourceEnv = await _environmentService.GetAsync(request.SourceEnvironmentId, ct); + var targetEnv = await _environmentService.GetAsync(request.TargetEnvironmentId, ct); + + if (sourceEnv is null) + { + errors.Add($"Source environment {request.SourceEnvironmentId} not found"); + } + if (targetEnv is null) + { + errors.Add($"Target environment {request.TargetEnvironmentId} not found"); + } + + // Validate environment order (target must be after source) + if (sourceEnv is not null && targetEnv is not null) + { + if (sourceEnv.OrderIndex >= targetEnv.OrderIndex) + { + errors.Add("Target environment must be later in promotion order than source"); + } + } + + // Check for freeze window on target + if (targetEnv is not null) + { + var isFrozen = await _freezeWindowService.IsEnvironmentFrozenAsync(targetEnv.Id, ct); + if (isFrozen) + { + errors.Add($"Target environment {targetEnv.Name} is currently frozen"); + } + } + + // Check for existing active promotion + var existingPromotions = await _promotionStore.ListAsync(new PromotionFilter( + ReleaseId: request.ReleaseId, + TargetEnvironmentId: request.TargetEnvironmentId + ), ct); + + if (existingPromotions.Any(p => p.IsActive)) + { + errors.Add("An active promotion already exists for this release and environment"); + } + + return errors.Count == 0 + ? ValidationResult.Success() + : ValidationResult.Failure(errors); + } +} +``` + +### PromotionStateMachine + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Manager; + +public sealed class PromotionStateMachine +{ + private static readonly ImmutableDictionary> ValidTransitions = + new Dictionary> + { + [PromotionStatus.Pending] = [PromotionStatus.AwaitingApproval, PromotionStatus.Cancelled], + [PromotionStatus.AwaitingApproval] = [PromotionStatus.Approved, PromotionStatus.Rejected, PromotionStatus.Cancelled], + [PromotionStatus.Approved] = [PromotionStatus.Deploying, PromotionStatus.Cancelled], + [PromotionStatus.Deploying] = [PromotionStatus.Deployed, PromotionStatus.Failed], + [PromotionStatus.Failed] = [PromotionStatus.RolledBack, PromotionStatus.AwaitingApproval], + [PromotionStatus.Deployed] = [], + [PromotionStatus.Rejected] = [], + [PromotionStatus.Cancelled] = [], + [PromotionStatus.RolledBack] = [] + }.ToImmutableDictionary(); + + public bool CanTransition(PromotionStatus from, PromotionStatus to) + { + return ValidTransitions.TryGetValue(from, out var targets) && + targets.Contains(to); + } + + public void ValidateTransition(PromotionStatus from, PromotionStatus to) + { + if (!CanTransition(from, to)) + { + throw new InvalidPromotionTransitionException(from, to); + } + } + + public IReadOnlyList GetValidTransitions(PromotionStatus current) + { + return ValidTransitions.TryGetValue(current, out var targets) + ? targets + : []; + } +} +``` + +### Domain Events + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Events; + +public sealed record PromotionRequested( + Guid PromotionId, + Guid TenantId, + string ReleaseName, + string SourceEnvironment, + string TargetEnvironment, + DateTimeOffset RequestedAt, + Guid RequestedBy +) : IDomainEvent; + +public sealed record PromotionSubmitted( + Guid PromotionId, + Guid TenantId, + Guid TargetEnvironmentId, + DateTimeOffset SubmittedAt +) : IDomainEvent; + +public sealed record PromotionApproved( + Guid PromotionId, + Guid TenantId, + int ApprovalCount, + DateTimeOffset ApprovedAt +) : IDomainEvent; + +public sealed record PromotionRejected( + Guid PromotionId, + Guid TenantId, + Guid RejectedBy, + string Reason, + DateTimeOffset RejectedAt +) : IDomainEvent; + +public sealed record PromotionCancelled( + Guid PromotionId, + Guid TenantId, + string? Reason, + DateTimeOffset CancelledAt +) : IDomainEvent; + +public sealed record PromotionDeployed( + Guid PromotionId, + Guid TenantId, + Guid DeploymentId, + DateTimeOffset DeployedAt +) : IDomainEvent; +``` + +### Documentation Deliverables + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `docs/modules/release-orchestrator/api/promotions.md` (partial) | Markdown | API endpoint documentation for promotion requests (create, list, get, cancel) | + +--- + +## Acceptance Criteria + +### Code + +- [ ] Create promotion request +- [ ] Validate release is finalized +- [ ] Validate environment order +- [ ] Check for freeze window +- [ ] Prevent duplicate active promotions +- [ ] Submit promotion for approval +- [ ] Cancel promotion +- [ ] State machine validates transitions +- [ ] List pending approvals +- [ ] Unit test coverage >=85% + +### Documentation +- [ ] Promotion API endpoints documented +- [ ] Create promotion request documented with full schema +- [ ] List/Get/Cancel promotion endpoints documented +- [ ] Promotion state machine referenced + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `RequestPromotion_ValidRequest_Succeeds` | Creation works | +| `RequestPromotion_DraftRelease_Fails` | Draft release rejected | +| `RequestPromotion_FrozenEnvironment_Fails` | Freeze check works | +| `RequestPromotion_DuplicateActive_Fails` | Duplicate check works | +| `SubmitPromotion_ChangesStatus` | Submission works | +| `CancelPromotion_ByRequester_Succeeds` | Cancellation works | +| `StateMachine_ValidTransition_Succeeds` | State transitions | +| `StateMachine_InvalidTransition_Fails` | Invalid transitions blocked | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `PromotionLifecycle_E2E` | Full promotion flow | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 104_003 Release Manager | Internal | TODO | +| 103_001 Environment CRUD | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IPromotionManager | TODO | | +| PromotionManager | TODO | | +| PromotionValidator | TODO | | +| PromotionStateMachine | TODO | | +| Promotion model | TODO | | +| IPromotionStore | TODO | | +| PromotionStore | TODO | | +| Domain events | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 11-Jan-2026 | Added documentation deliverable: api/promotions.md (partial - promotions) | diff --git a/docs/implplan/SPRINT_20260110_106_002_PROMOT_approval_gateway.md b/docs/implplan/SPRINT_20260110_106_002_PROMOT_approval_gateway.md new file mode 100644 index 000000000..c61f7f318 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_106_002_PROMOT_approval_gateway.md @@ -0,0 +1,648 @@ +# SPRINT: Approval Gateway + +> **Sprint ID:** 106_002 +> **Module:** PROMOT +> **Phase:** 6 - Promotion & Gates +> **Status:** TODO +> **Parent:** [106_000_INDEX](SPRINT_20260110_106_000_INDEX_promotion_gates.md) + +--- + +## Overview + +Implement the Approval Gateway for managing approval workflows with multi-approver and separation of duties support. + +### Objectives + +- Process approval/rejection decisions +- Enforce separation of duties (requester != approver) +- Support multi-approver requirements +- Track approval history + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Promotion/ +│ ├── Approval/ +│ │ ├── IApprovalGateway.cs +│ │ ├── ApprovalGateway.cs +│ │ ├── SeparationOfDutiesEnforcer.cs +│ │ ├── ApprovalEligibilityChecker.cs +│ │ └── ApprovalNotifier.cs +│ ├── Store/ +│ │ ├── IApprovalStore.cs +│ │ └── ApprovalStore.cs +│ └── Models/ +│ ├── Approval.cs +│ └── ApprovalConfig.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Promotion.Tests/ + └── Approval/ +``` + +--- + +## Architecture Reference + +- [Promotion Manager](../modules/release-orchestrator/modules/promotion-manager.md) + +--- + +## Deliverables + +### IApprovalGateway Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Approval; + +public interface IApprovalGateway +{ + Task ApproveAsync(Guid promotionId, ApprovalRequest request, CancellationToken ct = default); + Task RejectAsync(Guid promotionId, RejectionRequest request, CancellationToken ct = default); + Task GetStatusAsync(Guid promotionId, CancellationToken ct = default); + Task> GetHistoryAsync(Guid promotionId, CancellationToken ct = default); + Task> GetEligibleApproversAsync(Guid promotionId, CancellationToken ct = default); + Task CanUserApproveAsync(Guid promotionId, Guid userId, CancellationToken ct = default); +} + +public sealed record ApprovalRequest( + string? Comment = null +); + +public sealed record RejectionRequest( + string Reason +); + +public sealed record ApprovalResult( + bool Success, + ApprovalStatus Status, + string? Message = null +); + +public sealed record ApprovalStatus( + int RequiredApprovals, + int CurrentApprovals, + bool IsApproved, + bool IsRejected, + IReadOnlyList Approvals +); + +public sealed record EligibleApprover( + Guid UserId, + string UserName, + string? Email, + bool HasAlreadyDecided, + ApprovalDecision? Decision +); +``` + +### Approval Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Models; + +public sealed record Approval +{ + public required Guid Id { get; init; } + public required Guid PromotionId { get; init; } + public required Guid UserId { get; init; } + public required string UserName { get; init; } + public required ApprovalDecision Decision { get; init; } + public string? Comment { get; init; } + public required DateTimeOffset DecidedAt { get; init; } +} + +public sealed record ApprovalConfig +{ + public required int RequiredApprovals { get; init; } + public required bool RequireSeparationOfDuties { get; init; } + public ImmutableArray ApproverUserIds { get; init; } = []; + public ImmutableArray ApproverGroupNames { get; init; } = []; + public TimeSpan? Timeout { get; init; } + public bool AutoApproveOnTimeout { get; init; } = false; +} +``` + +### ApprovalGateway Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Approval; + +public sealed class ApprovalGateway : IApprovalGateway +{ + private readonly IPromotionStore _promotionStore; + private readonly IApprovalStore _approvalStore; + private readonly IEnvironmentService _environmentService; + private readonly SeparationOfDutiesEnforcer _sodEnforcer; + private readonly ApprovalEligibilityChecker _eligibilityChecker; + private readonly IPromotionManager _promotionManager; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + private readonly ILogger _logger; + + public async Task ApproveAsync( + Guid promotionId, + ApprovalRequest request, + CancellationToken ct = default) + { + var promotion = await _promotionStore.GetAsync(promotionId, ct) + ?? throw new PromotionNotFoundException(promotionId); + + if (promotion.Status != PromotionStatus.AwaitingApproval) + { + return new ApprovalResult(false, await GetStatusAsync(promotionId, ct), + "Promotion is not awaiting approval"); + } + + // Check eligibility + var canApprove = await CanUserApproveAsync(promotionId, _userContext.UserId, ct); + if (!canApprove) + { + return new ApprovalResult(false, await GetStatusAsync(promotionId, ct), + "User is not eligible to approve this promotion"); + } + + // Record approval + var approval = new Approval + { + Id = _guidGenerator.NewGuid(), + PromotionId = promotionId, + UserId = _userContext.UserId, + UserName = _userContext.UserName, + Decision = ApprovalDecision.Approved, + Comment = request.Comment, + DecidedAt = _timeProvider.GetUtcNow() + }; + + await _approvalStore.SaveAsync(approval, ct); + + // Update promotion with new approval + var updatedApprovals = promotion.Approvals.Add(new ApprovalRecord( + approval.UserId, + approval.UserName, + approval.Decision, + approval.Comment, + approval.DecidedAt + )); + + var updatedPromotion = promotion with { Approvals = updatedApprovals }; + + // Check if we have enough approvals + var config = await GetApprovalConfigAsync(promotion.TargetEnvironmentId, ct); + var approvalCount = updatedApprovals.Count(a => a.Decision == ApprovalDecision.Approved); + + if (approvalCount >= config.RequiredApprovals) + { + updatedPromotion = updatedPromotion with + { + Status = PromotionStatus.Approved, + ApprovedAt = _timeProvider.GetUtcNow() + }; + + await _eventPublisher.PublishAsync(new PromotionApproved( + promotionId, + promotion.TenantId, + approvalCount, + _timeProvider.GetUtcNow() + ), ct); + } + + await _promotionStore.SaveAsync(updatedPromotion, ct); + + _logger.LogInformation( + "User {User} approved promotion {PromotionId} ({Current}/{Required})", + _userContext.UserName, + promotionId, + approvalCount, + config.RequiredApprovals); + + return new ApprovalResult(true, await GetStatusAsync(promotionId, ct)); + } + + public async Task RejectAsync( + Guid promotionId, + RejectionRequest request, + CancellationToken ct = default) + { + var promotion = await _promotionStore.GetAsync(promotionId, ct) + ?? throw new PromotionNotFoundException(promotionId); + + if (promotion.Status != PromotionStatus.AwaitingApproval) + { + return new ApprovalResult(false, await GetStatusAsync(promotionId, ct), + "Promotion is not awaiting approval"); + } + + var canApprove = await CanUserApproveAsync(promotionId, _userContext.UserId, ct); + if (!canApprove) + { + return new ApprovalResult(false, await GetStatusAsync(promotionId, ct), + "User is not eligible to reject this promotion"); + } + + var approval = new Approval + { + Id = _guidGenerator.NewGuid(), + PromotionId = promotionId, + UserId = _userContext.UserId, + UserName = _userContext.UserName, + Decision = ApprovalDecision.Rejected, + Comment = request.Reason, + DecidedAt = _timeProvider.GetUtcNow() + }; + + await _approvalStore.SaveAsync(approval, ct); + + var updatedApprovals = promotion.Approvals.Add(new ApprovalRecord( + approval.UserId, + approval.UserName, + approval.Decision, + approval.Comment, + approval.DecidedAt + )); + + var updatedPromotion = promotion with + { + Status = PromotionStatus.Rejected, + RejectionReason = request.Reason, + Approvals = updatedApprovals, + CompletedAt = _timeProvider.GetUtcNow() + }; + + await _promotionStore.SaveAsync(updatedPromotion, ct); + + await _eventPublisher.PublishAsync(new PromotionRejected( + promotionId, + promotion.TenantId, + _userContext.UserId, + request.Reason, + _timeProvider.GetUtcNow() + ), ct); + + _logger.LogInformation( + "User {User} rejected promotion {PromotionId}: {Reason}", + _userContext.UserName, + promotionId, + request.Reason); + + return new ApprovalResult(true, await GetStatusAsync(promotionId, ct)); + } + + public async Task CanUserApproveAsync( + Guid promotionId, + Guid userId, + CancellationToken ct = default) + { + var promotion = await _promotionStore.GetAsync(promotionId, ct); + if (promotion is null) + return false; + + // Check separation of duties + var config = await GetApprovalConfigAsync(promotion.TargetEnvironmentId, ct); + if (config.RequireSeparationOfDuties && promotion.RequestedBy == userId) + { + return false; + } + + // Check if user already approved/rejected + if (promotion.Approvals.Any(a => a.UserId == userId)) + { + return false; + } + + // Check if user is in approvers list + return await _eligibilityChecker.IsEligibleAsync( + userId, config.ApproverUserIds, config.ApproverGroupNames, ct); + } + + private async Task GetApprovalConfigAsync( + Guid environmentId, + CancellationToken ct) + { + var environment = await _environmentService.GetAsync(environmentId, ct) + ?? throw new EnvironmentNotFoundException(environmentId); + + return new ApprovalConfig + { + RequiredApprovals = environment.RequiredApprovals, + RequireSeparationOfDuties = environment.RequireSeparationOfDuties, + // ApproverUserIds and ApproverGroupNames from environment config + }; + } +} +``` + +### SeparationOfDutiesEnforcer + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Approval; + +public sealed class SeparationOfDutiesEnforcer +{ + private readonly ILogger _logger; + + public ValidationResult Validate( + Promotion promotion, + Guid approvingUserId, + ApprovalConfig config) + { + if (!config.RequireSeparationOfDuties) + { + return ValidationResult.Success(); + } + + var errors = new List(); + + // Requester cannot approve their own promotion + if (promotion.RequestedBy == approvingUserId) + { + errors.Add("Separation of duties: requester cannot approve their own promotion"); + } + + // Check previous approvals don't include this user + if (promotion.Approvals.Any(a => a.UserId == approvingUserId)) + { + errors.Add("User has already provided an approval decision"); + } + + if (errors.Count > 0) + { + _logger.LogWarning( + "Separation of duties violation for promotion {PromotionId} by user {UserId}: {Errors}", + promotion.Id, + approvingUserId, + string.Join("; ", errors)); + } + + return errors.Count == 0 + ? ValidationResult.Success() + : ValidationResult.Failure(errors); + } +} +``` + +### ApprovalEligibilityChecker + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Approval; + +public sealed class ApprovalEligibilityChecker +{ + private readonly IUserService _userService; + private readonly IGroupService _groupService; + + public async Task IsEligibleAsync( + Guid userId, + ImmutableArray approverUserIds, + ImmutableArray approverGroupNames, + CancellationToken ct = default) + { + // If no specific approvers configured, any authenticated user can approve + if (approverUserIds.Length == 0 && approverGroupNames.Length == 0) + { + return true; + } + + // Check if user is directly in approvers list + if (approverUserIds.Contains(userId)) + { + return true; + } + + // Check if user is in any approver group + if (approverGroupNames.Length > 0) + { + var userGroups = await _groupService.GetUserGroupsAsync(userId, ct); + if (userGroups.Any(g => approverGroupNames.Contains(g.Name))) + { + return true; + } + } + + return false; + } + + public async Task> GetEligibleApproversAsync( + Guid promotionId, + ApprovalConfig config, + ImmutableArray existingApprovals, + CancellationToken ct = default) + { + var eligibleUsers = new List(); + + // Get users from direct list + foreach (var userId in config.ApproverUserIds) + { + var user = await _userService.GetAsync(userId, ct); + if (user is not null) + { + var existingApproval = existingApprovals.FirstOrDefault(a => a.UserId == userId); + eligibleUsers.Add(new EligibleApprover( + userId, + user.Name, + user.Email, + existingApproval is not null, + existingApproval?.Decision + )); + } + } + + // Get users from groups + foreach (var groupName in config.ApproverGroupNames) + { + var groupMembers = await _groupService.GetMembersAsync(groupName, ct); + foreach (var member in groupMembers) + { + if (!eligibleUsers.Any(u => u.UserId == member.Id)) + { + var existingApproval = existingApprovals.FirstOrDefault(a => a.UserId == member.Id); + eligibleUsers.Add(new EligibleApprover( + member.Id, + member.Name, + member.Email, + existingApproval is not null, + existingApproval?.Decision + )); + } + } + } + + return eligibleUsers.AsReadOnly(); + } +} +``` + +### ApprovalNotifier + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Approval; + +public sealed class ApprovalNotifier +{ + private readonly INotificationService _notificationService; + private readonly ApprovalEligibilityChecker _eligibilityChecker; + private readonly ILogger _logger; + + public async Task NotifyApprovalRequestedAsync( + Promotion promotion, + ApprovalConfig config, + CancellationToken ct = default) + { + var eligibleApprovers = await _eligibilityChecker.GetEligibleApproversAsync( + promotion.Id, config, promotion.Approvals, ct); + + var pendingApprovers = eligibleApprovers + .Where(a => !a.HasAlreadyDecided) + .ToList(); + + if (pendingApprovers.Count == 0) + { + _logger.LogWarning( + "No eligible approvers found for promotion {PromotionId}", + promotion.Id); + return; + } + + var notification = new NotificationRequest + { + Channel = "email", + Title = $"Approval Required: {promotion.ReleaseName} to {promotion.TargetEnvironmentName}", + Message = BuildApprovalMessage(promotion), + Recipients = pendingApprovers.Where(a => a.Email is not null).Select(a => a.Email!).ToList(), + Metadata = new Dictionary + { + ["promotionId"] = promotion.Id.ToString(), + ["releaseId"] = promotion.ReleaseId.ToString(), + ["targetEnvironment"] = promotion.TargetEnvironmentName + } + }; + + await _notificationService.SendAsync(notification, ct); + + _logger.LogInformation( + "Sent approval notification for promotion {PromotionId} to {Count} approvers", + promotion.Id, + pendingApprovers.Count); + } + + private static string BuildApprovalMessage(Promotion promotion) => + $"Release '{promotion.ReleaseName}' is requesting promotion from " + + $"{promotion.SourceEnvironmentName} to {promotion.TargetEnvironmentName}.\n\n" + + $"Requested by: {promotion.RequestedByName}\n" + + $"Reason: {promotion.Reason ?? "No reason provided"}\n\n" + + $"Please review and approve or reject this promotion."; +} +``` + +### Domain Events + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Events; + +public sealed record ApprovalDecisionRecorded( + Guid PromotionId, + Guid TenantId, + Guid UserId, + string UserName, + ApprovalDecision Decision, + DateTimeOffset DecidedAt +) : IDomainEvent; + +public sealed record ApprovalThresholdMet( + Guid PromotionId, + Guid TenantId, + int ApprovalCount, + int RequiredApprovals, + DateTimeOffset MetAt +) : IDomainEvent; +``` + +### Documentation Deliverables + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `docs/modules/release-orchestrator/api/promotions.md` (partial) | Markdown | API endpoint documentation for approvals (approve, reject, SoD enforcement) | + +--- + +## Acceptance Criteria + +### Code + +- [ ] Approve promotion with comment +- [ ] Reject promotion with reason +- [ ] Enforce separation of duties +- [ ] Support multi-approver requirements +- [ ] Check user eligibility +- [ ] List eligible approvers +- [ ] Track approval history +- [ ] Notify approvers on request +- [ ] Unit test coverage >=85% + +### Documentation +- [ ] Approval API endpoints documented +- [ ] Approve promotion endpoint documented (POST /api/v1/promotions/{id}/approve) +- [ ] Reject promotion endpoint documented +- [ ] Separation of duties rules explained +- [ ] Approval record schema included + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `Approve_ValidUser_Succeeds` | Approval works | +| `Approve_Requester_FailsSoD` | SoD enforcement | +| `Approve_AlreadyDecided_Fails` | Duplicate check | +| `Approve_ThresholdMet_ApprovesPromotion` | Threshold logic | +| `Reject_SetsStatusRejected` | Rejection works | +| `CanUserApprove_InGroup_ReturnsTrue` | Group membership | +| `GetEligibleApprovers_ReturnsCorrectList` | Eligibility list | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `ApprovalWorkflow_E2E` | Full approval flow | +| `MultiApprover_E2E` | Multi-approver scenario | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 106_001 Promotion Manager | Internal | TODO | +| Authority | Internal | Exists | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IApprovalGateway | TODO | | +| ApprovalGateway | TODO | | +| SeparationOfDutiesEnforcer | TODO | | +| ApprovalEligibilityChecker | TODO | | +| ApprovalNotifier | TODO | | +| Approval model | TODO | | +| IApprovalStore | TODO | | +| ApprovalStore | TODO | | +| Domain events | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 11-Jan-2026 | Added documentation deliverable: api/promotions.md (partial - approvals) | diff --git a/docs/implplan/SPRINT_20260110_106_003_PROMOT_gate_registry.md b/docs/implplan/SPRINT_20260110_106_003_PROMOT_gate_registry.md new file mode 100644 index 000000000..123a130f2 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_106_003_PROMOT_gate_registry.md @@ -0,0 +1,727 @@ +# SPRINT: Gate Registry + +> **Sprint ID:** 106_003 +> **Module:** PROMOT +> **Phase:** 6 - Promotion & Gates +> **Status:** TODO +> **Parent:** [106_000_INDEX](SPRINT_20260110_106_000_INDEX_promotion_gates.md) + +--- + +## Overview + +Implement the Gate Registry for managing built-in and plugin promotion gates. + +### Objectives + +- Register built-in gate types (8 types total) +- Load plugin gate types via `IGateProviderCapability` +- Execute gates in promotion context +- Track gate evaluation results + +### Gate Type Catalog + +The Release Orchestrator supports 8 built-in promotion gates. All gates implement `IGateProvider`. + +| Gate Type | Category | Blocking | Sprint | Description | +|-----------|----------|----------|--------|-------------| +| `security-gate` | Security | Yes | 106_004 | Blocks if vulnerabilities exceed thresholds | +| `policy-gate` | Compliance | Yes | 106_003 | Evaluates policy rules (OPA/Rego) | +| `freeze-window-gate` | Operational | Yes | 106_003 | Blocks during freeze windows | +| `manual-gate` | Operational | Yes | 106_003 | Requires manual confirmation | +| `approval-gate` | Compliance | Yes | 106_003 | Requires multi-party approval (N of M) | +| `schedule-gate` | Operational | Yes | 106_003 | Deployment window restrictions | +| `dependency-gate` | Quality | No | 106_003 | Checks upstream dependencies are healthy | +| `metric-gate` | Quality | Configurable | Plugin | SLO/error rate threshold checks | + +> **Note:** `metric-gate` is delivered as a plugin reference implementation via the Plugin SDK because it requires integration with external metrics systems (Prometheus, DataDog, etc.). See 101_004 for plugin SDK details. + +### Gate Categories + +- **Security:** Gates that block based on security findings (vulnerabilities, compliance) +- **Compliance:** Gates that enforce organizational policies and approvals +- **Quality:** Gates that check service health and dependencies +- **Operational:** Gates that manage deployment timing and manual interventions +- **Custom:** User-defined plugin gates + +### This Sprint's Scope + +This sprint (106_003) implements the Gate Registry and the following built-in gates: +- `freeze-window-gate` (blocking) +- `manual-gate` (blocking) +- `policy-gate` (blocking) +- `approval-gate` (blocking) +- `schedule-gate` (blocking) +- `dependency-gate` (non-blocking) + +> **Note:** `security-gate` is detailed in sprint 106_004. + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Promotion/ +│ ├── Gate/ +│ │ ├── IGateRegistry.cs +│ │ ├── GateRegistry.cs +│ │ ├── IGateProvider.cs +│ │ ├── GateEvaluator.cs +│ │ └── GateContext.cs +│ ├── Gate.BuiltIn/ +│ │ ├── FreezeWindowGate.cs +│ │ ├── ManualGate.cs +│ │ ├── PolicyGate.cs +│ │ ├── ApprovalGate.cs +│ │ ├── ScheduleGate.cs +│ │ └── DependencyGate.cs +│ └── Models/ +│ ├── GateDefinition.cs +│ ├── GateResult.cs +│ └── GateConfig.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Promotion.Tests/ + └── Gate/ +``` + +--- + +## Architecture Reference + +- [Promotion Manager](../modules/release-orchestrator/modules/promotion-manager.md) +- [Plugin System](../modules/release-orchestrator/plugins/gate-plugins.md) + +--- + +## Deliverables + +### IGateRegistry Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Gate; + +public interface IGateRegistry +{ + void RegisterBuiltIn(string gateName) where T : class, IGateProvider; + void RegisterPlugin(GateDefinition definition, IGateProvider provider); + Task GetProviderAsync(string gateName, CancellationToken ct = default); + GateDefinition? GetDefinition(string gateName); + IReadOnlyList GetAllDefinitions(); + IReadOnlyList GetBuiltInDefinitions(); + IReadOnlyList GetPluginDefinitions(); + bool IsRegistered(string gateName); +} + +public interface IGateProvider +{ + string GateName { get; } + string DisplayName { get; } + string Description { get; } + GateConfigSchema ConfigSchema { get; } + bool IsBlocking { get; } + + Task EvaluateAsync(GateContext context, CancellationToken ct = default); + Task ValidateConfigAsync(IReadOnlyDictionary config, CancellationToken ct = default); +} +``` + +### GateDefinition Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Models; + +public sealed record GateDefinition +{ + public required string GateName { get; init; } + public required string DisplayName { get; init; } + public required string Description { get; init; } + public required GateCategory Category { get; init; } + public required GateSource Source { get; init; } + public string? PluginId { get; init; } + public required GateConfigSchema ConfigSchema { get; init; } + public required bool IsBlocking { get; init; } + public string? DocumentationUrl { get; init; } +} + +public enum GateCategory +{ + Security, + Compliance, + Quality, + Operational, + Custom +} + +public enum GateSource +{ + BuiltIn, + Plugin +} + +public sealed record GateConfigSchema +{ + public ImmutableArray Properties { get; init; } = []; + public ImmutableArray Required { get; init; } = []; +} + +public sealed record GateConfigProperty( + string Name, + GatePropertyType Type, + string Description, + object? Default = null +); + +public enum GatePropertyType +{ + String, + Integer, + Boolean, + Array, + Object +} +``` + +### GateResult Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Models; + +public sealed record GateResult +{ + public required string GateName { get; init; } + public required string GateType { get; init; } + public required bool Passed { get; init; } + public required bool Blocking { get; init; } + public string? Message { get; init; } + public ImmutableDictionary Details { get; init; } = + ImmutableDictionary.Empty; + public required DateTimeOffset EvaluatedAt { get; init; } + public TimeSpan Duration { get; init; } + + public static GateResult Pass( + string gateName, + string gateType, + string? message = null, + ImmutableDictionary? details = null) => + new() + { + GateName = gateName, + GateType = gateType, + Passed = true, + Blocking = false, + Message = message, + Details = details ?? ImmutableDictionary.Empty, + EvaluatedAt = TimeProvider.System.GetUtcNow() + }; + + public static GateResult Fail( + string gateName, + string gateType, + string message, + bool blocking = true, + ImmutableDictionary? details = null) => + new() + { + GateName = gateName, + GateType = gateType, + Passed = false, + Blocking = blocking, + Message = message, + Details = details ?? ImmutableDictionary.Empty, + EvaluatedAt = TimeProvider.System.GetUtcNow() + }; +} +``` + +### GateContext Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Gate; + +public sealed record GateContext +{ + public required Guid PromotionId { get; init; } + public required Guid ReleaseId { get; init; } + public required string ReleaseName { get; init; } + public required Guid SourceEnvironmentId { get; init; } + public required Guid TargetEnvironmentId { get; init; } + public required string TargetEnvironmentName { get; init; } + public required ImmutableDictionary Config { get; init; } + public required Guid RequestedBy { get; init; } + public required DateTimeOffset RequestedAt { get; init; } +} +``` + +### GateRegistry Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Gate; + +public sealed class GateRegistry : IGateRegistry +{ + private readonly ConcurrentDictionary _gates = new(); + private readonly IServiceProvider _serviceProvider; + private readonly ILogger _logger; + + public void RegisterBuiltIn(string gateName) where T : class, IGateProvider + { + var provider = _serviceProvider.GetRequiredService(); + + var definition = new GateDefinition + { + GateName = gateName, + DisplayName = provider.DisplayName, + Description = provider.Description, + Category = InferCategory(gateName), + Source = GateSource.BuiltIn, + ConfigSchema = provider.ConfigSchema, + IsBlocking = provider.IsBlocking + }; + + if (!_gates.TryAdd(gateName, (definition, provider))) + { + throw new InvalidOperationException($"Gate '{gateName}' is already registered"); + } + + _logger.LogInformation("Registered built-in gate: {GateName}", gateName); + } + + public void RegisterPlugin(GateDefinition definition, IGateProvider provider) + { + if (definition.Source != GateSource.Plugin) + { + throw new ArgumentException("Definition must have Plugin source"); + } + + if (!_gates.TryAdd(definition.GateName, (definition, provider))) + { + throw new InvalidOperationException($"Gate '{definition.GateName}' is already registered"); + } + + _logger.LogInformation( + "Registered plugin gate: {GateName} from {PluginId}", + definition.GateName, + definition.PluginId); + } + + public Task GetProviderAsync(string gateName, CancellationToken ct = default) + { + return _gates.TryGetValue(gateName, out var entry) + ? Task.FromResult(entry.Provider) + : Task.FromResult(null); + } + + public GateDefinition? GetDefinition(string gateName) + { + return _gates.TryGetValue(gateName, out var entry) + ? entry.Definition + : null; + } + + public IReadOnlyList GetAllDefinitions() + { + return _gates.Values.Select(e => e.Definition).ToList().AsReadOnly(); + } + + public IReadOnlyList GetBuiltInDefinitions() + { + return _gates.Values + .Where(e => e.Definition.Source == GateSource.BuiltIn) + .Select(e => e.Definition) + .ToList() + .AsReadOnly(); + } + + public IReadOnlyList GetPluginDefinitions() + { + return _gates.Values + .Where(e => e.Definition.Source == GateSource.Plugin) + .Select(e => e.Definition) + .ToList() + .AsReadOnly(); + } + + public bool IsRegistered(string gateName) => _gates.ContainsKey(gateName); + + private static GateCategory InferCategory(string gateName) => + gateName switch + { + "security-gate" => GateCategory.Security, + "freeze-window-gate" => GateCategory.Operational, + "policy-gate" => GateCategory.Compliance, + "manual-gate" => GateCategory.Operational, + _ => GateCategory.Custom + }; +} +``` + +### GateEvaluator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Gate; + +public sealed class GateEvaluator +{ + private readonly IGateRegistry _registry; + private readonly ILogger _logger; + private readonly TimeProvider _timeProvider; + + public async Task EvaluateAsync( + string gateName, + GateContext context, + CancellationToken ct = default) + { + var sw = Stopwatch.StartNew(); + + var provider = await _registry.GetProviderAsync(gateName, ct); + if (provider is null) + { + return GateResult.Fail( + gateName, + "unknown", + $"Unknown gate type: {gateName}", + blocking: true); + } + + try + { + _logger.LogDebug( + "Evaluating gate {GateName} for promotion {PromotionId}", + gateName, + context.PromotionId); + + var result = await provider.EvaluateAsync(context, ct); + result = result with { Duration = sw.Elapsed }; + + _logger.LogInformation( + "Gate {GateName} for promotion {PromotionId}: {Result} in {Duration}ms", + gateName, + context.PromotionId, + result.Passed ? "PASSED" : "FAILED", + sw.ElapsedMilliseconds); + + return result; + } + catch (Exception ex) + { + _logger.LogError(ex, + "Gate {GateName} evaluation failed for promotion {PromotionId}", + gateName, + context.PromotionId); + + return GateResult.Fail( + gateName, + provider.GateName, + $"Gate evaluation failed: {ex.Message}", + blocking: provider.IsBlocking); + } + } + + public async Task> EvaluateAllAsync( + IReadOnlyList gateNames, + GateContext context, + CancellationToken ct = default) + { + var tasks = gateNames.Select(name => EvaluateAsync(name, context, ct)); + var results = await Task.WhenAll(tasks); + return results.ToList().AsReadOnly(); + } +} +``` + +### FreezeWindowGate (Built-in) + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Gate.BuiltIn; + +public sealed class FreezeWindowGate : IGateProvider +{ + private readonly IFreezeWindowService _freezeWindowService; + + public string GateName => "freeze-window-gate"; + public string DisplayName => "Freeze Window Gate"; + public string Description => "Blocks promotion during freeze windows"; + public bool IsBlocking => true; + + public GateConfigSchema ConfigSchema => new() + { + Properties = + [ + new GateConfigProperty( + "allowExemptions", + GatePropertyType.Boolean, + "Allow exemptions to bypass freeze", + Default: true) + ] + }; + + public async Task EvaluateAsync(GateContext context, CancellationToken ct) + { + var activeFreezeWindow = await _freezeWindowService.GetActiveFreezeWindowAsync( + context.TargetEnvironmentId, ct); + + if (activeFreezeWindow is null) + { + return GateResult.Pass( + GateName, + GateName, + "No active freeze window"); + } + + // Check for exemption + var allowExemptions = context.Config.GetValueOrDefault("allowExemptions") as bool? ?? true; + if (allowExemptions) + { + var hasExemption = await _freezeWindowService.HasExemptionAsync( + activeFreezeWindow.Id, context.RequestedBy, ct); + + if (hasExemption) + { + return GateResult.Pass( + GateName, + GateName, + "Freeze window active but user has exemption", + new Dictionary + { + ["freezeWindowId"] = activeFreezeWindow.Id, + ["exemptionGranted"] = true + }.ToImmutableDictionary()); + } + } + + return GateResult.Fail( + GateName, + GateName, + $"Environment is frozen: {activeFreezeWindow.Name}", + blocking: true, + new Dictionary + { + ["freezeWindowId"] = activeFreezeWindow.Id, + ["freezeWindowName"] = activeFreezeWindow.Name, + ["endsAt"] = activeFreezeWindow.EndAt.ToString("O") + }.ToImmutableDictionary()); + } + + public Task ValidateConfigAsync( + IReadOnlyDictionary config, + CancellationToken ct) => + Task.FromResult(ValidationResult.Success()); +} +``` + +### ManualGate (Built-in) + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Gate.BuiltIn; + +public sealed class ManualGate : IGateProvider +{ + private readonly IPromotionStore _promotionStore; + private readonly IStepCallbackHandler _callbackHandler; + + public string GateName => "manual-gate"; + public string DisplayName => "Manual Gate"; + public string Description => "Requires manual confirmation to proceed"; + public bool IsBlocking => true; + + public GateConfigSchema ConfigSchema => new() + { + Properties = + [ + new GateConfigProperty( + "message", + GatePropertyType.String, + "Message to display for manual confirmation"), + new GateConfigProperty( + "confirmers", + GatePropertyType.Array, + "User IDs or group names who can confirm"), + new GateConfigProperty( + "timeout", + GatePropertyType.Integer, + "Timeout in seconds", + Default: 86400) + ], + Required = ["message"] + }; + + public async Task EvaluateAsync(GateContext context, CancellationToken ct) + { + var message = context.Config.GetValueOrDefault("message")?.ToString() ?? "Manual confirmation required"; + var timeoutSeconds = context.Config.GetValueOrDefault("timeout") as int? ?? 86400; + + // Create callback for manual confirmation + var callback = await _callbackHandler.CreateCallbackAsync( + context.PromotionId, + "manual-gate", + TimeSpan.FromSeconds(timeoutSeconds), + ct); + + // Return a result indicating we're waiting + return new GateResult + { + GateName = GateName, + GateType = GateName, + Passed = false, + Blocking = true, + Message = message, + Details = new Dictionary + { + ["callbackToken"] = callback.Token, + ["expiresAt"] = callback.ExpiresAt.ToString("O"), + ["waitingForConfirmation"] = true + }.ToImmutableDictionary(), + EvaluatedAt = TimeProvider.System.GetUtcNow() + }; + } + + public Task ValidateConfigAsync( + IReadOnlyDictionary config, + CancellationToken ct) => + Task.FromResult(ValidationResult.Success()); +} +``` + +### GateRegistryInitializer + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Gate; + +public sealed class GateRegistryInitializer : IHostedService +{ + private readonly IGateRegistry _registry; + private readonly IPluginLoader _pluginLoader; + private readonly ILogger _logger; + + public async Task StartAsync(CancellationToken ct) + { + _logger.LogInformation("Initializing gate registry"); + + // Register built-in gates (6 gates in this sprint, security-gate in 106_004) + _registry.RegisterBuiltIn("freeze-window-gate"); + _registry.RegisterBuiltIn("manual-gate"); + _registry.RegisterBuiltIn("policy-gate"); + _registry.RegisterBuiltIn("approval-gate"); + _registry.RegisterBuiltIn("schedule-gate"); + _registry.RegisterBuiltIn("dependency-gate"); + + _logger.LogInformation( + "Registered {Count} built-in gates", + _registry.GetBuiltInDefinitions().Count); + + // Load plugin gates + var plugins = await _pluginLoader.GetPluginsAsync(ct); + foreach (var plugin in plugins) + { + try + { + var providers = plugin.Instance.GetGateProviders(); + foreach (var provider in providers) + { + var definition = new GateDefinition + { + GateName = provider.GateName, + DisplayName = provider.DisplayName, + Description = provider.Description, + Category = GateCategory.Custom, + Source = GateSource.Plugin, + PluginId = plugin.Manifest.Id, + ConfigSchema = provider.ConfigSchema, + IsBlocking = provider.IsBlocking + }; + + _registry.RegisterPlugin(definition, provider); + } + } + catch (Exception ex) + { + _logger.LogError(ex, + "Failed to load gate plugin {PluginId}", + plugin.Manifest.Id); + } + } + + _logger.LogInformation( + "Loaded {Count} plugin gates", + _registry.GetPluginDefinitions().Count); + } + + public Task StopAsync(CancellationToken ct) => Task.CompletedTask; +} +``` + +--- + +## Acceptance Criteria + +- [ ] Register built-in gate types +- [ ] Load plugin gate types +- [ ] Evaluate individual gate +- [ ] Evaluate all gates for promotion +- [ ] Freeze window gate blocks during freeze +- [ ] Manual gate waits for confirmation +- [ ] Track gate results +- [ ] Validate gate configuration +- [ ] Unit test coverage >=85% + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `RegisterBuiltIn_AddsGate` | Registration works | +| `RegisterPlugin_AddsGate` | Plugin registration works | +| `GetProvider_ReturnsProvider` | Lookup works | +| `EvaluateGate_ReturnsResult` | Evaluation works | +| `FreezeWindowGate_ActiveFreeze_Fails` | Freeze gate logic | +| `FreezeWindowGate_NoFreeze_Passes` | No freeze passes | +| `ManualGate_CreatesCallback` | Manual gate logic | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `GateRegistryInit_E2E` | Full initialization | +| `PluginGateLoading_E2E` | Plugin gate loading | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 106_001 Promotion Manager | Internal | TODO | +| 101_002 Plugin Registry | Internal | TODO | +| 103_001 Environment (Freeze Windows) | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IGateRegistry | TODO | | +| GateRegistry | TODO | | +| IGateProvider | TODO | | +| GateEvaluator | TODO | | +| GateContext | TODO | | +| FreezeWindowGate | TODO | Blocks during freeze windows | +| ManualGate | TODO | Manual confirmation | +| PolicyGate | TODO | OPA/Rego policy evaluation | +| ApprovalGate | TODO | Multi-party approval (N of M) | +| ScheduleGate | TODO | Deployment window restrictions | +| DependencyGate | TODO | Upstream dependency checks | +| GateRegistryInitializer | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 10-Jan-2026 | Added Gate Type Catalog (8 types) with categories and sprint assignments | diff --git a/docs/implplan/SPRINT_20260110_106_004_PROMOT_security_gate.md b/docs/implplan/SPRINT_20260110_106_004_PROMOT_security_gate.md new file mode 100644 index 000000000..fa82ea190 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_106_004_PROMOT_security_gate.md @@ -0,0 +1,576 @@ +# SPRINT: Security Gate + +> **Sprint ID:** 106_004 +> **Module:** PROMOT +> **Phase:** 6 - Promotion & Gates +> **Status:** TODO +> **Parent:** [106_000_INDEX](SPRINT_20260110_106_000_INDEX_promotion_gates.md) + +--- + +## Overview + +Implement the Security Gate for blocking promotions based on vulnerability thresholds. + +### Objectives + +- Check vulnerability counts against thresholds +- Support severity-based limits (critical, high, medium) +- Require SBOM presence +- Integrate with Scanner service +- Support VEX-based exceptions + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Promotion/ +│ └── Gate.Security/ +│ ├── SecurityGate.cs +│ ├── SecurityGateConfig.cs +│ ├── VulnerabilityCounter.cs +│ ├── VexExceptionChecker.cs +│ └── SbomRequirementChecker.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Promotion.Tests/ + └── Gate.Security/ +``` + +--- + +## Architecture Reference + +- [Security Gate](../modules/release-orchestrator/modules/gates/security-gate.md) +- [Scanner Integration](../modules/scanner/integration.md) + +--- + +## Deliverables + +### SecurityGate + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Gate.Security; + +public sealed class SecurityGate : IGateProvider +{ + private readonly IReleaseManager _releaseManager; + private readonly IScannerService _scannerService; + private readonly VulnerabilityCounter _vulnCounter; + private readonly VexExceptionChecker _vexChecker; + private readonly SbomRequirementChecker _sbomChecker; + private readonly ILogger _logger; + + public string GateName => "security-gate"; + public string DisplayName => "Security Gate"; + public string Description => "Enforces vulnerability thresholds for release promotion"; + public bool IsBlocking => true; + + public GateConfigSchema ConfigSchema => new() + { + Properties = + [ + new GateConfigProperty("maxCritical", GatePropertyType.Integer, "Maximum critical vulnerabilities allowed", Default: 0), + new GateConfigProperty("maxHigh", GatePropertyType.Integer, "Maximum high vulnerabilities allowed", Default: 5), + new GateConfigProperty("maxMedium", GatePropertyType.Integer, "Maximum medium vulnerabilities allowed (null = unlimited)"), + new GateConfigProperty("maxLow", GatePropertyType.Integer, "Maximum low vulnerabilities allowed (null = unlimited)"), + new GateConfigProperty("requireSbom", GatePropertyType.Boolean, "Require SBOM for all components", Default: true), + new GateConfigProperty("maxScanAge", GatePropertyType.Integer, "Maximum scan age in hours", Default: 24), + new GateConfigProperty("applyVexExceptions", GatePropertyType.Boolean, "Apply VEX exceptions to counts", Default: true), + new GateConfigProperty("blockOnKnownExploited", GatePropertyType.Boolean, "Block on KEV vulnerabilities", Default: true) + ] + }; + + public async Task EvaluateAsync(GateContext context, CancellationToken ct) + { + var config = ParseConfig(context.Config); + + var release = await _releaseManager.GetAsync(context.ReleaseId, ct) + ?? throw new ReleaseNotFoundException(context.ReleaseId); + + var violations = new List(); + var details = new Dictionary(); + var totalVulns = new VulnerabilityCounts(); + + foreach (var component in release.Components) + { + // Check SBOM requirement + if (config.RequireSbom) + { + var hasSbom = await _sbomChecker.HasSbomAsync(component.Digest, ct); + if (!hasSbom) + { + violations.Add($"Component {component.ComponentName} has no SBOM"); + } + } + + // Get scan results + var scan = await _scannerService.GetLatestScanAsync(component.Digest, ct); + if (scan is null) + { + if (config.RequireSbom) + { + violations.Add($"Component {component.ComponentName} has no security scan"); + } + continue; + } + + // Check scan age + var scanAge = TimeProvider.System.GetUtcNow() - scan.CompletedAt; + if (scanAge.TotalHours > config.MaxScanAgeHours) + { + violations.Add($"Component {component.ComponentName} scan is too old ({scanAge.TotalHours:F1}h)"); + } + + // Count vulnerabilities + var vulnCounts = await _vulnCounter.CountAsync( + scan, + config.ApplyVexExceptions ? component.Digest : null, + ct); + + totalVulns = totalVulns.Add(vulnCounts); + + // Check for known exploited vulnerabilities + if (config.BlockOnKnownExploited && vulnCounts.KnownExploitedCount > 0) + { + violations.Add( + $"Component {component.ComponentName} has {vulnCounts.KnownExploitedCount} known exploited vulnerabilities"); + } + + details[$"component_{component.ComponentName}"] = new Dictionary + { + ["critical"] = vulnCounts.Critical, + ["high"] = vulnCounts.High, + ["medium"] = vulnCounts.Medium, + ["low"] = vulnCounts.Low, + ["knownExploited"] = vulnCounts.KnownExploitedCount, + ["scanAge"] = scanAge.TotalHours + }; + } + + // Check thresholds + if (totalVulns.Critical > config.MaxCritical) + { + violations.Add($"Critical vulnerabilities ({totalVulns.Critical}) exceed threshold ({config.MaxCritical})"); + } + + if (totalVulns.High > config.MaxHigh) + { + violations.Add($"High vulnerabilities ({totalVulns.High}) exceed threshold ({config.MaxHigh})"); + } + + if (config.MaxMedium.HasValue && totalVulns.Medium > config.MaxMedium.Value) + { + violations.Add($"Medium vulnerabilities ({totalVulns.Medium}) exceed threshold ({config.MaxMedium})"); + } + + if (config.MaxLow.HasValue && totalVulns.Low > config.MaxLow.Value) + { + violations.Add($"Low vulnerabilities ({totalVulns.Low}) exceed threshold ({config.MaxLow})"); + } + + details["totals"] = new Dictionary + { + ["critical"] = totalVulns.Critical, + ["high"] = totalVulns.High, + ["medium"] = totalVulns.Medium, + ["low"] = totalVulns.Low, + ["knownExploited"] = totalVulns.KnownExploitedCount, + ["componentsScanned"] = release.Components.Length + }; + + details["thresholds"] = new Dictionary + { + ["maxCritical"] = config.MaxCritical, + ["maxHigh"] = config.MaxHigh, + ["maxMedium"] = config.MaxMedium ?? -1, + ["maxLow"] = config.MaxLow ?? -1 + }; + + if (violations.Count > 0) + { + _logger.LogWarning( + "Security gate failed for release {ReleaseId}: {Violations}", + context.ReleaseId, + string.Join("; ", violations)); + + return GateResult.Fail( + GateName, + GateName, + string.Join("; ", violations), + blocking: true, + details.ToImmutableDictionary()); + } + + _logger.LogInformation( + "Security gate passed for release {ReleaseId}: {Critical}C/{High}H/{Medium}M/{Low}L", + context.ReleaseId, + totalVulns.Critical, + totalVulns.High, + totalVulns.Medium, + totalVulns.Low); + + return GateResult.Pass( + GateName, + GateName, + $"All security thresholds met", + details.ToImmutableDictionary()); + } + + private static SecurityGateConfig ParseConfig(ImmutableDictionary config) => + new() + { + MaxCritical = config.GetValueOrDefault("maxCritical") as int? ?? 0, + MaxHigh = config.GetValueOrDefault("maxHigh") as int? ?? 5, + MaxMedium = config.GetValueOrDefault("maxMedium") as int?, + MaxLow = config.GetValueOrDefault("maxLow") as int?, + RequireSbom = config.GetValueOrDefault("requireSbom") as bool? ?? true, + MaxScanAgeHours = config.GetValueOrDefault("maxScanAge") as int? ?? 24, + ApplyVexExceptions = config.GetValueOrDefault("applyVexExceptions") as bool? ?? true, + BlockOnKnownExploited = config.GetValueOrDefault("blockOnKnownExploited") as bool? ?? true + }; + + public Task ValidateConfigAsync( + IReadOnlyDictionary config, + CancellationToken ct) + { + var errors = new List(); + + if (config.TryGetValue("maxCritical", out var maxCritical) && + maxCritical is int mc && mc < 0) + { + errors.Add("maxCritical cannot be negative"); + } + + if (config.TryGetValue("maxScanAge", out var maxScanAge) && + maxScanAge is int msa && msa < 1) + { + errors.Add("maxScanAge must be at least 1 hour"); + } + + return Task.FromResult(errors.Count == 0 + ? ValidationResult.Success() + : ValidationResult.Failure(errors)); + } +} +``` + +### SecurityGateConfig + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Gate.Security; + +public sealed record SecurityGateConfig +{ + public int MaxCritical { get; init; } = 0; + public int MaxHigh { get; init; } = 5; + public int? MaxMedium { get; init; } + public int? MaxLow { get; init; } + public bool RequireSbom { get; init; } = true; + public int MaxScanAgeHours { get; init; } = 24; + public bool ApplyVexExceptions { get; init; } = true; + public bool BlockOnKnownExploited { get; init; } = true; +} +``` + +### VulnerabilityCounter + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Gate.Security; + +public sealed class VulnerabilityCounter +{ + private readonly IVexService _vexService; + private readonly IKevService _kevService; + + public async Task CountAsync( + ScanResult scan, + string? digestForVex = null, + CancellationToken ct = default) + { + var counts = new VulnerabilityCounts + { + Critical = scan.CriticalCount, + High = scan.HighCount, + Medium = scan.MediumCount, + Low = scan.LowCount + }; + + // Count known exploited + var kevVulns = await _kevService.GetKevVulnerabilitiesAsync( + scan.Vulnerabilities.Select(v => v.CveId), ct); + counts = counts with { KnownExploitedCount = kevVulns.Count }; + + // Apply VEX exceptions if requested + if (digestForVex is not null) + { + var vexDocs = await _vexService.GetVexForDigestAsync(digestForVex, ct); + counts = ApplyVexExceptions(counts, scan.Vulnerabilities, vexDocs); + } + + return counts; + } + + private static VulnerabilityCounts ApplyVexExceptions( + VulnerabilityCounts counts, + IReadOnlyList vulnerabilities, + IReadOnlyList vexDocs) + { + var exceptedCves = vexDocs + .SelectMany(v => v.Statements) + .Where(s => s.Status == VexStatus.NotAffected || s.Status == VexStatus.Fixed) + .Select(s => s.VulnerabilityId) + .ToHashSet(); + + var adjustedCounts = counts; + + foreach (var vuln in vulnerabilities) + { + if (exceptedCves.Contains(vuln.CveId)) + { + adjustedCounts = vuln.Severity switch + { + VulnerabilitySeverity.Critical => adjustedCounts with { Critical = adjustedCounts.Critical - 1 }, + VulnerabilitySeverity.High => adjustedCounts with { High = adjustedCounts.High - 1 }, + VulnerabilitySeverity.Medium => adjustedCounts with { Medium = adjustedCounts.Medium - 1 }, + VulnerabilitySeverity.Low => adjustedCounts with { Low = adjustedCounts.Low - 1 }, + _ => adjustedCounts + }; + } + } + + return adjustedCounts; + } +} + +public sealed record VulnerabilityCounts +{ + public int Critical { get; init; } + public int High { get; init; } + public int Medium { get; init; } + public int Low { get; init; } + public int KnownExploitedCount { get; init; } + + public int Total => Critical + High + Medium + Low; + + public VulnerabilityCounts Add(VulnerabilityCounts other) => + new() + { + Critical = Critical + other.Critical, + High = High + other.High, + Medium = Medium + other.Medium, + Low = Low + other.Low, + KnownExploitedCount = KnownExploitedCount + other.KnownExploitedCount + }; +} +``` + +### VexExceptionChecker + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Gate.Security; + +public sealed class VexExceptionChecker +{ + private readonly IVexService _vexService; + private readonly ILogger _logger; + + public async Task CheckAsync( + string digest, + string cveId, + CancellationToken ct = default) + { + var vexDocs = await _vexService.GetVexForDigestAsync(digest, ct); + + foreach (var doc in vexDocs) + { + var statement = doc.Statements + .FirstOrDefault(s => s.VulnerabilityId == cveId); + + if (statement is null) + continue; + + if (statement.Status == VexStatus.NotAffected) + { + return new VexExceptionResult( + IsExcepted: true, + Reason: statement.Justification ?? "Not affected", + VexDocumentId: doc.Id, + VexStatus: VexStatus.NotAffected + ); + } + + if (statement.Status == VexStatus.Fixed) + { + return new VexExceptionResult( + IsExcepted: true, + Reason: statement.ActionStatement ?? "Fixed", + VexDocumentId: doc.Id, + VexStatus: VexStatus.Fixed + ); + } + } + + return new VexExceptionResult( + IsExcepted: false, + Reason: null, + VexDocumentId: null, + VexStatus: null + ); + } +} + +public sealed record VexExceptionResult( + bool IsExcepted, + string? Reason, + Guid? VexDocumentId, + VexStatus? VexStatus +); +``` + +### SbomRequirementChecker + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Gate.Security; + +public sealed class SbomRequirementChecker +{ + private readonly ISbomService _sbomService; + private readonly ILogger _logger; + + public async Task HasSbomAsync(string digest, CancellationToken ct = default) + { + var sbom = await _sbomService.GetByDigestAsync(digest, ct); + return sbom is not null; + } + + public async Task ValidateSbomAsync( + string digest, + CancellationToken ct = default) + { + var sbom = await _sbomService.GetByDigestAsync(digest, ct); + + if (sbom is null) + { + return new SbomValidationResult( + HasSbom: false, + IsValid: false, + Errors: ["No SBOM found for digest"] + ); + } + + var errors = new List(); + + // Check SBOM has components + if (sbom.Components.Length == 0) + { + errors.Add("SBOM has no components"); + } + + // Check SBOM format + if (string.IsNullOrEmpty(sbom.Format)) + { + errors.Add("SBOM format not specified"); + } + + // Check SBOM is not too old (optional) + var sbomAge = TimeProvider.System.GetUtcNow() - sbom.GeneratedAt; + if (sbomAge.TotalDays > 90) + { + errors.Add($"SBOM is {sbomAge.TotalDays:F0} days old"); + } + + return new SbomValidationResult( + HasSbom: true, + IsValid: errors.Count == 0, + Errors: errors.ToImmutableArray(), + SbomId: sbom.Id, + Format: sbom.Format, + ComponentCount: sbom.Components.Length, + GeneratedAt: sbom.GeneratedAt + ); + } +} + +public sealed record SbomValidationResult( + bool HasSbom, + bool IsValid, + ImmutableArray Errors, + Guid? SbomId = null, + string? Format = null, + int? ComponentCount = null, + DateTimeOffset? GeneratedAt = null +); +``` + +--- + +## Acceptance Criteria + +- [ ] Check vulnerability counts against thresholds +- [ ] Block on critical vulnerabilities above threshold +- [ ] Block on high vulnerabilities above threshold +- [ ] Support optional medium/low thresholds +- [ ] Require SBOM presence +- [ ] Check scan age +- [ ] Apply VEX exceptions +- [ ] Block on known exploited (KEV) vulnerabilities +- [ ] Return detailed gate result +- [ ] Unit test coverage >=85% + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `Evaluate_BelowThreshold_Passes` | Pass case | +| `Evaluate_CriticalAboveThreshold_Fails` | Critical block | +| `Evaluate_HighAboveThreshold_Fails` | High block | +| `Evaluate_NoSbom_Fails` | SBOM requirement | +| `Evaluate_OldScan_Fails` | Scan age check | +| `VulnCounter_AppliesVexExceptions` | VEX logic | +| `VulnCounter_CountsKev` | KEV counting | +| `SbomChecker_ValidatesSbom` | SBOM validation | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `SecurityGate_E2E` | Full gate evaluation | +| `VexException_E2E` | VEX integration | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 106_003 Gate Registry | Internal | TODO | +| Scanner | Internal | Exists | +| VexService | Internal | Exists | +| SbomService | Internal | Exists | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| SecurityGate | TODO | | +| SecurityGateConfig | TODO | | +| VulnerabilityCounter | TODO | | +| VexExceptionChecker | TODO | | +| SbomRequirementChecker | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_106_005_PROMOT_decision_engine.md b/docs/implplan/SPRINT_20260110_106_005_PROMOT_decision_engine.md new file mode 100644 index 000000000..05588bfe6 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_106_005_PROMOT_decision_engine.md @@ -0,0 +1,626 @@ +# SPRINT: Decision Engine + +> **Sprint ID:** 106_005 +> **Module:** PROMOT +> **Phase:** 6 - Promotion & Gates +> **Status:** TODO +> **Parent:** [106_000_INDEX](SPRINT_20260110_106_000_INDEX_promotion_gates.md) + +--- + +## Overview + +Implement the Decision Engine for combining gate results and approvals into final promotion decisions. + +### Objectives + +- Evaluate all configured gates +- Combine gate results with approval status +- Generate decision records with evidence +- Support configurable decision rules + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Promotion/ +│ ├── Decision/ +│ │ ├── IDecisionEngine.cs +│ │ ├── DecisionEngine.cs +│ │ ├── DecisionRules.cs +│ │ ├── DecisionRecorder.cs +│ │ └── DecisionNotifier.cs +│ └── Models/ +│ ├── DecisionResult.cs +│ ├── DecisionRecord.cs +│ └── EnvironmentGateConfig.cs +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.Promotion.Tests/ + └── Decision/ +``` + +--- + +## Architecture Reference + +- [Promotion Manager](../modules/release-orchestrator/modules/promotion-manager.md) + +--- + +## Deliverables + +### IDecisionEngine Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Decision; + +public interface IDecisionEngine +{ + Task EvaluateAsync(Guid promotionId, CancellationToken ct = default); + Task EvaluateGateAsync(Guid promotionId, string gateName, CancellationToken ct = default); + Task> EvaluateAllGatesAsync(Guid promotionId, CancellationToken ct = default); + Task GetDecisionRecordAsync(Guid promotionId, CancellationToken ct = default); + Task> GetDecisionHistoryAsync(Guid promotionId, CancellationToken ct = default); +} +``` + +### DecisionResult Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Models; + +public sealed record DecisionResult +{ + public required Guid PromotionId { get; init; } + public required DecisionOutcome Outcome { get; init; } + public required bool CanProceed { get; init; } + public string? BlockingReason { get; init; } + public required ImmutableArray GateResults { get; init; } + public required ApprovalStatus ApprovalStatus { get; init; } + public required DateTimeOffset EvaluatedAt { get; init; } + public TimeSpan Duration { get; init; } + + public IEnumerable PassedGates => + GateResults.Where(g => g.Passed); + + public IEnumerable FailedGates => + GateResults.Where(g => !g.Passed); + + public IEnumerable BlockingFailedGates => + GateResults.Where(g => !g.Passed && g.Blocking); +} + +public enum DecisionOutcome +{ + Allow, // All gates passed, approvals complete + Deny, // Blocking gate failed + PendingApproval, // Gates passed, awaiting approvals + PendingGate, // Async gate awaiting callback + Error // Evaluation error +} +``` + +### DecisionRecord Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Models; + +public sealed record DecisionRecord +{ + public required Guid Id { get; init; } + public required Guid PromotionId { get; init; } + public required Guid TenantId { get; init; } + public required DecisionOutcome Outcome { get; init; } + public required string OutcomeReason { get; init; } + public required ImmutableArray GateResults { get; init; } + public required ImmutableArray Approvals { get; init; } + public required EnvironmentGateConfig GateConfig { get; init; } + public required DateTimeOffset EvaluatedAt { get; init; } + public required Guid EvaluatedBy { get; init; } // System or user + public string? EvidenceDigest { get; init; } +} + +public sealed record EnvironmentGateConfig +{ + public required Guid EnvironmentId { get; init; } + public required ImmutableArray RequiredGates { get; init; } + public required int RequiredApprovals { get; init; } + public required bool RequireSeparationOfDuties { get; init; } + public required bool AllGatesMustPass { get; init; } +} +``` + +### DecisionEngine Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Decision; + +public sealed class DecisionEngine : IDecisionEngine +{ + private readonly IPromotionStore _promotionStore; + private readonly IEnvironmentService _environmentService; + private readonly IGateRegistry _gateRegistry; + private readonly GateEvaluator _gateEvaluator; + private readonly IApprovalGateway _approvalGateway; + private readonly DecisionRules _decisionRules; + private readonly DecisionRecorder _decisionRecorder; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public async Task EvaluateAsync( + Guid promotionId, + CancellationToken ct = default) + { + var sw = Stopwatch.StartNew(); + + var promotion = await _promotionStore.GetAsync(promotionId, ct) + ?? throw new PromotionNotFoundException(promotionId); + + var gateConfig = await GetGateConfigAsync(promotion.TargetEnvironmentId, ct); + + // Evaluate all required gates + var gateContext = BuildGateContext(promotion); + var gateResults = await EvaluateGatesAsync(gateConfig.RequiredGates, gateContext, ct); + + // Get approval status + var approvalStatus = await _approvalGateway.GetStatusAsync(promotionId, ct); + + // Apply decision rules + var outcome = _decisionRules.Evaluate(gateResults, approvalStatus, gateConfig); + + var result = new DecisionResult + { + PromotionId = promotionId, + Outcome = outcome.Decision, + CanProceed = outcome.CanProceed, + BlockingReason = outcome.BlockingReason, + GateResults = gateResults, + ApprovalStatus = approvalStatus, + EvaluatedAt = _timeProvider.GetUtcNow(), + Duration = sw.Elapsed + }; + + // Record decision + await _decisionRecorder.RecordAsync(promotion, result, gateConfig, ct); + + // Update promotion gate results + var updatedPromotion = promotion with { GateResults = gateResults }; + await _promotionStore.SaveAsync(updatedPromotion, ct); + + // Publish event + await _eventPublisher.PublishAsync(new PromotionDecisionMade( + promotionId, + promotion.TenantId, + result.Outcome, + result.CanProceed, + gateResults.Count(g => g.Passed), + gateResults.Count(g => !g.Passed), + _timeProvider.GetUtcNow() + ), ct); + + _logger.LogInformation( + "Decision for promotion {PromotionId}: {Outcome} (proceed={CanProceed}) in {Duration}ms", + promotionId, + result.Outcome, + result.CanProceed, + sw.ElapsedMilliseconds); + + return result; + } + + private async Task> EvaluateGatesAsync( + ImmutableArray gateNames, + GateContext context, + CancellationToken ct) + { + var results = new List(); + + // Evaluate gates in parallel + var tasks = gateNames.Select(name => + _gateEvaluator.EvaluateAsync(name, context, ct)); + + var gateResults = await Task.WhenAll(tasks); + return gateResults.ToImmutableArray(); + } + + private async Task GetGateConfigAsync( + Guid environmentId, + CancellationToken ct) + { + var environment = await _environmentService.GetAsync(environmentId, ct) + ?? throw new EnvironmentNotFoundException(environmentId); + + // Get configured gates for this environment + var configuredGates = await _environmentService.GetGatesAsync(environmentId, ct); + + return new EnvironmentGateConfig + { + EnvironmentId = environmentId, + RequiredGates = configuredGates.Select(g => g.GateName).ToImmutableArray(), + RequiredApprovals = environment.RequiredApprovals, + RequireSeparationOfDuties = environment.RequireSeparationOfDuties, + AllGatesMustPass = true // Configurable in future + }; + } + + private static GateContext BuildGateContext(Promotion promotion) => + new() + { + PromotionId = promotion.Id, + ReleaseId = promotion.ReleaseId, + ReleaseName = promotion.ReleaseName, + SourceEnvironmentId = promotion.SourceEnvironmentId, + TargetEnvironmentId = promotion.TargetEnvironmentId, + TargetEnvironmentName = promotion.TargetEnvironmentName, + Config = ImmutableDictionary.Empty, + RequestedBy = promotion.RequestedBy, + RequestedAt = promotion.RequestedAt + }; + + public async Task GetDecisionRecordAsync( + Guid promotionId, + CancellationToken ct = default) + { + return await _decisionRecorder.GetLatestAsync(promotionId, ct) + ?? throw new DecisionRecordNotFoundException(promotionId); + } +} +``` + +### DecisionRules + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Decision; + +public sealed class DecisionRules +{ + public DecisionOutcomeResult Evaluate( + ImmutableArray gateResults, + ApprovalStatus approvalStatus, + EnvironmentGateConfig config) + { + // Check for blocking gate failures first + var blockingFailures = gateResults.Where(g => !g.Passed && g.Blocking).ToList(); + if (blockingFailures.Count > 0) + { + return new DecisionOutcomeResult( + Decision: DecisionOutcome.Deny, + CanProceed: false, + BlockingReason: $"Blocked by gates: {string.Join(", ", blockingFailures.Select(g => g.GateName))}" + ); + } + + // Check for async gates waiting for callback + var pendingGates = gateResults + .Where(g => !g.Passed && g.Details.ContainsKey("waitingForConfirmation")) + .ToList(); + + if (pendingGates.Count > 0) + { + return new DecisionOutcomeResult( + Decision: DecisionOutcome.PendingGate, + CanProceed: false, + BlockingReason: $"Waiting for: {string.Join(", ", pendingGates.Select(g => g.GateName))}" + ); + } + + // Check if all gates must pass + if (config.AllGatesMustPass) + { + var failedGates = gateResults.Where(g => !g.Passed).ToList(); + if (failedGates.Count > 0) + { + return new DecisionOutcomeResult( + Decision: DecisionOutcome.Deny, + CanProceed: false, + BlockingReason: $"Failed gates: {string.Join(", ", failedGates.Select(g => g.GateName))}" + ); + } + } + + // Check approval status + if (approvalStatus.IsRejected) + { + return new DecisionOutcomeResult( + Decision: DecisionOutcome.Deny, + CanProceed: false, + BlockingReason: "Promotion was rejected" + ); + } + + if (!approvalStatus.IsApproved && config.RequiredApprovals > 0) + { + return new DecisionOutcomeResult( + Decision: DecisionOutcome.PendingApproval, + CanProceed: false, + BlockingReason: $"Awaiting approvals: {approvalStatus.CurrentApprovals}/{config.RequiredApprovals}" + ); + } + + // All checks passed + return new DecisionOutcomeResult( + Decision: DecisionOutcome.Allow, + CanProceed: true, + BlockingReason: null + ); + } +} + +public sealed record DecisionOutcomeResult( + DecisionOutcome Decision, + bool CanProceed, + string? BlockingReason +); +``` + +### DecisionRecorder + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Decision; + +public sealed class DecisionRecorder +{ + private readonly IDecisionRecordStore _store; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + private readonly ILogger _logger; + + public async Task RecordAsync( + Promotion promotion, + DecisionResult result, + EnvironmentGateConfig config, + CancellationToken ct = default) + { + var record = new DecisionRecord + { + Id = _guidGenerator.NewGuid(), + PromotionId = promotion.Id, + TenantId = promotion.TenantId, + Outcome = result.Outcome, + OutcomeReason = result.BlockingReason ?? "All requirements met", + GateResults = result.GateResults, + Approvals = promotion.Approvals, + GateConfig = config, + EvaluatedAt = _timeProvider.GetUtcNow(), + EvaluatedBy = Guid.Empty, // System evaluation + EvidenceDigest = ComputeEvidenceDigest(result) + }; + + await _store.SaveAsync(record, ct); + + _logger.LogDebug( + "Recorded decision {DecisionId} for promotion {PromotionId}: {Outcome}", + record.Id, + promotion.Id, + result.Outcome); + } + + public async Task GetLatestAsync( + Guid promotionId, + CancellationToken ct = default) + { + return await _store.GetLatestAsync(promotionId, ct); + } + + public async Task> GetHistoryAsync( + Guid promotionId, + CancellationToken ct = default) + { + return await _store.ListByPromotionAsync(promotionId, ct); + } + + private static string ComputeEvidenceDigest(DecisionResult result) + { + // Create canonical representation and hash + var evidence = new + { + result.PromotionId, + result.Outcome, + result.EvaluatedAt, + Gates = result.GateResults.Select(g => new + { + g.GateName, + g.Passed, + g.Message + }).OrderBy(g => g.GateName), + Approvals = result.ApprovalStatus.Approvals.Select(a => new + { + a.UserId, + a.Decision, + a.DecidedAt + }).OrderBy(a => a.DecidedAt) + }; + + var json = CanonicalJsonSerializer.Serialize(evidence); + var hash = SHA256.HashData(Encoding.UTF8.GetBytes(json)); + return $"sha256:{Convert.ToHexString(hash).ToLowerInvariant()}"; + } +} +``` + +### DecisionNotifier + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Decision; + +public sealed class DecisionNotifier +{ + private readonly INotificationService _notificationService; + private readonly IPromotionStore _promotionStore; + private readonly ILogger _logger; + + public async Task NotifyDecisionAsync( + DecisionResult result, + CancellationToken ct = default) + { + var promotion = await _promotionStore.GetAsync(result.PromotionId, ct); + if (promotion is null) + return; + + var notification = result.Outcome switch + { + DecisionOutcome.Allow => BuildAllowNotification(promotion, result), + DecisionOutcome.Deny => BuildDenyNotification(promotion, result), + DecisionOutcome.PendingApproval => BuildPendingApprovalNotification(promotion, result), + _ => null + }; + + if (notification is not null) + { + await _notificationService.SendAsync(notification, ct); + + _logger.LogInformation( + "Sent {Outcome} notification for promotion {PromotionId}", + result.Outcome, + result.PromotionId); + } + } + + private static NotificationRequest BuildAllowNotification( + Promotion promotion, + DecisionResult result) => + new() + { + Channel = "slack", + Title = $"Promotion Approved: {promotion.ReleaseName}", + Message = $"Release '{promotion.ReleaseName}' has been approved for deployment to {promotion.TargetEnvironmentName}.", + Severity = NotificationSeverity.Info, + Metadata = new Dictionary + { + ["promotionId"] = promotion.Id.ToString(), + ["outcome"] = "allow" + } + }; + + private static NotificationRequest BuildDenyNotification( + Promotion promotion, + DecisionResult result) => + new() + { + Channel = "slack", + Title = $"Promotion Blocked: {promotion.ReleaseName}", + Message = $"Release '{promotion.ReleaseName}' promotion to {promotion.TargetEnvironmentName} was blocked.\n\nReason: {result.BlockingReason}", + Severity = NotificationSeverity.Warning, + Metadata = new Dictionary + { + ["promotionId"] = promotion.Id.ToString(), + ["outcome"] = "deny" + } + }; + + private static NotificationRequest BuildPendingApprovalNotification( + Promotion promotion, + DecisionResult result) => + new() + { + Channel = "slack", + Title = $"Approval Required: {promotion.ReleaseName}", + Message = $"Release '{promotion.ReleaseName}' is awaiting approval for deployment to {promotion.TargetEnvironmentName}.\n\n{result.BlockingReason}", + Severity = NotificationSeverity.Info, + Metadata = new Dictionary + { + ["promotionId"] = promotion.Id.ToString(), + ["outcome"] = "pending_approval" + } + }; +} +``` + +### Domain Events + +```csharp +namespace StellaOps.ReleaseOrchestrator.Promotion.Events; + +public sealed record PromotionDecisionMade( + Guid PromotionId, + Guid TenantId, + DecisionOutcome Outcome, + bool CanProceed, + int PassedGates, + int FailedGates, + DateTimeOffset DecidedAt +) : IDomainEvent; + +public sealed record PromotionReadyForDeployment( + Guid PromotionId, + Guid TenantId, + Guid ReleaseId, + Guid TargetEnvironmentId, + DateTimeOffset ReadyAt +) : IDomainEvent; +``` + +--- + +## Acceptance Criteria + +- [ ] Evaluate all configured gates +- [ ] Combine gate results with approvals +- [ ] Deny on blocking gate failure +- [ ] Pending on approval required +- [ ] Allow when all requirements met +- [ ] Record decision with evidence +- [ ] Compute evidence digest +- [ ] Notify on decision +- [ ] Support decision history +- [ ] Unit test coverage >=85% + +--- + +## Test Plan + +### Unit Tests + +| Test | Description | +|------|-------------| +| `Evaluate_AllGatesPass_AllApprovals_Allows` | Allow case | +| `Evaluate_BlockingGateFails_Denies` | Deny case | +| `Evaluate_PendingApprovals_ReturnsPending` | Pending case | +| `DecisionRules_AllMustPass_AnyFails_Denies` | Rule logic | +| `DecisionRecorder_ComputesDigest` | Evidence hash | +| `DecisionRecorder_SavesHistory` | History tracking | + +### Integration Tests + +| Test | Description | +|------|-------------| +| `DecisionEngine_E2E` | Full evaluation flow | +| `DecisionHistory_E2E` | Multiple decisions | + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 106_002 Approval Gateway | Internal | TODO | +| 106_003 Gate Registry | Internal | TODO | +| 106_004 Security Gate | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IDecisionEngine | TODO | | +| DecisionEngine | TODO | | +| DecisionRules | TODO | | +| DecisionRecorder | TODO | | +| DecisionNotifier | TODO | | +| DecisionResult model | TODO | | +| DecisionRecord model | TODO | | +| IDecisionRecordStore | TODO | | +| DecisionRecordStore | TODO | | +| Domain events | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_107_000_INDEX_deployment_execution.md b/docs/implplan/SPRINT_20260110_107_000_INDEX_deployment_execution.md new file mode 100644 index 000000000..71abb47f8 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_107_000_INDEX_deployment_execution.md @@ -0,0 +1,254 @@ +# SPRINT INDEX: Phase 7 - Deployment Execution + +> **Epic:** Release Orchestrator +> **Phase:** 7 - Deployment Execution +> **Batch:** 107 +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_release_orchestrator.md) + +--- + +## Overview + +Phase 7 implements the Deployment Execution system - orchestrating the actual deployment of releases to targets via agents. + +### Objectives + +- Deploy orchestrator coordinates multi-target deployments +- Target executor dispatches tasks to agents +- Artifact generator creates deployment artifacts +- Rollback manager handles failure recovery +- Deployment strategies (rolling, blue-green, canary) + +--- + +## Sprint Structure + +| Sprint ID | Title | Module | Status | Dependencies | +|-----------|-------|--------|--------|--------------| +| 107_001 | Deploy Orchestrator | DEPLOY | TODO | 105_003, 106_005 | +| 107_002 | Target Executor | DEPLOY | TODO | 107_001, 103_002 | +| 107_003 | Artifact Generator | DEPLOY | TODO | 107_001 | +| 107_004 | Rollback Manager | DEPLOY | TODO | 107_002 | +| 107_005 | Deployment Strategies | DEPLOY | TODO | 107_002 | + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ DEPLOYMENT EXECUTION │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ DEPLOY ORCHESTRATOR (107_001) │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Deployment Job │ │ │ +│ │ │ promotion_id: uuid │ │ │ +│ │ │ strategy: rolling │ │ │ +│ │ │ targets: [target-1, target-2, target-3] │ │ │ +│ │ │ status: deploying │ │ │ +│ │ └─────────────────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ TARGET EXECUTOR (107_002) │ │ +│ │ │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ Target 1 │ │ Target 2 │ │ Target 3 │ │ │ +│ │ │ ✓ Done │ │ ⟳ Running │ │ ○ Pending │ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ │ │ │ +│ │ Task dispatch via gRPC to agents │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ ARTIFACT GENERATOR (107_003) │ │ +│ │ │ │ +│ │ Generated artifacts for each deployment: │ │ +│ │ ├── compose.stella.lock.yml (digested compose file) │ │ +│ │ ├── stella.version.json (version sticker) │ │ +│ │ └── deployment-manifest.json (full deployment record) │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ ROLLBACK MANAGER (107_004) │ │ +│ │ │ │ +│ │ On failure: │ │ +│ │ 1. Stop pending tasks │ │ +│ │ 2. Rollback completed targets to previous version │ │ +│ │ 3. Generate rollback evidence │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ DEPLOYMENT STRATEGIES (107_005) │ │ +│ │ │ │ +│ │ Rolling: [■■□□□] → [■■■□□] → [■■■■□] → [■■■■■] │ │ +│ │ Blue-Green: [■■■■■] ──swap──► [□□□□□] (instant cutover) │ │ +│ │ Canary: [■□□□□] → [■■□□□] → [■■■□□] → [■■■■■] (gradual) │ │ +│ │ All-at-once: [□□□□□] → [■■■■■] (simultaneous) │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Deliverables Summary + +### 107_001: Deploy Orchestrator + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IDeployOrchestrator` | Interface | Deployment coordination | +| `DeployOrchestrator` | Class | Implementation | +| `DeploymentJob` | Model | Job entity | +| `DeploymentScheduler` | Class | Task scheduling | + +### 107_002: Target Executor + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `ITargetExecutor` | Interface | Target deployment | +| `TargetExecutor` | Class | Implementation | +| `DeploymentTask` | Model | Per-target task | +| `AgentDispatcher` | Class | gRPC task dispatch | + +### 107_003: Artifact Generator + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IArtifactGenerator` | Interface | Artifact creation | +| `ComposeLockGenerator` | Class | Digest-locked compose | +| `VersionStickerGenerator` | Class | stella.version.json | +| `DeploymentManifestGenerator` | Class | Full manifest | + +### 107_004: Rollback Manager + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IRollbackManager` | Interface | Rollback operations | +| `RollbackManager` | Class | Implementation | +| `RollbackPlan` | Model | Rollback strategy | +| `RollbackExecutor` | Class | Execute rollback | + +### 107_005: Deployment Strategies + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IDeploymentStrategy` | Interface | Strategy contract | +| `RollingStrategy` | Strategy | Rolling deployment | +| `BlueGreenStrategy` | Strategy | Blue-green deployment | +| `CanaryStrategy` | Strategy | Canary deployment | +| `AllAtOnceStrategy` | Strategy | Simultaneous deployment | + +--- + +## Key Interfaces + +```csharp +public interface IDeployOrchestrator +{ + Task StartAsync(Guid promotionId, DeploymentOptions options, CancellationToken ct); + Task GetJobAsync(Guid jobId, CancellationToken ct); + Task CancelAsync(Guid jobId, CancellationToken ct); + Task WaitForCompletionAsync(Guid jobId, CancellationToken ct); +} + +public interface ITargetExecutor +{ + Task DeployToTargetAsync(Guid jobId, Guid targetId, DeploymentPayload payload, CancellationToken ct); + Task GetTaskAsync(Guid taskId, CancellationToken ct); +} + +public interface IDeploymentStrategy +{ + string Name { get; } + Task> PlanAsync(DeploymentJob job, CancellationToken ct); + Task ShouldProceedAsync(DeploymentBatch completedBatch, CancellationToken ct); +} + +public interface IRollbackManager +{ + Task PlanAsync(Guid jobId, CancellationToken ct); + Task ExecuteAsync(RollbackPlan plan, CancellationToken ct); +} +``` + +--- + +## Deployment Flow + +``` +┌──────────────────────────────────────────────────────────────────────────────┐ +│ DEPLOYMENT FLOW │ +│ │ +│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ +│ │ Promotion │───►│ Decision │───►│ Deploy │───►│ Generate │ │ +│ │ Approved │ │ Allow │ │ Start │ │ Artifacts │ │ +│ └─────────────┘ └─────────────┘ └─────────────┘ └──────┬──────┘ │ +│ │ │ +│ ┌─────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────────┐│ +│ │ Strategy Execution ││ +│ │ ││ +│ │ Batch 1 Batch 2 Batch 3 ││ +│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ││ +│ │ │Target-1 │ ──► │Target-2 │ ──► │Target-3 │ ││ +│ │ │ ✓ Done │ │ ✓ Done │ │ ⟳ Active │ ││ +│ │ └─────────┘ └─────────┘ └─────────┘ ││ +│ │ │ │ │ ││ +│ │ ▼ ▼ ▼ ││ +│ │ Health Check Health Check Health Check ││ +│ │ │ │ │ ││ +│ │ ▼ ▼ ▼ ││ +│ │ Write Sticker Write Sticker Write Sticker ││ +│ └─────────────────────────────────────────────────────────────────────────┘│ +│ │ +│ ┌─────────────────────────────────────────────────────┐ │ +│ │ On Failure │ │ +│ │ │ │ +│ │ 1. Stop pending batches │ │ +│ │ 2. Rollback completed targets │ │ +│ │ 3. Generate rollback evidence │ │ +│ │ 4. Update promotion status │ │ +│ └─────────────────────────────────────────────────────┘ │ +│ │ +└──────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Dependencies + +| Module | Purpose | +|--------|---------| +| 105_003 Workflow Engine | Workflow execution | +| 106_005 Decision Engine | Deployment approval | +| 103_002 Target Registry | Target information | +| 108_* Agents | Task execution | + +--- + +## Acceptance Criteria + +- [ ] Deployment job created from promotion +- [ ] Tasks dispatched to agents +- [ ] Rolling deployment works +- [ ] Blue-green deployment works +- [ ] Canary deployment works +- [ ] Artifacts generated for each target +- [ ] Rollback restores previous version +- [ ] Health checks gate progression +- [ ] Unit test coverage ≥80% + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Phase 7 index created | diff --git a/docs/implplan/SPRINT_20260110_107_001_DEPLOY_orchestrator.md b/docs/implplan/SPRINT_20260110_107_001_DEPLOY_orchestrator.md new file mode 100644 index 000000000..666a70b9a --- /dev/null +++ b/docs/implplan/SPRINT_20260110_107_001_DEPLOY_orchestrator.md @@ -0,0 +1,410 @@ +# SPRINT: Deploy Orchestrator + +> **Sprint ID:** 107_001 +> **Module:** DEPLOY +> **Phase:** 7 - Deployment Execution +> **Status:** TODO +> **Parent:** [107_000_INDEX](SPRINT_20260110_107_000_INDEX_deployment_execution.md) + +--- + +## Overview + +Implement the Deploy Orchestrator for coordinating multi-target deployments. + +### Objectives + +- Create deployment jobs from approved promotions +- Coordinate deployment across multiple targets +- Track deployment progress and status +- Support deployment cancellation + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Deployment/ +│ ├── Orchestrator/ +│ │ ├── IDeployOrchestrator.cs +│ │ ├── DeployOrchestrator.cs +│ │ ├── DeploymentCoordinator.cs +│ │ └── DeploymentScheduler.cs +│ ├── Store/ +│ │ ├── IDeploymentJobStore.cs +│ │ └── DeploymentJobStore.cs +│ └── Models/ +│ ├── DeploymentJob.cs +│ ├── DeploymentOptions.cs +│ └── DeploymentStatus.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### IDeployOrchestrator Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Orchestrator; + +public interface IDeployOrchestrator +{ + Task StartAsync(Guid promotionId, DeploymentOptions options, CancellationToken ct = default); + Task GetJobAsync(Guid jobId, CancellationToken ct = default); + Task> ListJobsAsync(DeploymentJobFilter? filter = null, CancellationToken ct = default); + Task CancelAsync(Guid jobId, string? reason = null, CancellationToken ct = default); + Task WaitForCompletionAsync(Guid jobId, TimeSpan? timeout = null, CancellationToken ct = default); + Task GetProgressAsync(Guid jobId, CancellationToken ct = default); +} + +public sealed record DeploymentOptions( + DeploymentStrategy Strategy = DeploymentStrategy.Rolling, + string? BatchSize = "25%", + bool WaitForHealthCheck = true, + bool RollbackOnFailure = true, + TimeSpan? Timeout = null, + Guid? WorkflowRunId = null, + string? CallbackToken = null +); + +public enum DeploymentStrategy +{ + Rolling, + BlueGreen, + Canary, + AllAtOnce +} +``` + +### DeploymentJob Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Models; + +public sealed record DeploymentJob +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required Guid PromotionId { get; init; } + public required Guid ReleaseId { get; init; } + public required string ReleaseName { get; init; } + public required Guid EnvironmentId { get; init; } + public required string EnvironmentName { get; init; } + public required DeploymentStatus Status { get; init; } + public required DeploymentStrategy Strategy { get; init; } + public required DeploymentOptions Options { get; init; } + public required ImmutableArray Tasks { get; init; } + public string? FailureReason { get; init; } + public string? CancelReason { get; init; } + public DateTimeOffset StartedAt { get; init; } + public DateTimeOffset? CompletedAt { get; init; } + public Guid StartedBy { get; init; } + public Guid? RollbackJobId { get; init; } + + public TimeSpan? Duration => CompletedAt.HasValue + ? CompletedAt.Value - StartedAt + : null; + + public int CompletedTaskCount => Tasks.Count(t => t.Status == DeploymentTaskStatus.Completed); + public int TotalTaskCount => Tasks.Length; + public double ProgressPercent => TotalTaskCount > 0 + ? (double)CompletedTaskCount / TotalTaskCount * 100 + : 0; +} + +public enum DeploymentStatus +{ + Pending, + Running, + Completed, + Failed, + Cancelled, + RollingBack, + RolledBack +} + +public sealed record DeploymentTask +{ + public required Guid Id { get; init; } + public required Guid TargetId { get; init; } + public required string TargetName { get; init; } + public required int BatchIndex { get; init; } + public required DeploymentTaskStatus Status { get; init; } + public string? AgentId { get; init; } + public DateTimeOffset? StartedAt { get; init; } + public DateTimeOffset? CompletedAt { get; init; } + public string? Error { get; init; } + public ImmutableDictionary Result { get; init; } = ImmutableDictionary.Empty; +} + +public enum DeploymentTaskStatus +{ + Pending, + Running, + Completed, + Failed, + Skipped, + Cancelled +} +``` + +### DeployOrchestrator Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Orchestrator; + +public sealed class DeployOrchestrator : IDeployOrchestrator +{ + private readonly IDeploymentJobStore _jobStore; + private readonly IPromotionManager _promotionManager; + private readonly IReleaseManager _releaseManager; + private readonly ITargetRegistry _targetRegistry; + private readonly IDeploymentStrategyFactory _strategyFactory; + private readonly ITargetExecutor _targetExecutor; + private readonly IArtifactGenerator _artifactGenerator; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + private readonly ILogger _logger; + + public async Task StartAsync( + Guid promotionId, + DeploymentOptions options, + CancellationToken ct = default) + { + var promotion = await _promotionManager.GetAsync(promotionId, ct) + ?? throw new PromotionNotFoundException(promotionId); + + if (promotion.Status != PromotionStatus.Approved) + { + throw new PromotionNotApprovedException(promotionId); + } + + var release = await _releaseManager.GetAsync(promotion.ReleaseId, ct) + ?? throw new ReleaseNotFoundException(promotion.ReleaseId); + + var targets = await _targetRegistry.ListHealthyAsync(promotion.TargetEnvironmentId, ct); + if (targets.Count == 0) + { + throw new NoHealthyTargetsException(promotion.TargetEnvironmentId); + } + + // Create deployment tasks for each target + var tasks = targets.Select((target, index) => new DeploymentTask + { + Id = _guidGenerator.NewGuid(), + TargetId = target.Id, + TargetName = target.Name, + BatchIndex = 0, // Will be set by strategy + Status = DeploymentTaskStatus.Pending + }).ToImmutableArray(); + + var job = new DeploymentJob + { + Id = _guidGenerator.NewGuid(), + TenantId = _tenantContext.TenantId, + PromotionId = promotionId, + ReleaseId = release.Id, + ReleaseName = release.Name, + EnvironmentId = promotion.TargetEnvironmentId, + EnvironmentName = promotion.TargetEnvironmentName, + Status = DeploymentStatus.Pending, + Strategy = options.Strategy, + Options = options, + Tasks = tasks, + StartedAt = _timeProvider.GetUtcNow(), + StartedBy = _userContext.UserId + }; + + await _jobStore.SaveAsync(job, ct); + + // Update promotion status + await _promotionManager.UpdateStatusAsync(promotionId, PromotionStatus.Deploying, ct); + + await _eventPublisher.PublishAsync(new DeploymentJobStarted( + job.Id, + job.TenantId, + job.ReleaseName, + job.EnvironmentName, + job.Strategy, + targets.Count, + _timeProvider.GetUtcNow() + ), ct); + + _logger.LogInformation( + "Started deployment job {JobId} for release {Release} to {Environment} with {TargetCount} targets", + job.Id, release.Name, promotion.TargetEnvironmentName, targets.Count); + + // Start deployment execution + _ = ExecuteDeploymentAsync(job.Id, ct); + + return job; + } + + private async Task ExecuteDeploymentAsync(Guid jobId, CancellationToken ct) + { + try + { + var job = await _jobStore.GetAsync(jobId, ct); + if (job is null) return; + + job = job with { Status = DeploymentStatus.Running }; + await _jobStore.SaveAsync(job, ct); + + // Get strategy and plan batches + var strategy = _strategyFactory.Create(job.Strategy); + var batches = await strategy.PlanAsync(job, ct); + + // Execute batches + foreach (var batch in batches) + { + job = await _jobStore.GetAsync(jobId, ct); + if (job is null || job.Status == DeploymentStatus.Cancelled) break; + + await ExecuteBatchAsync(job, batch, ct); + + // Check if should continue + if (!await strategy.ShouldProceedAsync(batch, ct)) + { + _logger.LogWarning("Strategy halted deployment after batch {BatchIndex}", batch.Index); + break; + } + } + + // Complete or fail + job = await _jobStore.GetAsync(jobId, ct); + if (job is not null && job.Status == DeploymentStatus.Running) + { + var allCompleted = job.Tasks.All(t => t.Status == DeploymentTaskStatus.Completed); + job = job with + { + Status = allCompleted ? DeploymentStatus.Completed : DeploymentStatus.Failed, + CompletedAt = _timeProvider.GetUtcNow() + }; + await _jobStore.SaveAsync(job, ct); + + await NotifyCompletionAsync(job, ct); + } + } + catch (Exception ex) + { + _logger.LogError(ex, "Deployment job {JobId} failed", jobId); + await FailJobAsync(jobId, ex.Message, ct); + } + } + + private async Task ExecuteBatchAsync(DeploymentJob job, DeploymentBatch batch, CancellationToken ct) + { + _logger.LogInformation("Executing batch {BatchIndex} with {TaskCount} tasks", + batch.Index, batch.TaskIds.Count); + + // Generate artifacts + var payload = await _artifactGenerator.GeneratePayloadAsync(job, ct); + + // Execute tasks in parallel within batch + var tasks = batch.TaskIds.Select(taskId => + _targetExecutor.DeployToTargetAsync(job.Id, taskId, payload, ct)); + + await Task.WhenAll(tasks); + } + + public async Task CancelAsync(Guid jobId, string? reason = null, CancellationToken ct = default) + { + var job = await _jobStore.GetAsync(jobId, ct) + ?? throw new DeploymentJobNotFoundException(jobId); + + if (job.Status != DeploymentStatus.Running && job.Status != DeploymentStatus.Pending) + { + throw new DeploymentJobNotCancellableException(jobId); + } + + job = job with + { + Status = DeploymentStatus.Cancelled, + CancelReason = reason, + CompletedAt = _timeProvider.GetUtcNow() + }; + + await _jobStore.SaveAsync(job, ct); + + await _eventPublisher.PublishAsync(new DeploymentJobCancelled( + jobId, job.TenantId, reason, _timeProvider.GetUtcNow() + ), ct); + } + + public async Task GetProgressAsync(Guid jobId, CancellationToken ct = default) + { + var job = await _jobStore.GetAsync(jobId, ct) + ?? throw new DeploymentJobNotFoundException(jobId); + + return new DeploymentProgress( + JobId: job.Id, + Status: job.Status, + TotalTargets: job.TotalTaskCount, + CompletedTargets: job.CompletedTaskCount, + FailedTargets: job.Tasks.Count(t => t.Status == DeploymentTaskStatus.Failed), + PendingTargets: job.Tasks.Count(t => t.Status == DeploymentTaskStatus.Pending), + ProgressPercent: job.ProgressPercent, + CurrentBatch: job.Tasks.Where(t => t.Status == DeploymentTaskStatus.Running).Select(t => t.BatchIndex).FirstOrDefault() + ); + } +} + +public sealed record DeploymentProgress( + Guid JobId, + DeploymentStatus Status, + int TotalTargets, + int CompletedTargets, + int FailedTargets, + int PendingTargets, + double ProgressPercent, + int CurrentBatch +); +``` + +--- + +## Acceptance Criteria + +- [ ] Create deployment job from promotion +- [ ] Coordinate multi-target deployment +- [ ] Track task progress per target +- [ ] Cancel running deployment +- [ ] Wait for deployment completion +- [ ] Report deployment progress +- [ ] Handle deployment failures +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 106_005 Decision Engine | Internal | TODO | +| 103_002 Target Registry | Internal | TODO | +| 107_002 Target Executor | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IDeployOrchestrator | TODO | | +| DeployOrchestrator | TODO | | +| DeploymentCoordinator | TODO | | +| DeploymentScheduler | TODO | | +| DeploymentJob model | TODO | | +| IDeploymentJobStore | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_107_002_DEPLOY_target_executor.md b/docs/implplan/SPRINT_20260110_107_002_DEPLOY_target_executor.md new file mode 100644 index 000000000..a4fd21d52 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_107_002_DEPLOY_target_executor.md @@ -0,0 +1,367 @@ +# SPRINT: Target Executor + +> **Sprint ID:** 107_002 +> **Module:** DEPLOY +> **Phase:** 7 - Deployment Execution +> **Status:** TODO +> **Parent:** [107_000_INDEX](SPRINT_20260110_107_000_INDEX_deployment_execution.md) + +--- + +## Overview + +Implement the Target Executor for dispatching deployment tasks to agents. + +### Objectives + +- Dispatch deployment tasks to agents via gRPC +- Track task execution status +- Handle task timeouts and retries +- Collect task results and logs + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Deployment/ +│ ├── Executor/ +│ │ ├── ITargetExecutor.cs +│ │ ├── TargetExecutor.cs +│ │ ├── AgentDispatcher.cs +│ │ └── TaskResultCollector.cs +│ └── Models/ +│ ├── DeploymentPayload.cs +│ └── TaskResult.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### ITargetExecutor Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Executor; + +public interface ITargetExecutor +{ + Task DeployToTargetAsync( + Guid jobId, + Guid taskId, + DeploymentPayload payload, + CancellationToken ct = default); + + Task GetTaskAsync(Guid taskId, CancellationToken ct = default); + Task CancelTaskAsync(Guid taskId, CancellationToken ct = default); + Task GetTaskLogsAsync(Guid taskId, CancellationToken ct = default); +} + +public sealed record DeploymentPayload +{ + public required Guid ReleaseId { get; init; } + public required string ReleaseName { get; init; } + public required ImmutableArray Components { get; init; } + public required string ComposeLock { get; init; } + public required string VersionSticker { get; init; } + public required string DeploymentManifest { get; init; } + public ImmutableDictionary Variables { get; init; } = ImmutableDictionary.Empty; +} + +public sealed record DeploymentComponent( + string Name, + string Image, + string Digest, + ImmutableDictionary Config +); +``` + +### TargetExecutor Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Executor; + +public sealed class TargetExecutor : ITargetExecutor +{ + private readonly IDeploymentJobStore _jobStore; + private readonly ITargetRegistry _targetRegistry; + private readonly IAgentManager _agentManager; + private readonly AgentDispatcher _dispatcher; + private readonly TaskResultCollector _resultCollector; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public async Task DeployToTargetAsync( + Guid jobId, + Guid taskId, + DeploymentPayload payload, + CancellationToken ct = default) + { + var job = await _jobStore.GetAsync(jobId, ct) + ?? throw new DeploymentJobNotFoundException(jobId); + + var task = job.Tasks.FirstOrDefault(t => t.Id == taskId) + ?? throw new DeploymentTaskNotFoundException(taskId); + + var target = await _targetRegistry.GetAsync(task.TargetId, ct) + ?? throw new TargetNotFoundException(task.TargetId); + + if (target.AgentId is null) + { + throw new NoAgentAssignedException(target.Id); + } + + var agent = await _agentManager.GetAsync(target.AgentId.Value, ct); + if (agent?.Status != AgentStatus.Active) + { + throw new AgentNotActiveException(target.AgentId.Value); + } + + // Update task status + task = task with + { + Status = DeploymentTaskStatus.Running, + AgentId = agent.Id.ToString(), + StartedAt = _timeProvider.GetUtcNow() + }; + + await UpdateTaskAsync(job, task, ct); + + await _eventPublisher.PublishAsync(new DeploymentTaskStarted( + taskId, jobId, target.Name, agent.Name, _timeProvider.GetUtcNow() + ), ct); + + try + { + // Dispatch to agent + var agentTask = BuildAgentTask(target, payload); + var result = await _dispatcher.DispatchAsync(agent.Id, agentTask, ct); + + // Collect results + task = await _resultCollector.CollectAsync(task, result, ct); + + if (task.Status == DeploymentTaskStatus.Completed) + { + await _eventPublisher.PublishAsync(new DeploymentTaskCompleted( + taskId, jobId, target.Name, task.CompletedAt!.Value - task.StartedAt!.Value, + _timeProvider.GetUtcNow() + ), ct); + } + else + { + await _eventPublisher.PublishAsync(new DeploymentTaskFailed( + taskId, jobId, target.Name, task.Error ?? "Unknown error", + _timeProvider.GetUtcNow() + ), ct); + } + } + catch (Exception ex) + { + _logger.LogError(ex, "Deployment task {TaskId} failed for target {Target}", taskId, target.Name); + + task = task with + { + Status = DeploymentTaskStatus.Failed, + Error = ex.Message, + CompletedAt = _timeProvider.GetUtcNow() + }; + + await _eventPublisher.PublishAsync(new DeploymentTaskFailed( + taskId, jobId, target.Name, ex.Message, _timeProvider.GetUtcNow() + ), ct); + } + + await UpdateTaskAsync(job, task, ct); + return task; + } + + private static AgentDeploymentTask BuildAgentTask(Target target, DeploymentPayload payload) + { + return new AgentDeploymentTask + { + Type = target.Type switch + { + TargetType.DockerHost => AgentTaskType.DockerDeploy, + TargetType.ComposeHost => AgentTaskType.ComposeDeploy, + _ => throw new UnsupportedTargetTypeException(target.Type) + }, + Payload = new AgentDeploymentPayload + { + Components = payload.Components.Select(c => new AgentComponent + { + Name = c.Name, + Image = $"{c.Image}@{c.Digest}", + Config = c.Config + }).ToList(), + ComposeLock = payload.ComposeLock, + VersionSticker = payload.VersionSticker, + Variables = payload.Variables + } + }; + } + + private async Task UpdateTaskAsync(DeploymentJob job, DeploymentTask updatedTask, CancellationToken ct) + { + var tasks = job.Tasks.Select(t => t.Id == updatedTask.Id ? updatedTask : t).ToImmutableArray(); + var updatedJob = job with { Tasks = tasks }; + await _jobStore.SaveAsync(updatedJob, ct); + } +} +``` + +### AgentDispatcher + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Executor; + +public sealed class AgentDispatcher +{ + private readonly IAgentManager _agentManager; + private readonly ILogger _logger; + private readonly TimeSpan _defaultTimeout = TimeSpan.FromMinutes(30); + + public async Task DispatchAsync( + Guid agentId, + AgentDeploymentTask task, + CancellationToken ct = default) + { + _logger.LogDebug("Dispatching task to agent {AgentId}", agentId); + + using var timeoutCts = new CancellationTokenSource(_defaultTimeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + try + { + var result = await _agentManager.ExecuteTaskAsync(agentId, task, linkedCts.Token); + + _logger.LogDebug( + "Agent {AgentId} completed task with status {Status}", + agentId, + result.Success ? "success" : "failure"); + + return result; + } + catch (OperationCanceledException) when (timeoutCts.IsCancellationRequested) + { + throw new AgentTaskTimeoutException(agentId, _defaultTimeout); + } + } +} + +public sealed record AgentDeploymentTask +{ + public required AgentTaskType Type { get; init; } + public required AgentDeploymentPayload Payload { get; init; } +} + +public enum AgentTaskType +{ + DockerDeploy, + ComposeDeploy, + DockerRollback, + ComposeRollback +} + +public sealed record AgentDeploymentPayload +{ + public required IReadOnlyList Components { get; init; } + public required string ComposeLock { get; init; } + public required string VersionSticker { get; init; } + public IReadOnlyDictionary Variables { get; init; } = new Dictionary(); +} + +public sealed record AgentComponent +{ + public required string Name { get; init; } + public required string Image { get; init; } + public IReadOnlyDictionary Config { get; init; } = new Dictionary(); +} + +public sealed record AgentTaskResult +{ + public bool Success { get; init; } + public string? Error { get; init; } + public IReadOnlyDictionary Outputs { get; init; } = new Dictionary(); + public string? Logs { get; init; } + public TimeSpan Duration { get; init; } +} +``` + +### TaskResultCollector + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Executor; + +public sealed class TaskResultCollector +{ + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public Task CollectAsync( + DeploymentTask task, + AgentTaskResult result, + CancellationToken ct = default) + { + var updatedTask = task with + { + Status = result.Success ? DeploymentTaskStatus.Completed : DeploymentTaskStatus.Failed, + Error = result.Error, + CompletedAt = _timeProvider.GetUtcNow(), + Result = result.Outputs.ToImmutableDictionary() + }; + + _logger.LogDebug( + "Collected result for task {TaskId}: {Status}", + task.Id, + updatedTask.Status); + + return Task.FromResult(updatedTask); + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Dispatch tasks to agents via gRPC +- [ ] Track task execution status +- [ ] Handle task timeouts +- [ ] Collect task results +- [ ] Collect task logs +- [ ] Cancel running tasks +- [ ] Support Docker and Compose targets +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 107_001 Deploy Orchestrator | Internal | TODO | +| 103_002 Target Registry | Internal | TODO | +| 103_003 Agent Manager | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| ITargetExecutor | TODO | | +| TargetExecutor | TODO | | +| AgentDispatcher | TODO | | +| TaskResultCollector | TODO | | +| DeploymentPayload | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_107_003_DEPLOY_artifact_generator.md b/docs/implplan/SPRINT_20260110_107_003_DEPLOY_artifact_generator.md new file mode 100644 index 000000000..19a776b28 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_107_003_DEPLOY_artifact_generator.md @@ -0,0 +1,461 @@ +# SPRINT: Artifact Generator + +> **Sprint ID:** 107_003 +> **Module:** DEPLOY +> **Phase:** 7 - Deployment Execution +> **Status:** TODO +> **Parent:** [107_000_INDEX](SPRINT_20260110_107_000_INDEX_deployment_execution.md) + +--- + +## Overview + +Implement the Artifact Generator for creating deployment artifacts including digest-locked compose files and version stickers. + +### Objectives + +- Generate digest-locked compose files +- Create version sticker files (stella.version.json) +- Generate deployment manifests +- Support multiple artifact formats + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Deployment/ +│ └── Artifact/ +│ ├── IArtifactGenerator.cs +│ ├── ArtifactGenerator.cs +│ ├── ComposeLockGenerator.cs +│ ├── VersionStickerGenerator.cs +│ └── DeploymentManifestGenerator.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### IArtifactGenerator Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Artifact; + +public interface IArtifactGenerator +{ + Task GeneratePayloadAsync(DeploymentJob job, CancellationToken ct = default); + Task GenerateComposeLockAsync(Release release, CancellationToken ct = default); + Task GenerateVersionStickerAsync(Release release, DeploymentJob job, CancellationToken ct = default); + Task GenerateDeploymentManifestAsync(DeploymentJob job, CancellationToken ct = default); +} +``` + +### ComposeLockGenerator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Artifact; + +public sealed class ComposeLockGenerator +{ + private readonly ILogger _logger; + + public string Generate(Release release, ComposeTemplate? template = null) + { + var services = new Dictionary(); + + foreach (var component in release.Components.OrderBy(c => c.OrderIndex)) + { + var service = new Dictionary + { + ["image"] = $"{GetFullImageRef(component)}@{component.Digest}", + ["labels"] = new Dictionary + { + ["stella.release.id"] = release.Id.ToString(), + ["stella.release.name"] = release.Name, + ["stella.component.id"] = component.ComponentId.ToString(), + ["stella.component.name"] = component.ComponentName, + ["stella.digest"] = component.Digest + } + }; + + // Add config from component + foreach (var (key, value) in component.Config) + { + service[key] = value; + } + + services[component.ComponentName] = service; + } + + var compose = new Dictionary + { + ["version"] = "3.8", + ["services"] = services, + ["x-stella"] = new Dictionary + { + ["release"] = new Dictionary + { + ["id"] = release.Id.ToString(), + ["name"] = release.Name, + ["manifestDigest"] = release.ManifestDigest ?? "" + }, + ["generated"] = TimeProvider.System.GetUtcNow().ToString("O") + } + }; + + // Merge with template if provided + if (template is not null) + { + compose = MergeWithTemplate(compose, template); + } + + var yaml = new SerializerBuilder() + .WithNamingConvention(CamelCaseNamingConvention.Instance) + .Build() + .Serialize(compose); + + _logger.LogDebug( + "Generated compose.stella.lock.yml for release {Release} with {Count} services", + release.Name, + services.Count); + + return yaml; + } + + private static string GetFullImageRef(ReleaseComponent component) + { + // Component config should include registry info + var registry = component.Config.GetValueOrDefault("registry", ""); + var repository = component.Config.GetValueOrDefault("repository", component.ComponentName); + return string.IsNullOrEmpty(registry) ? repository : $"{registry}/{repository}"; + } + + private static Dictionary MergeWithTemplate( + Dictionary generated, + ComposeTemplate template) + { + // Deep merge template with generated config + // Template provides networks, volumes, etc. + var merged = new Dictionary(generated); + + if (template.Networks is not null) + merged["networks"] = template.Networks; + + if (template.Volumes is not null) + merged["volumes"] = template.Volumes; + + // Merge service configs from template + if (template.ServiceDefaults is not null && merged["services"] is Dictionary services) + { + foreach (var (serviceName, serviceConfig) in services) + { + if (serviceConfig is Dictionary config) + { + foreach (var (key, value) in template.ServiceDefaults) + { + if (!config.ContainsKey(key)) + { + config[key] = value; + } + } + } + } + } + + return merged; + } +} + +public sealed record ComposeTemplate( + IReadOnlyDictionary? Networks, + IReadOnlyDictionary? Volumes, + IReadOnlyDictionary? ServiceDefaults +); +``` + +### VersionStickerGenerator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Artifact; + +public sealed class VersionStickerGenerator +{ + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public string Generate(Release release, DeploymentJob job, Target target) + { + var sticker = new VersionSticker + { + SchemaVersion = "1.0", + Release = new ReleaseInfo + { + Id = release.Id.ToString(), + Name = release.Name, + ManifestDigest = release.ManifestDigest, + FinalizedAt = release.FinalizedAt?.ToString("O") + }, + Deployment = new DeploymentInfo + { + JobId = job.Id.ToString(), + EnvironmentId = job.EnvironmentId.ToString(), + EnvironmentName = job.EnvironmentName, + TargetId = target.Id.ToString(), + TargetName = target.Name, + Strategy = job.Strategy.ToString(), + DeployedAt = _timeProvider.GetUtcNow().ToString("O") + }, + Components = release.Components.Select(c => new ComponentInfo + { + Name = c.ComponentName, + Digest = c.Digest, + Tag = c.Tag, + SemVer = c.SemVer + }).ToList() + }; + + var json = JsonSerializer.Serialize(sticker, new JsonSerializerOptions + { + WriteIndented = true, + PropertyNamingPolicy = JsonNamingPolicy.CamelCase + }); + + _logger.LogDebug( + "Generated stella.version.json for release {Release} on target {Target}", + release.Name, + target.Name); + + return json; + } +} + +public sealed class VersionSticker +{ + public required string SchemaVersion { get; set; } + public required ReleaseInfo Release { get; set; } + public required DeploymentInfo Deployment { get; set; } + public required IReadOnlyList Components { get; set; } +} + +public sealed class ReleaseInfo +{ + public required string Id { get; set; } + public required string Name { get; set; } + public string? ManifestDigest { get; set; } + public string? FinalizedAt { get; set; } +} + +public sealed class DeploymentInfo +{ + public required string JobId { get; set; } + public required string EnvironmentId { get; set; } + public required string EnvironmentName { get; set; } + public required string TargetId { get; set; } + public required string TargetName { get; set; } + public required string Strategy { get; set; } + public required string DeployedAt { get; set; } +} + +public sealed class ComponentInfo +{ + public required string Name { get; set; } + public required string Digest { get; set; } + public string? Tag { get; set; } + public string? SemVer { get; set; } +} +``` + +### DeploymentManifestGenerator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Artifact; + +public sealed class DeploymentManifestGenerator +{ + private readonly IReleaseManager _releaseManager; + private readonly IEnvironmentService _environmentService; + private readonly IPromotionManager _promotionManager; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public async Task GenerateAsync(DeploymentJob job, CancellationToken ct = default) + { + var release = await _releaseManager.GetAsync(job.ReleaseId, ct); + var environment = await _environmentService.GetAsync(job.EnvironmentId, ct); + var promotion = await _promotionManager.GetAsync(job.PromotionId, ct); + + var manifest = new DeploymentManifest + { + SchemaVersion = "1.0", + Deployment = new DeploymentMetadata + { + JobId = job.Id.ToString(), + Strategy = job.Strategy.ToString(), + StartedAt = job.StartedAt.ToString("O"), + StartedBy = job.StartedBy.ToString() + }, + Release = new ReleaseMetadata + { + Id = release!.Id.ToString(), + Name = release.Name, + ManifestDigest = release.ManifestDigest, + FinalizedAt = release.FinalizedAt?.ToString("O"), + Components = release.Components.Select(c => new ComponentMetadata + { + Id = c.ComponentId.ToString(), + Name = c.ComponentName, + Digest = c.Digest, + Tag = c.Tag, + SemVer = c.SemVer + }).ToList() + }, + Environment = new EnvironmentMetadata + { + Id = environment!.Id.ToString(), + Name = environment.Name, + IsProduction = environment.IsProduction + }, + Promotion = promotion is not null ? new PromotionMetadata + { + Id = promotion.Id.ToString(), + RequestedBy = promotion.RequestedBy.ToString(), + RequestedAt = promotion.RequestedAt.ToString("O"), + Approvals = promotion.Approvals.Select(a => new ApprovalMetadata + { + UserId = a.UserId.ToString(), + UserName = a.UserName, + Decision = a.Decision.ToString(), + DecidedAt = a.DecidedAt.ToString("O") + }).ToList(), + GateResults = promotion.GateResults.Select(g => new GateResultMetadata + { + GateName = g.GateName, + Passed = g.Passed, + Message = g.Message + }).ToList() + } : null, + Targets = job.Tasks.Select(t => new TargetMetadata + { + Id = t.TargetId.ToString(), + Name = t.TargetName, + Status = t.Status.ToString() + }).ToList(), + GeneratedAt = _timeProvider.GetUtcNow().ToString("O") + }; + + var json = JsonSerializer.Serialize(manifest, new JsonSerializerOptions + { + WriteIndented = true, + PropertyNamingPolicy = JsonNamingPolicy.CamelCase + }); + + _logger.LogDebug("Generated deployment manifest for job {JobId}", job.Id); + + return json; + } +} + +// Manifest models +public sealed class DeploymentManifest +{ + public required string SchemaVersion { get; set; } + public required DeploymentMetadata Deployment { get; set; } + public required ReleaseMetadata Release { get; set; } + public required EnvironmentMetadata Environment { get; set; } + public PromotionMetadata? Promotion { get; set; } + public required IReadOnlyList Targets { get; set; } + public required string GeneratedAt { get; set; } +} + +// Additional metadata classes abbreviated for brevity... +``` + +### ArtifactGenerator (Coordinator) + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Artifact; + +public sealed class ArtifactGenerator : IArtifactGenerator +{ + private readonly IReleaseManager _releaseManager; + private readonly ComposeLockGenerator _composeLockGenerator; + private readonly VersionStickerGenerator _versionStickerGenerator; + private readonly DeploymentManifestGenerator _manifestGenerator; + private readonly ILogger _logger; + + public async Task GeneratePayloadAsync( + DeploymentJob job, + CancellationToken ct = default) + { + var release = await _releaseManager.GetAsync(job.ReleaseId, ct) + ?? throw new ReleaseNotFoundException(job.ReleaseId); + + var composeLock = await GenerateComposeLockAsync(release, ct); + var versionSticker = await GenerateVersionStickerAsync(release, job, ct); + var manifest = await GenerateDeploymentManifestAsync(job, ct); + + var components = release.Components.Select(c => new DeploymentComponent( + c.ComponentName, + c.Config.GetValueOrDefault("image", c.ComponentName), + c.Digest, + c.Config + )).ToImmutableArray(); + + return new DeploymentPayload + { + ReleaseId = release.Id, + ReleaseName = release.Name, + Components = components, + ComposeLock = composeLock, + VersionSticker = versionSticker, + DeploymentManifest = manifest + }; + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Generate digest-locked compose files +- [ ] All images use digest references +- [ ] Generate stella.version.json stickers +- [ ] Generate deployment manifests +- [ ] Include all required metadata +- [ ] Merge with compose templates +- [ ] JSON/YAML formats valid +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 107_001 Deploy Orchestrator | Internal | TODO | +| 104_003 Release Manager | Internal | TODO | +| YamlDotNet | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IArtifactGenerator | TODO | | +| ArtifactGenerator | TODO | | +| ComposeLockGenerator | TODO | | +| VersionStickerGenerator | TODO | | +| DeploymentManifestGenerator | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_107_004_DEPLOY_rollback_manager.md b/docs/implplan/SPRINT_20260110_107_004_DEPLOY_rollback_manager.md new file mode 100644 index 000000000..b9c81f0de --- /dev/null +++ b/docs/implplan/SPRINT_20260110_107_004_DEPLOY_rollback_manager.md @@ -0,0 +1,461 @@ +# SPRINT: Rollback Manager + +> **Sprint ID:** 107_004 +> **Module:** DEPLOY +> **Phase:** 7 - Deployment Execution +> **Status:** TODO +> **Parent:** [107_000_INDEX](SPRINT_20260110_107_000_INDEX_deployment_execution.md) + +--- + +## Overview + +Implement the Rollback Manager for handling deployment failure recovery. + +### Objectives + +- Plan rollback strategy for failed deployments +- Execute rollback to previous release +- Track rollback progress and status +- Generate rollback evidence + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Deployment/ +│ └── Rollback/ +│ ├── IRollbackManager.cs +│ ├── RollbackManager.cs +│ ├── RollbackPlanner.cs +│ ├── RollbackExecutor.cs +│ └── RollbackEvidenceGenerator.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### IRollbackManager Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Rollback; + +public interface IRollbackManager +{ + Task PlanAsync(Guid jobId, CancellationToken ct = default); + Task ExecuteAsync(RollbackPlan plan, CancellationToken ct = default); + Task ExecuteAsync(Guid jobId, CancellationToken ct = default); + Task GetPlanAsync(Guid jobId, CancellationToken ct = default); + Task CanRollbackAsync(Guid jobId, CancellationToken ct = default); +} + +public sealed record RollbackPlan +{ + public required Guid Id { get; init; } + public required Guid FailedJobId { get; init; } + public required Guid TargetReleaseId { get; init; } + public required string TargetReleaseName { get; init; } + public required ImmutableArray Targets { get; init; } + public required RollbackStrategy Strategy { get; init; } + public required DateTimeOffset PlannedAt { get; init; } +} + +public enum RollbackStrategy +{ + RedeployPrevious, // Redeploy the previous release + RestoreSnapshot, // Restore from snapshot if available + Manual // Requires manual intervention +} + +public sealed record RollbackTarget( + Guid TargetId, + string TargetName, + string CurrentDigest, + string RollbackToDigest, + RollbackTargetStatus Status +); + +public enum RollbackTargetStatus +{ + Pending, + RollingBack, + RolledBack, + Failed, + Skipped +} +``` + +### RollbackManager Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Rollback; + +public sealed class RollbackManager : IRollbackManager +{ + private readonly IDeploymentJobStore _jobStore; + private readonly IReleaseHistory _releaseHistory; + private readonly IReleaseManager _releaseManager; + private readonly ITargetExecutor _targetExecutor; + private readonly IArtifactGenerator _artifactGenerator; + private readonly RollbackPlanner _planner; + private readonly RollbackEvidenceGenerator _evidenceGenerator; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + private readonly ILogger _logger; + + public async Task PlanAsync(Guid jobId, CancellationToken ct = default) + { + var job = await _jobStore.GetAsync(jobId, ct) + ?? throw new DeploymentJobNotFoundException(jobId); + + if (job.Status != DeploymentStatus.Failed) + { + throw new RollbackNotRequiredException(jobId); + } + + // Find previous successful deployment + var previousRelease = await _releaseHistory.GetPreviousDeployedAsync( + job.EnvironmentId, job.ReleaseId, ct); + + if (previousRelease is null) + { + throw new NoPreviousReleaseException(job.EnvironmentId); + } + + var plan = await _planner.CreatePlanAsync(job, previousRelease, ct); + + _logger.LogInformation( + "Created rollback plan {PlanId} for job {JobId}: rollback to {Release}", + plan.Id, jobId, previousRelease.Name); + + return plan; + } + + public async Task ExecuteAsync( + RollbackPlan plan, + CancellationToken ct = default) + { + var failedJob = await _jobStore.GetAsync(plan.FailedJobId, ct) + ?? throw new DeploymentJobNotFoundException(plan.FailedJobId); + + var targetRelease = await _releaseManager.GetAsync(plan.TargetReleaseId, ct) + ?? throw new ReleaseNotFoundException(plan.TargetReleaseId); + + // Update original job to rolling back + failedJob = failedJob with { Status = DeploymentStatus.RollingBack }; + await _jobStore.SaveAsync(failedJob, ct); + + await _eventPublisher.PublishAsync(new RollbackStarted( + plan.Id, plan.FailedJobId, plan.TargetReleaseId, + plan.TargetReleaseName, plan.Targets.Length, _timeProvider.GetUtcNow() + ), ct); + + try + { + // Generate rollback payload + var payload = await _artifactGenerator.GeneratePayloadAsync( + new DeploymentJob + { + Id = _guidGenerator.NewGuid(), + TenantId = failedJob.TenantId, + PromotionId = failedJob.PromotionId, + ReleaseId = targetRelease.Id, + ReleaseName = targetRelease.Name, + EnvironmentId = failedJob.EnvironmentId, + EnvironmentName = failedJob.EnvironmentName, + Status = DeploymentStatus.Running, + Strategy = DeploymentStrategy.AllAtOnce, + Options = new DeploymentOptions(), + Tasks = [], + StartedAt = _timeProvider.GetUtcNow(), + StartedBy = Guid.Empty + }, ct); + + // Execute rollback on each target + foreach (var target in plan.Targets) + { + if (target.Status != RollbackTargetStatus.Pending) + continue; + + try + { + await ExecuteTargetRollbackAsync(failedJob, target, payload, ct); + } + catch (Exception ex) + { + _logger.LogError(ex, + "Rollback failed for target {Target}", + target.TargetName); + } + } + + // Update job status + failedJob = failedJob with + { + Status = DeploymentStatus.RolledBack, + RollbackJobId = plan.Id, + CompletedAt = _timeProvider.GetUtcNow() + }; + await _jobStore.SaveAsync(failedJob, ct); + + // Generate evidence + await _evidenceGenerator.GenerateAsync(plan, failedJob, ct); + + await _eventPublisher.PublishAsync(new RollbackCompleted( + plan.Id, plan.FailedJobId, plan.TargetReleaseName, + _timeProvider.GetUtcNow() + ), ct); + + _logger.LogInformation( + "Rollback completed for job {JobId} to release {Release}", + plan.FailedJobId, targetRelease.Name); + + return failedJob; + } + catch (Exception ex) + { + _logger.LogError(ex, "Rollback failed for job {JobId}", plan.FailedJobId); + + failedJob = failedJob with + { + Status = DeploymentStatus.Failed, + FailureReason = $"Rollback failed: {ex.Message}" + }; + await _jobStore.SaveAsync(failedJob, ct); + + await _eventPublisher.PublishAsync(new RollbackFailed( + plan.Id, plan.FailedJobId, ex.Message, _timeProvider.GetUtcNow() + ), ct); + + throw; + } + } + + private async Task ExecuteTargetRollbackAsync( + DeploymentJob job, + RollbackTarget target, + DeploymentPayload payload, + CancellationToken ct) + { + _logger.LogInformation( + "Rolling back target {Target} from {Current} to {Previous}", + target.TargetName, + target.CurrentDigest[..16], + target.RollbackToDigest[..16]); + + // Create a rollback task + var task = new DeploymentTask + { + Id = _guidGenerator.NewGuid(), + TargetId = target.TargetId, + TargetName = target.TargetName, + BatchIndex = 0, + Status = DeploymentTaskStatus.Pending + }; + + await _targetExecutor.DeployToTargetAsync(job.Id, task.Id, payload, ct); + } + + public async Task CanRollbackAsync(Guid jobId, CancellationToken ct = default) + { + var job = await _jobStore.GetAsync(jobId, ct); + if (job is null) + return false; + + if (job.Status != DeploymentStatus.Failed) + return false; + + var previousRelease = await _releaseHistory.GetPreviousDeployedAsync( + job.EnvironmentId, job.ReleaseId, ct); + + return previousRelease is not null; + } +} +``` + +### RollbackPlanner + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Rollback; + +public sealed class RollbackPlanner +{ + private readonly IInventorySyncService _inventoryService; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + + public async Task CreatePlanAsync( + DeploymentJob failedJob, + Release targetRelease, + CancellationToken ct = default) + { + var targets = new List(); + + foreach (var task in failedJob.Tasks) + { + // Get current state from inventory + var snapshot = await _inventoryService.GetLatestSnapshotAsync(task.TargetId, ct); + + var currentDigest = snapshot?.Containers + .FirstOrDefault(c => IsDeployedComponent(c, failedJob.ReleaseName)) + ?.ImageDigest ?? ""; + + var rollbackDigest = targetRelease.Components + .FirstOrDefault(c => MatchesTarget(c, task)) + ?.Digest ?? ""; + + targets.Add(new RollbackTarget( + TargetId: task.TargetId, + TargetName: task.TargetName, + CurrentDigest: currentDigest, + RollbackToDigest: rollbackDigest, + Status: task.Status == DeploymentTaskStatus.Completed + ? RollbackTargetStatus.Pending + : RollbackTargetStatus.Skipped + )); + } + + return new RollbackPlan + { + Id = _guidGenerator.NewGuid(), + FailedJobId = failedJob.Id, + TargetReleaseId = targetRelease.Id, + TargetReleaseName = targetRelease.Name, + Targets = targets.ToImmutableArray(), + Strategy = RollbackStrategy.RedeployPrevious, + PlannedAt = _timeProvider.GetUtcNow() + }; + } + + private static bool IsDeployedComponent(ContainerInfo container, string releaseName) => + container.Labels.GetValueOrDefault("stella.release.name") == releaseName; + + private static bool MatchesTarget(ReleaseComponent component, DeploymentTask task) => + component.ComponentName == task.TargetName; +} +``` + +### RollbackEvidenceGenerator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Rollback; + +public sealed class RollbackEvidenceGenerator +{ + private readonly IEvidencePacketService _evidenceService; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public async Task GenerateAsync( + RollbackPlan plan, + DeploymentJob job, + CancellationToken ct = default) + { + var evidence = new RollbackEvidence + { + PlanId = plan.Id.ToString(), + FailedJobId = plan.FailedJobId.ToString(), + TargetReleaseId = plan.TargetReleaseId.ToString(), + TargetReleaseName = plan.TargetReleaseName, + RollbackStrategy = plan.Strategy.ToString(), + PlannedAt = plan.PlannedAt.ToString("O"), + ExecutedAt = _timeProvider.GetUtcNow().ToString("O"), + Targets = plan.Targets.Select(t => new RollbackTargetEvidence + { + TargetId = t.TargetId.ToString(), + TargetName = t.TargetName, + FromDigest = t.CurrentDigest, + ToDigest = t.RollbackToDigest, + Status = t.Status.ToString() + }).ToList(), + OriginalFailure = job.FailureReason + }; + + var packet = await _evidenceService.CreatePacketAsync(new CreateEvidencePacketRequest + { + Type = EvidenceType.Rollback, + SubjectId = plan.FailedJobId, + Content = JsonSerializer.Serialize(evidence), + Metadata = new Dictionary + { + ["rollbackPlanId"] = plan.Id.ToString(), + ["targetRelease"] = plan.TargetReleaseName, + ["environment"] = job.EnvironmentName + } + }, ct); + + _logger.LogInformation( + "Generated rollback evidence packet {PacketId} for job {JobId}", + packet.Id, plan.FailedJobId); + } +} + +public sealed class RollbackEvidence +{ + public required string PlanId { get; set; } + public required string FailedJobId { get; set; } + public required string TargetReleaseId { get; set; } + public required string TargetReleaseName { get; set; } + public required string RollbackStrategy { get; set; } + public required string PlannedAt { get; set; } + public required string ExecutedAt { get; set; } + public required IReadOnlyList Targets { get; set; } + public string? OriginalFailure { get; set; } +} + +public sealed class RollbackTargetEvidence +{ + public required string TargetId { get; set; } + public required string TargetName { get; set; } + public required string FromDigest { get; set; } + public required string ToDigest { get; set; } + public required string Status { get; set; } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Plan rollback from failed deployment +- [ ] Find previous successful release +- [ ] Execute rollback on completed targets +- [ ] Skip targets not yet deployed +- [ ] Track rollback progress +- [ ] Generate rollback evidence +- [ ] Update deployment status +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 107_002 Target Executor | Internal | TODO | +| 104_004 Release Catalog | Internal | TODO | +| 109_002 Evidence Packets | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IRollbackManager | TODO | | +| RollbackManager | TODO | | +| RollbackPlanner | TODO | | +| RollbackEvidenceGenerator | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_107_005_DEPLOY_strategies.md b/docs/implplan/SPRINT_20260110_107_005_DEPLOY_strategies.md new file mode 100644 index 000000000..711191e0b --- /dev/null +++ b/docs/implplan/SPRINT_20260110_107_005_DEPLOY_strategies.md @@ -0,0 +1,460 @@ +# SPRINT: Deployment Strategies + +> **Sprint ID:** 107_005 +> **Module:** DEPLOY +> **Phase:** 7 - Deployment Execution +> **Status:** TODO +> **Parent:** [107_000_INDEX](SPRINT_20260110_107_000_INDEX_deployment_execution.md) + +--- + +## Overview + +Implement deployment strategies for different deployment patterns. + +### Objectives + +- Rolling deployment strategy +- Blue-green deployment strategy +- Canary deployment strategy +- All-at-once deployment strategy +- Strategy factory for selection + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Deployment/ +│ └── Strategy/ +│ ├── IDeploymentStrategy.cs +│ ├── DeploymentStrategyFactory.cs +│ ├── RollingStrategy.cs +│ ├── BlueGreenStrategy.cs +│ ├── CanaryStrategy.cs +│ └── AllAtOnceStrategy.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### IDeploymentStrategy Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Strategy; + +public interface IDeploymentStrategy +{ + string Name { get; } + Task> PlanAsync(DeploymentJob job, CancellationToken ct = default); + Task ShouldProceedAsync(DeploymentBatch completedBatch, CancellationToken ct = default); +} + +public sealed record DeploymentBatch( + int Index, + ImmutableArray TaskIds, + BatchRequirements Requirements +); + +public sealed record BatchRequirements( + bool WaitForHealthCheck = true, + TimeSpan? HealthCheckTimeout = null, + double MinSuccessRate = 1.0 +); +``` + +### RollingStrategy + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Strategy; + +public sealed class RollingStrategy : IDeploymentStrategy +{ + private readonly ITargetHealthChecker _healthChecker; + private readonly ILogger _logger; + + public string Name => "rolling"; + + public Task> PlanAsync( + DeploymentJob job, + CancellationToken ct = default) + { + var batchSize = ParseBatchSize(job.Options.BatchSize, job.Tasks.Length); + var batches = new List(); + + var taskIds = job.Tasks.Select(t => t.Id).ToList(); + var batchIndex = 0; + + while (taskIds.Count > 0) + { + var batchTaskIds = taskIds.Take(batchSize).ToImmutableArray(); + taskIds = taskIds.Skip(batchSize).ToList(); + + batches.Add(new DeploymentBatch( + Index: batchIndex++, + TaskIds: batchTaskIds, + Requirements: new BatchRequirements( + WaitForHealthCheck: job.Options.WaitForHealthCheck, + HealthCheckTimeout: TimeSpan.FromMinutes(5) + ) + )); + } + + _logger.LogInformation( + "Rolling strategy planned {BatchCount} batches of ~{BatchSize} targets", + batches.Count, batchSize); + + return Task.FromResult>(batches); + } + + public async Task ShouldProceedAsync( + DeploymentBatch completedBatch, + CancellationToken ct = default) + { + if (!completedBatch.Requirements.WaitForHealthCheck) + return true; + + // Check health of deployed targets + foreach (var taskId in completedBatch.TaskIds) + { + var isHealthy = await _healthChecker.CheckTaskHealthAsync(taskId, ct); + if (!isHealthy) + { + _logger.LogWarning( + "Task {TaskId} in batch {BatchIndex} is unhealthy, halting rollout", + taskId, completedBatch.Index); + return false; + } + } + + return true; + } + + private static int ParseBatchSize(string? batchSizeSpec, int totalTargets) + { + if (string.IsNullOrEmpty(batchSizeSpec)) + return Math.Max(1, totalTargets / 4); + + if (batchSizeSpec.EndsWith('%')) + { + var percent = int.Parse(batchSizeSpec.TrimEnd('%'), CultureInfo.InvariantCulture); + return Math.Max(1, totalTargets * percent / 100); + } + + return int.Parse(batchSizeSpec, CultureInfo.InvariantCulture); + } +} +``` + +### BlueGreenStrategy + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Strategy; + +public sealed class BlueGreenStrategy : IDeploymentStrategy +{ + private readonly ITargetHealthChecker _healthChecker; + private readonly ITrafficRouter _trafficRouter; + private readonly ILogger _logger; + + public string Name => "blue-green"; + + public Task> PlanAsync( + DeploymentJob job, + CancellationToken ct = default) + { + // Blue-green deploys to all targets at once (the "green" set) + // Then switches traffic from "blue" to "green" + var batches = new List + { + // Phase 1: Deploy to green (all targets) + new DeploymentBatch( + Index: 0, + TaskIds: job.Tasks.Select(t => t.Id).ToImmutableArray(), + Requirements: new BatchRequirements( + WaitForHealthCheck: true, + HealthCheckTimeout: TimeSpan.FromMinutes(10), + MinSuccessRate: 1.0 // All must succeed + ) + ) + }; + + _logger.LogInformation( + "Blue-green strategy: deploy all {Count} targets, then switch traffic", + job.Tasks.Length); + + return Task.FromResult>(batches); + } + + public async Task ShouldProceedAsync( + DeploymentBatch completedBatch, + CancellationToken ct = default) + { + // All targets must be healthy before switching traffic + foreach (var taskId in completedBatch.TaskIds) + { + var isHealthy = await _healthChecker.CheckTaskHealthAsync(taskId, ct); + if (!isHealthy) + { + _logger.LogWarning( + "Blue-green: target {TaskId} unhealthy, not switching traffic", + taskId); + return false; + } + } + + // Switch traffic to new deployment + _logger.LogInformation("Blue-green: switching traffic to new deployment"); + // Traffic switching handled externally based on deployment type + + return true; + } +} +``` + +### CanaryStrategy + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Strategy; + +public sealed class CanaryStrategy : IDeploymentStrategy +{ + private readonly ITargetHealthChecker _healthChecker; + private readonly IMetricsCollector _metricsCollector; + private readonly ILogger _logger; + + public string Name => "canary"; + + public Task> PlanAsync( + DeploymentJob job, + CancellationToken ct = default) + { + var tasks = job.Tasks.ToList(); + var batches = new List(); + + if (tasks.Count == 0) + return Task.FromResult>(batches); + + // Canary phase: 1 target (or min 5% if many targets) + var canarySize = Math.Max(1, tasks.Count / 20); + batches.Add(new DeploymentBatch( + Index: 0, + TaskIds: tasks.Take(canarySize).Select(t => t.Id).ToImmutableArray(), + Requirements: new BatchRequirements( + WaitForHealthCheck: true, + HealthCheckTimeout: TimeSpan.FromMinutes(10), + MinSuccessRate: 1.0 + ) + )); + tasks = tasks.Skip(canarySize).ToList(); + + // Gradual rollout: 25% increments + var batchIndex = 1; + var incrementSize = Math.Max(1, (tasks.Count + 3) / 4); + + while (tasks.Count > 0) + { + var batchTasks = tasks.Take(incrementSize).ToList(); + tasks = tasks.Skip(incrementSize).ToList(); + + batches.Add(new DeploymentBatch( + Index: batchIndex++, + TaskIds: batchTasks.Select(t => t.Id).ToImmutableArray(), + Requirements: new BatchRequirements( + WaitForHealthCheck: true, + MinSuccessRate: 0.95 // Allow some failures in later batches + ) + )); + } + + _logger.LogInformation( + "Canary strategy: {CanarySize} canary, then {Batches} batches", + canarySize, batches.Count - 1); + + return Task.FromResult>(batches); + } + + public async Task ShouldProceedAsync( + DeploymentBatch completedBatch, + CancellationToken ct = default) + { + // Check health + var healthyCount = 0; + foreach (var taskId in completedBatch.TaskIds) + { + if (await _healthChecker.CheckTaskHealthAsync(taskId, ct)) + healthyCount++; + } + + var successRate = (double)healthyCount / completedBatch.TaskIds.Length; + if (successRate < completedBatch.Requirements.MinSuccessRate) + { + _logger.LogWarning( + "Canary batch {Index}: success rate {Rate:P0} below threshold {Required:P0}", + completedBatch.Index, successRate, completedBatch.Requirements.MinSuccessRate); + return false; + } + + // For canary batch (index 0), also check metrics + if (completedBatch.Index == 0) + { + var metrics = await _metricsCollector.GetCanaryMetricsAsync( + completedBatch.TaskIds, ct); + + if (metrics.ErrorRate > 0.05) + { + _logger.LogWarning( + "Canary error rate {Rate:P1} exceeds threshold", + metrics.ErrorRate); + return false; + } + } + + return true; + } +} + +public sealed record CanaryMetrics( + double ErrorRate, + double Latency99th, + int RequestCount +); +``` + +### AllAtOnceStrategy + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Strategy; + +public sealed class AllAtOnceStrategy : IDeploymentStrategy +{ + private readonly ITargetHealthChecker _healthChecker; + private readonly ILogger _logger; + + public string Name => "all-at-once"; + + public Task> PlanAsync( + DeploymentJob job, + CancellationToken ct = default) + { + var batches = new List + { + new DeploymentBatch( + Index: 0, + TaskIds: job.Tasks.Select(t => t.Id).ToImmutableArray(), + Requirements: new BatchRequirements( + WaitForHealthCheck: job.Options.WaitForHealthCheck, + MinSuccessRate: 0.8 // Allow some failures + ) + ) + }; + + _logger.LogInformation( + "All-at-once strategy: deploying to all {Count} targets simultaneously", + job.Tasks.Length); + + return Task.FromResult>(batches); + } + + public Task ShouldProceedAsync( + DeploymentBatch completedBatch, + CancellationToken ct = default) + { + // Single batch, always "proceed" (nothing to proceed to) + return Task.FromResult(true); + } +} +``` + +### DeploymentStrategyFactory + +```csharp +namespace StellaOps.ReleaseOrchestrator.Deployment.Strategy; + +public interface IDeploymentStrategyFactory +{ + IDeploymentStrategy Create(DeploymentStrategy strategy); + IReadOnlyList GetAvailableStrategies(); +} + +public sealed class DeploymentStrategyFactory : IDeploymentStrategyFactory +{ + private readonly IServiceProvider _serviceProvider; + private readonly ILogger _logger; + + private static readonly Dictionary StrategyTypes = new() + { + [DeploymentStrategy.Rolling] = typeof(RollingStrategy), + [DeploymentStrategy.BlueGreen] = typeof(BlueGreenStrategy), + [DeploymentStrategy.Canary] = typeof(CanaryStrategy), + [DeploymentStrategy.AllAtOnce] = typeof(AllAtOnceStrategy) + }; + + public IDeploymentStrategy Create(DeploymentStrategy strategy) + { + if (!StrategyTypes.TryGetValue(strategy, out var type)) + { + throw new UnsupportedStrategyException(strategy); + } + + var instance = _serviceProvider.GetRequiredService(type) as IDeploymentStrategy; + if (instance is null) + { + throw new StrategyCreationException(strategy); + } + + _logger.LogDebug("Created {Strategy} deployment strategy", strategy); + return instance; + } + + public IReadOnlyList GetAvailableStrategies() => + StrategyTypes.Keys.Select(s => s.ToString()).ToList().AsReadOnly(); +} +``` + +--- + +## Acceptance Criteria + +- [ ] Rolling strategy batches targets +- [ ] Rolling strategy checks health between batches +- [ ] Blue-green deploys all then switches +- [ ] Canary deploys incrementally +- [ ] Canary checks metrics after canary batch +- [ ] All-at-once deploys simultaneously +- [ ] Strategy factory creates correct type +- [ ] Batch size parsing works +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 107_002 Target Executor | Internal | TODO | +| 103_002 Target Registry | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IDeploymentStrategy | TODO | | +| DeploymentStrategyFactory | TODO | | +| RollingStrategy | TODO | | +| BlueGreenStrategy | TODO | | +| CanaryStrategy | TODO | | +| AllAtOnceStrategy | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_108_000_INDEX_agents.md b/docs/implplan/SPRINT_20260110_108_000_INDEX_agents.md new file mode 100644 index 000000000..d1e45c1bd --- /dev/null +++ b/docs/implplan/SPRINT_20260110_108_000_INDEX_agents.md @@ -0,0 +1,291 @@ +# SPRINT INDEX: Phase 8 - Agents + +> **Epic:** Release Orchestrator +> **Phase:** 8 - Agents +> **Batch:** 108 +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_release_orchestrator.md) + +--- + +## Overview + +Phase 8 implements the deployment Agents - lightweight, secure executors that run on target hosts to perform container operations. + +### Objectives + +- Agent core runtime with gRPC communication +- Docker agent for standalone containers +- Compose agent for docker-compose deployments +- SSH agent for remote execution +- WinRM agent for Windows hosts +- ECS agent for AWS Elastic Container Service +- Nomad agent for HashiCorp Nomad + +--- + +## Sprint Structure + +| Sprint ID | Title | Module | Status | Dependencies | +|-----------|-------|--------|--------|--------------| +| 108_001 | Agent Core Runtime | AGENTS | TODO | 103_003 | +| 108_002 | Agent - Docker | AGENTS | TODO | 108_001 | +| 108_003 | Agent - Compose | AGENTS | TODO | 108_002 | +| 108_004 | Agent - SSH | AGENTS | TODO | 108_001 | +| 108_005 | Agent - WinRM | AGENTS | TODO | 108_001 | +| 108_006 | Agent - ECS | AGENTS | TODO | 108_001 | +| 108_007 | Agent - Nomad | AGENTS | TODO | 108_001 | + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ AGENT SYSTEM │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ AGENT CORE RUNTIME (108_001) │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Stella Agent │ │ │ +│ │ │ │ │ │ +│ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ +│ │ │ │ gRPC │ │ Task Queue │ │ Heartbeat │ │ │ │ +│ │ │ │ Server │ │ Executor │ │ Service │ │ │ │ +│ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ +│ │ │ │ │ │ +│ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ +│ │ │ │ Credential │ │ Log │ │ Metrics │ │ │ │ +│ │ │ │ Resolver │ │ Streamer │ │ Reporter │ │ │ │ +│ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ +│ │ └─────────────────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────┐ ┌─────────────────────────┐ │ +│ │ DOCKER AGENT (108_002)│ │ COMPOSE AGENT (108_003)│ │ +│ │ │ │ │ │ +│ │ - docker pull │ │ - docker compose pull │ │ +│ │ - docker run │ │ - docker compose up │ │ +│ │ - docker stop │ │ - docker compose down │ │ +│ │ - docker rm │ │ - service health check │ │ +│ │ - health check │ │ - volume management │ │ +│ │ - log streaming │ │ - network management │ │ +│ └─────────────────────────┘ └─────────────────────────┘ │ +│ │ +│ ┌─────────────────────────┐ ┌─────────────────────────┐ │ +│ │ SSH AGENT (108_004) │ │ WINRM AGENT (108_005) │ │ +│ │ │ │ │ │ +│ │ - Remote Docker ops │ │ - Windows containers │ │ +│ │ - Remote script exec │ │ - IIS management │ │ +│ │ - File transfer │ │ - Windows services │ │ +│ │ - SSH key auth │ │ - PowerShell execution │ │ +│ └─────────────────────────┘ └─────────────────────────┘ │ +│ │ +│ ┌─────────────────────────┐ ┌─────────────────────────┐ │ +│ │ ECS AGENT (108_006) │ │ NOMAD AGENT (108_007) │ │ +│ │ │ │ │ │ +│ │ - ECS service deploy │ │ - Nomad job deploy │ │ +│ │ - Task execution │ │ - Job scaling │ │ +│ │ - Service scaling │ │ - Allocation health │ │ +│ │ - CloudWatch logs │ │ - Log streaming │ │ +│ │ - Fargate + EC2 │ │ - Multiple drivers │ │ +│ └─────────────────────────┘ └─────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Deliverables Summary + +### 108_001: Agent Core Runtime + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `AgentHost` | Service | Main agent process | +| `GrpcAgentServer` | gRPC | Task receiver | +| `TaskExecutor` | Class | Task execution | +| `HeartbeatService` | Service | Health reporting | +| `CredentialResolver` | Class | Secret resolution | +| `LogStreamer` | Class | Log forwarding | + +### 108_002: Agent - Docker + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `DockerCapability` | Capability | Docker operations | +| `DockerPullTask` | Task | Pull images | +| `DockerRunTask` | Task | Create/start containers | +| `DockerStopTask` | Task | Stop containers | +| `DockerHealthCheck` | Task | Container health | + +### 108_003: Agent - Compose + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `ComposeCapability` | Capability | Compose operations | +| `ComposePullTask` | Task | Pull compose images | +| `ComposeUpTask` | Task | Deploy compose stack | +| `ComposeDownTask` | Task | Remove compose stack | +| `ComposeScaleTask` | Task | Scale services | + +### 108_004: Agent - SSH + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `SshCapability` | Capability | SSH operations | +| `SshExecuteTask` | Task | Remote command execution | +| `SshFileTransferTask` | Task | SCP file transfer | +| `SshTunnelTask` | Task | SSH tunneling | + +### 108_005: Agent - WinRM + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `WinRmCapability` | Capability | WinRM operations | +| `PowerShellTask` | Task | PowerShell execution | +| `WindowsServiceTask` | Task | Service management | +| `WindowsContainerTask` | Task | Windows container ops | + +### 108_006: Agent - ECS + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `EcsCapability` | Capability | AWS ECS operations | +| `EcsDeployServiceTask` | Task | Deploy/update ECS services | +| `EcsRunTaskTask` | Task | Run one-off ECS tasks | +| `EcsStopTaskTask` | Task | Stop running tasks | +| `EcsScaleServiceTask` | Task | Scale services | +| `EcsHealthCheckTask` | Task | Service health check | +| `CloudWatchLogStreamer` | Class | Log streaming | + +### 108_007: Agent - Nomad + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `NomadCapability` | Capability | Nomad operations | +| `NomadDeployJobTask` | Task | Deploy Nomad jobs | +| `NomadStopJobTask` | Task | Stop jobs | +| `NomadScaleJobTask` | Task | Scale task groups | +| `NomadHealthCheckTask` | Task | Job health check | +| `NomadDispatchJobTask` | Task | Dispatch parameterized jobs | +| `NomadLogStreamer` | Class | Allocation log streaming | + +--- + +## Agent Protocol (gRPC) + +```protobuf +syntax = "proto3"; +package stella.agent.v1; + +service AgentService { + // Task execution + rpc ExecuteTask(TaskRequest) returns (stream TaskProgress); + rpc CancelTask(CancelTaskRequest) returns (CancelTaskResponse); + + // Health and status + rpc Heartbeat(HeartbeatRequest) returns (HeartbeatResponse); + rpc GetStatus(StatusRequest) returns (StatusResponse); + + // Logs + rpc StreamLogs(LogStreamRequest) returns (stream LogEntry); +} + +message TaskRequest { + string task_id = 1; + string task_type = 2; + bytes payload = 3; + map credentials = 4; +} + +message TaskProgress { + string task_id = 1; + TaskState state = 2; + int32 progress_percent = 3; + string message = 4; + bytes result = 5; +} + +enum TaskState { + PENDING = 0; + RUNNING = 1; + SUCCEEDED = 2; + FAILED = 3; + CANCELLED = 4; +} +``` + +--- + +## Agent Security + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ AGENT SECURITY MODEL │ +│ │ +│ Registration Flow: │ +│ ┌─────────────┐ 1. Get token ┌─────────────┐ │ +│ │ Admin │ ───────────────► │ Orchestrator │ │ +│ └─────────────┘ └──────┬──────┘ │ +│ │ 2. Generate one-time token │ +│ ▼ │ +│ ┌─────────────┐ 3. Register ┌─────────────┐ │ +│ │ Agent │ ───────────────► │ Orchestrator │ │ +│ │ (token) │ └──────┬──────┘ │ +│ └─────────────┘ │ 4. Issue mTLS certificate │ +│ ▼ │ +│ ┌─────────────┐ 5. Connect ┌─────────────┐ │ +│ │ Agent │ ◄───────────────►│ Orchestrator │ │ +│ │ (mTLS) │ (gRPC) └─────────────┘ │ +│ └─────────────┘ │ +│ │ +│ Security Controls: │ +│ - mTLS with short-lived certificates (24h) │ +│ - Capability-based authorization │ +│ - Task-scoped credentials (never stored) │ +│ - Audit logging of all operations │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Dependencies + +| Module | Purpose | +|--------|---------| +| 103_003 Agent Manager | Registration | +| 107_002 Target Executor | Task dispatch | +| Docker.DotNet | Docker API | +| AWSSDK.ECS | AWS ECS API | +| AWSSDK.CloudWatchLogs | AWS CloudWatch Logs | +| Nomad.Api (custom) | Nomad HTTP API | + +--- + +## Acceptance Criteria + +- [ ] Agent registers with one-time token +- [ ] mTLS established after registration +- [ ] Heartbeat updates agent status +- [ ] Docker pull/run/stop works +- [ ] Compose up/down works +- [ ] SSH remote execution works +- [ ] WinRM PowerShell works +- [ ] ECS service deploy/scale works +- [ ] ECS task run/stop works +- [ ] Nomad job deploy/stop works +- [ ] Nomad job scaling works +- [ ] Log streaming works (Docker, CloudWatch, Nomad) +- [ ] Credentials resolved at runtime +- [ ] Unit test coverage ≥80% + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Phase 8 index created | +| 10-Jan-2026 | Added ECS agent (108_006) and Nomad agent (108_007) sprints per feature completeness review | diff --git a/docs/implplan/SPRINT_20260110_108_001_AGENTS_core_runtime.md b/docs/implplan/SPRINT_20260110_108_001_AGENTS_core_runtime.md new file mode 100644 index 000000000..a38938d28 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_108_001_AGENTS_core_runtime.md @@ -0,0 +1,776 @@ +# SPRINT: Agent Core Runtime + +> **Sprint ID:** 108_001 +> **Module:** AGENTS +> **Phase:** 8 - Agents +> **Status:** TODO +> **Parent:** [108_000_INDEX](SPRINT_20260110_108_000_INDEX_agents.md) + +--- + +## Overview + +Implement the Agent Core Runtime - the foundational process that runs on target hosts to receive and execute deployment tasks. + +### Objectives + +- Agent host process with lifecycle management +- gRPC server for task reception +- Heartbeat service for health reporting +- Credential resolution at runtime +- Log streaming to orchestrator +- Capability registration system + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Agents/ +│ └── StellaOps.Agent.Core/ +│ ├── AgentHost.cs +│ ├── AgentConfiguration.cs +│ ├── GrpcAgentServer.cs +│ ├── TaskExecutor.cs +│ ├── HeartbeatService.cs +│ ├── CredentialResolver.cs +│ ├── LogStreamer.cs +│ └── CapabilityRegistry.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### AgentConfiguration + +```csharp +namespace StellaOps.Agent.Core; + +public sealed class AgentConfiguration +{ + public required string AgentId { get; set; } + public required string AgentName { get; set; } + public required string OrchestratorUrl { get; set; } + public required string CertificatePath { get; set; } + public required string PrivateKeyPath { get; set; } + public required string CaCertificatePath { get; set; } + public int GrpcPort { get; set; } = 50051; + public TimeSpan HeartbeatInterval { get; set; } = TimeSpan.FromSeconds(30); + public TimeSpan TaskTimeout { get; set; } = TimeSpan.FromMinutes(30); + public IReadOnlyList EnabledCapabilities { get; set; } = []; +} +``` + +### IAgentCapability Interface + +```csharp +namespace StellaOps.Agent.Core; + +public interface IAgentCapability +{ + string Name { get; } + string Version { get; } + IReadOnlyList SupportedTaskTypes { get; } + Task InitializeAsync(CancellationToken ct = default); + Task ExecuteAsync(AgentTask task, CancellationToken ct = default); + Task CheckHealthAsync(CancellationToken ct = default); +} + +public sealed record CapabilityHealthStatus( + bool IsHealthy, + string? Message = null, + IReadOnlyDictionary? Details = null +); +``` + +### CapabilityRegistry + +```csharp +namespace StellaOps.Agent.Core; + +public sealed class CapabilityRegistry +{ + private readonly Dictionary _capabilities = new(); + private readonly ILogger _logger; + + public void Register(IAgentCapability capability) + { + if (_capabilities.ContainsKey(capability.Name)) + { + throw new CapabilityAlreadyRegisteredException(capability.Name); + } + + _capabilities[capability.Name] = capability; + _logger.LogInformation( + "Registered capability {Name} v{Version} with tasks: {Tasks}", + capability.Name, + capability.Version, + string.Join(", ", capability.SupportedTaskTypes)); + } + + public IAgentCapability? GetForTaskType(string taskType) + { + return _capabilities.Values + .FirstOrDefault(c => c.SupportedTaskTypes.Contains(taskType)); + } + + public IReadOnlyList GetCapabilities() + { + return _capabilities.Values.Select(c => new CapabilityInfo( + c.Name, + c.Version, + c.SupportedTaskTypes.ToImmutableArray() + )).ToList().AsReadOnly(); + } + + public async Task InitializeAllAsync(CancellationToken ct = default) + { + foreach (var (name, capability) in _capabilities) + { + var success = await capability.InitializeAsync(ct); + if (!success) + { + _logger.LogWarning("Capability {Name} failed to initialize", name); + } + } + } +} + +public sealed record CapabilityInfo( + string Name, + string Version, + ImmutableArray SupportedTaskTypes +); +``` + +### AgentTask Model + +```csharp +namespace StellaOps.Agent.Core; + +public sealed record AgentTask +{ + public required Guid Id { get; init; } + public required string TaskType { get; init; } + public required string Payload { get; init; } + public required IReadOnlyDictionary Credentials { get; init; } + public required IReadOnlyDictionary Variables { get; init; } + public DateTimeOffset ReceivedAt { get; init; } + public TimeSpan Timeout { get; init; } = TimeSpan.FromMinutes(30); +} + +public sealed record TaskResult +{ + public required Guid TaskId { get; init; } + public required bool Success { get; init; } + public string? Error { get; init; } + public IReadOnlyDictionary Outputs { get; init; } = new Dictionary(); + public DateTimeOffset CompletedAt { get; init; } + public TimeSpan Duration { get; init; } +} +``` + +### TaskExecutor + +```csharp +namespace StellaOps.Agent.Core; + +public sealed class TaskExecutor +{ + private readonly CapabilityRegistry _capabilities; + private readonly CredentialResolver _credentialResolver; + private readonly ILogger _logger; + private readonly ConcurrentDictionary _runningTasks = new(); + + public async Task ExecuteAsync( + AgentTask task, + IProgress? progress = null, + CancellationToken ct = default) + { + var capability = _capabilities.GetForTaskType(task.TaskType) + ?? throw new UnsupportedTaskTypeException(task.TaskType); + + using var taskCts = new CancellationTokenSource(task.Timeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, taskCts.Token); + + _runningTasks[task.Id] = linkedCts; + + var stopwatch = Stopwatch.StartNew(); + + try + { + _logger.LogInformation( + "Executing task {TaskId} of type {TaskType}", + task.Id, task.TaskType); + + progress?.Report(new TaskProgress(task.Id, TaskState.Running, 0, "Starting")); + + // Resolve credentials + var resolvedTask = await ResolveCredentialsAsync(task, linkedCts.Token); + + // Execute via capability + var result = await capability.ExecuteAsync(resolvedTask, linkedCts.Token); + + progress?.Report(new TaskProgress( + task.Id, + result.Success ? TaskState.Succeeded : TaskState.Failed, + 100, + result.Success ? "Completed" : result.Error ?? "Failed")); + + _logger.LogInformation( + "Task {TaskId} completed with status {Status} in {Duration}ms", + task.Id, + result.Success ? "success" : "failure", + stopwatch.ElapsedMilliseconds); + + return result with { Duration = stopwatch.Elapsed }; + } + catch (OperationCanceledException) when (taskCts.IsCancellationRequested) + { + _logger.LogWarning("Task {TaskId} timed out after {Timeout}", task.Id, task.Timeout); + + progress?.Report(new TaskProgress(task.Id, TaskState.Failed, 0, "Timeout")); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Task timed out after {task.Timeout}", + CompletedAt = DateTimeOffset.UtcNow, + Duration = stopwatch.Elapsed + }; + } + catch (OperationCanceledException) + { + _logger.LogInformation("Task {TaskId} was cancelled", task.Id); + + progress?.Report(new TaskProgress(task.Id, TaskState.Cancelled, 0, "Cancelled")); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = "Task was cancelled", + CompletedAt = DateTimeOffset.UtcNow, + Duration = stopwatch.Elapsed + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "Task {TaskId} failed with exception", task.Id); + + progress?.Report(new TaskProgress(task.Id, TaskState.Failed, 0, ex.Message)); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = ex.Message, + CompletedAt = DateTimeOffset.UtcNow, + Duration = stopwatch.Elapsed + }; + } + finally + { + _runningTasks.TryRemove(task.Id, out _); + } + } + + public bool CancelTask(Guid taskId) + { + if (_runningTasks.TryGetValue(taskId, out var cts)) + { + cts.Cancel(); + return true; + } + return false; + } + + private async Task ResolveCredentialsAsync(AgentTask task, CancellationToken ct) + { + var resolvedCredentials = new Dictionary(); + + foreach (var (key, value) in task.Credentials) + { + resolvedCredentials[key] = await _credentialResolver.ResolveAsync(value, ct); + } + + return task with { Credentials = resolvedCredentials }; + } +} + +public sealed record TaskProgress( + Guid TaskId, + TaskState State, + int ProgressPercent, + string Message +); + +public enum TaskState +{ + Pending, + Running, + Succeeded, + Failed, + Cancelled +} +``` + +### CredentialResolver + +```csharp +namespace StellaOps.Agent.Core; + +public sealed class CredentialResolver +{ + private readonly IEnumerable _providers; + private readonly ILogger _logger; + + public async Task ResolveAsync(string reference, CancellationToken ct = default) + { + // Reference format: provider://path + // e.g., env://DB_PASSWORD, file:///etc/secrets/api-key, vault://secrets/myapp/apikey + + var parsed = ParseReference(reference); + if (parsed is null) + { + // Not a reference, return as-is (literal value) + return reference; + } + + var provider = _providers.FirstOrDefault(p => p.Scheme == parsed.Scheme) + ?? throw new UnknownCredentialProviderException(parsed.Scheme); + + var value = await provider.GetSecretAsync(parsed.Path, ct); + if (value is null) + { + throw new CredentialNotFoundException(reference); + } + + _logger.LogDebug("Resolved credential reference {Scheme}://***", parsed.Scheme); + return value; + } + + private static CredentialReference? ParseReference(string reference) + { + if (string.IsNullOrEmpty(reference)) + return null; + + var match = Regex.Match(reference, @"^([a-z]+)://(.+)$"); + if (!match.Success) + return null; + + return new CredentialReference(match.Groups[1].Value, match.Groups[2].Value); + } +} + +public interface ICredentialProvider +{ + string Scheme { get; } + Task GetSecretAsync(string path, CancellationToken ct = default); +} + +public sealed class EnvironmentCredentialProvider : ICredentialProvider +{ + public string Scheme => "env"; + + public Task GetSecretAsync(string path, CancellationToken ct = default) + { + return Task.FromResult(Environment.GetEnvironmentVariable(path)); + } +} + +public sealed class FileCredentialProvider : ICredentialProvider +{ + public string Scheme => "file"; + + public async Task GetSecretAsync(string path, CancellationToken ct = default) + { + if (!File.Exists(path)) + return null; + + return (await File.ReadAllTextAsync(path, ct)).Trim(); + } +} + +internal sealed record CredentialReference(string Scheme, string Path); +``` + +### HeartbeatService + +```csharp +namespace StellaOps.Agent.Core; + +public sealed class HeartbeatService : BackgroundService +{ + private readonly AgentConfiguration _config; + private readonly CapabilityRegistry _capabilities; + private readonly IOrchestratorClient _orchestratorClient; + private readonly ILogger _logger; + + protected override async Task ExecuteAsync(CancellationToken stoppingToken) + { + _logger.LogInformation("Heartbeat service started"); + + while (!stoppingToken.IsCancellationRequested) + { + try + { + await SendHeartbeatAsync(stoppingToken); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Failed to send heartbeat"); + } + + await Task.Delay(_config.HeartbeatInterval, stoppingToken); + } + } + + private async Task SendHeartbeatAsync(CancellationToken ct) + { + var capabilities = _capabilities.GetCapabilities(); + var health = await CheckCapabilityHealthAsync(ct); + + var heartbeat = new AgentHeartbeat + { + AgentId = _config.AgentId, + Timestamp = DateTimeOffset.UtcNow, + Status = health.AllHealthy ? AgentStatus.Active : AgentStatus.Degraded, + Capabilities = capabilities, + SystemInfo = GetSystemInfo(), + RunningTaskCount = GetRunningTaskCount(), + HealthDetails = health.Details + }; + + await _orchestratorClient.SendHeartbeatAsync(heartbeat, ct); + + _logger.LogDebug( + "Heartbeat sent: status={Status}, tasks={TaskCount}", + heartbeat.Status, + heartbeat.RunningTaskCount); + } + + private async Task CheckCapabilityHealthAsync(CancellationToken ct) + { + var details = new Dictionary(); + var allHealthy = true; + + foreach (var capability in _capabilities.GetCapabilities()) + { + var cap = _capabilities.GetForTaskType(capability.SupportedTaskTypes.First()); + if (cap is null) continue; + + var health = await cap.CheckHealthAsync(ct); + details[capability.Name] = new { health.IsHealthy, health.Message }; + allHealthy = allHealthy && health.IsHealthy; + } + + return new HealthCheckResult(allHealthy, details); + } + + private static SystemInfo GetSystemInfo() + { + return new SystemInfo + { + Hostname = Environment.MachineName, + OsDescription = RuntimeInformation.OSDescription, + ProcessorCount = Environment.ProcessorCount, + MemoryBytes = GC.GetGCMemoryInfo().TotalAvailableMemoryBytes + }; + } + + private int GetRunningTaskCount() + { + // Implementation would get from TaskExecutor + return 0; + } +} + +public sealed record AgentHeartbeat +{ + public required string AgentId { get; init; } + public required DateTimeOffset Timestamp { get; init; } + public required AgentStatus Status { get; init; } + public required IReadOnlyList Capabilities { get; init; } + public required SystemInfo SystemInfo { get; init; } + public int RunningTaskCount { get; init; } + public IReadOnlyDictionary? HealthDetails { get; init; } +} + +public sealed record SystemInfo +{ + public required string Hostname { get; init; } + public required string OsDescription { get; init; } + public required int ProcessorCount { get; init; } + public required long MemoryBytes { get; init; } +} + +public enum AgentStatus +{ + Inactive, + Active, + Degraded, + Disconnected +} + +internal sealed record HealthCheckResult( + bool AllHealthy, + IReadOnlyDictionary Details +); +``` + +### LogStreamer + +```csharp +namespace StellaOps.Agent.Core; + +public sealed class LogStreamer : IAsyncDisposable +{ + private readonly IOrchestratorClient _orchestratorClient; + private readonly Channel _logChannel; + private readonly ILogger _logger; + private readonly CancellationTokenSource _cts = new(); + private readonly Task _streamTask; + + public LogStreamer(IOrchestratorClient orchestratorClient, ILogger logger) + { + _orchestratorClient = orchestratorClient; + _logger = logger; + _logChannel = Channel.CreateBounded(new BoundedChannelOptions(10000) + { + FullMode = BoundedChannelFullMode.DropOldest + }); + + _streamTask = StreamLogsAsync(_cts.Token); + } + + public void Log(Guid taskId, LogLevel level, string message) + { + var entry = new LogEntry + { + TaskId = taskId, + Timestamp = DateTimeOffset.UtcNow, + Level = level, + Message = message + }; + + if (!_logChannel.Writer.TryWrite(entry)) + { + _logger.LogWarning("Log channel full, dropping log entry"); + } + } + + private async Task StreamLogsAsync(CancellationToken ct) + { + var batch = new List(); + var batchTimeout = TimeSpan.FromMilliseconds(100); + + while (!ct.IsCancellationRequested) + { + try + { + // Collect logs for batching + using var timeoutCts = new CancellationTokenSource(batchTimeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + while (batch.Count < 100) + { + if (_logChannel.Reader.TryRead(out var entry)) + { + batch.Add(entry); + } + else + { + await _logChannel.Reader.WaitToReadAsync(linkedCts.Token); + } + } + } + catch (OperationCanceledException) when (!ct.IsCancellationRequested) + { + // Timeout, send what we have + } + + if (batch.Count > 0) + { + try + { + await _orchestratorClient.SendLogsAsync(batch, ct); + batch.Clear(); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Failed to send logs, will retry"); + } + } + } + } + + public async ValueTask DisposeAsync() + { + _cts.Cancel(); + await _streamTask; + _cts.Dispose(); + } +} + +public sealed record LogEntry +{ + public required Guid TaskId { get; init; } + public required DateTimeOffset Timestamp { get; init; } + public required LogLevel Level { get; init; } + public required string Message { get; init; } +} +``` + +### AgentHost + +```csharp +namespace StellaOps.Agent.Core; + +public sealed class AgentHost : IHostedService +{ + private readonly AgentConfiguration _config; + private readonly CapabilityRegistry _capabilities; + private readonly GrpcAgentServer _grpcServer; + private readonly HeartbeatService _heartbeatService; + private readonly IOrchestratorClient _orchestratorClient; + private readonly ILogger _logger; + + public async Task StartAsync(CancellationToken cancellationToken) + { + _logger.LogInformation( + "Starting Stella Agent {Name} ({Id})", + _config.AgentName, + _config.AgentId); + + // Initialize capabilities + await _capabilities.InitializeAllAsync(cancellationToken); + + // Connect to orchestrator + await _orchestratorClient.ConnectAsync(cancellationToken); + + // Start gRPC server + await _grpcServer.StartAsync(cancellationToken); + + _logger.LogInformation( + "Agent started on port {Port} with {Count} capabilities", + _config.GrpcPort, + _capabilities.GetCapabilities().Count); + } + + public async Task StopAsync(CancellationToken cancellationToken) + { + _logger.LogInformation("Stopping Stella Agent"); + + await _grpcServer.StopAsync(cancellationToken); + await _orchestratorClient.DisconnectAsync(cancellationToken); + + _logger.LogInformation("Agent stopped"); + } +} +``` + +### GrpcAgentServer + +```csharp +namespace StellaOps.Agent.Core; + +public sealed class GrpcAgentServer +{ + private readonly AgentConfiguration _config; + private readonly TaskExecutor _taskExecutor; + private readonly LogStreamer _logStreamer; + private readonly ILogger _logger; + private Server? _server; + + public Task StartAsync(CancellationToken ct = default) + { + var serverCredentials = BuildServerCredentials(); + + _server = new Server + { + Services = { AgentService.BindService(new AgentServiceImpl(_taskExecutor, _logStreamer)) }, + Ports = { new ServerPort("0.0.0.0", _config.GrpcPort, serverCredentials) } + }; + + _server.Start(); + _logger.LogInformation("gRPC server started on port {Port}", _config.GrpcPort); + + return Task.CompletedTask; + } + + public async Task StopAsync(CancellationToken ct = default) + { + if (_server is not null) + { + await _server.ShutdownAsync(); + _logger.LogInformation("gRPC server stopped"); + } + } + + private ServerCredentials BuildServerCredentials() + { + var cert = File.ReadAllText(_config.CertificatePath); + var key = File.ReadAllText(_config.PrivateKeyPath); + var caCert = File.ReadAllText(_config.CaCertificatePath); + + var keyCertPair = new KeyCertificatePair(cert, key); + + return new SslServerCredentials( + new[] { keyCertPair }, + caCert, + SslClientCertificateRequestType.RequestAndRequireAndVerify); + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Agent process starts and runs as service +- [ ] gRPC server accepts mTLS connections +- [ ] Capabilities register at startup +- [ ] Tasks execute via correct capability +- [ ] Task cancellation works +- [ ] Heartbeat sends to orchestrator +- [ ] Credentials resolve at runtime +- [ ] Logs stream to orchestrator +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 103_003 Agent Manager | Internal | TODO | +| Grpc.AspNetCore | NuGet | Available | +| Google.Protobuf | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| AgentConfiguration | TODO | | +| IAgentCapability | TODO | | +| CapabilityRegistry | TODO | | +| TaskExecutor | TODO | | +| CredentialResolver | TODO | | +| HeartbeatService | TODO | | +| LogStreamer | TODO | | +| AgentHost | TODO | | +| GrpcAgentServer | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_108_002_AGENTS_docker.md b/docs/implplan/SPRINT_20260110_108_002_AGENTS_docker.md new file mode 100644 index 000000000..328c10d0b --- /dev/null +++ b/docs/implplan/SPRINT_20260110_108_002_AGENTS_docker.md @@ -0,0 +1,936 @@ +# SPRINT: Agent - Docker + +> **Sprint ID:** 108_002 +> **Module:** AGENTS +> **Phase:** 8 - Agents +> **Status:** TODO +> **Parent:** [108_000_INDEX](SPRINT_20260110_108_000_INDEX_agents.md) + +--- + +## Overview + +Implement the Docker Agent capability for managing standalone Docker containers on target hosts. + +### Objectives + +- Docker image pull operations +- Container creation and start +- Container stop and removal +- Container health checking +- Log streaming from containers +- Registry authentication + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Agents/ +│ └── StellaOps.Agent.Docker/ +│ ├── DockerCapability.cs +│ ├── Tasks/ +│ │ ├── DockerPullTask.cs +│ │ ├── DockerRunTask.cs +│ │ ├── DockerStopTask.cs +│ │ ├── DockerRemoveTask.cs +│ │ └── DockerHealthCheckTask.cs +│ ├── DockerClientFactory.cs +│ └── ContainerLogStreamer.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### DockerCapability + +```csharp +namespace StellaOps.Agent.Docker; + +public sealed class DockerCapability : IAgentCapability +{ + private readonly IDockerClient _dockerClient; + private readonly ILogger _logger; + private readonly Dictionary>> _taskHandlers; + + public string Name => "docker"; + public string Version => "1.0.0"; + + public IReadOnlyList SupportedTaskTypes => new[] + { + "docker.pull", + "docker.run", + "docker.stop", + "docker.remove", + "docker.health-check", + "docker.logs" + }; + + public DockerCapability(IDockerClient dockerClient, ILogger logger) + { + _dockerClient = dockerClient; + _logger = logger; + + _taskHandlers = new Dictionary>> + { + ["docker.pull"] = ExecutePullAsync, + ["docker.run"] = ExecuteRunAsync, + ["docker.stop"] = ExecuteStopAsync, + ["docker.remove"] = ExecuteRemoveAsync, + ["docker.health-check"] = ExecuteHealthCheckAsync + }; + } + + public async Task InitializeAsync(CancellationToken ct = default) + { + try + { + var version = await _dockerClient.System.GetVersionAsync(ct); + _logger.LogInformation( + "Docker capability initialized: Docker {Version} on {OS}", + version.Version, + version.Os); + return true; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to initialize Docker capability"); + return false; + } + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct = default) + { + if (!_taskHandlers.TryGetValue(task.TaskType, out var handler)) + { + throw new UnsupportedTaskTypeException(task.TaskType); + } + + return await handler(task, ct); + } + + public async Task CheckHealthAsync(CancellationToken ct = default) + { + try + { + await _dockerClient.System.PingAsync(ct); + return new CapabilityHealthStatus(true, "Docker daemon responding"); + } + catch (Exception ex) + { + return new CapabilityHealthStatus(false, $"Docker daemon not responding: {ex.Message}"); + } + } + + private Task ExecutePullAsync(AgentTask task, CancellationToken ct) => + new DockerPullTask(_dockerClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteRunAsync(AgentTask task, CancellationToken ct) => + new DockerRunTask(_dockerClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteStopAsync(AgentTask task, CancellationToken ct) => + new DockerStopTask(_dockerClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteRemoveAsync(AgentTask task, CancellationToken ct) => + new DockerRemoveTask(_dockerClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteHealthCheckAsync(AgentTask task, CancellationToken ct) => + new DockerHealthCheckTask(_dockerClient, _logger).ExecuteAsync(task, ct); +} +``` + +### DockerPullTask + +```csharp +namespace StellaOps.Agent.Docker.Tasks; + +public sealed class DockerPullTask +{ + private readonly IDockerClient _dockerClient; + private readonly ILogger _logger; + + public sealed record PullPayload + { + public required string Image { get; init; } + public string? Tag { get; init; } + public string? Digest { get; init; } + public string? Registry { get; init; } + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("docker.pull"); + + var imageRef = BuildImageReference(payload); + + _logger.LogInformation("Pulling image {Image}", imageRef); + + try + { + // Get registry credentials if provided + AuthConfig? authConfig = null; + if (task.Credentials.TryGetValue("registry.username", out var username) && + task.Credentials.TryGetValue("registry.password", out var password)) + { + authConfig = new AuthConfig + { + Username = username, + Password = password, + ServerAddress = payload.Registry ?? "https://index.docker.io/v1/" + }; + } + + await _dockerClient.Images.CreateImageAsync( + new ImagesCreateParameters + { + FromImage = imageRef + }, + authConfig, + new Progress(msg => + { + if (!string.IsNullOrEmpty(msg.Status)) + { + _logger.LogDebug("Pull progress: {Status}", msg.Status); + } + }), + ct); + + // Verify the image was pulled + var images = await _dockerClient.Images.ListImagesAsync( + new ImagesListParameters + { + Filters = new Dictionary> + { + ["reference"] = new Dictionary { [imageRef] = true } + } + }, + ct); + + if (images.Count == 0) + { + throw new ImagePullException(imageRef, "Image not found after pull"); + } + + var pulledImage = images.First(); + + _logger.LogInformation( + "Successfully pulled image {Image} (ID: {Id})", + imageRef, + pulledImage.ID[..12]); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["imageId"] = pulledImage.ID, + ["size"] = pulledImage.Size, + ["digest"] = payload.Digest ?? ExtractDigest(pulledImage) + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (DockerApiException ex) + { + _logger.LogError(ex, "Failed to pull image {Image}", imageRef); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Failed to pull image: {ex.Message}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private static string BuildImageReference(PullPayload payload) + { + var image = payload.Image; + + if (!string.IsNullOrEmpty(payload.Registry)) + { + image = $"{payload.Registry}/{image}"; + } + + if (!string.IsNullOrEmpty(payload.Digest)) + { + return $"{image}@{payload.Digest}"; + } + + if (!string.IsNullOrEmpty(payload.Tag)) + { + return $"{image}:{payload.Tag}"; + } + + return $"{image}:latest"; + } + + private static string ExtractDigest(ImagesListResponse image) + { + return image.RepoDigests.FirstOrDefault()?.Split('@').LastOrDefault() ?? ""; + } +} +``` + +### DockerRunTask + +```csharp +namespace StellaOps.Agent.Docker.Tasks; + +public sealed class DockerRunTask +{ + private readonly IDockerClient _dockerClient; + private readonly ILogger _logger; + + public sealed record RunPayload + { + public required string Image { get; init; } + public required string Name { get; init; } + public IReadOnlyDictionary? Environment { get; init; } + public IReadOnlyList? Ports { get; init; } + public IReadOnlyList? Volumes { get; init; } + public IReadOnlyDictionary? Labels { get; init; } + public string? Network { get; init; } + public IReadOnlyList? Command { get; init; } + public ContainerHealthConfig? HealthCheck { get; init; } + public bool AutoRemove { get; init; } + public RestartPolicy? RestartPolicy { get; init; } + } + + public sealed record ContainerHealthConfig + { + public required IReadOnlyList Test { get; init; } + public TimeSpan Interval { get; init; } = TimeSpan.FromSeconds(30); + public TimeSpan Timeout { get; init; } = TimeSpan.FromSeconds(10); + public int Retries { get; init; } = 3; + public TimeSpan StartPeriod { get; init; } = TimeSpan.FromSeconds(0); + } + + public sealed record RestartPolicy + { + public string Name { get; init; } = "no"; + public int MaximumRetryCount { get; init; } + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("docker.run"); + + _logger.LogInformation( + "Creating container {Name} from image {Image}", + payload.Name, + payload.Image); + + try + { + // Check if container already exists + var existingContainers = await _dockerClient.Containers.ListContainersAsync( + new ContainersListParameters + { + All = true, + Filters = new Dictionary> + { + ["name"] = new Dictionary { [payload.Name] = true } + } + }, + ct); + + if (existingContainers.Any()) + { + var existing = existingContainers.First(); + _logger.LogInformation( + "Container {Name} already exists (ID: {Id}), removing", + payload.Name, + existing.ID[..12]); + + await _dockerClient.Containers.StopContainerAsync(existing.ID, new ContainerStopParameters(), ct); + await _dockerClient.Containers.RemoveContainerAsync(existing.ID, new ContainerRemoveParameters(), ct); + } + + // Merge labels with Stella metadata + var labels = new Dictionary(payload.Labels ?? new Dictionary()); + labels["stella.managed"] = "true"; + labels["stella.task.id"] = task.Id.ToString(); + + // Build create parameters + var createParams = new CreateContainerParameters + { + Image = payload.Image, + Name = payload.Name, + Env = BuildEnvironment(payload.Environment, task.Variables), + Labels = labels, + Cmd = payload.Command?.ToList(), + HostConfig = new HostConfig + { + PortBindings = ParsePortBindings(payload.Ports), + Binds = payload.Volumes?.ToList(), + NetworkMode = payload.Network, + AutoRemove = payload.AutoRemove, + RestartPolicy = payload.RestartPolicy is not null + ? new Docker.DotNet.Models.RestartPolicy + { + Name = Enum.Parse(payload.RestartPolicy.Name, ignoreCase: true), + MaximumRetryCount = payload.RestartPolicy.MaximumRetryCount + } + : null + }, + Healthcheck = payload.HealthCheck is not null + ? new HealthConfig + { + Test = payload.HealthCheck.Test.ToList(), + Interval = (long)payload.HealthCheck.Interval.TotalNanoseconds, + Timeout = (long)payload.HealthCheck.Timeout.TotalNanoseconds, + Retries = payload.HealthCheck.Retries, + StartPeriod = (long)payload.HealthCheck.StartPeriod.TotalNanoseconds + } + : null + }; + + // Create container + var createResponse = await _dockerClient.Containers.CreateContainerAsync(createParams, ct); + + _logger.LogInformation( + "Created container {Name} (ID: {Id})", + payload.Name, + createResponse.ID[..12]); + + // Start container + var started = await _dockerClient.Containers.StartContainerAsync( + createResponse.ID, + new ContainerStartParameters(), + ct); + + if (!started) + { + throw new ContainerStartException(payload.Name, "Container failed to start"); + } + + // Get container info + var containerInfo = await _dockerClient.Containers.InspectContainerAsync(createResponse.ID, ct); + + _logger.LogInformation( + "Started container {Name} (State: {State})", + payload.Name, + containerInfo.State.Status); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["containerId"] = createResponse.ID, + ["containerName"] = payload.Name, + ["state"] = containerInfo.State.Status, + ["ipAddress"] = containerInfo.NetworkSettings.IPAddress + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (DockerApiException ex) + { + _logger.LogError(ex, "Failed to create/start container {Name}", payload.Name); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Failed to create/start container: {ex.Message}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private static List BuildEnvironment( + IReadOnlyDictionary? env, + IReadOnlyDictionary variables) + { + var result = new List(); + + if (env is not null) + { + foreach (var (key, value) in env) + { + // Substitute variables in values + var resolvedValue = SubstituteVariables(value, variables); + result.Add($"{key}={resolvedValue}"); + } + } + + return result; + } + + private static string SubstituteVariables(string value, IReadOnlyDictionary variables) + { + return Regex.Replace(value, @"\$\{([^}]+)\}", match => + { + var varName = match.Groups[1].Value; + return variables.TryGetValue(varName, out var varValue) ? varValue : match.Value; + }); + } + + private static IDictionary> ParsePortBindings(IReadOnlyList? ports) + { + var bindings = new Dictionary>(); + + if (ports is null) + return bindings; + + foreach (var port in ports) + { + // Format: hostPort:containerPort or hostPort:containerPort/protocol + var parts = port.Split(':'); + if (parts.Length != 2) + continue; + + var hostPort = parts[0]; + var containerPortWithProtocol = parts[1]; + var containerPort = containerPortWithProtocol.Contains('/') + ? containerPortWithProtocol + : $"{containerPortWithProtocol}/tcp"; + + bindings[containerPort] = new List + { + new() { HostPort = hostPort } + }; + } + + return bindings; + } +} +``` + +### DockerStopTask + +```csharp +namespace StellaOps.Agent.Docker.Tasks; + +public sealed class DockerStopTask +{ + private readonly IDockerClient _dockerClient; + private readonly ILogger _logger; + + public sealed record StopPayload + { + public string? ContainerId { get; init; } + public string? ContainerName { get; init; } + public TimeSpan Timeout { get; init; } = TimeSpan.FromSeconds(30); + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("docker.stop"); + + var containerId = await ResolveContainerIdAsync(payload, ct); + if (containerId is null) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = "Container not found", + CompletedAt = DateTimeOffset.UtcNow + }; + } + + _logger.LogInformation("Stopping container {ContainerId}", containerId[..12]); + + try + { + var stopped = await _dockerClient.Containers.StopContainerAsync( + containerId, + new ContainerStopParameters + { + WaitBeforeKillSeconds = (uint)payload.Timeout.TotalSeconds + }, + ct); + + if (stopped) + { + _logger.LogInformation("Container {ContainerId} stopped", containerId[..12]); + } + else + { + _logger.LogWarning("Container {ContainerId} was already stopped", containerId[..12]); + } + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["containerId"] = containerId, + ["wasRunning"] = stopped + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (DockerApiException ex) + { + _logger.LogError(ex, "Failed to stop container {ContainerId}", containerId[..12]); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Failed to stop container: {ex.Message}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private async Task ResolveContainerIdAsync(StopPayload payload, CancellationToken ct) + { + if (!string.IsNullOrEmpty(payload.ContainerId)) + { + return payload.ContainerId; + } + + if (!string.IsNullOrEmpty(payload.ContainerName)) + { + var containers = await _dockerClient.Containers.ListContainersAsync( + new ContainersListParameters + { + All = true, + Filters = new Dictionary> + { + ["name"] = new Dictionary { [payload.ContainerName] = true } + } + }, + ct); + + return containers.FirstOrDefault()?.ID; + } + + return null; + } +} +``` + +### DockerHealthCheckTask + +```csharp +namespace StellaOps.Agent.Docker.Tasks; + +public sealed class DockerHealthCheckTask +{ + private readonly IDockerClient _dockerClient; + private readonly ILogger _logger; + + public sealed record HealthCheckPayload + { + public string? ContainerId { get; init; } + public string? ContainerName { get; init; } + public TimeSpan Timeout { get; init; } = TimeSpan.FromSeconds(30); + public bool WaitForHealthy { get; init; } = true; + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("docker.health-check"); + + var containerId = await ResolveContainerIdAsync(payload, ct); + if (containerId is null) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = "Container not found", + CompletedAt = DateTimeOffset.UtcNow + }; + } + + _logger.LogInformation("Checking health of container {ContainerId}", containerId[..12]); + + try + { + using var timeoutCts = new CancellationTokenSource(payload.Timeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + while (!linkedCts.IsCancellationRequested) + { + var containerInfo = await _dockerClient.Containers.InspectContainerAsync(containerId, linkedCts.Token); + + if (containerInfo.State.Status != "running") + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Container not running (state: {containerInfo.State.Status})", + Outputs = new Dictionary + { + ["state"] = containerInfo.State.Status + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + var health = containerInfo.State.Health; + if (health is null) + { + // No health check configured, container is running + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["containerId"] = containerId, + ["state"] = "running", + ["healthCheck"] = "none" + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + if (health.Status == "healthy") + { + _logger.LogInformation("Container {ContainerId} is healthy", containerId[..12]); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["containerId"] = containerId, + ["state"] = "running", + ["healthStatus"] = "healthy", + ["failingStreak"] = health.FailingStreak + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + if (health.Status == "unhealthy") + { + var lastLog = health.Log.LastOrDefault(); + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Container unhealthy: {lastLog?.Output ?? "unknown"}", + Outputs = new Dictionary + { + ["containerId"] = containerId, + ["healthStatus"] = "unhealthy", + ["failingStreak"] = health.FailingStreak + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + if (!payload.WaitForHealthy) + { + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["containerId"] = containerId, + ["healthStatus"] = health.Status + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + // Wait before checking again + await Task.Delay(TimeSpan.FromSeconds(2), linkedCts.Token); + } + + throw new OperationCanceledException(); + } + catch (OperationCanceledException) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Health check timed out after {payload.Timeout}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private async Task ResolveContainerIdAsync(HealthCheckPayload payload, CancellationToken ct) + { + if (!string.IsNullOrEmpty(payload.ContainerId)) + { + return payload.ContainerId; + } + + if (!string.IsNullOrEmpty(payload.ContainerName)) + { + var containers = await _dockerClient.Containers.ListContainersAsync( + new ContainersListParameters + { + All = true, + Filters = new Dictionary> + { + ["name"] = new Dictionary { [payload.ContainerName] = true } + } + }, + ct); + + return containers.FirstOrDefault()?.ID; + } + + return null; + } +} +``` + +### ContainerLogStreamer + +```csharp +namespace StellaOps.Agent.Docker; + +public sealed class ContainerLogStreamer +{ + private readonly IDockerClient _dockerClient; + private readonly LogStreamer _logStreamer; + private readonly ILogger _logger; + + public async Task StreamLogsAsync( + Guid taskId, + string containerId, + CancellationToken ct = default) + { + try + { + var stream = await _dockerClient.Containers.GetContainerLogsAsync( + containerId, + false, + new ContainerLogsParameters + { + Follow = true, + ShowStdout = true, + ShowStderr = true, + Timestamps = true + }, + ct); + + using var reader = new StreamReader(stream); + + while (!ct.IsCancellationRequested) + { + var line = await reader.ReadLineAsync(ct); + if (line is null) + break; + + var (level, message) = ParseLogLine(line); + _logStreamer.Log(taskId, level, message); + } + } + catch (OperationCanceledException) + { + // Expected when task completes + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Error streaming logs for container {ContainerId}", containerId[..12]); + } + } + + private static (LogLevel Level, string Message) ParseLogLine(string line) + { + // Docker log format includes stream type marker + // First 8 bytes are header: [stream_type, 0, 0, 0, size (4 bytes)] + // For text streams, we just parse the content + + var level = LogLevel.Information; + + // Simple heuristic for log level detection + if (line.Contains("ERROR", StringComparison.OrdinalIgnoreCase) || + line.Contains("FATAL", StringComparison.OrdinalIgnoreCase)) + { + level = LogLevel.Error; + } + else if (line.Contains("WARN", StringComparison.OrdinalIgnoreCase)) + { + level = LogLevel.Warning; + } + else if (line.Contains("DEBUG", StringComparison.OrdinalIgnoreCase)) + { + level = LogLevel.Debug; + } + + return (level, line); + } +} +``` + +### Documentation Deliverables + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `docs/modules/release-orchestrator/deployment/agent-based.md` (partial) | Markdown | Agent-based deployment documentation (Docker agent with 9 operations) | + +--- + +## Acceptance Criteria + +### Code + +- [ ] Pull images with digest references +- [ ] Pull from authenticated registries +- [ ] Create containers with environment variables +- [ ] Create containers with port mappings +- [ ] Create containers with volume mounts +- [ ] Start containers successfully +- [ ] Stop containers gracefully +- [ ] Remove containers +- [ ] Check container health status +- [ ] Wait for health check to pass +- [ ] Stream container logs +- [ ] Unit test coverage >=85% + +### Documentation +- [ ] Docker agent documentation section created +- [ ] All Docker operations documented (pull, run, stop, remove, health check, logs) +- [ ] TypeScript implementation code included +- [ ] Digest verification flow documented + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 108_001 Agent Core Runtime | Internal | TODO | +| Docker.DotNet | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| DockerCapability | TODO | | +| DockerPullTask | TODO | | +| DockerRunTask | TODO | | +| DockerStopTask | TODO | | +| DockerRemoveTask | TODO | | +| DockerHealthCheckTask | TODO | | +| ContainerLogStreamer | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 11-Jan-2026 | Added documentation deliverable: deployment/agent-based.md (partial - Docker) | diff --git a/docs/implplan/SPRINT_20260110_108_003_AGENTS_compose.md b/docs/implplan/SPRINT_20260110_108_003_AGENTS_compose.md new file mode 100644 index 000000000..6af05eccb --- /dev/null +++ b/docs/implplan/SPRINT_20260110_108_003_AGENTS_compose.md @@ -0,0 +1,976 @@ +# SPRINT: Agent - Compose + +> **Sprint ID:** 108_003 +> **Module:** AGENTS +> **Phase:** 8 - Agents +> **Status:** TODO +> **Parent:** [108_000_INDEX](SPRINT_20260110_108_000_INDEX_agents.md) + +--- + +## Overview + +Implement the Compose Agent capability for managing docker-compose stacks on target hosts. + +### Objectives + +- Compose stack deployment (up) +- Compose stack teardown (down) +- Service scaling +- Stack health checking +- Compose file management with digest-locked references + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Agents/ +│ └── StellaOps.Agent.Compose/ +│ ├── ComposeCapability.cs +│ ├── Tasks/ +│ │ ├── ComposePullTask.cs +│ │ ├── ComposeUpTask.cs +│ │ ├── ComposeDownTask.cs +│ │ ├── ComposeScaleTask.cs +│ │ └── ComposeHealthCheckTask.cs +│ ├── ComposeFileManager.cs +│ └── ComposeExecutor.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### ComposeCapability + +```csharp +namespace StellaOps.Agent.Compose; + +public sealed class ComposeCapability : IAgentCapability +{ + private readonly ComposeExecutor _executor; + private readonly ComposeFileManager _fileManager; + private readonly ILogger _logger; + private readonly Dictionary>> _taskHandlers; + + public string Name => "compose"; + public string Version => "1.0.0"; + + public IReadOnlyList SupportedTaskTypes => new[] + { + "compose.pull", + "compose.up", + "compose.down", + "compose.scale", + "compose.health-check", + "compose.ps" + }; + + public ComposeCapability( + ComposeExecutor executor, + ComposeFileManager fileManager, + ILogger logger) + { + _executor = executor; + _fileManager = fileManager; + _logger = logger; + + _taskHandlers = new Dictionary>> + { + ["compose.pull"] = ExecutePullAsync, + ["compose.up"] = ExecuteUpAsync, + ["compose.down"] = ExecuteDownAsync, + ["compose.scale"] = ExecuteScaleAsync, + ["compose.health-check"] = ExecuteHealthCheckAsync, + ["compose.ps"] = ExecutePsAsync + }; + } + + public async Task InitializeAsync(CancellationToken ct = default) + { + try + { + var version = await _executor.GetVersionAsync(ct); + _logger.LogInformation("Compose capability initialized: {Version}", version); + return true; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to initialize Compose capability"); + return false; + } + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct = default) + { + if (!_taskHandlers.TryGetValue(task.TaskType, out var handler)) + { + throw new UnsupportedTaskTypeException(task.TaskType); + } + + return await handler(task, ct); + } + + public async Task CheckHealthAsync(CancellationToken ct = default) + { + try + { + await _executor.GetVersionAsync(ct); + return new CapabilityHealthStatus(true, "Docker Compose available"); + } + catch (Exception ex) + { + return new CapabilityHealthStatus(false, $"Docker Compose not available: {ex.Message}"); + } + } + + private Task ExecutePullAsync(AgentTask task, CancellationToken ct) => + new ComposePullTask(_executor, _fileManager, _logger).ExecuteAsync(task, ct); + + private Task ExecuteUpAsync(AgentTask task, CancellationToken ct) => + new ComposeUpTask(_executor, _fileManager, _logger).ExecuteAsync(task, ct); + + private Task ExecuteDownAsync(AgentTask task, CancellationToken ct) => + new ComposeDownTask(_executor, _fileManager, _logger).ExecuteAsync(task, ct); + + private Task ExecuteScaleAsync(AgentTask task, CancellationToken ct) => + new ComposeScaleTask(_executor, _logger).ExecuteAsync(task, ct); + + private Task ExecuteHealthCheckAsync(AgentTask task, CancellationToken ct) => + new ComposeHealthCheckTask(_executor, _logger).ExecuteAsync(task, ct); + + private Task ExecutePsAsync(AgentTask task, CancellationToken ct) => + new ComposePsTask(_executor, _logger).ExecuteAsync(task, ct); +} +``` + +### ComposeExecutor + +```csharp +namespace StellaOps.Agent.Compose; + +public sealed class ComposeExecutor +{ + private readonly string _composeCommand; + private readonly ILogger _logger; + + public ComposeExecutor(ILogger logger) + { + _logger = logger; + // Detect docker compose v2 vs docker-compose v1 + _composeCommand = DetectComposeCommand(); + } + + public async Task GetVersionAsync(CancellationToken ct = default) + { + var result = await ExecuteAsync("version --short", null, ct); + return result.StandardOutput.Trim(); + } + + public async Task PullAsync( + string projectDir, + string composeFile, + IReadOnlyDictionary? credentials = null, + CancellationToken ct = default) + { + var args = $"-f {composeFile} pull"; + return await ExecuteAsync(args, projectDir, ct, BuildEnvironment(credentials)); + } + + public async Task UpAsync( + string projectDir, + string composeFile, + ComposeUpOptions options, + CancellationToken ct = default) + { + var args = $"-f {composeFile} up -d"; + + if (options.ForceRecreate) + args += " --force-recreate"; + + if (options.RemoveOrphans) + args += " --remove-orphans"; + + if (options.NoStart) + args += " --no-start"; + + if (options.Services?.Count > 0) + args += " " + string.Join(" ", options.Services); + + return await ExecuteAsync(args, projectDir, ct, options.Environment); + } + + public async Task DownAsync( + string projectDir, + string composeFile, + ComposeDownOptions options, + CancellationToken ct = default) + { + var args = $"-f {composeFile} down"; + + if (options.RemoveVolumes) + args += " -v"; + + if (options.RemoveOrphans) + args += " --remove-orphans"; + + if (options.Timeout.HasValue) + args += $" -t {(int)options.Timeout.Value.TotalSeconds}"; + + return await ExecuteAsync(args, projectDir, ct); + } + + public async Task ScaleAsync( + string projectDir, + string composeFile, + IReadOnlyDictionary scaling, + CancellationToken ct = default) + { + var scaleArgs = string.Join(" ", scaling.Select(kv => $"{kv.Key}={kv.Value}")); + var args = $"-f {composeFile} up -d --no-recreate --scale {scaleArgs}"; + return await ExecuteAsync(args, projectDir, ct); + } + + public async Task PsAsync( + string projectDir, + string composeFile, + bool all = false, + CancellationToken ct = default) + { + var args = $"-f {composeFile} ps --format json"; + if (all) + args += " -a"; + + return await ExecuteAsync(args, projectDir, ct); + } + + private async Task ExecuteAsync( + string arguments, + string? workingDirectory, + CancellationToken ct, + IReadOnlyDictionary? environment = null) + { + var psi = new ProcessStartInfo + { + FileName = _composeCommand.Split(' ')[0], + Arguments = _composeCommand.Contains(' ') + ? $"{_composeCommand.Substring(_composeCommand.IndexOf(' ') + 1)} {arguments}" + : arguments, + WorkingDirectory = workingDirectory ?? Environment.CurrentDirectory, + RedirectStandardOutput = true, + RedirectStandardError = true, + UseShellExecute = false, + CreateNoWindow = true + }; + + if (environment is not null) + { + foreach (var (key, value) in environment) + { + psi.Environment[key] = value; + } + } + + _logger.LogDebug("Executing: {Command} {Args}", psi.FileName, psi.Arguments); + + using var process = new Process { StartInfo = psi }; + var stdout = new StringBuilder(); + var stderr = new StringBuilder(); + + process.OutputDataReceived += (_, e) => + { + if (e.Data is not null) + stdout.AppendLine(e.Data); + }; + + process.ErrorDataReceived += (_, e) => + { + if (e.Data is not null) + stderr.AppendLine(e.Data); + }; + + process.Start(); + process.BeginOutputReadLine(); + process.BeginErrorReadLine(); + + await process.WaitForExitAsync(ct); + + var result = new ComposeResult( + process.ExitCode == 0, + process.ExitCode, + stdout.ToString(), + stderr.ToString()); + + if (!result.Success) + { + _logger.LogWarning( + "Compose command failed with exit code {ExitCode}: {Stderr}", + result.ExitCode, + result.StandardError); + } + + return result; + } + + private static string DetectComposeCommand() + { + // Try docker compose (v2) first + try + { + var psi = new ProcessStartInfo + { + FileName = "docker", + Arguments = "compose version", + RedirectStandardOutput = true, + RedirectStandardError = true, + UseShellExecute = false, + CreateNoWindow = true + }; + + using var process = Process.Start(psi); + process?.WaitForExit(5000); + if (process?.ExitCode == 0) + { + return "docker compose"; + } + } + catch { } + + // Fall back to docker-compose (v1) + return "docker-compose"; + } + + private static IReadOnlyDictionary? BuildEnvironment( + IReadOnlyDictionary? credentials) + { + if (credentials is null) + return null; + + var env = new Dictionary(); + + if (credentials.TryGetValue("registry.username", out var user)) + env["DOCKER_REGISTRY_USER"] = user; + + if (credentials.TryGetValue("registry.password", out var pass)) + env["DOCKER_REGISTRY_PASSWORD"] = pass; + + return env; + } +} + +public sealed record ComposeResult( + bool Success, + int ExitCode, + string StandardOutput, + string StandardError +); + +public sealed record ComposeUpOptions +{ + public bool ForceRecreate { get; init; } + public bool RemoveOrphans { get; init; } = true; + public bool NoStart { get; init; } + public IReadOnlyList? Services { get; init; } + public IReadOnlyDictionary? Environment { get; init; } +} + +public sealed record ComposeDownOptions +{ + public bool RemoveVolumes { get; init; } + public bool RemoveOrphans { get; init; } = true; + public TimeSpan? Timeout { get; init; } +} +``` + +### ComposeFileManager + +```csharp +namespace StellaOps.Agent.Compose; + +public sealed class ComposeFileManager +{ + private readonly string _deploymentRoot; + private readonly ILogger _logger; + + public ComposeFileManager(AgentConfiguration config, ILogger logger) + { + _deploymentRoot = config.DeploymentRoot ?? "/var/lib/stella-agent/deployments"; + _logger = logger; + } + + public async Task WriteComposeFileAsync( + string projectName, + string composeLockContent, + string versionStickerContent, + CancellationToken ct = default) + { + var projectDir = Path.Combine(_deploymentRoot, projectName); + Directory.CreateDirectory(projectDir); + + // Write compose.stella.lock.yml + var composeFile = Path.Combine(projectDir, "compose.stella.lock.yml"); + await File.WriteAllTextAsync(composeFile, composeLockContent, ct); + _logger.LogDebug("Wrote compose file: {Path}", composeFile); + + // Write stella.version.json + var versionFile = Path.Combine(projectDir, "stella.version.json"); + await File.WriteAllTextAsync(versionFile, versionStickerContent, ct); + _logger.LogDebug("Wrote version sticker: {Path}", versionFile); + + return projectDir; + } + + public string GetProjectDirectory(string projectName) + { + return Path.Combine(_deploymentRoot, projectName); + } + + public string GetComposeFilePath(string projectName) + { + return Path.Combine(GetProjectDirectory(projectName), "compose.stella.lock.yml"); + } + + public async Task GetVersionStickerAsync(string projectName, CancellationToken ct = default) + { + var path = Path.Combine(GetProjectDirectory(projectName), "stella.version.json"); + if (!File.Exists(path)) + return null; + + return await File.ReadAllTextAsync(path, ct); + } + + public async Task BackupExistingAsync(string projectName, CancellationToken ct = default) + { + var projectDir = GetProjectDirectory(projectName); + if (!Directory.Exists(projectDir)) + return; + + var backupDir = Path.Combine(projectDir, ".backup", DateTime.UtcNow.ToString("yyyyMMdd-HHmmss")); + Directory.CreateDirectory(backupDir); + + foreach (var file in Directory.GetFiles(projectDir, "*.*")) + { + var fileName = Path.GetFileName(file); + if (fileName.StartsWith(".")) + continue; + + File.Copy(file, Path.Combine(backupDir, fileName)); + } + + _logger.LogDebug("Backed up existing deployment to {BackupDir}", backupDir); + } + + public async Task CleanupAsync(string projectName, CancellationToken ct = default) + { + var projectDir = GetProjectDirectory(projectName); + if (Directory.Exists(projectDir)) + { + Directory.Delete(projectDir, recursive: true); + _logger.LogDebug("Cleaned up project directory: {Path}", projectDir); + } + } +} +``` + +### ComposeUpTask + +```csharp +namespace StellaOps.Agent.Compose.Tasks; + +public sealed class ComposeUpTask +{ + private readonly ComposeExecutor _executor; + private readonly ComposeFileManager _fileManager; + private readonly ILogger _logger; + + public sealed record UpPayload + { + public required string ProjectName { get; init; } + public required string ComposeLock { get; init; } + public required string VersionSticker { get; init; } + public bool ForceRecreate { get; init; } = true; + public bool RemoveOrphans { get; init; } = true; + public IReadOnlyList? Services { get; init; } + public IReadOnlyDictionary? Environment { get; init; } + public bool BackupExisting { get; init; } = true; + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("compose.up"); + + _logger.LogInformation("Deploying compose stack: {Project}", payload.ProjectName); + + try + { + // Backup existing deployment + if (payload.BackupExisting) + { + await _fileManager.BackupExistingAsync(payload.ProjectName, ct); + } + + // Write compose files + var projectDir = await _fileManager.WriteComposeFileAsync( + payload.ProjectName, + payload.ComposeLock, + payload.VersionSticker, + ct); + + var composeFile = _fileManager.GetComposeFilePath(payload.ProjectName); + + // Pull images first + _logger.LogInformation("Pulling images for {Project}", payload.ProjectName); + var pullResult = await _executor.PullAsync( + projectDir, + composeFile, + task.Credentials as IReadOnlyDictionary, + ct); + + if (!pullResult.Success) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Failed to pull images: {pullResult.StandardError}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + + // Deploy the stack + _logger.LogInformation("Starting compose stack: {Project}", payload.ProjectName); + var upResult = await _executor.UpAsync( + projectDir, + composeFile, + new ComposeUpOptions + { + ForceRecreate = payload.ForceRecreate, + RemoveOrphans = payload.RemoveOrphans, + Services = payload.Services, + Environment = MergeEnvironment(payload.Environment, task.Variables) + }, + ct); + + if (!upResult.Success) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Failed to deploy stack: {upResult.StandardError}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + + // Get running services + var psResult = await _executor.PsAsync(projectDir, composeFile, ct: ct); + var services = ParseServicesFromPs(psResult.StandardOutput); + + _logger.LogInformation( + "Deployed compose stack {Project} with {Count} services", + payload.ProjectName, + services.Count); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["projectName"] = payload.ProjectName, + ["projectDir"] = projectDir, + ["services"] = services + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to deploy compose stack {Project}", payload.ProjectName); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = ex.Message, + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private static IReadOnlyDictionary? MergeEnvironment( + IReadOnlyDictionary? env, + IReadOnlyDictionary variables) + { + if (env is null && variables.Count == 0) + return null; + + var merged = new Dictionary(variables); + if (env is not null) + { + foreach (var (key, value) in env) + { + merged[key] = value; + } + } + return merged; + } + + private static IReadOnlyList ParseServicesFromPs(string output) + { + if (string.IsNullOrWhiteSpace(output)) + return []; + + try + { + var services = new List(); + foreach (var line in output.Split('\n', StringSplitOptions.RemoveEmptyEntries)) + { + var service = JsonSerializer.Deserialize(line); + services.Add(new ServiceStatus( + service.GetProperty("Name").GetString() ?? "", + service.GetProperty("Service").GetString() ?? "", + service.GetProperty("State").GetString() ?? "", + service.GetProperty("Health").GetString() + )); + } + return services; + } + catch + { + return []; + } + } +} + +public sealed record ServiceStatus( + string Name, + string Service, + string State, + string? Health +); +``` + +### ComposeDownTask + +```csharp +namespace StellaOps.Agent.Compose.Tasks; + +public sealed class ComposeDownTask +{ + private readonly ComposeExecutor _executor; + private readonly ComposeFileManager _fileManager; + private readonly ILogger _logger; + + public sealed record DownPayload + { + public required string ProjectName { get; init; } + public bool RemoveVolumes { get; init; } + public bool RemoveOrphans { get; init; } = true; + public bool CleanupFiles { get; init; } + public TimeSpan? Timeout { get; init; } = TimeSpan.FromSeconds(30); + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("compose.down"); + + _logger.LogInformation("Stopping compose stack: {Project}", payload.ProjectName); + + try + { + var projectDir = _fileManager.GetProjectDirectory(payload.ProjectName); + var composeFile = _fileManager.GetComposeFilePath(payload.ProjectName); + + if (!File.Exists(composeFile)) + { + _logger.LogWarning( + "Compose file not found for project {Project}, skipping down", + payload.ProjectName); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["projectName"] = payload.ProjectName, + ["skipped"] = true, + ["reason"] = "Compose file not found" + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + var result = await _executor.DownAsync( + projectDir, + composeFile, + new ComposeDownOptions + { + RemoveVolumes = payload.RemoveVolumes, + RemoveOrphans = payload.RemoveOrphans, + Timeout = payload.Timeout + }, + ct); + + if (!result.Success) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Failed to stop stack: {result.StandardError}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + + // Cleanup files if requested + if (payload.CleanupFiles) + { + await _fileManager.CleanupAsync(payload.ProjectName, ct); + } + + _logger.LogInformation("Stopped compose stack: {Project}", payload.ProjectName); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["projectName"] = payload.ProjectName, + ["removedVolumes"] = payload.RemoveVolumes, + ["cleanedFiles"] = payload.CleanupFiles + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to stop compose stack {Project}", payload.ProjectName); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = ex.Message, + CompletedAt = DateTimeOffset.UtcNow + }; + } + } +} +``` + +### ComposeHealthCheckTask + +```csharp +namespace StellaOps.Agent.Compose.Tasks; + +public sealed class ComposeHealthCheckTask +{ + private readonly ComposeExecutor _executor; + private readonly ILogger _logger; + + public sealed record HealthCheckPayload + { + public required string ProjectName { get; init; } + public string? ComposeFile { get; init; } + public IReadOnlyList? Services { get; init; } + public TimeSpan Timeout { get; init; } = TimeSpan.FromMinutes(5); + public bool WaitForHealthy { get; init; } = true; + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("compose.health-check"); + + _logger.LogInformation("Checking health of compose stack: {Project}", payload.ProjectName); + + try + { + var projectDir = Path.Combine("/var/lib/stella-agent/deployments", payload.ProjectName); + var composeFile = payload.ComposeFile ?? Path.Combine(projectDir, "compose.stella.lock.yml"); + + using var timeoutCts = new CancellationTokenSource(payload.Timeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + while (!linkedCts.IsCancellationRequested) + { + var psResult = await _executor.PsAsync(projectDir, composeFile, ct: linkedCts.Token); + var services = ParseServices(psResult.StandardOutput); + + // Filter to requested services if specified + if (payload.Services?.Count > 0) + { + services = services.Where(s => payload.Services.Contains(s.Service)).ToList(); + } + + var allRunning = services.All(s => s.State == "running"); + var allHealthy = services.All(s => + s.Health is null || s.Health == "healthy" || s.Health == ""); + + if (allRunning && allHealthy) + { + _logger.LogInformation( + "Compose stack {Project} is healthy ({Count} services)", + payload.ProjectName, + services.Count); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["projectName"] = payload.ProjectName, + ["services"] = services, + ["allHealthy"] = true + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + var unhealthyServices = services.Where(s => + s.State != "running" || (s.Health is not null && s.Health != "healthy" && s.Health != "")); + + if (!payload.WaitForHealthy) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = "Some services are unhealthy", + Outputs = new Dictionary + { + ["projectName"] = payload.ProjectName, + ["services"] = services, + ["unhealthyServices"] = unhealthyServices.ToList() + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + await Task.Delay(TimeSpan.FromSeconds(5), linkedCts.Token); + } + + throw new OperationCanceledException(); + } + catch (OperationCanceledException) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Health check timed out after {payload.Timeout}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "Health check failed for stack {Project}", payload.ProjectName); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = ex.Message, + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private static List ParseServices(string output) + { + var services = new List(); + + if (string.IsNullOrWhiteSpace(output)) + return services; + + foreach (var line in output.Split('\n', StringSplitOptions.RemoveEmptyEntries)) + { + try + { + var service = JsonSerializer.Deserialize(line); + services.Add(new ServiceStatus( + service.GetProperty("Name").GetString() ?? "", + service.GetProperty("Service").GetString() ?? "", + service.GetProperty("State").GetString() ?? "", + service.TryGetProperty("Health", out var health) ? health.GetString() : null + )); + } + catch { } + } + + return services; + } +} +``` + +### Documentation Deliverables + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `docs/modules/release-orchestrator/deployment/agent-based.md` (partial) | Markdown | Agent-based deployment documentation (Compose agent with 8 operations) | + +--- + +## Acceptance Criteria + +### Code + +- [ ] Deploy compose stack from compose.stella.lock.yml +- [ ] Pull images before deployment +- [ ] Support authenticated registries +- [ ] Force recreate containers option +- [ ] Remove orphan containers +- [ ] Stop and remove compose stack +- [ ] Optionally remove volumes on down +- [ ] Scale services up/down +- [ ] Check health of all services +- [ ] Wait for services to become healthy +- [ ] Backup existing deployment before update +- [ ] Unit test coverage >=85% + +### Documentation +- [ ] Compose agent documentation section created +- [ ] All Compose operations documented (pull, up, down, scale, health, backup) +- [ ] TypeScript implementation code included +- [ ] Compose lock file usage documented + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 108_001 Agent Core Runtime | Internal | TODO | +| 108_002 Agent - Docker | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| ComposeCapability | TODO | | +| ComposeExecutor | TODO | | +| ComposeFileManager | TODO | | +| ComposePullTask | TODO | | +| ComposeUpTask | TODO | | +| ComposeDownTask | TODO | | +| ComposeScaleTask | TODO | | +| ComposeHealthCheckTask | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 11-Jan-2026 | Added documentation deliverable: deployment/agent-based.md (partial - Compose) | diff --git a/docs/implplan/SPRINT_20260110_108_004_AGENTS_ssh.md b/docs/implplan/SPRINT_20260110_108_004_AGENTS_ssh.md new file mode 100644 index 000000000..2dfb59e0d --- /dev/null +++ b/docs/implplan/SPRINT_20260110_108_004_AGENTS_ssh.md @@ -0,0 +1,813 @@ +# SPRINT: Agent - SSH + +> **Sprint ID:** 108_004 +> **Module:** AGENTS +> **Phase:** 8 - Agents +> **Status:** TODO +> **Parent:** [108_000_INDEX](SPRINT_20260110_108_000_INDEX_agents.md) + +--- + +## Overview + +Implement the SSH Agent capability for remote command execution and file transfer via SSH. + +### Objectives + +- Remote command execution via SSH +- File transfer (SCP/SFTP) +- SSH key authentication +- SSH tunneling for remote Docker/Compose operations +- Connection pooling for efficiency + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Agents/ +│ └── StellaOps.Agent.Ssh/ +│ ├── SshCapability.cs +│ ├── Tasks/ +│ │ ├── SshExecuteTask.cs +│ │ ├── SshFileTransferTask.cs +│ │ ├── SshTunnelTask.cs +│ │ └── SshDockerProxyTask.cs +│ ├── SshConnectionPool.cs +│ └── SshClientFactory.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### SshCapability + +```csharp +namespace StellaOps.Agent.Ssh; + +public sealed class SshCapability : IAgentCapability +{ + private readonly SshConnectionPool _connectionPool; + private readonly ILogger _logger; + private readonly Dictionary>> _taskHandlers; + + public string Name => "ssh"; + public string Version => "1.0.0"; + + public IReadOnlyList SupportedTaskTypes => new[] + { + "ssh.execute", + "ssh.upload", + "ssh.download", + "ssh.tunnel", + "ssh.docker-proxy" + }; + + public SshCapability(SshConnectionPool connectionPool, ILogger logger) + { + _connectionPool = connectionPool; + _logger = logger; + + _taskHandlers = new Dictionary>> + { + ["ssh.execute"] = ExecuteCommandAsync, + ["ssh.upload"] = UploadFileAsync, + ["ssh.download"] = DownloadFileAsync, + ["ssh.tunnel"] = CreateTunnelAsync, + ["ssh.docker-proxy"] = DockerProxyAsync + }; + } + + public Task InitializeAsync(CancellationToken ct = default) + { + // SSH capability is always available if SSH.NET is loaded + _logger.LogInformation("SSH capability initialized"); + return Task.FromResult(true); + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct = default) + { + if (!_taskHandlers.TryGetValue(task.TaskType, out var handler)) + { + throw new UnsupportedTaskTypeException(task.TaskType); + } + + return await handler(task, ct); + } + + public Task CheckHealthAsync(CancellationToken ct = default) + { + return Task.FromResult(new CapabilityHealthStatus(true, "SSH capability available")); + } + + private Task ExecuteCommandAsync(AgentTask task, CancellationToken ct) => + new SshExecuteTask(_connectionPool, _logger).ExecuteAsync(task, ct); + + private Task UploadFileAsync(AgentTask task, CancellationToken ct) => + new SshFileTransferTask(_connectionPool, _logger).UploadAsync(task, ct); + + private Task DownloadFileAsync(AgentTask task, CancellationToken ct) => + new SshFileTransferTask(_connectionPool, _logger).DownloadAsync(task, ct); + + private Task CreateTunnelAsync(AgentTask task, CancellationToken ct) => + new SshTunnelTask(_connectionPool, _logger).ExecuteAsync(task, ct); + + private Task DockerProxyAsync(AgentTask task, CancellationToken ct) => + new SshDockerProxyTask(_connectionPool, _logger).ExecuteAsync(task, ct); +} +``` + +### SshConnectionPool + +```csharp +namespace StellaOps.Agent.Ssh; + +public sealed class SshConnectionPool : IAsyncDisposable +{ + private readonly ConcurrentDictionary _connections = new(); + private readonly TimeSpan _connectionTimeout = TimeSpan.FromMinutes(10); + private readonly ILogger _logger; + private readonly Timer _cleanupTimer; + + public SshConnectionPool(ILogger logger) + { + _logger = logger; + _cleanupTimer = new Timer(CleanupExpiredConnections, null, TimeSpan.FromMinutes(1), TimeSpan.FromMinutes(1)); + } + + public async Task GetConnectionAsync( + SshConnectionInfo connectionInfo, + CancellationToken ct = default) + { + var key = connectionInfo.GetConnectionKey(); + + if (_connections.TryGetValue(key, out var pooled) && pooled.Client.IsConnected) + { + pooled.LastUsed = DateTimeOffset.UtcNow; + return pooled.Client; + } + + var client = await CreateConnectionAsync(connectionInfo, ct); + _connections[key] = new PooledConnection(client, DateTimeOffset.UtcNow); + + return client; + } + + private async Task CreateConnectionAsync( + SshConnectionInfo info, + CancellationToken ct) + { + var authMethods = new List(); + + // Private key authentication + if (!string.IsNullOrEmpty(info.PrivateKey)) + { + var keyFile = string.IsNullOrEmpty(info.PrivateKeyPassphrase) + ? new PrivateKeyFile(new MemoryStream(Encoding.UTF8.GetBytes(info.PrivateKey))) + : new PrivateKeyFile(new MemoryStream(Encoding.UTF8.GetBytes(info.PrivateKey)), info.PrivateKeyPassphrase); + + authMethods.Add(new PrivateKeyAuthenticationMethod(info.Username, keyFile)); + } + + // Password authentication + if (!string.IsNullOrEmpty(info.Password)) + { + authMethods.Add(new PasswordAuthenticationMethod(info.Username, info.Password)); + } + + var connectionInfo = new ConnectionInfo( + info.Host, + info.Port, + info.Username, + authMethods.ToArray()); + + var client = new SshClient(connectionInfo); + + await Task.Run(() => client.Connect(), ct); + + _logger.LogDebug( + "SSH connection established to {User}@{Host}:{Port}", + info.Username, + info.Host, + info.Port); + + return client; + } + + public void ReleaseConnection(string connectionKey) + { + // Connection stays in pool for reuse + if (_connections.TryGetValue(connectionKey, out var pooled)) + { + pooled.LastUsed = DateTimeOffset.UtcNow; + } + } + + private void CleanupExpiredConnections(object? state) + { + var expired = _connections + .Where(kv => DateTimeOffset.UtcNow - kv.Value.LastUsed > _connectionTimeout) + .ToList(); + + foreach (var (key, pooled) in expired) + { + if (_connections.TryRemove(key, out _)) + { + try + { + pooled.Client.Disconnect(); + pooled.Client.Dispose(); + _logger.LogDebug("Closed expired SSH connection: {Key}", key); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Error closing SSH connection: {Key}", key); + } + } + } + } + + public async ValueTask DisposeAsync() + { + _cleanupTimer.Dispose(); + + foreach (var (_, pooled) in _connections) + { + try + { + pooled.Client.Disconnect(); + pooled.Client.Dispose(); + } + catch { } + } + + _connections.Clear(); + } + + private sealed class PooledConnection + { + public SshClient Client { get; } + public DateTimeOffset LastUsed { get; set; } + + public PooledConnection(SshClient client, DateTimeOffset lastUsed) + { + Client = client; + LastUsed = lastUsed; + } + } +} + +public sealed record SshConnectionInfo +{ + public required string Host { get; init; } + public int Port { get; init; } = 22; + public required string Username { get; init; } + public string? Password { get; init; } + public string? PrivateKey { get; init; } + public string? PrivateKeyPassphrase { get; init; } + + public string GetConnectionKey() => $"{Username}@{Host}:{Port}"; +} +``` + +### SshExecuteTask + +```csharp +namespace StellaOps.Agent.Ssh.Tasks; + +public sealed class SshExecuteTask +{ + private readonly SshConnectionPool _connectionPool; + private readonly ILogger _logger; + + public sealed record ExecutePayload + { + public required string Host { get; init; } + public int Port { get; init; } = 22; + public required string Username { get; init; } + public required string Command { get; init; } + public IReadOnlyDictionary? Environment { get; init; } + public string? WorkingDirectory { get; init; } + public TimeSpan Timeout { get; init; } = TimeSpan.FromMinutes(10); + public bool CombineOutput { get; init; } = true; + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("ssh.execute"); + + var connectionInfo = new SshConnectionInfo + { + Host = payload.Host, + Port = payload.Port, + Username = payload.Username, + Password = task.Credentials.GetValueOrDefault("ssh.password"), + PrivateKey = task.Credentials.GetValueOrDefault("ssh.privateKey"), + PrivateKeyPassphrase = task.Credentials.GetValueOrDefault("ssh.passphrase") + }; + + _logger.LogInformation( + "Executing SSH command on {User}@{Host}", + payload.Username, + payload.Host); + + try + { + var client = await _connectionPool.GetConnectionAsync(connectionInfo, ct); + + // Build command with environment and working directory + var fullCommand = BuildCommand(payload); + + using var command = client.CreateCommand(fullCommand); + command.CommandTimeout = payload.Timeout; + + var asyncResult = command.BeginExecute(); + + // Wait for completion with cancellation support + while (!asyncResult.IsCompleted) + { + ct.ThrowIfCancellationRequested(); + await Task.Delay(100, ct); + } + + var result = command.EndExecute(asyncResult); + + var exitCode = command.ExitStatus; + var stdout = result; + var stderr = command.Error; + + _logger.LogInformation( + "SSH command completed with exit code {ExitCode}", + exitCode); + + return new TaskResult + { + TaskId = task.Id, + Success = exitCode == 0, + Error = exitCode != 0 ? stderr : null, + Outputs = new Dictionary + { + ["exitCode"] = exitCode, + ["stdout"] = stdout, + ["stderr"] = stderr, + ["output"] = payload.CombineOutput ? $"{stdout}\n{stderr}" : stdout + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (SshException ex) + { + _logger.LogError(ex, "SSH command failed on {Host}", payload.Host); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"SSH error: {ex.Message}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private static string BuildCommand(ExecutePayload payload) + { + var parts = new List(); + + // Set environment variables + if (payload.Environment is not null) + { + foreach (var (key, value) in payload.Environment) + { + parts.Add($"export {key}='{EscapeShellString(value)}'"); + } + } + + // Change to working directory + if (!string.IsNullOrEmpty(payload.WorkingDirectory)) + { + parts.Add($"cd '{EscapeShellString(payload.WorkingDirectory)}'"); + } + + parts.Add(payload.Command); + + return string.Join(" && ", parts); + } + + private static string EscapeShellString(string value) + { + return value.Replace("'", "'\"'\"'"); + } +} +``` + +### SshFileTransferTask + +```csharp +namespace StellaOps.Agent.Ssh.Tasks; + +public sealed class SshFileTransferTask +{ + private readonly SshConnectionPool _connectionPool; + private readonly ILogger _logger; + + public sealed record UploadPayload + { + public required string Host { get; init; } + public int Port { get; init; } = 22; + public required string Username { get; init; } + public required string LocalPath { get; init; } + public required string RemotePath { get; init; } + public bool CreateDirectory { get; init; } = true; + public int Permissions { get; init; } = 0644; + } + + public sealed record DownloadPayload + { + public required string Host { get; init; } + public int Port { get; init; } = 22; + public required string Username { get; init; } + public required string RemotePath { get; init; } + public required string LocalPath { get; init; } + public bool CreateDirectory { get; init; } = true; + } + + public async Task UploadAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("ssh.upload"); + + var connectionInfo = BuildConnectionInfo(payload.Host, payload.Port, payload.Username, task.Credentials); + + _logger.LogInformation( + "Uploading {Local} to {User}@{Host}:{Remote}", + payload.LocalPath, + payload.Username, + payload.Host, + payload.RemotePath); + + try + { + var client = await _connectionPool.GetConnectionAsync(connectionInfo, ct); + + using var sftp = new SftpClient(client.ConnectionInfo); + await Task.Run(() => sftp.Connect(), ct); + + // Create parent directory if needed + if (payload.CreateDirectory) + { + var parentDir = Path.GetDirectoryName(payload.RemotePath)?.Replace('\\', '/'); + if (!string.IsNullOrEmpty(parentDir)) + { + await CreateRemoteDirectoryAsync(sftp, parentDir, ct); + } + } + + // Upload file + await using var localFile = File.OpenRead(payload.LocalPath); + await Task.Run(() => sftp.UploadFile(localFile, payload.RemotePath), ct); + + // Set permissions + sftp.ChangePermissions(payload.RemotePath, (short)payload.Permissions); + + var fileInfo = sftp.GetAttributes(payload.RemotePath); + + sftp.Disconnect(); + + _logger.LogInformation( + "Uploaded {Size} bytes to {Remote}", + fileInfo.Size, + payload.RemotePath); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["remotePath"] = payload.RemotePath, + ["size"] = fileInfo.Size, + ["permissions"] = payload.Permissions + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (SftpPathNotFoundException ex) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Remote path not found: {ex.Message}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to upload file to {Host}", payload.Host); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = ex.Message, + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + public async Task DownloadAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("ssh.download"); + + var connectionInfo = BuildConnectionInfo(payload.Host, payload.Port, payload.Username, task.Credentials); + + _logger.LogInformation( + "Downloading {User}@{Host}:{Remote} to {Local}", + payload.Username, + payload.Host, + payload.RemotePath, + payload.LocalPath); + + try + { + var client = await _connectionPool.GetConnectionAsync(connectionInfo, ct); + + using var sftp = new SftpClient(client.ConnectionInfo); + await Task.Run(() => sftp.Connect(), ct); + + // Create local directory if needed + if (payload.CreateDirectory) + { + var localDir = Path.GetDirectoryName(payload.LocalPath); + if (!string.IsNullOrEmpty(localDir)) + { + Directory.CreateDirectory(localDir); + } + } + + // Download file + var remoteAttributes = sftp.GetAttributes(payload.RemotePath); + await using var localFile = File.Create(payload.LocalPath); + await Task.Run(() => sftp.DownloadFile(payload.RemotePath, localFile), ct); + + sftp.Disconnect(); + + _logger.LogInformation( + "Downloaded {Size} bytes to {Local}", + remoteAttributes.Size, + payload.LocalPath); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["localPath"] = payload.LocalPath, + ["size"] = remoteAttributes.Size + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (SftpPathNotFoundException) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Remote file not found: {payload.RemotePath}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to download file from {Host}", payload.Host); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = ex.Message, + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private static async Task CreateRemoteDirectoryAsync(SftpClient sftp, string path, CancellationToken ct) + { + var parts = path.Split('/').Where(p => !string.IsNullOrEmpty(p)).ToList(); + var current = ""; + + foreach (var part in parts) + { + current = $"{current}/{part}"; + + try + { + var attrs = sftp.GetAttributes(current); + if (!attrs.IsDirectory) + { + throw new InvalidOperationException($"Path exists but is not a directory: {current}"); + } + } + catch (SftpPathNotFoundException) + { + await Task.Run(() => sftp.CreateDirectory(current), ct); + } + } + } + + private static SshConnectionInfo BuildConnectionInfo( + string host, + int port, + string username, + IReadOnlyDictionary credentials) + { + return new SshConnectionInfo + { + Host = host, + Port = port, + Username = username, + Password = credentials.GetValueOrDefault("ssh.password"), + PrivateKey = credentials.GetValueOrDefault("ssh.privateKey"), + PrivateKeyPassphrase = credentials.GetValueOrDefault("ssh.passphrase") + }; + } +} +``` + +### SshTunnelTask + +```csharp +namespace StellaOps.Agent.Ssh.Tasks; + +public sealed class SshTunnelTask +{ + private readonly SshConnectionPool _connectionPool; + private readonly ILogger _logger; + + public sealed record TunnelPayload + { + public required string Host { get; init; } + public int Port { get; init; } = 22; + public required string Username { get; init; } + public required int LocalPort { get; init; } + public required string RemoteHost { get; init; } + public required int RemotePort { get; init; } + public TimeSpan Duration { get; init; } = TimeSpan.FromMinutes(30); + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("ssh.tunnel"); + + var connectionInfo = new SshConnectionInfo + { + Host = payload.Host, + Port = payload.Port, + Username = payload.Username, + Password = task.Credentials.GetValueOrDefault("ssh.password"), + PrivateKey = task.Credentials.GetValueOrDefault("ssh.privateKey"), + PrivateKeyPassphrase = task.Credentials.GetValueOrDefault("ssh.passphrase") + }; + + _logger.LogInformation( + "Creating SSH tunnel: localhost:{Local} -> {User}@{Host} -> {RemoteHost}:{RemotePort}", + payload.LocalPort, + payload.Username, + payload.Host, + payload.RemoteHost, + payload.RemotePort); + + try + { + var client = await _connectionPool.GetConnectionAsync(connectionInfo, ct); + + var tunnel = new ForwardedPortLocal( + "127.0.0.1", + (uint)payload.LocalPort, + payload.RemoteHost, + (uint)payload.RemotePort); + + client.AddForwardedPort(tunnel); + tunnel.Start(); + + _logger.LogInformation( + "SSH tunnel established: localhost:{Local} -> {RemoteHost}:{RemotePort}", + payload.LocalPort, + payload.RemoteHost, + payload.RemotePort); + + // Keep tunnel open for specified duration + using var durationCts = new CancellationTokenSource(payload.Duration); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, durationCts.Token); + + try + { + await Task.Delay(payload.Duration, linkedCts.Token); + } + catch (OperationCanceledException) when (durationCts.IsCancellationRequested) + { + // Duration expired, normal completion + } + + tunnel.Stop(); + client.RemoveForwardedPort(tunnel); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["localPort"] = payload.LocalPort, + ["remoteHost"] = payload.RemoteHost, + ["remotePort"] = payload.RemotePort, + ["duration"] = payload.Duration.ToString() + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to create SSH tunnel to {Host}", payload.Host); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = ex.Message, + CompletedAt = DateTimeOffset.UtcNow + }; + } + } +} +``` + +### Documentation Deliverables + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `docs/modules/release-orchestrator/deployment/agentless.md` (partial) | Markdown | Agentless deployment documentation (SSH remote executor) | + +--- + +## Acceptance Criteria + +### Code + +- [ ] Execute remote commands via SSH +- [ ] Support password authentication +- [ ] Support private key authentication +- [ ] Support passphrase-protected keys +- [ ] Upload files via SFTP +- [ ] Download files via SFTP +- [ ] Create remote directories automatically +- [ ] Set file permissions on upload +- [ ] Create SSH tunnels for port forwarding +- [ ] Connection pooling for efficiency +- [ ] Timeout handling for commands +- [ ] Unit test coverage >=85% + +### Documentation +- [ ] SSH remote executor documentation created +- [ ] All SSH operations documented (execute, upload, download, tunnel) +- [ ] SFTP file transfer flow documented +- [ ] SSH key authentication documented +- [ ] TypeScript implementation included + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 108_001 Agent Core Runtime | Internal | TODO | +| SSH.NET | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| SshCapability | TODO | | +| SshConnectionPool | TODO | | +| SshExecuteTask | TODO | | +| SshFileTransferTask | TODO | | +| SshTunnelTask | TODO | | +| SshDockerProxyTask | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 11-Jan-2026 | Added documentation deliverable: deployment/agentless.md (partial - SSH) | diff --git a/docs/implplan/SPRINT_20260110_108_005_AGENTS_winrm.md b/docs/implplan/SPRINT_20260110_108_005_AGENTS_winrm.md new file mode 100644 index 000000000..20e6f542b --- /dev/null +++ b/docs/implplan/SPRINT_20260110_108_005_AGENTS_winrm.md @@ -0,0 +1,915 @@ +# SPRINT: Agent - WinRM + +> **Sprint ID:** 108_005 +> **Module:** AGENTS +> **Phase:** 8 - Agents +> **Status:** TODO +> **Parent:** [108_000_INDEX](SPRINT_20260110_108_000_INDEX_agents.md) + +--- + +## Overview + +Implement the WinRM Agent capability for remote Windows management via WinRM/PowerShell. + +### Objectives + +- Remote PowerShell execution via WinRM +- Windows service management +- Windows container operations +- File transfer to Windows hosts +- NTLM and Kerberos authentication + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Agents/ +│ └── StellaOps.Agent.WinRM/ +│ ├── WinRmCapability.cs +│ ├── Tasks/ +│ │ ├── PowerShellTask.cs +│ │ ├── WindowsServiceTask.cs +│ │ ├── WindowsContainerTask.cs +│ │ └── WinRmFileTransferTask.cs +│ ├── WinRmConnectionPool.cs +│ └── PowerShellRunner.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### WinRmCapability + +```csharp +namespace StellaOps.Agent.WinRM; + +public sealed class WinRmCapability : IAgentCapability +{ + private readonly WinRmConnectionPool _connectionPool; + private readonly ILogger _logger; + private readonly Dictionary>> _taskHandlers; + + public string Name => "winrm"; + public string Version => "1.0.0"; + + public IReadOnlyList SupportedTaskTypes => new[] + { + "winrm.powershell", + "winrm.service.start", + "winrm.service.stop", + "winrm.service.restart", + "winrm.service.status", + "winrm.container.deploy", + "winrm.file.upload", + "winrm.file.download" + }; + + public WinRmCapability(WinRmConnectionPool connectionPool, ILogger logger) + { + _connectionPool = connectionPool; + _logger = logger; + + _taskHandlers = new Dictionary>> + { + ["winrm.powershell"] = ExecutePowerShellAsync, + ["winrm.service.start"] = StartServiceAsync, + ["winrm.service.stop"] = StopServiceAsync, + ["winrm.service.restart"] = RestartServiceAsync, + ["winrm.service.status"] = GetServiceStatusAsync, + ["winrm.container.deploy"] = DeployContainerAsync, + ["winrm.file.upload"] = UploadFileAsync, + ["winrm.file.download"] = DownloadFileAsync + }; + } + + public Task InitializeAsync(CancellationToken ct = default) + { + _logger.LogInformation("WinRM capability initialized"); + return Task.FromResult(true); + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct = default) + { + if (!_taskHandlers.TryGetValue(task.TaskType, out var handler)) + { + throw new UnsupportedTaskTypeException(task.TaskType); + } + + return await handler(task, ct); + } + + public Task CheckHealthAsync(CancellationToken ct = default) + { + return Task.FromResult(new CapabilityHealthStatus(true, "WinRM capability available")); + } + + private Task ExecutePowerShellAsync(AgentTask task, CancellationToken ct) => + new PowerShellTask(_connectionPool, _logger).ExecuteAsync(task, ct); + + private Task StartServiceAsync(AgentTask task, CancellationToken ct) => + new WindowsServiceTask(_connectionPool, _logger).StartAsync(task, ct); + + private Task StopServiceAsync(AgentTask task, CancellationToken ct) => + new WindowsServiceTask(_connectionPool, _logger).StopAsync(task, ct); + + private Task RestartServiceAsync(AgentTask task, CancellationToken ct) => + new WindowsServiceTask(_connectionPool, _logger).RestartAsync(task, ct); + + private Task GetServiceStatusAsync(AgentTask task, CancellationToken ct) => + new WindowsServiceTask(_connectionPool, _logger).GetStatusAsync(task, ct); + + private Task DeployContainerAsync(AgentTask task, CancellationToken ct) => + new WindowsContainerTask(_connectionPool, _logger).DeployAsync(task, ct); + + private Task UploadFileAsync(AgentTask task, CancellationToken ct) => + new WinRmFileTransferTask(_connectionPool, _logger).UploadAsync(task, ct); + + private Task DownloadFileAsync(AgentTask task, CancellationToken ct) => + new WinRmFileTransferTask(_connectionPool, _logger).DownloadAsync(task, ct); +} +``` + +### WinRmConnectionPool + +```csharp +namespace StellaOps.Agent.WinRM; + +public sealed class WinRmConnectionPool : IAsyncDisposable +{ + private readonly ConcurrentDictionary _sessions = new(); + private readonly TimeSpan _sessionTimeout = TimeSpan.FromMinutes(10); + private readonly ILogger _logger; + private readonly Timer _cleanupTimer; + + public WinRmConnectionPool(ILogger logger) + { + _logger = logger; + _cleanupTimer = new Timer(CleanupExpiredSessions, null, TimeSpan.FromMinutes(1), TimeSpan.FromMinutes(1)); + } + + public async Task GetSessionAsync( + WinRmConnectionInfo connectionInfo, + CancellationToken ct = default) + { + var key = connectionInfo.GetConnectionKey(); + + if (_sessions.TryGetValue(key, out var pooled) && pooled.IsValid) + { + pooled.LastUsed = DateTimeOffset.UtcNow; + return pooled.Session; + } + + var session = await CreateSessionAsync(connectionInfo, ct); + _sessions[key] = new PooledSession(session, DateTimeOffset.UtcNow); + + return session; + } + + private async Task CreateSessionAsync( + WinRmConnectionInfo info, + CancellationToken ct) + { + var sessionOptions = new WSManSessionOptions + { + DestinationHost = info.Host, + DestinationPort = info.Port, + UseSSL = info.UseSSL, + AuthenticationMechanism = info.AuthMechanism, + Credential = new NetworkCredential(info.Username, info.Password, info.Domain) + }; + + var session = await Task.Run(() => new WSManSession(sessionOptions), ct); + + _logger.LogDebug( + "WinRM session established to {Host}:{Port}", + info.Host, + info.Port); + + return session; + } + + private void CleanupExpiredSessions(object? state) + { + var expired = _sessions + .Where(kv => DateTimeOffset.UtcNow - kv.Value.LastUsed > _sessionTimeout) + .ToList(); + + foreach (var (key, pooled) in expired) + { + if (_sessions.TryRemove(key, out _)) + { + try + { + pooled.Session.Dispose(); + _logger.LogDebug("Closed expired WinRM session: {Key}", key); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Error closing WinRM session: {Key}", key); + } + } + } + } + + public async ValueTask DisposeAsync() + { + _cleanupTimer.Dispose(); + + foreach (var (_, pooled) in _sessions) + { + try + { + pooled.Session.Dispose(); + } + catch { } + } + + _sessions.Clear(); + } + + private sealed class PooledSession + { + public WSManSession Session { get; } + public DateTimeOffset LastUsed { get; set; } + public bool IsValid => !Session.IsDisposed; + + public PooledSession(WSManSession session, DateTimeOffset lastUsed) + { + Session = session; + LastUsed = lastUsed; + } + } +} + +public sealed record WinRmConnectionInfo +{ + public required string Host { get; init; } + public int Port { get; init; } = 5985; + public bool UseSSL { get; init; } + public required string Username { get; init; } + public required string Password { get; init; } + public string? Domain { get; init; } + public WinRmAuthMechanism AuthMechanism { get; init; } = WinRmAuthMechanism.Negotiate; + + public string GetConnectionKey() => $"{Domain ?? ""}\\{Username}@{Host}:{Port}"; +} + +public enum WinRmAuthMechanism +{ + Basic, + Negotiate, + Kerberos, + CredSSP +} +``` + +### PowerShellTask + +```csharp +namespace StellaOps.Agent.WinRM.Tasks; + +public sealed class PowerShellTask +{ + private readonly WinRmConnectionPool _connectionPool; + private readonly ILogger _logger; + + public sealed record PowerShellPayload + { + public required string Host { get; init; } + public int Port { get; init; } = 5985; + public bool UseSSL { get; init; } + public required string Username { get; init; } + public string? Domain { get; init; } + public required string Script { get; init; } + public IReadOnlyDictionary? Parameters { get; init; } + public TimeSpan Timeout { get; init; } = TimeSpan.FromMinutes(10); + public bool NoProfile { get; init; } = true; + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("winrm.powershell"); + + var connectionInfo = new WinRmConnectionInfo + { + Host = payload.Host, + Port = payload.Port, + UseSSL = payload.UseSSL, + Username = payload.Username, + Password = task.Credentials.GetValueOrDefault("winrm.password") ?? "", + Domain = payload.Domain + }; + + _logger.LogInformation( + "Executing PowerShell script on {Host}", + payload.Host); + + try + { + var session = await _connectionPool.GetSessionAsync(connectionInfo, ct); + + // Build the script with parameters + var script = BuildScript(payload); + + using var timeoutCts = new CancellationTokenSource(payload.Timeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + var result = await Task.Run(() => + { + using var shell = session.CreatePowerShellShell(); + + if (payload.NoProfile) + { + shell.AddScript("$PSDefaultParameterValues['*:NoProfile'] = $true"); + } + + shell.AddScript(script); + + // Add parameters + if (payload.Parameters is not null) + { + foreach (var (key, value) in payload.Parameters) + { + shell.AddParameter(key, value); + } + } + + var output = shell.Invoke(); + + return new PowerShellResult + { + Output = output.Select(o => o.ToString()).ToList(), + HadErrors = shell.HadErrors, + Errors = shell.Streams.Error.Select(e => e.ToString()).ToList() + }; + }, linkedCts.Token); + + _logger.LogInformation( + "PowerShell script completed (errors: {HadErrors})", + result.HadErrors); + + return new TaskResult + { + TaskId = task.Id, + Success = !result.HadErrors, + Error = result.HadErrors ? string.Join("\n", result.Errors) : null, + Outputs = new Dictionary + { + ["output"] = result.Output, + ["errors"] = result.Errors, + ["hadErrors"] = result.HadErrors + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (OperationCanceledException) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"PowerShell execution timed out after {payload.Timeout}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "PowerShell execution failed on {Host}", payload.Host); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = ex.Message, + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private static string BuildScript(PowerShellPayload payload) + { + // Wrap script in error handling + return $@" +$ErrorActionPreference = 'Stop' +try {{ + {payload.Script} +}} catch {{ + Write-Error $_.Exception.Message + throw +}}"; + } + + private sealed record PowerShellResult + { + public required IReadOnlyList Output { get; init; } + public required bool HadErrors { get; init; } + public required IReadOnlyList Errors { get; init; } + } +} +``` + +### WindowsServiceTask + +```csharp +namespace StellaOps.Agent.WinRM.Tasks; + +public sealed class WindowsServiceTask +{ + private readonly WinRmConnectionPool _connectionPool; + private readonly ILogger _logger; + + public sealed record ServicePayload + { + public required string Host { get; init; } + public int Port { get; init; } = 5985; + public bool UseSSL { get; init; } + public required string Username { get; init; } + public string? Domain { get; init; } + public required string ServiceName { get; init; } + public TimeSpan Timeout { get; init; } = TimeSpan.FromMinutes(5); + } + + public async Task StartAsync(AgentTask task, CancellationToken ct) + { + return await ExecuteServiceCommandAsync(task, "Start-Service", ct); + } + + public async Task StopAsync(AgentTask task, CancellationToken ct) + { + return await ExecuteServiceCommandAsync(task, "Stop-Service", ct); + } + + public async Task RestartAsync(AgentTask task, CancellationToken ct) + { + return await ExecuteServiceCommandAsync(task, "Restart-Service", ct); + } + + public async Task GetStatusAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("winrm.service.status"); + + var connectionInfo = BuildConnectionInfo(payload, task.Credentials); + + _logger.LogInformation( + "Getting service status for {Service} on {Host}", + payload.ServiceName, + payload.Host); + + try + { + var session = await _connectionPool.GetSessionAsync(connectionInfo, ct); + + var script = $@" +$service = Get-Service -Name '{EscapeString(payload.ServiceName)}' -ErrorAction Stop +@{{ + Name = $service.Name + DisplayName = $service.DisplayName + Status = $service.Status.ToString() + StartType = $service.StartType.ToString() + CanStop = $service.CanStop + CanPauseAndContinue = $service.CanPauseAndContinue +}} | ConvertTo-Json"; + + var result = await ExecutePowerShellAsync(session, script, payload.Timeout, ct); + + if (result.HadErrors) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = string.Join("\n", result.Errors), + CompletedAt = DateTimeOffset.UtcNow + }; + } + + var serviceInfo = JsonSerializer.Deserialize(string.Join("", result.Output)); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["serviceName"] = serviceInfo?.Name ?? payload.ServiceName, + ["displayName"] = serviceInfo?.DisplayName ?? "", + ["status"] = serviceInfo?.Status ?? "Unknown", + ["startType"] = serviceInfo?.StartType ?? "Unknown" + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to get service status on {Host}", payload.Host); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = ex.Message, + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private async Task ExecuteServiceCommandAsync( + AgentTask task, + string command, + CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("winrm.service"); + + var connectionInfo = BuildConnectionInfo(payload, task.Credentials); + + _logger.LogInformation( + "Executing {Command} for {Service} on {Host}", + command, + payload.ServiceName, + payload.Host); + + try + { + var session = await _connectionPool.GetSessionAsync(connectionInfo, ct); + + var script = $@" +{command} -Name '{EscapeString(payload.ServiceName)}' -ErrorAction Stop +$service = Get-Service -Name '{EscapeString(payload.ServiceName)}' +$service.Status.ToString()"; + + var result = await ExecutePowerShellAsync(session, script, payload.Timeout, ct); + + if (result.HadErrors) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = string.Join("\n", result.Errors), + CompletedAt = DateTimeOffset.UtcNow + }; + } + + var status = string.Join("", result.Output).Trim(); + + _logger.LogInformation( + "Service {Service} is now {Status}", + payload.ServiceName, + status); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["serviceName"] = payload.ServiceName, + ["status"] = status + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "Service command failed on {Host}", payload.Host); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = ex.Message, + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private static async Task ExecutePowerShellAsync( + WSManSession session, + string script, + TimeSpan timeout, + CancellationToken ct) + { + using var timeoutCts = new CancellationTokenSource(timeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + return await Task.Run(() => + { + using var shell = session.CreatePowerShellShell(); + shell.AddScript(script); + + var output = shell.Invoke(); + + return new PowerShellResult + { + Output = output.Select(o => o.ToString()).ToList(), + HadErrors = shell.HadErrors, + Errors = shell.Streams.Error.Select(e => e.ToString()).ToList() + }; + }, linkedCts.Token); + } + + private static WinRmConnectionInfo BuildConnectionInfo( + ServicePayload payload, + IReadOnlyDictionary credentials) + { + return new WinRmConnectionInfo + { + Host = payload.Host, + Port = payload.Port, + UseSSL = payload.UseSSL, + Username = payload.Username, + Password = credentials.GetValueOrDefault("winrm.password") ?? "", + Domain = payload.Domain + }; + } + + private static string EscapeString(string value) + { + return value.Replace("'", "''"); + } + + private sealed record ServiceInfo + { + public string? Name { get; init; } + public string? DisplayName { get; init; } + public string? Status { get; init; } + public string? StartType { get; init; } + } + + private sealed record PowerShellResult + { + public required IReadOnlyList Output { get; init; } + public required bool HadErrors { get; init; } + public required IReadOnlyList Errors { get; init; } + } +} +``` + +### WindowsContainerTask + +```csharp +namespace StellaOps.Agent.WinRM.Tasks; + +public sealed class WindowsContainerTask +{ + private readonly WinRmConnectionPool _connectionPool; + private readonly ILogger _logger; + + public sealed record ContainerPayload + { + public required string Host { get; init; } + public int Port { get; init; } = 5985; + public bool UseSSL { get; init; } + public required string Username { get; init; } + public string? Domain { get; init; } + public required string Image { get; init; } + public required string Name { get; init; } + public IReadOnlyDictionary? Environment { get; init; } + public IReadOnlyList? Ports { get; init; } + public IReadOnlyList? Volumes { get; init; } + public string? Network { get; init; } + public bool RemoveExisting { get; init; } = true; + public TimeSpan Timeout { get; init; } = TimeSpan.FromMinutes(10); + } + + public async Task DeployAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("winrm.container.deploy"); + + var connectionInfo = new WinRmConnectionInfo + { + Host = payload.Host, + Port = payload.Port, + UseSSL = payload.UseSSL, + Username = payload.Username, + Password = task.Credentials.GetValueOrDefault("winrm.password") ?? "", + Domain = payload.Domain + }; + + _logger.LogInformation( + "Deploying Windows container {Name} on {Host}", + payload.Name, + payload.Host); + + try + { + var session = await _connectionPool.GetSessionAsync(connectionInfo, ct); + + // Build deployment script + var script = BuildDeploymentScript(payload, task.Credentials); + + using var timeoutCts = new CancellationTokenSource(payload.Timeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + var result = await Task.Run(() => + { + using var shell = session.CreatePowerShellShell(); + shell.AddScript(script); + + var output = shell.Invoke(); + + return new PowerShellResult + { + Output = output.Select(o => o.ToString()).ToList(), + HadErrors = shell.HadErrors, + Errors = shell.Streams.Error.Select(e => e.ToString()).ToList() + }; + }, linkedCts.Token); + + if (result.HadErrors) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = string.Join("\n", result.Errors), + CompletedAt = DateTimeOffset.UtcNow + }; + } + + // Parse container ID from output + var containerId = ParseContainerId(result.Output); + + _logger.LogInformation( + "Windows container {Name} deployed (ID: {Id})", + payload.Name, + containerId?[..12] ?? "unknown"); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["containerId"] = containerId ?? "", + ["containerName"] = payload.Name, + ["image"] = payload.Image + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "Container deployment failed on {Host}", payload.Host); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = ex.Message, + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private static string BuildDeploymentScript( + ContainerPayload payload, + IReadOnlyDictionary credentials) + { + var sb = new StringBuilder(); + + // Registry login if credentials provided + if (credentials.TryGetValue("registry.username", out var regUser) && + credentials.TryGetValue("registry.password", out var regPass)) + { + var registry = payload.Image.Contains('/') ? payload.Image.Split('/')[0] : ""; + if (!string.IsNullOrEmpty(registry) && registry.Contains('.')) + { + sb.AppendLine($@" +$securePassword = ConvertTo-SecureString '{EscapeString(regPass)}' -AsPlainText -Force +$credential = New-Object System.Management.Automation.PSCredential('{EscapeString(regUser)}', $securePassword) +docker login {registry} --username $credential.UserName --password $credential.GetNetworkCredential().Password"); + } + } + + // Remove existing container if requested + if (payload.RemoveExisting) + { + sb.AppendLine($@" +$existing = docker ps -a --filter 'name=^{payload.Name}$' --format '{{{{.ID}}}}' +if ($existing) {{ + docker stop $existing 2>&1 | Out-Null + docker rm $existing 2>&1 | Out-Null +}}"); + } + + // Pull image + sb.AppendLine($"docker pull '{EscapeString(payload.Image)}'"); + + // Build run command + var runArgs = new List { "docker run -d", $"--name '{EscapeString(payload.Name)}'" }; + + // Environment variables + if (payload.Environment is not null) + { + foreach (var (key, value) in payload.Environment) + { + runArgs.Add($"-e '{EscapeString(key)}={EscapeString(value)}'"); + } + } + + // Port mappings + if (payload.Ports is not null) + { + foreach (var port in payload.Ports) + { + runArgs.Add($"-p {port}"); + } + } + + // Volume mounts + if (payload.Volumes is not null) + { + foreach (var volume in payload.Volumes) + { + runArgs.Add($"-v '{EscapeString(volume)}'"); + } + } + + // Network + if (!string.IsNullOrEmpty(payload.Network)) + { + runArgs.Add($"--network '{EscapeString(payload.Network)}'"); + } + + runArgs.Add($"'{EscapeString(payload.Image)}'"); + + sb.AppendLine(string.Join(" `\n ", runArgs)); + + return sb.ToString(); + } + + private static string? ParseContainerId(IReadOnlyList output) + { + return output.LastOrDefault(l => l.Length >= 12 && l.All(c => char.IsLetterOrDigit(c))); + } + + private static string EscapeString(string value) + { + return value.Replace("'", "''"); + } + + private sealed record PowerShellResult + { + public required IReadOnlyList Output { get; init; } + public required bool HadErrors { get; init; } + public required IReadOnlyList Errors { get; init; } + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Execute PowerShell scripts remotely +- [ ] Support NTLM authentication +- [ ] Support Kerberos authentication +- [ ] Start Windows services +- [ ] Stop Windows services +- [ ] Restart Windows services +- [ ] Get Windows service status +- [ ] Deploy Windows containers via remote Docker +- [ ] Upload files to Windows hosts +- [ ] Download files from Windows hosts +- [ ] Connection pooling for efficiency +- [ ] Timeout handling +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 108_001 Agent Core Runtime | Internal | TODO | +| System.Management.Automation | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| WinRmCapability | TODO | | +| WinRmConnectionPool | TODO | | +| PowerShellTask | TODO | | +| WindowsServiceTask | TODO | | +| WindowsContainerTask | TODO | | +| WinRmFileTransferTask | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_108_006_AGENTS_ecs.md b/docs/implplan/SPRINT_20260110_108_006_AGENTS_ecs.md new file mode 100644 index 000000000..cbc11be6a --- /dev/null +++ b/docs/implplan/SPRINT_20260110_108_006_AGENTS_ecs.md @@ -0,0 +1,961 @@ +# SPRINT: Agent - ECS + +> **Sprint ID:** 108_006 +> **Module:** AGENTS +> **Phase:** 8 - Agents +> **Status:** TODO +> **Parent:** [108_000_INDEX](SPRINT_20260110_108_000_INDEX_agents.md) + +--- + +## Overview + +Implement the ECS Agent capability for managing AWS Elastic Container Service deployments on ECS clusters (Fargate or EC2 launch types). + +### Objectives + +- ECS service deployments (create, update, delete) +- ECS task execution (run tasks, stop tasks) +- Task definition registration +- Service scaling operations +- Deployment health monitoring +- Log streaming via CloudWatch Logs +- Support for Fargate and EC2 launch types + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Agents/ +│ └── StellaOps.Agent.Ecs/ +│ ├── EcsCapability.cs +│ ├── Tasks/ +│ │ ├── EcsDeployServiceTask.cs +│ │ ├── EcsRunTaskTask.cs +│ │ ├── EcsStopTaskTask.cs +│ │ ├── EcsScaleServiceTask.cs +│ │ ├── EcsRegisterTaskDefinitionTask.cs +│ │ └── EcsHealthCheckTask.cs +│ ├── EcsClientFactory.cs +│ └── CloudWatchLogStreamer.cs +└── __Tests/ + └── StellaOps.Agent.Ecs.Tests/ +``` + +--- + +## Deliverables + +### EcsCapability + +```csharp +namespace StellaOps.Agent.Ecs; + +public sealed class EcsCapability : IAgentCapability +{ + private readonly IAmazonECS _ecsClient; + private readonly IAmazonCloudWatchLogs _logsClient; + private readonly ILogger _logger; + private readonly Dictionary>> _taskHandlers; + + public string Name => "ecs"; + public string Version => "1.0.0"; + + public IReadOnlyList SupportedTaskTypes => new[] + { + "ecs.deploy-service", + "ecs.run-task", + "ecs.stop-task", + "ecs.scale-service", + "ecs.register-task-definition", + "ecs.health-check", + "ecs.describe-service" + }; + + public EcsCapability( + IAmazonECS ecsClient, + IAmazonCloudWatchLogs logsClient, + ILogger logger) + { + _ecsClient = ecsClient; + _logsClient = logsClient; + _logger = logger; + + _taskHandlers = new Dictionary>> + { + ["ecs.deploy-service"] = ExecuteDeployServiceAsync, + ["ecs.run-task"] = ExecuteRunTaskAsync, + ["ecs.stop-task"] = ExecuteStopTaskAsync, + ["ecs.scale-service"] = ExecuteScaleServiceAsync, + ["ecs.register-task-definition"] = ExecuteRegisterTaskDefinitionAsync, + ["ecs.health-check"] = ExecuteHealthCheckAsync + }; + } + + public async Task InitializeAsync(CancellationToken ct = default) + { + try + { + // Verify AWS credentials and ECS access + var clusters = await _ecsClient.ListClustersAsync(new ListClustersRequest + { + MaxResults = 1 + }, ct); + + _logger.LogInformation( + "ECS capability initialized, discovered {ClusterCount} clusters", + clusters.ClusterArns.Count); + + return true; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to initialize ECS capability"); + return false; + } + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct = default) + { + if (!_taskHandlers.TryGetValue(task.TaskType, out var handler)) + { + throw new UnsupportedTaskTypeException(task.TaskType); + } + + return await handler(task, ct); + } + + public async Task CheckHealthAsync(CancellationToken ct = default) + { + try + { + await _ecsClient.ListClustersAsync(new ListClustersRequest { MaxResults = 1 }, ct); + return new CapabilityHealthStatus(true, "ECS API responding"); + } + catch (Exception ex) + { + return new CapabilityHealthStatus(false, $"ECS API not responding: {ex.Message}"); + } + } + + private Task ExecuteDeployServiceAsync(AgentTask task, CancellationToken ct) => + new EcsDeployServiceTask(_ecsClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteRunTaskAsync(AgentTask task, CancellationToken ct) => + new EcsRunTaskTask(_ecsClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteStopTaskAsync(AgentTask task, CancellationToken ct) => + new EcsStopTaskTask(_ecsClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteScaleServiceAsync(AgentTask task, CancellationToken ct) => + new EcsScaleServiceTask(_ecsClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteRegisterTaskDefinitionAsync(AgentTask task, CancellationToken ct) => + new EcsRegisterTaskDefinitionTask(_ecsClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteHealthCheckAsync(AgentTask task, CancellationToken ct) => + new EcsHealthCheckTask(_ecsClient, _logger).ExecuteAsync(task, ct); +} +``` + +### EcsDeployServiceTask + +```csharp +namespace StellaOps.Agent.Ecs.Tasks; + +public sealed class EcsDeployServiceTask +{ + private readonly IAmazonECS _ecsClient; + private readonly ILogger _logger; + + public sealed record DeployServicePayload + { + public required string Cluster { get; init; } + public required string ServiceName { get; init; } + public required string TaskDefinition { get; init; } + public int DesiredCount { get; init; } = 1; + public string? LaunchType { get; init; } // FARGATE or EC2 + public NetworkConfiguration? NetworkConfig { get; init; } + public LoadBalancerConfiguration? LoadBalancer { get; init; } + public DeploymentConfiguration? DeploymentConfig { get; init; } + public IReadOnlyDictionary? Tags { get; init; } + public bool ForceNewDeployment { get; init; } = true; + public TimeSpan DeploymentTimeout { get; init; } = TimeSpan.FromMinutes(10); + } + + public sealed record NetworkConfiguration + { + public required IReadOnlyList Subnets { get; init; } + public IReadOnlyList? SecurityGroups { get; init; } + public bool AssignPublicIp { get; init; } = false; + } + + public sealed record LoadBalancerConfiguration + { + public required string TargetGroupArn { get; init; } + public required string ContainerName { get; init; } + public required int ContainerPort { get; init; } + } + + public sealed record DeploymentConfiguration + { + public int MaximumPercent { get; init; } = 200; + public int MinimumHealthyPercent { get; init; } = 100; + public DeploymentCircuitBreaker? CircuitBreaker { get; init; } + } + + public sealed record DeploymentCircuitBreaker + { + public bool Enable { get; init; } = true; + public bool Rollback { get; init; } = true; + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("ecs.deploy-service"); + + _logger.LogInformation( + "Deploying ECS service {Service} to cluster {Cluster} with task definition {TaskDef}", + payload.ServiceName, + payload.Cluster, + payload.TaskDefinition); + + try + { + // Check if service exists + var existingService = await GetServiceAsync(payload.Cluster, payload.ServiceName, ct); + + if (existingService is not null) + { + return await UpdateServiceAsync(task.Id, payload, ct); + } + else + { + return await CreateServiceAsync(task.Id, payload, ct); + } + } + catch (AmazonECSException ex) + { + _logger.LogError(ex, "Failed to deploy ECS service {Service}", payload.ServiceName); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Failed to deploy service: {ex.Message}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private async Task GetServiceAsync(string cluster, string serviceName, CancellationToken ct) + { + try + { + var response = await _ecsClient.DescribeServicesAsync(new DescribeServicesRequest + { + Cluster = cluster, + Services = new List { serviceName } + }, ct); + + return response.Services.FirstOrDefault(s => s.Status != "INACTIVE"); + } + catch + { + return null; + } + } + + private async Task CreateServiceAsync( + Guid taskId, + DeployServicePayload payload, + CancellationToken ct) + { + _logger.LogInformation("Creating new ECS service {Service}", payload.ServiceName); + + var request = new CreateServiceRequest + { + Cluster = payload.Cluster, + ServiceName = payload.ServiceName, + TaskDefinition = payload.TaskDefinition, + DesiredCount = payload.DesiredCount, + LaunchType = string.IsNullOrEmpty(payload.LaunchType) ? null : new LaunchType(payload.LaunchType), + DeploymentConfiguration = payload.DeploymentConfig is not null + ? new Amazon.ECS.Model.DeploymentConfiguration + { + MaximumPercent = payload.DeploymentConfig.MaximumPercent, + MinimumHealthyPercent = payload.DeploymentConfig.MinimumHealthyPercent, + DeploymentCircuitBreaker = payload.DeploymentConfig.CircuitBreaker is not null + ? new Amazon.ECS.Model.DeploymentCircuitBreaker + { + Enable = payload.DeploymentConfig.CircuitBreaker.Enable, + Rollback = payload.DeploymentConfig.CircuitBreaker.Rollback + } + : null + } + : null, + Tags = payload.Tags?.Select(kv => new Tag { Key = kv.Key, Value = kv.Value }).ToList() + }; + + if (payload.NetworkConfig is not null) + { + request.NetworkConfiguration = new Amazon.ECS.Model.NetworkConfiguration + { + AwsvpcConfiguration = new AwsVpcConfiguration + { + Subnets = payload.NetworkConfig.Subnets.ToList(), + SecurityGroups = payload.NetworkConfig.SecurityGroups?.ToList(), + AssignPublicIp = payload.NetworkConfig.AssignPublicIp ? AssignPublicIp.ENABLED : AssignPublicIp.DISABLED + } + }; + } + + if (payload.LoadBalancer is not null) + { + request.LoadBalancers = new List + { + new() + { + TargetGroupArn = payload.LoadBalancer.TargetGroupArn, + ContainerName = payload.LoadBalancer.ContainerName, + ContainerPort = payload.LoadBalancer.ContainerPort + } + }; + } + + var createResponse = await _ecsClient.CreateServiceAsync(request, ct); + var service = createResponse.Service; + + _logger.LogInformation( + "Created ECS service {Service} (ARN: {Arn})", + payload.ServiceName, + service.ServiceArn); + + // Wait for deployment to stabilize + var stable = await WaitForServiceStableAsync( + payload.Cluster, + payload.ServiceName, + payload.DeploymentTimeout, + ct); + + return new TaskResult + { + TaskId = taskId, + Success = stable, + Error = stable ? null : "Service did not stabilize within timeout", + Outputs = new Dictionary + { + ["serviceArn"] = service.ServiceArn, + ["serviceName"] = service.ServiceName, + ["taskDefinition"] = service.TaskDefinition, + ["runningCount"] = service.RunningCount, + ["desiredCount"] = service.DesiredCount, + ["deploymentStatus"] = stable ? "COMPLETED" : "TIMED_OUT" + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + private async Task UpdateServiceAsync( + Guid taskId, + DeployServicePayload payload, + CancellationToken ct) + { + _logger.LogInformation( + "Updating existing ECS service {Service} to task definition {TaskDef}", + payload.ServiceName, + payload.TaskDefinition); + + var request = new UpdateServiceRequest + { + Cluster = payload.Cluster, + Service = payload.ServiceName, + TaskDefinition = payload.TaskDefinition, + DesiredCount = payload.DesiredCount, + ForceNewDeployment = payload.ForceNewDeployment + }; + + if (payload.DeploymentConfig is not null) + { + request.DeploymentConfiguration = new Amazon.ECS.Model.DeploymentConfiguration + { + MaximumPercent = payload.DeploymentConfig.MaximumPercent, + MinimumHealthyPercent = payload.DeploymentConfig.MinimumHealthyPercent, + DeploymentCircuitBreaker = payload.DeploymentConfig.CircuitBreaker is not null + ? new Amazon.ECS.Model.DeploymentCircuitBreaker + { + Enable = payload.DeploymentConfig.CircuitBreaker.Enable, + Rollback = payload.DeploymentConfig.CircuitBreaker.Rollback + } + : null + }; + } + + var updateResponse = await _ecsClient.UpdateServiceAsync(request, ct); + var service = updateResponse.Service; + + _logger.LogInformation( + "Updated ECS service {Service}, deployment ID: {DeploymentId}", + payload.ServiceName, + service.Deployments.FirstOrDefault()?.Id ?? "unknown"); + + // Wait for deployment to stabilize + var stable = await WaitForServiceStableAsync( + payload.Cluster, + payload.ServiceName, + payload.DeploymentTimeout, + ct); + + return new TaskResult + { + TaskId = taskId, + Success = stable, + Error = stable ? null : "Service did not stabilize within timeout", + Outputs = new Dictionary + { + ["serviceArn"] = service.ServiceArn, + ["serviceName"] = service.ServiceName, + ["taskDefinition"] = service.TaskDefinition, + ["runningCount"] = service.RunningCount, + ["desiredCount"] = service.DesiredCount, + ["deploymentId"] = service.Deployments.FirstOrDefault()?.Id ?? "", + ["deploymentStatus"] = stable ? "COMPLETED" : "TIMED_OUT" + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + private async Task WaitForServiceStableAsync( + string cluster, + string serviceName, + TimeSpan timeout, + CancellationToken ct) + { + _logger.LogInformation("Waiting for service {Service} to stabilize", serviceName); + + using var timeoutCts = new CancellationTokenSource(timeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + try + { + while (!linkedCts.IsCancellationRequested) + { + var response = await _ecsClient.DescribeServicesAsync(new DescribeServicesRequest + { + Cluster = cluster, + Services = new List { serviceName } + }, linkedCts.Token); + + var service = response.Services.FirstOrDefault(); + if (service is null) + { + _logger.LogWarning("Service {Service} not found during stabilization check", serviceName); + return false; + } + + var primaryDeployment = service.Deployments.FirstOrDefault(d => d.Status == "PRIMARY"); + if (primaryDeployment is null) + { + await Task.Delay(TimeSpan.FromSeconds(10), linkedCts.Token); + continue; + } + + if (primaryDeployment.RunningCount == primaryDeployment.DesiredCount && + service.Deployments.Count == 1) + { + _logger.LogInformation( + "Service {Service} stabilized with {Count} running tasks", + serviceName, + primaryDeployment.RunningCount); + return true; + } + + _logger.LogDebug( + "Service {Service} not stable: running={Running}, desired={Desired}, deployments={Deployments}", + serviceName, + primaryDeployment.RunningCount, + primaryDeployment.DesiredCount, + service.Deployments.Count); + + await Task.Delay(TimeSpan.FromSeconds(10), linkedCts.Token); + } + } + catch (OperationCanceledException) when (timeoutCts.IsCancellationRequested) + { + _logger.LogWarning("Service {Service} stabilization timed out after {Timeout}", serviceName, timeout); + } + + return false; + } +} +``` + +### EcsRunTaskTask + +```csharp +namespace StellaOps.Agent.Ecs.Tasks; + +public sealed class EcsRunTaskTask +{ + private readonly IAmazonECS _ecsClient; + private readonly ILogger _logger; + + public sealed record RunTaskPayload + { + public required string Cluster { get; init; } + public required string TaskDefinition { get; init; } + public int Count { get; init; } = 1; + public string? LaunchType { get; init; } + public NetworkConfiguration? NetworkConfig { get; init; } + public IReadOnlyList? Overrides { get; init; } + public string? Group { get; init; } + public IReadOnlyDictionary? Tags { get; init; } + public bool WaitForCompletion { get; init; } = true; + public TimeSpan CompletionTimeout { get; init; } = TimeSpan.FromMinutes(30); + } + + public sealed record ContainerOverride + { + public required string Name { get; init; } + public IReadOnlyList? Command { get; init; } + public IReadOnlyDictionary? Environment { get; init; } + public int? Cpu { get; init; } + public int? Memory { get; init; } + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("ecs.run-task"); + + _logger.LogInformation( + "Running ECS task from definition {TaskDef} on cluster {Cluster}", + payload.TaskDefinition, + payload.Cluster); + + try + { + var request = new RunTaskRequest + { + Cluster = payload.Cluster, + TaskDefinition = payload.TaskDefinition, + Count = payload.Count, + LaunchType = string.IsNullOrEmpty(payload.LaunchType) ? null : new LaunchType(payload.LaunchType), + Group = payload.Group, + Tags = payload.Tags?.Select(kv => new Tag { Key = kv.Key, Value = kv.Value }).ToList() + }; + + if (payload.NetworkConfig is not null) + { + request.NetworkConfiguration = new Amazon.ECS.Model.NetworkConfiguration + { + AwsvpcConfiguration = new AwsVpcConfiguration + { + Subnets = payload.NetworkConfig.Subnets.ToList(), + SecurityGroups = payload.NetworkConfig.SecurityGroups?.ToList(), + AssignPublicIp = payload.NetworkConfig.AssignPublicIp ? AssignPublicIp.ENABLED : AssignPublicIp.DISABLED + } + }; + } + + if (payload.Overrides is not null) + { + request.Overrides = new TaskOverride + { + ContainerOverrides = payload.Overrides.Select(o => new Amazon.ECS.Model.ContainerOverride + { + Name = o.Name, + Command = o.Command?.ToList(), + Environment = o.Environment?.Select(kv => new Amazon.ECS.Model.KeyValuePair + { + Name = kv.Key, + Value = kv.Value + }).ToList(), + Cpu = o.Cpu, + Memory = o.Memory + }).ToList() + }; + } + + var runResponse = await _ecsClient.RunTaskAsync(request, ct); + + if (runResponse.Failures.Any()) + { + var failure = runResponse.Failures.First(); + _logger.LogError( + "Failed to run ECS task: {Reason} (ARN: {Arn})", + failure.Reason, + failure.Arn); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Failed to run task: {failure.Reason}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + + var ecsTasks = runResponse.Tasks; + var taskArns = ecsTasks.Select(t => t.TaskArn).ToList(); + + _logger.LogInformation( + "Started {Count} ECS task(s): {TaskArns}", + ecsTasks.Count, + string.Join(", ", taskArns.Select(a => a.Split('/').Last()))); + + if (!payload.WaitForCompletion) + { + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["taskArns"] = taskArns, + ["taskCount"] = ecsTasks.Count, + ["status"] = "RUNNING" + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + // Wait for tasks to complete + var (completed, exitCodes) = await WaitForTasksAsync( + payload.Cluster, + taskArns, + payload.CompletionTimeout, + ct); + + var allSucceeded = completed && exitCodes.All(e => e == 0); + + return new TaskResult + { + TaskId = task.Id, + Success = allSucceeded, + Error = allSucceeded ? null : $"Task(s) failed with exit codes: {string.Join(", ", exitCodes)}", + Outputs = new Dictionary + { + ["taskArns"] = taskArns, + ["taskCount"] = ecsTasks.Count, + ["exitCodes"] = exitCodes, + ["status"] = allSucceeded ? "SUCCEEDED" : "FAILED" + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (AmazonECSException ex) + { + _logger.LogError(ex, "Failed to run ECS task from {TaskDef}", payload.TaskDefinition); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Failed to run task: {ex.Message}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private async Task<(bool Completed, List ExitCodes)> WaitForTasksAsync( + string cluster, + List taskArns, + TimeSpan timeout, + CancellationToken ct) + { + using var timeoutCts = new CancellationTokenSource(timeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + var exitCodes = new List(); + + try + { + while (!linkedCts.IsCancellationRequested) + { + var response = await _ecsClient.DescribeTasksAsync(new DescribeTasksRequest + { + Cluster = cluster, + Tasks = taskArns + }, linkedCts.Token); + + var allStopped = response.Tasks.All(t => t.LastStatus == "STOPPED"); + if (allStopped) + { + exitCodes = response.Tasks + .SelectMany(t => t.Containers.Select(c => c.ExitCode ?? -1)) + .ToList(); + return (true, exitCodes); + } + + await Task.Delay(TimeSpan.FromSeconds(10), linkedCts.Token); + } + } + catch (OperationCanceledException) when (timeoutCts.IsCancellationRequested) + { + _logger.LogWarning("Task completion wait timed out after {Timeout}", timeout); + } + + return (false, exitCodes); + } +} +``` + +### EcsHealthCheckTask + +```csharp +namespace StellaOps.Agent.Ecs.Tasks; + +public sealed class EcsHealthCheckTask +{ + private readonly IAmazonECS _ecsClient; + private readonly ILogger _logger; + + public sealed record HealthCheckPayload + { + public required string Cluster { get; init; } + public required string ServiceName { get; init; } + public int MinHealthyPercent { get; init; } = 100; + public TimeSpan Timeout { get; init; } = TimeSpan.FromMinutes(5); + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("ecs.health-check"); + + _logger.LogInformation( + "Checking health of ECS service {Service} in cluster {Cluster}", + payload.ServiceName, + payload.Cluster); + + try + { + using var timeoutCts = new CancellationTokenSource(payload.Timeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + while (!linkedCts.IsCancellationRequested) + { + var response = await _ecsClient.DescribeServicesAsync(new DescribeServicesRequest + { + Cluster = payload.Cluster, + Services = new List { payload.ServiceName } + }, linkedCts.Token); + + var service = response.Services.FirstOrDefault(); + if (service is null) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = "Service not found", + CompletedAt = DateTimeOffset.UtcNow + }; + } + + var healthyPercent = service.DesiredCount > 0 + ? (service.RunningCount * 100) / service.DesiredCount + : 0; + + if (healthyPercent >= payload.MinHealthyPercent && service.Deployments.Count == 1) + { + _logger.LogInformation( + "Service {Service} is healthy: {Running}/{Desired} tasks running ({Percent}%)", + payload.ServiceName, + service.RunningCount, + service.DesiredCount, + healthyPercent); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["serviceName"] = service.ServiceName, + ["runningCount"] = service.RunningCount, + ["desiredCount"] = service.DesiredCount, + ["healthyPercent"] = healthyPercent, + ["status"] = service.Status, + ["deployments"] = service.Deployments.Count + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + _logger.LogDebug( + "Service {Service} health check: {Running}/{Desired} ({Percent}%), waiting...", + payload.ServiceName, + service.RunningCount, + service.DesiredCount, + healthyPercent); + + await Task.Delay(TimeSpan.FromSeconds(10), linkedCts.Token); + } + + throw new OperationCanceledException(); + } + catch (OperationCanceledException) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Health check timed out after {payload.Timeout}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (AmazonECSException ex) + { + _logger.LogError(ex, "Failed to check health of ECS service {Service}", payload.ServiceName); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Health check failed: {ex.Message}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + } +} +``` + +### CloudWatchLogStreamer + +```csharp +namespace StellaOps.Agent.Ecs; + +public sealed class CloudWatchLogStreamer +{ + private readonly IAmazonCloudWatchLogs _logsClient; + private readonly LogStreamer _logStreamer; + private readonly ILogger _logger; + + public async Task StreamLogsAsync( + Guid taskId, + string logGroupName, + string logStreamName, + CancellationToken ct = default) + { + string? nextToken = null; + + try + { + while (!ct.IsCancellationRequested) + { + var request = new GetLogEventsRequest + { + LogGroupName = logGroupName, + LogStreamName = logStreamName, + StartFromHead = true, + NextToken = nextToken + }; + + var response = await _logsClient.GetLogEventsAsync(request, ct); + + foreach (var logEvent in response.Events) + { + var level = DetectLogLevel(logEvent.Message); + _logStreamer.Log(taskId, level, logEvent.Message); + } + + if (response.NextForwardToken == nextToken) + { + // No new logs, wait before polling again + await Task.Delay(TimeSpan.FromSeconds(2), ct); + } + + nextToken = response.NextForwardToken; + } + } + catch (OperationCanceledException) + { + // Expected when task completes + } + catch (Exception ex) + { + _logger.LogWarning( + ex, + "Error streaming logs from {LogGroup}/{LogStream}", + logGroupName, + logStreamName); + } + } + + private static LogLevel DetectLogLevel(string message) + { + if (message.Contains("ERROR", StringComparison.OrdinalIgnoreCase) || + message.Contains("FATAL", StringComparison.OrdinalIgnoreCase)) + { + return LogLevel.Error; + } + + if (message.Contains("WARN", StringComparison.OrdinalIgnoreCase)) + { + return LogLevel.Warning; + } + + if (message.Contains("DEBUG", StringComparison.OrdinalIgnoreCase)) + { + return LogLevel.Debug; + } + + return LogLevel.Information; + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Deploy new ECS services (Fargate and EC2 launch types) +- [ ] Update existing ECS services with new task definitions +- [ ] Run one-off ECS tasks +- [ ] Stop running ECS tasks +- [ ] Scale ECS services up/down +- [ ] Register new task definitions +- [ ] Check service health and stability +- [ ] Wait for deployments to complete +- [ ] Stream logs from CloudWatch +- [ ] Support network configuration (VPC, subnets, security groups) +- [ ] Support load balancer integration +- [ ] Support deployment circuit breaker +- [ ] Unit test coverage >= 85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 108_001 Agent Core Runtime | Internal | TODO | +| AWSSDK.ECS | NuGet | Available | +| AWSSDK.CloudWatchLogs | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| EcsCapability | TODO | | +| EcsDeployServiceTask | TODO | | +| EcsRunTaskTask | TODO | | +| EcsStopTaskTask | TODO | | +| EcsScaleServiceTask | TODO | | +| EcsRegisterTaskDefinitionTask | TODO | | +| EcsHealthCheckTask | TODO | | +| CloudWatchLogStreamer | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_108_007_AGENTS_nomad.md b/docs/implplan/SPRINT_20260110_108_007_AGENTS_nomad.md new file mode 100644 index 000000000..a36fd5fec --- /dev/null +++ b/docs/implplan/SPRINT_20260110_108_007_AGENTS_nomad.md @@ -0,0 +1,900 @@ +# SPRINT: Agent - Nomad + +> **Sprint ID:** 108_007 +> **Module:** AGENTS +> **Phase:** 8 - Agents +> **Status:** TODO +> **Parent:** [108_000_INDEX](SPRINT_20260110_108_000_INDEX_agents.md) + +--- + +## Overview + +Implement the Nomad Agent capability for managing HashiCorp Nomad job deployments, supporting Docker, raw_exec, and other Nomad task drivers. + +### Objectives + +- Nomad job deployments (register, run, stop) +- Job scaling operations +- Deployment monitoring and health checks +- Allocation status tracking +- Log streaming from allocations +- Support for multiple task drivers (docker, raw_exec, java) +- Constraint and affinity configuration + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Agents/ +│ └── StellaOps.Agent.Nomad/ +│ ├── NomadCapability.cs +│ ├── Tasks/ +│ │ ├── NomadDeployJobTask.cs +│ │ ├── NomadStopJobTask.cs +│ │ ├── NomadScaleJobTask.cs +│ │ ├── NomadJobStatusTask.cs +│ │ └── NomadHealthCheckTask.cs +│ ├── NomadClientFactory.cs +│ └── NomadLogStreamer.cs +└── __Tests/ + └── StellaOps.Agent.Nomad.Tests/ +``` + +--- + +## Deliverables + +### NomadCapability + +```csharp +namespace StellaOps.Agent.Nomad; + +public sealed class NomadCapability : IAgentCapability +{ + private readonly NomadClient _nomadClient; + private readonly ILogger _logger; + private readonly Dictionary>> _taskHandlers; + + public string Name => "nomad"; + public string Version => "1.0.0"; + + public IReadOnlyList SupportedTaskTypes => new[] + { + "nomad.deploy-job", + "nomad.stop-job", + "nomad.scale-job", + "nomad.job-status", + "nomad.health-check", + "nomad.dispatch-job" + }; + + public NomadCapability(NomadClient nomadClient, ILogger logger) + { + _nomadClient = nomadClient; + _logger = logger; + + _taskHandlers = new Dictionary>> + { + ["nomad.deploy-job"] = ExecuteDeployJobAsync, + ["nomad.stop-job"] = ExecuteStopJobAsync, + ["nomad.scale-job"] = ExecuteScaleJobAsync, + ["nomad.job-status"] = ExecuteJobStatusAsync, + ["nomad.health-check"] = ExecuteHealthCheckAsync, + ["nomad.dispatch-job"] = ExecuteDispatchJobAsync + }; + } + + public async Task InitializeAsync(CancellationToken ct = default) + { + try + { + var status = await _nomadClient.Agent.GetSelfAsync(ct); + _logger.LogInformation( + "Nomad capability initialized, connected to {Region} region (version {Version})", + status.Stats["nomad"]["region"], + status.Stats["nomad"]["version"]); + return true; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to initialize Nomad capability"); + return false; + } + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct = default) + { + if (!_taskHandlers.TryGetValue(task.TaskType, out var handler)) + { + throw new UnsupportedTaskTypeException(task.TaskType); + } + + return await handler(task, ct); + } + + public async Task CheckHealthAsync(CancellationToken ct = default) + { + try + { + var status = await _nomadClient.Agent.GetSelfAsync(ct); + return new CapabilityHealthStatus(true, $"Nomad agent responding ({status.Stats["nomad"]["region"]})"); + } + catch (Exception ex) + { + return new CapabilityHealthStatus(false, $"Nomad agent not responding: {ex.Message}"); + } + } + + private Task ExecuteDeployJobAsync(AgentTask task, CancellationToken ct) => + new NomadDeployJobTask(_nomadClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteStopJobAsync(AgentTask task, CancellationToken ct) => + new NomadStopJobTask(_nomadClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteScaleJobAsync(AgentTask task, CancellationToken ct) => + new NomadScaleJobTask(_nomadClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteJobStatusAsync(AgentTask task, CancellationToken ct) => + new NomadJobStatusTask(_nomadClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteHealthCheckAsync(AgentTask task, CancellationToken ct) => + new NomadHealthCheckTask(_nomadClient, _logger).ExecuteAsync(task, ct); + + private Task ExecuteDispatchJobAsync(AgentTask task, CancellationToken ct) => + new NomadDispatchJobTask(_nomadClient, _logger).ExecuteAsync(task, ct); +} +``` + +### NomadDeployJobTask + +```csharp +namespace StellaOps.Agent.Nomad.Tasks; + +public sealed class NomadDeployJobTask +{ + private readonly NomadClient _nomadClient; + private readonly ILogger _logger; + + public sealed record DeployJobPayload + { + /// + /// Job specification in HCL or JSON format. + /// + public string? JobSpec { get; init; } + + /// + /// Pre-parsed job definition (alternative to JobSpec). + /// + public JobDefinition? Job { get; init; } + + /// + /// Variables to substitute in job spec. + /// + public IReadOnlyDictionary? Variables { get; init; } + + /// + /// Nomad namespace. + /// + public string? Namespace { get; init; } + + /// + /// Region to deploy to. + /// + public string? Region { get; init; } + + /// + /// Whether to wait for deployment to complete. + /// + public bool WaitForDeployment { get; init; } = true; + + /// + /// Deployment completion timeout. + /// + public TimeSpan DeploymentTimeout { get; init; } = TimeSpan.FromMinutes(10); + + /// + /// If true, job is run in detached mode (fire and forget). + /// + public bool Detach { get; init; } = false; + } + + public sealed record JobDefinition + { + public required string ID { get; init; } + public required string Name { get; init; } + public string Type { get; init; } = "service"; + public string? Namespace { get; init; } + public string? Region { get; init; } + public int Priority { get; init; } = 50; + public IReadOnlyList? Datacenters { get; init; } + public IReadOnlyList? TaskGroups { get; init; } + public UpdateStrategy? Update { get; init; } + public IReadOnlyDictionary? Meta { get; init; } + public IReadOnlyList? Constraints { get; init; } + } + + public sealed record TaskGroupDefinition + { + public required string Name { get; init; } + public int Count { get; init; } = 1; + public IReadOnlyList? Tasks { get; init; } + public IReadOnlyList? Networks { get; init; } + public IReadOnlyList? Services { get; init; } + public RestartPolicy? RestartPolicy { get; init; } + public EphemeralDisk? EphemeralDisk { get; init; } + } + + public sealed record TaskDefinition + { + public required string Name { get; init; } + public required string Driver { get; init; } // docker, raw_exec, java, etc. + public required IReadOnlyDictionary Config { get; init; } + public ResourceRequirements? Resources { get; init; } + public IReadOnlyDictionary? Env { get; init; } + public IReadOnlyList? Templates { get; init; } + public IReadOnlyList? Artifacts { get; init; } + public LogConfig? Logs { get; init; } + } + + public sealed record UpdateStrategy + { + public int MaxParallel { get; init; } = 1; + public string HealthCheck { get; init; } = "checks"; + public TimeSpan MinHealthyTime { get; init; } = TimeSpan.FromSeconds(10); + public TimeSpan HealthyDeadline { get; init; } = TimeSpan.FromMinutes(5); + public bool AutoRevert { get; init; } = false; + public bool AutoPromote { get; init; } = false; + public int Canary { get; init; } = 0; + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("nomad.deploy-job"); + + Job nomadJob; + + if (!string.IsNullOrEmpty(payload.JobSpec)) + { + // Parse HCL or JSON job spec + var parseResponse = await _nomadClient.Jobs.ParseJobAsync( + payload.JobSpec, + payload.Variables?.ToDictionary(kv => kv.Key, kv => kv.Value), + ct); + nomadJob = parseResponse; + } + else if (payload.Job is not null) + { + nomadJob = ConvertToNomadJob(payload.Job); + } + else + { + throw new InvalidPayloadException("nomad.deploy-job", "Either JobSpec or Job must be provided"); + } + + _logger.LogInformation( + "Deploying Nomad job {JobId} to region {Region}", + nomadJob.ID, + payload.Region ?? "default"); + + try + { + // Register the job + var registerResponse = await _nomadClient.Jobs.RegisterAsync( + nomadJob, + new WriteOptions + { + Namespace = payload.Namespace, + Region = payload.Region + }, + ct); + + _logger.LogInformation( + "Registered Nomad job {JobId}, evaluation ID: {EvalId}", + nomadJob.ID, + registerResponse.EvalID); + + if (payload.Detach) + { + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["jobId"] = nomadJob.ID, + ["evalId"] = registerResponse.EvalID, + ["status"] = "DETACHED" + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + if (!payload.WaitForDeployment) + { + // Just wait for evaluation to complete + var evaluation = await WaitForEvaluationAsync( + registerResponse.EvalID, + payload.Namespace, + TimeSpan.FromMinutes(2), + ct); + + return new TaskResult + { + TaskId = task.Id, + Success = evaluation.Status == "complete", + Outputs = new Dictionary + { + ["jobId"] = nomadJob.ID, + ["evalId"] = registerResponse.EvalID, + ["evalStatus"] = evaluation.Status, + ["status"] = evaluation.Status == "complete" ? "EVALUATED" : "EVAL_FAILED" + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + // Wait for deployment to complete + var deployment = await WaitForDeploymentAsync( + nomadJob.ID, + payload.Namespace, + payload.DeploymentTimeout, + ct); + + var success = deployment?.Status == "successful"; + + return new TaskResult + { + TaskId = task.Id, + Success = success, + Error = success ? null : $"Deployment failed: {deployment?.StatusDescription ?? "unknown"}", + Outputs = new Dictionary + { + ["jobId"] = nomadJob.ID, + ["evalId"] = registerResponse.EvalID, + ["deploymentId"] = deployment?.ID ?? "", + ["deploymentStatus"] = deployment?.Status ?? "unknown", + ["status"] = success ? "DEPLOYED" : "DEPLOYMENT_FAILED" + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (NomadApiException ex) + { + _logger.LogError(ex, "Failed to deploy Nomad job {JobId}", nomadJob.ID); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Failed to deploy job: {ex.Message}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + } + + private async Task WaitForEvaluationAsync( + string evalId, + string? ns, + TimeSpan timeout, + CancellationToken ct) + { + using var timeoutCts = new CancellationTokenSource(timeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + while (!linkedCts.IsCancellationRequested) + { + var evaluation = await _nomadClient.Evaluations.GetAsync( + evalId, + new QueryOptions { Namespace = ns }, + linkedCts.Token); + + if (evaluation.Status is "complete" or "failed" or "canceled") + { + return evaluation; + } + + _logger.LogDebug("Evaluation {EvalId} status: {Status}", evalId, evaluation.Status); + await Task.Delay(TimeSpan.FromSeconds(2), linkedCts.Token); + } + + throw new OperationCanceledException("Evaluation wait timed out"); + } + + private async Task WaitForDeploymentAsync( + string jobId, + string? ns, + TimeSpan timeout, + CancellationToken ct) + { + using var timeoutCts = new CancellationTokenSource(timeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + Deployment? deployment = null; + + while (!linkedCts.IsCancellationRequested) + { + var deployments = await _nomadClient.Jobs.GetDeploymentsAsync( + jobId, + new QueryOptions { Namespace = ns }, + linkedCts.Token); + + deployment = deployments.FirstOrDefault(); + if (deployment is null) + { + await Task.Delay(TimeSpan.FromSeconds(2), linkedCts.Token); + continue; + } + + if (deployment.Status is "successful" or "failed" or "cancelled") + { + return deployment; + } + + _logger.LogDebug( + "Deployment {DeploymentId} status: {Status}", + deployment.ID, + deployment.Status); + + await Task.Delay(TimeSpan.FromSeconds(5), linkedCts.Token); + } + + return deployment; + } + + private static Job ConvertToNomadJob(JobDefinition def) + { + return new Job + { + ID = def.ID, + Name = def.Name, + Type = def.Type, + Namespace = def.Namespace, + Region = def.Region, + Priority = def.Priority, + Datacenters = def.Datacenters?.ToList(), + Meta = def.Meta?.ToDictionary(kv => kv.Key, kv => kv.Value), + TaskGroups = def.TaskGroups?.Select(tg => new TaskGroup + { + Name = tg.Name, + Count = tg.Count, + Tasks = tg.Tasks?.Select(t => new Task + { + Name = t.Name, + Driver = t.Driver, + Config = t.Config?.ToDictionary(kv => kv.Key, kv => kv.Value), + Env = t.Env?.ToDictionary(kv => kv.Key, kv => kv.Value), + Resources = t.Resources is not null ? new Resources + { + CPU = t.Resources.CPU, + MemoryMB = t.Resources.MemoryMB + } : null + }).ToList() + }).ToList(), + Update = def.Update is not null ? new UpdateStrategy + { + MaxParallel = def.Update.MaxParallel, + HealthCheck = def.Update.HealthCheck, + MinHealthyTime = (long)def.Update.MinHealthyTime.TotalNanoseconds, + HealthyDeadline = (long)def.Update.HealthyDeadline.TotalNanoseconds, + AutoRevert = def.Update.AutoRevert, + AutoPromote = def.Update.AutoPromote, + Canary = def.Update.Canary + } : null + }; + } +} +``` + +### NomadStopJobTask + +```csharp +namespace StellaOps.Agent.Nomad.Tasks; + +public sealed class NomadStopJobTask +{ + private readonly NomadClient _nomadClient; + private readonly ILogger _logger; + + public sealed record StopJobPayload + { + public required string JobId { get; init; } + public string? Namespace { get; init; } + public string? Region { get; init; } + public bool Purge { get; init; } = false; + public bool Global { get; init; } = false; + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("nomad.stop-job"); + + _logger.LogInformation( + "Stopping Nomad job {JobId} (purge: {Purge})", + payload.JobId, + payload.Purge); + + try + { + var response = await _nomadClient.Jobs.DeregisterAsync( + payload.JobId, + payload.Purge, + payload.Global, + new WriteOptions + { + Namespace = payload.Namespace, + Region = payload.Region + }, + ct); + + _logger.LogInformation( + "Stopped Nomad job {JobId}, evaluation ID: {EvalId}", + payload.JobId, + response.EvalID); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["jobId"] = payload.JobId, + ["evalId"] = response.EvalID, + ["purged"] = payload.Purge + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (NomadApiException ex) + { + _logger.LogError(ex, "Failed to stop Nomad job {JobId}", payload.JobId); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Failed to stop job: {ex.Message}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + } +} +``` + +### NomadScaleJobTask + +```csharp +namespace StellaOps.Agent.Nomad.Tasks; + +public sealed class NomadScaleJobTask +{ + private readonly NomadClient _nomadClient; + private readonly ILogger _logger; + + public sealed record ScaleJobPayload + { + public required string JobId { get; init; } + public required string TaskGroup { get; init; } + public required int Count { get; init; } + public string? Namespace { get; init; } + public string? Region { get; init; } + public string? Reason { get; init; } + public bool PolicyOverride { get; init; } = false; + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("nomad.scale-job"); + + _logger.LogInformation( + "Scaling Nomad job {JobId} task group {TaskGroup} to {Count}", + payload.JobId, + payload.TaskGroup, + payload.Count); + + try + { + var response = await _nomadClient.Jobs.ScaleAsync( + payload.JobId, + payload.TaskGroup, + payload.Count, + payload.Reason ?? $"Scaled by Stella Ops (task: {task.Id})", + payload.PolicyOverride, + new WriteOptions + { + Namespace = payload.Namespace, + Region = payload.Region + }, + ct); + + _logger.LogInformation( + "Scaled Nomad job {JobId} task group {TaskGroup} to {Count}, evaluation ID: {EvalId}", + payload.JobId, + payload.TaskGroup, + payload.Count, + response.EvalID); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["jobId"] = payload.JobId, + ["taskGroup"] = payload.TaskGroup, + ["count"] = payload.Count, + ["evalId"] = response.EvalID + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (NomadApiException ex) + { + _logger.LogError( + ex, + "Failed to scale Nomad job {JobId} task group {TaskGroup}", + payload.JobId, + payload.TaskGroup); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Failed to scale job: {ex.Message}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + } +} +``` + +### NomadHealthCheckTask + +```csharp +namespace StellaOps.Agent.Nomad.Tasks; + +public sealed class NomadHealthCheckTask +{ + private readonly NomadClient _nomadClient; + private readonly ILogger _logger; + + public sealed record HealthCheckPayload + { + public required string JobId { get; init; } + public string? Namespace { get; init; } + public string? Region { get; init; } + public int MinHealthyAllocations { get; init; } = 1; + public TimeSpan Timeout { get; init; } = TimeSpan.FromMinutes(5); + } + + public async Task ExecuteAsync(AgentTask task, CancellationToken ct) + { + var payload = JsonSerializer.Deserialize(task.Payload) + ?? throw new InvalidPayloadException("nomad.health-check"); + + _logger.LogInformation( + "Checking health of Nomad job {JobId}", + payload.JobId); + + try + { + using var timeoutCts = new CancellationTokenSource(payload.Timeout); + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token); + + while (!linkedCts.IsCancellationRequested) + { + var allocations = await _nomadClient.Jobs.GetAllocationsAsync( + payload.JobId, + new QueryOptions + { + Namespace = payload.Namespace, + Region = payload.Region + }, + linkedCts.Token); + + var runningAllocations = allocations + .Where(a => a.ClientStatus == "running") + .ToList(); + + var healthyCount = runningAllocations + .Count(a => a.DeploymentStatus?.Healthy == true); + + if (healthyCount >= payload.MinHealthyAllocations) + { + _logger.LogInformation( + "Nomad job {JobId} is healthy: {Healthy}/{Total} allocations healthy", + payload.JobId, + healthyCount, + runningAllocations.Count); + + return new TaskResult + { + TaskId = task.Id, + Success = true, + Outputs = new Dictionary + { + ["jobId"] = payload.JobId, + ["healthyAllocations"] = healthyCount, + ["totalAllocations"] = allocations.Count, + ["runningAllocations"] = runningAllocations.Count + }, + CompletedAt = DateTimeOffset.UtcNow + }; + } + + _logger.LogDebug( + "Nomad job {JobId} health check: {Healthy}/{MinRequired} healthy, waiting...", + payload.JobId, + healthyCount, + payload.MinHealthyAllocations); + + await Task.Delay(TimeSpan.FromSeconds(5), linkedCts.Token); + } + + throw new OperationCanceledException(); + } + catch (OperationCanceledException) + { + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Health check timed out after {payload.Timeout}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + catch (NomadApiException ex) + { + _logger.LogError(ex, "Failed to check health of Nomad job {JobId}", payload.JobId); + + return new TaskResult + { + TaskId = task.Id, + Success = false, + Error = $"Health check failed: {ex.Message}", + CompletedAt = DateTimeOffset.UtcNow + }; + } + } +} +``` + +### NomadLogStreamer + +```csharp +namespace StellaOps.Agent.Nomad; + +public sealed class NomadLogStreamer +{ + private readonly NomadClient _nomadClient; + private readonly LogStreamer _logStreamer; + private readonly ILogger _logger; + + public async Task StreamLogsAsync( + Guid taskId, + string allocationId, + string taskName, + string logType, // "stdout" or "stderr" + CancellationToken ct = default) + { + try + { + var stream = await _nomadClient.Allocations.GetLogsAsync( + allocationId, + taskName, + logType, + follow: true, + ct); + + using var reader = new StreamReader(stream); + + while (!ct.IsCancellationRequested) + { + var line = await reader.ReadLineAsync(ct); + if (line is null) + break; + + var level = logType == "stderr" ? LogLevel.Error : LogLevel.Information; + + // Override level based on content heuristics + if (logType == "stdout") + { + level = DetectLogLevel(line); + } + + _logStreamer.Log(taskId, level, line); + } + } + catch (OperationCanceledException) + { + // Expected when task completes + } + catch (Exception ex) + { + _logger.LogWarning( + ex, + "Error streaming logs for allocation {AllocationId} task {TaskName}", + allocationId, + taskName); + } + } + + private static LogLevel DetectLogLevel(string message) + { + if (message.Contains("ERROR", StringComparison.OrdinalIgnoreCase) || + message.Contains("FATAL", StringComparison.OrdinalIgnoreCase)) + { + return LogLevel.Error; + } + + if (message.Contains("WARN", StringComparison.OrdinalIgnoreCase)) + { + return LogLevel.Warning; + } + + if (message.Contains("DEBUG", StringComparison.OrdinalIgnoreCase)) + { + return LogLevel.Debug; + } + + return LogLevel.Information; + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Register and deploy Nomad jobs from HCL or JSON spec +- [ ] Register and deploy Nomad jobs from structured JobDefinition +- [ ] Stop Nomad jobs (with optional purge) +- [ ] Scale Nomad job task groups +- [ ] Check job health and allocation status +- [ ] Wait for deployments to complete +- [ ] Dispatch parameterized batch jobs +- [ ] Stream logs from allocations +- [ ] Support Docker task driver +- [ ] Support raw_exec task driver +- [ ] Support job constraints and affinities +- [ ] Support update strategies (rolling, canary) +- [ ] Unit test coverage >= 85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 108_001 Agent Core Runtime | Internal | TODO | +| Nomad.Api (or custom HTTP client) | NuGet/Custom | TODO | + +> **Note:** HashiCorp does not provide an official .NET SDK for Nomad. Implementation will use a custom HTTP client wrapper or community library. + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| NomadCapability | TODO | | +| NomadDeployJobTask | TODO | | +| NomadStopJobTask | TODO | | +| NomadScaleJobTask | TODO | | +| NomadJobStatusTask | TODO | | +| NomadHealthCheckTask | TODO | | +| NomadDispatchJobTask | TODO | | +| NomadLogStreamer | TODO | | +| NomadClient wrapper | TODO | Custom HTTP client | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_109_000_INDEX_evidence_audit.md b/docs/implplan/SPRINT_20260110_109_000_INDEX_evidence_audit.md new file mode 100644 index 000000000..14455249f --- /dev/null +++ b/docs/implplan/SPRINT_20260110_109_000_INDEX_evidence_audit.md @@ -0,0 +1,243 @@ +# SPRINT INDEX: Phase 9 - Evidence & Audit + +> **Epic:** Release Orchestrator +> **Phase:** 9 - Evidence & Audit +> **Batch:** 109 +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_release_orchestrator.md) + +--- + +## Overview + +Phase 9 implements the Evidence & Audit system - generating cryptographically signed, immutable evidence packets for every deployment decision. + +### Objectives + +- Evidence collector gathers deployment context +- Evidence signer creates tamper-proof signatures +- Version sticker writer records deployment state +- Audit exporter generates compliance reports + +--- + +## Sprint Structure + +| Sprint ID | Title | Module | Status | Dependencies | +|-----------|-------|--------|--------|--------------| +| 109_001 | Evidence Collector | RELEVI | TODO | 106_005, 107_001 | +| 109_002 | Evidence Signer | RELEVI | TODO | 109_001 | +| 109_003 | Version Sticker Writer | RELEVI | TODO | 107_002 | +| 109_004 | Audit Exporter | RELEVI | TODO | 109_002 | + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ EVIDENCE & AUDIT │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ EVIDENCE COLLECTOR (109_001) │ │ +│ │ │ │ +│ │ Collects from: │ │ +│ │ ├── Release bundle (components, digests, source refs) │ │ +│ │ ├── Promotion (requester, approvers, gates) │ │ +│ │ ├── Deployment (targets, tasks, artifacts) │ │ +│ │ └── Decision (gate results, freeze window status) │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Evidence Packet │ │ │ +│ │ │ { │ │ │ +│ │ │ "type": "deployment", │ │ │ +│ │ │ "release": { ... }, │ │ │ +│ │ │ "environment": { ... }, │ │ │ +│ │ │ "actors": { requester, approvers, deployer }, │ │ │ +│ │ │ "decision": { gates, freeze_check, sod }, │ │ │ +│ │ │ "execution": { tasks, artifacts, metrics } │ │ │ +│ │ │ } │ │ │ +│ │ └─────────────────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ EVIDENCE SIGNER (109_002) │ │ +│ │ │ │ +│ │ 1. Canonicalize JSON (RFC 8785) │ │ +│ │ 2. Hash content (SHA-256) │ │ +│ │ 3. Sign hash with signing key (RS256 or ES256) │ │ +│ │ 4. Store in append-only table │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ { │ │ │ +│ │ │ "content": { ... }, │ │ │ +│ │ │ "contentHash": "sha256:abc...", │ │ │ +│ │ │ "signature": "base64...", │ │ │ +│ │ │ "signatureAlgorithm": "RS256", │ │ │ +│ │ │ "signerKeyRef": "stella/signing/prod-key-2026" │ │ │ +│ │ │ } │ │ │ +│ │ └─────────────────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ VERSION STICKER WRITER (109_003) │ │ +│ │ │ │ +│ │ stella.version.json written to each target: │ │ +│ │ { │ │ +│ │ "release": "myapp-v2.3.1", │ │ +│ │ "deployment_id": "uuid", │ │ +│ │ "deployed_at": "2026-01-10T14:35:00Z", │ │ +│ │ "components": [ │ │ +│ │ { "name": "api", "digest": "sha256:..." }, │ │ +│ │ { "name": "worker", "digest": "sha256:..." } │ │ +│ │ ], │ │ +│ │ "evidence_id": "evid-uuid" │ │ +│ │ } │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ AUDIT EXPORTER (109_004) │ │ +│ │ │ │ +│ │ Export formats: │ │ +│ │ ├── JSON - Machine-readable, full detail │ │ +│ │ ├── PDF - Human-readable compliance reports │ │ +│ │ ├── CSV - Spreadsheet analysis │ │ +│ │ └── SLSA - SLSA provenance format │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Deliverables Summary + +### 109_001: Evidence Collector + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IEvidenceCollector` | Interface | Evidence collection | +| `EvidenceCollector` | Class | Implementation | +| `EvidenceContent` | Model | Evidence structure | +| `ContentBuilder` | Class | Build evidence sections | + +### 109_002: Evidence Signer + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IEvidenceSigner` | Interface | Signing operations | +| `EvidenceSigner` | Class | Implementation | +| `CanonicalJsonSerializer` | Class | RFC 8785 canonicalization | +| `SigningKeyProvider` | Class | Key management | + +### 109_003: Version Sticker Writer + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IVersionStickerWriter` | Interface | Sticker writing | +| `VersionStickerWriter` | Class | Implementation | +| `VersionSticker` | Model | Sticker structure | +| `StickerAgent Task` | Task | Agent writes sticker | + +### 109_004: Audit Exporter + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IAuditExporter` | Interface | Export operations | +| `JsonExporter` | Exporter | JSON format | +| `PdfExporter` | Exporter | PDF format | +| `CsvExporter` | Exporter | CSV format | +| `SlsaExporter` | Exporter | SLSA format | + +--- + +## Key Interfaces + +```csharp +public interface IEvidenceCollector +{ + Task CollectAsync(Guid promotionId, EvidenceType type, CancellationToken ct); +} + +public interface IEvidenceSigner +{ + Task SignAsync(EvidencePacket packet, CancellationToken ct); + Task VerifyAsync(SignedEvidencePacket packet, CancellationToken ct); +} + +public interface IVersionStickerWriter +{ + Task WriteAsync(Guid deploymentTaskId, VersionSticker sticker, CancellationToken ct); + Task ReadAsync(Guid targetId, CancellationToken ct); +} + +public interface IAuditExporter +{ + Task ExportAsync(AuditExportRequest request, CancellationToken ct); + IReadOnlyList SupportedFormats { get; } +} +``` + +--- + +## Evidence Lifecycle + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ EVIDENCE LIFECYCLE │ +│ │ +│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ +│ │ Promotion │───►│ Collect │───►│ Sign │───►│ Store │ │ +│ │ Complete │ │ Evidence │ │ Evidence │ │ (immutable) │ │ +│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ +│ │ +│ │ │ +│ ┌───────────────────┴───────────────────┐ │ +│ ▼ ▼ │ +│ ┌─────────────┐ ┌─────────────┐ │ +│ │ Export │ │ Verify │ │ +│ │ (on-demand)│ │ (on-demand) │ │ +│ └─────────────┘ └─────────────┘ │ +│ │ │ │ +│ ┌───────────┼───────────┬───────────┐ │ │ +│ ▼ ▼ ▼ ▼ ▼ │ +│ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌───────────┐ │ +│ │ JSON │ │ PDF │ │ CSV │ │ SLSA │ │ Verified │ │ +│ └────────┘ └────────┘ └────────┘ └────────┘ │ Report │ │ +│ └───────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Dependencies + +| Module | Purpose | +|--------|---------| +| 106_005 Decision Engine | Decision data | +| 107_001 Deploy Orchestrator | Deployment data | +| 107_002 Target Executor | Task data | +| Signer | Cryptographic signing | + +--- + +## Acceptance Criteria + +- [ ] Evidence collected for all promotions +- [ ] Evidence signed with platform key +- [ ] Signature verification works +- [ ] Append-only storage enforced +- [ ] Version sticker written to targets +- [ ] JSON export works +- [ ] PDF export readable +- [ ] SLSA format compliant +- [ ] Unit test coverage ≥80% + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Phase 9 index created | diff --git a/docs/implplan/SPRINT_20260110_109_001_RELEVI_evidence_collector.md b/docs/implplan/SPRINT_20260110_109_001_RELEVI_evidence_collector.md new file mode 100644 index 000000000..0851221bb --- /dev/null +++ b/docs/implplan/SPRINT_20260110_109_001_RELEVI_evidence_collector.md @@ -0,0 +1,597 @@ +# SPRINT: Evidence Collector + +> **Sprint ID:** 109_001 +> **Module:** RELEVI +> **Phase:** 9 - Evidence & Audit +> **Status:** TODO +> **Parent:** [109_000_INDEX](SPRINT_20260110_109_000_INDEX_evidence_audit.md) + +--- + +## Overview + +Implement the Evidence Collector for gathering deployment decision context into cryptographically sealed evidence packets. + +### Objectives + +- Collect evidence from release, promotion, and deployment data +- Build comprehensive evidence packets +- Track evidence dependencies and lineage +- Store evidence in append-only store + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Evidence/ +│ ├── Collector/ +│ │ ├── IEvidenceCollector.cs +│ │ ├── EvidenceCollector.cs +│ │ ├── ContentBuilder.cs +│ │ └── Collectors/ +│ │ ├── ReleaseEvidenceCollector.cs +│ │ ├── PromotionEvidenceCollector.cs +│ │ ├── DeploymentEvidenceCollector.cs +│ │ └── DecisionEvidenceCollector.cs +│ ├── Models/ +│ │ ├── EvidencePacket.cs +│ │ ├── EvidenceContent.cs +│ │ └── EvidenceType.cs +│ └── Store/ +│ ├── IEvidenceStore.cs +│ └── EvidenceStore.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### IEvidenceCollector Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Collector; + +public interface IEvidenceCollector +{ + Task CollectAsync( + Guid subjectId, + EvidenceType type, + CancellationToken ct = default); + + Task CollectDeploymentEvidenceAsync( + Guid deploymentJobId, + CancellationToken ct = default); + + Task CollectPromotionEvidenceAsync( + Guid promotionId, + CancellationToken ct = default); +} + +public enum EvidenceType +{ + Promotion, + Deployment, + Rollback, + GateDecision, + Approval +} +``` + +### EvidencePacket Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Models; + +public sealed record EvidencePacket +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required EvidenceType Type { get; init; } + public required Guid SubjectId { get; init; } + public required string SubjectType { get; init; } + public required EvidenceContent Content { get; init; } + public required ImmutableArray DependsOn { get; init; } + public required DateTimeOffset CollectedAt { get; init; } + public required string CollectorVersion { get; init; } +} + +public sealed record EvidenceContent +{ + public required ReleaseEvidence? Release { get; init; } + public required PromotionEvidence? Promotion { get; init; } + public required DeploymentEvidence? Deployment { get; init; } + public required DecisionEvidence? Decision { get; init; } + public required ImmutableDictionary Metadata { get; init; } +} + +public sealed record ReleaseEvidence +{ + public required Guid ReleaseId { get; init; } + public required string ReleaseName { get; init; } + public string? ManifestDigest { get; init; } + public DateTimeOffset? FinalizedAt { get; init; } + public required ImmutableArray Components { get; init; } +} + +public sealed record ComponentEvidence +{ + public required Guid ComponentId { get; init; } + public required string ComponentName { get; init; } + public required string Digest { get; init; } + public string? Tag { get; init; } + public string? SemVer { get; init; } + public string? SourceRef { get; init; } + public string? SbomDigest { get; init; } +} + +public sealed record PromotionEvidence +{ + public required Guid PromotionId { get; init; } + public required Guid SourceEnvironmentId { get; init; } + public required string SourceEnvironmentName { get; init; } + public required Guid TargetEnvironmentId { get; init; } + public required string TargetEnvironmentName { get; init; } + public required ActorEvidence Requester { get; init; } + public required ImmutableArray Approvals { get; init; } + public required DateTimeOffset RequestedAt { get; init; } + public DateTimeOffset? ApprovedAt { get; init; } +} + +public sealed record ActorEvidence +{ + public required Guid UserId { get; init; } + public required string UserName { get; init; } + public required string UserEmail { get; init; } + public ImmutableArray Groups { get; init; } = []; +} + +public sealed record ApprovalEvidence +{ + public required ActorEvidence Approver { get; init; } + public required string Decision { get; init; } + public string? Comment { get; init; } + public required DateTimeOffset DecidedAt { get; init; } +} + +public sealed record DeploymentEvidence +{ + public required Guid DeploymentJobId { get; init; } + public required string Strategy { get; init; } + public required DateTimeOffset StartedAt { get; init; } + public DateTimeOffset? CompletedAt { get; init; } + public required string Status { get; init; } + public required ImmutableArray Tasks { get; init; } + public required ImmutableArray Artifacts { get; init; } +} + +public sealed record TaskEvidence +{ + public required Guid TaskId { get; init; } + public required Guid TargetId { get; init; } + public required string TargetName { get; init; } + public required string Status { get; init; } + public DateTimeOffset? StartedAt { get; init; } + public DateTimeOffset? CompletedAt { get; init; } + public string? Error { get; init; } +} + +public sealed record ArtifactEvidence +{ + public required string ArtifactType { get; init; } + public required string Digest { get; init; } + public required string Location { get; init; } +} + +public sealed record DecisionEvidence +{ + public required ImmutableArray GateResults { get; init; } + public required FreezeCheckEvidence FreezeCheck { get; init; } + public required SodCheckEvidence SodCheck { get; init; } + public required string FinalDecision { get; init; } + public required DateTimeOffset DecidedAt { get; init; } +} + +public sealed record GateResultEvidence +{ + public required string GateName { get; init; } + public required string GateType { get; init; } + public required bool Passed { get; init; } + public string? Message { get; init; } + public ImmutableDictionary? Details { get; init; } + public required DateTimeOffset EvaluatedAt { get; init; } +} + +public sealed record FreezeCheckEvidence +{ + public required bool Checked { get; init; } + public required bool FreezeActive { get; init; } + public string? FreezeReason { get; init; } + public bool Overridden { get; init; } + public ActorEvidence? OverriddenBy { get; init; } +} + +public sealed record SodCheckEvidence +{ + public required bool Required { get; init; } + public required bool Satisfied { get; init; } + public string? Violation { get; init; } +} +``` + +### EvidenceCollector Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Collector; + +public sealed class EvidenceCollector : IEvidenceCollector +{ + private readonly ReleaseEvidenceCollector _releaseCollector; + private readonly PromotionEvidenceCollector _promotionCollector; + private readonly DeploymentEvidenceCollector _deploymentCollector; + private readonly DecisionEvidenceCollector _decisionCollector; + private readonly IEvidenceStore _evidenceStore; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + private readonly ITenantContext _tenantContext; + private readonly ILogger _logger; + + private const string CollectorVersion = "1.0.0"; + + public async Task CollectAsync( + Guid subjectId, + EvidenceType type, + CancellationToken ct = default) + { + return type switch + { + EvidenceType.Promotion => await CollectPromotionEvidenceAsync(subjectId, ct), + EvidenceType.Deployment => await CollectDeploymentEvidenceAsync(subjectId, ct), + _ => throw new UnsupportedEvidenceTypeException(type) + }; + } + + public async Task CollectDeploymentEvidenceAsync( + Guid deploymentJobId, + CancellationToken ct = default) + { + _logger.LogInformation( + "Collecting deployment evidence for job {JobId}", + deploymentJobId); + + // Collect all evidence sections + var deploymentEvidence = await _deploymentCollector.CollectAsync(deploymentJobId, ct); + var releaseEvidence = await _releaseCollector.CollectAsync(deploymentEvidence.ReleaseId, ct); + var promotionEvidence = await _promotionCollector.CollectAsync(deploymentEvidence.PromotionId, ct); + var decisionEvidence = await _decisionCollector.CollectAsync(deploymentEvidence.PromotionId, ct); + + var content = new EvidenceContent + { + Release = releaseEvidence, + Promotion = promotionEvidence, + Deployment = deploymentEvidence.ToEvidence(), + Decision = decisionEvidence, + Metadata = ImmutableDictionary.Empty + .Add("platform", "stella-ops") + .Add("collectorVersion", CollectorVersion) + }; + + var packet = new EvidencePacket + { + Id = _guidGenerator.NewGuid(), + TenantId = _tenantContext.TenantId, + Type = EvidenceType.Deployment, + SubjectId = deploymentJobId, + SubjectType = "DeploymentJob", + Content = content, + DependsOn = await GetDependentEvidenceAsync(deploymentJobId, ct), + CollectedAt = _timeProvider.GetUtcNow(), + CollectorVersion = CollectorVersion + }; + + // Store the packet + await _evidenceStore.StoreAsync(packet, ct); + + _logger.LogInformation( + "Collected deployment evidence {PacketId} for job {JobId}", + packet.Id, + deploymentJobId); + + return packet; + } + + public async Task CollectPromotionEvidenceAsync( + Guid promotionId, + CancellationToken ct = default) + { + _logger.LogInformation( + "Collecting promotion evidence for {PromotionId}", + promotionId); + + var promotionEvidence = await _promotionCollector.CollectAsync(promotionId, ct); + var releaseEvidence = await _releaseCollector.CollectAsync(promotionEvidence.ReleaseId, ct); + var decisionEvidence = await _decisionCollector.CollectAsync(promotionId, ct); + + var content = new EvidenceContent + { + Release = releaseEvidence, + Promotion = promotionEvidence.ToEvidence(), + Deployment = null, + Decision = decisionEvidence, + Metadata = ImmutableDictionary.Empty + .Add("platform", "stella-ops") + .Add("collectorVersion", CollectorVersion) + }; + + var packet = new EvidencePacket + { + Id = _guidGenerator.NewGuid(), + TenantId = _tenantContext.TenantId, + Type = EvidenceType.Promotion, + SubjectId = promotionId, + SubjectType = "Promotion", + Content = content, + DependsOn = ImmutableArray.Empty, + CollectedAt = _timeProvider.GetUtcNow(), + CollectorVersion = CollectorVersion + }; + + await _evidenceStore.StoreAsync(packet, ct); + + _logger.LogInformation( + "Collected promotion evidence {PacketId} for {PromotionId}", + packet.Id, + promotionId); + + return packet; + } + + private async Task> GetDependentEvidenceAsync( + Guid deploymentJobId, + CancellationToken ct) + { + // Find promotion evidence that this deployment depends on + var promotion = await _promotionCollector.GetPromotionForJobAsync(deploymentJobId, ct); + if (promotion is null) + return ImmutableArray.Empty; + + var promotionEvidence = await _evidenceStore.GetBySubjectAsync( + promotion.Id, + EvidenceType.Promotion, + ct); + + if (promotionEvidence is null) + return ImmutableArray.Empty; + + return ImmutableArray.Create(promotionEvidence.Id); + } +} +``` + +### ContentBuilder + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Collector; + +public sealed class ContentBuilder +{ + public static ReleaseEvidence BuildReleaseEvidence(Release release) + { + return new ReleaseEvidence + { + ReleaseId = release.Id, + ReleaseName = release.Name, + ManifestDigest = release.ManifestDigest, + FinalizedAt = release.FinalizedAt, + Components = release.Components.Select(c => new ComponentEvidence + { + ComponentId = c.ComponentId, + ComponentName = c.ComponentName, + Digest = c.Digest, + Tag = c.Tag, + SemVer = c.SemVer, + SourceRef = c.Config.GetValueOrDefault("sourceRef"), + SbomDigest = c.Config.GetValueOrDefault("sbomDigest") + }).ToImmutableArray() + }; + } + + public static PromotionEvidence BuildPromotionEvidence( + Promotion promotion, + IReadOnlyList approvals, + IReadOnlyList users) + { + var userLookup = users.ToDictionary(u => u.Id); + + return new PromotionEvidence + { + PromotionId = promotion.Id, + SourceEnvironmentId = promotion.SourceEnvironmentId, + SourceEnvironmentName = promotion.SourceEnvironmentName, + TargetEnvironmentId = promotion.TargetEnvironmentId, + TargetEnvironmentName = promotion.TargetEnvironmentName, + Requester = BuildActorEvidence(promotion.RequestedBy, userLookup), + Approvals = approvals.Select(a => new ApprovalEvidence + { + Approver = BuildActorEvidence(a.UserId, userLookup), + Decision = a.Decision.ToString(), + Comment = a.Comment, + DecidedAt = a.DecidedAt + }).ToImmutableArray(), + RequestedAt = promotion.RequestedAt, + ApprovedAt = promotion.ApprovedAt + }; + } + + public static DeploymentEvidence BuildDeploymentEvidence( + DeploymentJob job, + IReadOnlyList artifacts) + { + return new DeploymentEvidence + { + DeploymentJobId = job.Id, + Strategy = job.Strategy.ToString(), + StartedAt = job.StartedAt, + CompletedAt = job.CompletedAt, + Status = job.Status.ToString(), + Tasks = job.Tasks.Select(t => new TaskEvidence + { + TaskId = t.Id, + TargetId = t.TargetId, + TargetName = t.TargetName, + Status = t.Status.ToString(), + StartedAt = t.StartedAt, + CompletedAt = t.CompletedAt, + Error = t.Error + }).ToImmutableArray(), + Artifacts = artifacts.Select(a => new ArtifactEvidence + { + ArtifactType = a.Type, + Digest = a.Digest, + Location = a.Location + }).ToImmutableArray() + }; + } + + public static DecisionEvidence BuildDecisionEvidence( + DecisionRecord decision, + IReadOnlyList gateResults) + { + return new DecisionEvidence + { + GateResults = gateResults.Select(g => new GateResultEvidence + { + GateName = g.GateName, + GateType = g.GateType, + Passed = g.Passed, + Message = g.Message, + Details = g.Details?.ToImmutableDictionary(), + EvaluatedAt = g.EvaluatedAt + }).ToImmutableArray(), + FreezeCheck = new FreezeCheckEvidence + { + Checked = true, + FreezeActive = decision.FreezeActive, + FreezeReason = decision.FreezeReason, + Overridden = decision.FreezeOverridden, + OverriddenBy = decision.FreezeOverriddenBy is not null + ? new ActorEvidence + { + UserId = decision.FreezeOverriddenBy.Value, + UserName = decision.FreezeOverriddenByName ?? "", + UserEmail = "" + } + : null + }, + SodCheck = new SodCheckEvidence + { + Required = decision.SodRequired, + Satisfied = decision.SodSatisfied, + Violation = decision.SodViolation + }, + FinalDecision = decision.FinalDecision.ToString(), + DecidedAt = decision.DecidedAt + }; + } + + private static ActorEvidence BuildActorEvidence( + Guid userId, + Dictionary userLookup) + { + if (userLookup.TryGetValue(userId, out var user)) + { + return new ActorEvidence + { + UserId = user.Id, + UserName = user.Name, + UserEmail = user.Email, + Groups = user.Groups.ToImmutableArray() + }; + } + + return new ActorEvidence + { + UserId = userId, + UserName = "Unknown", + UserEmail = "", + Groups = ImmutableArray.Empty + }; + } +} +``` + +### IEvidenceStore Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Store; + +public interface IEvidenceStore +{ + Task StoreAsync(EvidencePacket packet, CancellationToken ct = default); + Task GetAsync(Guid packetId, CancellationToken ct = default); + Task GetBySubjectAsync(Guid subjectId, EvidenceType type, CancellationToken ct = default); + Task> ListAsync(EvidenceQueryFilter filter, CancellationToken ct = default); + Task ExistsAsync(Guid packetId, CancellationToken ct = default); +} + +public sealed record EvidenceQueryFilter +{ + public Guid? TenantId { get; init; } + public EvidenceType? Type { get; init; } + public DateTimeOffset? FromDate { get; init; } + public DateTimeOffset? ToDate { get; init; } + public int Limit { get; init; } = 100; + public int Offset { get; init; } = 0; +} +``` + +--- + +## Acceptance Criteria + +- [ ] Collect release evidence with all components +- [ ] Collect promotion evidence with approvals +- [ ] Collect deployment evidence with all tasks +- [ ] Collect decision evidence with gate results +- [ ] Build comprehensive evidence packets +- [ ] Track evidence dependencies +- [ ] Store evidence in append-only store +- [ ] Query evidence by subject +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 106_005 Decision Engine | Internal | TODO | +| 107_001 Deploy Orchestrator | Internal | TODO | +| 104_003 Release Manager | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IEvidenceCollector | TODO | | +| EvidenceCollector | TODO | | +| ContentBuilder | TODO | | +| EvidencePacket model | TODO | | +| ReleaseEvidenceCollector | TODO | | +| PromotionEvidenceCollector | TODO | | +| DeploymentEvidenceCollector | TODO | | +| DecisionEvidenceCollector | TODO | | +| IEvidenceStore | TODO | | +| EvidenceStore | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_109_002_RELEVI_evidence_signer.md b/docs/implplan/SPRINT_20260110_109_002_RELEVI_evidence_signer.md new file mode 100644 index 000000000..a300817e6 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_109_002_RELEVI_evidence_signer.md @@ -0,0 +1,626 @@ +# SPRINT: Evidence Signer + +> **Sprint ID:** 109_002 +> **Module:** RELEVI +> **Phase:** 9 - Evidence & Audit +> **Status:** TODO +> **Parent:** [109_000_INDEX](SPRINT_20260110_109_000_INDEX_evidence_audit.md) + +--- + +## Overview + +Implement the Evidence Signer for creating cryptographically signed, tamper-proof evidence packets. + +### Objectives + +- Canonicalize JSON using RFC 8785 +- Hash evidence content with SHA-256 +- Sign with RS256 or ES256 algorithms +- Verify signatures on demand +- Key rotation support + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Evidence/ +│ └── Signing/ +│ ├── IEvidenceSigner.cs +│ ├── EvidenceSigner.cs +│ ├── CanonicalJsonSerializer.cs +│ ├── SigningKeyProvider.cs +│ ├── SignedEvidencePacket.cs +│ └── Algorithms/ +│ ├── Rs256Signer.cs +│ └── Es256Signer.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### IEvidenceSigner Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Signing; + +public interface IEvidenceSigner +{ + Task SignAsync( + EvidencePacket packet, + CancellationToken ct = default); + + Task VerifyAsync( + SignedEvidencePacket signedPacket, + CancellationToken ct = default); + + Task VerifyWithDetailsAsync( + SignedEvidencePacket signedPacket, + CancellationToken ct = default); +} + +public sealed record SignedEvidencePacket +{ + public required Guid Id { get; init; } + public required EvidencePacket Content { get; init; } + public required string ContentHash { get; init; } + public required string Signature { get; init; } + public required string SignatureAlgorithm { get; init; } + public required string SignerKeyRef { get; init; } + public required DateTimeOffset SignedAt { get; init; } +} + +public sealed record VerificationResult +{ + public required bool IsValid { get; init; } + public required bool SignatureValid { get; init; } + public required bool ContentHashValid { get; init; } + public required bool KeyValid { get; init; } + public string? Error { get; init; } + public DateTimeOffset VerifiedAt { get; init; } +} +``` + +### CanonicalJsonSerializer + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Signing; + +/// +/// RFC 8785 (JCS) compliant JSON canonicalizer. +/// +public static class CanonicalJsonSerializer +{ + public static string Serialize(object value) + { + // Convert to JsonElement for processing + var json = JsonSerializer.Serialize(value, new JsonSerializerOptions + { + PropertyNamingPolicy = null, // Preserve property names + WriteIndented = false + }); + + var element = JsonDocument.Parse(json).RootElement; + return Canonicalize(element); + } + + public static string Serialize(EvidencePacket packet) + { + // Use explicit ordering for evidence packets + var orderedContent = new SortedDictionary + { + ["id"] = packet.Id.ToString(), + ["tenantId"] = packet.TenantId.ToString(), + ["type"] = packet.Type.ToString(), + ["subjectId"] = packet.SubjectId.ToString(), + ["subjectType"] = packet.SubjectType, + ["content"] = SerializeContent(packet.Content), + ["dependsOn"] = packet.DependsOn.Select(d => d.ToString()).ToArray(), + ["collectedAt"] = FormatTimestamp(packet.CollectedAt), + ["collectorVersion"] = packet.CollectorVersion + }; + + return SerializeOrdered(orderedContent); + } + + private static string Canonicalize(JsonElement element) + { + return element.ValueKind switch + { + JsonValueKind.Object => CanonicalizeObject(element), + JsonValueKind.Array => CanonicalizeArray(element), + JsonValueKind.String => CanonicalizeString(element), + JsonValueKind.Number => CanonicalizeNumber(element), + JsonValueKind.True => "true", + JsonValueKind.False => "false", + JsonValueKind.Null => "null", + _ => throw new InvalidOperationException($"Unsupported JSON type: {element.ValueKind}") + }; + } + + private static string CanonicalizeObject(JsonElement element) + { + // RFC 8785: Sort properties by Unicode code point order + var properties = element.EnumerateObject() + .OrderBy(p => p.Name, StringComparer.Ordinal) + .Select(p => $"\"{EscapeString(p.Name)}\":{Canonicalize(p.Value)}"); + + return "{" + string.Join(",", properties) + "}"; + } + + private static string CanonicalizeArray(JsonElement element) + { + var items = element.EnumerateArray() + .Select(Canonicalize); + + return "[" + string.Join(",", items) + "]"; + } + + private static string CanonicalizeString(JsonElement element) + { + return "\"" + EscapeString(element.GetString() ?? "") + "\""; + } + + private static string CanonicalizeNumber(JsonElement element) + { + // RFC 8785: Numbers are serialized without exponent notation + // and without trailing zeros + if (element.TryGetInt64(out var longValue)) + { + return longValue.ToString(CultureInfo.InvariantCulture); + } + + if (element.TryGetDouble(out var doubleValue)) + { + // Format without exponent, minimal precision + return FormatDouble(doubleValue); + } + + return element.GetRawText(); + } + + private static string FormatDouble(double value) + { + if (double.IsNaN(value) || double.IsInfinity(value)) + { + throw new InvalidOperationException("NaN and Infinity not allowed in canonical JSON"); + } + + // Use G17 for full precision, then normalize + var str = value.ToString("G17", CultureInfo.InvariantCulture); + + // Remove exponent notation if present + if (str.Contains('E') || str.Contains('e')) + { + var d = double.Parse(str, CultureInfo.InvariantCulture); + str = d.ToString("F15", CultureInfo.InvariantCulture).TrimEnd('0').TrimEnd('.'); + } + + return str; + } + + private static string EscapeString(string value) + { + var sb = new StringBuilder(); + + foreach (var c in value) + { + switch (c) + { + case '"': sb.Append("\\\""); break; + case '\\': sb.Append("\\\\"); break; + case '\b': sb.Append("\\b"); break; + case '\f': sb.Append("\\f"); break; + case '\n': sb.Append("\\n"); break; + case '\r': sb.Append("\\r"); break; + case '\t': sb.Append("\\t"); break; + default: + if (c < 0x20) + { + sb.Append($"\\u{(int)c:x4}"); + } + else + { + sb.Append(c); + } + break; + } + } + + return sb.ToString(); + } + + private static string FormatTimestamp(DateTimeOffset timestamp) + { + return timestamp.ToUniversalTime().ToString("yyyy-MM-ddTHH:mm:ss.fffZ", CultureInfo.InvariantCulture); + } + + private static object SerializeContent(EvidenceContent content) + { + // Serialize with sorted keys + return new SortedDictionary + { + ["decision"] = content.Decision, + ["deployment"] = content.Deployment, + ["metadata"] = content.Metadata, + ["promotion"] = content.Promotion, + ["release"] = content.Release + }; + } + + private static string SerializeOrdered(SortedDictionary dict) + { + return JsonSerializer.Serialize(dict, new JsonSerializerOptions + { + PropertyNamingPolicy = null, + WriteIndented = false, + DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull + }); + } +} +``` + +### EvidenceSigner Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Signing; + +public sealed class EvidenceSigner : IEvidenceSigner +{ + private readonly ISigningKeyProvider _keyProvider; + private readonly ISignedEvidenceStore _signedStore; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public async Task SignAsync( + EvidencePacket packet, + CancellationToken ct = default) + { + _logger.LogDebug("Signing evidence packet {PacketId}", packet.Id); + + // Get signing key + var key = await _keyProvider.GetCurrentSigningKeyAsync(ct); + + // Canonicalize content + var canonicalJson = CanonicalJsonSerializer.Serialize(packet); + + // Compute content hash + var contentHashBytes = SHA256.HashData(Encoding.UTF8.GetBytes(canonicalJson)); + var contentHash = $"sha256:{Convert.ToHexString(contentHashBytes).ToLowerInvariant()}"; + + // Sign the hash + var signatureBytes = await SignHashAsync(key, contentHashBytes, ct); + var signature = Convert.ToBase64String(signatureBytes); + + var signedPacket = new SignedEvidencePacket + { + Id = packet.Id, + Content = packet, + ContentHash = contentHash, + Signature = signature, + SignatureAlgorithm = key.Algorithm, + SignerKeyRef = key.KeyRef, + SignedAt = _timeProvider.GetUtcNow() + }; + + // Store signed packet + await _signedStore.StoreAsync(signedPacket, ct); + + _logger.LogInformation( + "Signed evidence packet {PacketId} with key {KeyRef}", + packet.Id, + key.KeyRef); + + return signedPacket; + } + + public async Task VerifyAsync( + SignedEvidencePacket signedPacket, + CancellationToken ct = default) + { + var result = await VerifyWithDetailsAsync(signedPacket, ct); + return result.IsValid; + } + + public async Task VerifyWithDetailsAsync( + SignedEvidencePacket signedPacket, + CancellationToken ct = default) + { + _logger.LogDebug("Verifying evidence packet {PacketId}", signedPacket.Id); + + try + { + // Get the signing key + var key = await _keyProvider.GetKeyByRefAsync(signedPacket.SignerKeyRef, ct); + if (key is null) + { + return new VerificationResult + { + IsValid = false, + SignatureValid = false, + ContentHashValid = false, + KeyValid = false, + Error = $"Signing key not found: {signedPacket.SignerKeyRef}", + VerifiedAt = _timeProvider.GetUtcNow() + }; + } + + // Verify content hash + var canonicalJson = CanonicalJsonSerializer.Serialize(signedPacket.Content); + var computedHashBytes = SHA256.HashData(Encoding.UTF8.GetBytes(canonicalJson)); + var computedHash = $"sha256:{Convert.ToHexString(computedHashBytes).ToLowerInvariant()}"; + + var contentHashValid = signedPacket.ContentHash == computedHash; + + if (!contentHashValid) + { + _logger.LogWarning( + "Content hash mismatch for packet {PacketId}: expected {Expected}, got {Actual}", + signedPacket.Id, + signedPacket.ContentHash, + computedHash); + + return new VerificationResult + { + IsValid = false, + SignatureValid = false, + ContentHashValid = false, + KeyValid = true, + Error = "Content hash mismatch - evidence may have been tampered with", + VerifiedAt = _timeProvider.GetUtcNow() + }; + } + + // Verify signature + var signatureBytes = Convert.FromBase64String(signedPacket.Signature); + var signatureValid = await VerifySignatureAsync(key, computedHashBytes, signatureBytes, ct); + + if (!signatureValid) + { + _logger.LogWarning( + "Signature verification failed for packet {PacketId}", + signedPacket.Id); + + return new VerificationResult + { + IsValid = false, + SignatureValid = false, + ContentHashValid = true, + KeyValid = true, + Error = "Signature verification failed", + VerifiedAt = _timeProvider.GetUtcNow() + }; + } + + _logger.LogDebug("Evidence packet {PacketId} verified successfully", signedPacket.Id); + + return new VerificationResult + { + IsValid = true, + SignatureValid = true, + ContentHashValid = true, + KeyValid = true, + VerifiedAt = _timeProvider.GetUtcNow() + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "Error verifying evidence packet {PacketId}", signedPacket.Id); + + return new VerificationResult + { + IsValid = false, + SignatureValid = false, + ContentHashValid = false, + KeyValid = false, + Error = ex.Message, + VerifiedAt = _timeProvider.GetUtcNow() + }; + } + } + + private async Task SignHashAsync( + SigningKey key, + byte[] hash, + CancellationToken ct) + { + return key.Algorithm switch + { + "RS256" => await SignRs256Async(key, hash, ct), + "ES256" => await SignEs256Async(key, hash, ct), + _ => throw new UnsupportedAlgorithmException(key.Algorithm) + }; + } + + private Task SignRs256Async(SigningKey key, byte[] hash, CancellationToken ct) + { + using var rsa = RSA.Create(); + rsa.ImportFromPem(key.PrivateKey); + + var signature = rsa.SignHash(hash, HashAlgorithmName.SHA256, RSASignaturePadding.Pkcs1); + return Task.FromResult(signature); + } + + private Task SignEs256Async(SigningKey key, byte[] hash, CancellationToken ct) + { + using var ecdsa = ECDsa.Create(); + ecdsa.ImportFromPem(key.PrivateKey); + + var signature = ecdsa.SignHash(hash); + return Task.FromResult(signature); + } + + private Task VerifySignatureAsync( + SigningKey key, + byte[] hash, + byte[] signature, + CancellationToken ct) + { + return key.Algorithm switch + { + "RS256" => Task.FromResult(VerifyRs256(key, hash, signature)), + "ES256" => Task.FromResult(VerifyEs256(key, hash, signature)), + _ => throw new UnsupportedAlgorithmException(key.Algorithm) + }; + } + + private static bool VerifyRs256(SigningKey key, byte[] hash, byte[] signature) + { + using var rsa = RSA.Create(); + rsa.ImportFromPem(key.PublicKey); + + return rsa.VerifyHash(hash, signature, HashAlgorithmName.SHA256, RSASignaturePadding.Pkcs1); + } + + private static bool VerifyEs256(SigningKey key, byte[] hash, byte[] signature) + { + using var ecdsa = ECDsa.Create(); + ecdsa.ImportFromPem(key.PublicKey); + + return ecdsa.VerifyHash(hash, signature); + } +} +``` + +### SigningKeyProvider + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Signing; + +public interface ISigningKeyProvider +{ + Task GetCurrentSigningKeyAsync(CancellationToken ct = default); + Task GetKeyByRefAsync(string keyRef, CancellationToken ct = default); + Task> ListKeysAsync(CancellationToken ct = default); +} + +public sealed class SigningKeyProvider : ISigningKeyProvider +{ + private readonly IKeyVaultClient _keyVault; + private readonly SigningConfiguration _config; + private readonly ILogger _logger; + + public async Task GetCurrentSigningKeyAsync(CancellationToken ct = default) + { + var keyRef = _config.CurrentKeyRef; + var key = await GetKeyByRefAsync(keyRef, ct) + ?? throw new SigningKeyNotFoundException(keyRef); + + return key; + } + + public async Task GetKeyByRefAsync(string keyRef, CancellationToken ct = default) + { + try + { + var vaultKey = await _keyVault.GetKeyAsync(keyRef, ct); + if (vaultKey is null) + return null; + + return new SigningKey + { + KeyRef = keyRef, + Algorithm = vaultKey.Algorithm, + PublicKey = vaultKey.PublicKey, + PrivateKey = vaultKey.PrivateKey, + CreatedAt = vaultKey.CreatedAt, + ExpiresAt = vaultKey.ExpiresAt + }; + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Failed to get signing key {KeyRef}", keyRef); + return null; + } + } + + public async Task> ListKeysAsync(CancellationToken ct = default) + { + var keys = await _keyVault.ListKeysAsync(_config.KeyPrefix, ct); + + return keys.Select(k => new SigningKeyInfo + { + KeyRef = k.KeyRef, + Algorithm = k.Algorithm, + CreatedAt = k.CreatedAt, + ExpiresAt = k.ExpiresAt, + IsCurrent = k.KeyRef == _config.CurrentKeyRef + }).ToList().AsReadOnly(); + } +} + +public sealed record SigningKey +{ + public required string KeyRef { get; init; } + public required string Algorithm { get; init; } + public required string PublicKey { get; init; } + public required string PrivateKey { get; init; } + public required DateTimeOffset CreatedAt { get; init; } + public DateTimeOffset? ExpiresAt { get; init; } +} + +public sealed record SigningKeyInfo +{ + public required string KeyRef { get; init; } + public required string Algorithm { get; init; } + public required DateTimeOffset CreatedAt { get; init; } + public DateTimeOffset? ExpiresAt { get; init; } + public bool IsCurrent { get; init; } +} + +public sealed class SigningConfiguration +{ + public required string CurrentKeyRef { get; set; } + public string KeyPrefix { get; set; } = "stella/signing/"; + public string DefaultAlgorithm { get; set; } = "RS256"; +} +``` + +--- + +## Acceptance Criteria + +- [ ] Canonicalize JSON per RFC 8785 +- [ ] Hash content with SHA-256 +- [ ] Sign with RS256 algorithm +- [ ] Sign with ES256 algorithm +- [ ] Verify signatures +- [ ] Detect content tampering +- [ ] Support key rotation +- [ ] Store signed packets immutably +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 109_001 Evidence Collector | Internal | TODO | +| Signer service | Internal | Existing | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IEvidenceSigner | TODO | | +| EvidenceSigner | TODO | | +| CanonicalJsonSerializer | TODO | | +| SigningKeyProvider | TODO | | +| Rs256Signer | TODO | | +| Es256Signer | TODO | | +| SignedEvidencePacket | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_109_003_RELEVI_version_sticker.md b/docs/implplan/SPRINT_20260110_109_003_RELEVI_version_sticker.md new file mode 100644 index 000000000..033d4cdff --- /dev/null +++ b/docs/implplan/SPRINT_20260110_109_003_RELEVI_version_sticker.md @@ -0,0 +1,538 @@ +# SPRINT: Version Sticker Writer + +> **Sprint ID:** 109_003 +> **Module:** RELEVI +> **Phase:** 9 - Evidence & Audit +> **Status:** TODO +> **Parent:** [109_000_INDEX](SPRINT_20260110_109_000_INDEX_evidence_audit.md) + +--- + +## Overview + +Implement the Version Sticker Writer for recording deployment state as stella.version.json files on each target. + +### Objectives + +- Generate version sticker content +- Write stickers to targets via agents +- Read stickers from targets for verification +- Track sticker state across deployments + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Evidence/ +│ └── Sticker/ +│ ├── IVersionStickerWriter.cs +│ ├── VersionStickerWriter.cs +│ ├── VersionStickerGenerator.cs +│ ├── StickerAgentTask.cs +│ └── Models/ +│ ├── VersionSticker.cs +│ └── StickerWriteResult.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### VersionSticker Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Sticker.Models; + +public sealed record VersionSticker +{ + public required string SchemaVersion { get; init; } = "1.0"; + public required string Release { get; init; } + public required Guid ReleaseId { get; init; } + public required Guid DeploymentId { get; init; } + public required Guid EnvironmentId { get; init; } + public required string EnvironmentName { get; init; } + public required Guid TargetId { get; init; } + public required string TargetName { get; init; } + public required DateTimeOffset DeployedAt { get; init; } + public required ImmutableArray Components { get; init; } + public required Guid EvidenceId { get; init; } + public string? EvidenceDigest { get; init; } + public required StickerMetadata Metadata { get; init; } +} + +public sealed record ComponentSticker +{ + public required string Name { get; init; } + public required string Digest { get; init; } + public string? Tag { get; init; } + public string? SemVer { get; init; } + public string? Image { get; init; } +} + +public sealed record StickerMetadata +{ + public required string Platform { get; init; } = "stella-ops"; + public required string PlatformVersion { get; init; } + public required string DeploymentStrategy { get; init; } + public Guid? PromotionId { get; init; } + public string? SourceEnvironment { get; init; } + public ImmutableDictionary CustomLabels { get; init; } = ImmutableDictionary.Empty; +} +``` + +### IVersionStickerWriter Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Sticker; + +public interface IVersionStickerWriter +{ + Task WriteAsync( + Guid deploymentTaskId, + VersionSticker sticker, + CancellationToken ct = default); + + Task ReadAsync( + Guid targetId, + CancellationToken ct = default); + + Task> WriteAllAsync( + Guid deploymentJobId, + CancellationToken ct = default); + + Task ValidateAsync( + Guid targetId, + Guid expectedReleaseId, + CancellationToken ct = default); +} + +public sealed record StickerWriteResult +{ + public required Guid TargetId { get; init; } + public required string TargetName { get; init; } + public required bool Success { get; init; } + public string? Error { get; init; } + public string? StickerPath { get; init; } + public DateTimeOffset WrittenAt { get; init; } +} + +public sealed record StickerValidationResult +{ + public required Guid TargetId { get; init; } + public required bool Valid { get; init; } + public required bool StickerExists { get; init; } + public required bool ReleaseMatches { get; init; } + public required bool ComponentsMatch { get; init; } + public Guid? ActualReleaseId { get; init; } + public IReadOnlyList? MismatchedComponents { get; init; } +} +``` + +### VersionStickerGenerator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Sticker; + +public sealed class VersionStickerGenerator +{ + private readonly IReleaseManager _releaseManager; + private readonly IDeploymentJobStore _jobStore; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + private const string PlatformVersion = "1.0.0"; + + public async Task GenerateAsync( + DeploymentJob job, + DeploymentTask task, + Guid evidenceId, + CancellationToken ct = default) + { + var release = await _releaseManager.GetAsync(job.ReleaseId, ct) + ?? throw new ReleaseNotFoundException(job.ReleaseId); + + var components = release.Components.Select(c => new ComponentSticker + { + Name = c.ComponentName, + Digest = c.Digest, + Tag = c.Tag, + SemVer = c.SemVer, + Image = c.Config.GetValueOrDefault("image") + }).ToImmutableArray(); + + var sticker = new VersionSticker + { + SchemaVersion = "1.0", + Release = release.Name, + ReleaseId = release.Id, + DeploymentId = job.Id, + EnvironmentId = job.EnvironmentId, + EnvironmentName = job.EnvironmentName, + TargetId = task.TargetId, + TargetName = task.TargetName, + DeployedAt = _timeProvider.GetUtcNow(), + Components = components, + EvidenceId = evidenceId, + Metadata = new StickerMetadata + { + Platform = "stella-ops", + PlatformVersion = PlatformVersion, + DeploymentStrategy = job.Strategy.ToString(), + PromotionId = job.PromotionId + } + }; + + _logger.LogDebug( + "Generated version sticker for release {Release} on target {Target}", + release.Name, + task.TargetName); + + return sticker; + } + + public string Serialize(VersionSticker sticker) + { + return JsonSerializer.Serialize(sticker, new JsonSerializerOptions + { + WriteIndented = true, + PropertyNamingPolicy = JsonNamingPolicy.CamelCase, + DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull + }); + } + + public VersionSticker? Deserialize(string json) + { + try + { + return JsonSerializer.Deserialize(json, new JsonSerializerOptions + { + PropertyNamingPolicy = JsonNamingPolicy.CamelCase + }); + } + catch (JsonException ex) + { + _logger.LogWarning(ex, "Failed to deserialize version sticker"); + return null; + } + } +} +``` + +### VersionStickerWriter Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Sticker; + +public sealed class VersionStickerWriter : IVersionStickerWriter +{ + private readonly IDeploymentJobStore _jobStore; + private readonly ITargetExecutor _targetExecutor; + private readonly VersionStickerGenerator _stickerGenerator; + private readonly IEvidenceCollector _evidenceCollector; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + private const string StickerFileName = "stella.version.json"; + + public async Task WriteAsync( + Guid deploymentTaskId, + VersionSticker sticker, + CancellationToken ct = default) + { + _logger.LogDebug( + "Writing version sticker to target {Target}", + sticker.TargetName); + + try + { + var stickerJson = _stickerGenerator.Serialize(sticker); + + // Create agent task to write sticker + var agentTask = new StickerAgentTask + { + TargetId = sticker.TargetId, + FileName = StickerFileName, + Content = stickerJson, + Location = GetStickerLocation(sticker) + }; + + var result = await _targetExecutor.ExecuteStickerWriteAsync(agentTask, ct); + + if (result.Success) + { + _logger.LogInformation( + "Wrote version sticker to target {Target} at {Path}", + sticker.TargetName, + result.StickerPath); + } + else + { + _logger.LogWarning( + "Failed to write version sticker to target {Target}: {Error}", + sticker.TargetName, + result.Error); + } + + return new StickerWriteResult + { + TargetId = sticker.TargetId, + TargetName = sticker.TargetName, + Success = result.Success, + Error = result.Error, + StickerPath = result.StickerPath, + WrittenAt = _timeProvider.GetUtcNow() + }; + } + catch (Exception ex) + { + _logger.LogError(ex, + "Error writing version sticker to target {Target}", + sticker.TargetName); + + return new StickerWriteResult + { + TargetId = sticker.TargetId, + TargetName = sticker.TargetName, + Success = false, + Error = ex.Message, + WrittenAt = _timeProvider.GetUtcNow() + }; + } + } + + public async Task> WriteAllAsync( + Guid deploymentJobId, + CancellationToken ct = default) + { + var job = await _jobStore.GetAsync(deploymentJobId, ct) + ?? throw new DeploymentJobNotFoundException(deploymentJobId); + + // Collect evidence first + var evidence = await _evidenceCollector.CollectDeploymentEvidenceAsync(deploymentJobId, ct); + + var results = new List(); + + foreach (var task in job.Tasks) + { + if (task.Status != DeploymentTaskStatus.Completed) + { + results.Add(new StickerWriteResult + { + TargetId = task.TargetId, + TargetName = task.TargetName, + Success = false, + Error = $"Task not completed (status: {task.Status})", + WrittenAt = _timeProvider.GetUtcNow() + }); + continue; + } + + var sticker = await _stickerGenerator.GenerateAsync(job, task, evidence.Id, ct); + var result = await WriteAsync(task.Id, sticker, ct); + results.Add(result); + } + + _logger.LogInformation( + "Wrote version stickers for job {JobId}: {Success}/{Total} succeeded", + deploymentJobId, + results.Count(r => r.Success), + results.Count); + + return results.AsReadOnly(); + } + + public async Task ReadAsync( + Guid targetId, + CancellationToken ct = default) + { + try + { + var agentTask = new StickerReadAgentTask + { + TargetId = targetId, + FileName = StickerFileName + }; + + var result = await _targetExecutor.ExecuteStickerReadAsync(agentTask, ct); + + if (!result.Success || string.IsNullOrEmpty(result.Content)) + { + _logger.LogDebug("No version sticker found on target {TargetId}", targetId); + return null; + } + + return _stickerGenerator.Deserialize(result.Content); + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Error reading version sticker from target {TargetId}", targetId); + return null; + } + } + + public async Task ValidateAsync( + Guid targetId, + Guid expectedReleaseId, + CancellationToken ct = default) + { + var sticker = await ReadAsync(targetId, ct); + + if (sticker is null) + { + return new StickerValidationResult + { + TargetId = targetId, + Valid = false, + StickerExists = false, + ReleaseMatches = false, + ComponentsMatch = false + }; + } + + var releaseMatches = sticker.ReleaseId == expectedReleaseId; + + // If release doesn't match, we can't validate components + if (!releaseMatches) + { + return new StickerValidationResult + { + TargetId = targetId, + Valid = false, + StickerExists = true, + ReleaseMatches = false, + ComponentsMatch = false, + ActualReleaseId = sticker.ReleaseId + }; + } + + // Validate components against actual running containers + var validation = await ValidateComponentsAsync(targetId, sticker.Components, ct); + + return new StickerValidationResult + { + TargetId = targetId, + Valid = validation.AllMatch, + StickerExists = true, + ReleaseMatches = true, + ComponentsMatch = validation.AllMatch, + ActualReleaseId = sticker.ReleaseId, + MismatchedComponents = validation.Mismatches + }; + } + + private async Task<(bool AllMatch, IReadOnlyList Mismatches)> ValidateComponentsAsync( + Guid targetId, + ImmutableArray expectedComponents, + CancellationToken ct) + { + var mismatches = new List(); + + // Query actual container digests from target + var actualContainers = await _targetExecutor.GetRunningContainersAsync(targetId, ct); + + foreach (var expected in expectedComponents) + { + var actual = actualContainers.FirstOrDefault(c => c.Name == expected.Name); + + if (actual is null) + { + mismatches.Add($"{expected.Name}: not running"); + } + else if (actual.Digest != expected.Digest) + { + mismatches.Add($"{expected.Name}: digest mismatch (expected {expected.Digest[..16]}, got {actual.Digest[..16]})"); + } + } + + return (mismatches.Count == 0, mismatches.AsReadOnly()); + } + + private static string GetStickerLocation(VersionSticker sticker) + { + // Default to /var/lib/stella-agent// + return $"/var/lib/stella-agent/{sticker.DeploymentId}/"; + } +} +``` + +### StickerAgentTask + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Sticker; + +public sealed record StickerAgentTask +{ + public required Guid TargetId { get; init; } + public required string FileName { get; init; } + public required string Content { get; init; } + public required string Location { get; init; } +} + +public sealed record StickerReadAgentTask +{ + public required Guid TargetId { get; init; } + public required string FileName { get; init; } + public string? Location { get; init; } +} + +public sealed record StickerWriteAgentResult +{ + public required bool Success { get; init; } + public string? Error { get; init; } + public string? StickerPath { get; init; } +} + +public sealed record StickerReadAgentResult +{ + public required bool Success { get; init; } + public string? Content { get; init; } + public string? Error { get; init; } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Generate version sticker with all components +- [ ] Serialize sticker as valid JSON +- [ ] Write sticker to target via agent +- [ ] Write stickers for all completed tasks +- [ ] Read sticker from target +- [ ] Validate sticker against expected release +- [ ] Validate components against running containers +- [ ] Detect digest mismatches +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 107_002 Target Executor | Internal | TODO | +| 109_001 Evidence Collector | Internal | TODO | +| 108_001 Agent Core Runtime | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IVersionStickerWriter | TODO | | +| VersionStickerWriter | TODO | | +| VersionStickerGenerator | TODO | | +| VersionSticker model | TODO | | +| StickerAgentTask | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_109_004_RELEVI_audit_exporter.md b/docs/implplan/SPRINT_20260110_109_004_RELEVI_audit_exporter.md new file mode 100644 index 000000000..8739750da --- /dev/null +++ b/docs/implplan/SPRINT_20260110_109_004_RELEVI_audit_exporter.md @@ -0,0 +1,706 @@ +# SPRINT: Audit Exporter + +> **Sprint ID:** 109_004 +> **Module:** RELEVI +> **Phase:** 9 - Evidence & Audit +> **Status:** TODO +> **Parent:** [109_000_INDEX](SPRINT_20260110_109_000_INDEX_evidence_audit.md) + +--- + +## Overview + +Implement the Audit Exporter for generating compliance reports in multiple formats from signed evidence packets. + +### Objectives + +- Export to JSON for machine processing +- Export to PDF for human-readable reports +- Export to CSV for spreadsheet analysis +- Export to SLSA provenance format +- Batch export for audit periods + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Evidence/ +│ └── Export/ +│ ├── IAuditExporter.cs +│ ├── AuditExporter.cs +│ ├── Exporters/ +│ │ ├── JsonExporter.cs +│ │ ├── PdfExporter.cs +│ │ ├── CsvExporter.cs +│ │ └── SlsaExporter.cs +│ └── Models/ +│ ├── AuditExportRequest.cs +│ └── ExportFormat.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### IAuditExporter Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Export; + +public interface IAuditExporter +{ + Task ExportAsync( + AuditExportRequest request, + CancellationToken ct = default); + + IReadOnlyList SupportedFormats { get; } + + Task ExportToStreamAsync( + AuditExportRequest request, + CancellationToken ct = default); +} + +public sealed record AuditExportRequest +{ + public required ExportFormat Format { get; init; } + public Guid? TenantId { get; init; } + public Guid? EnvironmentId { get; init; } + public DateTimeOffset? FromDate { get; init; } + public DateTimeOffset? ToDate { get; init; } + public IReadOnlyList? EvidenceIds { get; init; } + public IReadOnlyList? Types { get; init; } + public bool IncludeVerification { get; init; } = true; + public bool IncludeSignatures { get; init; } = false; + public string? ReportTitle { get; init; } +} + +public enum ExportFormat +{ + Json, + Pdf, + Csv, + Slsa +} + +public sealed record ExportResult +{ + public required bool Success { get; init; } + public required ExportFormat Format { get; init; } + public required string FileName { get; init; } + public required string ContentType { get; init; } + public required long SizeBytes { get; init; } + public required int EvidenceCount { get; init; } + public required DateTimeOffset GeneratedAt { get; init; } + public Stream? Content { get; init; } + public string? Error { get; init; } +} +``` + +### AuditExporter Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Export; + +public sealed class AuditExporter : IAuditExporter +{ + private readonly ISignedEvidenceStore _evidenceStore; + private readonly IEvidenceSigner _evidenceSigner; + private readonly IEnumerable _exporters; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public IReadOnlyList SupportedFormats => + _exporters.Select(e => e.Format).ToList().AsReadOnly(); + + public async Task ExportAsync( + AuditExportRequest request, + CancellationToken ct = default) + { + _logger.LogInformation( + "Starting audit export: format={Format}, from={From}, to={To}", + request.Format, + request.FromDate, + request.ToDate); + + var exporter = _exporters.FirstOrDefault(e => e.Format == request.Format) + ?? throw new UnsupportedExportFormatException(request.Format); + + try + { + // Query evidence + var evidence = await QueryEvidenceAsync(request, ct); + + if (evidence.Count == 0) + { + return new ExportResult + { + Success = false, + Format = request.Format, + FileName = "", + ContentType = "", + SizeBytes = 0, + EvidenceCount = 0, + GeneratedAt = _timeProvider.GetUtcNow(), + Error = "No evidence found matching the criteria" + }; + } + + // Verify evidence if requested + var verificationResults = request.IncludeVerification + ? await VerifyAllAsync(evidence, ct) + : null; + + // Export + var stream = await exporter.ExportAsync(evidence, verificationResults, request, ct); + + var fileName = GenerateFileName(request); + var contentType = exporter.ContentType; + + _logger.LogInformation( + "Audit export completed: {Count} evidence packets, {Size} bytes", + evidence.Count, + stream.Length); + + return new ExportResult + { + Success = true, + Format = request.Format, + FileName = fileName, + ContentType = contentType, + SizeBytes = stream.Length, + EvidenceCount = evidence.Count, + GeneratedAt = _timeProvider.GetUtcNow(), + Content = stream + }; + } + catch (Exception ex) + { + _logger.LogError(ex, "Audit export failed"); + + return new ExportResult + { + Success = false, + Format = request.Format, + FileName = "", + ContentType = "", + SizeBytes = 0, + EvidenceCount = 0, + GeneratedAt = _timeProvider.GetUtcNow(), + Error = ex.Message + }; + } + } + + public async Task ExportToStreamAsync( + AuditExportRequest request, + CancellationToken ct = default) + { + var result = await ExportAsync(request, ct); + + if (!result.Success || result.Content is null) + { + throw new ExportFailedException(result.Error ?? "Unknown error"); + } + + return result.Content; + } + + private async Task> QueryEvidenceAsync( + AuditExportRequest request, + CancellationToken ct) + { + if (request.EvidenceIds?.Count > 0) + { + var packets = new List(); + foreach (var id in request.EvidenceIds) + { + var packet = await _evidenceStore.GetAsync(id, ct); + if (packet is not null) + { + packets.Add(packet); + } + } + return packets.AsReadOnly(); + } + + var filter = new SignedEvidenceQueryFilter + { + TenantId = request.TenantId, + FromDate = request.FromDate, + ToDate = request.ToDate, + Types = request.Types + }; + + return await _evidenceStore.ListAsync(filter, ct); + } + + private async Task> VerifyAllAsync( + IReadOnlyList evidence, + CancellationToken ct) + { + var results = new Dictionary(); + + foreach (var packet in evidence) + { + var result = await _evidenceSigner.VerifyWithDetailsAsync(packet, ct); + results[packet.Id] = result; + } + + return results.AsReadOnly(); + } + + private string GenerateFileName(AuditExportRequest request) + { + var timestamp = _timeProvider.GetUtcNow().ToString("yyyyMMdd-HHmmss", CultureInfo.InvariantCulture); + var extension = request.Format switch + { + ExportFormat.Json => "json", + ExportFormat.Pdf => "pdf", + ExportFormat.Csv => "csv", + ExportFormat.Slsa => "slsa.json", + _ => "dat" + }; + + return $"audit-export-{timestamp}.{extension}"; + } +} +``` + +### IFormatExporter Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Export; + +public interface IFormatExporter +{ + ExportFormat Format { get; } + string ContentType { get; } + + Task ExportAsync( + IReadOnlyList evidence, + IReadOnlyDictionary? verificationResults, + AuditExportRequest request, + CancellationToken ct = default); +} +``` + +### JsonExporter + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Export.Exporters; + +public sealed class JsonExporter : IFormatExporter +{ + public ExportFormat Format => ExportFormat.Json; + public string ContentType => "application/json"; + + public async Task ExportAsync( + IReadOnlyList evidence, + IReadOnlyDictionary? verificationResults, + AuditExportRequest request, + CancellationToken ct = default) + { + var export = new JsonAuditExport + { + SchemaVersion = "1.0", + GeneratedAt = DateTimeOffset.UtcNow.ToString("O"), + ReportTitle = request.ReportTitle ?? "Audit Export", + Query = new QueryInfo + { + FromDate = request.FromDate?.ToString("O"), + ToDate = request.ToDate?.ToString("O"), + TenantId = request.TenantId?.ToString(), + EnvironmentId = request.EnvironmentId?.ToString(), + Types = request.Types?.Select(t => t.ToString()).ToList() + }, + Summary = new ExportSummary + { + TotalEvidence = evidence.Count, + ByType = evidence.GroupBy(e => e.Content.Type) + .ToDictionary(g => g.Key.ToString(), g => g.Count()), + VerificationSummary = verificationResults is not null + ? new VerificationSummary + { + TotalVerified = verificationResults.Count, + AllValid = verificationResults.Values.All(v => v.IsValid), + FailedCount = verificationResults.Values.Count(v => !v.IsValid) + } + : null + }, + Evidence = evidence.Select(e => new EvidenceEntry + { + Id = e.Id.ToString(), + Type = e.Content.Type.ToString(), + SubjectId = e.Content.SubjectId.ToString(), + CollectedAt = e.Content.CollectedAt.ToString("O"), + SignedAt = e.SignedAt.ToString("O"), + ContentHash = request.IncludeSignatures ? e.ContentHash : null, + Signature = request.IncludeSignatures ? e.Signature : null, + SignatureAlgorithm = request.IncludeSignatures ? e.SignatureAlgorithm : null, + SignerKeyRef = e.SignerKeyRef, + Verification = verificationResults?.TryGetValue(e.Id, out var v) == true + ? new VerificationEntry + { + IsValid = v.IsValid, + SignatureValid = v.SignatureValid, + ContentHashValid = v.ContentHashValid, + Error = v.Error + } + : null, + Content = e.Content + }).ToList() + }; + + var stream = new MemoryStream(); + await JsonSerializer.SerializeAsync(stream, export, new JsonSerializerOptions + { + WriteIndented = true, + PropertyNamingPolicy = JsonNamingPolicy.CamelCase, + DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull + }, ct); + + stream.Position = 0; + return stream; + } +} + +// JSON export models +public sealed class JsonAuditExport +{ + public required string SchemaVersion { get; init; } + public required string GeneratedAt { get; init; } + public required string ReportTitle { get; init; } + public required QueryInfo Query { get; init; } + public required ExportSummary Summary { get; init; } + public required IReadOnlyList Evidence { get; init; } +} + +public sealed class QueryInfo +{ + public string? FromDate { get; init; } + public string? ToDate { get; init; } + public string? TenantId { get; init; } + public string? EnvironmentId { get; init; } + public IReadOnlyList? Types { get; init; } +} + +public sealed class ExportSummary +{ + public required int TotalEvidence { get; init; } + public required IReadOnlyDictionary ByType { get; init; } + public VerificationSummary? VerificationSummary { get; init; } +} + +public sealed class VerificationSummary +{ + public required int TotalVerified { get; init; } + public required bool AllValid { get; init; } + public required int FailedCount { get; init; } +} + +public sealed class EvidenceEntry +{ + public required string Id { get; init; } + public required string Type { get; init; } + public required string SubjectId { get; init; } + public required string CollectedAt { get; init; } + public required string SignedAt { get; init; } + public string? ContentHash { get; init; } + public string? Signature { get; init; } + public string? SignatureAlgorithm { get; init; } + public required string SignerKeyRef { get; init; } + public VerificationEntry? Verification { get; init; } + public required EvidencePacket Content { get; init; } +} + +public sealed class VerificationEntry +{ + public required bool IsValid { get; init; } + public required bool SignatureValid { get; init; } + public required bool ContentHashValid { get; init; } + public string? Error { get; init; } +} +``` + +### CsvExporter + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Export.Exporters; + +public sealed class CsvExporter : IFormatExporter +{ + public ExportFormat Format => ExportFormat.Csv; + public string ContentType => "text/csv"; + + public Task ExportAsync( + IReadOnlyList evidence, + IReadOnlyDictionary? verificationResults, + AuditExportRequest request, + CancellationToken ct = default) + { + var stream = new MemoryStream(); + using var writer = new StreamWriter(stream, Encoding.UTF8, leaveOpen: true); + + // Write header + writer.WriteLine("EvidenceId,Type,SubjectId,ReleaseName,EnvironmentName,CollectedAt,SignedAt,SignerKeyRef,IsValid,VerificationError"); + + // Write data rows + foreach (var packet in evidence) + { + var verification = verificationResults?.TryGetValue(packet.Id, out var v) == true ? v : null; + + var row = new[] + { + packet.Id.ToString(), + packet.Content.Type.ToString(), + packet.Content.SubjectId.ToString(), + EscapeCsv(packet.Content.Content.Release?.ReleaseName ?? ""), + EscapeCsv(packet.Content.Content.Deployment?.DeploymentJobId.ToString() ?? + packet.Content.Content.Promotion?.TargetEnvironmentName ?? ""), + packet.Content.CollectedAt.ToString("O"), + packet.SignedAt.ToString("O"), + packet.SignerKeyRef, + verification?.IsValid.ToString() ?? "", + EscapeCsv(verification?.Error ?? "") + }; + + writer.WriteLine(string.Join(",", row)); + } + + writer.Flush(); + stream.Position = 0; + + return Task.FromResult(stream); + } + + private static string EscapeCsv(string value) + { + if (string.IsNullOrEmpty(value)) + return ""; + + if (value.Contains(',') || value.Contains('"') || value.Contains('\n')) + { + return $"\"{value.Replace("\"", "\"\"")}\""; + } + + return value; + } +} +``` + +### SlsaExporter + +```csharp +namespace StellaOps.ReleaseOrchestrator.Evidence.Export.Exporters; + +public sealed class SlsaExporter : IFormatExporter +{ + public ExportFormat Format => ExportFormat.Slsa; + public string ContentType => "application/vnd.in-toto+json"; + + public async Task ExportAsync( + IReadOnlyList evidence, + IReadOnlyDictionary? verificationResults, + AuditExportRequest request, + CancellationToken ct = default) + { + // Export as SLSA Provenance v1.0 format + var provenances = evidence + .Where(e => e.Content.Type == EvidenceType.Deployment) + .Select(e => BuildSlsaProvenance(e)) + .ToList(); + + var stream = new MemoryStream(); + + // Write as NDJSON (one provenance per line) + using var writer = new StreamWriter(stream, Encoding.UTF8, leaveOpen: true); + + foreach (var provenance in provenances) + { + var json = JsonSerializer.Serialize(provenance, new JsonSerializerOptions + { + PropertyNamingPolicy = JsonNamingPolicy.CamelCase, + DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull + }); + await writer.WriteLineAsync(json); + } + + await writer.FlushAsync(ct); + stream.Position = 0; + + return stream; + } + + private static SlsaProvenance BuildSlsaProvenance(SignedEvidencePacket packet) + { + var deployment = packet.Content.Content.Deployment; + var release = packet.Content.Content.Release; + + return new SlsaProvenance + { + Type = "https://in-toto.io/Statement/v1", + Subject = release?.Components.Select(c => new SlsaSubject + { + Name = c.ComponentName, + Digest = new Dictionary + { + ["sha256"] = c.Digest.Replace("sha256:", "") + } + }).ToList() ?? [], + PredicateType = "https://slsa.dev/provenance/v1", + Predicate = new SlsaPredicate + { + BuildDefinition = new SlsaBuildDefinition + { + BuildType = "https://stella-ops.io/DeploymentProvenanceV1", + ExternalParameters = new Dictionary + { + ["deployment"] = new + { + jobId = deployment?.DeploymentJobId.ToString(), + strategy = deployment?.Strategy, + environment = packet.Content.Content.Promotion?.TargetEnvironmentName + } + }, + InternalParameters = new Dictionary + { + ["evidenceId"] = packet.Id.ToString(), + ["collectedAt"] = packet.Content.CollectedAt.ToString("O") + }, + ResolvedDependencies = release?.Components.Select(c => new SlsaResourceDescriptor + { + Name = c.ComponentName, + Uri = $"oci://{c.ComponentName}@{c.Digest}", + Digest = new Dictionary + { + ["sha256"] = c.Digest.Replace("sha256:", "") + } + }).ToList() ?? [] + }, + RunDetails = new SlsaRunDetails + { + Builder = new SlsaBuilder + { + Id = "https://stella-ops.io/ReleaseOrchestrator", + Version = new Dictionary + { + ["stella-ops"] = packet.Content.CollectorVersion + } + }, + Metadata = new SlsaMetadata + { + InvocationId = packet.Content.SubjectId.ToString(), + StartedOn = deployment?.StartedAt.ToString("O"), + FinishedOn = deployment?.CompletedAt?.ToString("O") + } + } + } + }; + } +} + +// SLSA Provenance models +public sealed class SlsaProvenance +{ + [JsonPropertyName("_type")] + public required string Type { get; init; } + public required IReadOnlyList Subject { get; init; } + public required string PredicateType { get; init; } + public required SlsaPredicate Predicate { get; init; } +} + +public sealed class SlsaSubject +{ + public required string Name { get; init; } + public required IReadOnlyDictionary Digest { get; init; } +} + +public sealed class SlsaPredicate +{ + public required SlsaBuildDefinition BuildDefinition { get; init; } + public required SlsaRunDetails RunDetails { get; init; } +} + +public sealed class SlsaBuildDefinition +{ + public required string BuildType { get; init; } + public required IReadOnlyDictionary ExternalParameters { get; init; } + public required IReadOnlyDictionary InternalParameters { get; init; } + public required IReadOnlyList ResolvedDependencies { get; init; } +} + +public sealed class SlsaResourceDescriptor +{ + public required string Name { get; init; } + public required string Uri { get; init; } + public required IReadOnlyDictionary Digest { get; init; } +} + +public sealed class SlsaRunDetails +{ + public required SlsaBuilder Builder { get; init; } + public required SlsaMetadata Metadata { get; init; } +} + +public sealed class SlsaBuilder +{ + public required string Id { get; init; } + public required IReadOnlyDictionary Version { get; init; } +} + +public sealed class SlsaMetadata +{ + public required string InvocationId { get; init; } + public string? StartedOn { get; init; } + public string? FinishedOn { get; init; } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Export evidence as JSON +- [ ] Export evidence as PDF +- [ ] Export evidence as CSV +- [ ] Export evidence as SLSA provenance +- [ ] Include verification results +- [ ] Filter by date range +- [ ] Filter by evidence type +- [ ] Generate meaningful file names +- [ ] SLSA format compliant with spec +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 109_001 Evidence Collector | Internal | TODO | +| 109_002 Evidence Signer | Internal | TODO | +| QuestPDF | NuGet | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IAuditExporter | TODO | | +| AuditExporter | TODO | | +| JsonExporter | TODO | | +| PdfExporter | TODO | | +| CsvExporter | TODO | | +| SlsaExporter | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_110_000_INDEX_progressive_delivery.md b/docs/implplan/SPRINT_20260110_110_000_INDEX_progressive_delivery.md new file mode 100644 index 000000000..a63d4def6 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_110_000_INDEX_progressive_delivery.md @@ -0,0 +1,250 @@ +# SPRINT INDEX: Phase 10 - Progressive Delivery + +> **Epic:** Release Orchestrator +> **Phase:** 10 - Progressive Delivery +> **Batch:** 110 +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_release_orchestrator.md) + +--- + +## Overview + +Phase 10 implements Progressive Delivery - A/B releases, canary deployments, and traffic routing for gradual rollouts. + +### Objectives + +- A/B release manager for parallel versions +- Traffic router framework abstraction +- Canary controller for gradual promotion +- Router plugin for Nginx (reference implementation) + +--- + +## Sprint Structure + +| Sprint ID | Title | Module | Status | Dependencies | +|-----------|-------|--------|--------|--------------| +| 110_001 | A/B Release Manager | PROGDL | TODO | 107_005 | +| 110_002 | Traffic Router Framework | PROGDL | TODO | 110_001 | +| 110_003 | Canary Controller | PROGDL | TODO | 110_002 | +| 110_004 | Router Plugin - Nginx | PROGDL | TODO | 110_002 | + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ PROGRESSIVE DELIVERY │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ A/B RELEASE MANAGER (110_001) │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ A/B Release │ │ │ +│ │ │ │ │ │ +│ │ │ Control (current): sha256:abc123 ──► 80% traffic │ │ │ +│ │ │ Treatment (new): sha256:def456 ──► 20% traffic │ │ │ +│ │ │ │ │ │ +│ │ │ Status: active │ │ │ +│ │ │ Decision: pending │ │ │ +│ │ └─────────────────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ TRAFFIC ROUTER FRAMEWORK (110_002) │ │ +│ │ │ │ +│ │ ITrafficRouter │ │ +│ │ ├── SetWeights(control: 80, treatment: 20) │ │ +│ │ ├── SetHeaderRouting(x-canary: true → treatment) │ │ +│ │ ├── SetCookieRouting(ab_group: B → treatment) │ │ +│ │ └── GetCurrentRouting() → RoutingConfig │ │ +│ │ │ │ +│ │ Implementations: │ │ +│ │ ├── NginxRouter │ │ +│ │ ├── HaproxyRouter │ │ +│ │ ├── TraefikRouter │ │ +│ │ └── AwsAlbRouter │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ CANARY CONTROLLER (110_003) │ │ +│ │ │ │ +│ │ Canary Progression: │ │ +│ │ │ │ +│ │ Step 1: 5% ──────┐ │ │ +│ │ Step 2: 10% ─────┤ │ │ +│ │ Step 3: 25% ─────┼──► Auto-advance if metrics pass │ │ +│ │ Step 4: 50% ─────┤ │ │ +│ │ Step 5: 100% ────┘ │ │ +│ │ │ │ +│ │ Rollback triggers: │ │ +│ │ ├── Error rate > threshold │ │ +│ │ ├── Latency P99 > threshold │ │ +│ │ └── Manual intervention │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ NGINX ROUTER PLUGIN (110_004) │ │ +│ │ │ │ +│ │ upstream control { │ │ +│ │ server app-v1:8080 weight=80; │ │ +│ │ } │ │ +│ │ upstream treatment { │ │ +│ │ server app-v2:8080 weight=20; │ │ +│ │ } │ │ +│ │ │ │ +│ │ # Header-based routing │ │ +│ │ if ($http_x_canary = "true") { │ │ +│ │ proxy_pass http://treatment; │ │ +│ │ } │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Deliverables Summary + +### 110_001: A/B Release Manager + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `IAbReleaseManager` | Interface | A/B operations | +| `AbReleaseManager` | Class | Implementation | +| `AbRelease` | Model | A/B release entity | +| `AbDecision` | Model | Promotion decision | + +### 110_002: Traffic Router Framework + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `ITrafficRouter` | Interface | Router abstraction | +| `RoutingConfig` | Model | Current routing state | +| `WeightedRouting` | Strategy | Percentage-based | +| `HeaderRouting` | Strategy | Header-based | +| `CookieRouting` | Strategy | Cookie-based | + +### 110_003: Canary Controller + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `ICanaryController` | Interface | Canary operations | +| `CanaryController` | Class | Implementation | +| `CanaryStep` | Model | Progression step | +| `CanaryMetrics` | Model | Health metrics | +| `AutoRollback` | Class | Automatic rollback | + +### 110_004: Router Plugin - Nginx + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `NginxRouter` | Router | Nginx implementation | +| `NginxConfigGenerator` | Class | Config generation | +| `NginxReloader` | Class | Hot reload | +| `NginxMetrics` | Class | Status parsing | + +--- + +## Key Interfaces + +```csharp +public interface IAbReleaseManager +{ + Task CreateAsync(CreateAbReleaseRequest request, CancellationToken ct); + Task UpdateWeightsAsync(Guid id, int controlWeight, int treatmentWeight, CancellationToken ct); + Task PromoteAsync(Guid id, AbDecision decision, CancellationToken ct); + Task RollbackAsync(Guid id, CancellationToken ct); + Task GetAsync(Guid id, CancellationToken ct); +} + +public interface ITrafficRouter +{ + string RouterType { get; } + Task ApplyAsync(RoutingConfig config, CancellationToken ct); + Task GetCurrentAsync(CancellationToken ct); + Task HealthCheckAsync(CancellationToken ct); +} + +public interface ICanaryController +{ + Task StartAsync(Guid releaseId, CanaryConfig config, CancellationToken ct); + Task AdvanceAsync(Guid canaryId, CancellationToken ct); + Task RollbackAsync(Guid canaryId, string reason, CancellationToken ct); + Task CompleteAsync(Guid canaryId, CancellationToken ct); +} +``` + +--- + +## Canary Flow + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ CANARY FLOW │ +│ │ +│ ┌─────────────┐ │ +│ │ Start │ │ +│ │ Canary │ │ +│ └──────┬──────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────┐ metrics ┌─────────────┐ pass ┌─────────────┐ │ +│ │ Step 1 │ ────────────►│ Analyze │ ──────────►│ Step 2 │ │ +│ │ 5% │ │ Metrics │ │ 10% │ │ +│ └─────────────┘ └──────┬──────┘ └──────┬──────┘ │ +│ │ │ │ +│ │ fail │ │ +│ ▼ ▼ │ +│ ┌─────────────┐ ... continue │ +│ │ Rollback │ │ │ +│ │ to Control │ │ │ +│ └─────────────┘ │ │ +│ ▼ │ +│ ┌─────────────┐ │ +│ │ Step N │ │ +│ │ 100% │ │ +│ └──────┬──────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────┐ │ +│ │ Complete │ │ +│ │ Promote │ │ +│ └─────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Dependencies + +| Module | Purpose | +|--------|---------| +| 107_005 Deployment Strategies | Base deployment | +| 107_002 Target Executor | Deploy versions | +| Telemetry | Metrics collection | + +--- + +## Acceptance Criteria + +- [ ] A/B release created +- [ ] Traffic weights applied +- [ ] Header-based routing works +- [ ] Canary progression advances +- [ ] Auto-rollback on metrics failure +- [ ] Nginx config generated +- [ ] Nginx hot reload works +- [ ] Evidence captured for A/B +- [ ] Unit test coverage ≥80% + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Phase 10 index created | diff --git a/docs/implplan/SPRINT_20260110_110_001_PROGDL_ab_release_manager.md b/docs/implplan/SPRINT_20260110_110_001_PROGDL_ab_release_manager.md new file mode 100644 index 000000000..f5b594ea6 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_110_001_PROGDL_ab_release_manager.md @@ -0,0 +1,613 @@ +# SPRINT: A/B Release Manager + +> **Sprint ID:** 110_001 +> **Module:** PROGDL +> **Phase:** 10 - Progressive Delivery +> **Status:** TODO +> **Parent:** [110_000_INDEX](SPRINT_20260110_110_000_INDEX_progressive_delivery.md) + +--- + +## Overview + +Implement the A/B Release Manager for running parallel versions with traffic splitting. + +### Objectives + +- Create A/B releases with control and treatment versions +- Manage traffic weight distribution +- Track A/B experiment metrics +- Promote or rollback based on results + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Progressive/ +│ ├── AbRelease/ +│ │ ├── IAbReleaseManager.cs +│ │ ├── AbReleaseManager.cs +│ │ ├── AbReleaseStore.cs +│ │ └── AbMetricsCollector.cs +│ └── Models/ +│ ├── AbRelease.cs +│ ├── AbDecision.cs +│ └── AbMetrics.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### AbRelease Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Models; + +public sealed record AbRelease +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required Guid EnvironmentId { get; init; } + public required string EnvironmentName { get; init; } + public required AbVersion Control { get; init; } + public required AbVersion Treatment { get; init; } + public required int ControlWeight { get; init; } + public required int TreatmentWeight { get; init; } + public required AbReleaseStatus Status { get; init; } + public required DateTimeOffset CreatedAt { get; init; } + public required Guid CreatedBy { get; init; } + public DateTimeOffset? StartedAt { get; init; } + public DateTimeOffset? CompletedAt { get; init; } + public AbDecision? Decision { get; init; } + public AbMetrics? LatestMetrics { get; init; } +} + +public sealed record AbVersion +{ + public required Guid ReleaseId { get; init; } + public required string ReleaseName { get; init; } + public required string Variant { get; init; } // "control" or "treatment" + public required ImmutableArray Components { get; init; } + public required ImmutableArray TargetIds { get; init; } +} + +public sealed record AbComponent +{ + public required string Name { get; init; } + public required string Digest { get; init; } + public string? Endpoint { get; init; } +} + +public enum AbReleaseStatus +{ + Draft, + Deploying, + Active, + Paused, + Promoting, + RollingBack, + Completed, + Failed +} + +public sealed record AbDecision +{ + public required AbDecisionType Type { get; init; } + public required string Reason { get; init; } + public required Guid DecidedBy { get; init; } + public required DateTimeOffset DecidedAt { get; init; } + public AbMetrics? MetricsAtDecision { get; init; } +} + +public enum AbDecisionType +{ + PromoteTreatment, + KeepControl, + ExtendExperiment +} + +public sealed record AbMetrics +{ + public required DateTimeOffset CollectedAt { get; init; } + public required AbVariantMetrics ControlMetrics { get; init; } + public required AbVariantMetrics TreatmentMetrics { get; init; } + public double? StatisticalSignificance { get; init; } +} + +public sealed record AbVariantMetrics +{ + public required long RequestCount { get; init; } + public required double ErrorRate { get; init; } + public required double LatencyP50 { get; init; } + public required double LatencyP95 { get; init; } + public required double LatencyP99 { get; init; } + public ImmutableDictionary CustomMetrics { get; init; } = ImmutableDictionary.Empty; +} +``` + +### IAbReleaseManager Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.AbRelease; + +public interface IAbReleaseManager +{ + Task CreateAsync( + CreateAbReleaseRequest request, + CancellationToken ct = default); + + Task StartAsync( + Guid id, + CancellationToken ct = default); + + Task UpdateWeightsAsync( + Guid id, + int controlWeight, + int treatmentWeight, + CancellationToken ct = default); + + Task PauseAsync( + Guid id, + string? reason = null, + CancellationToken ct = default); + + Task ResumeAsync( + Guid id, + CancellationToken ct = default); + + Task PromoteAsync( + Guid id, + AbDecision decision, + CancellationToken ct = default); + + Task RollbackAsync( + Guid id, + string reason, + CancellationToken ct = default); + + Task GetAsync( + Guid id, + CancellationToken ct = default); + + Task> ListAsync( + AbReleaseFilter? filter = null, + CancellationToken ct = default); + + Task GetLatestMetricsAsync( + Guid id, + CancellationToken ct = default); +} + +public sealed record CreateAbReleaseRequest +{ + public required Guid EnvironmentId { get; init; } + public required Guid ControlReleaseId { get; init; } + public required Guid TreatmentReleaseId { get; init; } + public int InitialControlWeight { get; init; } = 90; + public int InitialTreatmentWeight { get; init; } = 10; + public IReadOnlyList? ControlTargetIds { get; init; } + public IReadOnlyList? TreatmentTargetIds { get; init; } + public AbRoutingMode RoutingMode { get; init; } = AbRoutingMode.Weighted; +} + +public enum AbRoutingMode +{ + Weighted, // Random distribution by weight + HeaderBased, // Route by header value + CookieBased, // Route by cookie value + UserIdBased // Route by user ID hash +} + +public sealed record AbReleaseFilter +{ + public Guid? EnvironmentId { get; init; } + public AbReleaseStatus? Status { get; init; } + public DateTimeOffset? FromDate { get; init; } + public DateTimeOffset? ToDate { get; init; } +} +``` + +### AbReleaseManager Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.AbRelease; + +public sealed class AbReleaseManager : IAbReleaseManager +{ + private readonly IAbReleaseStore _store; + private readonly IReleaseManager _releaseManager; + private readonly IDeployOrchestrator _deployOrchestrator; + private readonly ITrafficRouter _trafficRouter; + private readonly AbMetricsCollector _metricsCollector; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + private readonly ITenantContext _tenantContext; + private readonly IUserContext _userContext; + private readonly ILogger _logger; + + public async Task CreateAsync( + CreateAbReleaseRequest request, + CancellationToken ct = default) + { + var controlRelease = await _releaseManager.GetAsync(request.ControlReleaseId, ct) + ?? throw new ReleaseNotFoundException(request.ControlReleaseId); + + var treatmentRelease = await _releaseManager.GetAsync(request.TreatmentReleaseId, ct) + ?? throw new ReleaseNotFoundException(request.TreatmentReleaseId); + + ValidateWeights(request.InitialControlWeight, request.InitialTreatmentWeight); + + var abRelease = new AbRelease + { + Id = _guidGenerator.NewGuid(), + TenantId = _tenantContext.TenantId, + EnvironmentId = request.EnvironmentId, + EnvironmentName = controlRelease.Components.First().Config.GetValueOrDefault("environment", ""), + Control = new AbVersion + { + ReleaseId = controlRelease.Id, + ReleaseName = controlRelease.Name, + Variant = "control", + Components = controlRelease.Components.Select(c => new AbComponent + { + Name = c.ComponentName, + Digest = c.Digest + }).ToImmutableArray(), + TargetIds = request.ControlTargetIds?.ToImmutableArray() ?? ImmutableArray.Empty + }, + Treatment = new AbVersion + { + ReleaseId = treatmentRelease.Id, + ReleaseName = treatmentRelease.Name, + Variant = "treatment", + Components = treatmentRelease.Components.Select(c => new AbComponent + { + Name = c.ComponentName, + Digest = c.Digest + }).ToImmutableArray(), + TargetIds = request.TreatmentTargetIds?.ToImmutableArray() ?? ImmutableArray.Empty + }, + ControlWeight = request.InitialControlWeight, + TreatmentWeight = request.InitialTreatmentWeight, + Status = AbReleaseStatus.Draft, + CreatedAt = _timeProvider.GetUtcNow(), + CreatedBy = _userContext.UserId + }; + + await _store.SaveAsync(abRelease, ct); + + await _eventPublisher.PublishAsync(new AbReleaseCreated( + abRelease.Id, + abRelease.Control.ReleaseName, + abRelease.Treatment.ReleaseName, + abRelease.EnvironmentId, + _timeProvider.GetUtcNow() + ), ct); + + _logger.LogInformation( + "Created A/B release {Id}: control={Control}, treatment={Treatment}", + abRelease.Id, + controlRelease.Name, + treatmentRelease.Name); + + return abRelease; + } + + public async Task StartAsync(Guid id, CancellationToken ct = default) + { + var abRelease = await GetRequiredAsync(id, ct); + + if (abRelease.Status != AbReleaseStatus.Draft) + { + throw new InvalidAbReleaseStateException(id, abRelease.Status, "Cannot start - not in Draft status"); + } + + // Deploy both versions + abRelease = abRelease with { Status = AbReleaseStatus.Deploying }; + await _store.SaveAsync(abRelease, ct); + + try + { + // Deploy control version to control targets + await DeployVersionAsync(abRelease.Control, abRelease.EnvironmentId, ct); + + // Deploy treatment version to treatment targets + await DeployVersionAsync(abRelease.Treatment, abRelease.EnvironmentId, ct); + + // Configure traffic routing + await _trafficRouter.ApplyAsync(new RoutingConfig + { + AbReleaseId = abRelease.Id, + ControlEndpoints = abRelease.Control.Components.Select(c => c.Endpoint ?? "").ToList(), + TreatmentEndpoints = abRelease.Treatment.Components.Select(c => c.Endpoint ?? "").ToList(), + ControlWeight = abRelease.ControlWeight, + TreatmentWeight = abRelease.TreatmentWeight + }, ct); + + abRelease = abRelease with + { + Status = AbReleaseStatus.Active, + StartedAt = _timeProvider.GetUtcNow() + }; + await _store.SaveAsync(abRelease, ct); + + await _eventPublisher.PublishAsync(new AbReleaseStarted( + abRelease.Id, + abRelease.ControlWeight, + abRelease.TreatmentWeight, + _timeProvider.GetUtcNow() + ), ct); + + _logger.LogInformation( + "Started A/B release {Id}: {ControlWeight}% control, {TreatmentWeight}% treatment", + id, + abRelease.ControlWeight, + abRelease.TreatmentWeight); + + return abRelease; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to start A/B release {Id}", id); + + abRelease = abRelease with + { + Status = AbReleaseStatus.Failed, + Decision = new AbDecision + { + Type = AbDecisionType.KeepControl, + Reason = $"Deployment failed: {ex.Message}", + DecidedBy = _userContext.UserId, + DecidedAt = _timeProvider.GetUtcNow() + } + }; + await _store.SaveAsync(abRelease, ct); + + throw; + } + } + + public async Task UpdateWeightsAsync( + Guid id, + int controlWeight, + int treatmentWeight, + CancellationToken ct = default) + { + var abRelease = await GetRequiredAsync(id, ct); + + if (abRelease.Status != AbReleaseStatus.Active && abRelease.Status != AbReleaseStatus.Paused) + { + throw new InvalidAbReleaseStateException(id, abRelease.Status, "Cannot update weights - not active or paused"); + } + + ValidateWeights(controlWeight, treatmentWeight); + + // Update traffic routing + await _trafficRouter.ApplyAsync(new RoutingConfig + { + AbReleaseId = abRelease.Id, + ControlEndpoints = abRelease.Control.Components.Select(c => c.Endpoint ?? "").ToList(), + TreatmentEndpoints = abRelease.Treatment.Components.Select(c => c.Endpoint ?? "").ToList(), + ControlWeight = controlWeight, + TreatmentWeight = treatmentWeight + }, ct); + + abRelease = abRelease with + { + ControlWeight = controlWeight, + TreatmentWeight = treatmentWeight + }; + await _store.SaveAsync(abRelease, ct); + + await _eventPublisher.PublishAsync(new AbReleaseWeightsUpdated( + abRelease.Id, + controlWeight, + treatmentWeight, + _timeProvider.GetUtcNow() + ), ct); + + _logger.LogInformation( + "Updated A/B release {Id} weights: {ControlWeight}% control, {TreatmentWeight}% treatment", + id, + controlWeight, + treatmentWeight); + + return abRelease; + } + + public async Task PromoteAsync( + Guid id, + AbDecision decision, + CancellationToken ct = default) + { + var abRelease = await GetRequiredAsync(id, ct); + + if (abRelease.Status != AbReleaseStatus.Active) + { + throw new InvalidAbReleaseStateException(id, abRelease.Status, "Cannot promote - not active"); + } + + abRelease = abRelease with { Status = AbReleaseStatus.Promoting }; + await _store.SaveAsync(abRelease, ct); + + try + { + var winningRelease = decision.Type == AbDecisionType.PromoteTreatment + ? abRelease.Treatment + : abRelease.Control; + + // Route 100% traffic to winner + await _trafficRouter.ApplyAsync(new RoutingConfig + { + AbReleaseId = abRelease.Id, + ControlEndpoints = winningRelease.Components.Select(c => c.Endpoint ?? "").ToList(), + TreatmentEndpoints = [], + ControlWeight = 100, + TreatmentWeight = 0 + }, ct); + + // Collect final metrics + var finalMetrics = await _metricsCollector.CollectAsync(abRelease, ct); + + abRelease = abRelease with + { + Status = AbReleaseStatus.Completed, + CompletedAt = _timeProvider.GetUtcNow(), + Decision = decision with { MetricsAtDecision = finalMetrics }, + LatestMetrics = finalMetrics + }; + await _store.SaveAsync(abRelease, ct); + + await _eventPublisher.PublishAsync(new AbReleaseCompleted( + abRelease.Id, + decision.Type, + winningRelease.ReleaseName, + _timeProvider.GetUtcNow() + ), ct); + + _logger.LogInformation( + "A/B release {Id} completed: winner={Winner}", + id, + winningRelease.ReleaseName); + + return abRelease; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to promote A/B release {Id}", id); + + abRelease = abRelease with { Status = AbReleaseStatus.Active }; + await _store.SaveAsync(abRelease, ct); + + throw; + } + } + + public async Task RollbackAsync( + Guid id, + string reason, + CancellationToken ct = default) + { + var abRelease = await GetRequiredAsync(id, ct); + + if (abRelease.Status != AbReleaseStatus.Active && abRelease.Status != AbReleaseStatus.Promoting) + { + throw new InvalidAbReleaseStateException(id, abRelease.Status, "Cannot rollback"); + } + + abRelease = abRelease with { Status = AbReleaseStatus.RollingBack }; + await _store.SaveAsync(abRelease, ct); + + // Route 100% to control + await _trafficRouter.ApplyAsync(new RoutingConfig + { + AbReleaseId = abRelease.Id, + ControlEndpoints = abRelease.Control.Components.Select(c => c.Endpoint ?? "").ToList(), + TreatmentEndpoints = [], + ControlWeight = 100, + TreatmentWeight = 0 + }, ct); + + abRelease = abRelease with + { + Status = AbReleaseStatus.Completed, + CompletedAt = _timeProvider.GetUtcNow(), + Decision = new AbDecision + { + Type = AbDecisionType.KeepControl, + Reason = $"Rollback: {reason}", + DecidedBy = _userContext.UserId, + DecidedAt = _timeProvider.GetUtcNow() + } + }; + await _store.SaveAsync(abRelease, ct); + + _logger.LogInformation("Rolled back A/B release {Id}: {Reason}", id, reason); + + return abRelease; + } + + public async Task GetLatestMetricsAsync(Guid id, CancellationToken ct = default) + { + var abRelease = await GetRequiredAsync(id, ct); + return await _metricsCollector.CollectAsync(abRelease, ct); + } + + private async Task DeployVersionAsync(AbVersion version, Guid environmentId, CancellationToken ct) + { + // Create a deployment for this version + // Implementation depends on target assignment strategy + } + + private async Task GetRequiredAsync(Guid id, CancellationToken ct) + { + return await _store.GetAsync(id, ct) + ?? throw new AbReleaseNotFoundException(id); + } + + private static void ValidateWeights(int controlWeight, int treatmentWeight) + { + if (controlWeight < 0 || treatmentWeight < 0) + { + throw new InvalidWeightException("Weights must be non-negative"); + } + + if (controlWeight + treatmentWeight != 100) + { + throw new InvalidWeightException("Weights must sum to 100"); + } + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Create A/B release with control and treatment versions +- [ ] Validate weights sum to 100 +- [ ] Deploy both versions on start +- [ ] Configure traffic routing +- [ ] Update weights dynamically +- [ ] Pause and resume experiments +- [ ] Promote treatment version +- [ ] Rollback to control version +- [ ] Collect metrics for both variants +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 107_005 Deployment Strategies | Internal | TODO | +| 104_003 Release Manager | Internal | TODO | +| 110_002 Traffic Router Framework | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| IAbReleaseManager | TODO | | +| AbReleaseManager | TODO | | +| AbRelease model | TODO | | +| AbDecision model | TODO | | +| AbMetrics model | TODO | | +| AbReleaseStore | TODO | | +| AbMetricsCollector | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_110_002_PROGDL_traffic_router.md b/docs/implplan/SPRINT_20260110_110_002_PROGDL_traffic_router.md new file mode 100644 index 000000000..5d843b144 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_110_002_PROGDL_traffic_router.md @@ -0,0 +1,520 @@ +# SPRINT: Traffic Router Framework + +> **Sprint ID:** 110_002 +> **Module:** PROGDL +> **Phase:** 10 - Progressive Delivery +> **Status:** TODO +> **Parent:** [110_000_INDEX](SPRINT_20260110_110_000_INDEX_progressive_delivery.md) + +--- + +## Overview + +Implement the Traffic Router Framework providing abstractions for traffic splitting across load balancers. + +### Objectives + +- Define traffic router interface for plugins +- Support weighted routing (percentage-based) +- Support header-based routing +- Support cookie-based routing +- Track routing state and transitions + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Progressive/ +│ └── Router/ +│ ├── ITrafficRouter.cs +│ ├── TrafficRouterRegistry.cs +│ ├── RoutingConfig.cs +│ ├── Strategies/ +│ │ ├── WeightedRouting.cs +│ │ ├── HeaderRouting.cs +│ │ └── CookieRouting.cs +│ └── Store/ +│ ├── IRoutingStateStore.cs +│ └── RoutingStateStore.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### ITrafficRouter Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Router; + +public interface ITrafficRouter +{ + string RouterType { get; } + IReadOnlyList SupportedStrategies { get; } + + Task IsAvailableAsync(CancellationToken ct = default); + + Task ApplyAsync( + RoutingConfig config, + CancellationToken ct = default); + + Task GetCurrentAsync( + Guid contextId, + CancellationToken ct = default); + + Task RemoveAsync( + Guid contextId, + CancellationToken ct = default); + + Task HealthCheckAsync(CancellationToken ct = default); + + Task GetMetricsAsync( + Guid contextId, + CancellationToken ct = default); +} + +public sealed record RoutingConfig +{ + public required Guid AbReleaseId { get; init; } + public required IReadOnlyList ControlEndpoints { get; init; } + public required IReadOnlyList TreatmentEndpoints { get; init; } + public required int ControlWeight { get; init; } + public required int TreatmentWeight { get; init; } + public RoutingStrategy Strategy { get; init; } = RoutingStrategy.Weighted; + public HeaderRoutingConfig? HeaderRouting { get; init; } + public CookieRoutingConfig? CookieRouting { get; init; } + public IReadOnlyDictionary Metadata { get; init; } = new Dictionary(); +} + +public enum RoutingStrategy +{ + Weighted, + HeaderBased, + CookieBased, + Combined +} + +public sealed record HeaderRoutingConfig +{ + public required string HeaderName { get; init; } + public required string TreatmentValue { get; init; } + public bool FallbackToWeighted { get; init; } = true; +} + +public sealed record CookieRoutingConfig +{ + public required string CookieName { get; init; } + public required string TreatmentValue { get; init; } + public bool FallbackToWeighted { get; init; } = true; +} + +public sealed record RouterMetrics +{ + public required long ControlRequests { get; init; } + public required long TreatmentRequests { get; init; } + public required double ControlErrorRate { get; init; } + public required double TreatmentErrorRate { get; init; } + public required double ControlLatencyP50 { get; init; } + public required double TreatmentLatencyP50 { get; init; } + public required DateTimeOffset CollectedAt { get; init; } +} +``` + +### TrafficRouterRegistry + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Router; + +public sealed class TrafficRouterRegistry +{ + private readonly Dictionary _routers = new(); + private readonly ILogger _logger; + + public void Register(ITrafficRouter router) + { + if (_routers.ContainsKey(router.RouterType)) + { + throw new RouterAlreadyRegisteredException(router.RouterType); + } + + _routers[router.RouterType] = router; + _logger.LogInformation( + "Registered traffic router: {Type} with strategies: {Strategies}", + router.RouterType, + string.Join(", ", router.SupportedStrategies)); + } + + public ITrafficRouter? Get(string routerType) + { + return _routers.TryGetValue(routerType, out var router) ? router : null; + } + + public ITrafficRouter GetRequired(string routerType) + { + return Get(routerType) + ?? throw new RouterNotFoundException(routerType); + } + + public IReadOnlyList GetAvailable() + { + return _routers.Values.Select(r => new RouterInfo + { + Type = r.RouterType, + SupportedStrategies = r.SupportedStrategies + }).ToList().AsReadOnly(); + } + + public async Task> CheckHealthAsync(CancellationToken ct = default) + { + var results = new List(); + + foreach (var (type, router) in _routers) + { + try + { + var isHealthy = await router.HealthCheckAsync(ct); + results.Add(new RouterHealthStatus(type, isHealthy, null)); + } + catch (Exception ex) + { + results.Add(new RouterHealthStatus(type, false, ex.Message)); + } + } + + return results.AsReadOnly(); + } +} + +public sealed record RouterInfo +{ + public required string Type { get; init; } + public required IReadOnlyList SupportedStrategies { get; init; } +} + +public sealed record RouterHealthStatus( + string RouterType, + bool IsHealthy, + string? Error +); +``` + +### RoutingStateStore + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Router.Store; + +public interface IRoutingStateStore +{ + Task SaveAsync(RoutingState state, CancellationToken ct = default); + Task GetAsync(Guid contextId, CancellationToken ct = default); + Task> ListActiveAsync(CancellationToken ct = default); + Task DeleteAsync(Guid contextId, CancellationToken ct = default); + Task> GetHistoryAsync(Guid contextId, CancellationToken ct = default); +} + +public sealed record RoutingState +{ + public required Guid ContextId { get; init; } + public required string RouterType { get; init; } + public required RoutingConfig Config { get; init; } + public required RoutingStateStatus Status { get; init; } + public required DateTimeOffset AppliedAt { get; init; } + public required DateTimeOffset LastVerifiedAt { get; init; } + public string? Error { get; init; } +} + +public enum RoutingStateStatus +{ + Pending, + Applied, + Verified, + Failed, + Removed +} + +public sealed record RoutingTransition +{ + public required Guid ContextId { get; init; } + public required RoutingConfig FromConfig { get; init; } + public required RoutingConfig ToConfig { get; init; } + public required string Reason { get; init; } + public required Guid TriggeredBy { get; init; } + public required DateTimeOffset TransitionedAt { get; init; } +} +``` + +### WeightedRouting Strategy + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Router.Strategies; + +public sealed class WeightedRouting +{ + public static UpstreamConfig Generate(RoutingConfig config) + { + var controlUpstream = new UpstreamDefinition + { + Name = $"control-{config.AbReleaseId:N}", + Servers = config.ControlEndpoints.Select(e => new UpstreamServer + { + Address = e, + Weight = config.ControlWeight + }).ToList() + }; + + var treatmentUpstream = new UpstreamDefinition + { + Name = $"treatment-{config.AbReleaseId:N}", + Servers = config.TreatmentEndpoints.Select(e => new UpstreamServer + { + Address = e, + Weight = config.TreatmentWeight + }).ToList() + }; + + return new UpstreamConfig + { + ContextId = config.AbReleaseId, + Upstreams = new[] { controlUpstream, treatmentUpstream }.ToList(), + DefaultUpstream = controlUpstream.Name, + SplitConfig = new SplitConfig + { + ControlUpstream = controlUpstream.Name, + TreatmentUpstream = treatmentUpstream.Name, + ControlWeight = config.ControlWeight, + TreatmentWeight = config.TreatmentWeight + } + }; + } +} + +public sealed record UpstreamConfig +{ + public required Guid ContextId { get; init; } + public required IReadOnlyList Upstreams { get; init; } + public required string DefaultUpstream { get; init; } + public SplitConfig? SplitConfig { get; init; } + public HeaderMatchConfig? HeaderConfig { get; init; } + public CookieMatchConfig? CookieConfig { get; init; } +} + +public sealed record UpstreamDefinition +{ + public required string Name { get; init; } + public required IReadOnlyList Servers { get; init; } +} + +public sealed record UpstreamServer +{ + public required string Address { get; init; } + public int Weight { get; init; } = 1; + public int MaxFails { get; init; } = 3; + public TimeSpan FailTimeout { get; init; } = TimeSpan.FromSeconds(30); +} + +public sealed record SplitConfig +{ + public required string ControlUpstream { get; init; } + public required string TreatmentUpstream { get; init; } + public required int ControlWeight { get; init; } + public required int TreatmentWeight { get; init; } +} +``` + +### HeaderRouting Strategy + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Router.Strategies; + +public sealed class HeaderRouting +{ + public static HeaderMatchConfig Generate(RoutingConfig config) + { + if (config.HeaderRouting is null) + { + throw new InvalidOperationException("Header routing config is required"); + } + + return new HeaderMatchConfig + { + HeaderName = config.HeaderRouting.HeaderName, + Matches = new[] + { + new HeaderMatch + { + Value = config.HeaderRouting.TreatmentValue, + Upstream = $"treatment-{config.AbReleaseId:N}" + } + }.ToList(), + DefaultUpstream = $"control-{config.AbReleaseId:N}", + FallbackToWeighted = config.HeaderRouting.FallbackToWeighted + }; + } +} + +public sealed record HeaderMatchConfig +{ + public required string HeaderName { get; init; } + public required IReadOnlyList Matches { get; init; } + public required string DefaultUpstream { get; init; } + public bool FallbackToWeighted { get; init; } +} + +public sealed record HeaderMatch +{ + public required string Value { get; init; } + public required string Upstream { get; init; } +} +``` + +### CookieRouting Strategy + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Router.Strategies; + +public sealed class CookieRouting +{ + public static CookieMatchConfig Generate(RoutingConfig config) + { + if (config.CookieRouting is null) + { + throw new InvalidOperationException("Cookie routing config is required"); + } + + return new CookieMatchConfig + { + CookieName = config.CookieRouting.CookieName, + Matches = new[] + { + new CookieMatch + { + Value = config.CookieRouting.TreatmentValue, + Upstream = $"treatment-{config.AbReleaseId:N}" + } + }.ToList(), + DefaultUpstream = $"control-{config.AbReleaseId:N}", + FallbackToWeighted = config.CookieRouting.FallbackToWeighted, + SetCookieOnFirstRequest = true + }; + } +} + +public sealed record CookieMatchConfig +{ + public required string CookieName { get; init; } + public required IReadOnlyList Matches { get; init; } + public required string DefaultUpstream { get; init; } + public bool FallbackToWeighted { get; init; } + public bool SetCookieOnFirstRequest { get; init; } + public TimeSpan CookieExpiry { get; init; } = TimeSpan.FromDays(30); +} + +public sealed record CookieMatch +{ + public required string Value { get; init; } + public required string Upstream { get; init; } +} +``` + +### Routing Config Validator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Router; + +public static class RoutingConfigValidator +{ + public static ValidationResult Validate(RoutingConfig config) + { + var errors = new List(); + + // Validate weights + if (config.ControlWeight + config.TreatmentWeight != 100) + { + errors.Add("Weights must sum to 100"); + } + + if (config.ControlWeight < 0 || config.TreatmentWeight < 0) + { + errors.Add("Weights must be non-negative"); + } + + // Validate endpoints + if (config.ControlEndpoints.Count == 0 && config.ControlWeight > 0) + { + errors.Add("Control endpoints required when control weight > 0"); + } + + if (config.TreatmentEndpoints.Count == 0 && config.TreatmentWeight > 0) + { + errors.Add("Treatment endpoints required when treatment weight > 0"); + } + + // Validate strategy-specific config + if (config.Strategy == RoutingStrategy.HeaderBased && config.HeaderRouting is null) + { + errors.Add("Header routing config required for header-based strategy"); + } + + if (config.Strategy == RoutingStrategy.CookieBased && config.CookieRouting is null) + { + errors.Add("Cookie routing config required for cookie-based strategy"); + } + + return new ValidationResult(errors.Count == 0, errors.AsReadOnly()); + } +} + +public sealed record ValidationResult( + bool IsValid, + IReadOnlyList Errors +); +``` + +--- + +## Acceptance Criteria + +- [ ] Define traffic router interface +- [ ] Register and discover routers +- [ ] Generate weighted routing config +- [ ] Generate header-based routing config +- [ ] Generate cookie-based routing config +- [ ] Validate routing configurations +- [ ] Store routing state transitions +- [ ] Query active routing states +- [ ] Health check router implementations +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 110_001 A/B Release Manager | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| ITrafficRouter | TODO | | +| TrafficRouterRegistry | TODO | | +| RoutingConfig | TODO | | +| WeightedRouting | TODO | | +| HeaderRouting | TODO | | +| CookieRouting | TODO | | +| RoutingStateStore | TODO | | +| RoutingConfigValidator | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_110_003_PROGDL_canary_controller.md b/docs/implplan/SPRINT_20260110_110_003_PROGDL_canary_controller.md new file mode 100644 index 000000000..ae423a0c4 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_110_003_PROGDL_canary_controller.md @@ -0,0 +1,702 @@ +# SPRINT: Canary Controller + +> **Sprint ID:** 110_003 +> **Module:** PROGDL +> **Phase:** 10 - Progressive Delivery +> **Status:** TODO +> **Parent:** [110_000_INDEX](SPRINT_20260110_110_000_INDEX_progressive_delivery.md) + +--- + +## Overview + +Implement the Canary Controller for gradual traffic promotion with automatic rollback based on metrics. + +### Objectives + +- Define canary progression steps +- Auto-advance based on metrics analysis +- Auto-rollback on metric threshold breach +- Manual intervention support +- Configurable promotion schedules + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Progressive/ +│ └── Canary/ +│ ├── ICanaryController.cs +│ ├── CanaryController.cs +│ ├── CanaryProgressionEngine.cs +│ ├── CanaryMetricsAnalyzer.cs +│ ├── AutoRollback.cs +│ └── Models/ +│ ├── CanaryRelease.cs +│ ├── CanaryStep.cs +│ └── CanaryConfig.cs +└── __Tests/ +``` + +--- + +## Deliverables + +### CanaryRelease Model + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Canary.Models; + +public sealed record CanaryRelease +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required Guid ReleaseId { get; init; } + public required string ReleaseName { get; init; } + public required Guid EnvironmentId { get; init; } + public required string EnvironmentName { get; init; } + public required CanaryConfig Config { get; init; } + public required ImmutableArray Steps { get; init; } + public required int CurrentStepIndex { get; init; } + public required CanaryStatus Status { get; init; } + public required DateTimeOffset CreatedAt { get; init; } + public required Guid CreatedBy { get; init; } + public DateTimeOffset? StartedAt { get; init; } + public DateTimeOffset? CurrentStepStartedAt { get; init; } + public DateTimeOffset? CompletedAt { get; init; } + public CanaryRollbackInfo? RollbackInfo { get; init; } + public CanaryMetrics? LatestMetrics { get; init; } +} + +public enum CanaryStatus +{ + Pending, + Running, + WaitingForMetrics, + Advancing, + Paused, + RollingBack, + Completed, + Failed, + Cancelled +} + +public sealed record CanaryConfig +{ + public required ImmutableArray StepConfigs { get; init; } + public required CanaryMetricThresholds Thresholds { get; init; } + public TimeSpan MetricsWindowDuration { get; init; } = TimeSpan.FromMinutes(5); + public TimeSpan MinStepDuration { get; init; } = TimeSpan.FromMinutes(10); + public bool AutoAdvance { get; init; } = true; + public bool AutoRollback { get; init; } = true; + public int MetricCheckIntervalSeconds { get; init; } = 60; +} + +public sealed record CanaryStepConfig +{ + public required int StepIndex { get; init; } + public required int TrafficPercentage { get; init; } + public TimeSpan? MinDuration { get; init; } + public TimeSpan? MaxDuration { get; init; } + public bool RequireManualApproval { get; init; } +} + +public sealed record CanaryStep +{ + public required int Index { get; init; } + public required int TrafficPercentage { get; init; } + public required CanaryStepStatus Status { get; init; } + public DateTimeOffset? StartedAt { get; init; } + public DateTimeOffset? CompletedAt { get; init; } + public CanaryMetrics? MetricsAtStart { get; init; } + public CanaryMetrics? MetricsAtEnd { get; init; } + public string? Notes { get; init; } +} + +public enum CanaryStepStatus +{ + Pending, + Running, + WaitingApproval, + Completed, + Skipped, + Failed +} + +public sealed record CanaryMetricThresholds +{ + public double MaxErrorRate { get; init; } = 0.05; // 5% + public double MaxLatencyP99Ms { get; init; } = 1000; // 1 second + public double? MaxLatencyP95Ms { get; init; } + public double? MaxLatencyP50Ms { get; init; } + public double MinSuccessRate { get; init; } = 0.95; // 95% + public ImmutableDictionary CustomThresholds { get; init; } = ImmutableDictionary.Empty; +} + +public sealed record CanaryMetrics +{ + public required DateTimeOffset CollectedAt { get; init; } + public required TimeSpan WindowDuration { get; init; } + public required long RequestCount { get; init; } + public required double ErrorRate { get; init; } + public required double SuccessRate { get; init; } + public required double LatencyP50Ms { get; init; } + public required double LatencyP95Ms { get; init; } + public required double LatencyP99Ms { get; init; } + public ImmutableDictionary CustomMetrics { get; init; } = ImmutableDictionary.Empty; +} + +public sealed record CanaryRollbackInfo +{ + public required string Reason { get; init; } + public required bool WasAutomatic { get; init; } + public required int RolledBackFromStep { get; init; } + public required CanaryMetrics? MetricsAtRollback { get; init; } + public required DateTimeOffset RolledBackAt { get; init; } + public Guid? TriggeredBy { get; init; } +} +``` + +### ICanaryController Interface + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Canary; + +public interface ICanaryController +{ + Task StartAsync( + Guid releaseId, + CanaryConfig config, + CancellationToken ct = default); + + Task AdvanceAsync( + Guid canaryId, + CancellationToken ct = default); + + Task PauseAsync( + Guid canaryId, + string? reason = null, + CancellationToken ct = default); + + Task ResumeAsync( + Guid canaryId, + CancellationToken ct = default); + + Task RollbackAsync( + Guid canaryId, + string reason, + CancellationToken ct = default); + + Task CompleteAsync( + Guid canaryId, + CancellationToken ct = default); + + Task ApproveStepAsync( + Guid canaryId, + int stepIndex, + string? comment = null, + CancellationToken ct = default); + + Task GetAsync( + Guid canaryId, + CancellationToken ct = default); + + Task> ListActiveAsync( + Guid? environmentId = null, + CancellationToken ct = default); + + Task GetCurrentMetricsAsync( + Guid canaryId, + CancellationToken ct = default); +} +``` + +### CanaryController Implementation + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Canary; + +public sealed class CanaryController : ICanaryController +{ + private readonly ICanaryStore _store; + private readonly IReleaseManager _releaseManager; + private readonly ITrafficRouter _trafficRouter; + private readonly CanaryProgressionEngine _progressionEngine; + private readonly CanaryMetricsAnalyzer _metricsAnalyzer; + private readonly IEventPublisher _eventPublisher; + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + private readonly ITenantContext _tenantContext; + private readonly IUserContext _userContext; + private readonly ILogger _logger; + + public async Task StartAsync( + Guid releaseId, + CanaryConfig config, + CancellationToken ct = default) + { + var release = await _releaseManager.GetAsync(releaseId, ct) + ?? throw new ReleaseNotFoundException(releaseId); + + ValidateConfig(config); + + var steps = BuildSteps(config); + + var canary = new CanaryRelease + { + Id = _guidGenerator.NewGuid(), + TenantId = _tenantContext.TenantId, + ReleaseId = release.Id, + ReleaseName = release.Name, + EnvironmentId = release.EnvironmentId ?? Guid.Empty, + EnvironmentName = release.EnvironmentName ?? "", + Config = config, + Steps = steps, + CurrentStepIndex = 0, + Status = CanaryStatus.Pending, + CreatedAt = _timeProvider.GetUtcNow(), + CreatedBy = _userContext.UserId + }; + + await _store.SaveAsync(canary, ct); + + _logger.LogInformation( + "Created canary release {Id} for release {Release} with {StepCount} steps", + canary.Id, + release.Name, + steps.Length); + + // Start first step + return await StartStepAsync(canary, 0, ct); + } + + public async Task AdvanceAsync( + Guid canaryId, + CancellationToken ct = default) + { + var canary = await GetRequiredAsync(canaryId, ct); + + if (canary.Status != CanaryStatus.Running && canary.Status != CanaryStatus.WaitingForMetrics) + { + throw new InvalidCanaryStateException(canaryId, canary.Status, "Cannot advance"); + } + + var currentStep = canary.Steps[canary.CurrentStepIndex]; + if (currentStep.Status == CanaryStepStatus.WaitingApproval) + { + throw new CanaryStepAwaitingApprovalException(canaryId, canary.CurrentStepIndex); + } + + // Check if there are more steps + var nextStepIndex = canary.CurrentStepIndex + 1; + if (nextStepIndex >= canary.Steps.Length) + { + // All steps complete, finalize + return await CompleteAsync(canaryId, ct); + } + + // Collect metrics for current step + var currentMetrics = await _metricsAnalyzer.CollectAsync(canary, ct); + + // Update current step as completed + var updatedSteps = canary.Steps.SetItem(canary.CurrentStepIndex, currentStep with + { + Status = CanaryStepStatus.Completed, + CompletedAt = _timeProvider.GetUtcNow(), + MetricsAtEnd = currentMetrics + }); + + canary = canary with + { + Steps = updatedSteps, + LatestMetrics = currentMetrics + }; + + await _store.SaveAsync(canary, ct); + + // Start next step + return await StartStepAsync(canary, nextStepIndex, ct); + } + + public async Task RollbackAsync( + Guid canaryId, + string reason, + CancellationToken ct = default) + { + var canary = await GetRequiredAsync(canaryId, ct); + + if (canary.Status == CanaryStatus.Completed || canary.Status == CanaryStatus.Failed) + { + throw new InvalidCanaryStateException(canaryId, canary.Status, "Cannot rollback completed canary"); + } + + canary = canary with { Status = CanaryStatus.RollingBack }; + await _store.SaveAsync(canary, ct); + + _logger.LogWarning( + "Rolling back canary {Id} from step {Step}: {Reason}", + canaryId, + canary.CurrentStepIndex, + reason); + + try + { + // Route 100% traffic back to baseline + await _trafficRouter.ApplyAsync(new RoutingConfig + { + AbReleaseId = canary.Id, + ControlEndpoints = await GetBaselineEndpointsAsync(canary, ct), + TreatmentEndpoints = [], + ControlWeight = 100, + TreatmentWeight = 0 + }, ct); + + var currentMetrics = await _metricsAnalyzer.CollectAsync(canary, ct); + + canary = canary with + { + Status = CanaryStatus.Failed, + CompletedAt = _timeProvider.GetUtcNow(), + RollbackInfo = new CanaryRollbackInfo + { + Reason = reason, + WasAutomatic = false, + RolledBackFromStep = canary.CurrentStepIndex, + MetricsAtRollback = currentMetrics, + RolledBackAt = _timeProvider.GetUtcNow(), + TriggeredBy = _userContext.UserId + } + }; + + await _store.SaveAsync(canary, ct); + + await _eventPublisher.PublishAsync(new CanaryRolledBack( + canary.Id, + canary.ReleaseName, + reason, + canary.CurrentStepIndex, + _timeProvider.GetUtcNow() + ), ct); + + return canary; + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to rollback canary {Id}", canaryId); + throw; + } + } + + public async Task CompleteAsync( + Guid canaryId, + CancellationToken ct = default) + { + var canary = await GetRequiredAsync(canaryId, ct); + + if (canary.Status != CanaryStatus.Running) + { + throw new InvalidCanaryStateException(canaryId, canary.Status, "Cannot complete"); + } + + // Verify we're at 100% traffic + var currentStep = canary.Steps[canary.CurrentStepIndex]; + if (currentStep.TrafficPercentage != 100) + { + throw new CanaryNotAtFullTrafficException(canaryId, currentStep.TrafficPercentage); + } + + var finalMetrics = await _metricsAnalyzer.CollectAsync(canary, ct); + + // Mark final step complete + var updatedSteps = canary.Steps.SetItem(canary.CurrentStepIndex, currentStep with + { + Status = CanaryStepStatus.Completed, + CompletedAt = _timeProvider.GetUtcNow(), + MetricsAtEnd = finalMetrics + }); + + canary = canary with + { + Steps = updatedSteps, + Status = CanaryStatus.Completed, + CompletedAt = _timeProvider.GetUtcNow(), + LatestMetrics = finalMetrics + }; + + await _store.SaveAsync(canary, ct); + + await _eventPublisher.PublishAsync(new CanaryCompleted( + canary.Id, + canary.ReleaseName, + canary.Steps.Length, + _timeProvider.GetUtcNow() + ), ct); + + _logger.LogInformation( + "Canary {Id} completed successfully after {StepCount} steps", + canaryId, + canary.Steps.Length); + + return canary; + } + + private async Task StartStepAsync( + CanaryRelease canary, + int stepIndex, + CancellationToken ct) + { + var step = canary.Steps[stepIndex]; + var stepConfig = canary.Config.StepConfigs[stepIndex]; + + _logger.LogInformation( + "Starting canary step {Step} at {Percentage}% traffic", + stepIndex, + step.TrafficPercentage); + + // Apply traffic routing + await _trafficRouter.ApplyAsync(new RoutingConfig + { + AbReleaseId = canary.Id, + ControlEndpoints = await GetBaselineEndpointsAsync(canary, ct), + TreatmentEndpoints = await GetCanaryEndpointsAsync(canary, ct), + ControlWeight = 100 - step.TrafficPercentage, + TreatmentWeight = step.TrafficPercentage + }, ct); + + var startMetrics = await _metricsAnalyzer.CollectAsync(canary, ct); + + var status = stepConfig.RequireManualApproval + ? CanaryStepStatus.WaitingApproval + : CanaryStepStatus.Running; + + var updatedSteps = canary.Steps.SetItem(stepIndex, step with + { + Status = status, + StartedAt = _timeProvider.GetUtcNow(), + MetricsAtStart = startMetrics + }); + + canary = canary with + { + Steps = updatedSteps, + CurrentStepIndex = stepIndex, + CurrentStepStartedAt = _timeProvider.GetUtcNow(), + Status = status == CanaryStepStatus.WaitingApproval + ? CanaryStatus.WaitingForMetrics + : CanaryStatus.Running, + StartedAt = canary.StartedAt ?? _timeProvider.GetUtcNow(), + LatestMetrics = startMetrics + }; + + await _store.SaveAsync(canary, ct); + + await _eventPublisher.PublishAsync(new CanaryStepStarted( + canary.Id, + stepIndex, + step.TrafficPercentage, + _timeProvider.GetUtcNow() + ), ct); + + return canary; + } + + private static ImmutableArray BuildSteps(CanaryConfig config) + { + return config.StepConfigs.Select(c => new CanaryStep + { + Index = c.StepIndex, + TrafficPercentage = c.TrafficPercentage, + Status = CanaryStepStatus.Pending + }).ToImmutableArray(); + } + + private static void ValidateConfig(CanaryConfig config) + { + if (config.StepConfigs.Length == 0) + { + throw new InvalidCanaryConfigException("At least one step is required"); + } + + var lastPercentage = 0; + foreach (var step in config.StepConfigs.OrderBy(s => s.StepIndex)) + { + if (step.TrafficPercentage <= lastPercentage) + { + throw new InvalidCanaryConfigException("Traffic percentage must increase with each step"); + } + lastPercentage = step.TrafficPercentage; + } + + if (lastPercentage != 100) + { + throw new InvalidCanaryConfigException("Final step must have 100% traffic"); + } + } + + private async Task GetRequiredAsync(Guid id, CancellationToken ct) + { + return await _store.GetAsync(id, ct) + ?? throw new CanaryNotFoundException(id); + } + + private Task> GetBaselineEndpointsAsync(CanaryRelease canary, CancellationToken ct) + { + // Implementation to get baseline/stable version endpoints + return Task.FromResult>(new List()); + } + + private Task> GetCanaryEndpointsAsync(CanaryRelease canary, CancellationToken ct) + { + // Implementation to get canary version endpoints + return Task.FromResult>(new List()); + } +} +``` + +### AutoRollback + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Canary; + +public sealed class AutoRollback : BackgroundService +{ + private readonly ICanaryController _canaryController; + private readonly CanaryMetricsAnalyzer _metricsAnalyzer; + private readonly ICanaryStore _store; + private readonly ILogger _logger; + + protected override async Task ExecuteAsync(CancellationToken stoppingToken) + { + _logger.LogInformation("Auto-rollback service started"); + + while (!stoppingToken.IsCancellationRequested) + { + try + { + await CheckActiveCanariesAsync(stoppingToken); + } + catch (Exception ex) + { + _logger.LogError(ex, "Error in auto-rollback check"); + } + + await Task.Delay(TimeSpan.FromSeconds(30), stoppingToken); + } + } + + private async Task CheckActiveCanariesAsync(CancellationToken ct) + { + var activeCanaries = await _store.ListByStatusAsync(CanaryStatus.Running, ct); + + foreach (var canary in activeCanaries) + { + if (!canary.Config.AutoRollback) + continue; + + try + { + await CheckAndRollbackIfNeededAsync(canary, ct); + } + catch (Exception ex) + { + _logger.LogError(ex, + "Error checking canary {Id} for auto-rollback", + canary.Id); + } + } + } + + private async Task CheckAndRollbackIfNeededAsync(CanaryRelease canary, CancellationToken ct) + { + var metrics = await _metricsAnalyzer.CollectAsync(canary, ct); + var thresholds = canary.Config.Thresholds; + + var violations = new List(); + + if (metrics.ErrorRate > thresholds.MaxErrorRate) + { + violations.Add($"Error rate {metrics.ErrorRate:P1} exceeds threshold {thresholds.MaxErrorRate:P1}"); + } + + if (metrics.LatencyP99Ms > thresholds.MaxLatencyP99Ms) + { + violations.Add($"P99 latency {metrics.LatencyP99Ms:F0}ms exceeds threshold {thresholds.MaxLatencyP99Ms:F0}ms"); + } + + if (metrics.SuccessRate < thresholds.MinSuccessRate) + { + violations.Add($"Success rate {metrics.SuccessRate:P1} below threshold {thresholds.MinSuccessRate:P1}"); + } + + // Check custom thresholds + foreach (var (metricName, threshold) in thresholds.CustomThresholds) + { + if (metrics.CustomMetrics.TryGetValue(metricName, out var value) && value > threshold) + { + violations.Add($"Custom metric {metricName} ({value:F2}) exceeds threshold ({threshold:F2})"); + } + } + + if (violations.Count > 0) + { + var reason = $"Auto-rollback triggered: {string.Join("; ", violations)}"; + + _logger.LogWarning( + "Auto-rolling back canary {Id}: {Reason}", + canary.Id, + reason); + + await _canaryController.RollbackAsync(canary.Id, reason, ct); + } + } +} +``` + +--- + +## Acceptance Criteria + +- [ ] Create canary with progression steps +- [ ] Start canary at initial traffic percentage +- [ ] Advance through steps automatically +- [ ] Wait for manual approval when configured +- [ ] Rollback on metric threshold breach +- [ ] Auto-rollback runs in background +- [ ] Complete canary at 100% traffic +- [ ] Pause and resume canary +- [ ] Track metrics at each step +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 110_001 A/B Release Manager | Internal | TODO | +| 110_002 Traffic Router Framework | Internal | TODO | +| Telemetry | External | Existing | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| ICanaryController | TODO | | +| CanaryController | TODO | | +| CanaryProgressionEngine | TODO | | +| CanaryMetricsAnalyzer | TODO | | +| AutoRollback | TODO | | +| CanaryRelease model | TODO | | +| CanaryStep model | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_110_004_PROGDL_nginx_router.md b/docs/implplan/SPRINT_20260110_110_004_PROGDL_nginx_router.md new file mode 100644 index 000000000..a588926da --- /dev/null +++ b/docs/implplan/SPRINT_20260110_110_004_PROGDL_nginx_router.md @@ -0,0 +1,762 @@ +# SPRINT: Router Plugin - Nginx + +> **Sprint ID:** 110_004 +> **Module:** PROGDL +> **Phase:** 10 - Progressive Delivery +> **Status:** TODO +> **Parent:** [110_000_INDEX](SPRINT_20260110_110_000_INDEX_progressive_delivery.md) + +--- + +## Overview + +Implement the Nginx traffic router plugin as the **reference implementation** for progressive delivery traffic splitting. This plugin serves as the primary built-in router and as a template for additional router plugins. + +### Router Plugin Catalog + +The Release Orchestrator supports multiple traffic router implementations via the `ITrafficRouter` interface: + +| Router | Status | Description | +|--------|--------|-------------| +| **Nginx** | **v1 Built-in** | Reference implementation (this sprint) | +| HAProxy | Plugin Example | Sample implementation for plugin developers | +| Traefik | Plugin Example | Sample implementation for plugin developers | +| AWS ALB | Plugin Example | Sample implementation for plugin developers | +| Envoy | Post-v1 | Planned for future release | + +> **Plugin Developer Note:** HAProxy, Traefik, and AWS ALB are provided as reference examples in the Plugin SDK (`StellaOps.Plugin.Sdk`) to demonstrate how third parties can implement the `ITrafficRouter` interface. These examples can be found in `src/ReleaseOrchestrator/__Plugins/StellaOps.Plugin.Sdk/Examples/Routers/`. Organizations can implement their own routers for Istio, Linkerd, Kong, or any other traffic management system. + +### Objectives + +- Generate Nginx upstream configurations +- Generate Nginx split_clients config for weighted routing +- Support header-based routing via map directives +- Hot reload Nginx configuration +- Parse Nginx status for metrics +- Serve as reference implementation for custom router plugins + +### Working Directory + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ └── StellaOps.ReleaseOrchestrator.Progressive/ +│ └── Routers/ +│ └── Nginx/ +│ ├── NginxRouter.cs +│ ├── NginxConfigGenerator.cs +│ ├── NginxReloader.cs +│ ├── NginxStatusParser.cs +│ └── Templates/ +│ ├── upstream.conf.template +│ ├── split_clients.conf.template +│ └── location.conf.template +└── __Tests/ +``` + +--- + +## Deliverables + +### NginxRouter + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Routers.Nginx; + +public sealed class NginxRouter : ITrafficRouter +{ + private readonly NginxConfigGenerator _configGenerator; + private readonly NginxReloader _reloader; + private readonly NginxStatusParser _statusParser; + private readonly NginxConfiguration _config; + private readonly IRoutingStateStore _stateStore; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public string RouterType => "nginx"; + + public IReadOnlyList SupportedStrategies => new[] + { + "weighted", + "header-based", + "cookie-based", + "combined" + }; + + public async Task IsAvailableAsync(CancellationToken ct = default) + { + try + { + var configTest = await TestConfigAsync(ct); + return configTest; + } + catch + { + return false; + } + } + + public async Task ApplyAsync( + RoutingConfig config, + CancellationToken ct = default) + { + _logger.LogInformation( + "Applying Nginx routing config for {ContextId}: {Control}%/{Treatment}%", + config.AbReleaseId, + config.ControlWeight, + config.TreatmentWeight); + + try + { + // Generate configuration files + var nginxConfig = _configGenerator.Generate(config); + + // Write configuration files + await WriteConfigFilesAsync(nginxConfig, ct); + + // Test configuration + var testResult = await TestConfigAsync(ct); + if (!testResult) + { + throw new NginxConfigurationException("Configuration test failed"); + } + + // Reload Nginx + await _reloader.ReloadAsync(ct); + + // Store state + await _stateStore.SaveAsync(new RoutingState + { + ContextId = config.AbReleaseId, + RouterType = RouterType, + Config = config, + Status = RoutingStateStatus.Applied, + AppliedAt = _timeProvider.GetUtcNow(), + LastVerifiedAt = _timeProvider.GetUtcNow() + }, ct); + + _logger.LogInformation( + "Successfully applied Nginx config for {ContextId}", + config.AbReleaseId); + } + catch (Exception ex) + { + _logger.LogError(ex, + "Failed to apply Nginx config for {ContextId}", + config.AbReleaseId); + + await _stateStore.SaveAsync(new RoutingState + { + ContextId = config.AbReleaseId, + RouterType = RouterType, + Config = config, + Status = RoutingStateStatus.Failed, + AppliedAt = _timeProvider.GetUtcNow(), + LastVerifiedAt = _timeProvider.GetUtcNow(), + Error = ex.Message + }, ct); + + throw; + } + } + + public async Task GetCurrentAsync( + Guid contextId, + CancellationToken ct = default) + { + var state = await _stateStore.GetAsync(contextId, ct); + if (state is null) + { + throw new RoutingConfigNotFoundException(contextId); + } + + return state.Config; + } + + public async Task RemoveAsync( + Guid contextId, + CancellationToken ct = default) + { + _logger.LogInformation("Removing Nginx config for {ContextId}", contextId); + + // Remove configuration files + var configPath = GetConfigPath(contextId); + if (File.Exists(configPath)) + { + File.Delete(configPath); + } + + // Reload Nginx + await _reloader.ReloadAsync(ct); + + // Update state + await _stateStore.DeleteAsync(contextId, ct); + } + + public async Task HealthCheckAsync(CancellationToken ct = default) + { + try + { + // Check Nginx is running + var isRunning = await CheckNginxRunningAsync(ct); + if (!isRunning) + return false; + + // Check config is valid + var configValid = await TestConfigAsync(ct); + return configValid; + } + catch + { + return false; + } + } + + public async Task GetMetricsAsync( + Guid contextId, + CancellationToken ct = default) + { + var statusUrl = $"{_config.StatusEndpoint}/status"; + return await _statusParser.ParseAsync(statusUrl, contextId, ct); + } + + private async Task WriteConfigFilesAsync(NginxConfig config, CancellationToken ct) + { + var basePath = _config.ConfigDirectory; + Directory.CreateDirectory(basePath); + + // Write upstream config + var upstreamPath = Path.Combine(basePath, $"upstream-{config.ContextId:N}.conf"); + await File.WriteAllTextAsync(upstreamPath, config.UpstreamConfig, ct); + + // Write routing config + var routingPath = Path.Combine(basePath, $"routing-{config.ContextId:N}.conf"); + await File.WriteAllTextAsync(routingPath, config.RoutingConfig, ct); + + // Write location config + var locationPath = Path.Combine(basePath, $"location-{config.ContextId:N}.conf"); + await File.WriteAllTextAsync(locationPath, config.LocationConfig, ct); + } + + private async Task TestConfigAsync(CancellationToken ct) + { + var result = await ExecuteNginxCommandAsync("-t", ct); + return result.ExitCode == 0; + } + + private async Task CheckNginxRunningAsync(CancellationToken ct) + { + var result = await ExecuteNginxCommandAsync("-s reload", ct); + return result.ExitCode == 0; + } + + private async Task ExecuteNginxCommandAsync(string args, CancellationToken ct) + { + var psi = new ProcessStartInfo + { + FileName = _config.NginxPath, + Arguments = args, + RedirectStandardOutput = true, + RedirectStandardError = true, + UseShellExecute = false + }; + + using var process = Process.Start(psi); + if (process is null) + { + return new ProcessResult(-1, "", "Failed to start process"); + } + + await process.WaitForExitAsync(ct); + + return new ProcessResult( + process.ExitCode, + await process.StandardOutput.ReadToEndAsync(ct), + await process.StandardError.ReadToEndAsync(ct)); + } + + private string GetConfigPath(Guid contextId) + { + return Path.Combine(_config.ConfigDirectory, $"routing-{contextId:N}.conf"); + } +} + +public sealed record ProcessResult(int ExitCode, string Stdout, string Stderr); +``` + +### NginxConfigGenerator + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Routers.Nginx; + +public sealed class NginxConfigGenerator +{ + private readonly NginxConfiguration _config; + + public NginxConfig Generate(RoutingConfig config) + { + var contextId = config.AbReleaseId.ToString("N"); + + var upstreamConfig = GenerateUpstreams(config, contextId); + var routingConfig = GenerateRouting(config, contextId); + var locationConfig = GenerateLocation(config, contextId); + + return new NginxConfig + { + ContextId = config.AbReleaseId, + UpstreamConfig = upstreamConfig, + RoutingConfig = routingConfig, + LocationConfig = locationConfig + }; + } + + private string GenerateUpstreams(RoutingConfig config, string contextId) + { + var sb = new StringBuilder(); + + // Control upstream + sb.AppendLine($"upstream control_{contextId} {{"); + foreach (var endpoint in config.ControlEndpoints) + { + sb.AppendLine($" server {endpoint};"); + } + sb.AppendLine("}"); + sb.AppendLine(); + + // Treatment upstream + if (config.TreatmentEndpoints.Count > 0) + { + sb.AppendLine($"upstream treatment_{contextId} {{"); + foreach (var endpoint in config.TreatmentEndpoints) + { + sb.AppendLine($" server {endpoint};"); + } + sb.AppendLine("}"); + } + + return sb.ToString(); + } + + private string GenerateRouting(RoutingConfig config, string contextId) + { + var sb = new StringBuilder(); + + switch (config.Strategy) + { + case RoutingStrategy.Weighted: + sb.Append(GenerateWeightedRouting(config, contextId)); + break; + + case RoutingStrategy.HeaderBased: + sb.Append(GenerateHeaderRouting(config, contextId)); + break; + + case RoutingStrategy.CookieBased: + sb.Append(GenerateCookieRouting(config, contextId)); + break; + + case RoutingStrategy.Combined: + sb.Append(GenerateCombinedRouting(config, contextId)); + break; + } + + return sb.ToString(); + } + + private string GenerateWeightedRouting(RoutingConfig config, string contextId) + { + var sb = new StringBuilder(); + + // Use split_clients for weighted distribution + sb.AppendLine($"split_clients \"$request_id\" $ab_upstream_{contextId} {{"); + sb.AppendLine($" {config.ControlWeight}% control_{contextId};"); + sb.AppendLine($" * treatment_{contextId};"); + sb.AppendLine("}"); + + return sb.ToString(); + } + + private string GenerateHeaderRouting(RoutingConfig config, string contextId) + { + var header = config.HeaderRouting!; + var sb = new StringBuilder(); + + // Use map for header-based routing + sb.AppendLine($"map $http_{NormalizeHeaderName(header.HeaderName)} $ab_upstream_{contextId} {{"); + sb.AppendLine($" default control_{contextId};"); + sb.AppendLine($" \"{header.TreatmentValue}\" treatment_{contextId};"); + sb.AppendLine("}"); + + return sb.ToString(); + } + + private string GenerateCookieRouting(RoutingConfig config, string contextId) + { + var cookie = config.CookieRouting!; + var sb = new StringBuilder(); + + // Use map for cookie-based routing + sb.AppendLine($"map $cookie_{cookie.CookieName} $ab_upstream_{contextId} {{"); + sb.AppendLine($" default control_{contextId};"); + sb.AppendLine($" \"{cookie.TreatmentValue}\" treatment_{contextId};"); + sb.AppendLine("}"); + + return sb.ToString(); + } + + private string GenerateCombinedRouting(RoutingConfig config, string contextId) + { + var sb = new StringBuilder(); + + // Check header first, then cookie, then weighted + sb.AppendLine($"# Combined routing for {contextId}"); + + if (config.HeaderRouting is not null) + { + var header = config.HeaderRouting; + sb.AppendLine($"map $http_{NormalizeHeaderName(header.HeaderName)} $ab_header_{contextId} {{"); + sb.AppendLine($" default \"\";"); + sb.AppendLine($" \"{header.TreatmentValue}\" treatment_{contextId};"); + sb.AppendLine("}"); + sb.AppendLine(); + } + + if (config.CookieRouting is not null) + { + var cookie = config.CookieRouting; + sb.AppendLine($"map $cookie_{cookie.CookieName} $ab_cookie_{contextId} {{"); + sb.AppendLine($" default \"\";"); + sb.AppendLine($" \"{cookie.TreatmentValue}\" treatment_{contextId};"); + sb.AppendLine("}"); + sb.AppendLine(); + } + + // Weighted fallback + sb.AppendLine($"split_clients \"$request_id\" $ab_weighted_{contextId} {{"); + sb.AppendLine($" {config.ControlWeight}% control_{contextId};"); + sb.AppendLine($" * treatment_{contextId};"); + sb.AppendLine("}"); + sb.AppendLine(); + + // Combined decision + sb.AppendLine($"map $ab_header_{contextId}$ab_cookie_{contextId} $ab_upstream_{contextId} {{"); + sb.AppendLine($" default $ab_weighted_{contextId};"); + sb.AppendLine($" \"~treatment_\" treatment_{contextId};"); + sb.AppendLine("}"); + + return sb.ToString(); + } + + private string GenerateLocation(RoutingConfig config, string contextId) + { + var sb = new StringBuilder(); + + sb.AppendLine($"# Location for A/B release {config.AbReleaseId}"); + sb.AppendLine($"location @ab_{contextId} {{"); + sb.AppendLine($" proxy_pass http://$ab_upstream_{contextId};"); + sb.AppendLine($" proxy_set_header Host $host;"); + sb.AppendLine($" proxy_set_header X-Real-IP $remote_addr;"); + sb.AppendLine($" proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;"); + sb.AppendLine($" proxy_set_header X-AB-Variant $ab_upstream_{contextId};"); + sb.AppendLine($" proxy_set_header X-AB-Release-Id {config.AbReleaseId};"); + sb.AppendLine("}"); + + return sb.ToString(); + } + + private static string NormalizeHeaderName(string headerName) + { + // Convert header name to Nginx variable format + // X-Canary-Test -> x_canary_test + return headerName.ToLowerInvariant().Replace('-', '_'); + } +} + +public sealed record NginxConfig +{ + public required Guid ContextId { get; init; } + public required string UpstreamConfig { get; init; } + public required string RoutingConfig { get; init; } + public required string LocationConfig { get; init; } +} +``` + +### NginxReloader + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Routers.Nginx; + +public sealed class NginxReloader +{ + private readonly NginxConfiguration _config; + private readonly ILogger _logger; + private readonly SemaphoreSlim _reloadLock = new(1, 1); + + public async Task ReloadAsync(CancellationToken ct = default) + { + await _reloadLock.WaitAsync(ct); + + try + { + _logger.LogDebug("Reloading Nginx configuration"); + + var result = await ExecuteAsync("-s reload", ct); + + if (result.ExitCode != 0) + { + _logger.LogError( + "Nginx reload failed: {Stderr}", + result.Stderr); + throw new NginxReloadException(result.Stderr); + } + + // Wait for reload to complete + await Task.Delay(TimeSpan.FromMilliseconds(500), ct); + + // Verify reload + var testResult = await ExecuteAsync("-t", ct); + if (testResult.ExitCode != 0) + { + throw new NginxReloadException("Post-reload test failed"); + } + + _logger.LogInformation("Nginx configuration reloaded successfully"); + } + finally + { + _reloadLock.Release(); + } + } + + public async Task TestConfigAsync(CancellationToken ct = default) + { + var result = await ExecuteAsync("-t", ct); + return result.ExitCode == 0; + } + + private async Task ExecuteAsync(string args, CancellationToken ct) + { + var psi = new ProcessStartInfo + { + FileName = _config.NginxPath, + Arguments = args, + RedirectStandardOutput = true, + RedirectStandardError = true, + UseShellExecute = false, + CreateNoWindow = true + }; + + using var process = new Process { StartInfo = psi }; + process.Start(); + + var stdout = await process.StandardOutput.ReadToEndAsync(ct); + var stderr = await process.StandardError.ReadToEndAsync(ct); + + await process.WaitForExitAsync(ct); + + return new ProcessResult(process.ExitCode, stdout, stderr); + } +} + +public sealed class NginxReloadException : Exception +{ + public NginxReloadException(string message) : base(message) { } +} +``` + +### NginxStatusParser + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Routers.Nginx; + +public sealed class NginxStatusParser +{ + private readonly HttpClient _httpClient; + private readonly TimeProvider _timeProvider; + private readonly ILogger _logger; + + public async Task ParseAsync( + string statusUrl, + Guid contextId, + CancellationToken ct = default) + { + try + { + // Get Nginx stub_status or extended status + var response = await _httpClient.GetStringAsync(statusUrl, ct); + + // Parse the status response + var status = ParseStatusResponse(response); + + // Get upstream-specific metrics if available + var upstreamMetrics = await GetUpstreamMetricsAsync(contextId, ct); + + return new RouterMetrics + { + ControlRequests = upstreamMetrics.ControlRequests, + TreatmentRequests = upstreamMetrics.TreatmentRequests, + ControlErrorRate = upstreamMetrics.ControlErrorRate, + TreatmentErrorRate = upstreamMetrics.TreatmentErrorRate, + ControlLatencyP50 = upstreamMetrics.ControlLatencyP50, + TreatmentLatencyP50 = upstreamMetrics.TreatmentLatencyP50, + CollectedAt = _timeProvider.GetUtcNow() + }; + } + catch (Exception ex) + { + _logger.LogWarning(ex, "Failed to parse Nginx status from {Url}", statusUrl); + + return new RouterMetrics + { + ControlRequests = 0, + TreatmentRequests = 0, + ControlErrorRate = 0, + TreatmentErrorRate = 0, + ControlLatencyP50 = 0, + TreatmentLatencyP50 = 0, + CollectedAt = _timeProvider.GetUtcNow() + }; + } + } + + private NginxStatus ParseStatusResponse(string response) + { + // Parse stub_status format: + // Active connections: 43 + // server accepts handled requests + // 7368 7368 10993 + // Reading: 0 Writing: 1 Waiting: 42 + + var lines = response.Split('\n', StringSplitOptions.RemoveEmptyEntries); + var status = new NginxStatus(); + + foreach (var line in lines) + { + if (line.StartsWith("Active connections:")) + { + var value = line.Split(':')[1].Trim(); + status.ActiveConnections = int.Parse(value, CultureInfo.InvariantCulture); + } + else if (line.Contains("Reading:")) + { + var parts = line.Split(new[] { ' ', ':' }, StringSplitOptions.RemoveEmptyEntries); + for (var i = 0; i < parts.Length; i++) + { + switch (parts[i]) + { + case "Reading": + status.Reading = int.Parse(parts[i + 1], CultureInfo.InvariantCulture); + break; + case "Writing": + status.Writing = int.Parse(parts[i + 1], CultureInfo.InvariantCulture); + break; + case "Waiting": + status.Waiting = int.Parse(parts[i + 1], CultureInfo.InvariantCulture); + break; + } + } + } + } + + return status; + } + + private async Task GetUpstreamMetricsAsync(Guid contextId, CancellationToken ct) + { + // This would typically query Nginx Plus API or a metrics exporter + // For open source Nginx, we'd use access log analysis or Prometheus metrics + + return new UpstreamMetrics + { + ControlRequests = 0, + TreatmentRequests = 0, + ControlErrorRate = 0, + TreatmentErrorRate = 0, + ControlLatencyP50 = 0, + TreatmentLatencyP50 = 0 + }; + } +} + +internal sealed class NginxStatus +{ + public int ActiveConnections { get; set; } + public int Accepts { get; set; } + public int Handled { get; set; } + public int Requests { get; set; } + public int Reading { get; set; } + public int Writing { get; set; } + public int Waiting { get; set; } +} + +internal sealed class UpstreamMetrics +{ + public long ControlRequests { get; set; } + public long TreatmentRequests { get; set; } + public double ControlErrorRate { get; set; } + public double TreatmentErrorRate { get; set; } + public double ControlLatencyP50 { get; set; } + public double TreatmentLatencyP50 { get; set; } +} +``` + +### NginxConfiguration + +```csharp +namespace StellaOps.ReleaseOrchestrator.Progressive.Routers.Nginx; + +public sealed class NginxConfiguration +{ + public string NginxPath { get; set; } = "/usr/sbin/nginx"; + public string ConfigDirectory { get; set; } = "/etc/nginx/conf.d/stella-ab"; + public string StatusEndpoint { get; set; } = "http://127.0.0.1:8080"; + public TimeSpan ReloadTimeout { get; set; } = TimeSpan.FromSeconds(30); + public bool TestConfigBeforeReload { get; set; } = true; + public bool BackupConfigOnChange { get; set; } = true; +} +``` + +--- + +## Acceptance Criteria + +- [ ] Generate upstream configuration +- [ ] Generate split_clients for weighted routing +- [ ] Generate map for header-based routing +- [ ] Generate map for cookie-based routing +- [ ] Support combined routing strategies +- [ ] Test configuration before reload +- [ ] Hot reload Nginx configuration +- [ ] Parse Nginx status for metrics +- [ ] Handle reload failures gracefully +- [ ] Unit test coverage >=85% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 110_002 Traffic Router Framework | Internal | TODO | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| NginxRouter | TODO | | +| NginxConfigGenerator | TODO | | +| NginxReloader | TODO | | +| NginxStatusParser | TODO | | +| NginxConfiguration | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 10-Jan-2026 | Added Router Plugin Catalog with HAProxy/Traefik/ALB as plugin reference examples | diff --git a/docs/implplan/SPRINT_20260110_111_000_INDEX_ui_implementation.md b/docs/implplan/SPRINT_20260110_111_000_INDEX_ui_implementation.md new file mode 100644 index 000000000..937adf5f7 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_111_000_INDEX_ui_implementation.md @@ -0,0 +1,300 @@ +# SPRINT INDEX: Phase 11 - UI Implementation + +> **Epic:** Release Orchestrator +> **Phase:** 11 - UI Implementation +> **Batch:** 111 +> **Status:** TODO +> **Parent:** [100_000_INDEX](SPRINT_20260110_100_000_INDEX_release_orchestrator.md) + +--- + +## Overview + +Phase 11 implements the frontend UI for the Release Orchestrator - Angular-based dashboards, management screens, and workflow editors. + +### Objectives + +- Dashboard with pipeline overview +- Environment management UI +- Release management UI +- Visual workflow editor +- Promotion and approval UI +- Deployment monitoring UI +- Evidence viewer + +--- + +## Sprint Structure + +| Sprint ID | Title | Module | Status | Dependencies | +|-----------|-------|--------|--------|--------------| +| 111_001 | Dashboard - Overview | FE | TODO | 107_001 | +| 111_002 | Environment Management UI | FE | TODO | 103_001 | +| 111_003 | Release Management UI | FE | TODO | 104_003 | +| 111_004 | Workflow Editor | FE | TODO | 105_001 | +| 111_005 | Promotion & Approval UI | FE | TODO | 106_001 | +| 111_006 | Deployment Monitoring UI | FE | TODO | 107_001 | +| 111_007 | Evidence Viewer | FE | TODO | 109_002 | + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ UI IMPLEMENTATION │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ DASHBOARD (111_001) │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Pipeline Overview │ │ │ +│ │ │ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │ │ │ +│ │ │ │ DEV │──►│STAGE │──►│ UAT │──►│ PROD │ │ │ │ +│ │ │ │ ✓ 3 │ │ ✓ 2 │ │ ⟳ 1 │ │ ○ 0 │ │ │ │ +│ │ │ └──────┘ └──────┘ └──────┘ └──────┘ │ │ │ +│ │ └─────────────────────────────────────────────────────────────────┘ │ │ +│ │ │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ Pending │ │ Active │ │ Recent │ │ │ +│ │ │ Approvals │ │ Deployments │ │ Releases │ │ │ +│ │ │ (5) │ │ (2) │ │ (12) │ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ ENVIRONMENT MANAGEMENT (111_002) │ │ +│ │ │ │ +│ │ Environments │ Environment: Production │ │ +│ │ ├── Development │ ┌───────────────────────────────────────────────┐ │ │ +│ │ ├── Staging │ │ Targets (4) │ Settings │ │ │ +│ │ ├── UAT │ │ ├── prod-web-01 │ Required Approvals: 2 │ │ │ +│ │ └── Production◄─┤ │ ├── prod-web-02 │ Freeze Windows: 1 │ │ │ +│ │ │ │ ├── prod-api-01 │ Auto-promote: disabled │ │ │ +│ │ │ │ └── prod-api-02 │ SoD: enabled │ │ │ +│ │ │ └───────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ WORKFLOW EDITOR (111_004) │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Visual DAG Editor │ │ │ +│ │ │ │ │ │ +│ │ │ ┌──────────┐ │ │ │ +│ │ │ │ Security │ │ │ │ +│ │ │ │ Gate │ │ │ │ +│ │ │ └────┬─────┘ │ │ │ +│ │ │ │ │ │ │ +│ │ │ ┌────▼─────┐ │ │ │ +│ │ │ │ Approval │ [ Step Palette ] │ │ │ +│ │ │ └────┬─────┘ ├── Script │ │ │ +│ │ │ │ ├── Approval │ │ │ +│ │ │ ┌────────┼────────┐ ├── Deploy │ │ │ +│ │ │ ▼ ▼ ├── Notify │ │ │ +│ │ │ ┌──────┐ ┌──────┐└── Gate │ │ │ +│ │ │ │Deploy│ │Smoke │ │ │ │ +│ │ │ └──┬───┘ └──┬───┘ │ │ │ +│ │ │ └───────┬───────┘ │ │ │ +│ │ │ ▼ │ │ │ +│ │ │ ┌─────────┐ │ │ │ +│ │ │ │ Notify │ │ │ │ +│ │ │ └─────────┘ │ │ │ +│ │ └─────────────────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ DEPLOYMENT MONITORING (111_006) │ │ +│ │ │ │ +│ │ Deployment: myapp-v2.3.1 → Production │ │ +│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Progress: 75% ████████████████████░░░░░░░ │ │ │ +│ │ │ │ │ │ +│ │ │ Target Status Duration Agent │ │ │ +│ │ │ prod-web-01 ✓ Done 2m 15s agent-01 │ │ │ +│ │ │ prod-web-02 ✓ Done 2m 08s agent-01 │ │ │ +│ │ │ prod-api-01 ⟳ Running 1m 45s agent-02 │ │ │ +│ │ │ prod-api-02 ○ Pending - - │ │ │ +│ │ │ │ │ │ +│ │ │ [View Logs] [Cancel] [Rollback] │ │ │ +│ │ └─────────────────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Deliverables Summary + +### 111_001: Dashboard - Overview + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `DashboardComponent` | Angular | Main dashboard | +| `PipelineOverview` | Component | Environment pipeline | +| `PendingApprovals` | Component | Approval queue | +| `ActiveDeployments` | Component | Running deployments | +| `RecentReleases` | Component | Release list | + +### 111_002: Environment Management UI + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `EnvironmentListComponent` | Angular | Environment list | +| `EnvironmentDetailComponent` | Angular | Environment detail | +| `TargetListComponent` | Component | Target management | +| `FreezeWindowEditor` | Component | Freeze window config | + +### 111_003: Release Management UI + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `ReleaseListComponent` | Angular | Release catalog | +| `ReleaseDetailComponent` | Angular | Release detail | +| `CreateReleaseWizard` | Component | Release creation | +| `ComponentSelector` | Component | Add components | + +### 111_004: Workflow Editor + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `WorkflowEditorComponent` | Angular | DAG editor | +| `StepPalette` | Component | Available steps | +| `StepConfigPanel` | Component | Step configuration | +| `DagCanvas` | Component | Visual DAG | +| `YamlEditor` | Component | Raw YAML editing | + +### 111_005: Promotion & Approval UI + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `PromotionRequestComponent` | Angular | Request promotion | +| `ApprovalQueueComponent` | Angular | Pending approvals | +| `ApprovalDetailComponent` | Angular | Approval action | +| `GateResultsPanel` | Component | Gate status | + +### 111_006: Deployment Monitoring UI + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `DeploymentMonitorComponent` | Angular | Deployment status | +| `TargetProgressList` | Component | Per-target progress | +| `LogStreamViewer` | Component | Real-time logs | +| `RollbackDialog` | Component | Rollback confirmation | + +### 111_007: Evidence Viewer + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `EvidenceListComponent` | Angular | Evidence packets | +| `EvidenceDetailComponent` | Angular | Evidence detail | +| `EvidenceVerifier` | Component | Verify signature | +| `ExportDialog` | Component | Export options | + +--- + +## Component Library + +```typescript +// Shared UI Components +@NgModule({ + declarations: [ + // Layout + PageHeaderComponent, + SideNavComponent, + BreadcrumbsComponent, + + // Data Display + StatusBadgeComponent, + ProgressBarComponent, + TimelineComponent, + DataTableComponent, + + // Forms + SearchInputComponent, + FilterPanelComponent, + JsonEditorComponent, + + // Feedback + ToastNotificationComponent, + ConfirmDialogComponent, + LoadingSpinnerComponent, + + // Domain + EnvironmentBadgeComponent, + ReleaseStatusComponent, + GateStatusIconComponent, + DigestDisplayComponent + ] +}) +export class SharedUiModule {} +``` + +--- + +## State Management + +```typescript +// NgRx Store Structure +interface AppState { + environments: EnvironmentsState; + releases: ReleasesState; + promotions: PromotionsState; + deployments: DeploymentsState; + evidence: EvidenceState; + ui: UiState; +} + +// Actions Pattern +export const EnvironmentActions = createActionGroup({ + source: 'Environments', + events: { + 'Load Environments': emptyProps(), + 'Load Environments Success': props<{ environments: Environment[] }>(), + 'Load Environments Failure': props<{ error: string }>(), + 'Select Environment': props<{ id: string }>(), + 'Create Environment': props<{ request: CreateEnvironmentRequest }>(), + 'Update Environment': props<{ id: string; request: UpdateEnvironmentRequest }>(), + 'Delete Environment': props<{ id: string }>() + } +}); +``` + +--- + +## Dependencies + +| Module | Purpose | +|--------|---------| +| All backend APIs | Data source | +| Angular 17 | Framework | +| NgRx | State management | +| PrimeNG | UI components | +| Monaco Editor | YAML/JSON editing | +| D3.js | DAG visualization | + +--- + +## Acceptance Criteria + +- [ ] Dashboard loads quickly (<2s) +- [ ] Environment CRUD works +- [ ] Target health displayed +- [ ] Release creation wizard works +- [ ] Workflow editor saves correctly +- [ ] DAG visualization renders +- [ ] Approval flow works end-to-end +- [ ] Deployment progress updates real-time +- [ ] Log streaming works +- [ ] Evidence verification shows result +- [ ] Export downloads file +- [ ] Responsive on tablet/desktop + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Phase 11 index created | diff --git a/docs/implplan/SPRINT_20260110_111_001_FE_dashboard_overview.md b/docs/implplan/SPRINT_20260110_111_001_FE_dashboard_overview.md new file mode 100644 index 000000000..556c874c6 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_111_001_FE_dashboard_overview.md @@ -0,0 +1,792 @@ +# SPRINT: Dashboard - Overview + +> **Sprint ID:** 111_001 +> **Module:** FE +> **Phase:** 11 - UI Implementation +> **Status:** TODO +> **Parent:** [111_000_INDEX](SPRINT_20260110_111_000_INDEX_ui_implementation.md) + +--- + +## Overview + +Implement the main Release Orchestrator dashboard providing at-a-glance visibility into pipeline status, pending approvals, active deployments, and recent releases. + +### Objectives + +- Pipeline overview showing environments progression +- Pending approvals count and quick actions +- Active deployments with progress indicators +- Recent releases list +- Real-time updates via SignalR + +### Working Directory + +``` +src/Web/StellaOps.Web/ +├── src/app/features/release-orchestrator/ +│ └── dashboard/ +│ ├── dashboard.component.ts +│ ├── dashboard.component.html +│ ├── dashboard.component.scss +│ ├── dashboard.routes.ts +│ ├── components/ +│ │ ├── pipeline-overview/ +│ │ │ ├── pipeline-overview.component.ts +│ │ │ ├── pipeline-overview.component.html +│ │ │ └── pipeline-overview.component.scss +│ │ ├── pending-approvals/ +│ │ │ ├── pending-approvals.component.ts +│ │ │ ├── pending-approvals.component.html +│ │ │ └── pending-approvals.component.scss +│ │ ├── active-deployments/ +│ │ │ ├── active-deployments.component.ts +│ │ │ ├── active-deployments.component.html +│ │ │ └── active-deployments.component.scss +│ │ └── recent-releases/ +│ │ ├── recent-releases.component.ts +│ │ ├── recent-releases.component.html +│ │ └── recent-releases.component.scss +│ └── services/ +│ └── dashboard.service.ts +└── src/app/store/release-orchestrator/ + └── dashboard/ + ├── dashboard.actions.ts + ├── dashboard.reducer.ts + ├── dashboard.effects.ts + └── dashboard.selectors.ts +``` + +--- + +## Deliverables + +### Dashboard Component + +```typescript +// dashboard.component.ts +import { Component, OnInit, OnDestroy, inject, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { Store } from '@ngrx/store'; +import { Subject, takeUntil } from 'rxjs'; +import { PipelineOverviewComponent } from './components/pipeline-overview/pipeline-overview.component'; +import { PendingApprovalsComponent } from './components/pending-approvals/pending-approvals.component'; +import { ActiveDeploymentsComponent } from './components/active-deployments/active-deployments.component'; +import { RecentReleasesComponent } from './components/recent-releases/recent-releases.component'; +import { DashboardActions } from '@store/release-orchestrator/dashboard/dashboard.actions'; +import * as DashboardSelectors from '@store/release-orchestrator/dashboard/dashboard.selectors'; + +@Component({ + selector: 'so-dashboard', + standalone: true, + imports: [ + CommonModule, + PipelineOverviewComponent, + PendingApprovalsComponent, + ActiveDeploymentsComponent, + RecentReleasesComponent + ], + templateUrl: './dashboard.component.html', + styleUrl: './dashboard.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class DashboardComponent implements OnInit, OnDestroy { + private readonly store = inject(Store); + private readonly destroy$ = new Subject(); + + readonly loading$ = this.store.select(DashboardSelectors.selectLoading); + readonly error$ = this.store.select(DashboardSelectors.selectError); + readonly pipelineData$ = this.store.select(DashboardSelectors.selectPipelineData); + readonly pendingApprovals$ = this.store.select(DashboardSelectors.selectPendingApprovals); + readonly activeDeployments$ = this.store.select(DashboardSelectors.selectActiveDeployments); + readonly recentReleases$ = this.store.select(DashboardSelectors.selectRecentReleases); + readonly lastUpdated$ = this.store.select(DashboardSelectors.selectLastUpdated); + + ngOnInit(): void { + this.store.dispatch(DashboardActions.loadDashboard()); + this.store.dispatch(DashboardActions.subscribeToUpdates()); + } + + ngOnDestroy(): void { + this.store.dispatch(DashboardActions.unsubscribeFromUpdates()); + this.destroy$.next(); + this.destroy$.complete(); + } + + onRefresh(): void { + this.store.dispatch(DashboardActions.loadDashboard()); + } +} +``` + +```html + +
+
+

Release Orchestrator

+
+ + Last updated: {{ lastUpdated | date:'medium' }} + + +
+
+ +
+ +
+ +
+
+ + +
+ +
+
+ + +
+ +
+ + +
+ +
+ + +
+
+
+
+``` + +### Pipeline Overview Component + +```typescript +// pipeline-overview.component.ts +import { Component, Input, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { RouterModule } from '@angular/router'; + +export interface PipelineEnvironment { + id: string; + name: string; + order: number; + releaseCount: number; + pendingCount: number; + healthStatus: 'healthy' | 'degraded' | 'unhealthy' | 'unknown'; +} + +export interface PipelineData { + environments: PipelineEnvironment[]; + connections: Array<{ from: string; to: string }>; +} + +@Component({ + selector: 'so-pipeline-overview', + standalone: true, + imports: [CommonModule, RouterModule], + templateUrl: './pipeline-overview.component.html', + styleUrl: './pipeline-overview.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class PipelineOverviewComponent { + @Input() data: PipelineData | null = null; + @Input() loading = false; + + getStatusIcon(status: string): string { + switch (status) { + case 'healthy': return 'pi-check-circle'; + case 'degraded': return 'pi-exclamation-triangle'; + case 'unhealthy': return 'pi-times-circle'; + default: return 'pi-question-circle'; + } + } + + getStatusClass(status: string): string { + return `env-card--${status}`; + } +} +``` + +```html + + +``` + +### Pending Approvals Component + +```typescript +// pending-approvals.component.ts +import { Component, Input, Output, EventEmitter, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { RouterModule } from '@angular/router'; + +export interface PendingApproval { + id: string; + releaseId: string; + releaseName: string; + sourceEnvironment: string; + targetEnvironment: string; + requestedBy: string; + requestedAt: Date; + urgency: 'low' | 'normal' | 'high' | 'critical'; +} + +@Component({ + selector: 'so-pending-approvals', + standalone: true, + imports: [CommonModule, RouterModule], + templateUrl: './pending-approvals.component.html', + styleUrl: './pending-approvals.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class PendingApprovalsComponent { + @Input() approvals: PendingApproval[] | null = null; + @Input() loading = false; + @Output() approve = new EventEmitter(); + @Output() reject = new EventEmitter(); + + getUrgencyClass(urgency: string): string { + return `approval--${urgency}`; + } + + onQuickApprove(event: Event, id: string): void { + event.preventDefault(); + event.stopPropagation(); + this.approve.emit(id); + } + + onQuickReject(event: Event, id: string): void { + event.preventDefault(); + event.stopPropagation(); + this.reject.emit(id); + } +} +``` + +```html + + +``` + +### Active Deployments Component + +```typescript +// active-deployments.component.ts +import { Component, Input, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { RouterModule } from '@angular/router'; + +export interface ActiveDeployment { + id: string; + releaseId: string; + releaseName: string; + environment: string; + progress: number; + status: 'running' | 'paused' | 'waiting'; + startedAt: Date; + completedTargets: number; + totalTargets: number; +} + +@Component({ + selector: 'so-active-deployments', + standalone: true, + imports: [CommonModule, RouterModule], + templateUrl: './active-deployments.component.html', + styleUrl: './active-deployments.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class ActiveDeploymentsComponent { + @Input() deployments: ActiveDeployment[] | null = null; + @Input() loading = false; + + getStatusIcon(status: string): string { + switch (status) { + case 'running': return 'pi-spin pi-spinner'; + case 'paused': return 'pi-pause'; + case 'waiting': return 'pi-clock'; + default: return 'pi-question'; + } + } + + getDuration(startedAt: Date): string { + const diff = Date.now() - new Date(startedAt).getTime(); + const minutes = Math.floor(diff / 60000); + const seconds = Math.floor((diff % 60000) / 1000); + return `${minutes}m ${seconds}s`; + } +} +``` + +```html + + +``` + +### Recent Releases Component + +```typescript +// recent-releases.component.ts +import { Component, Input, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { RouterModule } from '@angular/router'; + +export interface RecentRelease { + id: string; + name: string; + version: string; + status: 'draft' | 'ready' | 'deploying' | 'deployed' | 'failed' | 'rolled_back'; + currentEnvironment: string | null; + createdAt: Date; + createdBy: string; + componentCount: number; +} + +@Component({ + selector: 'so-recent-releases', + standalone: true, + imports: [CommonModule, RouterModule], + templateUrl: './recent-releases.component.html', + styleUrl: './recent-releases.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class RecentReleasesComponent { + @Input() releases: RecentRelease[] | null = null; + @Input() loading = false; + + getStatusBadgeClass(status: string): string { + const map: Record = { + draft: 'badge--secondary', + ready: 'badge--info', + deploying: 'badge--warning', + deployed: 'badge--success', + failed: 'badge--danger', + rolled_back: 'badge--warning' + }; + return map[status] || 'badge--secondary'; + } + + formatStatus(status: string): string { + return status.replace('_', ' ').replace(/\b\w/g, c => c.toUpperCase()); + } +} +``` + +```html + +
+
+

Recent Releases

+ View all +
+ +
+ + + + Release + Status + Environment + Components + Created + + + + + + + {{ release.name }} {{ release.version }} + + + + + {{ formatStatus(release.status) }} + + + {{ release.currentEnvironment || '-' }} + {{ release.componentCount }} + {{ release.createdAt | date:'short' }} + + + + + No releases found + + + +
+ + + + +
+``` + +### Dashboard Store + +```typescript +// dashboard.actions.ts +import { createActionGroup, emptyProps, props } from '@ngrx/store'; +import { PipelineData, PendingApproval, ActiveDeployment, RecentRelease } from '../models'; + +export const DashboardActions = createActionGroup({ + source: 'Dashboard', + events: { + 'Load Dashboard': emptyProps(), + 'Load Dashboard Success': props<{ + pipelineData: PipelineData; + pendingApprovals: PendingApproval[]; + activeDeployments: ActiveDeployment[]; + recentReleases: RecentRelease[]; + }>(), + 'Load Dashboard Failure': props<{ error: string }>(), + 'Subscribe To Updates': emptyProps(), + 'Unsubscribe From Updates': emptyProps(), + 'Update Pipeline': props<{ pipelineData: PipelineData }>(), + 'Update Approvals': props<{ approvals: PendingApproval[] }>(), + 'Update Deployments': props<{ deployments: ActiveDeployment[] }>(), + 'Update Releases': props<{ releases: RecentRelease[] }>(), + } +}); + +// dashboard.reducer.ts +import { createReducer, on } from '@ngrx/store'; +import { DashboardActions } from './dashboard.actions'; + +export interface DashboardState { + pipelineData: PipelineData | null; + pendingApprovals: PendingApproval[]; + activeDeployments: ActiveDeployment[]; + recentReleases: RecentRelease[]; + loading: boolean; + error: string | null; + lastUpdated: Date | null; +} + +const initialState: DashboardState = { + pipelineData: null, + pendingApprovals: [], + activeDeployments: [], + recentReleases: [], + loading: false, + error: null, + lastUpdated: null +}; + +export const dashboardReducer = createReducer( + initialState, + on(DashboardActions.loadDashboard, (state) => ({ + ...state, + loading: true, + error: null + })), + on(DashboardActions.loadDashboardSuccess, (state, { pipelineData, pendingApprovals, activeDeployments, recentReleases }) => ({ + ...state, + pipelineData, + pendingApprovals, + activeDeployments, + recentReleases, + loading: false, + lastUpdated: new Date() + })), + on(DashboardActions.loadDashboardFailure, (state, { error }) => ({ + ...state, + loading: false, + error + })), + on(DashboardActions.updatePipeline, (state, { pipelineData }) => ({ + ...state, + pipelineData, + lastUpdated: new Date() + })), + on(DashboardActions.updateApprovals, (state, { approvals }) => ({ + ...state, + pendingApprovals: approvals, + lastUpdated: new Date() + })), + on(DashboardActions.updateDeployments, (state, { deployments }) => ({ + ...state, + activeDeployments: deployments, + lastUpdated: new Date() + })), + on(DashboardActions.updateReleases, (state, { releases }) => ({ + ...state, + recentReleases: releases, + lastUpdated: new Date() + })) +); + +// dashboard.selectors.ts +import { createFeatureSelector, createSelector } from '@ngrx/store'; +import { DashboardState } from './dashboard.reducer'; + +export const selectDashboardState = createFeatureSelector('dashboard'); + +export const selectLoading = createSelector(selectDashboardState, state => state.loading); +export const selectError = createSelector(selectDashboardState, state => state.error); +export const selectPipelineData = createSelector(selectDashboardState, state => state.pipelineData); +export const selectPendingApprovals = createSelector(selectDashboardState, state => state.pendingApprovals); +export const selectActiveDeployments = createSelector(selectDashboardState, state => state.activeDeployments); +export const selectRecentReleases = createSelector(selectDashboardState, state => state.recentReleases); +export const selectLastUpdated = createSelector(selectDashboardState, state => state.lastUpdated); +export const selectPendingApprovalCount = createSelector(selectPendingApprovals, approvals => approvals.length); +export const selectActiveDeploymentCount = createSelector(selectActiveDeployments, deployments => deployments.length); +``` + +### Dashboard Service + +```typescript +// dashboard.service.ts +import { Injectable, inject } from '@angular/core'; +import { HttpClient } from '@angular/common/http'; +import { Observable, Subject, takeUntil } from 'rxjs'; +import { HubConnection, HubConnectionBuilder } from '@microsoft/signalr'; +import { environment } from '@env/environment'; + +export interface DashboardData { + pipelineData: PipelineData; + pendingApprovals: PendingApproval[]; + activeDeployments: ActiveDeployment[]; + recentReleases: RecentRelease[]; +} + +@Injectable({ providedIn: 'root' }) +export class DashboardService { + private readonly http = inject(HttpClient); + private readonly baseUrl = `${environment.apiUrl}/api/v1/release-orchestrator/dashboard`; + private hubConnection: HubConnection | null = null; + private readonly updates$ = new Subject>(); + + getDashboardData(): Observable { + return this.http.get(this.baseUrl); + } + + subscribeToUpdates(): Observable> { + if (!this.hubConnection) { + this.hubConnection = new HubConnectionBuilder() + .withUrl(`${environment.apiUrl}/hubs/dashboard`) + .withAutomaticReconnect() + .build(); + + this.hubConnection.on('PipelineUpdated', (data) => { + this.updates$.next({ pipelineData: data }); + }); + + this.hubConnection.on('ApprovalsUpdated', (data) => { + this.updates$.next({ pendingApprovals: data }); + }); + + this.hubConnection.on('DeploymentsUpdated', (data) => { + this.updates$.next({ activeDeployments: data }); + }); + + this.hubConnection.on('ReleasesUpdated', (data) => { + this.updates$.next({ recentReleases: data }); + }); + + this.hubConnection.start().catch(err => console.error('SignalR connection error:', err)); + } + + return this.updates$.asObservable(); + } + + unsubscribeFromUpdates(): void { + if (this.hubConnection) { + this.hubConnection.stop(); + this.hubConnection = null; + } + } +} +``` + +### Documentation Deliverables + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `docs/modules/release-orchestrator/api/websockets.md` | Markdown | WebSocket/SSE endpoint documentation for real-time updates (workflow runs, deployments, dashboard metrics, agent tasks) | +| `docs/modules/release-orchestrator/ui/dashboard.md` | Markdown | Dashboard specification with layout, metrics, TypeScript interfaces | + +--- + +## Acceptance Criteria + +### Code + +- [ ] Dashboard loads within 2 seconds +- [ ] Pipeline overview shows all environments +- [ ] Environment health status displayed correctly +- [ ] Pending approvals show count badge +- [ ] Quick approve/reject actions work +- [ ] Active deployments show progress +- [ ] Recent releases table paginated +- [ ] Real-time updates via SignalR +- [ ] Loading skeletons shown during fetch +- [ ] Error messages displayed appropriately +- [ ] Responsive layout on tablet/desktop +- [ ] Unit test coverage >=80% + +### Documentation + +- [ ] WebSocket API documentation file created +- [ ] All 4 real-time streams documented (workflow, deployment, dashboard, agent) +- [ ] WebSocket authentication flow documented +- [ ] Message format schemas included +- [ ] Dashboard specification file created +- [ ] Dashboard layout diagram included +- [ ] Metrics TypeScript interfaces documented + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 107_001 Platform API Gateway | Internal | TODO | +| Angular 17 | External | Available | +| NgRx 17 | External | Available | +| PrimeNG 17 | External | Available | +| SignalR Client | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| DashboardComponent | TODO | | +| PipelineOverviewComponent | TODO | | +| PendingApprovalsComponent | TODO | | +| ActiveDeploymentsComponent | TODO | | +| RecentReleasesComponent | TODO | | +| Dashboard NgRx Store | TODO | | +| DashboardService | TODO | | +| SignalR integration | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 11-Jan-2026 | Added documentation deliverables: api/websockets.md, ui/dashboard.md | diff --git a/docs/implplan/SPRINT_20260110_111_002_FE_environment_management_ui.md b/docs/implplan/SPRINT_20260110_111_002_FE_environment_management_ui.md new file mode 100644 index 000000000..ebdc141e9 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_111_002_FE_environment_management_ui.md @@ -0,0 +1,993 @@ +# SPRINT: Environment Management UI + +> **Sprint ID:** 111_002 +> **Module:** FE +> **Phase:** 11 - UI Implementation +> **Status:** TODO +> **Parent:** [111_000_INDEX](SPRINT_20260110_111_000_INDEX_ui_implementation.md) + +--- + +## Overview + +Implement the Environment Management UI providing CRUD operations for environments, target management, freeze window configuration, and environment settings. + +### Objectives + +- Environment list with hierarchy visualization +- Environment detail with targets and settings +- Target management (add/remove/health) +- Freeze window editor +- Environment settings configuration + +### Working Directory + +``` +src/Web/StellaOps.Web/ +├── src/app/features/release-orchestrator/ +│ └── environments/ +│ ├── environment-list/ +│ │ ├── environment-list.component.ts +│ │ ├── environment-list.component.html +│ │ └── environment-list.component.scss +│ ├── environment-detail/ +│ │ ├── environment-detail.component.ts +│ │ ├── environment-detail.component.html +│ │ └── environment-detail.component.scss +│ ├── components/ +│ │ ├── target-list/ +│ │ ├── target-form/ +│ │ ├── freeze-window-editor/ +│ │ ├── environment-settings/ +│ │ └── environment-form/ +│ ├── services/ +│ │ └── environment.service.ts +│ └── environments.routes.ts +└── src/app/store/release-orchestrator/ + └── environments/ + ├── environments.actions.ts + ├── environments.reducer.ts + ├── environments.effects.ts + └── environments.selectors.ts +``` + +--- + +## Deliverables + +### Environment List Component + +```typescript +// environment-list.component.ts +import { Component, OnInit, inject, ChangeDetectionStrategy, signal, computed } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { RouterModule } from '@angular/router'; +import { Store } from '@ngrx/store'; +import { DialogService, DynamicDialogRef } from 'primeng/dynamicdialog'; +import { ConfirmationService } from 'primeng/api'; +import { EnvironmentActions } from '@store/release-orchestrator/environments/environments.actions'; +import * as EnvironmentSelectors from '@store/release-orchestrator/environments/environments.selectors'; +import { EnvironmentFormComponent } from '../components/environment-form/environment-form.component'; + +export interface Environment { + id: string; + name: string; + description: string; + order: number; + isProduction: boolean; + targetCount: number; + healthyTargetCount: number; + requiresApproval: boolean; + requiredApprovers: number; + freezeWindowCount: number; + activeFreezeWindow: boolean; + createdAt: Date; + updatedAt: Date; +} + +@Component({ + selector: 'so-environment-list', + standalone: true, + imports: [CommonModule, RouterModule], + templateUrl: './environment-list.component.html', + styleUrl: './environment-list.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush, + providers: [DialogService, ConfirmationService] +}) +export class EnvironmentListComponent implements OnInit { + private readonly store = inject(Store); + private readonly dialogService = inject(DialogService); + private readonly confirmationService = inject(ConfirmationService); + + readonly environments$ = this.store.select(EnvironmentSelectors.selectAllEnvironments); + readonly loading$ = this.store.select(EnvironmentSelectors.selectLoading); + readonly error$ = this.store.select(EnvironmentSelectors.selectError); + + searchTerm = signal(''); + + ngOnInit(): void { + this.store.dispatch(EnvironmentActions.loadEnvironments()); + } + + onSearch(term: string): void { + this.searchTerm.set(term); + } + + onCreate(): void { + const ref = this.dialogService.open(EnvironmentFormComponent, { + header: 'Create Environment', + width: '600px', + data: { mode: 'create' } + }); + + ref.onClose.subscribe((result) => { + if (result) { + this.store.dispatch(EnvironmentActions.createEnvironment({ request: result })); + } + }); + } + + onDelete(env: Environment): void { + this.confirmationService.confirm({ + message: `Are you sure you want to delete "${env.name}"? This action cannot be undone.`, + header: 'Delete Environment', + icon: 'pi pi-exclamation-triangle', + acceptButtonStyleClass: 'p-button-danger', + accept: () => { + this.store.dispatch(EnvironmentActions.deleteEnvironment({ id: env.id })); + } + }); + } + + getHealthPercentage(env: Environment): number { + if (env.targetCount === 0) return 100; + return Math.round((env.healthyTargetCount / env.targetCount) * 100); + } + + getHealthClass(env: Environment): string { + const pct = this.getHealthPercentage(env); + if (pct >= 90) return 'health--good'; + if (pct >= 70) return 'health--warning'; + return 'health--critical'; + } +} +``` + +```html + +
+
+

Environments

+
+ + + + + +
+
+ +
+ +
+ +
+
+
+
+
+ #{{ env.order }} + {{ env.name }} + Production +
+ + +
+ +

{{ env.description }}

+ +
+
+ {{ env.targetCount }} + Targets +
+
+ {{ getHealthPercentage(env) }}% + Healthy +
+
+ {{ env.requiredApprovers }} + Approvers +
+
+ + +
+
+ +
+ +

No environments yet

+

Create your first environment to start managing releases.

+ +
+
+ + +
+ +
+
+
+ + +``` + +### Environment Detail Component + +```typescript +// environment-detail.component.ts +import { Component, OnInit, inject, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { ActivatedRoute, RouterModule } from '@angular/router'; +import { Store } from '@ngrx/store'; +import { TabViewModule } from 'primeng/tabview'; +import { TargetListComponent } from '../components/target-list/target-list.component'; +import { FreezeWindowEditorComponent } from '../components/freeze-window-editor/freeze-window-editor.component'; +import { EnvironmentSettingsComponent } from '../components/environment-settings/environment-settings.component'; +import { EnvironmentActions } from '@store/release-orchestrator/environments/environments.actions'; +import * as EnvironmentSelectors from '@store/release-orchestrator/environments/environments.selectors'; + +@Component({ + selector: 'so-environment-detail', + standalone: true, + imports: [ + CommonModule, + RouterModule, + TabViewModule, + TargetListComponent, + FreezeWindowEditorComponent, + EnvironmentSettingsComponent + ], + templateUrl: './environment-detail.component.html', + styleUrl: './environment-detail.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class EnvironmentDetailComponent implements OnInit { + private readonly store = inject(Store); + private readonly route = inject(ActivatedRoute); + + readonly environment$ = this.store.select(EnvironmentSelectors.selectSelectedEnvironment); + readonly targets$ = this.store.select(EnvironmentSelectors.selectSelectedEnvironmentTargets); + readonly freezeWindows$ = this.store.select(EnvironmentSelectors.selectSelectedEnvironmentFreezeWindows); + readonly loading$ = this.store.select(EnvironmentSelectors.selectLoading); + + activeTabIndex = 0; + + ngOnInit(): void { + const id = this.route.snapshot.paramMap.get('id'); + if (id) { + this.store.dispatch(EnvironmentActions.loadEnvironment({ id })); + this.store.dispatch(EnvironmentActions.loadEnvironmentTargets({ environmentId: id })); + this.store.dispatch(EnvironmentActions.loadFreezeWindows({ environmentId: id })); + } + } + + onTabChange(index: number): void { + this.activeTabIndex = index; + } +} +``` + +```html + +
+
+
+ Environments + + {{ env.name }} +
+ +
+
+

+ {{ env.name }} + Production +

+

{{ env.description }}

+
+
+ +
+
+ +
+
+ +
+ {{ env.targetCount }} + Deployment Targets +
+
+
+ +
+ {{ env.requiredApprovers }} + Required Approvers +
+
+
+ +
+ {{ env.freezeWindowCount }} + Freeze Windows +
+
+
+
+ + + + + + + + + + + + + + + + + +
+ +
+ +
+``` + +### Target List Component + +```typescript +// target-list.component.ts +import { Component, Input, inject, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { Store } from '@ngrx/store'; +import { DialogService } from 'primeng/dynamicdialog'; +import { ConfirmationService, MessageService } from 'primeng/api'; +import { TargetFormComponent } from '../target-form/target-form.component'; +import { EnvironmentActions } from '@store/release-orchestrator/environments/environments.actions'; + +export interface DeploymentTarget { + id: string; + environmentId: string; + name: string; + type: 'docker_host' | 'compose_host' | 'ecs_service' | 'nomad_job'; + agentId: string | null; + agentStatus: 'connected' | 'disconnected' | 'unknown'; + healthStatus: 'healthy' | 'unhealthy' | 'unknown'; + lastHealthCheck: Date | null; + metadata: Record; + createdAt: Date; +} + +@Component({ + selector: 'so-target-list', + standalone: true, + imports: [CommonModule], + templateUrl: './target-list.component.html', + styleUrl: './target-list.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush, + providers: [DialogService, ConfirmationService, MessageService] +}) +export class TargetListComponent { + @Input() targets: DeploymentTarget[] | null = null; + @Input() environmentId: string = ''; + @Input() loading = false; + + private readonly store = inject(Store); + private readonly dialogService = inject(DialogService); + private readonly confirmationService = inject(ConfirmationService); + + onAddTarget(): void { + const ref = this.dialogService.open(TargetFormComponent, { + header: 'Add Deployment Target', + width: '600px', + data: { environmentId: this.environmentId, mode: 'create' } + }); + + ref.onClose.subscribe((result) => { + if (result) { + this.store.dispatch(EnvironmentActions.addTarget({ + environmentId: this.environmentId, + request: result + })); + } + }); + } + + onEditTarget(target: DeploymentTarget): void { + const ref = this.dialogService.open(TargetFormComponent, { + header: 'Edit Deployment Target', + width: '600px', + data: { environmentId: this.environmentId, target, mode: 'edit' } + }); + + ref.onClose.subscribe((result) => { + if (result) { + this.store.dispatch(EnvironmentActions.updateTarget({ + environmentId: this.environmentId, + targetId: target.id, + request: result + })); + } + }); + } + + onRemoveTarget(target: DeploymentTarget): void { + this.confirmationService.confirm({ + message: `Remove target "${target.name}" from this environment?`, + header: 'Remove Target', + icon: 'pi pi-exclamation-triangle', + accept: () => { + this.store.dispatch(EnvironmentActions.removeTarget({ + environmentId: this.environmentId, + targetId: target.id + })); + } + }); + } + + onHealthCheck(target: DeploymentTarget): void { + this.store.dispatch(EnvironmentActions.checkTargetHealth({ + environmentId: this.environmentId, + targetId: target.id + })); + } + + getTypeIcon(type: string): string { + const icons: Record = { + docker_host: 'pi-box', + compose_host: 'pi-th-large', + ecs_service: 'pi-cloud', + nomad_job: 'pi-sitemap' + }; + return icons[type] || 'pi-server'; + } + + getHealthClass(status: string): string { + return `health-badge--${status}`; + } + + getAgentStatusClass(status: string): string { + return `agent-status--${status}`; + } +} +``` + +```html + +
+
+

Deployment Targets

+ +
+ + + + + Name + Type + Agent + Health + Last Check + Actions + + + + + +
+ + {{ target.name }} +
+ + {{ target.type | titlecase }} + + + + {{ target.agentId || 'Not assigned' }} + + + + + {{ target.healthStatus | titlecase }} + + + {{ target.lastHealthCheck | date:'short' }} + + + + + + +
+ + + +
+ +

No deployment targets configured

+ +
+ + +
+
+
+ + +``` + +### Freeze Window Editor Component + +```typescript +// freeze-window-editor.component.ts +import { Component, Input, inject, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { FormBuilder, FormGroup, ReactiveFormsModule, Validators } from '@angular/forms'; +import { Store } from '@ngrx/store'; +import { DialogService } from 'primeng/dynamicdialog'; +import { EnvironmentActions } from '@store/release-orchestrator/environments/environments.actions'; + +export interface FreezeWindow { + id: string; + environmentId: string; + name: string; + reason: string; + startTime: Date; + endTime: Date; + recurrence: 'none' | 'daily' | 'weekly' | 'monthly'; + isActive: boolean; + createdBy: string; + createdAt: Date; +} + +@Component({ + selector: 'so-freeze-window-editor', + standalone: true, + imports: [CommonModule, ReactiveFormsModule], + templateUrl: './freeze-window-editor.component.html', + styleUrl: './freeze-window-editor.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush, + providers: [DialogService] +}) +export class FreezeWindowEditorComponent { + @Input() freezeWindows: FreezeWindow[] | null = null; + @Input() environmentId: string = ''; + @Input() loading = false; + + private readonly store = inject(Store); + private readonly fb = inject(FormBuilder); + private readonly dialogService = inject(DialogService); + + showForm = false; + editingId: string | null = null; + + form: FormGroup = this.fb.group({ + name: ['', [Validators.required, Validators.maxLength(100)]], + reason: ['', [Validators.required, Validators.maxLength(500)]], + startTime: [null, Validators.required], + endTime: [null, Validators.required], + recurrence: ['none'] + }); + + recurrenceOptions = [ + { label: 'None (One-time)', value: 'none' }, + { label: 'Daily', value: 'daily' }, + { label: 'Weekly', value: 'weekly' }, + { label: 'Monthly', value: 'monthly' } + ]; + + onAdd(): void { + this.showForm = true; + this.editingId = null; + this.form.reset({ recurrence: 'none' }); + } + + onEdit(window: FreezeWindow): void { + this.showForm = true; + this.editingId = window.id; + this.form.patchValue({ + name: window.name, + reason: window.reason, + startTime: new Date(window.startTime), + endTime: new Date(window.endTime), + recurrence: window.recurrence + }); + } + + onCancel(): void { + this.showForm = false; + this.editingId = null; + this.form.reset(); + } + + onSave(): void { + if (this.form.invalid) return; + + const value = this.form.value; + if (this.editingId) { + this.store.dispatch(EnvironmentActions.updateFreezeWindow({ + environmentId: this.environmentId, + windowId: this.editingId, + request: value + })); + } else { + this.store.dispatch(EnvironmentActions.createFreezeWindow({ + environmentId: this.environmentId, + request: value + })); + } + + this.onCancel(); + } + + onDelete(window: FreezeWindow): void { + this.store.dispatch(EnvironmentActions.deleteFreezeWindow({ + environmentId: this.environmentId, + windowId: window.id + })); + } + + isActiveNow(window: FreezeWindow): boolean { + const now = new Date(); + return new Date(window.startTime) <= now && now <= new Date(window.endTime); + } + + getRecurrenceLabel(value: string): string { + return this.recurrenceOptions.find(o => o.value === value)?.label || value; + } +} +``` + +```html + +
+
+

Freeze Windows

+ +
+ +
+
+
+
+ + +
+ +
+ + +
+ +
+
+ + +
+
+ + +
+
+ +
+ + +
+
+ +
+ + +
+
+
+ +
+
+
+ + + {{ window.name }} + + Active +
+

{{ window.reason }}

+
+ {{ window.startTime | date:'medium' }} - {{ window.endTime | date:'medium' }} + + ({{ getRecurrenceLabel(window.recurrence) }}) + +
+
+ + +
+
+ +
+ +

No freeze windows configured

+
+
+ + + + +
+``` + +### Environment Settings Component + +```typescript +// environment-settings.component.ts +import { Component, Input, inject, OnChanges, SimpleChanges, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { FormBuilder, FormGroup, ReactiveFormsModule, Validators } from '@angular/forms'; +import { Store } from '@ngrx/store'; +import { EnvironmentActions } from '@store/release-orchestrator/environments/environments.actions'; + +@Component({ + selector: 'so-environment-settings', + standalone: true, + imports: [CommonModule, ReactiveFormsModule], + templateUrl: './environment-settings.component.html', + styleUrl: './environment-settings.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class EnvironmentSettingsComponent implements OnChanges { + @Input() environment: Environment | null = null; + @Input() loading = false; + + private readonly store = inject(Store); + private readonly fb = inject(FormBuilder); + + form: FormGroup = this.fb.group({ + requiresApproval: [true], + requiredApprovers: [1, [Validators.min(0), Validators.max(10)]], + autoPromoteOnSuccess: [false], + separationOfDuties: [false], + notifyOnPromotion: [true], + notifyOnDeployment: [true], + notifyOnFailure: [true], + webhookUrl: [''], + maxConcurrentDeployments: [1, [Validators.min(1), Validators.max(100)]], + deploymentTimeout: [3600, [Validators.min(60), Validators.max(86400)]] + }); + + ngOnChanges(changes: SimpleChanges): void { + if (changes['environment'] && this.environment) { + this.form.patchValue({ + requiresApproval: this.environment.requiresApproval, + requiredApprovers: this.environment.requiredApprovers, + autoPromoteOnSuccess: this.environment.autoPromoteOnSuccess, + separationOfDuties: this.environment.separationOfDuties, + notifyOnPromotion: this.environment.notifyOnPromotion, + notifyOnDeployment: this.environment.notifyOnDeployment, + notifyOnFailure: this.environment.notifyOnFailure, + webhookUrl: this.environment.webhookUrl || '', + maxConcurrentDeployments: this.environment.maxConcurrentDeployments, + deploymentTimeout: this.environment.deploymentTimeout + }); + } + } + + onSave(): void { + if (this.form.invalid || !this.environment) return; + + this.store.dispatch(EnvironmentActions.updateEnvironmentSettings({ + id: this.environment.id, + settings: this.form.value + })); + } + + onReset(): void { + if (this.environment) { + this.ngOnChanges({ environment: { currentValue: this.environment } } as any); + } + } +} +``` + +```html + +
+
+
+

Approval Settings

+ +
+ + +
+ +
+ + +
+ +
+ + +
+ +
+ + +
+
+ +
+

Notifications

+ +
+ + +
+ +
+ + +
+ +
+ + +
+ +
+ + +
+
+ +
+

Deployment Limits

+ +
+ + +
+ +
+ + + {{ form.get('deploymentTimeout')?.value / 60 | number:'1.0-0' }} minutes +
+
+ +
+ + +
+
+
+``` + +### Documentation Deliverables + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `docs/modules/release-orchestrator/ui/screens.md` (partial) | Markdown | Key UI screens reference (environment overview, release detail, "Why Blocked?" modal) | + +--- + +## Acceptance Criteria + +### Code + +- [ ] Environment list displays all environments +- [ ] Environment cards show health status +- [ ] Create environment dialog works +- [ ] Delete environment with confirmation +- [ ] Environment detail loads correctly +- [ ] Target list shows all targets +- [ ] Add/edit/remove targets works +- [ ] Target health check triggers +- [ ] Freeze window CRUD operations work +- [ ] Freeze window recurrence saves +- [ ] Active freeze window highlighted +- [ ] Environment settings save correctly +- [ ] Form validation works +- [ ] Unit test coverage >=80% + +### Documentation + +- [ ] UI screens specification file created +- [ ] Environment overview screen documented with ASCII mockup +- [ ] Target management screens documented +- [ ] Freeze window editor documented +- [ ] All screen wireframes included + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 103_001 Environment Model | Internal | TODO | +| Angular 17 | External | Available | +| NgRx 17 | External | Available | +| PrimeNG 17 | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| EnvironmentListComponent | TODO | | +| EnvironmentDetailComponent | TODO | | +| TargetListComponent | TODO | | +| TargetFormComponent | TODO | | +| FreezeWindowEditorComponent | TODO | | +| EnvironmentSettingsComponent | TODO | | +| Environment NgRx Store | TODO | | +| EnvironmentService | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 11-Jan-2026 | Added documentation deliverable: ui/screens.md (partial - environment screens) | diff --git a/docs/implplan/SPRINT_20260110_111_003_FE_release_management_ui.md b/docs/implplan/SPRINT_20260110_111_003_FE_release_management_ui.md new file mode 100644 index 000000000..0ecbc0faa --- /dev/null +++ b/docs/implplan/SPRINT_20260110_111_003_FE_release_management_ui.md @@ -0,0 +1,931 @@ +# SPRINT: Release Management UI + +> **Sprint ID:** 111_003 +> **Module:** FE +> **Phase:** 11 - UI Implementation +> **Status:** TODO +> **Parent:** [111_000_INDEX](SPRINT_20260110_111_000_INDEX_ui_implementation.md) + +--- + +## Overview + +Implement the Release Management UI providing release catalog, release detail views, release creation wizard, and component selection functionality. + +### Objectives + +- Release catalog with filtering and search +- Release detail view with components +- Create release wizard (multi-step) +- Component selector with registry integration +- Release status tracking +- Release bundle comparison + +### Working Directory + +``` +src/Web/StellaOps.Web/ +├── src/app/features/release-orchestrator/ +│ └── releases/ +│ ├── release-list/ +│ │ ├── release-list.component.ts +│ │ ├── release-list.component.html +│ │ └── release-list.component.scss +│ ├── release-detail/ +│ │ ├── release-detail.component.ts +│ │ ├── release-detail.component.html +│ │ └── release-detail.component.scss +│ ├── create-release/ +│ │ ├── create-release.component.ts +│ │ ├── steps/ +│ │ │ ├── basic-info-step/ +│ │ │ ├── component-selection-step/ +│ │ │ ├── configuration-step/ +│ │ │ └── review-step/ +│ │ └── create-release.routes.ts +│ ├── components/ +│ │ ├── component-selector/ +│ │ ├── component-list/ +│ │ ├── release-timeline/ +│ │ └── release-comparison/ +│ └── releases.routes.ts +└── src/app/store/release-orchestrator/ + └── releases/ + ├── releases.actions.ts + ├── releases.reducer.ts + ├── releases.effects.ts + └── releases.selectors.ts +``` + +--- + +## Deliverables + +### Release List Component + +```typescript +// release-list.component.ts +import { Component, OnInit, inject, signal, computed, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { RouterModule } from '@angular/router'; +import { FormsModule } from '@angular/forms'; +import { Store } from '@ngrx/store'; +import { ReleaseActions } from '@store/release-orchestrator/releases/releases.actions'; +import * as ReleaseSelectors from '@store/release-orchestrator/releases/releases.selectors'; + +export interface Release { + id: string; + name: string; + version: string; + description: string; + status: 'draft' | 'ready' | 'deploying' | 'deployed' | 'failed' | 'rolled_back'; + currentEnvironment: string | null; + targetEnvironment: string | null; + componentCount: number; + createdAt: Date; + createdBy: string; + updatedAt: Date; + deployedAt: Date | null; +} + +@Component({ + selector: 'so-release-list', + standalone: true, + imports: [CommonModule, RouterModule, FormsModule], + templateUrl: './release-list.component.html', + styleUrl: './release-list.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class ReleaseListComponent implements OnInit { + private readonly store = inject(Store); + + readonly releases$ = this.store.select(ReleaseSelectors.selectFilteredReleases); + readonly loading$ = this.store.select(ReleaseSelectors.selectLoading); + readonly totalCount$ = this.store.select(ReleaseSelectors.selectTotalCount); + + searchTerm = signal(''); + statusFilter = signal([]); + environmentFilter = signal(null); + sortField = signal('createdAt'); + sortOrder = signal<'asc' | 'desc'>('desc'); + + readonly statusOptions = [ + { label: 'Draft', value: 'draft' }, + { label: 'Ready', value: 'ready' }, + { label: 'Deploying', value: 'deploying' }, + { label: 'Deployed', value: 'deployed' }, + { label: 'Failed', value: 'failed' }, + { label: 'Rolled Back', value: 'rolled_back' } + ]; + + ngOnInit(): void { + this.loadReleases(); + } + + loadReleases(): void { + this.store.dispatch(ReleaseActions.loadReleases({ + filter: { + search: this.searchTerm(), + statuses: this.statusFilter(), + environment: this.environmentFilter(), + sortField: this.sortField(), + sortOrder: this.sortOrder() + } + })); + } + + onSearch(term: string): void { + this.searchTerm.set(term); + this.loadReleases(); + } + + onStatusFilterChange(statuses: string[]): void { + this.statusFilter.set(statuses); + this.loadReleases(); + } + + onSort(field: string): void { + if (this.sortField() === field) { + this.sortOrder.set(this.sortOrder() === 'asc' ? 'desc' : 'asc'); + } else { + this.sortField.set(field); + this.sortOrder.set('desc'); + } + this.loadReleases(); + } + + getStatusClass(status: string): string { + const classes: Record = { + draft: 'badge--secondary', + ready: 'badge--info', + deploying: 'badge--warning', + deployed: 'badge--success', + failed: 'badge--danger', + rolled_back: 'badge--warning' + }; + return classes[status] || 'badge--secondary'; + } + + formatStatus(status: string): string { + return status.replace('_', ' ').replace(/\b\w/g, c => c.toUpperCase()); + } +} +``` + +```html + +
+
+

Releases

+ +
+ +
+ + + + + + + + +
+ + + + + + Release + + Status + Environment + Components + + Created + + Actions + + + + + + + {{ release.name }} + {{ release.version }} + + {{ release.description }} + + + + {{ formatStatus(release.status) }} + + + + + {{ release.currentEnvironment }} + + - + + {{ release.componentCount }} + + {{ release.createdAt | date:'short' }} + by {{ release.createdBy }} + + + + + + + + + + +
+ +

No releases found

+

Create your first release to get started.

+ +
+ + +
+
+
+``` + +### Release Detail Component + +```typescript +// release-detail.component.ts +import { Component, OnInit, inject, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { ActivatedRoute, RouterModule } from '@angular/router'; +import { Store } from '@ngrx/store'; +import { ConfirmationService } from 'primeng/api'; +import { ComponentListComponent } from '../components/component-list/component-list.component'; +import { ReleaseTimelineComponent } from '../components/release-timeline/release-timeline.component'; +import { ReleaseActions } from '@store/release-orchestrator/releases/releases.actions'; +import * as ReleaseSelectors from '@store/release-orchestrator/releases/releases.selectors'; + +export interface ReleaseComponent { + id: string; + name: string; + imageRef: string; + digest: string; + tag: string | null; + version: string; + type: 'container' | 'helm' | 'script'; + configOverrides: Record; +} + +export interface ReleaseEvent { + id: string; + type: 'created' | 'promoted' | 'approved' | 'rejected' | 'deployed' | 'failed' | 'rolled_back'; + environment: string | null; + actor: string; + message: string; + timestamp: Date; + metadata: Record; +} + +@Component({ + selector: 'so-release-detail', + standalone: true, + imports: [CommonModule, RouterModule, ComponentListComponent, ReleaseTimelineComponent], + templateUrl: './release-detail.component.html', + styleUrl: './release-detail.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush, + providers: [ConfirmationService] +}) +export class ReleaseDetailComponent implements OnInit { + private readonly store = inject(Store); + private readonly route = inject(ActivatedRoute); + private readonly confirmationService = inject(ConfirmationService); + + readonly release$ = this.store.select(ReleaseSelectors.selectSelectedRelease); + readonly components$ = this.store.select(ReleaseSelectors.selectSelectedReleaseComponents); + readonly events$ = this.store.select(ReleaseSelectors.selectSelectedReleaseEvents); + readonly loading$ = this.store.select(ReleaseSelectors.selectLoading); + + activeTab = 'components'; + + ngOnInit(): void { + const id = this.route.snapshot.paramMap.get('id'); + if (id) { + this.store.dispatch(ReleaseActions.loadRelease({ id })); + this.store.dispatch(ReleaseActions.loadReleaseComponents({ releaseId: id })); + this.store.dispatch(ReleaseActions.loadReleaseEvents({ releaseId: id })); + } + } + + onPromote(release: Release): void { + this.store.dispatch(ReleaseActions.requestPromotion({ releaseId: release.id })); + } + + onDeploy(release: Release): void { + this.confirmationService.confirm({ + message: `Deploy "${release.name}" to ${release.targetEnvironment}?`, + header: 'Confirm Deployment', + icon: 'pi pi-exclamation-triangle', + accept: () => { + this.store.dispatch(ReleaseActions.deploy({ releaseId: release.id })); + } + }); + } + + onRollback(release: Release): void { + this.confirmationService.confirm({ + message: `Rollback "${release.name}" from ${release.currentEnvironment}? This will restore the previous release.`, + header: 'Confirm Rollback', + icon: 'pi pi-exclamation-triangle', + acceptButtonStyleClass: 'p-button-danger', + accept: () => { + this.store.dispatch(ReleaseActions.rollback({ releaseId: release.id })); + } + }); + } + + canPromote(release: Release): boolean { + return release.status === 'ready' || release.status === 'deployed'; + } + + canDeploy(release: Release): boolean { + return release.status === 'ready' && release.targetEnvironment !== null; + } + + canRollback(release: Release): boolean { + return release.status === 'deployed' || release.status === 'failed'; + } + + getStatusClass(status: string): string { + const classes: Record = { + draft: 'badge--secondary', + ready: 'badge--info', + deploying: 'badge--warning', + deployed: 'badge--success', + failed: 'badge--danger', + rolled_back: 'badge--warning' + }; + return classes[status] || 'badge--secondary'; + } +} +``` + +```html + +
+
+
+ Releases + + {{ release.name }} +
+ +
+
+

+ {{ release.name }} + {{ release.version }} + + {{ release.status | titlecase }} + +

+

{{ release.description }}

+
+
+ + + +
+
+ +
+
+ + Created by {{ release.createdBy }} +
+
+ + {{ release.createdAt | date:'medium' }} +
+
+ + Currently in {{ release.currentEnvironment }} +
+
+ + Target: {{ release.targetEnvironment }} +
+
+
+ +
+ + + +
+ +
+ + + + + + +
+
{{ release | json }}
+
+
+
+ + +``` + +### Create Release Wizard + +```typescript +// create-release.component.ts +import { Component, inject, signal, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { Router } from '@angular/router'; +import { Store } from '@ngrx/store'; +import { StepsModule } from 'primeng/steps'; +import { MenuItem } from 'primeng/api'; +import { BasicInfoStepComponent } from './steps/basic-info-step/basic-info-step.component'; +import { ComponentSelectionStepComponent } from './steps/component-selection-step/component-selection-step.component'; +import { ConfigurationStepComponent } from './steps/configuration-step/configuration-step.component'; +import { ReviewStepComponent } from './steps/review-step/review-step.component'; +import { ReleaseActions } from '@store/release-orchestrator/releases/releases.actions'; + +export interface CreateReleaseData { + basicInfo: { + name: string; + version: string; + description: string; + }; + components: ReleaseComponent[]; + configuration: { + targetEnvironment: string; + deploymentStrategy: string; + configOverrides: Record; + }; +} + +@Component({ + selector: 'so-create-release', + standalone: true, + imports: [ + CommonModule, + StepsModule, + BasicInfoStepComponent, + ComponentSelectionStepComponent, + ConfigurationStepComponent, + ReviewStepComponent + ], + templateUrl: './create-release.component.html', + styleUrl: './create-release.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class CreateReleaseComponent { + private readonly store = inject(Store); + private readonly router = inject(Router); + + activeIndex = signal(0); + releaseData = signal>({}); + + readonly steps: MenuItem[] = [ + { label: 'Basic Info' }, + { label: 'Components' }, + { label: 'Configuration' }, + { label: 'Review' } + ]; + + onBasicInfoComplete(data: CreateReleaseData['basicInfo']): void { + this.releaseData.update(current => ({ ...current, basicInfo: data })); + this.activeIndex.set(1); + } + + onComponentsComplete(components: ReleaseComponent[]): void { + this.releaseData.update(current => ({ ...current, components })); + this.activeIndex.set(2); + } + + onConfigurationComplete(config: CreateReleaseData['configuration']): void { + this.releaseData.update(current => ({ ...current, configuration: config })); + this.activeIndex.set(3); + } + + onBack(): void { + this.activeIndex.update(i => Math.max(0, i - 1)); + } + + onCancel(): void { + this.router.navigate(['/releases']); + } + + onSubmit(): void { + const data = this.releaseData(); + if (data.basicInfo && data.components && data.configuration) { + this.store.dispatch(ReleaseActions.createRelease({ + request: { + name: data.basicInfo.name, + version: data.basicInfo.version, + description: data.basicInfo.description, + components: data.components, + targetEnvironment: data.configuration.targetEnvironment, + deploymentStrategy: data.configuration.deploymentStrategy, + configOverrides: data.configuration.configOverrides + } + })); + } + } +} +``` + +```html + +
+
+

Create Release

+ +
+ + + +
+ + + + + + + + + + + +
+
+``` + +### Component Selector Component + +```typescript +// component-selector.component.ts +import { Component, Input, Output, EventEmitter, inject, signal, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { FormsModule } from '@angular/forms'; +import { Store } from '@ngrx/store'; +import { debounceTime, Subject } from 'rxjs'; + +export interface RegistryImage { + name: string; + repository: string; + tags: string[]; + digests: Array<{ tag: string; digest: string; pushedAt: Date }>; + lastPushed: Date; +} + +@Component({ + selector: 'so-component-selector', + standalone: true, + imports: [CommonModule, FormsModule], + templateUrl: './component-selector.component.html', + styleUrl: './component-selector.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class ComponentSelectorComponent { + @Input() selectedComponents: ReleaseComponent[] = []; + @Output() selectionChange = new EventEmitter(); + @Output() close = new EventEmitter(); + + private readonly searchSubject = new Subject(); + + searchTerm = signal(''); + searchResults = signal([]); + loading = signal(false); + selectedImage = signal(null); + selectedDigest = signal(null); + + constructor() { + this.searchSubject.pipe( + debounceTime(300) + ).subscribe(term => this.search(term)); + } + + onSearchInput(term: string): void { + this.searchTerm.set(term); + this.searchSubject.next(term); + } + + private search(term: string): void { + if (term.length < 2) { + this.searchResults.set([]); + return; + } + + this.loading.set(true); + // API call would go here + // For now, simulating with timeout + setTimeout(() => { + this.searchResults.set([ + { + name: term, + repository: `registry.example.com/${term}`, + tags: ['latest', 'v1.0.0', 'v1.1.0'], + digests: [ + { tag: 'latest', digest: 'sha256:abc123...', pushedAt: new Date() }, + { tag: 'v1.0.0', digest: 'sha256:def456...', pushedAt: new Date() } + ], + lastPushed: new Date() + } + ]); + this.loading.set(false); + }, 500); + } + + onSelectImage(image: RegistryImage): void { + this.selectedImage.set(image); + this.selectedDigest.set(null); + } + + onSelectDigest(digest: string): void { + this.selectedDigest.set(digest); + } + + onAddComponent(): void { + const image = this.selectedImage(); + const digest = this.selectedDigest(); + + if (!image || !digest) return; + + const digestInfo = image.digests.find(d => d.digest === digest); + const component: ReleaseComponent = { + id: crypto.randomUUID(), + name: image.name, + imageRef: image.repository, + digest: digest, + tag: digestInfo?.tag || null, + version: digestInfo?.tag || digest.substring(7, 19), + type: 'container', + configOverrides: {} + }; + + const updated = [...this.selectedComponents, component]; + this.selectionChange.emit(updated); + this.selectedImage.set(null); + this.selectedDigest.set(null); + } + + onRemoveComponent(id: string): void { + const updated = this.selectedComponents.filter(c => c.id !== id); + this.selectionChange.emit(updated); + } + + isAlreadySelected(image: RegistryImage, digest: string): boolean { + return this.selectedComponents.some( + c => c.imageRef === image.repository && c.digest === digest + ); + } + + formatDigest(digest: string): string { + return digest.substring(0, 19) + '...'; + } +} +``` + +```html + +
+ + +
+
+ No images found matching "{{ searchTerm() }}" +
+ +
+
+ + {{ image.name }} +
+ {{ image.repository }} + +
+
+ {{ d.tag || 'untagged' }} + {{ formatDigest(d.digest) }} + {{ d.pushedAt | date:'short' }} + Added +
+
+
+
+ +
+

Selected Components ({{ selectedComponents.length }})

+
+ No components selected yet +
+
+
+ {{ comp.name }} + {{ comp.version }} + {{ formatDigest(comp.digest) }} +
+ +
+
+ +
+ +
+
+``` + +### Release Timeline Component + +```typescript +// release-timeline.component.ts +import { Component, Input, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; + +@Component({ + selector: 'so-release-timeline', + standalone: true, + imports: [CommonModule], + templateUrl: './release-timeline.component.html', + styleUrl: './release-timeline.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class ReleaseTimelineComponent { + @Input() events: ReleaseEvent[] | null = null; + + getEventIcon(type: string): string { + const icons: Record = { + created: 'pi-plus-circle', + promoted: 'pi-arrow-right', + approved: 'pi-check-circle', + rejected: 'pi-times-circle', + deployed: 'pi-cloud-upload', + failed: 'pi-exclamation-triangle', + rolled_back: 'pi-undo' + }; + return icons[type] || 'pi-circle'; + } + + getEventClass(type: string): string { + const classes: Record = { + created: 'event--info', + promoted: 'event--info', + approved: 'event--success', + rejected: 'event--danger', + deployed: 'event--success', + failed: 'event--danger', + rolled_back: 'event--warning' + }; + return classes[type] || 'event--default'; + } +} +``` + +```html + +
+
+ +

No events yet

+
+ +
+
+
+ +
+
+
+ {{ event.type | titlecase }} + + {{ event.environment }} + +
+

{{ event.message }}

+
+ {{ event.actor }} + {{ event.timestamp | date:'medium' }} +
+
+
+
+
+``` + +--- + +## Acceptance Criteria + +- [ ] Release list displays all releases +- [ ] Filtering by status works +- [ ] Filtering by environment works +- [ ] Search finds releases by name +- [ ] Sorting by columns works +- [ ] Pagination works correctly +- [ ] Release detail loads correctly +- [ ] Component list displays properly +- [ ] Timeline shows release events +- [ ] Promote action works +- [ ] Deploy action with confirmation +- [ ] Rollback action with confirmation +- [ ] Create wizard completes all steps +- [ ] Component selector searches registry +- [ ] Component selector adds by digest +- [ ] Review step shows all data +- [ ] Unit test coverage >=80% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 104_003 Release Bundle | Internal | TODO | +| Angular 17 | External | Available | +| NgRx 17 | External | Available | +| PrimeNG 17 | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| ReleaseListComponent | TODO | | +| ReleaseDetailComponent | TODO | | +| CreateReleaseComponent | TODO | | +| BasicInfoStepComponent | TODO | | +| ComponentSelectionStepComponent | TODO | | +| ConfigurationStepComponent | TODO | | +| ReviewStepComponent | TODO | | +| ComponentSelectorComponent | TODO | | +| ComponentListComponent | TODO | | +| ReleaseTimelineComponent | TODO | | +| Release NgRx Store | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_111_004_FE_workflow_editor.md b/docs/implplan/SPRINT_20260110_111_004_FE_workflow_editor.md new file mode 100644 index 000000000..d1d1a3c31 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_111_004_FE_workflow_editor.md @@ -0,0 +1,1207 @@ +# SPRINT: Workflow Editor + +> **Sprint ID:** 111_004 +> **Module:** FE +> **Phase:** 11 - UI Implementation +> **Status:** TODO +> **Parent:** [111_000_INDEX](SPRINT_20260110_111_000_INDEX_ui_implementation.md) + +--- + +## Overview + +Implement the Visual Workflow Editor providing DAG-based workflow design, step palette, step configuration panel, and YAML editing capabilities. + +### Objectives + +- Visual DAG editor with drag-and-drop +- Step palette with available step types +- Step configuration panel +- Connection validation +- YAML view with syntax highlighting +- Import/export workflow definitions + +### Working Directory + +``` +src/Web/StellaOps.Web/ +├── src/app/features/release-orchestrator/ +│ └── workflows/ +│ ├── workflow-list/ +│ │ ├── workflow-list.component.ts +│ │ ├── workflow-list.component.html +│ │ └── workflow-list.component.scss +│ ├── workflow-editor/ +│ │ ├── workflow-editor.component.ts +│ │ ├── workflow-editor.component.html +│ │ └── workflow-editor.component.scss +│ ├── components/ +│ │ ├── dag-canvas/ +│ │ │ ├── dag-canvas.component.ts +│ │ │ ├── dag-canvas.component.html +│ │ │ └── dag-canvas.component.scss +│ │ ├── step-palette/ +│ │ │ ├── step-palette.component.ts +│ │ │ └── step-palette.component.html +│ │ ├── step-config-panel/ +│ │ │ ├── step-config-panel.component.ts +│ │ │ └── step-config-panel.component.html +│ │ ├── step-node/ +│ │ │ ├── step-node.component.ts +│ │ │ └── step-node.component.html +│ │ └── yaml-editor/ +│ │ ├── yaml-editor.component.ts +│ │ └── yaml-editor.component.html +│ ├── services/ +│ │ ├── workflow.service.ts +│ │ └── dag-layout.service.ts +│ └── workflows.routes.ts +└── src/app/store/release-orchestrator/ + └── workflows/ + ├── workflows.actions.ts + ├── workflows.reducer.ts + └── workflows.selectors.ts +``` + +--- + +## Deliverables + +### Workflow Editor Component + +```typescript +// workflow-editor.component.ts +import { Component, OnInit, inject, signal, computed, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { ActivatedRoute, Router } from '@angular/router'; +import { Store } from '@ngrx/store'; +import { ConfirmationService, MessageService } from 'primeng/api'; +import { DagCanvasComponent } from '../components/dag-canvas/dag-canvas.component'; +import { StepPaletteComponent } from '../components/step-palette/step-palette.component'; +import { StepConfigPanelComponent } from '../components/step-config-panel/step-config-panel.component'; +import { YamlEditorComponent } from '../components/yaml-editor/yaml-editor.component'; +import { WorkflowActions } from '@store/release-orchestrator/workflows/workflows.actions'; +import * as WorkflowSelectors from '@store/release-orchestrator/workflows/workflows.selectors'; + +export interface WorkflowStep { + id: string; + type: 'script' | 'approval' | 'deploy' | 'notify' | 'gate' | 'wait' | 'parallel' | 'manual'; + name: string; + config: Record; + position: { x: number; y: number }; + dependencies: string[]; +} + +export interface Workflow { + id: string; + name: string; + description: string; + version: number; + steps: WorkflowStep[]; + connections: Array<{ from: string; to: string }>; + triggers: WorkflowTrigger[]; + isDraft: boolean; + createdAt: Date; + updatedAt: Date; +} + +export interface WorkflowTrigger { + type: 'manual' | 'promotion' | 'schedule' | 'webhook'; + config: Record; +} + +@Component({ + selector: 'so-workflow-editor', + standalone: true, + imports: [ + CommonModule, + DagCanvasComponent, + StepPaletteComponent, + StepConfigPanelComponent, + YamlEditorComponent + ], + templateUrl: './workflow-editor.component.html', + styleUrl: './workflow-editor.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush, + providers: [ConfirmationService, MessageService] +}) +export class WorkflowEditorComponent implements OnInit { + private readonly store = inject(Store); + private readonly route = inject(ActivatedRoute); + private readonly router = inject(Router); + private readonly confirmationService = inject(ConfirmationService); + private readonly messageService = inject(MessageService); + + readonly workflow$ = this.store.select(WorkflowSelectors.selectCurrentWorkflow); + readonly loading$ = this.store.select(WorkflowSelectors.selectLoading); + readonly validationErrors$ = this.store.select(WorkflowSelectors.selectValidationErrors); + readonly isDirty$ = this.store.select(WorkflowSelectors.selectIsDirty); + + viewMode = signal<'visual' | 'yaml'>('visual'); + selectedStepId = signal(null); + showPalette = signal(true); + zoom = signal(100); + + readonly selectedStep = computed(() => { + const workflow = this.workflow$; + const stepId = this.selectedStepId(); + // Would use actual selector in real implementation + return null; + }); + + ngOnInit(): void { + const id = this.route.snapshot.paramMap.get('id'); + if (id && id !== 'new') { + this.store.dispatch(WorkflowActions.loadWorkflow({ id })); + } else { + this.store.dispatch(WorkflowActions.createNewWorkflow()); + } + } + + onStepAdded(step: Partial): void { + this.store.dispatch(WorkflowActions.addStep({ step: step as WorkflowStep })); + } + + onStepSelected(stepId: string | null): void { + this.selectedStepId.set(stepId); + } + + onStepUpdated(step: WorkflowStep): void { + this.store.dispatch(WorkflowActions.updateStep({ step })); + } + + onStepDeleted(stepId: string): void { + this.confirmationService.confirm({ + message: 'Delete this step? All connections to this step will also be removed.', + header: 'Delete Step', + icon: 'pi pi-exclamation-triangle', + accept: () => { + this.store.dispatch(WorkflowActions.deleteStep({ stepId })); + this.selectedStepId.set(null); + } + }); + } + + onConnectionAdded(connection: { from: string; to: string }): void { + this.store.dispatch(WorkflowActions.addConnection({ connection })); + } + + onConnectionRemoved(connection: { from: string; to: string }): void { + this.store.dispatch(WorkflowActions.removeConnection({ connection })); + } + + onStepMoved(stepId: string, position: { x: number; y: number }): void { + this.store.dispatch(WorkflowActions.moveStep({ stepId, position })); + } + + onYamlChanged(yaml: string): void { + this.store.dispatch(WorkflowActions.updateFromYaml({ yaml })); + } + + onSave(): void { + this.store.dispatch(WorkflowActions.saveWorkflow()); + } + + onValidate(): void { + this.store.dispatch(WorkflowActions.validateWorkflow()); + } + + onPublish(): void { + this.confirmationService.confirm({ + message: 'Publish this workflow? It will become available for use in releases.', + header: 'Publish Workflow', + accept: () => { + this.store.dispatch(WorkflowActions.publishWorkflow()); + } + }); + } + + onZoomIn(): void { + this.zoom.update(z => Math.min(200, z + 10)); + } + + onZoomOut(): void { + this.zoom.update(z => Math.max(25, z - 10)); + } + + onZoomReset(): void { + this.zoom.set(100); + } + + onToggleView(): void { + this.viewMode.update(m => m === 'visual' ? 'yaml' : 'visual'); + } + + canDeactivate(): boolean { + // Would check isDirty$ and prompt for confirmation + return true; + } +} +``` + +```html + +
+
+
+ + + + + Draft + Unsaved +
+ +
+
+ + +
+ + + + +
+
+ +
+ + +
    +
  • {{ error }}
  • +
+
+
+
+ +
+ + + + +
+
+ + {{ zoom() }}% + + + + +
+ + + +
+ + +
+ + +
+ + +
+
+
+ + + +``` + +### DAG Canvas Component + +```typescript +// dag-canvas.component.ts +import { Component, Input, Output, EventEmitter, ElementRef, ViewChild, + AfterViewInit, OnChanges, SimpleChanges, inject, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import * as d3 from 'd3'; +import { DagLayoutService } from '../../services/dag-layout.service'; +import { StepNodeComponent } from '../step-node/step-node.component'; + +interface DagNode { + id: string; + x: number; + y: number; + width: number; + height: number; + step: WorkflowStep; +} + +interface DagEdge { + from: string; + to: string; + path: string; +} + +@Component({ + selector: 'so-dag-canvas', + standalone: true, + imports: [CommonModule, StepNodeComponent], + templateUrl: './dag-canvas.component.html', + styleUrl: './dag-canvas.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class DagCanvasComponent implements AfterViewInit, OnChanges { + @Input() workflow: Workflow | null = null; + @Input() zoom = 100; + @Input() selectedStepId: string | null = null; + + @Output() stepSelected = new EventEmitter(); + @Output() stepMoved = new EventEmitter<{ stepId: string; position: { x: number; y: number } }>(); + @Output() stepDeleted = new EventEmitter(); + @Output() connectionAdded = new EventEmitter<{ from: string; to: string }>(); + @Output() connectionRemoved = new EventEmitter<{ from: string; to: string }>(); + + @ViewChild('svgContainer') svgContainer!: ElementRef; + @ViewChild('canvasContainer') canvasContainer!: ElementRef; + + private readonly layoutService = inject(DagLayoutService); + + nodes: DagNode[] = []; + edges: DagEdge[] = []; + connectingFrom: string | null = null; + mousePosition = { x: 0, y: 0 }; + + ngAfterViewInit(): void { + this.setupDragDrop(); + this.setupPanZoom(); + } + + ngOnChanges(changes: SimpleChanges): void { + if (changes['workflow'] && this.workflow) { + this.computeLayout(); + } + } + + private computeLayout(): void { + if (!this.workflow) return; + + const layout = this.layoutService.computeLayout( + this.workflow.steps, + this.workflow.connections + ); + + this.nodes = layout.nodes; + this.edges = layout.edges; + } + + private setupDragDrop(): void { + // D3 drag behavior for nodes + const drag = d3.drag() + .on('start', (event, d) => { + d3.select(event.sourceEvent.target.closest('.dag-node')).raise(); + }) + .on('drag', (event, d) => { + d.x = event.x; + d.y = event.y; + this.updateNodePosition(d); + this.updateEdges(); + }) + .on('end', (event, d) => { + this.stepMoved.emit({ + stepId: d.id, + position: { x: d.x, y: d.y } + }); + }); + + // Apply to all node elements + d3.select(this.svgContainer.nativeElement) + .selectAll('.dag-node') + .call(drag); + } + + private setupPanZoom(): void { + const svg = d3.select(this.svgContainer.nativeElement); + const zoom = d3.zoom() + .scaleExtent([0.25, 2]) + .on('zoom', (event) => { + svg.select('.canvas-content') + .attr('transform', event.transform.toString()); + }); + + svg.call(zoom); + } + + private updateNodePosition(node: DagNode): void { + d3.select(this.svgContainer.nativeElement) + .select(`#node-${node.id}`) + .attr('transform', `translate(${node.x}, ${node.y})`); + } + + private updateEdges(): void { + this.edges = this.layoutService.computeEdgePaths(this.nodes, this.workflow?.connections || []); + } + + onNodeClick(node: DagNode, event: MouseEvent): void { + event.stopPropagation(); + this.stepSelected.emit(node.id); + } + + onCanvasClick(): void { + this.stepSelected.emit(null); + this.connectingFrom = null; + } + + onStartConnection(nodeId: string): void { + this.connectingFrom = nodeId; + } + + onEndConnection(nodeId: string): void { + if (this.connectingFrom && this.connectingFrom !== nodeId) { + this.connectionAdded.emit({ + from: this.connectingFrom, + to: nodeId + }); + } + this.connectingFrom = null; + } + + onEdgeClick(edge: DagEdge, event: MouseEvent): void { + event.stopPropagation(); + // Could show context menu for edge deletion + } + + onDeleteEdge(edge: DagEdge): void { + this.connectionRemoved.emit({ from: edge.from, to: edge.to }); + } + + onMouseMove(event: MouseEvent): void { + if (this.connectingFrom) { + const rect = this.canvasContainer.nativeElement.getBoundingClientRect(); + this.mousePosition = { + x: event.clientX - rect.left, + y: event.clientY - rect.top + }; + } + } + + getNodeTransform(node: DagNode): string { + return `translate(${node.x}, ${node.y})`; + } +} +``` + +```html + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ Release to connect +
+
+``` + +### Step Palette Component + +```typescript +// step-palette.component.ts +import { Component, Output, EventEmitter, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { DragDropModule, CdkDragEnd } from '@angular/cdk/drag-drop'; + +export interface StepTemplate { + type: string; + name: string; + icon: string; + description: string; + category: 'actions' | 'control' | 'integration'; + defaultConfig: Record; +} + +@Component({ + selector: 'so-step-palette', + standalone: true, + imports: [CommonModule, DragDropModule], + templateUrl: './step-palette.component.html', + styleUrl: './step-palette.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class StepPaletteComponent { + @Output() stepDragStart = new EventEmitter>(); + + readonly stepTemplates: StepTemplate[] = [ + // Actions + { + type: 'script', + name: 'Script', + icon: 'pi-code', + description: 'Execute a shell script', + category: 'actions', + defaultConfig: { script: '', timeout: 300 } + }, + { + type: 'deploy', + name: 'Deploy', + icon: 'pi-cloud-upload', + description: 'Deploy to target', + category: 'actions', + defaultConfig: { strategy: 'rolling', targetSelector: '' } + }, + { + type: 'notify', + name: 'Notify', + icon: 'pi-bell', + description: 'Send notification', + category: 'actions', + defaultConfig: { channel: 'slack', message: '' } + }, + // Control + { + type: 'approval', + name: 'Approval', + icon: 'pi-check-circle', + description: 'Wait for approval', + category: 'control', + defaultConfig: { approvers: [], minApprovals: 1 } + }, + { + type: 'gate', + name: 'Gate', + icon: 'pi-shield', + description: 'Conditional gate', + category: 'control', + defaultConfig: { condition: '', failAction: 'stop' } + }, + { + type: 'wait', + name: 'Wait', + icon: 'pi-clock', + description: 'Wait for duration', + category: 'control', + defaultConfig: { duration: 60 } + }, + { + type: 'parallel', + name: 'Parallel', + icon: 'pi-arrows-h', + description: 'Run steps in parallel', + category: 'control', + defaultConfig: { maxConcurrency: 0 } + }, + { + type: 'manual', + name: 'Manual', + icon: 'pi-user', + description: 'Manual intervention', + category: 'control', + defaultConfig: { instructions: '' } + }, + // Integration + { + type: 'webhook', + name: 'Webhook', + icon: 'pi-link', + description: 'Call external webhook', + category: 'integration', + defaultConfig: { url: '', method: 'POST', headers: {} } + } + ]; + + readonly categories = [ + { key: 'actions', label: 'Actions' }, + { key: 'control', label: 'Control Flow' }, + { key: 'integration', label: 'Integration' } + ]; + + getStepsByCategory(category: string): StepTemplate[] { + return this.stepTemplates.filter(s => s.category === category); + } + + onDragEnd(event: CdkDragEnd, template: StepTemplate): void { + const step: Partial = { + id: crypto.randomUUID(), + type: template.type as any, + name: template.name, + config: { ...template.defaultConfig }, + position: { + x: event.dropPoint.x, + y: event.dropPoint.y + }, + dependencies: [] + }; + + this.stepDragStart.emit(step); + } +} +``` + +```html + +
+

Steps

+ +
+

{{ category.label }}

+
+
+
+ +
+
+ {{ template.name }} + {{ template.description }} +
+
+ + {{ template.name }} +
+
+
+
+
+``` + +### Step Config Panel Component + +```typescript +// step-config-panel.component.ts +import { Component, Input, Output, EventEmitter, OnChanges, SimpleChanges, + inject, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { FormBuilder, FormGroup, ReactiveFormsModule } from '@angular/forms'; + +@Component({ + selector: 'so-step-config-panel', + standalone: true, + imports: [CommonModule, ReactiveFormsModule], + templateUrl: './step-config-panel.component.html', + styleUrl: './step-config-panel.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class StepConfigPanelComponent implements OnChanges { + @Input() step: WorkflowStep | null = null; + @Output() stepUpdated = new EventEmitter(); + @Output() close = new EventEmitter(); + + private readonly fb = inject(FormBuilder); + + form: FormGroup = this.fb.group({ + name: [''], + config: this.fb.group({}) + }); + + ngOnChanges(changes: SimpleChanges): void { + if (changes['step'] && this.step) { + this.buildForm(); + } + } + + private buildForm(): void { + if (!this.step) return; + + // Build dynamic form based on step type + const configGroup = this.getConfigFormGroup(this.step.type, this.step.config); + + this.form = this.fb.group({ + name: [this.step.name], + config: configGroup + }); + } + + private getConfigFormGroup(type: string, config: Record): FormGroup { + switch (type) { + case 'script': + return this.fb.group({ + script: [config['script'] || ''], + timeout: [config['timeout'] || 300], + shell: [config['shell'] || '/bin/bash'] + }); + + case 'deploy': + return this.fb.group({ + strategy: [config['strategy'] || 'rolling'], + targetSelector: [config['targetSelector'] || ''], + healthCheckPath: [config['healthCheckPath'] || '/health'], + rollbackOnFailure: [config['rollbackOnFailure'] ?? true] + }); + + case 'approval': + return this.fb.group({ + approvers: [config['approvers'] || []], + minApprovals: [config['minApprovals'] || 1], + timeout: [config['timeout'] || 86400], + autoReject: [config['autoReject'] ?? false] + }); + + case 'gate': + return this.fb.group({ + condition: [config['condition'] || ''], + failAction: [config['failAction'] || 'stop'], + retryCount: [config['retryCount'] || 0], + retryDelay: [config['retryDelay'] || 60] + }); + + case 'notify': + return this.fb.group({ + channel: [config['channel'] || 'slack'], + message: [config['message'] || ''], + recipients: [config['recipients'] || []] + }); + + case 'wait': + return this.fb.group({ + duration: [config['duration'] || 60], + until: [config['until'] || null] + }); + + case 'webhook': + return this.fb.group({ + url: [config['url'] || ''], + method: [config['method'] || 'POST'], + headers: [config['headers'] || {}], + body: [config['body'] || ''], + expectedStatus: [config['expectedStatus'] || 200] + }); + + default: + return this.fb.group({}); + } + } + + onSave(): void { + if (!this.step || this.form.invalid) return; + + const updated: WorkflowStep = { + ...this.step, + name: this.form.value.name, + config: this.form.value.config + }; + + this.stepUpdated.emit(updated); + } + + getStepIcon(): string { + const icons: Record = { + script: 'pi-code', + deploy: 'pi-cloud-upload', + approval: 'pi-check-circle', + gate: 'pi-shield', + notify: 'pi-bell', + wait: 'pi-clock', + parallel: 'pi-arrows-h', + manual: 'pi-user', + webhook: 'pi-link' + }; + return icons[this.step?.type || ''] || 'pi-circle'; + } +} +``` + +```html + +
+
+
+ + {{ step?.type | titlecase }} Configuration +
+ +
+ +
+
+ + +
+ + + + +
+ + +
+
+ + +
+
+ + +
+
+ + + +
+ + +
+
+ + +
+
+ + +
+
+ + +
+
+ + + +
+ + +
+
+ + +
+
+ + +
+
+ + + +
+ + +
+
+ + +
+
+ + + +
+ + +
+
+ + +
+
+ + + +
+ + +
+
+
+ +
+ +
+
+
+``` + +### YAML Editor Component + +```typescript +// yaml-editor.component.ts +import { Component, Input, Output, EventEmitter, AfterViewInit, ViewChild, + ElementRef, OnChanges, SimpleChanges, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import * as yaml from 'js-yaml'; +import * as monaco from 'monaco-editor'; + +@Component({ + selector: 'so-yaml-editor', + standalone: true, + imports: [CommonModule], + templateUrl: './yaml-editor.component.html', + styleUrl: './yaml-editor.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class YamlEditorComponent implements AfterViewInit, OnChanges { + @Input() workflow: Workflow | null = null; + @Output() yamlChanged = new EventEmitter(); + + @ViewChild('editorContainer') editorContainer!: ElementRef; + + private editor: monaco.editor.IStandaloneCodeEditor | null = null; + private ignoreChange = false; + + parseError: string | null = null; + + ngAfterViewInit(): void { + this.initEditor(); + } + + ngOnChanges(changes: SimpleChanges): void { + if (changes['workflow'] && this.editor && this.workflow) { + this.ignoreChange = true; + const yamlContent = this.workflowToYaml(this.workflow); + this.editor.setValue(yamlContent); + this.ignoreChange = false; + } + } + + private initEditor(): void { + this.editor = monaco.editor.create(this.editorContainer.nativeElement, { + value: this.workflow ? this.workflowToYaml(this.workflow) : '', + language: 'yaml', + theme: 'vs-dark', + automaticLayout: true, + minimap: { enabled: false }, + lineNumbers: 'on', + scrollBeyondLastLine: false, + wordWrap: 'on', + fontSize: 14, + tabSize: 2 + }); + + this.editor.onDidChangeModelContent(() => { + if (this.ignoreChange) return; + + const content = this.editor?.getValue() || ''; + this.validateAndEmit(content); + }); + } + + private validateAndEmit(content: string): void { + try { + yaml.load(content); + this.parseError = null; + this.yamlChanged.emit(content); + } catch (e: any) { + this.parseError = e.message; + } + } + + private workflowToYaml(workflow: Workflow): string { + const yamlObj = { + name: workflow.name, + description: workflow.description, + triggers: workflow.triggers, + steps: workflow.steps.map(step => ({ + id: step.id, + name: step.name, + type: step.type, + depends_on: step.dependencies.length > 0 ? step.dependencies : undefined, + config: step.config + })) + }; + + return yaml.dump(yamlObj, { + indent: 2, + lineWidth: 120, + noRefs: true + }); + } + + onFormat(): void { + if (!this.editor) return; + + try { + const content = this.editor.getValue(); + const parsed = yaml.load(content); + const formatted = yaml.dump(parsed, { indent: 2, lineWidth: 120 }); + this.ignoreChange = true; + this.editor.setValue(formatted); + this.ignoreChange = false; + } catch (e) { + // Ignore formatting errors + } + } + + onCopy(): void { + if (this.editor) { + navigator.clipboard.writeText(this.editor.getValue()); + } + } +} +``` + +```html + +
+
+ + + + + {{ parseError }} + +
+
+
+``` + +### Documentation Deliverables + +| Deliverable | Type | Description | +|-------------|------|-------------| +| `docs/modules/release-orchestrator/ui/workflow-editor.md` | Markdown | Workflow editor specification (graph editor, DAG visualization, Monaco integration) | + +--- + +## Acceptance Criteria + +### Code + +- [ ] Workflow list displays all workflows +- [ ] Create new workflow initializes empty canvas +- [ ] Load existing workflow displays DAG +- [ ] Drag step from palette to canvas +- [ ] Connect steps by dragging +- [ ] Delete steps and connections +- [ ] Step configuration panel updates step +- [ ] Save workflow persists changes +- [ ] Publish workflow marks as non-draft +- [ ] YAML view shows valid YAML +- [ ] YAML changes reflect in visual view +- [ ] Syntax highlighting in YAML editor +- [ ] Zoom in/out works +- [ ] Pan canvas works +- [ ] Validation errors displayed +- [ ] Unit test coverage >=80% + +### Documentation + +- [ ] Workflow editor specification file created +- [ ] Graph editor component interface documented +- [ ] DAG visualization documented (D3.js integration) +- [ ] Run visualization overlay documented +- [ ] WebSocket integration for real-time updates documented +- [ ] YAML editor bidirectional sync documented + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 105_001 Workflow DAG | Internal | TODO | +| Angular 17 | External | Available | +| D3.js | External | Available | +| Monaco Editor | External | Available | +| js-yaml | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| WorkflowListComponent | TODO | | +| WorkflowEditorComponent | TODO | | +| DagCanvasComponent | TODO | | +| StepPaletteComponent | TODO | | +| StepConfigPanelComponent | TODO | | +| StepNodeComponent | TODO | | +| YamlEditorComponent | TODO | | +| DagLayoutService | TODO | | +| Workflow NgRx Store | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | +| 11-Jan-2026 | Added documentation deliverable: ui/workflow-editor.md | diff --git a/docs/implplan/SPRINT_20260110_111_005_FE_promotion_approval_ui.md b/docs/implplan/SPRINT_20260110_111_005_FE_promotion_approval_ui.md new file mode 100644 index 000000000..2d3407ce1 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_111_005_FE_promotion_approval_ui.md @@ -0,0 +1,991 @@ +# SPRINT: Promotion & Approval UI + +> **Sprint ID:** 111_005 +> **Module:** FE +> **Phase:** 11 - UI Implementation +> **Status:** TODO +> **Parent:** [111_000_INDEX](SPRINT_20260110_111_000_INDEX_ui_implementation.md) + +--- + +## Overview + +Implement the Promotion and Approval UI providing promotion request creation, approval queue management, approval detail views, and gate results display. + +### Objectives + +- Promotion request form with gate preview +- Approval queue with filtering +- Approval detail with gate results +- Approve/reject with comments +- Batch approval support +- Approval history + +### Working Directory + +``` +src/Web/StellaOps.Web/ +├── src/app/features/release-orchestrator/ +│ └── approvals/ +│ ├── promotion-request/ +│ │ ├── promotion-request.component.ts +│ │ ├── promotion-request.component.html +│ │ └── promotion-request.component.scss +│ ├── approval-queue/ +│ │ ├── approval-queue.component.ts +│ │ ├── approval-queue.component.html +│ │ └── approval-queue.component.scss +│ ├── approval-detail/ +│ │ ├── approval-detail.component.ts +│ │ ├── approval-detail.component.html +│ │ └── approval-detail.component.scss +│ ├── components/ +│ │ ├── gate-results-panel/ +│ │ ├── approval-form/ +│ │ ├── approval-history/ +│ │ └── approver-list/ +│ └── approvals.routes.ts +└── src/app/store/release-orchestrator/ + └── approvals/ + ├── approvals.actions.ts + ├── approvals.reducer.ts + └── approvals.selectors.ts +``` + +--- + +## Deliverables + +### Promotion Request Component + +```typescript +// promotion-request.component.ts +import { Component, OnInit, inject, signal, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { ActivatedRoute, Router } from '@angular/router'; +import { FormBuilder, FormGroup, ReactiveFormsModule, Validators } from '@angular/forms'; +import { Store } from '@ngrx/store'; +import { GateResultsPanelComponent } from '../components/gate-results-panel/gate-results-panel.component'; +import { PromotionActions } from '@store/release-orchestrator/approvals/approvals.actions'; +import * as ApprovalSelectors from '@store/release-orchestrator/approvals/approvals.selectors'; + +export interface GateResult { + gateId: string; + gateName: string; + type: 'security' | 'policy' | 'quality' | 'custom'; + status: 'passed' | 'failed' | 'warning' | 'pending' | 'skipped'; + message: string; + details: Record; + evaluatedAt: Date; +} + +export interface PromotionPreview { + releaseId: string; + releaseName: string; + sourceEnvironment: string; + targetEnvironment: string; + gateResults: GateResult[]; + allGatesPassed: boolean; + requiredApprovers: number; + estimatedDeployTime: number; + warnings: string[]; +} + +@Component({ + selector: 'so-promotion-request', + standalone: true, + imports: [CommonModule, ReactiveFormsModule, GateResultsPanelComponent], + templateUrl: './promotion-request.component.html', + styleUrl: './promotion-request.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class PromotionRequestComponent implements OnInit { + private readonly store = inject(Store); + private readonly route = inject(ActivatedRoute); + private readonly router = inject(Router); + private readonly fb = inject(FormBuilder); + + readonly preview$ = this.store.select(ApprovalSelectors.selectPromotionPreview); + readonly loading$ = this.store.select(ApprovalSelectors.selectLoading); + readonly submitting$ = this.store.select(ApprovalSelectors.selectSubmitting); + + readonly environments$ = this.store.select(ApprovalSelectors.selectAvailableEnvironments); + + form: FormGroup = this.fb.group({ + targetEnvironment: ['', Validators.required], + urgency: ['normal'], + justification: ['', [Validators.required, Validators.minLength(10)]], + notifyApprovers: [true], + scheduledTime: [null] + }); + + urgencyOptions = [ + { label: 'Low', value: 'low' }, + { label: 'Normal', value: 'normal' }, + { label: 'High', value: 'high' }, + { label: 'Critical', value: 'critical' } + ]; + + releaseId: string = ''; + + ngOnInit(): void { + this.releaseId = this.route.snapshot.paramMap.get('releaseId') || ''; + if (this.releaseId) { + this.store.dispatch(PromotionActions.loadAvailableEnvironments({ releaseId: this.releaseId })); + } + + // Watch for target environment changes to fetch preview + this.form.get('targetEnvironment')?.valueChanges.subscribe(envId => { + if (envId) { + this.store.dispatch(PromotionActions.loadPromotionPreview({ + releaseId: this.releaseId, + targetEnvironmentId: envId + })); + } + }); + } + + onSubmit(): void { + if (this.form.invalid) return; + + this.store.dispatch(PromotionActions.submitPromotionRequest({ + releaseId: this.releaseId, + request: { + targetEnvironmentId: this.form.value.targetEnvironment, + urgency: this.form.value.urgency, + justification: this.form.value.justification, + notifyApprovers: this.form.value.notifyApprovers, + scheduledTime: this.form.value.scheduledTime + } + })); + } + + onCancel(): void { + this.router.navigate(['/releases', this.releaseId]); + } + + hasFailedGates(preview: PromotionPreview): boolean { + return preview.gateResults.some(g => g.status === 'failed'); + } + + getFailedGatesCount(preview: PromotionPreview): number { + return preview.gateResults.filter(g => g.status === 'failed').length; + } +} +``` + +```html + +
+
+

Request Promotion

+
+ +
+
+
+

Promotion Details

+ +
+ + + +
+ +
+ + + +
+ +
+ + + Minimum 10 characters required +
+ +
+ + + + Leave empty to deploy as soon as approved +
+ +
+ + +
+
+ + +
+

Gate Evaluation Preview

+ +
+
+ + All gates passed + + {{ getFailedGatesCount(preview) }} gate(s) failed + +
+
+ Required approvers: {{ preview.requiredApprovers }} + Est. deploy time: {{ preview.estimatedDeployTime }}s +
+
+ +
+ + +
    +
  • {{ warning }}
  • +
+
+
+
+ + +
+ +
+ + +
+
+
+
+``` + +### Approval Queue Component + +```typescript +// approval-queue.component.ts +import { Component, OnInit, inject, signal, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { RouterModule } from '@angular/router'; +import { FormsModule } from '@angular/forms'; +import { Store } from '@ngrx/store'; +import { ConfirmationService } from 'primeng/api'; +import { ApprovalActions } from '@store/release-orchestrator/approvals/approvals.actions'; +import * as ApprovalSelectors from '@store/release-orchestrator/approvals/approvals.selectors'; + +export interface ApprovalRequest { + id: string; + releaseId: string; + releaseName: string; + releaseVersion: string; + sourceEnvironment: string; + targetEnvironment: string; + requestedBy: string; + requestedAt: Date; + urgency: 'low' | 'normal' | 'high' | 'critical'; + justification: string; + status: 'pending' | 'approved' | 'rejected' | 'expired'; + currentApprovals: number; + requiredApprovals: number; + gatesPassed: boolean; + scheduledTime: Date | null; + expiresAt: Date; +} + +@Component({ + selector: 'so-approval-queue', + standalone: true, + imports: [CommonModule, RouterModule, FormsModule], + templateUrl: './approval-queue.component.html', + styleUrl: './approval-queue.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush, + providers: [ConfirmationService] +}) +export class ApprovalQueueComponent implements OnInit { + private readonly store = inject(Store); + private readonly confirmationService = inject(ConfirmationService); + + readonly approvals$ = this.store.select(ApprovalSelectors.selectFilteredApprovals); + readonly loading$ = this.store.select(ApprovalSelectors.selectLoading); + readonly selectedIds = signal>(new Set()); + + statusFilter = signal(['pending']); + urgencyFilter = signal([]); + environmentFilter = signal(null); + + readonly statusOptions = [ + { label: 'Pending', value: 'pending' }, + { label: 'Approved', value: 'approved' }, + { label: 'Rejected', value: 'rejected' }, + { label: 'Expired', value: 'expired' } + ]; + + readonly urgencyOptions = [ + { label: 'Low', value: 'low' }, + { label: 'Normal', value: 'normal' }, + { label: 'High', value: 'high' }, + { label: 'Critical', value: 'critical' } + ]; + + ngOnInit(): void { + this.loadApprovals(); + } + + loadApprovals(): void { + this.store.dispatch(ApprovalActions.loadApprovals({ + filter: { + statuses: this.statusFilter(), + urgencies: this.urgencyFilter(), + environment: this.environmentFilter() + } + })); + } + + onStatusFilterChange(statuses: string[]): void { + this.statusFilter.set(statuses); + this.loadApprovals(); + } + + onToggleSelect(id: string): void { + this.selectedIds.update(ids => { + const newIds = new Set(ids); + if (newIds.has(id)) { + newIds.delete(id); + } else { + newIds.add(id); + } + return newIds; + }); + } + + onSelectAll(approvals: ApprovalRequest[]): void { + const pendingIds = approvals + .filter(a => a.status === 'pending') + .map(a => a.id); + this.selectedIds.set(new Set(pendingIds)); + } + + onDeselectAll(): void { + this.selectedIds.set(new Set()); + } + + onBatchApprove(): void { + const ids = Array.from(this.selectedIds()); + if (ids.length === 0) return; + + this.confirmationService.confirm({ + message: `Approve ${ids.length} promotion request(s)?`, + header: 'Batch Approve', + accept: () => { + this.store.dispatch(ApprovalActions.batchApprove({ ids, comment: 'Batch approved' })); + this.selectedIds.set(new Set()); + } + }); + } + + onBatchReject(): void { + const ids = Array.from(this.selectedIds()); + if (ids.length === 0) return; + + this.confirmationService.confirm({ + message: `Reject ${ids.length} promotion request(s)?`, + header: 'Batch Reject', + acceptButtonStyleClass: 'p-button-danger', + accept: () => { + this.store.dispatch(ApprovalActions.batchReject({ ids, comment: 'Batch rejected' })); + this.selectedIds.set(new Set()); + } + }); + } + + getUrgencyClass(urgency: string): string { + const classes: Record = { + low: 'urgency--low', + normal: 'urgency--normal', + high: 'urgency--high', + critical: 'urgency--critical' + }; + return classes[urgency] || ''; + } + + getStatusClass(status: string): string { + const classes: Record = { + pending: 'badge--warning', + approved: 'badge--success', + rejected: 'badge--danger', + expired: 'badge--secondary' + }; + return classes[status] || ''; + } + + isExpiringSoon(approval: ApprovalRequest): boolean { + const hoursUntilExpiry = (new Date(approval.expiresAt).getTime() - Date.now()) / 3600000; + return approval.status === 'pending' && hoursUntilExpiry < 4; + } +} +``` + +```html + +
+
+

Approval Queue

+
+ + +
+
+ +
+ {{ selectedIds().size }} selected + + + +
+ +
+ + + + + + + + Release + Promotion + Urgency + Status + Approvals + Requested + Actions + + + + + + + + + + + {{ approval.releaseName }} + {{ approval.releaseVersion }} + + + + + {{ approval.sourceEnvironment }} + + {{ approval.targetEnvironment }} + + + + + {{ approval.urgency | titlecase }} + + + + + {{ approval.status | titlecase }} + + + + + + + + {{ approval.currentApprovals }}/{{ approval.requiredApprovals }} + + + + {{ approval.requestedAt | date:'short' }} + by {{ approval.requestedBy }} + + + + + + + + + +
+ +

No approvals found

+

There are no promotion requests matching your filters.

+
+ + +
+
+
+ + + + +
+ + +``` + +### Approval Detail Component + +```typescript +// approval-detail.component.ts +import { Component, OnInit, inject, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { ActivatedRoute, RouterModule } from '@angular/router'; +import { FormBuilder, FormGroup, ReactiveFormsModule, Validators } from '@angular/forms'; +import { Store } from '@ngrx/store'; +import { GateResultsPanelComponent } from '../components/gate-results-panel/gate-results-panel.component'; +import { ApprovalHistoryComponent } from '../components/approval-history/approval-history.component'; +import { ApproverListComponent } from '../components/approver-list/approver-list.component'; +import { ApprovalActions } from '@store/release-orchestrator/approvals/approvals.actions'; +import * as ApprovalSelectors from '@store/release-orchestrator/approvals/approvals.selectors'; + +export interface ApprovalAction { + id: string; + approvalId: string; + action: 'approved' | 'rejected'; + actor: string; + comment: string; + timestamp: Date; +} + +export interface ApprovalDetail extends ApprovalRequest { + gateResults: GateResult[]; + actions: ApprovalAction[]; + approvers: Array<{ + id: string; + name: string; + email: string; + hasApproved: boolean; + approvedAt: Date | null; + }>; + releaseComponents: Array<{ + name: string; + version: string; + digest: string; + }>; +} + +@Component({ + selector: 'so-approval-detail', + standalone: true, + imports: [ + CommonModule, + RouterModule, + ReactiveFormsModule, + GateResultsPanelComponent, + ApprovalHistoryComponent, + ApproverListComponent + ], + templateUrl: './approval-detail.component.html', + styleUrl: './approval-detail.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class ApprovalDetailComponent implements OnInit { + private readonly store = inject(Store); + private readonly route = inject(ActivatedRoute); + private readonly fb = inject(FormBuilder); + + readonly approval$ = this.store.select(ApprovalSelectors.selectCurrentApproval); + readonly loading$ = this.store.select(ApprovalSelectors.selectLoading); + readonly canApprove$ = this.store.select(ApprovalSelectors.selectCanApprove); + readonly submitting$ = this.store.select(ApprovalSelectors.selectSubmitting); + + approvalForm: FormGroup = this.fb.group({ + comment: ['', Validators.required] + }); + + showApprovalForm = false; + pendingAction: 'approve' | 'reject' | null = null; + + ngOnInit(): void { + const id = this.route.snapshot.paramMap.get('id'); + if (id) { + this.store.dispatch(ApprovalActions.loadApproval({ id })); + } + } + + onStartApprove(): void { + this.pendingAction = 'approve'; + this.showApprovalForm = true; + } + + onStartReject(): void { + this.pendingAction = 'reject'; + this.showApprovalForm = true; + } + + onCancelAction(): void { + this.pendingAction = null; + this.showApprovalForm = false; + this.approvalForm.reset(); + } + + onSubmitAction(approvalId: string): void { + if (this.approvalForm.invalid || !this.pendingAction) return; + + if (this.pendingAction === 'approve') { + this.store.dispatch(ApprovalActions.approve({ + id: approvalId, + comment: this.approvalForm.value.comment + })); + } else { + this.store.dispatch(ApprovalActions.reject({ + id: approvalId, + comment: this.approvalForm.value.comment + })); + } + + this.onCancelAction(); + } + + getUrgencyClass(urgency: string): string { + return `urgency--${urgency}`; + } + + getStatusClass(status: string): string { + const classes: Record = { + pending: 'badge--warning', + approved: 'badge--success', + rejected: 'badge--danger', + expired: 'badge--secondary' + }; + return classes[status] || ''; + } + + getTimeRemaining(expiresAt: Date): string { + const ms = new Date(expiresAt).getTime() - Date.now(); + if (ms <= 0) return 'Expired'; + + const hours = Math.floor(ms / 3600000); + const minutes = Math.floor((ms % 3600000) / 60000); + + if (hours > 24) { + return `${Math.floor(hours / 24)}d ${hours % 24}h`; + } + return `${hours}h ${minutes}m`; + } +} +``` + +```html + +
+
+
+ Approvals + + {{ approval.releaseName }} +
+ +
+
+

+ Promotion Request + + {{ approval.status | titlecase }} + +

+

+ {{ approval.sourceEnvironment }} + + {{ approval.targetEnvironment }} +

+
+ +
+ + +
+
+ +
+
+ + {{ approval.urgency | titlecase }} urgency +
+
+ + Requested by {{ approval.requestedBy }} +
+
+ + {{ approval.requestedAt | date:'medium' }} +
+
+ + Expires in {{ getTimeRemaining(approval.expiresAt) }} +
+
+
+ + +
+
+

{{ pendingAction === 'approve' ? 'Approve' : 'Reject' }} Promotion

+
+
+ + +
+
+ + +
+
+
+
+ +
+
+ +
+

Justification

+

{{ approval.justification }}

+
+ + +
+

+ Gate Evaluation + + {{ approval.gatesPassed ? 'All Passed' : 'Some Failed' }} + +

+ +
+ + +
+

Release Components

+
+
+ {{ comp.name }} + {{ comp.version }} + {{ comp.digest | slice:0:19 }}... +
+
+
+
+ + +
+
+ +
+ +
+``` + +### Gate Results Panel Component + +```typescript +// gate-results-panel.component.ts +import { Component, Input, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; + +@Component({ + selector: 'so-gate-results-panel', + standalone: true, + imports: [CommonModule], + templateUrl: './gate-results-panel.component.html', + styleUrl: './gate-results-panel.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class GateResultsPanelComponent { + @Input() results: GateResult[] = []; + @Input() showDetails = true; + + expandedGates = new Set(); + + getGateIcon(status: string): string { + const icons: Record = { + passed: 'pi-check-circle', + failed: 'pi-times-circle', + warning: 'pi-exclamation-triangle', + pending: 'pi-spin pi-spinner', + skipped: 'pi-minus-circle' + }; + return icons[status] || 'pi-question-circle'; + } + + getGateClass(status: string): string { + return `gate--${status}`; + } + + getTypeIcon(type: string): string { + const icons: Record = { + security: 'pi-shield', + policy: 'pi-book', + quality: 'pi-chart-bar', + custom: 'pi-cog' + }; + return icons[type] || 'pi-circle'; + } + + toggleExpand(gateId: string): void { + if (this.expandedGates.has(gateId)) { + this.expandedGates.delete(gateId); + } else { + this.expandedGates.add(gateId); + } + } + + isExpanded(gateId: string): boolean { + return this.expandedGates.has(gateId); + } +} +``` + +```html + +
+
+ No gates configured for this promotion +
+ +
+
+
+ +
+
+ + + {{ gate.gateName }} + + {{ gate.message }} +
+
+ +
+
+ +
+
+ {{ item.key }} + {{ item.value | json }} +
+
+ Evaluated at {{ gate.evaluatedAt | date:'medium' }} +
+
+
+
+``` + +--- + +## Acceptance Criteria + +- [ ] Promotion request form validates input +- [ ] Target environment dropdown populated +- [ ] Gate preview loads on environment select +- [ ] Failed gates block submission (with override option) +- [ ] Approval queue shows pending requests +- [ ] Filtering by status works +- [ ] Filtering by urgency works +- [ ] Batch selection works +- [ ] Batch approve/reject works +- [ ] Approval detail loads correctly +- [ ] Approve action with comment +- [ ] Reject action with comment +- [ ] Gate results display correctly +- [ ] Approval progress shows correctly +- [ ] Approver list shows who approved +- [ ] Approval history timeline +- [ ] Unit test coverage >=80% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 106_001 Promotion Request | Internal | TODO | +| Angular 17 | External | Available | +| NgRx 17 | External | Available | +| PrimeNG 17 | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| PromotionRequestComponent | TODO | | +| ApprovalQueueComponent | TODO | | +| ApprovalDetailComponent | TODO | | +| GateResultsPanelComponent | TODO | | +| ApprovalFormComponent | TODO | | +| ApprovalHistoryComponent | TODO | | +| ApproverListComponent | TODO | | +| Approval NgRx Store | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_111_006_FE_deployment_monitoring_ui.md b/docs/implplan/SPRINT_20260110_111_006_FE_deployment_monitoring_ui.md new file mode 100644 index 000000000..bf59f6cb7 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_111_006_FE_deployment_monitoring_ui.md @@ -0,0 +1,895 @@ +# SPRINT: Deployment Monitoring UI + +> **Sprint ID:** 111_006 +> **Module:** FE +> **Phase:** 11 - UI Implementation +> **Status:** TODO +> **Parent:** [111_000_INDEX](SPRINT_20260110_111_000_INDEX_ui_implementation.md) + +--- + +## Overview + +Implement the Deployment Monitoring UI providing real-time deployment status, per-target progress tracking, live log streaming, and rollback capabilities. + +### Objectives + +- Deployment status overview +- Per-target progress tracking +- Real-time log streaming +- Deployment actions (pause, resume, cancel) +- Rollback confirmation dialog +- Deployment history + +### Working Directory + +``` +src/Web/StellaOps.Web/ +├── src/app/features/release-orchestrator/ +│ └── deployments/ +│ ├── deployment-list/ +│ │ ├── deployment-list.component.ts +│ │ ├── deployment-list.component.html +│ │ └── deployment-list.component.scss +│ ├── deployment-monitor/ +│ │ ├── deployment-monitor.component.ts +│ │ ├── deployment-monitor.component.html +│ │ └── deployment-monitor.component.scss +│ ├── components/ +│ │ ├── target-progress-list/ +│ │ ├── log-stream-viewer/ +│ │ ├── deployment-timeline/ +│ │ ├── rollback-dialog/ +│ │ └── deployment-metrics/ +│ └── deployments.routes.ts +└── src/app/store/release-orchestrator/ + └── deployments/ + ├── deployments.actions.ts + ├── deployments.reducer.ts + └── deployments.selectors.ts +``` + +--- + +## Deliverables + +### Deployment Monitor Component + +```typescript +// deployment-monitor.component.ts +import { Component, OnInit, OnDestroy, inject, signal, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { ActivatedRoute, RouterModule } from '@angular/router'; +import { Store } from '@ngrx/store'; +import { Subject, takeUntil } from 'rxjs'; +import { ConfirmationService, MessageService } from 'primeng/api'; +import { TargetProgressListComponent } from '../components/target-progress-list/target-progress-list.component'; +import { LogStreamViewerComponent } from '../components/log-stream-viewer/log-stream-viewer.component'; +import { DeploymentTimelineComponent } from '../components/deployment-timeline/deployment-timeline.component'; +import { DeploymentMetricsComponent } from '../components/deployment-metrics/deployment-metrics.component'; +import { RollbackDialogComponent } from '../components/rollback-dialog/rollback-dialog.component'; +import { DeploymentActions } from '@store/release-orchestrator/deployments/deployments.actions'; +import * as DeploymentSelectors from '@store/release-orchestrator/deployments/deployments.selectors'; + +export interface Deployment { + id: string; + releaseId: string; + releaseName: string; + releaseVersion: string; + environmentId: string; + environmentName: string; + status: 'pending' | 'running' | 'paused' | 'completed' | 'failed' | 'cancelled' | 'rolling_back'; + strategy: 'rolling' | 'blue_green' | 'canary' | 'all_at_once'; + progress: number; + startedAt: Date; + completedAt: Date | null; + initiatedBy: string; + targets: DeploymentTarget[]; +} + +export interface DeploymentTarget { + id: string; + name: string; + type: string; + status: 'pending' | 'running' | 'completed' | 'failed' | 'skipped'; + progress: number; + startedAt: Date | null; + completedAt: Date | null; + duration: number | null; + agentId: string; + error: string | null; +} + +export interface DeploymentEvent { + id: string; + type: 'started' | 'target_started' | 'target_completed' | 'target_failed' | 'paused' | 'resumed' | 'completed' | 'failed' | 'cancelled' | 'rollback_started'; + targetId: string | null; + targetName: string | null; + message: string; + timestamp: Date; +} + +@Component({ + selector: 'so-deployment-monitor', + standalone: true, + imports: [ + CommonModule, + RouterModule, + TargetProgressListComponent, + LogStreamViewerComponent, + DeploymentTimelineComponent, + DeploymentMetricsComponent, + RollbackDialogComponent + ], + templateUrl: './deployment-monitor.component.html', + styleUrl: './deployment-monitor.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush, + providers: [ConfirmationService, MessageService] +}) +export class DeploymentMonitorComponent implements OnInit, OnDestroy { + private readonly store = inject(Store); + private readonly route = inject(ActivatedRoute); + private readonly confirmationService = inject(ConfirmationService); + private readonly destroy$ = new Subject(); + + readonly deployment$ = this.store.select(DeploymentSelectors.selectCurrentDeployment); + readonly targets$ = this.store.select(DeploymentSelectors.selectDeploymentTargets); + readonly events$ = this.store.select(DeploymentSelectors.selectDeploymentEvents); + readonly logs$ = this.store.select(DeploymentSelectors.selectDeploymentLogs); + readonly metrics$ = this.store.select(DeploymentSelectors.selectDeploymentMetrics); + readonly loading$ = this.store.select(DeploymentSelectors.selectLoading); + + selectedTargetId = signal(null); + showRollbackDialog = signal(false); + activeTab = signal<'logs' | 'timeline' | 'metrics'>('logs'); + + ngOnInit(): void { + const id = this.route.snapshot.paramMap.get('id'); + if (id) { + this.store.dispatch(DeploymentActions.loadDeployment({ id })); + this.store.dispatch(DeploymentActions.subscribeToUpdates({ deploymentId: id })); + } + } + + ngOnDestroy(): void { + this.store.dispatch(DeploymentActions.unsubscribeFromUpdates()); + this.destroy$.next(); + this.destroy$.complete(); + } + + onPause(deployment: Deployment): void { + this.confirmationService.confirm({ + message: 'Pause the deployment? In-progress targets will complete, but no new targets will start.', + header: 'Pause Deployment', + accept: () => { + this.store.dispatch(DeploymentActions.pause({ deploymentId: deployment.id })); + } + }); + } + + onResume(deployment: Deployment): void { + this.store.dispatch(DeploymentActions.resume({ deploymentId: deployment.id })); + } + + onCancel(deployment: Deployment): void { + this.confirmationService.confirm({ + message: 'Cancel the deployment? In-progress targets will complete, but no new targets will start. This cannot be undone.', + header: 'Cancel Deployment', + acceptButtonStyleClass: 'p-button-danger', + accept: () => { + this.store.dispatch(DeploymentActions.cancel({ deploymentId: deployment.id })); + } + }); + } + + onRollback(): void { + this.showRollbackDialog.set(true); + } + + onRollbackConfirm(options: { targetIds?: string[]; reason: string }): void { + const id = this.route.snapshot.paramMap.get('id'); + if (id) { + this.store.dispatch(DeploymentActions.rollback({ + deploymentId: id, + targetIds: options.targetIds, + reason: options.reason + })); + } + this.showRollbackDialog.set(false); + } + + onTargetSelect(targetId: string | null): void { + this.selectedTargetId.set(targetId); + if (targetId) { + const id = this.route.snapshot.paramMap.get('id'); + if (id) { + this.store.dispatch(DeploymentActions.loadTargetLogs({ + deploymentId: id, + targetId + })); + } + } + } + + onRetryTarget(targetId: string): void { + const id = this.route.snapshot.paramMap.get('id'); + if (id) { + this.store.dispatch(DeploymentActions.retryTarget({ + deploymentId: id, + targetId + })); + } + } + + getStatusIcon(status: string): string { + const icons: Record = { + pending: 'pi-clock', + running: 'pi-spin pi-spinner', + paused: 'pi-pause', + completed: 'pi-check-circle', + failed: 'pi-times-circle', + cancelled: 'pi-ban', + rolling_back: 'pi-spin pi-undo' + }; + return icons[status] || 'pi-question'; + } + + getStatusClass(status: string): string { + const classes: Record = { + pending: 'status--pending', + running: 'status--running', + paused: 'status--paused', + completed: 'status--success', + failed: 'status--danger', + cancelled: 'status--cancelled', + rolling_back: 'status--warning' + }; + return classes[status] || ''; + } + + getDuration(deployment: Deployment): string { + const start = new Date(deployment.startedAt).getTime(); + const end = deployment.completedAt + ? new Date(deployment.completedAt).getTime() + : Date.now(); + const seconds = Math.floor((end - start) / 1000); + const minutes = Math.floor(seconds / 60); + const remainingSeconds = seconds % 60; + return `${minutes}m ${remainingSeconds}s`; + } + + canPause(deployment: Deployment): boolean { + return deployment.status === 'running'; + } + + canResume(deployment: Deployment): boolean { + return deployment.status === 'paused'; + } + + canCancel(deployment: Deployment): boolean { + return ['running', 'paused', 'pending'].includes(deployment.status); + } + + canRollback(deployment: Deployment): boolean { + return ['completed', 'failed'].includes(deployment.status); + } +} +``` + +```html + +
+
+
+ Deployments + + {{ deployment.releaseName }} +
+ +
+
+

+ + {{ deployment.releaseName }} + {{ deployment.releaseVersion }} +

+

+ Deploying to {{ deployment.environmentName }} + using {{ deployment.strategy | titlecase }} strategy +

+
+ +
+ + + + +
+
+ +
+
+ {{ deployment.progress }}% complete + Duration: {{ getDuration(deployment) }} +
+ + +
+ +
+
+ {{ (targets$ | async)?.length || 0 }} + Total Targets +
+
+ + {{ ((targets$ | async) || []) | filter:'status':'completed' | count }} + + Completed +
+
+ + {{ ((targets$ | async) || []) | filter:'status':'running' | count }} + + Running +
+
+ + {{ ((targets$ | async) || []) | filter:'status':'failed' | count }} + + Failed +
+
+
+ +
+ + +
+
+ + + +
+ +
+ + + + + + + + +
+
+
+
+ + + + + +``` + +### Target Progress List Component + +```typescript +// target-progress-list.component.ts +import { Component, Input, Output, EventEmitter, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; + +@Component({ + selector: 'so-target-progress-list', + standalone: true, + imports: [CommonModule], + templateUrl: './target-progress-list.component.html', + styleUrl: './target-progress-list.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class TargetProgressListComponent { + @Input() targets: DeploymentTarget[] | null = null; + @Input() selectedTargetId: string | null = null; + @Output() targetSelect = new EventEmitter(); + @Output() retryTarget = new EventEmitter(); + + getStatusIcon(status: string): string { + const icons: Record = { + pending: 'pi-clock', + running: 'pi-spin pi-spinner', + completed: 'pi-check-circle', + failed: 'pi-times-circle', + skipped: 'pi-minus-circle' + }; + return icons[status] || 'pi-question'; + } + + getStatusClass(status: string): string { + return `target--${status}`; + } + + getTypeIcon(type: string): string { + const icons: Record = { + docker_host: 'pi-box', + compose_host: 'pi-th-large', + ecs_service: 'pi-cloud', + nomad_job: 'pi-sitemap' + }; + return icons[type] || 'pi-server'; + } + + formatDuration(ms: number | null): string { + if (ms === null) return '-'; + const seconds = Math.floor(ms / 1000); + const minutes = Math.floor(seconds / 60); + const remainingSeconds = seconds % 60; + if (minutes > 0) { + return `${minutes}m ${remainingSeconds}s`; + } + return `${seconds}s`; + } + + onSelect(targetId: string): void { + if (this.selectedTargetId === targetId) { + this.targetSelect.emit(null); + } else { + this.targetSelect.emit(targetId); + } + } + + onRetry(event: Event, targetId: string): void { + event.stopPropagation(); + this.retryTarget.emit(targetId); + } +} +``` + +```html + +
+

Deployment Targets

+ +
+
+
+ +
+ +
+
+ + {{ target.name }} +
+
+ +
+
+ + {{ formatDuration(target.duration) }} + + Agent: {{ target.agentId }} +
+
+ {{ target.error }} +
+
+ +
+ +
+
+
+ +
+ +

No targets

+
+
+``` + +### Log Stream Viewer Component + +```typescript +// log-stream-viewer.component.ts +import { Component, Input, ViewChild, ElementRef, AfterViewChecked, + OnChanges, SimpleChanges, signal, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { FormsModule } from '@angular/forms'; + +export interface LogEntry { + timestamp: Date; + level: 'debug' | 'info' | 'warn' | 'error'; + source: string; + targetId: string | null; + message: string; +} + +@Component({ + selector: 'so-log-stream-viewer', + standalone: true, + imports: [CommonModule, FormsModule], + templateUrl: './log-stream-viewer.component.html', + styleUrl: './log-stream-viewer.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class LogStreamViewerComponent implements AfterViewChecked, OnChanges { + @Input() logs: LogEntry[] | null = null; + @Input() targetId: string | null = null; + + @ViewChild('logContainer') logContainer!: ElementRef; + + autoScroll = signal(true); + searchTerm = signal(''); + levelFilter = signal(['debug', 'info', 'warn', 'error']); + private shouldScroll = false; + + readonly levelOptions = [ + { label: 'Debug', value: 'debug' }, + { label: 'Info', value: 'info' }, + { label: 'Warn', value: 'warn' }, + { label: 'Error', value: 'error' } + ]; + + ngOnChanges(changes: SimpleChanges): void { + if (changes['logs'] && this.autoScroll()) { + this.shouldScroll = true; + } + } + + ngAfterViewChecked(): void { + if (this.shouldScroll && this.logContainer) { + this.scrollToBottom(); + this.shouldScroll = false; + } + } + + get filteredLogs(): LogEntry[] { + if (!this.logs) return []; + + return this.logs.filter(log => { + // Filter by target + if (this.targetId && log.targetId !== this.targetId) { + return false; + } + + // Filter by level + if (!this.levelFilter().includes(log.level)) { + return false; + } + + // Filter by search term + const term = this.searchTerm().toLowerCase(); + if (term && !log.message.toLowerCase().includes(term)) { + return false; + } + + return true; + }); + } + + getLevelClass(level: string): string { + return `log-entry--${level}`; + } + + formatTimestamp(timestamp: Date): string { + return new Date(timestamp).toISOString().split('T')[1].slice(0, 12); + } + + scrollToBottom(): void { + if (this.logContainer) { + const el = this.logContainer.nativeElement; + el.scrollTop = el.scrollHeight; + } + } + + onClear(): void { + this.searchTerm.set(''); + } + + onCopy(): void { + const text = this.filteredLogs + .map(log => `${this.formatTimestamp(log.timestamp)} [${log.level.toUpperCase()}] ${log.message}`) + .join('\n'); + navigator.clipboard.writeText(text); + } + + onDownload(): void { + const text = this.filteredLogs + .map(log => JSON.stringify(log)) + .join('\n'); + const blob = new Blob([text], { type: 'application/x-ndjson' }); + const url = URL.createObjectURL(blob); + const a = document.createElement('a'); + a.href = url; + a.download = `deployment-logs-${Date.now()}.ndjson`; + a.click(); + URL.revokeObjectURL(url); + } +} +``` + +```html + +
+
+ + + + + + + + +
+ + + + +
+
+ +
+
+ +

No logs to display

+
+ +
+ + {{ log.level | uppercase }} + {{ log.source }} + {{ log.message }} +
+
+
+``` + +### Rollback Dialog Component + +```typescript +// rollback-dialog.component.ts +import { Component, Input, Output, EventEmitter, signal, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { FormBuilder, FormGroup, ReactiveFormsModule, Validators } from '@angular/forms'; + +@Component({ + selector: 'so-rollback-dialog', + standalone: true, + imports: [CommonModule, ReactiveFormsModule], + templateUrl: './rollback-dialog.component.html', + styleUrl: './rollback-dialog.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class RollbackDialogComponent { + @Input() deployment: Deployment | null = null; + @Input() targets: DeploymentTarget[] | null = null; + @Output() confirm = new EventEmitter<{ targetIds?: string[]; reason: string }>(); + @Output() cancel = new EventEmitter(); + + private readonly fb = inject(FormBuilder); + + rollbackType = signal<'all' | 'selected'>('all'); + selectedTargets = signal>(new Set()); + + form: FormGroup = this.fb.group({ + reason: ['', [Validators.required, Validators.minLength(10)]] + }); + + get completedTargets(): DeploymentTarget[] { + return (this.targets || []).filter(t => t.status === 'completed'); + } + + onToggleTarget(targetId: string): void { + this.selectedTargets.update(set => { + const newSet = new Set(set); + if (newSet.has(targetId)) { + newSet.delete(targetId); + } else { + newSet.add(targetId); + } + return newSet; + }); + } + + onSelectAll(): void { + this.selectedTargets.set(new Set(this.completedTargets.map(t => t.id))); + } + + onDeselectAll(): void { + this.selectedTargets.set(new Set()); + } + + onConfirm(): void { + if (this.form.invalid) return; + + const targetIds = this.rollbackType() === 'selected' + ? Array.from(this.selectedTargets()) + : undefined; + + this.confirm.emit({ + targetIds, + reason: this.form.value.reason + }); + } +} +``` + +```html + +
+
+
+

+ + Rollback Deployment +

+ +
+ +
+ + +
+

Rollback Scope

+
+
+ + + +
+
+ + + +
+
+
+ +
+
+ Select targets to rollback ({{ selectedTargets().size }} selected) + +
+
+
+ + + +
+
+
+ +
+
+ + + Minimum 10 characters required +
+
+
+ +
+ + +
+
+
+``` + +--- + +## Acceptance Criteria + +- [ ] Deployment list shows all deployments +- [ ] Deployment monitor loads correctly +- [ ] Real-time progress updates via SignalR +- [ ] Target list shows all targets with status +- [ ] Target selection shows target-specific logs +- [ ] Log streaming works in real-time +- [ ] Log filtering by level works +- [ ] Log search works +- [ ] Pause deployment works +- [ ] Resume deployment works +- [ ] Cancel deployment with confirmation +- [ ] Rollback dialog opens +- [ ] Rollback scope selection works +- [ ] Rollback reason required +- [ ] Retry failed target works +- [ ] Timeline shows deployment events +- [ ] Metrics display correctly +- [ ] Unit test coverage >=80% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 107_001 Platform API Gateway | Internal | TODO | +| Angular 17 | External | Available | +| NgRx 17 | External | Available | +| PrimeNG 17 | External | Available | +| SignalR Client | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| DeploymentListComponent | TODO | | +| DeploymentMonitorComponent | TODO | | +| TargetProgressListComponent | TODO | | +| LogStreamViewerComponent | TODO | | +| DeploymentTimelineComponent | TODO | | +| DeploymentMetricsComponent | TODO | | +| RollbackDialogComponent | TODO | | +| Deployment NgRx Store | TODO | | +| SignalR integration | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/implplan/SPRINT_20260110_111_007_FE_evidence_viewer.md b/docs/implplan/SPRINT_20260110_111_007_FE_evidence_viewer.md new file mode 100644 index 000000000..68151df41 --- /dev/null +++ b/docs/implplan/SPRINT_20260110_111_007_FE_evidence_viewer.md @@ -0,0 +1,1109 @@ +# SPRINT: Evidence Viewer + +> **Sprint ID:** 111_007 +> **Module:** FE +> **Phase:** 11 - UI Implementation +> **Status:** TODO +> **Parent:** [111_000_INDEX](SPRINT_20260110_111_000_INDEX_ui_implementation.md) + +--- + +## Overview + +Implement the Evidence Viewer UI providing evidence packet browsing, detailed evidence inspection, cryptographic signature verification, and evidence export capabilities. + +### Objectives + +- Evidence packet list with filtering +- Evidence detail view with content inspection +- Cryptographic signature verification +- Evidence export to multiple formats +- Evidence comparison view +- Audit trail integration + +### Working Directory + +``` +src/Web/StellaOps.Web/ +├── src/app/features/release-orchestrator/ +│ └── evidence/ +│ ├── evidence-list/ +│ │ ├── evidence-list.component.ts +│ │ ├── evidence-list.component.html +│ │ └── evidence-list.component.scss +│ ├── evidence-detail/ +│ │ ├── evidence-detail.component.ts +│ │ ├── evidence-detail.component.html +│ │ └── evidence-detail.component.scss +│ ├── components/ +│ │ ├── evidence-verifier/ +│ │ ├── evidence-content-viewer/ +│ │ ├── export-dialog/ +│ │ ├── evidence-timeline/ +│ │ └── signature-panel/ +│ └── evidence.routes.ts +└── src/app/store/release-orchestrator/ + └── evidence/ + ├── evidence.actions.ts + ├── evidence.reducer.ts + └── evidence.selectors.ts +``` + +--- + +## Deliverables + +### Evidence List Component + +```typescript +// evidence-list.component.ts +import { Component, OnInit, inject, signal, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { RouterModule } from '@angular/router'; +import { FormsModule } from '@angular/forms'; +import { Store } from '@ngrx/store'; +import { EvidenceActions } from '@store/release-orchestrator/evidence/evidence.actions'; +import * as EvidenceSelectors from '@store/release-orchestrator/evidence/evidence.selectors'; + +export interface EvidencePacket { + id: string; + deploymentId: string; + releaseId: string; + releaseName: string; + releaseVersion: string; + environmentId: string; + environmentName: string; + status: 'pending' | 'complete' | 'failed'; + signatureStatus: 'unsigned' | 'valid' | 'invalid' | 'expired'; + contentHash: string; + signedAt: Date | null; + signedBy: string | null; + createdAt: Date; + size: number; + contentTypes: string[]; +} + +@Component({ + selector: 'so-evidence-list', + standalone: true, + imports: [CommonModule, RouterModule, FormsModule], + templateUrl: './evidence-list.component.html', + styleUrl: './evidence-list.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class EvidenceListComponent implements OnInit { + private readonly store = inject(Store); + + readonly evidencePackets$ = this.store.select(EvidenceSelectors.selectFilteredEvidence); + readonly loading$ = this.store.select(EvidenceSelectors.selectLoading); + readonly totalCount$ = this.store.select(EvidenceSelectors.selectTotalCount); + + searchTerm = signal(''); + signatureFilter = signal([]); + environmentFilter = signal(null); + dateRange = signal<[Date, Date] | null>(null); + + readonly signatureOptions = [ + { label: 'Valid', value: 'valid' }, + { label: 'Invalid', value: 'invalid' }, + { label: 'Unsigned', value: 'unsigned' }, + { label: 'Expired', value: 'expired' } + ]; + + ngOnInit(): void { + this.loadEvidence(); + } + + loadEvidence(): void { + this.store.dispatch(EvidenceActions.loadEvidence({ + filter: { + search: this.searchTerm(), + signatureStatuses: this.signatureFilter(), + environment: this.environmentFilter(), + dateRange: this.dateRange() + } + })); + } + + onSearch(term: string): void { + this.searchTerm.set(term); + this.loadEvidence(); + } + + onSignatureFilterChange(statuses: string[]): void { + this.signatureFilter.set(statuses); + this.loadEvidence(); + } + + onDateRangeChange(range: [Date, Date] | null): void { + this.dateRange.set(range); + this.loadEvidence(); + } + + getSignatureIcon(status: string): string { + const icons: Record = { + valid: 'pi-verified', + invalid: 'pi-times-circle', + unsigned: 'pi-minus-circle', + expired: 'pi-clock' + }; + return icons[status] || 'pi-question-circle'; + } + + getSignatureClass(status: string): string { + const classes: Record = { + valid: 'signature--valid', + invalid: 'signature--invalid', + unsigned: 'signature--unsigned', + expired: 'signature--expired' + }; + return classes[status] || ''; + } + + formatSize(bytes: number): string { + if (bytes < 1024) return `${bytes} B`; + if (bytes < 1024 * 1024) return `${(bytes / 1024).toFixed(1)} KB`; + return `${(bytes / (1024 * 1024)).toFixed(1)} MB`; + } + + onExportMultiple(ids: string[]): void { + this.store.dispatch(EvidenceActions.exportMultiple({ ids, format: 'zip' })); + } +} +``` + +```html + +
+
+

Evidence Packets

+
+ +
+
+ +
+ + + + + + + + + + +
+ +
+ + + + + + + Release + Environment + Signature + Contents + Size + Created + Actions + + + + + + + + + + {{ evidence.releaseName }} + {{ evidence.releaseVersion }} + + + {{ evidence.environmentName }} + + + + {{ evidence.signatureStatus | titlecase }} + + + +
+ + {{ type }} + +
+ + {{ formatSize(evidence.size) }} + {{ evidence.createdAt | date:'short' }} + + + + + +
+ + + +
+ +

No evidence packets found

+

Evidence packets are generated during deployments.

+
+ + +
+
+
+ + + + +
+``` + +### Evidence Detail Component + +```typescript +// evidence-detail.component.ts +import { Component, OnInit, inject, signal, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { ActivatedRoute, RouterModule } from '@angular/router'; +import { Store } from '@ngrx/store'; +import { EvidenceVerifierComponent } from '../components/evidence-verifier/evidence-verifier.component'; +import { EvidenceContentViewerComponent } from '../components/evidence-content-viewer/evidence-content-viewer.component'; +import { SignaturePanelComponent } from '../components/signature-panel/signature-panel.component'; +import { ExportDialogComponent } from '../components/export-dialog/export-dialog.component'; +import { EvidenceTimelineComponent } from '../components/evidence-timeline/evidence-timeline.component'; +import { EvidenceActions } from '@store/release-orchestrator/evidence/evidence.actions'; +import * as EvidenceSelectors from '@store/release-orchestrator/evidence/evidence.selectors'; + +export interface EvidenceDetail extends EvidencePacket { + content: EvidenceContent; + signature: EvidenceSignature | null; + verificationResult: VerificationResult | null; +} + +export interface EvidenceContent { + metadata: { + deploymentId: string; + releaseId: string; + environmentId: string; + startedAt: string; + completedAt: string; + initiatedBy: string; + outcome: string; + }; + release: { + name: string; + version: string; + components: Array<{ + name: string; + digest: string; + version: string; + }>; + }; + workflow: { + id: string; + name: string; + version: number; + stepsExecuted: number; + stepsFailed: number; + }; + targets: Array<{ + id: string; + name: string; + type: string; + outcome: string; + duration: number; + }>; + approvals: Array<{ + approver: string; + action: string; + timestamp: string; + comment: string; + }>; + gateResults: Array<{ + gateId: string; + gateName: string; + status: string; + evaluatedAt: string; + }>; + artifacts: Array<{ + name: string; + type: string; + digest: string; + size: number; + }>; +} + +export interface EvidenceSignature { + algorithm: string; + keyId: string; + signature: string; + signedAt: Date; + signedBy: string; + certificate: string | null; +} + +export interface VerificationResult { + valid: boolean; + message: string; + details: { + signatureValid: boolean; + contentHashValid: boolean; + certificateValid: boolean; + timestampValid: boolean; + }; + verifiedAt: Date; +} + +@Component({ + selector: 'so-evidence-detail', + standalone: true, + imports: [ + CommonModule, + RouterModule, + EvidenceVerifierComponent, + EvidenceContentViewerComponent, + SignaturePanelComponent, + ExportDialogComponent, + EvidenceTimelineComponent + ], + templateUrl: './evidence-detail.component.html', + styleUrl: './evidence-detail.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class EvidenceDetailComponent implements OnInit { + private readonly store = inject(Store); + private readonly route = inject(ActivatedRoute); + + readonly evidence$ = this.store.select(EvidenceSelectors.selectCurrentEvidence); + readonly loading$ = this.store.select(EvidenceSelectors.selectLoading); + readonly verifying$ = this.store.select(EvidenceSelectors.selectVerifying); + + showExportDialog = signal(false); + activeTab = signal<'overview' | 'content' | 'signature' | 'timeline'>('overview'); + + ngOnInit(): void { + const id = this.route.snapshot.paramMap.get('id'); + if (id) { + this.store.dispatch(EvidenceActions.loadEvidence({ id })); + } + } + + onVerify(): void { + const id = this.route.snapshot.paramMap.get('id'); + if (id) { + this.store.dispatch(EvidenceActions.verifyEvidence({ id })); + } + } + + onExport(): void { + this.showExportDialog.set(true); + } + + onExportConfirm(options: { format: string; includeSignature: boolean }): void { + const id = this.route.snapshot.paramMap.get('id'); + if (id) { + this.store.dispatch(EvidenceActions.exportEvidence({ + id, + format: options.format, + includeSignature: options.includeSignature + })); + } + this.showExportDialog.set(false); + } + + onDownloadRaw(): void { + const id = this.route.snapshot.paramMap.get('id'); + if (id) { + this.store.dispatch(EvidenceActions.downloadRaw({ id })); + } + } + + getSignatureIcon(status: string): string { + const icons: Record = { + valid: 'pi-verified', + invalid: 'pi-times-circle', + unsigned: 'pi-minus-circle', + expired: 'pi-clock' + }; + return icons[status] || 'pi-question-circle'; + } + + getSignatureClass(status: string): string { + return `signature--${status}`; + } + + getOutcomeClass(outcome: string): string { + const classes: Record = { + success: 'outcome--success', + failure: 'outcome--failure', + partial: 'outcome--warning', + cancelled: 'outcome--secondary' + }; + return classes[outcome] || ''; + } +} +``` + +```html + +
+
+
+ Evidence + + {{ evidence.releaseName }} +
+ +
+
+

+ Evidence Packet + + + {{ evidence.signatureStatus | titlecase }} + +

+

+ {{ evidence.releaseName }} {{ evidence.releaseVersion }} + deployed to {{ evidence.environmentName }} +

+
+ +
+ + +
+
+ +
+
+ Content Hash + {{ evidence.contentHash }} +
+
+ Created + {{ evidence.createdAt | date:'medium' }} +
+
+ Signed + + {{ evidence.signedAt | date:'medium' }} + by {{ evidence.signedBy }} + +
+
+
+ + +
+ +
+ {{ evidence.verificationResult.valid ? 'Verification Passed' : 'Verification Failed' }} + {{ evidence.verificationResult.message }} +
+ + Verified {{ evidence.verificationResult.verifiedAt | date:'medium' }} + +
+ +
+ + + + +
+ +
+ +
+
+
+

Deployment Summary

+
+
+ {{ evidence.content.metadata.outcome | titlecase }} +
+
+
+ {{ evidence.content.targets.length }} + Targets +
+
+ {{ evidence.content.workflow.stepsExecuted }} + Steps +
+
+ {{ evidence.content.approvals.length }} + Approvals +
+
+
+
+ +
+

Release Components

+
+
+ {{ comp.name }} + {{ comp.version }} + {{ comp.digest | slice:0:19 }}... +
+
+
+ +
+

Gate Results

+
+
+ + {{ gate.gateName }} + {{ gate.status | titlecase }} +
+
+
+ +
+

Artifacts

+
+
+ + {{ artifact.name }} + {{ artifact.type }} + {{ artifact.digest | slice:0:12 }}... +
+
+
+
+
+ + + + + + + + + + + + +
+
+ + + + +
+ +
+``` + +### Evidence Verifier Component + +```typescript +// evidence-verifier.component.ts +import { Component, Input, Output, EventEmitter, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; + +@Component({ + selector: 'so-evidence-verifier', + standalone: true, + imports: [CommonModule], + templateUrl: './evidence-verifier.component.html', + styleUrl: './evidence-verifier.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class EvidenceVerifierComponent { + @Input() result: VerificationResult | null = null; + @Input() verifying = false; + @Output() verify = new EventEmitter(); + + getCheckIcon(passed: boolean): string { + return passed ? 'pi-check-circle' : 'pi-times-circle'; + } + + getCheckClass(passed: boolean): string { + return passed ? 'check--passed' : 'check--failed'; + } +} +``` + +```html + +
+
+ +

Verify the cryptographic signature and content integrity of this evidence packet.

+ +
+ +
+ +

Verifying evidence...

+
+ +
+
+ +

{{ result.valid ? 'Verification Passed' : 'Verification Failed' }}

+
+ +

{{ result.message }}

+ +
+
+ + Signature verification +
+
+ + Content hash verification +
+
+ + Certificate validation +
+
+ + Timestamp validation +
+
+ + +
+
+``` + +### Export Dialog Component + +```typescript +// export-dialog.component.ts +import { Component, Output, EventEmitter, signal, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; +import { FormsModule } from '@angular/forms'; + +@Component({ + selector: 'so-export-dialog', + standalone: true, + imports: [CommonModule, FormsModule], + templateUrl: './export-dialog.component.html', + styleUrl: './export-dialog.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class ExportDialogComponent { + @Output() confirm = new EventEmitter<{ format: string; includeSignature: boolean }>(); + @Output() cancel = new EventEmitter(); + + selectedFormat = signal('json'); + includeSignature = signal(true); + + readonly formatOptions = [ + { + value: 'json', + label: 'JSON', + description: 'Raw JSON evidence packet', + icon: 'pi-file' + }, + { + value: 'pdf', + label: 'PDF Report', + description: 'Human-readable PDF document', + icon: 'pi-file-pdf' + }, + { + value: 'csv', + label: 'CSV', + description: 'Spreadsheet-compatible format', + icon: 'pi-file-excel' + }, + { + value: 'slsa', + label: 'SLSA Provenance', + description: 'SLSA v1.0 provenance format', + icon: 'pi-shield' + } + ]; + + onConfirm(): void { + this.confirm.emit({ + format: this.selectedFormat(), + includeSignature: this.includeSignature() + }); + } +} +``` + +```html + +
+
+
+

+ + Export Evidence +

+ +
+ +
+
+

Export Format

+
+
+
+ +
+
+ {{ format.label }} + {{ format.description }} +
+
+ +
+
+
+
+ +
+

Options

+
+ + + +
+
+
+ +
+ + +
+
+
+``` + +### Signature Panel Component + +```typescript +// signature-panel.component.ts +import { Component, Input, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; + +@Component({ + selector: 'so-signature-panel', + standalone: true, + imports: [CommonModule], + templateUrl: './signature-panel.component.html', + styleUrl: './signature-panel.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class SignaturePanelComponent { + @Input() signature: EvidenceSignature | null = null; + @Input() verificationResult: VerificationResult | null = null; + + showFullSignature = false; + showFullCertificate = false; + + formatSignature(sig: string): string { + if (this.showFullSignature) return sig; + return sig.length > 64 ? sig.substring(0, 64) + '...' : sig; + } + + onCopySignature(): void { + if (this.signature) { + navigator.clipboard.writeText(this.signature.signature); + } + } + + onCopyCertificate(): void { + if (this.signature?.certificate) { + navigator.clipboard.writeText(this.signature.certificate); + } + } +} +``` + +```html + +
+
+ +

Unsigned Evidence

+

This evidence packet has not been cryptographically signed.

+
+ +
+
+

Signature Details

+
+
+ Algorithm + {{ signature.algorithm }} +
+
+ Key ID + {{ signature.keyId }} +
+
+ Signed At + {{ signature.signedAt | date:'medium' }} +
+
+ Signed By + {{ signature.signedBy }} +
+
+
+ +
+
+

Signature Value

+
+ + +
+
+ + {{ formatSignature(signature.signature) }} + +
+ +
+
+

Certificate

+
+ + +
+
+
{{ signature.certificate }}
+
+ +
+

Verification Status

+
+ +
+ {{ verificationResult.valid ? 'Valid Signature' : 'Invalid Signature' }} + {{ verificationResult.message }} +
+
+
+
+
+``` + +### Evidence Content Viewer Component + +```typescript +// evidence-content-viewer.component.ts +import { Component, Input, signal, ChangeDetectionStrategy } from '@angular/core'; +import { CommonModule } from '@angular/common'; + +@Component({ + selector: 'so-evidence-content-viewer', + standalone: true, + imports: [CommonModule], + templateUrl: './evidence-content-viewer.component.html', + styleUrl: './evidence-content-viewer.component.scss', + changeDetection: ChangeDetectionStrategy.OnPush +}) +export class EvidenceContentViewerComponent { + @Input() content: EvidenceContent | null = null; + + viewMode = signal<'formatted' | 'raw'>('formatted'); + + get rawJson(): string { + return JSON.stringify(this.content, null, 2); + } + + onCopy(): void { + navigator.clipboard.writeText(this.rawJson); + } +} +``` + +```html + +
+
+
+ + +
+ +
+ +
+
+

Metadata

+
+
+ {{ item.key }} + {{ item.value }} +
+
+
+ +
+

Targets ({{ content.targets.length }})

+
+ + + + + + + + + + + + + + + + + +
NameTypeOutcomeDuration
{{ target.name }}{{ target.type }} + + {{ target.outcome }} + + {{ target.duration }}ms
+
+
+ +
+

Approvals ({{ content.approvals.length }})

+
+
+
+ {{ approval.approver }} + + {{ approval.action }} + +
+

{{ approval.comment }}

+ {{ approval.timestamp | date:'medium' }} +
+
+
+
+ +
+
{{ rawJson }}
+
+
+``` + +--- + +## Acceptance Criteria + +- [ ] Evidence list displays all packets +- [ ] Filtering by signature status works +- [ ] Filtering by date range works +- [ ] Search finds evidence by release/environment +- [ ] Evidence detail loads correctly +- [ ] Overview tab shows summary +- [ ] Content tab shows formatted/raw view +- [ ] Signature tab shows signature details +- [ ] Verification triggers and shows result +- [ ] Export dialog opens +- [ ] Export to JSON works +- [ ] Export to PDF works +- [ ] Export to CSV works +- [ ] Export to SLSA format works +- [ ] Copy signature/certificate works +- [ ] Download raw evidence works +- [ ] Timeline shows evidence events +- [ ] Unit test coverage >=80% + +--- + +## Dependencies + +| Dependency | Type | Status | +|------------|------|--------| +| 109_002 Evidence Signer | Internal | TODO | +| Angular 17 | External | Available | +| NgRx 17 | External | Available | +| PrimeNG 17 | External | Available | + +--- + +## Delivery Tracker + +| Deliverable | Status | Notes | +|-------------|--------|-------| +| EvidenceListComponent | TODO | | +| EvidenceDetailComponent | TODO | | +| EvidenceVerifierComponent | TODO | | +| EvidenceContentViewerComponent | TODO | | +| ExportDialogComponent | TODO | | +| SignaturePanelComponent | TODO | | +| EvidenceTimelineComponent | TODO | | +| Evidence NgRx Store | TODO | | +| Unit tests | TODO | | + +--- + +## Execution Log + +| Date | Entry | +|------|-------| +| 10-Jan-2026 | Sprint created | diff --git a/docs/key-features.md b/docs/key-features.md index e0121829b..ad0f70c4f 100644 --- a/docs/key-features.md +++ b/docs/key-features.md @@ -1,91 +1,186 @@ -# Key Features – Capability Cards +# Key Features — Capability Cards -> **Core Thesis:** Stella Ops isn't a scanner that outputs findings. It's a platform that outputs **attestable decisions that can be replayed**. That difference survives auditors, regulators, and supply-chain propagation. +> **Core Thesis:** Stella Ops Suite isn't a scanner or a deployment tool—it's a **release control plane** that produces **attestable decisions that can be replayed**. Security is a gate, not a blocker. Evidence survives auditors, regulators, and supply-chain propagation. -> **Looking for the complete feature catalog?** See [`full-features-list.md`](full-features-list.md) for the comprehensive list of all platform capabilities, or [`FEATURE_MATRIX.md`](FEATURE_MATRIX.md) for tier-by-tier availability. +> **Looking for the complete feature catalog?** See [`full-features-list.md`](full-features-list.md) for the comprehensive list, or [`FEATURE_MATRIX.md`](FEATURE_MATRIX.md) for tier-by-tier availability. --- ## At a Glance -| What Competitors Do | What Stella Ops Does | -|--------------------|---------------------| -| Output findings | Output decisions with proof chains | +| What Competitors Do | What Stella Ops Suite Does | +|--------------------|---------------------------| +| CI/CD runs pipelines | Central release authority across environments | +| Deployment tools promote | Promotion with integrated security gates | +| Scanners output findings | Security gates output decisions with proof chains | | VEX as suppression file | VEX as logical claim system (K4 lattice) | -| Reachability as badge | Reachability as signed proof | -| "+3 CVEs" reports | "Exploitability dropped 41%" semantic deltas | -| Hide unknowns | Surface and score unknowns | +| Release identity via tags | Release identity via immutable digests | +| Per-seat/per-project pricing | Pay for environments + new digests/day | | Online-first | Offline-first with full parity | --- -Each card below pairs the headline capability with the evidence that backs it and why it matters day to day. +## Release Orchestration (Planned) -## 0. Decision Capsules — Audit-Grade Evidence Bundles +### 0. Release Control Plane -**The core moat capability.** Every scan result is sealed in a **Decision Capsule**—a content-addressed bundle containing everything needed to reproduce and verify the vulnerability decision. +**The new core capability.** Stella Ops Suite becomes the central release authority between CI and runtime targets. + +| Capability | What It Does | +|------------|--------------| +| **Environment management** | Define Dev/Stage/Prod with freeze windows and approval policies | +| **Release bundles** | Compose releases from component OCI digests with semantic versioning | +| **Promotion workflows** | DAG-based workflow engine with approvals, gates, and hooks | +| **Security gates** | Scan on build, evaluate on release, re-evaluate on CVE updates | +| **Deployment execution** | Deploy to Docker/Compose/ECS/Nomad via agents or agentless | +| **Evidence packets** | Every release decision is cryptographically signed and stored | + +**Why it matters:** Non-Kubernetes container teams finally get a central release authority with audit-grade evidence—without replacing their existing CI/SCM/registry stack. + +### 1. Digest-First Release Identity + +**Tags are mutable; digests are truth.** A release is an immutable set of OCI digests, resolved at release creation time. + +``` +Release: myapp-v2.3.1 +Components: + api: sha256:abc123... + worker: sha256:def456... + frontend: sha256:789ghi... +``` + +**What this enables:** +- Tamper detection at pull time (digest mismatch = deployment failure) +- Audit trail of exactly what was deployed +- Rollback to known-good digests, not "latest" tags +- "What is deployed where" tracking with integrity + +**Modules (planned):** `ReleaseManager`, `ComponentRegistry`, `VersionManager` + +### 2. Promotion Workflows with Security Gates + +**Security integrated into release flow, not bolted on.** Promotion requests trigger gate evaluation before deployment. + +| Gate Type | What It Checks | +|-----------|---------------| +| **Security gate** | Reachable critical/high vulnerabilities | +| **Approval gate** | Required approval count, separation of duties | +| **Freeze window gate** | Environment freeze windows | +| **Policy gate** | Custom OPA/Rego policies | +| **Previous environment gate** | Release deployed to prior environment | + +**Decision records include:** +- All gate results with pass/fail reasons +- Evidence refs (scan verdicts, approval records) +- Policy hash + inputs hash for replay +- "Why blocked?" explainability + +**Modules (planned):** `PromotionManager`, `ApprovalGateway`, `DecisionEngine` + +### 3. Deployment Execution + +**Deploy to non-Kubernetes targets as first-class citizens.** Agent-based or agentless deployment to Docker hosts, Compose, ECS, Nomad. + +| Target Type | Deployment Method | +|-------------|-------------------| +| **Docker host** | Agent pulls and starts containers | +| **Compose host** | Agent writes `compose.stella.lock.yml` and runs `docker-compose up` | +| **ECS service** | Agent updates task definition and service | +| **Nomad job** | Agent updates job spec and submits | +| **SSH remote** | Agentless via SSH (Linux) | +| **WinRM remote** | Agentless via WinRM (Windows) | + +**Generated artifacts:** +- `compose.stella.lock.yml`: Pinned digests, resolved environment refs +- `stella.version.json`: Version sticker on target for drift detection +- `release.evidence.json`: Decision record + +**Modules (planned):** `DeployOrchestrator`, `Agent.*`, `ArtifactGenerator` + +### 4. Progressive Delivery + +**A/B releases and canary deployments.** Gradual rollout with automatic rollback on health failure. + +| Strategy | Description | +|----------|-------------| +| **Immediate** | 0% → 100% instantly | +| **Canary** | 10% → 25% → 50% → 100% with health checks | +| **Blue-green** | Deploy to B, switch traffic, retire A | +| **Rolling** | 10% at a time with health checks | + +**Traffic routing plugins:** Nginx, HAProxy, Traefik, AWS ALB + +**Modules (planned):** `ABManager`, `TrafficRouter`, `CanaryController` + +### 5. Plugin System (Three-Surface Model) + +**Extensible without core code changes.** Plugins contribute through three surfaces. + +| Surface | What It Does | +|---------|--------------| +| **Manifest** | Declares what the plugin provides (integrations, steps, agents) | +| **Connector runtime** | gRPC interface for runtime operations | +| **Step provider** | Execution characteristics for workflow steps | + +**Plugin types:** +- **Integration connectors:** SCM (GitHub, GitLab), CI (Actions, Jenkins), Registry (Harbor, ECR), Vault (HashiCorp, AWS Secrets) +- **Step providers:** Custom workflow steps +- **Agent types:** New deployment targets +- **Gate providers:** Custom gate evaluations + +**Modules (planned):** `PluginRegistry`, `PluginLoader`, `PluginSandbox`, `PluginSDK` + +--- + +## Security Capabilities (Operational) + +### 6. Decision Capsules — Audit-Grade Evidence Bundles + +**Every scan and release decision is sealed.** A Decision Capsule is a content-addressed bundle containing everything needed to reproduce and verify the decision. | Component | What's Included | |-----------|----------------| -| **Inputs** | Exact SBOM, frozen feed snapshots (with Merkle roots), policy version, lattice rules | +| **Inputs** | Exact SBOM, frozen feed snapshots (with Merkle roots), policy version | | **Evidence** | Reachability proofs (static + runtime), VEX statements, binary fingerprints | | **Outputs** | Verdicts, risk scores, remediation paths | | **Signatures** | DSSE envelopes over all of the above | -**Why it matters:** Six months from now, an auditor can run `stella replay srm.yaml --assert-digest ` and get *identical* results. This is what "audit-grade assurance" actually means. +**Why it matters:** Auditors can replay any decision bit-for-bit. This is what "audit-grade assurance" actually means. -**No competitor offers this.** Trivy, Grype, Snyk—none can replay a past scan bit-for-bit because they don't freeze feeds or produce deterministic manifests. +**Modules:** `EvidenceLocker`, `Attestor`, `Replay` -## 1. Delta SBOM Engine +### 7. Lattice Policy + OpenVEX (K4 Logic) -**Performance without sacrificing determinism.** Layer-aware ingestion keeps the SBOM catalog content-addressed; rescans only fetch new layers. +**VEX as a logical claim system, not a suppression file.** The policy engine uses Belnap K4 four-valued logic. -- **Speed:** Warm scans < 1 second; CI/CD pipelines stay fast -- **Determinism:** Replay Manifest (SRM) captures exact analyzer inputs/outputs per layer -- **Evidence:** Binary crosswalk via Build-ID mapping; `bin:{sha256}` fallbacks for stripped binaries +| State | Meaning | +|-------|---------| +| **Unknown (bottom)** | No information | +| **True** | Positive assertion | +| **False** | Negative assertion | +| **Conflict (top)** | Contradictory assertions | -**Modules:** `Scanner`, `SbomService`, `BinaryIndex` +**Why it matters:** When vendor says "not_affected" but runtime shows the function was called, you have a *conflict*—not a false positive. ---- +**Modules:** `VexLens`, `TrustLatticeEngine`, `Policy` -## 2. Lattice Policy + OpenVEX (K4 Logic) +### 8. Signed Reachability Proofs -**VEX as a logical claim system, not a suppression file.** The policy engine uses **Belnap K4 four-valued logic** (Unknown, True, False, Conflict) to merge SBOM, advisories, VEX, and waivers. +**Proof of exploitability, not just a badge.** Every reachability graph is sealed with DSSE. -| What Competitors Do | What Stella Does | -|--------------------|------------------| -| VEX filters findings (boolean) | VEX is logical claims with trust weighting | -| Conflicts hidden | Conflicts are explicit state (⊤) | -| "Vendor says not_affected" = done | Vendor + runtime + reachability merged; conflicts surfaced | -| Unknown = assume safe | Unknown = first-class state with risk implications | +| Layer | What It Proves | +|-------|---------------| +| **Static** | Call graph shows path from entrypoint → vulnerable function | +| **Binary** | Compiled binary contains the symbol | +| **Runtime** | Process actually executed the code path | -**Why it matters:** When vendor says "not_affected" but your runtime shows the function was called, you have a *conflict*—not a false positive. The lattice preserves this for policy resolution. +**Why it matters:** "Here's the exact call path" vs "potentially reachable." Signed, not claimed. -**Modules:** `VexLens`, `TrustLatticeEngine`, `Excititor` (110+ tests passing) +**Modules:** `ReachGraph`, `PathWitnessBuilder` ---- +### 9. Deterministic Replay -## 3. Sovereign Crypto Profiles - -**Regional compliance without code changes.** FIPS, eIDAS, GOST, SM, and PQC (post-quantum) profiles are configuration toggles, not recompiles. - -| Profile | Algorithms | Use Case | -|---------|-----------|----------| -| **FIPS-140-3** | ECDSA P-256, RSA-PSS | US federal requirements | -| **eIDAS** | ETSI TS 119 312 | EU qualified signatures | -| **GOST-2012** | GOST R 34.10-2012 | Russian Federation | -| **SM2** | GM/T 0003.2-2012 | People's Republic of China | -| **PQC** | Dilithium, Falcon | Post-quantum readiness | - -**Why it matters:** Multi-signature DSSE envelopes (sign with FIPS *and* GOST) for cross-jurisdiction compliance. No competitor offers this. - -**Modules:** `Cryptography`, `CryptoProfile`, `RootPack` - ---- - -## 4. Deterministic Replay - -**The audit-grade guarantee.** Every scan produces a DSSE + SRM bundle that can be replayed with `stella replay srm.yaml`. +**The audit-grade guarantee.** Every scan produces a DSSE + SRM bundle that can be replayed. ```bash # Six months later, prove what you knew @@ -93,212 +188,62 @@ stella replay srm.yaml --assert-digest sha256:abc123... # Output: PASS - identical result ``` -**What's frozen:** -- Feed snapshots (NVD, KEV, EPSS, distro advisories) with content hashes -- Analyzer versions and configs -- Policy rules and lattice state -- Random seeds for deterministic ordering +**What's frozen:** Feed snapshots, analyzer versions, policy rules, random seeds. -**Why it matters:** This is what "audit-grade" actually means. Not "we logged it" but "you can re-run it." +**Modules:** `Replay`, `Scanner`, `Policy` + +### 10. Sovereign Crypto Profiles + +**Regional compliance without code changes.** FIPS, eIDAS, GOST, SM, and PQC profiles are configuration toggles. + +| Profile | Use Case | +|---------|----------| +| **FIPS-140-3** | US federal | +| **eIDAS** | EU qualified signatures | +| **GOST-2012** | Russian Federation | +| **SM2** | People's Republic of China | +| **PQC** | Post-quantum readiness | + +**Modules:** `Cryptography`, `CryptoProfile` + +### 11. Offline Operations (Air-Gap Parity) + +**Full functionality without network.** Offline Update Kits bundle everything needed. + +| Component | Offline Method | +|-----------|----------------| +| Feed updates | Sealed bundle with Merkle roots | +| Crypto verification | Embedded revocation lists | +| Transparency logging | Local transparency mirror | + +**Modules:** `AirGap.Controller`, `TrustStore` --- -## 5. Offline Operations (Air-Gap Parity) +## Competitive Moats Summary -**Full functionality without network.** Offline Update Kits bundle everything needed for air-gapped operation. +**Six capabilities no competitor offers together:** -| Component | Online | Offline | -|-----------|--------|---------| -| Feed updates | Live | Sealed bundle with Merkle roots | -| Crypto verification | OCSP/CRL | Embedded revocation lists | -| Transparency logging | Rekor | Local transparency mirror | -| Trust roots | Live TSL | RootPack bundles | +| # | Capability | Category | +|---|-----------|----------| +| 1 | **Non-Kubernetes Specialization** | Release orchestration | +| 2 | **Digest-First Release Identity** | Release orchestration | +| 3 | **Security Gates in Promotion Flow** | Release orchestration | +| 4 | **Signed Reachability Proofs** | Security | +| 5 | **Deterministic Replay** | Security | +| 6 | **Sovereign + Offline Operation** | Operations | -**Why it matters:** Air-gapped environments get *identical* results to connected, not degraded. Competitors offer partial offline (cached feeds) but not epistemic parity (sealed, reproducible knowledge state). - -**Modules:** `AirGap.Controller`, `TrustStore`, `EgressPolicy` - ---- - -## 6. Signed Reachability Proofs - -**Proof of exploitability, not just a badge.** Every reachability graph is sealed with DSSE; optional edge-bundle attestations for contested paths. - -| Layer | What It Proves | Attestation | -|-------|---------------|-------------| -| **Static** | Call graph says function is reachable | Graph-level DSSE | -| **Binary** | Compiled binary contains the symbol | Build-ID mapping | -| **Runtime** | Process actually executed the code path | Edge-bundle DSSE (optional) | - -**Why it matters:** Not "potentially reachable" but "here's the exact call path from `main()` to `vulnerable_function()`." You can quarantine or dispute individual edges, not just all-or-nothing. - -**No competitor signs reachability graphs.** They claim reachability; we *prove* it. - -**Modules:** `ReachGraph`, `PathWitnessBuilder`, `CompositeGateDetector` - ---- - -## 7. Semantic Smart-Diff - -**Diff security meaning, not CVE counts.** Compare reachability graphs, policy outcomes, and trust weights between releases. - -``` -Before: 5 critical CVEs (3 reachable) -After: 7 critical CVEs (1 reachable) - -Smart-Diff output: "Exploitability DECREASED by 67% despite +2 CVEs" -``` - -**What's compared:** -- Reachability graph deltas -- VEX state changes -- Policy outcome changes -- Trust weight shifts - -**Why it matters:** "+3 CVEs" tells you nothing. "Reachable attack surface dropped by half" tells you everything. - -**Modules:** `MaterialRiskChangeDetector`, `RiskStateSnapshot`, `Scanner.ReachabilityDrift` - ---- - -## 8. Unknowns as First-Class State - -**Uncertainty is risk—we surface and score it.** Explicit modeling of what we *don't* know, with policy implications. - -| Band | Meaning | Policy Action | -|------|---------|---------------| -| **HOT** | High uncertainty + exploit pressure | Immediate investigation | -| **WARM** | Moderate uncertainty | Scheduled review | -| **COLD** | Low uncertainty | Decay toward resolution | -| **RESOLVED** | Uncertainty eliminated | No action | - -**Why it matters:** Competitors hide unknowns (assume safe). We track them with decay algorithms, blast-radius containment, and policy budgets ("fail if unknowns > N"). - -**Modules:** `UnknownStateLedger`, `Policy`, `Signals` - ---- - -## 9. Three-Layer Reachability Proofs - -**Structural false positive elimination.** All three layers must align for exploitability to be confirmed. - -``` -Layer 1 (Static): Call graph shows path from entrypoint → vulnerable function -Layer 2 (Binary): Compiled binary contains the symbol with matching offset -Layer 3 (Runtime): eBPF probe confirms function was actually executed -``` - -**Confidence tiers:** -- **Confirmed** — All three layers agree -- **Likely** — Static + binary agree; no runtime data -- **Present** — Package present; no reachability evidence -- **Unreachable** — Static analysis proves no path exists - -**Why it matters:** False positives become *structurally impossible*, not heuristically reduced. - -**Modules:** `Scanner.VulnSurfaces`, `PathWitnessBuilder` - ---- - -## 10. Competitive Moats Summary - -**Four capabilities no competitor offers together:** - -| # | Capability | Why It's Hard to Copy | -|---|-----------|----------------------| -| 1 | **Signed Reachability** | Requires three-layer instrumentation + cryptographic binding | -| 2 | **Deterministic Replay** | Requires content-addressed evidence + feed snapshotting | -| 3 | **K4 Lattice VEX** | Requires rethinking VEX from suppression to claims | -| 4 | **Sovereign Offline** | Requires pluggable crypto + offline trust roots | - -**Reference:** `docs/product/competitive-landscape.md`, `docs/product/moat-strategy-summary.md` - ---- - -## 11. Trust Algebra Engine (K4 Lattice) - -**Formal conflict resolution, not naive precedence.** The lattice engine uses Belnap K4 four-valued logic to aggregate heterogeneous security assertions. - -| State | Meaning | Example | -|-------|---------|---------| -| **Unknown (⊥)** | No information | New package, no VEX yet | -| **True (T)** | Positive assertion | "This CVE affects this package" | -| **False (F)** | Negative assertion | "This CVE does not affect this package" | -| **Conflict (⊤)** | Contradictory assertions | Vendor says not_affected; runtime says called | - -**Security Atoms (six orthogonal propositions):** -- PRESENT, APPLIES, REACHABLE, MITIGATED, FIXED, MISATTRIBUTED - -**Why it matters:** Unlike naive precedence (vendor > distro > scanner), we: -- Preserve conflicts as explicit state, not hidden -- Track critical unknowns separately from ancillary ones -- Produce deterministic, explainable dispositions - -**Modules:** `TrustLatticeEngine`, `Policy` (110+ tests passing) - ---- - -## 12. Deterministic Task Packs - -**Auditable automation.** TaskRunner executes declarative Task Packs with plan-hash binding, approvals, and DSSE evidence bundles. - -- **Plan-hash binding:** Task pack execution is tied to specific plan versions -- **Approval gates:** Human sign-off required before execution -- **Sealed mode:** Air-gap compatible execution -- **Evidence bundles:** DSSE-signed results for audit trails - -**Why it matters:** Same workflows online or offline, with provable provenance. - -**Reference:** `docs/modules/packs-registry/guides/spec.md`, `docs/modules/taskrunner/architecture.md` - ---- - -## 13. Evidence-Grade Testing - -**Determinism as a continuous guarantee.** CI lanes that make reproducibility continuously provable. - -| Test Type | What It Proves | -|----------|---------------| -| **Determinism tests** | Same inputs → same outputs | -| **Offline parity tests** | Air-gapped = connected results | -| **Contract stability tests** | APIs don't break | -| **Golden fixture tests** | Historical scans still replay | - -**Why it matters:** Regression-proof audits. Evidence, not assumptions, drives releases. - -**Reference:** `docs/technical/testing/testing-strategy-models.md`, `docs/TEST_SUITE_OVERVIEW.md` +**Pricing moat:** No per-seat, per-project, or per-deployment tax. Limits are environments + new digests/day. --- ## Quick Reference -### Key Commands - -```bash -# Determinism proof -stella scan --image --srm-out a.yaml -stella scan --image --srm-out b.yaml -diff a.yaml b.yaml # Identical - -# Replay proof -stella replay srm.yaml --assert-digest - -# Reachability proof -stella graph show --cve CVE-XXXX-YYYY --artifact - -# VEX evaluation -stella vex evaluate --artifact - -# Offline scan -stella rootpack import bundle.tar.gz -stella scan --offline --image -``` - ### Key Documents -- **Competitive Landscape**: `docs/product/competitive-landscape.md` -- **Moat Strategy**: `docs/product/moat-strategy-summary.md` -- **Proof Architecture**: `docs/modules/platform/proof-driven-moats-architecture.md` -- **Vision**: `docs/VISION.md` -- **Architecture Overview**: `docs/ARCHITECTURE_OVERVIEW.md` -- **Quickstart**: `docs/quickstart.md` +- **Product Vision**: [`docs/product/VISION.md`](product/VISION.md) +- **Architecture Overview**: [`docs/ARCHITECTURE_OVERVIEW.md`](ARCHITECTURE_OVERVIEW.md) +- **Release Orchestrator Architecture**: [`docs/modules/release-orchestrator/architecture.md`](modules/release-orchestrator/architecture.md) +- **Competitive Landscape**: [`docs/product/competitive-landscape.md`](product/competitive-landscape.md) +- **Quickstart**: [`docs/quickstart.md`](quickstart.md) +- **Feature Matrix**: [`docs/FEATURE_MATRIX.md`](FEATURE_MATRIX.md) diff --git a/docs/modules/release-orchestrator/README.md b/docs/modules/release-orchestrator/README.md new file mode 100644 index 000000000..7761711fb --- /dev/null +++ b/docs/modules/release-orchestrator/README.md @@ -0,0 +1,137 @@ +# Release Orchestrator + +> Central release control plane for non-Kubernetes container estates. + +**Status:** Planned (not yet implemented) +**Source:** [Full Architecture Specification](../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) + +## Purpose + +The Release Orchestrator extends Stella Ops from a vulnerability scanning platform into **Stella Ops Suite** — a unified release control plane for non-Kubernetes container environments. It integrates: + +- **Existing capabilities**: SBOM generation, reachability-aware vulnerability analysis, VEX support, policy engine, evidence locker, deterministic replay +- **New capabilities**: Environment management, release orchestration, promotion workflows, deployment execution, progressive delivery, audit-grade release governance + +## Scope + +| In Scope | Out of Scope | +|----------|--------------| +| Non-K8s container deployments (Docker, Compose, ECS, Nomad) | Kubernetes deployments (use ArgoCD, Flux) | +| Release identity via OCI digests | Tag-based release identity | +| Plugin-extensible integrations | Hard-coded vendor integrations | +| SSH/WinRM + agent-based deployment | Cloud-native serverless deployments | +| L4/L7 traffic management via router plugins | Built-in service mesh | + +## Documentation Structure + +### Design & Principles +- [Design Principles](design/principles.md) — Core principles and invariants +- [Key Decisions](design/decisions.md) — Architectural decision record + +### Implementation +- [Implementation Guide](implementation-guide.md) — .NET 10 patterns and best practices +- [Test Structure](test-structure.md) — Test organization and guidelines + +### Module Architecture +- [Module Overview](modules/overview.md) — All modules and themes +- [Integration Hub (INTHUB)](modules/integration-hub.md) — External integrations +- [Environment Manager (ENVMGR)](modules/environment-manager.md) — Environments and targets +- [Release Manager (RELMAN)](modules/release-manager.md) — Release bundles and versions +- [Workflow Engine (WORKFL)](modules/workflow-engine.md) — DAG execution +- [Promotion Manager (PROMOT)](modules/promotion-manager.md) — Approvals and gates +- [Deploy Orchestrator (DEPLOY)](modules/deploy-orchestrator.md) — Deployment execution +- [Agents (AGENTS)](modules/agents.md) — Deployment agents +- [Progressive Delivery (PROGDL)](modules/progressive-delivery.md) — A/B and canary +- [Release Evidence (RELEVI)](modules/evidence.md) — Evidence packets +- [Plugin System (PLUGIN)](modules/plugin-system.md) — Plugin infrastructure + +### Data Model +- [Database Schema](data-model/schema.md) — PostgreSQL schema specification +- [Entity Definitions](data-model/entities.md) — Entity descriptions + +### API Specification +- [API Overview](api/overview.md) — API design principles +- [Environment APIs](api/environments.md) — Environment endpoints +- [Release APIs](api/releases.md) — Release endpoints +- [Promotion APIs](api/promotions.md) — Promotion endpoints +- [Workflow APIs](api/workflows.md) — Workflow endpoints +- [Agent APIs](api/agents.md) — Agent endpoints +- [WebSocket APIs](api/websockets.md) — Real-time endpoints + +### Workflow Engine +- [Template Structure](workflow/templates.md) — Workflow template specification +- [Execution State Machine](workflow/execution.md) — Workflow state machine +- [Promotion State Machine](workflow/promotion.md) — Promotion state machine + +### Security +- [Security Overview](security/overview.md) — Security principles +- [Authentication & Authorization](security/auth.md) — AuthN/AuthZ +- [Agent Security](security/agent-security.md) — Agent security model +- [Threat Model](security/threat-model.md) — Threats and mitigations +- [Audit Trail](security/audit-trail.md) — Audit logging + +### Integrations +- [Integration Overview](integrations/overview.md) — Integration types +- [Connector Interface](integrations/connectors.md) — Connector specification +- [Webhook Architecture](integrations/webhooks.md) — Webhook handling +- [CI/CD Patterns](integrations/ci-cd.md) — CI/CD integration patterns + +### Deployment +- [Deployment Overview](deployment/overview.md) — Architecture overview +- [Deployment Strategies](deployment/strategies.md) — Deployment strategies +- [Agent-Based Deployment](deployment/agent-based.md) — Agent deployment +- [Agentless Deployment](deployment/agentless.md) — SSH/WinRM deployment +- [Artifact Generation](deployment/artifacts.md) — Generated artifacts + +### Progressive Delivery +- [Progressive Overview](progressive-delivery/overview.md) — Progressive delivery architecture +- [A/B Releases](progressive-delivery/ab-releases.md) — A/B release models +- [Canary Controller](progressive-delivery/canary.md) — Canary implementation +- [Router Plugins](progressive-delivery/routers.md) — Traffic routing plugins + +### UI/UX +- [Dashboard Specification](ui/dashboard.md) — Dashboard screens +- [Workflow Editor](ui/workflow-editor.md) — Workflow editor +- [Screen Reference](ui/screens.md) — Key UI screens + +### Operations +- [Metrics](operations/metrics.md) — Metrics specification +- [Logging](operations/logging.md) — Logging patterns +- [Tracing](operations/tracing.md) — Distributed tracing +- [Alerting](operations/alerting.md) — Alert rules + +### Implementation +- [Roadmap](roadmap.md) — Implementation phases +- [Resource Requirements](roadmap.md#resource-requirements) — Sizing + +### Appendices +- [Glossary](appendices/glossary.md) — Term definitions +- [Configuration Reference](appendices/config.md) — Configuration options +- [Error Codes](appendices/errors.md) — API error codes +- [Evidence Schema](appendices/evidence-schema.md) — Evidence packet format + +## Quick Reference + +### Key Principles + +1. **Digest-first release identity** — Releases are immutable OCI digests, not tags +2. **Evidence for every decision** — Every promotion/deployment produces sealed evidence +3. **Pluggable everything, stable core** — Integrations are plugins; core is stable +4. **No feature gating** — All plans include all features +5. **Offline-first operation** — Core works in air-gapped environments +6. **Immutable generated artifacts** — Every deployment generates stored artifacts + +### Platform Themes + +| Theme | Purpose | +|-------|---------| +| **INTHUB** | Integration hub — external system connections | +| **ENVMGR** | Environment management — environments, targets, agents | +| **RELMAN** | Release management — components, versions, releases | +| **WORKFL** | Workflow engine — DAG execution, steps | +| **PROMOT** | Promotion — approvals, gates, decisions | +| **DEPLOY** | Deployment — execution, artifacts, rollback | +| **AGENTS** | Agents — Docker, Compose, ECS, Nomad | +| **PROGDL** | Progressive delivery — A/B, canary | +| **RELEVI** | Evidence — packets, stickers, audit | +| **PLUGIN** | Plugins — registry, loader, SDK | diff --git a/docs/modules/release-orchestrator/api/agents.md b/docs/modules/release-orchestrator/api/agents.md new file mode 100644 index 000000000..2dfc44134 --- /dev/null +++ b/docs/modules/release-orchestrator/api/agents.md @@ -0,0 +1,274 @@ +# Agent APIs + +> API endpoints for agent registration, lifecycle management, and task coordination. + +**Status:** Planned (not yet implemented) +**Source:** [Architecture Advisory Section 6.3.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) +**Related Modules:** [Agents Module](../modules/agents.md), [Agent Security](../security/agent-security.md) + +## Overview + +The Agent API provides endpoints for registering deployment agents, managing their lifecycle, and coordinating task execution. Agents use mTLS for secure communication after initial registration. + +--- + +## Registration Endpoints + +### Register Agent + +**Endpoint:** `POST /api/v1/agents/register` + +Registers a new agent with the orchestrator. Requires a one-time registration token. + +**Headers:** +``` +X-Agent-Token: {registration-token} +``` + +**Request:** +```json +{ + "name": "agent-prod-01", + "version": "1.0.0", + "capabilities": ["docker", "compose"], + "labels": { + "datacenter": "us-east-1", + "role": "deployment" + } +} +``` + +**Response:** `201 Created` +```json +{ + "agentId": "uuid", + "token": "jwt-token-for-subsequent-requests", + "config": { + "heartbeatInterval": 30, + "taskPollInterval": 5, + "logLevel": "info" + }, + "certificate": { + "cert": "-----BEGIN CERTIFICATE-----...", + "key": "-----BEGIN PRIVATE KEY-----...", + "ca": "-----BEGIN CERTIFICATE-----...", + "expiresAt": "2026-01-11T14:23:45Z" + } +} +``` + +**Notes:** +- Registration token is single-use and expires after 24 hours +- After registration, agent must use mTLS for all subsequent requests +- Certificate is short-lived (24h) and must be renewed via heartbeat + +--- + +## Lifecycle Endpoints + +### List Agents + +**Endpoint:** `GET /api/v1/agents` + +**Query Parameters:** +- `status` (string): Filter by status (`online`, `offline`, `degraded`) +- `capability` (string): Filter by capability (`docker`, `compose`, `ssh`, `winrm`, `ecs`, `nomad`) + +**Response:** `200 OK` +```json +[ + { + "id": "uuid", + "name": "agent-prod-01", + "version": "1.0.0", + "status": "online", + "capabilities": ["docker", "compose"], + "lastHeartbeat": "2026-01-10T14:23:45Z", + "resourceUsage": { + "cpu": 15.5, + "memory": 45.2 + } + } +] +``` + +### Get Agent + +**Endpoint:** `GET /api/v1/agents/{id}` + +**Response:** `200 OK` - Full agent details including assigned targets + +### Update Agent + +**Endpoint:** `PUT /api/v1/agents/{id}` + +**Request:** +```json +{ + "labels": { + "datacenter": "us-west-2" + }, + "capabilities": ["docker", "compose", "ssh"] +} +``` + +**Response:** `200 OK` - Updated agent + +### Delete Agent + +**Endpoint:** `DELETE /api/v1/agents/{id}` + +Revokes agent credentials and removes registration. + +**Response:** `200 OK` +```json +{ "deleted": true } +``` + +--- + +## Heartbeat Endpoints + +### Send Heartbeat + +**Endpoint:** `POST /api/v1/agents/{id}/heartbeat` + +Agents must send heartbeats at the configured interval to maintain online status and receive pending tasks. + +**Request:** +```json +{ + "status": "healthy", + "resourceUsage": { + "cpu": 15.5, + "memory": 45.2, + "disk": 60.0 + }, + "capabilities": ["docker", "compose"], + "runningTasks": 2 +} +``` + +**Response:** `200 OK` +```json +{ + "tasks": [ + { + "taskId": "uuid", + "taskType": "docker.pull", + "payload": { + "image": "myapp", + "tag": "v2.3.1", + "digest": "sha256:abc123..." + }, + "credentials": { + "registry.username": "user", + "registry.password": "token" + }, + "timeout": 300 + } + ], + "certificateRenewal": { + "cert": "-----BEGIN CERTIFICATE-----...", + "expiresAt": "2026-01-11T14:23:45Z" + } +} +``` + +**Notes:** +- Certificate renewal is included when current certificate is within 1 hour of expiration +- Tasks array contains pending work for the agent +- Missing heartbeats for 3 intervals marks agent as `offline` + +--- + +## Task Endpoints + +### Complete Task + +**Endpoint:** `POST /api/v1/agents/{id}/tasks/{taskId}/complete` + +Reports task completion status back to the orchestrator. + +**Request:** +```json +{ + "success": true, + "result": { + "imageId": "sha256:abc123...", + "containerId": "container-uuid" + }, + "logs": [ + { "timestamp": "2026-01-10T14:23:45Z", "level": "info", "message": "Pulling image..." }, + { "timestamp": "2026-01-10T14:23:50Z", "level": "info", "message": "Image pulled successfully" } + ] +} +``` + +**Response:** `200 OK` +```json +{ "acknowledged": true } +``` + +### Get Pending Tasks + +**Endpoint:** `GET /api/v1/agents/{id}/tasks` + +Alternative to heartbeat for polling pending tasks. + +**Response:** `200 OK` +```json +{ + "tasks": [ + { + "taskId": "uuid", + "taskType": "docker.run", + "priority": 10, + "createdAt": "2026-01-10T14:20:00Z" + } + ] +} +``` + +--- + +## WebSocket Endpoints + +### Task Stream + +**Endpoint:** `WS /api/v1/agents/{id}/task-stream` + +Real-time task assignment stream for agents. + +**Messages (Server to Agent):** +```json +{ "type": "task_assigned", "task": { "taskId": "uuid", "taskType": "docker.pull", ... } } +{ "type": "task_cancelled", "taskId": "uuid" } +``` + +**Messages (Agent to Server):** +```json +{ "type": "task_progress", "taskId": "uuid", "progress": 50, "message": "Pulling layer 3/5" } +{ "type": "task_log", "taskId": "uuid", "level": "info", "message": "..." } +``` + +--- + +## Error Responses + +| Status Code | Description | +|-------------|-------------| +| `401` | Invalid or expired registration token | +| `403` | Agent not authorized for this operation | +| `404` | Agent not found | +| `409` | Agent name already registered | +| `503` | Agent offline or unreachable | + +--- + +## See Also + +- [Environments API](environments.md) +- [Agents Module](../modules/agents.md) +- [Agent Security](../security/agent-security.md) +- [WebSocket APIs](websockets.md) diff --git a/docs/modules/release-orchestrator/api/environments.md b/docs/modules/release-orchestrator/api/environments.md new file mode 100644 index 000000000..8a5cb1f67 --- /dev/null +++ b/docs/modules/release-orchestrator/api/environments.md @@ -0,0 +1,289 @@ +# Environment Management APIs + +> API endpoints for managing environments, targets, agents, freeze windows, and inventory. + +**Status:** Planned (not yet implemented) +**Source:** [Architecture Advisory Section 6.3.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) +**Related Modules:** [Environment Manager](../modules/environment-manager.md), [Agents](../modules/agents.md) + +## Overview + +The Environment Management API provides CRUD operations for environments, target groups, deployment targets, agents, freeze windows, and inventory synchronization. All endpoints require authentication and respect tenant isolation via Row-Level Security. + +--- + +## Environment Endpoints + +### Create Environment + +**Endpoint:** `POST /api/v1/environments` + +**Request:** +```json +{ + "name": "production", + "displayName": "Production", + "orderIndex": 3, + "config": { + "deploymentTimeout": 600, + "healthCheckInterval": 30 + }, + "requiredApprovals": 2, + "requireSod": true, + "promotionPolicy": "default" +} +``` + +**Response:** `201 Created` +```json +{ + "id": "uuid", + "name": "production", + "displayName": "Production", + "orderIndex": 3, + "isProduction": true, + "requiredApprovals": 2, + "requireSeparationOfDuties": true, + "createdAt": "2026-01-10T14:23:45Z" +} +``` + +### List Environments + +**Endpoint:** `GET /api/v1/environments` + +**Query Parameters:** +- `includeState` (boolean): Include current release state + +**Response:** `200 OK` +```json +[ + { + "id": "uuid", + "name": "development", + "displayName": "Development", + "orderIndex": 1, + "currentRelease": { + "id": "release-uuid", + "name": "myapp-v2.3.1", + "deployedAt": "2026-01-09T10:00:00Z" + } + } +] +``` + +### Get Environment + +**Endpoint:** `GET /api/v1/environments/{id}` + +**Response:** `200 OK` - Full environment details + +### Update Environment + +**Endpoint:** `PUT /api/v1/environments/{id}` + +**Request:** Partial environment object + +**Response:** `200 OK` - Updated environment + +### Delete Environment + +**Endpoint:** `DELETE /api/v1/environments/{id}` + +**Response:** `200 OK` +```json +{ "deleted": true } +``` + +--- + +## Freeze Window Endpoints + +### Create Freeze Window + +**Endpoint:** `POST /api/v1/environments/{envId}/freeze-windows` + +**Request:** +```json +{ + "start": "2026-01-15T00:00:00Z", + "end": "2026-01-20T00:00:00Z", + "reason": "Holiday freeze", + "exceptions": ["user-uuid-1", "user-uuid-2"] +} +``` + +**Response:** `201 Created` +```json +{ + "id": "uuid", + "environmentId": "env-uuid", + "start": "2026-01-15T00:00:00Z", + "end": "2026-01-20T00:00:00Z", + "reason": "Holiday freeze", + "createdBy": "user-uuid" +} +``` + +### List Freeze Windows + +**Endpoint:** `GET /api/v1/environments/{envId}/freeze-windows` + +**Query Parameters:** +- `active` (boolean): Filter to active freeze windows only + +**Response:** `200 OK` - Array of freeze windows + +### Delete Freeze Window + +**Endpoint:** `DELETE /api/v1/environments/{envId}/freeze-windows/{windowId}` + +**Response:** `200 OK` +```json +{ "deleted": true } +``` + +--- + +## Target Group Endpoints + +### Create Target Group + +**Endpoint:** `POST /api/v1/environments/{envId}/target-groups` + +### List Target Groups + +**Endpoint:** `GET /api/v1/environments/{envId}/target-groups` + +### Get Target Group + +**Endpoint:** `GET /api/v1/target-groups/{id}` + +### Update Target Group + +**Endpoint:** `PUT /api/v1/target-groups/{id}` + +### Delete Target Group + +**Endpoint:** `DELETE /api/v1/target-groups/{id}` + +--- + +## Target Endpoints + +### Create Target + +**Endpoint:** `POST /api/v1/targets` + +**Request:** +```json +{ + "environmentId": "env-uuid", + "targetGroupId": "group-uuid", + "name": "prod-web-01", + "targetType": "docker_host", + "connection": { + "host": "192.168.1.100", + "port": 2375, + "tlsEnabled": true + }, + "labels": { + "role": "web", + "datacenter": "us-east-1" + }, + "deploymentDirectory": "/opt/deployments" +} +``` + +**Response:** `201 Created` +```json +{ + "id": "uuid", + "name": "prod-web-01", + "targetType": "docker_host", + "healthStatus": "unknown", + "createdAt": "2026-01-10T14:23:45Z" +} +``` + +### List Targets + +**Endpoint:** `GET /api/v1/targets` + +**Query Parameters:** +- `environmentId` (UUID): Filter by environment +- `targetType` (string): Filter by type (`docker_host`, `compose_host`, `ecs_service`, `nomad_job`) +- `labels` (JSON): Filter by labels +- `healthStatus` (string): Filter by health status + +**Response:** `200 OK` - Array of targets + +### Get Target + +**Endpoint:** `GET /api/v1/targets/{id}` + +### Update Target + +**Endpoint:** `PUT /api/v1/targets/{id}` + +### Delete Target + +**Endpoint:** `DELETE /api/v1/targets/{id}` + +### Trigger Health Check + +**Endpoint:** `POST /api/v1/targets/{id}/health-check` + +**Response:** `200 OK` +```json +{ + "status": "healthy", + "message": "Docker daemon responding", + "checkedAt": "2026-01-10T14:23:45Z" +} +``` + +### Get Version Sticker + +**Endpoint:** `GET /api/v1/targets/{id}/sticker` + +**Response:** `200 OK` +```json +{ + "releaseId": "uuid", + "releaseName": "myapp-v2.3.1", + "components": [ + { + "componentId": "uuid", + "componentName": "api", + "digest": "sha256:abc123..." + } + ], + "deployedAt": "2026-01-09T10:00:00Z", + "deployedBy": "user-uuid" +} +``` + +### Check Drift + +**Endpoint:** `GET /api/v1/targets/{id}/drift` + +**Response:** `200 OK` +```json +{ + "hasDrift": true, + "expected": { "releaseId": "uuid", "digest": "sha256:abc..." }, + "actual": { "digest": "sha256:def..." }, + "differences": [ + { "component": "api", "expected": "sha256:abc...", "actual": "sha256:def..." } + ] +} +``` + +--- + +## See Also + +- [Agents API](agents.md) +- [Environment Manager Module](../modules/environment-manager.md) +- [Agent Security](../security/agent-security.md) diff --git a/docs/modules/release-orchestrator/api/overview.md b/docs/modules/release-orchestrator/api/overview.md new file mode 100644 index 000000000..4bb8f857b --- /dev/null +++ b/docs/modules/release-orchestrator/api/overview.md @@ -0,0 +1,299 @@ +# API Overview + +**Version**: v1 +**Base Path**: `/api/v1` + +## Design Principles + +| Principle | Implementation | +|-----------|----------------| +| **RESTful** | Resource-oriented URLs, standard HTTP methods | +| **Versioned** | `/api/v1/...` prefix; breaking changes require version bump | +| **Consistent** | Standard response envelope, error format, pagination | +| **Authenticated** | OAuth 2.0 Bearer tokens via Authority module | +| **Tenant-scoped** | Tenant ID from token; all operations scoped to tenant | +| **Audited** | All mutating operations logged with user/timestamp | + +## Authentication + +All API requests require a valid JWT Bearer token: + +```http +Authorization: Bearer +``` + +Tokens are issued by the Authority module and contain: +- `user_id`: User identifier +- `tenant_id`: Tenant scope +- `roles`: User roles +- `permissions`: Specific permissions + +## Standard Response Envelope + +### Success Response + +```typescript +interface ApiResponse { + success: true; + data: T; + meta?: { + pagination?: PaginationMeta; + requestId: string; + timestamp: string; + }; +} +``` + +### Error Response + +```typescript +interface ApiErrorResponse { + success: false; + error: { + code: string; // e.g., "PROMOTION_BLOCKED" + message: string; // Human-readable message + details?: object; // Additional context + validationErrors?: ValidationError[]; + }; + meta: { + requestId: string; + timestamp: string; + }; +} + +interface ValidationError { + field: string; + message: string; + code: string; +} +``` + +### Pagination + +```typescript +interface PaginationMeta { + page: number; + pageSize: number; + totalItems: number; + totalPages: number; + hasNext: boolean; + hasPrevious: boolean; +} +``` + +## HTTP Status Codes + +| Code | Description | +|------|-------------| +| `200` | Success | +| `201` | Created | +| `204` | No Content | +| `400` | Bad Request - validation error | +| `401` | Unauthorized - invalid/missing token | +| `403` | Forbidden - insufficient permissions | +| `404` | Not Found | +| `409` | Conflict - resource state conflict | +| `422` | Unprocessable Entity - business rule violation | +| `429` | Too Many Requests - rate limited | +| `500` | Internal Server Error | + +## Common Query Parameters + +| Parameter | Type | Description | +|-----------|------|-------------| +| `page` | integer | Page number (1-indexed) | +| `pageSize` | integer | Items per page (max 100) | +| `sort` | string | Sort field (prefix `-` for descending) | +| `filter` | string | JSON filter expression | + +## API Modules + +### Integration Hub (INTHUB) + +``` +GET /api/v1/integration-types +GET /api/v1/integration-types/{typeId} +POST /api/v1/integrations +GET /api/v1/integrations +GET /api/v1/integrations/{id} +PUT /api/v1/integrations/{id} +DELETE /api/v1/integrations/{id} +POST /api/v1/integrations/{id}/test +POST /api/v1/integrations/{id}/discover +GET /api/v1/integrations/{id}/health +``` + +### Environment & Inventory (ENVMGR) + +``` +POST /api/v1/environments +GET /api/v1/environments +GET /api/v1/environments/{id} +PUT /api/v1/environments/{id} +DELETE /api/v1/environments/{id} +POST /api/v1/environments/{envId}/freeze-windows +GET /api/v1/environments/{envId}/freeze-windows +DELETE /api/v1/environments/{envId}/freeze-windows/{windowId} +POST /api/v1/targets +GET /api/v1/targets +GET /api/v1/targets/{id} +PUT /api/v1/targets/{id} +DELETE /api/v1/targets/{id} +POST /api/v1/targets/{id}/health-check +GET /api/v1/targets/{id}/sticker +GET /api/v1/targets/{id}/drift +POST /api/v1/agents/register +GET /api/v1/agents +GET /api/v1/agents/{id} +PUT /api/v1/agents/{id} +DELETE /api/v1/agents/{id} +POST /api/v1/agents/{id}/heartbeat +``` + +### Release Management (RELMAN) + +``` +POST /api/v1/components +GET /api/v1/components +GET /api/v1/components/{id} +PUT /api/v1/components/{id} +DELETE /api/v1/components/{id} +POST /api/v1/components/{id}/sync-versions +GET /api/v1/components/{id}/versions +POST /api/v1/releases +GET /api/v1/releases +GET /api/v1/releases/{id} +PUT /api/v1/releases/{id} +DELETE /api/v1/releases/{id} +GET /api/v1/releases/{id}/state +POST /api/v1/releases/{id}/deprecate +GET /api/v1/releases/{id}/compare/{otherId} +POST /api/v1/releases/from-latest +``` + +### Workflow Engine (WORKFL) + +``` +POST /api/v1/workflow-templates +GET /api/v1/workflow-templates +GET /api/v1/workflow-templates/{id} +PUT /api/v1/workflow-templates/{id} +DELETE /api/v1/workflow-templates/{id} +POST /api/v1/workflow-templates/{id}/validate +GET /api/v1/step-types +GET /api/v1/step-types/{type} +POST /api/v1/workflow-runs +GET /api/v1/workflow-runs +GET /api/v1/workflow-runs/{id} +POST /api/v1/workflow-runs/{id}/pause +POST /api/v1/workflow-runs/{id}/resume +POST /api/v1/workflow-runs/{id}/cancel +GET /api/v1/workflow-runs/{id}/steps +GET /api/v1/workflow-runs/{id}/steps/{nodeId} +GET /api/v1/workflow-runs/{id}/steps/{nodeId}/logs +GET /api/v1/workflow-runs/{id}/steps/{nodeId}/artifacts +``` + +### Promotion & Approval (PROMOT) + +``` +POST /api/v1/promotions +GET /api/v1/promotions +GET /api/v1/promotions/{id} +POST /api/v1/promotions/{id}/approve +POST /api/v1/promotions/{id}/reject +POST /api/v1/promotions/{id}/cancel +GET /api/v1/promotions/{id}/decision +GET /api/v1/promotions/{id}/approvals +GET /api/v1/promotions/{id}/evidence +POST /api/v1/promotions/preview-gates +POST /api/v1/approval-policies +GET /api/v1/approval-policies +GET /api/v1/my/pending-approvals +``` + +### Deployment (DEPLOY) + +``` +GET /api/v1/deployment-jobs +GET /api/v1/deployment-jobs/{id} +GET /api/v1/deployment-jobs/{id}/tasks +GET /api/v1/deployment-jobs/{id}/tasks/{taskId} +GET /api/v1/deployment-jobs/{id}/tasks/{taskId}/logs +GET /api/v1/deployment-jobs/{id}/artifacts +GET /api/v1/deployment-jobs/{id}/artifacts/{artifactId} +POST /api/v1/rollbacks +GET /api/v1/rollbacks +``` + +### Progressive Delivery (PROGDL) + +``` +POST /api/v1/ab-releases +GET /api/v1/ab-releases +GET /api/v1/ab-releases/{id} +POST /api/v1/ab-releases/{id}/start +POST /api/v1/ab-releases/{id}/advance +POST /api/v1/ab-releases/{id}/promote +POST /api/v1/ab-releases/{id}/rollback +GET /api/v1/ab-releases/{id}/traffic +GET /api/v1/ab-releases/{id}/health +GET /api/v1/rollout-strategies +``` + +### Release Evidence (RELEVI) + +``` +GET /api/v1/evidence-packets +GET /api/v1/evidence-packets/{id} +GET /api/v1/evidence-packets/{id}/download +POST /api/v1/audit-reports +GET /api/v1/audit-reports/{id} +GET /api/v1/audit-reports/{id}/download +GET /api/v1/version-stickers +GET /api/v1/version-stickers/{id} +``` + +### Plugin Infrastructure (PLUGIN) + +``` +GET /api/v1/plugins +GET /api/v1/plugins/{id} +POST /api/v1/plugins/{id}/enable +POST /api/v1/plugins/{id}/disable +GET /api/v1/plugins/{id}/health +POST /api/v1/plugin-instances +GET /api/v1/plugin-instances +PUT /api/v1/plugin-instances/{id} +DELETE /api/v1/plugin-instances/{id} +``` + +## WebSocket Endpoints + +``` +WS /api/v1/workflow-runs/{id}/stream +WS /api/v1/deployment-jobs/{id}/stream +WS /api/v1/agents/{id}/task-stream +WS /api/v1/dashboard/stream +``` + +## Rate Limits + +| Tier | Requests/minute | Burst | +|------|-----------------|-------| +| Standard | 1000 | 100 | +| Premium | 5000 | 500 | + +Rate limit headers: +- `X-RateLimit-Limit`: Request limit +- `X-RateLimit-Remaining`: Remaining requests +- `X-RateLimit-Reset`: Reset timestamp + +## References + +- [Environments API](environments.md) +- [Releases API](releases.md) +- [Promotions API](promotions.md) +- [Workflows API](workflows.md) +- [Agents API](agents.md) +- [WebSocket API](websockets.md) diff --git a/docs/modules/release-orchestrator/api/promotions.md b/docs/modules/release-orchestrator/api/promotions.md new file mode 100644 index 000000000..10bb21d72 --- /dev/null +++ b/docs/modules/release-orchestrator/api/promotions.md @@ -0,0 +1,317 @@ +# Promotion & Approval APIs + +> API endpoints for managing promotions, approvals, and gate evaluations. + +**Status:** Planned (not yet implemented) +**Source:** [Architecture Advisory Section 6.3.5](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) +**Related Modules:** [Promotion Manager Module](../modules/promotion-manager.md), [Workflow Promotion](../workflow/promotion.md) + +## Overview + +The Promotion API provides endpoints for requesting release promotions between environments, managing approvals, and evaluating promotion gates. Promotions enforce separation of duties (SoD) and require configured approvals before deployment proceeds. + +--- + +## Promotion Endpoints + +### Create Promotion Request + +**Endpoint:** `POST /api/v1/promotions` + +Initiates a promotion request for a release to a target environment. + +**Request:** +```json +{ + "releaseId": "uuid", + "targetEnvironmentId": "uuid", + "reason": "Deploying v2.3.1 with critical bug fix" +} +``` + +**Response:** `201 Created` +```json +{ + "id": "uuid", + "releaseId": "uuid", + "releaseName": "myapp-v2.3.1", + "sourceEnvironmentId": "uuid", + "sourceEnvironmentName": "Staging", + "targetEnvironmentId": "uuid", + "targetEnvironmentName": "Production", + "status": "pending", + "requestedBy": "user-uuid", + "requestedAt": "2026-01-10T14:23:45Z", + "reason": "Deploying v2.3.1 with critical bug fix" +} +``` + +**Status Flow:** +``` +pending -> awaiting_approval -> approved -> deploying -> deployed + -> rejected + -> cancelled + -> failed + -> rolled_back +``` + +### List Promotions + +**Endpoint:** `GET /api/v1/promotions` + +**Query Parameters:** +- `status` (string): Filter by status +- `releaseId` (UUID): Filter by release +- `environmentId` (UUID): Filter by target environment +- `page` (number): Page number + +**Response:** `200 OK` +```json +{ + "data": [ + { + "id": "uuid", + "releaseName": "myapp-v2.3.1", + "targetEnvironmentName": "Production", + "status": "awaiting_approval", + "requestedAt": "2026-01-10T14:23:45Z" + } + ], + "meta": { "page": 1, "totalCount": 25 } +} +``` + +### Get Promotion + +**Endpoint:** `GET /api/v1/promotions/{id}` + +**Response:** `200 OK` - Full promotion with decision record and approvals + +### Approve Promotion + +**Endpoint:** `POST /api/v1/promotions/{id}/approve` + +**Request:** +```json +{ + "comment": "Approved after reviewing security scan results" +} +``` + +**Response:** `200 OK` +```json +{ + "id": "uuid", + "status": "approved", + "approvalCount": 2, + "requiredApprovals": 2, + "decidedAt": "2026-01-10T14:30:00Z" +} +``` + +**Notes:** +- Separation of Duties (SoD): The user who requested the promotion cannot approve it if `requireSod` is enabled on the environment +- Multi-party approval: Promotion proceeds when `approvalCount >= requiredApprovals` + +### Reject Promotion + +**Endpoint:** `POST /api/v1/promotions/{id}/reject` + +**Request:** +```json +{ + "reason": "Security vulnerabilities not addressed" +} +``` + +**Response:** `200 OK` - Updated promotion with `status: rejected` + +### Cancel Promotion + +**Endpoint:** `POST /api/v1/promotions/{id}/cancel` + +Cancels a pending or awaiting_approval promotion. + +**Response:** `200 OK` - Updated promotion with `status: cancelled` + +--- + +## Decision & Evidence Endpoints + +### Get Decision Record + +**Endpoint:** `GET /api/v1/promotions/{id}/decision` + +Returns the full decision record including gate evaluations. + +**Response:** `200 OK` +```json +{ + "promotionId": "uuid", + "decision": "allow", + "decidedAt": "2026-01-10T14:30:00Z", + "gates": [ + { + "gateName": "security-gate", + "passed": true, + "details": { + "criticalCount": 0, + "highCount": 3, + "maxCritical": 0, + "maxHigh": 5 + } + }, + { + "gateName": "freeze-window-gate", + "passed": true, + "details": { + "activeFreezeWindow": null + } + } + ], + "approvals": [ + { + "approverId": "uuid", + "approverName": "John Doe", + "decision": "approved", + "comment": "LGTM", + "approvedAt": "2026-01-10T14:28:00Z" + } + ] +} +``` + +### Get Approvals + +**Endpoint:** `GET /api/v1/promotions/{id}/approvals` + +**Response:** `200 OK` - Array of approval records + +### Get Evidence Packet + +**Endpoint:** `GET /api/v1/promotions/{id}/evidence` + +Returns the signed evidence packet for the promotion decision. + +**Response:** `200 OK` +```json +{ + "id": "uuid", + "type": "release_decision", + "version": "1.0", + "content": { ... }, + "contentHash": "sha256:abc...", + "signature": "base64-signature", + "signatureAlgorithm": "ECDSA-P256-SHA256", + "signerKeyRef": "key-id", + "generatedAt": "2026-01-10T14:30:00Z" +} +``` + +--- + +## Gate Preview Endpoints + +### Preview Gate Evaluation + +**Endpoint:** `POST /api/v1/promotions/preview-gates` + +Evaluates gates without creating a promotion (dry run). + +**Request:** +```json +{ + "releaseId": "uuid", + "targetEnvironmentId": "uuid" +} +``` + +**Response:** `200 OK` +```json +{ + "wouldPass": false, + "gates": [ + { + "gateName": "security-gate", + "passed": false, + "blocking": true, + "message": "3 critical vulnerabilities exceed threshold (max: 0)" + }, + { + "gateName": "freeze-window-gate", + "passed": true, + "blocking": false, + "message": "No active freeze window" + } + ] +} +``` + +--- + +## Approval Policy Endpoints + +### Create Approval Policy + +**Endpoint:** `POST /api/v1/approval-policies` + +**Request:** +```json +{ + "name": "production-policy", + "environmentId": "uuid", + "requiredApprovals": 2, + "approverGroups": ["release-managers", "sre-team"], + "requireSeparationOfDuties": true, + "autoExpireHours": 24 +} +``` + +### List Approval Policies + +**Endpoint:** `GET /api/v1/approval-policies` + +### Get Approval Policy + +**Endpoint:** `GET /api/v1/approval-policies/{id}` + +### Update Approval Policy + +**Endpoint:** `PUT /api/v1/approval-policies/{id}` + +### Delete Approval Policy + +**Endpoint:** `DELETE /api/v1/approval-policies/{id}` + +--- + +## Current User Endpoints + +### Get My Pending Approvals + +**Endpoint:** `GET /api/v1/my/pending-approvals` + +Returns promotions awaiting approval from the current user. + +**Response:** `200 OK` - Array of promotions + +--- + +## Error Responses + +| Status Code | Description | +|-------------|-------------| +| `400` | Invalid promotion request | +| `403` | User cannot approve (SoD violation or not in approver list) | +| `404` | Promotion not found | +| `409` | Promotion already decided | +| `422` | Gate evaluation failed | + +--- + +## See Also + +- [Workflows API](workflows.md) +- [Releases API](releases.md) +- [Promotion Manager Module](../modules/promotion-manager.md) +- [Security Gates](../modules/promotion-manager.md#security-gate) diff --git a/docs/modules/release-orchestrator/api/releases.md b/docs/modules/release-orchestrator/api/releases.md new file mode 100644 index 000000000..44c690704 --- /dev/null +++ b/docs/modules/release-orchestrator/api/releases.md @@ -0,0 +1,345 @@ +# Release Management APIs + +> API endpoints for managing components, versions, and release bundles. + +**Status:** Planned (not yet implemented) +**Source:** [Architecture Advisory Section 6.3.3](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) +**Related Modules:** [Release Manager Module](../modules/release-manager.md), [Integration Hub](../modules/integration-hub.md) + +## Overview + +The Release Management API provides endpoints for managing container components, version tracking, and release bundle creation. All releases are identified by immutable OCI digests, ensuring cryptographic verification throughout the deployment pipeline. + +> **Design Principle:** Release identity is established via digest, not tag. Tags are human-friendly aliases; digests are the source of truth. + +--- + +## Component Endpoints + +### Create Component + +**Endpoint:** `POST /api/v1/components` + +Registers a new container component for release management. + +**Request:** +```json +{ + "name": "api", + "displayName": "API Service", + "imageRepository": "myorg/api", + "registryIntegrationId": "uuid", + "versioningStrategy": "semver", + "defaultChannel": "stable" +} +``` + +**Response:** `201 Created` +```json +{ + "id": "uuid", + "name": "api", + "displayName": "API Service", + "imageRepository": "myorg/api", + "registryIntegrationId": "uuid", + "versioningStrategy": "semver", + "createdAt": "2026-01-10T14:23:45Z" +} +``` + +### List Components + +**Endpoint:** `GET /api/v1/components` + +**Response:** `200 OK` - Array of components + +### Get Component + +**Endpoint:** `GET /api/v1/components/{id}` + +### Update Component + +**Endpoint:** `PUT /api/v1/components/{id}` + +### Delete Component + +**Endpoint:** `DELETE /api/v1/components/{id}` + +### Sync Versions + +**Endpoint:** `POST /api/v1/components/{id}/sync-versions` + +Triggers a refresh of available versions from the container registry. + +**Request:** +```json +{ + "forceRefresh": true +} +``` + +**Response:** `200 OK` +```json +{ + "synced": 15, + "versions": [ + { + "tag": "v2.3.1", + "digest": "sha256:abc123...", + "semver": "2.3.1", + "channel": "stable", + "pushedAt": "2026-01-09T10:00:00Z" + } + ] +} +``` + +### List Component Versions + +**Endpoint:** `GET /api/v1/components/{id}/versions` + +**Query Parameters:** +- `channel` (string): Filter by channel (`stable`, `beta`, `rc`) +- `limit` (number): Maximum versions to return + +**Response:** `200 OK` - Array of version maps + +--- + +## Version Map Endpoints + +### Create Version Map + +**Endpoint:** `POST /api/v1/version-maps` + +Manually assign a semver and channel to a tag/digest. + +**Request:** +```json +{ + "componentId": "uuid", + "tag": "v2.3.1", + "semver": "2.3.1", + "channel": "stable" +} +``` + +**Response:** `201 Created` + +### List Version Maps + +**Endpoint:** `GET /api/v1/version-maps` + +**Query Parameters:** +- `componentId` (UUID): Filter by component +- `channel` (string): Filter by channel + +--- + +## Release Endpoints + +### Create Release + +**Endpoint:** `POST /api/v1/releases` + +Creates a new release bundle with specified component versions. + +**Request:** +```json +{ + "name": "myapp-v2.3.1", + "displayName": "My App 2.3.1", + "components": [ + { "componentId": "uuid", "version": "2.3.1" }, + { "componentId": "uuid", "digest": "sha256:def456..." }, + { "componentId": "uuid", "channel": "stable" } + ], + "sourceRef": { + "scmIntegrationId": "uuid", + "repository": "myorg/myapp", + "branch": "main", + "commitSha": "abc123" + } +} +``` + +**Response:** `201 Created` +```json +{ + "id": "uuid", + "name": "myapp-v2.3.1", + "displayName": "My App 2.3.1", + "status": "draft", + "components": [ + { + "componentId": "uuid", + "componentName": "api", + "version": "2.3.1", + "digest": "sha256:abc123...", + "channel": "stable" + } + ], + "createdAt": "2026-01-10T14:23:45Z", + "createdBy": "user-uuid" +} +``` + +### Create Release from Latest + +**Endpoint:** `POST /api/v1/releases/from-latest` + +Convenience endpoint to create a release from the latest versions of all (or specified) components. + +**Request:** +```json +{ + "name": "myapp-latest", + "channel": "stable", + "componentIds": ["uuid1", "uuid2"], + "pinFrom": { + "environmentId": "uuid" + } +} +``` + +**Response:** `201 Created` - Release with resolved digests + +### List Releases + +**Endpoint:** `GET /api/v1/releases` + +**Query Parameters:** +- `status` (string): Filter by status (`draft`, `ready`, `promoting`, `deployed`, `deprecated`) +- `componentId` (UUID): Filter by component inclusion +- `page` (number): Page number +- `pageSize` (number): Items per page + +**Response:** `200 OK` +```json +{ + "data": [ + { + "id": "uuid", + "name": "myapp-v2.3.1", + "status": "deployed", + "componentCount": 3, + "createdAt": "2026-01-10T14:23:45Z" + } + ], + "meta": { + "page": 1, + "pageSize": 20, + "totalCount": 150, + "totalPages": 8 + } +} +``` + +### Get Release + +**Endpoint:** `GET /api/v1/releases/{id}` + +**Response:** `200 OK` - Full release with component details + +### Update Release + +**Endpoint:** `PUT /api/v1/releases/{id}` + +**Request:** +```json +{ + "displayName": "Updated Display Name", + "metadata": { "key": "value" }, + "status": "ready" +} +``` + +### Delete Release + +**Endpoint:** `DELETE /api/v1/releases/{id}` + +### Get Release State + +**Endpoint:** `GET /api/v1/releases/{id}/state` + +Returns the deployment state of a release across environments. + +**Response:** `200 OK` +```json +{ + "environments": [ + { + "environmentId": "uuid", + "environmentName": "Development", + "status": "deployed", + "deployedAt": "2026-01-09T10:00:00Z" + }, + { + "environmentId": "uuid", + "environmentName": "Staging", + "status": "deployed", + "deployedAt": "2026-01-10T08:00:00Z" + }, + { + "environmentId": "uuid", + "environmentName": "Production", + "status": "not_deployed" + } + ] +} +``` + +### Deprecate Release + +**Endpoint:** `POST /api/v1/releases/{id}/deprecate` + +Marks a release as deprecated, preventing new promotions. + +**Response:** `200 OK` - Updated release with `status: deprecated` + +### Compare Releases + +**Endpoint:** `GET /api/v1/releases/{id}/compare/{otherId}` + +Compares two releases to identify component differences. + +**Response:** `200 OK` +```json +{ + "added": [ + { "componentId": "uuid", "componentName": "worker" } + ], + "removed": [ + { "componentId": "uuid", "componentName": "legacy-service" } + ], + "changed": [ + { + "component": "api", + "fromVersion": "2.3.0", + "toVersion": "2.3.1", + "fromDigest": "sha256:old...", + "toDigest": "sha256:new..." + } + ] +} +``` + +--- + +## Error Responses + +| Status Code | Description | +|-------------|-------------| +| `400` | Invalid release configuration | +| `404` | Release or component not found | +| `409` | Release name already exists | +| `422` | Cannot resolve component version | + +--- + +## See Also + +- [Promotions API](promotions.md) +- [Release Manager Module](../modules/release-manager.md) +- [Integration Hub](../modules/integration-hub.md) +- [Design Principles](../design/principles.md) diff --git a/docs/modules/release-orchestrator/api/websockets.md b/docs/modules/release-orchestrator/api/websockets.md new file mode 100644 index 000000000..df1fa0088 --- /dev/null +++ b/docs/modules/release-orchestrator/api/websockets.md @@ -0,0 +1,374 @@ +# Real-Time APIs (WebSocket/SSE) + +> WebSocket and Server-Sent Events endpoints for real-time updates. + +**Status:** Planned (not yet implemented) +**Source:** [Architecture Advisory Section 6.4](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) +**Related Modules:** [Workflow Execution](../workflow/execution.md), [UI Dashboard](../ui/dashboard.md) + +## Overview + +The Release Orchestrator provides real-time streaming endpoints for workflow runs, deployment progress, agent tasks, and dashboard metrics. These endpoints support both WebSocket connections and Server-Sent Events (SSE) for browser compatibility. + +--- + +## Authentication + +All WebSocket and SSE connections require authentication via JWT token: + +**WebSocket:** Token in query parameter or first message +``` +ws://api/v1/workflow-runs/{id}/stream?token=jwt-token +``` + +**SSE:** Token in Authorization header +``` +GET /api/v1/dashboard/stream +Authorization: Bearer jwt-token +``` + +--- + +## Workflow Run Stream + +**Endpoint:** `WS /api/v1/workflow-runs/{id}/stream` + +Streams real-time updates for a workflow run including step progress and logs. + +### Message Types (Server to Client) + +**Step Started:** +```json +{ + "type": "step_started", + "nodeId": "security-check", + "stepType": "security-gate", + "timestamp": "2026-01-10T14:23:45Z" +} +``` + +**Step Progress:** +```json +{ + "type": "step_progress", + "nodeId": "deploy", + "progress": 50, + "message": "Deploying to target 3/6" +} +``` + +**Step Log:** +```json +{ + "type": "step_log", + "nodeId": "deploy", + "line": "Pulling image sha256:abc123...", + "level": "info", + "timestamp": "2026-01-10T14:23:50Z" +} +``` + +**Step Completed:** +```json +{ + "type": "step_completed", + "nodeId": "security-check", + "status": "succeeded", + "outputs": { + "criticalCount": 0, + "highCount": 3 + }, + "duration": 5.2, + "timestamp": "2026-01-10T14:23:50Z" +} +``` + +**Workflow Completed:** +```json +{ + "type": "workflow_completed", + "status": "succeeded", + "duration": 125.5, + "outputs": { + "deploymentId": "uuid" + }, + "timestamp": "2026-01-10T14:25:50Z" +} +``` + +--- + +## Deployment Job Stream + +**Endpoint:** `WS /api/v1/deployment-jobs/{id}/stream` + +Streams real-time updates for deployment job execution. + +### Message Types (Server to Client) + +**Task Started:** +```json +{ + "type": "task_started", + "taskId": "uuid", + "targetId": "uuid", + "targetName": "prod-web-01", + "taskType": "docker.pull", + "timestamp": "2026-01-10T14:23:45Z" +} +``` + +**Task Progress:** +```json +{ + "type": "task_progress", + "taskId": "uuid", + "progress": 75, + "message": "Pulling layer 4/5" +} +``` + +**Task Log:** +```json +{ + "type": "task_log", + "taskId": "uuid", + "line": "Container started successfully", + "level": "info" +} +``` + +**Task Completed:** +```json +{ + "type": "task_completed", + "taskId": "uuid", + "targetId": "uuid", + "status": "succeeded", + "duration": 45.2, + "result": { + "containerId": "abc123", + "digest": "sha256:..." + }, + "timestamp": "2026-01-10T14:24:30Z" +} +``` + +**Job Completed:** +```json +{ + "type": "job_completed", + "status": "succeeded", + "targetsDeployed": 4, + "targetsFailed": 0, + "duration": 180.5, + "timestamp": "2026-01-10T14:26:45Z" +} +``` + +--- + +## Agent Task Stream + +**Endpoint:** `WS /api/v1/agents/{id}/task-stream` + +Bidirectional stream for agent task assignment and progress reporting. + +### Message Types (Server to Agent) + +**Task Assigned:** +```json +{ + "type": "task_assigned", + "task": { + "taskId": "uuid", + "taskType": "docker.pull", + "payload": { + "image": "myapp", + "digest": "sha256:abc123..." + }, + "credentials": { + "registry.username": "user", + "registry.password": "token" + }, + "timeout": 300 + } +} +``` + +**Task Cancelled:** +```json +{ + "type": "task_cancelled", + "taskId": "uuid", + "reason": "Deployment cancelled by user" +} +``` + +### Message Types (Agent to Server) + +**Task Progress:** +```json +{ + "type": "task_progress", + "taskId": "uuid", + "progress": 50, + "message": "Pulling image layer 3/5" +} +``` + +**Task Log:** +```json +{ + "type": "task_log", + "taskId": "uuid", + "level": "info", + "message": "Image layer downloaded: sha256:def456..." +} +``` + +**Task Completed:** +```json +{ + "type": "task_completed", + "taskId": "uuid", + "success": true, + "result": { + "imageId": "sha256:abc123..." + } +} +``` + +--- + +## Dashboard Metrics Stream + +**Endpoint:** `WS /api/v1/dashboard/stream` + +Streams real-time dashboard metrics and alerts. + +### Message Types (Server to Client) + +**Metric Update:** +```json +{ + "type": "metric_update", + "metrics": { + "pipelineStatus": [ + { "environmentId": "uuid", "name": "Production", "health": "healthy" } + ], + "pendingApprovals": 3, + "activeDeployments": 1, + "recentReleases": 12, + "systemHealth": { + "agentsOnline": 8, + "agentsTotal": 10, + "queueDepth": 5 + } + }, + "timestamp": "2026-01-10T14:23:45Z" +} +``` + +**Alert:** +```json +{ + "type": "alert", + "alert": { + "id": "uuid", + "severity": "warning", + "title": "Deployment Failed", + "message": "Deployment to Production failed: health check timeout", + "resourceType": "deployment", + "resourceId": "uuid", + "timestamp": "2026-01-10T14:23:45Z" + } +} +``` + +**Promotion Update:** +```json +{ + "type": "promotion_update", + "promotion": { + "id": "uuid", + "releaseName": "myapp-v2.3.1", + "targetEnvironment": "Production", + "status": "awaiting_approval", + "requestedBy": "John Doe" + } +} +``` + +--- + +## Connection Management + +### Reconnection + +Clients should implement exponential backoff reconnection: + +```javascript +const connect = (retryCount = 0) => { + const ws = new WebSocket(url); + + ws.onclose = () => { + const delay = Math.min(1000 * Math.pow(2, retryCount), 30000); + setTimeout(() => connect(retryCount + 1), delay); + }; + + ws.onopen = () => { + retryCount = 0; + }; +}; +``` + +### Heartbeat + +WebSocket connections receive periodic heartbeat messages: + +```json +{ + "type": "heartbeat", + "timestamp": "2026-01-10T14:23:45Z" +} +``` + +Clients should respond with: +```json +{ + "type": "pong" +} +``` + +Connections without pong response within 30 seconds are terminated. + +--- + +## Error Messages + +```json +{ + "type": "error", + "code": "unauthorized", + "message": "Token expired", + "timestamp": "2026-01-10T14:23:45Z" +} +``` + +| Error Code | Description | +|------------|-------------| +| `unauthorized` | Invalid or expired token | +| `forbidden` | No access to resource | +| `not_found` | Resource not found | +| `rate_limited` | Too many connections | +| `internal_error` | Server error | + +--- + +## See Also + +- [Workflows API](workflows.md) +- [Agents API](agents.md) +- [UI Dashboard](../ui/dashboard.md) +- [Workflow Execution](../workflow/execution.md) diff --git a/docs/modules/release-orchestrator/api/workflows.md b/docs/modules/release-orchestrator/api/workflows.md new file mode 100644 index 000000000..27076c029 --- /dev/null +++ b/docs/modules/release-orchestrator/api/workflows.md @@ -0,0 +1,354 @@ +# Workflow APIs + +> API endpoints for managing workflow templates, step registry, and workflow runs. + +**Status:** Planned (not yet implemented) +**Source:** [Architecture Advisory Section 6.3.4](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) +**Related Modules:** [Workflow Engine Module](../modules/workflow-engine.md), [Workflow Templates](../workflow/templates.md) + +## Overview + +The Workflow API provides endpoints for managing workflow templates (DAG definitions), discovering available step types, and executing workflow runs. Workflows are directed acyclic graphs (DAGs) of steps that orchestrate promotions, deployments, and other automation tasks. + +--- + +## Workflow Template Endpoints + +### Create Workflow Template + +**Endpoint:** `POST /api/v1/workflow-templates` + +**Request:** +```json +{ + "name": "standard-promotion", + "displayName": "Standard Promotion Workflow", + "description": "Default workflow for promoting releases", + "nodes": [ + { + "id": "security-check", + "type": "security-gate", + "name": "Security Check", + "config": { + "maxCritical": 0, + "maxHigh": 5 + }, + "position": { "x": 100, "y": 100 } + }, + { + "id": "approval", + "type": "approval", + "name": "Manager Approval", + "config": { + "approvers": ["manager-group"], + "minApprovals": 1 + }, + "position": { "x": 300, "y": 100 } + }, + { + "id": "deploy", + "type": "deploy", + "name": "Deploy to Target", + "config": { + "strategy": "rolling", + "batchSize": "25%" + }, + "position": { "x": 500, "y": 100 } + } + ], + "edges": [ + { "from": "security-check", "to": "approval" }, + { "from": "approval", "to": "deploy" } + ], + "inputs": [ + { "name": "releaseId", "type": "uuid", "required": true }, + { "name": "environmentId", "type": "uuid", "required": true } + ], + "outputs": [ + { "name": "deploymentId", "type": "uuid" } + ] +} +``` + +**Response:** `201 Created` +```json +{ + "id": "uuid", + "name": "standard-promotion", + "displayName": "Standard Promotion Workflow", + "version": 1, + "nodeCount": 3, + "isActive": true, + "createdAt": "2026-01-10T14:23:45Z" +} +``` + +### List Workflow Templates + +**Endpoint:** `GET /api/v1/workflow-templates` + +**Query Parameters:** +- `includeBuiltin` (boolean): Include system-provided templates +- `tags` (string): Filter by tags + +**Response:** `200 OK` - Array of workflow templates + +### Get Workflow Template + +**Endpoint:** `GET /api/v1/workflow-templates/{id}` + +**Response:** `200 OK` - Full template with nodes and edges + +### Update Workflow Template + +**Endpoint:** `PUT /api/v1/workflow-templates/{id}` + +Creates a new version of the template. + +**Request:** Partial or full template definition + +**Response:** `200 OK` - New version of template + +### Delete Workflow Template + +**Endpoint:** `DELETE /api/v1/workflow-templates/{id}` + +**Response:** `200 OK` +```json +{ "deleted": true } +``` + +### Validate Workflow Template + +**Endpoint:** `POST /api/v1/workflow-templates/{id}/validate` + +Validates a template with sample inputs. + +**Request:** +```json +{ + "inputs": { + "releaseId": "sample-uuid", + "environmentId": "sample-uuid" + } +} +``` + +**Response:** `200 OK` +```json +{ + "valid": true, + "errors": [] +} +``` + +Or on validation failure: +```json +{ + "valid": false, + "errors": [ + { "nodeId": "deploy", "field": "config.strategy", "message": "Invalid strategy: unknown" }, + { "type": "dag", "message": "Cycle detected: node-a -> node-b -> node-a" } + ] +} +``` + +--- + +## Step Registry Endpoints + +### List Step Types + +**Endpoint:** `GET /api/v1/step-types` + +Lists all available step types from core and plugins. + +**Query Parameters:** +- `category` (string): Filter by category (`deployment`, `gate`, `notification`, `utility`) +- `provider` (string): Filter by provider (`builtin`, `plugin-id`) + +**Response:** `200 OK` +```json +[ + { + "type": "script", + "displayName": "Script", + "description": "Execute shell script on target", + "category": "utility", + "provider": "builtin", + "configSchema": { ... } + }, + { + "type": "security-gate", + "displayName": "Security Gate", + "description": "Check vulnerability thresholds", + "category": "gate", + "provider": "builtin", + "configSchema": { ... } + } +] +``` + +### Get Step Type + +**Endpoint:** `GET /api/v1/step-types/{type}` + +**Response:** `200 OK` - Full step type with configuration schema + +--- + +## Workflow Run Endpoints + +### Start Workflow Run + +**Endpoint:** `POST /api/v1/workflow-runs` + +**Request:** +```json +{ + "templateId": "uuid", + "context": { + "releaseId": "uuid", + "environmentId": "uuid", + "variables": { + "deploymentTimeout": 600 + } + } +} +``` + +**Response:** `201 Created` +```json +{ + "id": "uuid", + "templateId": "uuid", + "templateVersion": 1, + "status": "running", + "startedAt": "2026-01-10T14:23:45Z" +} +``` + +### List Workflow Runs + +**Endpoint:** `GET /api/v1/workflow-runs` + +**Query Parameters:** +- `status` (string): Filter by status (`pending`, `running`, `succeeded`, `failed`, `cancelled`) +- `templateId` (UUID): Filter by template +- `page` (number): Page number + +**Response:** `200 OK` +```json +{ + "data": [ + { + "id": "uuid", + "templateName": "standard-promotion", + "status": "running", + "progress": 66, + "startedAt": "2026-01-10T14:23:45Z" + } + ], + "meta": { "page": 1, "totalCount": 50 } +} +``` + +### Get Workflow Run + +**Endpoint:** `GET /api/v1/workflow-runs/{id}` + +**Response:** `200 OK` - Full run with step statuses + +### Pause Workflow Run + +**Endpoint:** `POST /api/v1/workflow-runs/{id}/pause` + +Pauses a running workflow at the next step boundary. + +**Response:** `200 OK` - Updated workflow run + +### Resume Workflow Run + +**Endpoint:** `POST /api/v1/workflow-runs/{id}/resume` + +Resumes a paused workflow. + +**Response:** `200 OK` - Updated workflow run + +### Cancel Workflow Run + +**Endpoint:** `POST /api/v1/workflow-runs/{id}/cancel` + +Cancels a running or paused workflow. + +**Response:** `200 OK` - Updated workflow run + +### List Step Runs + +**Endpoint:** `GET /api/v1/workflow-runs/{id}/steps` + +**Response:** `200 OK` +```json +[ + { + "nodeId": "security-check", + "stepType": "security-gate", + "status": "succeeded", + "startedAt": "2026-01-10T14:23:45Z", + "completedAt": "2026-01-10T14:23:50Z" + }, + { + "nodeId": "approval", + "stepType": "approval", + "status": "running", + "startedAt": "2026-01-10T14:23:50Z" + } +] +``` + +### Get Step Run + +**Endpoint:** `GET /api/v1/workflow-runs/{id}/steps/{nodeId}` + +**Response:** `200 OK` - Step run with logs + +### Get Step Logs + +**Endpoint:** `GET /api/v1/workflow-runs/{id}/steps/{nodeId}/logs` + +**Query Parameters:** +- `follow` (boolean): Stream logs in real-time via SSE + +**Response:** `200 OK` - Log content or SSE stream + +### List Step Artifacts + +**Endpoint:** `GET /api/v1/workflow-runs/{id}/steps/{nodeId}/artifacts` + +**Response:** `200 OK` - Array of artifacts + +### Download Artifact + +**Endpoint:** `GET /api/v1/workflow-runs/{id}/steps/{nodeId}/artifacts/{artifactId}` + +**Response:** Binary download + +--- + +## Error Responses + +| Status Code | Description | +|-------------|-------------| +| `400` | Invalid workflow template | +| `404` | Template or run not found | +| `409` | Workflow already running | +| `422` | DAG validation failed | + +--- + +## See Also + +- [WebSocket APIs](websockets.md) - Real-time workflow updates +- [Workflow Engine Module](../modules/workflow-engine.md) +- [Workflow Templates](../workflow/templates.md) +- [Workflow Execution](../workflow/execution.md) diff --git a/docs/modules/release-orchestrator/appendices/config.md b/docs/modules/release-orchestrator/appendices/config.md new file mode 100644 index 000000000..e34f43389 --- /dev/null +++ b/docs/modules/release-orchestrator/appendices/config.md @@ -0,0 +1,224 @@ +# Configuration Reference + +> Environment variables and OPA policy examples for the Release Orchestrator. + +**Status:** Planned (not yet implemented) +**Source:** [Architecture Advisory Section 15.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) +**Related Modules:** [Security Overview](../security/overview.md), [Promotion Manager](../modules/promotion-manager.md) +**Sprint:** [101_001 Foundation](../../../../implplan/SPRINT_20260110_101_001_DB_schema_core_tables.md) + +## Overview + +This document provides the configuration reference for the Release Orchestrator, including environment variables and OPA policy examples. + +--- + +## Environment Variables + +### Core Configuration + +```bash +# Database +STELLA_DATABASE_URL=postgresql://user:pass@host:5432/stella +STELLA_REDIS_URL=redis://host:6379 +STELLA_SECRET_KEY=base64-encoded-32-bytes +STELLA_LOG_LEVEL=info +STELLA_LOG_FORMAT=json +``` + +### Authentication (Authority) + +```bash +# OAuth/OIDC +STELLA_OAUTH_ISSUER=https://auth.example.com +STELLA_OAUTH_CLIENT_ID=stella-app +STELLA_OAUTH_CLIENT_SECRET=secret +``` + +### Agents + +```bash +# Agent TLS +STELLA_AGENT_LISTEN_PORT=8443 +STELLA_AGENT_TLS_CERT=/path/to/cert.pem +STELLA_AGENT_TLS_KEY=/path/to/key.pem +STELLA_AGENT_CA_CERT=/path/to/ca.pem +``` + +### Plugins + +```bash +# Plugin configuration +STELLA_PLUGIN_DIR=/var/stella/plugins +STELLA_PLUGIN_SANDBOX_MEMORY=512m +STELLA_PLUGIN_SANDBOX_CPU=1 +``` + +### Integrations + +```bash +# Vault integration +STELLA_VAULT_ADDR=https://vault.example.com +STELLA_VAULT_TOKEN=hvs.xxx +``` + +--- + +## Full Configuration File + +```yaml +# stella-config.yaml + +database: + url: postgresql://user:pass@host:5432/stella + pool_size: 20 + ssl_mode: require + +redis: + url: redis://host:6379 + prefix: stella + +auth: + issuer: https://auth.example.com + client_id: stella-app + client_secret_ref: vault://secrets/oauth-client-secret + +agents: + listen_port: 8443 + tls: + cert_path: /etc/stella/agent.crt + key_path: /etc/stella/agent.key + ca_path: /etc/stella/ca.crt + heartbeat_interval: 30 + task_timeout: 600 + +plugins: + directory: /var/stella/plugins + sandbox: + memory: 512m + cpu: 1 + network: restricted + +evidence: + storage_path: /var/stella/evidence + signing_key_ref: vault://secrets/evidence-signing-key + retention_days: 2555 # 7 years + +logging: + level: info + format: json + output: stdout + +telemetry: + enabled: true + otlp_endpoint: otel-collector:4317 + service_name: stella-release-orchestrator +``` + +--- + +## OPA Policy Examples + +### Security Gate Policy + +```rego +# security_gate.rego +package stella.gates.security + +default allow = false + +allow { + input.release.components[_].security.reachable_critical == 0 + input.release.components[_].security.reachable_high == 0 +} + +deny[msg] { + component := input.release.components[_] + component.security.reachable_critical > 0 + msg := sprintf("Component %s has %d reachable critical vulnerabilities", + [component.name, component.security.reachable_critical]) +} +``` + +### Approval Gate Policy + +```rego +# approval_gate.rego +package stella.gates.approval + +default allow = false + +allow { + count(input.approvals) >= input.environment.required_approvals + separation_of_duties_met +} + +separation_of_duties_met { + not input.environment.require_sod +} + +separation_of_duties_met { + input.environment.require_sod + approver_ids := {a.approver_id | a := input.approvals[_]; a.action == "approved"} + not input.promotion.requested_by in approver_ids +} +``` + +### Freeze Window Gate Policy + +```rego +# freeze_window_gate.rego +package stella.gates.freeze + +default allow = true + +allow = false { + window := input.environment.freeze_windows[_] + time.now_ns() >= time.parse_rfc3339_ns(window.start) + time.now_ns() <= time.parse_rfc3339_ns(window.end) + not input.promotion.requested_by in window.exceptions +} +``` + +--- + +## API Error Codes + +| Code | HTTP Status | Description | +|------|-------------|-------------| +| `RELEASE_NOT_FOUND` | 404 | Release with specified ID does not exist | +| `ENVIRONMENT_NOT_FOUND` | 404 | Environment with specified ID does not exist | +| `PROMOTION_BLOCKED` | 403 | Promotion blocked by policy gates | +| `APPROVAL_REQUIRED` | 403 | Additional approvals required | +| `FREEZE_WINDOW_ACTIVE` | 403 | Environment is in freeze window | +| `DIGEST_MISMATCH` | 400 | Image digest does not match expected | +| `AGENT_OFFLINE` | 503 | Required agent is offline | +| `WORKFLOW_FAILED` | 500 | Workflow execution failed | +| `PLUGIN_ERROR` | 500 | Plugin returned an error | +| `QUOTA_EXCEEDED` | 429 | Digest analysis quota exceeded | +| `VALIDATION_ERROR` | 400 | Request validation failed | +| `UNAUTHORIZED` | 401 | Authentication required | +| `FORBIDDEN` | 403 | Insufficient permissions | + +--- + +## Default Values + +| Setting | Default | Description | +|---------|---------|-------------| +| Agent heartbeat interval | 30s | Frequency of agent heartbeats | +| Task timeout | 600s | Maximum time for agent task | +| Deployment batch size | 25% | Percentage of targets per batch | +| Health check timeout | 60s | Timeout for health checks | +| Evidence retention | 7 years | Audit compliance requirement | +| Max workflow steps | 50 | Maximum steps per workflow | +| Max parallel tasks | 10 | Per-agent concurrent tasks | + +--- + +## See Also + +- [Security Overview](../security/overview.md) +- [Promotion Manager](../modules/promotion-manager.md) +- [Database Schema](../data-model/schema.md) +- [Glossary](glossary.md) diff --git a/docs/modules/release-orchestrator/appendices/errors.md b/docs/modules/release-orchestrator/appendices/errors.md new file mode 100644 index 000000000..3b829ddc7 --- /dev/null +++ b/docs/modules/release-orchestrator/appendices/errors.md @@ -0,0 +1,296 @@ +# API Error Codes + +## Overview + +All API errors follow a consistent format with error codes for programmatic handling. + +## Error Response Format + +```typescript +interface ApiErrorResponse { + success: false; + error: { + code: string; // Machine-readable error code + message: string; // Human-readable message + details?: object; // Additional context + validationErrors?: ValidationError[]; + }; + meta: { + requestId: string; + timestamp: string; + }; +} + +interface ValidationError { + field: string; + message: string; + code: string; +} +``` + +## Error Code Categories + +| Prefix | Category | HTTP Status Range | +|--------|----------|-------------------| +| `AUTH_` | Authentication | 401 | +| `PERM_` | Authorization/Permission | 403 | +| `VAL_` | Validation | 400 | +| `RES_` | Resource | 404, 409 | +| `ENV_` | Environment | 422 | +| `REL_` | Release | 422 | +| `PROM_` | Promotion | 422 | +| `DEPLOY_` | Deployment | 422 | +| `GATE_` | Gate | 422 | +| `AGT_` | Agent | 422 | +| `INT_` | Integration | 422 | +| `WF_` | Workflow | 422 | +| `SYS_` | System | 500 | + +## Authentication Errors (401) + +| Code | Message | Description | +|------|---------|-------------| +| `AUTH_TOKEN_MISSING` | Authentication token required | No token provided | +| `AUTH_TOKEN_INVALID` | Invalid authentication token | Token cannot be parsed | +| `AUTH_TOKEN_EXPIRED` | Authentication token expired | Token has expired | +| `AUTH_TOKEN_REVOKED` | Authentication token revoked | Token has been revoked | +| `AUTH_AGENT_CERT_INVALID` | Invalid agent certificate | Agent mTLS cert invalid | +| `AUTH_AGENT_CERT_EXPIRED` | Agent certificate expired | Agent cert has expired | +| `AUTH_API_KEY_INVALID` | Invalid API key | API key not recognized | + +## Permission Errors (403) + +| Code | Message | Description | +|------|---------|-------------| +| `PERM_DENIED` | Permission denied | Generic permission denial | +| `PERM_RESOURCE_DENIED` | Access to resource denied | Cannot access specific resource | +| `PERM_ACTION_DENIED` | Action not permitted | Cannot perform specific action | +| `PERM_SCOPE_DENIED` | Outside permitted scope | Action outside user's scope | +| `PERM_SOD_VIOLATION` | Separation of duties violation | SoD prevents action | +| `PERM_SELF_APPROVAL` | Cannot approve own request | Self-approval not allowed | +| `PERM_TENANT_MISMATCH` | Tenant mismatch | Resource belongs to different tenant | + +## Validation Errors (400) + +| Code | Message | Description | +|------|---------|-------------| +| `VAL_REQUIRED_FIELD` | Required field missing | Field is required | +| `VAL_INVALID_FORMAT` | Invalid field format | Field format incorrect | +| `VAL_INVALID_VALUE` | Invalid field value | Value not in allowed set | +| `VAL_TOO_LONG` | Field value too long | Exceeds max length | +| `VAL_TOO_SHORT` | Field value too short | Below min length | +| `VAL_INVALID_UUID` | Invalid UUID format | Not a valid UUID | +| `VAL_INVALID_DIGEST` | Invalid digest format | Not a valid OCI digest | +| `VAL_INVALID_SEMVER` | Invalid semver format | Not valid semantic version | +| `VAL_INVALID_JSON` | Invalid JSON | Request body not valid JSON | +| `VAL_SCHEMA_MISMATCH` | Schema validation failed | Doesn't match schema | + +## Resource Errors (404, 409) + +| Code | Message | HTTP | Description | +|------|---------|------|-------------| +| `RES_NOT_FOUND` | Resource not found | 404 | Generic not found | +| `RES_ENVIRONMENT_NOT_FOUND` | Environment not found | 404 | Environment doesn't exist | +| `RES_RELEASE_NOT_FOUND` | Release not found | 404 | Release doesn't exist | +| `RES_PROMOTION_NOT_FOUND` | Promotion not found | 404 | Promotion doesn't exist | +| `RES_TARGET_NOT_FOUND` | Target not found | 404 | Target doesn't exist | +| `RES_AGENT_NOT_FOUND` | Agent not found | 404 | Agent doesn't exist | +| `RES_CONFLICT` | Resource conflict | 409 | Resource state conflict | +| `RES_ALREADY_EXISTS` | Resource already exists | 409 | Duplicate resource | +| `RES_VERSION_CONFLICT` | Version conflict | 409 | Optimistic lock failure | + +## Environment Errors (422) + +| Code | Message | Description | +|------|---------|-------------| +| `ENV_FROZEN` | Environment is frozen | Deployment blocked by freeze window | +| `ENV_FREEZE_ACTIVE` | Active freeze window | Cannot modify during freeze | +| `ENV_INVALID_ORDER` | Invalid environment order | Order index conflict | +| `ENV_CIRCULAR_PROMOTION` | Circular promotion path | Auto-promote creates cycle | +| `ENV_QUOTA_EXCEEDED` | Environment quota exceeded | Max environments reached | + +## Release Errors (422) + +| Code | Message | Description | +|------|---------|-------------| +| `REL_ALREADY_FINALIZED` | Release already finalized | Cannot modify finalized release | +| `REL_NOT_READY` | Release not ready | Release not in ready state | +| `REL_DIGEST_MISMATCH` | Digest mismatch | Resolved digest differs | +| `REL_TAG_NOT_FOUND` | Tag not found in registry | Cannot resolve tag | +| `REL_COMPONENT_MISSING` | Component not found | Referenced component missing | +| `REL_INVALID_STATUS_TRANSITION` | Invalid status transition | Status change not allowed | +| `REL_DEPRECATED` | Release deprecated | Cannot promote deprecated release | + +## Promotion Errors (422) + +| Code | Message | Description | +|------|---------|-------------| +| `PROM_ALREADY_EXISTS` | Promotion already pending | Duplicate promotion request | +| `PROM_NOT_PENDING` | Promotion not pending | Cannot approve/reject | +| `PROM_ALREADY_APPROVED` | Promotion already approved | Already approved | +| `PROM_ALREADY_REJECTED` | Promotion already rejected | Already rejected | +| `PROM_ALREADY_CANCELLED` | Promotion already cancelled | Already cancelled | +| `PROM_DEPLOYING` | Promotion is deploying | Cannot cancel during deploy | +| `PROM_INVALID_STATE` | Invalid promotion state | State doesn't allow action | +| `PROM_APPROVER_REQUIRED` | Additional approvers required | Insufficient approvals | +| `PROM_SKIP_ENVIRONMENT` | Cannot skip environments | Must promote sequentially | + +## Deployment Errors (422) + +| Code | Message | Description | +|------|---------|-------------| +| `DEPLOY_IN_PROGRESS` | Deployment in progress | Another deployment running | +| `DEPLOY_NO_TARGETS` | No targets available | No targets in environment | +| `DEPLOY_TARGET_UNHEALTHY` | Target unhealthy | Target failed health check | +| `DEPLOY_AGENT_UNAVAILABLE` | Agent unavailable | Required agent offline | +| `DEPLOY_ARTIFACT_MISSING` | Deployment artifact missing | Required artifact not found | +| `DEPLOY_TIMEOUT` | Deployment timeout | Exceeded timeout | +| `DEPLOY_PULL_FAILED` | Image pull failed | Cannot pull container image | +| `DEPLOY_DIGEST_VERIFICATION_FAILED` | Digest verification failed | Image tampered | +| `DEPLOY_HEALTH_CHECK_FAILED` | Health check failed | Post-deploy health failed | +| `DEPLOY_ROLLBACK_IN_PROGRESS` | Rollback in progress | Already rolling back | +| `DEPLOY_NOTHING_TO_ROLLBACK` | Nothing to rollback | No previous deployment | + +## Gate Errors (422) + +| Code | Message | Description | +|------|---------|-------------| +| `GATE_EVALUATION_FAILED` | Gate evaluation failed | Gate cannot be evaluated | +| `GATE_SECURITY_BLOCKED` | Blocked by security gate | Security policy violation | +| `GATE_POLICY_BLOCKED` | Blocked by policy gate | Custom policy violation | +| `GATE_APPROVAL_BLOCKED` | Blocked pending approval | Awaiting approval | +| `GATE_TIMEOUT` | Gate evaluation timeout | Evaluation exceeded timeout | + +## Agent Errors (422) + +| Code | Message | Description | +|------|---------|-------------| +| `AGT_REGISTRATION_FAILED` | Agent registration failed | Cannot register agent | +| `AGT_TOKEN_INVALID` | Invalid registration token | Bad or expired token | +| `AGT_TOKEN_USED` | Registration token already used | One-time token reused | +| `AGT_CERTIFICATE_FAILED` | Certificate issuance failed | Cannot issue certificate | +| `AGT_OFFLINE` | Agent offline | Agent not responding | +| `AGT_CAPABILITY_MISSING` | Missing capability | Agent lacks required capability | +| `AGT_TASK_FAILED` | Task execution failed | Agent task failed | +| `AGT_HEARTBEAT_TIMEOUT` | Heartbeat timeout | Agent heartbeat overdue | + +## Integration Errors (422) + +| Code | Message | Description | +|------|---------|-------------| +| `INT_CONNECTION_FAILED` | Connection failed | Cannot connect to integration | +| `INT_AUTH_FAILED` | Authentication failed | Integration auth failed | +| `INT_RATE_LIMITED` | Rate limited | Integration rate limit hit | +| `INT_TIMEOUT` | Integration timeout | Request timeout | +| `INT_INVALID_RESPONSE` | Invalid response | Unexpected response format | +| `INT_RESOURCE_NOT_FOUND` | External resource not found | Registry/SCM resource missing | + +## Workflow Errors (422) + +| Code | Message | Description | +|------|---------|-------------| +| `WF_TEMPLATE_NOT_FOUND` | Workflow template not found | Template doesn't exist | +| `WF_TEMPLATE_INVALID` | Invalid workflow template | Template validation failed | +| `WF_CYCLE_DETECTED` | Cycle detected in workflow | DAG contains cycle | +| `WF_STEP_FAILED` | Workflow step failed | Step execution failed | +| `WF_ALREADY_RUNNING` | Workflow already running | Duplicate workflow run | +| `WF_INVALID_STATE` | Invalid workflow state | Cannot perform action | +| `WF_EXPRESSION_ERROR` | Expression evaluation error | Bad expression | + +## System Errors (500) + +| Code | Message | Description | +|------|---------|-------------| +| `SYS_INTERNAL_ERROR` | Internal server error | Unexpected error | +| `SYS_DATABASE_ERROR` | Database error | Database operation failed | +| `SYS_STORAGE_ERROR` | Storage error | Storage operation failed | +| `SYS_VAULT_ERROR` | Vault error | Secret retrieval failed | +| `SYS_QUEUE_ERROR` | Queue error | Message queue failed | +| `SYS_SERVICE_UNAVAILABLE` | Service unavailable | Dependency unavailable | +| `SYS_OVERLOADED` | System overloaded | Capacity exceeded | + +## Example Error Responses + +### Validation Error + +```json +{ + "success": false, + "error": { + "code": "VAL_REQUIRED_FIELD", + "message": "Validation failed", + "validationErrors": [ + { + "field": "releaseId", + "message": "Release ID is required", + "code": "VAL_REQUIRED_FIELD" + }, + { + "field": "targetEnvironmentId", + "message": "Invalid UUID format", + "code": "VAL_INVALID_UUID" + } + ] + }, + "meta": { + "requestId": "req-12345", + "timestamp": "2026-01-10T14:30:00Z" + } +} +``` + +### Permission Error + +```json +{ + "success": false, + "error": { + "code": "PERM_SOD_VIOLATION", + "message": "Separation of duties violation: requester cannot approve their own promotion", + "details": { + "promotionId": "promo-uuid", + "requesterId": "user-uuid", + "approverId": "user-uuid", + "environmentId": "env-uuid", + "requiresSoD": true + } + }, + "meta": { + "requestId": "req-12345", + "timestamp": "2026-01-10T14:30:00Z" + } +} +``` + +### Gate Block Error + +```json +{ + "success": false, + "error": { + "code": "GATE_SECURITY_BLOCKED", + "message": "Promotion blocked by security gate", + "details": { + "gateName": "security-gate", + "releaseId": "rel-uuid", + "targetEnvironment": "production", + "violations": [ + { + "type": "critical_vulnerability", + "count": 3, + "threshold": 0 + } + ] + } + }, + "meta": { + "requestId": "req-12345", + "timestamp": "2026-01-10T14:30:00Z" + } +} +``` + +## References + +- [API Overview](../api/overview.md) +- [Security Overview](../security/overview.md) diff --git a/docs/modules/release-orchestrator/appendices/evidence-schema.md b/docs/modules/release-orchestrator/appendices/evidence-schema.md new file mode 100644 index 000000000..8d50c7320 --- /dev/null +++ b/docs/modules/release-orchestrator/appendices/evidence-schema.md @@ -0,0 +1,549 @@ +# Evidence Packet Schema + +## Overview + +Evidence packets are cryptographically signed, immutable records of deployment decisions and outcomes. They provide audit-grade proof of who did what, when, and why. + +## Evidence Packet Types + +| Type | Description | Generated When | +|------|-------------|----------------| +| `release_decision` | Promotion decision evidence | Promotion approved/rejected | +| `deployment` | Deployment execution evidence | Deployment completes | +| `rollback` | Rollback evidence | Rollback completes | +| `ab_promotion` | A/B release promotion evidence | A/B promotion completes | + +## Schema Definition + +### Evidence Packet Structure + +```typescript +interface EvidencePacket { + // Identification + id: UUID; + version: "1.0"; + type: EvidencePacketType; + + // Metadata + generatedAt: DateTime; + generatorVersion: string; + tenantId: UUID; + + // Content + content: EvidenceContent; + + // Integrity + contentHash: string; // SHA-256 of canonical JSON content + signature: string; // Base64-encoded signature + signatureAlgorithm: string; // "RS256", "ES256" + signerKeyRef: string; // Reference to signing key +} + +type EvidencePacketType = + | "release_decision" + | "deployment" + | "rollback" + | "ab_promotion"; +``` + +### Evidence Content + +```typescript +interface EvidenceContent { + // What was released + release: ReleaseEvidence; + + // Where it was released + environment: EnvironmentEvidence; + + // Who requested and approved + actors: ActorEvidence; + + // Why it was allowed + decision: DecisionEvidence; + + // How it was executed (deployment only) + execution?: ExecutionEvidence; + + // Previous state (for rollback) + previous?: PreviousStateEvidence; +} +``` + +### Release Evidence + +```typescript +interface ReleaseEvidence { + id: UUID; + name: string; + displayName: string; + createdAt: DateTime; + createdBy: ActorRef; + + components: Array<{ + id: UUID; + name: string; + digest: string; + semver: string; + tag: string; + role: "primary" | "sidecar" | "init" | "migration"; + }>; + + sourceRef?: { + scmIntegrationId?: UUID; + repository?: string; + commitSha?: string; + branch?: string; + ciIntegrationId?: UUID; + buildId?: string; + pipelineUrl?: string; + }; +} +``` + +### Environment Evidence + +```typescript +interface EnvironmentEvidence { + id: UUID; + name: string; + displayName: string; + orderIndex: number; + + targets: Array<{ + id: UUID; + name: string; + type: string; + healthStatus: string; + }>; + + configuration: { + requiredApprovals: number; + requireSeparationOfDuties: boolean; + promotionPolicy?: string; + deploymentTimeout: number; + }; +} +``` + +### Actor Evidence + +```typescript +interface ActorEvidence { + requester: ActorRef; + requestReason: string; + requestedAt: DateTime; + + approvers: Array<{ + actor: ActorRef; + action: "approved" | "rejected"; + comment?: string; + timestamp: DateTime; + roles: string[]; + }>; + + deployer?: { + agent: AgentRef; + triggeredBy: ActorRef; + startedAt: DateTime; + }; +} + +interface ActorRef { + id: UUID; + type: "user" | "system" | "agent"; + name: string; + email?: string; +} + +interface AgentRef { + id: UUID; + name: string; + version: string; +} +``` + +### Decision Evidence + +```typescript +interface DecisionEvidence { + promotionId: UUID; + decision: "allow" | "block"; + decidedAt: DateTime; + + gateResults: Array<{ + gateName: string; + gateType: string; + passed: boolean; + blocking: boolean; + message: string; + evaluatedAt: DateTime; + details: object; + }>; + + freezeWindowCheck: { + checked: boolean; + windowActive: boolean; + windowId?: UUID; + exemption?: { + grantedBy: UUID; + reason: string; + }; + }; + + separationOfDuties: { + required: boolean; + satisfied: boolean; + requesterIds: UUID[]; + approverIds: UUID[]; + }; +} +``` + +### Execution Evidence + +```typescript +interface ExecutionEvidence { + deploymentJobId: UUID; + strategy: string; + startedAt: DateTime; + completedAt: DateTime; + status: "succeeded" | "failed" | "rolled_back"; + + tasks: Array<{ + targetId: UUID; + targetName: string; + agentId: UUID; + status: string; + startedAt: DateTime; + completedAt: DateTime; + digest: string; + stickerWritten: boolean; + error?: string; + }>; + + artifacts: Array<{ + name: string; + type: string; + sha256: string; + storageRef: string; + }>; + + metrics: { + totalTasks: number; + succeededTasks: number; + failedTasks: number; + totalDurationSeconds: number; + }; +} +``` + +### Previous State Evidence + +```typescript +interface PreviousStateEvidence { + releaseId: UUID; + releaseName: string; + deployedAt: DateTime; + deployedBy: ActorRef; + components: Array<{ + name: string; + digest: string; + }>; +} +``` + +## Example Evidence Packet + +```json +{ + "id": "evid-12345-uuid", + "version": "1.0", + "type": "deployment", + "generatedAt": "2026-01-10T14:35:00Z", + "generatorVersion": "stella-evidence-generator@1.5.0", + "tenantId": "tenant-uuid", + + "content": { + "release": { + "id": "rel-uuid", + "name": "myapp-v2.3.1", + "displayName": "MyApp v2.3.1", + "createdAt": "2026-01-10T10:00:00Z", + "createdBy": { + "id": "user-uuid", + "type": "user", + "name": "John Doe", + "email": "john@example.com" + }, + "components": [ + { + "id": "comp-api-uuid", + "name": "api", + "digest": "sha256:abc123def456...", + "semver": "2.3.1", + "tag": "v2.3.1", + "role": "primary" + }, + { + "id": "comp-worker-uuid", + "name": "worker", + "digest": "sha256:789xyz...", + "semver": "2.3.1", + "tag": "v2.3.1", + "role": "primary" + } + ], + "sourceRef": { + "repository": "github.com/myorg/myapp", + "commitSha": "abc123", + "branch": "main", + "buildId": "build-456" + } + }, + + "environment": { + "id": "env-prod-uuid", + "name": "production", + "displayName": "Production", + "orderIndex": 2, + "targets": [ + { + "id": "target-1-uuid", + "name": "prod-web-01", + "type": "compose_host", + "healthStatus": "healthy" + }, + { + "id": "target-2-uuid", + "name": "prod-web-02", + "type": "compose_host", + "healthStatus": "healthy" + } + ], + "configuration": { + "requiredApprovals": 2, + "requireSeparationOfDuties": true, + "deploymentTimeout": 600 + } + }, + + "actors": { + "requester": { + "id": "user-john-uuid", + "type": "user", + "name": "John Doe", + "email": "john@example.com" + }, + "requestReason": "Release v2.3.1 with performance improvements", + "requestedAt": "2026-01-10T12:00:00Z", + "approvers": [ + { + "actor": { + "id": "user-jane-uuid", + "type": "user", + "name": "Jane Smith", + "email": "jane@example.com" + }, + "action": "approved", + "comment": "LGTM, tests passed", + "timestamp": "2026-01-10T13:00:00Z", + "roles": ["release_manager"] + }, + { + "actor": { + "id": "user-bob-uuid", + "type": "user", + "name": "Bob Johnson", + "email": "bob@example.com" + }, + "action": "approved", + "comment": "Approved for production", + "timestamp": "2026-01-10T13:30:00Z", + "roles": ["approver"] + } + ], + "deployer": { + "agent": { + "id": "agent-prod-uuid", + "name": "prod-agent-01", + "version": "1.5.0" + }, + "triggeredBy": { + "id": "system", + "type": "system", + "name": "Stella Orchestrator" + }, + "startedAt": "2026-01-10T14:00:00Z" + } + }, + + "decision": { + "promotionId": "promo-uuid", + "decision": "allow", + "decidedAt": "2026-01-10T13:55:00Z", + "gateResults": [ + { + "gateName": "security-gate", + "gateType": "security", + "passed": true, + "blocking": true, + "message": "No critical or high vulnerabilities", + "evaluatedAt": "2026-01-10T13:50:00Z", + "details": { + "critical": 0, + "high": 0, + "medium": 5, + "low": 12 + } + }, + { + "gateName": "approval-gate", + "gateType": "approval", + "passed": true, + "blocking": true, + "message": "2/2 required approvals received", + "evaluatedAt": "2026-01-10T13:55:00Z", + "details": { + "required": 2, + "received": 2 + } + } + ], + "freezeWindowCheck": { + "checked": true, + "windowActive": false + }, + "separationOfDuties": { + "required": true, + "satisfied": true, + "requesterIds": ["user-john-uuid"], + "approverIds": ["user-jane-uuid", "user-bob-uuid"] + } + }, + + "execution": { + "deploymentJobId": "job-uuid", + "strategy": "rolling", + "startedAt": "2026-01-10T14:00:00Z", + "completedAt": "2026-01-10T14:35:00Z", + "status": "succeeded", + "tasks": [ + { + "targetId": "target-1-uuid", + "targetName": "prod-web-01", + "agentId": "agent-prod-uuid", + "status": "succeeded", + "startedAt": "2026-01-10T14:00:00Z", + "completedAt": "2026-01-10T14:15:00Z", + "digest": "sha256:abc123def456...", + "stickerWritten": true + }, + { + "targetId": "target-2-uuid", + "targetName": "prod-web-02", + "agentId": "agent-prod-uuid", + "status": "succeeded", + "startedAt": "2026-01-10T14:20:00Z", + "completedAt": "2026-01-10T14:35:00Z", + "digest": "sha256:abc123def456...", + "stickerWritten": true + } + ], + "artifacts": [ + { + "name": "compose.stella.lock.yml", + "type": "compose-lock", + "sha256": "checksum...", + "storageRef": "s3://artifacts/job-uuid/compose.stella.lock.yml" + } + ], + "metrics": { + "totalTasks": 2, + "succeededTasks": 2, + "failedTasks": 0, + "totalDurationSeconds": 2100 + } + } + }, + + "contentHash": "sha256:content-hash...", + "signature": "base64-signature...", + "signatureAlgorithm": "RS256", + "signerKeyRef": "stella/signing/prod-key-2026" +} +``` + +## Signature Verification + +```typescript +async function verifyEvidencePacket(packet: EvidencePacket): Promise { + // 1. Verify content hash + const canonicalContent = canonicalize(packet.content); + const computedHash = sha256(canonicalContent); + + if (computedHash !== packet.contentHash) { + return { valid: false, error: "Content hash mismatch" }; + } + + // 2. Get signing key + const publicKey = await getPublicKey(packet.signerKeyRef); + + // 3. Verify signature + const signatureValid = await verify( + packet.signature, + packet.contentHash, + publicKey, + packet.signatureAlgorithm + ); + + if (!signatureValid) { + return { valid: false, error: "Invalid signature" }; + } + + return { valid: true }; +} +``` + +## Storage + +Evidence packets are stored in an append-only table: + +```sql +CREATE TABLE release.evidence_packets ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + promotion_id UUID NOT NULL REFERENCES release.promotions(id), + type TEXT NOT NULL, + version TEXT NOT NULL DEFAULT '1.0', + content JSONB NOT NULL, + content_hash TEXT NOT NULL, + signature TEXT NOT NULL, + signature_algorithm TEXT NOT NULL, + signer_key_ref TEXT NOT NULL, + generated_at TIMESTAMPTZ NOT NULL, + generator_version TEXT NOT NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT now() + -- Note: No updated_at - packets are immutable +); + +-- Prevent modifications +REVOKE UPDATE, DELETE ON release.evidence_packets FROM app_role; +``` + +## Export Formats + +Evidence packets can be exported in multiple formats: + +| Format | Use Case | +|--------|----------| +| JSON | API consumption, archival | +| PDF | Human-readable compliance reports | +| CSV | Spreadsheet analysis | +| SLSA | SLSA provenance format | + +## References + +- [Security Overview](../security/overview.md) +- [Deployment Artifacts](../deployment/artifacts.md) +- [Audit Trail](../security/audit-trail.md) diff --git a/docs/modules/release-orchestrator/appendices/glossary.md b/docs/modules/release-orchestrator/appendices/glossary.md new file mode 100644 index 000000000..0a59a5e95 --- /dev/null +++ b/docs/modules/release-orchestrator/appendices/glossary.md @@ -0,0 +1,235 @@ +# Glossary + +## Core Concepts + +### Agent +A software component installed on deployment targets that receives and executes deployment tasks. Agents communicate with the orchestrator via mTLS and execute deployments locally on the target. + +### Approval +A human decision to authorize a promotion request. Approvals may require multiple approvers and enforce separation of duties. + +### Approval Policy +Rules defining who can approve promotions to specific environments, including required approval counts and SoD requirements. + +### Blue-Green Deployment +A deployment strategy using two identical production environments. Traffic switches from "blue" (current) to "green" (new) after validation. + +### Canary Deployment +A deployment strategy that gradually rolls out changes to a small subset of targets before full deployment, allowing validation with real traffic. + +### Channel +A version stream for components (e.g., "stable", "beta", "nightly"). Each channel tracks the latest compatible version. + +### Component +A deployable unit mapped to a container image repository. Components have versions tracked via digest. + +### Compose Lock +A Docker Compose file with all image references pinned to specific digests, ensuring reproducible deployments. + +### Connector +A plugin that integrates Release Orchestrator with external systems (registries, CI/CD, notifications, etc.). + +### Decision Record +An immutable record of all gate evaluations and conditions considered when making a promotion decision. + +### Deployment Job +A unit of work representing the deployment of a release to an environment. Contains multiple deployment tasks. + +### Deployment Task +A single target-level deployment operation within a deployment job. + +### Digest +A cryptographic hash (SHA-256) that uniquely identifies a container image. Format: `sha256:abc123...` + +### Drift +A mismatch between the expected deployed version (from version sticker) and the actual running version on a target. + +### Environment +A logical grouping of deployment targets representing a stage in the promotion pipeline (e.g., dev, staging, production). + +### Evidence Packet +An immutable, cryptographically signed record of deployment decisions and outcomes for audit purposes. + +### Freeze Window +A time period during which deployments to an environment are blocked (e.g., holiday code freeze). + +### Gate +A checkpoint in the promotion workflow that must pass before deployment proceeds. Types include security gates, approval gates, and custom policy gates. + +### Promotion +The process of moving a release from one environment to another, subject to gates and approvals. + +### Release +A versioned bundle of component digests representing a deployable unit. Releases are immutable once created. + +### Rolling Deployment +A deployment strategy that updates targets in batches, maintaining availability throughout the process. + +### Rollback +The process of reverting to a previous release version when a deployment fails or causes issues. + +### Security Gate +An automated gate that evaluates security policies (vulnerability thresholds, compliance requirements) before allowing promotion. + +### Separation of Duties (SoD) +A security principle requiring that the person who requests a promotion cannot be the same person who approves it. + +### Step +A single unit of work within a workflow template. Steps have types (deploy, approve, notify, etc.) and can have dependencies. + +### Target +A specific deployment destination (host, service, container) within an environment. + +### Tenant +An isolated organizational unit with its own environments, releases, and configurations. Multi-tenancy ensures data isolation. + +### Version Map +A mapping of image tags to digests for a component, allowing tag-based references while maintaining digest-based deployments. + +### Version Sticker +Metadata placed on deployment targets indicating the currently deployed release and digest. + +### Workflow +A DAG (Directed Acyclic Graph) of steps defining the deployment process, including gates, approvals, and verification. + +### Workflow Template +A reusable workflow definition that can be customized for specific deployment scenarios. + +## Module Abbreviations + +| Abbreviation | Full Name | Description | +|--------------|-----------|-------------| +| INTHUB | Integration Hub | External system integration | +| ENVMGR | Environment Manager | Environment and target management | +| RELMAN | Release Management | Component and release management | +| WORKFL | Workflow Engine | Workflow execution | +| PROMOT | Promotion & Approval | Promotion and approval handling | +| DEPLOY | Deployment Execution | Deployment orchestration | +| AGENTS | Deployment Agents | Agent management | +| PROGDL | Progressive Delivery | A/B and canary releases | +| RELEVI | Release Evidence | Audit and compliance | +| PLUGIN | Plugin Infrastructure | Plugin system | + +## Deployment Strategies + +| Strategy | Description | +|----------|-------------| +| All-at-once | Deploy to all targets simultaneously | +| Rolling | Deploy in batches with availability | +| Canary | Gradual rollout with metrics validation | +| Blue-Green | Parallel environment with traffic switch | + +## Status Values + +### Promotion Status + +| Status | Description | +|--------|-------------| +| `pending` | Promotion created, not yet evaluated | +| `pending_approval` | Waiting for human approval | +| `approved` | Approved, ready for deployment | +| `rejected` | Rejected by approver | +| `deploying` | Deployment in progress | +| `completed` | Successfully deployed | +| `failed` | Deployment failed | +| `cancelled` | Cancelled by user | + +### Deployment Job Status + +| Status | Description | +|--------|-------------| +| `pending` | Job created, not started | +| `preparing` | Generating artifacts | +| `running` | Tasks executing | +| `completing` | Verifying deployment | +| `completed` | Successfully completed | +| `failed` | Deployment failed | +| `rolling_back` | Rollback in progress | +| `rolled_back` | Rollback completed | + +### Agent Status + +| Status | Description | +|--------|-------------| +| `online` | Agent connected and healthy | +| `offline` | Agent not connected | +| `degraded` | Agent connected but reporting issues | + +### Target Health Status + +| Status | Description | +|--------|-------------| +| `healthy` | Target responding correctly | +| `unhealthy` | Target failing health checks | +| `unknown` | Health status not determined | + +## API Error Codes + +| Code | Description | +|------|-------------| +| `RELEASE_NOT_FOUND` | Release ID does not exist | +| `ENVIRONMENT_NOT_FOUND` | Environment ID does not exist | +| `PROMOTION_BLOCKED` | Promotion blocked by gate or freeze | +| `APPROVAL_REQUIRED` | Promotion requires approval | +| `INSUFFICIENT_APPROVALS` | Not enough approvals | +| `SOD_VIOLATION` | Separation of duties violated | +| `FREEZE_WINDOW_ACTIVE` | Environment in freeze window | +| `SECURITY_GATE_FAILED` | Security requirements not met | +| `NO_AGENT_AVAILABLE` | No agent available for target | +| `DEPLOYMENT_IN_PROGRESS` | Another deployment running | +| `ROLLBACK_NOT_POSSIBLE` | No previous version to rollback to | + +## Integration Types + +| Type | Category | Description | +|------|----------|-------------| +| `docker-registry` | Registry | Docker Registry v2 | +| `ecr` | Registry | AWS ECR | +| `acr` | Registry | Azure Container Registry | +| `gcr` | Registry | Google Container Registry | +| `harbor` | Registry | Harbor Registry | +| `gitlab-ci` | CI/CD | GitLab CI/CD | +| `github-actions` | CI/CD | GitHub Actions | +| `jenkins` | CI/CD | Jenkins | +| `slack` | Notification | Slack | +| `teams` | Notification | Microsoft Teams | +| `email` | Notification | Email (SMTP) | +| `hashicorp-vault` | Secrets | HashiCorp Vault | +| `prometheus` | Metrics | Prometheus | + +## Workflow Step Types + +| Type | Category | Description | +|------|----------|-------------| +| `approval` | Control | Wait for human approval | +| `wait` | Control | Wait for duration | +| `condition` | Control | Branch based on condition | +| `parallel` | Control | Execute children in parallel | +| `security-gate` | Gate | Evaluate security policy | +| `custom-gate` | Gate | Custom OPA policy | +| `freeze-check` | Gate | Check freeze windows | +| `deploy-docker` | Deploy | Deploy single container | +| `deploy-compose` | Deploy | Deploy Compose stack | +| `health-check` | Verify | HTTP/TCP health check | +| `smoke-test` | Verify | Run smoke tests | +| `notify` | Notify | Send notification | +| `webhook` | Integration | Call external webhook | +| `trigger-ci` | Integration | Trigger CI pipeline | +| `rollback` | Recovery | Rollback deployment | + +## Security Terms + +| Term | Description | +|------|-------------| +| mTLS | Mutual TLS - both client and server authenticate with certificates | +| JWT | JSON Web Token - used for API authentication | +| RBAC | Role-Based Access Control | +| OPA | Open Policy Agent - policy evaluation engine | +| SoD | Separation of Duties | +| PEP | Policy Enforcement Point | + +## References + +- [Design Principles](../design/principles.md) +- [API Overview](../api/overview.md) +- [Security Overview](../security/overview.md) diff --git a/docs/modules/release-orchestrator/architecture.md b/docs/modules/release-orchestrator/architecture.md new file mode 100644 index 000000000..5af298546 --- /dev/null +++ b/docs/modules/release-orchestrator/architecture.md @@ -0,0 +1,410 @@ +# Release Orchestrator Architecture + +> Technical architecture specification for the Release Orchestrator — Stella Ops Suite's central release control plane for non-Kubernetes container estates. + +**Status:** Planned (not yet implemented) + +## Overview + +The Release Orchestrator transforms Stella Ops Suite from a vulnerability scanning platform into a centralized, auditable release control plane. It sits between CI systems and runtime targets, governing promotion across environments, enforcing security and policy gates, and producing verifiable evidence for every release decision. + +### Core Value Proposition + +- **Release orchestration** — UI-driven promotion (Dev → Stage → Prod), approvals, policy gates, rollbacks +- **Security decisioning as a gate** — Scan on build, evaluate on release, re-evaluate on CVE updates +- **OCI-digest-first releases** — Immutable digest-based release identity +- **Toolchain-agnostic integrations** — Plug into any SCM, CI, registry, secrets system +- **Auditability + standards** — Evidence packets, SBOM/VEX/attestation support, deterministic replay + +## Design Principles + +1. **Digest-First Release Identity** — A release is an immutable set of OCI digests, never mutable tags. Tags are resolved to digests at release creation time. + +2. **Pluggable Everything, Stable Core** — Integrations are plugins; the core orchestration engine is stable. Plugins contribute UI screens, connector logic, step types, and agent types. + +3. **Evidence for Every Decision** — Every deployment/promotion produces an immutable evidence record containing who, what, why, how, and when. + +4. **No Feature Gating** — All plans include all features. Limits are only: environments, new digests/day, fair use on deployments. + +5. **Offline-First Operation** — All core operations work in air-gapped environments. Plugins may require connectivity; core does not. + +6. **Immutable Generated Artifacts** — Every deployment generates and stores immutable artifacts (compose lockfiles, scripts, evidence). + +## Platform Themes + +The Release Orchestrator introduces ten new functional themes: + +| Theme | Purpose | Key Modules | +|-------|---------|-------------| +| **INTHUB** | Integration hub | Integration Manager, Connection Profiles, Connector Runtime | +| **ENVMGR** | Environment management | Environment Manager, Target Registry, Agent Manager | +| **RELMAN** | Release management | Component Registry, Version Manager, Release Manager | +| **WORKFL** | Workflow engine | Workflow Designer, Workflow Engine, Step Executor | +| **PROMOT** | Promotion and approval | Promotion Manager, Approval Gateway, Decision Engine | +| **DEPLOY** | Deployment execution | Deploy Orchestrator, Target Executor, Artifact Generator | +| **AGENTS** | Deployment agents | Agent Core, Docker/Compose/ECS/Nomad agents | +| **PROGDL** | Progressive delivery | A/B Manager, Traffic Router, Canary Controller | +| **RELEVI** | Release evidence | Evidence Collector, Sticker Writer, Audit Exporter | +| **PLUGIN** | Plugin infrastructure | Plugin Registry, Plugin Loader, Plugin SDK | + +## Components + +``` +ReleaseOrchestrator/ +├── __Libraries/ +│ ├── StellaOps.ReleaseOrchestrator.Core/ # Core domain models +│ ├── StellaOps.ReleaseOrchestrator.Workflow/ # DAG workflow engine +│ ├── StellaOps.ReleaseOrchestrator.Promotion/ # Promotion logic +│ ├── StellaOps.ReleaseOrchestrator.Deploy/ # Deployment coordination +│ ├── StellaOps.ReleaseOrchestrator.Evidence/ # Evidence generation +│ ├── StellaOps.ReleaseOrchestrator.Plugin/ # Plugin infrastructure +│ └── StellaOps.ReleaseOrchestrator.Integration/ # Integration connectors +├── StellaOps.ReleaseOrchestrator.WebService/ # HTTP API +├── StellaOps.ReleaseOrchestrator.Worker/ # Background processing +├── StellaOps.Agent.Core/ # Agent base framework +├── StellaOps.Agent.Docker/ # Docker host agent +├── StellaOps.Agent.Compose/ # Docker Compose agent +├── StellaOps.Agent.SSH/ # SSH agentless executor +├── StellaOps.Agent.WinRM/ # WinRM agentless executor +├── StellaOps.Agent.ECS/ # AWS ECS agent +├── StellaOps.Agent.Nomad/ # HashiCorp Nomad agent +└── __Tests/ + └── StellaOps.ReleaseOrchestrator.*.Tests/ +``` + +## Data Flow + +### Release Orchestration Flow + +``` +CI Build → Registry Push → Webhook → Stella Scan → Create Release → +Request Promotion → Gate Evaluation → Decision Record → +Deploy via Agent → Version Sticker → Evidence Packet +``` + +### Detailed Flow + +1. **CI pushes image** to registry by digest; triggers webhook to Stella +2. **Stella scans** the new digest (if not already scanned); stores verdict +3. **Release created** bundling component digests with semantic version +4. **Promotion requested** to move release from source → target environment +5. **Gate evaluation** runs: security verdict, approval count, freeze windows, custom policies +6. **Decision record** produced with evidence refs and signed +7. **Deployment executed** via agent to target (Docker/Compose/ECS/Nomad) +8. **Version sticker** written to target for drift detection +9. **Evidence packet** sealed and stored + +## Key Abstractions + +### Environment + +```csharp +public sealed record Environment +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required string Name { get; init; } // "dev", "stage", "prod" + public required string Slug { get; init; } // URL-safe identifier + public required int PromotionOrder { get; init; } // 1, 2, 3... + public required FreezeWindow[] FreezeWindows { get; init; } + public required ApprovalPolicy ApprovalPolicy { get; init; } + public required bool IsProduction { get; init; } + public EnvironmentState State { get; init; } // Active, Frozen, Retired +} +``` + +### Release + +```csharp +public sealed record Release +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required string Version { get; init; } // SemVer: "2.3.1" + public required string Name { get; init; } // Display name + public required ImmutableDictionary Components { get; init; } + public required string SourceRef { get; init; } // Git SHA or tag + public required DateTimeOffset CreatedAt { get; init; } + public required Guid CreatedBy { get; init; } + public ReleaseState State { get; init; } // Draft, Active, Deprecated +} + +public sealed record ComponentDigest +{ + public required string Repository { get; init; } // registry.example.com/app/api + public required string Digest { get; init; } // sha256:abc123... + public required string? ResolvedFromTag { get; init; } // Optional: "v2.3.1" +} +``` + +### Promotion + +```csharp +public sealed record Promotion +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required Guid ReleaseId { get; init; } + public required Guid SourceEnvironmentId { get; init; } + public required Guid TargetEnvironmentId { get; init; } + public required Guid RequestedBy { get; init; } + public required DateTimeOffset RequestedAt { get; init; } + public PromotionState State { get; init; } // Pending, Approved, Rejected, Deployed, RolledBack + public required ImmutableArray GateResults { get; init; } + public required ImmutableArray Approvals { get; init; } + public required DecisionRecord? Decision { get; init; } +} +``` + +### Workflow + +```csharp +public sealed record Workflow +{ + public required Guid Id { get; init; } + public required string Name { get; init; } + public required ImmutableArray Steps { get; init; } + public required ImmutableDictionary DependencyGraph { get; init; } +} + +public sealed record WorkflowStep +{ + public required string Id { get; init; } + public required string Type { get; init; } // "script", "approval", "deploy", "gate" + public required StepProvider Provider { get; init; } + public required ImmutableDictionary Config { get; init; } + public required string[] DependsOn { get; init; } + public StepState State { get; init; } +} +``` + +### Target + +```csharp +public sealed record Target +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required Guid EnvironmentId { get; init; } + public required string Name { get; init; } + public required TargetType Type { get; init; } // DockerHost, ComposeHost, ECSService, NomadJob + public required ImmutableDictionary Labels { get; init; } + public required Guid? AgentId { get; init; } // Null for agentless + public required TargetState State { get; init; } + public required HealthStatus Health { get; init; } +} + +public enum TargetType +{ + DockerHost, + ComposeHost, + ECSService, + NomadJob, + SSHRemote, + WinRMRemote +} +``` + +### Agent + +```csharp +public sealed record Agent +{ + public required Guid Id { get; init; } + public required Guid TenantId { get; init; } + public required string Name { get; init; } + public required string Version { get; init; } + public required ImmutableArray Capabilities { get; init; } + public required DateTimeOffset LastHeartbeat { get; init; } + public required AgentState State { get; init; } // Online, Offline, Degraded + public required ImmutableDictionary Labels { get; init; } +} +``` + +## Database Schema + +| Table | Purpose | +|-------|---------| +| `release.environments` | Environment definitions with freeze windows | +| `release.targets` | Deployment targets within environments | +| `release.agents` | Registered deployment agents | +| `release.components` | Component definitions (service → repository mapping) | +| `release.releases` | Release bundles (version → component digests) | +| `release.promotions` | Promotion requests and state | +| `release.approvals` | Approval records | +| `release.workflows` | Workflow templates | +| `release.workflow_runs` | Workflow execution state | +| `release.deployment_jobs` | Deployment job records | +| `release.evidence_packets` | Sealed evidence records | +| `release.integrations` | Integration configurations | +| `release.plugins` | Plugin registrations | + +## Gate Types + +| Gate | Purpose | Evaluation | +|------|---------|------------| +| **Security** | Check scan verdict | Query latest scan for release digest; block on critical/high reachable | +| **Approval** | Human sign-off | Count approvals; check SoD rules | +| **FreezeWindow** | Calendar-based blocking | Check target environment freeze windows | +| **PreviousEnvironment** | Require prior deployment | Verify release deployed to source environment | +| **Policy** | Custom OPA/Rego rules | Evaluate policy with promotion context | +| **HealthCheck** | Target health | Verify target is healthy before deploy | + +## Plugin System (Three-Surface Model) + +Plugins contribute through three surfaces: + +### 1. Manifest (Static Declaration) + +```yaml +# plugin-manifest.yaml +name: github-integration +version: 1.0.0 +provider: StellaOps.Integration.GitHub.Plugin +capabilities: + integrations: + - type: scm + id: github + displayName: GitHub + steps: + - type: github-status + displayName: Update GitHub Status + gates: + - type: github-check + displayName: GitHub Check Required +``` + +### 2. Connector Runtime (Dynamic Execution) + +```csharp +public interface IIntegrationConnector +{ + Task TestConnectionAsync(CancellationToken ct); + Task GetHealthAsync(CancellationToken ct); + Task> DiscoverResourcesAsync(string resourceType, CancellationToken ct); +} + +public interface ISCMConnector : IIntegrationConnector +{ + Task GetCommitAsync(string ref, CancellationToken ct); + Task CreateCommitStatusAsync(string commit, CommitStatus status, CancellationToken ct); +} + +public interface IRegistryConnector : IIntegrationConnector +{ + Task ResolveDigestAsync(string imageRef, CancellationToken ct); + Task VerifyDigestAsync(string imageRef, string expectedDigest, CancellationToken ct); +} +``` + +### 3. Step Provider (Execution Contract) + +```csharp +public interface IStepProvider +{ + StepExecutionCharacteristics Characteristics { get; } + Task ExecuteAsync(StepContext context, CancellationToken ct); + Task RollbackAsync(StepContext context, CancellationToken ct); +} + +public sealed record StepExecutionCharacteristics +{ + public bool IsIdempotent { get; init; } + public bool SupportsRollback { get; init; } + public TimeSpan DefaultTimeout { get; init; } + public ResourceRequirements Resources { get; init; } +} +``` + +## Invariants + +1. **Release identity is immutable** — Once created, a release's component digests cannot be changed. Create a new release instead. + +2. **Promotions are append-only** — Promotion state transitions are recorded; no edits or deletions. + +3. **Evidence packets are sealed** — Evidence is cryptographically signed and stored immutably. + +4. **Digest verification at deploy time** — Agents verify image digests at pull time; mismatch fails deployment. + +5. **Separation of duties enforced** — Requester cannot be sole approver for production promotions. + +6. **Workflow execution is deterministic** — Same inputs produce same execution order and outputs. + +## Error Handling + +- **Transient failures** — Retry with exponential backoff; circuit breaker for repeated failures +- **Agent disconnection** — Mark agent offline; reassign pending tasks to other agents +- **Deployment failure** — Automatic rollback if configured; otherwise mark promotion as failed +- **Gate failure** — Block promotion; require manual intervention or re-evaluation + +## Observability + +### Metrics + +- `release_promotions_total` — Counter by environment and outcome +- `release_deployments_duration_seconds` — Histogram of deployment times +- `release_gate_evaluations_total` — Counter by gate type and result +- `release_agents_online` — Gauge of online agents +- `release_workflow_steps_duration_seconds` — Histogram by step type + +### Traces + +- `promotion.request` — Span for promotion request handling +- `gate.evaluate` — Span for each gate evaluation +- `deployment.execute` — Span for deployment execution +- `agent.task` — Span for agent task execution + +### Logs + +- Structured logs with correlation IDs +- Promotion ID, release ID, environment ID in all relevant logs +- Sensitive data (secrets, credentials) masked + +## Security Considerations + +### Agent Security + +- **mTLS authentication** — Agents authenticate with CA-signed certificates +- **Short-lived credentials** — Task credentials expire after execution +- **Capability-based authorization** — Agents only receive tasks matching their capabilities +- **Heartbeat monitoring** — Detect and flag agent disconnections + +### Secrets Management + +- **Never stored in database** — Only vault references stored +- **Fetched at execution time** — Secrets retrieved just-in-time for deployment +- **Short-lived** — Dynamic credentials with minimal TTL +- **Masked in logs** — Secret values never logged + +### Plugin Sandbox + +- **Resource limits** — CPU, memory, timeout limits per plugin +- **Capability restrictions** — Plugins declare required capabilities +- **Network isolation** — Optional network restrictions for plugins + +## Performance Characteristics + +- **Promotion evaluation** — < 5 seconds for typical gate evaluation +- **Deployment latency** — Dominated by image pull time; orchestration overhead < 10 seconds +- **Agent heartbeat** — 30-second interval; offline detection within 90 seconds +- **Workflow step timeout** — Configurable; default 5 minutes per step + +## Implementation Roadmap + +| Phase | Focus | Key Deliverables | +|-------|-------|------------------| +| **Phase 1** | Foundation | Environment management, integration hub, release bundles | +| **Phase 2** | Workflow Engine | DAG execution, step registry, workflow templates | +| **Phase 3** | Promotion & Decision | Approval gateway, security gates, decision records | +| **Phase 4** | Deployment Execution | Docker/Compose agents, artifact generation, rollback | +| **Phase 5** | UI & Polish | Release dashboard, promotion UI, environment management | +| **Phase 6** | Progressive Delivery | A/B releases, canary, traffic routing | +| **Phase 7** | Extended Targets | ECS, Nomad, SSH/WinRM agentless | +| **Phase 8** | Plugin Ecosystem | Full plugin system, marketplace | + +## References + +- [Product Vision](../../product/VISION.md) +- [Architecture Overview](../../ARCHITECTURE_OVERVIEW.md) +- [Full Orchestrator Specification](../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) +- [Competitive Landscape](../../product/competitive-landscape.md) diff --git a/docs/modules/release-orchestrator/data-model/entities.md b/docs/modules/release-orchestrator/data-model/entities.md new file mode 100644 index 000000000..b538fb7a0 --- /dev/null +++ b/docs/modules/release-orchestrator/data-model/entities.md @@ -0,0 +1,343 @@ +# Entity Definitions + +This document describes the core entities in the Release Orchestrator data model. + +## Entity Relationship Overview + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ ENTITY RELATIONSHIPS │ +│ │ +│ ┌──────────┐ ┌──────────────┐ ┌────────────┐ │ +│ │ Tenant │───────│ Environment │───────│ Target │ │ +│ └──────────┘ └──────────────┘ └────────────┘ │ +│ │ │ │ │ +│ │ │ │ │ +│ ▼ ▼ ▼ │ +│ ┌──────────┐ ┌──────────────┐ ┌────────────┐ │ +│ │ Component│ │ Approval │ │ Agent │ │ +│ └──────────┘ │ Policy │ └────────────┘ │ +│ │ └──────────────┘ │ │ +│ │ │ │ │ +│ ▼ │ ▼ │ +│ ┌──────────┐ │ ┌─────────────┐ │ +│ │ Version │ │ │ Deployment │ │ +│ │ Map │ │ │ Task │ │ +│ └──────────┘ │ └─────────────┘ │ +│ │ │ │ │ +│ │ │ │ │ +│ ▼ │ ▼ │ +│ ┌─────────────────────────┼─────────────────────────────┐ │ +│ │ │ │ │ +│ │ ┌──────────┐ ┌─────▼─────┐ ┌─────────────┐ │ │ +│ │ │ Release │─────│ Promotion │─────│ Deployment │ │ │ +│ │ └──────────┘ └───────────┘ │ Job │ │ │ +│ │ │ │ └─────────────┘ │ │ +│ │ │ │ │ │ │ +│ │ │ ▼ │ │ │ +│ │ │ ┌───────────┐ │ │ │ +│ │ │ │ Approval │ │ │ │ +│ │ │ └───────────┘ │ │ │ +│ │ │ │ │ │ │ +│ │ │ ▼ ▼ │ │ +│ │ │ ┌───────────┐ ┌───────────┐ │ │ +│ │ │ │ Decision │ │ Generated │ │ │ +│ │ │ │ Record │ │ Artifacts │ │ │ +│ │ │ └───────────┘ └───────────┘ │ │ +│ │ │ │ │ │ │ +│ │ │ └────────┬────────┘ │ │ +│ │ │ │ │ │ +│ │ │ ▼ │ │ +│ │ │ ┌───────────┐ │ │ +│ │ └───────────────────►│ Evidence │◄────────────┘ │ +│ │ │ Packet │ │ +│ │ └───────────┘ │ +│ │ │ │ +│ │ ▼ │ +│ │ ┌───────────┐ │ +│ │ │ Version │ │ +│ │ │ Sticker │ │ +│ │ └───────────┘ │ +│ │ │ +│ └─────────────────────────────────────────────────────────────────────────┘ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Core Entities + +### Environment + +Represents a deployment target environment (dev, staging, production). + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `tenant_id` | UUID | Tenant reference | +| `name` | string | Unique name (e.g., "prod") | +| `display_name` | string | Display name (e.g., "Production") | +| `order_index` | integer | Promotion order | +| `config` | JSONB | Environment configuration | +| `freeze_windows` | JSONB | Active freeze windows | +| `required_approvals` | integer | Approvals needed for promotion | +| `require_sod` | boolean | Require separation of duties | +| `created_at` | timestamp | Creation time | + +### Target + +Represents a deployment target (host, service). + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `tenant_id` | UUID | Tenant reference | +| `environment_id` | UUID | Environment reference | +| `name` | string | Target name | +| `target_type` | string | Type (docker_host, compose_host, etc.) | +| `connection` | JSONB | Connection configuration | +| `labels` | JSONB | Target labels | +| `health_status` | string | Current health status | +| `current_digest` | string | Currently deployed digest | + +### Agent + +Represents a deployment agent. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `tenant_id` | UUID | Tenant reference | +| `name` | string | Agent name | +| `version` | string | Agent version | +| `capabilities` | JSONB | Agent capabilities | +| `status` | string | online/offline/degraded | +| `last_heartbeat` | timestamp | Last heartbeat time | + +### Component + +Represents a deployable component (maps to an image repository). + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `tenant_id` | UUID | Tenant reference | +| `name` | string | Component name | +| `display_name` | string | Display name | +| `image_repository` | string | Image repository URL | +| `versioning_strategy` | JSONB | How versions are determined | +| `default_channel` | string | Default version channel | + +### Version Map + +Maps image tags to digests and semantic versions. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `component_id` | UUID | Component reference | +| `tag` | string | Image tag | +| `digest` | string | Image digest (sha256:...) | +| `semver` | string | Semantic version | +| `channel` | string | Version channel (stable, beta) | + +### Release + +A versioned bundle of component digests. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `tenant_id` | UUID | Tenant reference | +| `name` | string | Release name | +| `display_name` | string | Display name | +| `components` | JSONB | Component/digest mappings | +| `source_ref` | JSONB | Source code reference | +| `status` | string | draft/ready/deployed/deprecated | +| `created_by` | UUID | Creator user reference | + +### Promotion + +A request to promote a release to an environment. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `tenant_id` | UUID | Tenant reference | +| `release_id` | UUID | Release reference | +| `source_environment_id` | UUID | Source environment (nullable) | +| `target_environment_id` | UUID | Target environment | +| `status` | string | Promotion status | +| `decision_record` | JSONB | Gate evaluation results | +| `workflow_run_id` | UUID | Associated workflow run | +| `requested_by` | UUID | Requesting user | +| `requested_at` | timestamp | Request time | + +### Approval + +An approval or rejection of a promotion. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `promotion_id` | UUID | Promotion reference | +| `approver_id` | UUID | Approving user | +| `action` | string | approved/rejected | +| `comment` | string | Approval comment | +| `approved_at` | timestamp | Approval time | + +### Deployment Job + +A deployment execution job. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `promotion_id` | UUID | Promotion reference | +| `release_id` | UUID | Release reference | +| `environment_id` | UUID | Environment reference | +| `status` | string | Job status | +| `strategy` | string | Deployment strategy | +| `artifacts` | JSONB | Generated artifacts | +| `rollback_of` | UUID | If rollback, original job | + +### Deployment Task + +A task to deploy to a single target. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `job_id` | UUID | Job reference | +| `target_id` | UUID | Target reference | +| `digest` | string | Digest to deploy | +| `status` | string | Task status | +| `agent_id` | UUID | Assigned agent | +| `logs` | text | Execution logs | +| `previous_digest` | string | Previous digest (for rollback) | + +### Evidence Packet + +Immutable audit evidence for a promotion/deployment. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `promotion_id` | UUID | Promotion reference | +| `packet_type` | string | Type of evidence | +| `content` | JSONB | Evidence content | +| `content_hash` | string | SHA-256 of content | +| `signature` | string | Cryptographic signature | +| `signer_key_ref` | string | Signing key reference | +| `created_at` | timestamp | Creation time (no update) | + +### Version Sticker + +Version marker placed on deployment targets. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `target_id` | UUID | Target reference | +| `release_id` | UUID | Release reference | +| `promotion_id` | UUID | Promotion reference | +| `sticker_content` | JSONB | Sticker JSON content | +| `content_hash` | string | Content hash | +| `written_at` | timestamp | Write time | +| `drift_detected` | boolean | Drift detection flag | + +## Workflow Entities + +### Workflow Template + +A reusable workflow definition. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `tenant_id` | UUID | Tenant reference (null for builtin) | +| `name` | string | Template name | +| `version` | integer | Template version | +| `nodes` | JSONB | Step nodes | +| `edges` | JSONB | Step edges | +| `inputs` | JSONB | Input definitions | +| `outputs` | JSONB | Output definitions | +| `is_builtin` | boolean | Is built-in template | + +### Workflow Run + +An execution of a workflow template. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `template_id` | UUID | Template reference | +| `template_version` | integer | Template version at execution | +| `status` | string | Run status | +| `context` | JSONB | Execution context | +| `inputs` | JSONB | Input values | +| `outputs` | JSONB | Output values | +| `started_at` | timestamp | Start time | +| `completed_at` | timestamp | Completion time | + +### Step Run + +Execution of a single step within a workflow run. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `workflow_run_id` | UUID | Workflow run reference | +| `node_id` | string | Node ID from template | +| `status` | string | Step status | +| `inputs` | JSONB | Resolved inputs | +| `outputs` | JSONB | Produced outputs | +| `logs` | text | Execution logs | +| `attempt_number` | integer | Retry attempt number | + +## Plugin Entities + +### Plugin + +A registered plugin. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `plugin_id` | string | Unique plugin identifier | +| `version` | string | Plugin version | +| `vendor` | string | Plugin vendor | +| `manifest` | JSONB | Plugin manifest | +| `status` | string | Plugin status | +| `entrypoint` | string | Plugin entrypoint path | + +### Plugin Instance + +A tenant-specific plugin configuration. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `plugin_id` | UUID | Plugin reference | +| `tenant_id` | UUID | Tenant reference | +| `config` | JSONB | Tenant configuration | +| `enabled` | boolean | Is enabled for tenant | + +## Integration Entities + +### Integration + +A configured external integration. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Primary key | +| `tenant_id` | UUID | Tenant reference | +| `type_id` | string | Integration type | +| `name` | string | Integration name | +| `config` | JSONB | Integration configuration | +| `credential_ref` | string | Vault credential reference | +| `health_status` | string | Connection health | + +## References + +- [Database Schema](schema.md) +- [Module Overview](../modules/overview.md) diff --git a/docs/modules/release-orchestrator/data-model/schema.md b/docs/modules/release-orchestrator/data-model/schema.md new file mode 100644 index 000000000..68539d111 --- /dev/null +++ b/docs/modules/release-orchestrator/data-model/schema.md @@ -0,0 +1,631 @@ +# Database Schema (PostgreSQL) + +This document specifies the complete PostgreSQL schema for the Release Orchestrator. + +## Schema Organization + +All release orchestration tables reside in the `release` schema: + +```sql +CREATE SCHEMA IF NOT EXISTS release; +SET search_path TO release, public; +``` + +## Core Tables + +### Tenant and Authority Extensions + +```sql +-- Extended: Add release-related permissions +ALTER TABLE permissions ADD COLUMN IF NOT EXISTS + resource_type VARCHAR(50) CHECK (resource_type IN ( + 'environment', 'release', 'promotion', 'target', 'workflow', 'plugin' + )); +``` + +--- + +## Integration Hub + +```sql +CREATE TABLE integration_types ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + name VARCHAR(100) NOT NULL UNIQUE, + category VARCHAR(50) NOT NULL CHECK (category IN ( + 'scm', 'ci', 'registry', 'vault', 'target', 'router' + )), + plugin_id UUID REFERENCES plugins(id), + config_schema JSONB NOT NULL, + secrets_schema JSONB NOT NULL, + is_builtin BOOLEAN NOT NULL DEFAULT FALSE, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE TABLE integrations ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + integration_type_id UUID NOT NULL REFERENCES integration_types(id), + name VARCHAR(255) NOT NULL, + config JSONB NOT NULL, + credential_ref VARCHAR(500), -- Vault path or encrypted ref + status VARCHAR(50) NOT NULL DEFAULT 'unknown' CHECK (status IN ( + 'healthy', 'degraded', 'unhealthy', 'unknown' + )), + last_health_check TIMESTAMPTZ, + last_health_message TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + created_by UUID REFERENCES users(id), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_integrations_tenant ON integrations(tenant_id); +CREATE INDEX idx_integrations_type ON integrations(integration_type_id); + +CREATE TABLE connection_profiles ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + user_id UUID NOT NULL REFERENCES users(id), + integration_type_id UUID NOT NULL REFERENCES integration_types(id), + name VARCHAR(255) NOT NULL, + config_defaults JSONB NOT NULL, + is_default BOOLEAN NOT NULL DEFAULT FALSE, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, user_id, integration_type_id, name) +); +``` + +--- + +## Environment & Inventory + +```sql +CREATE TABLE environments ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + name VARCHAR(100) NOT NULL, + display_name VARCHAR(255) NOT NULL, + order_index INTEGER NOT NULL, + config JSONB NOT NULL DEFAULT '{}', + freeze_windows JSONB NOT NULL DEFAULT '[]', + required_approvals INTEGER NOT NULL DEFAULT 0, + require_sod BOOLEAN NOT NULL DEFAULT FALSE, + auto_promote_from UUID REFERENCES environments(id), + promotion_policy VARCHAR(255), + deployment_timeout INTEGER NOT NULL DEFAULT 600, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_environments_tenant ON environments(tenant_id); +CREATE INDEX idx_environments_order ON environments(tenant_id, order_index); + +CREATE TABLE target_groups ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES environments(id) ON DELETE CASCADE, + name VARCHAR(255) NOT NULL, + labels JSONB NOT NULL DEFAULT '{}', + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, environment_id, name) +); + +CREATE TABLE targets ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES environments(id) ON DELETE CASCADE, + target_group_id UUID REFERENCES target_groups(id), + name VARCHAR(255) NOT NULL, + target_type VARCHAR(100) NOT NULL, + connection JSONB NOT NULL, + capabilities JSONB NOT NULL DEFAULT '[]', + labels JSONB NOT NULL DEFAULT '{}', + deployment_directory VARCHAR(500), + health_status VARCHAR(50) NOT NULL DEFAULT 'unknown' CHECK (health_status IN ( + 'healthy', 'degraded', 'unhealthy', 'unknown' + )), + last_health_check TIMESTAMPTZ, + current_digest VARCHAR(100), + agent_id UUID REFERENCES agents(id), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, environment_id, name) +); + +CREATE INDEX idx_targets_tenant_env ON targets(tenant_id, environment_id); +CREATE INDEX idx_targets_type ON targets(target_type); +CREATE INDEX idx_targets_labels ON targets USING GIN (labels); + +CREATE TABLE agents ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + name VARCHAR(255) NOT NULL, + version VARCHAR(50) NOT NULL, + capabilities JSONB NOT NULL DEFAULT '[]', + labels JSONB NOT NULL DEFAULT '{}', + status VARCHAR(50) NOT NULL DEFAULT 'offline' CHECK (status IN ( + 'online', 'offline', 'degraded' + )), + last_heartbeat TIMESTAMPTZ, + resource_usage JSONB, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_agents_tenant ON agents(tenant_id); +CREATE INDEX idx_agents_status ON agents(status); +CREATE INDEX idx_agents_capabilities ON agents USING GIN (capabilities); +``` + +--- + +## Release Management + +```sql +CREATE TABLE components ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + name VARCHAR(255) NOT NULL, + display_name VARCHAR(255) NOT NULL, + image_repository VARCHAR(500) NOT NULL, + registry_integration_id UUID REFERENCES integrations(id), + versioning_strategy JSONB NOT NULL DEFAULT '{"type": "semver"}', + deployment_template VARCHAR(255), + default_channel VARCHAR(50) NOT NULL DEFAULT 'stable', + metadata JSONB NOT NULL DEFAULT '{}', + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_components_tenant ON components(tenant_id); + +CREATE TABLE version_maps ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + component_id UUID NOT NULL REFERENCES components(id) ON DELETE CASCADE, + tag VARCHAR(255) NOT NULL, + digest VARCHAR(100) NOT NULL, + semver VARCHAR(50), + channel VARCHAR(50) NOT NULL DEFAULT 'stable', + prerelease BOOLEAN NOT NULL DEFAULT FALSE, + build_metadata VARCHAR(255), + resolved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + source VARCHAR(50) NOT NULL DEFAULT 'auto' CHECK (source IN ('auto', 'manual')), + UNIQUE (tenant_id, component_id, digest) +); + +CREATE INDEX idx_version_maps_component ON version_maps(component_id); +CREATE INDEX idx_version_maps_digest ON version_maps(digest); +CREATE INDEX idx_version_maps_semver ON version_maps(semver); + +CREATE TABLE releases ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + name VARCHAR(255) NOT NULL, + display_name VARCHAR(255) NOT NULL, + components JSONB NOT NULL, -- [{componentId, digest, semver, tag, role}] + source_ref JSONB, -- {scmIntegrationId, commitSha, ciIntegrationId, buildId} + status VARCHAR(50) NOT NULL DEFAULT 'draft' CHECK (status IN ( + 'draft', 'ready', 'promoting', 'deployed', 'deprecated', 'archived' + )), + metadata JSONB NOT NULL DEFAULT '{}', + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + created_by UUID REFERENCES users(id), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_releases_tenant ON releases(tenant_id); +CREATE INDEX idx_releases_status ON releases(status); +CREATE INDEX idx_releases_created ON releases(created_at DESC); + +CREATE TABLE release_environment_state ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES environments(id) ON DELETE CASCADE, + release_id UUID NOT NULL REFERENCES releases(id), + status VARCHAR(50) NOT NULL CHECK (status IN ( + 'deployed', 'deploying', 'failed', 'rolling_back', 'rolled_back' + )), + deployed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + deployed_by UUID REFERENCES users(id), + promotion_id UUID, -- will reference promotions + evidence_ref VARCHAR(255), + UNIQUE (tenant_id, environment_id) +); + +CREATE INDEX idx_release_env_state_env ON release_environment_state(environment_id); +CREATE INDEX idx_release_env_state_release ON release_environment_state(release_id); +``` + +--- + +## Workflow Engine + +```sql +CREATE TABLE workflow_templates ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID REFERENCES tenants(id) ON DELETE CASCADE, -- NULL for builtin + name VARCHAR(255) NOT NULL, + display_name VARCHAR(255) NOT NULL, + description TEXT, + version INTEGER NOT NULL DEFAULT 1, + nodes JSONB NOT NULL, + edges JSONB NOT NULL, + inputs JSONB NOT NULL DEFAULT '[]', + outputs JSONB NOT NULL DEFAULT '[]', + is_builtin BOOLEAN NOT NULL DEFAULT FALSE, + tags JSONB NOT NULL DEFAULT '[]', + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + created_by UUID REFERENCES users(id), + UNIQUE (tenant_id, name, version) +); + +CREATE INDEX idx_workflow_templates_tenant ON workflow_templates(tenant_id); +CREATE INDEX idx_workflow_templates_builtin ON workflow_templates(is_builtin); + +CREATE TABLE workflow_runs ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + template_id UUID NOT NULL REFERENCES workflow_templates(id), + template_version INTEGER NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'created' CHECK (status IN ( + 'created', 'running', 'paused', 'succeeded', 'failed', 'cancelled' + )), + context JSONB NOT NULL, -- inputs, variables, release info + outputs JSONB, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + error_message TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + triggered_by UUID REFERENCES users(id) +); + +CREATE INDEX idx_workflow_runs_tenant ON workflow_runs(tenant_id); +CREATE INDEX idx_workflow_runs_status ON workflow_runs(status); +CREATE INDEX idx_workflow_runs_template ON workflow_runs(template_id); + +CREATE TABLE step_runs ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + workflow_run_id UUID NOT NULL REFERENCES workflow_runs(id) ON DELETE CASCADE, + node_id VARCHAR(100) NOT NULL, + step_type VARCHAR(100) NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN ( + 'pending', 'running', 'succeeded', 'failed', 'skipped', 'retrying', 'cancelled' + )), + inputs JSONB NOT NULL, + config JSONB NOT NULL, + outputs JSONB, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + attempt_number INTEGER NOT NULL DEFAULT 1, + error_message TEXT, + error_type VARCHAR(100), + logs TEXT, + artifacts JSONB NOT NULL DEFAULT '[]', + t_hlc BIGINT, -- Hybrid Logical Clock for ordering (optional) + ts_wall TIMESTAMPTZ, -- Wall-clock timestamp for debugging (optional) + UNIQUE (workflow_run_id, node_id, attempt_number) +); + +CREATE INDEX idx_step_runs_workflow ON step_runs(workflow_run_id); +CREATE INDEX idx_step_runs_status ON step_runs(status); +``` + +--- + +## Promotion & Approval + +```sql +CREATE TABLE promotions ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + release_id UUID NOT NULL REFERENCES releases(id), + source_environment_id UUID REFERENCES environments(id), + target_environment_id UUID NOT NULL REFERENCES environments(id), + status VARCHAR(50) NOT NULL DEFAULT 'pending_approval' CHECK (status IN ( + 'pending_approval', 'pending_gate', 'approved', 'rejected', + 'deploying', 'deployed', 'failed', 'cancelled', 'rolled_back' + )), + decision_record JSONB, + workflow_run_id UUID REFERENCES workflow_runs(id), + requested_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + requested_by UUID NOT NULL REFERENCES users(id), + request_reason TEXT, + decided_at TIMESTAMPTZ, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + evidence_packet_id UUID, + t_hlc BIGINT, -- Hybrid Logical Clock for ordering (optional) + ts_wall TIMESTAMPTZ -- Wall-clock timestamp for debugging (optional) +); + +CREATE INDEX idx_promotions_tenant ON promotions(tenant_id); +CREATE INDEX idx_promotions_release ON promotions(release_id); +CREATE INDEX idx_promotions_status ON promotions(status); +CREATE INDEX idx_promotions_target_env ON promotions(target_environment_id); + +-- Add FK to release_environment_state +ALTER TABLE release_environment_state + ADD CONSTRAINT fk_release_env_state_promotion + FOREIGN KEY (promotion_id) REFERENCES promotions(id); + +CREATE TABLE approvals ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + promotion_id UUID NOT NULL REFERENCES promotions(id) ON DELETE CASCADE, + approver_id UUID NOT NULL REFERENCES users(id), + action VARCHAR(50) NOT NULL CHECK (action IN ('approved', 'rejected')), + comment TEXT, + approved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + approver_role VARCHAR(255), + approver_groups JSONB NOT NULL DEFAULT '[]' +); + +CREATE INDEX idx_approvals_promotion ON approvals(promotion_id); +CREATE INDEX idx_approvals_approver ON approvals(approver_id); + +CREATE TABLE approval_policies ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES environments(id) ON DELETE CASCADE, + required_count INTEGER NOT NULL DEFAULT 1, + required_roles JSONB NOT NULL DEFAULT '[]', + required_groups JSONB NOT NULL DEFAULT '[]', + require_sod BOOLEAN NOT NULL DEFAULT FALSE, + allow_self_approval BOOLEAN NOT NULL DEFAULT FALSE, + expiration_minutes INTEGER NOT NULL DEFAULT 1440, + UNIQUE (tenant_id, environment_id) +); +``` + +--- + +## Deployment + +```sql +CREATE TABLE deployment_jobs ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + promotion_id UUID NOT NULL REFERENCES promotions(id), + release_id UUID NOT NULL REFERENCES releases(id), + environment_id UUID NOT NULL REFERENCES environments(id), + status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN ( + 'pending', 'running', 'succeeded', 'failed', 'cancelled', 'rolling_back', 'rolled_back' + )), + strategy VARCHAR(50) NOT NULL DEFAULT 'all-at-once', + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + artifacts JSONB NOT NULL DEFAULT '[]', + rollback_of UUID REFERENCES deployment_jobs(id), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + t_hlc BIGINT, -- Hybrid Logical Clock for ordering (optional) + ts_wall TIMESTAMPTZ -- Wall-clock timestamp for debugging (optional) +); + +CREATE INDEX idx_deployment_jobs_promotion ON deployment_jobs(promotion_id); +CREATE INDEX idx_deployment_jobs_status ON deployment_jobs(status); + +CREATE TABLE deployment_tasks ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + job_id UUID NOT NULL REFERENCES deployment_jobs(id) ON DELETE CASCADE, + target_id UUID NOT NULL REFERENCES targets(id), + digest VARCHAR(100) NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN ( + 'pending', 'running', 'succeeded', 'failed', 'cancelled', 'skipped' + )), + agent_id UUID REFERENCES agents(id), + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + exit_code INTEGER, + logs TEXT, + previous_digest VARCHAR(100), + sticker_written BOOLEAN NOT NULL DEFAULT FALSE, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_deployment_tasks_job ON deployment_tasks(job_id); +CREATE INDEX idx_deployment_tasks_target ON deployment_tasks(target_id); +CREATE INDEX idx_deployment_tasks_status ON deployment_tasks(status); + +CREATE TABLE generated_artifacts ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + deployment_job_id UUID REFERENCES deployment_jobs(id) ON DELETE CASCADE, + artifact_type VARCHAR(50) NOT NULL CHECK (artifact_type IN ( + 'compose_lock', 'script', 'sticker', 'evidence', 'config' + )), + name VARCHAR(255) NOT NULL, + content_hash VARCHAR(100) NOT NULL, + content BYTEA, -- for small artifacts + storage_ref VARCHAR(500), -- for large artifacts (S3, etc.) + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_generated_artifacts_job ON generated_artifacts(deployment_job_id); +``` + +--- + +## Progressive Delivery + +```sql +CREATE TABLE ab_releases ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES environments(id), + name VARCHAR(255) NOT NULL, + variations JSONB NOT NULL, -- [{name, releaseId, targetGroupId, trafficPercentage}] + active_variation VARCHAR(50) NOT NULL DEFAULT 'A', + traffic_split JSONB NOT NULL, + rollout_strategy JSONB NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'created' CHECK (status IN ( + 'created', 'deploying', 'running', 'promoting', 'completed', 'rolled_back' + )), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + completed_at TIMESTAMPTZ, + created_by UUID REFERENCES users(id) +); + +CREATE INDEX idx_ab_releases_tenant_env ON ab_releases(tenant_id, environment_id); +CREATE INDEX idx_ab_releases_status ON ab_releases(status); + +CREATE TABLE canary_stages ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + ab_release_id UUID NOT NULL REFERENCES ab_releases(id) ON DELETE CASCADE, + stage_number INTEGER NOT NULL, + traffic_percentage INTEGER NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN ( + 'pending', 'running', 'succeeded', 'failed', 'skipped' + )), + health_threshold DECIMAL(5,2), + duration_seconds INTEGER, + require_approval BOOLEAN NOT NULL DEFAULT FALSE, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + health_result JSONB, + UNIQUE (ab_release_id, stage_number) +); +``` + +--- + +## Release Evidence + +```sql +CREATE TABLE evidence_packets ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + promotion_id UUID NOT NULL REFERENCES promotions(id), + packet_type VARCHAR(50) NOT NULL CHECK (packet_type IN ( + 'release_decision', 'deployment', 'rollback', 'ab_promotion' + )), + content JSONB NOT NULL, + content_hash VARCHAR(100) NOT NULL, + signature TEXT, + signer_key_ref VARCHAR(255), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() + -- Note: No UPDATE or DELETE allowed (append-only) +); + +CREATE INDEX idx_evidence_packets_promotion ON evidence_packets(promotion_id); +CREATE INDEX idx_evidence_packets_created ON evidence_packets(created_at DESC); + +-- Append-only enforcement via trigger +CREATE OR REPLACE FUNCTION prevent_evidence_modification() +RETURNS TRIGGER AS $$ +BEGIN + RAISE EXCEPTION 'Evidence packets are immutable and cannot be modified or deleted'; +END; +$$ LANGUAGE plpgsql; + +CREATE TRIGGER evidence_packets_immutable +BEFORE UPDATE OR DELETE ON evidence_packets +FOR EACH ROW EXECUTE FUNCTION prevent_evidence_modification(); + +CREATE TABLE version_stickers ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + target_id UUID NOT NULL REFERENCES targets(id), + deployment_job_id UUID REFERENCES deployment_jobs(id), + release_id UUID NOT NULL REFERENCES releases(id), + digest VARCHAR(100) NOT NULL, + sticker_content JSONB NOT NULL, + written_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + verified_at TIMESTAMPTZ, + verification_status VARCHAR(50) CHECK (verification_status IN ('valid', 'mismatch', 'missing')) +); + +CREATE INDEX idx_version_stickers_target ON version_stickers(target_id); +CREATE INDEX idx_version_stickers_release ON version_stickers(release_id); +``` + +--- + +## Plugin Infrastructure + +```sql +CREATE TABLE plugins ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + name VARCHAR(255) NOT NULL UNIQUE, + display_name VARCHAR(255) NOT NULL, + version VARCHAR(50) NOT NULL, + description TEXT, + manifest JSONB NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'inactive' CHECK (status IN ( + 'active', 'inactive', 'error' + )), + error_message TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE TABLE plugin_instances ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + plugin_id UUID NOT NULL REFERENCES plugins(id), + config JSONB NOT NULL DEFAULT '{}', + enabled BOOLEAN NOT NULL DEFAULT TRUE, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, plugin_id) +); + +CREATE INDEX idx_plugin_instances_tenant ON plugin_instances(tenant_id); +``` + +--- + +--- + +## Hybrid Logical Clock (HLC) for Distributed Ordering + +**Optional Enhancement**: For strict distributed ordering and multi-region support, the following tables include optional `t_hlc` (Hybrid Logical Clock timestamp) and `ts_wall` (wall-clock timestamp) columns: + +- `promotions` — Promotion state transitions +- `deployment_jobs` — Deployment task ordering +- `step_runs` — Workflow step execution ordering + +**When to use HLC**: +- Multi-region deployments requiring strict causal ordering +- Deterministic replay across distributed systems +- Timeline event ordering in audit logs + +**HLC Schema**: +```sql +t_hlc BIGINT -- HLC timestamp (monotonic, skew-tolerant) +ts_wall TIMESTAMPTZ -- Wall-clock timestamp (informational) +``` + +**Usage**: +- `t_hlc` is generated by `IHybridLogicalClock.Tick()` on state transitions +- `ts_wall` is populated by `TimeProvider.GetUtcNow()` for debugging +- Index on `t_hlc` for ordering queries: `CREATE INDEX idx_promotions_hlc ON promotions(t_hlc);` + +**Reference**: See [Implementation Guide](../implementation-guide.md#hybrid-logical-clock-hlc-for-distributed-ordering) for HLC usage patterns. + +--- + +## Row-Level Security (Multi-Tenancy) + +All tables with `tenant_id` should have RLS enabled: + +```sql +-- Enable RLS on all release tables +ALTER TABLE integrations ENABLE ROW LEVEL SECURITY; +ALTER TABLE environments ENABLE ROW LEVEL SECURITY; +ALTER TABLE targets ENABLE ROW LEVEL SECURITY; +ALTER TABLE releases ENABLE ROW LEVEL SECURITY; +ALTER TABLE promotions ENABLE ROW LEVEL SECURITY; +-- ... etc. + +-- Example policy +CREATE POLICY tenant_isolation ON integrations + FOR ALL + USING (tenant_id = current_setting('app.tenant_id')::UUID); +``` diff --git a/docs/modules/release-orchestrator/deployment/agent-based.md b/docs/modules/release-orchestrator/deployment/agent-based.md new file mode 100644 index 000000000..1909ede15 --- /dev/null +++ b/docs/modules/release-orchestrator/deployment/agent-based.md @@ -0,0 +1,403 @@ +# Agent-Based Deployment + +> Agent-based deployment using Docker and Compose agents for executing tasks on targets. + +**Status:** Planned (not yet implemented) +**Source:** [Architecture Advisory Section 10.3](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) +**Related Modules:** [Agents Module](../modules/agents.md), [Deploy Orchestrator](../modules/deploy-orchestrator.md) +**Sprints:** [108_002 Docker Agent](../../../../implplan/SPRINT_20260110_108_002_AGENTS_docker.md), [108_003 Compose Agent](../../../../implplan/SPRINT_20260110_108_003_AGENTS_compose.md) + +## Overview + +Agent-based deployment uses lightweight agents installed on target hosts to execute deployment tasks. Agents communicate with the orchestrator over mTLS and receive tasks through heartbeat polling or WebSocket streams. + +--- + +## Agent Task Protocol + +### Task Payload Structure + +```typescript +// Task assignment (Core -> Agent) +interface AgentTask { + id: UUID; + type: TaskType; + targetId: UUID; + payload: TaskPayload; + credentials: EncryptedCredentials; + timeout: number; + priority: TaskPriority; + idempotencyKey: string; + assignedAt: DateTime; + expiresAt: DateTime; +} + +type TaskType = + | "deploy" + | "rollback" + | "health-check" + | "inspect" + | "execute-command" + | "upload-files" + | "write-sticker" + | "read-sticker"; + +interface DeployTaskPayload { + image: string; + digest: string; + config: DeployConfig; + artifacts: ArtifactReference[]; + previousDigest?: string; + hooks: { + preDeploy?: HookConfig; + postDeploy?: HookConfig; + }; +} +``` + +### Task Result Structure + +```typescript +// Task result (Agent -> Core) +interface TaskResult { + taskId: UUID; + success: boolean; + startedAt: DateTime; + completedAt: DateTime; + + // Success details + outputs?: Record; + artifacts?: ArtifactReference[]; + + // Failure details + error?: string; + errorType?: string; + retriable?: boolean; + + // Logs + logs: string; + + // Metrics + metrics: { + pullDurationMs?: number; + deployDurationMs?: number; + healthCheckDurationMs?: number; + }; +} +``` + +--- + +## Docker Agent Implementation + +The Docker agent deploys single containers to Docker hosts with digest verification. + +### Docker Agent Capabilities + +- Pull images with digest verification +- Create and start containers +- Stop and remove containers +- Health check monitoring +- Version sticker management +- Rollback to previous container + +### Deploy Task Flow + +```typescript +class DockerAgent implements TargetExecutor { + private docker: Docker; + + async deploy(task: DeployTaskPayload): Promise { + const { image, digest, config, previousDigest } = task; + const containerName = config.containerName; + + // 1. Pull image and verify digest + this.log(`Pulling image ${image}@${digest}`); + await this.docker.pull(image, { digest }); + + const pulledDigest = await this.getImageDigest(image); + if (pulledDigest !== digest) { + throw new DigestMismatchError( + `Expected digest ${digest}, got ${pulledDigest}. Possible tampering detected.` + ); + } + + // 2. Run pre-deploy hook + if (task.hooks?.preDeploy) { + await this.runHook(task.hooks.preDeploy, "pre-deploy"); + } + + // 3. Stop and rename existing container + const existingContainer = await this.findContainer(containerName); + if (existingContainer) { + this.log(`Stopping existing container ${containerName}`); + await existingContainer.stop({ t: 10 }); + await existingContainer.rename(`${containerName}-previous-${Date.now()}`); + } + + // 4. Create new container + this.log(`Creating container ${containerName} from ${image}@${digest}`); + const container = await this.docker.createContainer({ + name: containerName, + Image: `${image}@${digest}`, // Always use digest, not tag + Env: this.buildEnvVars(config.environment), + HostConfig: { + PortBindings: this.buildPortBindings(config.ports), + Binds: this.buildBindMounts(config.volumes), + RestartPolicy: { Name: config.restartPolicy || "unless-stopped" }, + Memory: config.memoryLimit, + CpuQuota: config.cpuLimit, + }, + Labels: { + "stella.release.id": config.releaseId, + "stella.release.name": config.releaseName, + "stella.digest": digest, + "stella.deployed.at": new Date().toISOString(), + }, + }); + + // 5. Start container + this.log(`Starting container ${containerName}`); + await container.start(); + + // 6. Wait for container to be healthy (if health check configured) + if (config.healthCheck) { + this.log(`Waiting for container health check`); + const healthy = await this.waitForHealthy(container, config.healthCheck.timeout); + if (!healthy) { + // Rollback to previous container + await this.rollbackContainer(containerName, existingContainer); + throw new HealthCheckFailedError(`Container ${containerName} failed health check`); + } + } + + // 7. Run post-deploy hook + if (task.hooks?.postDeploy) { + await this.runHook(task.hooks.postDeploy, "post-deploy"); + } + + // 8. Cleanup previous container + if (existingContainer && config.cleanupPrevious !== false) { + this.log(`Removing previous container`); + await existingContainer.remove({ force: true }); + } + + return { + success: true, + containerId: container.id, + previousDigest: previousDigest, + logs: this.getLogs(), + durationMs: this.getDuration(), + }; + } +} +``` + +### Rollback Implementation + +```typescript +async rollback(task: RollbackTaskPayload): Promise { + const { containerName, targetDigest } = task; + + // Find previous container or use specified digest + if (targetDigest) { + // Deploy specific digest + return this.deploy({ + ...task, + digest: targetDigest, + }); + } + + // Find and restore previous container + const previousContainer = await this.findContainer(`${containerName}-previous-*`); + if (!previousContainer) { + throw new RollbackError(`No previous container found for ${containerName}`); + } + + // Stop current, rename, start previous + const currentContainer = await this.findContainer(containerName); + if (currentContainer) { + await currentContainer.stop({ t: 10 }); + await currentContainer.rename(`${containerName}-failed-${Date.now()}`); + } + + await previousContainer.rename(containerName); + await previousContainer.start(); + + return { + success: true, + containerId: previousContainer.id, + logs: this.getLogs(), + durationMs: this.getDuration(), + }; +} +``` + +### Version Sticker Management + +```typescript +async writeSticker(sticker: VersionSticker): Promise { + const stickerPath = this.config.stickerPath || "/var/stella/version.json"; + const stickerContent = JSON.stringify(sticker, null, 2); + + // Write to host filesystem or container volume + if (this.config.stickerLocation === "volume") { + // Write to shared volume + await this.docker.run("alpine", [ + "sh", "-c", + `echo '${stickerContent}' > ${stickerPath}` + ], { + HostConfig: { + Binds: [`${this.config.stickerVolume}:/var/stella`] + } + }); + } else { + // Write directly to host + fs.writeFileSync(stickerPath, stickerContent); + } +} +``` + +--- + +## Compose Agent Implementation + +The Compose agent deploys multi-container applications defined in Docker Compose files. + +### Compose Agent Capabilities + +- Pull images for all services +- Verify digests for all services +- Deploy using compose lock files +- Health check all services +- Rollback to previous deployment +- Version sticker management + +### Deploy Task Flow + +```typescript +class ComposeAgent implements TargetExecutor { + async deploy(task: DeployTaskPayload): Promise { + const { artifacts, config } = task; + const deployDir = config.deploymentDirectory; + + // 1. Write compose lock file + const composeLock = artifacts.find(a => a.type === "compose_lock"); + const composeContent = await this.fetchArtifact(composeLock); + + const composePath = path.join(deployDir, "compose.stella.lock.yml"); + await fs.writeFile(composePath, composeContent); + + // 2. Write any additional config files + for (const artifact of artifacts.filter(a => a.type === "config")) { + const content = await this.fetchArtifact(artifact); + await fs.writeFile(path.join(deployDir, artifact.name), content); + } + + // 3. Run pre-deploy hook + if (task.hooks?.preDeploy) { + await this.runHook(task.hooks.preDeploy, deployDir); + } + + // 4. Pull images + this.log("Pulling images..."); + const pullResult = await this.runCompose(deployDir, ["pull"]); + if (!pullResult.success) { + throw new Error(`Failed to pull images: ${pullResult.stderr}`); + } + + // 5. Verify digests + await this.verifyDigests(composePath, config.expectedDigests); + + // 6. Deploy + this.log("Deploying services..."); + const upResult = await this.runCompose(deployDir, [ + "up", "-d", + "--remove-orphans", + "--force-recreate" + ]); + + if (!upResult.success) { + throw new Error(`Failed to deploy: ${upResult.stderr}`); + } + + // 7. Wait for services to be healthy + if (config.healthCheck) { + this.log("Waiting for services to be healthy..."); + const healthy = await this.waitForServicesHealthy( + deployDir, + config.healthCheck.timeout + ); + + if (!healthy) { + // Rollback + await this.rollbackToBackup(deployDir); + throw new HealthCheckFailedError("Services failed health check"); + } + } + + // 8. Run post-deploy hook + if (task.hooks?.postDeploy) { + await this.runHook(task.hooks.postDeploy, deployDir); + } + + // 9. Write version sticker + await this.writeSticker(config.sticker, deployDir); + + return { + success: true, + logs: this.getLogs(), + durationMs: this.getDuration(), + }; + } +} +``` + +### Digest Verification + +```typescript +private async verifyDigests( + composePath: string, + expectedDigests: Record +): Promise { + const composeContent = yaml.parse(await fs.readFile(composePath, "utf-8")); + + for (const [service, expectedDigest] of Object.entries(expectedDigests)) { + const serviceConfig = composeContent.services[service]; + if (!serviceConfig) { + throw new Error(`Service ${service} not found in compose file`); + } + + const image = serviceConfig.image; + if (!image.includes("@sha256:")) { + throw new Error(`Service ${service} image not pinned to digest: ${image}`); + } + + const actualDigest = image.split("@")[1]; + if (actualDigest !== expectedDigest) { + throw new DigestMismatchError( + `Service ${service}: expected ${expectedDigest}, got ${actualDigest}` + ); + } + } +} +``` + +--- + +## Security Considerations + +1. **Digest Verification:** All deployments verify image digests before execution +2. **Credential Encryption:** Credentials are encrypted in transit and at rest +3. **mTLS Communication:** All agent-server communication uses mutual TLS +4. **Hook Sandboxing:** Pre/post-deploy hooks run in isolated environments +5. **Audit Logging:** All deployment actions are logged with actor context + +--- + +## See Also + +- [Agents Module](../modules/agents.md) +- [Agent Security](../security/agent-security.md) +- [Deployment Orchestrator](../modules/deploy-orchestrator.md) +- [Agentless Deployment](agentless.md) diff --git a/docs/modules/release-orchestrator/deployment/agentless.md b/docs/modules/release-orchestrator/deployment/agentless.md new file mode 100644 index 000000000..87d7d88de --- /dev/null +++ b/docs/modules/release-orchestrator/deployment/agentless.md @@ -0,0 +1,427 @@ +# Agentless Deployment (SSH/WinRM) + +> Agentless deployment using SSH and WinRM for remote execution without installing agents. + +**Status:** Planned (not yet implemented) +**Source:** [Architecture Advisory Section 10.4](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) +**Related Modules:** [Agents Module](../modules/agents.md), [Deploy Orchestrator](../modules/deploy-orchestrator.md) +**Sprints:** [108_004 SSH Agent](../../../../implplan/SPRINT_20260110_108_004_AGENTS_ssh.md), [108_005 WinRM Agent](../../../../implplan/SPRINT_20260110_108_005_AGENTS_winrm.md) + +## Overview + +Agentless deployment enables deployment to targets without requiring a pre-installed agent. The orchestrator connects directly to targets using SSH (Linux/Unix) or WinRM (Windows) to execute deployment commands. + +--- + +## SSH Remote Executor + +### Capabilities + +- SSH key-based authentication +- File transfer via SFTP +- Remote command execution +- Docker operations over SSH +- Script execution +- Backup and rollback + +### Connection Management + +```typescript +class SSHRemoteExecutor implements TargetExecutor { + private ssh: SSHClient; + + async connect(config: SSHConnectionConfig): Promise { + const privateKey = await this.secrets.getSecret(config.privateKeyRef); + + this.ssh = new SSHClient(); + await this.ssh.connect({ + host: config.host, + port: config.port || 22, + username: config.username, + privateKey: privateKey.value, + readyTimeout: config.connectionTimeout || 30000, + keepaliveInterval: 10000, + }); + } +} +``` + +### Deploy Task Flow + +```typescript +async deploy(task: DeployTaskPayload): Promise { + const { artifacts, config } = task; + const deployDir = config.deploymentDirectory; + + try { + // 1. Ensure deployment directory exists + await this.exec(`mkdir -p ${deployDir}`); + await this.exec(`mkdir -p ${deployDir}/.stella-backup`); + + // 2. Backup current deployment + await this.exec(`cp -r ${deployDir}/* ${deployDir}/.stella-backup/ 2>/dev/null || true`); + + // 3. Upload artifacts + for (const artifact of artifacts) { + const content = await this.fetchArtifact(artifact); + const remotePath = path.join(deployDir, artifact.name); + await this.uploadFile(content, remotePath); + } + + // 4. Run pre-deploy hook + if (task.hooks?.preDeploy) { + await this.runRemoteHook(task.hooks.preDeploy, deployDir); + } + + // 5. Execute deployment script + const deployScript = artifacts.find(a => a.type === "deploy_script"); + if (deployScript) { + const scriptPath = path.join(deployDir, deployScript.name); + await this.exec(`chmod +x ${scriptPath}`); + + const result = await this.exec(scriptPath, { + cwd: deployDir, + timeout: config.deploymentTimeout, + env: config.environment, + }); + + if (result.exitCode !== 0) { + throw new DeploymentError(`Deploy script failed: ${result.stderr}`); + } + } + + // 6. Run post-deploy hook + if (task.hooks?.postDeploy) { + await this.runRemoteHook(task.hooks.postDeploy, deployDir); + } + + // 7. Health check + if (config.healthCheck) { + const healthy = await this.runHealthCheck(config.healthCheck); + if (!healthy) { + await this.rollback(task); + throw new HealthCheckFailedError("Health check failed"); + } + } + + // 8. Write version sticker + await this.writeSticker(config.sticker, deployDir); + + // 9. Cleanup backup + await this.exec(`rm -rf ${deployDir}/.stella-backup`); + + return { + success: true, + logs: this.getLogs(), + durationMs: this.getDuration(), + }; + + } finally { + this.ssh.end(); + } +} +``` + +### Command Execution + +```typescript +private async exec( + command: string, + options?: ExecOptions +): Promise { + return new Promise((resolve, reject) => { + const timeout = options?.timeout || 60000; + let stdout = ""; + let stderr = ""; + + this.ssh.exec(command, { cwd: options?.cwd }, (err, stream) => { + if (err) { + reject(err); + return; + } + + const timer = setTimeout(() => { + stream.close(); + reject(new TimeoutError(`Command timed out after ${timeout}ms`)); + }, timeout); + + stream.on("data", (data: Buffer) => { + stdout += data.toString(); + this.log(data.toString()); + }); + + stream.stderr.on("data", (data: Buffer) => { + stderr += data.toString(); + this.log(`[stderr] ${data.toString()}`); + }); + + stream.on("close", (code: number) => { + clearTimeout(timer); + resolve({ exitCode: code, stdout, stderr }); + }); + }); + }); +} +``` + +### File Upload via SFTP + +```typescript +private async uploadFile(content: Buffer | string, remotePath: string): Promise { + return new Promise((resolve, reject) => { + this.ssh.sftp((err, sftp) => { + if (err) { + reject(err); + return; + } + + const writeStream = sftp.createWriteStream(remotePath); + writeStream.on("close", () => resolve()); + writeStream.on("error", reject); + writeStream.end(content); + }); + }); +} +``` + +### Rollback + +```typescript +async rollback(task: RollbackTaskPayload): Promise { + const deployDir = task.config.deploymentDirectory; + + // Restore from backup + await this.exec(`rm -rf ${deployDir}/*`); + await this.exec(`cp -r ${deployDir}/.stella-backup/* ${deployDir}/`); + + // Re-run deployment from backup + const deployScript = path.join(deployDir, "deploy.sh"); + await this.exec(deployScript, { cwd: deployDir }); + + return { + success: true, + logs: this.getLogs(), + durationMs: this.getDuration(), + }; +} +``` + +--- + +## WinRM Remote Executor + +### Capabilities + +- NTLM/Kerberos authentication +- PowerShell script execution +- File transfer via base64 encoding +- Windows container operations +- Windows service management + +### Connection Management + +```typescript +class WinRMRemoteExecutor implements TargetExecutor { + private winrm: WinRMClient; + + async connect(config: WinRMConnectionConfig): Promise { + const credential = await this.secrets.getSecret(config.credentialRef); + + this.winrm = new WinRMClient({ + host: config.host, + port: config.port || 5986, + username: credential.username, + password: credential.password, + protocol: config.useHttps ? "https" : "http", + authentication: config.authType || "ntlm", // ntlm, kerberos, basic + }); + + await this.winrm.openShell(); + } +} +``` + +### Deploy Task Flow + +```typescript +async deploy(task: DeployTaskPayload): Promise { + const { artifacts, config } = task; + const deployDir = config.deploymentDirectory; + + try { + // 1. Ensure deployment directory exists + await this.execPowerShell(` + if (-not (Test-Path "${deployDir}")) { + New-Item -ItemType Directory -Path "${deployDir}" -Force + } + if (-not (Test-Path "${deployDir}\\.stella-backup")) { + New-Item -ItemType Directory -Path "${deployDir}\\.stella-backup" -Force + } + `); + + // 2. Backup current deployment + await this.execPowerShell(` + Get-ChildItem "${deployDir}" -Exclude ".stella-backup" | + Copy-Item -Destination "${deployDir}\\.stella-backup" -Recurse -Force + `); + + // 3. Upload artifacts + for (const artifact of artifacts) { + const content = await this.fetchArtifact(artifact); + const remotePath = `${deployDir}\\${artifact.name}`; + await this.uploadFile(content, remotePath); + } + + // 4. Run pre-deploy hook + if (task.hooks?.preDeploy) { + await this.runRemoteHook(task.hooks.preDeploy, deployDir); + } + + // 5. Execute deployment script + const deployScript = artifacts.find(a => a.type === "deploy_script"); + if (deployScript) { + const scriptPath = `${deployDir}\\${deployScript.name}`; + + const result = await this.execPowerShell(` + Set-Location "${deployDir}" + & "${scriptPath}" + exit $LASTEXITCODE + `, { timeout: config.deploymentTimeout }); + + if (result.exitCode !== 0) { + throw new DeploymentError(`Deploy script failed: ${result.stderr}`); + } + } + + // 6. Run post-deploy hook + if (task.hooks?.postDeploy) { + await this.runRemoteHook(task.hooks.postDeploy, deployDir); + } + + // 7. Health check + if (config.healthCheck) { + const healthy = await this.runHealthCheck(config.healthCheck); + if (!healthy) { + await this.rollback(task); + throw new HealthCheckFailedError("Health check failed"); + } + } + + // 8. Write version sticker + await this.writeSticker(config.sticker, deployDir); + + // 9. Cleanup backup + await this.execPowerShell(` + Remove-Item -Path "${deployDir}\\.stella-backup" -Recurse -Force + `); + + return { + success: true, + logs: this.getLogs(), + durationMs: this.getDuration(), + }; + + } finally { + this.winrm.closeShell(); + } +} +``` + +### PowerShell Execution + +```typescript +private async execPowerShell( + script: string, + options?: ExecOptions +): Promise { + const encoded = Buffer.from(script, "utf16le").toString("base64"); + return this.winrm.runCommand( + `powershell -EncodedCommand ${encoded}`, + { timeout: options?.timeout || 60000 } + ); +} +``` + +### File Upload + +```typescript +private async uploadFile(content: Buffer | string, remotePath: string): Promise { + // Use PowerShell to write file content + const base64Content = Buffer.from(content).toString("base64"); + + await this.execPowerShell(` + $bytes = [Convert]::FromBase64String("${base64Content}") + [IO.File]::WriteAllBytes("${remotePath}", $bytes) + `); +} +``` + +--- + +## Security Considerations + +### SSH Security + +1. **Key-Based Authentication:** Always use SSH keys, never passwords +2. **Key Rotation:** Regularly rotate SSH keys +3. **Bastion Hosts:** Use jump hosts for network isolation +4. **Connection Timeouts:** Enforce strict connection timeouts +5. **Known Hosts:** Verify host fingerprints + +### WinRM Security + +1. **HTTPS Required:** Always use WinRM over HTTPS in production +2. **Certificate Validation:** Validate server certificates +3. **Kerberos Preferred:** Use Kerberos when available, NTLM as fallback +4. **Credential Protection:** Store credentials in vault +5. **Session Cleanup:** Always close sessions after use + +--- + +## Configuration Examples + +### SSH Target Configuration + +```yaml +target: + name: web-server-01 + type: ssh + connection: + host: 192.168.1.100 + port: 22 + username: deploy + privateKeyRef: vault://ssh-keys/deploy-key + deployment: + directory: /opt/myapp + healthCheck: + command: curl -f http://localhost:8080/health + timeout: 30 +``` + +### WinRM Target Configuration + +```yaml +target: + name: windows-server-01 + type: winrm + connection: + host: 192.168.1.200 + port: 5986 + useHttps: true + authType: kerberos + credentialRef: vault://windows-creds/deploy-user + deployment: + directory: C:\Apps\MyApp + healthCheck: + command: Invoke-WebRequest -Uri http://localhost:8080/health -UseBasicParsing + timeout: 30 +``` + +--- + +## See Also + +- [Agent-Based Deployment](agent-based.md) +- [Agents Module](../modules/agents.md) +- [Deployment Orchestrator](../modules/deploy-orchestrator.md) +- [Security Overview](../security/overview.md) diff --git a/docs/modules/release-orchestrator/deployment/artifacts.md b/docs/modules/release-orchestrator/deployment/artifacts.md new file mode 100644 index 000000000..95f8c910f --- /dev/null +++ b/docs/modules/release-orchestrator/deployment/artifacts.md @@ -0,0 +1,308 @@ +# Artifact Generation + +## Overview + +Every deployment generates immutable artifacts that enable reproducibility, audit, and rollback. + +## Generated Artifacts + +### 1. Compose Lock File + +**File:** `compose.stella.lock.yml` + +A Docker Compose file with all image references pinned to specific digests. + +```yaml +# compose.stella.lock.yml +# Generated by Stella Ops - DO NOT EDIT +# Release: myapp-v2.3.1 +# Generated: 2026-01-10T14:30:00Z +# Generator: stella-artifact-generator@1.5.0 + +version: "3.8" + +services: + api: + image: registry.example.com/myapp/api@sha256:abc123... + # Original tag: v2.3.1 + deploy: + replicas: 2 + environment: + - DATABASE_URL=${DATABASE_URL} + - REDIS_URL=${REDIS_URL} + labels: + stella.component.id: "comp-api-uuid" + stella.release.id: "rel-uuid" + stella.digest: "sha256:abc123..." + + worker: + image: registry.example.com/myapp/worker@sha256:def456... + # Original tag: v2.3.1 + deploy: + replicas: 1 + labels: + stella.component.id: "comp-worker-uuid" + stella.release.id: "rel-uuid" + stella.digest: "sha256:def456..." + +# Stella metadata +x-stella: + release: + id: "rel-uuid" + name: "myapp-v2.3.1" + created_at: "2026-01-10T14:00:00Z" + environment: + id: "env-uuid" + name: "production" + deployment: + id: "deploy-uuid" + started_at: "2026-01-10T14:30:00Z" + checksums: + sha256: "checksum-of-this-file" +``` + +### 2. Version Sticker + +**File:** `stella.version.json` + +Metadata file placed on deployment targets indicating current deployment state. + +```json +{ + "version": "1.0", + "generatedAt": "2026-01-10T14:35:00Z", + "generator": "stella-artifact-generator@1.5.0", + + "release": { + "id": "rel-uuid", + "name": "myapp-v2.3.1", + "createdAt": "2026-01-10T14:00:00Z", + "components": [ + { + "name": "api", + "digest": "sha256:abc123...", + "semver": "2.3.1", + "tag": "v2.3.1" + }, + { + "name": "worker", + "digest": "sha256:def456...", + "semver": "2.3.1", + "tag": "v2.3.1" + } + ] + }, + + "deployment": { + "id": "deploy-uuid", + "promotionId": "promo-uuid", + "environmentId": "env-uuid", + "environmentName": "production", + "targetId": "target-uuid", + "targetName": "prod-web-01", + "strategy": "rolling", + "startedAt": "2026-01-10T14:30:00Z", + "completedAt": "2026-01-10T14:35:00Z" + }, + + "deployer": { + "userId": "user-uuid", + "userName": "john.doe", + "agentId": "agent-uuid", + "agentName": "prod-agent-01" + }, + + "previous": { + "releaseId": "prev-rel-uuid", + "releaseName": "myapp-v2.3.0", + "digest": "sha256:789..." + }, + + "signature": "base64-encoded-signature", + "signatureAlgorithm": "RS256", + "signerKeyRef": "stella/signing/prod-key-2026" +} +``` + +### 3. Evidence Packet + +**File:** Evidence stored in database (exportable as JSON/PDF) + +See [Evidence Schema](../appendices/evidence-schema.md) for full specification. + +### 4. Deployment Script (Optional) + +**File:** `deploy.stella.script.dll` or `deploy.stella.sh` + +When deployments use C# or shell scripts with hooks: + +```csharp +// deploy.stella.csx (source, compiled to DLL) +#r "nuget: StellaOps.Sdk, 1.0.0" + +using StellaOps.Sdk; + +// Pre-deploy hook +await Context.RunPreDeployHook(async (ctx) => { + await ctx.ExecuteCommand("./scripts/backup-database.sh"); + await ctx.HealthCheck("/ready", timeout: 30); +}); + +// Deploy +await Context.Deploy(); + +// Post-deploy hook +await Context.RunPostDeployHook(async (ctx) => { + await ctx.ExecuteCommand("./scripts/warm-cache.sh"); + await ctx.Notify("slack", "Deployment complete"); +}); +``` + +## Artifact Storage + +### Storage Structure + +``` +artifacts/ +├── {tenant_id}/ +│ ├── {deployment_id}/ +│ │ ├── compose.stella.lock.yml +│ │ ├── deploy.stella.script.dll (if applicable) +│ │ ├── deploy.stella.script.csx (source) +│ │ ├── manifest.json +│ │ └── checksums.sha256 +│ └── ... +└── ... +``` + +### Manifest File + +```json +{ + "version": "1.0", + "deploymentId": "deploy-uuid", + "createdAt": "2026-01-10T14:30:00Z", + "artifacts": [ + { + "name": "compose.stella.lock.yml", + "type": "compose-lock", + "size": 2048, + "sha256": "abc123..." + }, + { + "name": "deploy.stella.script.dll", + "type": "script-compiled", + "size": 8192, + "sha256": "def456..." + } + ], + "totalSize": 10240, + "signature": "base64-signature" +} +``` + +## Artifact Generation Process + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ ARTIFACT GENERATION FLOW │ +│ │ +│ ┌─────────────────┐ │ +│ │ Promotion │ │ +│ │ Approved │ │ +│ └────────┬────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ ARTIFACT GENERATOR │ │ +│ │ │ │ +│ │ 1. Load release bundle (components, digests) │ │ +│ │ 2. Load environment configuration (variables, secrets refs) │ │ +│ │ 3. Load workflow template (hooks, scripts) │ │ +│ │ 4. Generate compose.stella.lock.yml │ │ +│ │ 5. Compile scripts (if any) │ │ +│ │ 6. Generate version sticker template │ │ +│ │ 7. Compute checksums │ │ +│ │ 8. Sign artifacts │ │ +│ │ 9. Store in artifact storage │ │ +│ │ │ │ +│ └────────────────────────────┬────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ DEPLOYMENT ORCHESTRATOR │ │ +│ │ │ │ +│ │ Artifacts distributed to targets via agents │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Artifact Properties + +### Immutability + +Once generated, artifacts are never modified: +- Content-addressed storage (hash in path/metadata) +- No overwrite capability +- Append-only storage pattern + +### Integrity + +All artifacts are: +- Checksummed (SHA-256) +- Signed with deployment key +- Verifiable at deployment time + +### Retention + +| Environment | Retention Period | +|-------------|------------------| +| Development | 30 days | +| Staging | 90 days | +| Production | 7 years (compliance) | + +## API Operations + +```yaml +# List artifacts for deployment +GET /api/v1/deployment-jobs/{id}/artifacts +Response: Artifact[] + +# Download specific artifact +GET /api/v1/deployment-jobs/{id}/artifacts/{name} +Response: binary + +# Get artifact manifest +GET /api/v1/deployment-jobs/{id}/artifacts/manifest +Response: ArtifactManifest + +# Verify artifact integrity +POST /api/v1/deployment-jobs/{id}/artifacts/{name}/verify +Response: { valid: boolean, checksum: string, signature: string } +``` + +## Drift Detection + +Version stickers enable drift detection: + +```typescript +interface DriftCheck { + targetId: UUID; + expectedSticker: VersionSticker; + actualSticker: VersionSticker | null; + driftDetected: boolean; + driftType?: "missing" | "corrupted" | "mismatch"; + details?: { + expectedDigest: string; + actualDigest: string; + field: string; + }; +} +``` + +## References + +- [Deployment Overview](overview.md) +- [Deployment Strategies](strategies.md) +- [Evidence Schema](../appendices/evidence-schema.md) diff --git a/docs/modules/release-orchestrator/deployment/overview.md b/docs/modules/release-orchestrator/deployment/overview.md new file mode 100644 index 000000000..e15aa58ad --- /dev/null +++ b/docs/modules/release-orchestrator/deployment/overview.md @@ -0,0 +1,671 @@ +# Deployment Overview + +## Purpose + +The Deployment system executes the actual deployment of releases to target environments, managing deployment jobs, tasks, artifact generation, and rollback capabilities. + +## Deployment Architecture + +``` + DEPLOYMENT ARCHITECTURE + + ┌─────────────────────────────────────────────────────────────────────────────┐ + │ DEPLOY ORCHESTRATOR │ + │ │ + │ ┌─────────────────────────────────────────────────────────────────────┐ │ + │ │ DEPLOYMENT JOB MANAGER │ │ + │ │ │ │ + │ │ Promotion ───► Create Job ───► Plan Tasks ───► Execute Tasks │ │ + │ │ │ │ + │ └─────────────────────────────────────────────────────────────────────┘ │ + │ │ │ + │ ┌───────────────┼───────────────┐ │ + │ │ │ │ │ + │ ▼ ▼ ▼ │ + │ ┌─────────────────────┐ ┌─────────────────┐ ┌─────────────────────┐ │ + │ │ TARGET EXECUTOR │ │ RUNNER EXECUTOR │ │ ARTIFACT GENERATOR │ │ + │ │ │ │ │ │ │ │ + │ │ - Task dispatch │ │ - Agent tasks │ │ - Compose files │ │ + │ │ - Status tracking │ │ - SSH tasks │ │ - Env configs │ │ + │ │ - Log aggregation │ │ - API tasks │ │ - Manifests │ │ + │ └─────────────────────┘ └─────────────────┘ └─────────────────────┘ │ + │ │ │ + └─────────────────────────────────────────────────────────────────────────────┘ + │ + ┌────────────────────────────┼────────────────────────────┐ + │ │ │ + ▼ ▼ ▼ + ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ + │ Agent │ │ Agentless │ │ API │ + │ Execution │ │ Execution │ │ Execution │ + │ │ │ │ │ │ + │ Docker, │ │ SSH, │ │ ECS, │ + │ Compose │ │ WinRM │ │ Nomad │ + └─────────────┘ └─────────────┘ └─────────────┘ +``` + +## Deployment Flow + +### Standard Deployment Flow + +``` + DEPLOYMENT FLOW + + Promotion Deployment Task Agent/Target + Approved Job Execution + │ │ │ │ + │ Create Job │ │ │ + ├───────────────►│ │ │ + │ │ │ │ + │ │ Generate │ │ + │ │ Artifacts │ │ + │ ├────────────────►│ │ + │ │ │ │ + │ │ Create Tasks │ │ + │ │ per Target │ │ + │ ├────────────────►│ │ + │ │ │ │ + │ │ │ Dispatch Task │ + │ │ ├────────────────►│ + │ │ │ │ + │ │ │ Execute │ + │ │ │ (Pull, Deploy) │ + │ │ │ │ + │ │ │ Report Status │ + │ │ │◄────────────────┤ + │ │ │ │ + │ │ Aggregate │ │ + │ │ Results │ │ + │ │◄────────────────┤ │ + │ │ │ │ + │ Job Complete │ │ │ + │◄───────────────┤ │ │ + │ │ │ │ +``` + +## Deployment Job + +### Job Entity + +```typescript +interface DeploymentJob { + id: UUID; + promotionId: UUID; + releaseId: UUID; + environmentId: UUID; + + // Execution configuration + strategy: DeploymentStrategy; + parallelism: number; + + // Status tracking + status: JobStatus; + startedAt?: DateTime; + completedAt?: DateTime; + + // Artifacts + artifacts: GeneratedArtifact[]; + + // Rollback reference + rollbackOf?: UUID; // If this is a rollback job + previousJobId?: UUID; // Previous successful job + + // Tasks + tasks: DeploymentTask[]; +} + +type JobStatus = + | "pending" + | "preparing" + | "running" + | "completing" + | "completed" + | "failed" + | "rolling_back" + | "rolled_back"; + +type DeploymentStrategy = + | "all-at-once" + | "rolling" + | "canary" + | "blue-green"; +``` + +### Job State Machine + +``` + JOB STATE MACHINE + + ┌──────────┐ + │ PENDING │ + └────┬─────┘ + │ start() + ▼ + ┌──────────┐ + │PREPARING │ + │ │ + │ Generate │ + │ artifacts│ + └────┬─────┘ + │ + ▼ + ┌──────────┐ + │ RUNNING │◄────────────────┐ + │ │ │ + │ Execute │ │ + │ tasks │ │ + └────┬─────┘ │ + │ │ + ┌───────────────┼───────────────┐ │ + │ │ │ │ + ▼ ▼ ▼ │ + ┌──────────┐ ┌──────────┐ ┌──────────┐ │ + │COMPLETING│ │ FAILED │ │ ROLLING │ │ + │ │ │ │ │ BACK │──┘ + │ Verify │ │ │ │ │ + │ health │ │ │ │ │ + └────┬─────┘ └────┬─────┘ └────┬─────┘ + │ │ │ + ▼ │ ▼ + ┌──────────┐ │ ┌──────────┐ + │COMPLETED │ │ │ ROLLED │ + └──────────┘ │ │ BACK │ + │ └──────────┘ + │ + ▼ + [Failure + handling] +``` + +## Deployment Task + +### Task Entity + +```typescript +interface DeploymentTask { + id: UUID; + jobId: UUID; + targetId: UUID; + + // What to deploy + componentId: UUID; + digest: string; + + // Execution + status: TaskStatus; + agentId?: UUID; + startedAt?: DateTime; + completedAt?: DateTime; + + // Results + logs: string; + previousDigest?: string; // For rollback + error?: string; + + // Retry tracking + attemptNumber: number; + maxAttempts: number; +} + +type TaskStatus = + | "pending" + | "queued" + | "dispatched" + | "running" + | "verifying" + | "succeeded" + | "failed" + | "retrying"; +``` + +### Task Dispatch + +```typescript +class TaskDispatcher { + async dispatchTask(task: DeploymentTask): Promise { + const target = await this.targetRepository.get(task.targetId); + + switch (target.executionModel) { + case "agent": + await this.dispatchToAgent(task, target); + break; + + case "ssh": + await this.dispatchViaSsh(task, target); + break; + + case "api": + await this.dispatchViaApi(task, target); + break; + } + } + + private async dispatchToAgent( + task: DeploymentTask, + target: Target + ): Promise { + // Find available agent for target + const agent = await this.agentManager.findAgentForTarget(target); + + if (!agent) { + throw new NoAgentAvailableError(target.id); + } + + // Create task payload + const payload: AgentTaskPayload = { + taskId: task.id, + targetId: target.id, + action: "deploy", + digest: task.digest, + config: target.connection, + credentials: await this.fetchTaskCredentials(target) + }; + + // Dispatch to agent + await this.agentClient.dispatchTask(agent.id, payload); + + // Update task status + task.status = "dispatched"; + task.agentId = agent.id; + await this.taskRepository.update(task); + } +} +``` + +## Generated Artifacts + +### Artifact Types + +| Type | Description | Format | +|------|-------------|--------| +| `compose-file` | Docker Compose file | YAML | +| `compose-lock` | Pinned compose file | YAML | +| `env-file` | Environment variables | .env | +| `systemd-unit` | Systemd service unit | .service | +| `nginx-config` | Nginx configuration | .conf | +| `manifest` | Deployment manifest | JSON | + +### Compose Lock Generation + +```typescript +interface ComposeLock { + version: string; + services: Record; + generated: { + releaseId: string; + promotionId: string; + timestamp: string; + digest: string; // Hash of this file + }; +} + +interface LockedService { + image: string; // Full image reference with digest + environment?: Record; + labels: Record; +} + +class ComposeArtifactGenerator { + async generateLock( + release: Release, + target: Target, + template: ComposeTemplate + ): Promise { + const services: Record = {}; + + for (const [serviceName, serviceConfig] of Object.entries(template.services)) { + // Find component for this service + const componentDigest = release.components.find( + c => c.name === serviceConfig.componentName + ); + + if (!componentDigest) { + throw new Error(`No component found for service ${serviceName}`); + } + + // Build locked image reference + const imageRef = `${componentDigest.repository}@${componentDigest.digest}`; + + services[serviceName] = { + image: imageRef, + environment: { + ...serviceConfig.environment, + STELLA_RELEASE_ID: release.id, + STELLA_DIGEST: componentDigest.digest + }, + labels: { + "stella.release.id": release.id, + "stella.component.name": componentDigest.name, + "stella.digest": componentDigest.digest, + "stella.deployed.at": new Date().toISOString() + } + }; + } + + const lock: ComposeLock = { + version: "3.8", + services, + generated: { + releaseId: release.id, + promotionId: target.promotionId, + timestamp: new Date().toISOString(), + digest: "" // Computed below + } + }; + + // Compute content hash + const content = yaml.stringify(lock); + lock.generated.digest = crypto.createHash("sha256").update(content).digest("hex"); + + return lock; + } +} +``` + +## Deployment Execution + +### Execution Models + +| Model | Description | Use Case | +|-------|-------------|----------| +| `agent` | Stella agent on target | Docker hosts, servers | +| `ssh` | SSH-based agentless | Unix servers | +| `winrm` | WinRM-based agentless | Windows servers | +| `api` | API-based | ECS, Nomad, K8s | + +### Agent-Based Execution + +```typescript +class AgentExecutor { + async execute(task: DeploymentTask): Promise { + const agent = await this.agentManager.get(task.agentId); + const target = await this.targetRepository.get(task.targetId); + + // Prepare task payload with secrets + const payload: TaskPayload = { + taskId: task.id, + targetId: target.id, + action: "deploy", + digest: task.digest, + config: target.connection, + artifacts: await this.getArtifacts(task.jobId), + credentials: await this.secretsManager.fetchForTask(target) + }; + + // Dispatch to agent + const taskRef = await this.agentClient.dispatchTask(agent.id, payload); + + // Wait for completion + const result = await this.waitForTaskCompletion(taskRef, task.timeout); + + return result; + } + + private async waitForTaskCompletion( + taskRef: TaskReference, + timeout: number + ): Promise { + const deadline = Date.now() + timeout * 1000; + + while (Date.now() < deadline) { + const status = await this.agentClient.getTaskStatus(taskRef); + + if (status.completed) { + return { + success: status.success, + logs: status.logs, + deployedDigest: status.deployedDigest, + error: status.error + }; + } + + await sleep(1000); + } + + throw new TimeoutError(`Task did not complete within ${timeout} seconds`); + } +} +``` + +### SSH-Based Execution + +```typescript +class SshExecutor { + async execute(task: DeploymentTask): Promise { + const target = await this.targetRepository.get(task.targetId); + const sshConfig = target.connection as SshConnectionConfig; + + // Get SSH credentials from vault + const creds = await this.secretsManager.fetchSshCredentials( + sshConfig.credentialRef + ); + + // Connect via SSH + const ssh = new NodeSSH(); + await ssh.connect({ + host: sshConfig.host, + port: sshConfig.port || 22, + username: creds.username, + privateKey: creds.privateKey + }); + + try { + // Upload artifacts + const artifacts = await this.getArtifacts(task.jobId); + for (const artifact of artifacts) { + await ssh.putFile(artifact.localPath, artifact.remotePath); + } + + // Execute deployment script + const result = await ssh.execCommand( + this.buildDeployCommand(task, target), + { cwd: sshConfig.workDir } + ); + + return { + success: result.code === 0, + logs: `${result.stdout}\n${result.stderr}`, + error: result.code !== 0 ? result.stderr : undefined + }; + } finally { + ssh.dispose(); + } + } + + private buildDeployCommand(task: DeploymentTask, target: Target): string { + // Build deployment command based on target type + switch (target.targetType) { + case "compose_host": + return `cd ${target.connection.workDir} && docker-compose pull && docker-compose up -d`; + + case "docker_host": + return `docker pull ${task.digest} && docker stop ${target.containerName} && docker run -d --name ${target.containerName} ${task.digest}`; + + default: + throw new Error(`Unsupported target type: ${target.targetType}`); + } + } +} +``` + +## Health Verification + +```typescript +interface HealthCheckConfig { + type: "http" | "tcp" | "command"; + timeout: number; + retries: number; + interval: number; + + // HTTP-specific + path?: string; + expectedStatus?: number; + expectedBody?: string; + + // TCP-specific + port?: number; + + // Command-specific + command?: string; +} + +class HealthVerifier { + async verify( + target: Target, + config: HealthCheckConfig + ): Promise { + let lastError: Error | undefined; + + for (let attempt = 0; attempt < config.retries; attempt++) { + try { + const result = await this.performCheck(target, config); + + if (result.healthy) { + return result; + } + + lastError = new Error(result.message); + } catch (error) { + lastError = error as Error; + } + + if (attempt < config.retries - 1) { + await sleep(config.interval * 1000); + } + } + + return { + healthy: false, + message: lastError?.message || "Health check failed", + attempts: config.retries + }; + } + + private async performCheck( + target: Target, + config: HealthCheckConfig + ): Promise { + switch (config.type) { + case "http": + return this.httpCheck(target, config); + + case "tcp": + return this.tcpCheck(target, config); + + case "command": + return this.commandCheck(target, config); + } + } + + private async httpCheck( + target: Target, + config: HealthCheckConfig + ): Promise { + const url = `${target.healthEndpoint}${config.path || "/health"}`; + + try { + const response = await fetch(url, { + signal: AbortSignal.timeout(config.timeout * 1000) + }); + + const healthy = response.status === (config.expectedStatus || 200); + + return { + healthy, + message: healthy ? "OK" : `Status ${response.status}`, + statusCode: response.status + }; + } catch (error) { + return { + healthy: false, + message: (error as Error).message + }; + } + } +} +``` + +## Rollback Management + +```typescript +class RollbackManager { + async initiateRollback( + jobId: UUID, + reason: string + ): Promise { + const failedJob = await this.jobRepository.get(jobId); + const previousJob = await this.findPreviousSuccessfulJob( + failedJob.environmentId, + failedJob.releaseId + ); + + if (!previousJob) { + throw new NoRollbackTargetError(jobId); + } + + // Create rollback job + const rollbackJob: DeploymentJob = { + id: uuidv4(), + promotionId: failedJob.promotionId, + releaseId: previousJob.releaseId, // Previous release + environmentId: failedJob.environmentId, + strategy: "all-at-once", // Fast rollback + parallelism: 10, + status: "pending", + rollbackOf: jobId, + previousJobId: previousJob.id, + artifacts: [], + tasks: [] + }; + + // Create tasks to restore previous state + for (const task of failedJob.tasks) { + const previousTask = previousJob.tasks.find( + t => t.targetId === task.targetId + ); + + if (previousTask) { + rollbackJob.tasks.push({ + id: uuidv4(), + jobId: rollbackJob.id, + targetId: task.targetId, + componentId: previousTask.componentId, + digest: previousTask.previousDigest || task.previousDigest!, + status: "pending", + logs: "", + attemptNumber: 0, + maxAttempts: 3 + }); + } + } + + await this.jobRepository.save(rollbackJob); + + // Execute rollback + await this.executeJob(rollbackJob); + + return rollbackJob; + } + + private async findPreviousSuccessfulJob( + environmentId: UUID, + excludeReleaseId: UUID + ): Promise { + return this.jobRepository.findOne({ + environmentId, + status: "completed", + releaseId: { $ne: excludeReleaseId } + }, { + orderBy: { completedAt: "desc" } + }); + } +} +``` + +## References + +- [Deployment Strategies](strategies.md) +- [Agent-Based Deployment](agent-based.md) +- [Agentless Deployment](agentless.md) +- [Generated Artifacts](artifacts.md) +- [Deploy Orchestrator Module](../modules/deploy-orchestrator.md) diff --git a/docs/modules/release-orchestrator/deployment/strategies.md b/docs/modules/release-orchestrator/deployment/strategies.md new file mode 100644 index 000000000..f787dfc08 --- /dev/null +++ b/docs/modules/release-orchestrator/deployment/strategies.md @@ -0,0 +1,656 @@ +# Deployment Strategies + +## Overview + +Release Orchestrator supports multiple deployment strategies to balance deployment speed, risk, and availability requirements. + +## Strategy Comparison + +| Strategy | Description | Risk Level | Downtime | Rollback Speed | +|----------|-------------|------------|----------|----------------| +| All-at-once | Deploy to all targets simultaneously | High | Brief | Fast | +| Rolling | Deploy to targets in batches | Medium | None | Medium | +| Canary | Deploy to subset, then expand | Low | None | Fast | +| Blue-Green | Deploy to parallel environment | Low | None | Instant | + +## All-at-Once Strategy + +### Description + +Deploys to all targets simultaneously. Simple and fast, but highest risk. + +``` + ALL-AT-ONCE DEPLOYMENT + + Time 0 Time 1 + ┌─────────────────┐ ┌─────────────────┐ + │ Target 1 [v1] │ │ Target 1 [v2] │ + ├─────────────────┤ ├─────────────────┤ + │ Target 2 [v1] │ ───► │ Target 2 [v2] │ + ├─────────────────┤ ├─────────────────┤ + │ Target 3 [v1] │ │ Target 3 [v2] │ + └─────────────────┘ └─────────────────┘ +``` + +### Configuration + +```typescript +interface AllAtOnceConfig { + strategy: "all-at-once"; + + // Concurrency limit (0 = unlimited) + maxConcurrent: number; + + // Health check after deployment + healthCheck: HealthCheckConfig; + + // Failure behavior + failureBehavior: "rollback" | "continue" | "pause"; +} + +// Example +const config: AllAtOnceConfig = { + strategy: "all-at-once", + maxConcurrent: 0, + healthCheck: { + type: "http", + path: "/health", + timeout: 30, + retries: 3, + interval: 10 + }, + failureBehavior: "rollback" +}; +``` + +### Execution + +```typescript +class AllAtOnceExecutor { + async execute(job: DeploymentJob, config: AllAtOnceConfig): Promise { + const tasks = job.tasks; + const concurrency = config.maxConcurrent || tasks.length; + + // Execute all tasks with concurrency limit + const results = await pMap( + tasks, + async (task) => { + try { + await this.executeTask(task); + return { taskId: task.id, success: true }; + } catch (error) { + return { taskId: task.id, success: false, error }; + } + }, + { concurrency } + ); + + // Check for failures + const failures = results.filter(r => !r.success); + + if (failures.length > 0) { + if (config.failureBehavior === "rollback") { + await this.rollbackAll(job); + throw new DeploymentFailedError(failures); + } else if (config.failureBehavior === "pause") { + job.status = "failed"; + throw new DeploymentFailedError(failures); + } + // "continue" - proceed despite failures + } + + // Health check all targets + await this.verifyAllTargets(job, config.healthCheck); + } +} +``` + +### Use Cases + +- Development environments +- Small deployments +- Time-critical updates +- Stateless services with fast startup + +## Rolling Strategy + +### Description + +Deploys to targets in configurable batches, maintaining availability throughout. + +``` + ROLLING DEPLOYMENT (batch size: 1) + + Time 0 Time 1 Time 2 Time 3 + ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ + │ T1 [v1] │ │ T1 [v2] ✓ │ │ T1 [v2] ✓ │ │ T1 [v2] ✓ │ + ├─────────────┤ ├─────────────┤ ├─────────────┤ ├─────────────┤ + │ T2 [v1] │──►│ T2 [v1] │──►│ T2 [v2] ✓ │──►│ T2 [v2] ✓ │ + ├─────────────┤ ├─────────────┤ ├─────────────┤ ├─────────────┤ + │ T3 [v1] │ │ T3 [v1] │ │ T3 [v1] │ │ T3 [v2] ✓ │ + └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ +``` + +### Configuration + +```typescript +interface RollingConfig { + strategy: "rolling"; + + // Batch configuration + batchSize: number; // Targets per batch + batchPercent?: number; // Alternative: percentage of targets + + // Timing + batchDelay: number; // Seconds between batches + stabilizationTime: number; // Wait after health check passes + + // Health check + healthCheck: HealthCheckConfig; + + // Failure handling + maxFailedBatches: number; // Failures before stopping + failureBehavior: "rollback" | "pause" | "skip"; + + // Ordering + targetOrder: "default" | "shuffle" | "priority"; +} + +// Example +const config: RollingConfig = { + strategy: "rolling", + batchSize: 2, + batchDelay: 30, + stabilizationTime: 60, + healthCheck: { + type: "http", + path: "/health", + timeout: 30, + retries: 5, + interval: 10 + }, + maxFailedBatches: 1, + failureBehavior: "rollback", + targetOrder: "default" +}; +``` + +### Execution + +```typescript +class RollingExecutor { + async execute(job: DeploymentJob, config: RollingConfig): Promise { + const tasks = this.orderTasks(job.tasks, config.targetOrder); + const batches = this.createBatches(tasks, config); + let failedBatches = 0; + const completedTasks: DeploymentTask[] = []; + + for (const batch of batches) { + this.emitProgress(job, { + phase: "deploying", + currentBatch: batches.indexOf(batch) + 1, + totalBatches: batches.length, + completedTargets: completedTasks.length, + totalTargets: tasks.length + }); + + // Execute batch + const results = await Promise.all( + batch.map(task => this.executeTask(task)) + ); + + // Check batch results + const failures = results.filter(r => !r.success); + + if (failures.length > 0) { + failedBatches++; + + if (failedBatches > config.maxFailedBatches) { + if (config.failureBehavior === "rollback") { + await this.rollbackCompleted(completedTasks); + } + throw new DeploymentFailedError(failures); + } + + if (config.failureBehavior === "pause") { + job.status = "failed"; + throw new DeploymentFailedError(failures); + } + // "skip" - continue to next batch + } + + // Health check batch targets + await this.verifyBatch(batch, config.healthCheck); + + // Wait for stabilization + if (config.stabilizationTime > 0) { + await sleep(config.stabilizationTime * 1000); + } + + completedTasks.push(...batch); + + // Wait before next batch + if (batches.indexOf(batch) < batches.length - 1) { + await sleep(config.batchDelay * 1000); + } + } + } + + private createBatches( + tasks: DeploymentTask[], + config: RollingConfig + ): DeploymentTask[][] { + const batchSize = config.batchPercent + ? Math.ceil(tasks.length * config.batchPercent / 100) + : config.batchSize; + + const batches: DeploymentTask[][] = []; + for (let i = 0; i < tasks.length; i += batchSize) { + batches.push(tasks.slice(i, i + batchSize)); + } + + return batches; + } +} +``` + +### Use Cases + +- Production deployments +- High-availability requirements +- Large target counts +- Services requiring gradual rollout + +## Canary Strategy + +### Description + +Deploys to a small subset of targets first, validates, then expands to remaining targets. + +``` + CANARY DEPLOYMENT + + Phase 1: Canary (10%) Phase 2: Expand (50%) Phase 3: Full (100%) + + ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ + │ T1 [v2] ✓ │ ◄─canary │ T1 [v2] ✓ │ │ T1 [v2] ✓ │ + ├─────────────┤ ├─────────────┤ ├─────────────┤ + │ T2 [v1] │ │ T2 [v2] ✓ │ │ T2 [v2] ✓ │ + ├─────────────┤ ├─────────────┤ ├─────────────┤ + │ T3 [v1] │ │ T3 [v2] ✓ │ │ T3 [v2] ✓ │ + ├─────────────┤ ├─────────────┤ ├─────────────┤ + │ T4 [v1] │ │ T4 [v2] ✓ │ │ T4 [v2] ✓ │ + ├─────────────┤ ├─────────────┤ ├─────────────┤ + │ T5 [v1] │ │ T5 [v1] │ │ T5 [v2] ✓ │ + └─────────────┘ └─────────────┘ └─────────────┘ + + │ │ │ + ▼ ▼ ▼ + Health Check Health Check Health Check + Error Rate Check Error Rate Check Error Rate Check +``` + +### Configuration + +```typescript +interface CanaryConfig { + strategy: "canary"; + + // Canary stages + stages: CanaryStage[]; + + // Canary selection + canarySelector: "random" | "labeled" | "first"; + canaryLabel?: string; // Label for canary targets + + // Automatic vs manual progression + autoProgress: boolean; + + // Health and metrics checks + healthCheck: HealthCheckConfig; + metricsCheck?: MetricsCheckConfig; +} + +interface CanaryStage { + name: string; + percentage: number; // Target percentage + duration: number; // Minimum time at this stage (seconds) + autoProgress: boolean; // Auto-advance after duration +} + +interface MetricsCheckConfig { + integrationId: UUID; // Metrics integration + queries: MetricQuery[]; + failureThreshold: number; // Percentage deviation to fail +} + +interface MetricQuery { + name: string; + query: string; // PromQL or similar + operator: "lt" | "gt" | "eq"; + threshold: number; +} + +// Example +const config: CanaryConfig = { + strategy: "canary", + stages: [ + { name: "canary", percentage: 10, duration: 300, autoProgress: false }, + { name: "expand", percentage: 50, duration: 300, autoProgress: true }, + { name: "full", percentage: 100, duration: 0, autoProgress: true } + ], + canarySelector: "labeled", + canaryLabel: "canary=true", + autoProgress: false, + healthCheck: { + type: "http", + path: "/health", + timeout: 30, + retries: 5, + interval: 10 + }, + metricsCheck: { + integrationId: "prometheus-uuid", + queries: [ + { + name: "error_rate", + query: "rate(http_requests_total{status=~\"5..\"}[5m]) / rate(http_requests_total[5m])", + operator: "lt", + threshold: 0.01 // Less than 1% error rate + } + ], + failureThreshold: 10 + } +}; +``` + +### Execution + +```typescript +class CanaryExecutor { + async execute(job: DeploymentJob, config: CanaryConfig): Promise { + const tasks = this.orderTasks(job.tasks, config); + + for (const stage of config.stages) { + const targetCount = Math.ceil(tasks.length * stage.percentage / 100); + const stageTasks = tasks.slice(0, targetCount); + const newTasks = stageTasks.filter(t => t.status === "pending"); + + this.emitProgress(job, { + phase: "canary", + stage: stage.name, + percentage: stage.percentage, + targets: stageTasks.length + }); + + // Deploy to new targets in this stage + await Promise.all(newTasks.map(task => this.executeTask(task))); + + // Health check stage targets + await this.verifyTargets(stageTasks, config.healthCheck); + + // Metrics check if configured + if (config.metricsCheck) { + await this.checkMetrics(stageTasks, config.metricsCheck); + } + + // Wait for stage duration + if (stage.duration > 0) { + await this.waitWithMonitoring( + stageTasks, + stage.duration, + config.metricsCheck + ); + } + + // Wait for manual approval if not auto-progress + if (!stage.autoProgress && stage.percentage < 100) { + await this.waitForApproval(job, stage.name); + } + } + } + + private async checkMetrics( + targets: DeploymentTask[], + config: MetricsCheckConfig + ): Promise { + const metricsClient = await this.getMetricsClient(config.integrationId); + + for (const query of config.queries) { + const result = await metricsClient.query(query.query); + + const passed = this.evaluateMetric(result, query); + + if (!passed) { + throw new CanaryMetricsFailedError(query.name, result, query.threshold); + } + } + } +} +``` + +### Use Cases + +- Risk-sensitive deployments +- Services with real user traffic +- Deployments with metrics-based validation +- Gradual feature rollouts + +## Blue-Green Strategy + +### Description + +Deploys to a parallel "green" environment while "blue" continues serving traffic, then switches. + +``` + BLUE-GREEN DEPLOYMENT + + Phase 1: Deploy Green Phase 2: Switch Traffic + + ┌─────────────────────────┐ ┌─────────────────────────┐ + │ Load Balancer │ │ Load Balancer │ + │ │ │ │ │ │ + │ ▼ │ │ ▼ │ + │ ┌─────────────┐ │ │ ┌─────────────┐ │ + │ │ Blue [v1] │◄─active│ │ │ Blue [v1] │ │ + │ │ T1, T2, T3 │ │ │ │ T1, T2, T3 │ │ + │ └─────────────┘ │ │ └─────────────┘ │ + │ │ │ │ + │ ┌─────────────┐ │ │ ┌─────────────┐ │ + │ │ Green [v2] │◄─deploy│ │ │ Green [v2] │◄─active│ + │ │ T4, T5, T6 │ │ │ │ T4, T5, T6 │ │ + │ └─────────────┘ │ │ └─────────────┘ │ + │ │ │ │ + └─────────────────────────┘ └─────────────────────────┘ +``` + +### Configuration + +```typescript +interface BlueGreenConfig { + strategy: "blue-green"; + + // Environment labels + blueLabel: string; // Label for blue targets + greenLabel: string; // Label for green targets + + // Traffic routing + routerIntegration: UUID; // Router/LB integration + routingConfig: RoutingConfig; + + // Validation + healthCheck: HealthCheckConfig; + warmupTime: number; // Seconds to warm up green + validationTests?: string[]; // Test suites to run + + // Switchover + switchoverMode: "instant" | "gradual"; + gradualSteps?: number[]; // Percentage steps for gradual + + // Rollback + keepBlueActive: number; // Seconds to keep blue ready +} + +// Example +const config: BlueGreenConfig = { + strategy: "blue-green", + blueLabel: "deployment=blue", + greenLabel: "deployment=green", + routerIntegration: "nginx-lb-uuid", + routingConfig: { + upstreamName: "myapp", + healthEndpoint: "/health" + }, + healthCheck: { + type: "http", + path: "/health", + timeout: 30, + retries: 5, + interval: 10 + }, + warmupTime: 60, + validationTests: ["smoke-test-suite"], + switchoverMode: "instant", + keepBlueActive: 1800 // 30 minutes +}; +``` + +### Execution + +```typescript +class BlueGreenExecutor { + async execute(job: DeploymentJob, config: BlueGreenConfig): Promise { + // Identify blue and green targets + const { blue, green } = this.categorizeTargets(job.tasks, config); + + // Phase 1: Deploy to green + this.emitProgress(job, { phase: "deploying-green" }); + + await Promise.all(green.map(task => this.executeTask(task))); + + // Health check green targets + await this.verifyTargets(green, config.healthCheck); + + // Warmup period + if (config.warmupTime > 0) { + this.emitProgress(job, { phase: "warming-up" }); + await sleep(config.warmupTime * 1000); + } + + // Run validation tests + if (config.validationTests?.length) { + this.emitProgress(job, { phase: "validating" }); + await this.runValidationTests(green, config.validationTests); + } + + // Phase 2: Switch traffic + this.emitProgress(job, { phase: "switching-traffic" }); + + if (config.switchoverMode === "instant") { + await this.instantSwitchover(config, blue, green); + } else { + await this.gradualSwitchover(config, blue, green); + } + + // Verify traffic routing + await this.verifyRouting(green, config); + + // Schedule blue decommission + if (config.keepBlueActive > 0) { + this.scheduleBlueDecommission(blue, config.keepBlueActive); + } + } + + private async instantSwitchover( + config: BlueGreenConfig, + blue: DeploymentTask[], + green: DeploymentTask[] + ): Promise { + const router = await this.getRouter(config.routerIntegration); + + // Update upstream to green targets + await router.updateUpstream(config.routingConfig.upstreamName, { + servers: green.map(t => ({ + address: t.target.address, + weight: 1 + })) + }); + + // Remove blue from rotation + await router.removeServers( + config.routingConfig.upstreamName, + blue.map(t => t.target.address) + ); + } + + private async gradualSwitchover( + config: BlueGreenConfig, + blue: DeploymentTask[], + green: DeploymentTask[] + ): Promise { + const router = await this.getRouter(config.routerIntegration); + const steps = config.gradualSteps || [25, 50, 75, 100]; + + for (const percentage of steps) { + await router.setTrafficSplit(config.routingConfig.upstreamName, { + blue: 100 - percentage, + green: percentage + }); + + // Monitor for errors + await this.monitorTraffic(30); + } + } +} +``` + +### Use Cases + +- Zero-downtime deployments +- Database migration deployments +- High-stakes production updates +- Instant rollback requirements + +## Strategy Selection Guide + +``` + STRATEGY SELECTION + + START + │ + ▼ + ┌────────────────────────┐ + │ Zero downtime needed? │ + └───────────┬────────────┘ + │ + No │ Yes + │ │ │ + ▼ │ ▼ + ┌──────────┐ │ ┌───────────────────┐ + │ All-at- │ │ │ Metrics-based │ + │ once │ │ │ validation needed?│ + └──────────┘ │ └─────────┬─────────┘ + │ │ + │ No │ Yes + │ │ │ │ + │ ▼ │ ▼ + │ ┌──────────┐│ ┌──────────┐ + │ │ Instant ││ │ Canary │ + │ │ rollback? ││ │ │ + │ └────┬─────┘│ └──────────┘ + │ │ │ + │ No │ Yes │ + │ │ │ │ │ + │ ▼ │ ▼ │ + │┌──────┐│┌────┴─────┐ + ││Rolling│││Blue-Green│ + │└──────┘│└──────────┘ + │ │ + └───────┘ +``` + +## References + +- [Deployment Overview](overview.md) +- [Progressive Delivery](../modules/progressive-delivery.md) +- [Rollback Management](overview.md#rollback-management) diff --git a/docs/modules/release-orchestrator/design/decisions.md b/docs/modules/release-orchestrator/design/decisions.md new file mode 100644 index 000000000..5b1f1386e --- /dev/null +++ b/docs/modules/release-orchestrator/design/decisions.md @@ -0,0 +1,249 @@ +# Key Architectural Decisions + +This document records significant architectural decisions and their rationale. + +## ADR-001: Digest-First Release Identity + +**Status:** Accepted + +**Context:** +Container images can be referenced by tags (e.g., `v1.2.3`) or digests (e.g., `sha256:abc123...`). Tags are mutable - the same tag can point to different images over time. + +**Decision:** +All releases are identified by immutable OCI digests, never tags. Tags are accepted as input but immediately resolved to digests at release creation time. + +**Consequences:** +- Releases are immutable and reproducible +- Digest mismatch at pull time indicates tampering (deployment fails) +- Rollback targets specific digest, not "previous tag" +- Requires registry integration for tag resolution +- Users see both tag (friendly) and digest (authoritative) in UI + +--- + +## ADR-002: Evidence for Every Decision + +**Status:** Accepted + +**Context:** +Compliance and audit requirements demand proof of what was deployed, when, by whom, and why. + +**Decision:** +Every promotion and deployment produces a cryptographically signed evidence packet that is immutable and append-only. + +**Consequences:** +- Evidence table has no UPDATE/DELETE permissions +- Evidence enables audit-grade compliance reporting +- Evidence enables deterministic replay (same inputs + policy = same decision) +- Evidence packets are exportable for external audit systems +- Storage requirements increase over time + +--- + +## ADR-003: Plugin Architecture for Integrations + +**Status:** Accepted + +**Context:** +Organizations use diverse toolchains (registries, CI/CD, vaults, notification systems). Hard-coding integrations limits adoption. + +**Decision:** +All integrations are implemented as plugins via a three-surface contract (Manifest, Connector Runtime, Step Provider). Core orchestration is stable and plugin-agnostic. + +**Consequences:** +- Core has no hard-coded vendor integrations +- New integrations can be added without core changes +- Plugin failures cannot crash core (sandbox isolation) +- Plugin interface must be versioned and stable +- Additional complexity in plugin lifecycle management + +--- + +## ADR-004: No Feature Gating + +**Status:** Accepted + +**Context:** +Enterprise software often gates security features behind premium tiers, creating "pay for security" anti-patterns. + +**Decision:** +All plans include all features. Pricing is based only on: +- Number of environments +- New digests analyzed per day +- Fair use on deployments + +**Consequences:** +- No feature flags tied to billing tier +- Transparent pricing without feature fragmentation +- May limit revenue optimization per customer +- Quota enforcement must be clear and user-friendly + +--- + +## ADR-005: Offline-First Operation + +**Status:** Accepted + +**Context:** +Many organizations operate in air-gapped or restricted network environments. Dependency on external services limits adoption. + +**Decision:** +All core operations must work in air-gapped environments. External data is synced via mirror bundles. Plugins may require connectivity; core does not. + +**Consequences:** +- No runtime calls to external APIs for core decisions +- Advisory data synced via offline bundles +- Plugin connectivity requirements are declared in manifest +- Evidence packets exportable for external submission +- Additional complexity in data synchronization + +--- + +## ADR-006: Agent-Based and Agentless Deployment + +**Status:** Accepted + +**Context:** +Some organizations prefer agents for security isolation; others prefer agentless for simplicity. + +**Decision:** +Support both agent-based (persistent daemon on targets) and agentless (SSH/WinRM on demand) deployment models. + +**Consequences:** +- Agent provides better performance and reliability +- Agentless reduces infrastructure footprint +- Unified task model abstracts deployment details +- Security model must handle both patterns +- Higher testing matrix + +--- + +## ADR-007: PostgreSQL as Primary Database + +**Status:** Accepted + +**Context:** +Database choice affects scalability, operations, and feature availability. + +**Decision:** +PostgreSQL (16+) as the primary database with: +- Per-module schema isolation +- Row-level security for multi-tenancy +- JSONB for flexible configuration +- Append-only triggers for evidence tables + +**Consequences:** +- Proven scalability and reliability +- Rich feature set (JSONB, RLS, triggers) +- Single database technology to operate +- Requires PostgreSQL expertise +- Schema migrations must be carefully managed + +--- + +## ADR-008: Workflow Engine with DAG Execution + +**Status:** Accepted + +**Context:** +Deployment workflows need conditional logic, parallel execution, error handling, and rollback support. + +**Decision:** +Implement a DAG-based workflow engine where: +- Workflows are templates with nodes (steps) and edges (dependencies) +- Steps execute when all dependencies are satisfied +- Expressions reference previous step outputs +- Built-in support for approval, retry, timeout, and rollback + +**Consequences:** +- Flexible workflow composition +- Visual representation in UI +- Complex error handling scenarios supported +- Learning curve for workflow authors +- Expression engine security considerations + +--- + +## ADR-009: Separation of Duties Enforcement + +**Status:** Accepted + +**Context:** +Compliance requires that the person requesting a change cannot be the same person approving it. + +**Decision:** +Separation of Duties (SoD) is enforced at the approval gateway level, preventing self-approval when SoD is enabled for an environment. + +**Consequences:** +- Prevents single-person deployment to sensitive environments +- Configurable per environment +- May slow down deployments +- Requires minimum team size for SoD-enabled environments + +--- + +## ADR-010: Version Stickers for Drift Detection + +**Status:** Accepted + +**Context:** +Knowing what's actually deployed on targets is essential for audit and troubleshooting. + +**Decision:** +Every deployment writes a `stella.version.json` sticker file on the target containing release ID, digests, deployment timestamp, and deployer identity. + +**Consequences:** +- Enables drift detection (expected vs actual) +- Provides audit trail on target hosts +- Enables accurate "what's deployed where" queries +- Requires file access on targets +- Sticker corruption/deletion must be handled + +--- + +## ADR-011: Security Gate Integration + +**Status:** Accepted + +**Context:** +Security scanning exists as a separate concern; release orchestration should leverage but not duplicate it. + +**Decision:** +Security scanning remains in existing modules (Scanner, VEX). Release orchestration consumes scan results through a security gate that evaluates vulnerability thresholds. + +**Consequences:** +- Clear separation of concerns +- Existing scanning investment preserved +- Gate configuration determines block thresholds +- Requires API integration with scanning modules +- Policy engine evaluates security verdicts + +--- + +## ADR-012: gRPC for Agent Communication + +**Status:** Accepted + +**Context:** +Agent communication requires efficient, bidirectional, and secure data transfer. + +**Decision:** +Use gRPC for agent communication with: +- mTLS for transport security +- Bidirectional streaming for logs and progress +- Protocol buffers for efficient serialization + +**Consequences:** +- Efficient binary protocol +- Strong typing via protobuf +- Built-in streaming support +- Requires gRPC infrastructure +- Firewall considerations for gRPC traffic + +--- + +## References + +- [Design Principles](principles.md) +- [Security Architecture](../security/overview.md) +- [Plugin System](../modules/plugin-system.md) diff --git a/docs/modules/release-orchestrator/design/principles.md b/docs/modules/release-orchestrator/design/principles.md new file mode 100644 index 000000000..61163dccf --- /dev/null +++ b/docs/modules/release-orchestrator/design/principles.md @@ -0,0 +1,221 @@ +# Design Principles & Invariants + +> These principles are **inviolable** and MUST be reflected in all code, UI, documentation, and audit artifacts. + +## Core Principles + +### Principle 1: Release Identity via Digest + +``` +INVARIANT: A release is a set of OCI image digests (component → digest mapping), never tags. +``` + +- Tags are convenience inputs for resolution +- Tags are resolved to digests at release creation time +- All downstream operations (promotion, deployment, rollback) use digests +- Digest mismatch at pull time = deployment failure (tamper detection) + +**Implementation Requirements:** +- Release creation API accepts tags but immediately resolves to digests +- All internal references use `sha256:` prefixed digests +- Agent deployment verifies digest at pull time +- Rollback targets specific digest, not "previous tag" + +### Principle 2: Determinism and Evidence + +``` +INVARIANT: Every deployment/promotion produces an immutable evidence record. +``` + +Evidence record contains: +- **Who**: User identity (from Authority) +- **What**: Release bundle (digests), target environment, target hosts +- **Why**: Policy evaluation result, approval records, decision reasons +- **How**: Generated artifacts (compose files, scripts), execution logs +- **When**: Timestamps for request, decision, execution, completion + +Evidence enables: +- Audit-grade compliance reporting +- Deterministic replay (same inputs + policy → same decision) +- "Why blocked?" explainability + +**Implementation Requirements:** +- Evidence is generated synchronously with decision +- Evidence is signed before storage +- Evidence table is append-only (no UPDATE/DELETE) +- Evidence includes hash of all inputs for replay verification + +### Principle 3: Pluggable Everything, Stable Core + +``` +INVARIANT: Integrations are plugins; the core orchestration engine is stable. +``` + +**Plugins contribute:** +- Configuration screens (UI) +- Connector logic (runtime) +- Step node types (workflow) +- Doctor checks (diagnostics) +- Agent types (deployment) + +**Core engine provides:** +- Workflow execution (DAG processing) +- State machine management +- Evidence generation +- Policy evaluation +- Credential brokering + +**Implementation Requirements:** +- Core has no hard-coded integrations +- Plugin interface is versioned and stable +- Plugin failures cannot crash core +- Core provides fallback behavior when plugins unavailable + +### Principle 4: No Feature Gating + +``` +INVARIANT: All plans include all features. Limits are only: +- Number of environments +- Number of new digests analyzed per day +- Fair use on deployments +``` + +This prevents: +- "Pay for security" anti-pattern +- Per-project/per-seat billing landmines +- Feature fragmentation across tiers + +**Implementation Requirements:** +- No feature flags tied to billing tier +- Quota enforcement is transparent (clear error messages) +- Usage metrics exposed for customer visibility +- Overage handling is graceful (soft limits with warnings) + +### Principle 5: Offline-First Operation + +``` +INVARIANT: All core operations MUST work in air-gapped environments. +``` + +Implications: +- No runtime calls to external APIs for core decisions +- Vulnerability data synced via mirror bundles +- Plugins may require connectivity; core does not +- Evidence packets exportable for external audit + +**Implementation Requirements:** +- Core decision logic has no external HTTP calls +- All external data is pre-synced and cached +- Plugin connectivity requirements are declared in manifest +- Offline mode is explicit configuration, not degraded fallback + +### Principle 6: Immutable Generated Artifacts + +``` +INVARIANT: Every deployment generates and stores immutable artifacts. +``` + +Generated artifacts: +- `compose.stella.lock.yml`: Pinned digests, resolved env refs +- `deploy.stella.script.dll`: Compiled C# script (or hash reference) +- `release.evidence.json`: Decision record +- `stella.version.json`: Version sticker placed on target + +Version sticker enables: +- Drift detection (expected vs actual) +- Audit trail on target host +- Rollback reference + +**Implementation Requirements:** +- Artifacts are content-addressed (hash in filename or metadata) +- Artifacts are stored before deployment execution +- Artifact storage is immutable (no overwrites) +- Version sticker is atomic write on target + +--- + +## Architectural Invariants (Enforced by Design) + +These invariants are enforced through database constraints, code architecture, and operational controls. + +| Invariant | Enforcement Mechanism | +|-----------|----------------------| +| Digests are immutable | Database constraint: digest column is unique, no updates | +| Evidence packets are append-only | Evidence table has no UPDATE/DELETE permissions | +| Secrets never in database | Vault integration; only references stored | +| Plugins cannot bypass policy | Policy evaluation in core, not plugin | +| Multi-tenant isolation | `tenant_id` FK on all tables; row-level security | +| Workflow state is auditable | State transitions logged; no direct state manipulation | +| Approvals are tamper-evident | Approval records are signed and append-only | + +### Database Enforcement + +```sql +-- Example: Evidence table with no UPDATE/DELETE +CREATE TABLE release.evidence_packets ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + promotion_id UUID NOT NULL REFERENCES release.promotions(id), + content_hash TEXT NOT NULL, + content JSONB NOT NULL, + signature TEXT NOT NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT now() + -- No updated_at column; immutable by design +); + +-- Revoke UPDATE/DELETE from application role +REVOKE UPDATE, DELETE ON release.evidence_packets FROM app_role; +``` + +### Code Architecture Enforcement + +```csharp +// Policy evaluation is ALWAYS in core, never delegated to plugins +public sealed class PromotionDecisionEngine +{ + // Plugins provide gate implementations, but core orchestrates evaluation + public async Task EvaluateAsync( + Promotion promotion, + IReadOnlyList gates, + CancellationToken ct) + { + // Core controls evaluation order and aggregation + var results = new List(); + foreach (var gate in gates) + { + // Plugin provides evaluation logic + var result = await gate.EvaluateAsync(promotion, ct); + results.Add(result); + + // Core decides how to aggregate (plugins cannot override) + if (result.IsBlocking && _policy.FailFast) + break; + } + + // Core makes final decision + return _decisionAggregator.Aggregate(results); + } +} +``` + +--- + +## Document Conventions + +Throughout the Release Orchestrator documentation: + +- **MUST**: Mandatory requirement; non-compliance is a bug +- **SHOULD**: Recommended but not mandatory; deviation requires justification +- **MAY**: Optional; implementation decision +- **Entity names**: `PascalCase` (e.g., `ReleaseBundle`) +- **Table names**: `snake_case` (e.g., `release_bundles`) +- **API paths**: `/api/v1/resource-name` +- **Module names**: `kebab-case` (e.g., `release-manager`) + +--- + +## References + +- [Key Architectural Decisions](decisions.md) +- [Module Architecture](../modules/overview.md) +- [Security Architecture](../security/overview.md) diff --git a/docs/modules/release-orchestrator/implementation-guide.md b/docs/modules/release-orchestrator/implementation-guide.md new file mode 100644 index 000000000..f9b806b32 --- /dev/null +++ b/docs/modules/release-orchestrator/implementation-guide.md @@ -0,0 +1,602 @@ +# Implementation Guide + +> .NET 10 implementation patterns and best practices for Release Orchestrator modules. + +**Target Audience**: Development team implementing Release Orchestrator modules +**Prerequisites**: Familiarity with [CLAUDE.md](../../../CLAUDE.md) coding rules + +--- + +## Overview + +This guide supplements the architecture documentation with .NET 10-specific implementation patterns required for all Release Orchestrator modules. These patterns ensure: + +- Deterministic behavior for evidence reproducibility +- Testability through dependency injection +- Compliance with Stella Ops coding standards +- Performance and reliability + +--- + +## Code Quality Requirements + +### Compiler Configuration + +All Release Orchestrator projects **MUST** enforce warnings as errors: + +```xml + + + true + enable + disable + +``` + +**Rationale**: Warnings indicate potential bugs, regressions, or code quality drift. Treating them as errors prevents them from being ignored. + +--- + +## Determinism & Time Handling + +### TimeProvider Injection + +**Never** use `DateTime.UtcNow`, `DateTimeOffset.UtcNow`, or `DateTimeOffset.Now` directly. Always inject `TimeProvider`. + +```csharp +// ❌ BAD - non-deterministic, hard to test +public class PromotionManager +{ + public Promotion CreatePromotion(Guid releaseId, Guid targetEnvId) + { + return new Promotion + { + Id = Guid.NewGuid(), + ReleaseId = releaseId, + TargetEnvironmentId = targetEnvId, + RequestedAt = DateTimeOffset.UtcNow // ❌ Hard-coded time + }; + } +} + +// ✅ GOOD - injectable, testable, deterministic +public class PromotionManager +{ + private readonly TimeProvider _timeProvider; + private readonly IGuidGenerator _guidGenerator; + + public PromotionManager(TimeProvider timeProvider, IGuidGenerator guidGenerator) + { + _timeProvider = timeProvider; + _guidGenerator = guidGenerator; + } + + public Promotion CreatePromotion(Guid releaseId, Guid targetEnvId) + { + return new Promotion + { + Id = _guidGenerator.NewGuid(), + ReleaseId = releaseId, + TargetEnvironmentId = targetEnvId, + RequestedAt = _timeProvider.GetUtcNow() // ✅ Injected, testable + }; + } +} +``` + +**Registration**: +```csharp +// Production: use system time +services.AddSingleton(TimeProvider.System); + +// Testing: use manual time for deterministic tests +var manualTime = new ManualTimeProvider(); +manualTime.SetUtcNow(new DateTimeOffset(2026, 1, 10, 12, 0, 0, TimeSpan.Zero)); +services.AddSingleton(manualTime); +``` + +--- + +### GUID Generation + +**Never** use `Guid.NewGuid()` directly. Always inject `IGuidGenerator`. + +```csharp +// ❌ BAD +var releaseId = Guid.NewGuid(); + +// ✅ GOOD +var releaseId = _guidGenerator.NewGuid(); +``` + +**Interface**: +```csharp +public interface IGuidGenerator +{ + Guid NewGuid(); +} + +// Production implementation +public sealed class SystemGuidGenerator : IGuidGenerator +{ + public Guid NewGuid() => Guid.NewGuid(); +} + +// Deterministic test implementation +public sealed class SequentialGuidGenerator : IGuidGenerator +{ + private int _counter; + + public Guid NewGuid() + { + var bytes = new byte[16]; + BitConverter.GetBytes(_counter++).CopyTo(bytes, 0); + return new Guid(bytes); + } +} +``` + +--- + +## Async & Cancellation + +### CancellationToken Propagation + +**Always** propagate `CancellationToken` through async call chains. Never use `CancellationToken.None` except at entry points where no token is available. + +```csharp +// ❌ BAD - ignores cancellation +public async Task ApprovePromotionAsync(Guid promotionId, Guid userId, CancellationToken ct) +{ + var promotion = await _repository.GetByIdAsync(promotionId, CancellationToken.None); // ❌ Wrong + + promotion.Approvals.Add(new Approval + { + ApproverId = userId, + ApprovedAt = _timeProvider.GetUtcNow() + }); + + await _repository.SaveAsync(promotion, CancellationToken.None); // ❌ Wrong + await Task.Delay(1000); // ❌ Missing ct + + return promotion; +} + +// ✅ GOOD - propagates cancellation +public async Task ApprovePromotionAsync(Guid promotionId, Guid userId, CancellationToken ct) +{ + var promotion = await _repository.GetByIdAsync(promotionId, ct); // ✅ Propagated + + promotion.Approvals.Add(new Approval + { + ApproverId = userId, + ApprovedAt = _timeProvider.GetUtcNow() + }); + + await _repository.SaveAsync(promotion, ct); // ✅ Propagated + await Task.Delay(1000, ct); // ✅ Cancellable + + return promotion; +} +``` + +--- + +## HTTP Client Usage + +### IHttpClientFactory for Connector Runtime + +**Never** instantiate `HttpClient` directly. Always use `IHttpClientFactory` with configured timeouts and resilience policies. + +```csharp +// ❌ BAD - direct instantiation risks socket exhaustion +public class GitHubConnector +{ + public async Task GetCommitAsync(string sha) + { + using var client = new HttpClient(); // ❌ Socket exhaustion risk + var response = await client.GetAsync($"https://api.github.com/commits/{sha}"); + return await response.Content.ReadAsStringAsync(); + } +} + +// ✅ GOOD - factory with resilience +public class GitHubConnector +{ + private readonly IHttpClientFactory _httpClientFactory; + + public GitHubConnector(IHttpClientFactory httpClientFactory) + { + _httpClientFactory = httpClientFactory; + } + + public async Task GetCommitAsync(string sha, CancellationToken ct) + { + var client = _httpClientFactory.CreateClient("GitHub"); + var response = await client.GetAsync($"/commits/{sha}", ct); + response.EnsureSuccessStatusCode(); + return await response.Content.ReadAsStringAsync(ct); + } +} +``` + +**Registration with resilience**: +```csharp +services.AddHttpClient("GitHub", client => +{ + client.BaseAddress = new Uri("https://api.github.com"); + client.Timeout = TimeSpan.FromSeconds(30); + client.DefaultRequestHeaders.Add("User-Agent", "StellaOps/1.0"); +}) +.AddStandardResilienceHandler(options => +{ + options.Retry.MaxRetryAttempts = 3; + options.CircuitBreaker.SamplingDuration = TimeSpan.FromSeconds(30); + options.TotalRequestTimeout.Timeout = TimeSpan.FromMinutes(1); +}); +``` + +--- + +## Culture & Formatting + +### Invariant Culture for Parsing + +**Always** use `CultureInfo.InvariantCulture` for parsing and formatting dates, numbers, and any string that will be persisted, hashed, or compared. + +```csharp +// ❌ BAD - culture-sensitive +var percentage = double.Parse(input); +var formatted = value.ToString("P2"); +var dateStr = date.ToString("yyyy-MM-dd"); + +// ✅ GOOD - invariant culture +var percentage = double.Parse(input, CultureInfo.InvariantCulture); +var formatted = value.ToString("P2", CultureInfo.InvariantCulture); +var dateStr = date.ToString("yyyy-MM-dd", CultureInfo.InvariantCulture); +``` + +--- + +## JSON Handling + +### RFC 8785 Canonical JSON for Evidence + +For evidence packets and decision records that will be hashed or signed, use **RFC 8785-compliant** canonical JSON serialization. + +```csharp +// ❌ BAD - non-canonical JSON +var json = JsonSerializer.Serialize(decisionRecord, new JsonSerializerOptions +{ + Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping, + PropertyNamingPolicy = JsonNamingPolicy.CamelCase +}); +var hash = ComputeHash(json); // ❌ Non-deterministic + +// ✅ GOOD - use shared canonicalizer +var canonicalJson = CanonicalJsonSerializer.Serialize(decisionRecord); +var hash = ComputeHash(canonicalJson); // ✅ Deterministic +``` + +**Canonical JSON Requirements**: +- Keys sorted alphabetically +- Minimal escaping per RFC 8785 spec +- No exponent notation for numbers +- No trailing/leading zeros +- No whitespace + +--- + +## Database Interaction + +### DateTimeOffset for PostgreSQL timestamptz + +PostgreSQL `timestamptz` columns **MUST** be read and written as `DateTimeOffset`, not `DateTime`. + +```csharp +// ❌ BAD - loses offset information +await using var reader = await command.ExecuteReaderAsync(ct); +while (await reader.ReadAsync(ct)) +{ + var createdAt = reader.GetDateTime(reader.GetOrdinal("created_at")); // ❌ Loses offset +} + +// ✅ GOOD - preserves offset +await using var reader = await command.ExecuteReaderAsync(ct); +while (await reader.ReadAsync(ct)) +{ + var createdAt = reader.GetFieldValue(reader.GetOrdinal("created_at")); // ✅ Correct +} +``` + +**Insertion**: +```csharp +// ✅ Always use UTC DateTimeOffset +var createdAt = _timeProvider.GetUtcNow(); // Returns DateTimeOffset +await command.ExecuteNonQueryAsync(ct); +``` + +--- + +## Hybrid Logical Clock (HLC) for Distributed Ordering + +For distributed ordering and audit-safe sequencing, use `IHybridLogicalClock` from `StellaOps.HybridLogicalClock`. + +**When to use HLC**: +- Promotion state transitions +- Workflow step execution ordering +- Deployment task sequencing +- Timeline event ordering + +```csharp +public class PromotionStateTransition +{ + private readonly IHybridLogicalClock _hlc; + private readonly TimeProvider _timeProvider; + + public async Task TransitionStateAsync( + Promotion promotion, + PromotionState newState, + CancellationToken ct) + { + var transition = new StateTransition + { + PromotionId = promotion.Id, + FromState = promotion.Status, + ToState = newState, + THlc = _hlc.Tick(), // ✅ Monotonic, skew-tolerant ordering + TsWall = _timeProvider.GetUtcNow(), // ✅ Informational timestamp + TransitionedBy = _currentUser.Id + }; + + await _repository.RecordTransitionAsync(transition, ct); + } +} +``` + +**HLC State Persistence**: +```csharp +// Service startup +public async Task StartAsync(CancellationToken ct) +{ + await _hlc.InitializeFromStateAsync(ct); // Restore monotonicity +} + +// Service shutdown +public async Task StopAsync(CancellationToken ct) +{ + await _hlc.PersistStateAsync(ct); // Persist HLC state +} +``` + +--- + +## Configuration & Options + +### Options Validation at Startup + +Use `ValidateDataAnnotations()` and `ValidateOnStart()` for all options classes. + +```csharp +// Options class +public sealed class PromotionManagerOptions +{ + [Required] + [Range(1, 10)] + public int MaxConcurrentPromotions { get; set; } = 3; + + [Required] + [Range(1, 3600)] + public int ApprovalExpirationSeconds { get; set; } = 1440; +} + +// Registration with validation +services.AddOptions() + .Bind(configuration.GetSection("PromotionManager")) + .ValidateDataAnnotations() + .ValidateOnStart(); + +// Complex validation +public class PromotionManagerOptionsValidator : IValidateOptions +{ + public ValidateOptionsResult Validate(string? name, PromotionManagerOptions options) + { + if (options.MaxConcurrentPromotions <= 0) + return ValidateOptionsResult.Fail("MaxConcurrentPromotions must be positive"); + + return ValidateOptionsResult.Success; + } +} + +services.AddSingleton, PromotionManagerOptionsValidator>(); +``` + +--- + +## Immutability & Collections + +### Return Immutable Collections from Public APIs + +Public APIs **MUST** return `IReadOnlyList`, `ImmutableArray`, or defensive copies. Never expose mutable backing stores. + +```csharp +// ❌ BAD - exposes mutable backing store +public class ReleaseManager +{ + private readonly List _components = new(); + + public List Components => _components; // ❌ Callers can mutate! +} + +// ✅ GOOD - immutable return +public class ReleaseManager +{ + private readonly List _components = new(); + + public IReadOnlyList Components => _components.AsReadOnly(); // ✅ Read-only + + // Or using ImmutableArray + public ImmutableArray GetComponents() => _components.ToImmutableArray(); +} +``` + +--- + +## Error Handling + +### No Silent Stubs + +Placeholder code **MUST** throw `NotImplementedException` or return an explicit error. Never return success from unimplemented paths. + +```csharp +// ❌ BAD - silent stub masks missing implementation +public async Task DeployToNomadAsync(Deployment deployment, CancellationToken ct) +{ + // TODO: implement Nomad deployment + return Result.Success(); // ❌ Ships broken feature! +} + +// ✅ GOOD - explicit failure +public async Task DeployToNomadAsync(Deployment deployment, CancellationToken ct) +{ + throw new NotImplementedException( + "Nomad deployment not yet implemented. See SPRINT_20260115_003_AGENTS_nomad_support.md"); +} + +// ✅ Alternative: return unsupported result +public async Task DeployToNomadAsync(Deployment deployment, CancellationToken ct) +{ + return Result.Failure("Nomad deployment target not yet supported. Use Docker or Compose."); +} +``` + +--- + +## Caching + +### Bounded Caches with Eviction + +**Do not** use `ConcurrentDictionary` or `Dictionary` for caching without eviction policies. Use bounded caches with TTL/LRU eviction. + +```csharp +// ❌ BAD - unbounded growth +public class VersionMapCache +{ + private readonly ConcurrentDictionary _cache = new(); + + public void Add(string tag, DigestMapping mapping) + { + _cache[tag] = mapping; // ❌ Never evicts, memory grows forever + } +} + +// ✅ GOOD - bounded with eviction +public class VersionMapCache +{ + private readonly MemoryCache _cache; + + public VersionMapCache() + { + _cache = new MemoryCache(new MemoryCacheOptions + { + SizeLimit = 10_000 // Max 10k entries + }); + } + + public void Add(string tag, DigestMapping mapping) + { + _cache.Set(tag, mapping, new MemoryCacheEntryOptions + { + Size = 1, + SlidingExpiration = TimeSpan.FromHours(1) // ✅ 1 hour TTL + }); + } + + public DigestMapping? Get(string tag) => _cache.Get(tag); +} +``` + +**Cache TTL Recommendations**: +- **Integration health checks**: 5 minutes +- **Version maps (tag → digest)**: 1 hour +- **Environment configs**: 30 minutes +- **Agent capabilities**: 10 minutes + +--- + +## Testing + +### Test Helpers Must Call Production Code + +Test helpers **MUST** call production code, not reimplement algorithms. Only mock I/O and network boundaries. + +```csharp +// ❌ BAD - test reimplements production logic +public static string ComputeEvidenceHash(DecisionRecord record) +{ + // Custom hash implementation in test + var json = JsonSerializer.Serialize(record); // ❌ Different from production! + return SHA256.HashData(Encoding.UTF8.GetBytes(json)).ToHexString(); +} + +// ✅ GOOD - test uses production code +public static string ComputeEvidenceHash(DecisionRecord record) +{ + // Calls production EvidenceHasher + return EvidenceHasher.ComputeHash(record); // ✅ Same as production +} +``` + +--- + +## Path Resolution + +### Explicit CLI Options for Paths + +**Do not** derive paths from `AppContext.BaseDirectory` with parent directory walks. Use explicit CLI options or environment variables. + +```csharp +// ❌ BAD - fragile parent walks +var repoRoot = Path.GetFullPath(Path.Combine( + AppContext.BaseDirectory, "..", "..", "..", "..")); + +// ✅ GOOD - explicit option with fallback +[Option("--repo-root", Description = "Repository root path")] +public string? RepoRoot { get; set; } + +public string GetRepoRoot() => + RepoRoot + ?? Environment.GetEnvironmentVariable("STELLAOPS_REPO_ROOT") + ?? throw new InvalidOperationException( + "Repository root not specified. Use --repo-root or set STELLAOPS_REPO_ROOT."); +``` + +--- + +## Summary Checklist + +Before submitting a pull request, verify: + +- [ ] `TreatWarningsAsErrors` enabled in project file +- [ ] All timestamps use `TimeProvider`, never `DateTime.UtcNow` +- [ ] All GUIDs use `IGuidGenerator`, never `Guid.NewGuid()` +- [ ] `CancellationToken` propagated through all async methods +- [ ] HTTP clients use `IHttpClientFactory`, never `new HttpClient()` +- [ ] Culture-invariant parsing for all formatted strings +- [ ] Canonical JSON for evidence/decision records +- [ ] `DateTimeOffset` for all PostgreSQL `timestamptz` columns +- [ ] HLC used for distributed ordering where applicable +- [ ] Options classes validated at startup with `ValidateOnStart()` +- [ ] Public APIs return immutable collections +- [ ] No silent stubs; unimplemented code throws `NotImplementedException` +- [ ] Caches have bounded size and TTL eviction +- [ ] Tests exercise production code, not reimplementations + +--- + +## References + +- [CLAUDE.md](../../../CLAUDE.md) — Stella Ops coding rules +- [Test Structure](./test-structure.md) — Test organization guidelines +- [Database Schema](./data-model/schema.md) — Schema patterns +- [HLC Documentation](../../eventing/event-envelope-schema.md) — Event ordering with HLC diff --git a/docs/modules/release-orchestrator/integrations/ci-cd.md b/docs/modules/release-orchestrator/integrations/ci-cd.md new file mode 100644 index 000000000..7d0b65dbc --- /dev/null +++ b/docs/modules/release-orchestrator/integrations/ci-cd.md @@ -0,0 +1,643 @@ +# CI/CD Integration + +## Overview + +Release Orchestrator integrates with CI/CD systems to: +- Receive build completion notifications +- Trigger additional pipelines during deployment +- Create releases from CI artifacts +- Report deployment status back to CI systems + +## Integration Patterns + +### Pattern 1: CI Triggers Release + +``` + CI TRIGGERS RELEASE + + ┌────────────┐ ┌────────────┐ ┌────────────────────┐ + │ CI/CD │ │ Container │ │ Release │ + │ System │ │ Registry │ │ Orchestrator │ + └─────┬──────┘ └─────┬──────┘ └─────────┬──────────┘ + │ │ │ + │ Build & Push │ │ + │─────────────────►│ │ + │ │ │ + │ │ Webhook: image pushed + │ │─────────────────────►│ + │ │ │ + │ │ │ Create/Update + │ │ │ Version Map + │ │ │ + │ │ │ Auto-create + │ │ │ Release (if configured) + │ │ │ + │ API: Create Release (optional) │ + │────────────────────────────────────────►│ + │ │ │ + │ │ │ Start Promotion + │ │ │ Workflow + │ │ │ +``` + +### Pattern 2: Orchestrator Triggers CI + +``` + ORCHESTRATOR TRIGGERS CI + + ┌────────────────────┐ ┌────────────┐ ┌────────────┐ + │ Release │ │ CI/CD │ │ Target │ + │ Orchestrator │ │ System │ │ Systems │ + └─────────┬──────────┘ └─────┬──────┘ └─────┬──────┘ + │ │ │ + │ Pre-deploy: Trigger │ │ + │ Integration Tests │ │ + │─────────────────────►│ │ + │ │ │ + │ │ Run Tests │ + │ │─────────────────►│ + │ │ │ + │ Wait for completion │ │ + │◄─────────────────────│ │ + │ │ │ + │ If passed: Deploy │ │ + │─────────────────────────────────────────► + │ │ │ +``` + +### Pattern 3: Bidirectional Integration + +``` + BIDIRECTIONAL INTEGRATION + + ┌────────────┐ ┌────────────────────┐ + │ CI/CD │◄───────────────────────►│ Release │ + │ System │ │ Orchestrator │ + └─────┬──────┘ └─────────┬──────────┘ + │ │ + │══════════════════════════════════════════│ + │ Events (both directions) │ + │══════════════════════════════════════════│ + │ │ + │ CI Events: │ + │ - Pipeline completed │ + │ - Tests passed/failed │ + │ - Artifacts ready │ + │ │ + │ Orchestrator Events: │ + │ - Deployment started │ + │ - Deployment completed │ + │ - Rollback initiated │ + │ │ +``` + +## CI/CD System Configuration + +### GitLab CI Integration + +```yaml +# .gitlab-ci.yml +stages: + - build + - push + - release + +variables: + STELLA_API_URL: https://stella.example.com/api/v1 + COMPONENT_NAME: myapp + +build: + stage: build + script: + - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA . + +push: + stage: push + script: + - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA + - docker tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA $CI_REGISTRY_IMAGE:$CI_COMMIT_TAG + - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_TAG + rules: + - if: $CI_COMMIT_TAG + +release: + stage: release + image: curlimages/curl:latest + script: + - | + # Get image digest + DIGEST=$(docker inspect --format='{{index .RepoDigests 0}}' $CI_REGISTRY_IMAGE:$CI_COMMIT_TAG | cut -d@ -f2) + + # Create release in Stella + curl -X POST "$STELLA_API_URL/releases" \ + -H "Authorization: Bearer $STELLA_TOKEN" \ + -H "Content-Type: application/json" \ + -d "{ + \"name\": \"$COMPONENT_NAME-$CI_COMMIT_TAG\", + \"components\": [{ + \"componentId\": \"$STELLA_COMPONENT_ID\", + \"digest\": \"$DIGEST\" + }], + \"sourceRef\": { + \"type\": \"git\", + \"repository\": \"$CI_PROJECT_URL\", + \"commit\": \"$CI_COMMIT_SHA\", + \"tag\": \"$CI_COMMIT_TAG\" + } + }" + rules: + - if: $CI_COMMIT_TAG +``` + +### GitHub Actions Integration + +```yaml +# .github/workflows/release.yml +name: Release to Stella + +on: + push: + tags: + - 'v*' + +jobs: + build-and-release: + runs-on: ubuntu-latest + + steps: + - uses: actions/checkout@v4 + + - name: Set up Docker Buildx + uses: docker/setup-buildx-action@v3 + + - name: Login to Container Registry + uses: docker/login-action@v3 + with: + registry: ghcr.io + username: ${{ github.actor }} + password: ${{ secrets.GITHUB_TOKEN }} + + - name: Build and push + uses: docker/build-push-action@v5 + with: + push: true + tags: | + ghcr.io/${{ github.repository }}:${{ github.sha }} + ghcr.io/${{ github.repository }}:${{ github.ref_name }} + + - name: Get image digest + id: digest + run: | + DIGEST=$(docker inspect --format='{{index .RepoDigests 0}}' ghcr.io/${{ github.repository }}:${{ github.ref_name }} | cut -d@ -f2) + echo "digest=$DIGEST" >> $GITHUB_OUTPUT + + - name: Create Stella Release + uses: stella-ops/create-release-action@v1 + with: + stella-url: ${{ vars.STELLA_API_URL }} + stella-token: ${{ secrets.STELLA_TOKEN }} + release-name: ${{ github.event.repository.name }}-${{ github.ref_name }} + components: | + - componentId: ${{ vars.STELLA_COMPONENT_ID }} + digest: ${{ steps.digest.outputs.digest }} + source-ref: | + type: git + repository: ${{ github.server_url }}/${{ github.repository }} + commit: ${{ github.sha }} + tag: ${{ github.ref_name }} +``` + +### Jenkins Integration + +```groovy +// Jenkinsfile +pipeline { + agent any + + environment { + STELLA_API_URL = 'https://stella.example.com/api/v1' + STELLA_TOKEN = credentials('stella-api-token') + REGISTRY = 'registry.example.com' + IMAGE_NAME = 'myorg/myapp' + } + + stages { + stage('Build') { + steps { + script { + docker.build("${REGISTRY}/${IMAGE_NAME}:${env.BUILD_TAG}") + } + } + } + + stage('Push') { + steps { + script { + docker.withRegistry("https://${REGISTRY}", 'registry-creds') { + docker.image("${REGISTRY}/${IMAGE_NAME}:${env.BUILD_TAG}").push() + } + } + } + } + + stage('Create Release') { + when { + tag pattern: "v\\d+\\.\\d+\\.\\d+", comparator: "REGEXP" + } + steps { + script { + def digest = sh( + script: "docker inspect --format='{{index .RepoDigests 0}}' ${REGISTRY}/${IMAGE_NAME}:${env.TAG_NAME} | cut -d@ -f2", + returnStdout: true + ).trim() + + def response = httpRequest( + url: "${STELLA_API_URL}/releases", + httpMode: 'POST', + contentType: 'APPLICATION_JSON', + customHeaders: [[name: 'Authorization', value: "Bearer ${STELLA_TOKEN}"]], + requestBody: """ + { + "name": "${IMAGE_NAME}-${env.TAG_NAME}", + "components": [{ + "componentId": "${env.STELLA_COMPONENT_ID}", + "digest": "${digest}" + }], + "sourceRef": { + "type": "git", + "repository": "${env.GIT_URL}", + "commit": "${env.GIT_COMMIT}", + "tag": "${env.TAG_NAME}" + } + } + """ + ) + + echo "Release created: ${response.content}" + } + } + } + } + + post { + success { + // Notify Stella of successful build + httpRequest( + url: "${STELLA_API_URL}/webhooks/ci-status", + httpMode: 'POST', + contentType: 'APPLICATION_JSON', + customHeaders: [[name: 'Authorization', value: "Bearer ${STELLA_TOKEN}"]], + requestBody: """ + { + "buildId": "${env.BUILD_ID}", + "status": "success", + "commit": "${env.GIT_COMMIT}" + } + """ + ) + } + } +} +``` + +## Workflow Step Integration + +### Trigger CI Pipeline Step + +```typescript +// Step type: trigger-ci +interface TriggerCIConfig { + integrationId: UUID; // CI integration reference + pipelineId: string; // Pipeline to trigger + ref?: string; // Branch/tag reference + variables?: Record; + waitForCompletion: boolean; + timeout?: number; +} + +class TriggerCIStep implements IStepExecutor { + async execute( + inputs: StepInputs, + config: TriggerCIConfig, + context: ExecutionContext + ): Promise { + const connector = await this.getConnector(config.integrationId); + + // Trigger pipeline + const run = await connector.triggerPipeline( + config.pipelineId, + { + ref: config.ref || context.release?.sourceRef?.tag, + variables: { + ...config.variables, + STELLA_RELEASE_ID: context.release?.id, + STELLA_PROMOTION_ID: context.promotion?.id, + STELLA_ENVIRONMENT: context.environment?.name + } + } + ); + + if (!config.waitForCompletion) { + return { + pipelineRunId: run.id, + status: run.status, + webUrl: run.webUrl + }; + } + + // Wait for completion + const finalStatus = await this.waitForPipeline( + connector, + run.id, + config.timeout || 3600 + ); + + if (finalStatus.status !== "success") { + throw new StepError( + `Pipeline failed with status: ${finalStatus.status}`, + { pipelineRunId: run.id, status: finalStatus } + ); + } + + return { + pipelineRunId: run.id, + status: finalStatus.status, + webUrl: run.webUrl + }; + } + + private async waitForPipeline( + connector: ICICDConnector, + runId: string, + timeout: number + ): Promise { + const deadline = Date.now() + timeout * 1000; + + while (Date.now() < deadline) { + const run = await connector.getPipelineRun(runId); + + if (run.status === "success" || run.status === "failed" || run.status === "cancelled") { + return run; + } + + await sleep(10000); // Poll every 10 seconds + } + + throw new TimeoutError(`Pipeline did not complete within ${timeout} seconds`); + } +} +``` + +### Wait for CI Step + +```typescript +// Step type: wait-ci +interface WaitCIConfig { + integrationId: UUID; + runId?: string; // If known, or from input + runIdInput?: string; // Input name containing run ID + timeout: number; + failOnError: boolean; +} + +class WaitCIStep implements IStepExecutor { + async execute( + inputs: StepInputs, + config: WaitCIConfig, + context: ExecutionContext + ): Promise { + const runId = config.runId || inputs[config.runIdInput!]; + + if (!runId) { + throw new StepError("Pipeline run ID not provided"); + } + + const connector = await this.getConnector(config.integrationId); + + const finalStatus = await this.waitForPipeline( + connector, + runId, + config.timeout + ); + + const success = finalStatus.status === "success"; + + if (!success && config.failOnError) { + throw new StepError( + `Pipeline failed with status: ${finalStatus.status}`, + { pipelineRunId: runId, status: finalStatus } + ); + } + + return { + status: finalStatus.status, + success, + pipelineRun: finalStatus + }; + } +} +``` + +## Deployment Status Reporting + +### GitHub Deployment Status + +```typescript +class GitHubStatusReporter { + async reportDeploymentStart( + integration: Integration, + deployment: DeploymentContext + ): Promise { + const client = await this.getClient(integration); + + // Create deployment + const { data: ghDeployment } = await client.repos.createDeployment({ + owner: deployment.repository.owner, + repo: deployment.repository.name, + ref: deployment.sourceRef.commit, + environment: deployment.environment.name, + auto_merge: false, + required_contexts: [], + payload: { + stellaReleaseId: deployment.release.id, + stellaPromotionId: deployment.promotion.id + } + }); + + // Set status to in_progress + await client.repos.createDeploymentStatus({ + owner: deployment.repository.owner, + repo: deployment.repository.name, + deployment_id: ghDeployment.id, + state: "in_progress", + log_url: `${this.stellaUrl}/deployments/${deployment.jobId}`, + description: "Deployment in progress" + }); + + // Store deployment ID for later status update + await this.storeMapping(deployment.jobId, ghDeployment.id); + } + + async reportDeploymentComplete( + integration: Integration, + deployment: DeploymentContext, + success: boolean + ): Promise { + const client = await this.getClient(integration); + const ghDeploymentId = await this.getMapping(deployment.jobId); + + await client.repos.createDeploymentStatus({ + owner: deployment.repository.owner, + repo: deployment.repository.name, + deployment_id: ghDeploymentId, + state: success ? "success" : "failure", + log_url: `${this.stellaUrl}/deployments/${deployment.jobId}`, + environment_url: deployment.environment.url, + description: success + ? "Deployment completed successfully" + : "Deployment failed" + }); + } +} +``` + +### GitLab Pipeline Status + +```typescript +class GitLabStatusReporter { + async reportDeploymentStatus( + integration: Integration, + deployment: DeploymentContext, + state: "running" | "success" | "failed" | "canceled" + ): Promise { + const client = await this.getClient(integration); + + await client.post( + `/projects/${integration.config.projectId}/statuses/${deployment.sourceRef.commit}`, + { + state, + ref: deployment.sourceRef.tag || deployment.sourceRef.branch, + name: `stella/${deployment.environment.name}`, + target_url: `${this.stellaUrl}/deployments/${deployment.jobId}`, + description: this.getDescription(state, deployment) + } + ); + } + + private getDescription(state: string, deployment: DeploymentContext): string { + switch (state) { + case "running": + return `Deploying to ${deployment.environment.name}`; + case "success": + return `Deployed to ${deployment.environment.name}`; + case "failed": + return `Deployment to ${deployment.environment.name} failed`; + case "canceled": + return `Deployment to ${deployment.environment.name} cancelled`; + default: + return ""; + } + } +} +``` + +## API for CI Systems + +### Create Release from CI + +```http +POST /api/v1/releases +Authorization: Bearer +Content-Type: application/json + +{ + "name": "myapp-v1.2.0", + "components": [ + { + "componentId": "component-uuid", + "digest": "sha256:abc123..." + } + ], + "sourceRef": { + "type": "git", + "repository": "https://github.com/myorg/myapp", + "commit": "abc123def456", + "tag": "v1.2.0", + "branch": "main" + }, + "metadata": { + "buildId": "12345", + "buildUrl": "https://ci.example.com/builds/12345", + "triggeredBy": "ci-pipeline" + } +} +``` + +### Report Build Status + +```http +POST /api/v1/ci-events/build-complete +Authorization: Bearer +Content-Type: application/json + +{ + "integrationId": "integration-uuid", + "buildId": "12345", + "status": "success", + "commit": "abc123def456", + "artifacts": [ + { + "name": "myapp", + "digest": "sha256:abc123...", + "repository": "registry.example.com/myorg/myapp" + } + ], + "testResults": { + "passed": 150, + "failed": 0, + "skipped": 5 + } +} +``` + +## Service Account for CI + +### Creating CI Service Account + +```http +POST /api/v1/service-accounts +Authorization: Bearer +Content-Type: application/json + +{ + "name": "ci-pipeline", + "description": "Service account for CI/CD integration", + "roles": ["release-creator"], + "permissions": [ + { "resource": "release", "action": "create" }, + { "resource": "component", "action": "read" }, + { "resource": "version-map", "action": "read" } + ], + "expiresIn": "365d" +} +``` + +Response: +```json +{ + "success": true, + "data": { + "id": "sa-uuid", + "name": "ci-pipeline", + "token": "stella_sa_xxxxxxxxxxxxx", + "expiresAt": "2027-01-09T00:00:00Z" + } +} +``` + +## References + +- [Integrations Overview](overview.md) +- [Connectors](connectors.md) +- [Webhooks](webhooks.md) +- [Workflow Templates](../workflow/templates.md) diff --git a/docs/modules/release-orchestrator/integrations/connectors.md b/docs/modules/release-orchestrator/integrations/connectors.md new file mode 100644 index 000000000..007cc7505 --- /dev/null +++ b/docs/modules/release-orchestrator/integrations/connectors.md @@ -0,0 +1,900 @@ +# Connector Development + +## Overview + +Connectors are the integration layer between Release Orchestrator and external systems. Each connector implements a standard interface for its integration type. + +## Connector Architecture + +``` + CONNECTOR ARCHITECTURE + + ┌─────────────────────────────────────────────────────────────────────────────┐ + │ CONNECTOR RUNTIME │ + │ │ + │ ┌─────────────────────────────────────────────────────────────────────┐ │ + │ │ CONNECTOR INTERFACE │ │ + │ │ │ │ + │ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │ + │ │ │ getCapabilities()│ │ ping() │ │ authenticate() │ │ │ + │ │ └──────────────────┘ └──────────────────┘ └──────────────────┘ │ │ + │ │ │ │ + │ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │ + │ │ │ discover() │ │ execute() │ │ healthCheck() │ │ │ + │ │ └──────────────────┘ └──────────────────┘ └──────────────────┘ │ │ + │ │ │ │ + │ └─────────────────────────────────────────────────────────────────────┘ │ + │ │ │ + │ ▼ │ + │ ┌─────────────────────────────────────────────────────────────────────┐ │ + │ │ CONNECTOR IMPLEMENTATIONS │ │ + │ │ │ │ + │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ + │ │ │ Registry │ │ CI/CD │ │ Notification│ │ Secret │ │ │ + │ │ │ Connectors │ │ Connectors │ │ Connectors │ │ Connectors │ │ │ + │ │ │ │ │ │ │ │ │ │ │ │ + │ │ │ - Docker │ │ - GitLab │ │ - Slack │ │ - Vault │ │ │ + │ │ │ - ECR │ │ - GitHub │ │ - Teams │ │ - AWS SM │ │ │ + │ │ │ - ACR │ │ - Jenkins │ │ - Email │ │ - Azure KV │ │ │ + │ │ │ - Harbor │ │ - Azure DO │ │ - PagerDuty │ │ │ │ │ + │ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ + │ │ │ │ + │ └─────────────────────────────────────────────────────────────────────┘ │ + │ │ + └─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Base Connector Interface + +```typescript +interface IConnector { + // Metadata + readonly typeId: string; + readonly displayName: string; + readonly version: string; + readonly capabilities: ConnectorCapabilities; + + // Lifecycle + initialize(config: IntegrationConfig): Promise; + dispose(): Promise; + + // Health + ping(config: IntegrationConfig): Promise; + healthCheck(config: IntegrationConfig, creds: Credential): Promise; + + // Authentication + authenticate(config: IntegrationConfig, creds: Credential): Promise; + + // Discovery (optional) + discover?( + config: IntegrationConfig, + authContext: AuthContext, + resourceType: string, + filter?: DiscoveryFilter + ): Promise; +} + +interface ConnectorCapabilities { + discovery: boolean; + webhooks: boolean; + streaming: boolean; + batchOperations: boolean; + customActions: string[]; +} +``` + +## Registry Connectors + +### IRegistryConnector + +```typescript +interface IRegistryConnector extends IConnector { + // Repository operations + listRepositories(authContext: AuthContext): Promise; + + // Tag operations + listTags(authContext: AuthContext, repository: string): Promise; + getManifest(authContext: AuthContext, repository: string, reference: string): Promise; + getDigest(authContext: AuthContext, repository: string, tag: string): Promise; + + // Image operations + imageExists(authContext: AuthContext, repository: string, digest: string): Promise; + getImageMetadata(authContext: AuthContext, repository: string, digest: string): Promise; +} + +interface Repository { + name: string; + fullName: string; + tagCount?: number; + lastUpdated?: DateTime; +} + +interface Tag { + name: string; + digest: string; + createdAt?: DateTime; + size?: number; +} + +interface ImageMetadata { + digest: string; + mediaType: string; + size: number; + architecture: string; + os: string; + created: DateTime; + labels: Record; + layers: LayerInfo[]; +} +``` + +### Docker Registry Connector + +```typescript +class DockerRegistryConnector implements IRegistryConnector { + readonly typeId = "docker-registry"; + readonly displayName = "Docker Registry"; + readonly version = "1.0.0"; + readonly capabilities: ConnectorCapabilities = { + discovery: true, + webhooks: true, + streaming: false, + batchOperations: false, + customActions: [] + }; + + private httpClient: HttpClient; + + async initialize(config: DockerRegistryConfig): Promise { + this.httpClient = new HttpClient({ + baseUrl: config.url, + timeout: config.timeout || 30000, + insecureSkipVerify: config.insecureSkipVerify + }); + } + + async ping(config: DockerRegistryConfig): Promise { + const response = await this.httpClient.get("/v2/"); + if (response.status !== 200 && response.status !== 401) { + throw new Error(`Registry unavailable: ${response.status}`); + } + } + + async authenticate( + config: DockerRegistryConfig, + creds: BasicCredential + ): Promise { + // Get auth challenge from /v2/ + const challenge = await this.getAuthChallenge(); + + if (challenge.type === "bearer") { + // OAuth2 token flow + const token = await this.getToken(challenge, creds); + return { type: "bearer", token }; + } else { + // Basic auth + return { + type: "basic", + credentials: Buffer.from(`${creds.username}:${creds.password}`).toString("base64") + }; + } + } + + async getDigest( + authContext: AuthContext, + repository: string, + tag: string + ): Promise { + const response = await this.httpClient.head( + `/v2/${repository}/manifests/${tag}`, + { + headers: { + ...this.authHeader(authContext), + Accept: "application/vnd.docker.distribution.manifest.v2+json" + } + } + ); + + const digest = response.headers.get("docker-content-digest"); + if (!digest) { + throw new Error("No digest header in response"); + } + + return digest; + } + + async getImageMetadata( + authContext: AuthContext, + repository: string, + digest: string + ): Promise { + // Fetch manifest + const manifest = await this.getManifest(authContext, repository, digest); + + // Fetch config blob + const configDigest = manifest.config.digest; + const configResponse = await this.httpClient.get( + `/v2/${repository}/blobs/${configDigest}`, + { headers: this.authHeader(authContext) } + ); + + const config = await configResponse.json(); + + return { + digest, + mediaType: manifest.mediaType, + size: manifest.config.size, + architecture: config.architecture, + os: config.os, + created: new Date(config.created), + labels: config.config?.Labels || {}, + layers: manifest.layers.map(l => ({ + digest: l.digest, + size: l.size, + mediaType: l.mediaType + })) + }; + } +} +``` + +### ECR Connector + +```typescript +class ECRConnector implements IRegistryConnector { + readonly typeId = "ecr"; + readonly displayName = "AWS ECR"; + readonly version = "1.0.0"; + readonly capabilities: ConnectorCapabilities = { + discovery: true, + webhooks: false, + streaming: false, + batchOperations: true, + customActions: ["createRepository", "setLifecyclePolicy"] + }; + + private ecrClient: ECRClient; + + async initialize(config: ECRConfig): Promise { + this.ecrClient = new ECRClient({ + region: config.region, + credentials: { + accessKeyId: config.accessKeyId, + secretAccessKey: config.secretAccessKey + } + }); + } + + async authenticate( + config: ECRConfig, + creds: AWSCredential + ): Promise { + const command = new GetAuthorizationTokenCommand({}); + const response = await this.ecrClient.send(command); + + const authData = response.authorizationData?.[0]; + if (!authData?.authorizationToken) { + throw new Error("Failed to get ECR authorization token"); + } + + return { + type: "bearer", + token: authData.authorizationToken, + expiresAt: authData.expiresAt + }; + } + + async listRepositories(authContext: AuthContext): Promise { + const repositories: Repository[] = []; + let nextToken: string | undefined; + + do { + const command = new DescribeRepositoriesCommand({ + nextToken + }); + const response = await this.ecrClient.send(command); + + for (const repo of response.repositories || []) { + repositories.push({ + name: repo.repositoryName!, + fullName: repo.repositoryUri!, + lastUpdated: repo.createdAt + }); + } + + nextToken = response.nextToken; + } while (nextToken); + + return repositories; + } +} +``` + +## CI/CD Connectors + +### ICICDConnector + +```typescript +interface ICICDConnector extends IConnector { + // Pipeline operations + listPipelines(authContext: AuthContext): Promise; + getPipeline(authContext: AuthContext, pipelineId: string): Promise; + + // Trigger operations + triggerPipeline( + authContext: AuthContext, + pipelineId: string, + params: TriggerParams + ): Promise; + + // Run operations + getPipelineRun(authContext: AuthContext, runId: string): Promise; + cancelPipelineRun(authContext: AuthContext, runId: string): Promise; + getPipelineRunLogs(authContext: AuthContext, runId: string): Promise; +} + +interface Pipeline { + id: string; + name: string; + ref?: string; + webUrl?: string; +} + +interface TriggerParams { + ref?: string; // Branch/tag + variables?: Record; +} + +interface PipelineRun { + id: string; + pipelineId: string; + status: PipelineStatus; + ref?: string; + webUrl?: string; + startedAt?: DateTime; + finishedAt?: DateTime; +} + +type PipelineStatus = + | "pending" + | "running" + | "success" + | "failed" + | "cancelled"; +``` + +### GitLab CI Connector + +```typescript +class GitLabCIConnector implements ICICDConnector { + readonly typeId = "gitlab-ci"; + readonly displayName = "GitLab CI/CD"; + readonly version = "1.0.0"; + readonly capabilities: ConnectorCapabilities = { + discovery: true, + webhooks: true, + streaming: false, + batchOperations: false, + customActions: ["retryPipeline"] + }; + + private apiClient: GitLabClient; + + async initialize(config: GitLabCIConfig): Promise { + this.apiClient = new GitLabClient({ + baseUrl: config.url, + projectId: config.projectId + }); + } + + async authenticate( + config: GitLabCIConfig, + creds: TokenCredential + ): Promise { + // Validate token with user endpoint + this.apiClient.setToken(creds.token); + await this.apiClient.get("/user"); + + return { + type: "bearer", + token: creds.token + }; + } + + async triggerPipeline( + authContext: AuthContext, + pipelineId: string, + params: TriggerParams + ): Promise { + const response = await this.apiClient.post( + `/projects/${this.projectId}/pipeline`, + { + ref: params.ref || this.defaultBranch, + variables: Object.entries(params.variables || {}).map(([key, value]) => ({ + key, + value, + variable_type: "env_var" + })) + }, + { headers: { Authorization: `Bearer ${authContext.token}` } } + ); + + return { + id: response.id.toString(), + pipelineId: pipelineId, + status: this.mapStatus(response.status), + ref: response.ref, + webUrl: response.web_url, + startedAt: response.started_at ? new Date(response.started_at) : undefined + }; + } + + async getPipelineRun( + authContext: AuthContext, + runId: string + ): Promise { + const response = await this.apiClient.get( + `/projects/${this.projectId}/pipelines/${runId}`, + { headers: { Authorization: `Bearer ${authContext.token}` } } + ); + + return { + id: response.id.toString(), + pipelineId: response.id.toString(), + status: this.mapStatus(response.status), + ref: response.ref, + webUrl: response.web_url, + startedAt: response.started_at ? new Date(response.started_at) : undefined, + finishedAt: response.finished_at ? new Date(response.finished_at) : undefined + }; + } + + private mapStatus(gitlabStatus: string): PipelineStatus { + const statusMap: Record = { + created: "pending", + waiting_for_resource: "pending", + preparing: "pending", + pending: "pending", + running: "running", + success: "success", + failed: "failed", + canceled: "cancelled", + skipped: "cancelled", + manual: "pending" + }; + return statusMap[gitlabStatus] || "pending"; + } +} +``` + +## Notification Connectors + +### INotificationConnector + +```typescript +interface INotificationConnector extends IConnector { + // Channel operations + listChannels(authContext: AuthContext): Promise; + + // Send operations + sendMessage( + authContext: AuthContext, + channel: string, + message: NotificationMessage + ): Promise; + + sendTemplate( + authContext: AuthContext, + channel: string, + templateId: string, + data: Record + ): Promise; +} + +interface Channel { + id: string; + name: string; + type: string; +} + +interface NotificationMessage { + text: string; + title?: string; + color?: string; + fields?: MessageField[]; + actions?: MessageAction[]; +} + +interface MessageField { + name: string; + value: string; + inline?: boolean; +} + +interface MessageAction { + type: "button" | "link"; + text: string; + url?: string; + style?: "primary" | "danger" | "default"; +} +``` + +### Slack Connector + +```typescript +class SlackConnector implements INotificationConnector { + readonly typeId = "slack"; + readonly displayName = "Slack"; + readonly version = "1.0.0"; + readonly capabilities: ConnectorCapabilities = { + discovery: true, + webhooks: true, + streaming: false, + batchOperations: false, + customActions: ["addReaction", "updateMessage"] + }; + + private slackClient: WebClient; + + async initialize(config: SlackConfig): Promise { + // Client initialized in authenticate + } + + async authenticate( + config: SlackConfig, + creds: TokenCredential + ): Promise { + this.slackClient = new WebClient(creds.token); + + // Test authentication + const result = await this.slackClient.auth.test(); + if (!result.ok) { + throw new Error("Slack authentication failed"); + } + + return { + type: "bearer", + token: creds.token, + teamId: result.team_id, + userId: result.user_id + }; + } + + async listChannels(authContext: AuthContext): Promise { + const channels: Channel[] = []; + let cursor: string | undefined; + + do { + const result = await this.slackClient.conversations.list({ + types: "public_channel,private_channel", + cursor + }); + + for (const channel of result.channels || []) { + channels.push({ + id: channel.id!, + name: channel.name!, + type: channel.is_private ? "private" : "public" + }); + } + + cursor = result.response_metadata?.next_cursor; + } while (cursor); + + return channels; + } + + async sendMessage( + authContext: AuthContext, + channel: string, + message: NotificationMessage + ): Promise { + const blocks = this.buildBlocks(message); + + const result = await this.slackClient.chat.postMessage({ + channel, + text: message.text, + blocks, + attachments: message.color ? [{ + color: message.color, + blocks + }] : undefined + }); + + return { + messageId: result.ts!, + channel: result.channel!, + success: result.ok + }; + } + + private buildBlocks(message: NotificationMessage): KnownBlock[] { + const blocks: KnownBlock[] = []; + + if (message.title) { + blocks.push({ + type: "header", + text: { + type: "plain_text", + text: message.title + } + }); + } + + blocks.push({ + type: "section", + text: { + type: "mrkdwn", + text: message.text + } + }); + + if (message.fields?.length) { + blocks.push({ + type: "section", + fields: message.fields.map(f => ({ + type: "mrkdwn", + text: `*${f.name}*\n${f.value}` + })) + }); + } + + if (message.actions?.length) { + blocks.push({ + type: "actions", + elements: message.actions.map(a => ({ + type: "button", + text: { + type: "plain_text", + text: a.text + }, + url: a.url, + style: a.style === "danger" ? "danger" : "primary" + })) + }); + } + + return blocks; + } +} +``` + +## Secret Store Connectors + +### ISecretConnector + +```typescript +interface ISecretConnector extends IConnector { + // Secret operations + getSecret( + authContext: AuthContext, + path: string, + key?: string + ): Promise; + + listSecrets( + authContext: AuthContext, + path: string + ): Promise; +} + +interface SecretValue { + value: string; + version?: string; + createdAt?: DateTime; + expiresAt?: DateTime; +} +``` + +### HashiCorp Vault Connector + +```typescript +class VaultConnector implements ISecretConnector { + readonly typeId = "hashicorp-vault"; + readonly displayName = "HashiCorp Vault"; + readonly version = "1.0.0"; + readonly capabilities: ConnectorCapabilities = { + discovery: true, + webhooks: false, + streaming: false, + batchOperations: false, + customActions: ["renewToken"] + }; + + private vaultClient: VaultClient; + + async initialize(config: VaultConfig): Promise { + this.vaultClient = new VaultClient({ + endpoint: config.url, + namespace: config.namespace + }); + } + + async authenticate( + config: VaultConfig, + creds: Credential + ): Promise { + let token: string; + + switch (config.authMethod) { + case "token": + token = (creds as TokenCredential).token; + break; + + case "approle": + const approle = creds as AppRoleCredential; + const result = await this.vaultClient.auth.approle.login({ + role_id: approle.roleId, + secret_id: approle.secretId + }); + token = result.auth.client_token; + break; + + case "kubernetes": + const k8s = creds as KubernetesCredential; + const k8sResult = await this.vaultClient.auth.kubernetes.login({ + role: k8s.role, + jwt: k8s.serviceAccountToken + }); + token = k8sResult.auth.client_token; + break; + + default: + throw new Error(`Unsupported auth method: ${config.authMethod}`); + } + + this.vaultClient.token = token; + + return { + type: "bearer", + token, + renewable: true + }; + } + + async getSecret( + authContext: AuthContext, + path: string, + key?: string + ): Promise { + const result = await this.vaultClient.kv.v2.read({ + mount_path: this.mountPath, + path + }); + + const data = result.data.data; + const value = key ? data[key] : JSON.stringify(data); + + return { + value, + version: result.data.metadata.version.toString(), + createdAt: new Date(result.data.metadata.created_time) + }; + } + + async listSecrets( + authContext: AuthContext, + path: string + ): Promise { + const result = await this.vaultClient.kv.v2.list({ + mount_path: this.mountPath, + path + }); + + return result.data.keys; + } +} +``` + +## Custom Connector Development + +### Plugin Structure + +``` +my-connector/ + ├── manifest.yaml + ├── src/ + │ ├── connector.ts + │ ├── config.ts + │ └── types.ts + └── package.json +``` + +### Manifest + +```yaml +# manifest.yaml +id: my-custom-connector +version: 1.0.0 +name: My Custom Connector +description: Custom connector for XYZ service +author: Your Name + +connector: + typeId: my-service + displayName: My Service + entrypoint: ./src/connector.js + + capabilities: + discovery: true + webhooks: false + streaming: false + batchOperations: false + + config_schema: + type: object + properties: + url: + type: string + format: uri + description: Service URL + timeout: + type: integer + default: 30000 + required: + - url + + credential_types: + - api-key + - oauth2 +``` + +### Implementation + +```typescript +// connector.ts +import { IConnector, ConnectorCapabilities } from "@stella-ops/connector-sdk"; + +export class MyConnector implements IConnector { + readonly typeId = "my-service"; + readonly displayName = "My Service"; + readonly version = "1.0.0"; + readonly capabilities: ConnectorCapabilities = { + discovery: true, + webhooks: false, + streaming: false, + batchOperations: false, + customActions: [] + }; + + async initialize(config: MyConfig): Promise { + // Initialize your connector + } + + async dispose(): Promise { + // Cleanup resources + } + + async ping(config: MyConfig): Promise { + // Check connectivity + } + + async healthCheck(config: MyConfig, creds: Credential): Promise { + // Full health check + } + + async authenticate(config: MyConfig, creds: Credential): Promise { + // Authenticate and return context + } + + async discover( + config: MyConfig, + authContext: AuthContext, + resourceType: string, + filter?: DiscoveryFilter + ): Promise { + // Discover resources + } +} + +// Export connector factory +export default function createConnector(): IConnector { + return new MyConnector(); +} +``` + +## References + +- [Integrations Overview](overview.md) +- [Webhooks](webhooks.md) +- [Plugin System](../modules/plugin-system.md) diff --git a/docs/modules/release-orchestrator/integrations/overview.md b/docs/modules/release-orchestrator/integrations/overview.md new file mode 100644 index 000000000..5574ceec8 --- /dev/null +++ b/docs/modules/release-orchestrator/integrations/overview.md @@ -0,0 +1,412 @@ +# Integrations Overview + +## Purpose + +The Integration Hub (INTHUB) provides a unified interface for connecting Release Orchestrator to external systems including container registries, CI/CD pipelines, notification services, secret stores, and metrics providers. + +## Integration Architecture + +``` + INTEGRATION HUB ARCHITECTURE + + ┌─────────────────────────────────────────────────────────────────────────────┐ + │ INTEGRATION HUB │ + │ │ + │ ┌─────────────────────────────────────────────────────────────────────┐ │ + │ │ INTEGRATION MANAGER │ │ + │ │ │ │ + │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ + │ │ │ Type │ │ Instance │ │ Health │ │ Discovery │ │ │ + │ │ │ Registry │ │ Manager │ │ Monitor │ │ Service │ │ │ + │ │ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │ │ + │ │ │ │ + │ └─────────────────────────────────────────────────────────────────────┘ │ + │ │ │ + │ ▼ │ + │ ┌─────────────────────────────────────────────────────────────────────┐ │ + │ │ CONNECTOR RUNTIME │ │ + │ │ │ │ + │ │ ┌──────────────────────────────────────────────────────────────┐ │ │ + │ │ │ CONNECTOR POOL │ │ │ + │ │ │ │ │ │ + │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ + │ │ │ │ Docker │ │ GitLab │ │ Slack │ │ Vault │ │ │ │ + │ │ │ │ Registry │ │ CI │ │ │ │ │ │ │ │ + │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ + │ │ │ │ │ │ + │ │ └──────────────────────────────────────────────────────────────┘ │ │ + │ │ │ │ + │ └─────────────────────────────────────────────────────────────────────┘ │ + │ │ + └─────────────────────────────────────────────────────────────────────────────┘ + │ + ┌─────────────┬─────────────────┼─────────────────┬─────────────┐ + │ │ │ │ │ + ▼ ▼ ▼ ▼ ▼ + ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ + │Container│ │ CI/CD │ │ Notifi- │ │ Secret │ │ Metrics │ + │Registry │ │ Systems │ │ cations │ │ Stores │ │ Systems │ + └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ +``` + +## Integration Types + +### Container Registries + +| Type ID | Description | Discovery Support | +|---------|-------------|-------------------| +| `docker-registry` | Docker Registry v2 API | Yes | +| `docker-hub` | Docker Hub | Yes | +| `gcr` | Google Container Registry | Yes | +| `ecr` | AWS Elastic Container Registry | Yes | +| `acr` | Azure Container Registry | Yes | +| `ghcr` | GitHub Container Registry | Yes | +| `harbor` | Harbor Registry | Yes | +| `jfrog` | JFrog Artifactory | Yes | +| `nexus` | Sonatype Nexus | Yes | +| `quay` | Quay.io | Yes | + +### CI/CD Systems + +| Type ID | Description | Trigger Support | +|---------|-------------|-----------------| +| `gitlab-ci` | GitLab CI/CD | Yes | +| `github-actions` | GitHub Actions | Yes | +| `jenkins` | Jenkins | Yes | +| `azure-devops` | Azure DevOps Pipelines | Yes | +| `circleci` | CircleCI | Yes | +| `teamcity` | TeamCity | Yes | +| `drone` | Drone CI | Yes | + +### Notification Services + +| Type ID | Description | Features | +|---------|-------------|----------| +| `slack` | Slack | Channels, threads, reactions | +| `teams` | Microsoft Teams | Channels, cards | +| `email` | Email (SMTP) | Templates, attachments | +| `webhook` | Generic Webhook | JSON payloads | +| `pagerduty` | PagerDuty | Incidents, alerts | +| `opsgenie` | OpsGenie | Alerts, on-call | + +### Secret Stores + +| Type ID | Description | Features | +|---------|-------------|----------| +| `hashicorp-vault` | HashiCorp Vault | KV, Transit, PKI | +| `aws-secrets-manager` | AWS Secrets Manager | Rotation, versioning | +| `azure-key-vault` | Azure Key Vault | Keys, secrets, certs | +| `gcp-secret-manager` | GCP Secret Manager | Versions, labels | + +### Metrics & Monitoring + +| Type ID | Description | Use Case | +|---------|-------------|----------| +| `prometheus` | Prometheus | Canary metrics | +| `datadog` | Datadog | APM, logs, metrics | +| `newrelic` | New Relic | APM, infra monitoring | +| `dynatrace` | Dynatrace | Full-stack monitoring | + +## Integration Configuration + +### Integration Entity + +```typescript +interface Integration { + id: UUID; + tenantId: UUID; + typeId: string; // e.g., "docker-registry" + name: string; // Display name + description?: string; + + // Connection configuration + config: IntegrationConfig; + + // Credential reference (stored in vault) + credentialRef: string; + + // Health tracking + healthStatus: "healthy" | "degraded" | "unhealthy" | "unknown"; + lastHealthCheck?: DateTime; + + // Metadata + labels: Record; + createdAt: DateTime; + updatedAt: DateTime; +} + +interface IntegrationConfig { + // Common fields + url?: string; + timeout?: number; + retries?: number; + + // Type-specific fields + [key: string]: any; +} +``` + +### Type-Specific Configuration + +```typescript +// Docker Registry +interface DockerRegistryConfig extends IntegrationConfig { + url: string; // https://registry.example.com + repository?: string; // Optional default repository + insecureSkipVerify?: boolean; // Skip TLS verification +} + +// GitLab CI +interface GitLabCIConfig extends IntegrationConfig { + url: string; // https://gitlab.example.com + projectId: string; // Project ID or path + defaultBranch?: string; // Default ref for triggers +} + +// Slack +interface SlackConfig extends IntegrationConfig { + workspace?: string; // Workspace identifier + defaultChannel?: string; // Default channel for notifications + iconEmoji?: string; // Bot icon +} + +// HashiCorp Vault +interface VaultConfig extends IntegrationConfig { + url: string; // https://vault.example.com + namespace?: string; // Vault namespace + mountPath: string; // Secret mount path + authMethod: "token" | "approle" | "kubernetes"; +} +``` + +## Credential Management + +Credentials are never stored in the Release Orchestrator database. Instead, references to external secret stores are used. + +### Credential Reference Format + +``` +vault://vault-integration-id/path/to/secret#key + └─────────┬────────┘ └─────┬─────┘ └┬┘ + Vault ID Secret path Key +``` + +### Credential Types + +```typescript +type CredentialType = + | "basic" // Username/password + | "token" // Bearer token + | "api-key" // API key + | "oauth2" // OAuth2 credentials + | "service-account" // GCP/K8s service account + | "certificate"; // Client certificate + +interface CredentialReference { + type: CredentialType; + ref: string; // Vault reference +} + +// Examples +const dockerCreds: CredentialReference = { + type: "basic", + ref: "vault://vault-1/docker/registry.example.com#credentials" +}; + +const gitlabToken: CredentialReference = { + type: "token", + ref: "vault://vault-1/ci/gitlab#access_token" +}; +``` + +## Health Monitoring + +### Health Check Types + +| Check Type | Description | Frequency | +|------------|-------------|-----------| +| `connectivity` | TCP/HTTP connectivity | 1 min | +| `authentication` | Credential validity | 5 min | +| `functionality` | Full operation test | 15 min | + +### Health Check Flow + +```typescript +interface HealthCheckResult { + integrationId: UUID; + checkType: string; + status: "healthy" | "degraded" | "unhealthy"; + latencyMs: number; + message?: string; + checkedAt: DateTime; +} + +class IntegrationHealthMonitor { + async checkHealth(integration: Integration): Promise { + const connector = this.connectorPool.get(integration.typeId); + const startTime = Date.now(); + + try { + // Connectivity check + await connector.ping(integration.config); + + // Authentication check + const creds = await this.fetchCredentials(integration.credentialRef); + await connector.authenticate(integration.config, creds); + + return { + integrationId: integration.id, + checkType: "full", + status: "healthy", + latencyMs: Date.now() - startTime, + checkedAt: new Date() + }; + } catch (error) { + return { + integrationId: integration.id, + checkType: "full", + status: this.classifyError(error), + latencyMs: Date.now() - startTime, + message: error.message, + checkedAt: new Date() + }; + } + } +} +``` + +## Discovery Service + +Integrations can discover resources from connected systems. + +### Discovery Operations + +```typescript +interface DiscoveryService { + // Discover available repositories + discoverRepositories(integrationId: UUID): Promise; + + // Discover tags/versions + discoverTags(integrationId: UUID, repository: string): Promise; + + // Discover pipelines + discoverPipelines(integrationId: UUID): Promise; + + // Discover notification channels + discoverChannels(integrationId: UUID): Promise; +} + +// Example: Discover Docker repositories +const repos = await discoveryService.discoverRepositories(dockerIntegrationId); +// Returns: [{ name: "myapp", tags: ["latest", "v1.0.0", ...] }, ...] +``` + +### Discovery Caching + +```typescript +interface DiscoveryCache { + key: string; // integration_id:resource_type + data: any; + discoveredAt: DateTime; + ttlSeconds: number; +} + +// Cache TTLs by resource type +const cacheTTLs = { + repositories: 3600, // 1 hour + tags: 300, // 5 minutes + pipelines: 3600, // 1 hour + channels: 86400 // 24 hours +}; +``` + +## API Reference + +### Create Integration + +```http +POST /api/v1/integrations +Content-Type: application/json + +{ + "typeId": "docker-registry", + "name": "Production Registry", + "config": { + "url": "https://registry.example.com", + "repository": "myorg" + }, + "credentialRef": "vault://vault-1/docker/prod-registry#credentials", + "labels": { + "environment": "production" + } +} +``` + +### Test Integration + +```http +POST /api/v1/integrations/{id}/test +``` + +Response: +```json +{ + "success": true, + "data": { + "connectivityTest": { "status": "passed", "latencyMs": 45 }, + "authenticationTest": { "status": "passed", "latencyMs": 120 }, + "functionalityTest": { "status": "passed", "latencyMs": 230 } + } +} +``` + +### Discover Resources + +```http +POST /api/v1/integrations/{id}/discover +Content-Type: application/json + +{ + "resourceType": "repositories", + "filter": { + "namePattern": "myapp-*" + } +} +``` + +## Error Handling + +### Integration Errors + +| Error Code | Description | Retry Strategy | +|------------|-------------|----------------| +| `INTEGRATION_NOT_FOUND` | Integration ID not found | No retry | +| `INTEGRATION_UNHEALTHY` | Integration health check failing | Backoff retry | +| `CREDENTIAL_FETCH_FAILED` | Cannot fetch credentials | Retry with backoff | +| `CONNECTION_REFUSED` | Cannot connect to endpoint | Retry with backoff | +| `AUTHENTICATION_FAILED` | Invalid credentials | No retry | +| `RATE_LIMITED` | Too many requests | Retry after delay | + +### Circuit Breaker + +```typescript +interface CircuitBreakerConfig { + failureThreshold: number; // Failures before opening + successThreshold: number; // Successes to close + timeout: number; // Time in open state (ms) +} + +// Default configuration +const defaultCircuitBreaker: CircuitBreakerConfig = { + failureThreshold: 5, + successThreshold: 3, + timeout: 60000 +}; +``` + +## References + +- [Connectors](connectors.md) +- [Webhooks](webhooks.md) +- [CI/CD Integration](ci-cd.md) +- [Integration Hub Module](../modules/integration-hub.md) diff --git a/docs/modules/release-orchestrator/integrations/webhooks.md b/docs/modules/release-orchestrator/integrations/webhooks.md new file mode 100644 index 000000000..23aae6ee2 --- /dev/null +++ b/docs/modules/release-orchestrator/integrations/webhooks.md @@ -0,0 +1,627 @@ +# Webhooks + +## Overview + +Release Orchestrator supports both inbound webhooks (receiving events from external systems) and outbound webhooks (sending events to external systems). + +## Inbound Webhooks + +### Webhook Types + +| Type | Source | Triggers | +|------|--------|----------| +| `registry-push` | Container registries | Image push events | +| `ci-pipeline` | CI/CD systems | Pipeline completion | +| `github-app` | GitHub | PR, push, workflow events | +| `gitlab-webhook` | GitLab | Pipeline, push, MR events | +| `generic` | Any system | Custom payloads | + +### Registry Push Webhook + +Receives events when new images are pushed to registries. + +``` +POST /api/v1/webhooks/registry/{integrationId} +Content-Type: application/json + +# Docker Hub +{ + "push_data": { + "tag": "v1.2.0", + "images": ["sha256:abc123..."], + "pushed_at": 1704067200 + }, + "repository": { + "name": "myapp", + "namespace": "myorg", + "repo_url": "https://hub.docker.com/r/myorg/myapp" + } +} + +# Harbor +{ + "type": "PUSH_ARTIFACT", + "occur_at": 1704067200, + "event_data": { + "repository": { + "name": "myapp", + "repo_full_name": "myorg/myapp" + }, + "resources": [{ + "digest": "sha256:abc123...", + "tag": "v1.2.0" + }] + } +} +``` + +### Webhook Handler + +```typescript +interface WebhookHandler { + handleRegistryPush( + integrationId: UUID, + payload: RegistryPushPayload + ): Promise; + + handleCIPipeline( + integrationId: UUID, + payload: CIPipelinePayload + ): Promise; +} + +class RegistryWebhookHandler implements WebhookHandler { + async handleRegistryPush( + integrationId: UUID, + payload: RegistryPushPayload + ): Promise { + // Normalize payload from different registries + const normalized = this.normalizePayload(payload); + + // Find matching component + const component = await this.componentRegistry.findByRepository( + normalized.repository + ); + + if (!component) { + return { + success: true, + action: "ignored", + reason: "No matching component" + }; + } + + // Update version map + await this.versionManager.addVersion({ + componentId: component.id, + tag: normalized.tag, + digest: normalized.digest, + channel: this.determineChannel(normalized.tag) + }); + + // Check for auto-release triggers + const triggers = await this.getTriggers(component.id, normalized.tag); + for (const trigger of triggers) { + await this.triggerRelease(trigger, normalized); + } + + return { + success: true, + action: "processed", + componentId: component.id, + versionsAdded: 1, + triggersActivated: triggers.length + }; + } + + private normalizePayload(payload: any): NormalizedPushEvent { + // Detect registry type and normalize + if (payload.push_data) { + // Docker Hub format + return { + repository: `${payload.repository.namespace}/${payload.repository.name}`, + tag: payload.push_data.tag, + digest: payload.push_data.images[0], + pushedAt: new Date(payload.push_data.pushed_at * 1000) + }; + } + + if (payload.type === "PUSH_ARTIFACT") { + // Harbor format + return { + repository: payload.event_data.repository.repo_full_name, + tag: payload.event_data.resources[0].tag, + digest: payload.event_data.resources[0].digest, + pushedAt: new Date(payload.occur_at * 1000) + }; + } + + // Generic format + return payload as NormalizedPushEvent; + } +} +``` + +### Webhook Authentication + +```typescript +interface WebhookAuth { + // Signature validation + validateSignature( + payload: Buffer, + signature: string, + secret: string, + algorithm: SignatureAlgorithm + ): boolean; + + // Token validation + validateToken( + token: string, + expectedToken: string + ): boolean; +} + +type SignatureAlgorithm = "hmac-sha256" | "hmac-sha1"; + +class WebhookAuthenticator implements WebhookAuth { + validateSignature( + payload: Buffer, + signature: string, + secret: string, + algorithm: SignatureAlgorithm + ): boolean { + const algo = algorithm === "hmac-sha256" ? "sha256" : "sha1"; + const expected = crypto + .createHmac(algo, secret) + .update(payload) + .digest("hex"); + + // Constant-time comparison + return crypto.timingSafeEqual( + Buffer.from(signature), + Buffer.from(expected) + ); + } +} +``` + +### Webhook Configuration + +```typescript +interface WebhookConfig { + id: UUID; + integrationId: UUID; + type: WebhookType; + + // Security + secretRef: string; // Vault reference for signature secret + signatureHeader?: string; // Header containing signature + signatureAlgorithm?: SignatureAlgorithm; + + // Processing + enabled: boolean; + filters?: WebhookFilter[]; // Filter events + + // Retry + retryPolicy: RetryPolicy; +} + +interface WebhookFilter { + field: string; // JSONPath to field + operator: "equals" | "contains" | "matches"; + value: string; +} + +// Example: Only process tags matching semver +const semverFilter: WebhookFilter = { + field: "$.tag", + operator: "matches", + value: "^v\\d+\\.\\d+\\.\\d+$" +}; +``` + +## Outbound Webhooks + +### Event Types + +| Event | Description | Payload | +|-------|-------------|---------| +| `release.created` | New release created | Release details | +| `promotion.requested` | Promotion requested | Promotion details | +| `promotion.approved` | Promotion approved | Approval details | +| `promotion.rejected` | Promotion rejected | Rejection details | +| `deployment.started` | Deployment started | Job details | +| `deployment.completed` | Deployment completed | Job details, results | +| `deployment.failed` | Deployment failed | Job details, error | +| `rollback.initiated` | Rollback initiated | Rollback details | + +### Webhook Subscription + +```typescript +interface WebhookSubscription { + id: UUID; + tenantId: UUID; + name: string; + + // Target + url: string; + method: "POST" | "PUT"; + headers?: Record; + + // Authentication + authType: "none" | "basic" | "bearer" | "signature"; + credentialRef?: string; + signatureSecret?: string; + + // Events + events: string[]; // Event types to subscribe + filters?: EventFilter[]; // Filter events + + // Delivery + retryPolicy: RetryPolicy; + timeout: number; + + // Status + enabled: boolean; + lastDelivery?: DateTime; + lastStatus?: number; +} + +interface EventFilter { + field: string; + operator: string; + value: any; +} +``` + +### Webhook Delivery + +```typescript +interface WebhookPayload { + id: string; // Delivery ID + timestamp: string; // ISO-8601 + event: string; // Event type + tenantId: string; + data: Record; // Event-specific data +} + +class WebhookDeliveryService { + async deliver( + subscription: WebhookSubscription, + event: DomainEvent + ): Promise { + const payload: WebhookPayload = { + id: uuidv4(), + timestamp: new Date().toISOString(), + event: event.type, + tenantId: subscription.tenantId, + data: this.buildEventData(event) + }; + + const headers = this.buildHeaders(subscription, payload); + const body = JSON.stringify(payload); + + // Attempt delivery with retries + return this.deliverWithRetry(subscription, headers, body); + } + + private buildHeaders( + subscription: WebhookSubscription, + payload: WebhookPayload + ): Record { + const headers: Record = { + "Content-Type": "application/json", + "X-Stella-Event": payload.event, + "X-Stella-Delivery": payload.id, + "X-Stella-Timestamp": payload.timestamp, + ...subscription.headers + }; + + // Add signature if configured + if (subscription.authType === "signature") { + const signature = this.computeSignature( + JSON.stringify(payload), + subscription.signatureSecret! + ); + headers["X-Stella-Signature"] = signature; + } + + return headers; + } + + private async deliverWithRetry( + subscription: WebhookSubscription, + headers: Record, + body: string + ): Promise { + const policy = subscription.retryPolicy; + let lastError: Error | undefined; + + for (let attempt = 0; attempt <= policy.maxRetries; attempt++) { + try { + const response = await fetch(subscription.url, { + method: subscription.method, + headers, + body, + signal: AbortSignal.timeout(subscription.timeout) + }); + + // Record delivery + await this.recordDelivery(subscription.id, { + attempt, + statusCode: response.status, + success: response.ok + }); + + if (response.ok) { + return { success: true, statusCode: response.status, attempts: attempt + 1 }; + } + + // Non-retryable status codes + if (response.status >= 400 && response.status < 500) { + return { + success: false, + statusCode: response.status, + attempts: attempt + 1, + error: `Client error: ${response.status}` + }; + } + + lastError = new Error(`Server error: ${response.status}`); + } catch (error) { + lastError = error as Error; + } + + // Wait before retry + if (attempt < policy.maxRetries) { + const delay = this.calculateDelay(policy, attempt); + await sleep(delay); + } + } + + return { + success: false, + attempts: policy.maxRetries + 1, + error: lastError?.message + }; + } +} +``` + +### Delivery Logging + +```typescript +interface WebhookDeliveryLog { + id: UUID; + subscriptionId: UUID; + deliveryId: string; + + // Request + url: string; + method: string; + headers: Record; + body: string; + + // Response + statusCode?: number; + responseBody?: string; + responseTime: number; + + // Result + success: boolean; + attempt: number; + error?: string; + + // Timing + createdAt: DateTime; +} +``` + +## Webhook API + +### Register Subscription + +```http +POST /api/v1/webhook-subscriptions +Content-Type: application/json + +{ + "name": "Deployment Notifications", + "url": "https://api.example.com/webhooks/stella", + "method": "POST", + "authType": "signature", + "signatureSecret": "my-secret-key", + "events": [ + "deployment.started", + "deployment.completed", + "deployment.failed" + ], + "filters": [ + { + "field": "data.environment.name", + "operator": "equals", + "value": "production" + } + ], + "retryPolicy": { + "maxRetries": 3, + "backoffType": "exponential", + "backoffSeconds": 10 + }, + "timeout": 30000 +} +``` + +### Test Subscription + +```http +POST /api/v1/webhook-subscriptions/{id}/test +Content-Type: application/json + +{ + "event": "deployment.completed" +} +``` + +Response: +```json +{ + "success": true, + "data": { + "deliveryId": "d1234567-...", + "statusCode": 200, + "responseTime": 245, + "response": "OK" + } +} +``` + +### List Deliveries + +```http +GET /api/v1/webhook-subscriptions/{id}/deliveries?page=1&pageSize=20 +``` + +## Event Payloads + +### deployment.completed + +```json +{ + "id": "delivery-uuid", + "timestamp": "2026-01-09T10:30:00Z", + "event": "deployment.completed", + "tenantId": "tenant-uuid", + "data": { + "deploymentJob": { + "id": "job-uuid", + "status": "completed" + }, + "release": { + "id": "release-uuid", + "name": "myapp-v1.2.0", + "components": [ + { + "name": "api", + "digest": "sha256:abc123..." + } + ] + }, + "environment": { + "id": "env-uuid", + "name": "production" + }, + "promotion": { + "id": "promo-uuid", + "requestedBy": "user@example.com" + }, + "targets": [ + { + "id": "target-uuid", + "name": "prod-host-1", + "status": "succeeded" + } + ], + "timing": { + "startedAt": "2026-01-09T10:25:00Z", + "completedAt": "2026-01-09T10:30:00Z", + "durationSeconds": 300 + } + } +} +``` + +### promotion.requested + +```json +{ + "id": "delivery-uuid", + "timestamp": "2026-01-09T10:00:00Z", + "event": "promotion.requested", + "tenantId": "tenant-uuid", + "data": { + "promotion": { + "id": "promo-uuid", + "status": "pending_approval" + }, + "release": { + "id": "release-uuid", + "name": "myapp-v1.2.0" + }, + "sourceEnvironment": { + "id": "staging-uuid", + "name": "staging" + }, + "targetEnvironment": { + "id": "prod-uuid", + "name": "production" + }, + "requestedBy": { + "id": "user-uuid", + "email": "user@example.com", + "name": "John Doe" + }, + "approvalRequired": { + "count": 2, + "currentApprovals": 0 + } + } +} +``` + +## Security Considerations + +### Signature Verification + +Receivers should verify webhook signatures: + +```python +import hmac +import hashlib + +def verify_signature(payload: bytes, signature: str, secret: str) -> bool: + expected = hmac.new( + secret.encode(), + payload, + hashlib.sha256 + ).hexdigest() + + return hmac.compare_digest(signature, expected) + +# In webhook handler +@app.route("/webhooks/stella", methods=["POST"]) +def handle_webhook(): + signature = request.headers.get("X-Stella-Signature") + if not verify_signature(request.data, signature, WEBHOOK_SECRET): + return "Invalid signature", 401 + + payload = request.json + # Process event... +``` + +### IP Allowlisting + +Configure firewall rules to only accept webhooks from Stella IP ranges: +- Document IP ranges in deployment configuration +- Use VPN or private networking where possible + +### Replay Protection + +Check delivery timestamps to prevent replay attacks: + +```python +from datetime import datetime, timedelta + +MAX_TIMESTAMP_AGE = timedelta(minutes=5) + +def check_timestamp(timestamp_str: str) -> bool: + timestamp = datetime.fromisoformat(timestamp_str.replace("Z", "+00:00")) + now = datetime.now(timestamp.tzinfo) + return abs(now - timestamp) < MAX_TIMESTAMP_AGE +``` + +## References + +- [Integrations Overview](overview.md) +- [Connectors](connectors.md) +- [CI/CD Integration](ci-cd.md) diff --git a/docs/modules/release-orchestrator/modules/agents.md b/docs/modules/release-orchestrator/modules/agents.md new file mode 100644 index 000000000..5b07ab936 --- /dev/null +++ b/docs/modules/release-orchestrator/modules/agents.md @@ -0,0 +1,597 @@ +# AGENTS: Deployment Agents + +**Purpose**: Lightweight deployment agents for target execution. + +## Agent Types + +| Agent Type | Transport | Target Types | +|------------|-----------|--------------| +| `agent-docker` | gRPC | Docker hosts | +| `agent-compose` | gRPC | Docker Compose hosts | +| `agent-ssh` | SSH | Linux remote hosts | +| `agent-winrm` | WinRM | Windows remote hosts | +| `agent-ecs` | AWS API | AWS ECS services | +| `agent-nomad` | Nomad API | HashiCorp Nomad jobs | + +## Modules + +### Module: `agent-core` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Shared agent runtime; task execution framework | +| **Protocol** | gRPC for communication with Stella Core | +| **Security** | mTLS authentication; short-lived JWT for tasks | + +**Agent Lifecycle**: +1. Agent starts with registration token +2. Agent registers with capabilities and labels +3. Agent sends heartbeats (default: 30s interval) +4. Agent receives tasks from Stella Core +5. Agent reports task completion/failure + +**Agent Task Protocol**: +```typescript +// Task assignment (Core → Agent) +interface AgentTask { + id: UUID; + type: TaskType; + targetId: UUID; + payload: TaskPayload; + credentials: EncryptedCredentials; + timeout: number; + priority: TaskPriority; + idempotencyKey: string; + assignedAt: DateTime; + expiresAt: DateTime; +} + +type TaskType = + | "deploy" + | "rollback" + | "health-check" + | "inspect" + | "execute-command" + | "upload-files" + | "write-sticker" + | "read-sticker"; + +interface DeployTaskPayload { + image: string; + digest: string; + config: DeployConfig; + artifacts: ArtifactReference[]; + previousDigest?: string; + hooks: { + preDeploy?: HookConfig; + postDeploy?: HookConfig; + }; +} + +// Task result (Agent → Core) +interface TaskResult { + taskId: UUID; + success: boolean; + startedAt: DateTime; + completedAt: DateTime; + + // Success details + outputs?: Record; + artifacts?: ArtifactReference[]; + + // Failure details + error?: string; + errorType?: string; + retriable?: boolean; + + // Logs + logs: string; + + // Metrics + metrics: { + pullDurationMs?: number; + deployDurationMs?: number; + healthCheckDurationMs?: number; + }; +} +``` + +--- + +### Module: `agent-docker` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Docker container deployment | +| **Dependencies** | Docker Engine API | +| **Capabilities** | `docker.deploy`, `docker.rollback`, `docker.inspect` | + +**Docker Agent Implementation**: +```typescript +class DockerAgent implements TargetExecutor { + private docker: Docker; + + async deploy(task: DeployTaskPayload): Promise { + const { image, digest, config, previousDigest } = task; + const containerName = config.containerName; + + // 1. Pull image and verify digest + this.log(`Pulling image ${image}@${digest}`); + await this.docker.pull(image, { digest }); + + const pulledDigest = await this.getImageDigest(image); + if (pulledDigest !== digest) { + throw new DigestMismatchError( + `Expected digest ${digest}, got ${pulledDigest}. Possible tampering detected.` + ); + } + + // 2. Run pre-deploy hook + if (task.hooks?.preDeploy) { + await this.runHook(task.hooks.preDeploy, "pre-deploy"); + } + + // 3. Stop and rename existing container + const existingContainer = await this.findContainer(containerName); + if (existingContainer) { + this.log(`Stopping existing container ${containerName}`); + await existingContainer.stop({ t: 10 }); + await existingContainer.rename(`${containerName}-previous-${Date.now()}`); + } + + // 4. Create new container + this.log(`Creating container ${containerName} from ${image}@${digest}`); + const container = await this.docker.createContainer({ + name: containerName, + Image: `${image}@${digest}`, // Always use digest, not tag + Env: this.buildEnvVars(config.environment), + HostConfig: { + PortBindings: this.buildPortBindings(config.ports), + Binds: this.buildBindMounts(config.volumes), + RestartPolicy: { Name: config.restartPolicy || "unless-stopped" }, + Memory: config.memoryLimit, + CpuQuota: config.cpuLimit, + }, + Labels: { + "stella.release.id": config.releaseId, + "stella.release.name": config.releaseName, + "stella.digest": digest, + "stella.deployed.at": new Date().toISOString(), + }, + }); + + // 5. Start container + this.log(`Starting container ${containerName}`); + await container.start(); + + // 6. Wait for container to be healthy + if (config.healthCheck) { + this.log(`Waiting for container health check`); + const healthy = await this.waitForHealthy(container, config.healthCheck.timeout); + if (!healthy) { + await this.rollbackContainer(containerName, existingContainer); + throw new HealthCheckFailedError(`Container ${containerName} failed health check`); + } + } + + // 7. Run post-deploy hook + if (task.hooks?.postDeploy) { + await this.runHook(task.hooks.postDeploy, "post-deploy"); + } + + // 8. Cleanup previous container + if (existingContainer && config.cleanupPrevious !== false) { + this.log(`Removing previous container`); + await existingContainer.remove({ force: true }); + } + + return { + success: true, + containerId: container.id, + previousDigest: previousDigest, + }; + } + + async rollback(task: RollbackTaskPayload): Promise { + const { containerName, targetDigest } = task; + + if (targetDigest) { + // Deploy specific digest + return this.deploy({ ...task, digest: targetDigest }); + } + + // Find and restore previous container + const previousContainer = await this.findContainer(`${containerName}-previous-*`); + if (!previousContainer) { + throw new RollbackError(`No previous container found for ${containerName}`); + } + + const currentContainer = await this.findContainer(containerName); + if (currentContainer) { + await currentContainer.stop({ t: 10 }); + await currentContainer.rename(`${containerName}-failed-${Date.now()}`); + } + + await previousContainer.rename(containerName); + await previousContainer.start(); + + return { success: true, containerId: previousContainer.id }; + } + + async writeSticker(sticker: VersionSticker): Promise { + const stickerPath = this.config.stickerPath || "/var/stella/version.json"; + const stickerContent = JSON.stringify(sticker, null, 2); + + if (this.config.stickerLocation === "volume") { + await this.docker.run("alpine", [ + "sh", "-c", + `echo '${stickerContent}' > ${stickerPath}` + ], { + HostConfig: { Binds: [`${this.config.stickerVolume}:/var/stella`] } + }); + } else { + fs.writeFileSync(stickerPath, stickerContent); + } + } +} +``` + +--- + +### Module: `agent-compose` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Docker Compose stack deployment | +| **Dependencies** | Docker Compose CLI | +| **Capabilities** | `compose.deploy`, `compose.rollback`, `compose.inspect` | + +**Compose Agent Implementation**: +```typescript +class ComposeAgent implements TargetExecutor { + async deploy(task: DeployTaskPayload): Promise { + const { artifacts, config } = task; + const deployDir = config.deploymentDirectory; + + // 1. Write compose lock file + const composeLock = artifacts.find(a => a.type === "compose_lock"); + const composeContent = await this.fetchArtifact(composeLock); + const composePath = path.join(deployDir, "compose.stella.lock.yml"); + await fs.writeFile(composePath, composeContent); + + // 2. Run pre-deploy hook + if (task.hooks?.preDeploy) { + await this.runHook(task.hooks.preDeploy, deployDir); + } + + // 3. Pull images + this.log("Pulling images..."); + await this.runCompose(deployDir, ["pull"]); + + // 4. Verify digests + await this.verifyDigests(composePath, config.expectedDigests); + + // 5. Deploy + this.log("Deploying services..."); + await this.runCompose(deployDir, ["up", "-d", "--remove-orphans", "--force-recreate"]); + + // 6. Wait for services to be healthy + if (config.healthCheck) { + const healthy = await this.waitForServicesHealthy(deployDir, config.healthCheck.timeout); + if (!healthy) { + await this.rollbackToBackup(deployDir); + throw new HealthCheckFailedError("Services failed health check"); + } + } + + // 7. Run post-deploy hook + if (task.hooks?.postDeploy) { + await this.runHook(task.hooks.postDeploy, deployDir); + } + + // 8. Write version sticker + await this.writeSticker(config.sticker, deployDir); + + return { success: true }; + } + + private async verifyDigests( + composePath: string, + expectedDigests: Record + ): Promise { + const composeContent = yaml.parse(await fs.readFile(composePath, "utf-8")); + + for (const [service, expectedDigest] of Object.entries(expectedDigests)) { + const serviceConfig = composeContent.services[service]; + if (!serviceConfig) { + throw new Error(`Service ${service} not found in compose file`); + } + + const image = serviceConfig.image; + if (!image.includes("@sha256:")) { + throw new Error(`Service ${service} image not pinned to digest: ${image}`); + } + + const actualDigest = image.split("@")[1]; + if (actualDigest !== expectedDigest) { + throw new DigestMismatchError( + `Service ${service}: expected ${expectedDigest}, got ${actualDigest}` + ); + } + } + } +} +``` + +--- + +### Module: `agent-ssh` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | SSH remote execution (agentless) | +| **Dependencies** | SSH client library | +| **Capabilities** | `ssh.deploy`, `ssh.execute`, `ssh.upload` | + +**SSH Remote Executor**: +```typescript +class SSHRemoteExecutor implements TargetExecutor { + async connect(config: SSHConnectionConfig): Promise { + const privateKey = await this.secrets.getSecret(config.privateKeyRef); + + this.ssh = new SSHClient(); + await this.ssh.connect({ + host: config.host, + port: config.port || 22, + username: config.username, + privateKey: privateKey.value, + readyTimeout: config.connectionTimeout || 30000, + }); + } + + async deploy(task: DeployTaskPayload): Promise { + const { artifacts, config } = task; + const deployDir = config.deploymentDirectory; + + try { + // 1. Ensure deployment directory exists + await this.exec(`mkdir -p ${deployDir}`); + await this.exec(`mkdir -p ${deployDir}/.stella-backup`); + + // 2. Backup current deployment + await this.exec(`cp -r ${deployDir}/* ${deployDir}/.stella-backup/ 2>/dev/null || true`); + + // 3. Upload artifacts + for (const artifact of artifacts) { + const content = await this.fetchArtifact(artifact); + const remotePath = path.join(deployDir, artifact.name); + await this.uploadFile(content, remotePath); + } + + // 4. Run pre-deploy hook + if (task.hooks?.preDeploy) { + await this.runRemoteHook(task.hooks.preDeploy, deployDir); + } + + // 5. Execute deployment script + const deployScript = artifacts.find(a => a.type === "deploy_script"); + if (deployScript) { + const scriptPath = path.join(deployDir, deployScript.name); + await this.exec(`chmod +x ${scriptPath}`); + const result = await this.exec(scriptPath, { cwd: deployDir, timeout: config.deploymentTimeout }); + if (result.exitCode !== 0) { + throw new DeploymentError(`Deploy script failed: ${result.stderr}`); + } + } + + // 6. Run post-deploy hook + if (task.hooks?.postDeploy) { + await this.runRemoteHook(task.hooks.postDeploy, deployDir); + } + + // 7. Health check + if (config.healthCheck) { + const healthy = await this.runHealthCheck(config.healthCheck); + if (!healthy) { + await this.rollback(task); + throw new HealthCheckFailedError("Health check failed"); + } + } + + // 8. Write version sticker + await this.writeSticker(config.sticker, deployDir); + + // 9. Cleanup backup + await this.exec(`rm -rf ${deployDir}/.stella-backup`); + + return { success: true }; + } finally { + this.ssh.end(); + } + } +} +``` + +--- + +### Module: `agent-winrm` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | WinRM remote execution (agentless) | +| **Dependencies** | WinRM client library | +| **Capabilities** | `winrm.deploy`, `winrm.execute`, `winrm.upload` | +| **Authentication** | NTLM, Kerberos, Basic | + +--- + +### Module: `agent-ecs` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | AWS ECS service deployment | +| **Dependencies** | AWS SDK | +| **Capabilities** | `ecs.deploy`, `ecs.rollback`, `ecs.inspect` | + +--- + +### Module: `agent-nomad` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | HashiCorp Nomad job deployment | +| **Dependencies** | Nomad API client | +| **Capabilities** | `nomad.deploy`, `nomad.rollback`, `nomad.inspect` | + +--- + +## Agent Security Model + +### Registration Flow + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ AGENT REGISTRATION FLOW │ +│ │ +│ 1. Admin generates registration token (one-time use) │ +│ POST /api/v1/admin/agent-tokens │ +│ → { token: "reg_xxx", expiresAt: "..." } │ +│ │ +│ 2. Agent starts with registration token │ +│ ./stella-agent --register --token=reg_xxx │ +│ │ +│ 3. Agent requests mTLS certificate │ +│ POST /api/v1/agents/register │ +│ Headers: X-Registration-Token: reg_xxx │ +│ Body: { name, version, capabilities, csr } │ +│ → { agentId, certificate, caCertificate } │ +│ │ +│ 4. Agent establishes mTLS connection │ +│ Uses issued certificate for all subsequent requests │ +│ │ +│ 5. Agent requests short-lived JWT for task execution │ +│ POST /api/v1/agents/token (over mTLS) │ +│ → { token, expiresIn: 3600 } // 1 hour │ +│ │ +│ 6. Agent refreshes token before expiration │ +│ Token refresh only over mTLS connection │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### Communication Security + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ AGENT COMMUNICATION SECURITY │ +│ │ +│ ┌──────────────┐ ┌──────────────┐ │ +│ │ AGENT │ │ STELLA CORE │ │ +│ └──────┬───────┘ └──────┬───────┘ │ +│ │ │ │ +│ │ mTLS (mutual TLS) │ │ +│ │ - Agent cert signed by Stella CA │ │ +│ │ - Server cert verified by Agent │ │ +│ │ - TLS 1.3 only │ │ +│ │ - Perfect forward secrecy │ │ +│ │◄───────────────────────────────────────►│ │ +│ │ │ │ +│ │ Encrypted payload │ │ +│ │ - Task payloads encrypted with │ │ +│ │ agent-specific key │ │ +│ │ - Logs encrypted in transit │ │ +│ │◄───────────────────────────────────────►│ │ +│ │ │ │ +│ │ Heartbeat + capability refresh │ │ +│ │ - Every 30 seconds │ │ +│ │ - Signed with agent key │ │ +│ │─────────────────────────────────────────►│ │ +│ │ │ │ +│ │ Task assignment │ │ +│ │ - Contains short-lived credentials │ │ +│ │ - Scoped to specific target │ │ +│ │ - Expires after task timeout │ │ +│ │◄─────────────────────────────────────────│ │ +│ │ │ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Database Schema + +```sql +-- Agents +CREATE TABLE release.agents ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + name VARCHAR(255) NOT NULL, + version VARCHAR(50) NOT NULL, + capabilities JSONB NOT NULL DEFAULT '[]', + labels JSONB NOT NULL DEFAULT '{}', + status VARCHAR(50) NOT NULL DEFAULT 'offline' CHECK (status IN ( + 'online', 'offline', 'degraded' + )), + last_heartbeat TIMESTAMPTZ, + resource_usage JSONB, + certificate_fingerprint VARCHAR(64), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_agents_tenant ON release.agents(tenant_id); +CREATE INDEX idx_agents_status ON release.agents(status); +CREATE INDEX idx_agents_capabilities ON release.agents USING GIN (capabilities); +``` + +--- + +## API Endpoints + +```yaml +# Agent Registration +POST /api/v1/agents/register + Headers: X-Registration-Token: {token} + Body: { name, version, capabilities, csr } + Response: { agentId, certificate, caCertificate } + +# Agent Management +GET /api/v1/agents + Query: ?status={online|offline|degraded}&capability={type} + Response: Agent[] + +GET /api/v1/agents/{id} + Response: Agent + +PUT /api/v1/agents/{id} + Body: { labels?, capabilities? } + Response: Agent + +DELETE /api/v1/agents/{id} + Response: { deleted: true } + +# Agent Communication +POST /api/v1/agents/{id}/heartbeat + Body: { status, resourceUsage, capabilities } + Response: { tasks: AgentTask[] } + +POST /api/v1/agents/{id}/tasks/{taskId}/complete + Body: { success, result, logs } + Response: { acknowledged: true } + +# WebSocket for real-time task stream +WS /api/v1/agents/{id}/task-stream + Messages: + - { type: "task_assigned", task: AgentTask } + - { type: "task_cancelled", taskId } +``` + +--- + +## References + +- [Module Overview](overview.md) +- [Deploy Orchestrator](deploy-orchestrator.md) +- [Agent Security](../security/agent-security.md) +- [API Documentation](../api/agents.md) diff --git a/docs/modules/release-orchestrator/modules/deploy-orchestrator.md b/docs/modules/release-orchestrator/modules/deploy-orchestrator.md new file mode 100644 index 000000000..4d67023df --- /dev/null +++ b/docs/modules/release-orchestrator/modules/deploy-orchestrator.md @@ -0,0 +1,477 @@ +# DEPLOY: Deployment Execution + +**Purpose**: Orchestrate deployment jobs, execute on targets, manage rollbacks, and generate artifacts. + +## Modules + +### Module: `deploy-orchestrator` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Deployment job coordination; strategy execution | +| **Dependencies** | `target-executor`, `artifact-generator`, `agent-manager` | +| **Data Entities** | `DeploymentJob`, `DeploymentTask` | +| **Events Produced** | `deployment.started`, `deployment.task_started`, `deployment.task_completed`, `deployment.completed`, `deployment.failed` | + +**Deployment Job Entity**: +```typescript +interface DeploymentJob { + id: UUID; + tenantId: UUID; + promotionId: UUID; + releaseId: UUID; + environmentId: UUID; + status: DeploymentStatus; + strategy: DeploymentStrategy; + startedAt: DateTime; + completedAt: DateTime; + artifacts: GeneratedArtifact[]; + rollbackOf: UUID | null; // If this is a rollback job + tasks: DeploymentTask[]; +} + +type DeploymentStatus = + | "pending" // Waiting to start + | "running" // Deployment in progress + | "succeeded" // All tasks succeeded + | "failed" // One or more tasks failed + | "cancelled" // User cancelled + | "rolling_back" // Rollback in progress + | "rolled_back"; // Rollback complete + +interface DeploymentTask { + id: UUID; + jobId: UUID; + targetId: UUID; + digest: string; + status: TaskStatus; + agentId: UUID | null; + startedAt: DateTime; + completedAt: DateTime; + exitCode: number | null; + logs: string; + previousDigest: string | null; + stickerWritten: boolean; +} + +type TaskStatus = + | "pending" + | "running" + | "succeeded" + | "failed" + | "cancelled" + | "skipped"; +``` + +--- + +### Module: `target-executor` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Target-specific deployment logic | +| **Dependencies** | `agent-manager`, `connector-runtime` | +| **Protocol** | gRPC for agents, SSH/WinRM for agentless | + +**Executor Types**: + +| Type | Transport | Use Case | +|------|-----------|----------| +| `agent-docker` | gRPC | Docker hosts with agent | +| `agent-compose` | gRPC | Compose hosts with agent | +| `ssh-remote` | SSH | Agentless Linux hosts | +| `winrm-remote` | WinRM | Agentless Windows hosts | +| `ecs-api` | AWS API | AWS ECS services | +| `nomad-api` | Nomad API | HashiCorp Nomad jobs | + +--- + +### Module: `runner-executor` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Script/hook execution in sandbox | +| **Dependencies** | `plugin-sandbox` | +| **Supported Scripts** | C# (.csx), Bash, PowerShell | + +**Hook Types**: +- `pre-deploy`: Run before deployment starts +- `post-deploy`: Run after deployment succeeds +- `on-failure`: Run when deployment fails +- `on-rollback`: Run during rollback + +--- + +### Module: `artifact-generator` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Generate immutable deployment artifacts | +| **Dependencies** | `release-manager`, `environment-manager` | +| **Data Entities** | `GeneratedArtifact`, `ComposeLock`, `VersionSticker` | + +**Generated Artifacts**: + +| Artifact Type | Description | +|---------------|-------------| +| `compose_lock` | `compose.stella.lock.yml` - Pinned digests | +| `script` | Compiled deployment script | +| `sticker` | `stella.version.json` - Version marker | +| `evidence` | Decision and execution evidence | +| `config` | Environment-specific config files | + +**Compose Lock File Generation**: +```typescript +class ComposeLockGenerator { + async generate( + release: Release, + environment: Environment, + targets: Target[] + ): Promise { + + const services: Record = {}; + + for (const component of release.components) { + services[component.componentName] = { + // CRITICAL: Always use digest, never tag + image: `${component.imageRepository}@${component.digest}`, + + // Environment variables + environment: this.mergeEnvironment( + environment.config.variables, + this.buildStellaEnv(release, environment) + ), + + // Labels for Stella tracking + labels: { + "stella.release.id": release.id, + "stella.release.name": release.name, + "stella.component.name": component.componentName, + "stella.component.digest": component.digest, + "stella.environment": environment.name, + "stella.deployed.at": new Date().toISOString(), + }, + }; + } + + const composeLock = { + version: "3.8", + services, + "x-stella": { + release_id: release.id, + release_name: release.name, + environment: environment.name, + generated_at: new Date().toISOString(), + inputs_hash: this.computeInputsHash(release, environment), + components: release.components.map(c => ({ + name: c.componentName, + digest: c.digest, + semver: c.semver, + })), + }, + }; + + const content = yaml.stringify(composeLock); + const hash = crypto.createHash("sha256").update(content).digest("hex"); + + return { + type: "compose_lock", + name: "compose.stella.lock.yml", + content: Buffer.from(content), + contentHash: `sha256:${hash}`, + }; + } +} +``` + +**Version Sticker Generation**: +```typescript +interface VersionSticker { + stella_version: "1.0"; + release_id: UUID; + release_name: string; + components: Array<{ + name: string; + digest: string; + semver: string; + tag: string; + image_repository: string; + }>; + environment: string; + environment_id: UUID; + deployed_at: string; + deployed_by: UUID; + promotion_id: UUID; + workflow_run_id: UUID; + evidence_packet_id: UUID; + evidence_packet_hash: string; + orchestrator_version: string; + source_ref?: { + commit_sha: string; + branch: string; + repository: string; + }; +} +``` + +--- + +### Module: `rollback-manager` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Rollback orchestration; previous state recovery | +| **Dependencies** | `deploy-orchestrator`, `target-registry` | + +**Rollback Strategies**: + +| Strategy | Description | +|----------|-------------| +| `to-previous` | Roll back to last successful deployment | +| `to-release` | Roll back to specific release ID | +| `to-sticker` | Roll back to version in sticker on target | + +**Rollback Flow**: +1. Identify rollback target (previous release or specified) +2. Create rollback deployment job +3. Execute deployment with rollback artifacts +4. Update target state and sticker +5. Record rollback evidence + +--- + +## Deployment Strategies + +### All-at-Once +Deploy to all targets simultaneously. + +```typescript +interface AllAtOnceConfig { + parallelism: number; // Max concurrent deployments (0 = unlimited) + continueOnFailure: boolean; // Continue if some targets fail + failureThreshold: number; // Max failures before abort +} +``` + +### Rolling +Deploy to targets sequentially with health checks. + +```typescript +interface RollingConfig { + batchSize: number; // Targets per batch + batchDelay: number; // Seconds between batches + healthCheckBetweenBatches: boolean; + rollbackOnFailure: boolean; + maxUnavailable: number; // Max targets unavailable at once +} +``` + +### Canary +Deploy to subset, verify, then proceed. + +```typescript +interface CanaryConfig { + canaryTargets: number; // Number or percentage for canary + canaryDuration: number; // Seconds to run canary + healthThreshold: number; // Required health percentage + autoPromote: boolean; // Auto-proceed if healthy + requireApproval: boolean; // Require manual approval +} +``` + +### Blue-Green +Deploy to B, switch traffic, retire A. + +```typescript +interface BlueGreenConfig { + targetGroupA: UUID; // Current (blue) target group + targetGroupB: UUID; // New (green) target group + trafficShiftType: "instant" | "gradual"; + gradualShiftSteps?: number[]; // e.g., [10, 25, 50, 100] + rollbackOnHealthFailure: boolean; +} +``` + +--- + +## Rolling Deployment Algorithm + +```python +class RollingDeploymentExecutor: + def execute(self, job: DeploymentJob, config: RollingConfig) -> DeploymentResult: + targets = self.get_targets(job.environment_id) + batches = self.create_batches(targets, config.batch_size) + + deployed_targets = [] + failed_targets = [] + + for batch_index, batch in enumerate(batches): + self.log(f"Starting batch {batch_index + 1} of {len(batches)}") + + # Deploy batch in parallel + batch_results = self.deploy_batch(job, batch) + + for target, result in batch_results: + if result.success: + deployed_targets.append(target) + # Write version sticker + self.write_sticker(target, job.release) + else: + failed_targets.append(target) + + if config.rollback_on_failure: + # Rollback all deployed targets + self.rollback_targets(deployed_targets, job.previous_release) + return DeploymentResult( + success=False, + error=f"Batch {batch_index + 1} failed, rolled back", + deployed=deployed_targets, + failed=failed_targets, + rolled_back=deployed_targets + ) + + # Health check between batches + if config.health_check_between_batches and batch_index < len(batches) - 1: + health_result = self.check_batch_health(deployed_targets[-len(batch):]) + + if not health_result.healthy: + if config.rollback_on_failure: + self.rollback_targets(deployed_targets, job.previous_release) + return DeploymentResult( + success=False, + error=f"Health check failed after batch {batch_index + 1}", + deployed=deployed_targets, + failed=failed_targets, + rolled_back=deployed_targets + ) + + # Delay between batches + if config.batch_delay > 0 and batch_index < len(batches) - 1: + time.sleep(config.batch_delay) + + return DeploymentResult( + success=len(failed_targets) == 0, + deployed=deployed_targets, + failed=failed_targets + ) +``` + +--- + +## Database Schema + +```sql +-- Deployment Jobs +CREATE TABLE release.deployment_jobs ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + promotion_id UUID NOT NULL REFERENCES release.promotions(id), + release_id UUID NOT NULL REFERENCES release.releases(id), + environment_id UUID NOT NULL REFERENCES release.environments(id), + status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN ( + 'pending', 'running', 'succeeded', 'failed', 'cancelled', 'rolling_back', 'rolled_back' + )), + strategy VARCHAR(50) NOT NULL DEFAULT 'all-at-once', + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + artifacts JSONB NOT NULL DEFAULT '[]', + rollback_of UUID REFERENCES release.deployment_jobs(id), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_deployment_jobs_promotion ON release.deployment_jobs(promotion_id); +CREATE INDEX idx_deployment_jobs_status ON release.deployment_jobs(status); + +-- Deployment Tasks +CREATE TABLE release.deployment_tasks ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + job_id UUID NOT NULL REFERENCES release.deployment_jobs(id) ON DELETE CASCADE, + target_id UUID NOT NULL REFERENCES release.targets(id), + digest VARCHAR(100) NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN ( + 'pending', 'running', 'succeeded', 'failed', 'cancelled', 'skipped' + )), + agent_id UUID REFERENCES release.agents(id), + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + exit_code INTEGER, + logs TEXT, + previous_digest VARCHAR(100), + sticker_written BOOLEAN NOT NULL DEFAULT FALSE, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_deployment_tasks_job ON release.deployment_tasks(job_id); +CREATE INDEX idx_deployment_tasks_target ON release.deployment_tasks(target_id); +CREATE INDEX idx_deployment_tasks_status ON release.deployment_tasks(status); + +-- Generated Artifacts +CREATE TABLE release.generated_artifacts ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + deployment_job_id UUID REFERENCES release.deployment_jobs(id) ON DELETE CASCADE, + artifact_type VARCHAR(50) NOT NULL CHECK (artifact_type IN ( + 'compose_lock', 'script', 'sticker', 'evidence', 'config' + )), + name VARCHAR(255) NOT NULL, + content_hash VARCHAR(100) NOT NULL, + content BYTEA, -- for small artifacts + storage_ref VARCHAR(500), -- for large artifacts (S3, etc.) + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_generated_artifacts_job ON release.generated_artifacts(deployment_job_id); +``` + +--- + +## API Endpoints + +```yaml +# Deployment Jobs (mostly read-only; created by promotions) +GET /api/v1/deployment-jobs + Query: ?promotionId={uuid}&status={status}&environmentId={uuid} + Response: DeploymentJob[] + +GET /api/v1/deployment-jobs/{id} + Response: DeploymentJob (with tasks) + +GET /api/v1/deployment-jobs/{id}/tasks + Response: DeploymentTask[] + +GET /api/v1/deployment-jobs/{id}/tasks/{taskId} + Response: DeploymentTask (with logs) + +GET /api/v1/deployment-jobs/{id}/tasks/{taskId}/logs + Query: ?follow=true + Response: string | SSE stream + +GET /api/v1/deployment-jobs/{id}/artifacts + Response: GeneratedArtifact[] + +GET /api/v1/deployment-jobs/{id}/artifacts/{artifactId} + Response: binary (download) + +# Rollback +POST /api/v1/rollbacks + Body: { + environmentId: UUID, + strategy: "to-previous" | "to-release" | "to-sticker", + targetReleaseId?: UUID # for to-release strategy + } + Response: DeploymentJob (rollback job) + +GET /api/v1/rollbacks + Query: ?environmentId={uuid} + Response: DeploymentJob[] (rollback jobs only) +``` + +--- + +## References + +- [Module Overview](overview.md) +- [Agents Specification](agents.md) +- [Deployment Strategies](../deployment/strategies.md) +- [Artifact Generation](../deployment/artifacts.md) +- [API Documentation](../api/deployments.md) diff --git a/docs/modules/release-orchestrator/modules/environment-manager.md b/docs/modules/release-orchestrator/modules/environment-manager.md new file mode 100644 index 000000000..3b5b70a3c --- /dev/null +++ b/docs/modules/release-orchestrator/modules/environment-manager.md @@ -0,0 +1,418 @@ +# ENVMGR: Environment & Inventory Manager + +**Purpose**: Model environments, targets, agents, and their relationships. + +## Modules + +### Module: `environment-manager` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Environment CRUD, ordering, configuration, freeze windows | +| **Dependencies** | `authority` | +| **Data Entities** | `Environment`, `EnvironmentConfig`, `FreezeWindow` | +| **Events Produced** | `environment.created`, `environment.updated`, `environment.freeze_started`, `environment.freeze_ended` | + +**Key Operations**: +``` +CreateEnvironment(name, displayName, orderIndex, config) → Environment +UpdateEnvironment(id, config) → Environment +DeleteEnvironment(id) → void +SetFreezeWindow(environmentId, start, end, reason, exceptions) → FreezeWindow +ClearFreezeWindow(environmentId, windowId) → void +ListEnvironments(tenantId) → Environment[] +GetEnvironmentState(id) → EnvironmentState +``` + +**Environment Entity**: +```typescript +interface Environment { + id: UUID; + tenantId: UUID; + name: string; // "dev", "stage", "prod" + displayName: string; // "Development" + orderIndex: number; // 0, 1, 2 for promotion order + config: EnvironmentConfig; + freezeWindows: FreezeWindow[]; + requiredApprovals: number; // 0 for dev, 1+ for prod + requireSeparationOfDuties: boolean; + autoPromoteFrom: UUID | null; // auto-promote from this env + promotionPolicy: string; // OPA policy name + createdAt: DateTime; + updatedAt: DateTime; +} + +interface EnvironmentConfig { + variables: Record; // env-specific variables + secrets: SecretReference[]; // vault references + registryOverrides: RegistryOverride[]; // per-env registry + agentLabels: string[]; // required agent labels + deploymentTimeout: number; // seconds + healthCheckConfig: HealthCheckConfig; +} + +interface FreezeWindow { + id: UUID; + start: DateTime; + end: DateTime; + reason: string; + createdBy: UUID; + exceptions: UUID[]; // users who can override +} +``` + +--- + +### Module: `target-registry` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Deployment target inventory; capability tracking | +| **Dependencies** | `environment-manager`, `agent-manager` | +| **Data Entities** | `Target`, `TargetGroup`, `TargetCapability` | +| **Events Produced** | `target.created`, `target.updated`, `target.deleted`, `target.health_changed` | + +**Target Types** (plugin-provided): + +| Type | Description | +|------|-------------| +| `docker_host` | Single Docker host | +| `compose_host` | Docker Compose host | +| `ssh_remote` | Generic SSH target | +| `winrm_remote` | Windows remote target | +| `ecs_service` | AWS ECS service | +| `nomad_job` | HashiCorp Nomad job | + +**Target Entity**: +```typescript +interface Target { + id: UUID; + tenantId: UUID; + environmentId: UUID; + name: string; // "prod-web-01" + targetType: string; // "docker_host" + connection: TargetConnection; // type-specific + capabilities: TargetCapability[]; + labels: Record; // for grouping + healthStatus: HealthStatus; + lastHealthCheck: DateTime; + deploymentDirectory: string; // where artifacts are placed + currentDigest: string | null; // what's currently deployed + agentId: UUID | null; // assigned agent +} + +interface TargetConnection { + // Common fields + host: string; + port: number; + + // Type-specific (examples) + // docker_host: + dockerSocket?: string; + tlsCert?: SecretReference; + + // ssh_remote: + username?: string; + privateKey?: SecretReference; + + // ecs_service: + cluster?: string; + service?: string; + region?: string; + roleArn?: string; +} + +interface TargetGroup { + id: UUID; + tenantId: UUID; + environmentId: UUID; + name: string; + labels: Record; + createdAt: DateTime; +} +``` + +--- + +### Module: `agent-manager` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Agent registration, heartbeat, capability advertisement | +| **Dependencies** | `authority` (for agent tokens) | +| **Data Entities** | `Agent`, `AgentCapability`, `AgentHeartbeat` | +| **Events Produced** | `agent.registered`, `agent.online`, `agent.offline`, `agent.capability_changed` | + +**Agent Lifecycle**: +1. Agent starts, requests registration token from Authority +2. Agent registers with capabilities and labels +3. Agent sends heartbeats (default: 30s interval) +4. Agent pulls tasks from task queue +5. Agent reports task completion/failure + +**Agent Entity**: +```typescript +interface Agent { + id: UUID; + tenantId: UUID; + name: string; + version: string; + capabilities: AgentCapability[]; + labels: Record; + status: "online" | "offline" | "degraded"; + lastHeartbeat: DateTime; + assignedTargets: UUID[]; + resourceUsage: ResourceUsage; +} + +interface AgentCapability { + type: string; // "docker", "compose", "ssh", "winrm" + version: string; // capability version + config: object; // capability-specific config +} + +interface ResourceUsage { + cpuPercent: number; + memoryPercent: number; + diskPercent: number; + activeTasks: number; +} +``` + +**Agent Registration Protocol**: +``` +1. Admin generates registration token (one-time use) + POST /api/v1/admin/agent-tokens + → { token: "reg_xxx", expiresAt: "..." } + +2. Agent starts with registration token + ./stella-agent --register --token=reg_xxx + +3. Agent requests mTLS certificate + POST /api/v1/agents/register + Headers: X-Registration-Token: reg_xxx + Body: { name, version, capabilities, csr } + → { agentId, certificate, caCertificate } + +4. Agent establishes mTLS connection + Uses issued certificate for all subsequent requests + +5. Agent requests short-lived JWT for task execution + POST /api/v1/agents/token (over mTLS) + → { token, expiresIn: 3600 } // 1 hour +``` + +--- + +### Module: `inventory-sync` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Drift detection; expected vs actual state reconciliation | +| **Dependencies** | `target-registry`, `agent-manager` | +| **Events Produced** | `inventory.drift_detected`, `inventory.reconciled` | + +**Drift Detection Process**: +1. Read `stella.version.json` from target deployment directory +2. Compare with expected state in database +3. Flag discrepancies (digest mismatch, missing sticker, unexpected files) +4. Report on dashboard + +**Drift Detection Types**: + +| Drift Type | Description | Severity | +|------------|-------------|----------| +| `digest_mismatch` | Running digest differs from expected | Critical | +| `missing_sticker` | No version sticker found on target | Warning | +| `stale_sticker` | Sticker timestamp older than last deployment | Warning | +| `orphan_container` | Container not managed by Stella | Info | +| `extra_files` | Unexpected files in deployment directory | Info | + +--- + +## Cache Eviction Policies + +Environment configurations and target states are cached to improve performance. **All caches MUST have bounded size and TTL-based eviction**: + +| Cache Type | Purpose | TTL | Max Size | Eviction Strategy | +|-----------|---------|-----|----------|-------------------| +| **Environment Configs** | Environment configuration data | 30 minutes | 500 entries | Sliding expiration | +| **Target Health** | Target health status | 5 minutes | 2,000 entries | Sliding expiration | +| **Agent Capabilities** | Agent capability advertisement | 10 minutes | 1,000 entries | Sliding expiration | +| **Freeze Windows** | Active freeze window checks | 15 minutes | 100 entries | Absolute expiration | + +**Implementation**: +```csharp +public class EnvironmentConfigCache +{ + private readonly MemoryCache _cache; + + public EnvironmentConfigCache() + { + _cache = new MemoryCache(new MemoryCacheOptions + { + SizeLimit = 500 // Max 500 environment configs + }); + } + + public void CacheConfig(Guid environmentId, EnvironmentConfig config) + { + _cache.Set(environmentId, config, new MemoryCacheEntryOptions + { + Size = 1, + SlidingExpiration = TimeSpan.FromMinutes(30) // 30-minute TTL + }); + } + + public EnvironmentConfig? GetCachedConfig(Guid environmentId) + => _cache.Get(environmentId); + + public void InvalidateConfig(Guid environmentId) + => _cache.Remove(environmentId); +} +``` + +**Cache Invalidation**: +- Environment configs: Invalidate on update +- Target health: Invalidate on health check or deployment +- Agent capabilities: Invalidate on capability change event +- Freeze windows: Invalidate on window creation/deletion + +**Reference**: See [Implementation Guide](../implementation-guide.md#caching) for cache implementation patterns. + +--- + +## Database Schema + +```sql +-- Environments +CREATE TABLE release.environments ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + name VARCHAR(100) NOT NULL, + display_name VARCHAR(255) NOT NULL, + order_index INTEGER NOT NULL, + config JSONB NOT NULL DEFAULT '{}', + freeze_windows JSONB NOT NULL DEFAULT '[]', + required_approvals INTEGER NOT NULL DEFAULT 0, + require_sod BOOLEAN NOT NULL DEFAULT FALSE, + auto_promote_from UUID REFERENCES release.environments(id), + promotion_policy VARCHAR(255), + deployment_timeout INTEGER NOT NULL DEFAULT 600, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_environments_tenant ON release.environments(tenant_id); +CREATE INDEX idx_environments_order ON release.environments(tenant_id, order_index); + +-- Target Groups +CREATE TABLE release.target_groups ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES release.environments(id) ON DELETE CASCADE, + name VARCHAR(255) NOT NULL, + labels JSONB NOT NULL DEFAULT '{}', + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, environment_id, name) +); + +-- Targets +CREATE TABLE release.targets ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES release.environments(id) ON DELETE CASCADE, + target_group_id UUID REFERENCES release.target_groups(id), + name VARCHAR(255) NOT NULL, + target_type VARCHAR(100) NOT NULL, + connection JSONB NOT NULL, + capabilities JSONB NOT NULL DEFAULT '[]', + labels JSONB NOT NULL DEFAULT '{}', + deployment_directory VARCHAR(500), + health_status VARCHAR(50) NOT NULL DEFAULT 'unknown', + last_health_check TIMESTAMPTZ, + current_digest VARCHAR(100), + agent_id UUID REFERENCES release.agents(id), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, environment_id, name) +); + +CREATE INDEX idx_targets_tenant_env ON release.targets(tenant_id, environment_id); +CREATE INDEX idx_targets_type ON release.targets(target_type); +CREATE INDEX idx_targets_labels ON release.targets USING GIN (labels); + +-- Agents +CREATE TABLE release.agents ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + name VARCHAR(255) NOT NULL, + version VARCHAR(50) NOT NULL, + capabilities JSONB NOT NULL DEFAULT '[]', + labels JSONB NOT NULL DEFAULT '{}', + status VARCHAR(50) NOT NULL DEFAULT 'offline', + last_heartbeat TIMESTAMPTZ, + resource_usage JSONB, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_agents_tenant ON release.agents(tenant_id); +CREATE INDEX idx_agents_status ON release.agents(status); +CREATE INDEX idx_agents_capabilities ON release.agents USING GIN (capabilities); +``` + +--- + +## API Endpoints + +```yaml +# Environments +POST /api/v1/environments +GET /api/v1/environments +GET /api/v1/environments/{id} +PUT /api/v1/environments/{id} +DELETE /api/v1/environments/{id} + +# Freeze Windows +POST /api/v1/environments/{envId}/freeze-windows +GET /api/v1/environments/{envId}/freeze-windows +DELETE /api/v1/environments/{envId}/freeze-windows/{windowId} + +# Target Groups +POST /api/v1/environments/{envId}/target-groups +GET /api/v1/environments/{envId}/target-groups +GET /api/v1/target-groups/{id} +PUT /api/v1/target-groups/{id} +DELETE /api/v1/target-groups/{id} + +# Targets +POST /api/v1/targets +GET /api/v1/targets +GET /api/v1/targets/{id} +PUT /api/v1/targets/{id} +DELETE /api/v1/targets/{id} +POST /api/v1/targets/{id}/health-check +GET /api/v1/targets/{id}/sticker +GET /api/v1/targets/{id}/drift + +# Agents +POST /api/v1/agents/register +GET /api/v1/agents +GET /api/v1/agents/{id} +PUT /api/v1/agents/{id} +DELETE /api/v1/agents/{id} +POST /api/v1/agents/{id}/heartbeat +POST /api/v1/agents/{id}/tasks/{taskId}/complete +``` + +--- + +## References + +- [Module Overview](overview.md) +- [Agent Specification](agents.md) +- [API Documentation](../api/environments.md) +- [Agent Security](../security/agent-security.md) diff --git a/docs/modules/release-orchestrator/modules/evidence.md b/docs/modules/release-orchestrator/modules/evidence.md new file mode 100644 index 000000000..38bc30410 --- /dev/null +++ b/docs/modules/release-orchestrator/modules/evidence.md @@ -0,0 +1,575 @@ +# RELEVI: Release Evidence + +**Purpose**: Cryptographically sealed evidence packets for audit-grade release governance. + +## Modules + +### Module: `evidence-collector` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Evidence aggregation; packet composition | +| **Dependencies** | `promotion-manager`, `deploy-orchestrator`, `decision-engine` | +| **Data Entities** | `EvidencePacket`, `EvidenceContent` | +| **Events Produced** | `evidence.collected`, `evidence.packet_created` | + +**Evidence Packet Structure**: +```typescript +interface EvidencePacket { + id: UUID; + tenantId: UUID; + promotionId: UUID; + packetType: EvidencePacketType; + content: EvidenceContent; + contentHash: string; // SHA-256 of content + signature: string; // Cryptographic signature + signerKeyRef: string; // Reference to signing key + createdAt: DateTime; + // Note: No updatedAt - packets are immutable +} + +type EvidencePacketType = + | "release_decision" // Promotion decision evidence + | "deployment" // Deployment execution evidence + | "rollback" // Rollback evidence + | "ab_promotion"; // A/B promotion evidence + +interface EvidenceContent { + // Metadata + version: "1.0"; + generatedAt: DateTime; + generatorVersion: string; + + // What + release: { + id: UUID; + name: string; + components: Array<{ + name: string; + digest: string; + semver: string; + imageRepository: string; + }>; + sourceRef: SourceReference | null; + }; + + // Where + environment: { + id: UUID; + name: string; + targets: Array<{ + id: UUID; + name: string; + type: string; + }>; + }; + + // Who + actors: { + requester: { + id: UUID; + name: string; + email: string; + }; + approvers: Array<{ + id: UUID; + name: string; + action: string; + at: DateTime; + comment: string | null; + }>; + }; + + // Why + decision: { + result: "allow" | "deny"; + gates: Array<{ + type: string; + name: string; + status: string; + message: string; + details: Record; + }>; + reasons: string[]; + }; + + // How + execution: { + workflowRunId: UUID | null; + deploymentJobId: UUID | null; + artifacts: Array<{ + type: string; + name: string; + contentHash: string; + }>; + logs: string | null; // Compressed/truncated + }; + + // When + timeline: { + requestedAt: DateTime; + decidedAt: DateTime | null; + startedAt: DateTime | null; + completedAt: DateTime | null; + }; + + // Integrity + inputsHash: string; // Hash of all inputs for replay + previousEvidenceId: UUID | null; // Chain to previous evidence +} +``` + +--- + +### Module: `evidence-signer` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Cryptographic signing of evidence packets | +| **Dependencies** | `authority`, `vault` (for key storage) | +| **Algorithms** | RS256, ES256, Ed25519 | + +**Signing Process**: +```typescript +class EvidenceSigner { + async sign(content: EvidenceContent): Promise { + // 1. Canonicalize content (RFC 8785) + const canonicalJson = canonicalize(content); + + // 2. Compute content hash + const contentHash = crypto + .createHash("sha256") + .update(canonicalJson) + .digest("hex"); + + // 3. Get signing key from vault + const keyRef = await this.getActiveSigningKey(); + const privateKey = await this.vault.getPrivateKey(keyRef); + + // 4. Sign the content hash + const signature = await this.signWithKey(contentHash, privateKey); + + return { + content, + contentHash: `sha256:${contentHash}`, + signature: base64Encode(signature), + signerKeyRef: keyRef, + algorithm: this.config.signatureAlgorithm, + }; + } + + async verify(packet: EvidencePacket): Promise { + // 1. Canonicalize stored content + const canonicalJson = canonicalize(packet.content); + + // 2. Verify content hash + const computedHash = crypto + .createHash("sha256") + .update(canonicalJson) + .digest("hex"); + + if (`sha256:${computedHash}` !== packet.contentHash) { + return { valid: false, error: "Content hash mismatch" }; + } + + // 3. Get public key + const publicKey = await this.vault.getPublicKey(packet.signerKeyRef); + + // 4. Verify signature + const signatureValid = await this.verifySignature( + computedHash, + base64Decode(packet.signature), + publicKey + ); + + return { + valid: signatureValid, + signerKeyRef: packet.signerKeyRef, + signedAt: packet.createdAt, + }; + } +} +``` + +--- + +### Module: `sticker-writer` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Version sticker generation and placement | +| **Dependencies** | `deploy-orchestrator`, `agent-manager` | +| **Data Entities** | `VersionSticker` | + +**Version Sticker Schema**: +```typescript +interface VersionSticker { + stella_version: "1.0"; + + // Release identity + release_id: UUID; + release_name: string; + + // Component details + components: Array<{ + name: string; + digest: string; + semver: string; + tag: string; + image_repository: string; + }>; + + // Deployment context + environment: string; + environment_id: UUID; + deployed_at: string; // ISO 8601 + deployed_by: UUID; + + // Traceability + promotion_id: UUID; + workflow_run_id: UUID; + + // Evidence chain + evidence_packet_id: UUID; + evidence_packet_hash: string; + policy_decision_hash: string; + + // Orchestrator info + orchestrator_version: string; + + // Source reference + source_ref?: { + commit_sha: string; + branch: string; + repository: string; + }; +} +``` + +**Sticker Placement**: +- Written to `/var/stella/version.json` on each target +- Atomic write (write to temp, rename) +- Read during drift detection +- Verified against expected state + +--- + +### Module: `audit-exporter` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Compliance report generation; evidence export | +| **Dependencies** | `evidence-collector` | +| **Export Formats** | JSON, PDF, CSV | + +**Audit Report Types**: + +| Report Type | Description | +|-------------|-------------| +| `release_audit` | Full audit trail for a release | +| `environment_audit` | All deployments to an environment | +| `compliance_summary` | Summary for compliance review | +| `change_log` | Chronological change log | + +**Report Generation**: +```typescript +interface AuditReportRequest { + type: AuditReportType; + scope: { + releaseId?: UUID; + environmentId?: UUID; + from?: DateTime; + to?: DateTime; + }; + format: "json" | "pdf" | "csv"; + options?: { + includeDecisionDetails: boolean; + includeApproverDetails: boolean; + includeLogs: boolean; + includeArtifacts: boolean; + }; +} + +interface AuditReport { + id: UUID; + type: AuditReportType; + scope: ReportScope; + generatedAt: DateTime; + generatedBy: UUID; + + summary: { + totalPromotions: number; + successfulDeployments: number; + failedDeployments: number; + rollbacks: number; + averageDeploymentTime: number; + }; + + entries: AuditEntry[]; + + // For compliance + signatureChain: { + valid: boolean; + verifiedPackets: number; + invalidPackets: number; + }; +} +``` + +--- + +## Immutability Enforcement + +Evidence packets are append-only. This is enforced at multiple levels: + +### Database Level +```sql +-- Evidence packets table with no UPDATE/DELETE +CREATE TABLE release.evidence_packets ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + promotion_id UUID NOT NULL REFERENCES release.promotions(id), + packet_type VARCHAR(50) NOT NULL CHECK (packet_type IN ( + 'release_decision', 'deployment', 'rollback', 'ab_promotion' + )), + content JSONB NOT NULL, + content_hash VARCHAR(100) NOT NULL, + signature TEXT, + signer_key_ref VARCHAR(255), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() + -- Note: No updated_at column; immutable by design +); + +-- Append-only enforcement via trigger +CREATE OR REPLACE FUNCTION prevent_evidence_modification() +RETURNS TRIGGER AS $$ +BEGIN + RAISE EXCEPTION 'Evidence packets are immutable and cannot be modified or deleted'; +END; +$$ LANGUAGE plpgsql; + +CREATE TRIGGER evidence_packets_immutable +BEFORE UPDATE OR DELETE ON evidence_packets +FOR EACH ROW EXECUTE FUNCTION prevent_evidence_modification(); + +-- Revoke UPDATE/DELETE from application role +REVOKE UPDATE, DELETE ON release.evidence_packets FROM app_role; + +-- Version stickers table +CREATE TABLE release.version_stickers ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + target_id UUID NOT NULL REFERENCES release.targets(id), + release_id UUID NOT NULL REFERENCES release.releases(id), + promotion_id UUID NOT NULL REFERENCES release.promotions(id), + sticker_content JSONB NOT NULL, + content_hash VARCHAR(100) NOT NULL, + written_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + verified_at TIMESTAMPTZ, + drift_detected BOOLEAN NOT NULL DEFAULT FALSE +); + +CREATE INDEX idx_version_stickers_target ON release.version_stickers(target_id); +CREATE INDEX idx_version_stickers_release ON release.version_stickers(release_id); +CREATE INDEX idx_evidence_packets_promotion ON release.evidence_packets(promotion_id); +CREATE INDEX idx_evidence_packets_created ON release.evidence_packets(created_at DESC); +``` + +### Application Level +```csharp +// Evidence service enforces immutability +public sealed class EvidenceService +{ + // Only Create method - no Update or Delete + public async Task CreateAsync( + EvidenceContent content, + CancellationToken ct) + { + // Sign content + var signed = await _signer.SignAsync(content, ct); + + // Store (append-only) + var packet = new EvidencePacket + { + Id = Guid.NewGuid(), + TenantId = content.TenantId, + PromotionId = content.PromotionId, + PacketType = content.PacketType, + Content = content, + ContentHash = signed.ContentHash, + Signature = signed.Signature, + SignerKeyRef = signed.SignerKeyRef, + CreatedAt = DateTime.UtcNow, + }; + + await _repository.InsertAsync(packet, ct); + return packet; + } + + // Read methods only + public async Task GetAsync(Guid id, CancellationToken ct); + public async Task> ListAsync( + EvidenceFilter filter, CancellationToken ct); + public async Task VerifyAsync( + Guid id, CancellationToken ct); + + // No Update or Delete methods exist +} +``` + +--- + +## Evidence Chain + +Evidence packets form a verifiable chain: + +``` +┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ +│ Evidence #1 │ │ Evidence #2 │ │ Evidence #3 │ +│ (Dev Deploy) │────►│ (Stage Deploy) │────►│ (Prod Deploy) │ +│ │ │ │ │ │ +│ prevEvidenceId: │ │ prevEvidenceId: │ │ prevEvidenceId: │ +│ null │ │ #1 │ │ #2 │ +│ │ │ │ │ │ +│ contentHash: │ │ contentHash: │ │ contentHash: │ +│ sha256:abc... │ │ sha256:def... │ │ sha256:ghi... │ +└─────────────────┘ └─────────────────┘ └─────────────────┘ +``` + +**Chain Verification**: +```typescript +async function verifyEvidenceChain(releaseId: UUID): Promise { + const packets = await getPacketsForRelease(releaseId); + const results: PacketVerificationResult[] = []; + + let previousHash: string | null = null; + + for (const packet of packets) { + // 1. Verify packet signature + const signatureValid = await verifySignature(packet); + + // 2. Verify content hash + const contentValid = await verifyContentHash(packet); + + // 3. Verify chain link + const chainValid = packet.content.previousEvidenceId === null + ? previousHash === null + : await verifyPreviousLink(packet, previousHash); + + results.push({ + packetId: packet.id, + signatureValid, + contentValid, + chainValid, + valid: signatureValid && contentValid && chainValid, + }); + + previousHash = packet.contentHash; + } + + return { + valid: results.every(r => r.valid), + packets: results, + }; +} +``` + +--- + +## API Endpoints + +```yaml +# Evidence Packets +GET /api/v1/evidence-packets + Query: ?promotionId={uuid}&type={type}&from={date}&to={date} + Response: EvidencePacket[] + +GET /api/v1/evidence-packets/{id} + Response: EvidencePacket (full content) + +GET /api/v1/evidence-packets/{id}/verify + Response: VerificationResult + +GET /api/v1/evidence-packets/{id}/download + Query: ?format={json|pdf} + Response: binary + +# Evidence Chain +GET /api/v1/releases/{id}/evidence-chain + Response: EvidenceChain + +GET /api/v1/releases/{id}/evidence-chain/verify + Response: ChainVerificationResult + +# Audit Reports +POST /api/v1/audit-reports + Body: { + type: "release" | "environment" | "compliance", + scope: { releaseId?, environmentId?, from?, to? }, + format: "json" | "pdf" | "csv" + } + Response: { reportId: UUID, status: "generating" } + +GET /api/v1/audit-reports/{id} + Response: { status, downloadUrl? } + +GET /api/v1/audit-reports/{id}/download + Response: binary + +# Version Stickers +GET /api/v1/version-stickers + Query: ?targetId={uuid}&releaseId={uuid} + Response: VersionSticker[] + +GET /api/v1/version-stickers/{id} + Response: VersionSticker +``` + +--- + +## Deterministic Replay + +Evidence packets enable deterministic replay - given the same inputs and policy version, the same decision is produced: + +```typescript +async function replayDecision(evidencePacket: EvidencePacket): Promise { + const content = evidencePacket.content; + + // 1. Verify inputs hash + const currentInputsHash = computeInputsHash( + content.release, + content.environment, + content.decision.gates + ); + + if (currentInputsHash !== content.inputsHash) { + return { valid: false, error: "Inputs have changed since original decision" }; + } + + // 2. Re-evaluate decision with same inputs + const replayedDecision = await evaluateDecision( + content.release, + content.environment, + { asOf: content.timeline.decidedAt } // Use policy version from that time + ); + + // 3. Compare decisions + const decisionsMatch = replayedDecision.result === content.decision.result; + + return { + valid: decisionsMatch, + originalDecision: content.decision.result, + replayedDecision: replayedDecision.result, + differences: decisionsMatch ? [] : computeDifferences(content.decision, replayedDecision), + }; +} +``` + +--- + +## References + +- [Module Overview](overview.md) +- [Design Principles](../design/principles.md) +- [Security Architecture](../security/overview.md) +- [Evidence Schema](../appendices/evidence-schema.md) diff --git a/docs/modules/release-orchestrator/modules/integration-hub.md b/docs/modules/release-orchestrator/modules/integration-hub.md new file mode 100644 index 000000000..7db9acf90 --- /dev/null +++ b/docs/modules/release-orchestrator/modules/integration-hub.md @@ -0,0 +1,373 @@ +# INTHUB: Integration Hub + +**Purpose**: Central management of all external integrations (SCM, CI, registries, vaults, targets). + +## Modules + +### Module: `integration-manager` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | CRUD for integration instances; plugin type registry | +| **Dependencies** | `plugin-registry`, `authority` (for credentials) | +| **Data Entities** | `Integration`, `IntegrationType`, `IntegrationCredential` | +| **Events Produced** | `integration.created`, `integration.updated`, `integration.deleted`, `integration.health_changed` | +| **Events Consumed** | `plugin.registered`, `plugin.unregistered` | + +**Key Operations**: +``` +CreateIntegration(type, name, config, credentials) → Integration +UpdateIntegration(id, config, credentials) → Integration +DeleteIntegration(id) → void +TestConnection(id) → ConnectionTestResult +DiscoverResources(id, resourceType) → Resource[] +GetIntegrationHealth(id) → HealthStatus +ListIntegrations(filter) → Integration[] +``` + +**Integration Entity**: +```typescript +interface Integration { + id: UUID; + tenantId: UUID; + type: string; // "scm.github", "registry.harbor" + name: string; // user-defined name + config: IntegrationConfig; // type-specific config + credentialId: UUID; // reference to vault + healthStatus: HealthStatus; + lastHealthCheck: DateTime; + createdAt: DateTime; + updatedAt: DateTime; +} + +interface IntegrationConfig { + endpoint: string; + authMode: "token" | "oauth" | "mtls" | "iam"; + timeout: number; + retryPolicy: RetryPolicy; + customHeaders?: Record; + // Type-specific fields added by plugin + [key: string]: any; +} +``` + +--- + +### Module: `connection-profiles` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Default settings management; "last used" pattern | +| **Dependencies** | `integration-manager` | +| **Data Entities** | `ConnectionProfile`, `ProfileTemplate` | + +**Behavior**: When user adds a new integration instance: +1. Wizard defaults to last used endpoint, auth mode, network settings +2. Secrets are **never** auto-reused (explicit confirmation required) +3. User can save as named profile for reuse + +**Profile Entity**: +```typescript +interface ConnectionProfile { + id: UUID; + tenantId: UUID; + name: string; // "Production GitHub" + integrationType: string; + defaultConfig: Partial; + isDefault: boolean; + lastUsedAt: DateTime; + createdBy: UUID; +} +``` + +--- + +### Module: `connector-runtime` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Execute plugin connector logic in controlled environment | +| **Dependencies** | `plugin-loader`, `plugin-sandbox` | +| **Protocol** | gRPC (preferred) or HTTP/REST | + +**Connector Interface** (implemented by plugins): +```protobuf +service Connector { + // Connection management + rpc TestConnection(TestConnectionRequest) returns (TestConnectionResponse); + rpc GetHealth(HealthRequest) returns (HealthResponse); + + // Resource discovery + rpc DiscoverResources(DiscoverRequest) returns (DiscoverResponse); + rpc ListRepositories(ListReposRequest) returns (ListReposResponse); + rpc ListBranches(ListBranchesRequest) returns (ListBranchesResponse); + rpc ListTags(ListTagsRequest) returns (ListTagsResponse); + + // Registry operations + rpc ResolveTagToDigest(ResolveRequest) returns (ResolveResponse); + rpc FetchManifest(ManifestRequest) returns (ManifestResponse); + rpc VerifyDigest(VerifyRequest) returns (VerifyResponse); + + // Secrets operations + rpc GetSecretsRef(SecretsRequest) returns (SecretsResponse); + rpc FetchSecret(FetchSecretRequest) returns (FetchSecretResponse); + + // Workflow step execution + rpc ExecuteStep(StepRequest) returns (stream StepResponse); + rpc CancelStep(CancelRequest) returns (CancelResponse); +} +``` + +**Request/Response Types**: +```protobuf +message TestConnectionRequest { + string integration_id = 1; + map config = 2; + string credential_ref = 3; +} + +message TestConnectionResponse { + bool success = 1; + string error_message = 2; + map details = 3; + int64 latency_ms = 4; +} + +message ResolveRequest { + string integration_id = 1; + string image_ref = 2; // "myapp:v2.3.1" +} + +message ResolveResponse { + string digest = 1; // "sha256:abc123..." + string manifest_type = 2; + int64 size_bytes = 3; + google.protobuf.Timestamp pushed_at = 4; +} +``` + +--- + +### Module: `doctor-checks` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Integration health diagnostics; troubleshooting | +| **Dependencies** | `integration-manager`, `connector-runtime` | + +**Doctor Check Types**: + +| Check | Purpose | Pass Criteria | +|-------|---------|---------------| +| **Connectivity** | Can reach endpoint | TCP connect succeeds | +| **TLS** | Certificate valid | Chain validates, not expired | +| **Authentication** | Credentials valid | Auth request succeeds | +| **Authorization** | Permissions sufficient | Required scopes present | +| **Version** | API version supported | Version in supported range | +| **Rate Limit** | Quota available | >10% remaining | +| **Latency** | Response time acceptable | <5s p99 | + +**Doctor Check Output**: +```typescript +interface DoctorCheckResult { + checkType: string; + status: "pass" | "warn" | "fail"; + message: string; + details: Record; + suggestions: string[]; + runAt: DateTime; + durationMs: number; +} + +interface DoctorReport { + integrationId: UUID; + overallStatus: "healthy" | "degraded" | "unhealthy"; + checks: DoctorCheckResult[]; + generatedAt: DateTime; +} +``` + +--- + +## Cache Eviction Policies + +Integration health status and connector results are cached to reduce load on external systems. **All caches MUST have bounded size and TTL-based eviction**: + +| Cache Type | Purpose | TTL | Max Size | Eviction Strategy | +|-----------|---------|-----|----------|-------------------| +| **Health Checks** | Integration health status | 5 minutes | 1,000 entries | Sliding expiration | +| **Connection Tests** | Test connection results | 2 minutes | 500 entries | Sliding expiration | +| **Resource Discovery** | Discovered resources (repos, tags) | 10 minutes | 5,000 entries | Sliding expiration | +| **Tag Resolution** | Tag → digest mappings | 1 hour | 10,000 entries | Absolute expiration | + +**Implementation**: +```csharp +public class IntegrationHealthCache +{ + private readonly MemoryCache _cache; + + public IntegrationHealthCache() + { + _cache = new MemoryCache(new MemoryCacheOptions + { + SizeLimit = 1_000 // Max 1,000 integration health entries + }); + } + + public void CacheHealthStatus(Guid integrationId, HealthStatus status) + { + _cache.Set(integrationId, status, new MemoryCacheEntryOptions + { + Size = 1, + SlidingExpiration = TimeSpan.FromMinutes(5) // 5-minute TTL + }); + } + + public HealthStatus? GetCachedHealthStatus(Guid integrationId) + => _cache.Get(integrationId); +} +``` + +**Reference**: See [Implementation Guide](../implementation-guide.md#caching) for cache implementation patterns. + +--- + +## Integration Types + +The following integration types are supported (via plugins): + +### SCM Integrations + +| Type | Plugin | Capabilities | +|------|--------|--------------| +| `scm.github` | Built-in | repos, branches, commits, webhooks, status | +| `scm.gitlab` | Built-in | repos, branches, commits, webhooks, pipelines | +| `scm.bitbucket` | Plugin | repos, branches, commits, webhooks | +| `scm.azure_repos` | Plugin | repos, branches, commits, pipelines | + +### Registry Integrations + +| Type | Plugin | Capabilities | +|------|--------|--------------| +| `registry.harbor` | Built-in | repos, tags, digests, scanning status | +| `registry.ecr` | Plugin | repos, tags, digests, IAM auth | +| `registry.gcr` | Plugin | repos, tags, digests | +| `registry.dockerhub` | Plugin | repos, tags, digests | +| `registry.ghcr` | Plugin | repos, tags, digests | +| `registry.acr` | Plugin | repos, tags, digests | + +### Vault Integrations + +| Type | Plugin | Capabilities | +|------|--------|--------------| +| `vault.hashicorp` | Built-in | KV, transit, PKI | +| `vault.aws_secrets` | Plugin | secrets, IAM auth | +| `vault.azure_keyvault` | Plugin | secrets, certificates | +| `vault.gcp_secrets` | Plugin | secrets, IAM auth | + +### CI Integrations + +| Type | Plugin | Capabilities | +|------|--------|--------------| +| `ci.github_actions` | Built-in | workflows, runs, artifacts, status | +| `ci.gitlab_ci` | Built-in | pipelines, jobs, artifacts | +| `ci.jenkins` | Plugin | jobs, builds, artifacts | +| `ci.azure_pipelines` | Plugin | pipelines, runs, artifacts | + +### Router Integrations (for Progressive Delivery) + +| Type | Plugin | Capabilities | +|------|--------|--------------| +| `router.nginx` | Plugin | upstream config, reload | +| `router.haproxy` | Plugin | backend config, reload | +| `router.traefik` | Plugin | dynamic config | +| `router.aws_alb` | Plugin | target groups, listener rules | + +--- + +## Database Schema + +```sql +-- Integration types (populated by plugins) +CREATE TABLE release.integration_types ( + id TEXT PRIMARY KEY, -- "scm.github" + plugin_id UUID REFERENCES release.plugins(id), + display_name TEXT NOT NULL, + description TEXT, + icon_url TEXT, + config_schema JSONB NOT NULL, -- JSON Schema for config + capabilities TEXT[] NOT NULL, -- ["repos", "webhooks", "status"] + created_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +-- Integration instances +CREATE TABLE release.integrations ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + type_id TEXT NOT NULL REFERENCES release.integration_types(id), + name TEXT NOT NULL, + config JSONB NOT NULL, + credential_ref TEXT NOT NULL, -- vault reference + health_status TEXT NOT NULL DEFAULT 'unknown', + last_health_check TIMESTAMPTZ, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), + created_by UUID NOT NULL REFERENCES users(id), + UNIQUE(tenant_id, name) +); + +-- Connection profiles +CREATE TABLE release.connection_profiles ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id), + name TEXT NOT NULL, + integration_type TEXT NOT NULL, + default_config JSONB NOT NULL, + is_default BOOLEAN NOT NULL DEFAULT false, + last_used_at TIMESTAMPTZ, + created_by UUID NOT NULL REFERENCES users(id), + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + UNIQUE(tenant_id, name) +); + +-- Doctor check history +CREATE TABLE release.doctor_checks ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + integration_id UUID NOT NULL REFERENCES release.integrations(id), + check_type TEXT NOT NULL, + status TEXT NOT NULL, + message TEXT, + details JSONB, + duration_ms INTEGER NOT NULL, + run_at TIMESTAMPTZ NOT NULL DEFAULT now() +); + +CREATE INDEX idx_doctor_checks_integration ON release.doctor_checks(integration_id, run_at DESC); +``` + +--- + +## API Endpoints + +See [API Documentation](../api/overview.md) for full specification. + +``` +GET /api/v1/integration-types # List available types +GET /api/v1/integration-types/{type} # Get type details + +GET /api/v1/integrations # List integrations +POST /api/v1/integrations # Create integration +GET /api/v1/integrations/{id} # Get integration +PUT /api/v1/integrations/{id} # Update integration +DELETE /api/v1/integrations/{id} # Delete integration +POST /api/v1/integrations/{id}/test # Test connection +GET /api/v1/integrations/{id}/health # Get health status +POST /api/v1/integrations/{id}/doctor # Run doctor checks +GET /api/v1/integrations/{id}/resources # Discover resources + +GET /api/v1/connection-profiles # List profiles +POST /api/v1/connection-profiles # Create profile +GET /api/v1/connection-profiles/{id} # Get profile +PUT /api/v1/connection-profiles/{id} # Update profile +DELETE /api/v1/connection-profiles/{id} # Delete profile +``` diff --git a/docs/modules/release-orchestrator/modules/overview.md b/docs/modules/release-orchestrator/modules/overview.md new file mode 100644 index 000000000..2e81a99bc --- /dev/null +++ b/docs/modules/release-orchestrator/modules/overview.md @@ -0,0 +1,203 @@ +# Module Landscape Overview + +The Stella Ops Suite comprises existing modules (vulnerability scanning) and new modules (release orchestration). Modules are organized into **themes** (functional areas). + +## Architecture Diagram + +``` +┌─────────────────────────────────────────────────────────────────────────────────┐ +│ STELLA OPS SUITE │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────────┐ │ +│ │ EXISTING THEMES (Vulnerability) │ │ +│ │ │ │ +│ │ INGEST VEXOPS REASON SCANENG EVIDENCE │ │ +│ │ ├─concelier ├─excititor ├─policy ├─scanner ├─locker │ │ +│ │ └─advisory-ai └─linksets └─opa-runtime ├─sbom-gen ├─export │ │ +│ │ └─reachability └─timeline │ │ +│ │ │ │ +│ │ RUNTIME JOBCTRL OBSERVE REPLAY DEVEXP │ │ +│ │ ├─signals ├─scheduler ├─notifier └─replay-core ├─cli │ │ +│ │ ├─graph ├─orchestrator └─telemetry ├─web-ui │ │ +│ │ └─zastava └─task-runner └─sdk │ │ +│ └───────────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────────┐ │ +│ │ NEW THEMES (Release Orchestration) │ │ +│ │ │ │ +│ │ INTHUB (Integration Hub) │ │ +│ │ ├─integration-manager Central registry of configured integrations │ │ +│ │ ├─connection-profiles Default settings + credential management │ │ +│ │ ├─connector-runtime Plugin connector execution environment │ │ +│ │ └─doctor-checks Integration health diagnostics │ │ +│ │ │ │ +│ │ ENVMGR (Environment & Inventory) │ │ +│ │ ├─environment-manager Environment CRUD, ordering, config │ │ +│ │ ├─target-registry Deployment targets (hosts/services) │ │ +│ │ ├─agent-manager Agent registration, health, capabilities │ │ +│ │ └─inventory-sync Drift detection, state reconciliation │ │ +│ │ │ │ +│ │ RELMAN (Release Management) │ │ +│ │ ├─component-registry Image repos → components mapping │ │ +│ │ ├─version-manager Tag/digest → semver mapping │ │ +│ │ ├─release-manager Release bundle lifecycle │ │ +│ │ └─release-catalog Release history, search, compare │ │ +│ │ │ │ +│ │ WORKFL (Workflow Engine) │ │ +│ │ ├─workflow-designer Template creation, step graph editor │ │ +│ │ ├─workflow-engine DAG execution, state machine │ │ +│ │ ├─step-executor Step dispatch, retry, timeout │ │ +│ │ └─step-registry Built-in + plugin-provided steps │ │ +│ │ │ │ +│ │ PROMOT (Promotion & Approval) │ │ +│ │ ├─promotion-manager Promotion request lifecycle │ │ +│ │ ├─approval-gateway Approval collection, SoD enforcement │ │ +│ │ ├─decision-engine Gate evaluation, policy integration │ │ +│ │ └─gate-registry Built-in + custom gates │ │ +│ │ │ │ +│ │ DEPLOY (Deployment Execution) │ │ +│ │ ├─deploy-orchestrator Deployment job coordination │ │ +│ │ ├─target-executor Target-specific deployment logic │ │ +│ │ ├─runner-executor Script/hook execution sandbox │ │ +│ │ ├─artifact-generator Compose/script artifact generation │ │ +│ │ └─rollback-manager Rollback orchestration │ │ +│ │ │ │ +│ │ AGENTS (Deployment Agents) │ │ +│ │ ├─agent-core Shared agent runtime │ │ +│ │ ├─agent-docker Docker host agent │ │ +│ │ ├─agent-compose Docker Compose agent │ │ +│ │ ├─agent-ssh SSH remote executor │ │ +│ │ ├─agent-winrm WinRM remote executor │ │ +│ │ ├─agent-ecs AWS ECS agent │ │ +│ │ └─agent-nomad HashiCorp Nomad agent │ │ +│ │ │ │ +│ │ PROGDL (Progressive Delivery) │ │ +│ │ ├─ab-manager A/B release coordination │ │ +│ │ ├─traffic-router Router plugin orchestration │ │ +│ │ ├─canary-controller Canary ramp automation │ │ +│ │ └─rollout-strategy Strategy templates │ │ +│ │ │ │ +│ │ RELEVI (Release Evidence) │ │ +│ │ ├─evidence-collector Evidence aggregation │ │ +│ │ ├─evidence-signer Cryptographic signing │ │ +│ │ ├─sticker-writer Version sticker generation │ │ +│ │ └─audit-exporter Compliance report generation │ │ +│ │ │ │ +│ │ PLUGIN (Plugin Infrastructure) │ │ +│ │ ├─plugin-registry Plugin discovery, versioning │ │ +│ │ ├─plugin-loader Plugin lifecycle management │ │ +│ │ ├─plugin-sandbox Isolation, resource limits │ │ +│ │ └─plugin-sdk SDK for plugin development │ │ +│ └───────────────────────────────────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────────────────────────────┘ +``` + +## Theme Summary + +### Existing Themes (Vulnerability Scanning) + +| Theme | Purpose | Key Modules | +|-------|---------|-------------| +| **INGEST** | Advisory ingestion | concelier, advisory-ai | +| **VEXOPS** | VEX document handling | excititor, linksets | +| **REASON** | Policy and decisioning | policy, opa-runtime | +| **SCANENG** | Scanning and SBOM | scanner, sbom-gen, reachability | +| **EVIDENCE** | Evidence and attestation | locker, export, timeline | +| **RUNTIME** | Runtime signals | signals, graph, zastava | +| **JOBCTRL** | Job orchestration | scheduler, orchestrator, task-runner | +| **OBSERVE** | Observability | notifier, telemetry | +| **REPLAY** | Deterministic replay | replay-core | +| **DEVEXP** | Developer experience | cli, web-ui, sdk | + +### New Themes (Release Orchestration) + +| Theme | Purpose | Key Modules | Documentation | +|-------|---------|-------------|---------------| +| **INTHUB** | Integration hub | integration-manager, connection-profiles, connector-runtime, doctor-checks | [Details](integration-hub.md) | +| **ENVMGR** | Environment & inventory | environment-manager, target-registry, agent-manager, inventory-sync | [Details](environment-manager.md) | +| **RELMAN** | Release management | component-registry, version-manager, release-manager, release-catalog | [Details](release-manager.md) | +| **WORKFL** | Workflow engine | workflow-designer, workflow-engine, step-executor, step-registry | [Details](workflow-engine.md) | +| **PROMOT** | Promotion & approval | promotion-manager, approval-gateway, decision-engine, gate-registry | [Details](promotion-manager.md) | +| **DEPLOY** | Deployment execution | deploy-orchestrator, target-executor, runner-executor, artifact-generator, rollback-manager | [Details](deploy-orchestrator.md) | +| **AGENTS** | Deployment agents | agent-core, agent-docker, agent-compose, agent-ssh, agent-winrm, agent-ecs, agent-nomad | [Details](agents.md) | +| **PROGDL** | Progressive delivery | ab-manager, traffic-router, canary-controller, rollout-strategy | [Details](progressive-delivery.md) | +| **RELEVI** | Release evidence | evidence-collector, evidence-signer, sticker-writer, audit-exporter | [Details](evidence.md) | +| **PLUGIN** | Plugin infrastructure | plugin-registry, plugin-loader, plugin-sandbox, plugin-sdk | [Details](plugin-system.md) | + +## Module Dependencies + +``` + ┌──────────────┐ + │ AUTHORITY │ + └──────┬───────┘ + │ + ┌──────────────────┼──────────────────┐ + │ │ │ + ▼ ▼ ▼ +┌───────────────┐ ┌───────────────┐ ┌───────────────┐ +│ INTHUB │ │ ENVMGR │ │ PLUGIN │ +│ (Integrations)│ │ (Environments)│ │ (Plugins) │ +└───────┬───────┘ └───────┬───────┘ └───────┬───────┘ + │ │ │ + └──────────┬───────┴──────────────────┘ + │ + ▼ + ┌───────────────┐ + │ RELMAN │ + │ (Releases) │ + └───────┬───────┘ + │ + ▼ + ┌───────────────┐ + │ WORKFL │ + │ (Workflows) │ + └───────┬───────┘ + │ + ┌──────────┴──────────┐ + │ │ + ▼ ▼ +┌───────────────┐ ┌───────────────┐ +│ PROMOT │ │ DEPLOY │ +│ (Promotion) │ │ (Deployment) │ +└───────┬───────┘ └───────┬───────┘ + │ │ + │ ▼ + │ ┌───────────────┐ + │ │ AGENTS │ + │ │ (Agents) │ + │ └───────┬───────┘ + │ │ + └──────────┬──────────┘ + │ + ▼ + ┌───────────────┐ + │ RELEVI │ + │ (Evidence) │ + └───────────────┘ +``` + +## Communication Patterns + +| Pattern | Usage | +|---------|-------| +| **Synchronous API** | User-initiated operations (CRUD, queries) | +| **Event Bus** | Cross-module notifications (domain events) | +| **Task Queue** | Long-running operations (deployments, syncs) | +| **WebSocket/SSE** | Real-time UI updates | +| **gRPC Streams** | Agent communication | + +## Database Schema Organization + +Each theme owns a PostgreSQL schema: + +| Schema | Owner Theme | +|--------|-------------| +| `release.integrations` | INTHUB | +| `release.environments` | ENVMGR | +| `release.components` | RELMAN | +| `release.workflows` | WORKFL | +| `release.promotions` | PROMOT | +| `release.deployments` | DEPLOY | +| `release.agents` | AGENTS | +| `release.evidence` | RELEVI | +| `release.plugins` | PLUGIN | diff --git a/docs/modules/release-orchestrator/modules/plugin-system.md b/docs/modules/release-orchestrator/modules/plugin-system.md new file mode 100644 index 000000000..4671d5537 --- /dev/null +++ b/docs/modules/release-orchestrator/modules/plugin-system.md @@ -0,0 +1,629 @@ +# PLUGIN: Plugin Infrastructure + +**Purpose**: Extensible plugin system for integrations, steps, and custom functionality. + +## Architecture Overview + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ PLUGIN ARCHITECTURE │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ PLUGIN REGISTRY │ │ +│ │ │ │ +│ │ - Plugin discovery and versioning │ │ +│ │ - Manifest validation │ │ +│ │ - Dependency resolution │ │ +│ └──────────────────────────────┬──────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ PLUGIN LOADER │ │ +│ │ │ │ +│ │ - Lifecycle management (load, start, stop, unload) │ │ +│ │ - Health monitoring │ │ +│ │ - Hot reload support │ │ +│ └──────────────────────────────┬──────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ PLUGIN SANDBOX │ │ +│ │ │ │ +│ │ - Process isolation │ │ +│ │ - Resource limits (CPU, memory, network) │ │ +│ │ - Capability enforcement │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ Plugin Types: │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ Connector │ │ Step │ │ Gate │ │ Agent │ │ +│ │ Plugins │ │ Providers │ │ Providers │ │ Plugins │ │ +│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Modules + +### Module: `plugin-registry` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Plugin discovery; versioning; manifest management | +| **Data Entities** | `Plugin`, `PluginManifest`, `PluginVersion` | +| **Events Produced** | `plugin.discovered`, `plugin.registered`, `plugin.unregistered` | + +**Plugin Entity**: +```typescript +interface Plugin { + id: UUID; + pluginId: string; // "com.example.my-connector" + version: string; // "1.2.3" + vendor: string; + license: string; + manifest: PluginManifest; + status: PluginStatus; + entrypoint: string; // Path to plugin executable/module + lastHealthCheck: DateTime; + healthMessage: string | null; + installedAt: DateTime; + updatedAt: DateTime; +} + +type PluginStatus = + | "discovered" // Found but not loaded + | "loaded" // Loaded but not active + | "active" // Running and healthy + | "stopped" // Manually stopped + | "failed" // Failed to load or crashed + | "degraded"; // Running but with issues +``` + +--- + +### Module: `plugin-loader` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Plugin lifecycle management | +| **Dependencies** | `plugin-registry`, `plugin-sandbox` | +| **Events Produced** | `plugin.loaded`, `plugin.started`, `plugin.stopped`, `plugin.failed` | + +**Plugin Lifecycle**: +``` +┌──────────────┐ +│ DISCOVERED │ ──── Plugin found in registry +└──────┬───────┘ + │ load() + ▼ +┌──────────────┐ +│ LOADED │ ──── Plugin validated and prepared +└──────┬───────┘ + │ start() + ▼ +┌──────────────┐ ┌──────────────┐ +│ ACTIVE │ ──── │ DEGRADED │ ◄── Health issues +└──────┬───────┘ └──────────────┘ + │ stop() │ + ▼ │ +┌──────────────┐ │ +│ STOPPED │ ◄───────────┘ manual stop +└──────────────┘ + + │ unload() + ▼ +┌──────────────┐ +│ UNLOADED │ +└──────────────┘ +``` + +**Lifecycle Operations**: +```typescript +interface PluginLoader { + // Discovery + discover(): Promise; + refresh(): Promise; + + // Lifecycle + load(pluginId: string): Promise; + start(pluginId: string): Promise; + stop(pluginId: string): Promise; + unload(pluginId: string): Promise; + restart(pluginId: string): Promise; + + // Health + checkHealth(pluginId: string): Promise; + getStatus(pluginId: string): Promise; + + // Hot reload + reload(pluginId: string): Promise; +} +``` + +--- + +### Module: `plugin-sandbox` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Isolation; resource limits; security | +| **Enforcement** | Process isolation, capability-based security | + +**Sandbox Configuration**: +```typescript +interface SandboxConfig { + // Process isolation + processIsolation: boolean; // Run in separate process + containerIsolation: boolean; // Run in container + + // Resource limits + resourceLimits: { + maxMemoryMb: number; // Memory limit + maxCpuPercent: number; // CPU limit + maxDiskMb: number; // Disk quota + maxNetworkBandwidth: number; // Network bandwidth limit + }; + + // Network restrictions + networkPolicy: { + allowedHosts: string[]; // Allowed outbound hosts + blockedHosts: string[]; // Blocked hosts + allowOutbound: boolean; // Allow any outbound + }; + + // Filesystem restrictions + filesystemPolicy: { + readOnlyPaths: string[]; + writablePaths: string[]; + blockedPaths: string[]; + }; + + // Timeouts + timeouts: { + initializationMs: number; + operationMs: number; + shutdownMs: number; + }; +} +``` + +**Capability Enforcement**: +```typescript +interface PluginCapabilities { + // Integration capabilities + integrations: { + scm: boolean; + ci: boolean; + registry: boolean; + vault: boolean; + router: boolean; + }; + + // Step capabilities + steps: { + deploy: boolean; + gate: boolean; + notify: boolean; + custom: boolean; + }; + + // System capabilities + system: { + network: boolean; + filesystem: boolean; + secrets: boolean; + database: boolean; + }; +} +``` + +--- + +### Module: `plugin-sdk` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | SDK for plugin development | +| **Languages** | C#, TypeScript, Go | + +**Plugin SDK Interface**: +```typescript +// Base plugin interface +interface StellaPlugin { + // Lifecycle + initialize(config: PluginConfig): Promise; + start(): Promise; + stop(): Promise; + dispose(): Promise; + + // Health + getHealth(): Promise; + + // Metadata + getManifest(): PluginManifest; +} + +// Connector plugin interface +interface ConnectorPlugin extends StellaPlugin { + createConnector(config: ConnectorConfig): Promise; +} + +// Step provider plugin interface +interface StepProviderPlugin extends StellaPlugin { + getStepTypes(): StepType[]; + executeStep( + stepType: string, + config: StepConfig, + inputs: StepInputs, + context: StepContext + ): AsyncGenerator; +} + +// Gate provider plugin interface +interface GateProviderPlugin extends StellaPlugin { + getGateTypes(): GateType[]; + evaluateGate( + gateType: string, + config: GateConfig, + context: GateContext + ): Promise; +} +``` + +--- + +## Three-Surface Plugin Model + +Plugins contribute to the system through three distinct surfaces: + +### 1. Manifest Surface (Static) + +The plugin manifest declares: +- Plugin identity and version +- Required capabilities +- Provided integrations/steps/gates +- Configuration schema +- UI components (optional) + +```yaml +# plugin.stella.yaml +plugin: + id: "com.example.jenkins-connector" + version: "1.0.0" + vendor: "Example Corp" + license: "Apache-2.0" + description: "Jenkins CI integration for Stella Ops" + +capabilities: + required: + - network + optional: + - secrets + +provides: + integrations: + - type: "ci.jenkins" + displayName: "Jenkins" + configSchema: "./schemas/jenkins-config.json" + capabilities: + - "pipelines" + - "builds" + - "artifacts" + + steps: + - type: "jenkins-trigger" + displayName: "Trigger Jenkins Build" + category: "integration" + configSchema: "./schemas/jenkins-trigger-config.json" + inputSchema: "./schemas/jenkins-trigger-input.json" + outputSchema: "./schemas/jenkins-trigger-output.json" + +ui: + configScreen: "./ui/config.html" + icon: "./assets/jenkins-icon.svg" + +dependencies: + stellaCore: ">=1.0.0" +``` + +### 2. Connector Runtime Surface (Dynamic) + +Plugins implement connector interfaces for runtime operations: + +```typescript +// Jenkins connector implementation +class JenkinsConnector implements CIConnector { + private client: JenkinsClient; + + async initialize(config: ConnectorConfig, secrets: SecretHandle[]): Promise { + const apiToken = await this.getSecret(secrets, "api_token"); + this.client = new JenkinsClient({ + baseUrl: config.endpoint, + username: config.username, + apiToken: apiToken, + }); + } + + async testConnection(): Promise { + try { + const crumb = await this.client.getCrumb(); + return { success: true, message: "Connected to Jenkins" }; + } catch (error) { + return { success: false, message: error.message }; + } + } + + async listPipelines(): Promise { + const jobs = await this.client.getJobs(); + return jobs.map(job => ({ + id: job.name, + name: job.displayName, + url: job.url, + lastBuild: job.lastBuild?.number, + })); + } + + async triggerPipeline(pipelineId: string, params: object): Promise { + const queueItem = await this.client.build(pipelineId, params); + return { + id: queueItem.id.toString(), + pipelineId, + status: "queued", + startedAt: new Date(), + }; + } + + async getPipelineRun(runId: string): Promise { + const build = await this.client.getBuild(runId); + return { + id: build.number.toString(), + pipelineId: build.job, + status: this.mapStatus(build.result), + startedAt: new Date(build.timestamp), + completedAt: build.result ? new Date(build.timestamp + build.duration) : null, + }; + } +} +``` + +### 3. Step Provider Surface (Execution) + +Plugins implement step execution logic: + +```typescript +// Jenkins trigger step implementation +class JenkinsTriggerStep implements StepExecutor { + async *execute( + config: StepConfig, + inputs: StepInputs, + context: StepContext + ): AsyncGenerator { + const connector = await context.getConnector(config.integrationId); + + yield { type: "log", line: `Triggering Jenkins job: ${config.jobName}` }; + + // Trigger build + const run = await connector.triggerPipeline(config.jobName, inputs.parameters); + yield { type: "output", name: "buildId", value: run.id }; + yield { type: "log", line: `Build queued: ${run.id}` }; + + // Wait for completion if configured + if (config.waitForCompletion) { + yield { type: "log", line: "Waiting for build to complete..." }; + + while (true) { + const status = await connector.getPipelineRun(run.id); + + if (status.status === "succeeded") { + yield { type: "output", name: "status", value: "succeeded" }; + yield { type: "result", success: true }; + return; + } + + if (status.status === "failed") { + yield { type: "output", name: "status", value: "failed" }; + yield { type: "result", success: false, message: "Build failed" }; + return; + } + + yield { type: "progress", progress: 50, message: `Build running: ${status.status}` }; + await sleep(config.pollIntervalSeconds * 1000); + } + } + + yield { type: "result", success: true }; + } +} +``` + +--- + +## Database Schema + +```sql +-- Plugins +CREATE TABLE release.plugins ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + plugin_id VARCHAR(255) NOT NULL UNIQUE, + version VARCHAR(50) NOT NULL, + vendor VARCHAR(255) NOT NULL, + license VARCHAR(100), + manifest JSONB NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'discovered' CHECK (status IN ( + 'discovered', 'loaded', 'active', 'stopped', 'failed', 'degraded' + )), + entrypoint VARCHAR(500) NOT NULL, + last_health_check TIMESTAMPTZ, + health_message TEXT, + installed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_plugins_status ON release.plugins(status); + +-- Plugin Instances (per-tenant configuration) +CREATE TABLE release.plugin_instances ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + plugin_id UUID NOT NULL REFERENCES release.plugins(id) ON DELETE CASCADE, + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + config JSONB NOT NULL DEFAULT '{}', + enabled BOOLEAN NOT NULL DEFAULT TRUE, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_plugin_instances_tenant ON release.plugin_instances(tenant_id); + +-- Integration types (populated by plugins) +CREATE TABLE release.integration_types ( + id TEXT PRIMARY KEY, -- "scm.github", "ci.jenkins" + plugin_id UUID REFERENCES release.plugins(id), + display_name TEXT NOT NULL, + description TEXT, + icon_url TEXT, + config_schema JSONB NOT NULL, -- JSON Schema for config + capabilities TEXT[] NOT NULL, -- ["repos", "webhooks", "status"] + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); +``` + +--- + +## API Endpoints + +```yaml +# Plugin Registry +GET /api/v1/plugins + Query: ?status={status}&capability={type} + Response: Plugin[] + +GET /api/v1/plugins/{id} + Response: Plugin (with manifest) + +POST /api/v1/plugins/{id}/enable + Response: Plugin + +POST /api/v1/plugins/{id}/disable + Response: Plugin + +GET /api/v1/plugins/{id}/health + Response: { status, message, diagnostics[] } + +# Plugin Instances (per-tenant config) +POST /api/v1/plugin-instances + Body: { pluginId: UUID, config: object } + Response: PluginInstance + +GET /api/v1/plugin-instances + Response: PluginInstance[] + +PUT /api/v1/plugin-instances/{id} + Body: { config: object, enabled: boolean } + Response: PluginInstance + +DELETE /api/v1/plugin-instances/{id} + Response: { deleted: true } +``` + +--- + +## Plugin Security + +### Capability Declaration + +Plugins must declare all required capabilities in their manifest. The system enforces: + +1. **Network Access**: Plugins can only access declared hosts +2. **Secret Access**: Plugins receive secrets through controlled injection +3. **Database Access**: No direct database access; API only +4. **Filesystem Access**: Limited to declared paths + +### Sandbox Enforcement + +```typescript +// Plugin execution is sandboxed +class PluginSandbox { + async execute( + plugin: Plugin, + operation: () => Promise + ): Promise { + // 1. Verify capabilities + this.verifyCapabilities(plugin); + + // 2. Set resource limits + const limits = this.getResourceLimits(plugin); + await this.applyLimits(limits); + + // 3. Create isolated context + const context = await this.createIsolatedContext(plugin); + + try { + // 4. Execute with timeout + return await this.withTimeout( + operation(), + plugin.manifest.timeouts.operationMs + ); + } catch (error) { + // 5. Log and handle errors + await this.handlePluginError(plugin, error); + throw error; + } finally { + // 6. Cleanup + await context.dispose(); + } + } +} +``` + +### Plugin Failures Cannot Crash Core + +```csharp +// Core orchestration is protected from plugin failures +public sealed class PromotionDecisionEngine +{ + public async Task EvaluateAsync( + Promotion promotion, + IReadOnlyList gates, + CancellationToken ct) + { + var results = new List(); + + foreach (var gate in gates) + { + try + { + // Plugin provides evaluation logic + var result = await gate.EvaluateAsync(promotion, ct); + results.Add(result); + } + catch (Exception ex) + { + // Plugin failure is logged but doesn't crash core + _logger.LogError(ex, "Gate {GateType} failed", gate.Type); + results.Add(new GateResult + { + GateType = gate.Type, + Status = GateStatus.Failed, + Message = $"Gate evaluation failed: {ex.Message}", + IsBlocking = gate.IsBlocking, + }); + } + + // Core decides how to aggregate (plugins cannot override) + if (results.Last().IsBlocking && _policy.FailFast) + break; + } + + // Core makes final decision + return _decisionAggregator.Aggregate(results); + } +} +``` + +--- + +## References + +- [Module Overview](overview.md) +- [Integration Hub](integration-hub.md) +- [Workflow Engine](workflow-engine.md) +- [Connector Interface](../integrations/connectors.md) diff --git a/docs/modules/release-orchestrator/modules/progressive-delivery.md b/docs/modules/release-orchestrator/modules/progressive-delivery.md new file mode 100644 index 000000000..03c02fe1f --- /dev/null +++ b/docs/modules/release-orchestrator/modules/progressive-delivery.md @@ -0,0 +1,471 @@ +# PROGDL: Progressive Delivery + +**Purpose**: A/B releases, canary deployments, and traffic management. + +## Architecture Overview + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ PROGRESSIVE DELIVERY ARCHITECTURE │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ A/B RELEASE MANAGER │ │ +│ │ │ │ +│ │ - Create A/B release with variations │ │ +│ │ - Manage traffic split configuration │ │ +│ │ - Coordinate rollout stages │ │ +│ │ - Handle promotion/rollback │ │ +│ └──────────────────────────────┬──────────────────────────────────────┘ │ +│ │ │ +│ ┌──────────────────┴──────────────────┐ │ +│ │ │ │ +│ ▼ ▼ │ +│ ┌───────────────────────┐ ┌───────────────────────┐ │ +│ │ TARGET-GROUP A/B │ │ ROUTER-BASED A/B │ │ +│ │ │ │ │ │ +│ │ Deploy to groups │ │ Configure traffic │ │ +│ │ by labels/membership │ │ via load balancer │ │ +│ │ │ │ │ │ +│ │ Good for: │ │ Good for: │ │ +│ │ - Background workers │ │ - Web/API traffic │ │ +│ │ - Batch processors │ │ - Customer-facing │ │ +│ │ - Internal services │ │ - L7 routing │ │ +│ └───────────────────────┘ └───────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ CANARY CONTROLLER │ │ +│ │ │ │ +│ │ - Execute rollout stages │ │ +│ │ - Monitor health metrics │ │ +│ │ - Auto-advance or pause │ │ +│ │ - Trigger rollback on failure │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ TRAFFIC ROUTER INTEGRATION │ │ +│ │ │ │ +│ │ Plugin-based integration with: │ │ +│ │ - Nginx (config generation + reload) │ │ +│ │ - HAProxy (config generation + reload) │ │ +│ │ - Traefik (dynamic config API) │ │ +│ │ - AWS ALB (target group weights) │ │ +│ │ - Custom (webhook) │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Modules + +### Module: `ab-manager` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | A/B release lifecycle; variation management | +| **Dependencies** | `release-manager`, `environment-manager`, `deploy-orchestrator` | +| **Data Entities** | `ABRelease`, `Variation`, `TrafficSplit` | +| **Events Produced** | `ab.created`, `ab.started`, `ab.stage_advanced`, `ab.promoted`, `ab.rolled_back` | + +**A/B Release Entity**: +```typescript +interface ABRelease { + id: UUID; + tenantId: UUID; + environmentId: UUID; + name: string; + variations: Variation[]; + activeVariation: string; // "A" or "B" + trafficSplit: TrafficSplit; + rolloutStrategy: RolloutStrategy; + status: ABReleaseStatus; + createdAt: DateTime; + completedAt: DateTime | null; + createdBy: UUID; +} + +interface Variation { + name: string; // "A", "B" + releaseId: UUID; + targetGroupId: UUID | null; // for target-group based A/B + trafficPercentage: number; + deploymentJobId: UUID | null; +} + +interface TrafficSplit { + type: "percentage" | "sticky" | "header"; + percentages: Record; // {"A": 90, "B": 10} + stickyKey?: string; // cookie or header name + headerMatch?: { // for header-based routing + header: string; + values: Record; // value -> variation + }; +} + +type ABReleaseStatus = + | "created" // Configured, not started + | "deploying" // Deploying variations + | "running" // Active with traffic split + | "promoting" // Promoting winner to 100% + | "completed" // Successfully completed + | "rolled_back"; // Rolled back to original +``` + +**A/B Release Models**: + +| Model | Description | Use Case | +|-------|-------------|----------| +| **Target-Group A/B** | Deploy different releases to different target groups | Background workers, internal services | +| **Router-Based A/B** | Use load balancer to split traffic | Web/API traffic, customer-facing | +| **Hybrid A/B** | Combination of both | Complex deployments | + +--- + +### Module: `traffic-router` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Router plugin orchestration; traffic shifting | +| **Dependencies** | `integration-manager`, `connector-runtime` | +| **Protocol** | Plugin-specific (API calls, config generation) | + +**Router Connector Interface**: +```typescript +interface RouterConnector extends BaseConnector { + // Traffic management + configureRoute(config: RouteConfig): Promise; + getTrafficDistribution(): Promise; + shiftTraffic(from: string, to: string, percentage: number): Promise; + + // Configuration + reloadConfig(): Promise; + validateConfig(config: string): Promise; +} + +interface RouteConfig { + upstream: string; + backends: Array<{ + name: string; + targets: string[]; + weight: number; + }>; + healthCheck?: { + path: string; + interval: number; + timeout: number; + }; +} + +interface TrafficDistribution { + backends: Array<{ + name: string; + weight: number; + healthyTargets: number; + totalTargets: number; + }>; + timestamp: DateTime; +} +``` + +**Router Plugins**: + +| Plugin | Capabilities | +|--------|-------------| +| `router.nginx` | Config generation, reload via signal/API | +| `router.haproxy` | Config generation, reload via socket | +| `router.traefik` | Dynamic config API | +| `router.aws_alb` | Target group weights via AWS API | +| `router.custom` | Webhook-based custom integration | + +--- + +### Module: `canary-controller` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Canary ramp automation; health monitoring | +| **Dependencies** | `ab-manager`, `traffic-router` | +| **Data Entities** | `CanaryStage`, `HealthResult` | +| **Events Produced** | `canary.stage_started`, `canary.stage_passed`, `canary.stage_failed` | + +**Canary Stage Entity**: +```typescript +interface CanaryStage { + id: UUID; + abReleaseId: UUID; + stageNumber: number; + trafficPercentage: number; + status: CanaryStageStatus; + healthThreshold: number; // Required health % to pass + durationSeconds: number; // How long to run stage + requireApproval: boolean; // Require manual approval + startedAt: DateTime | null; + completedAt: DateTime | null; + healthResult: HealthResult | null; +} + +type CanaryStageStatus = + | "pending" + | "running" + | "succeeded" + | "failed" + | "skipped"; + +interface HealthResult { + healthy: boolean; + healthPercentage: number; + metrics: { + successRate: number; + errorRate: number; + latencyP50: number; + latencyP99: number; + }; + samples: number; + evaluatedAt: DateTime; +} +``` + +**Canary Rollout Execution**: +```typescript +class CanaryController { + async executeRollout(abRelease: ABRelease): Promise { + const stages = abRelease.rolloutStrategy.stages; + + for (const stage of stages) { + this.log(`Starting canary stage ${stage.stageNumber}: ${stage.trafficPercentage}%`); + + // 1. Shift traffic to canary percentage + await this.trafficRouter.shiftTraffic( + abRelease.variations[0].name, // baseline + abRelease.variations[1].name, // canary + stage.trafficPercentage + ); + + // 2. Update stage status + stage.status = "running"; + stage.startedAt = new Date(); + await this.save(stage); + + // 3. Wait for stage duration + await this.waitForDuration(stage.durationSeconds); + + // 4. Evaluate health + const healthResult = await this.evaluateHealth(abRelease, stage); + stage.healthResult = healthResult; + + if (!healthResult.healthy || healthResult.healthPercentage < stage.healthThreshold) { + stage.status = "failed"; + await this.save(stage); + + // Rollback + await this.rollback(abRelease); + throw new CanaryFailedError(`Stage ${stage.stageNumber} failed health check`); + } + + // 5. Check if approval required + if (stage.requireApproval) { + await this.waitForApproval(abRelease, stage); + } + + stage.status = "succeeded"; + stage.completedAt = new Date(); + await this.save(stage); + + // 6. Check for auto-advance + if (!abRelease.rolloutStrategy.autoAdvance) { + await this.waitForManualAdvance(abRelease); + } + } + + // All stages passed - promote canary to 100% + await this.promote(abRelease, abRelease.variations[1].name); + } + + private async evaluateHealth(abRelease: ABRelease, stage: CanaryStage): Promise { + // Collect metrics from targets + const canaryVariation = abRelease.variations.find(v => v.name === "B"); + const targets = await this.getTargets(canaryVariation.targetGroupId); + + let healthyCount = 0; + let totalLatency = 0; + let errorCount = 0; + + for (const target of targets) { + const health = await this.checkTargetHealth(target); + if (health.healthy) healthyCount++; + totalLatency += health.latencyMs; + if (health.errorRate > 0) errorCount++; + } + + return { + healthy: healthyCount >= targets.length * (stage.healthThreshold / 100), + healthPercentage: (healthyCount / targets.length) * 100, + metrics: { + successRate: ((targets.length - errorCount) / targets.length) * 100, + errorRate: (errorCount / targets.length) * 100, + latencyP50: totalLatency / targets.length, + latencyP99: totalLatency / targets.length * 1.5, // simplified + }, + samples: targets.length, + evaluatedAt: new Date(), + }; + } +} +``` + +--- + +### Module: `rollout-strategy` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Strategy templates; configuration | +| **Data Entities** | `RolloutStrategyTemplate` | + +**Built-in Strategy Templates**: + +| Template | Stages | Description | +|----------|--------|-------------| +| `canary-10-25-50-100` | 4 | Standard canary: 10%, 25%, 50%, 100% | +| `canary-1-5-10-50-100` | 5 | Conservative: 1%, 5%, 10%, 50%, 100% | +| `blue-green-instant` | 2 | Deploy 100% to green, instant switch | +| `blue-green-gradual` | 4 | Gradual shift: 25%, 50%, 75%, 100% | + +**Rollout Strategy Definition**: +```typescript +interface RolloutStrategy { + id: UUID; + name: string; + stages: Array<{ + trafficPercentage: number; + durationSeconds: number; + healthThreshold: number; + requireApproval: boolean; + }>; + autoAdvance: boolean; + rollbackOnFailure: boolean; + healthCheckInterval: number; +} + +// Example: Standard Canary +const standardCanary: RolloutStrategy = { + name: "canary-10-25-50-100", + stages: [ + { trafficPercentage: 10, durationSeconds: 300, healthThreshold: 95, requireApproval: false }, + { trafficPercentage: 25, durationSeconds: 600, healthThreshold: 95, requireApproval: false }, + { trafficPercentage: 50, durationSeconds: 900, healthThreshold: 95, requireApproval: true }, + { trafficPercentage: 100, durationSeconds: 0, healthThreshold: 95, requireApproval: false }, + ], + autoAdvance: true, + rollbackOnFailure: true, + healthCheckInterval: 30, +}; +``` + +--- + +## Database Schema + +```sql +-- A/B Releases +CREATE TABLE release.ab_releases ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES release.environments(id), + name VARCHAR(255) NOT NULL, + variations JSONB NOT NULL, -- [{name, releaseId, targetGroupId, trafficPercentage}] + active_variation VARCHAR(50) NOT NULL DEFAULT 'A', + traffic_split JSONB NOT NULL, + rollout_strategy JSONB NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'created' CHECK (status IN ( + 'created', 'deploying', 'running', 'promoting', 'completed', 'rolled_back' + )), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + completed_at TIMESTAMPTZ, + created_by UUID REFERENCES users(id) +); + +CREATE INDEX idx_ab_releases_tenant_env ON release.ab_releases(tenant_id, environment_id); +CREATE INDEX idx_ab_releases_status ON release.ab_releases(status); + +-- Canary Stages +CREATE TABLE release.canary_stages ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + ab_release_id UUID NOT NULL REFERENCES release.ab_releases(id) ON DELETE CASCADE, + stage_number INTEGER NOT NULL, + traffic_percentage INTEGER NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN ( + 'pending', 'running', 'succeeded', 'failed', 'skipped' + )), + health_threshold DECIMAL(5,2), + duration_seconds INTEGER, + require_approval BOOLEAN NOT NULL DEFAULT FALSE, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + health_result JSONB, + UNIQUE (ab_release_id, stage_number) +); +``` + +--- + +## API Endpoints + +```yaml +# A/B Releases +POST /api/v1/ab-releases + Body: { + environmentId: UUID, + name: string, + variations: [ + { name: "A", releaseId: UUID, targetGroupId?: UUID }, + { name: "B", releaseId: UUID, targetGroupId?: UUID } + ], + trafficSplit: TrafficSplit, + rolloutStrategy: RolloutStrategy + } + Response: ABRelease + +GET /api/v1/ab-releases + Query: ?environmentId={uuid}&status={status} + Response: ABRelease[] + +GET /api/v1/ab-releases/{id} + Response: ABRelease (with stages) + +POST /api/v1/ab-releases/{id}/start + Response: ABRelease + +POST /api/v1/ab-releases/{id}/advance + Body: { stageNumber?: number } # advance to next or specific stage + Response: ABRelease + +POST /api/v1/ab-releases/{id}/promote + Body: { variation: "A" | "B" } # promote to 100% + Response: ABRelease + +POST /api/v1/ab-releases/{id}/rollback + Response: ABRelease + +GET /api/v1/ab-releases/{id}/traffic + Response: { currentSplit: TrafficDistribution, history: TrafficHistory[] } + +GET /api/v1/ab-releases/{id}/health + Response: { variations: [{ name, healthStatus, metrics }] } + +# Rollout Strategies +GET /api/v1/rollout-strategies + Response: RolloutStrategyTemplate[] + +GET /api/v1/rollout-strategies/{id} + Response: RolloutStrategyTemplate +``` + +--- + +## References + +- [Module Overview](overview.md) +- [Deploy Orchestrator](deploy-orchestrator.md) +- [A/B Releases](../progressive-delivery/ab-releases.md) +- [Canary Controller](../progressive-delivery/canary.md) +- [Router Plugins](../progressive-delivery/routers.md) diff --git a/docs/modules/release-orchestrator/modules/promotion-manager.md b/docs/modules/release-orchestrator/modules/promotion-manager.md new file mode 100644 index 000000000..40e331f4e --- /dev/null +++ b/docs/modules/release-orchestrator/modules/promotion-manager.md @@ -0,0 +1,433 @@ +# PROMOT: Promotion & Approval Manager + +**Purpose**: Manage promotion requests, approvals, gates, and decision records. + +## Modules + +### Module: `promotion-manager` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Promotion request lifecycle; state management | +| **Dependencies** | `release-manager`, `environment-manager`, `workflow-engine` | +| **Data Entities** | `Promotion`, `PromotionState` | +| **Events Produced** | `promotion.requested`, `promotion.approved`, `promotion.rejected`, `promotion.started`, `promotion.completed`, `promotion.failed`, `promotion.rolled_back` | + +**Key Operations**: +``` +RequestPromotion(releaseId, targetEnvironmentId, reason) → Promotion +ApprovePromotion(promotionId, comment) → Promotion +RejectPromotion(promotionId, reason) → Promotion +CancelPromotion(promotionId) → Promotion +GetPromotionStatus(promotionId) → PromotionState +GetDecisionRecord(promotionId) → DecisionRecord +``` + +**Promotion Entity**: +```typescript +interface Promotion { + id: UUID; + tenantId: UUID; + releaseId: UUID; + sourceEnvironmentId: UUID | null; // null for first deployment + targetEnvironmentId: UUID; + status: PromotionStatus; + decisionRecord: DecisionRecord; + workflowRunId: UUID | null; + requestedAt: DateTime; + requestedBy: UUID; + requestReason: string; + decidedAt: DateTime | null; + startedAt: DateTime | null; + completedAt: DateTime | null; + evidencePacketId: UUID | null; +} + +type PromotionStatus = + | "pending_approval" // Waiting for human approval + | "pending_gate" // Waiting for gate evaluation + | "approved" // Ready for deployment + | "rejected" // Blocked by approval or gate + | "deploying" // Deployment in progress + | "deployed" // Successfully deployed + | "failed" // Deployment failed + | "cancelled" // User cancelled + | "rolled_back"; // Rolled back after failure +``` + +--- + +### Module: `approval-gateway` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Approval collection; separation of duties enforcement | +| **Dependencies** | `authority` (for user/group lookup) | +| **Data Entities** | `Approval`, `ApprovalPolicy` | +| **Events Produced** | `approval.granted`, `approval.denied` | + +**Approval Policy Entity**: +```typescript +interface ApprovalPolicy { + id: UUID; + tenantId: UUID; + environmentId: UUID; + requiredCount: number; // Minimum approvals required + requiredRoles: string[]; // At least one approver must have role + requiredGroups: string[]; // At least one approver must be in group + requireSeparationOfDuties: boolean; // Requester cannot approve + allowSelfApproval: boolean; // Override SoD for specific users + expirationMinutes: number; // Approval expires after N minutes +} + +interface Approval { + id: UUID; + tenantId: UUID; + promotionId: UUID; + approverId: UUID; + action: "approved" | "rejected"; + comment: string; + approvedAt: DateTime; + approverRole: string; + approverGroups: string[]; +} +``` + +**Separation of Duties (SoD) Rules**: +1. Requester cannot approve their own promotion (if `requireSeparationOfDuties` is true) +2. Same user cannot approve twice +3. At least N different users must approve (based on `requiredCount`) +4. At least one approver must match `requiredRoles` if specified +5. At least one approver must be in `requiredGroups` if specified + +--- + +### Module: `decision-engine` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Gate evaluation; policy integration; decision record generation | +| **Dependencies** | `gate-registry`, `policy` (OPA integration), `scanner` (security data) | +| **Data Entities** | `DecisionRecord`, `GateResult` | +| **Events Produced** | `decision.evaluated`, `decision.recorded` | + +**Decision Record Structure**: +```typescript +interface DecisionRecord { + promotionId: UUID; + evaluatedAt: DateTime; + decision: "allow" | "deny" | "pending"; + + // What was evaluated + release: { + id: UUID; + name: string; + components: Array<{ + name: string; + digest: string; + semver: string; + }>; + }; + + environment: { + id: UUID; + name: string; + requiredApprovals: number; + freezeWindow: boolean; + }; + + // Gate evaluation results + gates: GateResult[]; + + // Approval status + approvalStatus: { + required: number; + received: number; + approvers: Array<{ + userId: UUID; + action: string; + at: DateTime; + }>; + sodViolation: boolean; + }; + + // Reason for decision + reasons: string[]; + + // Hash of all inputs for replay verification + inputsHash: string; +} + +interface GateResult { + gateType: string; + gateName: string; + status: "passed" | "failed" | "warning" | "skipped"; + message: string; + details: Record; + evaluatedAt: DateTime; + durationMs: number; +} +``` + +**Gate Evaluation Order**: +1. **Freeze Window Check**: Is environment in freeze? +2. **Approval Check**: All required approvals received? +3. **Security Gate**: No blocking vulnerabilities? +4. **Custom Policy Gates**: All OPA policies pass? +5. **Integration Gates**: External system checks pass? + +--- + +### Module: `gate-registry` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Built-in + custom gate registration | +| **Dependencies** | `plugin-registry` | +| **Data Entities** | `GateDefinition`, `GateConfig` | + +**Built-in Gates**: + +| Gate Type | Description | +|-----------|-------------| +| `freeze-window` | Check if environment is in freeze | +| `approval` | Check if required approvals received | +| `security-scan` | Check for blocking vulnerabilities | +| `scan-freshness` | Check if scan is recent enough | +| `digest-verification` | Verify digests haven't changed | +| `environment-sequence` | Enforce promotion order | +| `custom-opa` | Custom OPA/Rego policy | +| `webhook` | External webhook gate | + +**Gate Definition**: +```typescript +interface GateDefinition { + type: string; + displayName: string; + description: string; + configSchema: JSONSchema; + evaluator: "builtin" | UUID; // builtin or plugin ID + blocking: boolean; // Can block promotion + cacheable: boolean; // Can cache result + cacheTtlSeconds: number; +} +``` + +--- + +## Promotion State Machine + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ PROMOTION STATE MACHINE │ +│ │ +│ ┌───────────────┐ │ +│ │ REQUESTED │ ◄──── User requests promotion │ +│ └───────┬───────┘ │ +│ │ │ +│ ▼ │ +│ ┌───────────────┐ ┌───────────────┐ │ +│ │ PENDING │─────►│ REJECTED │ ◄──── Approver rejects │ +│ │ APPROVAL │ └───────────────┘ │ +│ └───────┬───────┘ │ +│ │ approval received │ +│ ▼ │ +│ ┌───────────────┐ ┌───────────────┐ │ +│ │ PENDING │─────►│ REJECTED │ ◄──── Gate fails │ +│ │ GATE │ └───────────────┘ │ +│ └───────┬───────┘ │ +│ │ all gates pass │ +│ ▼ │ +│ ┌───────────────┐ │ +│ │ APPROVED │ ◄──── Ready for deployment │ +│ └───────┬───────┘ │ +│ │ workflow starts │ +│ ▼ │ +│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ +│ │ DEPLOYING │─────►│ FAILED │─────►│ ROLLED_BACK │ │ +│ └───────┬───────┘ └───────────────┘ └───────────────┘ │ +│ │ │ +│ │ deployment complete │ +│ ▼ │ +│ ┌───────────────┐ │ +│ │ DEPLOYED │ ◄──── Success! │ +│ └───────────────┘ │ +│ │ +│ Additional transitions: │ +│ - Any non-terminal → CANCELLED: user cancels │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Database Schema + +```sql +-- Promotions +CREATE TABLE release.promotions ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + release_id UUID NOT NULL REFERENCES release.releases(id), + source_environment_id UUID REFERENCES release.environments(id), + target_environment_id UUID NOT NULL REFERENCES release.environments(id), + status VARCHAR(50) NOT NULL DEFAULT 'pending_approval' CHECK (status IN ( + 'pending_approval', 'pending_gate', 'approved', 'rejected', + 'deploying', 'deployed', 'failed', 'cancelled', 'rolled_back' + )), + decision_record JSONB, + workflow_run_id UUID REFERENCES release.workflow_runs(id), + requested_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + requested_by UUID NOT NULL REFERENCES users(id), + request_reason TEXT, + decided_at TIMESTAMPTZ, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + evidence_packet_id UUID +); + +CREATE INDEX idx_promotions_tenant ON release.promotions(tenant_id); +CREATE INDEX idx_promotions_release ON release.promotions(release_id); +CREATE INDEX idx_promotions_status ON release.promotions(status); +CREATE INDEX idx_promotions_target_env ON release.promotions(target_environment_id); + +-- Approvals +CREATE TABLE release.approvals ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + promotion_id UUID NOT NULL REFERENCES release.promotions(id) ON DELETE CASCADE, + approver_id UUID NOT NULL REFERENCES users(id), + action VARCHAR(50) NOT NULL CHECK (action IN ('approved', 'rejected')), + comment TEXT, + approved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + approver_role VARCHAR(255), + approver_groups JSONB NOT NULL DEFAULT '[]' +); + +CREATE INDEX idx_approvals_promotion ON release.approvals(promotion_id); +CREATE INDEX idx_approvals_approver ON release.approvals(approver_id); + +-- Approval Policies +CREATE TABLE release.approval_policies ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES release.environments(id) ON DELETE CASCADE, + required_count INTEGER NOT NULL DEFAULT 1, + required_roles JSONB NOT NULL DEFAULT '[]', + required_groups JSONB NOT NULL DEFAULT '[]', + require_sod BOOLEAN NOT NULL DEFAULT FALSE, + allow_self_approval BOOLEAN NOT NULL DEFAULT FALSE, + expiration_minutes INTEGER NOT NULL DEFAULT 1440, + UNIQUE (tenant_id, environment_id) +); +``` + +--- + +## API Endpoints + +```yaml +# Promotions +POST /api/v1/promotions + Body: { releaseId, targetEnvironmentId, reason? } + Response: Promotion + +GET /api/v1/promotions + Query: ?status={status}&releaseId={uuid}&environmentId={uuid}&page={n} + Response: { data: Promotion[], meta: PaginationMeta } + +GET /api/v1/promotions/{id} + Response: Promotion (with decision record, approvals) + +POST /api/v1/promotions/{id}/approve + Body: { comment? } + Response: Promotion + +POST /api/v1/promotions/{id}/reject + Body: { reason } + Response: Promotion + +POST /api/v1/promotions/{id}/cancel + Response: Promotion + +GET /api/v1/promotions/{id}/decision + Response: DecisionRecord + +GET /api/v1/promotions/{id}/approvals + Response: Approval[] + +GET /api/v1/promotions/{id}/evidence + Response: EvidencePacket + +# Gate Evaluation Preview +POST /api/v1/promotions/preview-gates + Body: { releaseId, targetEnvironmentId } + Response: { wouldPass: boolean, gates: GateResult[] } + +# Approval Policies +POST /api/v1/approval-policies +GET /api/v1/approval-policies +GET /api/v1/approval-policies/{id} +PUT /api/v1/approval-policies/{id} +DELETE /api/v1/approval-policies/{id} + +# Pending Approvals (for current user) +GET /api/v1/my/pending-approvals + Response: Promotion[] +``` + +--- + +## Security Gate Integration + +The security gate evaluates the release against vulnerability data from the Scanner module: + +```typescript +interface SecurityGateConfig { + blockOnCritical: boolean; // Block if any critical severity + blockOnHigh: boolean; // Block if any high severity + maxCritical: number; // Max allowed critical (0 for strict) + maxHigh: number; // Max allowed high + requireFreshScan: boolean; // Require scan within N hours + scanFreshnessHours: number; // How recent scan must be + allowExceptions: boolean; // Allow VEX exceptions + requireVexJustification: boolean; // Require VEX for exceptions +} + +interface SecurityGateResult { + passed: boolean; + summary: { + critical: number; + high: number; + medium: number; + low: number; + }; + blocking: Array<{ + cve: string; + severity: string; + component: string; + digest: string; + fixAvailable: boolean; + }>; + exceptions: Array<{ + cve: string; + vexStatus: string; + justification: string; + }>; + scanAge: { + component: string; + scannedAt: DateTime; + ageHours: number; + fresh: boolean; + }[]; +} +``` + +--- + +## References + +- [Module Overview](overview.md) +- [Workflow Engine](workflow-engine.md) +- [Security Architecture](../security/overview.md) +- [API Documentation](../api/promotions.md) diff --git a/docs/modules/release-orchestrator/modules/release-manager.md b/docs/modules/release-orchestrator/modules/release-manager.md new file mode 100644 index 000000000..c43b68f47 --- /dev/null +++ b/docs/modules/release-orchestrator/modules/release-manager.md @@ -0,0 +1,406 @@ +# RELMAN: Release Management + +**Purpose**: Manage components, versions, and release bundles. + +## Modules + +### Module: `component-registry` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Map image repositories to logical components | +| **Dependencies** | `integration-manager` (for registry access) | +| **Data Entities** | `Component`, `ComponentVersion` | +| **Events Produced** | `component.created`, `component.updated`, `component.deleted` | + +**Key Operations**: +``` +CreateComponent(name, displayName, imageRepository, registryId) → Component +UpdateComponent(id, config) → Component +DeleteComponent(id) → void +SyncVersions(componentId, forceRefresh) → VersionMap[] +ListComponents(tenantId) → Component[] +``` + +**Component Entity**: +```typescript +interface Component { + id: UUID; + tenantId: UUID; + name: string; // "api", "worker", "frontend" + displayName: string; // "API Service" + imageRepository: string; // "registry.example.com/myapp/api" + registryIntegrationId: UUID; // which registry integration + versioningStrategy: VersionStrategy; + deploymentTemplate: string; // which workflow template to use + defaultChannel: string; // "stable", "beta" + metadata: Record; +} + +interface VersionStrategy { + type: "semver" | "date" | "sequential" | "manual"; + tagPattern?: string; // regex for tag extraction + semverExtract?: string; // regex capture group +} +``` + +--- + +### Module: `version-manager` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Tag/digest mapping; version rules | +| **Dependencies** | `component-registry`, `connector-runtime` | +| **Data Entities** | `VersionMap`, `VersionRule`, `Channel` | +| **Events Produced** | `version.resolved`, `version.updated` | + +**Version Resolution**: +```typescript +interface VersionMap { + id: UUID; + componentId: UUID; + tag: string; // "v2.3.1" + digest: string; // "sha256:abc123..." + semver: string; // "2.3.1" + channel: string; // "stable" + prerelease: boolean; + buildMetadata: string; + resolvedAt: DateTime; + source: "auto" | "manual"; +} + +interface VersionRule { + id: UUID; + componentId: UUID; + pattern: string; // "^v(\\d+\\.\\d+\\.\\d+)$" + channel: string; // "stable" + prereleasePattern: string;// ".*-(alpha|beta|rc).*" +} +``` + +**Version Resolution Algorithm**: +1. Fetch tags from registry (via connector) +2. Apply version rules to extract semver +3. Resolve each tag to digest +4. Store in version map +5. Update channels ("latest stable", "latest beta") + +--- + +### Module: `release-manager` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Release bundle lifecycle; composition | +| **Dependencies** | `component-registry`, `version-manager` | +| **Data Entities** | `Release`, `ReleaseComponent` | +| **Events Produced** | `release.created`, `release.promoted`, `release.deprecated` | + +**Release Entity**: +```typescript +interface Release { + id: UUID; + tenantId: UUID; + name: string; // "myapp-v2.3.1" + displayName: string; // "MyApp 2.3.1" + components: ReleaseComponent[]; + sourceRef: SourceReference; + status: ReleaseStatus; + createdAt: DateTime; + createdBy: UUID; + deployedEnvironments: UUID[]; // where currently deployed + metadata: Record; +} + +interface ReleaseComponent { + componentId: UUID; + componentName: string; + digest: string; // sha256:... + semver: string; // resolved semver + tag: string; // original tag (for display) + role: "primary" | "sidecar" | "init" | "migration"; +} + +interface SourceReference { + scmIntegrationId?: UUID; + commitSha?: string; + branch?: string; + ciIntegrationId?: UUID; + buildId?: string; + pipelineUrl?: string; +} + +type ReleaseStatus = + | "draft" // being composed + | "ready" // ready for promotion + | "promoting" // promotion in progress + | "deployed" // deployed to at least one env + | "deprecated" // marked as deprecated + | "archived"; // no longer active +``` + +**Release Creation Modes**: + +| Mode | Description | +|------|-------------| +| **Full Release** | All components, latest versions | +| **Partial Release** | Subset of components updated; others pinned from last deployment | +| **Pinned Release** | All versions explicitly specified | +| **Channel Release** | All components from specific channel ("beta") | + +--- + +### Module: `release-catalog` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Release history, search, comparison | +| **Dependencies** | `release-manager` | + +**Key Operations**: +``` +SearchReleases(filter, pagination) → Release[] +CompareReleases(releaseA, releaseB) → ReleaseDiff +GetReleaseHistory(componentId) → Release[] +GetReleaseLineage(releaseId) → ReleaseLineage // promotion path +``` + +**Release Comparison**: +```typescript +interface ReleaseDiff { + releaseA: UUID; + releaseB: UUID; + added: ComponentDiff[]; // Components in B not in A + removed: ComponentDiff[]; // Components in A not in B + changed: ComponentChange[]; // Components with different versions + unchanged: ComponentDiff[]; // Components with same version +} + +interface ComponentChange { + componentId: UUID; + componentName: string; + fromVersion: string; + toVersion: string; + fromDigest: string; + toDigest: string; +} +``` + +--- + +## Database Schema + +```sql +-- Components +CREATE TABLE release.components ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + name VARCHAR(255) NOT NULL, + display_name VARCHAR(255) NOT NULL, + image_repository VARCHAR(500) NOT NULL, + registry_integration_id UUID REFERENCES release.integrations(id), + versioning_strategy JSONB NOT NULL DEFAULT '{"type": "semver"}', + deployment_template VARCHAR(255), + default_channel VARCHAR(50) NOT NULL DEFAULT 'stable', + metadata JSONB NOT NULL DEFAULT '{}', + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_components_tenant ON release.components(tenant_id); + +-- Version Maps +CREATE TABLE release.version_maps ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + component_id UUID NOT NULL REFERENCES release.components(id) ON DELETE CASCADE, + tag VARCHAR(255) NOT NULL, + digest VARCHAR(100) NOT NULL, + semver VARCHAR(50), + channel VARCHAR(50) NOT NULL DEFAULT 'stable', + prerelease BOOLEAN NOT NULL DEFAULT FALSE, + build_metadata VARCHAR(255), + resolved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + source VARCHAR(50) NOT NULL DEFAULT 'auto', + UNIQUE (tenant_id, component_id, digest) +); + +CREATE INDEX idx_version_maps_component ON release.version_maps(component_id); +CREATE INDEX idx_version_maps_digest ON release.version_maps(digest); +CREATE INDEX idx_version_maps_semver ON release.version_maps(semver); + +-- Releases +CREATE TABLE release.releases ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + name VARCHAR(255) NOT NULL, + display_name VARCHAR(255) NOT NULL, + components JSONB NOT NULL, -- [{componentId, digest, semver, tag, role}] + source_ref JSONB, -- {scmIntegrationId, commitSha, ciIntegrationId, buildId} + status VARCHAR(50) NOT NULL DEFAULT 'draft', + metadata JSONB NOT NULL DEFAULT '{}', + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + created_by UUID REFERENCES users(id), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_releases_tenant ON release.releases(tenant_id); +CREATE INDEX idx_releases_status ON release.releases(status); +CREATE INDEX idx_releases_created ON release.releases(created_at DESC); + +-- Release Environment State +CREATE TABLE release.release_environment_state ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES release.environments(id) ON DELETE CASCADE, + release_id UUID NOT NULL REFERENCES release.releases(id), + status VARCHAR(50) NOT NULL, + deployed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + deployed_by UUID REFERENCES users(id), + promotion_id UUID, + evidence_ref VARCHAR(255), + UNIQUE (tenant_id, environment_id) +); + +CREATE INDEX idx_release_env_state_env ON release.release_environment_state(environment_id); +CREATE INDEX idx_release_env_state_release ON release.release_environment_state(release_id); +``` + +--- + +## API Endpoints + +```yaml +# Components +POST /api/v1/components + Body: { name, displayName, imageRepository, registryIntegrationId, versioningStrategy?, defaultChannel? } + Response: Component + +GET /api/v1/components + Response: Component[] + +GET /api/v1/components/{id} + Response: Component + +PUT /api/v1/components/{id} + Response: Component + +DELETE /api/v1/components/{id} + Response: { deleted: true } + +POST /api/v1/components/{id}/sync-versions + Body: { forceRefresh?: boolean } + Response: { synced: number, versions: VersionMap[] } + +GET /api/v1/components/{id}/versions + Query: ?channel={stable|beta}&limit={n} + Response: VersionMap[] + +# Version Maps +POST /api/v1/version-maps + Body: { componentId, tag, semver, channel } # manual version assignment + Response: VersionMap + +GET /api/v1/version-maps + Query: ?componentId={uuid}&channel={channel} + Response: VersionMap[] + +# Releases +POST /api/v1/releases + Body: { + name: string, + displayName?: string, + components: [ + { componentId: UUID, version?: string, digest?: string, channel?: string } + ], + sourceRef?: SourceReference + } + Response: Release + +GET /api/v1/releases + Query: ?status={status}&componentId={uuid}&page={n}&pageSize={n} + Response: { data: Release[], meta: PaginationMeta } + +GET /api/v1/releases/{id} + Response: Release (with full component details) + +PUT /api/v1/releases/{id} + Body: { displayName?, metadata?, status? } + Response: Release + +DELETE /api/v1/releases/{id} + Response: { deleted: true } + +GET /api/v1/releases/{id}/state + Response: { environments: [{ environmentId, status, deployedAt }] } + +POST /api/v1/releases/{id}/deprecate + Response: Release + +GET /api/v1/releases/{id}/compare/{otherId} + Response: ReleaseDiff + +# Quick release creation +POST /api/v1/releases/from-latest + Body: { + name: string, + channel?: string, # default: stable + componentIds?: UUID[], # default: all + pinFrom?: { environmentId: UUID } # for partial release + } + Response: Release +``` + +--- + +## Release Identity: Digest-First Principle + +A core design invariant of the Release Orchestrator: + +``` +INVARIANT: A release is a set of OCI image digests (component -> digest mapping), never tags. +``` + +**Implementation Requirements**: +- Tags are convenience inputs for resolution +- Tags are resolved to digests at release creation time +- All downstream operations (promotion, deployment, rollback) use digests +- Digest mismatch at pull time = deployment failure (tamper detection) + +**Example**: +```json +{ + "id": "release-uuid", + "name": "myapp-v2.3.1", + "components": [ + { + "componentId": "api-component-uuid", + "componentName": "api", + "tag": "v2.3.1", + "digest": "sha256:abc123def456...", + "semver": "2.3.1", + "role": "primary" + }, + { + "componentId": "worker-component-uuid", + "componentName": "worker", + "tag": "v2.3.1", + "digest": "sha256:789xyz123abc...", + "semver": "2.3.1", + "role": "primary" + } + ] +} +``` + +--- + +## References + +- [Module Overview](overview.md) +- [Design Principles](../design/principles.md) +- [API Documentation](../api/releases.md) +- [Promotion Manager](promotion-manager.md) diff --git a/docs/modules/release-orchestrator/modules/workflow-engine.md b/docs/modules/release-orchestrator/modules/workflow-engine.md new file mode 100644 index 000000000..4dcafc894 --- /dev/null +++ b/docs/modules/release-orchestrator/modules/workflow-engine.md @@ -0,0 +1,590 @@ +# WORKFL: Workflow Engine + +**Purpose**: DAG-based workflow execution for deployments, approvals, and custom automation. + +## Modules + +### Module: `workflow-designer` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Template creation; DAG graph editor; validation | +| **Dependencies** | `step-registry` | +| **Data Entities** | `WorkflowTemplate`, `StepNode`, `StepEdge` | + +**Workflow Template Structure**: +```typescript +interface WorkflowTemplate { + id: UUID; + tenantId: UUID; + name: string; + displayName: string; + description: string; + version: number; + + // DAG structure + nodes: StepNode[]; + edges: StepEdge[]; + + // I/O + inputs: InputDefinition[]; + outputs: OutputDefinition[]; + + // Metadata + tags: string[]; + isBuiltin: boolean; + createdAt: DateTime; + createdBy: UUID; +} +``` + +--- + +### Module: `workflow-engine` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | DAG execution; state machine; pause/resume | +| **Dependencies** | `step-executor`, `step-registry` | +| **Data Entities** | `WorkflowRun`, `WorkflowState` | +| **Events Produced** | `workflow.started`, `workflow.paused`, `workflow.resumed`, `workflow.completed`, `workflow.failed` | + +**Workflow Execution Algorithm**: +```python +class WorkflowEngine: + def execute(self, workflow_run: WorkflowRun) -> None: + """Main workflow execution loop.""" + + # Initialize + workflow_run.status = "running" + workflow_run.started_at = now() + self.save(workflow_run) + + try: + while not self.is_terminal(workflow_run): + # Handle pause state + if workflow_run.status == "paused": + self.wait_for_resume(workflow_run) + continue + + # Get nodes ready for execution + ready_nodes = self.get_ready_nodes(workflow_run) + + if not ready_nodes: + # Check if we're waiting on approvals + if self.has_pending_approvals(workflow_run): + workflow_run.status = "paused" + self.save(workflow_run) + continue + + # Check if all nodes are complete + if self.all_nodes_complete(workflow_run): + break + + # Deadlock detection + raise WorkflowDeadlockError(workflow_run.id) + + # Execute ready nodes in parallel + futures = [] + for node in ready_nodes: + future = self.executor.submit( + self.execute_node, + workflow_run, + node + ) + futures.append((node, future)) + + # Wait for at least one to complete + completed = self.wait_any(futures) + + for node, result in completed: + step_run = self.get_step_run(workflow_run, node.id) + + if result.success: + step_run.status = "succeeded" + step_run.outputs = result.outputs + self.propagate_outputs(workflow_run, node, result.outputs) + else: + step_run.status = "failed" + step_run.error_message = result.error + + # Handle failure action + if node.on_failure == "fail": + workflow_run.status = "failed" + workflow_run.error_message = f"Step {node.name} failed: {result.error}" + self.cancel_pending_steps(workflow_run) + return + elif node.on_failure == "rollback": + self.trigger_rollback(workflow_run, node) + elif node.on_failure.startswith("goto:"): + target = node.on_failure.split(":")[1] + self.add_ready_node(workflow_run, target) + # "continue" just continues to next nodes + + step_run.completed_at = now() + self.save(step_run) + + # Workflow completed successfully + workflow_run.status = "succeeded" + workflow_run.completed_at = now() + self.save(workflow_run) + + except WorkflowCancelledError: + workflow_run.status = "cancelled" + workflow_run.completed_at = now() + self.save(workflow_run) + except Exception as e: + workflow_run.status = "failed" + workflow_run.error_message = str(e) + workflow_run.completed_at = now() + self.save(workflow_run) +``` + +--- + +### Module: `step-executor` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Step dispatch; retry logic; timeout handling | +| **Dependencies** | `step-registry`, `plugin-sandbox` | +| **Data Entities** | `StepRun`, `StepResult` | +| **Events Produced** | `step.started`, `step.progress`, `step.completed`, `step.failed`, `step.retrying` | + +**Step Node Structure**: +```typescript +interface StepNode { + id: string; // Unique within template (e.g., "deploy-api") + type: string; // Step type from registry + name: string; // Display name + config: Record; // Step-specific configuration + inputs: InputBinding[]; // Input value bindings + outputs: OutputBinding[]; // Output declarations + position: { x: number; y: number }; // UI position + + // Execution settings + timeout: number; // Seconds (default from step type) + retryPolicy: RetryPolicy; + onFailure: FailureAction; + condition?: string; // JS expression for conditional execution + + // Documentation + description?: string; + documentation?: string; +} + +type FailureAction = "fail" | "continue" | "rollback" | "goto:{nodeId}"; + +interface InputBinding { + name: string; // Input parameter name + source: InputSource; +} + +type InputSource = + | { type: "literal"; value: any } + | { type: "context"; path: string } // e.g., "release.name" + | { type: "output"; nodeId: string; outputName: string } + | { type: "secret"; secretName: string } + | { type: "expression"; expression: string }; // JS expression + +interface StepEdge { + id: string; + from: string; // Source node ID + to: string; // Target node ID + condition?: string; // Optional condition expression + label?: string; // Display label for conditional edges +} + +interface RetryPolicy { + maxRetries: number; + backoffType: "fixed" | "exponential"; + backoffSeconds: number; + retryableErrors: string[]; +} +``` + +--- + +### Module: `step-registry` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Built-in + plugin-provided step types | +| **Dependencies** | `plugin-registry` | +| **Data Entities** | `StepType`, `StepSchema` | + +**Built-in Step Types**: + +| Step Type | Category | Description | +|-----------|----------|-------------| +| `approval` | Control | Wait for human approval | +| `security-gate` | Gate | Evaluate security policy | +| `custom-gate` | Gate | Custom OPA policy evaluation | +| `deploy-docker` | Deploy | Deploy single container | +| `deploy-compose` | Deploy | Deploy Docker Compose stack | +| `deploy-ecs` | Deploy | Deploy to AWS ECS | +| `deploy-nomad` | Deploy | Deploy to HashiCorp Nomad | +| `health-check` | Verify | HTTP/TCP health check | +| `smoke-test` | Verify | Run smoke test suite | +| `execute-script` | Custom | Run C#/Bash script | +| `webhook` | Integration | Call external webhook | +| `trigger-ci` | Integration | Trigger CI pipeline | +| `wait-ci` | Integration | Wait for CI pipeline | +| `notify` | Notification | Send notification | +| `rollback` | Recovery | Rollback deployment | +| `traffic-shift` | Progressive | Shift traffic percentage | + +**Step Type Definition**: +```typescript +interface StepType { + type: string; // "deploy-compose" + displayName: string; // "Deploy Compose Stack" + description: string; + category: StepCategory; + icon: string; + + // Schema + configSchema: JSONSchema; // Step configuration schema + inputSchema: JSONSchema; // Required inputs schema + outputSchema: JSONSchema; // Produced outputs schema + + // Execution + executor: "builtin" | UUID; // builtin or plugin ID + defaultTimeout: number; + safeToRetry: boolean; + retryableErrors: string[]; + + // Documentation + documentation: string; + examples: StepExample[]; +} +``` + +--- + +## Workflow Run State Machine + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ WORKFLOW RUN STATE MACHINE │ +│ │ +│ ┌──────────┐ │ +│ │ CREATED │ │ +│ └────┬─────┘ │ +│ │ start() │ +│ ▼ │ +│ ┌─────────────────────────────┐ │ +│ │ │ │ +│ pause() ┌──┴──────────┐ │ │ +│ ┌────────►│ PAUSED │◄─────────┐ │ │ +│ │ └──────┬──────┘ │ │ │ +│ │ │ resume() │ │ │ +│ │ ▼ │ │ │ +│ │ ┌─────────────┐ │ │ │ +│ └─────────│ RUNNING │──────────┘ │ │ +│ └──────┬──────┘ (waiting for │ │ +│ │ approval) │ │ +│ ┌────────────┼────────────┐ │ │ +│ │ │ │ │ │ +│ ▼ ▼ ▼ │ │ +│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │ +│ │ SUCCEEDED │ │ FAILED │ │ CANCELLED │ │ │ +│ └───────────┘ └───────────┘ └───────────┘ │ │ +│ │ +│ Transitions: │ +│ - CREATED → RUNNING: start() │ +│ - RUNNING → PAUSED: pause(), waiting approval │ +│ - PAUSED → RUNNING: resume(), approval granted │ +│ - RUNNING → SUCCEEDED: all nodes complete │ +│ - RUNNING → FAILED: node fails with fail action │ +│ - RUNNING → CANCELLED: cancel() │ +│ - PAUSED → CANCELLED: cancel() │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Step Run State Machine + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ STEP RUN STATE MACHINE │ +│ │ +│ ┌──────────┐ │ +│ │ PENDING │ ◄──── Initial state; dependencies not met │ +│ └────┬─────┘ │ +│ │ dependencies met + condition true │ +│ ▼ │ +│ ┌──────────┐ │ +│ │ RUNNING │ ◄──── Step is executing │ +│ └────┬─────┘ │ +│ │ │ +│ ┌────┴────────────────┬─────────────────┐ │ +│ │ │ │ │ +│ ▼ ▼ ▼ │ +│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ +│ │ SUCCEEDED │ │ FAILED │ │ SKIPPED │ │ +│ └───────────┘ └─────┬─────┘ └───────────┘ │ +│ │ ▲ │ +│ │ │ condition false │ +│ ▼ │ │ +│ ┌───────────┐ │ │ +│ │ RETRYING │──────┘ (max retries exceeded) │ +│ └─────┬─────┘ │ +│ │ │ +│ │ retry attempt │ +│ └──────────────────┐ │ +│ │ │ +│ ▼ │ +│ ┌──────────┐ │ +│ │ RUNNING │ (retry) │ +│ └──────────┘ │ +│ │ +│ Additional transitions: │ +│ - Any state → CANCELLED: workflow cancelled │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Database Schema + +```sql +-- Workflow Templates +CREATE TABLE release.workflow_templates ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID REFERENCES tenants(id) ON DELETE CASCADE, + name VARCHAR(255) NOT NULL, + display_name VARCHAR(255) NOT NULL, + description TEXT, + version INTEGER NOT NULL DEFAULT 1, + nodes JSONB NOT NULL, + edges JSONB NOT NULL, + inputs JSONB NOT NULL DEFAULT '[]', + outputs JSONB NOT NULL DEFAULT '[]', + tags JSONB NOT NULL DEFAULT '[]', + is_builtin BOOLEAN NOT NULL DEFAULT FALSE, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + created_by UUID REFERENCES users(id) +); + +CREATE INDEX idx_workflow_templates_tenant ON release.workflow_templates(tenant_id); +CREATE INDEX idx_workflow_templates_name ON release.workflow_templates(name); + +-- Workflow Runs +CREATE TABLE release.workflow_runs ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + template_id UUID NOT NULL REFERENCES release.workflow_templates(id), + template_version INTEGER NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'created', + context JSONB NOT NULL, + inputs JSONB NOT NULL DEFAULT '{}', + outputs JSONB NOT NULL DEFAULT '{}', + error_message TEXT, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + created_by UUID REFERENCES users(id) +); + +CREATE INDEX idx_workflow_runs_tenant ON release.workflow_runs(tenant_id); +CREATE INDEX idx_workflow_runs_template ON release.workflow_runs(template_id); +CREATE INDEX idx_workflow_runs_status ON release.workflow_runs(status); + +-- Step Runs +CREATE TABLE release.step_runs ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + workflow_run_id UUID NOT NULL REFERENCES release.workflow_runs(id) ON DELETE CASCADE, + node_id VARCHAR(255) NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'pending', + inputs JSONB NOT NULL DEFAULT '{}', + outputs JSONB NOT NULL DEFAULT '{}', + error_message TEXT, + logs TEXT, + attempt_number INTEGER NOT NULL DEFAULT 1, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + UNIQUE (workflow_run_id, node_id) +); + +CREATE INDEX idx_step_runs_workflow ON release.step_runs(workflow_run_id); +CREATE INDEX idx_step_runs_status ON release.step_runs(status); + +-- Step Registry +CREATE TABLE release.step_types ( + type VARCHAR(255) PRIMARY KEY, + display_name VARCHAR(255) NOT NULL, + description TEXT, + category VARCHAR(100) NOT NULL, + icon VARCHAR(255), + config_schema JSONB NOT NULL, + input_schema JSONB NOT NULL, + output_schema JSONB NOT NULL, + executor VARCHAR(255) NOT NULL DEFAULT 'builtin', + default_timeout INTEGER NOT NULL DEFAULT 300, + safe_to_retry BOOLEAN NOT NULL DEFAULT FALSE, + retryable_errors JSONB NOT NULL DEFAULT '[]', + documentation TEXT, + examples JSONB NOT NULL DEFAULT '[]', + plugin_id UUID REFERENCES release.plugins(id), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_step_types_category ON release.step_types(category); +CREATE INDEX idx_step_types_plugin ON release.step_types(plugin_id); +``` + +--- + +## Workflow Template Example: Standard Deployment + +```json +{ + "id": "template-standard-deploy", + "name": "standard-deploy", + "displayName": "Standard Deployment", + "version": 1, + "inputs": [ + { "name": "releaseId", "type": "uuid", "required": true }, + { "name": "environmentId", "type": "uuid", "required": true }, + { "name": "promotionId", "type": "uuid", "required": true } + ], + "nodes": [ + { + "id": "approval", + "type": "approval", + "name": "Approval Gate", + "config": {}, + "inputs": [ + { "name": "promotionId", "source": { "type": "context", "path": "promotionId" } } + ], + "position": { "x": 100, "y": 100 } + }, + { + "id": "security-gate", + "type": "security-gate", + "name": "Security Verification", + "config": { + "blockOnCritical": true, + "blockOnHigh": true + }, + "inputs": [ + { "name": "releaseId", "source": { "type": "context", "path": "releaseId" } } + ], + "position": { "x": 100, "y": 200 } + }, + { + "id": "deploy-targets", + "type": "deploy-compose", + "name": "Deploy to Targets", + "config": { + "strategy": "rolling", + "parallelism": 2 + }, + "inputs": [ + { "name": "releaseId", "source": { "type": "context", "path": "releaseId" } }, + { "name": "environmentId", "source": { "type": "context", "path": "environmentId" } } + ], + "timeout": 600, + "retryPolicy": { + "maxRetries": 2, + "backoffType": "exponential", + "backoffSeconds": 30 + }, + "onFailure": "rollback", + "position": { "x": 100, "y": 400 } + }, + { + "id": "health-check", + "type": "health-check", + "name": "Health Verification", + "config": { + "type": "http", + "path": "/health", + "expectedStatus": 200, + "timeout": 30, + "retries": 5 + }, + "inputs": [ + { "name": "targets", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "deployedTargets" } } + ], + "onFailure": "rollback", + "position": { "x": 100, "y": 500 } + }, + { + "id": "notify-success", + "type": "notify", + "name": "Success Notification", + "config": { + "channel": "slack", + "template": "deployment-success" + }, + "onFailure": "continue", + "position": { "x": 100, "y": 700 } + }, + { + "id": "rollback-handler", + "type": "rollback", + "name": "Rollback Handler", + "config": { + "strategy": "to-previous" + }, + "inputs": [ + { "name": "deploymentJobId", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "jobId" } } + ], + "position": { "x": 300, "y": 450 } + } + ], + "edges": [ + { "id": "e1", "from": "approval", "to": "security-gate" }, + { "id": "e2", "from": "security-gate", "to": "deploy-targets" }, + { "id": "e3", "from": "deploy-targets", "to": "health-check" }, + { "id": "e4", "from": "health-check", "to": "notify-success" }, + { "id": "e5", "from": "deploy-targets", "to": "rollback-handler", "condition": "status === 'failed'" }, + { "id": "e6", "from": "health-check", "to": "rollback-handler", "condition": "status === 'failed'" } + ] +} +``` + +--- + +## API Endpoints + +See [API Documentation](../api/workflows.md) for full specification. + +```yaml +# Workflow Templates +POST /api/v1/workflow-templates +GET /api/v1/workflow-templates +GET /api/v1/workflow-templates/{id} +PUT /api/v1/workflow-templates/{id} +DELETE /api/v1/workflow-templates/{id} +POST /api/v1/workflow-templates/{id}/validate + +# Step Registry +GET /api/v1/step-types +GET /api/v1/step-types/{type} + +# Workflow Runs +POST /api/v1/workflow-runs +GET /api/v1/workflow-runs +GET /api/v1/workflow-runs/{id} +POST /api/v1/workflow-runs/{id}/pause +POST /api/v1/workflow-runs/{id}/resume +POST /api/v1/workflow-runs/{id}/cancel +GET /api/v1/workflow-runs/{id}/steps +GET /api/v1/workflow-runs/{id}/steps/{nodeId} +GET /api/v1/workflow-runs/{id}/steps/{nodeId}/logs +GET /api/v1/workflow-runs/{id}/steps/{nodeId}/artifacts +``` + +--- + +## References + +- [Module Overview](overview.md) +- [Workflow Templates](../workflow/templates.md) +- [Execution State Machine](../workflow/execution.md) +- [API Documentation](../api/workflows.md) diff --git a/docs/modules/release-orchestrator/operations/alerting.md b/docs/modules/release-orchestrator/operations/alerting.md new file mode 100644 index 000000000..0eb7c88fb --- /dev/null +++ b/docs/modules/release-orchestrator/operations/alerting.md @@ -0,0 +1,246 @@ +# Alerting Rules + +> Prometheus alerting rules for the Release Orchestrator. + +**Status:** Planned (not yet implemented) +**Source:** [Architecture Advisory Section 13.4](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) +**Related Modules:** [Metrics](metrics.md), [Observability Overview](overview.md) + +## Overview + +The Release Orchestrator provides Prometheus alerting rules for monitoring promotions, deployments, agents, and integrations. + +--- + +## High Priority Alerts + +### Security Gate Block Rate + +```yaml +- alert: PromotionGateBlockRate + expr: | + rate(stella_security_gate_results_total{result="blocked"}[1h]) / + rate(stella_security_gate_results_total[1h]) > 0.5 + for: 15m + labels: + severity: warning + annotations: + summary: "High rate of security gate blocks" + description: "More than 50% of promotions are being blocked by security gates" +``` + +### Deployment Failure Rate + +```yaml +- alert: DeploymentFailureRate + expr: | + rate(stella_deployments_total{status="failed"}[1h]) / + rate(stella_deployments_total[1h]) > 0.1 + for: 10m + labels: + severity: critical + annotations: + summary: "High deployment failure rate" + description: "More than 10% of deployments are failing" +``` + +### Agent Offline + +```yaml +- alert: AgentOffline + expr: | + stella_agents_status{status="offline"} == 1 + for: 5m + labels: + severity: warning + annotations: + summary: "Agent offline" + description: "Agent {{ $labels.agent_id }} has been offline for 5 minutes" +``` + +### Promotion Stuck + +```yaml +- alert: PromotionStuck + expr: | + time() - stella_promotion_start_time{status="deploying"} > 1800 + for: 5m + labels: + severity: warning + annotations: + summary: "Promotion stuck in deploying state" + description: "Promotion {{ $labels.promotion_id }} has been deploying for more than 30 minutes" +``` + +### Integration Unhealthy + +```yaml +- alert: IntegrationUnhealthy + expr: | + stella_integration_health{status="unhealthy"} == 1 + for: 10m + labels: + severity: warning + annotations: + summary: "Integration unhealthy" + description: "Integration {{ $labels.integration_name }} has been unhealthy for 10 minutes" +``` + +--- + +## Medium Priority Alerts + +### Workflow Step Timeout + +```yaml +- alert: WorkflowStepTimeout + expr: | + stella_workflow_step_duration_seconds > 600 + for: 1m + labels: + severity: warning + annotations: + summary: "Workflow step taking too long" + description: "Step {{ $labels.step_type }} in workflow {{ $labels.workflow_run_id }} has been running for more than 10 minutes" +``` + +### Evidence Generation Failure + +```yaml +- alert: EvidenceGenerationFailure + expr: | + rate(stella_evidence_generation_failures_total[1h]) > 0 + for: 5m + labels: + severity: warning + annotations: + summary: "Evidence generation failures" + description: "Evidence generation is failing, affecting audit compliance" +``` + +### Target Health Degraded + +```yaml +- alert: TargetHealthDegraded + expr: | + stella_target_health{status!="healthy"} == 1 + for: 5m + labels: + severity: warning + annotations: + summary: "Target health degraded" + description: "Target {{ $labels.target_name }} is reporting {{ $labels.status }}" +``` + +### Approval Timeout + +```yaml +- alert: ApprovalTimeout + expr: | + time() - stella_promotion_approval_requested_time > 86400 + for: 1h + labels: + severity: warning + annotations: + summary: "Promotion awaiting approval for too long" + description: "Promotion {{ $labels.promotion_id }} has been waiting for approval for more than 24 hours" +``` + +--- + +## Low Priority Alerts + +### Database Connection Pool + +```yaml +- alert: DatabaseConnectionPoolExhausted + expr: | + stella_db_connection_pool_available < 5 + for: 5m + labels: + severity: warning + annotations: + summary: "Database connection pool running low" + description: "Only {{ $value }} database connections available" +``` + +### Plugin Error Rate + +```yaml +- alert: PluginErrorRate + expr: | + rate(stella_plugin_errors_total[5m]) > 1 + for: 5m + labels: + severity: warning + annotations: + summary: "Plugin errors detected" + description: "Plugin {{ $labels.plugin_id }} is experiencing errors" +``` + +--- + +## Alert Routing + +### Example AlertManager Configuration + +```yaml +# alertmanager.yaml +route: + receiver: default + group_by: [alertname, severity] + group_wait: 30s + group_interval: 5m + repeat_interval: 4h + + routes: + - match: + severity: critical + receiver: pagerduty + continue: true + + - match: + severity: warning + receiver: slack + +receivers: + - name: default + webhook_configs: + - url: http://webhook.example.com/alerts + + - name: pagerduty + pagerduty_configs: + - service_key: ${PAGERDUTY_KEY} + severity: critical + + - name: slack + slack_configs: + - channel: '#alerts' + api_url: ${SLACK_WEBHOOK_URL} + title: '{{ .CommonAnnotations.summary }}' + text: '{{ .CommonAnnotations.description }}' +``` + +--- + +## Dashboard Integration + +### Grafana Alert Panels + +Recommended dashboard panels for alerts: + +| Panel | Query | +|-------|-------| +| Active Alerts | `count(ALERTS{alertstate="firing"})` | +| Alert History | `count_over_time(ALERTS{alertstate="firing"}[24h])` | +| By Severity | `count(ALERTS{alertstate="firing"}) by (severity)` | +| By Component | `count(ALERTS{alertstate="firing"}) by (alertname)` | + +--- + +## See Also + +- [Metrics](metrics.md) +- [Observability Overview](overview.md) +- [Logging](logging.md) +- [Tracing](tracing.md) diff --git a/docs/modules/release-orchestrator/operations/logging.md b/docs/modules/release-orchestrator/operations/logging.md new file mode 100644 index 000000000..b310e4537 --- /dev/null +++ b/docs/modules/release-orchestrator/operations/logging.md @@ -0,0 +1,220 @@ +# Logging Specification + +> Structured logging format and categories for the Release Orchestrator. + +**Status:** Planned (not yet implemented) +**Source:** [Architecture Advisory Section 13.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) +**Related Modules:** [Observability Overview](overview.md), [Tracing](tracing.md) + +## Overview + +The Release Orchestrator uses structured JSON logging with consistent format, correlation IDs, and context propagation for all components. + +--- + +## Structured Log Format + +### JSON Schema + +```json +{ + "timestamp": "2026-01-09T14:32:15.123Z", + "level": "info", + "module": "promotion-manager", + "message": "Promotion approved", + "context": { + "tenant_id": "uuid", + "promotion_id": "uuid", + "release_id": "uuid", + "environment": "prod", + "user_id": "uuid" + }, + "details": { + "approvals_count": 2, + "gates_passed": ["security", "approval", "freeze"], + "decision": "allow" + }, + "trace_id": "abc123", + "span_id": "def456", + "duration_ms": 45 +} +``` + +--- + +## Log Levels + +| Level | Usage | +|-------|-------| +| `error` | Errors requiring attention; failures that impact functionality | +| `warn` | Potential issues; degraded functionality; approaching limits | +| `info` | Significant events; state changes; audit-relevant actions | +| `debug` | Detailed debugging info; request/response bodies | +| `trace` | Very detailed tracing; internal state; performance profiling | + +--- + +## Log Categories + +| Category | Examples | +|----------|----------| +| `api` | Request received, response sent, validation errors | +| `promotion` | Promotion requested, approved, rejected, completed | +| `deployment` | Deployment started, task assigned, completed, failed | +| `security` | Gate evaluation, vulnerability found, policy violation | +| `agent` | Agent registered, heartbeat, task execution | +| `workflow` | Workflow started, step executed, completed | +| `integration` | Integration tested, resource discovered, webhook received | + +--- + +## Logging Examples + +### API Request + +```json +{ + "timestamp": "2026-01-09T14:32:15.123Z", + "level": "info", + "module": "api", + "message": "Request received", + "context": { + "tenant_id": "uuid", + "user_id": "uuid" + }, + "details": { + "method": "POST", + "path": "/api/v1/promotions", + "status": 201, + "duration_ms": 125 + }, + "trace_id": "abc123", + "span_id": "def456" +} +``` + +### Promotion Event + +```json +{ + "timestamp": "2026-01-09T14:32:15.123Z", + "level": "info", + "module": "promotion-manager", + "message": "Promotion approved", + "context": { + "tenant_id": "uuid", + "promotion_id": "uuid", + "release_id": "uuid", + "environment": "prod", + "user_id": "uuid" + }, + "details": { + "approvals_count": 2, + "gates_passed": ["security", "approval", "freeze"], + "decision": "allow" + }, + "trace_id": "abc123", + "span_id": "def456", + "duration_ms": 45 +} +``` + +### Security Gate Failure + +```json +{ + "timestamp": "2026-01-09T14:32:15.123Z", + "level": "warn", + "module": "security", + "message": "Security gate blocked promotion", + "context": { + "tenant_id": "uuid", + "promotion_id": "uuid", + "release_id": "uuid", + "environment": "prod" + }, + "details": { + "gate_name": "security-gate", + "reason": "Critical vulnerability found", + "vulnerabilities": { + "critical": 1, + "high": 3 + } + }, + "trace_id": "abc123", + "span_id": "def456" +} +``` + +--- + +## Sensitive Data Masking + +The following fields are automatically masked in logs: + +| Field Type | Masking Strategy | +|------------|------------------| +| Passwords | Not logged | +| API Keys | First 4 and last 4 chars only | +| Tokens | Hash only | +| PII | Redacted | +| Credentials | Not logged | + +### Example + +```json +{ + "message": "Authentication succeeded", + "details": { + "api_key": "sk_l...abcd", + "token_hash": "sha256:abc123..." + } +} +``` + +--- + +## Correlation IDs + +All logs include correlation IDs for request tracing: + +| Field | Description | +|-------|-------------| +| `trace_id` | W3C Trace Context trace ID | +| `span_id` | Current operation span ID | +| `correlation_id` | Business-level correlation (optional) | + +--- + +## Log Aggregation + +Recommended log aggregation setup: + +```yaml +# Fluent Bit configuration +[INPUT] + Name tail + Path /var/log/stella/*.log + Parser json + +[FILTER] + Name nest + Match * + Operation lift + Nested_under context + +[OUTPUT] + Name opensearch + Match * + Host opensearch.example.com + Index stella-logs +``` + +--- + +## See Also + +- [Observability Overview](overview.md) +- [Tracing](tracing.md) +- [Alerting](alerting.md) +- [Security Overview](../security/overview.md) diff --git a/docs/modules/release-orchestrator/operations/metrics.md b/docs/modules/release-orchestrator/operations/metrics.md new file mode 100644 index 000000000..827b5ed7d --- /dev/null +++ b/docs/modules/release-orchestrator/operations/metrics.md @@ -0,0 +1,274 @@ +# Metrics Specification + +## Overview + +Release Orchestrator exposes Prometheus-compatible metrics for monitoring deployment health, performance, and operational status. + +## Core Metrics + +### Release Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_releases_total` | counter | Total releases created | `tenant`, `status` | +| `stella_releases_active` | gauge | Currently active releases | `tenant`, `status` | +| `stella_release_components_count` | histogram | Components per release | `tenant` | + +### Promotion Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_promotions_total` | counter | Total promotions | `tenant`, `env`, `status` | +| `stella_promotions_in_progress` | gauge | Promotions currently in progress | `tenant`, `env` | +| `stella_promotion_duration_seconds` | histogram | Time from request to completion | `tenant`, `env`, `status` | +| `stella_approval_pending_count` | gauge | Pending approvals | `tenant`, `env` | +| `stella_approval_duration_seconds` | histogram | Time to approve | `tenant`, `env` | + +### Deployment Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_deployments_total` | counter | Total deployments | `tenant`, `env`, `strategy`, `status` | +| `stella_deployment_duration_seconds` | histogram | Deployment duration | `tenant`, `env`, `strategy` | +| `stella_deployment_tasks_total` | counter | Total deployment tasks | `tenant`, `status` | +| `stella_deployment_task_duration_seconds` | histogram | Task duration | `target_type` | +| `stella_rollbacks_total` | counter | Total rollbacks | `tenant`, `env`, `reason` | + +### Agent Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_agents_connected` | gauge | Connected agents | `tenant` | +| `stella_agents_by_status` | gauge | Agents by status | `tenant`, `status` | +| `stella_agent_tasks_total` | counter | Tasks executed by agents | `agent`, `type`, `status` | +| `stella_agent_task_duration_seconds` | histogram | Agent task duration | `agent`, `type` | +| `stella_agent_heartbeat_age_seconds` | gauge | Seconds since last heartbeat | `agent` | +| `stella_agent_resource_cpu_percent` | gauge | Agent CPU usage | `agent` | +| `stella_agent_resource_memory_percent` | gauge | Agent memory usage | `agent` | + +### Workflow Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_workflow_runs_total` | counter | Workflow executions | `tenant`, `template`, `status` | +| `stella_workflow_runs_active` | gauge | Currently running workflows | `tenant`, `template` | +| `stella_workflow_duration_seconds` | histogram | Workflow duration | `template`, `status` | +| `stella_workflow_step_duration_seconds` | histogram | Step execution time | `step_type`, `status` | +| `stella_workflow_step_retries_total` | counter | Step retry count | `step_type` | + +### Target Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_targets_total` | gauge | Total targets | `tenant`, `env`, `type` | +| `stella_targets_by_health` | gauge | Targets by health status | `tenant`, `env`, `health` | +| `stella_target_drift_detected` | gauge | Targets with drift | `tenant`, `env` | + +### Integration Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_integrations_total` | gauge | Configured integrations | `tenant`, `type` | +| `stella_integration_health` | gauge | Integration health (1=healthy) | `tenant`, `integration` | +| `stella_integration_requests_total` | counter | Requests to integrations | `integration`, `operation`, `status` | +| `stella_integration_latency_seconds` | histogram | Integration request latency | `integration`, `operation` | + +### Gate Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_gate_evaluations_total` | counter | Gate evaluations | `tenant`, `gate_type`, `result` | +| `stella_gate_evaluation_duration_seconds` | histogram | Gate evaluation time | `gate_type` | +| `stella_gate_blocks_total` | counter | Blocked promotions by gate | `tenant`, `gate_type`, `env` | + +## API Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_http_requests_total` | counter | HTTP requests | `method`, `path`, `status` | +| `stella_http_request_duration_seconds` | histogram | Request latency | `method`, `path` | +| `stella_http_requests_in_flight` | gauge | Active requests | `method` | +| `stella_http_request_size_bytes` | histogram | Request size | `method`, `path` | +| `stella_http_response_size_bytes` | histogram | Response size | `method`, `path` | + +## Evidence Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_evidence_packets_total` | counter | Evidence packets generated | `tenant`, `type` | +| `stella_evidence_packet_size_bytes` | histogram | Evidence packet size | `type` | +| `stella_evidence_verification_total` | counter | Evidence verifications | `result` | + +## Prometheus Configuration + +```yaml +# prometheus.yml +global: + scrape_interval: 15s + evaluation_interval: 15s + +scrape_configs: + - job_name: 'stella-orchestrator' + static_configs: + - targets: ['stella-orchestrator:9090'] + metrics_path: /metrics + scheme: https + tls_config: + ca_file: /etc/prometheus/ca.crt + + - job_name: 'stella-agents' + kubernetes_sd_configs: + - role: pod + selectors: + - role: pod + label: "app.kubernetes.io/name=stella-agent" + relabel_configs: + - source_labels: [__meta_kubernetes_pod_label_agent_id] + target_label: agent_id +``` + +## Histogram Buckets + +### Duration Buckets (seconds) + +```yaml +# Short operations (API calls, gate evaluations) +short_duration_buckets: [0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10] + +# Medium operations (workflow steps) +medium_duration_buckets: [0.1, 0.5, 1, 2.5, 5, 10, 30, 60, 120, 300] + +# Long operations (deployments) +long_duration_buckets: [1, 5, 10, 30, 60, 120, 300, 600, 1200, 3600] +``` + +### Size Buckets (bytes) + +```yaml +# Request/response sizes +size_buckets: [100, 1000, 10000, 100000, 1000000, 10000000] + +# Evidence packet sizes +evidence_buckets: [1000, 10000, 100000, 500000, 1000000, 5000000] +``` + +## SLI Definitions + +### Availability SLI + +```promql +# API availability (99.9% target) +sum(rate(stella_http_requests_total{status!~"5.."}[5m])) +/ +sum(rate(stella_http_requests_total[5m])) +``` + +### Latency SLI + +```promql +# API latency P99 < 500ms +histogram_quantile(0.99, + sum(rate(stella_http_request_duration_seconds_bucket[5m])) by (le) +) +``` + +### Deployment Success SLI + +```promql +# Deployment success rate (99% target) +sum(rate(stella_deployments_total{status="succeeded"}[24h])) +/ +sum(rate(stella_deployments_total[24h])) +``` + +## Alert Rules + +```yaml +groups: + - name: stella-orchestrator + rules: + - alert: HighDeploymentFailureRate + expr: | + sum(rate(stella_deployments_total{status="failed"}[1h])) + / + sum(rate(stella_deployments_total[1h])) > 0.1 + for: 5m + labels: + severity: critical + annotations: + summary: High deployment failure rate + description: More than 10% of deployments failing in the last hour + + - alert: AgentOffline + expr: stella_agent_heartbeat_age_seconds > 120 + for: 2m + labels: + severity: warning + annotations: + summary: Agent {{ $labels.agent }} offline + description: Agent has not sent heartbeat for > 2 minutes + + - alert: PendingApprovalsStale + expr: | + stella_approval_pending_count > 0 + and + time() - stella_promotion_request_timestamp > 3600 + for: 5m + labels: + severity: warning + annotations: + summary: Stale pending approvals + description: Approvals pending for more than 1 hour + + - alert: IntegrationUnhealthy + expr: stella_integration_health == 0 + for: 5m + labels: + severity: warning + annotations: + summary: Integration {{ $labels.integration }} unhealthy + description: Integration health check failing + + - alert: HighAPILatency + expr: | + histogram_quantile(0.99, + sum(rate(stella_http_request_duration_seconds_bucket[5m])) by (le, path) + ) > 1 + for: 5m + labels: + severity: warning + annotations: + summary: High API latency on {{ $labels.path }} + description: P99 latency exceeds 1 second +``` + +## Grafana Dashboards + +### Main Dashboard Panels + +1. **Deployment Pipeline Overview** + - Promotions per environment (time series) + - Success/failure rates (gauge) + - Active deployments (stat) + +2. **Agent Health** + - Connected agents (stat) + - Agent status distribution (pie chart) + - Heartbeat age (table) + +3. **Gate Performance** + - Gate evaluation counts (bar chart) + - Block rate by gate type (time series) + - Evaluation latency (heatmap) + +4. **API Performance** + - Request rate (time series) + - Error rate (time series) + - Latency distribution (heatmap) + +## References + +- [Operations Overview](overview.md) +- [Logging](logging.md) +- [Tracing](tracing.md) +- [Alerting](alerting.md) diff --git a/docs/modules/release-orchestrator/operations/overview.md b/docs/modules/release-orchestrator/operations/overview.md new file mode 100644 index 000000000..d310f2137 --- /dev/null +++ b/docs/modules/release-orchestrator/operations/overview.md @@ -0,0 +1,508 @@ +# Operations Overview + +## Observability Stack + +Release Orchestrator provides comprehensive observability through metrics, logging, and distributed tracing. + +``` + OBSERVABILITY ARCHITECTURE + + ┌─────────────────────────────────────────────────────────────────────────────┐ + │ RELEASE ORCHESTRATOR │ + │ │ + │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ + │ │ Metrics │ │ Logs │ │ Traces │ │ Events │ │ + │ │ Exporter │ │ Collector │ │ Exporter │ │ Publisher │ │ + │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ + │ │ │ │ │ │ + └─────────┼────────────────┼────────────────┼────────────────┼────────────────┘ + │ │ │ │ + ▼ ▼ ▼ ▼ + ┌─────────────────────────────────────────────────────────────────────────────┐ + │ OBSERVABILITY BACKENDS │ + │ │ + │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ + │ │ Prometheus │ │ Loki / │ │ Jaeger / │ │ Event │ │ + │ │ / Mimir │ │ Elasticsearch│ │ Tempo │ │ Bus │ │ + │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ + │ │ │ │ │ │ + │ └────────────────┴────────────────┴────────────────┘ │ + │ │ │ + │ ▼ │ + │ ┌─────────────────┐ │ + │ │ Grafana │ │ + │ │ Dashboards │ │ + │ └─────────────────┘ │ + │ │ + └─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Metrics + +### Core Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_releases_total` | counter | Total releases created | `tenant`, `status` | +| `stella_promotions_total` | counter | Total promotions | `tenant`, `env`, `status` | +| `stella_deployments_total` | counter | Total deployments | `tenant`, `env`, `strategy` | +| `stella_deployment_duration_seconds` | histogram | Deployment duration | `tenant`, `env`, `strategy` | +| `stella_rollbacks_total` | counter | Total rollbacks | `tenant`, `env`, `reason` | +| `stella_agents_connected` | gauge | Connected agents | `tenant` | +| `stella_targets_total` | gauge | Total targets | `tenant`, `env`, `type` | +| `stella_workflow_runs_total` | counter | Workflow executions | `tenant`, `template`, `status` | +| `stella_workflow_step_duration_seconds` | histogram | Step execution time | `step_type` | +| `stella_approval_pending_count` | gauge | Pending approvals | `tenant`, `env` | +| `stella_approval_duration_seconds` | histogram | Time to approve | `tenant`, `env` | + +### API Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_http_requests_total` | counter | HTTP requests | `method`, `path`, `status` | +| `stella_http_request_duration_seconds` | histogram | Request latency | `method`, `path` | +| `stella_http_requests_in_flight` | gauge | Active requests | `method` | + +### Agent Metrics + +| Metric | Type | Description | Labels | +|--------|------|-------------|--------| +| `stella_agent_tasks_total` | counter | Tasks executed | `agent`, `type`, `status` | +| `stella_agent_task_duration_seconds` | histogram | Task duration | `agent`, `type` | +| `stella_agent_heartbeat_age_seconds` | gauge | Since last heartbeat | `agent` | + +### Prometheus Configuration + +```yaml +# prometheus.yml +scrape_configs: + - job_name: 'stella-orchestrator' + static_configs: + - targets: ['stella-orchestrator:9090'] + metrics_path: /metrics + scheme: https + tls_config: + ca_file: /etc/prometheus/ca.crt + + - job_name: 'stella-agents' + kubernetes_sd_configs: + - role: pod + selectors: + - role: pod + label: "app.kubernetes.io/name=stella-agent" + relabel_configs: + - source_labels: [__meta_kubernetes_pod_label_agent_id] + target_label: agent_id +``` + +## Logging + +### Log Format + +```json +{ + "timestamp": "2026-01-09T10:30:00.123Z", + "level": "info", + "message": "Deployment started", + "service": "deploy-orchestrator", + "version": "1.0.0", + "traceId": "abc123def456", + "spanId": "789ghi", + "tenantId": "tenant-uuid", + "correlationId": "corr-uuid", + "context": { + "deploymentJobId": "job-uuid", + "releaseId": "release-uuid", + "environmentId": "env-uuid" + } +} +``` + +### Log Levels + +| Level | Usage | +|-------|-------| +| `error` | Failures requiring attention | +| `warn` | Degraded operation, recoverable issues | +| `info` | Business events (deployment started, approval granted) | +| `debug` | Detailed operational info | +| `trace` | Very detailed debugging | + +### Structured Logging Configuration + +```typescript +// Logging configuration +const loggerConfig = { + level: process.env.LOG_LEVEL || 'info', + format: 'json', + outputs: [ + { + type: 'stdout', + format: 'json' + }, + { + type: 'file', + path: '/var/log/stella/orchestrator.log', + rotation: { + maxSize: '100MB', + maxFiles: 10 + } + } + ], + // Sensitive field masking + redact: [ + 'password', + 'token', + 'secret', + 'credentials', + 'authorization' + ] +}; +``` + +### Important Log Events + +| Event | Level | Description | +|-------|-------|-------------| +| `deployment.started` | info | Deployment job started | +| `deployment.completed` | info | Deployment successful | +| `deployment.failed` | error | Deployment failed | +| `rollback.initiated` | warn | Rollback triggered | +| `approval.granted` | info | Promotion approved | +| `approval.denied` | info | Promotion rejected | +| `agent.connected` | info | Agent came online | +| `agent.disconnected` | warn | Agent went offline | +| `security.gate.failed` | warn | Security check blocked | + +## Distributed Tracing + +### Trace Context Propagation + +```typescript +// Trace context in requests +interface TraceContext { + traceId: string; + spanId: string; + parentSpanId?: string; + sampled: boolean; + baggage?: Record; +} + +// W3C Trace Context headers +// traceparent: 00-{traceId}-{spanId}-{flags} +// tracestate: stella=... + +// Example trace propagation +class TracingMiddleware { + handle(req: Request, res: Response, next: NextFunction): void { + const traceparent = req.headers['traceparent']; + const traceContext = this.parseTraceParent(traceparent); + + // Start span for this request + const span = this.tracer.startSpan('http.request', { + parent: traceContext, + attributes: { + 'http.method': req.method, + 'http.url': req.url, + 'http.user_agent': req.headers['user-agent'], + 'tenant.id': req.tenantId + } + }); + + // Attach to request for downstream use + req.span = span; + + res.on('finish', () => { + span.setAttribute('http.status_code', res.statusCode); + span.end(); + }); + + next(); + } +} +``` + +### Key Spans + +| Span Name | Description | Attributes | +|-----------|-------------|------------| +| `deployment.execute` | Full deployment | `release_id`, `environment` | +| `task.dispatch` | Task dispatch to agent | `target_id`, `agent_id` | +| `agent.execute` | Agent task execution | `task_type`, `duration` | +| `workflow.run` | Workflow execution | `template_id`, `status` | +| `workflow.step` | Individual step | `step_type`, `node_id` | +| `approval.wait` | Waiting for approval | `promotion_id`, `duration` | +| `gate.evaluate` | Gate evaluation | `gate_type`, `result` | + +### Jaeger Configuration + +```yaml +# jaeger-config.yaml +apiVersion: jaegertracing.io/v1 +kind: Jaeger +metadata: + name: stella-jaeger +spec: + strategy: production + collector: + maxReplicas: 5 + storage: + type: elasticsearch + options: + es: + server-urls: https://elasticsearch:9200 + secretName: jaeger-es-secret + ingress: + enabled: true +``` + +## Alerting + +### Alert Rules + +```yaml +# prometheus-rules.yaml +groups: + - name: stella.deployment + rules: + - alert: DeploymentFailureRateHigh + expr: | + sum(rate(stella_deployments_total{status="failed"}[5m])) / + sum(rate(stella_deployments_total[5m])) > 0.1 + for: 5m + labels: + severity: critical + annotations: + summary: "High deployment failure rate" + description: "More than 10% of deployments are failing" + + - alert: DeploymentDurationHigh + expr: | + histogram_quantile(0.95, sum(rate(stella_deployment_duration_seconds_bucket[5m])) by (le, tenant)) > 600 + for: 10m + labels: + severity: warning + annotations: + summary: "Deployment duration high" + description: "P95 deployment duration exceeds 10 minutes" + + - alert: RollbackRateHigh + expr: | + sum(rate(stella_rollbacks_total[1h])) > 3 + for: 5m + labels: + severity: warning + annotations: + summary: "High rollback rate" + description: "More than 3 rollbacks in the last hour" + + - name: stella.agents + rules: + - alert: AgentOffline + expr: | + stella_agent_heartbeat_age_seconds > 120 + for: 2m + labels: + severity: critical + annotations: + summary: "Agent offline" + description: "Agent {{ $labels.agent }} has not sent heartbeat for 2 minutes" + + - alert: AgentPoolLow + expr: | + count(stella_agents_connected{status="online"}) by (tenant) < 2 + for: 5m + labels: + severity: warning + annotations: + summary: "Low agent count" + description: "Fewer than 2 agents online for tenant {{ $labels.tenant }}" + + - name: stella.approvals + rules: + - alert: ApprovalBacklogHigh + expr: | + stella_approval_pending_count > 10 + for: 1h + labels: + severity: warning + annotations: + summary: "Approval backlog growing" + description: "More than 10 pending approvals for over an hour" + + - alert: ApprovalWaitLong + expr: | + histogram_quantile(0.90, stella_approval_duration_seconds_bucket) > 86400 + for: 1h + labels: + severity: info + annotations: + summary: "Long approval wait times" + description: "P90 approval wait time exceeds 24 hours" +``` + +### PagerDuty Integration + +```typescript +interface AlertManagerConfig { + receivers: [ + { + name: "stella-critical", + pagerduty_configs: [ + { + service_key: "${PAGERDUTY_SERVICE_KEY}", + severity: "critical" + } + ] + }, + { + name: "stella-warning", + slack_configs: [ + { + api_url: "${SLACK_WEBHOOK_URL}", + channel: "#stella-alerts", + send_resolved: true + } + ] + } + ], + route: { + receiver: "stella-warning", + routes: [ + { + match: { severity: "critical" }, + receiver: "stella-critical" + } + ] + } +} +``` + +## Dashboards + +### Deployment Dashboard + +Key panels: +- Deployment rate over time +- Success/failure ratio +- Average deployment duration +- Deployment duration histogram +- Active deployments by environment +- Recent deployment list + +### Agent Health Dashboard + +Key panels: +- Connected agents count +- Agent heartbeat status +- Tasks per agent +- Task success rate by agent +- Agent resource utilization + +### Approval Dashboard + +Key panels: +- Pending approvals count +- Approval response time +- Approvals by user +- Rejection reasons breakdown + +## Health Endpoints + +### Application Health + +```http +GET /health +``` + +Response: +```json +{ + "status": "healthy", + "version": "1.0.0", + "uptime": 86400, + "checks": { + "database": { "status": "healthy", "latency": 5 }, + "redis": { "status": "healthy", "latency": 2 }, + "vault": { "status": "healthy", "latency": 10 } + } +} +``` + +### Readiness Probe + +```http +GET /health/ready +``` + +### Liveness Probe + +```http +GET /health/live +``` + +## Performance Tuning + +### Database Connection Pool + +```typescript +const poolConfig = { + min: 5, + max: 20, + acquireTimeout: 30000, + idleTimeout: 600000, + connectionTimeout: 10000 +}; +``` + +### Cache Configuration + +```typescript +const cacheConfig = { + // Release cache + releases: { + ttl: 300, // 5 minutes + maxSize: 1000 + }, + // Target cache + targets: { + ttl: 60, // 1 minute + maxSize: 5000 + }, + // Workflow template cache + templates: { + ttl: 3600, // 1 hour + maxSize: 100 + } +}; +``` + +### Rate Limiting + +```typescript +const rateLimitConfig = { + // API rate limits + api: { + windowMs: 60000, // 1 minute + max: 1000, // requests per window + burst: 100 // burst allowance + }, + // Webhook rate limits + webhooks: { + windowMs: 60000, + max: 100 + }, + // Per-tenant limits + tenant: { + windowMs: 60000, + max: 500 + } +}; +``` + +## References + +- [Metrics Reference](metrics.md) +- [Logging Guide](logging.md) +- [Tracing Setup](tracing.md) +- [Alert Configuration](alerting.md) diff --git a/docs/modules/release-orchestrator/operations/tracing.md b/docs/modules/release-orchestrator/operations/tracing.md new file mode 100644 index 000000000..32f8ddad6 --- /dev/null +++ b/docs/modules/release-orchestrator/operations/tracing.md @@ -0,0 +1,222 @@ +# Distributed Tracing Specification + +> OpenTelemetry-based distributed tracing for the Release Orchestrator. + +**Status:** Planned (not yet implemented) +**Source:** [Architecture Advisory Section 13.3](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) +**Related Modules:** [Observability Overview](overview.md), [Logging](logging.md) + +## Overview + +The Release Orchestrator uses OpenTelemetry for distributed tracing, enabling end-to-end visibility of promotion workflows, deployments, and agent tasks. + +--- + +## Trace Context Propagation + +### W3C Trace Context + +```typescript +// Trace context structure +interface TraceContext { + traceId: string; // 32-char hex + spanId: string; // 16-char hex + parentSpanId?: string; + sampled: boolean; + baggage: Record; +} + +// Propagation headers +const TRACE_HEADERS = { + W3C_TRACEPARENT: "traceparent", + W3C_TRACESTATE: "tracestate", + BAGGAGE: "baggage", +}; + +// Example traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01 +``` + +### Header Format + +``` +traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01 + ^ ^ ^ ^ + | | | | + | trace-id (32 hex) span-id (16 hex) flags + version +``` + +--- + +## Key Traces + +| Operation | Span Name | Attributes | +|-----------|-----------|------------| +| Promotion request | `promotion.request` | promotion_id, release_id, environment | +| Gate evaluation | `promotion.evaluate_gates` | gate_names, result | +| Workflow execution | `workflow.execute` | workflow_run_id, template_name | +| Step execution | `workflow.step.{type}` | step_run_id, node_id, inputs | +| Deployment job | `deployment.execute` | job_id, environment, strategy | +| Agent task | `agent.task.{type}` | task_id, agent_id, target_id | +| Plugin call | `plugin.{method}` | plugin_id, method, duration | + +--- + +## Trace Hierarchy + +### Promotion Flow + +``` +promotion.request (root) ++-- promotion.evaluate_gates +| +-- gate.security +| +-- gate.approval +| +-- gate.freeze_window +| ++-- workflow.execute +| +-- workflow.step.security-check +| +-- workflow.step.approval +| +-- workflow.step.deploy +| +-- deployment.execute +| +-- deployment.assign_tasks +| +-- agent.task.pull +| +-- agent.task.deploy +| +-- agent.task.health_check +| ++-- evidence.generate + +-- evidence.sign +``` + +--- + +## Span Attributes + +### Common Attributes + +| Attribute | Type | Description | +|-----------|------|-------------| +| `tenant.id` | string | Tenant UUID | +| `user.id` | string | User UUID (if authenticated) | +| `release.id` | string | Release UUID | +| `environment.name` | string | Environment name | +| `error` | boolean | Whether error occurred | +| `error.type` | string | Error type/class | + +### Promotion Attributes + +| Attribute | Type | Description | +|-----------|------|-------------| +| `promotion.id` | string | Promotion UUID | +| `promotion.status` | string | Current status | +| `promotion.gates` | string[] | Gates evaluated | +| `promotion.decision` | string | allow/deny | + +### Deployment Attributes + +| Attribute | Type | Description | +|-----------|------|-------------| +| `deployment.job_id` | string | Deployment job UUID | +| `deployment.strategy` | string | Deployment strategy | +| `deployment.target_count` | int | Number of targets | +| `deployment.batch_size` | int | Batch size | + +### Agent Task Attributes + +| Attribute | Type | Description | +|-----------|------|-------------| +| `task.id` | string | Task UUID | +| `task.type` | string | Task type | +| `agent.id` | string | Agent UUID | +| `target.id` | string | Target UUID | + +--- + +## OpenTelemetry Configuration + +### SDK Configuration + +```yaml +# otel-config.yaml +service: + name: stella-release-orchestrator + version: ${VERSION} + +exporters: + otlp: + endpoint: otel-collector:4317 + protocol: grpc + +processors: + batch: + timeout: 10s + send_batch_size: 1024 + +resource: + attributes: + - key: service.namespace + value: stella-ops + - key: deployment.environment + value: ${ENVIRONMENT} +``` + +### Environment Variables + +```bash +OTEL_SERVICE_NAME=stella-release-orchestrator +OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317 +OTEL_EXPORTER_OTLP_PROTOCOL=grpc +OTEL_TRACES_SAMPLER=parentbased_traceidratio +OTEL_TRACES_SAMPLER_ARG=0.1 +``` + +--- + +## Sampling Strategy + +| Environment | Sampling Rate | Reason | +|-------------|---------------|--------| +| Development | 100% | Full visibility | +| Staging | 100% | Full visibility | +| Production | 10% | Cost/performance | +| Production (errors) | 100% | Always sample errors | + +--- + +## Example Trace + +```json +{ + "traceId": "4bf92f3577b34da6a3ce929d0e0e4736", + "spans": [ + { + "spanId": "00f067aa0ba902b7", + "name": "promotion.request", + "duration_ms": 5234, + "attributes": { + "promotion.id": "promo-123", + "release.id": "rel-456", + "environment.name": "production" + } + }, + { + "spanId": "00f067aa0ba902b8", + "parentSpanId": "00f067aa0ba902b7", + "name": "gate.security", + "duration_ms": 234, + "attributes": { + "gate.result": "passed", + "vulnerabilities.critical": 0 + } + } + ] +} +``` + +--- + +## See Also + +- [Observability Overview](overview.md) +- [Logging](logging.md) +- [Metrics](metrics.md) +- [Alerting](alerting.md) diff --git a/docs/modules/release-orchestrator/progressive-delivery/ab-releases.md b/docs/modules/release-orchestrator/progressive-delivery/ab-releases.md new file mode 100644 index 000000000..4b1075b04 --- /dev/null +++ b/docs/modules/release-orchestrator/progressive-delivery/ab-releases.md @@ -0,0 +1,266 @@ +# A/B Release Models + +> Two models for A/B releases: target-group based and router-based traffic splitting. + +**Status:** Planned (not yet implemented) +**Source:** [Architecture Advisory Section 11.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) +**Related Modules:** [Progressive Delivery Module](../modules/progressive-delivery.md), [Traffic Router](routers.md) +**Sprint:** [110_001 A/B Release Manager](../../../../implplan/SPRINT_20260110_110_001_PROGDL_ab_release_manager.md) + +## Overview + +Stella Ops supports two distinct models for A/B releases: + +1. **Target-Group A/B:** Scale different target groups to shift workload +2. **Router-Based A/B:** Use traffic routers to split requests between variations + +Each model has different use cases, trade-offs, and implementation requirements. + +--- + +## Model 1: Target-Group A/B + +Target-group A/B splits traffic by scaling different groups of targets. Suitable for worker services, background processors, and scenarios where sticky sessions are not required. + +### Configuration + +```typescript +interface TargetGroupABConfig { + type: "target-group"; + + // Group definitions + groupA: { + targetGroupId: UUID; + labels?: Record; + }; + groupB: { + targetGroupId: UUID; + labels?: Record; + }; + + // Rollout by scaling groups + rolloutStrategy: { + type: "scale-groups"; + stages: ScaleStage[]; + }; +} + +interface ScaleStage { + name: string; + groupAPercentage: number; // Percentage of group A targets active + groupBPercentage: number; // Percentage of group B targets active + duration?: number; // Auto-advance after duration (seconds) + healthThreshold?: number; // Required health % to advance + requireApproval?: boolean; +} +``` + +### Example: Worker Service Canary + +```typescript +const workerCanaryConfig: TargetGroupABConfig = { + type: "target-group", + groupA: { labels: { "worker-group": "A" } }, + groupB: { labels: { "worker-group": "B" } }, + rolloutStrategy: { + type: "scale-groups", + stages: [ + // Stage 1: 100% A, 10% B (canary) + { name: "canary", groupAPercentage: 100, groupBPercentage: 10, + duration: 300, healthThreshold: 95 }, + // Stage 2: 100% A, 50% B + { name: "expand", groupAPercentage: 100, groupBPercentage: 50, + duration: 600, healthThreshold: 95 }, + // Stage 3: 50% A, 100% B + { name: "shift", groupAPercentage: 50, groupBPercentage: 100, + duration: 600, healthThreshold: 95 }, + // Stage 4: 0% A, 100% B (complete) + { name: "complete", groupAPercentage: 0, groupBPercentage: 100, + requireApproval: true }, + ], + }, +}; +``` + +### Use Cases + +- Background job processors +- Worker services without external traffic +- Infrastructure-level splitting +- Static traffic distribution +- Hardware-based variants + +--- + +## Model 2: Router-Based A/B + +Router-based A/B uses traffic routers (Nginx, HAProxy, ALB) to split incoming requests between variations. Suitable for APIs, web services, and scenarios requiring sticky sessions. + +### Configuration + +```typescript +interface RouterBasedABConfig { + type: "router-based"; + + // Router integration + routerIntegrationId: UUID; + + // Upstream configuration + upstreamName: string; + variationA: { + targets: string[]; + serviceName?: string; + }; + variationB: { + targets: string[]; + serviceName?: string; + }; + + // Traffic split configuration + trafficSplit: TrafficSplitConfig; + + // Rollout strategy + rolloutStrategy: RouterRolloutStrategy; +} + +interface TrafficSplitConfig { + type: "weight" | "header" | "cookie" | "tenant" | "composite"; + + // Weight-based (percentage) + weights?: { A: number; B: number }; + + // Header-based + headerName?: string; + headerValueA?: string; + headerValueB?: string; + + // Cookie-based + cookieName?: string; + cookieValueA?: string; + cookieValueB?: string; + + // Tenant-based (by host/path) + tenantRules?: TenantRule[]; +} +``` + +### Rollout Strategy + +```typescript +interface RouterRolloutStrategy { + type: "manual" | "time-based" | "health-based" | "composite"; + stages: RouterRolloutStage[]; +} + +interface RouterRolloutStage { + name: string; + trafficPercentageB: number; // % of traffic to variation B + + // Advancement criteria + duration?: number; // Auto-advance after duration + healthThreshold?: number; // Required health % + errorRateThreshold?: number; // Max error rate % + latencyThreshold?: number; // Max p99 latency ms + requireApproval?: boolean; + + // Optional: specific routing rules for this stage + routingOverrides?: TrafficSplitConfig; +} +``` + +### Example: API Canary with Health-Based Advancement + +```typescript +const apiCanaryConfig: RouterBasedABConfig = { + type: "router-based", + routerIntegrationId: "nginx-prod", + upstreamName: "api-backend", + variationA: { serviceName: "api-v1" }, + variationB: { serviceName: "api-v2" }, + trafficSplit: { type: "weight", weights: { A: 100, B: 0 } }, + rolloutStrategy: { + type: "health-based", + stages: [ + { name: "canary-10", trafficPercentageB: 10, + duration: 300, healthThreshold: 99, errorRateThreshold: 1 }, + { name: "canary-25", trafficPercentageB: 25, + duration: 600, healthThreshold: 99, errorRateThreshold: 1 }, + { name: "canary-50", trafficPercentageB: 50, + duration: 900, healthThreshold: 99, errorRateThreshold: 1 }, + { name: "promote", trafficPercentageB: 100, + requireApproval: true }, + ], + }, +}; +``` + +### Use Cases + +- API services with external traffic +- Web applications with user sessions +- Dynamic traffic distribution +- User-based variants (A/B testing) +- Feature flags and gradual rollouts + +--- + +## Routing Strategies + +### Weight-Based Routing + +Splits traffic by percentage across variations. + +```yaml +trafficSplit: + type: weight + weights: + A: 90 + B: 10 +``` + +### Header-Based Routing + +Routes based on request header values. + +```yaml +trafficSplit: + type: header + headerName: X-Feature-Flag + headerValueA: "control" + headerValueB: "experiment" +``` + +### Cookie-Based Routing + +Routes based on cookie values for sticky sessions. + +```yaml +trafficSplit: + type: cookie + cookieName: ab_variation + cookieValueA: "A" + cookieValueB: "B" +``` + +--- + +## Comparison Matrix + +| Aspect | Target-Group A/B | Router-Based A/B | +|--------|------------------|------------------| +| **Traffic Control** | By scaling targets | By routing rules | +| **Sticky Sessions** | Not supported | Supported | +| **Granularity** | Target-level | Request-level | +| **External Traffic** | Not required | Required | +| **Infrastructure** | Target groups | Traffic router | +| **Use Case** | Workers, batch jobs | APIs, web apps | +| **Rollback Speed** | Slower (scaling) | Immediate (routing) | + +--- + +## See Also + +- [Progressive Delivery Module](../modules/progressive-delivery.md) +- [Canary Controller](canary.md) +- [Router Plugins](routers.md) +- [Deployment Strategies](../deployment/strategies.md) diff --git a/docs/modules/release-orchestrator/progressive-delivery/canary.md b/docs/modules/release-orchestrator/progressive-delivery/canary.md new file mode 100644 index 000000000..f0a59fa09 --- /dev/null +++ b/docs/modules/release-orchestrator/progressive-delivery/canary.md @@ -0,0 +1,270 @@ +# Canary Controller + +> Automated canary deployment controller with health-based stage advancement and automatic rollback. + +**Status:** Planned (not yet implemented) +**Source:** [Architecture Advisory Section 11.3](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) +**Related Modules:** [Progressive Delivery Module](../modules/progressive-delivery.md), [Deployment Strategies](../deployment/strategies.md) +**Sprint:** [110_003 Canary Controller](../../../../implplan/SPRINT_20260110_110_003_PROGDL_canary_controller.md) + +## Overview + +The Canary Controller automates progressive rollout of new versions by gradually shifting traffic, monitoring health metrics, and automatically rolling back if issues are detected. + +--- + +## Canary State Machine + +### States + +``` +CREATED -> DEPLOYING -> EVALUATING -> PROMOTING/ROLLING_BACK -> COMPLETED +``` + +| State | Description | +|-------|-------------| +| `CREATED` | Canary release defined, not started | +| `DEPLOYING` | Deploying variation B to targets | +| `EVALUATING` | Monitoring health metrics at current stage | +| `PROMOTING` | Advancing to next stage | +| `ROLLING_BACK` | Reverting to variation A | +| `COMPLETED` | Final state (promoted or rolled back) | + +--- + +## Implementation + +### Canary Controller Class + +```typescript +class CanaryController { + async executeRollout(abRelease: ABRelease): Promise { + const strategy = abRelease.rolloutStrategy; + + for (let i = 0; i < strategy.stages.length; i++) { + const stage = strategy.stages[i]; + const stageRecord = await this.startStage(abRelease, stage, i); + + try { + // 1. Apply traffic configuration for this stage + await this.applyStageTraffic(abRelease, stage); + this.emit("canary.stage_started", { abRelease, stage, stageNumber: i }); + + // 2. Wait for stage completion based on criteria + const result = await this.waitForStageCompletion(abRelease, stage); + + if (!result.success) { + // Health check failed - rollback + this.log(`Stage ${stage.name} failed health check: ${result.reason}`); + await this.rollback(abRelease, result.reason); + return; + } + + // 3. Check if approval required + if (stage.requireApproval) { + this.log(`Stage ${stage.name} requires approval`); + await this.pauseForApproval(abRelease, stage); + + // Wait for approval + const approval = await this.waitForApproval(abRelease, stage); + if (!approval.approved) { + await this.rollback(abRelease, "Approval denied"); + return; + } + } + + await this.completeStage(stageRecord, "succeeded"); + this.emit("canary.stage_completed", { abRelease, stage, stageNumber: i }); + + } catch (error) { + await this.completeStage(stageRecord, "failed", error.message); + await this.rollback(abRelease, error.message); + return; + } + } + + // Rollout complete + await this.completeRollout(abRelease); + this.emit("canary.promoted", { abRelease }); + } +} +``` + +### Stage Completion Logic + +```typescript +private async waitForStageCompletion( + abRelease: ABRelease, + stage: RolloutStage +): Promise { + + const startTime = Date.now(); + const checkInterval = 30000; // 30 seconds + + while (true) { + // Check health metrics + const health = await this.checkHealth(abRelease, stage); + + if (!health.healthy) { + return { + success: false, + reason: `Health check failed: ${health.reason}` + }; + } + + // Check error rate (if threshold configured) + if (stage.errorRateThreshold !== undefined) { + const errorRate = await this.getErrorRate(abRelease); + if (errorRate > stage.errorRateThreshold) { + return { + success: false, + reason: `Error rate ${errorRate}% exceeds threshold ${stage.errorRateThreshold}%` + }; + } + } + + // Check latency (if threshold configured) + if (stage.latencyThreshold !== undefined) { + const latency = await this.getP99Latency(abRelease); + if (latency > stage.latencyThreshold) { + return { + success: false, + reason: `P99 latency ${latency}ms exceeds threshold ${stage.latencyThreshold}ms` + }; + } + } + + // Check duration (auto-advance) + if (stage.duration !== undefined) { + const elapsed = (Date.now() - startTime) / 1000; + if (elapsed >= stage.duration) { + return { success: true }; + } + } + + // Wait before next check + await sleep(checkInterval); + } +} +``` + +### Traffic Application + +```typescript +private async applyStageTraffic(abRelease: ABRelease, stage: RolloutStage): Promise { + if (abRelease.config.type === "router-based") { + const router = await this.getRouterConnector(abRelease.config.routerIntegrationId); + + await router.shiftTraffic( + abRelease.config.variationA.serviceName, + abRelease.config.variationB.serviceName, + stage.trafficPercentageB + ); + + } else if (abRelease.config.type === "target-group") { + // Scale target groups + await this.scaleTargetGroup( + abRelease.config.groupA, + stage.groupAPercentage + ); + await this.scaleTargetGroup( + abRelease.config.groupB, + stage.groupBPercentage + ); + } +} +``` + +### Rollback + +```typescript +async rollback(abRelease: ABRelease, reason: string): Promise { + this.log(`Rolling back A/B release: ${reason}`); + this.emit("canary.rollback_started", { abRelease, reason }); + + if (abRelease.config.type === "router-based") { + // Shift all traffic back to A + const router = await this.getRouterConnector(abRelease.config.routerIntegrationId); + await router.shiftTraffic( + abRelease.config.variationB.serviceName, + abRelease.config.variationA.serviceName, + 100 + ); + + } else if (abRelease.config.type === "target-group") { + // Scale B to 0, A to 100 + await this.scaleTargetGroup(abRelease.config.groupA, 100); + await this.scaleTargetGroup(abRelease.config.groupB, 0); + } + + abRelease.status = "rolled_back"; + await this.save(abRelease); + + this.emit("canary.rolled_back", { abRelease, reason }); +} +``` + +--- + +## Configuration + +### Canary Stages + +```yaml +rolloutStrategy: + type: health-based + stages: + - name: canary-5 + trafficPercentageB: 5 + duration: 300 # 5 minutes + healthThreshold: 99 + errorRateThreshold: 0.5 + + - name: canary-25 + trafficPercentageB: 25 + duration: 600 # 10 minutes + healthThreshold: 99 + errorRateThreshold: 1.0 + + - name: canary-50 + trafficPercentageB: 50 + duration: 900 # 15 minutes + healthThreshold: 99 + errorRateThreshold: 1.0 + + - name: promote + trafficPercentageB: 100 + requireApproval: true +``` + +### Health Metrics + +| Metric | Description | Typical Threshold | +|--------|-------------|-------------------| +| Success Rate | % of successful requests | > 99% | +| Error Rate | % of failed requests | < 1% | +| P99 Latency | 99th percentile response time | < 500ms | +| Health Check | Container/service health | Healthy | + +--- + +## Events + +The canary controller emits events for observability: + +| Event | Description | +|-------|-------------| +| `canary.stage_started` | Stage execution began | +| `canary.stage_completed` | Stage completed successfully | +| `canary.rollback_started` | Rollback initiated | +| `canary.rolled_back` | Rollback completed | +| `canary.promoted` | Full promotion completed | + +--- + +## See Also + +- [Progressive Delivery Module](../modules/progressive-delivery.md) +- [A/B Release Models](ab-releases.md) +- [Router Plugins](routers.md) +- [Metrics](../operations/metrics.md) diff --git a/docs/modules/release-orchestrator/progressive-delivery/routers.md b/docs/modules/release-orchestrator/progressive-delivery/routers.md new file mode 100644 index 000000000..ba1d0715c --- /dev/null +++ b/docs/modules/release-orchestrator/progressive-delivery/routers.md @@ -0,0 +1,348 @@ +# Router Plugins + +> Traffic router plugins for progressive delivery (Nginx, AWS ALB, and custom implementations). + +**Status:** Planned (not yet implemented) +**Source:** [Architecture Advisory Section 11.4](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) +**Related Modules:** [Progressive Delivery Module](../modules/progressive-delivery.md), [Plugin System](../modules/plugin-system.md) +**Sprint:** [110_004 Router Plugins](../../../../implplan/SPRINT_20260110_110_004_PROGDL_nginx_router.md) + +## Overview + +Router plugins enable traffic shifting for progressive delivery. The orchestrator ships with an Nginx router plugin for v1, with HAProxy, Traefik, and AWS ALB available as additional plugins. + +--- + +## Router Plugin Interface + +All router plugins implement the `TrafficRouterPlugin` interface: + +```typescript +interface TrafficRouterPlugin { + // Configuration + configureRoute(config: RouteConfig): Promise; + + // Traffic operations + shiftTraffic(from: string, to: string, percentage: number): Promise; + getTrafficDistribution(): Promise; + + // Health + validateConfig(): Promise; + reload(): Promise; +} + +interface RouteConfig { + upstream: string; + serverName: string; + variations: Variation[]; + splitType: "weight" | "header" | "cookie"; + headerName?: string; + headerValueB?: string; + stickySession?: boolean; + stickyDuration?: number; +} + +interface Variation { + name: string; + targets: string[]; + weight: number; +} + +interface TrafficDistribution { + variations: { + name: string; + percentage: number; + targets: string[]; + }[]; +} +``` + +--- + +## Nginx Router Plugin (v1 Built-in) + +The Nginx plugin generates and manages Nginx configuration for traffic splitting. + +### Implementation + +```typescript +class NginxRouterPlugin implements TrafficRouterPlugin { + async configureRoute(config: RouteConfig): Promise { + const upstreamConfig = this.generateUpstreamConfig(config); + const serverConfig = this.generateServerConfig(config); + + // Write configuration files + await this.writeConfig( + `/etc/nginx/conf.d/upstream-${config.upstream}.conf`, + upstreamConfig + ); + await this.writeConfig( + `/etc/nginx/conf.d/server-${config.upstream}.conf`, + serverConfig + ); + + // Validate configuration + const validation = await this.validateConfig(); + if (!validation.valid) { + throw new Error(`Nginx config validation failed: ${validation.error}`); + } + + // Reload nginx + await this.reload(); + } +} +``` + +### Upstream Configuration + +```typescript +private generateUpstreamConfig(config: RouteConfig): string { + const lines: string[] = []; + + for (const variation of config.variations) { + lines.push(`upstream ${config.upstream}_${variation.name} {`); + + for (const target of variation.targets) { + lines.push(` server ${target};`); + } + + lines.push(`}`); + lines.push(``); + } + + // Combined upstream with weights (for percentage-based routing) + if (config.splitType === "weight") { + lines.push(`upstream ${config.upstream} {`); + + for (const variation of config.variations) { + const weight = variation.weight; + for (const target of variation.targets) { + lines.push(` server ${target} weight=${weight};`); + } + } + + lines.push(`}`); + } + + return lines.join("\n"); +} +``` + +### Server Configuration + +```typescript +private generateServerConfig(config: RouteConfig): string { + if (config.splitType === "header" || config.splitType === "cookie") { + // Split block based on header/cookie + return ` +map $http_${config.headerName || "x-variation"} $${config.upstream}_backend { + default ${config.upstream}_A; + "${config.headerValueB || "B"}" ${config.upstream}_B; +} + +server { + listen 80; + server_name ${config.serverName}; + + location / { + proxy_pass http://$${config.upstream}_backend; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + } +} +`; + } else { + // Weight-based (default) + return ` +server { + listen 80; + server_name ${config.serverName}; + + location / { + proxy_pass http://${config.upstream}; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + } +} +`; + } +} +``` + +### Traffic Shifting + +```typescript +async shiftTraffic(from: string, to: string, percentage: number): Promise { + const config = await this.getCurrentConfig(); + + // Update weights + for (const variation of config.variations) { + if (variation.name === to) { + variation.weight = percentage; + } else { + variation.weight = 100 - percentage; + } + } + + await this.configureRoute(config); +} + +async getTrafficDistribution(): Promise { + // Parse current nginx config to get weights + const config = await this.parseCurrentConfig(); + + return { + variations: config.variations.map(v => ({ + name: v.name, + percentage: v.weight, + targets: v.targets, + })), + }; +} +``` + +--- + +## AWS ALB Router Plugin + +The AWS ALB plugin manages weighted target groups for traffic splitting. + +### Implementation + +```typescript +class AWSALBRouterPlugin implements TrafficRouterPlugin { + private alb: AWS.ELBv2; + + async configureRoute(config: RouteConfig): Promise { + const listenerArn = config.listenerArn; + + // Create/update target groups for each variation + const targetGroupArns: Record = {}; + + for (const variation of config.variations) { + const tgArn = await this.ensureTargetGroup( + `${config.upstream}-${variation.name}`, + variation.targets + ); + targetGroupArns[variation.name] = tgArn; + } + + // Update listener rule with weighted target groups + await this.alb.modifyRule({ + RuleArn: config.ruleArn, + Actions: [{ + Type: "forward", + ForwardConfig: { + TargetGroups: config.variations.map(v => ({ + TargetGroupArn: targetGroupArns[v.name], + Weight: v.weight, + })), + TargetGroupStickinessConfig: { + Enabled: config.stickySession || false, + DurationSeconds: config.stickyDuration || 3600, + }, + }, + }], + }).promise(); + } + + async shiftTraffic(from: string, to: string, percentage: number): Promise { + const rule = await this.getRule(); + const forwardConfig = rule.Actions[0].ForwardConfig; + + // Update weights + for (const tg of forwardConfig.TargetGroups) { + if (tg.TargetGroupArn.includes(`-${to}`)) { + tg.Weight = percentage; + } else { + tg.Weight = 100 - percentage; + } + } + + await this.alb.modifyRule({ + RuleArn: rule.RuleArn, + Actions: rule.Actions, + }).promise(); + } + + async getTrafficDistribution(): Promise { + const rule = await this.getRule(); + const forwardConfig = rule.Actions[0].ForwardConfig; + + const variations = []; + for (const tg of forwardConfig.TargetGroups) { + const targets = await this.getTargetGroupTargets(tg.TargetGroupArn); + const name = tg.TargetGroupArn.split("-").pop(); + + variations.push({ + name, + percentage: tg.Weight, + targets: targets.map(t => t.Id), + }); + } + + return { variations }; + } +} +``` + +--- + +## Router Plugin Catalog + +| Plugin | Status | Description | +|--------|--------|-------------| +| Nginx | v1 Built-in | Configuration-based weight/header routing | +| HAProxy | Plugin | Runtime API for traffic management | +| Traefik | Plugin | Dynamic configuration via API | +| AWS ALB | Plugin | Weighted target groups | +| Envoy | Planned | xDS API integration | + +--- + +## Creating Custom Router Plugins + +To create a custom router plugin: + +1. **Implement Interface:** Create a class implementing `TrafficRouterPlugin` +2. **Register Plugin:** Add to plugin registry with capabilities +3. **Configuration Schema:** Define JSON Schema for plugin config +4. **Health Checks:** Implement connection testing +5. **Rollback Support:** Handle traffic reversion on failures + +### Example Plugin Manifest + +```yaml +plugin: + name: my-router + version: 1.0.0 + type: router + +capabilities: + - traffic-routing + - weight-based + - header-based + +config: + type: object + properties: + endpoint: + type: string + description: Router API endpoint + auth: + type: object + properties: + type: + enum: [basic, token] + credentialRef: + type: string +``` + +--- + +## See Also + +- [Progressive Delivery Module](../modules/progressive-delivery.md) +- [Plugin System](../modules/plugin-system.md) +- [Canary Controller](canary.md) +- [A/B Release Models](ab-releases.md) diff --git a/docs/modules/release-orchestrator/roadmap.md b/docs/modules/release-orchestrator/roadmap.md new file mode 100644 index 000000000..cae87d8c0 --- /dev/null +++ b/docs/modules/release-orchestrator/roadmap.md @@ -0,0 +1,246 @@ +# Implementation Roadmap + +> Phased delivery plan for the Release Orchestrator implementation. + +**Status:** Planned +**Source:** [Architecture Advisory Section 14](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) +**Related:** [Implementation Guide](implementation-guide.md), [Test Structure](test-structure.md) + +## Overview + +The Release Orchestrator is delivered in 8 phases over 34 weeks, progressively building from foundational infrastructure to full plugin ecosystem support. + +--- + +## Phased Delivery Plan + +### Phase 1: Foundation (Weeks 1-4) + +**Goal:** Core infrastructure and basic release management + +| Week | Deliverables | +|------|--------------| +| Week 1 | Database schema migration; INTHUB integration-manager; connection-profiles | +| Week 2 | ENVMGR environment-manager; target-registry (basic) | +| Week 3 | RELMAN component-registry; version-manager; release-manager | +| Week 4 | Basic release CRUD APIs; CLI commands; integration tests | + +**Exit Criteria:** +- Can create environments with config +- Can register components with image repos +- Can create releases with pinned digests +- Can list/search releases + +**Certified Path:** Manual release creation; no deployment yet + +--- + +### Phase 2: Workflow Engine (Weeks 5-8) + +**Goal:** Workflow execution capability + +| Week | Deliverables | +|------|--------------| +| Week 5 | WORKFL step-registry; built-in step types (approval, policy-gate, notify) | +| Week 6 | WORKFL workflow-designer; workflow template CRUD | +| Week 7 | WORKFL workflow-engine; DAG execution; state machine | +| Week 8 | Step executor; retry logic; timeout handling; workflow run APIs | + +**Exit Criteria:** +- Can create workflow templates via API +- Can execute workflows with approval steps +- Workflow state machine handles all transitions +- Step retries work correctly + +**Certified Path:** Approval-only workflows; no deployment execution yet + +--- + +### Phase 3: Promotion & Decision (Weeks 9-12) + +**Goal:** Promotion workflow with security gates + +| Week | Deliverables | +|------|--------------| +| Week 9 | PROMOT promotion-manager; approval-gateway | +| Week 10 | PROMOT decision-engine; security gate integration with SCANENG | +| Week 11 | Gate registry; freeze window gate; SoD enforcement | +| Week 12 | Promotion APIs; "Why blocked?" endpoint; decision record | + +**Exit Criteria:** +- Can request promotion +- Security gates evaluate scan verdicts +- Approval workflow enforces SoD +- Decision record captures gate results + +**Certified Path:** Promotions with security + approval gates; no deployment yet + +--- + +### Phase 4: Deployment Execution (Weeks 13-18) + +**Goal:** Deploy to Docker/Compose targets + +| Week | Deliverables | +|------|--------------| +| Week 13 | AGENTS agent-core; agent registration; heartbeat | +| Week 14 | AGENTS agent-docker; Docker host deployment | +| Week 15 | AGENTS agent-compose; Compose deployment | +| Week 16 | DEPLOY deploy-orchestrator; artifact-generator | +| Week 17 | DEPLOY rollback-manager; version sticker writing | +| Week 18 | RELEVI evidence-collector; evidence-signer; audit-exporter | + +**Exit Criteria:** +- Agents can register and receive tasks +- Docker deployment works with digest verification +- Compose deployment writes lock files +- Rollback restores previous version +- Evidence packets generated for deployments + +**Certified Path:** Full promotion -> deployment flow for Docker/Compose + +--- + +### Phase 5: UI & Polish (Weeks 19-22) + +**Goal:** Web console for release orchestration + +| Week | Deliverables | +|------|--------------| +| Week 19 | Dashboard components; metrics widgets | +| Week 20 | Environment overview; release detail screens | +| Week 21 | Workflow editor (graph); run visualization | +| Week 22 | Promotion UI; approval queue; "Why blocked?" modal | + +**Exit Criteria:** +- Dashboard shows operational metrics +- Can manage environments/releases via UI +- Can create/edit workflows in graph editor +- Can approve promotions via UI + +**Certified Path:** Complete v1 user experience + +--- + +### Phase 6: Progressive Delivery (Weeks 23-26) + +**Goal:** A/B releases and canary deployments + +| Week | Deliverables | +|------|--------------| +| Week 23 | PROGDL ab-manager; target-group A/B | +| Week 24 | PROGDL canary-controller; stage execution | +| Week 25 | PROGDL traffic-router; Nginx plugin | +| Week 26 | Canary UI; traffic visualization; health monitoring | + +**Exit Criteria:** +- Can create A/B release with variations +- Canary controller advances stages based on health +- Traffic router shifts weights +- Rollback on health failure works + +**Certified Path:** Target-group A/B; Nginx router-based A/B + +--- + +### Phase 7: Extended Targets (Weeks 27-30) + +**Goal:** ECS and Nomad support; SSH/WinRM agentless + +| Week | Deliverables | +|------|--------------| +| Week 27 | AGENTS agent-ssh; SSH remote executor | +| Week 28 | AGENTS agent-winrm; WinRM remote executor | +| Week 29 | AGENTS agent-ecs; ECS deployment | +| Week 30 | AGENTS agent-nomad; Nomad deployment | + +**Exit Criteria:** +- SSH deployment works with script execution +- WinRM deployment works with PowerShell +- ECS task definition updates work +- Nomad job submissions work + +**Certified Path:** All target types operational + +--- + +### Phase 8: Plugin Ecosystem (Weeks 31-34) + +**Goal:** Full plugin system; external integrations + +| Week | Deliverables | +|------|--------------| +| Week 31 | PLUGIN plugin-registry; plugin-loader | +| Week 32 | PLUGIN plugin-sandbox; plugin-sdk | +| Week 33 | GitHub plugin; GitLab plugin | +| Week 34 | Jenkins plugin; Vault plugin | + +**Exit Criteria:** +- Can install and configure plugins +- Plugins can contribute step types +- Plugins can contribute integrations +- Plugin sandbox enforces limits + +**Certified Path:** GitHub + Harbor + Docker/Compose + Vault + +--- + +## Resource Requirements + +### Team Structure + +| Role | Count | Responsibilities | +|------|-------|------------------| +| Tech Lead | 1 | Architecture decisions; code review; unblocking | +| Backend Engineers | 4 | Module development; API implementation | +| Frontend Engineers | 2 | Web console; dashboard; workflow editor | +| DevOps Engineer | 1 | CI/CD; infrastructure; agent deployment | +| QA Engineer | 1 | Test automation; integration testing | +| Technical Writer | 0.5 | Documentation; API docs; user guides | + +### Infrastructure Requirements + +| Component | Specification | +|-----------|---------------| +| PostgreSQL | Primary database; 16+ recommended; read replicas for scale | +| Redis | Job queues; caching; session storage | +| Object Storage | S3-compatible; evidence packets; large artifacts | +| Container Runtime | Docker; for plugin sandboxes | +| Kubernetes | Optional; for Stella core deployment (not required for targets) | + +--- + +## Risk Mitigation + +| Risk | Likelihood | Impact | Mitigation | +|------|------------|--------|------------| +| Agent security complexity | High | High | Early security review; penetration testing; mTLS implementation in Phase 4 | +| Workflow state machine edge cases | Medium | High | Comprehensive state transition tests; chaos testing | +| Plugin sandbox escapes | Low | Critical | Security audit; capability restrictions; resource limits | +| Database migration issues | Medium | Medium | Staged rollout; rollback scripts; data validation | +| UI performance with large workflows | Medium | Medium | Virtual rendering; lazy loading; performance testing | +| Integration compatibility | High | Medium | Abstract connector interface; extensive integration tests | + +--- + +## Success Metrics + +| Phase | Key Metrics | +|-------|-------------| +| Phase 1 | Release creation time < 5s; API latency p99 < 200ms | +| Phase 2 | Workflow execution reliability > 99.9% | +| Phase 3 | Gate evaluation time < 500ms; SoD enforcement 100% | +| Phase 4 | Deployment success rate > 99%; rollback time < 60s | +| Phase 5 | UI initial load < 2s; real-time update latency < 1s | +| Phase 6 | Canary rollback trigger time < 30s | +| Phase 7 | All target type coverage with unified API | +| Phase 8 | Plugin sandbox isolation verified by security audit | + +--- + +## References + +- [Sprint Index](../../implplan/SPRINT_20260110_100_000_INDEX_release_orchestrator.md) +- [Implementation Guide](implementation-guide.md) +- [Test Structure](test-structure.md) +- [Architecture Overview](architecture.md) diff --git a/docs/modules/release-orchestrator/security/agent-security.md b/docs/modules/release-orchestrator/security/agent-security.md new file mode 100644 index 000000000..f2afabb27 --- /dev/null +++ b/docs/modules/release-orchestrator/security/agent-security.md @@ -0,0 +1,286 @@ +# Agent Security Model + +## Overview + +Agents are trusted components that execute deployment tasks on targets. Their security model ensures: +- Strong identity through mTLS certificates +- Minimal privilege through scoped task credentials +- Audit trail through signed task receipts +- Isolation through process sandboxing + +## Agent Registration Flow + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ AGENT REGISTRATION FLOW │ +│ │ +│ 1. Admin generates registration token (one-time use) │ +│ POST /api/v1/admin/agent-tokens │ +│ Response: { token: "reg_xxx", expiresAt: "..." } │ +│ │ +│ 2. Agent starts with registration token │ +│ ./stella-agent --register --token=reg_xxx │ +│ │ +│ 3. Agent requests mTLS certificate │ +│ POST /api/v1/agents/register │ +│ Headers: X-Registration-Token: reg_xxx │ +│ Body: { name, version, capabilities, csr } │ +│ Response: { agentId, certificate, caCertificate } │ +│ │ +│ 4. Agent establishes mTLS connection │ +│ Uses issued certificate for all subsequent requests │ +│ │ +│ 5. Agent requests short-lived JWT for task execution │ +│ POST /api/v1/agents/token (over mTLS) │ +│ Response: { token, expiresIn: 3600 } │ +│ │ +│ 6. Agent refreshes token before expiration │ +│ Token refresh only over mTLS connection │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## mTLS Communication + +All agent-to-core communication uses mutual TLS: + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ AGENT COMMUNICATION SECURITY │ +│ │ +│ ┌──────────────┐ ┌──────────────┐ │ +│ │ AGENT │ │ STELLA CORE │ │ +│ └──────┬───────┘ └──────┬───────┘ │ +│ │ │ │ +│ │ mTLS (mutual TLS) │ │ +│ │ - Agent cert signed by Stella CA │ │ +│ │ - Server cert verified by Agent │ │ +│ │ - TLS 1.3 only │ │ +│ │ - Perfect forward secrecy │ │ +│ │◄────────────────────────────────────────►│ │ +│ │ │ │ +│ │ Encrypted payload │ │ +│ │ - Task payloads encrypted with │ │ +│ │ agent-specific key │ │ +│ │ - Logs encrypted in transit │ │ +│ │◄────────────────────────────────────────►│ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### TLS Requirements + +| Requirement | Value | +|-------------|-------| +| Protocol | TLS 1.3 only | +| Cipher Suites | TLS_AES_256_GCM_SHA384, TLS_CHACHA20_POLY1305_SHA256 | +| Key Exchange | ECDHE with P-384 or X25519 | +| Certificate Key | RSA 4096-bit or ECDSA P-384 | +| Certificate Validity | 90 days (auto-renewed) | + +## Certificate Management + +### Certificate Structure + +```typescript +interface AgentCertificate { + subject: { + CN: string; // Agent name + O: string; // "Stella Ops" + OU: string; // Tenant ID + }; + serialNumber: string; + issuer: string; // Stella CA + validFrom: DateTime; + validTo: DateTime; + extensions: { + keyUsage: ["digitalSignature", "keyEncipherment"]; + extendedKeyUsage: ["clientAuth"]; + subjectAltName: string[]; // Agent ID as URI + }; +} +``` + +### Certificate Renewal + +Agents automatically renew certificates before expiration: +1. Agent detects certificate expiring within 30 days +2. Agent generates new CSR with same identity +3. Agent submits renewal request over existing mTLS connection +4. Authority issues new certificate +5. Agent transitions to new certificate seamlessly + +## Secrets Management + +Secrets are NEVER stored in the Stella database. Only vault references are stored. + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ SECRETS FLOW (NEVER STORED IN DB) │ +│ │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ VAULT │ │ STELLA CORE │ │ AGENT │ │ +│ │ (Source) │ │ (Broker) │ │ (Consumer) │ │ +│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ +│ │ │ │ │ +│ │ │ Task requires secret │ │ +│ │ │ │ │ +│ │ Fetch with service │ │ │ +│ │ account token │ │ │ +│ │◄─────────────────────── │ │ +│ │ │ │ │ +│ │ Return secret │ │ │ +│ │ (wrapped, short TTL) │ │ │ +│ │────────────────────────► │ │ +│ │ │ │ │ +│ │ │ Embed in task payload │ │ +│ │ │ (encrypted) │ │ +│ │ │────────────────────────► │ +│ │ │ │ │ +│ │ │ │ Decrypt │ +│ │ │ │ Use for task │ +│ │ │ │ Discard │ +│ │ +│ Rules: │ +│ - Secrets NEVER stored in Stella database │ +│ - Only Vault references stored │ +│ - Secrets fetched at execution time only │ +│ - Secrets not logged (masked in logs) │ +│ - Secrets not persisted in agent memory beyond task scope │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Task Security + +### Task Assignment + +```typescript +interface AgentTask { + id: UUID; + type: TaskType; + targetId: UUID; + payload: TaskPayload; + credentials: EncryptedCredentials; // Encrypted with agent's public key + timeout: number; + priority: TaskPriority; + idempotencyKey: string; + assignedAt: DateTime; + expiresAt: DateTime; +} +``` + +### Credential Scoping + +Task credentials are: +- Scoped to specific target only +- Valid only for task duration +- Encrypted with agent's public key +- Logged when accessed (without values) + +### Task Execution Isolation + +Agents execute tasks with isolation: +```typescript +interface TaskExecutionContext { + // Process isolation + workingDirectory: string; // Unique per task + processUser: string; // Non-root user + networkNamespace: string; // If network isolation enabled + + // Resource limits + memoryLimit: number; // Bytes + cpuLimit: number; // Millicores + diskLimit: number; // Bytes + networkEgress: string[]; // Allowed destinations + + // Cleanup + cleanupOnComplete: boolean; + cleanupTimeout: number; +} +``` + +## Agent Capabilities + +Agents declare capabilities that determine what tasks they can execute: + +```typescript +interface AgentCapabilities { + docker?: DockerCapability; + compose?: ComposeCapability; + ssh?: SshCapability; + winrm?: WinrmCapability; + ecs?: EcsCapability; + nomad?: NomadCapability; +} + +interface DockerCapability { + version: string; + apiVersion: string; + runtimes: string[]; + registryAuth: boolean; +} + +interface ComposeCapability { + version: string; + fileFormats: string[]; +} +``` + +## Heartbeat Protocol + +```typescript +interface AgentHeartbeat { + agentId: UUID; + timestamp: DateTime; + status: "healthy" | "degraded"; + resourceUsage: { + cpuPercent: number; + memoryPercent: number; + diskPercent: number; + networkRxBytes: number; + networkTxBytes: number; + }; + activeTaskCount: number; + completedTasks: number; + failedTasks: number; + errors: string[]; + signature: string; // HMAC of heartbeat data +} +``` + +### Heartbeat Validation + +1. Verify signature matches expected HMAC +2. Check timestamp is within acceptable skew (30s) +3. Update agent status based on heartbeat content +4. Trigger alerts if heartbeat missing for >90s + +## Agent Revocation + +When an agent is compromised or decommissioned: + +1. Certificate added to CRL (Certificate Revocation List) +2. All pending tasks for agent cancelled +3. Agent removed from target assignments +4. Audit event logged +5. New agent can be registered with same name (new identity) + +## Security Checklist + +| Control | Implementation | +|---------|----------------| +| Identity | mTLS certificates signed by internal CA | +| Authentication | Certificate-based + short-lived JWT | +| Authorization | Task-scoped credentials | +| Encryption | TLS 1.3 for transport, envelope encryption for secrets | +| Isolation | Process sandboxing, resource limits | +| Audit | All task assignments and completions logged | +| Revocation | CRL for compromised agents | +| Secret handling | Vault integration, no persistence | + +## References + +- [Security Overview](overview.md) +- [Authentication & Authorization](auth.md) +- [Threat Model](threat-model.md) diff --git a/docs/modules/release-orchestrator/security/audit-trail.md b/docs/modules/release-orchestrator/security/audit-trail.md new file mode 100644 index 000000000..91b1cdcfb --- /dev/null +++ b/docs/modules/release-orchestrator/security/audit-trail.md @@ -0,0 +1,239 @@ +# Audit Trail + +> Audit event structure and audited operations for compliance and forensics. + +**Status:** Planned (not yet implemented) +**Source:** [Architecture Advisory Section 8.5](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) +**Related Modules:** [Evidence Module](../modules/evidence.md), [Security Overview](overview.md) +**Sprints:** [109_001 Evidence Collector](../../../../implplan/SPRINT_20260110_109_001_RELEVI_evidence_collector.md) + +## Overview + +The Release Orchestrator maintains a tamper-evident audit trail of all security-relevant operations. Audit events are cryptographically chained to detect tampering. + +--- + +## Audit Event Structure + +### TypeScript Interface + +```typescript +interface AuditEvent { + id: UUID; + timestamp: DateTime; + tenantId: UUID; + + // Actor + actorType: "user" | "agent" | "system" | "plugin"; + actorId: UUID; + actorName: string; + actorIp?: string; + + // Action + action: string; // "promotion.approved", "deployment.started" + resource: string; // "promotion" + resourceId: UUID; + + // Context + environmentId?: UUID; + releaseId?: UUID; + promotionId?: UUID; + + // Details + before?: object; // State before (for updates) + after?: object; // State after + metadata?: object; // Additional context + + // Integrity + previousEventHash: string; // Hash chain for tamper detection + eventHash: string; +} +``` + +--- + +## Audited Operations + +| Category | Operations | +|----------|------------| +| **Authentication** | Login, logout, token refresh, failed attempts | +| **Authorization** | Permission denied events | +| **Environments** | Create, update, delete, freeze window changes | +| **Releases** | Create, deprecate, archive | +| **Promotions** | Request, approve, reject, cancel | +| **Deployments** | Start, complete, fail, rollback | +| **Targets** | Register, update, delete, health changes | +| **Agents** | Register, heartbeat gaps, capability changes | +| **Integrations** | Create, update, delete, test | +| **Plugins** | Enable, disable, config changes | +| **Evidence** | Create (never update/delete) | + +--- + +## Hash Chain + +### Chain Verification + +The audit trail uses SHA-256 hash chaining for tamper detection: + +```typescript +interface HashChainEntry { + eventId: UUID; + eventHash: string; + previousEventHash: string; +} + +function computeEventHash(event: AuditEvent): string { + const payload = JSON.stringify({ + id: event.id, + timestamp: event.timestamp, + tenantId: event.tenantId, + actorType: event.actorType, + actorId: event.actorId, + action: event.action, + resource: event.resource, + resourceId: event.resourceId, + previousEventHash: event.previousEventHash, + }); + + return sha256(payload); +} + +function verifyChain(events: AuditEvent[]): VerificationResult { + for (let i = 1; i < events.length; i++) { + const current = events[i]; + const previous = events[i - 1]; + + if (current.previousEventHash !== previous.eventHash) { + return { + valid: false, + brokenAt: i, + reason: "Hash chain broken" + }; + } + + const computed = computeEventHash(current); + if (computed !== current.eventHash) { + return { + valid: false, + brokenAt: i, + reason: "Event hash mismatch" + }; + } + } + + return { valid: true }; +} +``` + +--- + +## Example Audit Events + +### Promotion Approved + +```json +{ + "id": "evt-123", + "timestamp": "2026-01-09T14:32:15Z", + "tenantId": "tenant-uuid", + "actorType": "user", + "actorId": "user-uuid", + "actorName": "jane@example.com", + "actorIp": "192.168.1.100", + "action": "promotion.approved", + "resource": "promotion", + "resourceId": "promo-uuid", + "environmentId": "env-uuid", + "releaseId": "rel-uuid", + "promotionId": "promo-uuid", + "before": { + "status": "pending" + }, + "after": { + "status": "approved", + "approvals": 2 + }, + "metadata": { + "comment": "LGTM" + }, + "previousEventHash": "sha256:abc...", + "eventHash": "sha256:def..." +} +``` + +### Deployment Started + +```json +{ + "id": "evt-124", + "timestamp": "2026-01-09T14:32:20Z", + "tenantId": "tenant-uuid", + "actorType": "system", + "actorId": "system", + "actorName": "deployment-orchestrator", + "action": "deployment.started", + "resource": "deployment", + "resourceId": "deploy-uuid", + "environmentId": "env-uuid", + "releaseId": "rel-uuid", + "promotionId": "promo-uuid", + "after": { + "status": "deploying", + "strategy": "rolling", + "targetCount": 5 + }, + "previousEventHash": "sha256:def...", + "eventHash": "sha256:ghi..." +} +``` + +--- + +## Retention Policy + +| Environment | Retention Period | +|-------------|------------------| +| All tenants | 7 years (compliance) | +| After tenant deletion | 7 years (legal hold) | +| Archive format | NDJSON, signed | + +--- + +## Export Format + +Audit events can be exported for compliance reporting: + +```bash +# Export audit trail for a date range +GET /api/v1/audit/export? + start=2026-01-01T00:00:00Z& + end=2026-01-31T23:59:59Z& + format=ndjson +``` + +Response includes signed digest for verification: + +```json +{ + "export": { + "startDate": "2026-01-01T00:00:00Z", + "endDate": "2026-01-31T23:59:59Z", + "eventCount": 15234, + "firstEventHash": "sha256:abc...", + "lastEventHash": "sha256:xyz...", + "downloadUrl": "https://..." + }, + "signature": "base64-signature", + "signedAt": "2026-02-01T00:00:00Z" +} +``` + +--- + +## See Also + +- [Security Overview](overview.md) +- [Evidence](../modules/evidence.md) +- [Logging](../operations/logging.md) +- [Evidence Schema](../appendices/evidence-schema.md) diff --git a/docs/modules/release-orchestrator/security/auth.md b/docs/modules/release-orchestrator/security/auth.md new file mode 100644 index 000000000..1ee0dc2f4 --- /dev/null +++ b/docs/modules/release-orchestrator/security/auth.md @@ -0,0 +1,305 @@ +# Authentication & Authorization + +## Authentication Methods + +### OAuth 2.0 for Human Users + +``` +┌──────────────────────────────────────────────────────────────────────────────┐ +│ OAUTH 2.0 AUTHORIZATION CODE FLOW │ +│ │ +│ ┌──────────┐ ┌──────────────┐ │ +│ │ Browser │ │ Authority │ │ +│ └────┬─────┘ └──────┬───────┘ │ +│ │ │ │ +│ │ 1. Login request │ │ +│ │ ────────────────────────────────────► │ │ +│ │ │ │ +│ │ 2. Redirect to IdP │ │ +│ │ ◄──────────────────────────────────── │ │ +│ │ │ │ +│ │ 3. User authenticates at IdP │ │ +│ │ ─────────────────────────────────► │ │ +│ │ │ │ +│ │ 4. IdP callback with code │ │ +│ │ ◄──────────────────────────────────── │ │ +│ │ │ │ +│ │ 5. Exchange code for tokens │ │ +│ │ ────────────────────────────────────► │ │ +│ │ │ │ +│ │ 6. Access token + refresh token │ │ +│ │ ◄──────────────────────────────────── │ │ +│ │ │ │ +└──────────────────────────────────────────────────────────────────────────────┘ +``` + +### mTLS for Agents + +Agents authenticate using mutual TLS with certificates issued by Stella's internal CA. + +**Registration Flow:** +1. Admin generates one-time registration token +2. Agent starts with registration token +3. Agent submits CSR (Certificate Signing Request) +4. Authority issues certificate signed by Stella CA +5. Agent uses certificate for all subsequent requests + +### API Keys for Service-to-Service + +External services can use API keys for programmatic access: +- Keys are tenant-scoped +- Keys can have restricted permissions +- Keys can have expiration dates +- Key usage is audited + +## JWT Token Structure + +### Access Token Claims + +```typescript +interface AccessTokenClaims { + // Standard claims + iss: string; // "https://authority.stella.local" + sub: string; // User ID + aud: string[]; // ["stella-api"] + exp: number; // Expiration timestamp + iat: number; // Issued at timestamp + jti: string; // Unique token ID + + // Custom claims + tenant_id: string; + roles: string[]; + permissions: Permission[]; + email?: string; + name?: string; +} +``` + +### Token Lifetimes + +| Token Type | Lifetime | Refresh | +|------------|----------|---------| +| Access Token | 15 minutes | Via refresh token | +| Refresh Token | 7 days | Rotated on use | +| Agent Token | 1 hour | Via mTLS connection | +| API Key | Configurable | Not refreshed | + +## Authorization Model + +### Resource Types + +```typescript +type ResourceType = + | "environment" + | "release" + | "promotion" + | "target" + | "agent" + | "workflow" + | "plugin" + | "integration" + | "evidence"; +``` + +### Action Types + +```typescript +type ActionType = + | "create" + | "read" + | "update" + | "delete" + | "execute" + | "approve" + | "deploy" + | "rollback"; +``` + +### Permission Structure + +```typescript +interface Permission { + resource: ResourceType; + action: ActionType; + scope?: PermissionScope; + conditions?: Condition[]; +} + +type PermissionScope = + | "*" // All resources + | { environmentId: UUID } // Specific environment + | { labels: Record }; // Label-based +``` + +### Built-in Roles + +| Role | Description | Key Permissions | +|------|-------------|-----------------| +| `admin` | Full access | All permissions | +| `release_manager` | Manage releases and promotions | Create releases, request promotions | +| `deployer` | Execute deployments | Approve promotions (where allowed), view releases | +| `approver` | Approve promotions | Approve promotions (SoD respected) | +| `viewer` | Read-only access | Read all resources | +| `agent` | Agent service account | Execute deployment tasks | + +### Role Definitions + +```typescript +const roles = { + admin: { + permissions: [ + { resource: "*", action: "*" } + ] + }, + release_manager: { + permissions: [ + { resource: "release", action: "create" }, + { resource: "release", action: "read" }, + { resource: "release", action: "update" }, + { resource: "promotion", action: "create" }, + { resource: "promotion", action: "read" }, + { resource: "environment", action: "read" }, + { resource: "workflow", action: "read" }, + { resource: "workflow", action: "execute" } + ] + }, + deployer: { + permissions: [ + { resource: "release", action: "read" }, + { resource: "promotion", action: "read" }, + { resource: "promotion", action: "approve" }, + { resource: "environment", action: "read" }, + { resource: "target", action: "read" }, + { resource: "agent", action: "read" } + ] + }, + approver: { + permissions: [ + { resource: "promotion", action: "read" }, + { resource: "promotion", action: "approve" }, + { resource: "release", action: "read" }, + { resource: "environment", action: "read" } + ] + }, + viewer: { + permissions: [ + { resource: "*", action: "read" } + ] + } +}; +``` + +## Environment-Scoped Permissions + +Permissions can be scoped to specific environments: + +```typescript +// User can approve promotions only to staging +{ + resource: "promotion", + action: "approve", + scope: { environmentId: "staging-env-id" } +} + +// User can deploy only to targets with specific labels +{ + resource: "target", + action: "deploy", + scope: { labels: { "tier": "frontend" } } +} +``` + +## Separation of Duties (SoD) + +When SoD is enabled for an environment: +- The user who requested a promotion cannot approve it +- The user who created a release cannot be the sole approver +- Approval records include SoD verification status + +```typescript +interface ApprovalValidation { + promotionId: UUID; + approverId: UUID; + requesterId: UUID; + sodRequired: boolean; + sodSatisfied: boolean; + validationResult: "valid" | "self_approval_denied" | "sod_violation"; +} +``` + +## Permission Checking Algorithm + +```typescript +async function checkPermission( + userId: UUID, + resource: ResourceType, + action: ActionType, + resourceId?: UUID +): Promise { + // 1. Get user's roles and direct permissions + const userRoles = await getUserRoles(userId); + const userPermissions = await getUserPermissions(userId); + + // 2. Expand role permissions + const rolePermissions = userRoles.flatMap(r => roles[r].permissions); + const allPermissions = [...rolePermissions, ...userPermissions]; + + // 3. Check for matching permission + for (const perm of allPermissions) { + if (matchesResource(perm.resource, resource) && + matchesAction(perm.action, action) && + matchesScope(perm.scope, resourceId) && + evaluateConditions(perm.conditions)) { + return true; + } + } + + return false; +} + +function matchesResource(pattern: string, resource: string): boolean { + return pattern === "*" || pattern === resource; +} + +function matchesAction(pattern: string, action: string): boolean { + return pattern === "*" || pattern === action; +} +``` + +## API Authorization Headers + +All API requests require: +```http +Authorization: Bearer +``` + +For agent requests (over mTLS): +```http +X-Agent-Id: +Authorization: Bearer +``` + +## Permission Denied Response + +```json +{ + "success": false, + "error": { + "code": "PERMISSION_DENIED", + "message": "User does not have permission to approve promotions to production", + "details": { + "resource": "promotion", + "action": "approve", + "scope": { "environmentId": "prod-env-id" }, + "requiredRoles": ["admin", "approver"], + "userRoles": ["viewer"] + } + } +} +``` + +## References + +- [Security Overview](overview.md) +- [Agent Security](agent-security.md) +- [Authority Module](../../../authority/architecture.md) diff --git a/docs/modules/release-orchestrator/security/overview.md b/docs/modules/release-orchestrator/security/overview.md new file mode 100644 index 000000000..3d5b8b3fd --- /dev/null +++ b/docs/modules/release-orchestrator/security/overview.md @@ -0,0 +1,281 @@ +# Security Architecture Overview + +## Security Principles + +| Principle | Implementation | +|-----------|----------------| +| **Defense in depth** | Multiple layers: network, auth, authz, audit | +| **Least privilege** | Role-based access; minimal permissions | +| **Zero trust** | All requests authenticated; mTLS for agents | +| **Secrets hygiene** | Secrets in vault; never in DB; ephemeral injection | +| **Audit everything** | All mutations logged; evidence trail | +| **Immutable evidence** | Evidence packets append-only; cryptographically signed | + +## Authentication Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ AUTHENTICATION ARCHITECTURE │ +│ │ +│ Human Users Service/Agent │ +│ ┌──────────┐ ┌──────────┐ │ +│ │ Browser │ │ Agent │ │ +│ └────┬─────┘ └────┬─────┘ │ +│ │ │ │ +│ │ OAuth 2.0 │ mTLS + JWT │ +│ │ Authorization Code │ │ +│ ▼ ▼ │ +│ ┌──────────────────────────────────────────────────────────────────┐ │ +│ │ AUTHORITY MODULE │ │ +│ │ │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ OAuth 2.0 │ │ mTLS │ │ API Key │ │ │ +│ │ │ Provider │ │ Validator │ │ Validator │ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────────────────────┐ │ │ +│ │ │ TOKEN ISSUER │ │ │ +│ │ │ - Short-lived JWT (15 min) │ │ │ +│ │ │ - Contains: user_id, tenant_id, roles, permissions │ │ │ +│ │ │ - Signed with RS256 │ │ │ +│ │ └─────────────────────────────────────────────────────────────┘ │ │ +│ └──────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌──────────────────────────────────────────────────────────────────┐ │ +│ │ API GATEWAY │ │ +│ │ │ │ +│ │ - Validate JWT signature │ │ +│ │ - Check token expiration │ │ +│ │ - Extract tenant context │ │ +│ │ - Enforce rate limits │ │ +│ └──────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Authorization Model + +### Permission Structure + +```typescript +interface Permission { + resource: ResourceType; + action: ActionType; + scope?: ScopeType; + conditions?: Condition[]; +} + +type ResourceType = + | "environment" + | "release" + | "promotion" + | "target" + | "agent" + | "workflow" + | "plugin" + | "integration" + | "evidence"; + +type ActionType = + | "create" + | "read" + | "update" + | "delete" + | "execute" + | "approve" + | "deploy" + | "rollback"; + +type ScopeType = + | "*" // All resources + | { environmentId: UUID } // Specific environment + | { labels: Record }; // Label-based +``` + +### Role Definitions + +| Role | Permissions | +|------|-------------| +| `admin` | All permissions on all resources | +| `release-manager` | Full access to releases, promotions; read environments/targets | +| `deployer` | Read releases; create/read promotions; read targets | +| `approver` | Read/approve promotions | +| `viewer` | Read-only access to all resources | + +### Environment-Scoped Roles + +Roles can be scoped to specific environments: + +```typescript +// Example: Production deployer can only deploy to production +const prodDeployer = { + role: "deployer", + scope: { environmentId: "prod-environment-uuid" } +}; +``` + +## Policy Enforcement Points + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ POLICY ENFORCEMENT POINTS │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ API LAYER (PEP 1) │ │ +│ │ - Authenticate request │ │ +│ │ - Check resource-level permissions │ │ +│ │ - Enforce tenant isolation │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ SERVICE LAYER (PEP 2) │ │ +│ │ - Check business-level permissions │ │ +│ │ - Validate separation of duties │ │ +│ │ - Enforce approval policies │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ DECISION ENGINE (PEP 3) │ │ +│ │ - Evaluate security gates │ │ +│ │ - Evaluate custom OPA policies │ │ +│ │ - Produce signed decision records │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ DATA LAYER (PEP 4) │ │ +│ │ - Row-level security (tenant_id) │ │ +│ │ - Append-only enforcement (evidence) │ │ +│ │ - Encryption at rest │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Agent Security Model + +See [Agent Security](agent-security.md) for detailed agent security architecture. + +Key features: +- mTLS authentication with CA-signed certificates +- One-time registration tokens +- Short-lived JWT for task execution +- Encrypted task payloads +- Scoped credentials per task + +## Secrets Management + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ SECRETS FLOW (NEVER STORED IN DB) │ +│ │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ VAULT │ │ STELLA CORE │ │ AGENT │ │ +│ │ (Source) │ │ (Broker) │ │ (Consumer) │ │ +│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ +│ │ │ │ │ +│ │ │ Task requires secret │ │ +│ │ │ │ │ +│ │ Fetch with service │ │ │ +│ │ account token │ │ │ +│ │◄─────────────────────── │ │ +│ │ │ │ │ +│ │ Return secret │ │ │ +│ │ (wrapped, short TTL) │ │ │ +│ │───────────────────────► │ │ +│ │ │ │ │ +│ │ │ Embed in task payload │ │ +│ │ │ (encrypted) │ │ +│ │ │───────────────────────► │ +│ │ │ │ │ +│ │ │ │ Decrypt │ +│ │ │ │ Use for task │ +│ │ │ │ Discard │ +│ │ +│ Rules: │ +│ - Secrets NEVER stored in Stella database │ +│ - Only Vault references stored │ +│ - Secrets fetched at execution time only │ +│ - Secrets not logged (masked in logs) │ +│ - Secrets not persisted in agent memory beyond task scope │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Threat Model + +| Threat | Attack Vector | Mitigation | +|--------|---------------|------------| +| **Credential theft** | Database breach | Secrets never in DB; only vault refs | +| **Token replay** | Stolen JWT | Short-lived tokens (15 min); refresh tokens rotated | +| **Agent impersonation** | Fake agent | mTLS with CA-signed certs; registration token one-time | +| **Digest tampering** | Modified image | Digest verification at pull time; mismatch = failure | +| **Evidence tampering** | Modified audit records | Append-only table; cryptographic signing | +| **Privilege escalation** | Compromised account | Role-based access; SoD enforcement; audit logs | +| **Supply chain attack** | Malicious plugin | Plugin sandbox; capability declarations; review process | +| **Lateral movement** | Compromised target | Short-lived task credentials; scoped permissions | +| **Data exfiltration** | Log/artifact theft | Encryption at rest; network segmentation | +| **Denial of service** | Resource exhaustion | Rate limiting; resource quotas; circuit breakers | + +## Audit Trail + +### Audit Event Structure + +```typescript +interface AuditEvent { + id: UUID; + timestamp: DateTime; + tenantId: UUID; + + // Actor + actorType: "user" | "agent" | "system" | "plugin"; + actorId: UUID; + actorName: string; + actorIp?: string; + + // Action + action: string; // "promotion.approved", "deployment.started" + resource: string; // "promotion" + resourceId: UUID; + + // Context + environmentId?: UUID; + releaseId?: UUID; + promotionId?: UUID; + + // Details + before?: object; // State before (for updates) + after?: object; // State after + metadata?: object; // Additional context + + // Integrity + previousEventHash: string; // Hash chain for tamper detection + eventHash: string; +} +``` + +### Audited Operations + +| Category | Operations | +|----------|------------| +| **Authentication** | Login, logout, token refresh, failed attempts | +| **Authorization** | Permission denied events | +| **Environments** | Create, update, delete, freeze window changes | +| **Releases** | Create, deprecate, archive | +| **Promotions** | Request, approve, reject, cancel | +| **Deployments** | Start, complete, fail, rollback | +| **Targets** | Register, update, delete, health changes | +| **Agents** | Register, heartbeat gaps, capability changes | +| **Integrations** | Create, update, delete, test | +| **Plugins** | Enable, disable, config changes | +| **Evidence** | Create (never update/delete) | + +## References + +- [Authentication & Authorization](auth.md) +- [Agent Security](agent-security.md) +- [Threat Model](threat-model.md) +- [Audit Trail](audit-trail.md) diff --git a/docs/modules/release-orchestrator/security/threat-model.md b/docs/modules/release-orchestrator/security/threat-model.md new file mode 100644 index 000000000..7879a0dbc --- /dev/null +++ b/docs/modules/release-orchestrator/security/threat-model.md @@ -0,0 +1,207 @@ +# Threat Model + +## Overview + +This document identifies threats to the Release Orchestrator and their mitigations. + +## Threat Categories + +### T1: Credential Theft + +| Aspect | Description | +|--------|-------------| +| **Threat** | Attacker gains access to credentials through database breach | +| **Attack Vector** | SQL injection, database backup theft, insider threat | +| **Assets at Risk** | Registry credentials, vault tokens, SSH keys | +| **Mitigation** | Secrets NEVER stored in database; only vault references stored | +| **Detection** | Anomalous vault access patterns, failed authentication attempts | + +### T2: Token Replay + +| Aspect | Description | +|--------|-------------| +| **Threat** | Attacker captures and reuses valid JWT tokens | +| **Attack Vector** | Man-in-the-middle, log file exposure, memory dump | +| **Assets at Risk** | User sessions, API access | +| **Mitigation** | Short-lived tokens (15 min), refresh token rotation, TLS everywhere | +| **Detection** | Token used from unusual IP, concurrent sessions | + +### T3: Agent Impersonation + +| Aspect | Description | +|--------|-------------| +| **Threat** | Attacker registers fake agent to receive deployment tasks | +| **Attack Vector** | Stolen registration token, certificate forgery | +| **Assets at Risk** | Deployment credentials, target access | +| **Mitigation** | One-time registration tokens, mTLS with CA-signed certs | +| **Detection** | Registration from unexpected network, capability mismatch | + +### T4: Digest Tampering + +| Aspect | Description | +|--------|-------------| +| **Threat** | Attacker modifies container image after release creation | +| **Attack Vector** | Registry compromise, man-in-the-middle at pull time | +| **Assets at Risk** | Application integrity, supply chain | +| **Mitigation** | Digest verification at pull time; mismatch = deployment failure | +| **Detection** | Pull failures due to digest mismatch | + +### T5: Evidence Tampering + +| Aspect | Description | +|--------|-------------| +| **Threat** | Attacker modifies audit records to hide malicious activity | +| **Attack Vector** | Database admin access, SQL injection | +| **Assets at Risk** | Audit integrity, compliance | +| **Mitigation** | Append-only table, cryptographic signing, no UPDATE/DELETE | +| **Detection** | Signature verification failure, hash chain break | + +### T6: Privilege Escalation + +| Aspect | Description | +|--------|-------------| +| **Threat** | User gains permissions beyond their role | +| **Attack Vector** | Role assignment exploit, permission bypass | +| **Assets at Risk** | Environment access, approval authority | +| **Mitigation** | Role-based access, SoD enforcement, audit logs | +| **Detection** | Unusual permission patterns, SoD violation attempts | + +### T7: Supply Chain Attack + +| Aspect | Description | +|--------|-------------| +| **Threat** | Malicious plugin injected into workflow | +| **Attack Vector** | Plugin repository compromise, typosquatting | +| **Assets at Risk** | All environments, all credentials | +| **Mitigation** | Plugin sandbox, capability declarations, signed manifests | +| **Detection** | Unexpected network egress, resource anomalies | + +### T8: Lateral Movement + +| Aspect | Description | +|--------|-------------| +| **Threat** | Attacker uses compromised target to access others | +| **Attack Vector** | Target compromise, credential reuse | +| **Assets at Risk** | Other targets, environments | +| **Mitigation** | Short-lived task credentials, scoped permissions | +| **Detection** | Cross-target credential use, unexpected connections | + +### T9: Data Exfiltration + +| Aspect | Description | +|--------|-------------| +| **Threat** | Attacker extracts logs, artifacts, or configuration | +| **Attack Vector** | API abuse, log aggregator compromise | +| **Assets at Risk** | Application data, deployment configurations | +| **Mitigation** | Encryption at rest, network segmentation, audit logging | +| **Detection** | Large data transfers, unusual API patterns | + +### T10: Denial of Service + +| Aspect | Description | +|--------|-------------| +| **Threat** | Attacker exhausts resources to prevent deployments | +| **Attack Vector** | API flooding, workflow loop, agent task spam | +| **Assets at Risk** | Service availability | +| **Mitigation** | Rate limiting, resource quotas, circuit breakers | +| **Detection** | Resource exhaustion alerts, traffic spikes | + +## STRIDE Analysis + +| Category | Threats | Primary Mitigations | +|----------|---------|---------------------| +| **Spoofing** | T3 Agent Impersonation | mTLS, registration tokens | +| **Tampering** | T4 Digest, T5 Evidence | Digest verification, append-only tables | +| **Repudiation** | Evidence manipulation | Signed evidence packets | +| **Information Disclosure** | T1 Credentials, T9 Exfiltration | Vault integration, encryption | +| **Denial of Service** | T10 Resource exhaustion | Rate limits, quotas | +| **Elevation of Privilege** | T6 Escalation | RBAC, SoD enforcement | + +## Trust Boundaries + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ TRUST BOUNDARIES │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ PUBLIC NETWORK (Untrusted) │ │ +│ │ │ │ +│ │ Internet, External Users, External Services │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ │ TLS + Authentication │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ DMZ (Semi-trusted) │ │ +│ │ │ │ +│ │ API Gateway, Webhook Gateway │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ │ Internal mTLS │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ INTERNAL NETWORK (Trusted) │ │ +│ │ │ │ +│ │ Stella Core Services, Database, Internal Vault │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ │ Agent mTLS │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ DEPLOYMENT NETWORK (Controlled) │ │ +│ │ │ │ +│ │ Agents, Targets │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Data Classification + +| Classification | Examples | Protection Requirements | +|---------------|----------|------------------------| +| **Critical** | Vault credentials, signing keys | Hardware security, minimal access | +| **Sensitive** | User tokens, agent certificates | Encryption, access logging | +| **Internal** | Release configs, workflow definitions | Encryption at rest | +| **Public** | API documentation, release names | Integrity protection | + +## Security Controls Summary + +| Control | Implementation | Threats Addressed | +|---------|----------------|-------------------| +| mTLS | Agent communication | T3 | +| Short-lived tokens | 15-min access tokens | T2 | +| Vault integration | No secrets in DB | T1 | +| Digest verification | Pull-time validation | T4 | +| Append-only tables | Evidence immutability | T5 | +| RBAC + SoD | Permission enforcement | T6 | +| Plugin sandbox | Resource limits, capability control | T7 | +| Scoped credentials | Task-specific access | T8 | +| Encryption | At rest and in transit | T9 | +| Rate limiting | API and resource quotas | T10 | + +## Incident Response + +### Detection Signals + +| Signal | Indicates | Response | +|--------|-----------|----------| +| Digest mismatch at pull | T4 Tampering | Halt deployment, investigate registry | +| Evidence signature failure | T5 Tampering | Preserve logs, forensic analysis | +| Unusual agent registration | T3 Impersonation | Revoke agent, review access | +| SoD violation attempt | T6 Escalation | Block action, alert admin | +| Plugin network egress | T7 Supply chain | Isolate plugin, review manifest | + +### Response Procedures + +1. **Contain** - Isolate affected component (revoke token, disable agent) +2. **Investigate** - Collect logs, evidence packets, audit trail +3. **Remediate** - Patch vulnerability, rotate credentials +4. **Recover** - Restore service, verify integrity +5. **Report** - Document incident, update threat model + +## References + +- [Security Overview](overview.md) +- [Agent Security](agent-security.md) +- [Audit Trail](audit-trail.md) diff --git a/docs/modules/release-orchestrator/test-structure.md b/docs/modules/release-orchestrator/test-structure.md new file mode 100644 index 000000000..aafaade6c --- /dev/null +++ b/docs/modules/release-orchestrator/test-structure.md @@ -0,0 +1,508 @@ +# Test Structure & Guidelines + +> Test organization, categorization, and patterns for Release Orchestrator modules. + +--- + +## Test Directory Layout + +Release Orchestrator tests follow the Stella Ops standard test structure: + +``` +src/ReleaseOrchestrator/ +├── __Libraries/ +│ ├── StellaOps.ReleaseOrchestrator.Core/ +│ ├── StellaOps.ReleaseOrchestrator.Workflow/ +│ ├── StellaOps.ReleaseOrchestrator.Promotion/ +│ └── StellaOps.ReleaseOrchestrator.Deploy/ +├── __Tests/ +│ ├── StellaOps.ReleaseOrchestrator.Core.Tests/ # Unit tests for Core +│ ├── StellaOps.ReleaseOrchestrator.Workflow.Tests/ # Unit tests for Workflow +│ ├── StellaOps.ReleaseOrchestrator.Promotion.Tests/ # Unit tests for Promotion +│ ├── StellaOps.ReleaseOrchestrator.Deploy.Tests/ # Unit tests for Deploy +│ ├── StellaOps.ReleaseOrchestrator.Integration.Tests/ # Integration tests +│ └── StellaOps.ReleaseOrchestrator.Acceptance.Tests/ # End-to-end tests +└── StellaOps.ReleaseOrchestrator.WebService/ +``` + +**Shared test infrastructure**: +``` +src/__Tests/__Libraries/ +├── StellaOps.Infrastructure.Postgres.Testing/ # PostgreSQL Testcontainers fixtures +└── StellaOps.Testing.Common/ # Common test utilities +``` + +--- + +## Test Categories + +Tests **MUST** be categorized using xUnit traits to enable selective execution: + +### Unit Tests + +```csharp +[Trait("Category", "Unit")] +public class PromotionValidatorTests +{ + [Fact] + public void Validate_MissingReleaseId_ReturnsFalse() + { + // Arrange + var validator = new PromotionValidator(); + var promotion = new Promotion { ReleaseId = Guid.Empty }; + + // Act + var result = validator.Validate(promotion); + + // Assert + Assert.False(result.IsValid); + Assert.Contains("ReleaseId is required", result.Errors); + } +} +``` + +**Characteristics**: +- No database, network, or file system access +- Fast execution (< 100ms per test) +- Isolated from external dependencies +- Deterministic and repeatable + +### Integration Tests + +```csharp +[Trait("Category", "Integration")] +public class PromotionRepositoryTests : IClassFixture +{ + private readonly PostgresFixture _fixture; + + public PromotionRepositoryTests(PostgresFixture fixture) + { + _fixture = fixture; + } + + [Fact] + public async Task SaveAsync_ValidPromotion_PersistsToDatabase() + { + // Arrange + await using var connection = _fixture.CreateConnection(); + var repository = new PromotionRepository(connection, _fixture.TimeProvider); + + var promotion = new Promotion + { + Id = Guid.NewGuid(), + TenantId = _fixture.DefaultTenantId, + ReleaseId = Guid.NewGuid(), + TargetEnvironmentId = Guid.NewGuid(), + Status = PromotionState.PendingApproval, + RequestedAt = _fixture.TimeProvider.GetUtcNow(), + RequestedBy = Guid.NewGuid() + }; + + // Act + await repository.SaveAsync(promotion, CancellationToken.None); + + // Assert + var retrieved = await repository.GetByIdAsync(promotion.Id, CancellationToken.None); + Assert.NotNull(retrieved); + Assert.Equal(promotion.ReleaseId, retrieved.ReleaseId); + } +} +``` + +**Characteristics**: +- Uses Testcontainers for PostgreSQL +- Requires Docker to be running +- Slower execution (hundreds of ms per test) +- Tests data access layer and database constraints + +### Acceptance Tests + +```csharp +[Trait("Category", "Acceptance")] +public class PromotionWorkflowTests : IClassFixture> +{ + private readonly WebApplicationFactory _factory; + private readonly HttpClient _client; + + public PromotionWorkflowTests(WebApplicationFactory factory) + { + _factory = factory; + _client = factory.CreateClient(); + } + + [Fact] + public async Task PromotionWorkflow_EndToEnd_SuccessfullyDeploysRelease() + { + // Arrange: Create environment, release, and promotion + var envId = await CreateEnvironmentAsync("Production"); + var releaseId = await CreateReleaseAsync("v2.3.1"); + + // Act: Request promotion + var promotionResponse = await _client.PostAsJsonAsync( + "/api/v1/promotions", + new { releaseId, targetEnvironmentId = envId }); + + promotionResponse.EnsureSuccessStatusCode(); + var promotion = await promotionResponse.Content.ReadFromJsonAsync(); + + // Act: Approve promotion + var approveResponse = await _client.PostAsync( + $"/api/v1/promotions/{promotion.Id}/approve", null); + + approveResponse.EnsureSuccessStatusCode(); + + // Assert: Verify deployment completed + var status = await GetPromotionStatusAsync(promotion.Id); + Assert.Equal("deployed", status.Status); + } +} +``` + +**Characteristics**: +- Tests full API surface and workflows +- Uses `WebApplicationFactory` for in-memory hosting +- Tests end-to-end scenarios +- May involve multiple services + +--- + +## PostgreSQL Test Fixtures + +### Testcontainers Fixture + +```csharp +public class PostgresFixture : IAsyncLifetime +{ + private PostgreSqlContainer? _container; + private NpgsqlConnection? _connection; + public TimeProvider TimeProvider { get; private set; } = null!; + public IGuidGenerator GuidGenerator { get; private set; } = null!; + public Guid DefaultTenantId { get; private set; } + + public async Task InitializeAsync() + { + // Start PostgreSQL container + _container = new PostgreSqlBuilder() + .WithImage("postgres:16") + .WithDatabase("stellaops_test") + .WithUsername("postgres") + .WithPassword("postgres") + .Build(); + + await _container.StartAsync(); + + // Create connection + _connection = new NpgsqlConnection(_container.GetConnectionString()); + await _connection.OpenAsync(); + + // Run migrations + await ApplyMigrationsAsync(); + + // Setup test infrastructure + TimeProvider = new ManualTimeProvider(); + GuidGenerator = new SequentialGuidGenerator(); + DefaultTenantId = Guid.Parse("00000000-0000-0000-0000-000000000001"); + + // Seed test data + await SeedTestDataAsync(); + } + + public NpgsqlConnection CreateConnection() + { + if (_container == null) + throw new InvalidOperationException("Container not initialized"); + + return new NpgsqlConnection(_container.GetConnectionString()); + } + + private async Task ApplyMigrationsAsync() + { + // Apply schema migrations + await ExecuteSqlFileAsync("schema/release-orchestrator-schema.sql"); + } + + private async Task SeedTestDataAsync() + { + // Create default tenant + await using var cmd = _connection!.CreateCommand(); + cmd.CommandText = @" + INSERT INTO tenants (id, name, created_at) + VALUES (@id, @name, @created_at) + ON CONFLICT DO NOTHING"; + cmd.Parameters.AddWithValue("id", DefaultTenantId); + cmd.Parameters.AddWithValue("name", "Test Tenant"); + cmd.Parameters.AddWithValue("created_at", TimeProvider.GetUtcNow()); + await cmd.ExecuteNonQueryAsync(); + } + + public async Task DisposeAsync() + { + if (_connection != null) + { + await _connection.DisposeAsync(); + } + + if (_container != null) + { + await _container.DisposeAsync(); + } + } +} +``` + +--- + +## Test Patterns + +### Deterministic Time in Tests + +```csharp +public class PromotionTimingTests +{ + [Fact] + public void CreatePromotion_SetsCorrectTimestamp() + { + // Arrange + var manualTime = new ManualTimeProvider(); + manualTime.SetUtcNow(new DateTimeOffset(2026, 1, 10, 14, 30, 0, TimeSpan.Zero)); + + var guidGen = new SequentialGuidGenerator(); + var manager = new PromotionManager(manualTime, guidGen); + + // Act + var promotion = manager.CreatePromotion( + releaseId: Guid.Parse("00000000-0000-0000-0000-000000000001"), + targetEnvId: Guid.Parse("00000000-0000-0000-0000-000000000002") + ); + + // Assert + Assert.Equal( + new DateTimeOffset(2026, 1, 10, 14, 30, 0, TimeSpan.Zero), + promotion.RequestedAt + ); + } +} +``` + +### Testing CancellationToken Propagation + +```csharp +public class PromotionCancellationTests +{ + [Fact] + public async Task ApprovePromotionAsync_CancellationRequested_ThrowsOperationCanceledException() + { + // Arrange + var cts = new CancellationTokenSource(); + var repository = new Mock(); + + repository + .Setup(r => r.GetByIdAsync(It.IsAny(), It.IsAny())) + .Returns(async (Guid id, CancellationToken ct) => + { + await Task.Delay(100, ct); // Simulate delay + return new Promotion { Id = id }; + }); + + var manager = new PromotionManager(repository.Object, TimeProvider.System, new SystemGuidGenerator()); + + // Act & Assert + cts.Cancel(); // Cancel before operation completes + + await Assert.ThrowsAsync(async () => + await manager.ApprovePromotionAsync(Guid.NewGuid(), Guid.NewGuid(), cts.Token) + ); + } +} +``` + +### Testing Immutability + +```csharp +public class ReleaseImmutabilityTests +{ + [Fact] + public void GetComponents_ReturnsImmutableCollection() + { + // Arrange + var release = new Release + { + Components = new Dictionary + { + ["api"] = new ComponentDigest("registry.io/api", "sha256:abc123", "v1.0.0") + }.ToImmutableDictionary() + }; + + // Act + var components = release.Components; + + // Assert: Attempting to modify throws + Assert.Throws(() => + { + var mutable = (IDictionary)components; + mutable["web"] = new ComponentDigest("registry.io/web", "sha256:def456", "v1.0.0"); + }); + } +} +``` + +### Testing Evidence Hash Determinism + +```csharp +public class EvidenceHashDeterminismTests +{ + [Fact] + public void ComputeEvidenceHash_SameInputs_ProducesSameHash() + { + // Arrange + var decisionRecord = new DecisionRecord + { + PromotionId = Guid.Parse("00000000-0000-0000-0000-000000000001"), + DecidedAt = new DateTimeOffset(2026, 1, 10, 12, 0, 0, TimeSpan.Zero), + Outcome = "approved", + GateResults = ImmutableArray.Create( + new GateResult("security", "pass", null) + ) + }; + + // Act: Compute hash multiple times + var hash1 = EvidenceHasher.ComputeHash(decisionRecord); + var hash2 = EvidenceHasher.ComputeHash(decisionRecord); + + // Assert: Hashes are identical + Assert.Equal(hash1, hash2); + } +} +``` + +--- + +## Running Tests + +### Run All Tests + +```bash +dotnet test src/StellaOps.sln +``` + +### Run Only Unit Tests + +```bash +dotnet test src/StellaOps.sln --filter "Category=Unit" +``` + +### Run Only Integration Tests + +```bash +dotnet test src/StellaOps.sln --filter "Category=Integration" +``` + +### Run Specific Test Class + +```bash +dotnet test --filter "FullyQualifiedName~PromotionValidatorTests" +``` + +### Run with Coverage + +```bash +dotnet test src/StellaOps.sln --collect:"XPlat Code Coverage" +``` + +--- + +## Test Data Builders + +Use builder pattern for complex test data: + +```csharp +public class PromotionBuilder +{ + private Guid _id = Guid.NewGuid(); + private Guid _tenantId = Guid.NewGuid(); + private Guid _releaseId = Guid.NewGuid(); + private Guid _targetEnvId = Guid.NewGuid(); + private PromotionState _status = PromotionState.PendingApproval; + private DateTimeOffset _requestedAt = DateTimeOffset.UtcNow; + + public PromotionBuilder WithId(Guid id) + { + _id = id; + return this; + } + + public PromotionBuilder WithStatus(PromotionState status) + { + _status = status; + return this; + } + + public PromotionBuilder WithReleaseId(Guid releaseId) + { + _releaseId = releaseId; + return this; + } + + public Promotion Build() + { + return new Promotion + { + Id = _id, + TenantId = _tenantId, + ReleaseId = _releaseId, + TargetEnvironmentId = _targetEnvId, + Status = _status, + RequestedAt = _requestedAt, + RequestedBy = Guid.NewGuid() + }; + } +} + +// Usage in tests +[Fact] +public void ApprovePromotion_PendingStatus_TransitionsToApproved() +{ + var promotion = new PromotionBuilder() + .WithStatus(PromotionState.PendingApproval) + .Build(); + + // ... test logic +} +``` + +--- + +## Code Coverage Requirements + +- **Unit tests**: Aim for 80%+ coverage of business logic +- **Integration tests**: Cover all data access paths and constraints +- **Acceptance tests**: Cover critical user journeys + +**Exclusions from coverage**: +- Program.cs / Startup.cs configuration code +- DTOs and simple data classes +- Generated code + +--- + +## Summary Checklist + +Before merging: + +- [ ] All tests categorized with `[Trait("Category", "...")]` +- [ ] Unit tests use `TimeProvider` and `IGuidGenerator` for determinism +- [ ] Integration tests use `PostgresFixture` with Testcontainers +- [ ] `CancellationToken` propagation tested where applicable +- [ ] Evidence hash determinism verified +- [ ] No test reimplements production logic +- [ ] All tests pass locally and in CI +- [ ] Code coverage meets requirements + +--- + +## References + +- [Implementation Guide](./implementation-guide.md) — .NET implementation patterns +- [CLAUDE.md](../../../CLAUDE.md) — Stella Ops coding rules +- [PostgreSQL Testing Guide](../../infrastructure/Postgres.Testing/README.md) — Testcontainers setup +- [src/__Tests/AGENTS.md](../../../src/__Tests/AGENTS.md) — Global test infrastructure diff --git a/docs/modules/release-orchestrator/ui/dashboard.md b/docs/modules/release-orchestrator/ui/dashboard.md new file mode 100644 index 000000000..a42fdd2ad --- /dev/null +++ b/docs/modules/release-orchestrator/ui/dashboard.md @@ -0,0 +1,207 @@ +# Dashboard Specification + +> Main dashboard layout and metrics specification for the Release Orchestrator UI. + +**Status:** Planned (not yet implemented) +**Source:** [Architecture Advisory Section 12.1](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) +**Related Modules:** [WebSocket APIs](../api/websockets.md), [Metrics](../operations/metrics.md) +**Sprint:** [111_001 Dashboard Overview](../../../../implplan/SPRINT_20260110_111_001_FE_dashboard_overview.md) + +## Overview + +The dashboard provides a real-time overview of security posture, release operations, estate health, and compliance status. + +--- + +## Dashboard Layout + +``` ++-----------------------------------------------------------------------------+ +| STELLA OPS SUITE | +| +-----+ [User Menu v] | +| |Logo | Dashboard Releases Environments Workflows Integrations | ++-----------------------------------------------------------------------------+ +| | +| +-------------------------------+ +-----------------------------------+ | +| | SECURITY POSTURE | | RELEASE OPERATIONS | | +| | | | | | +| | +---------+ +---------+ | | +---------+ +---------+ | | +| | |Critical | | High | | | |In Flight| |Completed| | | +| | | 0 * | | 3 * | | | | 2 | | 47 | | | +| | |reachable| |reachable| | | |deploys | | today | | | +| | +---------+ +---------+ | | +---------+ +---------+ | | +| | | | | | +| | Blocked: 2 releases | | Pending Approval: 3 | | +| | Risk Drift: 1 env | | Failed (24h): 1 | | +| | | | | | +| +-------------------------------+ +-----------------------------------+ | +| | +| +-------------------------------+ +-----------------------------------+ | +| | ESTATE HEALTH | | COMPLIANCE/AUDIT | | +| | | | | | +| | Agents: 12 online, 1 offline| | Evidence Complete: 98% | | +| | Targets: 45/47 healthy | | Policy Changes: 2 (this week) | | +| | Drift Detected: 2 targets | | Audit Exports: 5 (this month) | | +| | | | | | +| +-------------------------------+ +-----------------------------------+ | +| | +| +-----------------------------------------------------------------------+ | +| | RECENT ACTIVITY | | +| | | | +| | * 14:32 myapp-v2.3.1 deployed to prod (jane@example.com) | | +| | o 14:28 myapp-v2.3.1 promoted to stage (auto) | | +| | * 14:15 api-v1.2.0 blocked: critical vuln CVE-2024-1234 | | +| | o 13:45 worker-v3.0.0 release created (john@example.com) | | +| | * 13:30 Target prod-web-03 health: degraded | | +| | | | +| +-----------------------------------------------------------------------+ | +| | ++-----------------------------------------------------------------------------+ +``` + +--- + +## Dashboard Metrics + +### TypeScript Interfaces + +```typescript +interface DashboardMetrics { + // Security Posture + security: { + criticalReachable: number; + highReachable: number; + blockedReleases: number; + riskDriftEnvironments: number; + digestsAnalyzedToday: number; + digestQuota: number; + }; + + // Release Operations + operations: { + deploymentsInFlight: number; + deploymentsCompletedToday: number; + deploymentsFailed24h: number; + pendingApprovals: number; + averageDeployTime: number; // seconds + }; + + // Estate Health + estate: { + agentsOnline: number; + agentsOffline: number; + agentsDegraded: number; + targetsHealthy: number; + targetsUnhealthy: number; + targetsDrift: number; + }; + + // Compliance/Audit + compliance: { + evidenceCompleteness: number; // percentage + policyChangesThisWeek: number; + auditExportsThisMonth: number; + lastExportDate: DateTime; + }; +} +``` + +--- + +## Dashboard Panels + +### 1. Security Posture Panel + +Displays current security state across all releases: + +| Metric | Description | +|--------|-------------| +| Critical Reachable | Critical vulnerabilities with confirmed reachability | +| High Reachable | High severity vulnerabilities with confirmed reachability | +| Blocked Releases | Releases blocked by security gates | +| Risk Drift | Environments with changed risk since deployment | + +### 2. Release Operations Panel + +Shows active deployment operations: + +| Metric | Description | +|--------|-------------| +| In Flight | Deployments currently in progress | +| Completed Today | Successful deployments in last 24h | +| Pending Approval | Promotions awaiting approval | +| Failed (24h) | Failed deployments in last 24h | + +### 3. Estate Health Panel + +Displays agent and target health: + +| Metric | Description | +|--------|-------------| +| Agents Online | Number of agents reporting healthy | +| Agents Offline | Agents that missed heartbeats | +| Targets Healthy | Targets passing health checks | +| Drift Detected | Targets with version drift | + +### 4. Compliance/Audit Panel + +Shows audit and compliance status: + +| Metric | Description | +|--------|-------------| +| Evidence Complete | % of deployments with full evidence | +| Policy Changes | Policy modifications this week | +| Audit Exports | Evidence exports this month | + +--- + +## Real-Time Updates + +### WebSocket Integration + +```typescript +interface DashboardStreamMessage { + type: "metric_update" | "activity" | "alert"; + timestamp: DateTime; + payload: MetricUpdate | ActivityEvent | Alert; +} + +// Subscribe to dashboard stream +const ws = new WebSocket("/api/v1/dashboard/stream"); + +ws.onmessage = (event) => { + const message: DashboardStreamMessage = JSON.parse(event.data); + + switch (message.type) { + case "metric_update": + updateMetrics(message.payload); + break; + case "activity": + addActivityItem(message.payload); + break; + case "alert": + showAlert(message.payload); + break; + } +}; +``` + +--- + +## Performance Targets + +| Metric | Target | +|--------|--------| +| Initial Load | < 2 seconds | +| Metric Refresh | Every 30 seconds | +| WebSocket Reconnect | Exponential backoff (1s, 2s, 4s, ... 30s max) | +| Activity History | Last 50 events | + +--- + +## See Also + +- [WebSocket APIs](../api/websockets.md) +- [Metrics](../operations/metrics.md) +- [Workflow Editor](workflow-editor.md) +- [Key Screens](screens.md) diff --git a/docs/modules/release-orchestrator/ui/overview.md b/docs/modules/release-orchestrator/ui/overview.md new file mode 100644 index 000000000..a16d4c0e2 --- /dev/null +++ b/docs/modules/release-orchestrator/ui/overview.md @@ -0,0 +1,332 @@ +# UI Overview + +## Status + +**Planned** - UI implementation has not started. + +## Design Principles + +| Principle | Implementation | +|-----------|----------------| +| **Clarity** | Clear status indicators, intuitive navigation | +| **Real-time** | Live updates via WebSocket for deployments | +| **Actionable** | One-click approvals, quick actions | +| **Audit-friendly** | Full history visibility, evidence access | +| **Mobile-aware** | Responsive design for on-call scenarios | + +## Main Screens + +### Dashboard + +The main dashboard provides an at-a-glance view of deployment health across environments. + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ RELEASE ORCHESTRATOR [User] [Settings] │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ ENVIRONMENT PIPELINE │ │ +│ │ │ │ +│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ +│ │ │ DEV │───►│ STAGING │───►│ UAT │───►│ PROD │ │ │ +│ │ │ v1.5.0 │ │ v1.4.2 │ │ v1.4.1 │ │ v1.4.0 │ │ │ +│ │ │ 3/3 OK │ │ 2/2 OK │ │ 2/2 OK │ │ 5/5 OK │ │ │ +│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ +│ │ │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌──────────────────────────────┐ ┌──────────────────────────────┐ │ +│ │ PENDING APPROVALS (3) │ │ RECENT DEPLOYMENTS │ │ +│ │ │ │ │ │ +│ │ ● myapp → prod [Approve] │ │ ✓ api v1.5.0 → dev 2m │ │ +│ │ Requested by: John │ │ ✓ web v1.4.2 → staging 15m │ │ +│ │ 2 hours ago │ │ ✗ api v1.4.1 → uat 1h │ │ +│ │ │ │ ✓ web v1.4.0 → prod 2h │ │ +│ │ ● web → uat [Approve] │ │ │ │ +│ │ Requested by: Jane │ │ [View All] │ │ +│ │ 30 minutes ago │ │ │ │ +│ │ │ │ │ │ +│ └──────────────────────────────┘ └──────────────────────────────┘ │ +│ │ +│ ┌──────────────────────────────┐ ┌──────────────────────────────┐ │ +│ │ AGENT STATUS │ │ ACTIVE WORKFLOWS │ │ +│ │ │ │ │ │ +│ │ ● 12 Online │ │ ● Deploy api v1.5.0 │ │ +│ │ ○ 1 Offline │ │ Step: Health Check (3/5) │ │ +│ │ ◐ 2 Degraded │ │ │ │ +│ │ │ │ ● Promote web to UAT │ │ +│ │ [View Details] │ │ Step: Awaiting Approval │ │ +│ │ │ │ │ │ +│ └──────────────────────────────┘ └──────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### Releases View + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ RELEASES [+ Create Release] │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ Filter: [All ▼] Status: [All ▼] Search: [________________] │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ NAME STATUS COMPONENTS ENVIRONMENTS CREATED │ │ +│ ├─────────────────────────────────────────────────────────────────────┤ │ +│ │ myapp-v1.5.0 Ready 3 dev 2h ago │ │ +│ │ myapp-v1.4.2 Deployed 3 staging, uat 1d ago │ │ +│ │ myapp-v1.4.1 Deployed 3 prod 3d ago │ │ +│ │ myapp-v1.4.0 Deprecated 3 - 1w ago │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ RELEASE DETAIL: myapp-v1.5.0 [Promote ▼] │ │ +│ ├─────────────────────────────────────────────────────────────────────┤ │ +│ │ │ │ +│ │ Components: │ │ +│ │ ┌────────────────────────────────────────────────────────────┐ │ │ +│ │ │ api sha256:abc123... registry.io/myorg/api │ │ │ +│ │ │ web sha256:def456... registry.io/myorg/web │ │ │ +│ │ │ worker sha256:ghi789... registry.io/myorg/worker │ │ │ +│ │ └────────────────────────────────────────────────────────────┘ │ │ +│ │ │ │ +│ │ Source: https://github.com/myorg/myapp @ v1.5.0 │ │ +│ │ Created: 2h ago by john@example.com │ │ +│ │ │ │ +│ │ Promotion History: │ │ +│ │ dev (✓) → staging (pending) → uat (-) → prod (-) │ │ +│ │ │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### Promotion Detail + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ PROMOTION: myapp-v1.5.0 → production │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ Status: PENDING APPROVAL [Approve] [Reject] │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ GATE EVALUATION │ │ +│ ├─────────────────────────────────────────────────────────────────────┤ │ +│ │ │ │ +│ │ ✓ Security Gate Passed │ │ +│ │ No critical vulnerabilities │ │ +│ │ │ │ +│ │ ✓ Freeze Window Check Passed │ │ +│ │ No active freeze windows │ │ +│ │ │ │ +│ │ ◐ Approval Gate 1/2 Approvals │ │ +│ │ Jane approved 30m ago │ │ +│ │ Waiting for 1 more approval │ │ +│ │ │ │ +│ │ ○ Separation of Duties Pending │ │ +│ │ Requester: John (cannot approve) │ │ +│ │ │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ PROMOTION TIMELINE │ │ +│ ├─────────────────────────────────────────────────────────────────────┤ │ +│ │ │ │ +│ │ 10:00 John requested promotion │ │ +│ │ 10:05 Security gate evaluated: PASSED │ │ +│ │ 10:05 Freeze check: PASSED │ │ +│ │ 10:30 Jane approved │ │ +│ │ 11:00 Waiting for additional approval... │ │ +│ │ │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### Workflow Editor + +Visual editor for creating and modifying workflow templates. + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ WORKFLOW EDITOR: standard-deploy [Save] [Run] │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ ┌─────────────────┐ ┌─────────────────────────────────────────────────┐ │ +│ │ STEP PALETTE │ │ │ │ +│ │ │ │ │ │ +│ │ Control │ │ ┌──────────┐ │ │ +│ │ ├─ Approval │ │ │ Approval │ │ │ +│ │ ├─ Wait │ │ │ Gate │ │ │ +│ │ └─ Condition │ │ └────┬─────┘ │ │ +│ │ │ │ │ │ │ +│ │ Gates │ │ ▼ │ │ +│ │ ├─ Security │ │ ┌──────────┐ │ │ +│ │ ├─ Freeze │ │ │ Security │ │ │ +│ │ └─ Custom │ │ │ Gate │ │ │ +│ │ │ │ └────┬─────┘ │ │ +│ │ Deploy │ │ │ │ │ +│ │ ├─ Docker │ │ ▼ │ │ +│ │ ├─ Compose │ │ ┌──────────┐ │ │ +│ │ └─ ECS │ │ │ Deploy │ │ │ +│ │ │ │ │ Targets │ │ │ +│ │ Verify │ │ └────┬─────┘ │ │ +│ │ ├─ Health │ │ │ │ │ +│ │ └─ Smoke Test │ │ ┌────┴────┐ │ │ +│ │ │ │ │ │ │ │ +│ │ Notify │ │ ▼ ▼ │ │ +│ │ ├─ Slack │ │ ┌──────┐ ┌──────────┐ │ │ +│ │ └─ Email │ │ │Health│ │ Rollback │◄──[on failure] │ │ +│ │ │ │ │Check │ │ Handler │ │ │ +│ │ │ │ └──┬───┘ └────┬─────┘ │ │ +│ │ │ │ │ │ │ │ +│ │ │ │ ▼ ▼ │ │ +│ │ │ │ ┌──────┐ ┌──────────┐ │ │ +│ │ │ │ │Notify│ │ Notify │ │ │ +│ │ │ │ │Success│ │ Failure │ │ │ +│ │ │ │ └──────┘ └──────────┘ │ │ +│ │ │ │ │ │ +│ └─────────────────┘ └─────────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ STEP PROPERTIES: Deploy Targets │ │ +│ ├─────────────────────────────────────────────────────────────────────┤ │ +│ │ Type: deploy-compose │ │ +│ │ Strategy: [Rolling ▼] │ │ +│ │ Parallelism: [2] │ │ +│ │ Timeout: [600] seconds │ │ +│ │ On Failure: [Rollback ▼] │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### Deployment Live View + +Real-time view of an active deployment. + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ DEPLOYMENT: myapp-v1.5.0 → production [Abort]│ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ Status: RUNNING Progress: ████████░░ 80% │ +│ Strategy: Rolling (batch 4/5) Duration: 5m 23s │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ TARGET STATUS │ │ +│ ├─────────────────────────────────────────────────────────────────────┤ │ +│ │ │ │ +│ │ ✓ prod-host-1 sha256:abc123 Deployed Health: OK │ │ +│ │ ✓ prod-host-2 sha256:abc123 Deployed Health: OK │ │ +│ │ ✓ prod-host-3 sha256:abc123 Deployed Health: OK │ │ +│ │ ● prod-host-4 sha256:abc123 Deploying Health: Checking... │ │ +│ │ ○ prod-host-5 - Pending Health: - │ │ +│ │ │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ LIVE LOGS: prod-host-4 │ │ +│ ├─────────────────────────────────────────────────────────────────────┤ │ +│ │ 10:25:15 Pulling image sha256:abc123... │ │ +│ │ 10:25:18 Image pulled successfully │ │ +│ │ 10:25:19 Stopping existing container... │ │ +│ │ 10:25:20 Starting new container... │ │ +│ │ 10:25:21 Container started │ │ +│ │ 10:25:22 Running health check... │ │ +│ │ 10:25:25 Health check passed (1/3) │ │ +│ │ 10:25:28 Health check passed (2/3) │ │ +│ │ ... │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### Environment Management + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ ENVIRONMENTS [+ Add Environment] │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ NAME ORDER TARGETS CURRENT RELEASE APPROVALS STATUS │ │ +│ ├─────────────────────────────────────────────────────────────────────┤ │ +│ │ development 1 3 myapp-v1.5.0 0 Active │ │ +│ │ staging 2 2 myapp-v1.4.2 1 Active │ │ +│ │ uat 3 2 myapp-v1.4.1 1 Active │ │ +│ │ production 4 5 myapp-v1.4.0 2 + SoD Active │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ ENVIRONMENT DETAIL: production [Edit] │ │ +│ ├─────────────────────────────────────────────────────────────────────┤ │ +│ │ │ │ +│ │ Approval Policy: │ │ +│ │ - Required approvals: 2 │ │ +│ │ - Separation of duties: Enabled │ │ +│ │ - Approver roles: release-manager, tech-lead │ │ +│ │ │ │ +│ │ Freeze Windows: │ │ +│ │ ┌────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Holiday Freeze Dec 20 - Jan 5 Active [Remove] │ │ │ +│ │ │ Weekend Freeze Sat-Sun Active [Remove] │ │ │ +│ │ └────────────────────────────────────────────────────────────┘ │ │ +│ │ [+ Add Freeze Window] │ │ +│ │ │ │ +│ │ Targets: │ │ +│ │ ┌────────────────────────────────────────────────────────────┐ │ │ +│ │ │ prod-host-1 docker_host healthy sha256:abc... │ │ │ +│ │ │ prod-host-2 docker_host healthy sha256:abc... │ │ │ +│ │ │ prod-host-3 docker_host healthy sha256:abc... │ │ │ +│ │ │ prod-host-4 docker_host healthy sha256:abc... │ │ │ +│ │ │ prod-host-5 docker_host degraded sha256:abc... │ │ │ +│ │ └────────────────────────────────────────────────────────────┘ │ │ +│ │ │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Key Interactions + +### Approval Flow + +1. User sees pending approval notification on dashboard +2. Clicks to view promotion detail +3. Reviews gate evaluation results and change details +4. Clicks "Approve" or "Reject" with optional comment +5. System validates SoD requirements +6. Promotion advances or notification sent + +### Quick Promote + +1. From release detail, user clicks "Promote" +2. Selects target environment from dropdown +3. Confirms promotion request +4. System evaluates gates immediately +5. If auto-approved, deployment begins +6. If approval required, notification sent to approvers + +### Emergency Rollback + +1. From deployment history or alert, user clicks "Rollback" +2. System shows previous healthy version +3. User confirms rollback +4. System creates rollback deployment job +5. Real-time progress shown + +## Mobile Considerations + +- Responsive design for smaller screens +- Critical actions (approve/reject) accessible on mobile +- Push notifications for pending approvals +- Simplified views for monitoring on-the-go + +## References + +- [API Overview](../api/overview.md) +- [Workflow Templates](../workflow/templates.md) diff --git a/docs/modules/release-orchestrator/ui/screens.md b/docs/modules/release-orchestrator/ui/screens.md new file mode 100644 index 000000000..1b0eb805d --- /dev/null +++ b/docs/modules/release-orchestrator/ui/screens.md @@ -0,0 +1,232 @@ +# Key UI Screens + +> Specification for key UI screens: Environment Overview, Release Detail, and Why Blocked modal. + +**Status:** Planned (not yet implemented) +**Source:** [Architecture Advisory Section 12.3](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) +**Related Modules:** [Environment Manager](../modules/environment-manager.md), [Release Manager](../modules/release-manager.md) +**Sprints:** [111_002 - 111_007](../../../../implplan/) + +## Overview + +This document specifies the key UI screens for release orchestration. + +--- + +## Environment Overview Screen + +The environment overview shows the deployment pipeline and current state of each environment. + +``` ++-----------------------------------------------------------------------------+ +| ENVIRONMENTS [+ New Environment] | ++-----------------------------------------------------------------------------+ +| | +| +------------------------------------------------------------------------+ | +| | ENVIRONMENT PIPELINE | | +| | | | +| | +---------+ +---------+ +---------+ +---------+ | | +| | | DEV | ---> | TEST | ---> | STAGE | ---> | PROD | | | +| | | | | | | | | | | | +| | | v2.4.0 | | v2.3.1 | | v2.3.1 | | v2.3.0 | | | +| | | * 5 min | | * 2h | | * 1d | | * 3d | | | +| | +---------+ +---------+ +---------+ +---------+ | | +| | | | +| +------------------------------------------------------------------------+ | +| | +| +------------------------------------------------------------------------+ | +| | PRODUCTION [Manage] [View] | | +| | | | +| | Current Release: myapp-v2.3.0 | | +| | Deployed: 3 days ago by jane@example.com | | +| | Targets: 5 healthy, 0 unhealthy | | +| | | | +| | +---------------------------------------------------------------+ | | +| | | Pending Promotion: myapp-v2.3.1 [Review] | | | +| | | Waiting: 2 approvals (1/2) | | | +| | | Security: V All gates pass | | | +| | +---------------------------------------------------------------+ | | +| | | | +| | Freeze Windows: None active | | +| | Required Approvals: 2 | | +| | | | +| +------------------------------------------------------------------------+ | +| | ++-----------------------------------------------------------------------------+ +``` + +### Features + +- **Environment Pipeline:** Visual flow showing version progression +- **Environment Cards:** Detailed view of each environment +- **Target Health:** Real-time target health indicators +- **Pending Promotions:** Promotions awaiting action +- **Freeze Windows:** Active and scheduled freeze windows +- **Approval Status:** Current approval count vs required + +--- + +## Release Detail Screen + +The release detail screen shows all information about a specific release. + +``` ++-----------------------------------------------------------------------------+ +| RELEASE: myapp-v2.3.1 | +| Created: 2 hours ago by jane@example.com | ++-----------------------------------------------------------------------------+ +| | +| [Overview] [Components] [Security] [Deployments] [Evidence] | +| | +| +------------------------------------------------------------------------+ | +| | COMPONENTS | | +| | | | +| | +------------------------------------------------------------------+ | | +| | | api | | | +| | | Version: 2.3.1 Digest: sha256:abc123... | | | +| | | Security: V 0 critical, 0 high (0 reachable) | | | +| | | Image: registry.example.com/myapp/api@sha256:abc123 | | | +| | +------------------------------------------------------------------+ | | +| | | | +| | +------------------------------------------------------------------+ | | +| | | worker | | | +| | | Version: 2.3.1 Digest: sha256:def456... | | | +| | | Security: V 0 critical, 0 high (0 reachable) | | | +| | | Image: registry.example.com/myapp/worker@sha256:def456 | | | +| | +------------------------------------------------------------------+ | | +| | | | +| +------------------------------------------------------------------------+ | +| | +| +------------------------------------------------------------------------+ | +| | DEPLOYMENT STATUS | | +| | | | +| | dev *--------------------------------------------* Deployed (2h) | | +| | test *--------------------------------------------* Deployed (1h) | | +| | stage o--------------------------------------------* Deploying... | | +| | prod o Not deployed | | +| | | | +| +------------------------------------------------------------------------+ | +| | +| [Promote to Stage v] [Compare with Production] [Download Evidence] | +| | ++-----------------------------------------------------------------------------+ +``` + +### Tabs + +1. **Overview:** Release metadata and summary +2. **Components:** Component list with digests and versions +3. **Security:** Vulnerability summary and reachability analysis +4. **Deployments:** Deployment history across environments +5. **Evidence:** Evidence packets for compliance + +### Features + +- **Digest Display:** Full OCI digests for each component +- **Security Summary:** Vulnerability counts by severity +- **Deployment Timeline:** Visual progress across environments +- **Quick Actions:** Promote, compare, and export options + +--- + +## "Why Blocked?" Modal + +The "Why Blocked?" modal explains why a promotion cannot proceed. + +``` ++-----------------------------------------------------------------------------+ +| WHY IS THIS PROMOTION BLOCKED? [Close] | ++-----------------------------------------------------------------------------+ +| | +| Release: myapp-v2.4.0 -> Production | +| | +| +------------------------------------------------------------------------+ | +| | X SECURITY GATE FAILED | | +| | | | +| | Component 'api' has 1 critical reachable vulnerability: | | +| | | | +| | - CVE-2024-1234 (Critical, CVSS 9.8) | | +| | Package: log4j 2.14.0 | | +| | Reachability: V Confirmed reachable via api/logging/Logger.java | | +| | Fixed in: 2.17.1 | | +| | [View Details] [View Evidence] | | +| | | | +| | Remediation: Update log4j to version 2.17.1 or later | | +| | | | +| +------------------------------------------------------------------------+ | +| | +| +------------------------------------------------------------------------+ | +| | V APPROVAL GATE PASSED | | +| | | | +| | Required: 2 approvals | | +| | Received: 2 approvals | | +| | - john@example.com (2h ago): "LGTM" | | +| | - sarah@example.com (1h ago): "Approved for prod" | | +| | | | +| +------------------------------------------------------------------------+ | +| | +| +------------------------------------------------------------------------+ | +| | V FREEZE WINDOW GATE PASSED | | +| | | | +| | No active freeze windows for production | | +| | | | +| +------------------------------------------------------------------------+ | +| | +| Policy evaluated at: 2026-01-09T14:32:15Z | +| Policy hash: sha256:789xyz... | +| [View Full Decision Record] | +| | ++-----------------------------------------------------------------------------+ +``` + +### Features + +- **Gate-by-Gate Status:** Shows each gate with pass/fail status +- **Failure Details:** Specific information about why a gate failed +- **Vulnerability Details:** CVE info, package, version, and remediation +- **Reachability Evidence:** Links to reachability analysis +- **Approval History:** List of approvers and their comments +- **Override Mechanism:** Request override for authorized users +- **Decision Record:** Link to full evidence packet + +--- + +## Navigation Structure + +``` +Dashboard ++-- Releases +| +-- [Release Detail] +| +-- Create Release +| +-- Compare Releases +| ++-- Environments +| +-- [Environment Overview] +| +-- Create Environment +| +-- Manage Targets +| ++-- Workflows +| +-- [Workflow Editor] +| +-- Workflow Runs +| +-- Step Types +| ++-- Integrations +| +-- Connectors +| +-- Plugins +| +-- Vault +| ++-- Settings + +-- Users & Teams + +-- Policies + +-- Audit Log +``` + +--- + +## See Also + +- [Dashboard](dashboard.md) +- [Workflow Editor](workflow-editor.md) +- [Environment Manager](../modules/environment-manager.md) +- [Release Manager](../modules/release-manager.md) +- [Promotion Manager](../modules/promotion-manager.md) diff --git a/docs/modules/release-orchestrator/ui/workflow-editor.md b/docs/modules/release-orchestrator/ui/workflow-editor.md new file mode 100644 index 000000000..06da0830e --- /dev/null +++ b/docs/modules/release-orchestrator/ui/workflow-editor.md @@ -0,0 +1,296 @@ +# Workflow Editor Specification + +> Visual workflow editor for creating and editing DAG-based workflow templates. + +**Status:** Planned (not yet implemented) +**Source:** [Architecture Advisory Section 12.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md) +**Related Modules:** [Workflow Engine](../modules/workflow-engine.md), [Workflow Templates](../workflow/templates.md) +**Sprint:** [111_004 Workflow Editor](../../../../implplan/SPRINT_20260110_111_004_FE_workflow_editor.md) + +## Overview + +The workflow editor provides a visual graph editor for creating and editing workflow templates. It supports drag-and-drop node placement, connection creation, real-time run visualization, and bidirectional YAML synchronization. + +--- + +## Graph Editor Component + +### Editor State + +```typescript +interface WorkflowEditorState { + template: WorkflowTemplate; + selectedNode: string | null; + selectedEdge: string | null; + zoom: number; + pan: { x: number; y: number }; + mode: "select" | "pan" | "connect"; + clipboard: StepNode[] | null; + undoStack: WorkflowTemplate[]; + redoStack: WorkflowTemplate[]; +} + +interface WorkflowEditorProps { + template: WorkflowTemplate; + stepTypes: StepType[]; + readOnly: boolean; + onSave: (template: WorkflowTemplate) => void; + onValidate: (template: WorkflowTemplate) => ValidationResult; +} +``` + +### Node Renderer + +```typescript +interface NodeRendererProps { + node: StepNode; + stepType: StepType; + status?: StepRunStatus; // For run visualization + selected: boolean; + onSelect: () => void; + onMove: (position: Position) => void; + onConnect: (sourceHandle: string) => void; +} + +const NodeRenderer: React.FC = ({ + node, stepType, status, selected +}) => { + const statusColor = getStatusColor(status); + + return ( +
+ + {/* Node header */} +
+ + {node.name} + {status && } +
+ + {/* Node body */} +
+ {stepType.name} + {node.timeout && T {node.timeout}s} +
+ + {/* Connection handles */} + + + + {/* Conditional indicator */} + {node.condition && ( +
+ +
+ )} +
+ ); +}; +``` + +--- + +## Run Visualization Overlay + +### Real-Time Execution Display + +```typescript +interface RunVisualizationProps { + template: WorkflowTemplate; + workflowRun: WorkflowRun; + stepRuns: StepRun[]; + onNodeClick: (nodeId: string) => void; +} + +const RunVisualization: React.FC = ({ + template, workflowRun, stepRuns, onNodeClick +}) => { + // WebSocket for real-time updates + const { subscribe, unsubscribe } = useWorkflowStream(workflowRun.id); + + useEffect(() => { + const handlers = { + 'step_started': (data) => updateStepStatus(data.nodeId, 'running'), + 'step_completed': (data) => updateStepStatus(data.nodeId, data.status), + 'step_log': (data) => appendLog(data.nodeId, data.line), + }; + + subscribe(handlers); + return () => unsubscribe(); + }, [workflowRun.id]); + + return ( +
+ {/* Workflow graph with status overlay */} + ( + setSelectedNode(node.id)} + /> + )} + edgeRenderer={(edge) => ( + + )} + /> + + {/* Log panel */} + {selectedNode && ( + + )} + + {/* Progress bar */} + +
+ ); +}; +``` + +### Status Indicators + +| Status | Visual | +|--------|--------| +| Pending | Gray circle | +| Running | Blue spinner | +| Success | Green checkmark | +| Failed | Red X | +| Skipped | Yellow dash | + +--- + +## Canvas Operations + +### Drag and Drop + +- Drag steps from palette to canvas +- Drop creates new node at position +- Connect nodes by dragging from source to target handle +- Multi-select with Shift+click or box selection + +### Validation + +The editor performs real-time validation: + +- **DAG Cycle Detection:** Prevent circular dependencies +- **Orphan Node Detection:** Warn about unconnected nodes +- **Required Inputs:** Highlight missing required configuration +- **Type Compatibility:** Validate edge connections between compatible types + +### Zoom and Pan + +| Action | Control | +|--------|---------| +| Zoom In | Ctrl + Mouse Wheel Up | +| Zoom Out | Ctrl + Mouse Wheel Down | +| Fit View | Ctrl + 0 | +| Pan | Middle Mouse Drag / Space + Drag | +| Reset | Ctrl + R | + +--- + +## YAML Editor Mode + +### Monaco Editor Integration + +The editor supports a bidirectional YAML mode for power users: + +```typescript +interface YAMLEditorProps { + template: WorkflowTemplate; + onChange: (template: WorkflowTemplate) => void; + onValidate: (yaml: string) => ValidationResult; +} + +const YAMLEditor: React.FC = ({ template, onChange, onValidate }) => { + const [yaml, setYaml] = useState(templateToYaml(template)); + + return ( + { + setYaml(value); + const result = onValidate(value); + if (result.valid) { + onChange(yamlToTemplate(value)); + } + }} + options={{ + minimap: { enabled: false }, + lineNumbers: 'on', + scrollBeyondLastLine: false, + }} + /> + ); +}; +``` + +### Bidirectional Sync + +Changes in either view (graph or YAML) are synchronized: + +- Graph changes update YAML immediately +- Valid YAML changes update graph +- Invalid YAML shows error markers without updating graph + +--- + +## Step Palette + +### Available Step Types + +The palette shows all available step types from core and plugins: + +```typescript +interface StepPaletteProps { + stepTypes: StepType[]; + onDragStart: (stepType: string) => void; + filter: string; +} + +const categories = [ + { name: "Deployment", types: ["deploy", "rollback"] }, + { name: "Gates", types: ["security-gate", "approval", "freeze-window-gate"] }, + { name: "Utility", types: ["script", "wait", "notify"] }, + { name: "Plugins", types: [] }, // Dynamically loaded +]; +``` + +--- + +## Keyboard Shortcuts + +| Shortcut | Action | +|----------|--------| +| Ctrl + S | Save template | +| Ctrl + Z | Undo | +| Ctrl + Shift + Z | Redo | +| Delete | Delete selected | +| Ctrl + C | Copy selected | +| Ctrl + V | Paste | +| Ctrl + A | Select all | +| Escape | Deselect / Cancel | + +--- + +## See Also + +- [Workflow Templates](../workflow/templates.md) +- [Workflow APIs](../api/workflows.md) +- [Dashboard](dashboard.md) +- [Key Screens](screens.md) diff --git a/docs/modules/release-orchestrator/workflow/execution.md b/docs/modules/release-orchestrator/workflow/execution.md new file mode 100644 index 000000000..bfb161480 --- /dev/null +++ b/docs/modules/release-orchestrator/workflow/execution.md @@ -0,0 +1,591 @@ +# Workflow Execution + +## Overview + +The Workflow Engine executes workflow templates as DAGs (Directed Acyclic Graphs) of steps, managing state transitions, parallelism, retries, and failure handling. + +## Execution Architecture + +``` + WORKFLOW EXECUTION ARCHITECTURE + + ┌─────────────────────────────────────────────────────────────────────────────┐ + │ WORKFLOW ENGINE │ + │ │ + │ ┌─────────────────────────────────────────────────────────────────────┐ │ + │ │ WORKFLOW RUNNER │ │ + │ │ │ │ + │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ + │ │ │ Template │───►│ Execution │───►│ Context │ │ │ + │ │ │ Parser │ │ Planner │ │ Builder │ │ │ + │ │ └────────────┘ └────────────┘ └────────────┘ │ │ + │ │ │ │ │ │ │ + │ │ └────────────────┼─────────────────┘ │ │ + │ │ ▼ │ │ + │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ + │ │ │ DAG EXECUTOR │ │ │ + │ │ │ │ │ │ + │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ + │ │ │ │ Ready │ │ Running │ │ Waiting │ │ Completed│ │ │ │ + │ │ │ │ Queue │ │ Set │ │ Set │ │ Set │ │ │ │ + │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ + │ │ │ │ │ │ + │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ + │ │ │ │ STEP DISPATCHER │ │ │ │ + │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ + │ │ └─────────────────────────────────────────────────────────────┘ │ │ + │ └─────────────────────────────────────────────────────────────────────┘ │ + │ │ │ + │ ▼ │ + │ ┌─────────────────────────────────────────────────────────────────────┐ │ + │ │ STEP EXECUTOR POOL │ │ + │ │ │ │ + │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ + │ │ │ Executor 1 │ │ Executor 2 │ │ Executor 3 │ │ Executor N │ │ │ + │ │ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │ │ + │ │ │ │ + │ └─────────────────────────────────────────────────────────────────────┘ │ + │ │ + └─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Workflow Run State Machine + +``` + WORKFLOW RUN STATES + + ┌──────────┐ + │ CREATED │ + └────┬─────┘ + │ start() + ▼ + ┌──────────┐ + │ RUNNING │◄──────────────────┐ + └────┬─────┘ │ + │ │ + ┌───────────────────┼───────────────────┐ │ + │ │ │ │ + ▼ ▼ ▼ │ + ┌──────────┐ ┌──────────┐ ┌──────────┐│ + │ WAITING │ │ PAUSED │ │ FAILING ││ + │ APPROVAL │ │ │ │ ││ + └────┬─────┘ └────┬─────┘ └────┬─────┘│ + │ │ │ │ + │ approve() │ resume() │ │ + │ │ │ │ + └───────────────►──┴──────────────────┘ │ + │ │ + └─────────────────────────┘ + │ + ┌───────────────────────┼───────────────────┐ + │ │ │ + ▼ ▼ ▼ + ┌──────────┐ ┌──────────┐ ┌──────────┐ + │COMPLETED │ │ FAILED │ │ CANCELLED│ + └──────────┘ └──────────┘ └──────────┘ +``` + +### State Transitions + +| Current State | Event | Next State | Description | +|---------------|-------|------------|-------------| +| `created` | `start()` | `running` | Begin workflow execution | +| `running` | Step requires approval | `waiting_approval` | Pause for human approval | +| `running` | `pause()` | `paused` | Manual pause requested | +| `running` | Step fails | `failing` | Handle failure path | +| `running` | All steps complete | `completed` | Workflow success | +| `waiting_approval` | `approve()` | `running` | Resume after approval | +| `waiting_approval` | `reject()` | `failed` | Rejection ends workflow | +| `paused` | `resume()` | `running` | Resume execution | +| `paused` | `cancel()` | `cancelled` | Cancel workflow | +| `failing` | Rollback complete | `failed` | Failure handling done | +| `failing` | Rollback succeeds | `running` | Resume with fallback | + +## Step Execution State Machine + +``` + STEP STATES + + ┌──────────┐ + │ PENDING │ + └────┬─────┘ + │ schedule() + ▼ + ┌──────────┐ + │ QUEUED │ + └────┬─────┘ + │ dispatch() + ▼ + ┌──────────┐ + │ RUNNING │◄─────────┐ + └────┬─────┘ │ + │ │ retry() + ┌───────────────────┼───────────────┐│ + │ │ ││ + ▼ ▼ ▼│ + ┌──────────┐ ┌──────────┐ ┌──────────┐ + │SUCCEEDED │ │ FAILED │ │ RETRYING │ + └──────────┘ └────┬─────┘ └──────────┘ + │ + ▼ + ┌─────────────────────┐ + │ FAILURE HANDLER │ + │ ┌───────────────┐ │ + │ │ fail │──┼─► Mark workflow failing + │ │ continue │──┼─► Continue to next step + │ │ rollback │──┼─► Trigger rollback path + │ │ goto:{nodeId} │──┼─► Jump to specific node + │ └───────────────┘ │ + └─────────────────────┘ +``` + +### Step States + +| State | Description | +|-------|-------------| +| `pending` | Step not yet ready (dependencies incomplete) | +| `queued` | Ready for execution, waiting for executor | +| `running` | Currently executing | +| `succeeded` | Completed successfully | +| `failed` | Failed after all retries exhausted | +| `retrying` | Failed, waiting for retry | +| `skipped` | Condition evaluated to false | + +## DAG Execution Algorithm + +```python +class DAGExecutor: + def __init__(self, workflow_run: WorkflowRun): + self.run = workflow_run + self.template = workflow_run.template + self.pending = set(node.id for node in template.nodes) + self.running = set() + self.completed = set() + self.failed = set() + self.outputs = {} # nodeId -> outputs + + async def execute(self): + """Main execution loop.""" + self.run.status = WorkflowStatus.RUNNING + self.run.started_at = datetime.utcnow() + + while self.pending or self.running: + # Find ready nodes (all dependencies satisfied) + ready = self.find_ready_nodes() + + # Dispatch ready nodes + for node_id in ready: + asyncio.create_task(self.execute_node(node_id)) + self.pending.remove(node_id) + self.running.add(node_id) + + # Wait for any node to complete + if self.running: + await self.wait_for_completion() + + # Check for deadlock + if not ready and self.pending and not self.running: + raise DeadlockException(self.pending) + + # Determine final status + if self.failed: + self.run.status = WorkflowStatus.FAILED + else: + self.run.status = WorkflowStatus.COMPLETED + + self.run.completed_at = datetime.utcnow() + + def find_ready_nodes(self) -> List[str]: + """Find nodes whose dependencies are all complete.""" + ready = [] + for node_id in self.pending: + node = self.template.get_node(node_id) + + # Check condition + if node.condition: + if not self.evaluate_condition(node.condition): + self.mark_skipped(node_id) + continue + + # Check all incoming edges + incoming = self.template.get_incoming_edges(node_id) + dependencies_met = all( + edge.from_node in self.completed + for edge in incoming + if self.evaluate_edge_condition(edge) + ) + + if dependencies_met: + ready.append(node_id) + + return ready + + async def execute_node(self, node_id: str): + """Execute a single node.""" + node = self.template.get_node(node_id) + step_run = StepRun( + workflow_run_id=self.run.id, + node_id=node_id, + status=StepStatus.RUNNING + ) + + try: + # Resolve inputs + inputs = self.resolve_inputs(node) + + # Get step executor + executor = self.step_registry.get_executor(node.type) + + # Execute with timeout + async with asyncio.timeout(node.timeout): + outputs = await executor.execute(inputs, node.config) + + # Store outputs + self.outputs[node_id] = outputs + step_run.outputs = outputs + step_run.status = StepStatus.SUCCEEDED + + self.running.remove(node_id) + self.completed.add(node_id) + + except Exception as e: + await self.handle_step_failure(node, step_run, e) + + async def handle_step_failure(self, node, step_run, error): + """Handle step failure according to retry and failure policies.""" + step_run.attempt_number += 1 + + # Check retry policy + if step_run.attempt_number <= node.retry_policy.max_retries: + if self.is_retryable(error, node.retry_policy): + step_run.status = StepStatus.RETRYING + delay = self.calculate_backoff(node.retry_policy, step_run.attempt_number) + await asyncio.sleep(delay) + await self.execute_node(node.id) # Retry + return + + # No more retries - handle failure + step_run.status = StepStatus.FAILED + step_run.error = str(error) + + match node.on_failure: + case "fail": + self.run.status = WorkflowStatus.FAILING + self.failed.add(node.id) + case "continue": + self.completed.add(node.id) # Continue as if succeeded + case "rollback": + await self.trigger_rollback(node) + case _ if node.on_failure.startswith("goto:"): + target = node.on_failure.split(":")[1] + self.pending.add(target) # Add target to pending + + self.running.remove(node.id) +``` + +## Input Resolution + +Inputs to steps can come from multiple sources: + +```typescript +interface InputResolver { + resolve(binding: InputBinding, context: ExecutionContext): any; +} + +class StandardInputResolver implements InputResolver { + resolve(binding: InputBinding, context: ExecutionContext): any { + switch (binding.source.type) { + case "literal": + return binding.source.value; + + case "context": + // Navigate context path: "release.name" -> context.release.name + return this.navigatePath(context, binding.source.path); + + case "output": + // Get output from previous step + const stepOutputs = context.stepOutputs[binding.source.nodeId]; + return stepOutputs?.[binding.source.outputName]; + + case "secret": + // Fetch from vault (never cached) + return this.secretsClient.fetch(binding.source.secretName); + + case "expression": + // Evaluate JavaScript expression + return this.expressionEvaluator.evaluate( + binding.source.expression, + context + ); + } + } +} +``` + +## Execution Context + +The execution context provides data available to all steps: + +```typescript +interface ExecutionContext { + // Workflow identifiers + workflowRunId: UUID; + templateId: UUID; + templateVersion: number; + + // Input values + inputs: Record; + + // Domain objects (loaded at start) + release?: Release; + promotion?: Promotion; + environment?: Environment; + targets?: Target[]; + + // Step outputs (accumulated during execution) + stepOutputs: Record>; + + // Tenant context + tenantId: UUID; + userId: UUID; + + // Metadata + startedAt: DateTime; + correlationId: string; +} +``` + +## Concurrency Control + +### Parallelism Within Workflows + +```typescript +interface ParallelConfig { + maxConcurrency: number; // Max simultaneous steps + failFast: boolean; // Stop all on first failure +} + +// Example: Parallel deployment to multiple targets +const parallelDeploy: StepNode = { + id: "parallel-deploy", + type: "parallel", + config: { + maxConcurrency: 5, + failFast: false + }, + children: [ + { id: "deploy-target-1", type: "deploy-docker", ... }, + { id: "deploy-target-2", type: "deploy-docker", ... }, + { id: "deploy-target-3", type: "deploy-docker", ... }, + ] +}; +``` + +### Global Concurrency Limits + +```typescript +interface ConcurrencyLimits { + maxWorkflowsPerTenant: number; // Concurrent workflow runs + maxStepsPerWorkflow: number; // Concurrent steps per workflow + maxDeploymentsPerEnvironment: number; // Prevent deployment conflicts +} + +// Default limits +const defaults: ConcurrencyLimits = { + maxWorkflowsPerTenant: 10, + maxStepsPerWorkflow: 20, + maxDeploymentsPerEnvironment: 1 // One deployment at a time +}; +``` + +## Checkpoint and Resume + +Workflows support checkpointing for long-running executions: + +```typescript +interface WorkflowCheckpoint { + workflowRunId: UUID; + checkpointedAt: DateTime; + + // Execution state + pendingNodes: string[]; + completedNodes: string[]; + failedNodes: string[]; + + // Accumulated data + stepOutputs: Record>; + + // Context snapshot + contextSnapshot: ExecutionContext; +} + +class CheckpointManager { + // Save checkpoint after each step completion + async saveCheckpoint(run: WorkflowRun): Promise { + const checkpoint: WorkflowCheckpoint = { + workflowRunId: run.id, + checkpointedAt: new Date(), + pendingNodes: Array.from(run.executor.pending), + completedNodes: Array.from(run.executor.completed), + failedNodes: Array.from(run.executor.failed), + stepOutputs: run.executor.outputs, + contextSnapshot: run.context + }; + + await this.repository.save(checkpoint); + } + + // Resume from checkpoint after service restart + async resumeFromCheckpoint(workflowRunId: UUID): Promise { + const checkpoint = await this.repository.get(workflowRunId); + + const run = new WorkflowRun(); + run.executor.pending = new Set(checkpoint.pendingNodes); + run.executor.completed = new Set(checkpoint.completedNodes); + run.executor.failed = new Set(checkpoint.failedNodes); + run.executor.outputs = checkpoint.stepOutputs; + run.context = checkpoint.contextSnapshot; + + // Resume execution + await run.executor.execute(); + return run; + } +} +``` + +## Timeout Handling + +```typescript +interface TimeoutConfig { + stepTimeout: number; // Per-step timeout (seconds) + workflowTimeout: number; // Total workflow timeout (seconds) +} + +class TimeoutHandler { + async executeWithTimeout( + operation: () => Promise, + timeoutSeconds: number, + onTimeout: () => Promise + ): Promise { + const controller = new AbortController(); + const timeoutId = setTimeout( + () => controller.abort(), + timeoutSeconds * 1000 + ); + + try { + const result = await operation(); + clearTimeout(timeoutId); + return result; + } catch (error) { + if (error.name === 'AbortError') { + await onTimeout(); + throw new TimeoutException(timeoutSeconds); + } + throw error; + } + } +} +``` + +## Event Emission + +The workflow engine emits events for observability: + +```typescript +type WorkflowEvent = + | { type: "workflow.started"; workflowRunId: UUID; templateId: UUID } + | { type: "workflow.completed"; workflowRunId: UUID; status: string } + | { type: "workflow.failed"; workflowRunId: UUID; error: string } + | { type: "step.started"; workflowRunId: UUID; nodeId: string } + | { type: "step.completed"; workflowRunId: UUID; nodeId: string; outputs: any } + | { type: "step.failed"; workflowRunId: UUID; nodeId: string; error: string } + | { type: "step.retrying"; workflowRunId: UUID; nodeId: string; attempt: number }; + +class WorkflowEventEmitter { + private subscribers: Map void)[]> = new Map(); + + emit(event: WorkflowEvent): void { + const handlers = this.subscribers.get(event.type) || []; + for (const handler of handlers) { + handler(event); + } + + // Also emit to event bus for external consumers + this.eventBus.publish("workflow.events", event); + } +} +``` + +## Execution Monitoring + +### Real-time Progress + +```typescript +interface WorkflowProgress { + workflowRunId: UUID; + status: WorkflowStatus; + + // Step progress + totalSteps: number; + completedSteps: number; + runningSteps: number; + failedSteps: number; + + // Current activity + currentNodes: string[]; + + // Timing + startedAt: DateTime; + estimatedCompletion?: DateTime; + + // Step details + steps: StepProgress[]; +} + +interface StepProgress { + nodeId: string; + nodeName: string; + status: StepStatus; + startedAt?: DateTime; + completedAt?: DateTime; + attempt: number; + logs?: string; +} +``` + +### WebSocket Streaming + +```typescript +// Client subscribes to workflow progress +const ws = new WebSocket(`/api/v1/workflow-runs/${runId}/stream`); + +ws.onmessage = (event) => { + const progress: WorkflowProgress = JSON.parse(event.data); + updateUI(progress); +}; + +// Server streams updates +class WorkflowStreamHandler { + async stream(runId: UUID, connection: WebSocket): Promise { + const subscription = this.eventBus.subscribe(`workflow.${runId}.*`); + + for await (const event of subscription) { + const progress = await this.buildProgress(runId); + connection.send(JSON.stringify(progress)); + + if (progress.status === 'completed' || progress.status === 'failed') { + break; + } + } + + connection.close(); + } +} +``` + +## References + +- [Workflow Templates](templates.md) +- [Workflow Engine Module](../modules/workflow-engine.md) +- [Promotion Manager](../modules/promotion-manager.md) diff --git a/docs/modules/release-orchestrator/workflow/promotion.md b/docs/modules/release-orchestrator/workflow/promotion.md new file mode 100644 index 000000000..85248cb8e --- /dev/null +++ b/docs/modules/release-orchestrator/workflow/promotion.md @@ -0,0 +1,405 @@ +# Promotion State Machine + +## Overview + +Promotions move releases through environments (Dev -> Staging -> Production). The promotion state machine manages the lifecycle from request to completion. + +## Promotion States + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ PROMOTION STATE MACHINE │ +│ │ +│ ┌──────────────────┐ │ +│ │ PENDING_APPROVAL │ (initial) │ +│ └────────┬─────────┘ │ +│ │ │ +│ ┌──────────────────┼──────────────────┐ │ +│ │ │ │ │ +│ ▼ ▼ ▼ │ +│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │ +│ │ REJECTED │ │ PENDING_GATE │ │ CANCELLED │ │ +│ └────────────────┘ └────────┬───────┘ └────────────────┘ │ +│ │ │ +│ │ gates pass │ +│ ▼ │ +│ ┌────────────────┐ │ +│ │ APPROVED │ │ +│ └────────┬───────┘ │ +│ │ │ +│ │ start deployment │ +│ ▼ │ +│ ┌────────────────┐ │ +│ │ DEPLOYING │ │ +│ └────────┬───────┘ │ +│ │ │ +│ ┌──────────────────┼──────────────────┐ │ +│ │ │ │ │ +│ ▼ ▼ ▼ │ +│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │ +│ │ FAILED │ │ DEPLOYED │ │ ROLLED_BACK │ │ +│ └────────────────┘ └────────────────┘ └────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## State Definitions + +| State | Description | +|-------|-------------| +| `pending_approval` | Awaiting human approval (if required) | +| `pending_gate` | Awaiting automated gate evaluation | +| `approved` | All approvals and gates satisfied; ready for deployment | +| `rejected` | Blocked by approval rejection or gate failure | +| `deploying` | Deployment in progress | +| `deployed` | Successfully deployed to target environment | +| `failed` | Deployment failed (not rolled back) | +| `cancelled` | Cancelled by user before completion | +| `rolled_back` | Deployment rolled back to previous version | + +## State Transitions + +### Valid Transitions + +```typescript +const validTransitions: Record = { + pending_approval: ["pending_gate", "approved", "rejected", "cancelled"], + pending_gate: ["approved", "rejected", "cancelled"], + approved: ["deploying", "cancelled"], + deploying: ["deployed", "failed", "rolled_back"], + rejected: [], // terminal + cancelled: [], // terminal + deployed: [], // terminal (for this promotion) + failed: ["rolled_back"], // can trigger rollback + rolled_back: [] // terminal +}; +``` + +### Transition Events + +```typescript +interface PromotionTransition { + promotionId: UUID; + fromState: PromotionStatus; + toState: PromotionStatus; + trigger: TransitionTrigger; + triggeredBy: UUID; // user or system + timestamp: DateTime; + details: object; +} + +type TransitionTrigger = + | "approval_granted" + | "approval_rejected" + | "gate_passed" + | "gate_failed" + | "deployment_started" + | "deployment_completed" + | "deployment_failed" + | "rollback_triggered" + | "rollback_completed" + | "user_cancelled"; +``` + +## Promotion Flow + +### 1. Request Promotion + +```typescript +async function requestPromotion(request: PromotionRequest): Promise { + // Validate release exists and is ready + const release = await getRelease(request.releaseId); + if (release.status !== "ready" && release.status !== "deployed") { + throw new Error("Release not ready for promotion"); + } + + // Validate target environment + const environment = await getEnvironment(request.targetEnvironmentId); + + // Check freeze windows + if (await isEnvironmentFrozen(environment.id)) { + throw new Error("Environment is frozen"); + } + + // Determine initial state + const requiresApproval = environment.requiredApprovals > 0; + const initialStatus = requiresApproval ? "pending_approval" : "pending_gate"; + + // Create promotion + const promotion = await createPromotion({ + releaseId: request.releaseId, + sourceEnvironmentId: release.currentEnvironmentId, + targetEnvironmentId: environment.id, + status: initialStatus, + requestedBy: request.userId, + requestReason: request.reason + }); + + // Emit event + await emitEvent("promotion.requested", promotion); + + return promotion; +} +``` + +### 2. Approval Phase + +```typescript +async function processApproval( + promotionId: UUID, + approverId: UUID, + action: "approve" | "reject", + comment?: string +): Promise { + const promotion = await getPromotion(promotionId); + const environment = await getEnvironment(promotion.targetEnvironmentId); + + // Validate approver can approve + await validateApproverPermission(approverId, environment.id); + + // Check separation of duties + if (environment.requireSeparationOfDuties) { + if (approverId === promotion.requestedBy) { + throw new Error("Separation of duties violation: requester cannot approve"); + } + } + + // Record approval + await recordApproval({ + promotionId, + approverId, + action, + comment + }); + + if (action === "reject") { + return await transitionState(promotion, "rejected", { + trigger: "approval_rejected", + triggeredBy: approverId, + details: { reason: comment } + }); + } + + // Check if all required approvals received + const approvalCount = await countApprovals(promotionId); + if (approvalCount >= environment.requiredApprovals) { + return await transitionState(promotion, "pending_gate", { + trigger: "approval_granted", + triggeredBy: approverId + }); + } + + return promotion; +} +``` + +### 3. Gate Evaluation + +```typescript +async function evaluateGates(promotionId: UUID): Promise { + const promotion = await getPromotion(promotionId); + const environment = await getEnvironment(promotion.targetEnvironmentId); + const release = await getRelease(promotion.releaseId); + + const gateResults: GateResult[] = []; + + // Security gate + const securityResult = await evaluateSecurityGate(release, environment); + gateResults.push(securityResult); + + // Custom policy gates + for (const policy of environment.policies) { + const policyResult = await evaluatePolicyGate(release, environment, policy); + gateResults.push(policyResult); + } + + // Aggregate results + const allPassed = gateResults.every(g => g.passed); + const blockingFailures = gateResults.filter(g => !g.passed && g.blocking); + + // Create decision record + const decisionRecord = await createDecisionRecord({ + promotionId, + gateResults, + decision: allPassed ? "allow" : "block", + decidedAt: new Date() + }); + + // Transition state + if (allPassed) { + await transitionState(promotion, "approved", { + trigger: "gate_passed", + triggeredBy: "system", + details: { decisionRecordId: decisionRecord.id } + }); + } else { + await transitionState(promotion, "rejected", { + trigger: "gate_failed", + triggeredBy: "system", + details: { blockingGates: blockingFailures } + }); + } + + return { passed: allPassed, gateResults, decisionRecord }; +} +``` + +### 4. Deployment Execution + +```typescript +async function executeDeployment(promotionId: UUID): Promise { + const promotion = await getPromotion(promotionId); + + // Transition to deploying + await transitionState(promotion, "deploying", { + trigger: "deployment_started", + triggeredBy: "system" + }); + + // Generate artifacts + const artifacts = await generateArtifacts(promotion); + + // Create deployment job + const job = await createDeploymentJob({ + promotionId, + releaseId: promotion.releaseId, + environmentId: promotion.targetEnvironmentId, + artifacts + }); + + // Execute via workflow or direct + const workflowRun = await startDeploymentWorkflow(job); + + // Update promotion with workflow reference + await updatePromotion(promotionId, { workflowRunId: workflowRun.id }); + + return job; +} +``` + +### 5. Completion Handling + +```typescript +async function handleDeploymentCompletion( + jobId: UUID, + status: "succeeded" | "failed" +): Promise { + const job = await getDeploymentJob(jobId); + const promotion = await getPromotion(job.promotionId); + + if (status === "succeeded") { + // Generate evidence packet + const evidence = await generateEvidencePacket(promotion, job); + + // Update release environment state + await updateReleaseEnvironmentState({ + releaseId: promotion.releaseId, + environmentId: promotion.targetEnvironmentId, + status: "deployed", + promotionId: promotion.id, + evidenceRef: evidence.id + }); + + return await transitionState(promotion, "deployed", { + trigger: "deployment_completed", + triggeredBy: "system", + details: { evidencePacketId: evidence.id } + }); + } else { + return await transitionState(promotion, "failed", { + trigger: "deployment_failed", + triggeredBy: "system", + details: { jobId, error: job.errorMessage } + }); + } +} +``` + +## Decision Record + +Every promotion produces a decision record: + +```typescript +interface DecisionRecord { + id: UUID; + promotionId: UUID; + decision: "allow" | "block"; + decidedAt: DateTime; + + // Inputs + release: { + id: UUID; + name: string; + components: Array<{ name: string; digest: string }>; + }; + environment: { + id: UUID; + name: string; + }; + + // Gate results + gateResults: Array<{ + gateName: string; + gateType: string; + passed: boolean; + blocking: boolean; + message: string; + details: object; + evaluatedAt: DateTime; + }>; + + // Approvals + approvals: Array<{ + approverId: UUID; + approverName: string; + action: "approved" | "rejected"; + comment?: string; + timestamp: DateTime; + }>; + + // Context + requester: { + id: UUID; + name: string; + }; + requestReason: string; + + // Signature + contentHash: string; + signature: string; +} +``` + +## API Endpoints + +```yaml +# Request promotion +POST /api/v1/promotions +Body: { releaseId, targetEnvironmentId, reason? } +Response: Promotion + +# Approve/reject promotion +POST /api/v1/promotions/{id}/approve +POST /api/v1/promotions/{id}/reject +Body: { comment? } +Response: Promotion + +# Cancel promotion +POST /api/v1/promotions/{id}/cancel +Response: Promotion + +# Get decision record +GET /api/v1/promotions/{id}/decision +Response: DecisionRecord + +# Preview gates (dry run) +POST /api/v1/promotions/preview-gates +Body: { releaseId, targetEnvironmentId } +Response: { wouldPass: boolean, gates: GateResult[] } +``` + +## References + +- [Workflow Templates](templates.md) +- [Workflow Execution](execution.md) +- [Evidence Schema](../appendices/evidence-schema.md) diff --git a/docs/modules/release-orchestrator/workflow/templates.md b/docs/modules/release-orchestrator/workflow/templates.md new file mode 100644 index 000000000..9d8e13d7f --- /dev/null +++ b/docs/modules/release-orchestrator/workflow/templates.md @@ -0,0 +1,327 @@ +# Workflow Template Structure + +## Overview + +Workflow templates define the DAG (Directed Acyclic Graph) of steps to execute during deployment, promotion, and other automated processes. + +## Template Structure + +```typescript +interface WorkflowTemplate { + id: UUID; + tenantId: UUID; + name: string; // "standard-deploy" + displayName: string; // "Standard Deployment" + description: string; + version: number; // Auto-incremented + + // DAG structure + nodes: StepNode[]; + edges: StepEdge[]; + + // I/O definitions + inputs: InputDefinition[]; + outputs: OutputDefinition[]; + + // Metadata + tags: string[]; + isBuiltin: boolean; + createdAt: DateTime; + createdBy: UUID; +} +``` + +## Node Types + +### Step Node + +```typescript +interface StepNode { + id: string; // Unique within template (e.g., "deploy-api") + type: string; // Step type from registry + name: string; // Display name + config: Record; // Step-specific configuration + inputs: InputBinding[]; // Input value bindings + outputs: OutputBinding[]; // Output declarations + position: { x: number; y: number }; // UI position + + // Execution settings + timeout: number; // Seconds (default from step type) + retryPolicy: RetryPolicy; + onFailure: FailureAction; + condition?: string; // JS expression for conditional execution + + // Documentation + description?: string; + documentation?: string; +} + +type FailureAction = "fail" | "continue" | "rollback" | "goto:{nodeId}"; + +interface RetryPolicy { + maxRetries: number; + backoffType: "fixed" | "exponential"; + backoffSeconds: number; + retryableErrors: string[]; +} +``` + +### Input Bindings + +```typescript +interface InputBinding { + name: string; // Input parameter name + source: InputSource; +} + +type InputSource = + | { type: "literal"; value: any } + | { type: "context"; path: string } // e.g., "release.name" + | { type: "output"; nodeId: string; outputName: string } + | { type: "secret"; secretName: string } + | { type: "expression"; expression: string }; // JS expression +``` + +### Edge Types + +```typescript +interface StepEdge { + id: string; + from: string; // Source node ID + to: string; // Target node ID + condition?: string; // Optional condition expression + label?: string; // Display label for conditional edges +} +``` + +## Built-in Step Types + +### Control Steps + +| Type | Description | Config | +|------|-------------|--------| +| `approval` | Wait for human approval | `promotionId` | +| `wait` | Wait for specified duration | `durationSeconds` | +| `condition` | Branch based on condition | `expression` | +| `parallel` | Execute children in parallel | `maxConcurrency` | + +### Gate Steps + +| Type | Description | Config | +|------|-------------|--------| +| `security-gate` | Evaluate security policy | `blockOnCritical`, `blockOnHigh` | +| `custom-gate` | Custom OPA policy evaluation | `policyName` | +| `freeze-check` | Check freeze windows | - | +| `approval-check` | Check approval status | `requiredCount` | + +### Deploy Steps + +| Type | Description | Config | +|------|-------------|--------| +| `deploy-docker` | Deploy single container | `containerName`, `strategy` | +| `deploy-compose` | Deploy Docker Compose stack | `composePath`, `strategy` | +| `deploy-ecs` | Deploy to AWS ECS | `cluster`, `service` | +| `deploy-nomad` | Deploy to HashiCorp Nomad | `jobName` | + +### Verification Steps + +| Type | Description | Config | +|------|-------------|--------| +| `health-check` | HTTP/TCP health check | `type`, `path`, `expectedStatus` | +| `smoke-test` | Run smoke test suite | `testSuite`, `timeout` | +| `verify-digest` | Verify deployed digest | `expectedDigest` | + +### Integration Steps + +| Type | Description | Config | +|------|-------------|--------| +| `webhook` | Call external webhook | `url`, `method`, `headers` | +| `trigger-ci` | Trigger CI pipeline | `integrationId`, `pipelineId` | +| `wait-ci` | Wait for CI pipeline | `runId`, `timeout` | + +### Notification Steps + +| Type | Description | Config | +|------|-------------|--------| +| `notify` | Send notification | `channel`, `template` | +| `slack` | Send Slack message | `channel`, `message` | +| `email` | Send email | `recipients`, `template` | + +### Recovery Steps + +| Type | Description | Config | +|------|-------------|--------| +| `rollback` | Rollback deployment | `strategy`, `targetReleaseId` | +| `execute-script` | Run recovery script | `scriptType`, `scriptRef` | + +## Template Example: Standard Deployment + +```json +{ + "id": "template-standard-deploy", + "name": "standard-deploy", + "displayName": "Standard Deployment", + "version": 1, + "inputs": [ + { "name": "releaseId", "type": "uuid", "required": true }, + { "name": "environmentId", "type": "uuid", "required": true }, + { "name": "promotionId", "type": "uuid", "required": true } + ], + "nodes": [ + { + "id": "approval", + "type": "approval", + "name": "Approval Gate", + "config": {}, + "inputs": [ + { "name": "promotionId", "source": { "type": "context", "path": "promotionId" } } + ], + "position": { "x": 100, "y": 100 } + }, + { + "id": "security-gate", + "type": "security-gate", + "name": "Security Verification", + "config": { + "blockOnCritical": true, + "blockOnHigh": true + }, + "inputs": [ + { "name": "releaseId", "source": { "type": "context", "path": "releaseId" } } + ], + "position": { "x": 100, "y": 200 } + }, + { + "id": "pre-deploy-hook", + "type": "execute-script", + "name": "Pre-Deploy Hook", + "config": { + "scriptType": "csharp", + "scriptRef": "hooks/pre-deploy.csx" + }, + "inputs": [ + { "name": "release", "source": { "type": "context", "path": "release" } }, + { "name": "environment", "source": { "type": "context", "path": "environment" } } + ], + "timeout": 300, + "onFailure": "fail", + "position": { "x": 100, "y": 300 } + }, + { + "id": "deploy-targets", + "type": "deploy-compose", + "name": "Deploy to Targets", + "config": { + "strategy": "rolling", + "parallelism": 2 + }, + "inputs": [ + { "name": "releaseId", "source": { "type": "context", "path": "releaseId" } }, + { "name": "environmentId", "source": { "type": "context", "path": "environmentId" } } + ], + "timeout": 600, + "retryPolicy": { + "maxRetries": 2, + "backoffType": "exponential", + "backoffSeconds": 30 + }, + "onFailure": "rollback", + "position": { "x": 100, "y": 400 } + }, + { + "id": "health-check", + "type": "health-check", + "name": "Health Verification", + "config": { + "type": "http", + "path": "/health", + "expectedStatus": 200, + "timeout": 30, + "retries": 5 + }, + "inputs": [ + { "name": "targets", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "deployedTargets" } } + ], + "onFailure": "rollback", + "position": { "x": 100, "y": 500 } + }, + { + "id": "post-deploy-hook", + "type": "execute-script", + "name": "Post-Deploy Hook", + "config": { + "scriptType": "bash", + "inline": "echo 'Deployment complete'" + }, + "timeout": 300, + "onFailure": "continue", + "position": { "x": 100, "y": 600 } + }, + { + "id": "notify-success", + "type": "notify", + "name": "Success Notification", + "config": { + "channel": "slack", + "template": "deployment-success" + }, + "inputs": [ + { "name": "release", "source": { "type": "context", "path": "release" } }, + { "name": "environment", "source": { "type": "context", "path": "environment" } } + ], + "onFailure": "continue", + "position": { "x": 100, "y": 700 } + }, + { + "id": "rollback-handler", + "type": "rollback", + "name": "Rollback Handler", + "config": { + "strategy": "to-previous" + }, + "inputs": [ + { "name": "deploymentJobId", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "jobId" } } + ], + "position": { "x": 300, "y": 450 } + }, + { + "id": "notify-failure", + "type": "notify", + "name": "Failure Notification", + "config": { + "channel": "slack", + "template": "deployment-failure" + }, + "onFailure": "continue", + "position": { "x": 300, "y": 550 } + } + ], + "edges": [ + { "id": "e1", "from": "approval", "to": "security-gate" }, + { "id": "e2", "from": "security-gate", "to": "pre-deploy-hook" }, + { "id": "e3", "from": "pre-deploy-hook", "to": "deploy-targets" }, + { "id": "e4", "from": "deploy-targets", "to": "health-check" }, + { "id": "e5", "from": "health-check", "to": "post-deploy-hook" }, + { "id": "e6", "from": "post-deploy-hook", "to": "notify-success" }, + { "id": "e7", "from": "deploy-targets", "to": "rollback-handler", "condition": "status === 'failed'" }, + { "id": "e8", "from": "health-check", "to": "rollback-handler", "condition": "status === 'failed'" }, + { "id": "e9", "from": "rollback-handler", "to": "notify-failure" } + ] +} +``` + +## Template Validation + +Templates are validated for: + +1. **Structural validity**: Valid JSON/YAML, required fields present +2. **DAG validity**: No cycles, all edges reference valid nodes +3. **Type validity**: All step types exist in registry +4. **Schema validity**: Step configs match type schemas +5. **Input validity**: All required inputs are bindable + +## References + +- [Workflow Engine](../modules/workflow-engine.md) +- [Execution State Machine](execution.md) +- [Step Registry](../modules/workflow-engine.md#module-step-registry) diff --git a/docs/overview.md b/docs/overview.md index 6fdce8e04..e1f5b2063 100644 --- a/docs/overview.md +++ b/docs/overview.md @@ -1,39 +1,75 @@ -# Stella Ops – 2‑Minute Overview +# Stella Ops Suite — 2-Minute Overview -## The Problem We Solve +## What Stella Ops Suite Is -- **Supply-chain attacks exploded 742 % in three years;** regulated teams still need to scan hundreds of containers a day while disconnected from the public Internet. -- **Existing scanners trade freedom for SaaS:** no offline feeds, hidden quotas, noisy results that lack exploitability context. -- **Audit fatigue is real:** Policy decisions are opaque, replaying scans is guesswork, and trust hinges on external transparency logs you do not control. +**Stella Ops Suite is a centralized, auditable release control plane for non-Kubernetes container estates.** -## The Promise +It sits between your CI and your runtime targets, governs promotion across environments, enforces security and policy gates, and produces verifiable evidence for every release decision—while remaining plug-in friendly to any SCM/CI/registry/secrets stack. -Stella Ops delivers **deterministic, sovereign container security** that works the same online or fully air-gapped: +## The Problems We Solve -1. **Deterministic replay manifests** (SRM) prove every scan result, so auditors can rerun evidence and see the exact same outcome. -2. **Lattice policy engine + OpenVEX** keeps findings explainable; exploitability, attestation, and waivers merge into one verdict. -3. **Sovereign crypto profiles** let you anchor signatures to eIDAS, FIPS, GOST, or SM roots, mirror your feeds, and keep Sigstore-compatible transparency logs offline. +- **Release governance is fragmented:** CI tools run pipelines but lack central release authority; deployment tools promote but bolt on security as an afterthought. +- **Non-Kubernetes targets are second-class:** Docker hosts, Compose, ECS, and Nomad deployments lack the GitOps tooling that Kubernetes enjoys. +- **Security blocks releases without explanation:** Scanners find vulnerabilities but don't integrate with promotion workflows; teams bypass gates or ignore findings. +- **Audit trails are scattered:** Release decisions live in CI logs, approval emails, and Slack threads—not in a unified, cryptographically verifiable ledger. +- **Pricing punishes automation:** Per-project, per-seat, or per-deployment billing creates friction for teams that deploy frequently. -## Core Capability Clusters +## What Stella Ops Suite Does -| Cluster | What you get | Why it matters | -|---------|--------------|----------------| -| **SBOM-first scanning** | Delta-layer SBOM cache, sub‑5 s warm scans, Trivy/CycloneDX/SPDX ingestion + dependency cartographing | Speeds repeat scans 10× and keeps SBOMs the source of truth | -| **Explainable policy** | OpenVEX + lattice logic, policy engine for custom rule packs, waiver expirations | Reduces alert fatigue, supports alert muting beyond VEX, and shows why a finding blocks deploy | -| **Attestation & provenance** | DSSE bundles, optional Rekor mirror, DSSE → CLI/UI exports | Lets you prove integrity without relying on external services | -| **Offline operations** | Offline Update Kit bundles, mirrored feeds, quota tokens verified locally | Works for sovereign clouds, SCIFs, and heavily regulated sectors | -| **Governance & observability** | Structured audit trails, quota transparency, per-tenant metrics | Keeps compliance teams and operators in sync without extra tooling | +| Capability | Description | +|------------|-------------| +| **Release orchestration** | UI-driven promotion (Dev → Stage → Prod), approvals, policy gates, rollbacks; steps are hook-able with scripts and step providers | +| **Security decisioning as a gate** | Scan on build, evaluate on release, re-evaluate when vulnerability intelligence updates—without forcing re-scans | +| **OCI-digest-first releases** | A release is an immutable digest (or bundle of digests); track "what is deployed where" with integrity | +| **Toolchain-agnostic integrations** | Plug into any SCM, any CI, any registry, any secrets system; customers reuse their existing stack | +| **Auditability + standards** | Audit log + evidence packets (exportable), SBOM/VEX/attestation-friendly, standards-first approach | + +## Core Strengths + +| Strength | Why It Matters | +|----------|----------------| +| **Non-Kubernetes specialization** | Docker hosts, Compose, ECS, Nomad-style targets are first-class, not an afterthought | +| **Reproducibility** | Deterministic release decisions captured as evidence (inputs + policy hash + verdict + approvals) | +| **Attestability** | Produces and verifies release evidence/attestations (provenance, SBOM linkage, decision records) in standard formats | +| **Verity (integrity)** | Digest-based release identity; signature/provenance verification; tamper-evident audit trail | +| **Hybrid reachability** | Reachability-aware vulnerability prioritization (static + runtime signals) to reduce noise and focus on exploitable paths | +| **Cost that doesn't punish automation** | No per-project tax, no per-seat tax, no "deployments bill." Limits are only: (1) number of environments and (2) number of new digests analyzed per day | ## Who Benefits -| Persona | Outcome in week one | -|---------|--------------------| -| **Security engineering** | Deterministic replay + explain traces | cuts review time, keeps waivers honest | -| **Platform / SRE** | Fast scans, local registry, no Internet dependency | fits pipelines and air-gapped staging | -| **Compliance & risk** | Signed SBOMs, provable quotas, legal/attestation docs | supports audits without custom tooling | +| Persona | Outcome | +|---------|---------| +| **Release managers** | Central control plane for promotions; clear approval workflows; audit-ready evidence | +| **Security engineering** | Security gates integrated into release flow; reachability-aware prioritization; VEX support | +| **Platform / SRE** | Deploy to Docker/Compose/ECS/Nomad with agents or agentless; rollback with confidence | +| **Compliance & risk** | Every release decision is cryptographically signed and replayable; export compliance reports | +| **DevOps / CI owners** | Integrate via webhooks; keep existing CI/SCM/registry; add release governance without replacing tools | + +## Platform Capabilities + +### Operational Today + +- **Vulnerability scanning** with SBOM-first approach and delta-layer caching +- **Advisory ingestion** from multiple sources with aggregation-not-merge semantics +- **VEX support** for exploitability decisioning (OpenVEX + SPDX 3.0.1 relationships) +- **Policy engine** with lattice logic for explainable, deterministic verdicts +- **Attestation and signing** (DSSE/in-toto) with optional Sigstore Rekor transparency +- **Offline operations** via Offline Kit bundles for air-gapped deployments +- **Sovereign crypto profiles** (eIDAS, FIPS, GOST, SM) + +### Planned (Release Orchestration) + +- **Environment management** — Define Dev/Stage/Prod environments with freeze windows and approval policies +- **Release bundles** — Compose releases from component digests with semantic versioning +- **Promotion workflows** — DAG-based workflow engine with approvals, gates, and hooks +- **Deployment execution** — Agents for Docker, Compose, ECS, Nomad; agentless via SSH/WinRM +- **Progressive delivery** — A/B releases, canary deployments, traffic routing +- **Plugin system** — Three-surface plugin model for integrations, steps, and agents +- **Version stickers** — Tamper-evident deployment records on targets for drift detection ## Where to Go Next -- Ready to pull the containers? Head to [quickstart.md](quickstart.md). -- Want the capability detail? Browse the five cards in [key-features.md](key-features.md). -- Need to evaluate fit and build a rollout plan? Grab the [evaluation checklist](onboarding/evaluation-checklist.md). +- Ready to try it? Head to [quickstart.md](quickstart.md) +- Want capability details? Browse [key-features.md](key-features.md) +- Understand the architecture? See [ARCHITECTURE_OVERVIEW.md](ARCHITECTURE_OVERVIEW.md) +- Review the roadmap? Check [ROADMAP.md](ROADMAP.md) diff --git a/docs/product/VISION.md b/docs/product/VISION.md index 0da50e8d8..c135b7c0f 100755 --- a/docs/product/VISION.md +++ b/docs/product/VISION.md @@ -1,409 +1,299 @@ -#  3 · Product Vision — **Stella Ops** +# Product Vision — Stella Ops Suite -> Stella Ops isn't just another scanner—it's a different product category: **deterministic, evidence-linked vulnerability decisions** that survive auditors, regulators, and supply-chain propagation. +> Stella Ops Suite isn't just another scanner or deployment tool—it's a different product category: **a centralized, auditable release control plane** that gates releases using reachability-aware security and produces verifiable evidence for every decision. ## 1) Problem Statement & Goals -We ship containers. We need: -- **Authenticity & integrity** of build artifacts and metadata. -- **Provenance** attached to artifacts, not platforms. -- **Transparency** to detect tampering and retroactive edits. -- **Determinism & explainability** so scanner judgments can be replayed and justified. -- **Actionability** to separate theoretical from exploitable risk (VEX). -- **Minimal trust** across multi‑tenant and third‑party boundaries. +We ship containers to non-Kubernetes targets (Docker hosts, Compose, ECS, Nomad). We need: -**Non‑goals:** Building a new package manager, inventing new SBOM/attestation formats, or depending on closed standards. +- **Release governance** across environments (Dev → Stage → Prod) with approvals and audit trails. +- **Security as a gate, not a blocker** — integrate vulnerability decisions into promotion workflows. +- **Digest-based release identity** — immutable releases, not mutable tags. +- **Toolchain flexibility** — plug into any SCM, CI, registry, and secrets system. +- **Determinism & explainability** — release decisions can be replayed and justified. +- **Evidence packets** — every release decision links to concrete artifacts. +- **Non-Kubernetes first-class support** — Docker hosts, Compose, ECS, Nomad are not afterthoughts. +- **Pricing that doesn't punish automation** — no per-project, per-seat, or per-deployment taxes. + +**Non-goals:** Replacing CI systems, building Kubernetes deployments (use ArgoCD/Flux), or inventing new SBOM/attestation formats. --- -## 2) Golden Path (Minimal End‑to‑End Flow) +## 2) Golden Path (Release-Centric Flow) + +``` +Build → Scan → Create Release → Request Promotion → Gate Evaluation → Deploy → Evidence +``` ```mermaid flowchart LR - A[Source / Image / Rootfs] --> B[SBOM Producer\nCycloneDX 1.7] - B --> C[Signer\nin‑toto Attestation + DSSE] - C --> D[Transparency\nSigstore Rekor - optional but RECOMMENDED] - D --> E[Durable Storage\nSBOMs, Attestations, Proofs] - E --> F[Scanner\nPkg analyzers + Entry‑trace + Layer cache] - F --> G[VEX Authoring\nOpenVEX + SPDX 3.0.1 relationships] - G --> H[Policy Gate\nOPA/Rego: allow/deny + waivers] - H --> I[Artifacts Store\nReports, SARIF, VEX, Audit log] -```` + A[CI Build] --> B[OCI Registry\nPush by digest] + B --> C[Stella Scan\nSBOM + Vuln Analysis] + C --> D[Create Release\nDigest bundle] + D --> E[Request Promotion\nDev → Stage → Prod] + E --> F[Gate Evaluation\nSecurity + Policy + Approval] + F --> G{Decision} + G -->|Allow| H[Deploy to Targets\nDocker/Compose/ECS/Nomad] + G -->|Deny| I[Block with Explanation] + H --> J[Evidence Packet\nSigned + Stored] +``` -**Adopted standards (pinned for interoperability):** +### What Stella Ops Suite Does -* **SBOM:** CycloneDX **1.7** (JSON/XML; 1.6 accepted for ingest) -* **Attestation & signing:** **in‑toto Attestations** (Statement + Predicate) in **DSSE** envelopes -* **Transparency:** **Sigstore Rekor** (inclusion proofs, monitoring) -* **Exploitability:** **OpenVEX** (statuses & justifications) -* **Modeling & interop:** **SPDX 3.0.1** (relationships / VEX modeling) -* **Findings interchange (optional):** SARIF for analyzer output - -> Pinnings are *policy*, not claims about “latest”. We may update pins via normal change control. +| Capability | Description | +|------------|-------------| +| **Release orchestration** | UI-driven promotion (Dev → Stage → Prod), approvals, policy gates, rollbacks; steps are hook-able with scripts | +| **Security decisioning as a gate** | Scan on build, evaluate on release, re-evaluate on CVE updates—without forcing re-scans | +| **OCI-digest-first releases** | A release is an immutable digest (or bundle of digests); track "what is deployed where" | +| **Toolchain-agnostic integrations** | Plug into any SCM, any CI, any registry, any secrets system | +| **Auditability + standards** | Evidence packets (exportable), SBOM/VEX/attestation support, deterministic replay | --- -## 3) Security Invariants (What MUST Always Hold) +## 3) Design Principles & Invariants -1. **Artifact identity is content‑addressed.** +These principles are **inviolable** and MUST be reflected in all code, UI, documentation, and audit artifacts. + +### Principle 1: Release Identity via Digest + +``` +INVARIANT: A release is a set of OCI image digests (component → digest mapping), never tags. +``` + +- Tags are convenience inputs for resolution +- Tags are resolved to digests at release creation time +- All downstream operations (promotion, deployment, rollback) use digests +- Digest mismatch at pull time = deployment failure (tamper detection) + +### Principle 2: Determinism and Evidence + +``` +INVARIANT: Every deployment/promotion produces an immutable evidence record. +``` + +Evidence record contains: +- **Who**: User identity (from Authority) +- **What**: Release bundle (digests), target environment, target hosts +- **Why**: Policy evaluation result, approval records, decision reasons +- **How**: Generated artifacts (compose files, scripts), execution logs +- **When**: Timestamps for request, decision, execution, completion + +### Principle 3: Pluggable Everything, Stable Core + +``` +INVARIANT: Integrations are plugins; the core orchestration engine is stable. +``` + +Plugins contribute: +- Configuration screens (UI) +- Connector logic (runtime) +- Step node types (workflow) +- Doctor checks (diagnostics) +- Agent types (deployment) + +Core engine provides: +- Workflow execution (DAG processing) +- State machine management +- Evidence generation +- Policy evaluation +- Credential brokering + +### Principle 4: No Feature Gating + +``` +INVARIANT: All plans include all features. Limits are only: +- Number of environments +- Number of new digests analyzed per day +- Fair use on deployments +``` + +### Principle 5: Offline-First Operation + +``` +INVARIANT: All core operations MUST work in air-gapped environments. +``` + +- No runtime calls to external APIs for core decisions +- Vulnerability data synced via mirror bundles +- Plugins may require connectivity; core does not +- Evidence packets exportable for external audit + +### Principle 6: Immutable Generated Artifacts + +``` +INVARIANT: Every deployment generates and stores immutable artifacts. +``` + +Generated artifacts: +- `compose.stella.lock.yml`: Pinned digests, resolved env refs +- `deploy.stella.script.dll`: Compiled C# script (or hash reference) +- `release.evidence.json`: Decision record +- `stella.version.json`: Version sticker placed on target + +--- + +## 4) Security Invariants (Scanning & Attestation) + +These invariants from our scanning heritage remain core to the security gate: + +1. **Artifact identity is content-addressed.** + - All identities are SHA-256 digests of immutable blobs. - * All identities are SHA‑256 digests of immutable blobs (images, SBOMs, attestations). 2. **Every SBOM is signed.** + - SBOMs MUST be wrapped in **in-toto DSSE** attestations tied to the container digest. - * SBOMs MUST be wrapped in **in‑toto DSSE** attestations tied to the container digest. 3. **Provenance is attached, not implied.** + - Build metadata (who/where/how) MUST ride as attestations linked by digest. - * Build metadata (who/where/how) MUST ride as attestations linked by digest. 4. **Transparency FIRST mindset.** + - Signatures/attestations SHOULD be logged to **Rekor** and store inclusion proofs. - * Signatures/attestations SHOULD be logged to **Rekor** and store inclusion proofs. 5. **Determinism & replay.** + - Scans MUST be reproducible given: input digests, scanner version, DB snapshot, and config. - * Scans MUST be reproducible given: input digests, scanner version, DB snapshot, and config. 6. **Explainability.** + - Findings MUST show the *why*: package → file path → call-stack / entrypoint (when available). - * Findings MUST show the *why*: package → file path → call‑stack / entrypoint (when available). 7. **Exploitability over enumeration.** + - Risk MUST be communicated via **VEX** (OpenVEX), including **under_investigation** where appropriate. - * Risk MUST be communicated via **VEX** (OpenVEX), including **under_investigation** where appropriate. -8. **Least privilege & minimal trust.** - - * Build keys are short‑lived; scanners run on ephemeral, least‑privileged workers. -9. **Air‑gap friendly.** - - * Mirrors for vuln DBs and containers; all verification MUST work without public egress. -10. **No hidden blockers.** - -* Policy gates MUST be code‑reviewable (e.g., Rego) and auditable; waivers are attestations, not emails. +8. **Air-gap friendly.** + - Mirrors for vuln DBs and containers; all verification MUST work without public egress. --- -## 4) Trust Boundaries & Roles +## 5) Adopted Standards - CI - CI -->|image digest| REG - REG -->|pull by digest| SB - SB --> AT --> TR --> REK - AT --> ST - REK --> ST - ST --> SCN --> POL --> ST - -``` --> - -* **Build/CI:** Holds signing capability (short‑lived keys or keyless signing). -* **Registry:** Source of truth for image bytes; access via digest only. -* **Scanner Pool:** Ephemeral nodes; content‑addressed caches; no shared mutable state. -* **Artifacts Store:** Immutable, WORM‑like storage for SBOMs, attestations, proofs, SARIF, VEX. +| Domain | Standard | Stella Pin | Notes | +|--------|----------|------------|-------| +| **SBOM** | CycloneDX | **1.7** | JSON or XML; 1.6 ingest supported | +| **Attestation** | in-toto | **Statement v1** | Predicates per use case | +| **Envelope** | DSSE | **v1** | Canonical JSON payloads | +| **Transparency** | Sigstore Rekor | **API stable** | Inclusion proof stored alongside artifacts | +| **VEX** | OpenVEX | **spec current** | Map to SPDX 3.0.1 relationships as needed | +| **Interop** | SPDX | **3.0.1** | Use for modeling & cross-ecosystem exchange | +| **Findings** | SARIF | **2.1.0** | Optional but recommended | --- -## 5) Data & Evidence We Persist +## 6) Competitive Positioning -| Artifact | MUST Persist | Why | -| -------------------- | ------------------------------------ | ---------------------------- | -| SBOM (CycloneDX 1.7) | Raw file + DSSE attestation | Reproducibility, audit | -| in‑toto Statement | Full JSON | Traceability | -| Rekor entry | UUID + inclusion proof | Tamper‑evidence | -| Scanner output | SARIF + raw notes | Triage & tooling interop | -| VEX | OpenVEX + links to findings | Noise reduction & compliance | -| Policy decisions | Input set + decision + rule versions | Governance & forensics | +### Why Stella Wins (One Line Each) -Retention follows our Compliance policy; default **≥ 18 months**. +- **CI/CD tools** (Actions/Jenkins/GitLab CI): great at running pipelines, weak at being a central release authority with audit-grade evidence. +- **CD orchestrators** (Octopus/Harness/Spinnaker): strong promotions, but security is bolt-on and pricing scales poorly. +- **Registries** (Harbor/JFrog): can store and scan, but don't provide release governance. +- **Scanners/CNAPP** (Trivy/Snyk/Aqua): scan well, but don't provide release orchestration. + +### Core Differentiators (Moats) + +1. **Non-Kubernetes Specialization** — Docker hosts, Compose, ECS, Nomad are first-class, not afterthoughts. + +2. **Signed Reachability** — Every reachability graph is sealed with DSSE; optional edge-bundle attestations for runtime/init/contested paths. + +3. **Deterministic Replay** — Scans and release decisions run bit-for-bit identical from frozen feeds and manifests. + +4. **Evidence-Linked Decisions** — Every gate evaluation produces a signed decision record with evidence refs. + +5. **Sovereign + Offline Operation** — FIPS/eIDAS/GOST/SM/PQC profiles and offline mirrors as first-class toggles. + +6. **Cost Model** — No per-seat, per-project, or per-deployment tax. Limits are environments + new digests/day. --- -## 6) Scanner Requirements (Determinism & Explainability) +## 7) Release Orchestration Architecture (Planned) -* **Inputs pinned:** image digest(s), SBOM(s), scanner version, vuln DB snapshot date, config hash. -* **Explainability:** show file paths, package coords (e.g., purl), and—when possible—**entry‑trace/call‑stack** from executable entrypoints to vulnerable symbol(s). -* **Caching:** content‑addressed per‑layer & per‑ecosystem caches; warming does not change decisions. -* **Unknowns:** output **under_investigation** where exploitability is not yet known; roll into VEX. -* **Interchange:** emit **SARIF** for IDE and pipeline consumption (optional but recommended). +### New Themes + +| Theme | Purpose | Key Modules | +|-------|---------|-------------| +| **INTHUB** | Integration hub | Integration Manager, Connection Profiles, Connector Runtime | +| **ENVMGR** | Environment management | Environment Manager, Target Registry, Agent Manager | +| **RELMAN** | Release management | Component Registry, Version Manager, Release Manager | +| **WORKFL** | Workflow engine | Workflow Designer, Workflow Engine, Step Executor | +| **PROMOT** | Promotion and approval | Promotion Manager, Approval Gateway, Decision Engine | +| **DEPLOY** | Deployment execution | Deploy Orchestrator, Target Executor, Artifact Generator | +| **AGENTS** | Deployment agents | Agent Core, Docker/Compose/ECS/Nomad agents | +| **PROGDL** | Progressive delivery | A/B Manager, Traffic Router, Canary Controller | +| **RELEVI** | Release evidence | Evidence Collector, Sticker Writer, Audit Exporter | +| **PLUGIN** | Plugin infrastructure | Plugin Registry, Plugin Loader, Plugin SDK | + +### Key Data Entities + +- **Environment**: Dev/Stage/Prod with freeze windows, approval policies +- **Target**: Deployment destination (Docker host, Compose host, ECS service, Nomad job) +- **Agent**: Deployment executor with capabilities and heartbeat +- **Component**: Logical service mapped to image repository +- **Release**: Bundle of component digests with semantic version +- **Promotion**: Request to move release between environments +- **Workflow**: DAG of steps for deployment execution +- **Evidence Packet**: Signed bundle of decision inputs and outputs --- -## 7) Policy Gate (OPA/Rego) — Examples +## 8) Existing Capabilities (Operational) -> Gate runs after scan + VEX merge. It treats VEX as first‑class input. +These themes power the security gate within release orchestration: -### 7.1 Deny unreconciled criticals that are exploitable - -```rego -package stella.policy - -default allow := false - -exploitable(v) { - v.severity == "CRITICAL" - v.exploitability == "affected" -} - -allow { - not exploitable_some -} - -exploitable_some { - some v in input.findings - exploitable(v) - not waived(v.id) -} - -waived(id) { - some w in input.vex - w.vuln_id == id - w.status == "not_affected" - w.justification != "" -} -``` - -### 7.2 Require Rekor inclusion for attestations - -```rego -package stella.policy - -violation[msg] { - some a in input.attestations - not a.rekor.inclusion_proof - msg := sprintf("Attestation %s lacks Rekor inclusion proof", [a.id]) -} -``` +| Theme | Purpose | Key Modules | +|-------|---------|-------------| +| **INGEST** | Advisory ingestion | Concelier, Advisory-AI | +| **VEXOPS** | VEX document handling | Excititor, VEX Lens, VEX Hub | +| **REASON** | Policy and decisioning | Policy Engine, OPA Runtime | +| **SCANENG** | Scanning and SBOM | Scanner, SBOM Service, Reachability | +| **EVIDENCE** | Evidence and attestation | Evidence Locker, Attestor, Export Center | +| **RUNTIME** | Runtime signals | Signals, Graph, Zastava | +| **JOBCTRL** | Job orchestration | Scheduler, Orchestrator, TaskRunner | +| **OBSERVE** | Observability | Notifier, Telemetry | +| **REPLAY** | Deterministic replay | Replay Engine | +| **DEVEXP** | Developer experience | CLI, Web UI, SDK | --- -## 8) Version Pins & Compatibility +## 9) Pricing Model -| Domain | Standard | Stella Pin | Notes | -| ------------ | -------------- | ---------------- | ------------------------------------------------ | -| SBOM | CycloneDX | **1.7** | JSON or XML accepted; 1.6 ingest supported | -| Attestation | in‑toto | **Statement v1** | Predicates per use case (e.g., sbom, provenance) | -| Envelope | DSSE | **v1** | Canonical JSON payloads | -| Transparency | Sigstore Rekor | **API stable** | Inclusion proof stored alongside artifacts | -| VEX | OpenVEX | **spec current** | Map to SPDX 3.0.1 relationships as needed | -| Interop | SPDX | **3.0.1** | Use for modeling & cross‑ecosystem exchange | -| Findings | SARIF | **2.1.0** | Optional but recommended | +**Principle:** Pay for scale, not for features or automation. + +| Plan | Price | Environments | New Digests/Day | Notes | +|------|-------|--------------|-----------------|-------| +| **Free** | $0/month | 3 | 333 | Full features, unlimited deployments (fair use) | +| **Pro** | $699/month | 33 | 3,333 | Same features | +| **Enterprise** | $1,999/month | Unlimited | Unlimited | Fair use on mirroring/audit bandwidth | --- -## 9) Minimal CLI Playbook (Illustrative) +## 10) Implementation Roadmap (Planned) -> Commands below are illustrative; wire them into CI with short‑lived credentials. - -```bash -# 1) Produce SBOM (CycloneDX 1.7) from image digest -syft registry:5000/myimg@sha256:... -o cyclonedx-json > sbom.cdx.json - -# 2) Create in‑toto DSSE attestation bound to the image digest -cosign attest --predicate sbom.cdx.json \ - --type https://stella-ops.org/attestations/sbom/1 \ - --key env://COSIGN_KEY \ - registry:5000/myimg@sha256:... - -# 3) (Optional but recommended) Rekor transparency -cosign sign --key env://COSIGN_KEY registry:5000/myimg@sha256:... -cosign verify-attestation --type ... --certificate-oidc-issuer https://token.actions... registry:5000/myimg@sha256:... > rekor-proof.json - -# 4) Scan (pinned DB snapshot) -stella-scan --image registry:5000/myimg@sha256:... \ - --sbom sbom.cdx.json \ - --db-snapshot 2025-10-01 \ - --out findings.sarif - -# 5) Emit VEX -stella-vex --from findings.sarif --policy vex-policy.yaml --out vex.json - -# 6) Gate -opa eval -i gate-input.json -d policy/ -f pretty "data.stella.policy.allow" -``` +| Phase | Focus | Key Deliverables | +|-------|-------|------------------| +| **Phase 1** | Foundation | Environment management, integration hub, release bundles | +| **Phase 2** | Workflow Engine | DAG execution, step registry, workflow templates | +| **Phase 3** | Promotion & Decision | Approval gateway, security gates, decision records | +| **Phase 4** | Deployment Execution | Docker/Compose agents, artifact generation, rollback | +| **Phase 5** | UI & Polish | Release dashboard, promotion UI, environment management | +| **Phase 6** | Progressive Delivery | A/B releases, canary, traffic routing | +| **Phase 7** | Extended Targets | ECS, Nomad, SSH/WinRM agentless | +| **Phase 8** | Plugin Ecosystem | Full plugin system, marketplace | --- -## 10) JSON Skeletons (Copy‑Ready) +## 11) Change Log -### 10.1 in‑toto Statement (DSSE payload) - -```json -{ - "_type": "https://in-toto.io/Statement/v1", - "subject": [ - { - "name": "registry:5000/myimg", - "digest": { "sha256": "IMAGE_DIGEST_SHA256" } - } - ], - "predicateType": "https://stella-ops.org/attestations/sbom/1", - "predicate": { - "sbomFormat": "CycloneDX", - "sbomVersion": "1.7", - "mediaType": "application/vnd.cyclonedx+json", - "location": "sha256:SBOM_BLOB_SHA256" - } -} -``` - -### 10.2 DSSE Envelope (wrapping the Statement) - -```json -{ - "payloadType": "application/vnd.in-toto+json", - "payload": "BASE64URL_OF_CANONICAL_STATEMENT_JSON", - "signatures": [ - { - "keyid": "KEY_ID_OR_CERT_ID", - "sig": "BASE64URL_SIGNATURE" - } - ] -} -``` - -### 10.3 OpenVEX (compact) - -```json -{ - "@context": "https://openvex.dev/ns/v0.2.0", - "author": "Stella Ops Security", - "timestamp": "2025-10-29T00:00:00Z", - "statements": [ - { - "vulnerability": "CVE-2025-0001", - "products": ["pkg:purl/example@1.2.3?arch=amd64"], - "status": "under_investigation", - "justification": "analysis_ongoing", - "timestamp": "2025-10-29T00:00:00Z" - } - ] -} -``` +| Version | Date | Note | +|---------|------|------| +| v2.0 | 09-Jan-2026 | Major revision: pivot to release control plane; scanning becomes gate | +| v1.4 | 29-Oct-2025 | Initial principles, golden path, policy examples, JSON skeletons | +| v1.3 | 12-Jul-2025 | Expanded ecosystem pillar, added metrics/integrations | +| v1.2 | 11-Jul-2025 | Restructured to link with WHY; merged principles | +| v1.1 | 11-Jul-2025 | Original OSS-only vision | +| v1.0 | 09-Jul-2025 | First public draft | --- -## 11) Handling “Unknowns” & Noise +## References -* Use **OpenVEX** statuses: `affected`, `not_affected`, `fixed`, `under_investigation`. -* Prefer **justifications** over free‑text. -* Time‑bound **waivers** are modeled as VEX with `not_affected` + justification or `affected` + compensating controls. -* Dashboards MUST surface counts separately for `under_investigation` so risk is visible. - ---- - -## 12) Operational Guidance - -**Key management** - -* Use **ephemeral OIDC** or short‑lived keys (HSM/KMS bound). -* Rotate signer identities at least quarterly; no shared long‑term keys in CI. - -**Caching & performance** - -* Layer caches keyed by digest + analyzer version. -* Pre‑warm vuln DB snapshots; mirror into air‑gapped envs. - -**Multi‑tenancy** - -* Strict tenant isolation for storage and compute. -* Rate‑limit and bound memory/CPU per scan job. - -**Auditing** - -* Every decision is a record: inputs, versions, rule commit, actor, result. -* Preserve Rekor inclusion proofs with the attestation record. - ---- - -## 13) Exceptions Process (Break‑glass) - -1. Open a tracked exception with: artifact digest, CVE(s), business justification, expiry. -2. Generate VEX entry reflecting the exception (`not_affected` with justification or `affected` with compensating controls). -3. Merge into policy inputs; **policy MUST read VEX**, not tickets. -4. Re‑review before expiry; exceptions cannot auto‑renew. - ---- - -## 14) Threat Model (Abbreviated) - -* **Tampering**: modified SBOMs/attestations → mitigated by DSSE + Rekor + WORM storage. -* **Confused deputy**: scanning a different image → mitigated by digest‑only pulls and subject digests in attestations. -* **TOCTOU / re‑tagging**: registry tags drift → mitigated by digest pinning everywhere. -* **Scanner poisoning**: unpinned DBs → mitigated by snapshotting and recording version/date. -* **Key compromise**: long‑lived CI keys → mitigated by OIDC keyless or short‑lived KMS keys. - ---- - -## 15) Implementation Checklist - -* [ ] SBOM producer emits CycloneDX 1.7; bound to image digest. -* [ ] in‑toto+DSSE signing wired in CI; Rekor logging enabled. -* [ ] Durable artifact store with WORM semantics. -* [ ] Scanner produces explainable findings; SARIF optional. -* [ ] OpenVEX emitted and archived; linked to findings & image. -* [ ] Policy gate enforced; waivers modeled as VEX; decisions logged. -* [ ] Air‑gap mirrors for registry and vuln DBs. -* [ ] Runbooks for key rotation, Rekor outage, and database rollback. - ---- - -## 16) Glossary - -* **SBOM**: Software Bill of Materials describing packages/components within an artifact. -* **Attestation**: Signed statement binding facts (predicate) to a subject (artifact) using in‑toto. -* **DSSE**: Envelope that signs the canonical payload detached from transport. -* **Transparency Log**: Append‑only log (e.g., Rekor) giving inclusion and temporal proofs. -* **VEX**: Vulnerability Exploitability eXchange expressing exploitability status & justification. - ---- - -## 9) Moats - - -**Four capabilities no competitor offers together:** - -1. **Signed Reachability** – Every reachability graph is sealed with DSSE; optional edge-bundle attestations for runtime/init/contested paths. -2. **Deterministic Replay** – Scans run bit-for-bit identical from frozen feeds and analyzer manifests. -3. **Explainable Policy (Lattice VEX)** – Evidence-linked VEX decisions with explicit "Unknown" state handling. -4. **Sovereign + Offline Operation** – FIPS/eIDAS/GOST/SM/PQC profiles and offline mirrors as first-class toggles. - -**Decision Capsules:** Every scan result is sealed in a Decision Capsule—a content-addressed bundle containing exact SBOM, vuln feed snapshots, reachability evidence, policy version, derived VEX, and signatures. Auditors can re-run any capsule bit-for-bit to verify the outcome. - -**Additional moat details:** -- **Deterministic replay:** Hash-stable scans with frozen feeds and analyzer manifests; replay packs verifiable offline. -- **Hybrid reachability attestations:** Graph-level DSSE always; selective edge-bundle DSSE for runtime/init/contested edges with Rekor caps. Both static call-graph edges and runtime-derived edges can be attested. -- **Lattice VEX engine (Evidence-Linked):** Trust algebra across advisories, runtime, reachability, waivers; explainable paths with proof-linked decisions. Unlike yes/no approaches, explicit "Unknown" state handling ensures incomplete data never leads to false safety. -- **Crypto sovereignty:** FIPS/eIDAS/GOST/SM/PQC profiles and offline mirrors as first-class configuration. -- **Proof graph:** DSSE + Rekor spanning SBOM, call-graph, VEX, Decision Capsules, replay manifests for chain-of-custody evidence. -- **VEX Propagation:** Generate vulnerability status attestations downstream consumers can automatically trust and ingest—scalable VEX sharing across the supply chain. - -See also: `docs/product/competitive-landscape.md` for vendor comparison and talking points. - ---- - - -## 8 · Change Log - -| Version | Date | Note (high‑level) | -| ------- | ----------- | ----------------------------------------------------------------------------------------------------- | -| v1.4 | 29-Oct-2025 | Initial principles, golden path, policy examples, and JSON skeletons. | -| v1.4 | 14‑Jul‑2025 | First public revision reflecting quarterly roadmap & KPI baseline. | -| v1.3 | 12‑Jul‑2025 | Expanded ecosystem pillar, added metrics/integrations, refined non-goals, community persona/feedback. | -| v1.2 | 11‑Jul‑2025 | Restructured to link with WHY; merged principles into Strategic Pillars; added review §7 | -| v1.1 | 11‑Jul‑2025 | Original OSS‑only vision | -| v1.0 | 09‑Jul‑2025 | First public draft | - -*(End of Product Vision v1.3)* +- [Overview](../overview.md) — 2-minute product summary +- [Architecture Overview](../ARCHITECTURE_OVERVIEW.md) — High-level architecture +- [Release Orchestrator Architecture](../modules/release-orchestrator/architecture.md) — Detailed orchestrator design +- [Competitive Landscape](competitive-landscape.md) — Vendor comparison +- [Roadmap](../ROADMAP.md) — Implementation priorities diff --git a/docs/product/competitive-landscape.md b/docs/product/competitive-landscape.md index b1ea5a754..1b68b54a2 100644 --- a/docs/product/competitive-landscape.md +++ b/docs/product/competitive-landscape.md @@ -1,8 +1,50 @@ # Competitive Landscape -> **TL;DR:** Stella Ops isn't a scanner that outputs findings. It's a platform that outputs **attestable decisions that can be replayed**. That difference survives auditors, regulators, and supply-chain propagation. +> **TL;DR:** Stella Ops Suite isn't a scanner or a deployment tool—it's a **release control plane** that gates releases using reachability-aware security and produces **attestable decisions that can be replayed**. Non-Kubernetes container estates finally get a central release authority. -Source: internal advisory "23-Nov-2025 - Stella Ops vs Competitors", updated Jan 2026. This summary distils a 15-vendor comparison into actionable positioning notes for sales/PMM and engineering prioritization. +Source: internal advisories "23-Nov-2025 - Stella Ops vs Competitors" and "09-Jan-2026 - Stella Ops Pivot", updated Jan 2026. This summary covers both release orchestration and security positioning. + +--- + +## The New Category: Release Control Plane + +**Stella Ops Suite** occupies a unique position by combining: +- Release orchestration (promotions, approvals, workflows) +- Security decisioning as a gate (not a blocker) +- Non-Kubernetes target specialization +- Evidence-linked decisions with deterministic replay + +### Why Competitors Can't Easily Catch Up (Release Orchestration) + +| Category | Representatives | What They Optimized For | Why They Can't Easily Catch Up | +|----------|----------------|------------------------|-------------------------------| +| **CI/CD Tools** | GitHub Actions, Jenkins, GitLab CI | Running pipelines, build automation | No central release authority; no audit-grade evidence; deployment is afterthought | +| **CD Orchestrators** | Octopus, Harness, Spinnaker | Deployment automation, Kubernetes | Security is bolt-on; non-K8s is second-class; pricing punishes automation | +| **Registries** | Harbor, JFrog Artifactory | Artifact storage, scanning | No release governance; no promotion workflows; no deployment execution | +| **Scanners/CNAPP** | Trivy, Snyk, Aqua | Vulnerability detection | No release orchestration; findings don't integrate with promotion gates | + +### Stella Ops Suite Positioning + +| vs. Category | Why Stella Wins | +|--------------|-----------------| +| **vs. CI/CD tools** | They run pipelines; we provide central release authority with audit-grade evidence | +| **vs. CD orchestrators** | They bolt on security; we integrate it as gates. They punish automation with per-project pricing; we don't | +| **vs. Registries** | They store and scan; we govern releases and orchestrate deployments | +| **vs. Scanners** | They output findings; we output release decisions with evidence packets | + +### Unique Differentiators (Release Orchestration) + +| Differentiator | What It Means | +|----------------|---------------| +| **Non-Kubernetes Specialization** | Docker hosts, Compose, ECS, Nomad are first-class—not afterthoughts | +| **Digest-First Release Identity** | Releases are immutable OCI digests, not mutable tags | +| **Security Gates in Promotion** | Scan on build, evaluate on release, re-evaluate on CVE updates | +| **Evidence Packets** | Every release decision is cryptographically signed and replayable | +| **Cost Model** | No per-seat, per-project, per-deployment tax. Environments + new digests/day | + +--- + +## Security Positioning (Original Analysis) ---