Files
git.stella-ops.org/docs/architecture/integrations.md

216 lines
8.1 KiB
Markdown

# Integration Catalog Architecture
> **Module:** Integrations (`src/Integrations/StellaOps.Integrations.WebService`)
> **Sprint:** SPRINT_20251229_010_PLATFORM_integration_catalog_core
> **Last Updated:** 2025-12-30
---
## Overview
The Integration Catalog is a centralized registry for managing external integrations in StellaOps. It provides a unified API for configuring, testing, and monitoring connections to registries, SCM providers, CI systems, runtime hosts, and feed sources.
**Architecture Note:** Integration Catalog is a dedicated service (`src/Integrations`), NOT part of Gateway. Gateway handles HTTP ingress/routing only. Integration domain logic, plugins, and persistence live in the Integrations module.
## Directory Structure
```
src/Integrations/
├── StellaOps.Integrations.WebService/ # ASP.NET Core host
├── __Libraries/
│ ├── StellaOps.Integrations.Core/ # Domain models, enums, events
│ ├── StellaOps.Integrations.Contracts/ # Plugin contracts and DTOs
│ └── StellaOps.Integrations.Persistence/ # PostgreSQL repositories
└── __Plugins/
├── StellaOps.Integrations.Plugin.GitHubApp/
├── StellaOps.Integrations.Plugin.Harbor/
└── StellaOps.Integrations.Plugin.InMemory/
```
## Plugin Architecture
Each integration provider is implemented as a plugin that implements `IIntegrationConnectorPlugin`:
```csharp
public interface IIntegrationConnectorPlugin : IAvailabilityPlugin
{
IntegrationType Type { get; }
IntegrationProvider Provider { get; }
Task<TestConnectionResult> TestConnectionAsync(IntegrationConfig config, CancellationToken ct);
Task<HealthCheckResult> CheckHealthAsync(IntegrationConfig config, CancellationToken ct);
}
```
Plugins are loaded at startup from:
1. The configured `PluginsDirectory` (default: `plugins/`)
2. The WebService assembly (for built-in plugins)
## Integration Types
| Type | Description | Examples |
|------|-------------|----------|
| **Registry** | Container image registries | Docker Hub, Harbor, ECR, ACR, GCR, GHCR, Quay, Artifactory |
| **SCM** | Source code management | GitHub, GitLab, Gitea, Bitbucket, Azure DevOps |
| **CI** | Continuous integration | GitHub Actions, GitLab CI, Gitea Actions, Jenkins, CircleCI |
| **Host** | Runtime observation | Zastava (eBPF, ETW, dyld probes) |
| **Feed** | Vulnerability feeds | Concelier, Excititor mirrors |
| **Artifact** | SBOM/VEX uploads | Direct artifact submission |
## Entity Schema
```csharp
public sealed class Integration
{
// Identity
public Guid IntegrationId { get; init; }
public string TenantId { get; init; }
public string Name { get; init; }
public string? Description { get; set; }
// Classification
public IntegrationType Type { get; init; }
public IntegrationProvider Provider { get; init; }
// Configuration
public string? BaseUrl { get; set; }
public string? AuthRef { get; set; } // Never raw secrets
public JsonDocument Configuration { get; set; }
// Organization
public string? Environment { get; set; } // prod, staging, dev
public string? Tags { get; set; }
public string? OwnerId { get; set; }
// Lifecycle
public IntegrationStatus Status { get; private set; }
public bool Paused { get; private set; }
public string? PauseReason { get; private set; }
// Health
public DateTimeOffset? LastTestedAt { get; private set; }
public bool? LastTestSuccess { get; private set; }
public int ConsecutiveFailures { get; private set; }
// Audit
public DateTimeOffset CreatedAt { get; init; }
public string CreatedBy { get; init; }
public DateTimeOffset? ModifiedAt { get; private set; }
public string? ModifiedBy { get; private set; }
public int Version { get; private set; }
}
```
## Lifecycle States
```
┌─────────┐
│ Draft │ ──── SubmitForVerification() ────►
└─────────┘
┌───────────────────┐
│ PendingVerification│ ──── Test Success ────►
└───────────────────┘
┌──────────┐
│ Active │ ◄──── Resume() ────┐
└──────────┘ │
│ │
Consecutive ┌─────────┐
Failures ≥ 3 │ Paused │
│ └─────────┘
▼ ▲
┌───────────┐ │
│ Degraded │ ──── Pause() ───────┘
└───────────┘
Failures ≥ 5
┌──────────┐
│ Failed │
└──────────┘
```
## API Endpoints
Base path: `/api/v1/integrations`
| Method | Path | Scope | Description |
|--------|------|-------|-------------|
| GET | `/` | `integrations.read` | List integrations with filtering |
| GET | `/{id}` | `integrations.read` | Get integration by ID |
| POST | `/` | `integrations.admin` | Create integration |
| PUT | `/{id}` | `integrations.admin` | Update integration |
| DELETE | `/{id}` | `integrations.admin` | Delete integration |
| POST | `/{id}/test` | `integrations.admin` | Test connection |
| POST | `/{id}/pause` | `integrations.admin` | Pause integration |
| POST | `/{id}/resume` | `integrations.admin` | Resume integration |
| POST | `/{id}/activate` | `integrations.admin` | Activate integration |
| GET | `/{id}/health` | `integrations.read` | Get health status |
## AuthRef Pattern
**Critical:** The Integration Catalog never stores raw credentials. All secrets are referenced via `AuthRef` strings that point to Authority's secret store.
```
AuthRef format: ref://<scope>/<provider>/<key>
Example: ref://integrations/github/acme-org-token
```
The AuthRef is resolved at runtime when making API calls to the integration provider. This ensures:
1. Secrets are stored centrally with proper encryption
2. Secret rotation doesn't require integration updates
3. Audit trails track secret access separately
4. Offline bundles can use different AuthRefs
## Event Pipeline
Integration lifecycle events are published for consumption by Scheduler and Orchestrator:
| Event | Trigger | Consumers |
|-------|---------|-----------|
| `integration.created` | New integration | Scheduler (schedule health checks) |
| `integration.updated` | Configuration change | Scheduler (reschedule) |
| `integration.deleted` | Integration removed | Scheduler (cancel jobs) |
| `integration.paused` | Operator paused | Orchestrator (pause jobs) |
| `integration.resumed` | Operator resumed | Orchestrator (resume jobs) |
| `integration.healthy` | Test passed | Signals (status update) |
| `integration.unhealthy` | Test failed | Signals, Notify (alert) |
## Audit Trail
All integration actions are logged:
- Create/Update/Delete with actor and timestamp
- Connection tests with success/failure
- Pause/Resume with reason and ticket reference
- Activate with approver
Audit logs are stored in the append-only audit store for compliance.
## Determinism & Offline
- Integration lists are ordered deterministically by name
- Timestamps are UTC ISO-8601
- Pagination uses stable cursor semantics
- Health polling respects offline mode (skip network checks)
- Feed integrations support allowlists for air-gap environments
## RBAC Scopes
| Scope | Permission |
|-------|------------|
| `integrations.read` | View integrations and health |
| `integrations.admin` | Create, update, delete, test, pause, resume |
## Future Extensions
1. **Provider-specific testers**: HTTP health checks, registry auth validation, SCM webhook verification
2. **PostgreSQL persistence**: Replace in-memory repository for production
3. **Messaging events**: Publish to Valkey/Kafka instead of no-op
4. **Health history**: Track uptime percentage and latency over time
5. **Bulk operations**: Import/export integrations for environment promotion