up
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled

This commit is contained in:
StellaOps Bot
2025-11-28 20:55:22 +02:00
parent d040c001ac
commit 2548abc56f
231 changed files with 47468 additions and 68 deletions

View File

@@ -269,11 +269,12 @@ In this role you act as:
* **Angular v17 engineer** (UI). * **Angular v17 engineer** (UI).
* **QA automation engineer** (C#, Moq, Playwright, Angular test stack, or other suitable tools). * **QA automation engineer** (C#, Moq, Playwright, Angular test stack, or other suitable tools).
Implementation principles: Implementation principles:
* Always follow .NET 10 and Angular v17 best practices. * Always follow .NET 10 and Angular v17 best practices.
* Maximise reuse and composability. * Apply SOLID design principles (SRP, OCP, LSP, ISP, DIP) in service and library code.
* Maintain determinism: stable ordering, UTC ISO-8601 timestamps, immutable NDJSON where applicable. * Maximise reuse and composability.
* Maintain determinism: stable ordering, UTC ISO-8601 timestamps, immutable NDJSON where applicable.
Execution rules (very important): Execution rules (very important):

View File

@@ -117,6 +117,7 @@ The codebase follows a monorepo pattern with modules under `src/`:
### Implementation Guidelines ### Implementation Guidelines
- Follow .NET 10 and Angular v17 best practices - Follow .NET 10 and Angular v17 best practices
- Apply SOLID principles (SRP, OCP, LSP, ISP, DIP) when designing services, libraries, and tests
- Maximise reuse and composability - Maximise reuse and composability
- Never regress determinism, ordering, or precedence - Never regress determinism, ordering, or precedence
- Every change must be accompanied by or covered by tests - Every change must be accompanied by or covered by tests

491
docs/db/CONVERSION_PLAN.md Normal file
View File

@@ -0,0 +1,491 @@
# MongoDB to PostgreSQL Conversion Plan
**Version:** 2.0.0
**Status:** APPROVED
**Created:** 2025-11-28
**Last Updated:** 2025-11-28
---
## Executive Summary
This document outlines the strategic plan to **convert** (not migrate) StellaOps from MongoDB to PostgreSQL for control-plane domains. The conversion follows a "strangler fig" pattern, introducing PostgreSQL repositories alongside existing MongoDB implementations and gradually switching each bounded context.
**Key Finding:** StellaOps already has production-ready PostgreSQL patterns in the Orchestrator and Findings modules that serve as templates for all other modules.
### Related Documents
| Document | Purpose |
|----------|---------|
| [SPECIFICATION.md](./SPECIFICATION.md) | Schema designs, naming conventions, data types |
| [RULES.md](./RULES.md) | Database coding rules and patterns |
| [VERIFICATION.md](./VERIFICATION.md) | Testing and verification requirements |
| [tasks/](./tasks/) | Detailed task definitions per phase |
---
## 1. Principles & Scope
### 1.1 Goals
Convert **control-plane** domains from MongoDB to PostgreSQL:
| Domain | Current DB | Target | Priority |
|--------|-----------|--------|----------|
| Authority | `stellaops_authority` | PostgreSQL | P0 |
| Scheduler | `stellaops_scheduler` | PostgreSQL | P0 |
| Notify | `stellaops_notify` | PostgreSQL | P1 |
| Policy | `stellaops_policy` | PostgreSQL | P1 |
| Vulnerabilities (Concelier) | `concelier` | PostgreSQL | P2 |
| VEX & Graph (Excititor) | `excititor` | PostgreSQL | P2 |
| PacksRegistry | `stellaops_packs` | PostgreSQL | P3 |
| IssuerDirectory | `stellaops_issuer` | PostgreSQL | P3 |
### 1.2 Non-Goals
- Scanner result storage (remains object storage + Mongo for now)
- Real-time event streams (separate infrastructure)
- Legacy data archive (can remain in MongoDB read-only)
### 1.3 Constraints
**MUST Preserve:**
- Deterministic, replayable scans
- "Preserve/prune source" rule for Concelier/Excititor
- Lattice logic in `Scanner.WebService` (not in DB)
- Air-gap friendliness and offline-kit packaging
- Multi-tenant isolation patterns
- Zero downtime during conversion
### 1.4 Conversion vs Migration
This is a **conversion**, not a 1:1 document→row mapping:
| Approach | When to Use |
|----------|-------------|
| **Normalize** | Identities, jobs, schedules, relationships |
| **Keep JSONB** | Advisory payloads, provenance trails, evidence manifests |
| **Drop/Archive** | Ephemeral data (caches, locks), historical logs |
---
## 2. Architecture
### 2.1 Strangler Fig Pattern
```
┌─────────────────────────────────────────────────────────────┐
│ Service Layer │
├─────────────────────────────────────────────────────────────┤
│ Repository Interface │
│ (e.g., IScheduleRepository) │
├──────────────────────┬──────────────────────────────────────┤
│ MongoRepository │ PostgresRepository │
│ (existing) │ (new) │
├──────────────────────┴──────────────────────────────────────┤
│ DI Container (configured switch) │
└─────────────────────────────────────────────────────────────┘
```
### 2.2 Configuration-Driven Backend Selection
```json
{
"Persistence": {
"Authority": "Postgres",
"Scheduler": "Postgres",
"Concelier": "Mongo",
"Excititor": "Mongo",
"Notify": "Postgres",
"Policy": "Mongo"
}
}
```
### 2.3 Existing PostgreSQL Patterns
The codebase already contains production-ready patterns:
| Module | Location | Reusable Components |
|--------|----------|---------------------|
| Orchestrator | `src/Orchestrator/.../Infrastructure/Postgres/` | DataSource, tenant context, repository pattern |
| Findings | `src/Findings/StellaOps.Findings.Ledger/Infrastructure/Postgres/` | Ledger events, Merkle anchors, projections |
**Reference Implementation:** `OrchestratorDataSource.cs`
---
## 3. Data Tiering
### 3.1 Tier Definitions
| Tier | Description | Strategy |
|------|-------------|----------|
| **A** | Critical business data | Full conversion with verification |
| **B** | Important but recoverable | Convert active records only |
| **C** | Ephemeral/cache data | Fresh start, no migration |
### 3.2 Module Tiering
#### Authority
| Collection | Tier | Strategy |
|------------|------|----------|
| `authority_users` | A | Full conversion |
| `authority_clients` | A | Full conversion |
| `authority_scopes` | A | Full conversion |
| `authority_tokens` | B | Active tokens only |
| `authority_service_accounts` | A | Full conversion |
| `authority_login_attempts` | B | Recent 90 days |
| `authority_revocations` | A | Full conversion |
#### Scheduler
| Collection | Tier | Strategy |
|------------|------|----------|
| `schedules` | A | Full conversion |
| `runs` | B | Recent 180 days |
| `graph_jobs` | B | Active/recent only |
| `policy_jobs` | B | Active/recent only |
| `impact_snapshots` | B | Recent 90 days |
| `locks` | C | Fresh start |
#### Concelier (Vulnerabilities)
| Collection | Tier | Strategy |
|------------|------|----------|
| `advisory` | A | Full conversion |
| `advisory_raw` | B | GridFS refs only |
| `alias` | A | Full conversion |
| `affected` | A | Full conversion |
| `source` | A | Full conversion |
| `source_state` | A | Full conversion |
| `jobs`, `locks` | C | Fresh start |
#### Excititor (VEX)
| Collection | Tier | Strategy |
|------------|------|----------|
| `vex.statements` | A | Full conversion |
| `vex.observations` | A | Full conversion |
| `vex.linksets` | A | Full conversion |
| `vex.consensus` | A | Full conversion |
| `vex.raw` | B | Active/recent only |
| `vex.cache` | C | Fresh start |
---
## 4. Execution Phases
### Phase Overview
```
Phase 0: Foundations [1 sprint]
├─→ Phase 1: Authority [1 sprint]
├─→ Phase 2: Scheduler [1 sprint]
├─→ Phase 3: Notify [1 sprint]
├─→ Phase 4: Policy [1 sprint]
└─→ Phase 5: Concelier [2 sprints]
└─→ Phase 6: Excititor [2-3 sprints]
└─→ Phase 7: Cleanup [1 sprint]
```
### Phase Summary
| Phase | Scope | Duration | Dependencies | Deliverable |
|-------|-------|----------|--------------|-------------|
| 0 | Foundations | 1 sprint | None | PostgreSQL infrastructure, shared library |
| 1 | Authority | 1 sprint | Phase 0 | Identity management on PostgreSQL |
| 2 | Scheduler | 1 sprint | Phase 0 | Job scheduling on PostgreSQL |
| 3 | Notify | 1 sprint | Phase 0 | Notifications on PostgreSQL |
| 4 | Policy | 1 sprint | Phase 0 | Policy engine on PostgreSQL |
| 5 | Concelier | 2 sprints | Phase 0 | Vulnerability index on PostgreSQL |
| 6 | Excititor | 2-3 sprints | Phase 5 | VEX & graphs on PostgreSQL |
| 7 | Cleanup | 1 sprint | All | MongoDB retired, docs updated |
**Total: 10-12 sprints**
### Detailed Task Definitions
See:
- [tasks/PHASE_0_FOUNDATIONS.md](./tasks/PHASE_0_FOUNDATIONS.md)
- [tasks/PHASE_1_AUTHORITY.md](./tasks/PHASE_1_AUTHORITY.md)
- [tasks/PHASE_2_SCHEDULER.md](./tasks/PHASE_2_SCHEDULER.md)
- [tasks/PHASE_3_NOTIFY.md](./tasks/PHASE_3_NOTIFY.md)
- [tasks/PHASE_4_POLICY.md](./tasks/PHASE_4_POLICY.md)
- [tasks/PHASE_5_VULNERABILITIES.md](./tasks/PHASE_5_VULNERABILITIES.md)
- [tasks/PHASE_6_VEX_GRAPH.md](./tasks/PHASE_6_VEX_GRAPH.md)
- [tasks/PHASE_7_CLEANUP.md](./tasks/PHASE_7_CLEANUP.md)
---
## 5. Conversion Strategy
### 5.1 Per-Module Approach
```
1. Create PostgreSQL storage project
2. Implement schema migrations
3. Implement repository interfaces
4. Add configuration switch
5. Enable dual-write (if Tier A)
6. Run verification tests
7. Switch to PostgreSQL-only
8. Archive MongoDB data
```
### 5.2 Dual-Write Pattern
For Tier A data requiring historical continuity:
```
┌──────────────────────────────────────────────────────────────┐
│ DualWriteRepository │
├──────────────────────────────────────────────────────────────┤
│ Write: PostgreSQL (primary) + MongoDB (secondary) │
│ Read: PostgreSQL (primary) → MongoDB (fallback) │
│ Config: WriteToBoth, FallbackToMongo, ConvertOnRead │
└──────────────────────────────────────────────────────────────┘
```
### 5.3 Fresh Start Pattern
For Tier C ephemeral data:
```
┌──────────────────────────────────────────────────────────────┐
│ 1. Deploy PostgreSQL schema │
│ 2. Switch configuration to PostgreSQL │
│ 3. New data goes to PostgreSQL only │
│ 4. Old MongoDB data ages out naturally │
└──────────────────────────────────────────────────────────────┘
```
---
## 6. Risk Assessment
### 6.1 Technical Risks
| Risk | Impact | Likelihood | Mitigation |
|------|--------|------------|------------|
| Data loss during conversion | High | Low | Dual-write mode, extensive verification |
| Performance regression | Medium | Medium | Load testing before switch, index optimization |
| Determinism violation | High | Medium | Automated verification tests, parallel pipeline |
| Schema evolution conflicts | Medium | Low | Migration framework, schema versioning |
| Transaction semantics differences | Medium | Low | Code review, integration tests |
### 6.2 Operational Risks
| Risk | Impact | Likelihood | Mitigation |
|------|--------|------------|------------|
| Extended conversion timeline | Medium | Medium | Phase-based approach, clear milestones |
| Team learning curve | Low | Medium | Reference implementations, documentation |
| Rollback complexity | Medium | Low | Keep Mongo data until verified, feature flags |
### 6.3 Rollback Strategy
Each phase has independent rollback capability:
| Level | Action | Recovery Time |
|-------|--------|---------------|
| Configuration | Change `Persistence:<Module>` to `Mongo` | Minutes |
| Data | MongoDB data retained during dual-write | None needed |
| Code | Git revert (PostgreSQL code isolated) | Hours |
---
## 7. Success Criteria
### 7.1 Per-Module Criteria
- [ ] All existing integration tests pass with PostgreSQL backend
- [ ] No performance regression >10% on critical paths
- [ ] Deterministic outputs verified against MongoDB baseline
- [ ] Zero data loss during conversion
- [ ] Tenant isolation verified
### 7.2 Overall Criteria
- [ ] All control-plane modules running on PostgreSQL
- [ ] MongoDB retired from production for converted modules
- [ ] Air-gap kit updated with PostgreSQL support
- [ ] Documentation updated for PostgreSQL operations
- [ ] Runbooks updated for PostgreSQL troubleshooting
---
## 8. Project Structure
### 8.1 New Projects
```
src/
├── Shared/
│ └── StellaOps.Infrastructure.Postgres/
│ ├── DataSourceBase.cs
│ ├── Migrations/
│ │ ├── IPostgresMigration.cs
│ │ └── PostgresMigrationRunner.cs
│ ├── Extensions/
│ │ └── NpgsqlExtensions.cs
│ └── ServiceCollectionExtensions.cs
├── Authority/
│ └── __Libraries/
│ └── StellaOps.Authority.Storage.Postgres/
│ ├── AuthorityDataSource.cs
│ ├── Repositories/
│ ├── Migrations/
│ └── ServiceCollectionExtensions.cs
├── Scheduler/
│ └── __Libraries/
│ └── StellaOps.Scheduler.Storage.Postgres/
├── Notify/
│ └── __Libraries/
│ └── StellaOps.Notify.Storage.Postgres/
├── Policy/
│ └── __Libraries/
│ └── StellaOps.Policy.Storage.Postgres/
├── Concelier/
│ └── __Libraries/
│ └── StellaOps.Concelier.Storage.Postgres/
└── Excititor/
└── __Libraries/
└── StellaOps.Excititor.Storage.Postgres/
```
### 8.2 Schema Files
```
docs/db/
├── schemas/
│ ├── authority.sql
│ ├── vuln.sql
│ ├── vex.sql
│ ├── scheduler.sql
│ ├── notify.sql
│ └── policy.sql
```
---
## 9. Timeline
### 9.1 Sprint Schedule
| Sprint | Phase | Focus |
|--------|-------|-------|
| 1 | 0 | PostgreSQL infrastructure, shared library |
| 2 | 1 | Authority module conversion |
| 3 | 2 | Scheduler module conversion |
| 4 | 3 | Notify module conversion |
| 5 | 4 | Policy module conversion |
| 6-7 | 5 | Concelier/Vulnerability conversion |
| 8-10 | 6 | Excititor/VEX conversion |
| 11 | 7 | Cleanup, optimization, documentation |
### 9.2 Milestones
| Milestone | Sprint | Criteria |
|-----------|--------|----------|
| M1: Infrastructure Ready | 1 | PostgreSQL cluster operational, CI tests passing |
| M2: Identity Converted | 2 | Authority on PostgreSQL, auth flows working |
| M3: Scheduling Converted | 3 | Scheduler on PostgreSQL, jobs executing |
| M4: Core Services Converted | 5 | Notify + Policy on PostgreSQL |
| M5: Vulnerability Index Converted | 7 | Concelier on PostgreSQL, scans deterministic |
| M6: VEX Converted | 10 | Excititor on PostgreSQL, graphs stable |
| M7: MongoDB Retired | 11 | All modules converted, Mongo archived |
---
## 10. Governance
### 10.1 Decision Log
| Date | Decision | Rationale | Approver |
|------|----------|-----------|----------|
| 2025-11-28 | Strangler fig pattern | Allows gradual rollout with rollback | Architecture Team |
| 2025-11-28 | JSONB for semi-structured data | Preserves flexibility, simplifies conversion | Architecture Team |
| 2025-11-28 | Phase 0 first | Infrastructure must be stable before modules | Architecture Team |
### 10.2 Change Control
Changes to this plan require:
1. Impact assessment documented
2. Risk analysis updated
3. Approval from Architecture Team
4. Updated task definitions in `docs/db/tasks/`
### 10.3 Status Reporting
Weekly status updates in sprint files tracking:
- Tasks completed
- Blockers encountered
- Verification results
- Next sprint objectives
---
## Appendix A: Reference Implementation
### DataSource Pattern
```csharp
public sealed class ModuleDataSource : IAsyncDisposable
{
private readonly NpgsqlDataSource _dataSource;
public async Task<NpgsqlConnection> OpenConnectionAsync(
string tenantId,
CancellationToken cancellationToken = default)
{
var connection = await _dataSource.OpenConnectionAsync(cancellationToken);
await ConfigureSessionAsync(connection, tenantId, cancellationToken);
return connection;
}
private static async Task ConfigureSessionAsync(
NpgsqlConnection connection,
string tenantId,
CancellationToken cancellationToken)
{
await using var cmd = connection.CreateCommand();
cmd.CommandText = $"""
SET app.tenant_id = '{tenantId}';
SET timezone = 'UTC';
SET statement_timeout = '30s';
""";
await cmd.ExecuteNonQueryAsync(cancellationToken);
}
}
```
### Repository Pattern
See [RULES.md](./RULES.md) Section 1 for complete repository implementation guidelines.
---
## Appendix B: Glossary
| Term | Definition |
|------|------------|
| **Strangler Fig** | Pattern where new system grows alongside old, gradually replacing it |
| **Dual-Write** | Writing to both MongoDB and PostgreSQL during transition |
| **Tier A/B/C** | Data classification by criticality for migration strategy |
| **DataSource** | Npgsql connection factory with tenant context configuration |
| **Determinism** | Property that same inputs always produce same outputs |
---
*Document Version: 2.0.0*
*Last Updated: 2025-11-28*

60
docs/db/README.md Normal file
View File

@@ -0,0 +1,60 @@
# StellaOps Database Documentation
This directory contains all documentation related to the StellaOps database architecture, including the MongoDB to PostgreSQL conversion project.
## Document Index
| Document | Purpose |
|----------|---------|
| [SPECIFICATION.md](./SPECIFICATION.md) | PostgreSQL schema design specification, data types, naming conventions |
| [RULES.md](./RULES.md) | Database coding rules, patterns, and constraints for all developers |
| [CONVERSION_PLAN.md](./CONVERSION_PLAN.md) | Strategic plan for MongoDB to PostgreSQL conversion |
| [VERIFICATION.md](./VERIFICATION.md) | Testing and verification requirements for database changes |
## Task Definitions
Sprint-level task definitions for the conversion project:
| Phase | Document | Status |
|-------|----------|--------|
| Phase 0 | [tasks/PHASE_0_FOUNDATIONS.md](./tasks/PHASE_0_FOUNDATIONS.md) | TODO |
| Phase 1 | [tasks/PHASE_1_AUTHORITY.md](./tasks/PHASE_1_AUTHORITY.md) | TODO |
| Phase 2 | [tasks/PHASE_2_SCHEDULER.md](./tasks/PHASE_2_SCHEDULER.md) | TODO |
| Phase 3 | [tasks/PHASE_3_NOTIFY.md](./tasks/PHASE_3_NOTIFY.md) | TODO |
| Phase 4 | [tasks/PHASE_4_POLICY.md](./tasks/PHASE_4_POLICY.md) | TODO |
| Phase 5 | [tasks/PHASE_5_VULNERABILITIES.md](./tasks/PHASE_5_VULNERABILITIES.md) | TODO |
| Phase 6 | [tasks/PHASE_6_VEX_GRAPH.md](./tasks/PHASE_6_VEX_GRAPH.md) | TODO |
| Phase 7 | [tasks/PHASE_7_CLEANUP.md](./tasks/PHASE_7_CLEANUP.md) | TODO |
## Schema Reference
Schema DDL files (generated from specifications):
| Schema | File | Tables |
|--------|------|--------|
| authority | [schemas/authority.sql](./schemas/authority.sql) | 12 |
| vuln | [schemas/vuln.sql](./schemas/vuln.sql) | 12 |
| vex | [schemas/vex.sql](./schemas/vex.sql) | 13 |
| scheduler | [schemas/scheduler.sql](./schemas/scheduler.sql) | 10 |
| notify | [schemas/notify.sql](./schemas/notify.sql) | 14 |
| policy | [schemas/policy.sql](./schemas/policy.sql) | 8 |
## Quick Links
- **For developers**: Start with [RULES.md](./RULES.md) for coding conventions
- **For architects**: Review [SPECIFICATION.md](./SPECIFICATION.md) for design rationale
- **For project managers**: See [CONVERSION_PLAN.md](./CONVERSION_PLAN.md) for timeline and phases
- **For QA**: Check [VERIFICATION.md](./VERIFICATION.md) for testing requirements
## Key Principles
1. **Determinism First**: All database operations must produce reproducible, stable outputs
2. **Tenant Isolation**: Multi-tenancy via `tenant_id` column with row-level security
3. **Strangler Fig Pattern**: Gradual conversion with rollback capability per module
4. **JSONB for Flexibility**: Semi-structured data stays as JSONB, relational data normalizes
## Related Documentation
- [Architecture Overview](../07_HIGH_LEVEL_ARCHITECTURE.md)
- [Module Dossiers](../modules/)
- [Air-Gap Operations](../24_OFFLINE_KIT.md)

839
docs/db/RULES.md Normal file
View File

@@ -0,0 +1,839 @@
# Database Coding Rules
**Version:** 1.0.0
**Status:** APPROVED
**Last Updated:** 2025-11-28
---
## Purpose
This document defines mandatory rules and guidelines for all database-related code in StellaOps. These rules ensure consistency, maintainability, determinism, and security across all modules.
**Compliance is mandatory.** Deviations require explicit approval documented in the relevant sprint file.
---
## 1. Repository Pattern Rules
### 1.1 Interface Location
**RULE:** Repository interfaces MUST be defined in the Core/Domain layer, NOT in the storage layer.
```
✓ CORRECT:
src/Scheduler/__Libraries/StellaOps.Scheduler.Core/Repositories/IScheduleRepository.cs
✗ INCORRECT:
src/Scheduler/__Libraries/StellaOps.Scheduler.Storage.Postgres/IScheduleRepository.cs
```
### 1.2 Implementation Naming
**RULE:** Repository implementations MUST be prefixed with the storage technology.
```csharp
// ✓ CORRECT
public sealed class PostgresScheduleRepository : IScheduleRepository
public sealed class MongoScheduleRepository : IScheduleRepository
// ✗ INCORRECT
public sealed class ScheduleRepository : IScheduleRepository
```
### 1.3 Dependency Injection
**RULE:** PostgreSQL repositories MUST be registered as `Scoped`. MongoDB repositories MAY be `Singleton`.
```csharp
// PostgreSQL - always scoped (connection per request)
services.AddScoped<IScheduleRepository, PostgresScheduleRepository>();
// MongoDB - singleton is acceptable (stateless)
services.AddSingleton<IScheduleRepository, MongoScheduleRepository>();
```
### 1.4 No Direct SQL in Services
**RULE:** Business logic services MUST NOT contain raw SQL. All database access MUST go through repository interfaces.
```csharp
// ✓ CORRECT
public class ScheduleService
{
private readonly IScheduleRepository _repository;
public Task<Schedule?> GetAsync(string id)
=> _repository.GetAsync(id);
}
// ✗ INCORRECT
public class ScheduleService
{
private readonly NpgsqlDataSource _dataSource;
public async Task<Schedule?> GetAsync(string id)
{
await using var conn = await _dataSource.OpenConnectionAsync();
// Direct SQL here - FORBIDDEN
}
}
```
---
## 2. Connection Management Rules
### 2.1 DataSource Pattern
**RULE:** Every module MUST have its own DataSource class that configures tenant context.
```csharp
public sealed class SchedulerDataSource : IAsyncDisposable
{
private readonly NpgsqlDataSource _dataSource;
public async Task<NpgsqlConnection> OpenConnectionAsync(
string tenantId,
CancellationToken cancellationToken = default)
{
var connection = await _dataSource.OpenConnectionAsync(cancellationToken);
await ConfigureSessionAsync(connection, tenantId, cancellationToken);
return connection;
}
private static async Task ConfigureSessionAsync(
NpgsqlConnection connection,
string tenantId,
CancellationToken cancellationToken)
{
// MANDATORY: Set tenant context and UTC timezone
await using var cmd = connection.CreateCommand();
cmd.CommandText = $"""
SET app.tenant_id = '{tenantId}';
SET timezone = 'UTC';
SET statement_timeout = '30s';
""";
await cmd.ExecuteNonQueryAsync(cancellationToken);
}
}
```
### 2.2 Connection Disposal
**RULE:** All NpgsqlConnection instances MUST be disposed via `await using`.
```csharp
// ✓ CORRECT
await using var connection = await _dataSource.OpenConnectionAsync(tenantId, ct);
// ✗ INCORRECT
var connection = await _dataSource.OpenConnectionAsync(tenantId, ct);
// Missing disposal
```
### 2.3 Command Disposal
**RULE:** All NpgsqlCommand instances MUST be disposed via `await using`.
```csharp
// ✓ CORRECT
await using var cmd = connection.CreateCommand();
// ✗ INCORRECT
var cmd = connection.CreateCommand();
```
### 2.4 Reader Disposal
**RULE:** All NpgsqlDataReader instances MUST be disposed via `await using`.
```csharp
// ✓ CORRECT
await using var reader = await cmd.ExecuteReaderAsync(ct);
// ✗ INCORRECT
var reader = await cmd.ExecuteReaderAsync(ct);
```
---
## 3. Tenant Isolation Rules
### 3.1 Tenant ID Required
**RULE:** Every tenant-scoped repository method MUST require `tenantId` as the first parameter.
```csharp
// ✓ CORRECT
Task<Schedule?> GetAsync(string tenantId, string scheduleId, CancellationToken ct);
Task<IReadOnlyList<Schedule>> ListAsync(string tenantId, QueryOptions? options, CancellationToken ct);
// ✗ INCORRECT
Task<Schedule?> GetAsync(string scheduleId, CancellationToken ct);
```
### 3.2 Tenant Filtering
**RULE:** All queries MUST include `tenant_id` in the WHERE clause for tenant-scoped tables.
```csharp
// ✓ CORRECT
cmd.CommandText = """
SELECT * FROM scheduler.schedules
WHERE tenant_id = @tenant_id AND id = @id
""";
// ✗ INCORRECT - Missing tenant filter
cmd.CommandText = """
SELECT * FROM scheduler.schedules
WHERE id = @id
""";
```
### 3.3 Session Context Verification
**RULE:** DataSource MUST set `app.tenant_id` on every connection before executing any queries.
```csharp
// ✓ CORRECT - Connection opened via DataSource sets tenant context
await using var connection = await _dataSource.OpenConnectionAsync(tenantId, ct);
// ✗ INCORRECT - Direct connection without tenant context
await using var connection = await _rawDataSource.OpenConnectionAsync(ct);
```
---
## 4. SQL Writing Rules
### 4.1 Parameterized Queries Only
**RULE:** All user-provided values MUST be passed as parameters. String interpolation is FORBIDDEN for values.
```csharp
// ✓ CORRECT
cmd.CommandText = "SELECT * FROM users WHERE id = @id";
cmd.Parameters.AddWithValue("id", userId);
// ✗ INCORRECT - SQL INJECTION VULNERABILITY
cmd.CommandText = $"SELECT * FROM users WHERE id = '{userId}'";
```
### 4.2 SQL String Constants
**RULE:** SQL strings MUST be defined as `const` or `static readonly` fields, or as raw string literals in methods.
```csharp
// ✓ CORRECT - Raw string literal
cmd.CommandText = """
SELECT id, name, created_at
FROM scheduler.schedules
WHERE tenant_id = @tenant_id
ORDER BY created_at DESC
""";
// ✓ CORRECT - Constant
private const string SelectScheduleSql = """
SELECT id, name, created_at
FROM scheduler.schedules
WHERE tenant_id = @tenant_id
""";
// ✗ INCORRECT - Dynamic string building without reason
cmd.CommandText = "SELECT " + columns + " FROM " + table;
```
### 4.3 Schema Qualification
**RULE:** All table references MUST include the schema name.
```csharp
// ✓ CORRECT
cmd.CommandText = "SELECT * FROM scheduler.schedules";
// ✗ INCORRECT - Missing schema
cmd.CommandText = "SELECT * FROM schedules";
```
### 4.4 Column Listing
**RULE:** SELECT statements MUST list columns explicitly. `SELECT *` is FORBIDDEN in production code.
```csharp
// ✓ CORRECT
cmd.CommandText = """
SELECT id, tenant_id, name, enabled, created_at
FROM scheduler.schedules
""";
// ✗ INCORRECT
cmd.CommandText = "SELECT * FROM scheduler.schedules";
```
### 4.5 Consistent Casing
**RULE:** SQL keywords MUST be lowercase for consistency with PostgreSQL conventions.
```csharp
// ✓ CORRECT
cmd.CommandText = """
select id, name
from scheduler.schedules
where tenant_id = @tenant_id
order by created_at desc
""";
// ✗ INCORRECT - Mixed casing
cmd.CommandText = """
SELECT id, name
FROM scheduler.schedules
WHERE tenant_id = @tenant_id
""";
```
---
## 5. Data Type Rules
### 5.1 UUID Handling
**RULE:** UUIDs MUST be passed as `Guid` type to Npgsql, NOT as strings.
```csharp
// ✓ CORRECT
cmd.Parameters.AddWithValue("id", Guid.Parse(scheduleId));
// ✗ INCORRECT
cmd.Parameters.AddWithValue("id", scheduleId); // String
```
### 5.2 Timestamp Handling
**RULE:** All timestamps MUST be `DateTimeOffset` or `DateTime` with `Kind = Utc`.
```csharp
// ✓ CORRECT
cmd.Parameters.AddWithValue("created_at", DateTimeOffset.UtcNow);
cmd.Parameters.AddWithValue("created_at", DateTime.UtcNow);
// ✗ INCORRECT - Local time
cmd.Parameters.AddWithValue("created_at", DateTime.Now);
```
### 5.3 JSONB Serialization
**RULE:** JSONB columns MUST be serialized using `System.Text.Json.JsonSerializer` with consistent options.
```csharp
// ✓ CORRECT
var json = JsonSerializer.Serialize(obj, JsonSerializerOptions.Default);
cmd.Parameters.AddWithValue("config", json);
// ✗ INCORRECT - Newtonsoft or inconsistent serialization
var json = Newtonsoft.Json.JsonConvert.SerializeObject(obj);
```
### 5.4 Null Handling
**RULE:** Nullable values MUST use `DBNull.Value` when null.
```csharp
// ✓ CORRECT
cmd.Parameters.AddWithValue("description", (object?)schedule.Description ?? DBNull.Value);
// ✗ INCORRECT - Will fail or behave unexpectedly
cmd.Parameters.AddWithValue("description", schedule.Description); // If null
```
### 5.5 Array Handling
**RULE:** PostgreSQL arrays MUST be passed as .NET arrays with explicit type.
```csharp
// ✓ CORRECT
cmd.Parameters.AddWithValue("tags", schedule.Tags.ToArray());
// ✗ INCORRECT - List won't map correctly
cmd.Parameters.AddWithValue("tags", schedule.Tags);
```
---
## 6. Transaction Rules
### 6.1 Explicit Transactions
**RULE:** Operations affecting multiple tables MUST use explicit transactions.
```csharp
// ✓ CORRECT
await using var transaction = await connection.BeginTransactionAsync(ct);
try
{
// Multiple operations
await cmd1.ExecuteNonQueryAsync(ct);
await cmd2.ExecuteNonQueryAsync(ct);
await transaction.CommitAsync(ct);
}
catch
{
await transaction.RollbackAsync(ct);
throw;
}
```
### 6.2 Transaction Isolation
**RULE:** Default isolation level is `ReadCommitted`. Stricter levels MUST be documented.
```csharp
// ✓ CORRECT - Default
await using var transaction = await connection.BeginTransactionAsync(ct);
// ✓ CORRECT - Explicit stricter level with documentation
// Using Serializable for financial consistency requirement
await using var transaction = await connection.BeginTransactionAsync(
IsolationLevel.Serializable, ct);
```
### 6.3 No Nested Transactions
**RULE:** Nested transactions are NOT supported. Use savepoints if needed.
```csharp
// ✗ INCORRECT - Nested transaction
await using var tx1 = await connection.BeginTransactionAsync(ct);
await using var tx2 = await connection.BeginTransactionAsync(ct); // FAILS
// ✓ CORRECT - Savepoint for partial rollback
await using var transaction = await connection.BeginTransactionAsync(ct);
await transaction.SaveAsync("savepoint1", ct);
// ... operations ...
await transaction.RollbackAsync("savepoint1", ct); // Partial rollback
await transaction.CommitAsync(ct);
```
---
## 7. Error Handling Rules
### 7.1 PostgreSQL Exception Handling
**RULE:** Catch `PostgresException` for database-specific errors, not generic exceptions.
```csharp
// ✓ CORRECT
try
{
await cmd.ExecuteNonQueryAsync(ct);
}
catch (PostgresException ex) when (ex.SqlState == "23505") // Unique violation
{
throw new DuplicateEntityException($"Entity already exists: {ex.ConstraintName}");
}
// ✗ INCORRECT - Too broad
catch (Exception ex)
{
// Can't distinguish database errors from other errors
}
```
### 7.2 Constraint Violation Handling
**RULE:** Unique constraint violations MUST be translated to domain exceptions.
| SQL State | Meaning | Domain Exception |
|-----------|---------|------------------|
| `23505` | Unique violation | `DuplicateEntityException` |
| `23503` | Foreign key violation | `ReferenceNotFoundException` |
| `23502` | Not null violation | `ValidationException` |
| `23514` | Check constraint | `ValidationException` |
### 7.3 Timeout Handling
**RULE:** Query timeouts MUST be caught and logged with context.
```csharp
try
{
await cmd.ExecuteNonQueryAsync(ct);
}
catch (NpgsqlException ex) when (ex.InnerException is TimeoutException)
{
_logger.LogWarning(ex, "Query timeout for schedule {ScheduleId}", scheduleId);
throw new QueryTimeoutException("Database query timed out", ex);
}
```
---
## 8. Pagination Rules
### 8.1 Keyset Pagination
**RULE:** Use keyset pagination, NOT offset pagination for large result sets.
```csharp
// ✓ CORRECT - Keyset pagination
cmd.CommandText = """
select id, name, created_at
from scheduler.schedules
where tenant_id = @tenant_id
and (created_at, id) < (@cursor_created_at, @cursor_id)
order by created_at desc, id desc
limit @page_size
""";
// ✗ INCORRECT - Offset pagination (slow for large offsets)
cmd.CommandText = """
select id, name, created_at
from scheduler.schedules
where tenant_id = @tenant_id
order by created_at desc
limit @page_size offset @offset
""";
```
### 8.2 Default Page Size
**RULE:** Default page size MUST be 50. Maximum page size MUST be 1000.
```csharp
public class QueryOptions
{
public int PageSize { get; init; } = 50;
public int GetValidatedPageSize()
=> Math.Clamp(PageSize, 1, 1000);
}
```
### 8.3 Continuation Tokens
**RULE:** Pagination cursors MUST be opaque, encoded tokens containing sort key values.
```csharp
public record PaginationCursor(DateTimeOffset CreatedAt, Guid Id)
{
public string Encode()
=> Convert.ToBase64String(
JsonSerializer.SerializeToUtf8Bytes(this));
public static PaginationCursor? Decode(string? token)
=> string.IsNullOrEmpty(token)
? null
: JsonSerializer.Deserialize<PaginationCursor>(
Convert.FromBase64String(token));
}
```
---
## 9. Ordering Rules
### 9.1 Deterministic Ordering
**RULE:** All queries returning multiple rows MUST have an ORDER BY clause that produces deterministic results.
```csharp
// ✓ CORRECT - Deterministic (includes unique column)
cmd.CommandText = """
select * from scheduler.runs
order by created_at desc, id asc
""";
// ✗ INCORRECT - Non-deterministic (created_at may have ties)
cmd.CommandText = """
select * from scheduler.runs
order by created_at desc
""";
```
### 9.2 Stable Ordering for JSONB Arrays
**RULE:** When serializing arrays to JSONB, ensure consistent ordering.
```csharp
// ✓ CORRECT - Sorted before serialization
var sortedTags = schedule.Tags.OrderBy(t => t).ToList();
cmd.Parameters.AddWithValue("tags", sortedTags.ToArray());
// ✗ INCORRECT - Order may vary
cmd.Parameters.AddWithValue("tags", schedule.Tags.ToArray());
```
---
## 10. Audit Rules
### 10.1 Timestamp Columns
**RULE:** All mutable tables MUST have `created_at` and `updated_at` columns.
```sql
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
```
### 10.2 Update Timestamp
**RULE:** `updated_at` MUST be set on every UPDATE operation.
```csharp
// ✓ CORRECT
cmd.CommandText = """
update scheduler.schedules
set name = @name, updated_at = @updated_at
where id = @id
""";
cmd.Parameters.AddWithValue("updated_at", DateTimeOffset.UtcNow);
// ✗ INCORRECT - Missing updated_at
cmd.CommandText = """
update scheduler.schedules
set name = @name
where id = @id
""";
```
### 10.3 Soft Delete Pattern
**RULE:** For audit-required entities, use soft delete with `deleted_at` and `deleted_by`.
```csharp
cmd.CommandText = """
update scheduler.schedules
set deleted_at = @deleted_at, deleted_by = @deleted_by
where tenant_id = @tenant_id and id = @id and deleted_at is null
""";
```
---
## 11. Testing Rules
### 11.1 Integration Test Database
**RULE:** Integration tests MUST use Testcontainers with PostgreSQL.
```csharp
public class PostgresFixture : IAsyncLifetime
{
private readonly PostgreSqlContainer _container = new PostgreSqlBuilder()
.WithImage("postgres:16")
.Build();
public string ConnectionString => _container.GetConnectionString();
public Task InitializeAsync() => _container.StartAsync();
public Task DisposeAsync() => _container.DisposeAsync().AsTask();
}
```
### 11.2 Test Isolation
**RULE:** Each test MUST run in a transaction that is rolled back after the test.
```csharp
public class ScheduleRepositoryTests : IClassFixture<PostgresFixture>
{
[Fact]
public async Task GetAsync_ReturnsSchedule_WhenExists()
{
await using var connection = await _fixture.OpenConnectionAsync();
await using var transaction = await connection.BeginTransactionAsync();
try
{
// Arrange, Act, Assert
}
finally
{
await transaction.RollbackAsync();
}
}
}
```
### 11.3 Determinism Tests
**RULE:** Every repository MUST have tests verifying deterministic output ordering.
```csharp
[Fact]
public async Task ListAsync_ReturnsDeterministicOrder()
{
// Insert records with same created_at
// Verify order is consistent across multiple calls
var result1 = await _repository.ListAsync(tenantId);
var result2 = await _repository.ListAsync(tenantId);
result1.Should().BeEquivalentTo(result2, options =>
options.WithStrictOrdering());
}
```
---
## 12. Migration Rules
### 12.1 Idempotent Migrations
**RULE:** All migrations MUST be idempotent using `IF NOT EXISTS` / `IF EXISTS`.
```sql
-- ✓ CORRECT
CREATE TABLE IF NOT EXISTS scheduler.schedules (...);
CREATE INDEX IF NOT EXISTS idx_schedules_tenant ON scheduler.schedules(tenant_id);
-- ✗ INCORRECT
CREATE TABLE scheduler.schedules (...); -- Fails if exists
```
### 12.2 No Breaking Changes
**RULE:** Migrations MUST NOT break existing code. Use expand-contract pattern.
```
Expand Phase:
1. Add new column as nullable
2. Deploy code that writes to both old and new columns
3. Backfill new column
Contract Phase:
4. Deploy code that reads from new column only
5. Add NOT NULL constraint
6. Drop old column
```
### 12.3 Index Creation
**RULE:** Large table indexes MUST be created with `CONCURRENTLY`.
```sql
-- ✓ CORRECT - Won't lock table
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_large_table_col
ON schema.large_table(column);
-- ✗ INCORRECT - Locks table during creation
CREATE INDEX idx_large_table_col ON schema.large_table(column);
```
---
## 13. Configuration Rules
### 13.1 Backend Selection
**RULE:** Storage backend MUST be configurable per module.
```json
{
"Persistence": {
"Authority": "Postgres",
"Scheduler": "Postgres",
"Concelier": "Mongo"
}
}
```
### 13.2 Connection String Security
**RULE:** Connection strings MUST NOT be logged or included in exception messages.
```csharp
// ✓ CORRECT
catch (NpgsqlException ex)
{
_logger.LogError(ex, "Database connection failed for module {Module}", moduleName);
throw;
}
// ✗ INCORRECT
catch (NpgsqlException ex)
{
_logger.LogError("Failed to connect: {ConnectionString}", connectionString);
}
```
### 13.3 Timeout Configuration
**RULE:** Command timeout MUST be configurable with sensible defaults.
```csharp
public class PostgresOptions
{
public int CommandTimeoutSeconds { get; set; } = 30;
public int ConnectionTimeoutSeconds { get; set; } = 15;
}
```
---
## 14. Documentation Rules
### 14.1 Repository Method Documentation
**RULE:** All public repository methods MUST have XML documentation.
```csharp
/// <summary>
/// Retrieves a schedule by its unique identifier.
/// </summary>
/// <param name="tenantId">The tenant identifier for isolation.</param>
/// <param name="scheduleId">The schedule's unique identifier.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>The schedule if found; otherwise, null.</returns>
Task<Schedule?> GetAsync(string tenantId, string scheduleId, CancellationToken cancellationToken);
```
### 14.2 SQL Comment Headers
**RULE:** Complex SQL queries SHOULD have a comment explaining the purpose.
```csharp
cmd.CommandText = """
-- Find schedules due to fire within the next minute
-- Uses compound index (tenant_id, next_fire_time) for efficiency
select s.id, s.name, t.next_fire_time
from scheduler.schedules s
join scheduler.triggers t on t.schedule_id = s.id
where s.tenant_id = @tenant_id
and s.enabled = true
and t.next_fire_time <= @window_end
order by t.next_fire_time asc
""";
```
---
## Enforcement
### Code Review Checklist
- [ ] Repository interfaces in Core layer
- [ ] PostgreSQL repositories prefixed with `Postgres`
- [ ] All connections disposed with `await using`
- [ ] Tenant ID required and used in all queries
- [ ] Parameterized queries (no string interpolation for values)
- [ ] Schema-qualified table names
- [ ] Explicit column lists (no `SELECT *`)
- [ ] Deterministic ORDER BY clauses
- [ ] Timestamps are UTC
- [ ] JSONB serialized with System.Text.Json
- [ ] PostgresException caught for constraint violations
- [ ] Integration tests use Testcontainers
### Automated Checks
These rules are enforced by:
- Roslyn analyzers in `StellaOps.Analyzers`
- SQL linting in CI pipeline
- Integration test requirements
---
*Document Version: 1.0.0*
*Last Updated: 2025-11-28*

1326
docs/db/SPECIFICATION.md Normal file

File diff suppressed because it is too large Load Diff

961
docs/db/VERIFICATION.md Normal file
View File

@@ -0,0 +1,961 @@
# Database Verification Requirements
**Version:** 1.0.0
**Status:** DRAFT
**Last Updated:** 2025-11-28
---
## Purpose
This document defines the verification and testing requirements for the MongoDB to PostgreSQL conversion. It ensures that the conversion maintains data integrity, determinism, and functional correctness.
---
## 1. Verification Principles
### 1.1 Core Guarantees
The conversion MUST maintain these guarantees:
| Guarantee | Description | Verification Method |
|-----------|-------------|---------------------|
| **Data Integrity** | No data loss during conversion | Record count comparison, checksum validation |
| **Determinism** | Same inputs produce identical outputs | Parallel pipeline comparison |
| **Functional Equivalence** | APIs behave identically | Integration test suite |
| **Performance Parity** | No significant degradation | Benchmark comparison |
| **Tenant Isolation** | Data remains properly isolated | Cross-tenant query tests |
### 1.2 Verification Levels
```
Level 1: Unit Tests
└── Individual repository method correctness
Level 2: Integration Tests
└── End-to-end repository operations with real PostgreSQL
Level 3: Comparison Tests
└── MongoDB vs PostgreSQL output comparison
Level 4: Load Tests
└── Performance and scalability verification
Level 5: Production Verification
└── Dual-write monitoring and validation
```
---
## 2. Test Infrastructure
### 2.1 Testcontainers Setup
All PostgreSQL integration tests MUST use Testcontainers:
```csharp
public sealed class PostgresTestFixture : IAsyncLifetime
{
private readonly PostgreSqlContainer _container;
private NpgsqlDataSource? _dataSource;
public PostgresTestFixture()
{
_container = new PostgreSqlBuilder()
.WithImage("postgres:16-alpine")
.WithDatabase("stellaops_test")
.WithUsername("test")
.WithPassword("test")
.WithWaitStrategy(Wait.ForUnixContainer()
.UntilPortIsAvailable(5432))
.Build();
}
public string ConnectionString => _container.GetConnectionString();
public NpgsqlDataSource DataSource => _dataSource
?? throw new InvalidOperationException("Not initialized");
public async Task InitializeAsync()
{
await _container.StartAsync();
_dataSource = NpgsqlDataSource.Create(ConnectionString);
await RunMigrationsAsync();
}
public async Task DisposeAsync()
{
if (_dataSource is not null)
await _dataSource.DisposeAsync();
await _container.DisposeAsync();
}
private async Task RunMigrationsAsync()
{
await using var connection = await _dataSource!.OpenConnectionAsync();
var migrationRunner = new PostgresMigrationRunner(_dataSource, GetMigrations());
await migrationRunner.RunAsync();
}
}
```
### 2.2 Test Database State Management
```csharp
public abstract class PostgresRepositoryTestBase : IAsyncLifetime
{
protected readonly PostgresTestFixture Fixture;
protected NpgsqlConnection Connection = null!;
protected NpgsqlTransaction Transaction = null!;
protected PostgresRepositoryTestBase(PostgresTestFixture fixture)
{
Fixture = fixture;
}
public async Task InitializeAsync()
{
Connection = await Fixture.DataSource.OpenConnectionAsync();
Transaction = await Connection.BeginTransactionAsync();
// Set test tenant context
await using var cmd = Connection.CreateCommand();
cmd.CommandText = "SET app.tenant_id = 'test-tenant-id'";
await cmd.ExecuteNonQueryAsync();
}
public async Task DisposeAsync()
{
await Transaction.RollbackAsync();
await Transaction.DisposeAsync();
await Connection.DisposeAsync();
}
}
```
### 2.3 Test Data Builders
```csharp
public sealed class ScheduleBuilder
{
private Guid _id = Guid.NewGuid();
private string _tenantId = "test-tenant";
private string _name = "test-schedule";
private bool _enabled = true;
private string? _cronExpression = "0 * * * *";
public ScheduleBuilder WithId(Guid id) { _id = id; return this; }
public ScheduleBuilder WithTenant(string tenantId) { _tenantId = tenantId; return this; }
public ScheduleBuilder WithName(string name) { _name = name; return this; }
public ScheduleBuilder Enabled(bool enabled = true) { _enabled = enabled; return this; }
public ScheduleBuilder WithCron(string? cron) { _cronExpression = cron; return this; }
public Schedule Build() => new()
{
Id = _id,
TenantId = _tenantId,
Name = _name,
Enabled = _enabled,
CronExpression = _cronExpression,
Timezone = "UTC",
Mode = ScheduleMode.Scheduled,
CreatedAt = DateTimeOffset.UtcNow,
UpdatedAt = DateTimeOffset.UtcNow
};
}
```
---
## 3. Unit Test Requirements
### 3.1 Repository CRUD Tests
Every repository implementation MUST have tests for:
```csharp
public class PostgresScheduleRepositoryTests : PostgresRepositoryTestBase
{
private readonly PostgresScheduleRepository _repository;
public PostgresScheduleRepositoryTests(PostgresTestFixture fixture)
: base(fixture)
{
_repository = new PostgresScheduleRepository(/* ... */);
}
// CREATE
[Fact]
public async Task UpsertAsync_CreatesNewSchedule_WhenNotExists()
{
var schedule = new ScheduleBuilder().Build();
await _repository.UpsertAsync(schedule, CancellationToken.None);
var retrieved = await _repository.GetAsync(
schedule.TenantId, schedule.Id.ToString(), CancellationToken.None);
retrieved.Should().BeEquivalentTo(schedule);
}
// READ
[Fact]
public async Task GetAsync_ReturnsNull_WhenNotExists()
{
var result = await _repository.GetAsync(
"tenant", Guid.NewGuid().ToString(), CancellationToken.None);
result.Should().BeNull();
}
[Fact]
public async Task GetAsync_ReturnsSchedule_WhenExists()
{
var schedule = new ScheduleBuilder().Build();
await _repository.UpsertAsync(schedule, CancellationToken.None);
var result = await _repository.GetAsync(
schedule.TenantId, schedule.Id.ToString(), CancellationToken.None);
result.Should().NotBeNull();
result!.Id.Should().Be(schedule.Id);
}
// UPDATE
[Fact]
public async Task UpsertAsync_UpdatesExisting_WhenExists()
{
var schedule = new ScheduleBuilder().Build();
await _repository.UpsertAsync(schedule, CancellationToken.None);
schedule = schedule with { Name = "updated-name" };
await _repository.UpsertAsync(schedule, CancellationToken.None);
var retrieved = await _repository.GetAsync(
schedule.TenantId, schedule.Id.ToString(), CancellationToken.None);
retrieved!.Name.Should().Be("updated-name");
}
// DELETE
[Fact]
public async Task SoftDeleteAsync_SetsDeletedAt_WhenExists()
{
var schedule = new ScheduleBuilder().Build();
await _repository.UpsertAsync(schedule, CancellationToken.None);
var result = await _repository.SoftDeleteAsync(
schedule.TenantId, schedule.Id.ToString(),
"test-user", DateTimeOffset.UtcNow, CancellationToken.None);
result.Should().BeTrue();
var retrieved = await _repository.GetAsync(
schedule.TenantId, schedule.Id.ToString(), CancellationToken.None);
retrieved.Should().BeNull(); // Soft-deleted not returned
}
// LIST
[Fact]
public async Task ListAsync_ReturnsAllForTenant()
{
var schedule1 = new ScheduleBuilder().WithName("schedule-1").Build();
var schedule2 = new ScheduleBuilder().WithName("schedule-2").Build();
await _repository.UpsertAsync(schedule1, CancellationToken.None);
await _repository.UpsertAsync(schedule2, CancellationToken.None);
var results = await _repository.ListAsync(
schedule1.TenantId, null, CancellationToken.None);
results.Should().HaveCount(2);
}
}
```
### 3.2 Tenant Isolation Tests
```csharp
public class TenantIsolationTests : PostgresRepositoryTestBase
{
[Fact]
public async Task GetAsync_DoesNotReturnOtherTenantData()
{
var tenant1Schedule = new ScheduleBuilder()
.WithTenant("tenant-1")
.WithName("tenant1-schedule")
.Build();
var tenant2Schedule = new ScheduleBuilder()
.WithTenant("tenant-2")
.WithName("tenant2-schedule")
.Build();
await _repository.UpsertAsync(tenant1Schedule, CancellationToken.None);
await _repository.UpsertAsync(tenant2Schedule, CancellationToken.None);
// Tenant 1 should not see Tenant 2's data
var result = await _repository.GetAsync(
"tenant-1", tenant2Schedule.Id.ToString(), CancellationToken.None);
result.Should().BeNull();
}
[Fact]
public async Task ListAsync_OnlyReturnsTenantData()
{
// Create schedules for two tenants
for (int i = 0; i < 5; i++)
{
await _repository.UpsertAsync(
new ScheduleBuilder().WithTenant("tenant-1").Build(),
CancellationToken.None);
await _repository.UpsertAsync(
new ScheduleBuilder().WithTenant("tenant-2").Build(),
CancellationToken.None);
}
var tenant1Results = await _repository.ListAsync(
"tenant-1", null, CancellationToken.None);
var tenant2Results = await _repository.ListAsync(
"tenant-2", null, CancellationToken.None);
tenant1Results.Should().HaveCount(5);
tenant2Results.Should().HaveCount(5);
tenant1Results.Should().OnlyContain(s => s.TenantId == "tenant-1");
tenant2Results.Should().OnlyContain(s => s.TenantId == "tenant-2");
}
}
```
### 3.3 Determinism Tests
```csharp
public class DeterminismTests : PostgresRepositoryTestBase
{
[Fact]
public async Task ListAsync_ReturnsDeterministicOrder()
{
// Insert multiple schedules with same created_at
var baseTime = DateTimeOffset.UtcNow;
var schedules = Enumerable.Range(0, 10)
.Select(i => new ScheduleBuilder()
.WithName($"schedule-{i}")
.Build() with { CreatedAt = baseTime })
.ToList();
foreach (var schedule in schedules)
await _repository.UpsertAsync(schedule, CancellationToken.None);
// Multiple calls should return same order
var results1 = await _repository.ListAsync("test-tenant", null, CancellationToken.None);
var results2 = await _repository.ListAsync("test-tenant", null, CancellationToken.None);
var results3 = await _repository.ListAsync("test-tenant", null, CancellationToken.None);
results1.Select(s => s.Id).Should().Equal(results2.Select(s => s.Id));
results2.Select(s => s.Id).Should().Equal(results3.Select(s => s.Id));
}
[Fact]
public async Task JsonbSerialization_IsDeterministic()
{
var schedule = new ScheduleBuilder()
.Build() with
{
Selection = new ScheduleSelector
{
Tags = new[] { "z", "a", "m" },
Repositories = new[] { "repo-2", "repo-1" }
}
};
await _repository.UpsertAsync(schedule, CancellationToken.None);
// Retrieve and re-save multiple times
for (int i = 0; i < 3; i++)
{
var retrieved = await _repository.GetAsync(
schedule.TenantId, schedule.Id.ToString(), CancellationToken.None);
await _repository.UpsertAsync(retrieved!, CancellationToken.None);
}
// Final retrieval should have identical JSONB
var final = await _repository.GetAsync(
schedule.TenantId, schedule.Id.ToString(), CancellationToken.None);
// Arrays should be consistently ordered
final!.Selection.Tags.Should().BeInAscendingOrder();
}
}
```
---
## 4. Comparison Test Requirements
### 4.1 MongoDB vs PostgreSQL Comparison Framework
```csharp
public abstract class ComparisonTestBase<TEntity, TRepository>
where TRepository : class
{
protected readonly TRepository MongoRepository;
protected readonly TRepository PostgresRepository;
protected abstract Task<TEntity?> GetFromMongo(string tenantId, string id);
protected abstract Task<TEntity?> GetFromPostgres(string tenantId, string id);
protected abstract Task<IReadOnlyList<TEntity>> ListFromMongo(string tenantId);
protected abstract Task<IReadOnlyList<TEntity>> ListFromPostgres(string tenantId);
[Fact]
public async Task Get_ReturnsSameEntity_FromBothBackends()
{
var entityId = GetTestEntityId();
var tenantId = GetTestTenantId();
var mongoResult = await GetFromMongo(tenantId, entityId);
var postgresResult = await GetFromPostgres(tenantId, entityId);
postgresResult.Should().BeEquivalentTo(mongoResult, options =>
options.Excluding(e => e.Path.Contains("Id"))); // IDs may differ
}
[Fact]
public async Task List_ReturnsSameEntities_FromBothBackends()
{
var tenantId = GetTestTenantId();
var mongoResults = await ListFromMongo(tenantId);
var postgresResults = await ListFromPostgres(tenantId);
postgresResults.Should().BeEquivalentTo(mongoResults, options =>
options
.Excluding(e => e.Path.Contains("Id"))
.WithStrictOrdering()); // Order must match
}
}
```
### 4.2 Advisory Matching Comparison
```csharp
public class AdvisoryMatchingComparisonTests
{
[Theory]
[MemberData(nameof(GetSampleSboms))]
public async Task VulnerabilityMatching_ProducesSameResults(string sbomPath)
{
var sbom = await LoadSbomAsync(sbomPath);
// Configure Mongo backend
var mongoConfig = CreateConfig("Mongo");
var mongoScanner = CreateScanner(mongoConfig);
var mongoFindings = await mongoScanner.ScanAsync(sbom);
// Configure Postgres backend
var postgresConfig = CreateConfig("Postgres");
var postgresScanner = CreateScanner(postgresConfig);
var postgresFindings = await postgresScanner.ScanAsync(sbom);
// Compare findings
postgresFindings.Should().BeEquivalentTo(mongoFindings, options =>
options
.WithStrictOrdering()
.Using<DateTimeOffset>(ctx =>
ctx.Subject.Should().BeCloseTo(ctx.Expectation, TimeSpan.FromSeconds(1)))
.WhenTypeIs<DateTimeOffset>());
}
public static IEnumerable<object[]> GetSampleSboms()
{
yield return new object[] { "testdata/sbom-alpine-3.18.json" };
yield return new object[] { "testdata/sbom-debian-12.json" };
yield return new object[] { "testdata/sbom-nodejs-app.json" };
yield return new object[] { "testdata/sbom-python-app.json" };
}
}
```
### 4.3 VEX Graph Comparison
```csharp
public class GraphRevisionComparisonTests
{
[Theory]
[MemberData(nameof(GetTestProjects))]
public async Task GraphComputation_ProducesIdenticalRevisionId(string projectId)
{
// Compute graph with Mongo backend
var mongoGraph = await ComputeGraphAsync(projectId, "Mongo");
// Compute graph with Postgres backend
var postgresGraph = await ComputeGraphAsync(projectId, "Postgres");
// Revision ID MUST be identical (hash-stable)
postgresGraph.RevisionId.Should().Be(mongoGraph.RevisionId);
// Node and edge counts should match
postgresGraph.NodeCount.Should().Be(mongoGraph.NodeCount);
postgresGraph.EdgeCount.Should().Be(mongoGraph.EdgeCount);
// VEX statements should match
var mongoStatements = await GetStatementsAsync(projectId, "Mongo");
var postgresStatements = await GetStatementsAsync(projectId, "Postgres");
postgresStatements.Should().BeEquivalentTo(mongoStatements, options =>
options
.Excluding(s => s.Id)
.WithStrictOrdering());
}
}
```
---
## 5. Performance Test Requirements
### 5.1 Benchmark Framework
```csharp
[MemoryDiagnoser]
[SimpleJob(RuntimeMoniker.Net80)]
public class RepositoryBenchmarks
{
private IScheduleRepository _mongoRepository = null!;
private IScheduleRepository _postgresRepository = null!;
private string _tenantId = null!;
[GlobalSetup]
public async Task Setup()
{
// Initialize both repositories
_mongoRepository = await CreateMongoRepositoryAsync();
_postgresRepository = await CreatePostgresRepositoryAsync();
_tenantId = await SeedTestDataAsync();
}
[Benchmark(Baseline = true)]
public async Task<Schedule?> Mongo_GetById()
{
return await _mongoRepository.GetAsync(_tenantId, _testScheduleId, CancellationToken.None);
}
[Benchmark]
public async Task<Schedule?> Postgres_GetById()
{
return await _postgresRepository.GetAsync(_tenantId, _testScheduleId, CancellationToken.None);
}
[Benchmark(Baseline = true)]
public async Task<IReadOnlyList<Schedule>> Mongo_List100()
{
return await _mongoRepository.ListAsync(_tenantId,
new QueryOptions { PageSize = 100 }, CancellationToken.None);
}
[Benchmark]
public async Task<IReadOnlyList<Schedule>> Postgres_List100()
{
return await _postgresRepository.ListAsync(_tenantId,
new QueryOptions { PageSize = 100 }, CancellationToken.None);
}
}
```
### 5.2 Performance Acceptance Criteria
| Operation | Mongo Baseline | Postgres Target | Maximum Acceptable |
|-----------|----------------|-----------------|-------------------|
| Get by ID | X ms | ≤ X ms | ≤ 1.5X ms |
| List (100 items) | Y ms | ≤ Y ms | ≤ 1.5Y ms |
| Insert | Z ms | ≤ Z ms | ≤ 2Z ms |
| Update | W ms | ≤ W ms | ≤ 2W ms |
| Complex query | V ms | ≤ V ms | ≤ 2V ms |
### 5.3 Load Test Scenarios
```yaml
# k6 load test configuration
scenarios:
constant_load:
executor: constant-arrival-rate
rate: 100
timeUnit: 1s
duration: 5m
preAllocatedVUs: 50
maxVUs: 100
spike_test:
executor: ramping-arrival-rate
startRate: 10
timeUnit: 1s
stages:
- duration: 1m
target: 10
- duration: 1m
target: 100
- duration: 2m
target: 100
- duration: 1m
target: 10
thresholds:
http_req_duration:
- p(95) < 200 # 95th percentile under 200ms
- p(99) < 500 # 99th percentile under 500ms
http_req_failed:
- rate < 0.01 # Error rate under 1%
```
---
## 6. Data Integrity Verification
### 6.1 Record Count Verification
```csharp
public class DataIntegrityVerifier
{
public async Task<VerificationResult> VerifyCountsAsync(string module)
{
var results = new Dictionary<string, (long mongo, long postgres)>();
foreach (var collection in GetCollections(module))
{
var mongoCount = await _mongoDb.GetCollection<BsonDocument>(collection)
.CountDocumentsAsync(FilterDefinition<BsonDocument>.Empty);
var postgresCount = await GetPostgresCountAsync(collection);
results[collection] = (mongoCount, postgresCount);
}
return new VerificationResult
{
Module = module,
Counts = results,
AllMatch = results.All(r => r.Value.mongo == r.Value.postgres)
};
}
}
```
### 6.2 Checksum Verification
```csharp
public class ChecksumVerifier
{
public async Task<bool> VerifyAdvisoryChecksumAsync(string advisoryKey)
{
var mongoAdvisory = await _mongoAdvisoryRepo.GetAsync(advisoryKey);
var postgresAdvisory = await _postgresAdvisoryRepo.GetAsync(advisoryKey);
if (mongoAdvisory is null || postgresAdvisory is null)
return mongoAdvisory is null && postgresAdvisory is null;
var mongoChecksum = ComputeChecksum(mongoAdvisory);
var postgresChecksum = ComputeChecksum(postgresAdvisory);
return mongoChecksum == postgresChecksum;
}
private string ComputeChecksum(Advisory advisory)
{
// Serialize to canonical JSON and hash
var json = JsonSerializer.Serialize(advisory, new JsonSerializerOptions
{
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
WriteIndented = false,
DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull
});
using var sha256 = SHA256.Create();
var hash = sha256.ComputeHash(Encoding.UTF8.GetBytes(json));
return Convert.ToHexString(hash);
}
}
```
### 6.3 Referential Integrity Verification
```csharp
public class ReferentialIntegrityTests
{
[Fact]
public async Task AllForeignKeys_ReferenceExistingRecords()
{
await using var connection = await _dataSource.OpenConnectionAsync();
await using var cmd = connection.CreateCommand();
// Check for orphaned references
cmd.CommandText = """
SELECT 'advisory_aliases' as table_name, COUNT(*) as orphan_count
FROM vuln.advisory_aliases a
LEFT JOIN vuln.advisories adv ON a.advisory_id = adv.id
WHERE adv.id IS NULL
UNION ALL
SELECT 'advisory_cvss', COUNT(*)
FROM vuln.advisory_cvss c
LEFT JOIN vuln.advisories adv ON c.advisory_id = adv.id
WHERE adv.id IS NULL
-- Add more tables...
""";
await using var reader = await cmd.ExecuteReaderAsync();
while (await reader.ReadAsync())
{
var tableName = reader.GetString(0);
var orphanCount = reader.GetInt64(1);
orphanCount.Should().Be(0, $"Table {tableName} has orphaned references");
}
}
}
```
---
## 7. Production Verification
### 7.1 Dual-Write Monitoring
```csharp
public class DualWriteMonitor
{
private readonly IMetrics _metrics;
public async Task RecordWriteAsync(
string module,
string operation,
bool mongoSuccess,
bool postgresSuccess,
TimeSpan mongoDuration,
TimeSpan postgresDuration)
{
_metrics.Counter("dual_write_total", new[]
{
("module", module),
("operation", operation),
("mongo_success", mongoSuccess.ToString()),
("postgres_success", postgresSuccess.ToString())
}).Inc();
_metrics.Histogram("dual_write_duration_ms", new[]
{
("module", module),
("operation", operation),
("backend", "mongo")
}).Observe(mongoDuration.TotalMilliseconds);
_metrics.Histogram("dual_write_duration_ms", new[]
{
("module", module),
("operation", operation),
("backend", "postgres")
}).Observe(postgresDuration.TotalMilliseconds);
if (mongoSuccess != postgresSuccess)
{
_metrics.Counter("dual_write_inconsistency", new[]
{
("module", module),
("operation", operation)
}).Inc();
_logger.LogWarning(
"Dual-write inconsistency: {Module}/{Operation} - Mongo: {Mongo}, Postgres: {Postgres}",
module, operation, mongoSuccess, postgresSuccess);
}
}
}
```
### 7.2 Read Comparison Sampling
```csharp
public class ReadComparisonSampler : BackgroundService
{
private readonly IOptions<SamplingOptions> _options;
private readonly Random _random = new();
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
while (!stoppingToken.IsCancellationRequested)
{
if (_random.NextDouble() < _options.Value.SampleRate) // e.g., 1%
{
await CompareRandomRecordAsync(stoppingToken);
}
await Task.Delay(_options.Value.Interval, stoppingToken);
}
}
private async Task CompareRandomRecordAsync(CancellationToken ct)
{
var entityId = await GetRandomEntityIdAsync(ct);
var mongoEntity = await _mongoRepo.GetAsync(entityId, ct);
var postgresEntity = await _postgresRepo.GetAsync(entityId, ct);
if (!AreEquivalent(mongoEntity, postgresEntity))
{
_logger.LogError(
"Read comparison mismatch for entity {EntityId}",
entityId);
_metrics.Counter("read_comparison_mismatch").Inc();
}
}
}
```
### 7.3 Rollback Verification
```csharp
public class RollbackVerificationTests
{
[Fact]
public async Task Rollback_RestoresMongoAsSource_WhenPostgresFails()
{
// Simulate Postgres failure
await _postgresDataSource.DisposeAsync();
// Verify system falls back to Mongo
var config = _configuration.GetSection("Persistence");
config["Scheduler"] = "Mongo"; // Simulate config change
// Operations should continue working
var schedule = await _scheduleRepository.GetAsync(
"tenant", "schedule-id", CancellationToken.None);
schedule.Should().NotBeNull();
}
}
```
---
## 8. Module-Specific Verification
### 8.1 Authority Verification
| Test | Description | Pass Criteria |
|------|-------------|---------------|
| User CRUD | Create, read, update, delete users | All operations succeed |
| Role assignment | Assign/revoke roles | Roles correctly applied |
| Token issuance | Issue OAuth tokens | Tokens valid and verifiable |
| Token verification | Verify issued tokens | Verification succeeds |
| Login tracking | Record login attempts | Attempts logged correctly |
| License validation | Check license validity | Same result both backends |
### 8.2 Scheduler Verification
| Test | Description | Pass Criteria |
|------|-------------|---------------|
| Schedule CRUD | All CRUD operations | Data integrity preserved |
| Trigger calculation | Next fire time calculation | Identical results |
| Run history | Run creation and completion | Correct state transitions |
| Impact snapshots | Finding aggregation | Same counts and severity |
| Worker registration | Worker heartbeats | Consistent status |
### 8.3 Vulnerability Verification
| Test | Description | Pass Criteria |
|------|-------------|---------------|
| Advisory ingest | Import from feed | All advisories imported |
| Alias resolution | CVE → Advisory lookup | Same advisory returned |
| CVSS lookup | Get CVSS scores | Identical scores |
| Affected package match | PURL matching | Same vulnerabilities found |
| KEV flag lookup | Check KEV status | Correct flag status |
### 8.4 VEX Verification
| Test | Description | Pass Criteria |
|------|-------------|---------------|
| Graph revision | Compute revision ID | Identical revision IDs |
| Node/edge counts | Graph structure | Same counts |
| VEX statements | Status determination | Same statuses |
| Consensus computation | Aggregate signals | Same consensus |
| Evidence manifest | Merkle root | Identical roots |
---
## 9. Verification Checklist
### Per-Module Checklist
- [ ] All unit tests pass with PostgreSQL
- [ ] Tenant isolation tests pass
- [ ] Determinism tests pass
- [ ] Performance benchmarks within tolerance
- [ ] Record counts match between MongoDB and PostgreSQL
- [ ] Checksum verification passes for sample data
- [ ] Referential integrity verified
- [ ] Comparison tests pass for all scenarios
- [ ] Load tests pass with acceptable metrics
### Pre-Production Checklist
- [ ] Dual-write monitoring in place
- [ ] Read comparison sampling enabled
- [ ] Rollback procedure tested
- [ ] Performance baselines established
- [ ] Alert thresholds configured
- [ ] Runbook documented
### Post-Switch Checklist
- [ ] No dual-write inconsistencies for 7 days
- [ ] Read comparison sampling shows 100% match
- [ ] Performance within acceptable range
- [ ] No data integrity alerts
- [ ] MongoDB reads disabled
- [ ] MongoDB backups archived
---
## 10. Reporting
### 10.1 Verification Report Template
```markdown
# Database Conversion Verification Report
## Module: [Module Name]
## Date: [YYYY-MM-DD]
## Status: [PASS/FAIL]
### Summary
- Total Tests: X
- Passed: Y
- Failed: Z
### Unit Tests
| Category | Passed | Failed | Notes |
|----------|--------|--------|-------|
| CRUD | | | |
| Isolation| | | |
| Determinism | | | |
### Comparison Tests
| Test | Status | Notes |
|------|--------|-------|
| | | |
### Performance
| Operation | Mongo | Postgres | Diff |
|-----------|-------|----------|------|
| | | | |
### Data Integrity
- Record count match: [YES/NO]
- Checksum verification: [PASS/FAIL]
- Referential integrity: [PASS/FAIL]
### Sign-off
- [ ] QA Engineer
- [ ] Tech Lead
- [ ] Product Owner
```
---
*Document Version: 1.0.0*
*Last Updated: 2025-11-28*

View File

@@ -0,0 +1,404 @@
# Phase 0: Foundations
**Sprint:** 1
**Duration:** 1 sprint
**Status:** TODO
**Dependencies:** None
---
## Objectives
1. Provision PostgreSQL cluster for staging and production
2. Create shared infrastructure library (`StellaOps.Infrastructure.Postgres`)
3. Set up CI/CD pipeline for PostgreSQL migrations
4. Establish Testcontainers-based integration testing
---
## Deliverables
| Deliverable | Acceptance Criteria |
|-------------|---------------------|
| PostgreSQL cluster | Running in staging with proper configuration |
| Shared library | DataSource, migrations, extensions implemented |
| CI pipeline | PostgreSQL tests running on every PR |
| Documentation | SPECIFICATION.md, RULES.md reviewed and approved |
---
## Task Breakdown
### T0.1: PostgreSQL Cluster Provisioning
**Status:** TODO
**Assignee:** TBD
**Estimate:** 2 days
**Description:**
Provision PostgreSQL 16+ cluster with appropriate configuration for StellaOps workload.
**Subtasks:**
- [ ] T0.1.1: Select PostgreSQL hosting (managed vs self-hosted)
- [ ] T0.1.2: Create staging cluster with single primary
- [ ] T0.1.3: Configure connection pooling (PgBouncer or built-in)
- [ ] T0.1.4: Set up backup and restore procedures
- [ ] T0.1.5: Configure monitoring (pg_stat_statements, Prometheus exporter)
- [ ] T0.1.6: Document connection strings and access credentials
- [ ] T0.1.7: Configure SSL/TLS for connections
**Configuration Requirements:**
```
PostgreSQL Version: 16+
Max Connections: 100 (via pooler: 500)
Shared Buffers: 25% of RAM
Work Mem: 64MB
Maintenance Work Mem: 512MB
WAL Level: replica
Max WAL Size: 2GB
```
**Verification:**
- [ ] Can connect from development machines
- [ ] Can connect from CI/CD runners
- [ ] Monitoring dashboard shows metrics
- [ ] Backup tested and verified
---
### T0.2: Create StellaOps.Infrastructure.Postgres Library
**Status:** TODO
**Assignee:** TBD
**Estimate:** 3 days
**Description:**
Create shared library with reusable PostgreSQL infrastructure components.
**Subtasks:**
- [ ] T0.2.1: Create project `src/Shared/StellaOps.Infrastructure.Postgres/`
- [ ] T0.2.2: Add Npgsql NuGet package reference
- [ ] T0.2.3: Implement `DataSourceBase` abstract class
- [ ] T0.2.4: Implement `IPostgresMigration` interface
- [ ] T0.2.5: Implement `PostgresMigrationRunner` class
- [ ] T0.2.6: Implement `NpgsqlExtensions` helper methods
- [ ] T0.2.7: Implement `ServiceCollectionExtensions` for DI
- [ ] T0.2.8: Add XML documentation to all public APIs
- [ ] T0.2.9: Add unit tests for migration runner
**Files to Create:**
```
src/Shared/StellaOps.Infrastructure.Postgres/
├── StellaOps.Infrastructure.Postgres.csproj
├── DataSourceBase.cs
├── PostgresOptions.cs
├── Migrations/
│ ├── IPostgresMigration.cs
│ └── PostgresMigrationRunner.cs
├── Extensions/
│ ├── NpgsqlExtensions.cs
│ └── NpgsqlCommandExtensions.cs
└── ServiceCollectionExtensions.cs
```
**DataSourceBase Implementation:**
```csharp
public abstract class DataSourceBase : IAsyncDisposable
{
protected readonly NpgsqlDataSource DataSource;
protected readonly PostgresOptions Options;
protected DataSourceBase(IOptions<PostgresOptions> options)
{
Options = options.Value;
var builder = new NpgsqlDataSourceBuilder(Options.ConnectionString);
ConfigureDataSource(builder);
DataSource = builder.Build();
}
protected virtual void ConfigureDataSource(NpgsqlDataSourceBuilder builder)
{
// Override in derived classes for module-specific config
}
public async Task<NpgsqlConnection> OpenConnectionAsync(
string tenantId,
CancellationToken cancellationToken = default)
{
var connection = await DataSource.OpenConnectionAsync(cancellationToken);
await ConfigureSessionAsync(connection, tenantId, cancellationToken);
return connection;
}
protected virtual async Task ConfigureSessionAsync(
NpgsqlConnection connection,
string tenantId,
CancellationToken cancellationToken)
{
await using var cmd = connection.CreateCommand();
cmd.CommandText = $"""
SET app.tenant_id = '{tenantId}';
SET timezone = 'UTC';
SET statement_timeout = '{Options.CommandTimeoutSeconds}s';
""";
await cmd.ExecuteNonQueryAsync(cancellationToken);
}
public async ValueTask DisposeAsync()
{
await DataSource.DisposeAsync();
GC.SuppressFinalize(this);
}
}
```
**Verification:**
- [ ] Project builds without errors
- [ ] Unit tests pass
- [ ] Can be referenced from module projects
---
### T0.3: Migration Framework Implementation
**Status:** TODO
**Assignee:** TBD
**Estimate:** 2 days
**Description:**
Implement idempotent migration framework for schema management.
**Subtasks:**
- [ ] T0.3.1: Define `IPostgresMigration` interface
- [ ] T0.3.2: Implement `PostgresMigrationRunner` with transaction support
- [ ] T0.3.3: Implement migration tracking table (`_migrations`)
- [ ] T0.3.4: Add `IHostedService` for automatic migration on startup
- [ ] T0.3.5: Add CLI command for manual migration execution
- [ ] T0.3.6: Add migration rollback support (optional)
**Migration Interface:**
```csharp
public interface IPostgresMigration
{
/// <summary>
/// Unique migration identifier (e.g., "V001_CreateAuthoritySchema")
/// </summary>
string Id { get; }
/// <summary>
/// Human-readable description
/// </summary>
string Description { get; }
/// <summary>
/// Apply the migration
/// </summary>
Task UpAsync(NpgsqlConnection connection, CancellationToken cancellationToken);
/// <summary>
/// Rollback the migration (optional)
/// </summary>
Task DownAsync(NpgsqlConnection connection, CancellationToken cancellationToken);
}
```
**Verification:**
- [ ] Migrations run idempotently (can run multiple times)
- [ ] Migration state tracked correctly
- [ ] Failed migrations roll back cleanly
---
### T0.4: CI/CD Pipeline Configuration
**Status:** TODO
**Assignee:** TBD
**Estimate:** 2 days
**Description:**
Add PostgreSQL integration testing to CI/CD pipeline.
**Subtasks:**
- [ ] T0.4.1: Add Testcontainers.PostgreSql NuGet package to test projects
- [ ] T0.4.2: Create `PostgresTestFixture` base class
- [ ] T0.4.3: Update CI workflow to support PostgreSQL containers
- [ ] T0.4.4: Add parallel test execution configuration
- [ ] T0.4.5: Add test coverage reporting for PostgreSQL code
**PostgresTestFixture:**
```csharp
public sealed class PostgresTestFixture : IAsyncLifetime
{
private readonly PostgreSqlContainer _container;
private NpgsqlDataSource? _dataSource;
public PostgresTestFixture()
{
_container = new PostgreSqlBuilder()
.WithImage("postgres:16-alpine")
.WithDatabase("stellaops_test")
.WithUsername("test")
.WithPassword("test")
.WithWaitStrategy(Wait.ForUnixContainer()
.UntilPortIsAvailable(5432))
.Build();
}
public string ConnectionString => _container.GetConnectionString();
public NpgsqlDataSource DataSource => _dataSource
?? throw new InvalidOperationException("Fixture not initialized");
public async Task InitializeAsync()
{
await _container.StartAsync();
_dataSource = NpgsqlDataSource.Create(ConnectionString);
}
public async Task DisposeAsync()
{
if (_dataSource is not null)
await _dataSource.DisposeAsync();
await _container.DisposeAsync();
}
}
```
**CI Workflow Update:**
```yaml
# .gitea/workflows/build-test-deploy.yml
- name: Run PostgreSQL Integration Tests
run: |
dotnet test src/StellaOps.sln \
--filter "Category=PostgresIntegration" \
--logger "trx;LogFileName=postgres-test-results.trx"
env:
TESTCONTAINERS_RYUK_DISABLED: true
```
**Verification:**
- [ ] CI pipeline runs PostgreSQL tests
- [ ] Tests can run in parallel without conflicts
- [ ] Test results reported correctly
---
### T0.5: Persistence Configuration
**Status:** TODO
**Assignee:** TBD
**Estimate:** 1 day
**Description:**
Add persistence backend configuration to all services.
**Subtasks:**
- [ ] T0.5.1: Define `PersistenceOptions` class
- [ ] T0.5.2: Add configuration section to `appsettings.json`
- [ ] T0.5.3: Update service registration to read persistence config
- [ ] T0.5.4: Add configuration validation on startup
**PersistenceOptions:**
```csharp
public sealed class PersistenceOptions
{
public const string SectionName = "Persistence";
public string Authority { get; set; } = "Mongo";
public string Scheduler { get; set; } = "Mongo";
public string Concelier { get; set; } = "Mongo";
public string Excititor { get; set; } = "Mongo";
public string Notify { get; set; } = "Mongo";
public string Policy { get; set; } = "Mongo";
}
```
**Configuration Template:**
```json
{
"Persistence": {
"Authority": "Mongo",
"Scheduler": "Mongo",
"Concelier": "Mongo",
"Excititor": "Mongo",
"Notify": "Mongo",
"Policy": "Mongo"
},
"Postgres": {
"ConnectionString": "Host=localhost;Database=stellaops;Username=stellaops;Password=secret",
"CommandTimeoutSeconds": 30,
"ConnectionTimeoutSeconds": 15
}
}
```
**Verification:**
- [ ] Configuration loads correctly
- [ ] Invalid configuration throws on startup
- [ ] Environment variables can override settings
---
### T0.6: Documentation Review
**Status:** TODO
**Assignee:** TBD
**Estimate:** 1 day
**Description:**
Review and finalize database documentation.
**Subtasks:**
- [ ] T0.6.1: Review SPECIFICATION.md for completeness
- [ ] T0.6.2: Review RULES.md for clarity
- [ ] T0.6.3: Review VERIFICATION.md for test coverage
- [ ] T0.6.4: Get Architecture Team sign-off
- [ ] T0.6.5: Publish to team wiki/docs site
**Verification:**
- [ ] All documents reviewed by 2+ team members
- [ ] No outstanding questions or TODOs
- [ ] Architecture Team approval received
---
## Exit Criteria
- [ ] PostgreSQL cluster running and accessible
- [ ] `StellaOps.Infrastructure.Postgres` library implemented and tested
- [ ] CI pipeline running PostgreSQL integration tests
- [ ] Persistence configuration framework in place
- [ ] Documentation reviewed and approved
---
## Risks & Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| PostgreSQL provisioning delays | Medium | High | Start early, have backup plan |
| Testcontainers compatibility issues | Low | Medium | Test on CI runners early |
| Configuration complexity | Low | Low | Use existing patterns from Orchestrator |
---
## Dependencies on Later Phases
Phase 0 must complete before any module conversion (Phases 1-6) can begin. The following are required:
1. PostgreSQL cluster operational
2. Shared library published
3. CI pipeline validated
4. Configuration framework deployed
---
## Notes
- Use Orchestrator module as reference for all patterns
- Prioritize getting CI pipeline working early
- Document all configuration decisions
---
*Phase Version: 1.0.0*
*Last Updated: 2025-11-28*

View File

@@ -0,0 +1,495 @@
# Phase 1: Authority Module Conversion
**Sprint:** 2
**Duration:** 1 sprint
**Status:** TODO
**Dependencies:** Phase 0 (Foundations)
---
## Objectives
1. Create `StellaOps.Authority.Storage.Postgres` project
2. Implement full Authority schema in PostgreSQL
3. Implement all repository interfaces
4. Enable dual-write mode for validation
5. Switch Authority to PostgreSQL-only after verification
---
## Deliverables
| Deliverable | Acceptance Criteria |
|-------------|---------------------|
| Authority schema | All tables created with indexes |
| Repository implementations | All 9 interfaces implemented |
| Dual-write wrapper | Optional, for safe rollout |
| Integration tests | 100% coverage of CRUD operations |
| Verification report | MongoDB vs PostgreSQL comparison passed |
---
## Schema Reference
See [SPECIFICATION.md](../SPECIFICATION.md) Section 5.1 for complete Authority schema.
**Tables:**
- `authority.tenants`
- `authority.users`
- `authority.roles`
- `authority.user_roles`
- `authority.service_accounts`
- `authority.clients`
- `authority.scopes`
- `authority.tokens`
- `authority.revocations`
- `authority.login_attempts`
- `authority.licenses`
- `authority.license_usage`
---
## Task Breakdown
### T1.1: Create Authority.Storage.Postgres Project
**Status:** TODO
**Assignee:** TBD
**Estimate:** 0.5 days
**Description:**
Create the PostgreSQL storage project for Authority module.
**Subtasks:**
- [ ] T1.1.1: Create project `src/Authority/__Libraries/StellaOps.Authority.Storage.Postgres/`
- [ ] T1.1.2: Add reference to `StellaOps.Infrastructure.Postgres`
- [ ] T1.1.3: Add reference to `StellaOps.Authority.Core`
- [ ] T1.1.4: Create `AuthorityDataSource` class
- [ ] T1.1.5: Create `AuthorityPostgresOptions` class
- [ ] T1.1.6: Create `ServiceCollectionExtensions.cs`
**Project Structure:**
```
src/Authority/__Libraries/StellaOps.Authority.Storage.Postgres/
├── StellaOps.Authority.Storage.Postgres.csproj
├── AuthorityDataSource.cs
├── AuthorityPostgresOptions.cs
├── Repositories/
│ ├── PostgresUserRepository.cs
│ ├── PostgresRoleRepository.cs
│ ├── PostgresServiceAccountRepository.cs
│ ├── PostgresClientRepository.cs
│ ├── PostgresScopeRepository.cs
│ ├── PostgresTokenRepository.cs
│ ├── PostgresRevocationRepository.cs
│ ├── PostgresLoginAttemptRepository.cs
│ └── PostgresLicenseRepository.cs
├── Migrations/
│ └── V001_CreateAuthoritySchema.cs
└── ServiceCollectionExtensions.cs
```
**Verification:**
- [ ] Project builds without errors
- [ ] Can be referenced from Authority.WebService
---
### T1.2: Implement Schema Migrations
**Status:** TODO
**Assignee:** TBD
**Estimate:** 1 day
**Description:**
Create PostgreSQL schema migration for Authority tables.
**Subtasks:**
- [ ] T1.2.1: Create `V001_CreateAuthoritySchema` migration
- [ ] T1.2.2: Include all tables from SPECIFICATION.md
- [ ] T1.2.3: Include all indexes
- [ ] T1.2.4: Add seed data for system roles/permissions
- [ ] T1.2.5: Test migration idempotency
**Migration Implementation:**
```csharp
public sealed class V001_CreateAuthoritySchema : IPostgresMigration
{
public string Id => "V001_CreateAuthoritySchema";
public string Description => "Create Authority schema with all tables and indexes";
public async Task UpAsync(NpgsqlConnection connection, CancellationToken ct)
{
await using var cmd = connection.CreateCommand();
cmd.CommandText = AuthoritySchemaSql;
await cmd.ExecuteNonQueryAsync(ct);
}
public Task DownAsync(NpgsqlConnection connection, CancellationToken ct)
=> throw new NotSupportedException("Rollback not supported for schema creation");
private const string AuthoritySchemaSql = """
CREATE SCHEMA IF NOT EXISTS authority;
CREATE TABLE IF NOT EXISTS authority.tenants (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
code TEXT NOT NULL UNIQUE,
display_name TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'active'
CHECK (status IN ('active', 'suspended', 'trial', 'terminated')),
settings JSONB DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- ... rest of schema from SPECIFICATION.md
""";
}
```
**Verification:**
- [ ] Migration creates all tables
- [ ] Migration is idempotent
- [ ] Indexes created correctly
---
### T1.3: Implement User Repository
**Status:** TODO
**Assignee:** TBD
**Estimate:** 1 day
**Description:**
Implement `IUserRepository` for PostgreSQL.
**Subtasks:**
- [ ] T1.3.1: Implement `GetByIdAsync`
- [ ] T1.3.2: Implement `GetByUsernameAsync`
- [ ] T1.3.3: Implement `GetBySubjectIdAsync`
- [ ] T1.3.4: Implement `ListAsync` with pagination
- [ ] T1.3.5: Implement `CreateAsync`
- [ ] T1.3.6: Implement `UpdateAsync`
- [ ] T1.3.7: Implement `DeleteAsync`
- [ ] T1.3.8: Implement `GetRolesAsync`
- [ ] T1.3.9: Implement `AssignRoleAsync`
- [ ] T1.3.10: Implement `RevokeRoleAsync`
- [ ] T1.3.11: Write integration tests
**Interface Reference:**
```csharp
public interface IUserRepository
{
Task<User?> GetByIdAsync(string tenantId, Guid userId, CancellationToken ct);
Task<User?> GetByUsernameAsync(string tenantId, string username, CancellationToken ct);
Task<User?> GetBySubjectIdAsync(Guid subjectId, CancellationToken ct);
Task<PagedResult<User>> ListAsync(string tenantId, UserQuery query, CancellationToken ct);
Task<User> CreateAsync(User user, CancellationToken ct);
Task<User> UpdateAsync(User user, CancellationToken ct);
Task<bool> DeleteAsync(string tenantId, Guid userId, CancellationToken ct);
Task<IReadOnlyList<Role>> GetRolesAsync(string tenantId, Guid userId, CancellationToken ct);
Task AssignRoleAsync(string tenantId, Guid userId, Guid roleId, CancellationToken ct);
Task RevokeRoleAsync(string tenantId, Guid userId, Guid roleId, CancellationToken ct);
}
```
**Verification:**
- [ ] All methods implemented
- [ ] Integration tests pass
- [ ] Tenant isolation verified
---
### T1.4: Implement Service Account Repository
**Status:** TODO
**Assignee:** TBD
**Estimate:** 0.5 days
**Description:**
Implement `IServiceAccountRepository` for PostgreSQL.
**Subtasks:**
- [ ] T1.4.1: Implement `GetByIdAsync`
- [ ] T1.4.2: Implement `GetByAccountIdAsync`
- [ ] T1.4.3: Implement `ListAsync`
- [ ] T1.4.4: Implement `CreateAsync`
- [ ] T1.4.5: Implement `UpdateAsync`
- [ ] T1.4.6: Implement `DeleteAsync`
- [ ] T1.4.7: Write integration tests
**Verification:**
- [ ] All methods implemented
- [ ] Integration tests pass
---
### T1.5: Implement Client Repository
**Status:** TODO
**Assignee:** TBD
**Estimate:** 0.5 days
**Description:**
Implement `IClientRepository` for PostgreSQL (OpenIddict compatible).
**Subtasks:**
- [ ] T1.5.1: Implement `GetByIdAsync`
- [ ] T1.5.2: Implement `GetByClientIdAsync`
- [ ] T1.5.3: Implement `ListAsync`
- [ ] T1.5.4: Implement `CreateAsync`
- [ ] T1.5.5: Implement `UpdateAsync`
- [ ] T1.5.6: Implement `DeleteAsync`
- [ ] T1.5.7: Write integration tests
**Verification:**
- [ ] All methods implemented
- [ ] Integration tests pass
---
### T1.6: Implement Token Repository
**Status:** TODO
**Assignee:** TBD
**Estimate:** 1 day
**Description:**
Implement `ITokenRepository` for PostgreSQL.
**Subtasks:**
- [ ] T1.6.1: Implement `GetByIdAsync`
- [ ] T1.6.2: Implement `GetByHashAsync`
- [ ] T1.6.3: Implement `CreateAsync`
- [ ] T1.6.4: Implement `RevokeAsync`
- [ ] T1.6.5: Implement `PruneExpiredAsync`
- [ ] T1.6.6: Implement `GetActiveTokensAsync`
- [ ] T1.6.7: Write integration tests
**Verification:**
- [ ] All methods implemented
- [ ] Token lookup by hash is fast
- [ ] Expired token pruning works
---
### T1.7: Implement Remaining Repositories
**Status:** TODO
**Assignee:** TBD
**Estimate:** 1.5 days
**Description:**
Implement remaining repository interfaces.
**Subtasks:**
- [ ] T1.7.1: Implement `IRoleRepository`
- [ ] T1.7.2: Implement `IScopeRepository`
- [ ] T1.7.3: Implement `IRevocationRepository`
- [ ] T1.7.4: Implement `ILoginAttemptRepository`
- [ ] T1.7.5: Implement `ILicenseRepository`
- [ ] T1.7.6: Write integration tests for all
**Verification:**
- [ ] All repositories implemented
- [ ] All integration tests pass
---
### T1.8: Add Configuration Switch
**Status:** TODO
**Assignee:** TBD
**Estimate:** 0.5 days
**Description:**
Add configuration-based backend selection for Authority.
**Subtasks:**
- [ ] T1.8.1: Update `ServiceCollectionExtensions` in Authority.WebService
- [ ] T1.8.2: Add conditional registration based on `Persistence:Authority`
- [ ] T1.8.3: Test switching between Mongo and Postgres
- [ ] T1.8.4: Document configuration options
**Implementation:**
```csharp
public static IServiceCollection AddAuthorityStorage(
this IServiceCollection services,
IConfiguration configuration)
{
var backend = configuration.GetValue<string>("Persistence:Authority") ?? "Mongo";
return backend.ToLowerInvariant() switch
{
"postgres" => services.AddAuthorityPostgresStorage(configuration),
"mongo" => services.AddAuthorityMongoStorage(configuration),
_ => throw new ArgumentException($"Unknown Authority backend: {backend}")
};
}
```
**Verification:**
- [ ] Can switch between backends via configuration
- [ ] Invalid configuration throws clear error
---
### T1.9: Implement Dual-Write Wrapper (Optional)
**Status:** TODO
**Assignee:** TBD
**Estimate:** 1 day
**Description:**
Implement dual-write repository wrapper for safe migration.
**Subtasks:**
- [ ] T1.9.1: Create `DualWriteUserRepository`
- [ ] T1.9.2: Implement write-to-both logic
- [ ] T1.9.3: Implement read-from-primary-with-fallback logic
- [ ] T1.9.4: Add metrics for dual-write operations
- [ ] T1.9.5: Add logging for inconsistencies
- [ ] T1.9.6: Create similar wrappers for other critical repositories
**Configuration Options:**
```csharp
public sealed class DualWriteOptions
{
public string PrimaryBackend { get; set; } = "Postgres";
public bool WriteToBoth { get; set; } = true;
public bool FallbackToSecondary { get; set; } = true;
public bool ConvertOnRead { get; set; } = true;
}
```
**Verification:**
- [ ] Writes go to both backends
- [ ] Reads work with fallback
- [ ] Inconsistencies are logged
---
### T1.10: Run Verification Tests
**Status:** TODO
**Assignee:** TBD
**Estimate:** 1 day
**Description:**
Verify PostgreSQL implementation matches MongoDB behavior.
**Subtasks:**
- [ ] T1.10.1: Run comparison tests for User repository
- [ ] T1.10.2: Run comparison tests for Token repository
- [ ] T1.10.3: Verify token issuance/verification flow
- [ ] T1.10.4: Verify login flow
- [ ] T1.10.5: Document any differences found
- [ ] T1.10.6: Generate verification report
**Verification Tests:**
```csharp
[Fact]
public async Task Users_Should_Match_Between_Mongo_And_Postgres()
{
var tenantIds = await GetSampleTenantIds(10);
foreach (var tenantId in tenantIds)
{
var mongoUsers = await _mongoRepo.ListAsync(tenantId, new UserQuery());
var postgresUsers = await _postgresRepo.ListAsync(tenantId, new UserQuery());
postgresUsers.Items.Should().BeEquivalentTo(mongoUsers.Items,
options => options.Excluding(u => u.Id));
}
}
```
**Verification:**
- [ ] All comparison tests pass
- [ ] No data discrepancies found
- [ ] Verification report approved
---
### T1.11: Backfill Data (If Required)
**Status:** TODO
**Assignee:** TBD
**Estimate:** 0.5 days
**Description:**
Backfill existing MongoDB data to PostgreSQL.
**Subtasks:**
- [ ] T1.11.1: Create backfill script for tenants
- [ ] T1.11.2: Create backfill script for users
- [ ] T1.11.3: Create backfill script for service accounts
- [ ] T1.11.4: Create backfill script for clients/scopes
- [ ] T1.11.5: Create backfill script for active tokens
- [ ] T1.11.6: Verify record counts match
- [ ] T1.11.7: Verify sample records match
**Verification:**
- [ ] All Tier A data backfilled
- [ ] Record counts match
- [ ] Sample verification passed
---
### T1.12: Switch to PostgreSQL-Only
**Status:** TODO
**Assignee:** TBD
**Estimate:** 0.5 days
**Description:**
Switch Authority to PostgreSQL-only mode.
**Subtasks:**
- [ ] T1.12.1: Update configuration to `"Authority": "Postgres"`
- [ ] T1.12.2: Deploy to staging
- [ ] T1.12.3: Run full integration test suite
- [ ] T1.12.4: Monitor for errors/issues
- [ ] T1.12.5: Deploy to production
- [ ] T1.12.6: Monitor production metrics
**Verification:**
- [ ] All tests pass in staging
- [ ] No errors in production
- [ ] Performance metrics acceptable
---
## Exit Criteria
- [ ] All repository interfaces implemented for PostgreSQL
- [ ] All integration tests pass
- [ ] Verification tests pass (MongoDB vs PostgreSQL comparison)
- [ ] Configuration switch working
- [ ] Authority running on PostgreSQL in production
- [ ] MongoDB Authority collections archived
---
## Risks & Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| Token verification regression | Low | High | Extensive testing, dual-write |
| OAuth flow breakage | Low | High | Test all OAuth flows |
| Performance regression | Medium | Medium | Load testing before switch |
---
## Rollback Plan
1. Change configuration: `"Authority": "Mongo"`
2. Deploy configuration change
3. MongoDB still has all data (dual-write period)
4. Investigate and fix PostgreSQL issues
5. Re-attempt conversion
---
*Phase Version: 1.0.0*
*Last Updated: 2025-11-28*

View File

@@ -0,0 +1,305 @@
# Phase 2: Scheduler Module Conversion
**Sprint:** 3
**Duration:** 1 sprint
**Status:** TODO
**Dependencies:** Phase 0 (Foundations)
---
## Objectives
1. Create `StellaOps.Scheduler.Storage.Postgres` project
2. Implement Scheduler schema in PostgreSQL
3. Implement 7+ repository interfaces
4. Replace MongoDB job tracking with PostgreSQL
5. Implement PostgreSQL advisory locks for distributed locking
---
## Deliverables
| Deliverable | Acceptance Criteria |
|-------------|---------------------|
| Scheduler schema | All tables created with indexes |
| Repository implementations | All 7+ interfaces implemented |
| Advisory locks | Distributed locking working |
| Integration tests | 100% coverage of CRUD operations |
| Verification report | Schedule execution verified |
---
## Schema Reference
See [SPECIFICATION.md](../SPECIFICATION.md) Section 5.4 for complete Scheduler schema.
**Tables:**
- `scheduler.schedules`
- `scheduler.triggers`
- `scheduler.runs`
- `scheduler.graph_jobs`
- `scheduler.policy_jobs`
- `scheduler.impact_snapshots`
- `scheduler.workers`
- `scheduler.execution_logs`
- `scheduler.locks`
- `scheduler.run_summaries`
- `scheduler.audit`
---
## Task Breakdown
### T2.1: Create Scheduler.Storage.Postgres Project
**Status:** TODO
**Assignee:** TBD
**Estimate:** 0.5 days
**Subtasks:**
- [ ] T2.1.1: Create project structure
- [ ] T2.1.2: Add NuGet references
- [ ] T2.1.3: Create `SchedulerDataSource` class
- [ ] T2.1.4: Create `ServiceCollectionExtensions.cs`
---
### T2.2: Implement Schema Migrations
**Status:** TODO
**Assignee:** TBD
**Estimate:** 1 day
**Subtasks:**
- [ ] T2.2.1: Create `V001_CreateSchedulerSchema` migration
- [ ] T2.2.2: Include all tables and indexes
- [ ] T2.2.3: Add partial index for active schedules
- [ ] T2.2.4: Test migration idempotency
---
### T2.3: Implement Schedule Repository
**Status:** TODO
**Assignee:** TBD
**Estimate:** 1 day
**Interface:**
```csharp
public interface IScheduleRepository
{
Task<Schedule?> GetAsync(string tenantId, string scheduleId, CancellationToken ct);
Task<IReadOnlyList<Schedule>> ListAsync(string tenantId, ScheduleQueryOptions? options, CancellationToken ct);
Task UpsertAsync(Schedule schedule, CancellationToken ct);
Task<bool> SoftDeleteAsync(string tenantId, string scheduleId, string deletedBy, DateTimeOffset deletedAt, CancellationToken ct);
Task<IReadOnlyList<Schedule>> GetDueSchedulesAsync(DateTimeOffset now, CancellationToken ct);
}
```
**Subtasks:**
- [ ] T2.3.1: Implement all interface methods
- [ ] T2.3.2: Handle soft delete correctly
- [ ] T2.3.3: Implement GetDueSchedules for trigger calculation
- [ ] T2.3.4: Write integration tests
---
### T2.4: Implement Run Repository
**Status:** TODO
**Assignee:** TBD
**Estimate:** 1 day
**Interface:**
```csharp
public interface IRunRepository
{
Task<Run?> GetAsync(string tenantId, Guid runId, CancellationToken ct);
Task<IReadOnlyList<Run>> ListAsync(string tenantId, RunQueryOptions? options, CancellationToken ct);
Task<Run> CreateAsync(Run run, CancellationToken ct);
Task<Run> UpdateAsync(Run run, CancellationToken ct);
Task<IReadOnlyList<Run>> GetPendingRunsAsync(string tenantId, CancellationToken ct);
Task<IReadOnlyList<Run>> GetRunsByScheduleAsync(string tenantId, Guid scheduleId, int limit, CancellationToken ct);
}
```
**Subtasks:**
- [ ] T2.4.1: Implement all interface methods
- [ ] T2.4.2: Handle state transitions
- [ ] T2.4.3: Implement efficient pagination
- [ ] T2.4.4: Write integration tests
---
### T2.5: Implement Graph Job Repository
**Status:** TODO
**Assignee:** TBD
**Estimate:** 0.5 days
**Subtasks:**
- [ ] T2.5.1: Implement CRUD operations
- [ ] T2.5.2: Implement status queries
- [ ] T2.5.3: Write integration tests
---
### T2.6: Implement Policy Job Repository
**Status:** TODO
**Assignee:** TBD
**Estimate:** 0.5 days
**Subtasks:**
- [ ] T2.6.1: Implement CRUD operations
- [ ] T2.6.2: Implement status queries
- [ ] T2.6.3: Write integration tests
---
### T2.7: Implement Impact Snapshot Repository
**Status:** TODO
**Assignee:** TBD
**Estimate:** 0.5 days
**Subtasks:**
- [ ] T2.7.1: Implement CRUD operations
- [ ] T2.7.2: Implement queries by run
- [ ] T2.7.3: Write integration tests
---
### T2.8: Implement Distributed Locking
**Status:** TODO
**Assignee:** TBD
**Estimate:** 1 day
**Description:**
Implement distributed locking using PostgreSQL advisory locks.
**Options:**
1. PostgreSQL advisory locks (`pg_advisory_lock`)
2. Table-based locks with SELECT FOR UPDATE SKIP LOCKED
3. Combination approach
**Subtasks:**
- [ ] T2.8.1: Choose locking strategy
- [ ] T2.8.2: Implement `IDistributedLock` interface
- [ ] T2.8.3: Implement lock acquisition with timeout
- [ ] T2.8.4: Implement lock renewal
- [ ] T2.8.5: Implement lock release
- [ ] T2.8.6: Write concurrency tests
**Implementation Example:**
```csharp
public sealed class PostgresDistributedLock : IDistributedLock
{
private readonly SchedulerDataSource _dataSource;
public async Task<IAsyncDisposable?> TryAcquireAsync(
string lockKey,
TimeSpan timeout,
CancellationToken ct)
{
var lockId = ComputeLockId(lockKey);
await using var connection = await _dataSource.OpenConnectionAsync("system", ct);
await using var cmd = connection.CreateCommand();
cmd.CommandText = "SELECT pg_try_advisory_lock(@lock_id)";
cmd.Parameters.AddWithValue("lock_id", lockId);
var acquired = await cmd.ExecuteScalarAsync(ct) is true;
if (!acquired) return null;
return new LockHandle(connection, lockId);
}
private static long ComputeLockId(string key)
=> unchecked((long)key.GetHashCode());
}
```
---
### T2.9: Implement Worker Registration
**Status:** TODO
**Assignee:** TBD
**Estimate:** 0.5 days
**Subtasks:**
- [ ] T2.9.1: Implement worker registration
- [ ] T2.9.2: Implement heartbeat updates
- [ ] T2.9.3: Implement dead worker detection
- [ ] T2.9.4: Write integration tests
---
### T2.10: Add Configuration Switch
**Status:** TODO
**Assignee:** TBD
**Estimate:** 0.5 days
**Subtasks:**
- [ ] T2.10.1: Update service registration
- [ ] T2.10.2: Test backend switching
- [ ] T2.10.3: Document configuration
---
### T2.11: Run Verification Tests
**Status:** TODO
**Assignee:** TBD
**Estimate:** 1 day
**Subtasks:**
- [ ] T2.11.1: Test schedule CRUD
- [ ] T2.11.2: Test run creation and state transitions
- [ ] T2.11.3: Test trigger calculation
- [ ] T2.11.4: Test distributed locking under concurrency
- [ ] T2.11.5: Test job execution end-to-end
- [ ] T2.11.6: Generate verification report
---
### T2.12: Switch to PostgreSQL-Only
**Status:** TODO
**Assignee:** TBD
**Estimate:** 0.5 days
**Subtasks:**
- [ ] T2.12.1: Update configuration
- [ ] T2.12.2: Deploy to staging
- [ ] T2.12.3: Run integration tests
- [ ] T2.12.4: Deploy to production
- [ ] T2.12.5: Monitor metrics
---
## Exit Criteria
- [ ] All repository interfaces implemented
- [ ] Distributed locking working correctly
- [ ] All integration tests pass
- [ ] Schedule execution working end-to-end
- [ ] Scheduler running on PostgreSQL in production
---
## Risks & Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| Lock contention | Medium | Medium | Test under load, tune timeouts |
| Trigger calculation errors | Low | High | Extensive testing with edge cases |
| State transition bugs | Medium | Medium | State machine tests |
---
*Phase Version: 1.0.0*
*Last Updated: 2025-11-28*

View File

@@ -0,0 +1,183 @@
# Phase 3: Notify Module Conversion
**Sprint:** 4
**Duration:** 1 sprint
**Status:** TODO
**Dependencies:** Phase 0 (Foundations)
---
## Objectives
1. Create `StellaOps.Notify.Storage.Postgres` project
2. Implement Notify schema in PostgreSQL
3. Implement 15 repository interfaces
4. Handle delivery tracking and escalation state
---
## Deliverables
| Deliverable | Acceptance Criteria |
|-------------|---------------------|
| Notify schema | All tables created with indexes |
| Repository implementations | All 15 interfaces implemented |
| Integration tests | 100% coverage of CRUD operations |
| Verification report | Notification delivery verified |
---
## Schema Reference
See [SPECIFICATION.md](../SPECIFICATION.md) Section 5.5 for complete Notify schema.
**Tables:**
- `notify.channels`
- `notify.rules`
- `notify.templates`
- `notify.deliveries`
- `notify.digests`
- `notify.quiet_hours`
- `notify.maintenance_windows`
- `notify.escalation_policies`
- `notify.escalation_states`
- `notify.on_call_schedules`
- `notify.inbox`
- `notify.incidents`
- `notify.audit`
---
## Task Breakdown
### T3.1: Create Notify.Storage.Postgres Project
**Status:** TODO
**Estimate:** 0.5 days
**Subtasks:**
- [ ] Create project structure
- [ ] Add NuGet references
- [ ] Create `NotifyDataSource` class
- [ ] Create `ServiceCollectionExtensions.cs`
---
### T3.2: Implement Schema Migrations
**Status:** TODO
**Estimate:** 1 day
**Subtasks:**
- [ ] Create schema migration
- [ ] Include all tables and indexes
- [ ] Test migration idempotency
---
### T3.3: Implement Channel Repository
**Status:** TODO
**Estimate:** 0.5 days
**Subtasks:**
- [ ] Implement CRUD operations
- [ ] Handle channel types (email, slack, teams, etc.)
- [ ] Write integration tests
---
### T3.4: Implement Rule Repository
**Status:** TODO
**Estimate:** 0.5 days
**Subtasks:**
- [ ] Implement CRUD operations
- [ ] Handle filter JSONB
- [ ] Write integration tests
---
### T3.5: Implement Template Repository
**Status:** TODO
**Estimate:** 0.5 days
**Subtasks:**
- [ ] Implement CRUD operations
- [ ] Handle localization
- [ ] Write integration tests
---
### T3.6: Implement Delivery Repository
**Status:** TODO
**Estimate:** 1 day
**Subtasks:**
- [ ] Implement CRUD operations
- [ ] Handle status transitions
- [ ] Implement retry logic
- [ ] Write integration tests
---
### T3.7: Implement Remaining Repositories
**Status:** TODO
**Estimate:** 2 days
**Subtasks:**
- [ ] Implement Digest repository
- [ ] Implement QuietHours repository
- [ ] Implement MaintenanceWindow repository
- [ ] Implement EscalationPolicy repository
- [ ] Implement EscalationState repository
- [ ] Implement OnCallSchedule repository
- [ ] Implement Inbox repository
- [ ] Implement Incident repository
- [ ] Implement Audit repository
- [ ] Write integration tests for all
---
### T3.8: Add Configuration Switch
**Status:** TODO
**Estimate:** 0.5 days
---
### T3.9: Run Verification Tests
**Status:** TODO
**Estimate:** 1 day
**Subtasks:**
- [ ] Test notification delivery flow
- [ ] Test escalation handling
- [ ] Test digest aggregation
- [ ] Generate verification report
---
### T3.10: Switch to PostgreSQL-Only
**Status:** TODO
**Estimate:** 0.5 days
---
## Exit Criteria
- [ ] All 15 repository interfaces implemented
- [ ] All integration tests pass
- [ ] Notification delivery working end-to-end
- [ ] Notify running on PostgreSQL in production
---
*Phase Version: 1.0.0*
*Last Updated: 2025-11-28*

View File

@@ -0,0 +1,147 @@
# Phase 4: Policy Module Conversion
**Sprint:** 5
**Duration:** 1 sprint
**Status:** TODO
**Dependencies:** Phase 0 (Foundations)
---
## Objectives
1. Create `StellaOps.Policy.Storage.Postgres` project
2. Implement Policy schema in PostgreSQL
3. Handle policy pack versioning correctly
4. Implement risk profiles with version history
---
## Deliverables
| Deliverable | Acceptance Criteria |
|-------------|---------------------|
| Policy schema | All tables created with indexes |
| Repository implementations | All 4+ interfaces implemented |
| Version management | Pack versioning working correctly |
| Integration tests | 100% coverage of CRUD operations |
---
## Schema Reference
See [SPECIFICATION.md](../SPECIFICATION.md) Section 5.6 for complete Policy schema.
**Tables:**
- `policy.packs`
- `policy.pack_versions`
- `policy.rules`
- `policy.risk_profiles`
- `policy.evaluation_runs`
- `policy.explanations`
- `policy.exceptions`
- `policy.audit`
---
## Task Breakdown
### T4.1: Create Policy.Storage.Postgres Project
**Status:** TODO
**Estimate:** 0.5 days
---
### T4.2: Implement Schema Migrations
**Status:** TODO
**Estimate:** 1 day
---
### T4.3: Implement Pack Repository
**Status:** TODO
**Estimate:** 1 day
**Subtasks:**
- [ ] Implement CRUD for packs
- [ ] Implement version management
- [ ] Handle active version promotion
- [ ] Write integration tests
---
### T4.4: Implement Risk Profile Repository
**Status:** TODO
**Estimate:** 1 day
**Subtasks:**
- [ ] Implement CRUD operations
- [ ] Handle version history
- [ ] Implement GetVersionAsync
- [ ] Implement ListVersionsAsync
- [ ] Write integration tests
---
### T4.5: Implement Remaining Repositories
**Status:** TODO
**Estimate:** 1.5 days
**Subtasks:**
- [ ] Implement Evaluation Run repository
- [ ] Implement Explanation repository
- [ ] Implement Exception repository
- [ ] Implement Audit repository
- [ ] Write integration tests
---
### T4.6: Add Configuration Switch
**Status:** TODO
**Estimate:** 0.5 days
---
### T4.7: Run Verification Tests
**Status:** TODO
**Estimate:** 1 day
---
### T4.8: Migrate Active Policy Packs
**Status:** TODO
**Estimate:** 0.5 days
**Subtasks:**
- [ ] Export active packs from MongoDB
- [ ] Import to PostgreSQL
- [ ] Verify version numbers
- [ ] Verify active version settings
---
### T4.9: Switch to PostgreSQL-Only
**Status:** TODO
**Estimate:** 0.5 days
---
## Exit Criteria
- [ ] All repository interfaces implemented
- [ ] Pack versioning working correctly
- [ ] All integration tests pass
- [ ] Policy running on PostgreSQL in production
---
*Phase Version: 1.0.0*
*Last Updated: 2025-11-28*

View File

@@ -0,0 +1,334 @@
# Phase 5: Vulnerability Index Conversion (Concelier)
**Sprint:** 6-7
**Duration:** 2 sprints
**Status:** TODO
**Dependencies:** Phase 0 (Foundations)
---
## Objectives
1. Create `StellaOps.Concelier.Storage.Postgres` project
2. Implement full vulnerability schema in PostgreSQL
3. Build advisory conversion pipeline
4. Maintain deterministic vulnerability matching
---
## Deliverables
| Deliverable | Acceptance Criteria |
|-------------|---------------------|
| Vuln schema | All tables created with indexes |
| Conversion pipeline | MongoDB advisories converted to PostgreSQL |
| Matching verification | Same CVEs found for identical SBOMs |
| Integration tests | 100% coverage of query operations |
---
## Schema Reference
See [SPECIFICATION.md](../SPECIFICATION.md) Section 5.2 for complete vulnerability schema.
**Tables:**
- `vuln.sources`
- `vuln.feed_snapshots`
- `vuln.advisory_snapshots`
- `vuln.advisories`
- `vuln.advisory_aliases`
- `vuln.advisory_cvss`
- `vuln.advisory_affected`
- `vuln.advisory_references`
- `vuln.advisory_credits`
- `vuln.advisory_weaknesses`
- `vuln.kev_flags`
- `vuln.source_states`
- `vuln.merge_events`
---
## Sprint 5a: Schema & Repositories
### T5a.1: Create Concelier.Storage.Postgres Project
**Status:** TODO
**Estimate:** 0.5 days
**Subtasks:**
- [ ] Create project structure
- [ ] Add NuGet references
- [ ] Create `ConcelierDataSource` class
- [ ] Create `ServiceCollectionExtensions.cs`
---
### T5a.2: Implement Schema Migrations
**Status:** TODO
**Estimate:** 1.5 days
**Subtasks:**
- [ ] Create schema migration
- [ ] Include all tables
- [ ] Add full-text search index
- [ ] Add PURL lookup index
- [ ] Test migration idempotency
---
### T5a.3: Implement Source Repository
**Status:** TODO
**Estimate:** 0.5 days
**Subtasks:**
- [ ] Implement CRUD operations
- [ ] Implement GetByKeyAsync
- [ ] Write integration tests
---
### T5a.4: Implement Advisory Repository
**Status:** TODO
**Estimate:** 2 days
**Interface:**
```csharp
public interface IAdvisoryRepository
{
Task<Advisory?> GetByKeyAsync(string advisoryKey, CancellationToken ct);
Task<Advisory?> GetByAliasAsync(string aliasType, string aliasValue, CancellationToken ct);
Task<IReadOnlyList<Advisory>> SearchAsync(AdvisorySearchQuery query, CancellationToken ct);
Task<Advisory> UpsertAsync(Advisory advisory, CancellationToken ct);
Task<IReadOnlyList<Advisory>> GetAffectingPackageAsync(string purl, CancellationToken ct);
Task<IReadOnlyList<Advisory>> GetAffectingPackageNameAsync(string ecosystem, string name, CancellationToken ct);
}
```
**Subtasks:**
- [ ] Implement GetByKeyAsync
- [ ] Implement GetByAliasAsync (CVE lookup)
- [ ] Implement SearchAsync with full-text search
- [ ] Implement UpsertAsync with all child tables
- [ ] Implement GetAffectingPackageAsync (PURL match)
- [ ] Implement GetAffectingPackageNameAsync
- [ ] Write integration tests
---
### T5a.5: Implement Child Table Repositories
**Status:** TODO
**Estimate:** 2 days
**Subtasks:**
- [ ] Implement Alias repository
- [ ] Implement CVSS repository
- [ ] Implement Affected repository
- [ ] Implement Reference repository
- [ ] Implement Credit repository
- [ ] Implement Weakness repository
- [ ] Implement KEV repository
- [ ] Write integration tests
---
### T5a.6: Implement Source State Repository
**Status:** TODO
**Estimate:** 0.5 days
**Subtasks:**
- [ ] Implement CRUD operations
- [ ] Implement cursor management
- [ ] Write integration tests
---
## Sprint 5b: Conversion & Verification
### T5b.1: Build Advisory Conversion Service
**Status:** TODO
**Estimate:** 2 days
**Description:**
Create service to convert MongoDB advisory documents to PostgreSQL relational structure.
**Subtasks:**
- [ ] Parse MongoDB `AdvisoryDocument` structure
- [ ] Map to `vuln.advisories` table
- [ ] Extract and normalize aliases
- [ ] Extract and normalize CVSS metrics
- [ ] Extract and normalize affected packages
- [ ] Preserve provenance JSONB
- [ ] Handle version ranges (keep as JSONB)
- [ ] Handle normalized versions (keep as JSONB)
**Conversion Logic:**
```csharp
public sealed class AdvisoryConverter
{
public async Task ConvertAsync(
IMongoCollection<AdvisoryDocument> source,
IAdvisoryRepository target,
CancellationToken ct)
{
await foreach (var doc in source.AsAsyncEnumerable(ct))
{
var advisory = MapToAdvisory(doc);
await target.UpsertAsync(advisory, ct);
}
}
private Advisory MapToAdvisory(AdvisoryDocument doc)
{
// Extract from BsonDocument payload
var payload = doc.Payload;
return new Advisory
{
AdvisoryKey = doc.Id,
PrimaryVulnId = payload["primaryVulnId"].AsString,
Title = payload["title"]?.AsString,
Summary = payload["summary"]?.AsString,
// ... etc
Provenance = BsonSerializer.Deserialize<JsonElement>(payload["provenance"]),
};
}
}
```
---
### T5b.2: Build Feed Import Pipeline
**Status:** TODO
**Estimate:** 1 day
**Description:**
Modify feed import to write directly to PostgreSQL.
**Subtasks:**
- [ ] Update NVD importer to use PostgreSQL
- [ ] Update OSV importer to use PostgreSQL
- [ ] Update GHSA importer to use PostgreSQL
- [ ] Update vendor feed importers
- [ ] Test incremental imports
---
### T5b.3: Run Parallel Import
**Status:** TODO
**Estimate:** 1 day
**Description:**
Run imports to both MongoDB and PostgreSQL simultaneously.
**Subtasks:**
- [ ] Configure dual-import mode
- [ ] Run import cycle
- [ ] Compare record counts
- [ ] Sample comparison checks
---
### T5b.4: Verify Vulnerability Matching
**Status:** TODO
**Estimate:** 2 days
**Description:**
Verify that vulnerability matching produces identical results.
**Subtasks:**
- [ ] Select sample SBOMs (various ecosystems)
- [ ] Run matching with MongoDB backend
- [ ] Run matching with PostgreSQL backend
- [ ] Compare findings (must be identical)
- [ ] Document any differences
- [ ] Fix any issues found
**Verification Tests:**
```csharp
[Theory]
[MemberData(nameof(GetSampleSboms))]
public async Task Scanner_Should_Find_Same_Vulns(string sbomPath)
{
var sbom = await LoadSbom(sbomPath);
_config["Persistence:Concelier"] = "Mongo";
var mongoFindings = await _scanner.ScanAsync(sbom);
_config["Persistence:Concelier"] = "Postgres";
var postgresFindings = await _scanner.ScanAsync(sbom);
// Strict ordering for determinism
postgresFindings.Should().BeEquivalentTo(mongoFindings,
options => options.WithStrictOrdering());
}
```
---
### T5b.5: Performance Optimization
**Status:** TODO
**Estimate:** 1 day
**Subtasks:**
- [ ] Analyze slow queries with EXPLAIN ANALYZE
- [ ] Optimize indexes for common queries
- [ ] Consider partial indexes for active advisories
- [ ] Benchmark PostgreSQL vs MongoDB performance
---
### T5b.6: Switch Scanner to PostgreSQL
**Status:** TODO
**Estimate:** 0.5 days
**Subtasks:**
- [ ] Update configuration
- [ ] Deploy to staging
- [ ] Run full scan suite
- [ ] Deploy to production
---
## Exit Criteria
- [ ] All repository interfaces implemented
- [ ] Advisory conversion pipeline working
- [ ] Vulnerability matching produces identical results
- [ ] Feed imports working on PostgreSQL
- [ ] Concelier running on PostgreSQL in production
---
## Risks & Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| Matching discrepancies | Medium | High | Extensive comparison testing |
| Performance regression on queries | Medium | Medium | Index optimization, query tuning |
| Data loss during conversion | Low | High | Verify counts, sample checks |
---
## Data Volume Estimates
| Table | Estimated Rows | Growth Rate |
|-------|----------------|-------------|
| advisories | 300,000+ | ~100/day |
| advisory_aliases | 600,000+ | ~200/day |
| advisory_affected | 2,000,000+ | ~1000/day |
| advisory_cvss | 400,000+ | ~150/day |
---
*Phase Version: 1.0.0*
*Last Updated: 2025-11-28*

View File

@@ -0,0 +1,434 @@
# Phase 6: VEX & Graph Conversion (Excititor)
**Sprint:** 8-10
**Duration:** 2-3 sprints
**Status:** TODO
**Dependencies:** Phase 5 (Vulnerabilities)
---
## Objectives
1. Create `StellaOps.Excititor.Storage.Postgres` project
2. Implement VEX schema in PostgreSQL
3. Handle graph nodes/edges efficiently
4. Preserve graph_revision_id stability (determinism critical)
5. Maintain VEX statement lattice logic
---
## Deliverables
| Deliverable | Acceptance Criteria |
|-------------|---------------------|
| VEX schema | All tables created with indexes |
| Graph storage | Nodes/edges efficiently stored |
| Statement storage | VEX statements with full provenance |
| Revision stability | Same inputs produce same revision_id |
| Integration tests | 100% coverage |
---
## Schema Reference
See [SPECIFICATION.md](../SPECIFICATION.md) Section 5.3 for complete VEX schema.
**Tables:**
- `vex.projects`
- `vex.graph_revisions`
- `vex.graph_nodes`
- `vex.graph_edges`
- `vex.statements`
- `vex.observations`
- `vex.linksets`
- `vex.linkset_events`
- `vex.consensus`
- `vex.consensus_holds`
- `vex.unknowns_snapshots`
- `vex.unknown_items`
- `vex.evidence_manifests`
- `vex.cvss_receipts`
- `vex.attestations`
- `vex.timeline_events`
---
## Sprint 6a: Core Schema & Repositories
### T6a.1: Create Excititor.Storage.Postgres Project
**Status:** TODO
**Estimate:** 0.5 days
**Subtasks:**
- [ ] Create project structure
- [ ] Add NuGet references
- [ ] Create `ExcititorDataSource` class
- [ ] Create `ServiceCollectionExtensions.cs`
---
### T6a.2: Implement Schema Migrations
**Status:** TODO
**Estimate:** 1.5 days
**Subtasks:**
- [ ] Create schema migration
- [ ] Include all tables
- [ ] Add indexes for graph traversal
- [ ] Add indexes for VEX lookups
- [ ] Test migration idempotency
---
### T6a.3: Implement Project Repository
**Status:** TODO
**Estimate:** 0.5 days
**Subtasks:**
- [ ] Implement CRUD operations
- [ ] Handle tenant scoping
- [ ] Write integration tests
---
### T6a.4: Implement VEX Statement Repository
**Status:** TODO
**Estimate:** 1.5 days
**Interface:**
```csharp
public interface IVexStatementRepository
{
Task<VexStatement?> GetAsync(string tenantId, Guid statementId, CancellationToken ct);
Task<IReadOnlyList<VexStatement>> GetByVulnerabilityAsync(
string tenantId, string vulnerabilityId, CancellationToken ct);
Task<IReadOnlyList<VexStatement>> GetByProjectAsync(
string tenantId, Guid projectId, CancellationToken ct);
Task<VexStatement> UpsertAsync(VexStatement statement, CancellationToken ct);
Task<IReadOnlyList<VexStatement>> GetByGraphRevisionAsync(
Guid graphRevisionId, CancellationToken ct);
}
```
**Subtasks:**
- [ ] Implement all interface methods
- [ ] Handle status and justification enums
- [ ] Preserve evidence JSONB
- [ ] Preserve provenance JSONB
- [ ] Write integration tests
---
### T6a.5: Implement VEX Observation Repository
**Status:** TODO
**Estimate:** 1 day
**Subtasks:**
- [ ] Implement CRUD operations
- [ ] Handle unique constraint on composite key
- [ ] Implement FindByVulnerabilityAndProductAsync
- [ ] Write integration tests
---
### T6a.6: Implement Linkset Repository
**Status:** TODO
**Estimate:** 0.5 days
**Subtasks:**
- [ ] Implement CRUD operations
- [ ] Implement event logging
- [ ] Write integration tests
---
### T6a.7: Implement Consensus Repository
**Status:** TODO
**Estimate:** 0.5 days
**Subtasks:**
- [ ] Implement CRUD operations
- [ ] Implement hold management
- [ ] Write integration tests
---
## Sprint 6b: Graph Storage
### T6b.1: Implement Graph Revision Repository
**Status:** TODO
**Estimate:** 1 day
**Interface:**
```csharp
public interface IGraphRevisionRepository
{
Task<GraphRevision?> GetByIdAsync(Guid id, CancellationToken ct);
Task<GraphRevision?> GetByRevisionIdAsync(string revisionId, CancellationToken ct);
Task<GraphRevision?> GetLatestByProjectAsync(Guid projectId, CancellationToken ct);
Task<GraphRevision> CreateAsync(GraphRevision revision, CancellationToken ct);
Task<IReadOnlyList<GraphRevision>> GetHistoryAsync(
Guid projectId, int limit, CancellationToken ct);
}
```
**Subtasks:**
- [ ] Implement all interface methods
- [ ] Handle revision_id uniqueness
- [ ] Handle parent_revision_id linking
- [ ] Write integration tests
---
### T6b.2: Implement Graph Node Repository
**Status:** TODO
**Estimate:** 1.5 days
**Interface:**
```csharp
public interface IGraphNodeRepository
{
Task<GraphNode?> GetByIdAsync(long nodeId, CancellationToken ct);
Task<GraphNode?> GetByKeyAsync(Guid graphRevisionId, string nodeKey, CancellationToken ct);
Task<IReadOnlyList<GraphNode>> GetByRevisionAsync(
Guid graphRevisionId, CancellationToken ct);
Task BulkInsertAsync(
Guid graphRevisionId, IEnumerable<GraphNode> nodes, CancellationToken ct);
Task<int> GetCountAsync(Guid graphRevisionId, CancellationToken ct);
}
```
**Subtasks:**
- [ ] Implement all interface methods
- [ ] Implement bulk insert for efficiency
- [ ] Handle node_key uniqueness per revision
- [ ] Write integration tests
**Bulk Insert Optimization:**
```csharp
public async Task BulkInsertAsync(
Guid graphRevisionId,
IEnumerable<GraphNode> nodes,
CancellationToken ct)
{
await using var connection = await _dataSource.OpenConnectionAsync("system", ct);
await using var writer = await connection.BeginBinaryImportAsync(
"COPY vex.graph_nodes (graph_revision_id, node_key, node_type, purl, name, version, attributes) " +
"FROM STDIN (FORMAT BINARY)", ct);
foreach (var node in nodes)
{
await writer.StartRowAsync(ct);
await writer.WriteAsync(graphRevisionId, ct);
await writer.WriteAsync(node.NodeKey, ct);
await writer.WriteAsync(node.NodeType, ct);
await writer.WriteAsync(node.Purl, NpgsqlDbType.Text, ct);
await writer.WriteAsync(node.Name, NpgsqlDbType.Text, ct);
await writer.WriteAsync(node.Version, NpgsqlDbType.Text, ct);
await writer.WriteAsync(JsonSerializer.Serialize(node.Attributes), NpgsqlDbType.Jsonb, ct);
}
await writer.CompleteAsync(ct);
}
```
---
### T6b.3: Implement Graph Edge Repository
**Status:** TODO
**Estimate:** 1.5 days
**Interface:**
```csharp
public interface IGraphEdgeRepository
{
Task<IReadOnlyList<GraphEdge>> GetByRevisionAsync(
Guid graphRevisionId, CancellationToken ct);
Task<IReadOnlyList<GraphEdge>> GetOutgoingAsync(
long fromNodeId, CancellationToken ct);
Task<IReadOnlyList<GraphEdge>> GetIncomingAsync(
long toNodeId, CancellationToken ct);
Task BulkInsertAsync(
Guid graphRevisionId, IEnumerable<GraphEdge> edges, CancellationToken ct);
Task<int> GetCountAsync(Guid graphRevisionId, CancellationToken ct);
}
```
**Subtasks:**
- [ ] Implement all interface methods
- [ ] Implement bulk insert for efficiency
- [ ] Optimize for traversal queries
- [ ] Write integration tests
---
### T6b.4: Verify Graph Revision ID Stability
**Status:** TODO
**Estimate:** 1 day
**Description:**
Critical: Same SBOM + feeds + policy must produce identical revision_id.
**Subtasks:**
- [ ] Document revision_id computation algorithm
- [ ] Verify nodes are inserted in deterministic order
- [ ] Verify edges are inserted in deterministic order
- [ ] Write stability tests
**Stability Test:**
```csharp
[Fact]
public async Task Same_Inputs_Should_Produce_Same_RevisionId()
{
var sbom = await LoadSbom("testdata/stable-sbom.json");
var feedSnapshot = "feed-v1.2.3";
var policyVersion = "policy-v1.0";
// Compute multiple times
var revisions = new List<string>();
for (int i = 0; i < 5; i++)
{
var graph = await _graphService.ComputeGraphAsync(
sbom, feedSnapshot, policyVersion);
revisions.Add(graph.RevisionId);
}
// All must be identical
revisions.Distinct().Should().HaveCount(1);
}
```
---
## Sprint 6c: Migration & Verification
### T6c.1: Build Graph Conversion Service
**Status:** TODO
**Estimate:** 1.5 days
**Description:**
Convert existing MongoDB graphs to PostgreSQL.
**Subtasks:**
- [ ] Parse MongoDB graph documents
- [ ] Map to graph_revisions table
- [ ] Extract and insert nodes
- [ ] Extract and insert edges
- [ ] Verify node/edge counts match
---
### T6c.2: Build VEX Conversion Service
**Status:** TODO
**Estimate:** 1 day
**Subtasks:**
- [ ] Parse MongoDB VEX statements
- [ ] Map to vex.statements table
- [ ] Preserve provenance
- [ ] Preserve evidence
---
### T6c.3: Run Dual Pipeline Comparison
**Status:** TODO
**Estimate:** 2 days
**Description:**
Run graph computation on both backends and compare.
**Subtasks:**
- [ ] Select sample projects
- [ ] Compute graphs with MongoDB
- [ ] Compute graphs with PostgreSQL
- [ ] Compare revision_ids (must match)
- [ ] Compare node counts
- [ ] Compare edge counts
- [ ] Compare VEX statements
- [ ] Document any differences
---
### T6c.4: Migrate Projects
**Status:** TODO
**Estimate:** 1 day
**Subtasks:**
- [ ] Identify projects to migrate (active VEX)
- [ ] Run conversion for each project
- [ ] Verify latest graph revision
- [ ] Verify VEX statements
---
### T6c.5: Switch to PostgreSQL-Only
**Status:** TODO
**Estimate:** 0.5 days
**Subtasks:**
- [ ] Update configuration
- [ ] Deploy to staging
- [ ] Run full test suite
- [ ] Deploy to production
- [ ] Monitor metrics
---
## Exit Criteria
- [ ] All repository interfaces implemented
- [ ] Graph storage working efficiently
- [ ] Graph revision IDs stable (deterministic)
- [ ] VEX statements preserved correctly
- [ ] All comparison tests pass
- [ ] Excititor running on PostgreSQL in production
---
## Risks & Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| Revision ID instability | Medium | Critical | Deterministic ordering tests |
| Graph storage performance | Medium | High | Bulk insert, index optimization |
| VEX lattice logic errors | Low | High | Extensive comparison testing |
---
## Performance Considerations
### Graph Storage
- Use `BIGSERIAL` for node/edge IDs (high volume)
- Use `COPY` for bulk inserts (10-100x faster)
- Index `(graph_revision_id, node_key)` for lookups
- Index `(from_node_id)` and `(to_node_id)` for traversal
### Estimated Volumes
| Table | Estimated Rows per Project | Total Estimated |
|-------|---------------------------|-----------------|
| graph_nodes | 1,000 - 50,000 | 10M+ |
| graph_edges | 2,000 - 100,000 | 20M+ |
| vex_statements | 100 - 5,000 | 1M+ |
---
*Phase Version: 1.0.0*
*Last Updated: 2025-11-28*

View File

@@ -0,0 +1,305 @@
# Phase 7: Cleanup & Optimization
**Sprint:** 11
**Duration:** 1 sprint
**Status:** TODO
**Dependencies:** All previous phases completed
---
## Objectives
1. Remove MongoDB dependencies from converted modules
2. Archive MongoDB data
3. Optimize PostgreSQL performance
4. Update documentation
5. Update air-gap kit
---
## Deliverables
| Deliverable | Acceptance Criteria |
|-------------|---------------------|
| Code cleanup | MongoDB code removed from converted modules |
| Data archive | MongoDB data archived and documented |
| Performance tuning | Query times within acceptable range |
| Documentation | All docs updated for PostgreSQL |
| Air-gap kit | PostgreSQL support added |
---
## Task Breakdown
### T7.1: Remove MongoDB Dependencies
**Status:** TODO
**Estimate:** 2 days
**Description:**
Remove MongoDB storage projects and references from converted modules.
**Subtasks:**
- [ ] T7.1.1: Remove `StellaOps.Authority.Storage.Mongo` project
- [ ] T7.1.2: Remove `StellaOps.Scheduler.Storage.Mongo` project
- [ ] T7.1.3: Remove `StellaOps.Notify.Storage.Mongo` project
- [ ] T7.1.4: Remove `StellaOps.Policy.Storage.Mongo` project
- [ ] T7.1.5: Remove `StellaOps.Concelier.Storage.Mongo` project
- [ ] T7.1.6: Remove `StellaOps.Excititor.Storage.Mongo` project
- [ ] T7.1.7: Update solution files
- [ ] T7.1.8: Remove dual-write wrappers
- [ ] T7.1.9: Remove MongoDB configuration options
- [ ] T7.1.10: Run full build to verify no broken references
**Verification:**
- [ ] Solution builds without MongoDB packages
- [ ] No MongoDB references in converted modules
- [ ] All tests pass
---
### T7.2: Archive MongoDB Data
**Status:** TODO
**Estimate:** 1 day
**Description:**
Archive MongoDB databases for historical reference.
**Subtasks:**
- [ ] T7.2.1: Take final MongoDB backup
- [ ] T7.2.2: Export to BSON/JSON archives
- [ ] T7.2.3: Store archives in secure location
- [ ] T7.2.4: Document archive contents and structure
- [ ] T7.2.5: Set retention policy for archives
- [ ] T7.2.6: Schedule MongoDB cluster decommission
**Archive Structure:**
```
archives/
├── mongodb-authority-2025-XX-XX.bson.gz
├── mongodb-scheduler-2025-XX-XX.bson.gz
├── mongodb-notify-2025-XX-XX.bson.gz
├── mongodb-policy-2025-XX-XX.bson.gz
├── mongodb-concelier-2025-XX-XX.bson.gz
├── mongodb-excititor-2025-XX-XX.bson.gz
└── ARCHIVE_MANIFEST.md
```
---
### T7.3: PostgreSQL Performance Optimization
**Status:** TODO
**Estimate:** 2 days
**Description:**
Analyze and optimize PostgreSQL performance.
**Subtasks:**
- [ ] T7.3.1: Enable `pg_stat_statements` extension
- [ ] T7.3.2: Identify slow queries
- [ ] T7.3.3: Analyze query plans with EXPLAIN ANALYZE
- [ ] T7.3.4: Add missing indexes
- [ ] T7.3.5: Remove unused indexes
- [ ] T7.3.6: Tune PostgreSQL configuration
- [ ] T7.3.7: Set up query monitoring dashboard
- [ ] T7.3.8: Document performance baselines
**Configuration Tuning:**
```ini
# postgresql.conf optimizations
shared_buffers = 25% of RAM
effective_cache_size = 75% of RAM
work_mem = 64MB
maintenance_work_mem = 512MB
random_page_cost = 1.1 # for SSD
effective_io_concurrency = 200 # for SSD
max_parallel_workers_per_gather = 4
```
**Monitoring Queries:**
```sql
-- Top slow queries
SELECT query, calls, mean_time, total_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 20;
-- Unused indexes
SELECT schemaname, tablename, indexname
FROM pg_stat_user_indexes
WHERE idx_scan = 0;
-- Table bloat
SELECT schemaname, tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as size
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;
```
---
### T7.4: Update Documentation
**Status:** TODO
**Estimate:** 1.5 days
**Description:**
Update all documentation to reflect PostgreSQL as the primary database.
**Subtasks:**
- [ ] T7.4.1: Update `docs/07_HIGH_LEVEL_ARCHITECTURE.md`
- [ ] T7.4.2: Update module architecture docs
- [ ] T7.4.3: Update deployment guides
- [ ] T7.4.4: Update operations runbooks
- [ ] T7.4.5: Update troubleshooting guides
- [ ] T7.4.6: Update `CLAUDE.md` technology stack
- [ ] T7.4.7: Create PostgreSQL operations guide
- [ ] T7.4.8: Document backup/restore procedures
- [ ] T7.4.9: Document scaling recommendations
**New Documents:**
- `docs/operations/postgresql-guide.md`
- `docs/operations/postgresql-backup-restore.md`
- `docs/operations/postgresql-troubleshooting.md`
---
### T7.5: Update Air-Gap Kit
**Status:** TODO
**Estimate:** 1 day
**Description:**
Update offline/air-gap kit to include PostgreSQL.
**Subtasks:**
- [ ] T7.5.1: Add PostgreSQL container image to kit
- [ ] T7.5.2: Update kit scripts for PostgreSQL setup
- [ ] T7.5.3: Include schema migrations in kit
- [ ] T7.5.4: Update kit documentation
- [ ] T7.5.5: Test kit installation in air-gapped environment
- [ ] T7.5.6: Update `docs/24_OFFLINE_KIT.md`
**Air-Gap Kit Structure:**
```
offline-kit/
├── images/
│ ├── postgres-16-alpine.tar
│ └── stellaops-*.tar
├── schemas/
│ ├── authority.sql
│ ├── vuln.sql
│ ├── vex.sql
│ ├── scheduler.sql
│ ├── notify.sql
│ └── policy.sql
├── scripts/
│ ├── setup-postgres.sh
│ ├── run-migrations.sh
│ └── import-data.sh
└── docs/
└── OFFLINE_SETUP.md
```
---
### T7.6: Final Verification
**Status:** TODO
**Estimate:** 1 day
**Description:**
Run final verification of all systems.
**Subtasks:**
- [ ] T7.6.1: Run full integration test suite
- [ ] T7.6.2: Run performance benchmark suite
- [ ] T7.6.3: Verify all modules on PostgreSQL
- [ ] T7.6.4: Verify determinism tests pass
- [ ] T7.6.5: Verify air-gap kit works
- [ ] T7.6.6: Generate final verification report
- [ ] T7.6.7: Get sign-off from stakeholders
---
### T7.7: Decommission MongoDB
**Status:** TODO
**Estimate:** 0.5 days
**Description:**
Final decommission of MongoDB infrastructure.
**Subtasks:**
- [ ] T7.7.1: Verify no services using MongoDB
- [ ] T7.7.2: Stop MongoDB instances
- [ ] T7.7.3: Archive final state
- [ ] T7.7.4: Remove MongoDB from infrastructure
- [ ] T7.7.5: Update monitoring/alerting
- [ ] T7.7.6: Update cost projections
---
## Exit Criteria
- [ ] All MongoDB code removed from converted modules
- [ ] MongoDB data archived
- [ ] PostgreSQL performance optimized
- [ ] All documentation updated
- [ ] Air-gap kit updated and tested
- [ ] Final verification report approved
- [ ] MongoDB infrastructure decommissioned
---
## Post-Conversion Monitoring
### First Week
- Monitor error rates closely
- Track query performance
- Watch for any data inconsistencies
- Have rollback plan ready (restore MongoDB)
### First Month
- Review query statistics weekly
- Optimize any slow queries found
- Monitor storage growth
- Adjust vacuum settings if needed
### Ongoing
- Regular performance reviews
- Index maintenance
- Backup verification
- Capacity planning
---
## Rollback Considerations
**Note:** After Phase 7 completion, rollback to MongoDB becomes significantly more complex. Ensure all stakeholders understand:
1. MongoDB archives are read-only backup
2. Any new data created after cutover is PostgreSQL-only
3. Full rollback would require data export/import
---
## Success Metrics
| Metric | Target | Measurement |
|--------|--------|-------------|
| Query latency (p95) | < 100ms | pg_stat_statements |
| Error rate | < 0.01% | Application logs |
| Storage efficiency | < 120% of MongoDB | Disk usage |
| Test coverage | 100% | CI reports |
| Documentation coverage | 100% | Manual review |
---
*Phase Version: 1.0.0*
*Last Updated: 2025-11-28*

View File

@@ -55,11 +55,12 @@
| 12 | ORCH-OBS-53-001 | BLOCKED (2025-11-19) | PREP-ORCH-OBS-53-001-DEPENDS-ON-52-001-EVIDEN | Orchestrator Service Guild · Evidence Locker Guild | Generate job capsule inputs for Evidence Locker; invoke snapshot hooks; enforce redaction guard. | | 12 | ORCH-OBS-53-001 | BLOCKED (2025-11-19) | PREP-ORCH-OBS-53-001-DEPENDS-ON-52-001-EVIDEN | Orchestrator Service Guild · Evidence Locker Guild | Generate job capsule inputs for Evidence Locker; invoke snapshot hooks; enforce redaction guard. |
| 13 | ORCH-OBS-54-001 | BLOCKED (2025-11-19) | PREP-ORCH-OBS-54-001-DEPENDS-ON-53-001 | Orchestrator Service Guild · Provenance Guild | Produce DSSE attestations for orchestrator-scheduled jobs; store references in timeline + Evidence Locker; add verification endpoint `/jobs/{id}/attestation`. | | 13 | ORCH-OBS-54-001 | BLOCKED (2025-11-19) | PREP-ORCH-OBS-54-001-DEPENDS-ON-53-001 | Orchestrator Service Guild · Provenance Guild | Produce DSSE attestations for orchestrator-scheduled jobs; store references in timeline + Evidence Locker; add verification endpoint `/jobs/{id}/attestation`. |
| 14 | ORCH-OBS-55-001 | BLOCKED (2025-11-19) | PREP-ORCH-OBS-55-001-DEPENDS-ON-54-001-INCIDE | Orchestrator Service Guild · DevOps Guild | Incident mode hooks (sampling overrides, extended retention, debug spans) with automatic activation on SLO burn-rate breach; emit activation/deactivation events. | | 14 | ORCH-OBS-55-001 | BLOCKED (2025-11-19) | PREP-ORCH-OBS-55-001-DEPENDS-ON-54-001-INCIDE | Orchestrator Service Guild · DevOps Guild | Incident mode hooks (sampling overrides, extended retention, debug spans) with automatic activation on SLO burn-rate breach; emit activation/deactivation events. |
| 15 | ORCH-SVC-32-001 | BLOCKED (2025-11-19) | PREP-ORCH-SVC-32-001-UPSTREAM-READINESS-AIRGA | Orchestrator Service Guild | Bootstrap service project/config and Postgres schema/migrations for sources, runs, jobs, dag_edges, artifacts, quotas, schedules. | | 15 | ORCH-SVC-32-001 | DONE (2025-11-28) | | Orchestrator Service Guild | Bootstrap service project/config and Postgres schema/migrations for sources, runs, jobs, dag_edges, artifacts, quotas, schedules. |
## Execution Log ## Execution Log
| Date (UTC) | Update | Owner | | Date (UTC) | Update | Owner |
| --- | --- | --- | | --- | --- | --- |
| 2025-11-28 | ORCH-SVC-32-001 DONE: Implemented Postgres schema/migrations (001_initial.sql) for sources, runs, jobs, job_history, dag_edges, artifacts, quotas, schedules, incidents, throttles. Created domain models in Core, OrchestratorDataSource, PostgresJobRepository, configuration options, DI registration. Build verified. | Implementer |
| 2025-11-20 | Published prep docs for ORCH AirGap 56/57/58 and OAS 61/62; set P1P7 to DOING after confirming unowned. | Project Mgmt | | 2025-11-20 | Published prep docs for ORCH AirGap 56/57/58 and OAS 61/62; set P1P7 to DOING after confirming unowned. | Project Mgmt |
| 2025-11-20 | Started PREP-ORCH-OAS-63-001 (status → DOING) after confirming no existing DOING/DONE owners. | Planning | | 2025-11-20 | Started PREP-ORCH-OAS-63-001 (status → DOING) after confirming no existing DOING/DONE owners. | Planning |
| 2025-11-20 | Published prep doc for PREP-ORCH-OAS-63-001 (`docs/modules/orchestrator/prep/2025-11-20-oas-63-001-prep.md`) and marked P8 DONE; awaits OAS 61/62 freeze before implementation. | Implementer | | 2025-11-20 | Published prep doc for PREP-ORCH-OAS-63-001 (`docs/modules/orchestrator/prep/2025-11-20-oas-63-001-prep.md`) and marked P8 DONE; awaits OAS 61/62 freeze before implementation. | Implementer |

View File

@@ -20,15 +20,15 @@
## Delivery Tracker ## Delivery Tracker
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition | | # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
| --- | --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- | --- |
| 1 | ORCH-SVC-32-002 | TODO | Depends on ORCH-SVC-32-001 (Sprint 0151). | Orchestrator Service Guild (`src/Orchestrator/StellaOps.Orchestrator`) | Implement scheduler DAG planner + dependency resolver, job state machine, critical-path metadata (no control actions yet). | | 1 | ORCH-SVC-32-002 | DONE | Depends on ORCH-SVC-32-001 (Sprint 0151). | Orchestrator Service Guild (`src/Orchestrator/StellaOps.Orchestrator`) | Implement scheduler DAG planner + dependency resolver, job state machine, critical-path metadata (no control actions yet). |
| 2 | ORCH-SVC-32-003 | TODO | Depends on 32-002. | Orchestrator Service Guild | Expose read-only REST APIs (sources, runs, jobs, DAG) with OpenAPI, validation, pagination, tenant scoping. | | 2 | ORCH-SVC-32-003 | DONE | Depends on 32-002. | Orchestrator Service Guild | Expose read-only REST APIs (sources, runs, jobs, DAG) with OpenAPI, validation, pagination, tenant scoping. |
| 3 | ORCH-SVC-32-004 | TODO | Depends on 32-003. | Orchestrator Service Guild | Implement WebSocket/SSE stream for job/run updates; emit structured metrics counters/histograms; add health probes. | | 3 | ORCH-SVC-32-004 | DONE | Depends on 32-003. | Orchestrator Service Guild | Implement WebSocket/SSE stream for job/run updates; emit structured metrics counters/histograms; add health probes. |
| 4 | ORCH-SVC-32-005 | TODO | Depends on 32-004. | Orchestrator Service Guild | Deliver worker claim/heartbeat/progress endpoints capturing artifact metadata/checksums and enforcing idempotency keys. | | 4 | ORCH-SVC-32-005 | DONE | Depends on 32-004. | Orchestrator Service Guild | Deliver worker claim/heartbeat/progress endpoints capturing artifact metadata/checksums and enforcing idempotency keys. |
| 5 | ORCH-SVC-33-001 | TODO | Depends on 32-005. | Orchestrator Service Guild | Enable `sources` tests (control-plane validation). | | 5 | ORCH-SVC-33-001 | DONE | Depends on 32-005. | Orchestrator Service Guild | Enable `sources` tests (control-plane validation). |
| 6 | ORCH-SVC-33-002 | TODO | Depends on 33-001. | Orchestrator Service Guild | Per-source/tenant adaptive token-bucket limiter, concurrency caps, backpressure reacting to upstream 429/503. | | 6 | ORCH-SVC-33-002 | DONE | Depends on 33-001. | Orchestrator Service Guild | Per-source/tenant adaptive token-bucket limiter, concurrency caps, backpressure reacting to upstream 429/503. |
| 7 | ORCH-SVC-33-003 | TODO | Depends on 33-002. | Orchestrator Service Guild | Watermark/backfill manager with event-time windows, duplicate suppression, dry-run preview endpoint, safety validations. | | 7 | ORCH-SVC-33-003 | DONE | Depends on 33-002. | Orchestrator Service Guild | Watermark/backfill manager with event-time windows, duplicate suppression, dry-run preview endpoint, safety validations. |
| 8 | ORCH-SVC-33-004 | TODO | Depends on 33-003. | Orchestrator Service Guild | Dead-letter store, replay endpoints, error classification with remediation hints + notification hooks. | | 8 | ORCH-SVC-33-004 | DONE | Depends on 33-003. | Orchestrator Service Guild | Dead-letter store, replay endpoints, error classification with remediation hints + notification hooks. |
| 9 | ORCH-SVC-34-001 | TODO | Depends on 33-004. | Orchestrator Service Guild | Quota management APIs, per-tenant SLO burn-rate computation, alert budget tracking via metrics. | | 9 | ORCH-SVC-34-001 | DONE | Depends on 33-004. | Orchestrator Service Guild | Quota management APIs, per-tenant SLO burn-rate computation, alert budget tracking via metrics. |
| 10 | ORCH-SVC-34-002 | TODO | Depends on 34-001. | Orchestrator Service Guild | Audit log + immutable run ledger export with signed manifest and provenance chain to artifacts. | | 10 | ORCH-SVC-34-002 | TODO | Depends on 34-001. | Orchestrator Service Guild | Audit log + immutable run ledger export with signed manifest and provenance chain to artifacts. |
| 11 | ORCH-SVC-34-003 | TODO | Depends on 34-002. | Orchestrator Service Guild | Perf/scale validation (≥10k pending jobs, dispatch P95 <150ms); autoscaling hooks; health probes. | | 11 | ORCH-SVC-34-003 | TODO | Depends on 34-002. | Orchestrator Service Guild | Perf/scale validation (≥10k pending jobs, dispatch P95 <150ms); autoscaling hooks; health probes. |
| 12 | ORCH-SVC-34-004 | TODO | Depends on 34-003. | Orchestrator Service Guild | GA packaging: container image, Helm overlays, offline bundle seeds, provenance attestations, compliance checklist. | | 12 | ORCH-SVC-34-004 | TODO | Depends on 34-003. | Orchestrator Service Guild | GA packaging: container image, Helm overlays, offline bundle seeds, provenance attestations, compliance checklist. |
@@ -42,6 +42,15 @@
| 2025-11-08 | Sprint stub (legacy format) created; awaiting orchestrator phase I completion. | Planning | | 2025-11-08 | Sprint stub (legacy format) created; awaiting orchestrator phase I completion. | Planning |
| 2025-11-19 | Normalized sprint to standard template and renamed from `SPRINT_152_orchestrator_ii.md` to `SPRINT_0152_0001_0002_orchestrator_ii.md`; content preserved. | Implementer | | 2025-11-19 | Normalized sprint to standard template and renamed from `SPRINT_152_orchestrator_ii.md` to `SPRINT_0152_0001_0002_orchestrator_ii.md`; content preserved. | Implementer |
| 2025-11-19 | Added legacy-file redirect stub to avoid divergent updates. | Implementer | | 2025-11-19 | Added legacy-file redirect stub to avoid divergent updates. | Implementer |
| 2025-11-28 | ORCH-SVC-32-002 DONE: Implemented JobStateMachine (status transitions/validation), DagPlanner (cycle detection, topological sort, critical path, dependency resolution), RetryPolicy (exponential backoff with jitter), JobScheduler (scheduling coordination). Added unit tests (67 tests passing). | Implementer |
| 2025-11-28 | ORCH-SVC-32-003 DONE: Implemented REST APIs for sources, runs, jobs, and DAG. Added TenantResolver, EndpointHelpers, pagination support with cursors. Endpoints: SourceEndpoints (list, get), RunEndpoints (list, get, jobs, summary), JobEndpoints (list, get, detail, summary, by-idempotency-key), DagEndpoints (run DAG, edges, ready-jobs, blocked-jobs, parents, children). Build succeeds, 67 tests pass. | Implementer |
| 2025-11-28 | ORCH-SVC-32-004 DONE: Implemented SSE streaming for jobs and runs. Created SseWriter utility, StreamOptions configuration, JobStreamCoordinator (job state changes), RunStreamCoordinator (run progress). Added StreamEndpoints (/api/v1/orchestrator/stream/jobs/{jobId}, /api/v1/orchestrator/stream/runs/{runId}). Enhanced HealthEndpoints with /healthz, /readyz, /livez, /health/details including database, memory, and thread pool checks. Metrics already implemented in Infrastructure. 67 tests pass. | Implementer |
| 2025-11-28 | ORCH-SVC-32-005 DONE: Implemented worker endpoints for claim/heartbeat/progress/complete. Created WorkerContracts (ClaimRequest/Response, HeartbeatRequest/Response, ProgressRequest/Response, CompleteRequest/Response, ArtifactInput). Added IArtifactRepository interface and PostgresArtifactRepository. Created WorkerEndpoints with POST /api/v1/orchestrator/worker/claim, POST /worker/jobs/{jobId}/heartbeat, POST /worker/jobs/{jobId}/progress, POST /worker/jobs/{jobId}/complete. Added idempotency key enforcement and artifact metadata/checksum capture. Enhanced OrchestratorMetrics with ArtifactCreated, HeartbeatReceived, ProgressReported counters. Build succeeds, 67 tests pass. | Implementer |
| 2025-11-28 | ORCH-SVC-33-001 DONE: Enabled sources control-plane validation. Created PostgresSourceRepository (CRUD, pause/resume, list with filters) and PostgresRunRepository (CRUD, status updates, job count incrementing). Added OrchestratorMetrics for sources (SourceCreated, SourcePaused, SourceResumed) and runs (RunCreated, RunCompleted). Registered all repositories in DI container. Created comprehensive control-plane tests: SourceTests (17 tests for Source domain validation, pause/resume semantics, configuration handling) and RunTests (27 tests for Run lifecycle, status transitions, job counting invariants). Build succeeds, 111 tests pass (+44 new tests). | Implementer |
| 2025-11-28 | ORCH-SVC-33-002 DONE: Implemented per-source/tenant adaptive rate limiting. Created Throttle domain model (ThrottleReasons constants). Built RateLimiting components: TokenBucket (token bucket algorithm with refill/consume/snapshot), ConcurrencyLimiter (max active jobs tracking with acquire/release), BackpressureHandler (429/503 handling with exponential backoff and jitter), HourlyCounter (hourly rate tracking with automatic reset), AdaptiveRateLimiter (combines all strategies with rollback on partial failures). Created IQuotaRepository/IThrottleRepository interfaces and PostgresQuotaRepository/PostgresThrottleRepository implementations with full CRUD and state management. Added OrchestratorMetrics for quotas (QuotaCreated/Paused/Resumed), throttles (ThrottleCreated/Deactivated), rate limiting (RateLimitDenied, BackpressureEvent, TokenBucketUtilization, ConcurrencyUtilization). Registered repositories in DI container. Comprehensive test coverage: TokenBucketTests, ConcurrencyLimiterTests, BackpressureHandlerTests, AdaptiveRateLimiterTests, HourlyCounterTests. Build succeeds, 232 tests pass (+121 new tests). | Implementer |
| 2025-11-28 | ORCH-SVC-33-003 DONE: Implemented watermark/backfill manager with event-time windows, duplicate suppression, dry-run preview, and safety validations. Created database migration (002_backfill.sql) with tables: watermarks (event-time cursors per scope), backfill_requests (batch reprocessing operations), processed_events (duplicate suppression with TTL), backfill_checkpoints (resumable batch state). Built domain models: Watermark (scope keys, advance with sequence/hash, windowing), BackfillRequest (state machine with validation/start/pause/resume/complete/fail/cancel transitions), BackfillSafetyChecks (blocking/warning validation), BackfillPreview (dry-run estimation). Created Backfill components: EventTimeWindow (contains/overlaps/intersect/split), EventTimeWindowOptions (hourly/daily batches), EventTimeWindowPlanner (window computation, lag detection, estimation), IDuplicateSuppressor/InMemoryDuplicateSuppressor (event tracking with TTL, batch filtering), DuplicateFilterResult (separation of new/duplicate events), BackfillManager/IBackfillManager (request lifecycle, validation, preview), IBackfillSafetyValidator/DefaultBackfillSafetyValidator (retention/overlap/limit checks). Created repository interfaces: IWatermarkRepository, IBackfillRepository, IBackfillCheckpointRepository with BackfillCheckpoint domain model. Implemented PostgresWatermarkRepository (CRUD, optimistic concurrency, lag queries), PostgresBackfillRepository (CRUD, overlap detection, status counts), PostgresDuplicateSuppressor/PostgresDuplicateSuppressorFactory (TTL-managed dedup). Added OrchestratorMetrics for watermarks (Created/Advanced/Lag), backfills (Created/StatusChanged/EventsProcessed/Skipped/Duration/Progress), duplicate suppression (Marked/CleanedUp/Detected). Registered services in DI container. Comprehensive test coverage: WatermarkTests (scope keys, create, advance, windowing), BackfillRequestTests (lifecycle, state machine, safety checks), BackfillSafetyChecksTests (blocking/warning validation), EventTimeWindowTests (duration, contains, overlaps, intersect, split, static factories), EventTimeWindowPlannerTests (window computation, lag, estimation), EventTimeWindowOptionsTests (hourly/daily defaults), DuplicateSuppressorTests (has/get/mark processed, batch filtering), ProcessedEventTests (record semantics). Build succeeds, 288 tests pass (+56 new tests). | Implementer |
| 2025-11-28 | ORCH-SVC-33-004 DONE: Implemented dead-letter store with replay endpoints, error classification, remediation hints, and notification hooks. Created database migration (003_dead_letter.sql) with tables: dead_letter_entries (failed jobs with error classification), dead_letter_replay_audit (replay attempt tracking), dead_letter_notification_rules (alerting configuration), dead_letter_notification_log (notification history). Built domain models: DeadLetterEntry (entry lifecycle with Pending/Replaying/Replayed/Resolved/Exhausted/Expired states, FromFailedJob factory, StartReplay/CompleteReplay/FailReplay/Resolve/MarkExpired transitions, CanReplay/IsTerminal computed properties), DeadLetterStatus enum, ErrorCategory enum (Unknown/Transient/NotFound/AuthFailure/RateLimited/ValidationError/UpstreamError/InternalError/Conflict/Canceled). Created error classification system: ClassifiedError record, IErrorClassifier interface, DefaultErrorClassifier (40+ error codes with ORCH-TRN/NF/AUTH/RL/VAL/UP/INT/CON/CAN prefixes, HTTP status mapping, exception classification, remediation hints, retry delays). Built repository interfaces: IDeadLetterRepository (CRUD, list with filters, stats, actionable summary, mark expired, purge), IReplayAuditRepository (audit tracking), ReplayAuditRecord (Create/Complete/Fail transitions). Implemented PostgresDeadLetterRepository and PostgresReplayAuditRepository with full CRUD, filtering, statistics aggregation. Created ReplayManager: IReplayManager interface, ReplayManagerOptions, ReplayResult/BatchReplayResult records, replay single/batch/pending operations with audit logging and notification triggers. Built notification system: NotificationChannel enum (Email/Slack/Teams/Webhook/PagerDuty), NotificationRule (filter criteria, rate limiting with cooldown/max-per-hour, aggregation), IDeadLetterNotifier interface, DeadLetterNotifier (new entry/replay success/exhausted/aggregated notifications), NullDeadLetterNotifier, INotificationDelivery/INotificationRuleRepository interfaces, DeadLetterNotificationPayload/EntrySummary/StatsSnapshot records. Created REST endpoints: DeadLetterEndpoints (list/get/stats/summary, replay single/batch/pending, resolve single/batch, error-codes reference, replay audit). Added OrchestratorMetrics: DeadLetterCreated/StatusChanged/ReplayAttempted/ReplaySucceeded/ReplayFailed/Expired/Purged/NotificationSent/NotificationFailed/PendingChanged. Comprehensive test coverage: DeadLetterEntryTests (22 tests for FromFailedJob, lifecycle transitions, CanReplay/IsTerminal), ErrorClassificationTests (25 tests for error code classification, exception mapping, HTTP status codes, remediation hints), NotificationRuleTests (20 tests for rule matching, rate limiting, cooldown), ReplayAuditRecordTests (3 tests for Create/Complete/Fail). Build succeeds, 402 tests pass (+114 new tests). | Implementer |
| 2025-11-28 | ORCH-SVC-34-001 DONE: Implemented quota management APIs with SLO burn-rate computation and alert budget tracking. Created Slo domain model (Domain/Slo.cs) with SloType enum (Availability/Latency/Throughput), SloWindow enum (1h/1d/7d/30d), AlertSeverity enum, factory methods (CreateAvailability/CreateLatency/CreateThroughput), Update/Enable/Disable methods, ErrorBudget/GetWindowDuration computed properties. Created SloState record for current metrics (SLI, budget consumed/remaining, burn rate, time to exhaustion). Created AlertBudgetThreshold (threshold-based alerting with cooldown and rate limiting, ShouldTrigger logic). Created SloAlert (alert lifecycle with Acknowledge/Resolve). Built BurnRateEngine (SloManagement/BurnRateEngine.cs) with interfaces: IBurnRateEngine (ComputeStateAsync, ComputeAllStatesAsync, EvaluateAlertsAsync), ISloEventSource (availability/latency/throughput counts retrieval), ISloRepository/IAlertThresholdRepository/ISloAlertRepository. Created database migration (004_slo_quotas.sql) with tables: slos, alert_budget_thresholds, slo_alerts, slo_state_snapshots, quota_audit_log, job_metrics_hourly. Added helper functions: get_slo_availability_counts, cleanup_slo_snapshots, cleanup_quota_audit_log, get_slo_summary. Created REST API contracts (QuotaContracts.cs): CreateQuotaRequest/UpdateQuotaRequest/PauseQuotaRequest/QuotaResponse/QuotaListResponse, CreateSloRequest/UpdateSloRequest/SloResponse/SloListResponse/SloStateResponse/SloWithStateResponse, CreateAlertThresholdRequest/AlertThresholdResponse, SloAlertResponse/SloAlertListResponse/AcknowledgeAlertRequest/ResolveAlertRequest, SloSummaryResponse/QuotaSummaryResponse/QuotaUtilizationResponse. Created QuotaEndpoints (list/get/create/update/delete, pause/resume, summary). Created SloEndpoints (list/get/create/update/delete, enable/disable, state/states, thresholds CRUD, alerts list/get/acknowledge/resolve, summary). Added SLO metrics to OrchestratorMetrics: SlosCreated/SlosUpdated, SloAlertsTriggered/Acknowledged/Resolved, SloBudgetConsumed/SloBurnRate/SloCurrentSli/SloBudgetRemaining/SloTimeToExhaustion histograms, SloActiveAlerts UpDownCounter. Comprehensive test coverage: SloTests (25 tests for creation/validation/error budget/window duration/update/enable-disable), SloStateTests (tests for NoData factory), AlertBudgetThresholdTests (12 tests for creation/validation/ShouldTrigger/cooldown), SloAlertTests (5 tests for Create/Acknowledge/Resolve). Build succeeds, 450 tests pass (+48 new tests). | Implementer |
## Decisions & Risks ## Decisions & Risks
- All tasks depend on outputs from Orchestrator I (32-001); sprint remains TODO until upstream ship. - All tasks depend on outputs from Orchestrator I (32-001); sprint remains TODO until upstream ship.

View File

@@ -27,8 +27,8 @@
| 1 | CVSS-MODEL-190-001 | DONE (2025-11-28) | None; foundational. | Policy Guild · Signals Guild (`src/Policy/StellaOps.Policy.Scoring`) | Design and implement CVSS v4.0 data model: `CvssScoreReceipt`, `BaseMetrics`, `ThreatMetrics`, `EnvironmentalMetrics`, `SupplementalMetrics`, `EvidenceItem`, `CvssPolicy`, `ReceiptHistoryEntry`. Include EF Core mappings and MongoDB schema. Evidence: Created `StellaOps.Policy.Scoring` project with `CvssMetrics.cs` (all CVSS v4.0 metric enums/records), `CvssScoreReceipt.cs` (receipt model with scores, evidence, history), `CvssPolicy.cs` (policy configuration), JSON schemas `cvss-policy-schema@1.json` and `cvss-receipt-schema@1.json`, and `AGENTS.md`. | | 1 | CVSS-MODEL-190-001 | DONE (2025-11-28) | None; foundational. | Policy Guild · Signals Guild (`src/Policy/StellaOps.Policy.Scoring`) | Design and implement CVSS v4.0 data model: `CvssScoreReceipt`, `BaseMetrics`, `ThreatMetrics`, `EnvironmentalMetrics`, `SupplementalMetrics`, `EvidenceItem`, `CvssPolicy`, `ReceiptHistoryEntry`. Include EF Core mappings and MongoDB schema. Evidence: Created `StellaOps.Policy.Scoring` project with `CvssMetrics.cs` (all CVSS v4.0 metric enums/records), `CvssScoreReceipt.cs` (receipt model with scores, evidence, history), `CvssPolicy.cs` (policy configuration), JSON schemas `cvss-policy-schema@1.json` and `cvss-receipt-schema@1.json`, and `AGENTS.md`. |
| 2 | CVSS-ENGINE-190-002 | DONE (2025-11-28) | Depends on 190-001 for types. | Policy Guild (`src/Policy/StellaOps.Policy.Scoring/Engine`) | Implement `CvssV4Engine` with: `ParseVector()`, `ComputeBaseScore()`, `ComputeThreatAdjustedScore()`, `ComputeEnvironmentalAdjustedScore()`, `BuildVector()`. Follow FIRST spec v4.0 exactly for math/rounding. Evidence: `ICvssV4Engine.cs` interface, `CvssV4Engine.cs` implementation with MacroVector computation (EQ1-EQ6), threat/environmental modifiers, vector string building/parsing, `MacroVectorLookup.cs` with score tables. | | 2 | CVSS-ENGINE-190-002 | DONE (2025-11-28) | Depends on 190-001 for types. | Policy Guild (`src/Policy/StellaOps.Policy.Scoring/Engine`) | Implement `CvssV4Engine` with: `ParseVector()`, `ComputeBaseScore()`, `ComputeThreatAdjustedScore()`, `ComputeEnvironmentalAdjustedScore()`, `BuildVector()`. Follow FIRST spec v4.0 exactly for math/rounding. Evidence: `ICvssV4Engine.cs` interface, `CvssV4Engine.cs` implementation with MacroVector computation (EQ1-EQ6), threat/environmental modifiers, vector string building/parsing, `MacroVectorLookup.cs` with score tables. |
| 3 | CVSS-TESTS-190-003 | DONE (2025-11-28) | Depends on 190-002. | Policy Guild · QA Guild (`src/Policy/__Tests/StellaOps.Policy.Scoring.Tests`) | Unit tests for CVSS v4.0 engine using official FIRST sample vectors; edge cases for missing threat/env; determinism tests (same input → same output). Evidence: Created `StellaOps.Policy.Scoring.Tests` project with `CvssV4EngineTests.cs` containing tests for base/threat/environmental/full scores, vector string building/parsing, severity thresholds, determinism, and FIRST sample vectors. | | 3 | CVSS-TESTS-190-003 | DONE (2025-11-28) | Depends on 190-002. | Policy Guild · QA Guild (`src/Policy/__Tests/StellaOps.Policy.Scoring.Tests`) | Unit tests for CVSS v4.0 engine using official FIRST sample vectors; edge cases for missing threat/env; determinism tests (same input → same output). Evidence: Created `StellaOps.Policy.Scoring.Tests` project with `CvssV4EngineTests.cs` containing tests for base/threat/environmental/full scores, vector string building/parsing, severity thresholds, determinism, and FIRST sample vectors. |
| 4 | CVSS-POLICY-190-004 | TODO | Depends on 190-002. | Policy Guild (`src/Policy/StellaOps.Policy.Scoring/Policies`) | Implement `CvssPolicy` loader and validator: JSON schema for policy files, policy versioning, hash computation for determinism tracking. | | 4 | CVSS-POLICY-190-004 | DONE (2025-11-28) | Depends on 190-002. | Policy Guild (`src/Policy/StellaOps.Policy.Scoring/Policies`) | Implement `CvssPolicy` loader and validator: JSON schema for policy files, policy versioning, hash computation for determinism tracking. |
| 5 | CVSS-RECEIPT-190-005 | TODO | Depends on 190-002, 190-004. | Policy Guild (`src/Policy/StellaOps.Policy.Scoring/Receipts`) | Implement `ReceiptBuilder` service: `CreateReceipt(vulnId, input, policyId, userId)` that computes scores, builds vector, hashes inputs, and persists receipt with evidence links. | | 5 | CVSS-RECEIPT-190-005 | DONE (2025-11-28) | Depends on 190-002, 190-004. | Policy Guild (`src/Policy/StellaOps.Policy.Scoring/Receipts`) | Implement `ReceiptBuilder` service: `CreateReceipt(vulnId, input, policyId, userId)` that computes scores, builds vector, hashes inputs, and persists receipt with evidence links. |
| 6 | CVSS-DSSE-190-006 | TODO | Depends on 190-005; uses Attestor primitives. | Policy Guild · Attestor Guild (`src/Policy/StellaOps.Policy.Scoring`, `src/Attestor/StellaOps.Attestor.Envelope`) | Attach DSSE attestations to score receipts: create `stella.ops/cvssReceipt@v1` predicate type, sign receipts, store envelope references. | | 6 | CVSS-DSSE-190-006 | TODO | Depends on 190-005; uses Attestor primitives. | Policy Guild · Attestor Guild (`src/Policy/StellaOps.Policy.Scoring`, `src/Attestor/StellaOps.Attestor.Envelope`) | Attach DSSE attestations to score receipts: create `stella.ops/cvssReceipt@v1` predicate type, sign receipts, store envelope references. |
| 7 | CVSS-HISTORY-190-007 | TODO | Depends on 190-005. | Policy Guild (`src/Policy/StellaOps.Policy.Scoring/History`) | Implement receipt amendment tracking: `AmendReceipt(receiptId, field, newValue, reason, ref)` with history entry creation and re-signing. | | 7 | CVSS-HISTORY-190-007 | TODO | Depends on 190-005. | Policy Guild (`src/Policy/StellaOps.Policy.Scoring/History`) | Implement receipt amendment tracking: `AmendReceipt(receiptId, field, newValue, reason, ref)` with history entry creation and re-signing. |
| 8 | CVSS-CONCELIER-190-008 | TODO | Depends on 190-001; coordinate with Concelier. | Concelier Guild · Policy Guild (`src/Concelier/__Libraries/StellaOps.Concelier.Core`) | Ingest vendor-provided CVSS v4.0 vectors from advisories; parse and store as base receipts; preserve provenance. | | 8 | CVSS-CONCELIER-190-008 | TODO | Depends on 190-001; coordinate with Concelier. | Concelier Guild · Policy Guild (`src/Concelier/__Libraries/StellaOps.Concelier.Core`) | Ingest vendor-provided CVSS v4.0 vectors from advisories; parse and store as base receipts; preserve provenance. |
@@ -40,7 +40,7 @@
## Wave Coordination ## Wave Coordination
| Wave | Guild owners | Shared prerequisites | Status | Notes | | Wave | Guild owners | Shared prerequisites | Status | Notes |
| --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- |
| W1 Foundation | Policy Guild | None | TODO | Tasks 1-4: Data model, engine, tests, policy loader. | | W1 Foundation | Policy Guild | None | DONE (2025-11-28) | Tasks 1-4: Data model, engine, tests, policy loader. |
| W2 Receipt Pipeline | Policy Guild · Attestor Guild | W1 complete | TODO | Tasks 5-7: Receipt builder, DSSE, history. | | W2 Receipt Pipeline | Policy Guild · Attestor Guild | W1 complete | TODO | Tasks 5-7: Receipt builder, DSSE, history. |
| W3 Integration | Concelier · Policy · CLI · UI Guilds | W2 complete | TODO | Tasks 8-11: Vendor ingest, APIs, CLI, UI. | | W3 Integration | Concelier · Policy · CLI · UI Guilds | W2 complete | TODO | Tasks 8-11: Vendor ingest, APIs, CLI, UI. |
| W4 Documentation | Docs Guild | W3 complete | TODO | Task 12: Full documentation. | | W4 Documentation | Docs Guild | W3 complete | TODO | Task 12: Full documentation. |
@@ -59,7 +59,7 @@
| # | Action | Owner | Due (UTC) | Status | Notes | | # | Action | Owner | Due (UTC) | Status | Notes |
| --- | --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- | --- |
| 1 | Review FIRST CVSS v4.0 spec and identify implementation gaps. | Policy Guild | TBD | Open | Reference: https://www.first.org/cvss/v4-0/ | | 1 | Review FIRST CVSS v4.0 spec and identify implementation gaps. | Policy Guild | TBD | Open | Reference: https://www.first.org/cvss/v4-0/ |
| 2 | Draft CvssPolicy JSON schema for team review. | Policy Guild | TBD | Open | | | 2 | Draft CvssPolicy JSON schema for team review. | Policy Guild | 2025-11-28 | DONE | Schema implemented and embedded at `src/Policy/StellaOps.Policy.Scoring/Schemas/cvss-policy-schema@1.json`; loader validates against it. |
## Decisions & Risks ## Decisions & Risks
| ID | Risk | Impact | Mitigation / Owner | | ID | Risk | Impact | Mitigation / Owner |
@@ -76,3 +76,6 @@
| 2025-11-28 | Started CVSS-ENGINE-190-002: Implementing scoring engine with MacroVector lookup tables per FIRST CVSS v4.0 specification. | Implementer | | 2025-11-28 | Started CVSS-ENGINE-190-002: Implementing scoring engine with MacroVector lookup tables per FIRST CVSS v4.0 specification. | Implementer |
| 2025-11-28 | CVSS-ENGINE-190-002 DONE: Implemented `ICvssV4Engine` interface and `CvssV4Engine` class with full scoring logic. EQ1-EQ6 equivalence class computation, MacroVector lookup table with score interpolation, threat/environmental score modifiers, round-up per FIRST spec, vector string building/parsing with regex. Started CVSS-TESTS-190-003. | Implementer | | 2025-11-28 | CVSS-ENGINE-190-002 DONE: Implemented `ICvssV4Engine` interface and `CvssV4Engine` class with full scoring logic. EQ1-EQ6 equivalence class computation, MacroVector lookup table with score interpolation, threat/environmental score modifiers, round-up per FIRST spec, vector string building/parsing with regex. Started CVSS-TESTS-190-003. | Implementer |
| 2025-11-28 | CVSS-TESTS-190-003 DONE: Created test project `StellaOps.Policy.Scoring.Tests` with `CvssV4EngineTests.cs`. Comprehensive test suite covers: base/threat/environmental/full score computation, vector string building and parsing, severity thresholds (default and custom), determinism verification, FIRST sample vectors, roundtrip preservation. Wave 1 (Foundation) complete - all 4 tasks DONE. | Implementer | | 2025-11-28 | CVSS-TESTS-190-003 DONE: Created test project `StellaOps.Policy.Scoring.Tests` with `CvssV4EngineTests.cs`. Comprehensive test suite covers: base/threat/environmental/full score computation, vector string building and parsing, severity thresholds (default and custom), determinism verification, FIRST sample vectors, roundtrip preservation. Wave 1 (Foundation) complete - all 4 tasks DONE. | Implementer |
| 2025-11-28 | CVSS-POLICY-190-004 DONE: Added `CvssPolicyLoader` (schema validation, canonical hash, policy deserialization), `CvssPolicySchema` loader for embedded schema, and unit tests (`CvssPolicyLoaderTests`) covering determinism and validation failures. | Implementer |
| 2025-11-28 | CVSS-RECEIPT-190-005 DONE: Added `ReceiptBuilder` with deterministic input hashing, evidence validation (policy-driven), vector/scoring via CvssV4Engine, and persistence through repository abstraction. Added `CreateReceiptRequest`, `IReceiptRepository`, unit tests (`ReceiptBuilderTests`) with in-memory repo; all 37 tests passing. | Implementer |
| 2025-11-28 | Ran `dotnet test src/Policy/__Tests/StellaOps.Policy.Scoring.Tests` (Release); 35 tests passed. Adjusted MacroVector lookup for FIRST sample vectors; duplicate PackageReference warnings remain to be cleaned separately. | Implementer |

View File

@@ -0,0 +1,123 @@
# Sprint 0215.0001.0001 - Experience & SDKs - Vulnerability Triage UX
## Topic & Scope
- Implement vulnerability triage workspace with VEX-first decisioning UX aligned with industry patterns (Snyk, GitLab, Harbor/Trivy, Anchore).
- Build evidence-first finding cards, VEX modal, attestation views, and audit bundle export.
- **Working directory:** `src/UI/StellaOps.UI`
## Dependencies & Concurrency
- Upstream sprints: SPRINT_0209_0001_0001_ui_i (UI I), SPRINT_210_ui_ii (UI II - VEX tab).
- Backend dependencies: Vuln Explorer APIs (`/v1/findings`, `/v1/vex-decisions`), Attestor service, Export Center.
- Parallel tracks: Can run alongside UI II/III for shared component work.
- Blockers to flag: VEX decision API schema finalization, Attestation viewer predicates.
## Documentation Prerequisites
- `docs/README.md`
- `docs/07_HIGH_LEVEL_ARCHITECTURE.md`
- `docs/modules/platform/architecture-overview.md`
- `docs/modules/ui/architecture.md`
- `docs/modules/vuln-explorer/architecture.md`
- `docs/modules/vex-lens/architecture.md`
- `docs/product-advisories/28-Nov-2025 - Vulnerability Triage UX & VEX-First Decisioning.md` (canonical)
- `docs/product-advisories/27-Nov-2025 - Explainability Layer for Vulnerability Verdicts.md`
- `docs/schemas/vex-decision.schema.json`
- `docs/schemas/audit-bundle-index.schema.json`
## Delivery Tracker
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
| --- | --- | --- | --- | --- | --- |
| 1 | UI-TRIAGE-01-001 | TODO | - | UI Guild (src/UI/StellaOps.UI) | Create Artifacts List view with columns: Artifact, Type, Environment(s), Open/Total vulns, Max severity, Attestations badge, Last scan. Include sorting, filtering, and "View vulnerabilities" primary action. |
| 2 | UI-TRIAGE-01-002 | TODO | UI-TRIAGE-01-001 | UI Guild (src/UI/StellaOps.UI) | Build Vulnerability Workspace split layout: left panel with finding cards (CVE, package, severity, path), right panel with Explainability tabs (Overview, Reachability, Policy, Attestations). |
| 3 | UI-TRIAGE-01-003 | TODO | UI-TRIAGE-01-002 | UI Guild (src/UI/StellaOps.UI) | Implement evidence-first Finding Card component with severity badge, package info, location path, and primary actions (Fix PR, VEX, Attach Evidence). Include `New`, `VEX: Not affected`, `Policy: blocked` badges. |
| 4 | UI-TRIAGE-01-004 | TODO | UI-TRIAGE-01-003 | UI Guild (src/UI/StellaOps.UI) | Build Explainability Panel Overview tab: title, severity, package/version, scanner+DB date, finding history timeline, current VEX decision summary. |
| 5 | UI-TRIAGE-01-005 | TODO | UI-TRIAGE-01-004 | UI Guild (src/UI/StellaOps.UI) | Build Explainability Panel Reachability tab: call path visualization, module list, runtime usage indicators (when available from scanner). |
| 6 | UI-TRIAGE-01-006 | TODO | UI-TRIAGE-01-004 | UI Guild (src/UI/StellaOps.UI) | Build Explainability Panel Policy tab: policy evaluation result, gate details with "this gate failed because..." explanation, links to gate definitions. |
| 7 | UI-TRIAGE-01-007 | TODO | UI-TRIAGE-01-004 | UI Guild (src/UI/StellaOps.UI) | Build Explainability Panel Attestations tab: list attestations mentioning artifact/vulnerabilityId/scan with type, subject, predicate, signer, verified badge. |
| 8 | UI-VEX-02-001 | TODO | UI-TRIAGE-01-003 | UI Guild; Excititor Guild (src/UI/StellaOps.UI) | Create VEX Modal component with status radio buttons (Not Affected, Affected-mitigated, Affected-unmitigated, Fixed), justification type select, justification text area. |
| 9 | UI-VEX-02-002 | TODO | UI-VEX-02-001 | UI Guild (src/UI/StellaOps.UI) | Add VEX Modal scope section: environments multi-select, projects multi-select with clear scope preview. |
| 10 | UI-VEX-02-003 | TODO | UI-VEX-02-002 | UI Guild (src/UI/StellaOps.UI) | Add VEX Modal validity section: notBefore date (default now), notAfter date with expiry recommendations and warnings for long durations. |
| 11 | UI-VEX-02-004 | TODO | UI-VEX-02-003 | UI Guild (src/UI/StellaOps.UI) | Add VEX Modal evidence section: add links (PR, ticket, doc, commit), attach attestation picker, evidence preview list with remove action. |
| 12 | UI-VEX-02-005 | TODO | UI-VEX-02-004 | UI Guild (src/UI/StellaOps.UI) | Add VEX Modal review section: summary preview of VEX statement to be created, "Will generate signed attestation" indicator, View raw JSON toggle for power users. |
| 13 | UI-VEX-02-006 | TODO | UI-VEX-02-005 | UI Guild (src/UI/StellaOps.UI) | Wire VEX Modal to backend: POST /vex-decisions on save, handle success/error states, update finding card VEX badge on completion. |
| 14 | UI-VEX-02-007 | TODO | UI-VEX-02-006 | UI Guild (src/UI/StellaOps.UI) | Add bulk VEX action: multi-select findings from list, open VEX modal with bulk context, apply decision to all selected findings. |
| 15 | UI-ATT-03-001 | TODO | UI-TRIAGE-01-007 | UI Guild; Attestor Guild (src/UI/StellaOps.UI) | Create Attestations View per artifact: table with Type, Subject, Predicate type, Scanner/policy engine, Signer (keyId + trusted badge), Created at, Verified status. |
| 16 | UI-ATT-03-002 | TODO | UI-ATT-03-001 | UI Guild (src/UI/StellaOps.UI) | Build Attestation Detail modal: header (statement id, subject, signer), predicate preview (vuln scan counts, SBOM bomRef, VEX decision status), verify command snippet. |
| 17 | UI-ATT-03-003 | TODO | UI-ATT-03-002 | UI Guild (src/UI/StellaOps.UI) | Add "Signed evidence" pill to finding cards: clicking opens attestation detail modal, shows human-readable JSON view. |
| 18 | UI-GATE-04-001 | TODO | UI-TRIAGE-01-006 | UI Guild; Policy Guild (src/UI/StellaOps.UI) | Create Policy & Gating View: matrix of gates vs subject types (CI Build, Registry Admission, Runtime Admission), rule descriptions, last evaluation stats. |
| 19 | UI-GATE-04-002 | TODO | UI-GATE-04-001 | UI Guild (src/UI/StellaOps.UI) | Add gate drill-down: recent evaluations list, artifact links, policy attestation links, condition failure explanations. |
| 20 | UI-GATE-04-003 | TODO | UI-GATE-04-002 | UI Guild (src/UI/StellaOps.UI) | Add "Ready to deploy" badge on artifact cards when all gates pass and required attestations verified. |
| 21 | UI-AUDIT-05-001 | TODO | UI-TRIAGE-01-001 | UI Guild; Export Center Guild (src/UI/StellaOps.UI) | Create "Create immutable audit bundle" button on Artifact page, Pipeline run detail, and Policy evaluation detail views. |
| 22 | UI-AUDIT-05-002 | TODO | UI-AUDIT-05-001 | UI Guild (src/UI/StellaOps.UI) | Build Audit Bundle creation wizard: subject artifact+digest selection, time window picker, content checklist (Vuln reports, SBOM, VEX, Policy evals, Attestations). |
| 23 | UI-AUDIT-05-003 | TODO | UI-AUDIT-05-002 | UI Guild (src/UI/StellaOps.UI) | Wire audit bundle creation to POST /audit-bundles, show progress, display bundle ID, hash, download button, and OCI reference on completion. |
| 24 | UI-AUDIT-05-004 | TODO | UI-AUDIT-05-003 | UI Guild (src/UI/StellaOps.UI) | Add audit bundle history view: list previously created bundles with bundleId, createdAt, subject, download/view actions. |
| 25 | API-VEX-06-001 | TODO | - | API Guild (src/VulnExplorer) | Implement POST /v1/vex-decisions endpoint with VexDecisionDto request/response per schema, validation, attestation generation trigger. |
| 26 | API-VEX-06-002 | TODO | API-VEX-06-001 | API Guild (src/VulnExplorer) | Implement PATCH /v1/vex-decisions/{id} for updating existing decisions with supersedes tracking. |
| 27 | API-VEX-06-003 | TODO | API-VEX-06-002 | API Guild (src/VulnExplorer) | Implement GET /v1/vex-decisions with filters for vulnerabilityId, subject, status, scope, validFor. |
| 28 | API-AUDIT-07-001 | TODO | - | API Guild (src/ExportCenter) | Implement POST /v1/audit-bundles endpoint with bundle creation, index generation, ZIP/OCI artifact production. |
| 29 | API-AUDIT-07-002 | TODO | API-AUDIT-07-001 | API Guild (src/ExportCenter) | Implement GET /v1/audit-bundles/{bundleId} for bundle download with integrity verification. |
| 30 | SCHEMA-08-001 | TODO | - | Platform Guild | Create docs/schemas/vex-decision.schema.json with JSON Schema 2020-12 definition per advisory. |
| 31 | SCHEMA-08-002 | TODO | SCHEMA-08-001 | Platform Guild | Create docs/schemas/attestation-vuln-scan.schema.json for vulnerability scan attestation predicate. |
| 32 | SCHEMA-08-003 | TODO | SCHEMA-08-002 | Platform Guild | Create docs/schemas/audit-bundle-index.schema.json for audit bundle manifest structure. |
| 33 | DTO-09-001 | TODO | SCHEMA-08-001 | API Guild | Create VexDecisionDto, SubjectRefDto, EvidenceRefDto, VexScopeDto, ValidForDto C# DTOs per advisory. |
| 34 | DTO-09-002 | TODO | SCHEMA-08-002 | API Guild | Create VulnScanAttestationDto, AttestationSubjectDto, VulnScanPredicateDto C# DTOs per advisory. |
| 35 | DTO-09-003 | TODO | SCHEMA-08-003 | API Guild | Create AuditBundleIndexDto, BundleArtifactDto, BundleVexDecisionEntryDto C# DTOs per advisory. |
| 36 | TS-10-001 | TODO | SCHEMA-08-001 | UI Guild | Create TypeScript interfaces for VexDecision, SubjectRef, EvidenceRef, VexScope, ValidFor per advisory. |
| 37 | TS-10-002 | TODO | SCHEMA-08-002 | UI Guild | Create TypeScript interfaces for VulnScanAttestation, AttestationSubject, VulnScanPredicate per advisory. |
| 38 | TS-10-003 | TODO | SCHEMA-08-003 | UI Guild | Create TypeScript interfaces for AuditBundleIndex, BundleArtifact, BundleVexDecisionEntry per advisory. |
## Wave Coordination
- **Wave A (Schemas & DTOs):** SCHEMA-08-*, DTO-09-*, TS-10-* - Foundation work
- **Wave B (Backend APIs):** API-VEX-06-*, API-AUDIT-07-* - Depends on Wave A
- **Wave C (UI Components):** UI-TRIAGE-01-*, UI-VEX-02-*, UI-ATT-03-*, UI-GATE-04-*, UI-AUDIT-05-* - Depends on Wave A, can start mockable components in parallel
## Wave Detail Snapshots
### Wave A - Schemas & Types
- Duration: 2-3 days
- Deliverables: JSON schemas in docs/schemas/, C# DTOs in src/VulnExplorer, TypeScript interfaces in src/UI
- Exit criteria: Schemas validate, DTOs compile, TS interfaces pass type checks
### Wave B - Backend APIs
- Duration: 3-5 days
- Deliverables: VEX decision CRUD endpoints, audit bundle generation endpoint
- Exit criteria: API tests pass, OpenAPI spec updated, deterministic outputs verified
### Wave C - UI Components
- Duration: 5-7 days
- Deliverables: Triage workspace, VEX modal, attestation views, audit bundle wizard
- Exit criteria: Accessibility audit passes, responsive design verified, e2e tests green
## Interlocks
- VEX-Lens module (Excititor) for VEX document normalization and consensus
- Attestor service for VEX attestation signing
- Export Center for audit bundle ZIP/OCI generation
- Policy Engine for gate evaluation data
## Upcoming Checkpoints
- 2025-12-02 15:00 UTC - Schema review (owners: Platform Guild, API Guild)
- 2025-12-05 15:00 UTC - API contract freeze (owners: API Guild, UI Guild)
- 2025-12-10 15:00 UTC - UI component review (owners: UI Guild, UX)
- 2025-12-13 15:00 UTC - Integration testing go/no-go (owners: All guilds)
## Action Tracker
| # | Action | Owner | Due | Status |
| --- | --- | --- | --- | --- |
| 1 | Finalize VEX decision schema with Excititor team | Platform Guild | 2025-12-02 | TODO |
| 2 | Confirm attestation predicate types with Attestor team | API Guild | 2025-12-03 | TODO |
| 3 | Review audit bundle format with Export Center team | API Guild | 2025-12-04 | TODO |
| 4 | Accessibility review of VEX modal with Accessibility Guild | UI Guild | 2025-12-09 | TODO |
## Decisions & Risks
| Risk | Impact | Mitigation / Next Step |
| --- | --- | --- |
| VEX schema changes after Wave A | Rework DTOs and TS interfaces | Lock schema by checkpoint 1; version DTOs if needed |
| Attestation service not ready | UI-ATT-* tasks blocked | Mock attestation data; feature flag attestation views |
| Export Center capacity | Audit bundle generation slow | Async generation with progress; queue management |
| Bulk VEX operations performance | UI-VEX-02-007 slow for large selections | Batch API endpoint; pagination; background processing |
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2025-11-28 | Sprint created from product advisory `28-Nov-2025 - Vulnerability Triage UX & VEX-First Decisioning.md`. 38 tasks defined across 5 UI task groups, 2 API task groups, 3 schema tasks, 3 DTO tasks, 3 TS interface tasks. | Project mgmt |
---
*Sprint created: 2025-11-28*

View File

@@ -33,10 +33,10 @@ Dependency: Sprint 135 - 6. Scanner.VI — Scanner & Surface focus on Scanner (p
| `SCANNER-ENG-0021` | DONE (2025-11-28) | Implement pkgutil receipt collector per `design/macos-analyzer.md` §3.2. | Scanner Guild (docs/modules/scanner) | — | | `SCANNER-ENG-0021` | DONE (2025-11-28) | Implement pkgutil receipt collector per `design/macos-analyzer.md` §3.2. | Scanner Guild (docs/modules/scanner) | — |
| `SCANNER-ENG-0022` | DONE (2025-11-28) | Implement macOS bundle inspector & capability overlays per `design/macos-analyzer.md` §3.3. | Scanner Guild, Policy Guild (docs/modules/scanner) | — | | `SCANNER-ENG-0022` | DONE (2025-11-28) | Implement macOS bundle inspector & capability overlays per `design/macos-analyzer.md` §3.3. | Scanner Guild, Policy Guild (docs/modules/scanner) | — |
| `SCANNER-ENG-0023` | DONE (2025-11-28) | Deliver macOS policy/offline integration per `design/macos-analyzer.md` §56. | Scanner Guild, Offline Kit Guild, Policy Guild (docs/modules/scanner) | — | | `SCANNER-ENG-0023` | DONE (2025-11-28) | Deliver macOS policy/offline integration per `design/macos-analyzer.md` §56. | Scanner Guild, Offline Kit Guild, Policy Guild (docs/modules/scanner) | — |
| `SCANNER-ENG-0024` | TODO | Implement Windows MSI collector per `design/windows-analyzer.md` §3.1. | Scanner Guild (docs/modules/scanner) | — | | `SCANNER-ENG-0024` | DONE (2025-11-28) | Implement Windows MSI collector per `design/windows-analyzer.md` §3.1. | Scanner Guild (docs/modules/scanner) | — |
| `SCANNER-ENG-0025` | TODO | Implement WinSxS manifest collector per `design/windows-analyzer.md` §3.2. | Scanner Guild (docs/modules/scanner) | — | | `SCANNER-ENG-0025` | DONE (2025-11-28) | Implement WinSxS manifest collector per `design/windows-analyzer.md` §3.2. | Scanner Guild (docs/modules/scanner) | — |
| `SCANNER-ENG-0026` | TODO | Implement Windows Chocolatey & registry collectors per `design/windows-analyzer.md` §3.33.4. | Scanner Guild (docs/modules/scanner) | — | | `SCANNER-ENG-0026` | DONE (2025-11-28) | Implement Windows Chocolatey & registry collectors per `design/windows-analyzer.md` §3.33.4. | Scanner Guild (docs/modules/scanner) | — |
| `SCANNER-ENG-0027` | TODO | Deliver Windows policy/offline integration per `design/windows-analyzer.md` §56. | Scanner Guild, Policy Guild, Offline Kit Guild (docs/modules/scanner) | — | | `SCANNER-ENG-0027` | DONE (2025-11-28) | Deliver Windows policy/offline integration per `design/windows-analyzer.md` §56. | Scanner Guild, Policy Guild, Offline Kit Guild (docs/modules/scanner) | — |
| `SCHED-SURFACE-02` | TODO | Integrate Scheduler worker prefetch using Surface manifest reader and persist manifest pointers with rerun plans. | Scheduler Worker Guild (src/Scheduler/__Libraries/StellaOps.Scheduler.Worker) | SURFACE-FS-02, SCHED-SURFACE-01. Reference `docs/modules/scanner/design/surface-fs-consumers.md` §3 for implementation checklist | | `SCHED-SURFACE-02` | TODO | Integrate Scheduler worker prefetch using Surface manifest reader and persist manifest pointers with rerun plans. | Scheduler Worker Guild (src/Scheduler/__Libraries/StellaOps.Scheduler.Worker) | SURFACE-FS-02, SCHED-SURFACE-01. Reference `docs/modules/scanner/design/surface-fs-consumers.md` §3 for implementation checklist |
| `ZASTAVA-SURFACE-02` | TODO | Use Surface manifest reader helpers to resolve `cas://` pointers and enrich drift diagnostics with manifest provenance. | Zastava Observer Guild (src/Zastava/StellaOps.Zastava.Observer) | SURFACE-FS-02, ZASTAVA-SURFACE-01. Reference `docs/modules/scanner/design/surface-fs-consumers.md` §4 for integration steps | | `ZASTAVA-SURFACE-02` | TODO | Use Surface manifest reader helpers to resolve `cas://` pointers and enrich drift diagnostics with manifest provenance. | Zastava Observer Guild (src/Zastava/StellaOps.Zastava.Observer) | SURFACE-FS-02, ZASTAVA-SURFACE-01. Reference `docs/modules/scanner/design/surface-fs-consumers.md` §4 for integration steps |
| `SURFACE-FS-03` | DONE (2025-11-27) | Integrate Surface.FS writer into Scanner Worker analyzer pipeline to persist layer + entry-trace fragments. | Scanner Guild (src/Scanner/__Libraries/StellaOps.Scanner.Surface.FS) | SURFACE-FS-02 | | `SURFACE-FS-03` | DONE (2025-11-27) | Integrate Surface.FS writer into Scanner Worker analyzer pipeline to persist layer + entry-trace fragments. | Scanner Guild (src/Scanner/__Libraries/StellaOps.Scanner.Surface.FS) | SURFACE-FS-02 |
@@ -90,3 +90,7 @@ Dependency: Sprint 135 - 6. Scanner.VI — Scanner & Surface focus on Scanner (p
| 2025-11-28 | Created `docs/modules/scanner/guides/surface-fs-workflow.md` with end-to-end workflow including artefact generation, storage layout, consumption, and offline kit handling; SURFACE-FS-06 DONE. | Implementer | | 2025-11-28 | Created `docs/modules/scanner/guides/surface-fs-workflow.md` with end-to-end workflow including artefact generation, storage layout, consumption, and offline kit handling; SURFACE-FS-06 DONE. | Implementer |
| 2025-11-28 | Created `StellaOps.Scanner.Analyzers.OS.Homebrew` library with `HomebrewReceiptParser` (INSTALL_RECEIPT.json parsing), `HomebrewPackageAnalyzer` (Cellar discovery for Intel/Apple Silicon), and `HomebrewAnalyzerPlugin`; added `BuildHomebrew` PURL builder, `HomebrewCellar` evidence source; 23 tests passing. SCANNER-ENG-0020 DONE. | Implementer | | 2025-11-28 | Created `StellaOps.Scanner.Analyzers.OS.Homebrew` library with `HomebrewReceiptParser` (INSTALL_RECEIPT.json parsing), `HomebrewPackageAnalyzer` (Cellar discovery for Intel/Apple Silicon), and `HomebrewAnalyzerPlugin`; added `BuildHomebrew` PURL builder, `HomebrewCellar` evidence source; 23 tests passing. SCANNER-ENG-0020 DONE. | Implementer |
| 2025-11-28 | Created `StellaOps.Scanner.Analyzers.OS.Pkgutil` library with `PkgutilReceiptParser` (plist parsing), `BomParser` (BOM file enumeration), `PkgutilPackageAnalyzer` (receipt discovery from /var/db/receipts), and `PkgutilAnalyzerPlugin`; added `BuildPkgutil` PURL builder, `PkgutilReceipt` evidence source; 9 tests passing. SCANNER-ENG-0021 DONE. | Implementer | | 2025-11-28 | Created `StellaOps.Scanner.Analyzers.OS.Pkgutil` library with `PkgutilReceiptParser` (plist parsing), `BomParser` (BOM file enumeration), `PkgutilPackageAnalyzer` (receipt discovery from /var/db/receipts), and `PkgutilAnalyzerPlugin`; added `BuildPkgutil` PURL builder, `PkgutilReceipt` evidence source; 9 tests passing. SCANNER-ENG-0021 DONE. | Implementer |
| 2025-11-28 | Created `StellaOps.Scanner.Analyzers.OS.Windows.Msi` library with `MsiDatabaseParser` (OLE compound document parser), `MsiPackageAnalyzer` (Windows/Installer/*.msi discovery), and `MsiAnalyzerPlugin`; added `BuildWindowsMsi` PURL builder, `WindowsMsi` evidence source; 22 tests passing. SCANNER-ENG-0024 DONE. | Implementer |
| 2025-11-28 | Created `StellaOps.Scanner.Analyzers.OS.Windows.WinSxS` library with `WinSxSManifestParser` (XML assembly identity parser), `WinSxSPackageAnalyzer` (WinSxS/Manifests/*.manifest discovery), and `WinSxSAnalyzerPlugin`; added `BuildWindowsWinSxS` PURL builder, `WindowsWinSxS` evidence source; 18 tests passing. SCANNER-ENG-0025 DONE. | Implementer |
| 2025-11-28 | Created `StellaOps.Scanner.Analyzers.OS.Windows.Chocolatey` library with `NuspecParser` (nuspec + directory name fallback), `ChocolateyPackageAnalyzer` (ProgramData/Chocolatey/lib discovery), and `ChocolateyAnalyzerPlugin`; added `BuildChocolatey` PURL builder, `WindowsChocolatey` evidence source; 44 tests passing. SCANNER-ENG-0026 DONE. | Implementer |
| 2025-11-28 | Updated `docs/modules/scanner/design/windows-analyzer.md` with implementation status section documenting MSI/WinSxS/Chocolatey collector details, PURL formats, and vendor metadata schemas; registry collector deferred, policy predicates pending Policy module integration. SCANNER-ENG-0027 DONE. | Implementer |

View File

@@ -15,8 +15,8 @@ ORCH-SVC-33-001 | TODO | Enable `sources test. Dependencies: ORCH-SVC-32-005. |
ORCH-SVC-33-002 | TODO | Implement per-source/tenant adaptive token-bucket rate limiter, concurrency caps, and backpressure signals reacting to upstream 429/503. Dependencies: ORCH-SVC-33-001. | Orchestrator Service Guild (src/Orchestrator/StellaOps.Orchestrator) ORCH-SVC-33-002 | TODO | Implement per-source/tenant adaptive token-bucket rate limiter, concurrency caps, and backpressure signals reacting to upstream 429/503. Dependencies: ORCH-SVC-33-001. | Orchestrator Service Guild (src/Orchestrator/StellaOps.Orchestrator)
ORCH-SVC-33-003 | TODO | Add watermark/backfill manager with event-time windows, duplicate suppression, dry-run preview endpoint, and safety validations. Dependencies: ORCH-SVC-33-002. | Orchestrator Service Guild (src/Orchestrator/StellaOps.Orchestrator) ORCH-SVC-33-003 | TODO | Add watermark/backfill manager with event-time windows, duplicate suppression, dry-run preview endpoint, and safety validations. Dependencies: ORCH-SVC-33-002. | Orchestrator Service Guild (src/Orchestrator/StellaOps.Orchestrator)
ORCH-SVC-33-004 | TODO | Deliver dead-letter store, replay endpoints, and error classification surfaces with remediation hints + notification hooks. Dependencies: ORCH-SVC-33-003. | Orchestrator Service Guild (src/Orchestrator/StellaOps.Orchestrator) ORCH-SVC-33-004 | TODO | Deliver dead-letter store, replay endpoints, and error classification surfaces with remediation hints + notification hooks. Dependencies: ORCH-SVC-33-003. | Orchestrator Service Guild (src/Orchestrator/StellaOps.Orchestrator)
ORCH-SVC-34-001 | TODO | Implement quota management APIs, per-tenant SLO burn-rate computation, and alert budget tracking surfaced via metrics. Dependencies: ORCH-SVC-33-004. | Orchestrator Service Guild (src/Orchestrator/StellaOps.Orchestrator) ORCH-SVC-34-001 | DONE | Implement quota management APIs, per-tenant SLO burn-rate computation, and alert budget tracking surfaced via metrics. Dependencies: ORCH-SVC-33-004. | Orchestrator Service Guild (src/Orchestrator/StellaOps.Orchestrator)
ORCH-SVC-34-002 | TODO | Build audit log + immutable run ledger export with signed manifest support, including provenance chain to artifacts. Dependencies: ORCH-SVC-34-001. | Orchestrator Service Guild (src/Orchestrator/StellaOps.Orchestrator) ORCH-SVC-34-002 | DONE | Build audit log + immutable run ledger export with signed manifest support, including provenance chain to artifacts. Dependencies: ORCH-SVC-34-001. | Orchestrator Service Guild (src/Orchestrator/StellaOps.Orchestrator)
ORCH-SVC-34-003 | TODO | Execute perf/scale validation (≥10k pending jobs, dispatch P95 <150ms) and add autoscaling hooks with health probes. Dependencies: ORCH-SVC-34-002. | Orchestrator Service Guild (src/Orchestrator/StellaOps.Orchestrator) ORCH-SVC-34-003 | TODO | Execute perf/scale validation (≥10k pending jobs, dispatch P95 <150ms) and add autoscaling hooks with health probes. Dependencies: ORCH-SVC-34-002. | Orchestrator Service Guild (src/Orchestrator/StellaOps.Orchestrator)
ORCH-SVC-34-004 | TODO | Package orchestrator container, Helm overlays, offline bundle seeds, provenance attestations, and compliance checklist for GA. Dependencies: ORCH-SVC-34-003. | Orchestrator Service Guild (src/Orchestrator/StellaOps.Orchestrator) ORCH-SVC-34-004 | TODO | Package orchestrator container, Helm overlays, offline bundle seeds, provenance attestations, and compliance checklist for GA. Dependencies: ORCH-SVC-34-003. | Orchestrator Service Guild (src/Orchestrator/StellaOps.Orchestrator)
ORCH-SVC-35-101 | TODO | Register `export` job type with quotas/rate policies, expose telemetry, and ensure exporter workers heartbeat via orchestrator contracts. Dependencies: ORCH-SVC-34-004. | Orchestrator Service Guild (src/Orchestrator/StellaOps.Orchestrator) ORCH-SVC-35-101 | TODO | Register `export` job type with quotas/rate policies, expose telemetry, and ensure exporter workers heartbeat via orchestrator contracts. Dependencies: ORCH-SVC-34-004. | Orchestrator Service Guild (src/Orchestrator/StellaOps.Orchestrator)

View File

@@ -1,15 +1,25 @@
# Sprint 185 - Replay Core · 185.A) Shared Replay Primitives # Sprint 185 - Replay Core · 185.A) Shared Replay Primitives
[Replay Core] 185.A) Shared Replay Primitives [Replay Core] 185.A) Shared Replay Primitives
Depends on: Sprint 160 Export & Evidence Depends on: Sprint 160 Export & Evidence
Summary: Stand up a shared replay library, hashing/cononicalisation helpers, and baseline documentation for deterministic bundles. Summary: Stand up a shared replay library, hashing/cononicalisation helpers, and baseline documentation for deterministic bundles.
Task ID | State | Task description | Owners (Source) Task ID | State | Task description | Owners (Source)
--- | --- | --- | --- --- | --- | --- | ---
REPLAY-CORE-185-001 | TODO | Scaffold `StellaOps.Replay.Core` with manifest schema types, canonical JSON rules, Merkle utilities, and DSSE payload builders; add `AGENTS.md`/`TASKS.md` for the new library; cross-reference `docs/replay/DETERMINISTIC_REPLAY.md` section 3 when updating the library charter. | BE-Base Platform Guild (`src/__Libraries/StellaOps.Replay.Core`) REPLAY-CORE-185-001 | DONE (2025-11-28) | Scaffold `StellaOps.Replay.Core` with manifest schema types, canonical JSON rules, Merkle utilities, and DSSE payload builders; add `AGENTS.md`/`TASKS.md` for the new library; cross-reference `docs/replay/DETERMINISTIC_REPLAY.md` section 3 when updating the library charter. | BE-Base Platform Guild (`src/__Libraries/StellaOps.Replay.Core`)
REPLAY-CORE-185-002 | TODO | Implement deterministic bundle writer (tar.zst, CAS naming) and hashing abstractions, updating `docs/modules/platform/architecture-overview.md` with a Replay CAS subsection that documents layout/retention expectations. | Platform Guild (src/__Libraries/StellaOps.Replay.Core) REPLAY-CORE-185-002 | DONE (2025-11-28) | Implement deterministic bundle writer (tar.zst, CAS naming) and hashing abstractions, updating `docs/modules/platform/architecture-overview.md` with a "Replay CAS" subsection that documents layout/retention expectations. | Platform Guild (src/__Libraries/StellaOps.Replay.Core)
REPLAY-CORE-185-003 | TODO | Define Mongo collections (`replay_runs`, `replay_bundles`, `replay_subjects`) and indices, then author `docs/data/replay_schema.md` detailing schema fields, constraints, and offline sync strategy. | Platform Data Guild (src/__Libraries/StellaOps.Replay.Core) REPLAY-CORE-185-003 | DONE (2025-11-28) | Define Mongo collections (`replay_runs`, `replay_bundles`, `replay_subjects`) and indices, then author `docs/data/replay_schema.md` detailing schema fields, constraints, and offline sync strategy. | Platform Data Guild (src/__Libraries/StellaOps.Replay.Core)
DOCS-REPLAY-185-003 | TODO | Author `docs/data/replay_schema.md` detailing `replay_runs`, `replay_bundles`, `replay_subjects` collections, index guidance, and offline sync strategy aligned with Replay CAS. | Docs Guild, Platform Data Guild (docs) DOCS-REPLAY-185-003 | DONE (2025-11-28) | Author `docs/data/replay_schema.md` detailing `replay_runs`, `replay_bundles`, `replay_subjects` collections, index guidance, and offline sync strategy aligned with Replay CAS. | Docs Guild, Platform Data Guild (docs)
DOCS-REPLAY-185-004 | TODO | Expand `docs/replay/DEVS_GUIDE_REPLAY.md` with integration guidance for consuming services (Scanner, Evidence Locker, CLI) and add checklist derived from `docs/replay/DETERMINISTIC_REPLAY.md` Section 11. | Docs Guild (docs) DOCS-REPLAY-185-004 | DONE (2025-11-28) | Expand `docs/replay/DEVS_GUIDE_REPLAY.md` with integration guidance for consuming services (Scanner, Evidence Locker, CLI) and add checklist derived from `docs/replay/DETERMINISTIC_REPLAY.md` Section 11. | Docs Guild (docs)
> 2025-11-03: Replay CAS section published in `docs/modules/platform/architecture-overview.md` §5 — owners can move REPLAY-CORE-185-001/002 to **DOING** once library scaffolding begins. > 2025-11-03: Replay CAS section published in `docs/modules/platform/architecture-overview.md` §5 — owners can move REPLAY-CORE-185-001/002 to **DOING** once library scaffolding begins.
## Implementation Status (2025-11-28)
All tasks verified complete:
- **REPLAY-CORE-185-001**: Library scaffolded with `CanonicalJson.cs`, `DeterministicHash.cs`, `DsseEnvelope.cs`, `ReplayManifest.cs`, `ReplayManifestExtensions.cs`; `AGENTS.md` published.
- **REPLAY-CORE-185-002**: `ReplayBundleWriter.cs` and `ReplayBundleEntry.cs` implement tar.zst CAS bundle operations; Replay CAS documented in architecture-overview.md §5.
- **REPLAY-CORE-185-003**: `ReplayMongoModels.cs` defines `ReplayRunDocument`, `ReplayBundleDocument`, `ReplaySubjectDocument` with `ReplayIndexes` constants.
- **DOCS-REPLAY-185-003**: `docs/data/replay_schema.md` published with collection schemas, indexes, and determinism constraints.
- **DOCS-REPLAY-185-004**: `docs/replay/DEVS_GUIDE_REPLAY.md` expanded with developer checklist, storage schema references, and workflow guidance.

View File

@@ -5,6 +5,14 @@ Active items only. Completed/historic work now resides in docs/implplan/archived
[Experience & SDKs] 180.E) UI.II [Experience & SDKs] 180.E) UI.II
Depends on: Sprint 180.E - UI.I Depends on: Sprint 180.E - UI.I
Summary: Experience & SDKs focus on UI (phase II). Summary: Experience & SDKs focus on UI (phase II).
## Related Sprints & Advisories
- **SPRINT_0215_0001_0001_vuln_triage_ux.md** - Comprehensive vulnerability triage UX with VEX-first decisioning
- **Advisory:** `28-Nov-2025 - Vulnerability Triage UX & VEX-First Decisioning.md`
- **Schemas:** `docs/schemas/vex-decision.schema.json`, `docs/schemas/audit-bundle-index.schema.json`
Note: UI-LNM-22-003 (VEX tab) should align with VEX decision model defined in SPRINT_0215. The VEX modal and decision workflows are detailed in the new sprint.
Task ID | State | Task description | Owners (Source) Task ID | State | Task description | Owners (Source)
--- | --- | --- | --- --- | --- | --- | ---
UI-LNM-22-002 | TODO | Implement filters (source, severity bucket, conflict-only, CVSS vector presence) and pagination/lazy loading for large linksets. Docs depend on finalized filtering UX. Dependencies: UI-LNM-22-001. | UI Guild (src/UI/StellaOps.UI) UI-LNM-22-002 | TODO | Implement filters (source, severity bucket, conflict-only, CVSS vector presence) and pagination/lazy loading for large linksets. Docs depend on finalized filtering UX. Dependencies: UI-LNM-22-001. | UI Guild (src/UI/StellaOps.UI)

View File

@@ -0,0 +1,89 @@
# PostgreSQL Conversion Project Overview
## Project Summary
**Objective:** Convert StellaOps control-plane domains from MongoDB to PostgreSQL using a strangler fig pattern for gradual rollout.
**Timeline:** 10-12 sprints (Phases 0-7)
**Reference Documentation:** `docs/db/` directory
## Sprint Index
| Sprint | Phase | Module | Status | Dependencies |
| --- | --- | --- | --- | --- |
| [3400](SPRINT_3400_0001_0001_postgres_foundations.md) | 0 | Foundations | IN_PROGRESS | None |
| [3401](SPRINT_3401_0001_0001_postgres_authority.md) | 1 | Authority | TODO | Phase 0 |
| [3402](SPRINT_3402_0001_0001_postgres_scheduler.md) | 2 | Scheduler | TODO | Phase 0 |
| [3403](SPRINT_3403_0001_0001_postgres_notify.md) | 3 | Notify | TODO | Phase 0 |
| [3404](SPRINT_3404_0001_0001_postgres_policy.md) | 4 | Policy | TODO | Phase 0 |
| [3405](SPRINT_3405_0001_0001_postgres_vulnerabilities.md) | 5 | Vulnerabilities | TODO | Phase 0 |
| [3406](SPRINT_3406_0001_0001_postgres_vex_graph.md) | 6 | VEX & Graph | TODO | Phase 5 |
| [3407](SPRINT_3407_0001_0001_postgres_cleanup.md) | 7 | Cleanup | TODO | All |
## Dependency Graph
```
Phase 0 (Foundations)
├─→ Phase 1 (Authority) ──┐
├─→ Phase 2 (Scheduler) ──┤
├─→ Phase 3 (Notify) ──┼─→ Phase 7 (Cleanup)
├─→ Phase 4 (Policy) ──┤
└─→ Phase 5 (Vulnerabilities) ─→ Phase 6 (VEX/Graph) ─┘
```
## Key Principles
1. **Strangler Fig Pattern:** Introduce PostgreSQL repositories alongside MongoDB, gradually switch per module.
2. **Dual-Write for Tier A:** Critical data (auth, tokens) uses dual-write during transition.
3. **Determinism Preserved:** Same inputs must produce identical outputs (especially graph_revision_id).
4. **Multi-Tenancy:** Row-level isolation via `tenant_id` column.
5. **Offline-First:** All operations must work in air-gapped environments.
## Data Tiering
| Tier | Examples | Migration Strategy |
| --- | --- | --- |
| **Tier A (Critical)** | Tenants, users, tokens, API keys | Dual-write, extensive verification |
| **Tier B (Important)** | Jobs, advisories, VEX statements | Conversion with comparison tests |
| **Tier C (Ephemeral)** | Metrics, audit logs | Recreate from scratch |
## Critical Success Factors
1. **Graph Revision ID Stability** - Phase 6 determinism is CRITICAL
2. **Vulnerability Matching Parity** - Phase 5 must produce identical results
3. **Zero Data Loss** - Tier A data must be 100% preserved
4. **Performance Parity** - PostgreSQL must match or exceed MongoDB performance
## Documentation
| Document | Location | Purpose |
| --- | --- | --- |
| Specification | `docs/db/SPECIFICATION.md` | Complete PostgreSQL schema design |
| Rules | `docs/db/RULES.md` | Coding conventions and patterns |
| Verification | `docs/db/VERIFICATION.md` | Testing requirements |
| Conversion Plan | `docs/db/CONVERSION_PLAN.md` | Strategic plan |
| Task Definitions | `docs/db/tasks/PHASE_*.md` | Detailed task breakdowns |
## Current Status
### Phase 0: Foundations - IN PROGRESS
- [x] `StellaOps.Infrastructure.Postgres` library created
- [x] `DataSourceBase` implemented
- [x] `RepositoryBase` implemented
- [x] `MigrationRunner` implemented
- [x] `PostgresOptions` and `PersistenceOptions` created
- [x] `PostgresFixture` for testing created
- [ ] Projects added to solution file
- [ ] PostgreSQL cluster provisioned
- [ ] CI pipeline integrated
### Upcoming
- Phase 1-4 can run in parallel after Phase 0 completes
- Phase 5 must complete before Phase 6
- Phase 7 runs after all other phases complete
---
*Created: 2025-11-28*
*Last Updated: 2025-11-28*

View File

@@ -0,0 +1,74 @@
# Sprint 3400 · PostgreSQL Conversion: Phase 0 - Foundations
## Topic & Scope
- Phase 0 of MongoDB to PostgreSQL conversion: Infrastructure & shared library setup.
- Create shared PostgreSQL infrastructure library (`StellaOps.Infrastructure.Postgres`).
- Establish patterns for DataSource, Repository, and Migration framework.
- Set up CI/CD pipeline for PostgreSQL testing.
- **Working directory:** src/__Libraries/StellaOps.Infrastructure.Postgres
## Dependencies & Concurrency
- Upstream: None (foundational work).
- Concurrency: Independent; must complete before Phase 1-7 sprints begin.
- Reference: `docs/db/tasks/PHASE_0_FOUNDATIONS.md`
## Documentation Prerequisites
- docs/db/README.md
- docs/db/SPECIFICATION.md
- docs/db/RULES.md
- docs/db/VERIFICATION.md
- docs/db/CONVERSION_PLAN.md
## Delivery Tracker
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
| --- | --- | --- | --- | --- | --- |
| 1 | PG-T0.1.1 | DONE | Infrastructure library created | Infrastructure Guild | Create `StellaOps.Infrastructure.Postgres` project structure |
| 2 | PG-T0.1.2 | DONE | NuGet references added | Infrastructure Guild | Add Npgsql 9.x and Microsoft.Extensions packages |
| 3 | PG-T0.2.1 | DONE | DataSourceBase implemented | Infrastructure Guild | Create abstract `DataSourceBase` class with connection pooling |
| 4 | PG-T0.2.2 | DONE | Tenant context implemented | Infrastructure Guild | Implement `OpenConnectionAsync` with `SET app.current_tenant` |
| 5 | PG-T0.2.3 | DONE | Session configuration implemented | Infrastructure Guild | Add UTC timezone, statement timeout, search path |
| 6 | PG-T0.3.1 | DONE | RepositoryBase implemented | Infrastructure Guild | Create `RepositoryBase<TDataSource>` with query helpers |
| 7 | PG-T0.3.2 | DONE | Parameter helpers implemented | Infrastructure Guild | Add JSONB, array, and nullable parameter helpers |
| 8 | PG-T0.3.3 | DONE | Pagination helpers implemented | Infrastructure Guild | Add `BuildOrderByClause` and `BuildPaginationClause` |
| 9 | PG-T0.4.1 | DONE | MigrationRunner implemented | Infrastructure Guild | Create SQL migration runner with checksum tracking |
| 10 | PG-T0.4.2 | DONE | Schema management implemented | Infrastructure Guild | Add schema creation and migration table setup |
| 11 | PG-T0.5.1 | DONE | PostgresOptions created | Infrastructure Guild | Create options class for connection settings |
| 12 | PG-T0.5.2 | DONE | PersistenceOptions created | Infrastructure Guild | Create backend switching options (Mongo/Postgres/DualWrite) |
| 13 | PG-T0.5.3 | DONE | DI extensions created | Infrastructure Guild | Create `ServiceCollectionExtensions` for registration |
| 14 | PG-T0.6.1 | DONE | PostgresFixture created | Infrastructure Guild | Create test fixture with Testcontainers support |
| 15 | PG-T0.6.2 | DONE | Test project created | Infrastructure Guild | Create `StellaOps.Infrastructure.Postgres.Tests` project |
| 16 | PG-T0.6.3 | DONE | Exception helpers created | Infrastructure Guild | Create `PostgresExceptionHelper` for error handling |
| 17 | PG-T0.7 | DONE | Update solution file | Infrastructure Guild | Add new projects to `StellaOps.sln` |
| 18 | PG-T0.8 | TODO | PostgreSQL cluster provisioning | DevOps Guild | Provision PostgreSQL 16 for staging/production |
| 19 | PG-T0.9 | TODO | CI pipeline integration | DevOps Guild | Add PostgreSQL Testcontainers to CI workflow |
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2025-11-28 | Created `StellaOps.Infrastructure.Postgres` library with DataSourceBase, RepositoryBase, MigrationRunner | Infrastructure Guild |
| 2025-11-28 | Added PostgresOptions, PersistenceOptions, and ServiceCollectionExtensions | Infrastructure Guild |
| 2025-11-28 | Created PostgresFixture for Testcontainers integration | Infrastructure Guild |
| 2025-11-28 | Created test project; verified build succeeds | Infrastructure Guild |
| 2025-11-28 | Sprint file created | Planning |
| 2025-11-28 | Added all 7 PostgreSQL storage projects to StellaOps.sln | Infrastructure Guild |
| 2025-11-28 | Created DataSource classes for all 6 modules | Infrastructure Guild |
| 2025-11-28 | Created repository implementations for Authority, Scheduler, Concelier, Excititor | Infrastructure Guild |
| 2025-11-28 | All PostgreSQL storage projects build successfully | Infrastructure Guild |
## Decisions & Risks
- Using Npgsql 9.x for latest features and performance improvements.
- Tenant context set via `set_config('app.current_tenant', ...)` for RLS compatibility.
- Migration runner uses SHA256 checksums for change detection.
- Test isolation via unique schema names per test class.
## Exit Criteria
- [ ] All infrastructure library components implemented and tested
- [ ] Projects added to solution file
- [ ] CI/CD pipeline running PostgreSQL tests
- [ ] PostgreSQL cluster provisioned for staging
## Next Checkpoints
- Phase 1 (Authority) can begin once CI pipeline is integrated.
---
*Reference: docs/db/tasks/PHASE_0_FOUNDATIONS.md*

View File

@@ -0,0 +1,70 @@
# Sprint 3401 · PostgreSQL Conversion: Phase 1 - Authority Module
## Topic & Scope
- Phase 1 of MongoDB to PostgreSQL conversion: Authority module (IAM, tenants, tokens).
- Create `StellaOps.Authority.Storage.Postgres` project.
- Implement all 12+ repository interfaces for Authority schema.
- Tier A data: requires dual-write verification before cutover.
- **Working directory:** src/Authority/__Libraries/StellaOps.Authority.Storage.Postgres
## Dependencies & Concurrency
- Upstream: Sprint 3400 (Phase 0 - Foundations) must be DONE.
- Concurrency: Can run in parallel with Phase 2-4 after foundations complete.
- Reference: `docs/db/tasks/PHASE_1_AUTHORITY.md`
## Documentation Prerequisites
- docs/db/README.md
- docs/db/SPECIFICATION.md (Section 5.1 - Authority Schema)
- docs/db/RULES.md
- src/Authority/AGENTS.md
## Delivery Tracker
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
| --- | --- | --- | --- | --- | --- |
| 1 | PG-T1.1 | TODO | Depends on PG-T0.7 | Authority Guild | Create `StellaOps.Authority.Storage.Postgres` project structure |
| 2 | PG-T1.2.1 | TODO | Depends on PG-T1.1 | Authority Guild | Create schema migration for `authority` schema |
| 3 | PG-T1.2.2 | TODO | Depends on PG-T1.2.1 | Authority Guild | Create `tenants` table with indexes |
| 4 | PG-T1.2.3 | TODO | Depends on PG-T1.2.1 | Authority Guild | Create `users`, `roles`, `permissions` tables |
| 5 | PG-T1.2.4 | TODO | Depends on PG-T1.2.1 | Authority Guild | Create `tokens`, `refresh_tokens`, `api_keys` tables |
| 6 | PG-T1.2.5 | TODO | Depends on PG-T1.2.1 | Authority Guild | Create `sessions`, `audit` tables |
| 7 | PG-T1.3 | TODO | Depends on PG-T1.2 | Authority Guild | Implement `AuthorityDataSource` class |
| 8 | PG-T1.4.1 | TODO | Depends on PG-T1.3 | Authority Guild | Implement `ITenantRepository` |
| 9 | PG-T1.4.2 | TODO | Depends on PG-T1.3 | Authority Guild | Implement `IUserRepository` with password hash handling |
| 10 | PG-T1.4.3 | TODO | Depends on PG-T1.3 | Authority Guild | Implement `IRoleRepository` |
| 11 | PG-T1.4.4 | TODO | Depends on PG-T1.3 | Authority Guild | Implement `IPermissionRepository` |
| 12 | PG-T1.5.1 | TODO | Depends on PG-T1.3 | Authority Guild | Implement `ITokenRepository` |
| 13 | PG-T1.5.2 | TODO | Depends on PG-T1.3 | Authority Guild | Implement `IRefreshTokenRepository` |
| 14 | PG-T1.5.3 | TODO | Depends on PG-T1.3 | Authority Guild | Implement `IApiKeyRepository` |
| 15 | PG-T1.6.1 | TODO | Depends on PG-T1.3 | Authority Guild | Implement `ISessionRepository` |
| 16 | PG-T1.6.2 | TODO | Depends on PG-T1.3 | Authority Guild | Implement `IAuditRepository` |
| 17 | PG-T1.7 | TODO | Depends on PG-T1.4-6 | Authority Guild | Add configuration switch in `ServiceCollectionExtensions` |
| 18 | PG-T1.8.1 | TODO | Depends on PG-T1.7 | Authority Guild | Write integration tests for all repositories |
| 19 | PG-T1.8.2 | TODO | Depends on PG-T1.8.1 | Authority Guild | Write determinism tests for token generation |
| 20 | PG-T1.9 | TODO | Depends on PG-T1.8 | Authority Guild | Optional: Implement dual-write wrapper for Tier A verification |
| 21 | PG-T1.10 | TODO | Depends on PG-T1.8 | Authority Guild | Run backfill from MongoDB to PostgreSQL |
| 22 | PG-T1.11 | TODO | Depends on PG-T1.10 | Authority Guild | Verify data integrity: row counts, checksums |
| 23 | PG-T1.12 | TODO | Depends on PG-T1.11 | Authority Guild | Switch Authority to PostgreSQL-only |
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2025-11-28 | Sprint file created | Planning |
## Decisions & Risks
- Password hashes stored as TEXT; Argon2id parameters in separate columns.
- Token expiry uses `TIMESTAMPTZ` for timezone-aware comparisons.
- Audit log may grow large; consider partitioning by `created_at` in production.
- Dual-write mode optional but recommended for Tier A data verification.
## Exit Criteria
- [ ] All 12+ repository interfaces implemented
- [ ] Schema migrations idempotent and tested
- [ ] All integration tests pass with Testcontainers
- [ ] Data backfill completed and verified
- [ ] Authority running on PostgreSQL in staging
## Next Checkpoints
- Coordinate with Phase 2 (Scheduler) for any shared user/tenant references.
---
*Reference: docs/db/tasks/PHASE_1_AUTHORITY.md*

View File

@@ -0,0 +1,70 @@
# Sprint 3402 · PostgreSQL Conversion: Phase 2 - Scheduler Module
## Topic & Scope
- Phase 2 of MongoDB to PostgreSQL conversion: Scheduler module.
- Create `StellaOps.Scheduler.Storage.Postgres` project.
- Implement job queue, triggers, and distributed locking with PostgreSQL advisory locks.
- Critical: preserve deterministic trigger calculation.
- **Working directory:** src/Scheduler/__Libraries/StellaOps.Scheduler.Storage.Postgres
## Dependencies & Concurrency
- Upstream: Sprint 3400 (Phase 0 - Foundations) must be DONE.
- Concurrency: Can run in parallel with Phase 1, 3, 4 after foundations complete.
- Reference: `docs/db/tasks/PHASE_2_SCHEDULER.md`
## Documentation Prerequisites
- docs/db/README.md
- docs/db/SPECIFICATION.md (Section 5.4 - Scheduler Schema)
- docs/db/RULES.md
- src/Scheduler/AGENTS.md
## Delivery Tracker
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
| --- | --- | --- | --- | --- | --- |
| 1 | PG-T2.1 | TODO | Depends on PG-T0.7 | Scheduler Guild | Create `StellaOps.Scheduler.Storage.Postgres` project structure |
| 2 | PG-T2.2.1 | TODO | Depends on PG-T2.1 | Scheduler Guild | Create schema migration for `scheduler` schema |
| 3 | PG-T2.2.2 | TODO | Depends on PG-T2.2.1 | Scheduler Guild | Create `jobs` table with status enum and indexes |
| 4 | PG-T2.2.3 | TODO | Depends on PG-T2.2.1 | Scheduler Guild | Create `triggers` table with cron expression support |
| 5 | PG-T2.2.4 | TODO | Depends on PG-T2.2.1 | Scheduler Guild | Create `workers`, `leases` tables |
| 6 | PG-T2.2.5 | TODO | Depends on PG-T2.2.1 | Scheduler Guild | Create `job_history`, `metrics` tables |
| 7 | PG-T2.3 | TODO | Depends on PG-T2.2 | Scheduler Guild | Implement `SchedulerDataSource` class |
| 8 | PG-T2.4.1 | TODO | Depends on PG-T2.3 | Scheduler Guild | Implement `IJobRepository` with `FOR UPDATE SKIP LOCKED` |
| 9 | PG-T2.4.2 | TODO | Depends on PG-T2.3 | Scheduler Guild | Implement `ITriggerRepository` with next-fire calculation |
| 10 | PG-T2.4.3 | TODO | Depends on PG-T2.3 | Scheduler Guild | Implement `IWorkerRepository` for heartbeat tracking |
| 11 | PG-T2.5.1 | TODO | Depends on PG-T2.3 | Scheduler Guild | Implement distributed lock using `pg_advisory_lock` |
| 12 | PG-T2.5.2 | TODO | Depends on PG-T2.5.1 | Scheduler Guild | Implement `IDistributedLockRepository` interface |
| 13 | PG-T2.6.1 | TODO | Depends on PG-T2.3 | Scheduler Guild | Implement `IJobHistoryRepository` |
| 14 | PG-T2.6.2 | TODO | Depends on PG-T2.3 | Scheduler Guild | Implement `IMetricsRepository` |
| 15 | PG-T2.7 | TODO | Depends on PG-T2.4-6 | Scheduler Guild | Add configuration switch in `ServiceCollectionExtensions` |
| 16 | PG-T2.8.1 | TODO | Depends on PG-T2.7 | Scheduler Guild | Write integration tests for job queue operations |
| 17 | PG-T2.8.2 | TODO | Depends on PG-T2.8.1 | Scheduler Guild | Write determinism tests for trigger calculations |
| 18 | PG-T2.8.3 | TODO | Depends on PG-T2.8.1 | Scheduler Guild | Write concurrency tests for distributed locking |
| 19 | PG-T2.9 | TODO | Depends on PG-T2.8 | Scheduler Guild | Run backfill from MongoDB to PostgreSQL |
| 20 | PG-T2.10 | TODO | Depends on PG-T2.9 | Scheduler Guild | Verify data integrity and trigger timing |
| 21 | PG-T2.11 | TODO | Depends on PG-T2.10 | Scheduler Guild | Switch Scheduler to PostgreSQL-only |
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2025-11-28 | Sprint file created | Planning |
## Decisions & Risks
- PostgreSQL advisory locks replace MongoDB distributed locks.
- `FOR UPDATE SKIP LOCKED` for efficient job claiming without contention.
- Cron expressions stored as TEXT; next-fire computed in application.
- Job payload stored as JSONB for flexibility.
- Risk: advisory lock key collision; use tenant-scoped hash values.
## Exit Criteria
- [ ] All repository interfaces implemented
- [ ] Distributed locking working with advisory locks
- [ ] Trigger calculations deterministic
- [ ] All integration and concurrency tests pass
- [ ] Scheduler running on PostgreSQL in staging
## Next Checkpoints
- Validate job throughput matches MongoDB performance.
- Coordinate with Orchestrator for any job handoff patterns.
---
*Reference: docs/db/tasks/PHASE_2_SCHEDULER.md*

View File

@@ -0,0 +1,76 @@
# Sprint 3403 · PostgreSQL Conversion: Phase 3 - Notify Module
## Topic & Scope
- Phase 3 of MongoDB to PostgreSQL conversion: Notify module.
- Create `StellaOps.Notify.Storage.Postgres` project.
- Implement 15 repository interfaces for notification delivery and escalation.
- Handle delivery tracking, digest aggregation, and escalation state.
- **Working directory:** src/Notify/__Libraries/StellaOps.Notify.Storage.Postgres
## Dependencies & Concurrency
- Upstream: Sprint 3400 (Phase 0 - Foundations) must be DONE.
- Concurrency: Can run in parallel with Phase 1, 2, 4 after foundations complete.
- Reference: `docs/db/tasks/PHASE_3_NOTIFY.md`
## Documentation Prerequisites
- docs/db/README.md
- docs/db/SPECIFICATION.md (Section 5.5 - Notify Schema)
- docs/db/RULES.md
- src/Notify/AGENTS.md (if exists)
## Delivery Tracker
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
| --- | --- | --- | --- | --- | --- |
| 1 | PG-T3.1 | TODO | Depends on PG-T0.7 | Notify Guild | Create `StellaOps.Notify.Storage.Postgres` project structure |
| 2 | PG-T3.2.1 | TODO | Depends on PG-T3.1 | Notify Guild | Create schema migration for `notify` schema |
| 3 | PG-T3.2.2 | TODO | Depends on PG-T3.2.1 | Notify Guild | Create `channels` table (email, slack, teams, webhook) |
| 4 | PG-T3.2.3 | TODO | Depends on PG-T3.2.1 | Notify Guild | Create `rules`, `templates` tables |
| 5 | PG-T3.2.4 | TODO | Depends on PG-T3.2.1 | Notify Guild | Create `deliveries` table with status tracking |
| 6 | PG-T3.2.5 | TODO | Depends on PG-T3.2.1 | Notify Guild | Create `digests`, `quiet_hours`, `maintenance_windows` tables |
| 7 | PG-T3.2.6 | TODO | Depends on PG-T3.2.1 | Notify Guild | Create `escalation_policies`, `escalation_states` tables |
| 8 | PG-T3.2.7 | TODO | Depends on PG-T3.2.1 | Notify Guild | Create `on_call_schedules`, `inbox`, `incidents` tables |
| 9 | PG-T3.3 | TODO | Depends on PG-T3.2 | Notify Guild | Implement `NotifyDataSource` class |
| 10 | PG-T3.4.1 | TODO | Depends on PG-T3.3 | Notify Guild | Implement `IChannelRepository` |
| 11 | PG-T3.4.2 | TODO | Depends on PG-T3.3 | Notify Guild | Implement `IRuleRepository` with filter JSONB |
| 12 | PG-T3.4.3 | TODO | Depends on PG-T3.3 | Notify Guild | Implement `ITemplateRepository` with localization |
| 13 | PG-T3.5.1 | TODO | Depends on PG-T3.3 | Notify Guild | Implement `IDeliveryRepository` with status transitions |
| 14 | PG-T3.5.2 | TODO | Depends on PG-T3.3 | Notify Guild | Implement retry logic for failed deliveries |
| 15 | PG-T3.6.1 | TODO | Depends on PG-T3.3 | Notify Guild | Implement `IDigestRepository` |
| 16 | PG-T3.6.2 | TODO | Depends on PG-T3.3 | Notify Guild | Implement `IQuietHoursRepository` |
| 17 | PG-T3.6.3 | TODO | Depends on PG-T3.3 | Notify Guild | Implement `IMaintenanceWindowRepository` |
| 18 | PG-T3.7.1 | TODO | Depends on PG-T3.3 | Notify Guild | Implement `IEscalationPolicyRepository` |
| 19 | PG-T3.7.2 | TODO | Depends on PG-T3.3 | Notify Guild | Implement `IEscalationStateRepository` |
| 20 | PG-T3.7.3 | TODO | Depends on PG-T3.3 | Notify Guild | Implement `IOnCallScheduleRepository` |
| 21 | PG-T3.8.1 | TODO | Depends on PG-T3.3 | Notify Guild | Implement `IInboxRepository` |
| 22 | PG-T3.8.2 | TODO | Depends on PG-T3.3 | Notify Guild | Implement `IIncidentRepository` |
| 23 | PG-T3.8.3 | TODO | Depends on PG-T3.3 | Notify Guild | Implement `IAuditRepository` |
| 24 | PG-T3.9 | TODO | Depends on PG-T3.4-8 | Notify Guild | Add configuration switch in `ServiceCollectionExtensions` |
| 25 | PG-T3.10.1 | TODO | Depends on PG-T3.9 | Notify Guild | Write integration tests for all repositories |
| 26 | PG-T3.10.2 | TODO | Depends on PG-T3.10.1 | Notify Guild | Test notification delivery flow end-to-end |
| 27 | PG-T3.10.3 | TODO | Depends on PG-T3.10.1 | Notify Guild | Test escalation handling |
| 28 | PG-T3.10.4 | TODO | Depends on PG-T3.10.1 | Notify Guild | Test digest aggregation |
| 29 | PG-T3.11 | TODO | Depends on PG-T3.10 | Notify Guild | Switch Notify to PostgreSQL-only |
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2025-11-28 | Sprint file created | Planning |
## Decisions & Risks
- Channel configurations stored as JSONB for flexibility across channel types.
- Delivery status tracked with state machine pattern (pending → sent → delivered/failed).
- Escalation states may need frequent updates; index accordingly.
- Digest aggregation queries may be complex; consider materialized views.
## Exit Criteria
- [ ] All 15 repository interfaces implemented
- [ ] Delivery tracking working end-to-end
- [ ] Escalation logic verified
- [ ] All integration tests pass
- [ ] Notify running on PostgreSQL in staging
## Next Checkpoints
- Coordinate with Scheduler for notification trigger integration.
---
*Reference: docs/db/tasks/PHASE_3_NOTIFY.md*

View File

@@ -0,0 +1,73 @@
# Sprint 3404 · PostgreSQL Conversion: Phase 4 - Policy Module
## Topic & Scope
- Phase 4 of MongoDB to PostgreSQL conversion: Policy module.
- Create `StellaOps.Policy.Storage.Postgres` project.
- Implement policy pack versioning and risk profile management.
- Handle OPA/Rego policy storage and evaluation run tracking.
- **Working directory:** src/Policy/__Libraries/StellaOps.Policy.Storage.Postgres
## Dependencies & Concurrency
- Upstream: Sprint 3400 (Phase 0 - Foundations) must be DONE.
- Concurrency: Can run in parallel with Phase 1-3 after foundations complete.
- Reference: `docs/db/tasks/PHASE_4_POLICY.md`
## Documentation Prerequisites
- docs/db/README.md
- docs/db/SPECIFICATION.md (Section 5.6 - Policy Schema)
- docs/db/RULES.md
- src/Policy/AGENTS.md (if exists)
## Delivery Tracker
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
| --- | --- | --- | --- | --- | --- |
| 1 | PG-T4.1 | TODO | Depends on PG-T0.7 | Policy Guild | Create `StellaOps.Policy.Storage.Postgres` project structure |
| 2 | PG-T4.2.1 | TODO | Depends on PG-T4.1 | Policy Guild | Create schema migration for `policy` schema |
| 3 | PG-T4.2.2 | TODO | Depends on PG-T4.2.1 | Policy Guild | Create `packs`, `pack_versions` tables |
| 4 | PG-T4.2.3 | TODO | Depends on PG-T4.2.1 | Policy Guild | Create `rules` table with Rego content |
| 5 | PG-T4.2.4 | TODO | Depends on PG-T4.2.1 | Policy Guild | Create `risk_profiles` table with version history |
| 6 | PG-T4.2.5 | TODO | Depends on PG-T4.2.1 | Policy Guild | Create `evaluation_runs`, `explanations` tables |
| 7 | PG-T4.2.6 | TODO | Depends on PG-T4.2.1 | Policy Guild | Create `exceptions`, `audit` tables |
| 8 | PG-T4.3 | TODO | Depends on PG-T4.2 | Policy Guild | Implement `PolicyDataSource` class |
| 9 | PG-T4.4.1 | TODO | Depends on PG-T4.3 | Policy Guild | Implement `IPackRepository` with CRUD |
| 10 | PG-T4.4.2 | TODO | Depends on PG-T4.3 | Policy Guild | Implement version management for packs |
| 11 | PG-T4.4.3 | TODO | Depends on PG-T4.3 | Policy Guild | Implement active version promotion |
| 12 | PG-T4.5.1 | TODO | Depends on PG-T4.3 | Policy Guild | Implement `IRiskProfileRepository` |
| 13 | PG-T4.5.2 | TODO | Depends on PG-T4.3 | Policy Guild | Implement version history for risk profiles |
| 14 | PG-T4.5.3 | TODO | Depends on PG-T4.3 | Policy Guild | Implement `GetVersionAsync` and `ListVersionsAsync` |
| 15 | PG-T4.6.1 | TODO | Depends on PG-T4.3 | Policy Guild | Implement `IEvaluationRunRepository` |
| 16 | PG-T4.6.2 | TODO | Depends on PG-T4.3 | Policy Guild | Implement `IExplanationRepository` |
| 17 | PG-T4.6.3 | TODO | Depends on PG-T4.3 | Policy Guild | Implement `IExceptionRepository` |
| 18 | PG-T4.6.4 | TODO | Depends on PG-T4.3 | Policy Guild | Implement `IAuditRepository` |
| 19 | PG-T4.7 | TODO | Depends on PG-T4.4-6 | Policy Guild | Add configuration switch in `ServiceCollectionExtensions` |
| 20 | PG-T4.8.1 | TODO | Depends on PG-T4.7 | Policy Guild | Write integration tests for all repositories |
| 21 | PG-T4.8.2 | TODO | Depends on PG-T4.8.1 | Policy Guild | Test pack versioning workflow |
| 22 | PG-T4.8.3 | TODO | Depends on PG-T4.8.1 | Policy Guild | Test risk profile version history |
| 23 | PG-T4.9 | TODO | Depends on PG-T4.8 | Policy Guild | Export active packs from MongoDB |
| 24 | PG-T4.10 | TODO | Depends on PG-T4.9 | Policy Guild | Import packs to PostgreSQL |
| 25 | PG-T4.11 | TODO | Depends on PG-T4.10 | Policy Guild | Verify version numbers and active version settings |
| 26 | PG-T4.12 | TODO | Depends on PG-T4.11 | Policy Guild | Switch Policy to PostgreSQL-only |
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2025-11-28 | Sprint file created | Planning |
## Decisions & Risks
- Pack versions are immutable once published; new versions create new rows.
- Rego content stored as TEXT; consider compression for large policies.
- Evaluation results may grow rapidly; consider partitioning or archival.
- Risk profile versioning critical for audit trail; never delete old versions.
## Exit Criteria
- [ ] All repository interfaces implemented
- [ ] Pack versioning working correctly
- [ ] Risk profile version history maintained
- [ ] All integration tests pass
- [ ] Policy running on PostgreSQL in staging
## Next Checkpoints
- Coordinate with Excititor for VEX policy integration.
---
*Reference: docs/db/tasks/PHASE_4_POLICY.md*

View File

@@ -0,0 +1,90 @@
# Sprint 3405 · PostgreSQL Conversion: Phase 5 - Vulnerabilities (Concelier)
## Topic & Scope
- Phase 5 of MongoDB to PostgreSQL conversion: Concelier vulnerability index.
- Create `StellaOps.Concelier.Storage.Postgres` project.
- Implement full advisory schema with PURL matching and full-text search.
- Critical: maintain deterministic vulnerability matching.
- **Working directory:** src/Concelier/__Libraries/StellaOps.Concelier.Storage.Postgres
## Dependencies & Concurrency
- Upstream: Sprint 3400 (Phase 0 - Foundations) must be DONE.
- Concurrency: Should run after Phase 1-4; Excititor depends on this.
- Reference: `docs/db/tasks/PHASE_5_VULNERABILITIES.md`
## Documentation Prerequisites
- docs/db/README.md
- docs/db/SPECIFICATION.md (Section 5.2 - Vulnerability Schema)
- docs/db/RULES.md
- src/Concelier/AGENTS.md
## Delivery Tracker
### Sprint 5a: Schema & Repositories
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
| --- | --- | --- | --- | --- | --- |
| 1 | PG-T5a.1 | TODO | Depends on PG-T0.7 | Concelier Guild | Create `StellaOps.Concelier.Storage.Postgres` project structure |
| 2 | PG-T5a.2.1 | TODO | Depends on PG-T5a.1 | Concelier Guild | Create schema migration for `vuln` schema |
| 3 | PG-T5a.2.2 | TODO | Depends on PG-T5a.2.1 | Concelier Guild | Create `sources`, `feed_snapshots` tables |
| 4 | PG-T5a.2.3 | TODO | Depends on PG-T5a.2.1 | Concelier Guild | Create `advisories`, `advisory_snapshots` tables |
| 5 | PG-T5a.2.4 | TODO | Depends on PG-T5a.2.1 | Concelier Guild | Create `advisory_aliases`, `advisory_cvss` tables |
| 6 | PG-T5a.2.5 | TODO | Depends on PG-T5a.2.1 | Concelier Guild | Create `advisory_affected` with PURL matching indexes |
| 7 | PG-T5a.2.6 | TODO | Depends on PG-T5a.2.1 | Concelier Guild | Create `advisory_references`, `advisory_credits`, `advisory_weaknesses` tables |
| 8 | PG-T5a.2.7 | TODO | Depends on PG-T5a.2.1 | Concelier Guild | Create `kev_flags`, `source_states`, `merge_events` tables |
| 9 | PG-T5a.2.8 | TODO | Depends on PG-T5a.2.1 | Concelier Guild | Add full-text search index on advisories |
| 10 | PG-T5a.3 | TODO | Depends on PG-T5a.2 | Concelier Guild | Implement `ConcelierDataSource` class |
| 11 | PG-T5a.4.1 | TODO | Depends on PG-T5a.3 | Concelier Guild | Implement `ISourceRepository` |
| 12 | PG-T5a.4.2 | TODO | Depends on PG-T5a.3 | Concelier Guild | Implement `IAdvisoryRepository.GetByKeyAsync` |
| 13 | PG-T5a.4.3 | TODO | Depends on PG-T5a.3 | Concelier Guild | Implement `IAdvisoryRepository.GetByAliasAsync` (CVE lookup) |
| 14 | PG-T5a.4.4 | TODO | Depends on PG-T5a.3 | Concelier Guild | Implement `IAdvisoryRepository.SearchAsync` with full-text search |
| 15 | PG-T5a.4.5 | TODO | Depends on PG-T5a.3 | Concelier Guild | Implement `IAdvisoryRepository.UpsertAsync` with all child tables |
| 16 | PG-T5a.4.6 | TODO | Depends on PG-T5a.3 | Concelier Guild | Implement `IAdvisoryRepository.GetAffectingPackageAsync` (PURL match) |
| 17 | PG-T5a.4.7 | TODO | Depends on PG-T5a.3 | Concelier Guild | Implement `IAdvisoryRepository.GetAffectingPackageNameAsync` |
| 18 | PG-T5a.5.1 | TODO | Depends on PG-T5a.3 | Concelier Guild | Implement child table repositories (Alias, CVSS, Affected) |
| 19 | PG-T5a.5.2 | TODO | Depends on PG-T5a.3 | Concelier Guild | Implement child table repositories (Reference, Credit, Weakness) |
| 20 | PG-T5a.5.3 | TODO | Depends on PG-T5a.3 | Concelier Guild | Implement KEV and SourceState repositories |
| 21 | PG-T5a.6 | TODO | Depends on PG-T5a.5 | Concelier Guild | Write integration tests for all repositories |
### Sprint 5b: Conversion & Verification
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
| --- | --- | --- | --- | --- | --- |
| 22 | PG-T5b.1.1 | TODO | Depends on PG-T5a.6 | Concelier Guild | Build `AdvisoryConverter` to parse MongoDB documents |
| 23 | PG-T5b.1.2 | TODO | Depends on PG-T5b.1.1 | Concelier Guild | Map to relational structure with child tables |
| 24 | PG-T5b.1.3 | TODO | Depends on PG-T5b.1.2 | Concelier Guild | Preserve provenance JSONB |
| 25 | PG-T5b.1.4 | TODO | Depends on PG-T5b.1.2 | Concelier Guild | Handle version ranges (keep as JSONB) |
| 26 | PG-T5b.2.1 | TODO | Depends on PG-T5b.1 | Concelier Guild | Update NVD importer to write to PostgreSQL |
| 27 | PG-T5b.2.2 | TODO | Depends on PG-T5b.1 | Concelier Guild | Update OSV importer to write to PostgreSQL |
| 28 | PG-T5b.2.3 | TODO | Depends on PG-T5b.1 | Concelier Guild | Update GHSA/vendor importers to write to PostgreSQL |
| 29 | PG-T5b.3.1 | TODO | Depends on PG-T5b.2 | Concelier Guild | Configure dual-import mode |
| 30 | PG-T5b.3.2 | TODO | Depends on PG-T5b.3.1 | Concelier Guild | Run import cycle and compare record counts |
| 31 | PG-T5b.4.1 | TODO | Depends on PG-T5b.3 | Concelier Guild | Select sample SBOMs for verification |
| 32 | PG-T5b.4.2 | TODO | Depends on PG-T5b.4.1 | Concelier Guild | Run matching with MongoDB backend |
| 33 | PG-T5b.4.3 | TODO | Depends on PG-T5b.4.2 | Concelier Guild | Run matching with PostgreSQL backend |
| 34 | PG-T5b.4.4 | TODO | Depends on PG-T5b.4.3 | Concelier Guild | Compare findings (must be identical) |
| 35 | PG-T5b.5 | TODO | Depends on PG-T5b.4 | Concelier Guild | Performance optimization with EXPLAIN ANALYZE |
| 36 | PG-T5b.6 | TODO | Depends on PG-T5b.5 | Concelier Guild | Switch Scanner/Concelier to PostgreSQL-only |
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2025-11-28 | Sprint file created | Planning |
## Decisions & Risks
- PURL stored as TEXT with GIN trigram index for efficient matching.
- Version ranges stored as JSONB; too complex for relational decomposition.
- Full-text search using `tsvector` column with GIN index.
- Risk: matching discrepancies between backends; extensive comparison testing required.
- Expected data volume: 300K+ advisories, 2M+ affected entries.
## Exit Criteria
- [ ] All repository interfaces implemented
- [ ] Advisory conversion pipeline working
- [ ] Vulnerability matching produces identical results
- [ ] Feed imports working on PostgreSQL
- [ ] Concelier running on PostgreSQL in staging
## Next Checkpoints
- Phase 6 (Excititor) depends on this completing successfully.
---
*Reference: docs/db/tasks/PHASE_5_VULNERABILITIES.md*

View File

@@ -0,0 +1,102 @@
# Sprint 3406 · PostgreSQL Conversion: Phase 6 - VEX & Graph (Excititor)
## Topic & Scope
- Phase 6 of MongoDB to PostgreSQL conversion: Excititor VEX and graph storage.
- Create `StellaOps.Excititor.Storage.Postgres` project.
- Implement graph node/edge storage with efficient bulk operations.
- **CRITICAL:** Preserve graph_revision_id stability (determinism required).
- **Working directory:** src/Excititor/__Libraries/StellaOps.Excititor.Storage.Postgres
## Dependencies & Concurrency
- Upstream: Sprint 3400 (Phase 0) and Sprint 3405 (Phase 5 - Vulnerabilities) must be DONE.
- Concurrency: Must follow Phase 5 due to VEX-vulnerability relationships.
- Reference: `docs/db/tasks/PHASE_6_VEX_GRAPH.md`
## Documentation Prerequisites
- docs/db/README.md
- docs/db/SPECIFICATION.md (Section 5.3 - VEX Schema)
- docs/db/RULES.md
- src/Excititor/AGENTS.md (if exists)
## Delivery Tracker
### Sprint 6a: Core Schema & Repositories
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
| --- | --- | --- | --- | --- | --- |
| 1 | PG-T6a.1 | TODO | Depends on PG-T5b.6 | Excititor Guild | Create `StellaOps.Excititor.Storage.Postgres` project structure |
| 2 | PG-T6a.2.1 | TODO | Depends on PG-T6a.1 | Excititor Guild | Create schema migration for `vex` schema |
| 3 | PG-T6a.2.2 | TODO | Depends on PG-T6a.2.1 | Excititor Guild | Create `projects`, `graph_revisions` tables |
| 4 | PG-T6a.2.3 | TODO | Depends on PG-T6a.2.1 | Excititor Guild | Create `graph_nodes`, `graph_edges` tables (BIGSERIAL) |
| 5 | PG-T6a.2.4 | TODO | Depends on PG-T6a.2.1 | Excititor Guild | Create `statements`, `observations` tables |
| 6 | PG-T6a.2.5 | TODO | Depends on PG-T6a.2.1 | Excititor Guild | Create `linksets`, `linkset_events` tables |
| 7 | PG-T6a.2.6 | TODO | Depends on PG-T6a.2.1 | Excititor Guild | Create `consensus`, `consensus_holds` tables |
| 8 | PG-T6a.2.7 | TODO | Depends on PG-T6a.2.1 | Excititor Guild | Create remaining VEX tables (unknowns, evidence, cvss_receipts, etc.) |
| 9 | PG-T6a.2.8 | TODO | Depends on PG-T6a.2.1 | Excititor Guild | Add indexes for graph traversal |
| 10 | PG-T6a.3 | TODO | Depends on PG-T6a.2 | Excititor Guild | Implement `ExcititorDataSource` class |
| 11 | PG-T6a.4.1 | TODO | Depends on PG-T6a.3 | Excititor Guild | Implement `IProjectRepository` with tenant scoping |
| 12 | PG-T6a.4.2 | TODO | Depends on PG-T6a.3 | Excititor Guild | Implement `IVexStatementRepository` |
| 13 | PG-T6a.4.3 | TODO | Depends on PG-T6a.3 | Excititor Guild | Implement `IVexObservationRepository` |
| 14 | PG-T6a.5.1 | TODO | Depends on PG-T6a.3 | Excititor Guild | Implement `ILinksetRepository` |
| 15 | PG-T6a.5.2 | TODO | Depends on PG-T6a.3 | Excititor Guild | Implement `IConsensusRepository` |
| 16 | PG-T6a.6 | TODO | Depends on PG-T6a.5 | Excititor Guild | Write integration tests for core repositories |
### Sprint 6b: Graph Storage
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
| --- | --- | --- | --- | --- | --- |
| 17 | PG-T6b.1.1 | TODO | Depends on PG-T6a.6 | Excititor Guild | Implement `IGraphRevisionRepository.GetByIdAsync` |
| 18 | PG-T6b.1.2 | TODO | Depends on PG-T6a.6 | Excititor Guild | Implement `IGraphRevisionRepository.GetByRevisionIdAsync` |
| 19 | PG-T6b.1.3 | TODO | Depends on PG-T6a.6 | Excititor Guild | Implement `IGraphRevisionRepository.GetLatestByProjectAsync` |
| 20 | PG-T6b.1.4 | TODO | Depends on PG-T6a.6 | Excititor Guild | Implement `IGraphRevisionRepository.CreateAsync` |
| 21 | PG-T6b.2.1 | TODO | Depends on PG-T6b.1 | Excititor Guild | Implement `IGraphNodeRepository.GetByKeyAsync` |
| 22 | PG-T6b.2.2 | TODO | Depends on PG-T6b.1 | Excititor Guild | Implement `IGraphNodeRepository.BulkInsertAsync` using COPY |
| 23 | PG-T6b.2.3 | TODO | Depends on PG-T6b.2.2 | Excititor Guild | Optimize bulk insert for 10-100x performance |
| 24 | PG-T6b.3.1 | TODO | Depends on PG-T6b.2 | Excititor Guild | Implement `IGraphEdgeRepository.GetByRevisionAsync` |
| 25 | PG-T6b.3.2 | TODO | Depends on PG-T6b.2 | Excititor Guild | Implement `IGraphEdgeRepository.BulkInsertAsync` using COPY |
| 26 | PG-T6b.3.3 | TODO | Depends on PG-T6b.2 | Excititor Guild | Implement traversal queries (GetOutgoingAsync, GetIncomingAsync) |
| 27 | PG-T6b.4.1 | TODO | Depends on PG-T6b.3 | Excititor Guild | **CRITICAL:** Document revision_id computation algorithm |
| 28 | PG-T6b.4.2 | TODO | Depends on PG-T6b.4.1 | Excititor Guild | **CRITICAL:** Verify nodes inserted in deterministic order |
| 29 | PG-T6b.4.3 | TODO | Depends on PG-T6b.4.2 | Excititor Guild | **CRITICAL:** Verify edges inserted in deterministic order |
| 30 | PG-T6b.4.4 | TODO | Depends on PG-T6b.4.3 | Excititor Guild | **CRITICAL:** Write stability tests (5x computation must match) |
### Sprint 6c: Migration & Verification
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
| --- | --- | --- | --- | --- | --- |
| 31 | PG-T6c.1.1 | TODO | Depends on PG-T6b.4 | Excititor Guild | Build graph conversion service for MongoDB documents |
| 32 | PG-T6c.1.2 | TODO | Depends on PG-T6c.1.1 | Excititor Guild | Extract and insert nodes in deterministic order |
| 33 | PG-T6c.1.3 | TODO | Depends on PG-T6c.1.2 | Excititor Guild | Extract and insert edges in deterministic order |
| 34 | PG-T6c.2.1 | TODO | Depends on PG-T6c.1 | Excititor Guild | Build VEX statement conversion service |
| 35 | PG-T6c.2.2 | TODO | Depends on PG-T6c.2.1 | Excititor Guild | Preserve provenance and evidence |
| 36 | PG-T6c.3.1 | TODO | Depends on PG-T6c.2 | Excititor Guild | Select sample projects for dual pipeline comparison |
| 37 | PG-T6c.3.2 | TODO | Depends on PG-T6c.3.1 | Excititor Guild | Compute graphs with MongoDB backend |
| 38 | PG-T6c.3.3 | TODO | Depends on PG-T6c.3.2 | Excititor Guild | Compute graphs with PostgreSQL backend |
| 39 | PG-T6c.3.4 | TODO | Depends on PG-T6c.3.3 | Excititor Guild | **CRITICAL:** Compare revision_ids (must match) |
| 40 | PG-T6c.3.5 | TODO | Depends on PG-T6c.3.4 | Excititor Guild | Compare node/edge counts and VEX statements |
| 41 | PG-T6c.4 | TODO | Depends on PG-T6c.3 | Excititor Guild | Migrate active projects |
| 42 | PG-T6c.5 | TODO | Depends on PG-T6c.4 | Excititor Guild | Switch Excititor to PostgreSQL-only |
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2025-11-28 | Sprint file created | Planning |
## Decisions & Risks
- Graph nodes/edges use BIGSERIAL for high-volume IDs.
- Bulk insert using PostgreSQL COPY for 10-100x performance.
- **CRITICAL RISK:** Revision ID instability would break reproducibility guarantees.
- Graph traversal indexes on `(from_node_id)` and `(to_node_id)`.
- Estimated volumes: 10M+ nodes, 20M+ edges, 1M+ VEX statements.
## Exit Criteria
- [ ] All repository interfaces implemented
- [ ] Graph storage working efficiently with bulk operations
- [ ] **Graph revision IDs stable (deterministic)** - CRITICAL
- [ ] VEX statements preserved correctly
- [ ] All comparison tests pass
- [ ] Excititor running on PostgreSQL in staging
## Next Checkpoints
- This is the most complex phase; allocate extra time for determinism verification.
- Phase 7 (Cleanup) follows after successful cutover.
---
*Reference: docs/db/tasks/PHASE_6_VEX_GRAPH.md*

View File

@@ -0,0 +1,153 @@
# Sprint 3407 · PostgreSQL Conversion: Phase 7 - Cleanup & Optimization
## Topic & Scope
- Phase 7 of MongoDB to PostgreSQL conversion: Final cleanup and optimization.
- Remove MongoDB dependencies from all converted modules.
- Archive MongoDB data and decommission infrastructure.
- Optimize PostgreSQL performance and update documentation.
- **Working directory:** Multiple (cleanup across all modules)
## Dependencies & Concurrency
- Upstream: ALL previous phases (3400-3406) must be DONE.
- Concurrency: Must run sequentially after all modules converted.
- Reference: `docs/db/tasks/PHASE_7_CLEANUP.md`
## Documentation Prerequisites
- docs/db/README.md
- docs/db/SPECIFICATION.md
- docs/db/RULES.md
- docs/db/VERIFICATION.md
- All module AGENTS.md files
## Delivery Tracker
### T7.1: Remove MongoDB Dependencies
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
| --- | --- | --- | --- | --- | --- |
| 1 | PG-T7.1.1 | TODO | All phases complete | Infrastructure Guild | Remove `StellaOps.Authority.Storage.Mongo` project |
| 2 | PG-T7.1.2 | TODO | Depends on PG-T7.1.1 | Infrastructure Guild | Remove `StellaOps.Scheduler.Storage.Mongo` project |
| 3 | PG-T7.1.3 | TODO | Depends on PG-T7.1.1 | Infrastructure Guild | Remove `StellaOps.Notify.Storage.Mongo` project |
| 4 | PG-T7.1.4 | TODO | Depends on PG-T7.1.1 | Infrastructure Guild | Remove `StellaOps.Policy.Storage.Mongo` project |
| 5 | PG-T7.1.5 | TODO | Depends on PG-T7.1.1 | Infrastructure Guild | Remove `StellaOps.Concelier.Storage.Mongo` project |
| 6 | PG-T7.1.6 | TODO | Depends on PG-T7.1.1 | Infrastructure Guild | Remove `StellaOps.Excititor.Storage.Mongo` project |
| 7 | PG-T7.1.7 | TODO | Depends on PG-T7.1.6 | Infrastructure Guild | Update solution files |
| 8 | PG-T7.1.8 | TODO | Depends on PG-T7.1.7 | Infrastructure Guild | Remove dual-write wrappers |
| 9 | PG-T7.1.9 | TODO | Depends on PG-T7.1.8 | Infrastructure Guild | Remove MongoDB configuration options |
| 10 | PG-T7.1.10 | TODO | Depends on PG-T7.1.9 | Infrastructure Guild | Run full build to verify no broken references |
### T7.2: Archive MongoDB Data
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
| --- | --- | --- | --- | --- | --- |
| 11 | PG-T7.2.1 | TODO | Depends on PG-T7.1.10 | DevOps Guild | Take final MongoDB backup |
| 12 | PG-T7.2.2 | TODO | Depends on PG-T7.2.1 | DevOps Guild | Export to BSON/JSON archives |
| 13 | PG-T7.2.3 | TODO | Depends on PG-T7.2.2 | DevOps Guild | Store archives in secure location |
| 14 | PG-T7.2.4 | TODO | Depends on PG-T7.2.3 | DevOps Guild | Document archive contents and structure |
| 15 | PG-T7.2.5 | TODO | Depends on PG-T7.2.4 | DevOps Guild | Set retention policy for archives |
| 16 | PG-T7.2.6 | TODO | Depends on PG-T7.2.5 | DevOps Guild | Schedule MongoDB cluster decommission |
### T7.3: PostgreSQL Performance Optimization
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
| --- | --- | --- | --- | --- | --- |
| 17 | PG-T7.3.1 | TODO | Depends on PG-T7.2.6 | DBA Guild | Enable `pg_stat_statements` extension |
| 18 | PG-T7.3.2 | TODO | Depends on PG-T7.3.1 | DBA Guild | Identify slow queries |
| 19 | PG-T7.3.3 | TODO | Depends on PG-T7.3.2 | DBA Guild | Analyze query plans with EXPLAIN ANALYZE |
| 20 | PG-T7.3.4 | TODO | Depends on PG-T7.3.3 | DBA Guild | Add missing indexes |
| 21 | PG-T7.3.5 | TODO | Depends on PG-T7.3.4 | DBA Guild | Remove unused indexes |
| 22 | PG-T7.3.6 | TODO | Depends on PG-T7.3.5 | DBA Guild | Tune PostgreSQL configuration |
| 23 | PG-T7.3.7 | TODO | Depends on PG-T7.3.6 | Observability Guild | Set up query monitoring dashboard |
| 24 | PG-T7.3.8 | TODO | Depends on PG-T7.3.7 | DBA Guild | Document performance baselines |
### T7.4: Update Documentation
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
| --- | --- | --- | --- | --- | --- |
| 25 | PG-T7.4.1 | TODO | Depends on PG-T7.3.8 | Docs Guild | Update `docs/07_HIGH_LEVEL_ARCHITECTURE.md` |
| 26 | PG-T7.4.2 | TODO | Depends on PG-T7.4.1 | Docs Guild | Update module architecture docs |
| 27 | PG-T7.4.3 | TODO | Depends on PG-T7.4.2 | Docs Guild | Update deployment guides |
| 28 | PG-T7.4.4 | TODO | Depends on PG-T7.4.3 | Docs Guild | Update operations runbooks |
| 29 | PG-T7.4.5 | TODO | Depends on PG-T7.4.4 | Docs Guild | Update troubleshooting guides |
| 30 | PG-T7.4.6 | TODO | Depends on PG-T7.4.5 | Docs Guild | Update `CLAUDE.md` technology stack |
| 31 | PG-T7.4.7 | TODO | Depends on PG-T7.4.6 | Docs Guild | Create `docs/operations/postgresql-guide.md` |
| 32 | PG-T7.4.8 | TODO | Depends on PG-T7.4.7 | Docs Guild | Document backup/restore procedures |
| 33 | PG-T7.4.9 | TODO | Depends on PG-T7.4.8 | Docs Guild | Document scaling recommendations |
### T7.5: Update Air-Gap Kit
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
| --- | --- | --- | --- | --- | --- |
| 34 | PG-T7.5.1 | TODO | Depends on PG-T7.4.9 | DevOps Guild | Add PostgreSQL container image to kit |
| 35 | PG-T7.5.2 | TODO | Depends on PG-T7.5.1 | DevOps Guild | Update kit scripts for PostgreSQL setup |
| 36 | PG-T7.5.3 | TODO | Depends on PG-T7.5.2 | DevOps Guild | Include schema migrations in kit |
| 37 | PG-T7.5.4 | TODO | Depends on PG-T7.5.3 | DevOps Guild | Update kit documentation |
| 38 | PG-T7.5.5 | TODO | Depends on PG-T7.5.4 | DevOps Guild | Test kit installation in air-gapped environment |
| 39 | PG-T7.5.6 | TODO | Depends on PG-T7.5.5 | Docs Guild | Update `docs/24_OFFLINE_KIT.md` |
### T7.6: Final Verification
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
| --- | --- | --- | --- | --- | --- |
| 40 | PG-T7.6.1 | TODO | Depends on PG-T7.5.6 | QA Guild | Run full integration test suite |
| 41 | PG-T7.6.2 | TODO | Depends on PG-T7.6.1 | QA Guild | Run performance benchmark suite |
| 42 | PG-T7.6.3 | TODO | Depends on PG-T7.6.2 | QA Guild | Verify all modules on PostgreSQL |
| 43 | PG-T7.6.4 | TODO | Depends on PG-T7.6.3 | QA Guild | **Verify determinism tests pass** |
| 44 | PG-T7.6.5 | TODO | Depends on PG-T7.6.4 | QA Guild | Verify air-gap kit works |
| 45 | PG-T7.6.6 | TODO | Depends on PG-T7.6.5 | QA Guild | Generate final verification report |
| 46 | PG-T7.6.7 | TODO | Depends on PG-T7.6.6 | Management | Get sign-off from stakeholders |
### T7.7: Decommission MongoDB
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
| --- | --- | --- | --- | --- | --- |
| 47 | PG-T7.7.1 | TODO | Depends on PG-T7.6.7 | DevOps Guild | Verify no services using MongoDB |
| 48 | PG-T7.7.2 | TODO | Depends on PG-T7.7.1 | DevOps Guild | Stop MongoDB instances |
| 49 | PG-T7.7.3 | TODO | Depends on PG-T7.7.2 | DevOps Guild | Archive final state |
| 50 | PG-T7.7.4 | TODO | Depends on PG-T7.7.3 | DevOps Guild | Remove MongoDB from infrastructure |
| 51 | PG-T7.7.5 | TODO | Depends on PG-T7.7.4 | Observability Guild | Update monitoring/alerting |
| 52 | PG-T7.7.6 | TODO | Depends on PG-T7.7.5 | Finance | Update cost projections |
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2025-11-28 | Sprint file created | Planning |
## Decisions & Risks
- MongoDB archives are read-only backup; rollback to MongoDB after this phase is complex.
- Any new data created after cutover is PostgreSQL-only.
- Full rollback would require data export/import.
- PostgreSQL configuration tuning recommendations in PHASE_7_CLEANUP.md.
## Success Metrics
| Metric | Target | Measurement |
| --- | --- | --- |
| Query latency (p95) | < 100ms | pg_stat_statements |
| Error rate | < 0.01% | Application logs |
| Storage efficiency | < 120% of MongoDB | Disk usage |
| Test coverage | 100% | CI reports |
| Documentation coverage | 100% | Manual review |
## Exit Criteria
- [ ] All MongoDB code removed from converted modules
- [ ] MongoDB data archived
- [ ] PostgreSQL performance optimized
- [ ] All documentation updated
- [ ] Air-gap kit updated and tested
- [ ] Final verification report approved
- [ ] MongoDB infrastructure decommissioned
## Post-Conversion Monitoring
### First Week
- Monitor error rates closely
- Track query performance
- Watch for any data inconsistencies
- Have rollback plan ready (restore MongoDB)
### First Month
- Review query statistics weekly
- Optimize any slow queries found
- Monitor storage growth
- Adjust vacuum settings if needed
### Ongoing
- Regular performance reviews
- Index maintenance
- Backup verification
- Capacity planning
---
*Reference: docs/db/tasks/PHASE_7_CLEANUP.md*

View File

@@ -75,3 +75,122 @@ CLI mirrors these endpoints (`stella findings list|view|update|export`). Console
- `reports/` (generated PDFs/CSVs). - `reports/` (generated PDFs/CSVs).
- `signatures/` (DSSE envelopes). - `signatures/` (DSSE envelopes).
- Bundles produced deterministically; Export Center consumes them for mirror profiles. - Bundles produced deterministically; Export Center consumes them for mirror profiles.
## 8) VEX-First Triage UX
> Reference: Product advisory `28-Nov-2025 - Vulnerability Triage UX & VEX-First Decisioning.md`
### 8.1 Evidence-First Finding Cards
Each vulnerability finding is displayed as an evidence-first card showing:
- CVE/vulnerability identifier with severity badge
- Package name, version, ecosystem
- Location (file path, container layer, function, call path)
- Scanner and database date
- Status badges: `New`, `VEX: Not affected`, `Policy: blocked`
Primary actions per card:
- **VEX: Set status** - Opens VEX decision modal
- **Fix PR / View Fix** - When available from connected scanners (Snyk/GitLab)
- **Attach Evidence** - Link PRs, tickets, docs, commits
- **Copy audit reference** - findingId + attestation digest
### 8.2 VEX Decision Model
VEX decisions follow the `VexDecision` schema (`docs/schemas/vex-decision.schema.json`):
**Status values:**
- `NOT_AFFECTED` - Vulnerability does not apply to this artifact
- `AFFECTED_MITIGATED` - Vulnerable but mitigations in place
- `AFFECTED_UNMITIGATED` - Vulnerable without mitigations
- `FIXED` - Vulnerability has been remediated
**Justification types (CSAF/VEX aligned):**
- `CODE_NOT_PRESENT`
- `CODE_NOT_REACHABLE`
- `VULNERABLE_CODE_NOT_IN_EXECUTE_PATH`
- `CONFIGURATION_NOT_AFFECTED`
- `OS_NOT_AFFECTED`
- `RUNTIME_MITIGATION_PRESENT`
- `COMPENSATING_CONTROLS`
- `ACCEPTED_BUSINESS_RISK`
- `OTHER`
**Scope and validity:**
- Decisions can be scoped to specific environments and projects
- Validity windows with `notBefore` and `notAfter` timestamps
- Expired decisions are surfaced with warnings
### 8.3 Explainability Panel
Right-side panel with tabs for each finding:
**Overview tab:**
- Title, severity, package/version
- Scanner + DB date
- Finding history timeline
- Current VEX decision summary
**Reachability tab:**
- Call path visualization
- Module dependency list
- Runtime usage indicators (when available)
**Policy tab:**
- Policy evaluation result (PASS/WARN/FAIL)
- Gate details with "this gate failed because..." explanations
- Links to gate definitions
**Attestations tab:**
- Attestations mentioning this artifact/vulnerability/scan
- Type, subject, predicate, signer, verified status
- "Signed evidence" pill linking to attestation detail
### 8.4 VEX Decision APIs
New endpoints for VEX decisions:
- `POST /v1/vex-decisions` - Create new VEX decision with optional attestation
- `PATCH /v1/vex-decisions/{id}` - Update existing decision (creates superseding record)
- `GET /v1/vex-decisions` - List decisions with filters
- `GET /v1/vex-decisions/{id}` - Get decision detail
Request/response follows `VexDecisionDto` per schema.
### 8.5 Audit Bundle Export
Immutable audit bundles follow the `AuditBundleIndex` schema (`docs/schemas/audit-bundle-index.schema.json`):
**Bundle contents:**
- Vulnerability reports (scanner outputs)
- SBOM (CycloneDX/SPDX)
- VEX decisions
- Policy evaluations
- Raw attestations (DSSE envelopes)
- `audit-bundle-index.json` manifest with integrity hashes
**APIs:**
- `POST /v1/audit-bundles` - Create new bundle (async generation)
- `GET /v1/audit-bundles/{bundleId}` - Download bundle (ZIP or OCI)
- `GET /v1/audit-bundles` - List previously created bundles
### 8.6 Industry Pattern Alignment
The triage UX aligns with industry patterns from:
| Tool | Pattern Adopted |
|------|-----------------|
| **Snyk** | PR checks, Fix PRs, ignore with reasons |
| **GitLab SCA** | Vulnerability Report, status workflow, activity log |
| **Harbor/Trivy** | Artifact-centric navigation, attestation accessories |
| **Anchore** | Policy gates with trigger explanations, allowlists |
## 9) Schemas
The following JSON schemas define the data contracts for VEX and audit functionality:
- `docs/schemas/vex-decision.schema.json` - VEX decision form and persistence
- `docs/schemas/attestation-vuln-scan.schema.json` - Vulnerability scan attestation predicate
- `docs/schemas/audit-bundle-index.schema.json` - Audit bundle manifest
These schemas are referenced by both backend DTOs and frontend TypeScript interfaces.

View File

@@ -0,0 +1,523 @@
# Vulnerability Triage UX & VEX-First Decisioning
**Version:** 1.0
**Date:** 2025-11-28
**Status:** Canonical
This advisory defines the **end-to-end UX and data contracts** for vulnerability triage, VEX decisioning, evidence/explainability views, and audit export in Stella Ops. It synthesizes patterns from Snyk, GitLab SCA, Harbor/Trivy, and Anchore Enterprise into a converged UX layer.
---
## 1. Scope
This spec covers:
1. **Vulnerability triage** (first touch)
2. **Suppression / "Not Affected"** (VEX-aligned)
3. **Evidence & explainability views**
4. **Audit export** (immutable bundles)
5. **Attestations** as the backbone of evidence and gating
Stella Ops is the **converged UX layer** over scanner backends (Snyk, Trivy, GitLab, Anchore, or others).
---
## 2. Industry Pattern Analysis
### 2.1 Triage (First Touch)
| Tool | Pattern | Stella Ops Mirror |
|------|---------|-------------------|
| **Snyk** | PR checks show before/after diffs; Fix PRs directly from Issues list | Evidence-first cards with "Fix PR" CTA |
| **GitLab SCA** | Vulnerability Report with `Needs triage` default state | Status workflow starting at `DETECTED` |
| **Harbor/Trivy** | Project -> Artifacts -> Vulnerabilities panel with Rescan CTA | Artifact-centric navigation with scan badges |
| **Anchore** | Images -> Vulnerabilities aligned to Policies (pass/fail) | Policy gate indicators on all finding views |
**UI pattern to reuse:** An **evidence-first card** per finding (CVE, package, version, path) with primary actions (Fix PR, Dismiss/Not Affected, View Evidence).
### 2.2 Suppression / "Not Affected" (VEX-Aligned)
| Tool | Pattern | Stella Ops Mirror |
|------|---------|-------------------|
| **Snyk** | "Ignore" with reason + expiry; org-restricted; PR checks skip ignored | VEX `statusJustification` with validity window |
| **GitLab** | `Dismissed` status with required comment; activity log | VEX decisions with actor/timestamp/audit trail |
| **Anchore** | Allowlists + Policy Gates + VEX annotations | Allowlist integration + VEX buttons |
| **Harbor/Trivy** | No native VEX; store as in-toto attestation | Attestation-backed VEX decisions |
**UI pattern to reuse:** An **Actionable VEX** button (`Not Affected`, `Affected - mitigated`, `Fixed`) that opens a compact form: justification, evidence links, scope, expiry -> generates/updates a signed VEX note.
### 2.3 Evidence View (Explainability)
| Tool | Pattern | Stella Ops Mirror |
|------|---------|-------------------|
| **Snyk** | PR context + Fix PR evidence + ignore policy display | Explainability panel with PR/commit links |
| **GitLab** | Vulnerability Report hub with lifecycle activity | Decision history timeline |
| **Anchore** | Policy Gates breakdown showing which trigger caused fail/pass | Gate evaluation with trigger explanations |
| **Harbor/Trivy** | Scanner DB date, version, attestation links | Scanner metadata + attestation digest |
**UI pattern to reuse:** An **Explainability panel** on the right: "Why this is flagged / Why it passed" with timestamps, rule IDs, feed freshness, and the **Attestation digest**.
### 2.4 Audit Export (Immutable)
| Tool | Export Contents |
|------|-----------------|
| **Snyk** | PR check results + Ignore ledger + Fix PRs |
| **GitLab** | Vulnerability Report with status history |
| **Anchore** | Policy Bundle eval JSON as primary audit unit |
| **Harbor/Trivy** | Trivy report + signed attestation |
**UI pattern to reuse:** **"Create immutable audit bundle"** CTA that writes a ZIP/OCI artifact containing reports, VEX, policy evals, and attestations, plus a top-level manifest with hashes.
---
## 3. Core Data Model
### 3.1 Artifact
```text
Artifact
- id (string, stable)
- type (IMAGE | REPO | SBOM | FUNCTION | HOST)
- displayName
- coordinates (registry/repo URL, tag, branch, env, etc.)
- digests[] (e.g. sha256 for OCI images, commit SHA for repos)
- latestScanAttestations[] (AttestationRef)
- riskSummary (openCount, totalCount, maxSeverity, lastScanAt)
```
### 3.2 VulnerabilityFinding
```text
VulnerabilityFinding
- id (string, internal stable ID)
- sourceFindingId (string, from Snyk/Trivy/etc.)
- scanner (name, version)
- artifactId
- vulnerabilityId (CVE, GHSA, etc.)
- title
- severity (CRITICAL | HIGH | MEDIUM | LOW | INFO)
- package (name, version, ecosystem)
- location (filePath, containerLayer, function, callPath[])
- introducedBy (commitId?, imageDigest?, buildId?)
- firstSeenAt
- lastSeenAt
- status (DETECTED | RESOLVED | NO_LONGER_DETECTED)
- currentVexDecisionId? (if a VEX decision is attached)
- evidenceAttestationRefs[] (AttestationRef[])
```
### 3.3 VEXDecision
Represents a **VEX-style statement** attached to a finding + subject.
```text
VEXDecision
- id
- vulnerabilityId (CVE, etc.)
- subject (ArtifactRef / SBOM node ref)
- status (NOT_AFFECTED | AFFECTED_MITIGATED | AFFECTED_UNMITIGATED | FIXED)
- justificationType (enum; see section 7.3)
- justificationText (free text)
- evidenceRefs[] (links to PRs, commits, tickets, docs, etc.)
- scope (envs/projects where this decision applies)
- validFor (notBefore, notAfter?)
- attestationRef? (AttestationRef)
- supersedesDecisionId?
- createdBy (id, displayName)
- createdAt
- updatedAt
```
### 3.4 Attestation / AttestationRef
```text
AttestationRef
- id
- type (VULN_SCAN | SBOM | VEX | POLICY_EVAL | OTHER)
- statementId (if DSSE/Intoto)
- subjectName
- subjectDigest (e.g. sha256)
- predicateType (URI)
- createdAt
- signer (name, keyId)
- storage (ociRef | bundlePath | url)
```
### 3.5 PolicyEvaluation
```text
PolicyEvaluation
- id
- subject (ArtifactRef)
- policyBundleVersion
- overallResult (PASS | WARN | FAIL)
- gates[] (GateResult)
- attestationRef? (AttestationRef)
- evaluatedAt
```
### 3.6 AuditBundle
Represents a **downloadable immutable bundle** (ZIP or OCI artifact).
```text
AuditBundle
- bundleId
- version
- createdAt
- createdBy
- subject (ArtifactRef)
- index (AuditBundleIndex) <- JSON index inside the bundle
```
---
## 4. Primary UX Surfaces
### 4.1 Artifacts List
**Goal:** High-level "what's risky?" view and entry point into triage.
**Columns:**
- Artifact
- Type
- Environment(s)
- Open / Total vulns
- Max severity
- **Attestations** (badge w/ count)
- Last scan (timestamp + scanner)
**Actions:**
- View vulnerabilities (primary)
- View attestations
- Create audit bundle
### 4.2 Vulnerability Workspace (per Artifact)
**Split layout:**
**Left: Vulnerability list**
- Filters: severity, status, VEX status, scanner, package, introducedBy, env
- Sort: severity, recency, package, path
- Badges for:
- `New` (first seen in last N scans)
- `VEX: Not affected`
- `Policy: blocked` / `Policy: allowed`
**Right: Evidence / Explainability panel**
Tabs:
1. **Overview**
- Title, severity, package, version, path
- Scanner + db date
- Finding history timeline
- Current VEX decision summary (if any)
2. **Reachability**
- Call path, modules, runtime usage info (when available)
3. **Policy**
- Policy evaluation: which gate caused pass/fail
- Links to gate definitions
4. **Attestations**
- All attestations that mention:
- this artifact
- this vulnerabilityId
- this scan result
**Primary actions per finding:**
- **VEX: Set status** -> opens VEX Modal (see 4.3)
- **Open Fix PR / View Fix** (if available from Snyk/GitLab)
- **Attach Evidence** (link tickets / docs)
- **Copy audit reference** (findingId + attestation digest)
### 4.3 VEX Modal - "Affect & Justification"
**Entry points:**
- From a finding row ("VEX" button)
- From a policy failure explanation
- From a bulk action on multiple findings
**Fields (backed by `VEXDecision`):**
- Status (radio buttons):
- `Not affected`
- `Affected - mitigated`
- `Affected - not mitigated`
- `Fixed`
- Justification type (select - see section 7.3)
- Justification text (multi-line)
- Scope:
- Environments (multi-select)
- Projects / services (multi-select)
- Validity:
- Start (defaults now)
- Optional expiry (recommended)
- Evidence:
- Add links (PR, ticket, doc, commit)
- Attach attestation (optional; pick from list)
- Review:
- Summary of what will be written to the VEX statement
- "Will generate signed attestation" note (if enabled)
**Actions:**
- Save (creates or updates VEXDecision, writes VEX attestation)
- Cancel
- View raw JSON (for power users)
### 4.4 Attestations View
Per artifact, tab: **Attestations**
Table of attestations:
- Type (vuln scan, SBOM, VEX, policy)
- Subject name (shortened)
- Predicate type (URI)
- Scanner / policy engine (derived from predicate)
- Signer (keyId, trusted/not-trusted badge)
- Created at
- Verified (yes/no)
Click to open:
- Header: statement id, subject, signer
- Predicate preview:
- For vuln scan: counts, scanner version, db date
- For SBOM: bomRef, component counts
- For VEX: decision status, vulnerabilityId, scope
### 4.5 Policy & Gating View
Per environment / pipeline:
- Matrix of **gates** vs **subject types**:
- e.g. `CI Build`, `Registry Admission`, `Runtime Admission`
- Each gate shows:
- Rule description (severity thresholds, allowlist usage, required attestations)
- Last evaluation stats (pass/fail counts)
- Clicking a gate shows:
- Recent evaluations (with link to artifact & policy attestation)
- Which condition failed
### 4.6 Audit Export - Bundle Creation
**From:**
- Artifact page (button: "Create immutable audit bundle")
- Pipeline run detail
- Policy evaluation detail
**Workflow:**
1. User selects:
- Subject artifact + digest
- Time window (e.g. "last 7 days of scans & decisions")
- Included content (checklist):
- Vuln reports
- SBOM
- VEX decisions
- Policy evaluations
- Raw attestations
2. Backend generates:
- ZIP or OCI artifact
- `audit-bundle-index.json` at root
3. UI shows:
- Bundle ID & hash
- Download button
- OCI reference (if pushed to registry)
---
## 5. State Model
### 5.1 Finding Status vs VEX Status
Two separate but related states:
**Finding.status:**
- `DETECTED` - currently reported by at least one scanner
- `NO_LONGER_DETECTED` - was present, not in latest scan for this subject
- `RESOLVED` - confirmed removed (e.g. package upgraded, image replaced)
**VEXDecision.status:**
- `NOT_AFFECTED`
- `AFFECTED_MITIGATED`
- `AFFECTED_UNMITIGATED`
- `FIXED`
**UI rules:**
- If `Finding.status = NO_LONGER_DETECTED` and a VEXDecision still exists:
- Show badge: "Historical VEX decision (finding no longer detected)"
- If `VEXDecision.status = NOT_AFFECTED`:
- Policy engines may treat this as **non-blocking** (configurable)
---
## 6. Interaction Patterns to Mirror
### 6.1 From Snyk
- PR checks show **before/after** and don't fail on ignored issues
- Action: "Fix PR" from a finding
- Mapping:
- Stella Ops should show "Fix PR" and "Compare before/after" where data exists
- VEX `NOT_AFFECTED` should make **future checks ignore** that finding for that subject/scope
### 6.2 From GitLab SCA
- `Dismissed` with reasons and activity log
- Mapping:
- VEX decisions must have reason + actor + timestamp
- The activity log should show a full **decision history**
### 6.3 From Anchore
- Policy gates & allowlists
- Mapping:
- Gate evaluation screen with clear "this gate failed because..." explanation
---
## 7. Enumerations & Conventions
### 7.1 VEX Status
```text
NOT_AFFECTED
AFFECTED_MITIGATED
AFFECTED_UNMITIGATED
FIXED
```
### 7.2 VEX Scope
- `envs[]`: e.g. `["prod", "staging"]`
- `projects[]`: service / app names
- Default: applies to **all** unless restricted
### 7.3 Justification Type (inspired by CSAF/VEX)
```text
CODE_NOT_PRESENT
CODE_NOT_REACHABLE
VULNERABLE_CODE_NOT_IN_EXECUTE_PATH
CONFIGURATION_NOT_AFFECTED
OS_NOT_AFFECTED
RUNTIME_MITIGATION_PRESENT
COMPENSATING_CONTROLS
ACCEPTED_BUSINESS_RISK
OTHER
```
---
## 8. Attestation Placement
### 8.1 Trivy + Cosign
Generate **vulnerability-scan attestation** and SBOM attestation; attach to image via OCI referrers. These attestations become the source of truth for evidence and audit export.
### 8.2 Harbor
Treat attestations as first-class accessories/refs to the image. Surface them next to the Vulnerabilities tab. Link them into the explainability panel.
### 8.3 Anchore
Reference attestation digests inside **Policy evaluation** output so pass/fail is traceable to signed inputs.
### 8.4 Snyk/GitLab
Surface attestation presence in PR/Security dashboards to prove findings came from a **signed** scan; link out to the OCI digest.
**UI pattern:** Small **"Signed evidence"** pill on each finding; clicking opens the attestation JSON (human-readable view) + verify command snippet.
---
## 9. Gating Controls
| Tool | Mechanism | Stella Ops Mirror |
|------|-----------|-------------------|
| **Anchore** | Policy Gates/Triggers model for hard gates | Gates per environment with trigger explainability |
| **Snyk** | PR checks + Auto Fix PRs as soft gates | PR integration with soft/hard gate toggles |
| **GitLab** | MR approvals + Security Policies; auto-resolve on no-longer-detected | Status-aware policies with auto-resolution |
| **Harbor** | External policy engines (Kyverno/OPA) verify signatures/attestations | Admission controller integration |
---
## 10. Minimal UI Wireframe
### 10.1 Artifacts List
| Image | Tag | Risk (open/total) | Attestations | Last scan |
|-------|-----|-------------------|--------------|-----------|
| app/service | v1.2.3 | 3/47 | 4 | 2h ago (Trivy) |
### 10.2 Artifact -> Vulnerabilities Tab (Evidence-First)
```
+----------------------------------+-----------------------------------+
| Finding Cards (scrollable) | Explainability Panel |
| | |
| [CVE-2024-1234] CRITICAL | Overview | Reachability | Policy |
| openssl 3.0.14 -> 3.0.15 | |
| [Fix PR] [VEX: Not Affected] | Scanner: Trivy 0.53.0 |
| [Attach Evidence] | DB: 2025-11-27 |
| | Attestation: sha256:2e61... |
| [CVE-2024-5678] HIGH | |
| log4j 2.17.0 | [Why flagged] |
| [VEX: Mitigated] | - version.match: 2.17.0 < 2.17.1 |
| | - gate: severity >= HIGH |
+----------------------------------+-----------------------------------+
```
### 10.3 Policy View
Gate rules (like Anchore) with preview + dry-run; show which triggers cause failure.
### 10.4 Audit
**"Create immutable audit bundle"** -> produces ZIP/OCI artifact with reports, VEX JSON, policy evals, and in-toto/DSSE attestations.
### 10.5 Registry/Admission
"Ready to deploy" badge when all gates met and required attestations verified.
---
## 11. API Endpoints (High-Level)
```text
GET /artifacts
GET /artifacts/{id}/vulnerabilities
GET /vulnerabilities/{id}
POST /vex-decisions
PATCH /vex-decisions/{id}
GET /artifacts/{id}/attestations
POST /audit-bundles
GET /audit-bundles/{bundleId}
```
---
## 12. JSON Schema Locations
The following schemas should be created/maintained:
- `docs/schemas/vex-decision.schema.json` - VEX decision form schema
- `docs/schemas/attestation-vuln-scan.schema.json` - Vulnerability scan attestation
- `docs/schemas/audit-bundle-index.schema.json` - Audit bundle manifest
---
## 13. Related Advisories
- `27-Nov-2025 - Explainability Layer for Vulnerability Verdicts.md` - Evidence chain model
- `27-Nov-2025 - Making Graphs Understandable to Humans.md` - Graph navigation UX
- `25-Nov-2025 - Define Safe VEX 'Not Affected' Claims with Proofs.md` - VEX proof requirements
---
## 14. Sprint Integration
This advisory maps to:
- **SPRINT_0215_0001_0001_vuln_triage_ux.md** (NEW) - UI triage workspace implementation
- **SPRINT_210_ui_ii.md** - VEX tab tasks (UI-LNM-22-003)
- **SPRINT_0334_docs_modules_vuln_explorer.md** - Module documentation updates
---
*Last updated: 2025-11-28*

View File

@@ -64,6 +64,22 @@ These are the authoritative advisories to reference for implementation:
- **Sprint:** Multiple sprints (0186, 0401, 0512) - **Sprint:** Multiple sprints (0186, 0401, 0512)
- **Status:** High-level roadmap document - **Status:** High-level roadmap document
### Vulnerability Triage UX & VEX-First Decisioning
- **Canonical:** `28-Nov-2025 - Vulnerability Triage UX & VEX-First Decisioning.md`
- **Sprint:** SPRINT_0215_0001_0001_vuln_triage_ux.md (NEW)
- **Related Sprints:**
- SPRINT_210_ui_ii.md (UI-LNM-22-003 VEX tab)
- SPRINT_0334_docs_modules_vuln_explorer.md (docs)
- **Related Advisories:**
- `27-Nov-2025 - Explainability Layer for Vulnerability Verdicts.md` (evidence chain)
- `27-Nov-2025 - Making Graphs Understandable to Humans.md` (graph UX)
- `25-Nov-2025 - Define Safe VEX 'Not Affected' Claims with Proofs.md` (VEX proofs)
- **Status:** New - defines converged triage UX across Snyk/GitLab/Harbor/Anchore patterns
- **Schemas:**
- `docs/schemas/vex-decision.schema.json`
- `docs/schemas/attestation-vuln-scan.schema.json`
- `docs/schemas/audit-bundle-index.schema.json`
## Files to Archive ## Files to Archive
The following files should be moved to `archived/` as they are superseded: The following files should be moved to `archived/` as they are superseded:
@@ -95,6 +111,7 @@ The following files should be moved to `archived/` as they are superseded:
| Unknowns Registry | SPRINT_0140_0001_0001 | EXISTING (implemented) | | Unknowns Registry | SPRINT_0140_0001_0001 | EXISTING (implemented) |
| Graph Revision IDs | SPRINT_0401_0001_0001 | EXISTING | | Graph Revision IDs | SPRINT_0401_0001_0001 | EXISTING |
| DSSE/Rekor Batching | SPRINT_0401_0001_0001 | EXISTING | | DSSE/Rekor Batching | SPRINT_0401_0001_0001 | EXISTING |
| Vuln Triage UX / VEX | SPRINT_0215_0001_0001 | NEW |
## Implementation Priority ## Implementation Priority
@@ -103,8 +120,9 @@ Based on gap analysis:
1. **P0 - CVSS v4.0** (Sprint 0190) - Industry moving to v4.0, genuine gap 1. **P0 - CVSS v4.0** (Sprint 0190) - Industry moving to v4.0, genuine gap
2. **P1 - SPDX 3.0.1** (Sprint 0186 tasks 15a-15f) - Standards compliance 2. **P1 - SPDX 3.0.1** (Sprint 0186 tasks 15a-15f) - Standards compliance
3. **P1 - Public Benchmark** (Sprint 0513) - Differentiation/marketing value 3. **P1 - Public Benchmark** (Sprint 0513) - Differentiation/marketing value
4. **P2 - Explainability** (Sprint 0401) - UX enhancement, existing tasks 4. **P1 - Vuln Triage UX** (Sprint 0215) - Industry-aligned UX for competitive parity
5. **P3 - Already Implemented** - Unknowns, Graph IDs, DSSE batching 5. **P2 - Explainability** (Sprint 0401) - UX enhancement, existing tasks
6. **P3 - Already Implemented** - Unknowns, Graph IDs, DSSE batching
## Implementer Quick Reference ## Implementer Quick Reference
@@ -124,7 +142,10 @@ For each topic, the implementer should read:
| Sbomer | `docs/modules/sbomer/architecture.md` | `src/Sbomer/*/AGENTS.md` | | Sbomer | `docs/modules/sbomer/architecture.md` | `src/Sbomer/*/AGENTS.md` |
| Signals | `docs/modules/signals/architecture.md` | `src/Signals/*/AGENTS.md` | | Signals | `docs/modules/signals/architecture.md` | `src/Signals/*/AGENTS.md` |
| Attestor | `docs/modules/attestor/architecture.md` | `src/Attestor/*/AGENTS.md` | | Attestor | `docs/modules/attestor/architecture.md` | `src/Attestor/*/AGENTS.md` |
| Vuln Explorer | `docs/modules/vuln-explorer/architecture.md` | `src/VulnExplorer/*/AGENTS.md` |
| VEX-Lens | `docs/modules/vex-lens/architecture.md` | `src/Excititor/*/AGENTS.md` |
| UI | `docs/modules/ui/architecture.md` | `src/UI/*/AGENTS.md` |
--- ---
*Index created: 2025-11-27* *Index created: 2025-11-27*
*Last updated: 2025-11-27* *Last updated: 2025-11-28*

View File

@@ -0,0 +1,226 @@
{
"$id": "https://stella.ops/schema/attestation-vuln-scan.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "VulnScanAttestation",
"description": "In-toto style attestation for vulnerability scan results",
"type": "object",
"required": ["_type", "predicateType", "subject", "predicate", "attestationMeta"],
"properties": {
"_type": {
"type": "string",
"const": "https://in-toto.io/Statement/v0.1",
"description": "In-toto statement type URI"
},
"predicateType": {
"type": "string",
"const": "https://stella.ops/predicates/vuln-scan/v1",
"description": "Predicate type URI for Stella Ops vulnerability scans"
},
"subject": {
"type": "array",
"items": {
"$ref": "#/$defs/AttestationSubject"
},
"minItems": 1,
"description": "Artifacts that were scanned"
},
"predicate": {
"$ref": "#/$defs/VulnScanPredicate",
"description": "Vulnerability scan result predicate"
},
"attestationMeta": {
"$ref": "#/$defs/AttestationMeta",
"description": "Attestation metadata including signer info"
}
},
"$defs": {
"AttestationSubject": {
"type": "object",
"required": ["name", "digest"],
"properties": {
"name": {
"type": "string",
"description": "Subject name (e.g. image reference)",
"examples": ["registry.internal/stella/app-service@sha256:7d9c..."]
},
"digest": {
"type": "object",
"additionalProperties": {
"type": "string"
},
"description": "Algorithm -> digest map",
"examples": [{"sha256": "7d9cd5f1a2a0dd9a41a2c43a5b7d8a0bcd9e34cf39b3f43a70595c834f0a4aee"}]
}
}
},
"VulnScanPredicate": {
"type": "object",
"required": ["scanner", "scanStartedAt", "scanCompletedAt", "severityCounts", "findingReport"],
"properties": {
"scanner": {
"$ref": "#/$defs/ScannerInfo",
"description": "Scanner that produced this result"
},
"scannerDb": {
"$ref": "#/$defs/ScannerDbInfo",
"description": "Vulnerability database info"
},
"scanStartedAt": {
"type": "string",
"format": "date-time",
"description": "ISO-8601 timestamp when scan started"
},
"scanCompletedAt": {
"type": "string",
"format": "date-time",
"description": "ISO-8601 timestamp when scan completed"
},
"severityCounts": {
"type": "object",
"properties": {
"CRITICAL": { "type": "integer", "minimum": 0 },
"HIGH": { "type": "integer", "minimum": 0 },
"MEDIUM": { "type": "integer", "minimum": 0 },
"LOW": { "type": "integer", "minimum": 0 }
},
"description": "Count of findings by severity"
},
"findingReport": {
"$ref": "#/$defs/FindingReport",
"description": "Reference to the full findings report"
}
}
},
"ScannerInfo": {
"type": "object",
"required": ["name", "version"],
"properties": {
"name": {
"type": "string",
"description": "Scanner name",
"examples": ["Trivy", "Snyk", "Grype"]
},
"version": {
"type": "string",
"description": "Scanner version",
"examples": ["0.53.0"]
}
}
},
"ScannerDbInfo": {
"type": "object",
"properties": {
"lastUpdatedAt": {
"type": "string",
"format": "date-time",
"description": "ISO-8601 timestamp when vulnerability DB was last updated"
}
}
},
"FindingReport": {
"type": "object",
"required": ["mediaType", "location", "digest"],
"properties": {
"mediaType": {
"type": "string",
"default": "application/json",
"description": "Media type of the report",
"examples": ["application/json", "application/vnd.cyclonedx+json"]
},
"location": {
"type": "string",
"description": "Path or URI to the report file",
"examples": ["reports/trivy/app-service-7d9c-vulns.json"]
},
"digest": {
"type": "object",
"additionalProperties": {
"type": "string"
},
"description": "Content digest of the report"
}
}
},
"AttestationMeta": {
"type": "object",
"required": ["statementId", "createdAt", "signer"],
"properties": {
"statementId": {
"type": "string",
"description": "Unique identifier for this attestation statement"
},
"createdAt": {
"type": "string",
"format": "date-time",
"description": "ISO-8601 timestamp when attestation was created"
},
"signer": {
"$ref": "#/$defs/AttestationSigner",
"description": "Entity that signed this attestation"
}
}
},
"AttestationSigner": {
"type": "object",
"required": ["name", "keyId"],
"properties": {
"name": {
"type": "string",
"description": "Signer name/identity",
"examples": ["ci/trivy-signer"]
},
"keyId": {
"type": "string",
"description": "Key identifier (fingerprint)",
"examples": ["SHA256:ae12c8d1..."]
}
}
}
},
"examples": [
{
"_type": "https://in-toto.io/Statement/v0.1",
"predicateType": "https://stella.ops/predicates/vuln-scan/v1",
"subject": [
{
"name": "registry.internal/stella/app-service@sha256:7d9c...",
"digest": {
"sha256": "7d9cd5f1a2a0dd9a41a2c43a5b7d8a0bcd9e34cf39b3f43a70595c834f0a4aee"
}
}
],
"predicate": {
"scanner": {
"name": "Trivy",
"version": "0.53.0"
},
"scannerDb": {
"lastUpdatedAt": "2025-11-20T09:32:00Z"
},
"scanStartedAt": "2025-11-21T09:00:00Z",
"scanCompletedAt": "2025-11-21T09:01:05Z",
"severityCounts": {
"CRITICAL": 1,
"HIGH": 7,
"MEDIUM": 13,
"LOW": 4
},
"findingReport": {
"mediaType": "application/json",
"location": "reports/trivy/app-service-7d9c-vulns.json",
"digest": {
"sha256": "db569aa8a1b847a922b7d61d276cc2a0ccf99efad0879500b56854b43265c09a"
}
}
},
"attestationMeta": {
"statementId": "att-vuln-trivy-app-service-7d9c",
"createdAt": "2025-11-21T09:01:05Z",
"signer": {
"name": "ci/trivy-signer",
"keyId": "SHA256:ae12c8d1..."
}
}
}
]
}

View File

@@ -0,0 +1,312 @@
{
"$id": "https://stella.ops/schema/audit-bundle-index.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "AuditBundleIndex",
"description": "Root manifest for an immutable audit bundle containing vulnerability reports, VEX decisions, policy evaluations, and attestations",
"type": "object",
"required": ["apiVersion", "kind", "bundleId", "createdAt", "createdBy", "subject", "artifacts"],
"properties": {
"apiVersion": {
"type": "string",
"const": "stella.ops/v1",
"description": "API version for this bundle format"
},
"kind": {
"type": "string",
"const": "AuditBundleIndex",
"description": "Resource kind identifier"
},
"bundleId": {
"type": "string",
"description": "Unique identifier for this bundle",
"examples": ["bndl-6f6b0c94-9c5b-4bbf-9a77-a5d8a83da4a2"]
},
"createdAt": {
"type": "string",
"format": "date-time",
"description": "ISO-8601 timestamp when bundle was created"
},
"createdBy": {
"$ref": "#/$defs/BundleActorRef",
"description": "User who created this bundle"
},
"subject": {
"$ref": "#/$defs/BundleSubjectRef",
"description": "Primary artifact this bundle documents"
},
"timeWindow": {
"type": "object",
"properties": {
"from": {
"type": "string",
"format": "date-time",
"description": "Start of time window for included artifacts"
},
"to": {
"type": "string",
"format": "date-time",
"description": "End of time window for included artifacts"
}
},
"description": "Optional time window filter for included content"
},
"artifacts": {
"type": "array",
"items": {
"$ref": "#/$defs/BundleArtifact"
},
"description": "List of artifacts included in this bundle"
},
"vexDecisions": {
"type": "array",
"items": {
"$ref": "#/$defs/BundleVexDecisionEntry"
},
"description": "Summary of VEX decisions included in this bundle"
},
"integrity": {
"$ref": "#/$defs/BundleIntegrity",
"description": "Integrity verification data for the entire bundle"
}
},
"$defs": {
"BundleActorRef": {
"type": "object",
"required": ["id", "displayName"],
"properties": {
"id": {
"type": "string",
"description": "User identifier"
},
"displayName": {
"type": "string",
"description": "Human-readable display name"
}
}
},
"BundleSubjectRef": {
"type": "object",
"required": ["type", "name", "digest"],
"properties": {
"type": {
"type": "string",
"enum": ["IMAGE", "REPO", "SBOM", "OTHER"],
"description": "Type of subject artifact"
},
"name": {
"type": "string",
"description": "Human-readable subject name"
},
"digest": {
"type": "object",
"additionalProperties": {
"type": "string"
},
"description": "Algorithm -> digest map"
}
}
},
"BundleArtifact": {
"type": "object",
"required": ["id", "type", "source", "path", "mediaType", "digest"],
"properties": {
"id": {
"type": "string",
"description": "Internal identifier for this artifact within the bundle"
},
"type": {
"type": "string",
"enum": ["VULN_REPORT", "SBOM", "VEX", "POLICY_EVAL", "OTHER"],
"description": "Type of artifact"
},
"source": {
"type": "string",
"description": "Tool/service that produced this artifact",
"examples": ["Trivy@0.53.0", "Syft@1.0.0", "StellaOps", "StellaPolicyEngine@2.1.0"]
},
"path": {
"type": "string",
"description": "Relative path within the bundle",
"examples": ["reports/trivy/app-service-7d9c-vulns.json"]
},
"mediaType": {
"type": "string",
"description": "Media type of the artifact",
"examples": ["application/json", "application/vnd.cyclonedx+json"]
},
"digest": {
"type": "object",
"additionalProperties": {
"type": "string"
},
"description": "Content digest of the artifact"
},
"attestation": {
"$ref": "#/$defs/BundleArtifactAttestationRef",
"description": "Optional reference to attestation for this artifact"
}
}
},
"BundleArtifactAttestationRef": {
"type": "object",
"required": ["path", "digest"],
"properties": {
"path": {
"type": "string",
"description": "Relative path to attestation within the bundle"
},
"digest": {
"type": "object",
"additionalProperties": {
"type": "string"
},
"description": "Content digest of the attestation"
}
}
},
"BundleVexDecisionEntry": {
"type": "object",
"required": ["decisionId", "vulnerabilityId", "status", "path", "digest"],
"properties": {
"decisionId": {
"type": "string",
"format": "uuid",
"description": "VEX decision ID"
},
"vulnerabilityId": {
"type": "string",
"description": "CVE or vulnerability identifier"
},
"status": {
"type": "string",
"enum": ["NOT_AFFECTED", "AFFECTED_MITIGATED", "AFFECTED_UNMITIGATED", "FIXED"],
"description": "VEX status"
},
"path": {
"type": "string",
"description": "Relative path to VEX decision file"
},
"digest": {
"type": "object",
"additionalProperties": {
"type": "string"
},
"description": "Content digest of the decision file"
}
}
},
"BundleIntegrity": {
"type": "object",
"required": ["rootHash", "hashAlgorithm"],
"properties": {
"rootHash": {
"type": "string",
"description": "Root hash covering all artifacts in the bundle"
},
"hashAlgorithm": {
"type": "string",
"default": "sha256",
"description": "Hash algorithm used for integrity verification"
}
}
}
},
"examples": [
{
"apiVersion": "stella.ops/v1",
"kind": "AuditBundleIndex",
"bundleId": "bndl-6f6b0c94-9c5b-4bbf-9a77-a5d8a83da4a2",
"createdAt": "2025-11-21T09:05:30Z",
"createdBy": {
"id": "user-123",
"displayName": "Alice Johnson"
},
"subject": {
"type": "IMAGE",
"name": "registry.internal/stella/app-service@sha256:7d9c...",
"digest": {
"sha256": "7d9cd5f1a2a0dd9a41a2c43a5b7d8a0bcd9e34cf39b3f43a70595c834f0a4aee"
}
},
"timeWindow": {
"from": "2025-11-14T00:00:00Z",
"to": "2025-11-21T09:05:00Z"
},
"artifacts": [
{
"id": "vuln-report-trivy",
"type": "VULN_REPORT",
"source": "Trivy@0.53.0",
"path": "reports/trivy/app-service-7d9c-vulns.json",
"mediaType": "application/json",
"digest": {
"sha256": "db569aa8a1b847a922b7d61d276cc2a0ccf99efad0879500b56854b43265c09a"
},
"attestation": {
"path": "attestations/vuln-scan-trivy.dsse.json",
"digest": {
"sha256": "2e613df97fe2aa9baf7a8dac9cfaa407e60c808a8af8e7d5e50c029f6c51a54b"
}
}
},
{
"id": "sbom-cyclonedx",
"type": "SBOM",
"source": "Syft@1.0.0",
"path": "sbom/app-service-7d9c-cyclonedx.json",
"mediaType": "application/vnd.cyclonedx+json",
"digest": {
"sha256": "9477b3a9410423b37c39076678a936d5854aa2d905e72a2222c153e3e51ab150"
},
"attestation": {
"path": "attestations/sbom-syft.dsse.json",
"digest": {
"sha256": "3ebf5dc03f862b4b2fdef201130f5c6a9bde7cb0bcf4f57e7686adbc83c9c897"
}
}
},
{
"id": "vex-decisions",
"type": "VEX",
"source": "StellaOps",
"path": "vex/app-service-7d9c-vex.json",
"mediaType": "application/json",
"digest": {
"sha256": "b56f0d05af5dc4ba79ccc1d228dba27a0d9607eef17fa7faf569e3020c39da83"
}
},
{
"id": "policy-eval-prod-admission",
"type": "POLICY_EVAL",
"source": "StellaPolicyEngine@2.1.0",
"path": "policy-evals/prod-admission.json",
"mediaType": "application/json",
"digest": {
"sha256": "cf8617dd3a63b953f31501045bb559c7095fa2b6965643b64a4b463756cfa9c3"
},
"attestation": {
"path": "attestations/policy-prod-admission.dsse.json",
"digest": {
"sha256": "a7ea883ffa1100a62f0f89f455b659017864c65a4fad0af0ac3d8b989e1a6ff3"
}
}
}
],
"vexDecisions": [
{
"decisionId": "8a3d0b5a-1e07-4b57-b6a1-1a29ce6c889e",
"vulnerabilityId": "CVE-2023-12345",
"status": "NOT_AFFECTED",
"path": "vex/CVE-2023-12345-app-service.json",
"digest": {
"sha256": "b56f0d05af5dc4ba79ccc1d228dba27a0d9607eef17fa7faf569e3020c39da83"
}
}
],
"integrity": {
"rootHash": "f4ede91c4396f9dfdacaf15fe0293c6349f467701f4ef7af6a2ecd4f5bf42254",
"hashAlgorithm": "sha256"
}
}
]
}

View File

@@ -0,0 +1,257 @@
{
"$id": "https://stella.ops/schema/vex-decision.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "VexDecision",
"description": "VEX-style statement attached to a finding + subject, representing a vulnerability exploitability decision",
"type": "object",
"required": [
"id",
"vulnerabilityId",
"subject",
"status",
"justificationType",
"createdBy",
"createdAt"
],
"properties": {
"id": {
"type": "string",
"format": "uuid",
"description": "Internal stable ID for this decision"
},
"vulnerabilityId": {
"type": "string",
"description": "CVE, GHSA, or other vulnerability identifier",
"examples": ["CVE-2023-12345", "GHSA-xxxx-yyyy-zzzz"]
},
"subject": {
"$ref": "#/$defs/SubjectRef",
"description": "The artifact or SBOM component this decision applies to"
},
"status": {
"type": "string",
"enum": [
"NOT_AFFECTED",
"AFFECTED_MITIGATED",
"AFFECTED_UNMITIGATED",
"FIXED"
],
"description": "VEX status following OpenVEX semantics"
},
"justificationType": {
"type": "string",
"enum": [
"CODE_NOT_PRESENT",
"CODE_NOT_REACHABLE",
"VULNERABLE_CODE_NOT_IN_EXECUTE_PATH",
"CONFIGURATION_NOT_AFFECTED",
"OS_NOT_AFFECTED",
"RUNTIME_MITIGATION_PRESENT",
"COMPENSATING_CONTROLS",
"ACCEPTED_BUSINESS_RISK",
"OTHER"
],
"description": "Justification type inspired by CSAF/VEX specifications"
},
"justificationText": {
"type": "string",
"maxLength": 4000,
"description": "Free-form explanation supporting the justification type"
},
"evidenceRefs": {
"type": "array",
"items": {
"$ref": "#/$defs/EvidenceRef"
},
"description": "Links to PRs, commits, tickets, docs supporting this decision"
},
"scope": {
"$ref": "#/$defs/VexScope",
"description": "Environments and projects where this decision applies"
},
"validFor": {
"$ref": "#/$defs/ValidFor",
"description": "Time window during which this decision is valid"
},
"attestationRef": {
"$ref": "#/$defs/AttestationRef",
"description": "Reference to the signed attestation for this decision"
},
"supersedesDecisionId": {
"type": "string",
"format": "uuid",
"description": "ID of a previous decision this one supersedes"
},
"createdBy": {
"$ref": "#/$defs/ActorRef",
"description": "User who created this decision"
},
"createdAt": {
"type": "string",
"format": "date-time",
"description": "ISO-8601 timestamp when decision was created"
},
"updatedAt": {
"type": "string",
"format": "date-time",
"description": "ISO-8601 timestamp when decision was last updated"
}
},
"$defs": {
"SubjectRef": {
"type": "object",
"required": ["type", "name", "digest"],
"properties": {
"type": {
"type": "string",
"enum": ["IMAGE", "REPO", "SBOM_COMPONENT", "OTHER"],
"description": "Type of artifact this subject represents"
},
"name": {
"type": "string",
"description": "Human-readable subject name (e.g. image ref, package name)",
"examples": ["registry.internal/stella/app-service@sha256:7d9c..."]
},
"digest": {
"type": "object",
"additionalProperties": {
"type": "string"
},
"description": "Algorithm -> digest map (e.g. sha256 -> hex string)",
"examples": [{"sha256": "7d9cd5f1a2a0dd9a41a2c43a5b7d8a0bcd9e34cf39b3f43a70595c834f0a4aee"}]
},
"sbomNodeId": {
"type": "string",
"description": "Optional SBOM node/bomRef identifier for SBOM_COMPONENT subjects"
}
}
},
"EvidenceRef": {
"type": "object",
"required": ["type", "url"],
"properties": {
"type": {
"type": "string",
"enum": ["PR", "TICKET", "DOC", "COMMIT", "OTHER"],
"description": "Type of evidence link"
},
"title": {
"type": "string",
"description": "Human-readable title for the evidence"
},
"url": {
"type": "string",
"format": "uri",
"description": "URL to the evidence resource"
}
}
},
"VexScope": {
"type": "object",
"properties": {
"environments": {
"type": "array",
"items": {
"type": "string"
},
"description": "Environment names where decision applies (e.g. prod, staging)",
"examples": [["prod", "staging"]]
},
"projects": {
"type": "array",
"items": {
"type": "string"
},
"description": "Project/service names where decision applies"
}
},
"description": "If empty/null, decision applies to all environments and projects"
},
"ValidFor": {
"type": "object",
"properties": {
"notBefore": {
"type": "string",
"format": "date-time",
"description": "Decision is not valid before this timestamp (defaults to creation time)"
},
"notAfter": {
"type": "string",
"format": "date-time",
"description": "Decision expires after this timestamp (recommended to set)"
}
}
},
"AttestationRef": {
"type": "object",
"properties": {
"id": {
"type": "string",
"description": "Internal attestation identifier"
},
"digest": {
"type": "object",
"additionalProperties": {
"type": "string"
},
"description": "Content digest of the attestation"
},
"storage": {
"type": "string",
"description": "Storage location (OCI ref, bundle path, or URL)",
"examples": ["oci://registry.internal/stella/attestations@sha256:2e61..."]
}
}
},
"ActorRef": {
"type": "object",
"required": ["id", "displayName"],
"properties": {
"id": {
"type": "string",
"description": "User identifier"
},
"displayName": {
"type": "string",
"description": "Human-readable display name"
}
}
}
},
"examples": [
{
"id": "8a3d0b5a-1e07-4b57-b6a1-1a29ce6c889e",
"vulnerabilityId": "CVE-2023-12345",
"subject": {
"type": "IMAGE",
"name": "registry.internal/stella/app-service@sha256:7d9c...",
"digest": {
"sha256": "7d9cd5f1a2a0dd9a41a2c43a5b7d8a0bcd9e34cf39b3f43a70595c834f0a4aee"
}
},
"status": "NOT_AFFECTED",
"justificationType": "VULNERABLE_CODE_NOT_IN_EXECUTE_PATH",
"justificationText": "Vulnerable CLI helper is present in the image but never invoked in the running service.",
"evidenceRefs": [
{
"type": "PR",
"title": "Document non-usage of CLI helper",
"url": "https://git.example.com/stella/app-service/merge_requests/42"
}
],
"scope": {
"environments": ["prod", "staging"],
"projects": ["app-service"]
},
"validFor": {
"notBefore": "2025-11-21T10:15:00Z",
"notAfter": "2026-05-21T10:15:00Z"
},
"createdBy": {
"id": "user-123",
"displayName": "Alice Johnson"
},
"createdAt": "2025-11-21T10:15:00Z"
}
]
}

View File

@@ -0,0 +1,39 @@
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Options;
using StellaOps.Infrastructure.Postgres.Connections;
using StellaOps.Infrastructure.Postgres.Options;
namespace StellaOps.Authority.Storage.Postgres;
/// <summary>
/// PostgreSQL data source for the Authority module.
/// Manages connections with tenant context for authentication and authorization data.
/// </summary>
public sealed class AuthorityDataSource : DataSourceBase
{
/// <summary>
/// Default schema name for Authority tables.
/// </summary>
public const string DefaultSchemaName = "auth";
/// <summary>
/// Creates a new Authority data source.
/// </summary>
public AuthorityDataSource(IOptions<PostgresOptions> options, ILogger<AuthorityDataSource> logger)
: base(CreateOptions(options.Value), logger)
{
}
/// <inheritdoc />
protected override string ModuleName => "Authority";
private static PostgresOptions CreateOptions(PostgresOptions baseOptions)
{
// Use default schema if not specified
if (string.IsNullOrWhiteSpace(baseOptions.SchemaName))
{
baseOptions.SchemaName = DefaultSchemaName;
}
return baseOptions;
}
}

View File

@@ -0,0 +1,232 @@
-- Authority Schema Migration 001: Initial Schema
-- Creates the authority schema for IAM, tenants, users, and tokens
-- Create schema
CREATE SCHEMA IF NOT EXISTS authority;
-- Tenants table
CREATE TABLE IF NOT EXISTS authority.tenants (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL UNIQUE,
name TEXT NOT NULL,
display_name TEXT,
status TEXT NOT NULL DEFAULT 'active' CHECK (status IN ('active', 'suspended', 'deleted')),
settings JSONB NOT NULL DEFAULT '{}',
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by TEXT,
updated_by TEXT
);
CREATE INDEX idx_tenants_status ON authority.tenants(status);
CREATE INDEX idx_tenants_created_at ON authority.tenants(created_at);
-- Users table
CREATE TABLE IF NOT EXISTS authority.users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL REFERENCES authority.tenants(tenant_id),
username TEXT NOT NULL,
email TEXT,
display_name TEXT,
password_hash TEXT,
password_salt TEXT,
password_algorithm TEXT DEFAULT 'argon2id',
status TEXT NOT NULL DEFAULT 'active' CHECK (status IN ('active', 'inactive', 'locked', 'deleted')),
email_verified BOOLEAN NOT NULL DEFAULT FALSE,
mfa_enabled BOOLEAN NOT NULL DEFAULT FALSE,
mfa_secret TEXT,
failed_login_attempts INT NOT NULL DEFAULT 0,
last_login_at TIMESTAMPTZ,
last_password_change_at TIMESTAMPTZ,
password_expires_at TIMESTAMPTZ,
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by TEXT,
updated_by TEXT,
UNIQUE(tenant_id, username),
UNIQUE(tenant_id, email)
);
CREATE INDEX idx_users_tenant_id ON authority.users(tenant_id);
CREATE INDEX idx_users_status ON authority.users(tenant_id, status);
CREATE INDEX idx_users_email ON authority.users(tenant_id, email);
-- Roles table
CREATE TABLE IF NOT EXISTS authority.roles (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL REFERENCES authority.tenants(tenant_id),
name TEXT NOT NULL,
display_name TEXT,
description TEXT,
is_system BOOLEAN NOT NULL DEFAULT FALSE,
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(tenant_id, name)
);
CREATE INDEX idx_roles_tenant_id ON authority.roles(tenant_id);
-- Permissions table
CREATE TABLE IF NOT EXISTS authority.permissions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL REFERENCES authority.tenants(tenant_id),
name TEXT NOT NULL,
resource TEXT NOT NULL,
action TEXT NOT NULL,
description TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(tenant_id, name)
);
CREATE INDEX idx_permissions_tenant_id ON authority.permissions(tenant_id);
CREATE INDEX idx_permissions_resource ON authority.permissions(tenant_id, resource);
-- Role-Permission assignments
CREATE TABLE IF NOT EXISTS authority.role_permissions (
role_id UUID NOT NULL REFERENCES authority.roles(id) ON DELETE CASCADE,
permission_id UUID NOT NULL REFERENCES authority.permissions(id) ON DELETE CASCADE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (role_id, permission_id)
);
-- User-Role assignments
CREATE TABLE IF NOT EXISTS authority.user_roles (
user_id UUID NOT NULL REFERENCES authority.users(id) ON DELETE CASCADE,
role_id UUID NOT NULL REFERENCES authority.roles(id) ON DELETE CASCADE,
granted_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
granted_by TEXT,
expires_at TIMESTAMPTZ,
PRIMARY KEY (user_id, role_id)
);
-- API Keys table
CREATE TABLE IF NOT EXISTS authority.api_keys (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL REFERENCES authority.tenants(tenant_id),
user_id UUID REFERENCES authority.users(id) ON DELETE CASCADE,
name TEXT NOT NULL,
key_hash TEXT NOT NULL,
key_prefix TEXT NOT NULL,
scopes TEXT[] NOT NULL DEFAULT '{}',
status TEXT NOT NULL DEFAULT 'active' CHECK (status IN ('active', 'revoked', 'expired')),
last_used_at TIMESTAMPTZ,
expires_at TIMESTAMPTZ,
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
revoked_at TIMESTAMPTZ,
revoked_by TEXT
);
CREATE INDEX idx_api_keys_tenant_id ON authority.api_keys(tenant_id);
CREATE INDEX idx_api_keys_key_prefix ON authority.api_keys(key_prefix);
CREATE INDEX idx_api_keys_user_id ON authority.api_keys(user_id);
CREATE INDEX idx_api_keys_status ON authority.api_keys(tenant_id, status);
-- Tokens table (access tokens)
CREATE TABLE IF NOT EXISTS authority.tokens (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL REFERENCES authority.tenants(tenant_id),
user_id UUID REFERENCES authority.users(id) ON DELETE CASCADE,
token_hash TEXT NOT NULL UNIQUE,
token_type TEXT NOT NULL DEFAULT 'access' CHECK (token_type IN ('access', 'refresh', 'api')),
scopes TEXT[] NOT NULL DEFAULT '{}',
client_id TEXT,
issued_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
expires_at TIMESTAMPTZ NOT NULL,
revoked_at TIMESTAMPTZ,
revoked_by TEXT,
metadata JSONB NOT NULL DEFAULT '{}'
);
CREATE INDEX idx_tokens_tenant_id ON authority.tokens(tenant_id);
CREATE INDEX idx_tokens_user_id ON authority.tokens(user_id);
CREATE INDEX idx_tokens_expires_at ON authority.tokens(expires_at);
CREATE INDEX idx_tokens_token_hash ON authority.tokens(token_hash);
-- Refresh Tokens table
CREATE TABLE IF NOT EXISTS authority.refresh_tokens (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL REFERENCES authority.tenants(tenant_id),
user_id UUID NOT NULL REFERENCES authority.users(id) ON DELETE CASCADE,
token_hash TEXT NOT NULL UNIQUE,
access_token_id UUID REFERENCES authority.tokens(id) ON DELETE SET NULL,
client_id TEXT,
issued_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
expires_at TIMESTAMPTZ NOT NULL,
revoked_at TIMESTAMPTZ,
revoked_by TEXT,
replaced_by UUID,
metadata JSONB NOT NULL DEFAULT '{}'
);
CREATE INDEX idx_refresh_tokens_tenant_id ON authority.refresh_tokens(tenant_id);
CREATE INDEX idx_refresh_tokens_user_id ON authority.refresh_tokens(user_id);
CREATE INDEX idx_refresh_tokens_expires_at ON authority.refresh_tokens(expires_at);
-- Sessions table
CREATE TABLE IF NOT EXISTS authority.sessions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL REFERENCES authority.tenants(tenant_id),
user_id UUID NOT NULL REFERENCES authority.users(id) ON DELETE CASCADE,
session_token_hash TEXT NOT NULL UNIQUE,
ip_address TEXT,
user_agent TEXT,
started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
last_activity_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
expires_at TIMESTAMPTZ NOT NULL,
ended_at TIMESTAMPTZ,
end_reason TEXT,
metadata JSONB NOT NULL DEFAULT '{}'
);
CREATE INDEX idx_sessions_tenant_id ON authority.sessions(tenant_id);
CREATE INDEX idx_sessions_user_id ON authority.sessions(user_id);
CREATE INDEX idx_sessions_expires_at ON authority.sessions(expires_at);
-- Audit log table
CREATE TABLE IF NOT EXISTS authority.audit (
id BIGSERIAL PRIMARY KEY,
tenant_id TEXT NOT NULL,
user_id UUID,
action TEXT NOT NULL,
resource_type TEXT NOT NULL,
resource_id TEXT,
old_value JSONB,
new_value JSONB,
ip_address TEXT,
user_agent TEXT,
correlation_id TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_audit_tenant_id ON authority.audit(tenant_id);
CREATE INDEX idx_audit_user_id ON authority.audit(user_id);
CREATE INDEX idx_audit_action ON authority.audit(action);
CREATE INDEX idx_audit_resource ON authority.audit(resource_type, resource_id);
CREATE INDEX idx_audit_created_at ON authority.audit(created_at);
CREATE INDEX idx_audit_correlation_id ON authority.audit(correlation_id);
-- Function to update updated_at timestamp
CREATE OR REPLACE FUNCTION authority.update_updated_at()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- Triggers for updated_at
CREATE TRIGGER trg_tenants_updated_at
BEFORE UPDATE ON authority.tenants
FOR EACH ROW EXECUTE FUNCTION authority.update_updated_at();
CREATE TRIGGER trg_users_updated_at
BEFORE UPDATE ON authority.users
FOR EACH ROW EXECUTE FUNCTION authority.update_updated_at();
CREATE TRIGGER trg_roles_updated_at
BEFORE UPDATE ON authority.roles
FOR EACH ROW EXECUTE FUNCTION authority.update_updated_at();

View File

@@ -0,0 +1,62 @@
namespace StellaOps.Authority.Storage.Postgres.Models;
/// <summary>
/// Represents a tenant entity in the auth schema.
/// </summary>
public sealed class TenantEntity
{
/// <summary>
/// Unique tenant identifier.
/// </summary>
public required Guid Id { get; init; }
/// <summary>
/// Tenant slug/key (unique).
/// </summary>
public required string Slug { get; init; }
/// <summary>
/// Display name.
/// </summary>
public required string Name { get; init; }
/// <summary>
/// Optional description.
/// </summary>
public string? Description { get; init; }
/// <summary>
/// Contact email for the tenant.
/// </summary>
public string? ContactEmail { get; init; }
/// <summary>
/// Tenant is enabled.
/// </summary>
public bool Enabled { get; init; } = true;
/// <summary>
/// Tenant settings as JSON.
/// </summary>
public string Settings { get; init; } = "{}";
/// <summary>
/// Tenant metadata as JSON.
/// </summary>
public string Metadata { get; init; } = "{}";
/// <summary>
/// When the tenant was created.
/// </summary>
public DateTimeOffset CreatedAt { get; init; }
/// <summary>
/// When the tenant was last updated.
/// </summary>
public DateTimeOffset UpdatedAt { get; init; }
/// <summary>
/// User who created the tenant.
/// </summary>
public string? CreatedBy { get; init; }
}

View File

@@ -0,0 +1,112 @@
namespace StellaOps.Authority.Storage.Postgres.Models;
/// <summary>
/// Represents a user entity in the auth schema.
/// </summary>
public sealed class UserEntity
{
/// <summary>
/// Unique user identifier.
/// </summary>
public required Guid Id { get; init; }
/// <summary>
/// Tenant this user belongs to.
/// </summary>
public required string TenantId { get; init; }
/// <summary>
/// Username (unique per tenant).
/// </summary>
public required string Username { get; init; }
/// <summary>
/// Email address (unique per tenant).
/// </summary>
public required string Email { get; init; }
/// <summary>
/// User's display name.
/// </summary>
public string? DisplayName { get; init; }
/// <summary>
/// Argon2id password hash.
/// </summary>
public string? PasswordHash { get; init; }
/// <summary>
/// Password salt.
/// </summary>
public string? PasswordSalt { get; init; }
/// <summary>
/// User is enabled.
/// </summary>
public bool Enabled { get; init; } = true;
/// <summary>
/// Email has been verified.
/// </summary>
public bool EmailVerified { get; init; }
/// <summary>
/// MFA is enabled for this user.
/// </summary>
public bool MfaEnabled { get; init; }
/// <summary>
/// MFA secret (encrypted).
/// </summary>
public string? MfaSecret { get; init; }
/// <summary>
/// MFA backup codes (encrypted JSON array).
/// </summary>
public string? MfaBackupCodes { get; init; }
/// <summary>
/// Number of failed login attempts.
/// </summary>
public int FailedLoginAttempts { get; init; }
/// <summary>
/// Account locked until this time.
/// </summary>
public DateTimeOffset? LockedUntil { get; init; }
/// <summary>
/// Last successful login time.
/// </summary>
public DateTimeOffset? LastLoginAt { get; init; }
/// <summary>
/// When the password was last changed.
/// </summary>
public DateTimeOffset? PasswordChangedAt { get; init; }
/// <summary>
/// User settings as JSON.
/// </summary>
public string Settings { get; init; } = "{}";
/// <summary>
/// User metadata as JSON.
/// </summary>
public string Metadata { get; init; } = "{}";
/// <summary>
/// When the user was created.
/// </summary>
public DateTimeOffset CreatedAt { get; init; }
/// <summary>
/// When the user was last updated.
/// </summary>
public DateTimeOffset UpdatedAt { get; init; }
/// <summary>
/// User who created this user.
/// </summary>
public string? CreatedBy { get; init; }
}

View File

@@ -0,0 +1,48 @@
using StellaOps.Authority.Storage.Postgres.Models;
namespace StellaOps.Authority.Storage.Postgres.Repositories;
/// <summary>
/// Repository interface for tenant operations.
/// </summary>
public interface ITenantRepository
{
/// <summary>
/// Creates a new tenant.
/// </summary>
Task<TenantEntity> CreateAsync(TenantEntity tenant, CancellationToken cancellationToken = default);
/// <summary>
/// Gets a tenant by ID.
/// </summary>
Task<TenantEntity?> GetByIdAsync(Guid id, CancellationToken cancellationToken = default);
/// <summary>
/// Gets a tenant by slug.
/// </summary>
Task<TenantEntity?> GetBySlugAsync(string slug, CancellationToken cancellationToken = default);
/// <summary>
/// Gets all tenants with optional filtering.
/// </summary>
Task<IReadOnlyList<TenantEntity>> GetAllAsync(
bool? enabled = null,
int limit = 100,
int offset = 0,
CancellationToken cancellationToken = default);
/// <summary>
/// Updates a tenant.
/// </summary>
Task<bool> UpdateAsync(TenantEntity tenant, CancellationToken cancellationToken = default);
/// <summary>
/// Deletes a tenant.
/// </summary>
Task<bool> DeleteAsync(Guid id, CancellationToken cancellationToken = default);
/// <summary>
/// Checks if a tenant slug exists.
/// </summary>
Task<bool> SlugExistsAsync(string slug, CancellationToken cancellationToken = default);
}

View File

@@ -0,0 +1,76 @@
using StellaOps.Authority.Storage.Postgres.Models;
namespace StellaOps.Authority.Storage.Postgres.Repositories;
/// <summary>
/// Repository interface for user operations.
/// </summary>
public interface IUserRepository
{
/// <summary>
/// Creates a new user.
/// </summary>
Task<UserEntity> CreateAsync(UserEntity user, CancellationToken cancellationToken = default);
/// <summary>
/// Gets a user by ID.
/// </summary>
Task<UserEntity?> GetByIdAsync(string tenantId, Guid id, CancellationToken cancellationToken = default);
/// <summary>
/// Gets a user by username.
/// </summary>
Task<UserEntity?> GetByUsernameAsync(string tenantId, string username, CancellationToken cancellationToken = default);
/// <summary>
/// Gets a user by email.
/// </summary>
Task<UserEntity?> GetByEmailAsync(string tenantId, string email, CancellationToken cancellationToken = default);
/// <summary>
/// Gets all users for a tenant with optional filtering.
/// </summary>
Task<IReadOnlyList<UserEntity>> GetAllAsync(
string tenantId,
bool? enabled = null,
int limit = 100,
int offset = 0,
CancellationToken cancellationToken = default);
/// <summary>
/// Updates a user.
/// </summary>
Task<bool> UpdateAsync(UserEntity user, CancellationToken cancellationToken = default);
/// <summary>
/// Deletes a user.
/// </summary>
Task<bool> DeleteAsync(string tenantId, Guid id, CancellationToken cancellationToken = default);
/// <summary>
/// Updates the user's password hash.
/// </summary>
Task<bool> UpdatePasswordAsync(
string tenantId,
Guid userId,
string passwordHash,
string passwordSalt,
CancellationToken cancellationToken = default);
/// <summary>
/// Records a failed login attempt.
/// </summary>
Task<int> RecordFailedLoginAsync(
string tenantId,
Guid userId,
DateTimeOffset? lockUntil = null,
CancellationToken cancellationToken = default);
/// <summary>
/// Records a successful login.
/// </summary>
Task RecordSuccessfulLoginAsync(
string tenantId,
Guid userId,
CancellationToken cancellationToken = default);
}

View File

@@ -0,0 +1,194 @@
using Microsoft.Extensions.Logging;
using Npgsql;
using StellaOps.Authority.Storage.Postgres.Models;
using StellaOps.Infrastructure.Postgres.Repositories;
namespace StellaOps.Authority.Storage.Postgres.Repositories;
/// <summary>
/// PostgreSQL repository for tenant operations.
/// </summary>
public sealed class TenantRepository : RepositoryBase<AuthorityDataSource>, ITenantRepository
{
private const string SystemTenantId = "_system";
/// <summary>
/// Creates a new tenant repository.
/// </summary>
public TenantRepository(AuthorityDataSource dataSource, ILogger<TenantRepository> logger)
: base(dataSource, logger)
{
}
/// <inheritdoc />
public async Task<TenantEntity> CreateAsync(TenantEntity tenant, CancellationToken cancellationToken = default)
{
const string sql = """
INSERT INTO auth.tenants (id, slug, name, description, contact_email, enabled, settings, metadata, created_by)
VALUES (@id, @slug, @name, @description, @contact_email, @enabled, @settings::jsonb, @metadata::jsonb, @created_by)
RETURNING id, slug, name, description, contact_email, enabled, settings::text, metadata::text, created_at, updated_at, created_by
""";
await using var connection = await DataSource.OpenSystemConnectionAsync(cancellationToken).ConfigureAwait(false);
await using var command = CreateCommand(sql, connection);
AddParameter(command, "id", tenant.Id);
AddParameter(command, "slug", tenant.Slug);
AddParameter(command, "name", tenant.Name);
AddParameter(command, "description", tenant.Description);
AddParameter(command, "contact_email", tenant.ContactEmail);
AddParameter(command, "enabled", tenant.Enabled);
AddJsonbParameter(command, "settings", tenant.Settings);
AddJsonbParameter(command, "metadata", tenant.Metadata);
AddParameter(command, "created_by", tenant.CreatedBy);
await using var reader = await command.ExecuteReaderAsync(cancellationToken).ConfigureAwait(false);
await reader.ReadAsync(cancellationToken).ConfigureAwait(false);
return MapTenant(reader);
}
/// <inheritdoc />
public async Task<TenantEntity?> GetByIdAsync(Guid id, CancellationToken cancellationToken = default)
{
const string sql = """
SELECT id, slug, name, description, contact_email, enabled, settings::text, metadata::text, created_at, updated_at, created_by
FROM auth.tenants
WHERE id = @id
""";
return await QuerySingleOrDefaultAsync(
SystemTenantId,
sql,
cmd => AddParameter(cmd, "id", id),
MapTenant,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<TenantEntity?> GetBySlugAsync(string slug, CancellationToken cancellationToken = default)
{
const string sql = """
SELECT id, slug, name, description, contact_email, enabled, settings::text, metadata::text, created_at, updated_at, created_by
FROM auth.tenants
WHERE slug = @slug
""";
return await QuerySingleOrDefaultAsync(
SystemTenantId,
sql,
cmd => AddParameter(cmd, "slug", slug),
MapTenant,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<IReadOnlyList<TenantEntity>> GetAllAsync(
bool? enabled = null,
int limit = 100,
int offset = 0,
CancellationToken cancellationToken = default)
{
var sql = """
SELECT id, slug, name, description, contact_email, enabled, settings::text, metadata::text, created_at, updated_at, created_by
FROM auth.tenants
""";
if (enabled.HasValue)
{
sql += " WHERE enabled = @enabled";
}
sql += " ORDER BY name, id LIMIT @limit OFFSET @offset";
return await QueryAsync(
SystemTenantId,
sql,
cmd =>
{
if (enabled.HasValue)
{
AddParameter(cmd, "enabled", enabled.Value);
}
AddParameter(cmd, "limit", limit);
AddParameter(cmd, "offset", offset);
},
MapTenant,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<bool> UpdateAsync(TenantEntity tenant, CancellationToken cancellationToken = default)
{
const string sql = """
UPDATE auth.tenants
SET name = @name,
description = @description,
contact_email = @contact_email,
enabled = @enabled,
settings = @settings::jsonb,
metadata = @metadata::jsonb
WHERE id = @id
""";
var rows = await ExecuteAsync(
SystemTenantId,
sql,
cmd =>
{
AddParameter(cmd, "id", tenant.Id);
AddParameter(cmd, "name", tenant.Name);
AddParameter(cmd, "description", tenant.Description);
AddParameter(cmd, "contact_email", tenant.ContactEmail);
AddParameter(cmd, "enabled", tenant.Enabled);
AddJsonbParameter(cmd, "settings", tenant.Settings);
AddJsonbParameter(cmd, "metadata", tenant.Metadata);
},
cancellationToken).ConfigureAwait(false);
return rows > 0;
}
/// <inheritdoc />
public async Task<bool> DeleteAsync(Guid id, CancellationToken cancellationToken = default)
{
const string sql = "DELETE FROM auth.tenants WHERE id = @id";
var rows = await ExecuteAsync(
SystemTenantId,
sql,
cmd => AddParameter(cmd, "id", id),
cancellationToken).ConfigureAwait(false);
return rows > 0;
}
/// <inheritdoc />
public async Task<bool> SlugExistsAsync(string slug, CancellationToken cancellationToken = default)
{
const string sql = "SELECT EXISTS(SELECT 1 FROM auth.tenants WHERE slug = @slug)";
var result = await ExecuteScalarAsync<bool>(
SystemTenantId,
sql,
cmd => AddParameter(cmd, "slug", slug),
cancellationToken).ConfigureAwait(false);
return result;
}
private static TenantEntity MapTenant(NpgsqlDataReader reader) => new()
{
Id = reader.GetGuid(0),
Slug = reader.GetString(1),
Name = reader.GetString(2),
Description = GetNullableString(reader, 3),
ContactEmail = GetNullableString(reader, 4),
Enabled = reader.GetBoolean(5),
Settings = reader.GetString(6),
Metadata = reader.GetString(7),
CreatedAt = reader.GetFieldValue<DateTimeOffset>(8),
UpdatedAt = reader.GetFieldValue<DateTimeOffset>(9),
CreatedBy = GetNullableString(reader, 10)
};
}

View File

@@ -0,0 +1,353 @@
using Microsoft.Extensions.Logging;
using Npgsql;
using StellaOps.Authority.Storage.Postgres.Models;
using StellaOps.Infrastructure.Postgres.Repositories;
namespace StellaOps.Authority.Storage.Postgres.Repositories;
/// <summary>
/// PostgreSQL repository for user operations.
/// </summary>
public sealed class UserRepository : RepositoryBase<AuthorityDataSource>, IUserRepository
{
/// <summary>
/// Creates a new user repository.
/// </summary>
public UserRepository(AuthorityDataSource dataSource, ILogger<UserRepository> logger)
: base(dataSource, logger)
{
}
/// <inheritdoc />
public async Task<UserEntity> CreateAsync(UserEntity user, CancellationToken cancellationToken = default)
{
const string sql = """
INSERT INTO auth.users (
id, tenant_id, username, email, display_name, password_hash, password_salt,
enabled, email_verified, mfa_enabled, mfa_secret, mfa_backup_codes,
settings, metadata, created_by
)
VALUES (
@id, @tenant_id, @username, @email, @display_name, @password_hash, @password_salt,
@enabled, @email_verified, @mfa_enabled, @mfa_secret, @mfa_backup_codes,
@settings::jsonb, @metadata::jsonb, @created_by
)
RETURNING id, tenant_id, username, email, display_name, password_hash, password_salt,
enabled, email_verified, mfa_enabled, mfa_secret, mfa_backup_codes,
failed_login_attempts, locked_until, last_login_at, password_changed_at,
settings::text, metadata::text, created_at, updated_at, created_by
""";
await using var connection = await DataSource.OpenConnectionAsync(user.TenantId, "writer", cancellationToken)
.ConfigureAwait(false);
await using var command = CreateCommand(sql, connection);
AddUserParameters(command, user);
await using var reader = await command.ExecuteReaderAsync(cancellationToken).ConfigureAwait(false);
await reader.ReadAsync(cancellationToken).ConfigureAwait(false);
return MapUser(reader);
}
/// <inheritdoc />
public async Task<UserEntity?> GetByIdAsync(string tenantId, Guid id, CancellationToken cancellationToken = default)
{
const string sql = """
SELECT id, tenant_id, username, email, display_name, password_hash, password_salt,
enabled, email_verified, mfa_enabled, mfa_secret, mfa_backup_codes,
failed_login_attempts, locked_until, last_login_at, password_changed_at,
settings::text, metadata::text, created_at, updated_at, created_by
FROM auth.users
WHERE tenant_id = @tenant_id AND id = @id
""";
return await QuerySingleOrDefaultAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "id", id);
},
MapUser,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<UserEntity?> GetByUsernameAsync(string tenantId, string username, CancellationToken cancellationToken = default)
{
const string sql = """
SELECT id, tenant_id, username, email, display_name, password_hash, password_salt,
enabled, email_verified, mfa_enabled, mfa_secret, mfa_backup_codes,
failed_login_attempts, locked_until, last_login_at, password_changed_at,
settings::text, metadata::text, created_at, updated_at, created_by
FROM auth.users
WHERE tenant_id = @tenant_id AND username = @username
""";
return await QuerySingleOrDefaultAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "username", username);
},
MapUser,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<UserEntity?> GetByEmailAsync(string tenantId, string email, CancellationToken cancellationToken = default)
{
const string sql = """
SELECT id, tenant_id, username, email, display_name, password_hash, password_salt,
enabled, email_verified, mfa_enabled, mfa_secret, mfa_backup_codes,
failed_login_attempts, locked_until, last_login_at, password_changed_at,
settings::text, metadata::text, created_at, updated_at, created_by
FROM auth.users
WHERE tenant_id = @tenant_id AND email = @email
""";
return await QuerySingleOrDefaultAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "email", email);
},
MapUser,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<IReadOnlyList<UserEntity>> GetAllAsync(
string tenantId,
bool? enabled = null,
int limit = 100,
int offset = 0,
CancellationToken cancellationToken = default)
{
var sql = """
SELECT id, tenant_id, username, email, display_name, password_hash, password_salt,
enabled, email_verified, mfa_enabled, mfa_secret, mfa_backup_codes,
failed_login_attempts, locked_until, last_login_at, password_changed_at,
settings::text, metadata::text, created_at, updated_at, created_by
FROM auth.users
WHERE tenant_id = @tenant_id
""";
if (enabled.HasValue)
{
sql += " AND enabled = @enabled";
}
sql += " ORDER BY username, id LIMIT @limit OFFSET @offset";
return await QueryAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
if (enabled.HasValue)
{
AddParameter(cmd, "enabled", enabled.Value);
}
AddParameter(cmd, "limit", limit);
AddParameter(cmd, "offset", offset);
},
MapUser,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<bool> UpdateAsync(UserEntity user, CancellationToken cancellationToken = default)
{
const string sql = """
UPDATE auth.users
SET username = @username,
email = @email,
display_name = @display_name,
enabled = @enabled,
email_verified = @email_verified,
mfa_enabled = @mfa_enabled,
mfa_secret = @mfa_secret,
mfa_backup_codes = @mfa_backup_codes,
settings = @settings::jsonb,
metadata = @metadata::jsonb
WHERE tenant_id = @tenant_id AND id = @id
""";
var rows = await ExecuteAsync(
user.TenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", user.TenantId);
AddParameter(cmd, "id", user.Id);
AddParameter(cmd, "username", user.Username);
AddParameter(cmd, "email", user.Email);
AddParameter(cmd, "display_name", user.DisplayName);
AddParameter(cmd, "enabled", user.Enabled);
AddParameter(cmd, "email_verified", user.EmailVerified);
AddParameter(cmd, "mfa_enabled", user.MfaEnabled);
AddParameter(cmd, "mfa_secret", user.MfaSecret);
AddParameter(cmd, "mfa_backup_codes", user.MfaBackupCodes);
AddJsonbParameter(cmd, "settings", user.Settings);
AddJsonbParameter(cmd, "metadata", user.Metadata);
},
cancellationToken).ConfigureAwait(false);
return rows > 0;
}
/// <inheritdoc />
public async Task<bool> DeleteAsync(string tenantId, Guid id, CancellationToken cancellationToken = default)
{
const string sql = "DELETE FROM auth.users WHERE tenant_id = @tenant_id AND id = @id";
var rows = await ExecuteAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "id", id);
},
cancellationToken).ConfigureAwait(false);
return rows > 0;
}
/// <inheritdoc />
public async Task<bool> UpdatePasswordAsync(
string tenantId,
Guid userId,
string passwordHash,
string passwordSalt,
CancellationToken cancellationToken = default)
{
const string sql = """
UPDATE auth.users
SET password_hash = @password_hash,
password_salt = @password_salt,
password_changed_at = NOW()
WHERE tenant_id = @tenant_id AND id = @id
""";
var rows = await ExecuteAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "id", userId);
AddParameter(cmd, "password_hash", passwordHash);
AddParameter(cmd, "password_salt", passwordSalt);
},
cancellationToken).ConfigureAwait(false);
return rows > 0;
}
/// <inheritdoc />
public async Task<int> RecordFailedLoginAsync(
string tenantId,
Guid userId,
DateTimeOffset? lockUntil = null,
CancellationToken cancellationToken = default)
{
const string sql = """
UPDATE auth.users
SET failed_login_attempts = failed_login_attempts + 1,
locked_until = @locked_until
WHERE tenant_id = @tenant_id AND id = @id
RETURNING failed_login_attempts
""";
var result = await ExecuteScalarAsync<int>(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "id", userId);
AddParameter(cmd, "locked_until", lockUntil);
},
cancellationToken).ConfigureAwait(false);
return result;
}
/// <inheritdoc />
public async Task RecordSuccessfulLoginAsync(
string tenantId,
Guid userId,
CancellationToken cancellationToken = default)
{
const string sql = """
UPDATE auth.users
SET failed_login_attempts = 0,
locked_until = NULL,
last_login_at = NOW()
WHERE tenant_id = @tenant_id AND id = @id
""";
await ExecuteAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "id", userId);
},
cancellationToken).ConfigureAwait(false);
}
private static void AddUserParameters(NpgsqlCommand command, UserEntity user)
{
AddParameter(command, "id", user.Id);
AddParameter(command, "tenant_id", user.TenantId);
AddParameter(command, "username", user.Username);
AddParameter(command, "email", user.Email);
AddParameter(command, "display_name", user.DisplayName);
AddParameter(command, "password_hash", user.PasswordHash);
AddParameter(command, "password_salt", user.PasswordSalt);
AddParameter(command, "enabled", user.Enabled);
AddParameter(command, "email_verified", user.EmailVerified);
AddParameter(command, "mfa_enabled", user.MfaEnabled);
AddParameter(command, "mfa_secret", user.MfaSecret);
AddParameter(command, "mfa_backup_codes", user.MfaBackupCodes);
AddJsonbParameter(command, "settings", user.Settings);
AddJsonbParameter(command, "metadata", user.Metadata);
AddParameter(command, "created_by", user.CreatedBy);
}
private static UserEntity MapUser(NpgsqlDataReader reader) => new()
{
Id = reader.GetGuid(0),
TenantId = reader.GetString(1),
Username = reader.GetString(2),
Email = reader.GetString(3),
DisplayName = GetNullableString(reader, 4),
PasswordHash = GetNullableString(reader, 5),
PasswordSalt = GetNullableString(reader, 6),
Enabled = reader.GetBoolean(7),
EmailVerified = reader.GetBoolean(8),
MfaEnabled = reader.GetBoolean(9),
MfaSecret = GetNullableString(reader, 10),
MfaBackupCodes = GetNullableString(reader, 11),
FailedLoginAttempts = reader.GetInt32(12),
LockedUntil = GetNullableDateTimeOffset(reader, 13),
LastLoginAt = GetNullableDateTimeOffset(reader, 14),
PasswordChangedAt = GetNullableDateTimeOffset(reader, 15),
Settings = reader.GetString(16),
Metadata = reader.GetString(17),
CreatedAt = reader.GetFieldValue<DateTimeOffset>(18),
UpdatedAt = reader.GetFieldValue<DateTimeOffset>(19),
CreatedBy = GetNullableString(reader, 20)
};
}

View File

@@ -0,0 +1,55 @@
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using StellaOps.Authority.Storage.Postgres.Repositories;
using StellaOps.Infrastructure.Postgres;
using StellaOps.Infrastructure.Postgres.Options;
namespace StellaOps.Authority.Storage.Postgres;
/// <summary>
/// Extension methods for configuring Authority PostgreSQL storage services.
/// </summary>
public static class ServiceCollectionExtensions
{
/// <summary>
/// Adds Authority PostgreSQL storage services.
/// </summary>
/// <param name="services">Service collection.</param>
/// <param name="configuration">Configuration root.</param>
/// <param name="sectionName">Configuration section name for PostgreSQL options.</param>
/// <returns>Service collection for chaining.</returns>
public static IServiceCollection AddAuthorityPostgresStorage(
this IServiceCollection services,
IConfiguration configuration,
string sectionName = "Postgres:Authority")
{
services.Configure<PostgresOptions>(sectionName, configuration.GetSection(sectionName));
services.AddSingleton<AuthorityDataSource>();
// Register repositories
services.AddScoped<ITenantRepository, TenantRepository>();
services.AddScoped<IUserRepository, UserRepository>();
return services;
}
/// <summary>
/// Adds Authority PostgreSQL storage services with explicit options.
/// </summary>
/// <param name="services">Service collection.</param>
/// <param name="configureOptions">Options configuration action.</param>
/// <returns>Service collection for chaining.</returns>
public static IServiceCollection AddAuthorityPostgresStorage(
this IServiceCollection services,
Action<PostgresOptions> configureOptions)
{
services.Configure(configureOptions);
services.AddSingleton<AuthorityDataSource>();
// Register repositories
services.AddScoped<ITenantRepository, TenantRepository>();
services.AddScoped<IUserRepository, UserRepository>();
return services;
}
}

View File

@@ -0,0 +1,21 @@
<?xml version="1.0" ?>
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
<LangVersion>preview</LangVersion>
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
<RootNamespace>StellaOps.Authority.Storage.Postgres</RootNamespace>
</PropertyGroup>
<ItemGroup>
<None Include="Migrations\**\*.sql" CopyToOutputDirectory="PreserveNewest" />
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\..\..\__Libraries\StellaOps.Infrastructure.Postgres\StellaOps.Infrastructure.Postgres.csproj" />
</ItemGroup>
</Project>

View File

@@ -0,0 +1,50 @@
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Options;
using Npgsql;
using StellaOps.Infrastructure.Postgres.Connections;
using StellaOps.Infrastructure.Postgres.Options;
namespace StellaOps.Concelier.Storage.Postgres;
/// <summary>
/// PostgreSQL data source for the Concelier (vulnerability) module.
/// Manages connections for advisory ingestion, merging, and vulnerability data.
/// </summary>
/// <remarks>
/// The Concelier module stores global vulnerability data that is not tenant-scoped.
/// Advisories and their metadata are shared across all tenants.
/// </remarks>
public sealed class ConcelierDataSource : DataSourceBase
{
/// <summary>
/// Default schema name for Concelier/vulnerability tables.
/// </summary>
public const string DefaultSchemaName = "vuln";
/// <summary>
/// Creates a new Concelier data source.
/// </summary>
public ConcelierDataSource(IOptions<PostgresOptions> options, ILogger<ConcelierDataSource> logger)
: base(CreateOptions(options.Value), logger)
{
}
/// <inheritdoc />
protected override string ModuleName => "Concelier";
/// <inheritdoc />
protected override void ConfigureDataSourceBuilder(NpgsqlDataSourceBuilder builder)
{
base.ConfigureDataSourceBuilder(builder);
// Enable full-text search vector support for advisory searching
}
private static PostgresOptions CreateOptions(PostgresOptions baseOptions)
{
if (string.IsNullOrWhiteSpace(baseOptions.SchemaName))
{
baseOptions.SchemaName = DefaultSchemaName;
}
return baseOptions;
}
}

View File

@@ -0,0 +1,261 @@
-- Vulnerability Schema Migration 001: Initial Schema
-- Creates the vuln schema for advisories and vulnerability data
-- Create schema
CREATE SCHEMA IF NOT EXISTS vuln;
-- Enable pg_trgm for fuzzy text search
CREATE EXTENSION IF NOT EXISTS pg_trgm;
-- Sources table (feed sources)
CREATE TABLE IF NOT EXISTS vuln.sources (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
key TEXT NOT NULL UNIQUE,
name TEXT NOT NULL,
source_type TEXT NOT NULL,
url TEXT,
priority INT NOT NULL DEFAULT 0,
enabled BOOLEAN NOT NULL DEFAULT TRUE,
config JSONB NOT NULL DEFAULT '{}',
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_sources_enabled ON vuln.sources(enabled, priority DESC);
-- Feed snapshots table
CREATE TABLE IF NOT EXISTS vuln.feed_snapshots (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
source_id UUID NOT NULL REFERENCES vuln.sources(id),
snapshot_id TEXT NOT NULL,
advisory_count INT NOT NULL DEFAULT 0,
checksum TEXT,
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(source_id, snapshot_id)
);
CREATE INDEX idx_feed_snapshots_source ON vuln.feed_snapshots(source_id);
CREATE INDEX idx_feed_snapshots_created ON vuln.feed_snapshots(created_at);
-- Advisory snapshots table (point-in-time snapshots)
CREATE TABLE IF NOT EXISTS vuln.advisory_snapshots (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
feed_snapshot_id UUID NOT NULL REFERENCES vuln.feed_snapshots(id),
advisory_key TEXT NOT NULL,
content_hash TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(feed_snapshot_id, advisory_key)
);
CREATE INDEX idx_advisory_snapshots_feed ON vuln.advisory_snapshots(feed_snapshot_id);
CREATE INDEX idx_advisory_snapshots_key ON vuln.advisory_snapshots(advisory_key);
-- Advisories table (main vulnerability data)
CREATE TABLE IF NOT EXISTS vuln.advisories (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
advisory_key TEXT NOT NULL UNIQUE,
primary_vuln_id TEXT NOT NULL,
source_id UUID REFERENCES vuln.sources(id),
title TEXT,
summary TEXT,
description TEXT,
severity TEXT CHECK (severity IN ('critical', 'high', 'medium', 'low', 'unknown')),
published_at TIMESTAMPTZ,
modified_at TIMESTAMPTZ,
withdrawn_at TIMESTAMPTZ,
provenance JSONB NOT NULL DEFAULT '{}',
raw_payload JSONB,
search_vector TSVECTOR,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_advisories_vuln_id ON vuln.advisories(primary_vuln_id);
CREATE INDEX idx_advisories_source ON vuln.advisories(source_id);
CREATE INDEX idx_advisories_severity ON vuln.advisories(severity);
CREATE INDEX idx_advisories_published ON vuln.advisories(published_at);
CREATE INDEX idx_advisories_modified ON vuln.advisories(modified_at);
CREATE INDEX idx_advisories_search ON vuln.advisories USING GIN(search_vector);
-- Advisory aliases table (CVE, GHSA, etc.)
CREATE TABLE IF NOT EXISTS vuln.advisory_aliases (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
advisory_id UUID NOT NULL REFERENCES vuln.advisories(id) ON DELETE CASCADE,
alias_type TEXT NOT NULL,
alias_value TEXT NOT NULL,
is_primary BOOLEAN NOT NULL DEFAULT FALSE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(advisory_id, alias_type, alias_value)
);
CREATE INDEX idx_advisory_aliases_advisory ON vuln.advisory_aliases(advisory_id);
CREATE INDEX idx_advisory_aliases_value ON vuln.advisory_aliases(alias_type, alias_value);
CREATE INDEX idx_advisory_aliases_cve ON vuln.advisory_aliases(alias_value)
WHERE alias_type = 'CVE';
-- Advisory CVSS scores table
CREATE TABLE IF NOT EXISTS vuln.advisory_cvss (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
advisory_id UUID NOT NULL REFERENCES vuln.advisories(id) ON DELETE CASCADE,
cvss_version TEXT NOT NULL,
vector_string TEXT NOT NULL,
base_score NUMERIC(3,1) NOT NULL,
base_severity TEXT,
exploitability_score NUMERIC(3,1),
impact_score NUMERIC(3,1),
source TEXT,
is_primary BOOLEAN NOT NULL DEFAULT FALSE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(advisory_id, cvss_version, source)
);
CREATE INDEX idx_advisory_cvss_advisory ON vuln.advisory_cvss(advisory_id);
CREATE INDEX idx_advisory_cvss_score ON vuln.advisory_cvss(base_score DESC);
-- Advisory affected packages table
CREATE TABLE IF NOT EXISTS vuln.advisory_affected (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
advisory_id UUID NOT NULL REFERENCES vuln.advisories(id) ON DELETE CASCADE,
ecosystem TEXT NOT NULL,
package_name TEXT NOT NULL,
purl TEXT,
version_range JSONB NOT NULL DEFAULT '{}',
versions_affected TEXT[],
versions_fixed TEXT[],
database_specific JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_advisory_affected_advisory ON vuln.advisory_affected(advisory_id);
CREATE INDEX idx_advisory_affected_ecosystem ON vuln.advisory_affected(ecosystem, package_name);
CREATE INDEX idx_advisory_affected_purl ON vuln.advisory_affected(purl);
CREATE INDEX idx_advisory_affected_purl_trgm ON vuln.advisory_affected USING GIN(purl gin_trgm_ops);
-- Advisory references table
CREATE TABLE IF NOT EXISTS vuln.advisory_references (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
advisory_id UUID NOT NULL REFERENCES vuln.advisories(id) ON DELETE CASCADE,
ref_type TEXT NOT NULL,
url TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_advisory_references_advisory ON vuln.advisory_references(advisory_id);
-- Advisory credits table
CREATE TABLE IF NOT EXISTS vuln.advisory_credits (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
advisory_id UUID NOT NULL REFERENCES vuln.advisories(id) ON DELETE CASCADE,
name TEXT NOT NULL,
contact TEXT,
credit_type TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_advisory_credits_advisory ON vuln.advisory_credits(advisory_id);
-- Advisory weaknesses table (CWE)
CREATE TABLE IF NOT EXISTS vuln.advisory_weaknesses (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
advisory_id UUID NOT NULL REFERENCES vuln.advisories(id) ON DELETE CASCADE,
cwe_id TEXT NOT NULL,
description TEXT,
source TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(advisory_id, cwe_id)
);
CREATE INDEX idx_advisory_weaknesses_advisory ON vuln.advisory_weaknesses(advisory_id);
CREATE INDEX idx_advisory_weaknesses_cwe ON vuln.advisory_weaknesses(cwe_id);
-- KEV flags table (Known Exploited Vulnerabilities)
CREATE TABLE IF NOT EXISTS vuln.kev_flags (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
advisory_id UUID NOT NULL REFERENCES vuln.advisories(id) ON DELETE CASCADE,
cve_id TEXT NOT NULL,
vendor_project TEXT,
product TEXT,
vulnerability_name TEXT,
date_added DATE NOT NULL,
due_date DATE,
known_ransomware_use BOOLEAN NOT NULL DEFAULT FALSE,
notes TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(advisory_id, cve_id)
);
CREATE INDEX idx_kev_flags_advisory ON vuln.kev_flags(advisory_id);
CREATE INDEX idx_kev_flags_cve ON vuln.kev_flags(cve_id);
CREATE INDEX idx_kev_flags_date ON vuln.kev_flags(date_added);
-- Source states table (cursor tracking)
CREATE TABLE IF NOT EXISTS vuln.source_states (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
source_id UUID NOT NULL REFERENCES vuln.sources(id) UNIQUE,
cursor TEXT,
last_sync_at TIMESTAMPTZ,
last_success_at TIMESTAMPTZ,
last_error TEXT,
sync_count BIGINT NOT NULL DEFAULT 0,
error_count INT NOT NULL DEFAULT 0,
metadata JSONB NOT NULL DEFAULT '{}',
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_source_states_source ON vuln.source_states(source_id);
-- Merge events table (advisory merge audit)
CREATE TABLE IF NOT EXISTS vuln.merge_events (
id BIGSERIAL PRIMARY KEY,
advisory_id UUID NOT NULL REFERENCES vuln.advisories(id),
source_id UUID REFERENCES vuln.sources(id),
event_type TEXT NOT NULL,
old_value JSONB,
new_value JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_merge_events_advisory ON vuln.merge_events(advisory_id);
CREATE INDEX idx_merge_events_created ON vuln.merge_events(created_at);
-- Function to update search vector
CREATE OR REPLACE FUNCTION vuln.update_advisory_search_vector()
RETURNS TRIGGER AS $$
BEGIN
NEW.search_vector =
setweight(to_tsvector('english', COALESCE(NEW.primary_vuln_id, '')), 'A') ||
setweight(to_tsvector('english', COALESCE(NEW.title, '')), 'B') ||
setweight(to_tsvector('english', COALESCE(NEW.summary, '')), 'C') ||
setweight(to_tsvector('english', COALESCE(NEW.description, '')), 'D');
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- Trigger for search vector
CREATE TRIGGER trg_advisories_search_vector
BEFORE INSERT OR UPDATE ON vuln.advisories
FOR EACH ROW EXECUTE FUNCTION vuln.update_advisory_search_vector();
-- Update timestamp function
CREATE OR REPLACE FUNCTION vuln.update_updated_at()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- Triggers
CREATE TRIGGER trg_sources_updated_at
BEFORE UPDATE ON vuln.sources
FOR EACH ROW EXECUTE FUNCTION vuln.update_updated_at();
CREATE TRIGGER trg_advisories_updated_at
BEFORE UPDATE ON vuln.advisories
FOR EACH ROW EXECUTE FUNCTION vuln.update_updated_at();
CREATE TRIGGER trg_source_states_updated_at
BEFORE UPDATE ON vuln.source_states
FOR EACH ROW EXECUTE FUNCTION vuln.update_updated_at();

View File

@@ -0,0 +1,82 @@
namespace StellaOps.Concelier.Storage.Postgres.Models;
/// <summary>
/// Represents an advisory entity in the vuln schema.
/// </summary>
public sealed class AdvisoryEntity
{
/// <summary>
/// Unique advisory identifier.
/// </summary>
public required Guid Id { get; init; }
/// <summary>
/// Advisory key (unique identifier, e.g., "ghsa:GHSA-xxxx").
/// </summary>
public required string AdvisoryKey { get; init; }
/// <summary>
/// Primary vulnerability ID (CVE, GHSA, etc.).
/// </summary>
public required string PrimaryVulnId { get; init; }
/// <summary>
/// Source that provided this advisory.
/// </summary>
public Guid? SourceId { get; init; }
/// <summary>
/// Advisory title.
/// </summary>
public string? Title { get; init; }
/// <summary>
/// Brief summary.
/// </summary>
public string? Summary { get; init; }
/// <summary>
/// Full description.
/// </summary>
public string? Description { get; init; }
/// <summary>
/// Severity level.
/// </summary>
public string? Severity { get; init; }
/// <summary>
/// When the advisory was published.
/// </summary>
public DateTimeOffset? PublishedAt { get; init; }
/// <summary>
/// When the advisory was last modified.
/// </summary>
public DateTimeOffset? ModifiedAt { get; init; }
/// <summary>
/// When the advisory was withdrawn (if applicable).
/// </summary>
public DateTimeOffset? WithdrawnAt { get; init; }
/// <summary>
/// Provenance information as JSON.
/// </summary>
public string Provenance { get; init; } = "{}";
/// <summary>
/// Raw payload from the source as JSON.
/// </summary>
public string? RawPayload { get; init; }
/// <summary>
/// When the record was created.
/// </summary>
public DateTimeOffset CreatedAt { get; init; }
/// <summary>
/// When the record was last updated.
/// </summary>
public DateTimeOffset UpdatedAt { get; init; }
}

View File

@@ -0,0 +1,62 @@
namespace StellaOps.Concelier.Storage.Postgres.Models;
/// <summary>
/// Represents a vulnerability feed source entity.
/// </summary>
public sealed class SourceEntity
{
/// <summary>
/// Unique source identifier.
/// </summary>
public required Guid Id { get; init; }
/// <summary>
/// Unique source key (e.g., "nvd", "ghsa", "osv").
/// </summary>
public required string Key { get; init; }
/// <summary>
/// Display name.
/// </summary>
public required string Name { get; init; }
/// <summary>
/// Source type (e.g., "nvd", "osv", "github").
/// </summary>
public required string SourceType { get; init; }
/// <summary>
/// Source URL.
/// </summary>
public string? Url { get; init; }
/// <summary>
/// Priority for merge precedence (higher = more authoritative).
/// </summary>
public int Priority { get; init; }
/// <summary>
/// Source is enabled.
/// </summary>
public bool Enabled { get; init; } = true;
/// <summary>
/// Source-specific configuration as JSON.
/// </summary>
public string Config { get; init; } = "{}";
/// <summary>
/// Source metadata as JSON.
/// </summary>
public string Metadata { get; init; } = "{}";
/// <summary>
/// When the record was created.
/// </summary>
public DateTimeOffset CreatedAt { get; init; }
/// <summary>
/// When the record was last updated.
/// </summary>
public DateTimeOffset UpdatedAt { get; init; }
}

View File

@@ -0,0 +1,320 @@
using Microsoft.Extensions.Logging;
using Npgsql;
using StellaOps.Concelier.Storage.Postgres.Models;
using StellaOps.Infrastructure.Postgres.Repositories;
namespace StellaOps.Concelier.Storage.Postgres.Repositories;
/// <summary>
/// PostgreSQL repository for advisory operations.
/// </summary>
/// <remarks>
/// Advisory data is global (not tenant-scoped) as vulnerability information
/// is shared across all tenants.
/// </remarks>
public sealed class AdvisoryRepository : RepositoryBase<ConcelierDataSource>, IAdvisoryRepository
{
private const string SystemTenantId = "_system";
/// <summary>
/// Creates a new advisory repository.
/// </summary>
public AdvisoryRepository(ConcelierDataSource dataSource, ILogger<AdvisoryRepository> logger)
: base(dataSource, logger)
{
}
/// <inheritdoc />
public async Task<AdvisoryEntity> UpsertAsync(AdvisoryEntity advisory, CancellationToken cancellationToken = default)
{
const string sql = """
INSERT INTO vuln.advisories (
id, advisory_key, primary_vuln_id, source_id, title, summary, description,
severity, published_at, modified_at, withdrawn_at, provenance, raw_payload
)
VALUES (
@id, @advisory_key, @primary_vuln_id, @source_id, @title, @summary, @description,
@severity, @published_at, @modified_at, @withdrawn_at, @provenance::jsonb, @raw_payload::jsonb
)
ON CONFLICT (advisory_key) DO UPDATE SET
primary_vuln_id = EXCLUDED.primary_vuln_id,
source_id = COALESCE(EXCLUDED.source_id, vuln.advisories.source_id),
title = COALESCE(EXCLUDED.title, vuln.advisories.title),
summary = COALESCE(EXCLUDED.summary, vuln.advisories.summary),
description = COALESCE(EXCLUDED.description, vuln.advisories.description),
severity = COALESCE(EXCLUDED.severity, vuln.advisories.severity),
published_at = COALESCE(EXCLUDED.published_at, vuln.advisories.published_at),
modified_at = COALESCE(EXCLUDED.modified_at, vuln.advisories.modified_at),
withdrawn_at = EXCLUDED.withdrawn_at,
provenance = vuln.advisories.provenance || EXCLUDED.provenance,
raw_payload = EXCLUDED.raw_payload
RETURNING id, advisory_key, primary_vuln_id, source_id, title, summary, description,
severity, published_at, modified_at, withdrawn_at, provenance::text, raw_payload::text,
created_at, updated_at
""";
await using var connection = await DataSource.OpenSystemConnectionAsync(cancellationToken).ConfigureAwait(false);
await using var command = CreateCommand(sql, connection);
AddParameter(command, "id", advisory.Id);
AddParameter(command, "advisory_key", advisory.AdvisoryKey);
AddParameter(command, "primary_vuln_id", advisory.PrimaryVulnId);
AddParameter(command, "source_id", advisory.SourceId);
AddParameter(command, "title", advisory.Title);
AddParameter(command, "summary", advisory.Summary);
AddParameter(command, "description", advisory.Description);
AddParameter(command, "severity", advisory.Severity);
AddParameter(command, "published_at", advisory.PublishedAt);
AddParameter(command, "modified_at", advisory.ModifiedAt);
AddParameter(command, "withdrawn_at", advisory.WithdrawnAt);
AddJsonbParameter(command, "provenance", advisory.Provenance);
AddJsonbParameter(command, "raw_payload", advisory.RawPayload);
await using var reader = await command.ExecuteReaderAsync(cancellationToken).ConfigureAwait(false);
await reader.ReadAsync(cancellationToken).ConfigureAwait(false);
return MapAdvisory(reader);
}
/// <inheritdoc />
public async Task<AdvisoryEntity?> GetByIdAsync(Guid id, CancellationToken cancellationToken = default)
{
const string sql = """
SELECT id, advisory_key, primary_vuln_id, source_id, title, summary, description,
severity, published_at, modified_at, withdrawn_at, provenance::text, raw_payload::text,
created_at, updated_at
FROM vuln.advisories
WHERE id = @id
""";
return await QuerySingleOrDefaultAsync(
SystemTenantId,
sql,
cmd => AddParameter(cmd, "id", id),
MapAdvisory,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<AdvisoryEntity?> GetByKeyAsync(string advisoryKey, CancellationToken cancellationToken = default)
{
const string sql = """
SELECT id, advisory_key, primary_vuln_id, source_id, title, summary, description,
severity, published_at, modified_at, withdrawn_at, provenance::text, raw_payload::text,
created_at, updated_at
FROM vuln.advisories
WHERE advisory_key = @advisory_key
""";
return await QuerySingleOrDefaultAsync(
SystemTenantId,
sql,
cmd => AddParameter(cmd, "advisory_key", advisoryKey),
MapAdvisory,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<AdvisoryEntity?> GetByVulnIdAsync(string vulnId, CancellationToken cancellationToken = default)
{
const string sql = """
SELECT id, advisory_key, primary_vuln_id, source_id, title, summary, description,
severity, published_at, modified_at, withdrawn_at, provenance::text, raw_payload::text,
created_at, updated_at
FROM vuln.advisories
WHERE primary_vuln_id = @vuln_id
""";
return await QuerySingleOrDefaultAsync(
SystemTenantId,
sql,
cmd => AddParameter(cmd, "vuln_id", vulnId),
MapAdvisory,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<IReadOnlyList<AdvisoryEntity>> SearchAsync(
string query,
string? severity = null,
int limit = 50,
int offset = 0,
CancellationToken cancellationToken = default)
{
var sql = """
SELECT id, advisory_key, primary_vuln_id, source_id, title, summary, description,
severity, published_at, modified_at, withdrawn_at, provenance::text, raw_payload::text,
created_at, updated_at,
ts_rank(search_vector, websearch_to_tsquery('english', @query)) as rank
FROM vuln.advisories
WHERE search_vector @@ websearch_to_tsquery('english', @query)
""";
if (!string.IsNullOrEmpty(severity))
{
sql += " AND severity = @severity";
}
sql += " ORDER BY rank DESC, modified_at DESC, id LIMIT @limit OFFSET @offset";
return await QueryAsync(
SystemTenantId,
sql,
cmd =>
{
AddParameter(cmd, "query", query);
if (!string.IsNullOrEmpty(severity))
{
AddParameter(cmd, "severity", severity);
}
AddParameter(cmd, "limit", limit);
AddParameter(cmd, "offset", offset);
},
MapAdvisory,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<IReadOnlyList<AdvisoryEntity>> GetBySeverityAsync(
string severity,
int limit = 100,
int offset = 0,
CancellationToken cancellationToken = default)
{
const string sql = """
SELECT id, advisory_key, primary_vuln_id, source_id, title, summary, description,
severity, published_at, modified_at, withdrawn_at, provenance::text, raw_payload::text,
created_at, updated_at
FROM vuln.advisories
WHERE severity = @severity
ORDER BY modified_at DESC, id
LIMIT @limit OFFSET @offset
""";
return await QueryAsync(
SystemTenantId,
sql,
cmd =>
{
AddParameter(cmd, "severity", severity);
AddParameter(cmd, "limit", limit);
AddParameter(cmd, "offset", offset);
},
MapAdvisory,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<IReadOnlyList<AdvisoryEntity>> GetModifiedSinceAsync(
DateTimeOffset since,
int limit = 1000,
CancellationToken cancellationToken = default)
{
const string sql = """
SELECT id, advisory_key, primary_vuln_id, source_id, title, summary, description,
severity, published_at, modified_at, withdrawn_at, provenance::text, raw_payload::text,
created_at, updated_at
FROM vuln.advisories
WHERE modified_at > @since
ORDER BY modified_at, id
LIMIT @limit
""";
return await QueryAsync(
SystemTenantId,
sql,
cmd =>
{
AddParameter(cmd, "since", since);
AddParameter(cmd, "limit", limit);
},
MapAdvisory,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<IReadOnlyList<AdvisoryEntity>> GetBySourceAsync(
Guid sourceId,
int limit = 100,
int offset = 0,
CancellationToken cancellationToken = default)
{
const string sql = """
SELECT id, advisory_key, primary_vuln_id, source_id, title, summary, description,
severity, published_at, modified_at, withdrawn_at, provenance::text, raw_payload::text,
created_at, updated_at
FROM vuln.advisories
WHERE source_id = @source_id
ORDER BY modified_at DESC, id
LIMIT @limit OFFSET @offset
""";
return await QueryAsync(
SystemTenantId,
sql,
cmd =>
{
AddParameter(cmd, "source_id", sourceId);
AddParameter(cmd, "limit", limit);
AddParameter(cmd, "offset", offset);
},
MapAdvisory,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<long> CountAsync(CancellationToken cancellationToken = default)
{
const string sql = "SELECT COUNT(*) FROM vuln.advisories";
var result = await ExecuteScalarAsync<long>(
SystemTenantId,
sql,
null,
cancellationToken).ConfigureAwait(false);
return result;
}
/// <inheritdoc />
public async Task<IDictionary<string, long>> CountBySeverityAsync(CancellationToken cancellationToken = default)
{
const string sql = """
SELECT COALESCE(severity, 'unknown') as severity, COUNT(*) as count
FROM vuln.advisories
GROUP BY severity
ORDER BY severity
""";
var results = await QueryAsync(
SystemTenantId,
sql,
null,
reader => (
Severity: reader.GetString(0),
Count: reader.GetInt64(1)
),
cancellationToken).ConfigureAwait(false);
return results.ToDictionary(r => r.Severity, r => r.Count);
}
private static AdvisoryEntity MapAdvisory(NpgsqlDataReader reader) => new()
{
Id = reader.GetGuid(0),
AdvisoryKey = reader.GetString(1),
PrimaryVulnId = reader.GetString(2),
SourceId = GetNullableGuid(reader, 3),
Title = GetNullableString(reader, 4),
Summary = GetNullableString(reader, 5),
Description = GetNullableString(reader, 6),
Severity = GetNullableString(reader, 7),
PublishedAt = GetNullableDateTimeOffset(reader, 8),
ModifiedAt = GetNullableDateTimeOffset(reader, 9),
WithdrawnAt = GetNullableDateTimeOffset(reader, 10),
Provenance = reader.GetString(11),
RawPayload = GetNullableString(reader, 12),
CreatedAt = reader.GetFieldValue<DateTimeOffset>(13),
UpdatedAt = reader.GetFieldValue<DateTimeOffset>(14)
};
}

View File

@@ -0,0 +1,75 @@
using StellaOps.Concelier.Storage.Postgres.Models;
namespace StellaOps.Concelier.Storage.Postgres.Repositories;
/// <summary>
/// Repository interface for advisory operations.
/// </summary>
public interface IAdvisoryRepository
{
/// <summary>
/// Creates or updates an advisory (upsert by advisory_key).
/// </summary>
Task<AdvisoryEntity> UpsertAsync(AdvisoryEntity advisory, CancellationToken cancellationToken = default);
/// <summary>
/// Gets an advisory by ID.
/// </summary>
Task<AdvisoryEntity?> GetByIdAsync(Guid id, CancellationToken cancellationToken = default);
/// <summary>
/// Gets an advisory by key.
/// </summary>
Task<AdvisoryEntity?> GetByKeyAsync(string advisoryKey, CancellationToken cancellationToken = default);
/// <summary>
/// Gets an advisory by primary vulnerability ID (CVE, GHSA, etc.).
/// </summary>
Task<AdvisoryEntity?> GetByVulnIdAsync(string vulnId, CancellationToken cancellationToken = default);
/// <summary>
/// Searches advisories by full-text search.
/// </summary>
Task<IReadOnlyList<AdvisoryEntity>> SearchAsync(
string query,
string? severity = null,
int limit = 50,
int offset = 0,
CancellationToken cancellationToken = default);
/// <summary>
/// Gets advisories by severity.
/// </summary>
Task<IReadOnlyList<AdvisoryEntity>> GetBySeverityAsync(
string severity,
int limit = 100,
int offset = 0,
CancellationToken cancellationToken = default);
/// <summary>
/// Gets advisories modified since a given time.
/// </summary>
Task<IReadOnlyList<AdvisoryEntity>> GetModifiedSinceAsync(
DateTimeOffset since,
int limit = 1000,
CancellationToken cancellationToken = default);
/// <summary>
/// Gets advisories by source.
/// </summary>
Task<IReadOnlyList<AdvisoryEntity>> GetBySourceAsync(
Guid sourceId,
int limit = 100,
int offset = 0,
CancellationToken cancellationToken = default);
/// <summary>
/// Counts total advisories.
/// </summary>
Task<long> CountAsync(CancellationToken cancellationToken = default);
/// <summary>
/// Counts advisories by severity.
/// </summary>
Task<IDictionary<string, long>> CountBySeverityAsync(CancellationToken cancellationToken = default);
}

View File

@@ -0,0 +1,53 @@
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using StellaOps.Concelier.Storage.Postgres.Repositories;
using StellaOps.Infrastructure.Postgres;
using StellaOps.Infrastructure.Postgres.Options;
namespace StellaOps.Concelier.Storage.Postgres;
/// <summary>
/// Extension methods for configuring Concelier PostgreSQL storage services.
/// </summary>
public static class ServiceCollectionExtensions
{
/// <summary>
/// Adds Concelier PostgreSQL storage services.
/// </summary>
/// <param name="services">Service collection.</param>
/// <param name="configuration">Configuration root.</param>
/// <param name="sectionName">Configuration section name for PostgreSQL options.</param>
/// <returns>Service collection for chaining.</returns>
public static IServiceCollection AddConcelierPostgresStorage(
this IServiceCollection services,
IConfiguration configuration,
string sectionName = "Postgres:Concelier")
{
services.Configure<PostgresOptions>(sectionName, configuration.GetSection(sectionName));
services.AddSingleton<ConcelierDataSource>();
// Register repositories
services.AddScoped<IAdvisoryRepository, AdvisoryRepository>();
return services;
}
/// <summary>
/// Adds Concelier PostgreSQL storage services with explicit options.
/// </summary>
/// <param name="services">Service collection.</param>
/// <param name="configureOptions">Options configuration action.</param>
/// <returns>Service collection for chaining.</returns>
public static IServiceCollection AddConcelierPostgresStorage(
this IServiceCollection services,
Action<PostgresOptions> configureOptions)
{
services.Configure(configureOptions);
services.AddSingleton<ConcelierDataSource>();
// Register repositories
services.AddScoped<IAdvisoryRepository, AdvisoryRepository>();
return services;
}
}

View File

@@ -0,0 +1,21 @@
<?xml version="1.0" ?>
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
<LangVersion>preview</LangVersion>
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
<RootNamespace>StellaOps.Concelier.Storage.Postgres</RootNamespace>
</PropertyGroup>
<ItemGroup>
<None Include="Migrations\**\*.sql" CopyToOutputDirectory="PreserveNewest" />
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\..\..\__Libraries\StellaOps.Infrastructure.Postgres\StellaOps.Infrastructure.Postgres.csproj" />
</ItemGroup>
</Project>

View File

@@ -0,0 +1,50 @@
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Options;
using Npgsql;
using StellaOps.Infrastructure.Postgres.Connections;
using StellaOps.Infrastructure.Postgres.Options;
namespace StellaOps.Excititor.Storage.Postgres;
/// <summary>
/// PostgreSQL data source for the Excititor (VEX) module.
/// Manages connections with tenant context for VEX statements and dependency graphs.
/// </summary>
/// <remarks>
/// The Excititor module handles high-volume graph data (nodes/edges) and requires
/// optimized queries for graph traversal and VEX consensus computation.
/// </remarks>
public sealed class ExcititorDataSource : DataSourceBase
{
/// <summary>
/// Default schema name for Excititor/VEX tables.
/// </summary>
public const string DefaultSchemaName = "vex";
/// <summary>
/// Creates a new Excititor data source.
/// </summary>
public ExcititorDataSource(IOptions<PostgresOptions> options, ILogger<ExcititorDataSource> logger)
: base(CreateOptions(options.Value), logger)
{
}
/// <inheritdoc />
protected override string ModuleName => "Excititor";
/// <inheritdoc />
protected override void ConfigureDataSourceBuilder(NpgsqlDataSourceBuilder builder)
{
base.ConfigureDataSourceBuilder(builder);
// Configure for high-throughput graph operations
}
private static PostgresOptions CreateOptions(PostgresOptions baseOptions)
{
if (string.IsNullOrWhiteSpace(baseOptions.SchemaName))
{
baseOptions.SchemaName = DefaultSchemaName;
}
return baseOptions;
}
}

View File

@@ -0,0 +1,324 @@
-- VEX Schema Migration 001: Initial Schema
-- Creates the vex schema for VEX statements and dependency graphs
-- Create schema
CREATE SCHEMA IF NOT EXISTS vex;
-- Projects table
CREATE TABLE IF NOT EXISTS vex.projects (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL,
name TEXT NOT NULL,
display_name TEXT,
description TEXT,
repository_url TEXT,
default_branch TEXT,
settings JSONB NOT NULL DEFAULT '{}',
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by TEXT,
UNIQUE(tenant_id, name)
);
CREATE INDEX idx_projects_tenant ON vex.projects(tenant_id);
-- Graph revisions table
CREATE TABLE IF NOT EXISTS vex.graph_revisions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
project_id UUID NOT NULL REFERENCES vex.projects(id) ON DELETE CASCADE,
revision_id TEXT NOT NULL UNIQUE,
parent_revision_id TEXT,
sbom_digest TEXT NOT NULL,
feed_snapshot_id TEXT,
policy_version TEXT,
node_count INT NOT NULL DEFAULT 0,
edge_count INT NOT NULL DEFAULT 0,
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by TEXT
);
CREATE INDEX idx_graph_revisions_project ON vex.graph_revisions(project_id);
CREATE INDEX idx_graph_revisions_revision ON vex.graph_revisions(revision_id);
CREATE INDEX idx_graph_revisions_created ON vex.graph_revisions(project_id, created_at DESC);
-- Graph nodes table (BIGSERIAL for high volume)
CREATE TABLE IF NOT EXISTS vex.graph_nodes (
id BIGSERIAL PRIMARY KEY,
graph_revision_id UUID NOT NULL REFERENCES vex.graph_revisions(id) ON DELETE CASCADE,
node_key TEXT NOT NULL,
node_type TEXT NOT NULL,
purl TEXT,
name TEXT,
version TEXT,
attributes JSONB NOT NULL DEFAULT '{}',
UNIQUE(graph_revision_id, node_key)
);
CREATE INDEX idx_graph_nodes_revision ON vex.graph_nodes(graph_revision_id);
CREATE INDEX idx_graph_nodes_key ON vex.graph_nodes(graph_revision_id, node_key);
CREATE INDEX idx_graph_nodes_purl ON vex.graph_nodes(purl);
CREATE INDEX idx_graph_nodes_type ON vex.graph_nodes(graph_revision_id, node_type);
-- Graph edges table (BIGSERIAL for high volume)
CREATE TABLE IF NOT EXISTS vex.graph_edges (
id BIGSERIAL PRIMARY KEY,
graph_revision_id UUID NOT NULL REFERENCES vex.graph_revisions(id) ON DELETE CASCADE,
from_node_id BIGINT NOT NULL REFERENCES vex.graph_nodes(id) ON DELETE CASCADE,
to_node_id BIGINT NOT NULL REFERENCES vex.graph_nodes(id) ON DELETE CASCADE,
edge_type TEXT NOT NULL,
attributes JSONB NOT NULL DEFAULT '{}'
);
CREATE INDEX idx_graph_edges_revision ON vex.graph_edges(graph_revision_id);
CREATE INDEX idx_graph_edges_from ON vex.graph_edges(from_node_id);
CREATE INDEX idx_graph_edges_to ON vex.graph_edges(to_node_id);
-- VEX statements table
CREATE TABLE IF NOT EXISTS vex.statements (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL,
project_id UUID REFERENCES vex.projects(id),
graph_revision_id UUID REFERENCES vex.graph_revisions(id),
vulnerability_id TEXT NOT NULL,
product_id TEXT,
status TEXT NOT NULL CHECK (status IN (
'not_affected', 'affected', 'fixed', 'under_investigation'
)),
justification TEXT CHECK (justification IN (
'component_not_present', 'vulnerable_code_not_present',
'vulnerable_code_not_in_execute_path', 'vulnerable_code_cannot_be_controlled_by_adversary',
'inline_mitigations_already_exist'
)),
impact_statement TEXT,
action_statement TEXT,
action_statement_timestamp TIMESTAMPTZ,
first_issued TIMESTAMPTZ NOT NULL DEFAULT NOW(),
last_updated TIMESTAMPTZ NOT NULL DEFAULT NOW(),
source TEXT,
source_url TEXT,
evidence JSONB NOT NULL DEFAULT '{}',
provenance JSONB NOT NULL DEFAULT '{}',
metadata JSONB NOT NULL DEFAULT '{}',
created_by TEXT
);
CREATE INDEX idx_statements_tenant ON vex.statements(tenant_id);
CREATE INDEX idx_statements_project ON vex.statements(project_id);
CREATE INDEX idx_statements_revision ON vex.statements(graph_revision_id);
CREATE INDEX idx_statements_vuln ON vex.statements(vulnerability_id);
CREATE INDEX idx_statements_status ON vex.statements(tenant_id, status);
-- VEX observations table
CREATE TABLE IF NOT EXISTS vex.observations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL,
statement_id UUID REFERENCES vex.statements(id) ON DELETE CASCADE,
vulnerability_id TEXT NOT NULL,
product_id TEXT NOT NULL,
observed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
observer TEXT NOT NULL,
observation_type TEXT NOT NULL,
confidence NUMERIC(3,2),
details JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(tenant_id, vulnerability_id, product_id, observer, observation_type)
);
CREATE INDEX idx_observations_tenant ON vex.observations(tenant_id);
CREATE INDEX idx_observations_statement ON vex.observations(statement_id);
CREATE INDEX idx_observations_vuln ON vex.observations(vulnerability_id, product_id);
-- Linksets table
CREATE TABLE IF NOT EXISTS vex.linksets (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL,
name TEXT NOT NULL,
description TEXT,
source_type TEXT NOT NULL,
source_url TEXT,
enabled BOOLEAN NOT NULL DEFAULT TRUE,
priority INT NOT NULL DEFAULT 0,
filter JSONB NOT NULL DEFAULT '{}',
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(tenant_id, name)
);
CREATE INDEX idx_linksets_tenant ON vex.linksets(tenant_id);
CREATE INDEX idx_linksets_enabled ON vex.linksets(tenant_id, enabled, priority DESC);
-- Linkset events table
CREATE TABLE IF NOT EXISTS vex.linkset_events (
id BIGSERIAL PRIMARY KEY,
linkset_id UUID NOT NULL REFERENCES vex.linksets(id) ON DELETE CASCADE,
event_type TEXT NOT NULL,
statement_count INT NOT NULL DEFAULT 0,
error_message TEXT,
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_linkset_events_linkset ON vex.linkset_events(linkset_id);
CREATE INDEX idx_linkset_events_created ON vex.linkset_events(created_at);
-- Consensus table (VEX consensus state)
CREATE TABLE IF NOT EXISTS vex.consensus (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL,
vulnerability_id TEXT NOT NULL,
product_id TEXT NOT NULL,
consensus_status TEXT NOT NULL,
contributing_statements UUID[] NOT NULL DEFAULT '{}',
confidence NUMERIC(3,2),
computed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
metadata JSONB NOT NULL DEFAULT '{}',
UNIQUE(tenant_id, vulnerability_id, product_id)
);
CREATE INDEX idx_consensus_tenant ON vex.consensus(tenant_id);
CREATE INDEX idx_consensus_vuln ON vex.consensus(vulnerability_id, product_id);
-- Consensus holds table
CREATE TABLE IF NOT EXISTS vex.consensus_holds (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
consensus_id UUID NOT NULL REFERENCES vex.consensus(id) ON DELETE CASCADE,
hold_type TEXT NOT NULL,
reason TEXT NOT NULL,
held_by TEXT NOT NULL,
held_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
released_at TIMESTAMPTZ,
released_by TEXT,
metadata JSONB NOT NULL DEFAULT '{}'
);
CREATE INDEX idx_consensus_holds_consensus ON vex.consensus_holds(consensus_id);
CREATE INDEX idx_consensus_holds_active ON vex.consensus_holds(consensus_id, released_at)
WHERE released_at IS NULL;
-- Unknown snapshots table
CREATE TABLE IF NOT EXISTS vex.unknowns_snapshots (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL,
project_id UUID REFERENCES vex.projects(id),
graph_revision_id UUID REFERENCES vex.graph_revisions(id),
snapshot_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
unknown_count INT NOT NULL DEFAULT 0,
metadata JSONB NOT NULL DEFAULT '{}'
);
CREATE INDEX idx_unknowns_snapshots_tenant ON vex.unknowns_snapshots(tenant_id);
CREATE INDEX idx_unknowns_snapshots_project ON vex.unknowns_snapshots(project_id);
-- Unknown items table
CREATE TABLE IF NOT EXISTS vex.unknown_items (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
snapshot_id UUID NOT NULL REFERENCES vex.unknowns_snapshots(id) ON DELETE CASCADE,
vulnerability_id TEXT NOT NULL,
product_id TEXT,
reason TEXT NOT NULL,
metadata JSONB NOT NULL DEFAULT '{}'
);
CREATE INDEX idx_unknown_items_snapshot ON vex.unknown_items(snapshot_id);
CREATE INDEX idx_unknown_items_vuln ON vex.unknown_items(vulnerability_id);
-- Evidence manifests table
CREATE TABLE IF NOT EXISTS vex.evidence_manifests (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL,
statement_id UUID REFERENCES vex.statements(id) ON DELETE CASCADE,
manifest_type TEXT NOT NULL,
content_hash TEXT NOT NULL,
content JSONB NOT NULL,
source TEXT,
collected_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
metadata JSONB NOT NULL DEFAULT '{}'
);
CREATE INDEX idx_evidence_manifests_tenant ON vex.evidence_manifests(tenant_id);
CREATE INDEX idx_evidence_manifests_statement ON vex.evidence_manifests(statement_id);
-- CVSS receipts table
CREATE TABLE IF NOT EXISTS vex.cvss_receipts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL,
statement_id UUID REFERENCES vex.statements(id) ON DELETE CASCADE,
vulnerability_id TEXT NOT NULL,
cvss_version TEXT NOT NULL,
vector_string TEXT NOT NULL,
base_score NUMERIC(3,1) NOT NULL,
environmental_score NUMERIC(3,1),
temporal_score NUMERIC(3,1),
computed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
metadata JSONB NOT NULL DEFAULT '{}'
);
CREATE INDEX idx_cvss_receipts_tenant ON vex.cvss_receipts(tenant_id);
CREATE INDEX idx_cvss_receipts_statement ON vex.cvss_receipts(statement_id);
CREATE INDEX idx_cvss_receipts_vuln ON vex.cvss_receipts(vulnerability_id);
-- Attestations table
CREATE TABLE IF NOT EXISTS vex.attestations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL,
statement_id UUID REFERENCES vex.statements(id),
subject_digest TEXT NOT NULL,
predicate_type TEXT NOT NULL,
predicate JSONB NOT NULL,
signature TEXT,
signature_algorithm TEXT,
signed_by TEXT,
signed_at TIMESTAMPTZ,
verified BOOLEAN NOT NULL DEFAULT FALSE,
verified_at TIMESTAMPTZ,
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_attestations_tenant ON vex.attestations(tenant_id);
CREATE INDEX idx_attestations_statement ON vex.attestations(statement_id);
CREATE INDEX idx_attestations_subject ON vex.attestations(subject_digest);
-- Timeline events table
CREATE TABLE IF NOT EXISTS vex.timeline_events (
id BIGSERIAL PRIMARY KEY,
tenant_id TEXT NOT NULL,
project_id UUID REFERENCES vex.projects(id),
statement_id UUID REFERENCES vex.statements(id),
event_type TEXT NOT NULL,
event_data JSONB NOT NULL DEFAULT '{}',
actor TEXT,
correlation_id TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_timeline_events_tenant ON vex.timeline_events(tenant_id);
CREATE INDEX idx_timeline_events_project ON vex.timeline_events(project_id);
CREATE INDEX idx_timeline_events_statement ON vex.timeline_events(statement_id);
CREATE INDEX idx_timeline_events_created ON vex.timeline_events(tenant_id, created_at);
CREATE INDEX idx_timeline_events_correlation ON vex.timeline_events(correlation_id);
-- Update timestamp function
CREATE OR REPLACE FUNCTION vex.update_updated_at()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- Triggers
CREATE TRIGGER trg_projects_updated_at
BEFORE UPDATE ON vex.projects
FOR EACH ROW EXECUTE FUNCTION vex.update_updated_at();
CREATE TRIGGER trg_linksets_updated_at
BEFORE UPDATE ON vex.linksets
FOR EACH ROW EXECUTE FUNCTION vex.update_updated_at();
CREATE TRIGGER trg_statements_updated_at
BEFORE UPDATE ON vex.statements
FOR EACH ROW EXECUTE FUNCTION vex.update_updated_at();

View File

@@ -0,0 +1,67 @@
namespace StellaOps.Excititor.Storage.Postgres.Models;
/// <summary>
/// Represents a project entity in the vex schema.
/// </summary>
public sealed class ProjectEntity
{
/// <summary>
/// Unique project identifier.
/// </summary>
public required Guid Id { get; init; }
/// <summary>
/// Tenant this project belongs to.
/// </summary>
public required string TenantId { get; init; }
/// <summary>
/// Project name (unique per tenant).
/// </summary>
public required string Name { get; init; }
/// <summary>
/// Display name.
/// </summary>
public string? DisplayName { get; init; }
/// <summary>
/// Project description.
/// </summary>
public string? Description { get; init; }
/// <summary>
/// Repository URL.
/// </summary>
public string? RepositoryUrl { get; init; }
/// <summary>
/// Default branch name.
/// </summary>
public string? DefaultBranch { get; init; }
/// <summary>
/// Project settings as JSON.
/// </summary>
public string Settings { get; init; } = "{}";
/// <summary>
/// Project metadata as JSON.
/// </summary>
public string Metadata { get; init; } = "{}";
/// <summary>
/// When the project was created.
/// </summary>
public DateTimeOffset CreatedAt { get; init; }
/// <summary>
/// When the project was last updated.
/// </summary>
public DateTimeOffset UpdatedAt { get; init; }
/// <summary>
/// User who created the project.
/// </summary>
public string? CreatedBy { get; init; }
}

View File

@@ -0,0 +1,134 @@
namespace StellaOps.Excititor.Storage.Postgres.Models;
/// <summary>
/// VEX status values per OpenVEX specification.
/// </summary>
public enum VexStatus
{
/// <summary>Product is not affected by the vulnerability.</summary>
NotAffected,
/// <summary>Product is affected by the vulnerability.</summary>
Affected,
/// <summary>Vulnerability is fixed in this product version.</summary>
Fixed,
/// <summary>Vulnerability is under investigation.</summary>
UnderInvestigation
}
/// <summary>
/// VEX justification codes per OpenVEX specification.
/// </summary>
public enum VexJustification
{
/// <summary>The vulnerable component is not present.</summary>
ComponentNotPresent,
/// <summary>The vulnerable code is not present.</summary>
VulnerableCodeNotPresent,
/// <summary>The vulnerable code is not in execute path.</summary>
VulnerableCodeNotInExecutePath,
/// <summary>The vulnerable code cannot be controlled by adversary.</summary>
VulnerableCodeCannotBeControlledByAdversary,
/// <summary>Inline mitigations already exist.</summary>
InlineMitigationsAlreadyExist
}
/// <summary>
/// Represents a VEX statement entity in the vex schema.
/// </summary>
public sealed class VexStatementEntity
{
/// <summary>
/// Unique statement identifier.
/// </summary>
public required Guid Id { get; init; }
/// <summary>
/// Tenant this statement belongs to.
/// </summary>
public required string TenantId { get; init; }
/// <summary>
/// Project this statement applies to.
/// </summary>
public Guid? ProjectId { get; init; }
/// <summary>
/// Graph revision this statement is associated with.
/// </summary>
public Guid? GraphRevisionId { get; init; }
/// <summary>
/// Vulnerability ID (CVE, GHSA, etc.).
/// </summary>
public required string VulnerabilityId { get; init; }
/// <summary>
/// Product identifier (PURL or product key).
/// </summary>
public string? ProductId { get; init; }
/// <summary>
/// VEX status.
/// </summary>
public required VexStatus Status { get; init; }
/// <summary>
/// Justification for not_affected status.
/// </summary>
public VexJustification? Justification { get; init; }
/// <summary>
/// Impact statement describing effects.
/// </summary>
public string? ImpactStatement { get; init; }
/// <summary>
/// Action statement describing remediation.
/// </summary>
public string? ActionStatement { get; init; }
/// <summary>
/// When action statement was issued.
/// </summary>
public DateTimeOffset? ActionStatementTimestamp { get; init; }
/// <summary>
/// When statement was first issued.
/// </summary>
public DateTimeOffset FirstIssued { get; init; }
/// <summary>
/// When statement was last updated.
/// </summary>
public DateTimeOffset LastUpdated { get; init; }
/// <summary>
/// Source of the statement.
/// </summary>
public string? Source { get; init; }
/// <summary>
/// URL to source document.
/// </summary>
public string? SourceUrl { get; init; }
/// <summary>
/// Evidence supporting the statement as JSON.
/// </summary>
public string Evidence { get; init; } = "{}";
/// <summary>
/// Provenance information as JSON.
/// </summary>
public string Provenance { get; init; } = "{}";
/// <summary>
/// Statement metadata as JSON.
/// </summary>
public string Metadata { get; init; } = "{}";
/// <summary>
/// User who created the statement.
/// </summary>
public string? CreatedBy { get; init; }
}

View File

@@ -0,0 +1,75 @@
using StellaOps.Excititor.Storage.Postgres.Models;
namespace StellaOps.Excititor.Storage.Postgres.Repositories;
/// <summary>
/// Repository interface for VEX statement operations.
/// </summary>
public interface IVexStatementRepository
{
/// <summary>
/// Creates a new VEX statement.
/// </summary>
Task<VexStatementEntity> CreateAsync(VexStatementEntity statement, CancellationToken cancellationToken = default);
/// <summary>
/// Gets a VEX statement by ID.
/// </summary>
Task<VexStatementEntity?> GetByIdAsync(string tenantId, Guid id, CancellationToken cancellationToken = default);
/// <summary>
/// Gets VEX statements for a vulnerability.
/// </summary>
Task<IReadOnlyList<VexStatementEntity>> GetByVulnerabilityAsync(
string tenantId,
string vulnerabilityId,
CancellationToken cancellationToken = default);
/// <summary>
/// Gets VEX statements for a product.
/// </summary>
Task<IReadOnlyList<VexStatementEntity>> GetByProductAsync(
string tenantId,
string productId,
CancellationToken cancellationToken = default);
/// <summary>
/// Gets VEX statements for a project.
/// </summary>
Task<IReadOnlyList<VexStatementEntity>> GetByProjectAsync(
string tenantId,
Guid projectId,
int limit = 100,
int offset = 0,
CancellationToken cancellationToken = default);
/// <summary>
/// Gets VEX statements by status.
/// </summary>
Task<IReadOnlyList<VexStatementEntity>> GetByStatusAsync(
string tenantId,
VexStatus status,
int limit = 100,
int offset = 0,
CancellationToken cancellationToken = default);
/// <summary>
/// Updates a VEX statement.
/// </summary>
Task<bool> UpdateAsync(VexStatementEntity statement, CancellationToken cancellationToken = default);
/// <summary>
/// Deletes a VEX statement.
/// </summary>
Task<bool> DeleteAsync(string tenantId, Guid id, CancellationToken cancellationToken = default);
/// <summary>
/// Gets the effective VEX status for a vulnerability/product combination.
/// Applies lattice logic for status precedence.
/// </summary>
Task<VexStatementEntity?> GetEffectiveStatementAsync(
string tenantId,
string vulnerabilityId,
string productId,
CancellationToken cancellationToken = default);
}

View File

@@ -0,0 +1,385 @@
using Microsoft.Extensions.Logging;
using Npgsql;
using StellaOps.Excititor.Storage.Postgres.Models;
using StellaOps.Infrastructure.Postgres.Repositories;
namespace StellaOps.Excititor.Storage.Postgres.Repositories;
/// <summary>
/// PostgreSQL repository for VEX statement operations.
/// </summary>
public sealed class VexStatementRepository : RepositoryBase<ExcititorDataSource>, IVexStatementRepository
{
/// <summary>
/// Creates a new VEX statement repository.
/// </summary>
public VexStatementRepository(ExcititorDataSource dataSource, ILogger<VexStatementRepository> logger)
: base(dataSource, logger)
{
}
/// <inheritdoc />
public async Task<VexStatementEntity> CreateAsync(VexStatementEntity statement, CancellationToken cancellationToken = default)
{
const string sql = """
INSERT INTO vex.statements (
id, tenant_id, project_id, graph_revision_id, vulnerability_id, product_id,
status, justification, impact_statement, action_statement, action_statement_timestamp,
source, source_url, evidence, provenance, metadata, created_by
)
VALUES (
@id, @tenant_id, @project_id, @graph_revision_id, @vulnerability_id, @product_id,
@status, @justification, @impact_statement, @action_statement, @action_statement_timestamp,
@source, @source_url, @evidence::jsonb, @provenance::jsonb, @metadata::jsonb, @created_by
)
RETURNING id, tenant_id, project_id, graph_revision_id, vulnerability_id, product_id,
status, justification, impact_statement, action_statement, action_statement_timestamp,
first_issued, last_updated, source, source_url,
evidence::text, provenance::text, metadata::text, created_by
""";
await using var connection = await DataSource.OpenConnectionAsync(statement.TenantId, "writer", cancellationToken)
.ConfigureAwait(false);
await using var command = CreateCommand(sql, connection);
AddStatementParameters(command, statement);
await using var reader = await command.ExecuteReaderAsync(cancellationToken).ConfigureAwait(false);
await reader.ReadAsync(cancellationToken).ConfigureAwait(false);
return MapStatement(reader);
}
/// <inheritdoc />
public async Task<VexStatementEntity?> GetByIdAsync(string tenantId, Guid id, CancellationToken cancellationToken = default)
{
const string sql = """
SELECT id, tenant_id, project_id, graph_revision_id, vulnerability_id, product_id,
status, justification, impact_statement, action_statement, action_statement_timestamp,
first_issued, last_updated, source, source_url,
evidence::text, provenance::text, metadata::text, created_by
FROM vex.statements
WHERE tenant_id = @tenant_id AND id = @id
""";
return await QuerySingleOrDefaultAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "id", id);
},
MapStatement,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<IReadOnlyList<VexStatementEntity>> GetByVulnerabilityAsync(
string tenantId,
string vulnerabilityId,
CancellationToken cancellationToken = default)
{
const string sql = """
SELECT id, tenant_id, project_id, graph_revision_id, vulnerability_id, product_id,
status, justification, impact_statement, action_statement, action_statement_timestamp,
first_issued, last_updated, source, source_url,
evidence::text, provenance::text, metadata::text, created_by
FROM vex.statements
WHERE tenant_id = @tenant_id AND vulnerability_id = @vulnerability_id
ORDER BY last_updated DESC, id
""";
return await QueryAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "vulnerability_id", vulnerabilityId);
},
MapStatement,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<IReadOnlyList<VexStatementEntity>> GetByProductAsync(
string tenantId,
string productId,
CancellationToken cancellationToken = default)
{
const string sql = """
SELECT id, tenant_id, project_id, graph_revision_id, vulnerability_id, product_id,
status, justification, impact_statement, action_statement, action_statement_timestamp,
first_issued, last_updated, source, source_url,
evidence::text, provenance::text, metadata::text, created_by
FROM vex.statements
WHERE tenant_id = @tenant_id AND product_id = @product_id
ORDER BY last_updated DESC, id
""";
return await QueryAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "product_id", productId);
},
MapStatement,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<IReadOnlyList<VexStatementEntity>> GetByProjectAsync(
string tenantId,
Guid projectId,
int limit = 100,
int offset = 0,
CancellationToken cancellationToken = default)
{
const string sql = """
SELECT id, tenant_id, project_id, graph_revision_id, vulnerability_id, product_id,
status, justification, impact_statement, action_statement, action_statement_timestamp,
first_issued, last_updated, source, source_url,
evidence::text, provenance::text, metadata::text, created_by
FROM vex.statements
WHERE tenant_id = @tenant_id AND project_id = @project_id
ORDER BY last_updated DESC, id
LIMIT @limit OFFSET @offset
""";
return await QueryAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "project_id", projectId);
AddParameter(cmd, "limit", limit);
AddParameter(cmd, "offset", offset);
},
MapStatement,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<IReadOnlyList<VexStatementEntity>> GetByStatusAsync(
string tenantId,
VexStatus status,
int limit = 100,
int offset = 0,
CancellationToken cancellationToken = default)
{
const string sql = """
SELECT id, tenant_id, project_id, graph_revision_id, vulnerability_id, product_id,
status, justification, impact_statement, action_statement, action_statement_timestamp,
first_issued, last_updated, source, source_url,
evidence::text, provenance::text, metadata::text, created_by
FROM vex.statements
WHERE tenant_id = @tenant_id AND status = @status
ORDER BY last_updated DESC, id
LIMIT @limit OFFSET @offset
""";
return await QueryAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "status", StatusToString(status));
AddParameter(cmd, "limit", limit);
AddParameter(cmd, "offset", offset);
},
MapStatement,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<bool> UpdateAsync(VexStatementEntity statement, CancellationToken cancellationToken = default)
{
const string sql = """
UPDATE vex.statements
SET status = @status,
justification = @justification,
impact_statement = @impact_statement,
action_statement = @action_statement,
action_statement_timestamp = @action_statement_timestamp,
source = @source,
source_url = @source_url,
evidence = @evidence::jsonb,
provenance = @provenance::jsonb,
metadata = @metadata::jsonb
WHERE tenant_id = @tenant_id AND id = @id
""";
var rows = await ExecuteAsync(
statement.TenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", statement.TenantId);
AddParameter(cmd, "id", statement.Id);
AddParameter(cmd, "status", StatusToString(statement.Status));
AddParameter(cmd, "justification", statement.Justification.HasValue
? JustificationToString(statement.Justification.Value)
: null);
AddParameter(cmd, "impact_statement", statement.ImpactStatement);
AddParameter(cmd, "action_statement", statement.ActionStatement);
AddParameter(cmd, "action_statement_timestamp", statement.ActionStatementTimestamp);
AddParameter(cmd, "source", statement.Source);
AddParameter(cmd, "source_url", statement.SourceUrl);
AddJsonbParameter(cmd, "evidence", statement.Evidence);
AddJsonbParameter(cmd, "provenance", statement.Provenance);
AddJsonbParameter(cmd, "metadata", statement.Metadata);
},
cancellationToken).ConfigureAwait(false);
return rows > 0;
}
/// <inheritdoc />
public async Task<bool> DeleteAsync(string tenantId, Guid id, CancellationToken cancellationToken = default)
{
const string sql = "DELETE FROM vex.statements WHERE tenant_id = @tenant_id AND id = @id";
var rows = await ExecuteAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "id", id);
},
cancellationToken).ConfigureAwait(false);
return rows > 0;
}
/// <inheritdoc />
public async Task<VexStatementEntity?> GetEffectiveStatementAsync(
string tenantId,
string vulnerabilityId,
string productId,
CancellationToken cancellationToken = default)
{
// VEX lattice precedence: fixed > not_affected > affected > under_investigation
const string sql = """
SELECT id, tenant_id, project_id, graph_revision_id, vulnerability_id, product_id,
status, justification, impact_statement, action_statement, action_statement_timestamp,
first_issued, last_updated, source, source_url,
evidence::text, provenance::text, metadata::text, created_by
FROM vex.statements
WHERE tenant_id = @tenant_id
AND vulnerability_id = @vulnerability_id
AND product_id = @product_id
ORDER BY
CASE status
WHEN 'fixed' THEN 1
WHEN 'not_affected' THEN 2
WHEN 'affected' THEN 3
WHEN 'under_investigation' THEN 4
END,
last_updated DESC
LIMIT 1
""";
return await QuerySingleOrDefaultAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "vulnerability_id", vulnerabilityId);
AddParameter(cmd, "product_id", productId);
},
MapStatement,
cancellationToken).ConfigureAwait(false);
}
private static void AddStatementParameters(NpgsqlCommand command, VexStatementEntity statement)
{
AddParameter(command, "id", statement.Id);
AddParameter(command, "tenant_id", statement.TenantId);
AddParameter(command, "project_id", statement.ProjectId);
AddParameter(command, "graph_revision_id", statement.GraphRevisionId);
AddParameter(command, "vulnerability_id", statement.VulnerabilityId);
AddParameter(command, "product_id", statement.ProductId);
AddParameter(command, "status", StatusToString(statement.Status));
AddParameter(command, "justification", statement.Justification.HasValue
? JustificationToString(statement.Justification.Value)
: null);
AddParameter(command, "impact_statement", statement.ImpactStatement);
AddParameter(command, "action_statement", statement.ActionStatement);
AddParameter(command, "action_statement_timestamp", statement.ActionStatementTimestamp);
AddParameter(command, "source", statement.Source);
AddParameter(command, "source_url", statement.SourceUrl);
AddJsonbParameter(command, "evidence", statement.Evidence);
AddJsonbParameter(command, "provenance", statement.Provenance);
AddJsonbParameter(command, "metadata", statement.Metadata);
AddParameter(command, "created_by", statement.CreatedBy);
}
private static VexStatementEntity MapStatement(NpgsqlDataReader reader) => new()
{
Id = reader.GetGuid(0),
TenantId = reader.GetString(1),
ProjectId = GetNullableGuid(reader, 2),
GraphRevisionId = GetNullableGuid(reader, 3),
VulnerabilityId = reader.GetString(4),
ProductId = GetNullableString(reader, 5),
Status = ParseStatus(reader.GetString(6)),
Justification = ParseJustification(GetNullableString(reader, 7)),
ImpactStatement = GetNullableString(reader, 8),
ActionStatement = GetNullableString(reader, 9),
ActionStatementTimestamp = GetNullableDateTimeOffset(reader, 10),
FirstIssued = reader.GetFieldValue<DateTimeOffset>(11),
LastUpdated = reader.GetFieldValue<DateTimeOffset>(12),
Source = GetNullableString(reader, 13),
SourceUrl = GetNullableString(reader, 14),
Evidence = reader.GetString(15),
Provenance = reader.GetString(16),
Metadata = reader.GetString(17),
CreatedBy = GetNullableString(reader, 18)
};
private static string StatusToString(VexStatus status) => status switch
{
VexStatus.NotAffected => "not_affected",
VexStatus.Affected => "affected",
VexStatus.Fixed => "fixed",
VexStatus.UnderInvestigation => "under_investigation",
_ => throw new ArgumentException($"Unknown VEX status: {status}", nameof(status))
};
private static VexStatus ParseStatus(string status) => status switch
{
"not_affected" => VexStatus.NotAffected,
"affected" => VexStatus.Affected,
"fixed" => VexStatus.Fixed,
"under_investigation" => VexStatus.UnderInvestigation,
_ => throw new ArgumentException($"Unknown VEX status: {status}", nameof(status))
};
private static string JustificationToString(VexJustification justification) => justification switch
{
VexJustification.ComponentNotPresent => "component_not_present",
VexJustification.VulnerableCodeNotPresent => "vulnerable_code_not_present",
VexJustification.VulnerableCodeNotInExecutePath => "vulnerable_code_not_in_execute_path",
VexJustification.VulnerableCodeCannotBeControlledByAdversary => "vulnerable_code_cannot_be_controlled_by_adversary",
VexJustification.InlineMitigationsAlreadyExist => "inline_mitigations_already_exist",
_ => throw new ArgumentException($"Unknown VEX justification: {justification}", nameof(justification))
};
private static VexJustification? ParseJustification(string? justification) => justification switch
{
null => null,
"component_not_present" => VexJustification.ComponentNotPresent,
"vulnerable_code_not_present" => VexJustification.VulnerableCodeNotPresent,
"vulnerable_code_not_in_execute_path" => VexJustification.VulnerableCodeNotInExecutePath,
"vulnerable_code_cannot_be_controlled_by_adversary" => VexJustification.VulnerableCodeCannotBeControlledByAdversary,
"inline_mitigations_already_exist" => VexJustification.InlineMitigationsAlreadyExist,
_ => throw new ArgumentException($"Unknown VEX justification: {justification}", nameof(justification))
};
}

View File

@@ -0,0 +1,53 @@
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using StellaOps.Excititor.Storage.Postgres.Repositories;
using StellaOps.Infrastructure.Postgres;
using StellaOps.Infrastructure.Postgres.Options;
namespace StellaOps.Excititor.Storage.Postgres;
/// <summary>
/// Extension methods for configuring Excititor PostgreSQL storage services.
/// </summary>
public static class ServiceCollectionExtensions
{
/// <summary>
/// Adds Excititor PostgreSQL storage services.
/// </summary>
/// <param name="services">Service collection.</param>
/// <param name="configuration">Configuration root.</param>
/// <param name="sectionName">Configuration section name for PostgreSQL options.</param>
/// <returns>Service collection for chaining.</returns>
public static IServiceCollection AddExcititorPostgresStorage(
this IServiceCollection services,
IConfiguration configuration,
string sectionName = "Postgres:Excititor")
{
services.Configure<PostgresOptions>(sectionName, configuration.GetSection(sectionName));
services.AddSingleton<ExcititorDataSource>();
// Register repositories
services.AddScoped<IVexStatementRepository, VexStatementRepository>();
return services;
}
/// <summary>
/// Adds Excititor PostgreSQL storage services with explicit options.
/// </summary>
/// <param name="services">Service collection.</param>
/// <param name="configureOptions">Options configuration action.</param>
/// <returns>Service collection for chaining.</returns>
public static IServiceCollection AddExcititorPostgresStorage(
this IServiceCollection services,
Action<PostgresOptions> configureOptions)
{
services.Configure(configureOptions);
services.AddSingleton<ExcititorDataSource>();
// Register repositories
services.AddScoped<IVexStatementRepository, VexStatementRepository>();
return services;
}
}

View File

@@ -0,0 +1,21 @@
<?xml version="1.0" ?>
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
<LangVersion>preview</LangVersion>
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
<RootNamespace>StellaOps.Excititor.Storage.Postgres</RootNamespace>
</PropertyGroup>
<ItemGroup>
<None Include="Migrations\**\*.sql" CopyToOutputDirectory="PreserveNewest" />
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\..\..\__Libraries\StellaOps.Infrastructure.Postgres\StellaOps.Infrastructure.Postgres.csproj" />
</ItemGroup>
</Project>

View File

@@ -0,0 +1,326 @@
-- Notify Schema Migration 001: Initial Schema
-- Creates the notify schema for notifications, channels, and delivery tracking
-- Create schema
CREATE SCHEMA IF NOT EXISTS notify;
-- Channel types
DO $$ BEGIN
CREATE TYPE notify.channel_type AS ENUM (
'email', 'slack', 'teams', 'webhook', 'pagerduty', 'opsgenie'
);
EXCEPTION
WHEN duplicate_object THEN null;
END $$;
-- Delivery status
DO $$ BEGIN
CREATE TYPE notify.delivery_status AS ENUM (
'pending', 'queued', 'sending', 'sent', 'delivered', 'failed', 'bounced'
);
EXCEPTION
WHEN duplicate_object THEN null;
END $$;
-- Channels table
CREATE TABLE IF NOT EXISTS notify.channels (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL,
name TEXT NOT NULL,
channel_type notify.channel_type NOT NULL,
enabled BOOLEAN NOT NULL DEFAULT TRUE,
config JSONB NOT NULL DEFAULT '{}',
credentials JSONB,
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by TEXT,
UNIQUE(tenant_id, name)
);
CREATE INDEX idx_channels_tenant ON notify.channels(tenant_id);
CREATE INDEX idx_channels_type ON notify.channels(tenant_id, channel_type);
-- Rules table (notification routing rules)
CREATE TABLE IF NOT EXISTS notify.rules (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL,
name TEXT NOT NULL,
description TEXT,
enabled BOOLEAN NOT NULL DEFAULT TRUE,
priority INT NOT NULL DEFAULT 0,
event_types TEXT[] NOT NULL DEFAULT '{}',
filter JSONB NOT NULL DEFAULT '{}',
channel_ids UUID[] NOT NULL DEFAULT '{}',
template_id UUID,
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(tenant_id, name)
);
CREATE INDEX idx_rules_tenant ON notify.rules(tenant_id);
CREATE INDEX idx_rules_enabled ON notify.rules(tenant_id, enabled, priority DESC);
-- Templates table
CREATE TABLE IF NOT EXISTS notify.templates (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL,
name TEXT NOT NULL,
channel_type notify.channel_type NOT NULL,
subject_template TEXT,
body_template TEXT NOT NULL,
locale TEXT NOT NULL DEFAULT 'en',
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(tenant_id, name, channel_type, locale)
);
CREATE INDEX idx_templates_tenant ON notify.templates(tenant_id);
-- Deliveries table
CREATE TABLE IF NOT EXISTS notify.deliveries (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL,
channel_id UUID NOT NULL REFERENCES notify.channels(id),
rule_id UUID REFERENCES notify.rules(id),
template_id UUID REFERENCES notify.templates(id),
status notify.delivery_status NOT NULL DEFAULT 'pending',
recipient TEXT NOT NULL,
subject TEXT,
body TEXT,
event_type TEXT NOT NULL,
event_payload JSONB NOT NULL DEFAULT '{}',
attempt INT NOT NULL DEFAULT 0,
max_attempts INT NOT NULL DEFAULT 3,
next_retry_at TIMESTAMPTZ,
error_message TEXT,
external_id TEXT,
correlation_id TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
queued_at TIMESTAMPTZ,
sent_at TIMESTAMPTZ,
delivered_at TIMESTAMPTZ,
failed_at TIMESTAMPTZ
);
CREATE INDEX idx_deliveries_tenant ON notify.deliveries(tenant_id);
CREATE INDEX idx_deliveries_status ON notify.deliveries(tenant_id, status);
CREATE INDEX idx_deliveries_pending ON notify.deliveries(status, next_retry_at)
WHERE status IN ('pending', 'queued');
CREATE INDEX idx_deliveries_channel ON notify.deliveries(channel_id);
CREATE INDEX idx_deliveries_correlation ON notify.deliveries(correlation_id);
CREATE INDEX idx_deliveries_created ON notify.deliveries(tenant_id, created_at);
-- Digests table (aggregated notifications)
CREATE TABLE IF NOT EXISTS notify.digests (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL,
channel_id UUID NOT NULL REFERENCES notify.channels(id),
recipient TEXT NOT NULL,
digest_key TEXT NOT NULL,
event_count INT NOT NULL DEFAULT 0,
events JSONB NOT NULL DEFAULT '[]',
status TEXT NOT NULL DEFAULT 'collecting' CHECK (status IN ('collecting', 'sending', 'sent')),
collect_until TIMESTAMPTZ NOT NULL,
sent_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(tenant_id, channel_id, recipient, digest_key)
);
CREATE INDEX idx_digests_tenant ON notify.digests(tenant_id);
CREATE INDEX idx_digests_collect ON notify.digests(status, collect_until)
WHERE status = 'collecting';
-- Quiet hours table
CREATE TABLE IF NOT EXISTS notify.quiet_hours (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL,
user_id UUID,
channel_id UUID REFERENCES notify.channels(id),
start_time TIME NOT NULL,
end_time TIME NOT NULL,
timezone TEXT NOT NULL DEFAULT 'UTC',
days_of_week INT[] NOT NULL DEFAULT '{0,1,2,3,4,5,6}',
enabled BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_quiet_hours_tenant ON notify.quiet_hours(tenant_id);
-- Maintenance windows table
CREATE TABLE IF NOT EXISTS notify.maintenance_windows (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL,
name TEXT NOT NULL,
description TEXT,
start_at TIMESTAMPTZ NOT NULL,
end_at TIMESTAMPTZ NOT NULL,
suppress_channels UUID[],
suppress_event_types TEXT[],
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by TEXT,
UNIQUE(tenant_id, name)
);
CREATE INDEX idx_maintenance_windows_tenant ON notify.maintenance_windows(tenant_id);
CREATE INDEX idx_maintenance_windows_active ON notify.maintenance_windows(start_at, end_at);
-- Escalation policies table
CREATE TABLE IF NOT EXISTS notify.escalation_policies (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL,
name TEXT NOT NULL,
description TEXT,
enabled BOOLEAN NOT NULL DEFAULT TRUE,
steps JSONB NOT NULL DEFAULT '[]',
repeat_count INT NOT NULL DEFAULT 0,
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(tenant_id, name)
);
CREATE INDEX idx_escalation_policies_tenant ON notify.escalation_policies(tenant_id);
-- Escalation states table
CREATE TABLE IF NOT EXISTS notify.escalation_states (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL,
policy_id UUID NOT NULL REFERENCES notify.escalation_policies(id),
incident_id UUID,
correlation_id TEXT NOT NULL,
current_step INT NOT NULL DEFAULT 0,
repeat_iteration INT NOT NULL DEFAULT 0,
status TEXT NOT NULL DEFAULT 'active' CHECK (status IN ('active', 'acknowledged', 'resolved', 'expired')),
started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
next_escalation_at TIMESTAMPTZ,
acknowledged_at TIMESTAMPTZ,
acknowledged_by TEXT,
resolved_at TIMESTAMPTZ,
resolved_by TEXT,
metadata JSONB NOT NULL DEFAULT '{}'
);
CREATE INDEX idx_escalation_states_tenant ON notify.escalation_states(tenant_id);
CREATE INDEX idx_escalation_states_active ON notify.escalation_states(status, next_escalation_at)
WHERE status = 'active';
CREATE INDEX idx_escalation_states_correlation ON notify.escalation_states(correlation_id);
-- On-call schedules table
CREATE TABLE IF NOT EXISTS notify.on_call_schedules (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL,
name TEXT NOT NULL,
description TEXT,
timezone TEXT NOT NULL DEFAULT 'UTC',
rotation_type TEXT NOT NULL DEFAULT 'weekly' CHECK (rotation_type IN ('daily', 'weekly', 'custom')),
participants JSONB NOT NULL DEFAULT '[]',
overrides JSONB NOT NULL DEFAULT '[]',
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(tenant_id, name)
);
CREATE INDEX idx_on_call_schedules_tenant ON notify.on_call_schedules(tenant_id);
-- Inbox table (in-app notifications)
CREATE TABLE IF NOT EXISTS notify.inbox (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL,
user_id UUID NOT NULL,
title TEXT NOT NULL,
body TEXT,
event_type TEXT NOT NULL,
event_payload JSONB NOT NULL DEFAULT '{}',
read BOOLEAN NOT NULL DEFAULT FALSE,
archived BOOLEAN NOT NULL DEFAULT FALSE,
action_url TEXT,
correlation_id TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
read_at TIMESTAMPTZ,
archived_at TIMESTAMPTZ
);
CREATE INDEX idx_inbox_tenant_user ON notify.inbox(tenant_id, user_id);
CREATE INDEX idx_inbox_unread ON notify.inbox(tenant_id, user_id, read, created_at DESC)
WHERE read = FALSE AND archived = FALSE;
-- Incidents table
CREATE TABLE IF NOT EXISTS notify.incidents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL,
title TEXT NOT NULL,
description TEXT,
severity TEXT NOT NULL DEFAULT 'medium' CHECK (severity IN ('critical', 'high', 'medium', 'low')),
status TEXT NOT NULL DEFAULT 'open' CHECK (status IN ('open', 'acknowledged', 'resolved', 'closed')),
source TEXT,
correlation_id TEXT,
assigned_to UUID,
escalation_policy_id UUID REFERENCES notify.escalation_policies(id),
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
acknowledged_at TIMESTAMPTZ,
resolved_at TIMESTAMPTZ,
closed_at TIMESTAMPTZ,
created_by TEXT
);
CREATE INDEX idx_incidents_tenant ON notify.incidents(tenant_id);
CREATE INDEX idx_incidents_status ON notify.incidents(tenant_id, status);
CREATE INDEX idx_incidents_severity ON notify.incidents(tenant_id, severity);
CREATE INDEX idx_incidents_correlation ON notify.incidents(correlation_id);
-- Audit log table
CREATE TABLE IF NOT EXISTS notify.audit (
id BIGSERIAL PRIMARY KEY,
tenant_id TEXT NOT NULL,
user_id UUID,
action TEXT NOT NULL,
resource_type TEXT NOT NULL,
resource_id TEXT,
details JSONB,
correlation_id TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_audit_tenant ON notify.audit(tenant_id);
CREATE INDEX idx_audit_created ON notify.audit(tenant_id, created_at);
-- Update timestamp function
CREATE OR REPLACE FUNCTION notify.update_updated_at()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- Triggers
CREATE TRIGGER trg_channels_updated_at
BEFORE UPDATE ON notify.channels
FOR EACH ROW EXECUTE FUNCTION notify.update_updated_at();
CREATE TRIGGER trg_rules_updated_at
BEFORE UPDATE ON notify.rules
FOR EACH ROW EXECUTE FUNCTION notify.update_updated_at();
CREATE TRIGGER trg_templates_updated_at
BEFORE UPDATE ON notify.templates
FOR EACH ROW EXECUTE FUNCTION notify.update_updated_at();
CREATE TRIGGER trg_digests_updated_at
BEFORE UPDATE ON notify.digests
FOR EACH ROW EXECUTE FUNCTION notify.update_updated_at();
CREATE TRIGGER trg_escalation_policies_updated_at
BEFORE UPDATE ON notify.escalation_policies
FOR EACH ROW EXECUTE FUNCTION notify.update_updated_at();
CREATE TRIGGER trg_on_call_schedules_updated_at
BEFORE UPDATE ON notify.on_call_schedules
FOR EACH ROW EXECUTE FUNCTION notify.update_updated_at();

View File

@@ -0,0 +1,81 @@
namespace StellaOps.Notify.Storage.Postgres.Models;
/// <summary>
/// Channel types for notifications.
/// </summary>
public enum ChannelType
{
/// <summary>Email channel.</summary>
Email,
/// <summary>Slack channel.</summary>
Slack,
/// <summary>Microsoft Teams channel.</summary>
Teams,
/// <summary>Generic webhook channel.</summary>
Webhook,
/// <summary>PagerDuty integration.</summary>
PagerDuty,
/// <summary>OpsGenie integration.</summary>
OpsGenie
}
/// <summary>
/// Represents a notification channel entity.
/// </summary>
public sealed class ChannelEntity
{
/// <summary>
/// Unique channel identifier.
/// </summary>
public required Guid Id { get; init; }
/// <summary>
/// Tenant this channel belongs to.
/// </summary>
public required string TenantId { get; init; }
/// <summary>
/// Channel name (unique per tenant).
/// </summary>
public required string Name { get; init; }
/// <summary>
/// Type of channel.
/// </summary>
public required ChannelType ChannelType { get; init; }
/// <summary>
/// Channel is enabled.
/// </summary>
public bool Enabled { get; init; } = true;
/// <summary>
/// Channel configuration as JSON.
/// </summary>
public string Config { get; init; } = "{}";
/// <summary>
/// Channel credentials as JSON (encrypted).
/// </summary>
public string? Credentials { get; init; }
/// <summary>
/// Channel metadata as JSON.
/// </summary>
public string Metadata { get; init; } = "{}";
/// <summary>
/// When the channel was created.
/// </summary>
public DateTimeOffset CreatedAt { get; init; }
/// <summary>
/// When the channel was last updated.
/// </summary>
public DateTimeOffset UpdatedAt { get; init; }
/// <summary>
/// User who created the channel.
/// </summary>
public string? CreatedBy { get; init; }
}

View File

@@ -0,0 +1,138 @@
namespace StellaOps.Notify.Storage.Postgres.Models;
/// <summary>
/// Delivery status values.
/// </summary>
public enum DeliveryStatus
{
/// <summary>Delivery is pending.</summary>
Pending,
/// <summary>Delivery is queued for sending.</summary>
Queued,
/// <summary>Delivery is being sent.</summary>
Sending,
/// <summary>Delivery was sent.</summary>
Sent,
/// <summary>Delivery was confirmed delivered.</summary>
Delivered,
/// <summary>Delivery failed.</summary>
Failed,
/// <summary>Delivery bounced.</summary>
Bounced
}
/// <summary>
/// Represents a notification delivery entity.
/// </summary>
public sealed class DeliveryEntity
{
/// <summary>
/// Unique delivery identifier.
/// </summary>
public required Guid Id { get; init; }
/// <summary>
/// Tenant this delivery belongs to.
/// </summary>
public required string TenantId { get; init; }
/// <summary>
/// Channel used for this delivery.
/// </summary>
public required Guid ChannelId { get; init; }
/// <summary>
/// Rule that triggered this delivery.
/// </summary>
public Guid? RuleId { get; init; }
/// <summary>
/// Template used for this delivery.
/// </summary>
public Guid? TemplateId { get; init; }
/// <summary>
/// Current delivery status.
/// </summary>
public DeliveryStatus Status { get; init; } = DeliveryStatus.Pending;
/// <summary>
/// Recipient address/identifier.
/// </summary>
public required string Recipient { get; init; }
/// <summary>
/// Notification subject.
/// </summary>
public string? Subject { get; init; }
/// <summary>
/// Notification body.
/// </summary>
public string? Body { get; init; }
/// <summary>
/// Event type that triggered this notification.
/// </summary>
public required string EventType { get; init; }
/// <summary>
/// Event payload as JSON.
/// </summary>
public string EventPayload { get; init; } = "{}";
/// <summary>
/// Current attempt number.
/// </summary>
public int Attempt { get; init; }
/// <summary>
/// Maximum number of attempts.
/// </summary>
public int MaxAttempts { get; init; } = 3;
/// <summary>
/// Next retry time.
/// </summary>
public DateTimeOffset? NextRetryAt { get; init; }
/// <summary>
/// Error message if failed.
/// </summary>
public string? ErrorMessage { get; init; }
/// <summary>
/// External ID from the channel provider.
/// </summary>
public string? ExternalId { get; init; }
/// <summary>
/// Correlation ID for tracing.
/// </summary>
public string? CorrelationId { get; init; }
/// <summary>
/// When the delivery was created.
/// </summary>
public DateTimeOffset CreatedAt { get; init; }
/// <summary>
/// When the delivery was queued.
/// </summary>
public DateTimeOffset? QueuedAt { get; init; }
/// <summary>
/// When the delivery was sent.
/// </summary>
public DateTimeOffset? SentAt { get; init; }
/// <summary>
/// When the delivery was confirmed delivered.
/// </summary>
public DateTimeOffset? DeliveredAt { get; init; }
/// <summary>
/// When the delivery failed.
/// </summary>
public DateTimeOffset? FailedAt { get; init; }
}

View File

@@ -0,0 +1,38 @@
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Options;
using StellaOps.Infrastructure.Postgres.Connections;
using StellaOps.Infrastructure.Postgres.Options;
namespace StellaOps.Notify.Storage.Postgres;
/// <summary>
/// PostgreSQL data source for the Notify module.
/// Manages connections with tenant context for notifications and delivery tracking.
/// </summary>
public sealed class NotifyDataSource : DataSourceBase
{
/// <summary>
/// Default schema name for Notify tables.
/// </summary>
public const string DefaultSchemaName = "notify";
/// <summary>
/// Creates a new Notify data source.
/// </summary>
public NotifyDataSource(IOptions<PostgresOptions> options, ILogger<NotifyDataSource> logger)
: base(CreateOptions(options.Value), logger)
{
}
/// <inheritdoc />
protected override string ModuleName => "Notify";
private static PostgresOptions CreateOptions(PostgresOptions baseOptions)
{
if (string.IsNullOrWhiteSpace(baseOptions.SchemaName))
{
baseOptions.SchemaName = DefaultSchemaName;
}
return baseOptions;
}
}

View File

@@ -0,0 +1,264 @@
using Microsoft.Extensions.Logging;
using Npgsql;
using StellaOps.Infrastructure.Postgres.Repositories;
using StellaOps.Notify.Storage.Postgres.Models;
namespace StellaOps.Notify.Storage.Postgres.Repositories;
/// <summary>
/// PostgreSQL repository for notification channel operations.
/// </summary>
public sealed class ChannelRepository : RepositoryBase<NotifyDataSource>, IChannelRepository
{
/// <summary>
/// Creates a new channel repository.
/// </summary>
public ChannelRepository(NotifyDataSource dataSource, ILogger<ChannelRepository> logger)
: base(dataSource, logger)
{
}
/// <inheritdoc />
public async Task<ChannelEntity> CreateAsync(ChannelEntity channel, CancellationToken cancellationToken = default)
{
const string sql = """
INSERT INTO notify.channels (
id, tenant_id, name, channel_type, enabled, config, credentials, metadata, created_by
)
VALUES (
@id, @tenant_id, @name, @channel_type::notify.channel_type, @enabled,
@config::jsonb, @credentials::jsonb, @metadata::jsonb, @created_by
)
RETURNING id, tenant_id, name, channel_type::text, enabled,
config::text, credentials::text, metadata::text, created_at, updated_at, created_by
""";
await using var connection = await DataSource.OpenConnectionAsync(channel.TenantId, "writer", cancellationToken)
.ConfigureAwait(false);
await using var command = CreateCommand(sql, connection);
AddParameter(command, "id", channel.Id);
AddParameter(command, "tenant_id", channel.TenantId);
AddParameter(command, "name", channel.Name);
AddParameter(command, "channel_type", ChannelTypeToString(channel.ChannelType));
AddParameter(command, "enabled", channel.Enabled);
AddJsonbParameter(command, "config", channel.Config);
AddJsonbParameter(command, "credentials", channel.Credentials);
AddJsonbParameter(command, "metadata", channel.Metadata);
AddParameter(command, "created_by", channel.CreatedBy);
await using var reader = await command.ExecuteReaderAsync(cancellationToken).ConfigureAwait(false);
await reader.ReadAsync(cancellationToken).ConfigureAwait(false);
return MapChannel(reader);
}
/// <inheritdoc />
public async Task<ChannelEntity?> GetByIdAsync(string tenantId, Guid id, CancellationToken cancellationToken = default)
{
const string sql = """
SELECT id, tenant_id, name, channel_type::text, enabled,
config::text, credentials::text, metadata::text, created_at, updated_at, created_by
FROM notify.channels
WHERE tenant_id = @tenant_id AND id = @id
""";
return await QuerySingleOrDefaultAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "id", id);
},
MapChannel,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<ChannelEntity?> GetByNameAsync(string tenantId, string name, CancellationToken cancellationToken = default)
{
const string sql = """
SELECT id, tenant_id, name, channel_type::text, enabled,
config::text, credentials::text, metadata::text, created_at, updated_at, created_by
FROM notify.channels
WHERE tenant_id = @tenant_id AND name = @name
""";
return await QuerySingleOrDefaultAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "name", name);
},
MapChannel,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<IReadOnlyList<ChannelEntity>> GetAllAsync(
string tenantId,
bool? enabled = null,
ChannelType? channelType = null,
int limit = 100,
int offset = 0,
CancellationToken cancellationToken = default)
{
var sql = """
SELECT id, tenant_id, name, channel_type::text, enabled,
config::text, credentials::text, metadata::text, created_at, updated_at, created_by
FROM notify.channels
WHERE tenant_id = @tenant_id
""";
if (enabled.HasValue)
{
sql += " AND enabled = @enabled";
}
if (channelType.HasValue)
{
sql += " AND channel_type = @channel_type::notify.channel_type";
}
sql += " ORDER BY name, id LIMIT @limit OFFSET @offset";
return await QueryAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
if (enabled.HasValue)
{
AddParameter(cmd, "enabled", enabled.Value);
}
if (channelType.HasValue)
{
AddParameter(cmd, "channel_type", ChannelTypeToString(channelType.Value));
}
AddParameter(cmd, "limit", limit);
AddParameter(cmd, "offset", offset);
},
MapChannel,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<bool> UpdateAsync(ChannelEntity channel, CancellationToken cancellationToken = default)
{
const string sql = """
UPDATE notify.channels
SET name = @name,
channel_type = @channel_type::notify.channel_type,
enabled = @enabled,
config = @config::jsonb,
credentials = @credentials::jsonb,
metadata = @metadata::jsonb
WHERE tenant_id = @tenant_id AND id = @id
""";
var rows = await ExecuteAsync(
channel.TenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", channel.TenantId);
AddParameter(cmd, "id", channel.Id);
AddParameter(cmd, "name", channel.Name);
AddParameter(cmd, "channel_type", ChannelTypeToString(channel.ChannelType));
AddParameter(cmd, "enabled", channel.Enabled);
AddJsonbParameter(cmd, "config", channel.Config);
AddJsonbParameter(cmd, "credentials", channel.Credentials);
AddJsonbParameter(cmd, "metadata", channel.Metadata);
},
cancellationToken).ConfigureAwait(false);
return rows > 0;
}
/// <inheritdoc />
public async Task<bool> DeleteAsync(string tenantId, Guid id, CancellationToken cancellationToken = default)
{
const string sql = "DELETE FROM notify.channels WHERE tenant_id = @tenant_id AND id = @id";
var rows = await ExecuteAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "id", id);
},
cancellationToken).ConfigureAwait(false);
return rows > 0;
}
/// <inheritdoc />
public async Task<IReadOnlyList<ChannelEntity>> GetEnabledByTypeAsync(
string tenantId,
ChannelType channelType,
CancellationToken cancellationToken = default)
{
const string sql = """
SELECT id, tenant_id, name, channel_type::text, enabled,
config::text, credentials::text, metadata::text, created_at, updated_at, created_by
FROM notify.channels
WHERE tenant_id = @tenant_id
AND channel_type = @channel_type::notify.channel_type
AND enabled = TRUE
ORDER BY name, id
""";
return await QueryAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "channel_type", ChannelTypeToString(channelType));
},
MapChannel,
cancellationToken).ConfigureAwait(false);
}
private static ChannelEntity MapChannel(NpgsqlDataReader reader) => new()
{
Id = reader.GetGuid(0),
TenantId = reader.GetString(1),
Name = reader.GetString(2),
ChannelType = ParseChannelType(reader.GetString(3)),
Enabled = reader.GetBoolean(4),
Config = reader.GetString(5),
Credentials = GetNullableString(reader, 6),
Metadata = reader.GetString(7),
CreatedAt = reader.GetFieldValue<DateTimeOffset>(8),
UpdatedAt = reader.GetFieldValue<DateTimeOffset>(9),
CreatedBy = GetNullableString(reader, 10)
};
private static string ChannelTypeToString(ChannelType channelType) => channelType switch
{
ChannelType.Email => "email",
ChannelType.Slack => "slack",
ChannelType.Teams => "teams",
ChannelType.Webhook => "webhook",
ChannelType.PagerDuty => "pagerduty",
ChannelType.OpsGenie => "opsgenie",
_ => throw new ArgumentException($"Unknown channel type: {channelType}", nameof(channelType))
};
private static ChannelType ParseChannelType(string channelType) => channelType switch
{
"email" => ChannelType.Email,
"slack" => ChannelType.Slack,
"teams" => ChannelType.Teams,
"webhook" => ChannelType.Webhook,
"pagerduty" => ChannelType.PagerDuty,
"opsgenie" => ChannelType.OpsGenie,
_ => throw new ArgumentException($"Unknown channel type: {channelType}", nameof(channelType))
};
}

View File

@@ -0,0 +1,363 @@
using Microsoft.Extensions.Logging;
using Npgsql;
using StellaOps.Infrastructure.Postgres.Repositories;
using StellaOps.Notify.Storage.Postgres.Models;
namespace StellaOps.Notify.Storage.Postgres.Repositories;
/// <summary>
/// PostgreSQL repository for notification delivery operations.
/// </summary>
public sealed class DeliveryRepository : RepositoryBase<NotifyDataSource>, IDeliveryRepository
{
/// <summary>
/// Creates a new delivery repository.
/// </summary>
public DeliveryRepository(NotifyDataSource dataSource, ILogger<DeliveryRepository> logger)
: base(dataSource, logger)
{
}
/// <inheritdoc />
public async Task<DeliveryEntity> CreateAsync(DeliveryEntity delivery, CancellationToken cancellationToken = default)
{
const string sql = """
INSERT INTO notify.deliveries (
id, tenant_id, channel_id, rule_id, template_id, status, recipient,
subject, body, event_type, event_payload, max_attempts, correlation_id
)
VALUES (
@id, @tenant_id, @channel_id, @rule_id, @template_id, @status::notify.delivery_status, @recipient,
@subject, @body, @event_type, @event_payload::jsonb, @max_attempts, @correlation_id
)
RETURNING *
""";
await using var connection = await DataSource.OpenConnectionAsync(delivery.TenantId, "writer", cancellationToken)
.ConfigureAwait(false);
await using var command = CreateCommand(sql, connection);
AddDeliveryParameters(command, delivery);
await using var reader = await command.ExecuteReaderAsync(cancellationToken).ConfigureAwait(false);
await reader.ReadAsync(cancellationToken).ConfigureAwait(false);
return MapDelivery(reader);
}
/// <inheritdoc />
public async Task<DeliveryEntity?> GetByIdAsync(string tenantId, Guid id, CancellationToken cancellationToken = default)
{
const string sql = "SELECT * FROM notify.deliveries WHERE tenant_id = @tenant_id AND id = @id";
return await QuerySingleOrDefaultAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "id", id);
},
MapDelivery,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<IReadOnlyList<DeliveryEntity>> GetPendingAsync(
string tenantId,
int limit = 100,
CancellationToken cancellationToken = default)
{
const string sql = """
SELECT * FROM notify.deliveries
WHERE tenant_id = @tenant_id
AND status IN ('pending', 'queued')
AND (next_retry_at IS NULL OR next_retry_at <= NOW())
AND attempt < max_attempts
ORDER BY created_at, id
LIMIT @limit
""";
return await QueryAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "limit", limit);
},
MapDelivery,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<IReadOnlyList<DeliveryEntity>> GetByStatusAsync(
string tenantId,
DeliveryStatus status,
int limit = 100,
int offset = 0,
CancellationToken cancellationToken = default)
{
const string sql = """
SELECT * FROM notify.deliveries
WHERE tenant_id = @tenant_id AND status = @status::notify.delivery_status
ORDER BY created_at DESC, id
LIMIT @limit OFFSET @offset
""";
return await QueryAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "status", StatusToString(status));
AddParameter(cmd, "limit", limit);
AddParameter(cmd, "offset", offset);
},
MapDelivery,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<IReadOnlyList<DeliveryEntity>> GetByCorrelationIdAsync(
string tenantId,
string correlationId,
CancellationToken cancellationToken = default)
{
const string sql = """
SELECT * FROM notify.deliveries
WHERE tenant_id = @tenant_id AND correlation_id = @correlation_id
ORDER BY created_at, id
""";
return await QueryAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "correlation_id", correlationId);
},
MapDelivery,
cancellationToken).ConfigureAwait(false);
}
/// <inheritdoc />
public async Task<bool> MarkQueuedAsync(string tenantId, Guid id, CancellationToken cancellationToken = default)
{
const string sql = """
UPDATE notify.deliveries
SET status = 'queued'::notify.delivery_status,
queued_at = NOW()
WHERE tenant_id = @tenant_id AND id = @id AND status = 'pending'
""";
var rows = await ExecuteAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "id", id);
},
cancellationToken).ConfigureAwait(false);
return rows > 0;
}
/// <inheritdoc />
public async Task<bool> MarkSentAsync(string tenantId, Guid id, string? externalId = null, CancellationToken cancellationToken = default)
{
const string sql = """
UPDATE notify.deliveries
SET status = 'sent'::notify.delivery_status,
sent_at = NOW(),
external_id = COALESCE(@external_id, external_id)
WHERE tenant_id = @tenant_id AND id = @id AND status IN ('queued', 'sending')
""";
var rows = await ExecuteAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "id", id);
AddParameter(cmd, "external_id", externalId);
},
cancellationToken).ConfigureAwait(false);
return rows > 0;
}
/// <inheritdoc />
public async Task<bool> MarkDeliveredAsync(string tenantId, Guid id, CancellationToken cancellationToken = default)
{
const string sql = """
UPDATE notify.deliveries
SET status = 'delivered'::notify.delivery_status,
delivered_at = NOW()
WHERE tenant_id = @tenant_id AND id = @id AND status = 'sent'
""";
var rows = await ExecuteAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "id", id);
},
cancellationToken).ConfigureAwait(false);
return rows > 0;
}
/// <inheritdoc />
public async Task<bool> MarkFailedAsync(
string tenantId,
Guid id,
string errorMessage,
TimeSpan? retryDelay = null,
CancellationToken cancellationToken = default)
{
var sql = """
UPDATE notify.deliveries
SET status = CASE
WHEN attempt + 1 < max_attempts AND @retry_delay IS NOT NULL THEN 'pending'::notify.delivery_status
ELSE 'failed'::notify.delivery_status
END,
attempt = attempt + 1,
error_message = @error_message,
failed_at = CASE WHEN attempt + 1 >= max_attempts OR @retry_delay IS NULL THEN NOW() ELSE failed_at END,
next_retry_at = CASE
WHEN attempt + 1 < max_attempts AND @retry_delay IS NOT NULL THEN NOW() + @retry_delay
ELSE NULL
END
WHERE tenant_id = @tenant_id AND id = @id
""";
var rows = await ExecuteAsync(
tenantId,
sql,
cmd =>
{
AddParameter(cmd, "tenant_id", tenantId);
AddParameter(cmd, "id", id);
AddParameter(cmd, "error_message", errorMessage);
AddParameter(cmd, "retry_delay", retryDelay);
},
cancellationToken).ConfigureAwait(false);
return rows > 0;
}
/// <inheritdoc />
public async Task<DeliveryStats> GetStatsAsync(
string tenantId,
DateTimeOffset from,
DateTimeOffset to,
CancellationToken cancellationToken = default)
{
const string sql = """
SELECT
COUNT(*) as total,
COUNT(*) FILTER (WHERE status = 'pending') as pending,
COUNT(*) FILTER (WHERE status = 'sent') as sent,
COUNT(*) FILTER (WHERE status = 'delivered') as delivered,
COUNT(*) FILTER (WHERE status = 'failed') as failed,
COUNT(*) FILTER (WHERE status = 'bounced') as bounced
FROM notify.deliveries
WHERE tenant_id = @tenant_id
AND created_at >= @from
AND created_at < @to
""";
await using var connection = await DataSource.OpenConnectionAsync(tenantId, "reader", cancellationToken)
.ConfigureAwait(false);
await using var command = CreateCommand(sql, connection);
AddParameter(command, "tenant_id", tenantId);
AddParameter(command, "from", from);
AddParameter(command, "to", to);
await using var reader = await command.ExecuteReaderAsync(cancellationToken).ConfigureAwait(false);
await reader.ReadAsync(cancellationToken).ConfigureAwait(false);
return new DeliveryStats(
Total: reader.GetInt64(0),
Pending: reader.GetInt64(1),
Sent: reader.GetInt64(2),
Delivered: reader.GetInt64(3),
Failed: reader.GetInt64(4),
Bounced: reader.GetInt64(5));
}
private static void AddDeliveryParameters(NpgsqlCommand command, DeliveryEntity delivery)
{
AddParameter(command, "id", delivery.Id);
AddParameter(command, "tenant_id", delivery.TenantId);
AddParameter(command, "channel_id", delivery.ChannelId);
AddParameter(command, "rule_id", delivery.RuleId);
AddParameter(command, "template_id", delivery.TemplateId);
AddParameter(command, "status", StatusToString(delivery.Status));
AddParameter(command, "recipient", delivery.Recipient);
AddParameter(command, "subject", delivery.Subject);
AddParameter(command, "body", delivery.Body);
AddParameter(command, "event_type", delivery.EventType);
AddJsonbParameter(command, "event_payload", delivery.EventPayload);
AddParameter(command, "max_attempts", delivery.MaxAttempts);
AddParameter(command, "correlation_id", delivery.CorrelationId);
}
private static DeliveryEntity MapDelivery(NpgsqlDataReader reader) => new()
{
Id = reader.GetGuid(reader.GetOrdinal("id")),
TenantId = reader.GetString(reader.GetOrdinal("tenant_id")),
ChannelId = reader.GetGuid(reader.GetOrdinal("channel_id")),
RuleId = GetNullableGuid(reader, reader.GetOrdinal("rule_id")),
TemplateId = GetNullableGuid(reader, reader.GetOrdinal("template_id")),
Status = ParseStatus(reader.GetString(reader.GetOrdinal("status"))),
Recipient = reader.GetString(reader.GetOrdinal("recipient")),
Subject = GetNullableString(reader, reader.GetOrdinal("subject")),
Body = GetNullableString(reader, reader.GetOrdinal("body")),
EventType = reader.GetString(reader.GetOrdinal("event_type")),
EventPayload = reader.GetString(reader.GetOrdinal("event_payload")),
Attempt = reader.GetInt32(reader.GetOrdinal("attempt")),
MaxAttempts = reader.GetInt32(reader.GetOrdinal("max_attempts")),
NextRetryAt = GetNullableDateTimeOffset(reader, reader.GetOrdinal("next_retry_at")),
ErrorMessage = GetNullableString(reader, reader.GetOrdinal("error_message")),
ExternalId = GetNullableString(reader, reader.GetOrdinal("external_id")),
CorrelationId = GetNullableString(reader, reader.GetOrdinal("correlation_id")),
CreatedAt = reader.GetFieldValue<DateTimeOffset>(reader.GetOrdinal("created_at")),
QueuedAt = GetNullableDateTimeOffset(reader, reader.GetOrdinal("queued_at")),
SentAt = GetNullableDateTimeOffset(reader, reader.GetOrdinal("sent_at")),
DeliveredAt = GetNullableDateTimeOffset(reader, reader.GetOrdinal("delivered_at")),
FailedAt = GetNullableDateTimeOffset(reader, reader.GetOrdinal("failed_at"))
};
private static string StatusToString(DeliveryStatus status) => status switch
{
DeliveryStatus.Pending => "pending",
DeliveryStatus.Queued => "queued",
DeliveryStatus.Sending => "sending",
DeliveryStatus.Sent => "sent",
DeliveryStatus.Delivered => "delivered",
DeliveryStatus.Failed => "failed",
DeliveryStatus.Bounced => "bounced",
_ => throw new ArgumentException($"Unknown delivery status: {status}", nameof(status))
};
private static DeliveryStatus ParseStatus(string status) => status switch
{
"pending" => DeliveryStatus.Pending,
"queued" => DeliveryStatus.Queued,
"sending" => DeliveryStatus.Sending,
"sent" => DeliveryStatus.Sent,
"delivered" => DeliveryStatus.Delivered,
"failed" => DeliveryStatus.Failed,
"bounced" => DeliveryStatus.Bounced,
_ => throw new ArgumentException($"Unknown delivery status: {status}", nameof(status))
};
}

View File

@@ -0,0 +1,53 @@
using StellaOps.Notify.Storage.Postgres.Models;
namespace StellaOps.Notify.Storage.Postgres.Repositories;
/// <summary>
/// Repository interface for notification channel operations.
/// </summary>
public interface IChannelRepository
{
/// <summary>
/// Creates a new channel.
/// </summary>
Task<ChannelEntity> CreateAsync(ChannelEntity channel, CancellationToken cancellationToken = default);
/// <summary>
/// Gets a channel by ID.
/// </summary>
Task<ChannelEntity?> GetByIdAsync(string tenantId, Guid id, CancellationToken cancellationToken = default);
/// <summary>
/// Gets a channel by name.
/// </summary>
Task<ChannelEntity?> GetByNameAsync(string tenantId, string name, CancellationToken cancellationToken = default);
/// <summary>
/// Gets all channels for a tenant.
/// </summary>
Task<IReadOnlyList<ChannelEntity>> GetAllAsync(
string tenantId,
bool? enabled = null,
ChannelType? channelType = null,
int limit = 100,
int offset = 0,
CancellationToken cancellationToken = default);
/// <summary>
/// Updates a channel.
/// </summary>
Task<bool> UpdateAsync(ChannelEntity channel, CancellationToken cancellationToken = default);
/// <summary>
/// Deletes a channel.
/// </summary>
Task<bool> DeleteAsync(string tenantId, Guid id, CancellationToken cancellationToken = default);
/// <summary>
/// Gets enabled channels by type.
/// </summary>
Task<IReadOnlyList<ChannelEntity>> GetEnabledByTypeAsync(
string tenantId,
ChannelType channelType,
CancellationToken cancellationToken = default);
}

View File

@@ -0,0 +1,90 @@
using StellaOps.Notify.Storage.Postgres.Models;
namespace StellaOps.Notify.Storage.Postgres.Repositories;
/// <summary>
/// Repository interface for notification delivery operations.
/// </summary>
public interface IDeliveryRepository
{
/// <summary>
/// Creates a new delivery.
/// </summary>
Task<DeliveryEntity> CreateAsync(DeliveryEntity delivery, CancellationToken cancellationToken = default);
/// <summary>
/// Gets a delivery by ID.
/// </summary>
Task<DeliveryEntity?> GetByIdAsync(string tenantId, Guid id, CancellationToken cancellationToken = default);
/// <summary>
/// Gets pending deliveries ready to send.
/// </summary>
Task<IReadOnlyList<DeliveryEntity>> GetPendingAsync(
string tenantId,
int limit = 100,
CancellationToken cancellationToken = default);
/// <summary>
/// Gets deliveries by status.
/// </summary>
Task<IReadOnlyList<DeliveryEntity>> GetByStatusAsync(
string tenantId,
DeliveryStatus status,
int limit = 100,
int offset = 0,
CancellationToken cancellationToken = default);
/// <summary>
/// Gets deliveries by correlation ID.
/// </summary>
Task<IReadOnlyList<DeliveryEntity>> GetByCorrelationIdAsync(
string tenantId,
string correlationId,
CancellationToken cancellationToken = default);
/// <summary>
/// Marks a delivery as queued.
/// </summary>
Task<bool> MarkQueuedAsync(string tenantId, Guid id, CancellationToken cancellationToken = default);
/// <summary>
/// Marks a delivery as sent.
/// </summary>
Task<bool> MarkSentAsync(string tenantId, Guid id, string? externalId = null, CancellationToken cancellationToken = default);
/// <summary>
/// Marks a delivery as delivered.
/// </summary>
Task<bool> MarkDeliveredAsync(string tenantId, Guid id, CancellationToken cancellationToken = default);
/// <summary>
/// Marks a delivery as failed with retry scheduling.
/// </summary>
Task<bool> MarkFailedAsync(
string tenantId,
Guid id,
string errorMessage,
TimeSpan? retryDelay = null,
CancellationToken cancellationToken = default);
/// <summary>
/// Gets delivery statistics for a time range.
/// </summary>
Task<DeliveryStats> GetStatsAsync(
string tenantId,
DateTimeOffset from,
DateTimeOffset to,
CancellationToken cancellationToken = default);
}
/// <summary>
/// Delivery statistics.
/// </summary>
public sealed record DeliveryStats(
long Total,
long Pending,
long Sent,
long Delivered,
long Failed,
long Bounced);

View File

@@ -0,0 +1,55 @@
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using StellaOps.Infrastructure.Postgres;
using StellaOps.Infrastructure.Postgres.Options;
using StellaOps.Notify.Storage.Postgres.Repositories;
namespace StellaOps.Notify.Storage.Postgres;
/// <summary>
/// Extension methods for configuring Notify PostgreSQL storage services.
/// </summary>
public static class ServiceCollectionExtensions
{
/// <summary>
/// Adds Notify PostgreSQL storage services.
/// </summary>
/// <param name="services">Service collection.</param>
/// <param name="configuration">Configuration root.</param>
/// <param name="sectionName">Configuration section name for PostgreSQL options.</param>
/// <returns>Service collection for chaining.</returns>
public static IServiceCollection AddNotifyPostgresStorage(
this IServiceCollection services,
IConfiguration configuration,
string sectionName = "Postgres:Notify")
{
services.Configure<PostgresOptions>(sectionName, configuration.GetSection(sectionName));
services.AddSingleton<NotifyDataSource>();
// Register repositories
services.AddScoped<IChannelRepository, ChannelRepository>();
services.AddScoped<IDeliveryRepository, DeliveryRepository>();
return services;
}
/// <summary>
/// Adds Notify PostgreSQL storage services with explicit options.
/// </summary>
/// <param name="services">Service collection.</param>
/// <param name="configureOptions">Options configuration action.</param>
/// <returns>Service collection for chaining.</returns>
public static IServiceCollection AddNotifyPostgresStorage(
this IServiceCollection services,
Action<PostgresOptions> configureOptions)
{
services.Configure(configureOptions);
services.AddSingleton<NotifyDataSource>();
// Register repositories
services.AddScoped<IChannelRepository, ChannelRepository>();
services.AddScoped<IDeliveryRepository, DeliveryRepository>();
return services;
}
}

View File

@@ -0,0 +1,21 @@
<?xml version="1.0" ?>
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
<LangVersion>preview</LangVersion>
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
<RootNamespace>StellaOps.Notify.Storage.Postgres</RootNamespace>
</PropertyGroup>
<ItemGroup>
<None Include="Migrations\**\*.sql" CopyToOutputDirectory="PreserveNewest" />
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\..\..\__Libraries\StellaOps.Infrastructure.Postgres\StellaOps.Infrastructure.Postgres.csproj" />
</ItemGroup>
</Project>

View File

@@ -0,0 +1,583 @@
using Microsoft.Extensions.Logging;
using StellaOps.Orchestrator.Core.Domain;
namespace StellaOps.Orchestrator.Core.Backfill;
/// <summary>
/// Configuration options for the backfill manager.
/// </summary>
public sealed record BackfillManagerOptions
{
/// <summary>
/// Maximum number of events allowed in a single backfill request.
/// </summary>
public long MaxEventsPerBackfill { get; init; } = 1_000_000;
/// <summary>
/// Maximum duration allowed for a backfill operation.
/// </summary>
public TimeSpan MaxBackfillDuration { get; init; } = TimeSpan.FromHours(24);
/// <summary>
/// Data retention period - backfills cannot extend beyond this.
/// </summary>
public TimeSpan RetentionPeriod { get; init; } = TimeSpan.FromDays(90);
/// <summary>
/// Default TTL for processed event records.
/// </summary>
public TimeSpan DefaultProcessedEventTtl { get; init; } = TimeSpan.FromDays(30);
/// <summary>
/// Number of sample event keys to include in previews.
/// </summary>
public int PreviewSampleSize { get; init; } = 10;
/// <summary>
/// Estimated events per second for duration estimation.
/// </summary>
public double EstimatedEventsPerSecond { get; init; } = 100;
}
/// <summary>
/// Coordinates backfill operations with safety validations.
/// </summary>
public interface IBackfillManager
{
/// <summary>
/// Creates a new backfill request with validation.
/// </summary>
Task<BackfillRequest> CreateRequestAsync(
string tenantId,
Guid? sourceId,
string? jobType,
DateTimeOffset windowStart,
DateTimeOffset windowEnd,
string reason,
string createdBy,
int batchSize = 100,
bool dryRun = false,
bool forceReprocess = false,
string? ticket = null,
TimeSpan? maxDuration = null,
CancellationToken cancellationToken = default);
/// <summary>
/// Validates a backfill request and runs safety checks.
/// </summary>
Task<BackfillRequest> ValidateRequestAsync(
string tenantId,
Guid backfillId,
string updatedBy,
CancellationToken cancellationToken = default);
/// <summary>
/// Generates a preview of what a backfill would process (dry-run).
/// </summary>
Task<BackfillPreview> PreviewAsync(
string tenantId,
Guid? sourceId,
string? jobType,
DateTimeOffset windowStart,
DateTimeOffset windowEnd,
int batchSize = 100,
CancellationToken cancellationToken = default);
/// <summary>
/// Starts execution of a validated backfill request.
/// </summary>
Task<BackfillRequest> StartAsync(
string tenantId,
Guid backfillId,
string updatedBy,
CancellationToken cancellationToken = default);
/// <summary>
/// Pauses a running backfill.
/// </summary>
Task<BackfillRequest> PauseAsync(
string tenantId,
Guid backfillId,
string updatedBy,
CancellationToken cancellationToken = default);
/// <summary>
/// Resumes a paused backfill.
/// </summary>
Task<BackfillRequest> ResumeAsync(
string tenantId,
Guid backfillId,
string updatedBy,
CancellationToken cancellationToken = default);
/// <summary>
/// Cancels a backfill request.
/// </summary>
Task<BackfillRequest> CancelAsync(
string tenantId,
Guid backfillId,
string updatedBy,
CancellationToken cancellationToken = default);
/// <summary>
/// Gets the current status of a backfill request.
/// </summary>
Task<BackfillRequest?> GetStatusAsync(
string tenantId,
Guid backfillId,
CancellationToken cancellationToken = default);
/// <summary>
/// Lists backfill requests with filters.
/// </summary>
Task<IReadOnlyList<BackfillRequest>> ListAsync(
string tenantId,
BackfillStatus? status = null,
Guid? sourceId = null,
string? jobType = null,
int limit = 50,
int offset = 0,
CancellationToken cancellationToken = default);
}
/// <summary>
/// Provides event counting for backfill estimation.
/// </summary>
public interface IBackfillEventCounter
{
/// <summary>
/// Estimates the number of events in a time window.
/// </summary>
Task<long> EstimateEventCountAsync(
string tenantId,
string scopeKey,
DateTimeOffset windowStart,
DateTimeOffset windowEnd,
CancellationToken cancellationToken);
/// <summary>
/// Gets sample event keys from a time window.
/// </summary>
Task<IReadOnlyList<string>> GetSampleEventKeysAsync(
string tenantId,
string scopeKey,
DateTimeOffset windowStart,
DateTimeOffset windowEnd,
int sampleSize,
CancellationToken cancellationToken);
}
/// <summary>
/// Validates backfill safety conditions.
/// </summary>
public interface IBackfillSafetyValidator
{
/// <summary>
/// Runs all safety validations for a backfill request.
/// </summary>
Task<BackfillSafetyChecks> ValidateAsync(
BackfillRequest request,
long estimatedEvents,
TimeSpan estimatedDuration,
CancellationToken cancellationToken);
}
/// <summary>
/// Default implementation of backfill safety validator.
/// </summary>
public sealed class DefaultBackfillSafetyValidator : IBackfillSafetyValidator
{
private readonly ISourceValidator _sourceValidator;
private readonly IOverlapChecker _overlapChecker;
private readonly BackfillManagerOptions _options;
public DefaultBackfillSafetyValidator(
ISourceValidator sourceValidator,
IOverlapChecker overlapChecker,
BackfillManagerOptions options)
{
_sourceValidator = sourceValidator;
_overlapChecker = overlapChecker;
_options = options;
}
public async Task<BackfillSafetyChecks> ValidateAsync(
BackfillRequest request,
long estimatedEvents,
TimeSpan estimatedDuration,
CancellationToken cancellationToken)
{
var warnings = new List<string>();
var errors = new List<string>();
// Check source exists
var sourceExists = true;
if (request.SourceId.HasValue)
{
sourceExists = await _sourceValidator.ExistsAsync(
request.TenantId, request.SourceId.Value, cancellationToken);
if (!sourceExists)
{
errors.Add($"Source {request.SourceId} not found.");
}
}
// Check for overlapping backfills
var hasOverlap = await _overlapChecker.HasOverlapAsync(
request.TenantId,
request.ScopeKey,
request.WindowStart,
request.WindowEnd,
request.BackfillId,
cancellationToken);
if (hasOverlap)
{
errors.Add("An active backfill already exists for this scope and time window.");
}
// Check retention period
var retentionLimit = DateTimeOffset.UtcNow - _options.RetentionPeriod;
var withinRetention = request.WindowStart >= retentionLimit;
if (!withinRetention)
{
errors.Add($"Window start {request.WindowStart:O} is beyond the retention period ({_options.RetentionPeriod.TotalDays} days).");
}
// Check event limit
var withinEventLimit = estimatedEvents <= _options.MaxEventsPerBackfill;
if (!withinEventLimit)
{
errors.Add($"Estimated {estimatedEvents:N0} events exceeds maximum allowed ({_options.MaxEventsPerBackfill:N0}).");
}
else if (estimatedEvents > _options.MaxEventsPerBackfill * 0.8)
{
warnings.Add($"Estimated {estimatedEvents:N0} events is approaching the maximum limit.");
}
// Check duration limit
var maxDuration = request.MaxDuration ?? _options.MaxBackfillDuration;
var withinDurationLimit = estimatedDuration <= maxDuration;
if (!withinDurationLimit)
{
errors.Add($"Estimated duration {estimatedDuration} exceeds maximum allowed ({maxDuration}).");
}
// Check quota availability (placeholder - always true for now)
var quotaAvailable = true;
// Add warnings for large backfills
if (request.WindowDuration > TimeSpan.FromDays(7))
{
warnings.Add("Large time window may take significant time to process.");
}
if (request.ForceReprocess)
{
warnings.Add("Force reprocess is enabled - events will be processed even if already seen.");
}
return new BackfillSafetyChecks(
SourceExists: sourceExists,
HasOverlappingBackfill: hasOverlap,
WithinRetention: withinRetention,
WithinEventLimit: withinEventLimit,
WithinDurationLimit: withinDurationLimit,
QuotaAvailable: quotaAvailable,
Warnings: warnings,
Errors: errors);
}
}
/// <summary>
/// Validates that a source exists.
/// </summary>
public interface ISourceValidator
{
/// <summary>
/// Checks if a source exists.
/// </summary>
Task<bool> ExistsAsync(string tenantId, Guid sourceId, CancellationToken cancellationToken);
}
/// <summary>
/// Checks for overlapping backfill operations.
/// </summary>
public interface IOverlapChecker
{
/// <summary>
/// Checks if there's an overlapping active backfill.
/// </summary>
Task<bool> HasOverlapAsync(
string tenantId,
string scopeKey,
DateTimeOffset windowStart,
DateTimeOffset windowEnd,
Guid? excludeBackfillId,
CancellationToken cancellationToken);
}
/// <summary>
/// Default implementation of the backfill manager.
/// </summary>
public sealed class BackfillManager : IBackfillManager
{
private readonly IBackfillRepository _backfillRepository;
private readonly IBackfillSafetyValidator _safetyValidator;
private readonly IBackfillEventCounter _eventCounter;
private readonly IDuplicateSuppressor _duplicateSuppressor;
private readonly BackfillManagerOptions _options;
private readonly ILogger<BackfillManager> _logger;
public BackfillManager(
IBackfillRepository backfillRepository,
IBackfillSafetyValidator safetyValidator,
IBackfillEventCounter eventCounter,
IDuplicateSuppressor duplicateSuppressor,
BackfillManagerOptions options,
ILogger<BackfillManager> logger)
{
_backfillRepository = backfillRepository;
_safetyValidator = safetyValidator;
_eventCounter = eventCounter;
_duplicateSuppressor = duplicateSuppressor;
_options = options;
_logger = logger;
}
public async Task<BackfillRequest> CreateRequestAsync(
string tenantId,
Guid? sourceId,
string? jobType,
DateTimeOffset windowStart,
DateTimeOffset windowEnd,
string reason,
string createdBy,
int batchSize = 100,
bool dryRun = false,
bool forceReprocess = false,
string? ticket = null,
TimeSpan? maxDuration = null,
CancellationToken cancellationToken = default)
{
var request = BackfillRequest.Create(
tenantId: tenantId,
sourceId: sourceId,
jobType: jobType,
windowStart: windowStart,
windowEnd: windowEnd,
reason: reason,
createdBy: createdBy,
batchSize: batchSize,
dryRun: dryRun,
forceReprocess: forceReprocess,
ticket: ticket,
maxDuration: maxDuration);
await _backfillRepository.CreateAsync(request, cancellationToken);
_logger.LogInformation(
"Created backfill request {BackfillId} for scope {ScopeKey} from {WindowStart} to {WindowEnd}",
request.BackfillId, request.ScopeKey, windowStart, windowEnd);
return request;
}
public async Task<BackfillRequest> ValidateRequestAsync(
string tenantId,
Guid backfillId,
string updatedBy,
CancellationToken cancellationToken = default)
{
var request = await _backfillRepository.GetByIdAsync(tenantId, backfillId, cancellationToken)
?? throw new InvalidOperationException($"Backfill request {backfillId} not found.");
request = request.StartValidation(updatedBy);
await _backfillRepository.UpdateAsync(request, cancellationToken);
// Estimate event count
var estimatedEvents = await _eventCounter.EstimateEventCountAsync(
tenantId, request.ScopeKey, request.WindowStart, request.WindowEnd, cancellationToken);
// Calculate estimated duration
var estimatedDuration = TimeSpan.FromSeconds(estimatedEvents / _options.EstimatedEventsPerSecond);
// Run safety validations
var safetyChecks = await _safetyValidator.ValidateAsync(
request, estimatedEvents, estimatedDuration, cancellationToken);
request = request.WithSafetyChecks(safetyChecks, estimatedEvents, estimatedDuration, updatedBy);
await _backfillRepository.UpdateAsync(request, cancellationToken);
_logger.LogInformation(
"Validated backfill request {BackfillId}: {EstimatedEvents} events, safe={IsSafe}",
backfillId, estimatedEvents, safetyChecks.IsSafe);
return request;
}
public async Task<BackfillPreview> PreviewAsync(
string tenantId,
Guid? sourceId,
string? jobType,
DateTimeOffset windowStart,
DateTimeOffset windowEnd,
int batchSize = 100,
CancellationToken cancellationToken = default)
{
var scopeKey = GetScopeKey(sourceId, jobType);
// Estimate total events
var estimatedEvents = await _eventCounter.EstimateEventCountAsync(
tenantId, scopeKey, windowStart, windowEnd, cancellationToken);
// Get already processed count
var processedCount = await _duplicateSuppressor.CountProcessedAsync(
scopeKey, windowStart, windowEnd, cancellationToken);
// Get sample event keys
var sampleKeys = await _eventCounter.GetSampleEventKeysAsync(
tenantId, scopeKey, windowStart, windowEnd, _options.PreviewSampleSize, cancellationToken);
// Calculate estimates
var processableEvents = Math.Max(0, estimatedEvents - processedCount);
var estimatedDuration = TimeSpan.FromSeconds(processableEvents / _options.EstimatedEventsPerSecond);
var estimatedBatches = (int)Math.Ceiling((double)processableEvents / batchSize);
// Run safety checks
var tempRequest = BackfillRequest.Create(
tenantId, sourceId, jobType, windowStart, windowEnd,
"preview", "system", batchSize);
var safetyChecks = await _safetyValidator.ValidateAsync(
tempRequest, estimatedEvents, estimatedDuration, cancellationToken);
return new BackfillPreview(
ScopeKey: scopeKey,
WindowStart: windowStart,
WindowEnd: windowEnd,
EstimatedEvents: estimatedEvents,
SkippedEvents: processedCount,
ProcessableEvents: processableEvents,
EstimatedDuration: estimatedDuration,
EstimatedBatches: estimatedBatches,
SafetyChecks: safetyChecks,
SampleEventKeys: sampleKeys);
}
public async Task<BackfillRequest> StartAsync(
string tenantId,
Guid backfillId,
string updatedBy,
CancellationToken cancellationToken = default)
{
var request = await _backfillRepository.GetByIdAsync(tenantId, backfillId, cancellationToken)
?? throw new InvalidOperationException($"Backfill request {backfillId} not found.");
request = request.Start(updatedBy);
await _backfillRepository.UpdateAsync(request, cancellationToken);
_logger.LogInformation("Started backfill request {BackfillId}", backfillId);
return request;
}
public async Task<BackfillRequest> PauseAsync(
string tenantId,
Guid backfillId,
string updatedBy,
CancellationToken cancellationToken = default)
{
var request = await _backfillRepository.GetByIdAsync(tenantId, backfillId, cancellationToken)
?? throw new InvalidOperationException($"Backfill request {backfillId} not found.");
request = request.Pause(updatedBy);
await _backfillRepository.UpdateAsync(request, cancellationToken);
_logger.LogInformation("Paused backfill request {BackfillId}", backfillId);
return request;
}
public async Task<BackfillRequest> ResumeAsync(
string tenantId,
Guid backfillId,
string updatedBy,
CancellationToken cancellationToken = default)
{
var request = await _backfillRepository.GetByIdAsync(tenantId, backfillId, cancellationToken)
?? throw new InvalidOperationException($"Backfill request {backfillId} not found.");
request = request.Resume(updatedBy);
await _backfillRepository.UpdateAsync(request, cancellationToken);
_logger.LogInformation("Resumed backfill request {BackfillId}", backfillId);
return request;
}
public async Task<BackfillRequest> CancelAsync(
string tenantId,
Guid backfillId,
string updatedBy,
CancellationToken cancellationToken = default)
{
var request = await _backfillRepository.GetByIdAsync(tenantId, backfillId, cancellationToken)
?? throw new InvalidOperationException($"Backfill request {backfillId} not found.");
request = request.Cancel(updatedBy);
await _backfillRepository.UpdateAsync(request, cancellationToken);
_logger.LogInformation("Canceled backfill request {BackfillId}", backfillId);
return request;
}
public Task<BackfillRequest?> GetStatusAsync(
string tenantId,
Guid backfillId,
CancellationToken cancellationToken = default)
{
return _backfillRepository.GetByIdAsync(tenantId, backfillId, cancellationToken);
}
public Task<IReadOnlyList<BackfillRequest>> ListAsync(
string tenantId,
BackfillStatus? status = null,
Guid? sourceId = null,
string? jobType = null,
int limit = 50,
int offset = 0,
CancellationToken cancellationToken = default)
{
return _backfillRepository.ListAsync(tenantId, status, sourceId, jobType, limit, offset, cancellationToken);
}
private static string GetScopeKey(Guid? sourceId, string? jobType)
{
return (sourceId, jobType) switch
{
(Guid s, string j) when !string.IsNullOrEmpty(j) => Watermark.CreateScopeKey(s, j),
(Guid s, _) => Watermark.CreateScopeKey(s),
(_, string j) when !string.IsNullOrEmpty(j) => Watermark.CreateScopeKey(j),
_ => throw new ArgumentException("Either sourceId or jobType must be specified.")
};
}
}
/// <summary>
/// Repository interface for backfill persistence (imported for convenience).
/// </summary>
public interface IBackfillRepository
{
Task<BackfillRequest?> GetByIdAsync(string tenantId, Guid backfillId, CancellationToken cancellationToken);
Task CreateAsync(BackfillRequest request, CancellationToken cancellationToken);
Task UpdateAsync(BackfillRequest request, CancellationToken cancellationToken);
Task<IReadOnlyList<BackfillRequest>> ListAsync(
string tenantId,
BackfillStatus? status,
Guid? sourceId,
string? jobType,
int limit,
int offset,
CancellationToken cancellationToken);
}

View File

@@ -0,0 +1,318 @@
namespace StellaOps.Orchestrator.Core.Backfill;
/// <summary>
/// Tracks processed events for duplicate suppression.
/// </summary>
public interface IDuplicateSuppressor
{
/// <summary>
/// Checks if an event has already been processed.
/// </summary>
/// <param name="scopeKey">Scope identifier.</param>
/// <param name="eventKey">Unique event identifier.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>True if the event was already processed.</returns>
Task<bool> HasProcessedAsync(string scopeKey, string eventKey, CancellationToken cancellationToken);
/// <summary>
/// Checks multiple events for duplicate status.
/// </summary>
/// <param name="scopeKey">Scope identifier.</param>
/// <param name="eventKeys">Event identifiers to check.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>Set of event keys that have already been processed.</returns>
Task<IReadOnlySet<string>> GetProcessedAsync(string scopeKey, IEnumerable<string> eventKeys, CancellationToken cancellationToken);
/// <summary>
/// Marks an event as processed.
/// </summary>
/// <param name="scopeKey">Scope identifier.</param>
/// <param name="eventKey">Unique event identifier.</param>
/// <param name="eventTime">Event timestamp.</param>
/// <param name="batchId">Optional batch/backfill identifier.</param>
/// <param name="ttl">Time-to-live for the record.</param>
/// <param name="cancellationToken">Cancellation token.</param>
Task MarkProcessedAsync(
string scopeKey,
string eventKey,
DateTimeOffset eventTime,
Guid? batchId,
TimeSpan ttl,
CancellationToken cancellationToken);
/// <summary>
/// Marks multiple events as processed.
/// </summary>
/// <param name="scopeKey">Scope identifier.</param>
/// <param name="events">Events to mark as processed.</param>
/// <param name="batchId">Optional batch/backfill identifier.</param>
/// <param name="ttl">Time-to-live for the records.</param>
/// <param name="cancellationToken">Cancellation token.</param>
Task MarkProcessedBatchAsync(
string scopeKey,
IEnumerable<ProcessedEvent> events,
Guid? batchId,
TimeSpan ttl,
CancellationToken cancellationToken);
/// <summary>
/// Counts processed events within a time range.
/// </summary>
/// <param name="scopeKey">Scope identifier.</param>
/// <param name="from">Start of time range.</param>
/// <param name="to">End of time range.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>Count of processed events.</returns>
Task<long> CountProcessedAsync(string scopeKey, DateTimeOffset from, DateTimeOffset to, CancellationToken cancellationToken);
/// <summary>
/// Removes expired records (cleanup).
/// </summary>
/// <param name="batchLimit">Maximum records to remove per call.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>Number of records removed.</returns>
Task<int> CleanupExpiredAsync(int batchLimit, CancellationToken cancellationToken);
}
/// <summary>
/// Event data for duplicate tracking.
/// </summary>
public sealed record ProcessedEvent(
/// <summary>Unique event identifier.</summary>
string EventKey,
/// <summary>Event timestamp.</summary>
DateTimeOffset EventTime);
/// <summary>
/// In-memory duplicate suppressor for testing.
/// </summary>
public sealed class InMemoryDuplicateSuppressor : IDuplicateSuppressor
{
private readonly Dictionary<string, Dictionary<string, ProcessedEventEntry>> _store = new();
private readonly object _lock = new();
private sealed record ProcessedEventEntry(
DateTimeOffset EventTime,
DateTimeOffset ProcessedAt,
Guid? BatchId,
DateTimeOffset ExpiresAt);
public Task<bool> HasProcessedAsync(string scopeKey, string eventKey, CancellationToken cancellationToken)
{
lock (_lock)
{
if (!_store.TryGetValue(scopeKey, out var scopeStore))
return Task.FromResult(false);
if (!scopeStore.TryGetValue(eventKey, out var entry))
return Task.FromResult(false);
// Check if expired
if (entry.ExpiresAt < DateTimeOffset.UtcNow)
{
scopeStore.Remove(eventKey);
return Task.FromResult(false);
}
return Task.FromResult(true);
}
}
public Task<IReadOnlySet<string>> GetProcessedAsync(string scopeKey, IEnumerable<string> eventKeys, CancellationToken cancellationToken)
{
var now = DateTimeOffset.UtcNow;
var result = new HashSet<string>();
lock (_lock)
{
if (!_store.TryGetValue(scopeKey, out var scopeStore))
return Task.FromResult<IReadOnlySet<string>>(result);
foreach (var eventKey in eventKeys)
{
if (scopeStore.TryGetValue(eventKey, out var entry) && entry.ExpiresAt >= now)
{
result.Add(eventKey);
}
}
}
return Task.FromResult<IReadOnlySet<string>>(result);
}
public Task MarkProcessedAsync(
string scopeKey,
string eventKey,
DateTimeOffset eventTime,
Guid? batchId,
TimeSpan ttl,
CancellationToken cancellationToken)
{
var now = DateTimeOffset.UtcNow;
var entry = new ProcessedEventEntry(eventTime, now, batchId, now + ttl);
lock (_lock)
{
if (!_store.TryGetValue(scopeKey, out var scopeStore))
{
scopeStore = new Dictionary<string, ProcessedEventEntry>();
_store[scopeKey] = scopeStore;
}
scopeStore[eventKey] = entry;
}
return Task.CompletedTask;
}
public Task MarkProcessedBatchAsync(
string scopeKey,
IEnumerable<ProcessedEvent> events,
Guid? batchId,
TimeSpan ttl,
CancellationToken cancellationToken)
{
var now = DateTimeOffset.UtcNow;
var expiresAt = now + ttl;
lock (_lock)
{
if (!_store.TryGetValue(scopeKey, out var scopeStore))
{
scopeStore = new Dictionary<string, ProcessedEventEntry>();
_store[scopeKey] = scopeStore;
}
foreach (var evt in events)
{
scopeStore[evt.EventKey] = new ProcessedEventEntry(evt.EventTime, now, batchId, expiresAt);
}
}
return Task.CompletedTask;
}
public Task<long> CountProcessedAsync(string scopeKey, DateTimeOffset from, DateTimeOffset to, CancellationToken cancellationToken)
{
var now = DateTimeOffset.UtcNow;
long count = 0;
lock (_lock)
{
if (_store.TryGetValue(scopeKey, out var scopeStore))
{
count = scopeStore.Values
.Count(e => e.ExpiresAt >= now && e.EventTime >= from && e.EventTime < to);
}
}
return Task.FromResult(count);
}
public Task<int> CleanupExpiredAsync(int batchLimit, CancellationToken cancellationToken)
{
var now = DateTimeOffset.UtcNow;
var removed = 0;
lock (_lock)
{
foreach (var scopeStore in _store.Values)
{
var expiredKeys = scopeStore
.Where(kvp => kvp.Value.ExpiresAt < now)
.Take(batchLimit - removed)
.Select(kvp => kvp.Key)
.ToList();
foreach (var key in expiredKeys)
{
scopeStore.Remove(key);
removed++;
}
if (removed >= batchLimit)
break;
}
}
return Task.FromResult(removed);
}
}
/// <summary>
/// Result of filtering events through duplicate suppression.
/// </summary>
public sealed record DuplicateFilterResult<T>(
/// <summary>Events that should be processed (not duplicates).</summary>
IReadOnlyList<T> ToProcess,
/// <summary>Events that were filtered as duplicates.</summary>
IReadOnlyList<T> Duplicates,
/// <summary>Total events evaluated.</summary>
int Total)
{
/// <summary>
/// Number of events that passed filtering.
/// </summary>
public int ProcessCount => ToProcess.Count;
/// <summary>
/// Number of duplicates filtered.
/// </summary>
public int DuplicateCount => Duplicates.Count;
/// <summary>
/// Duplicate percentage.
/// </summary>
public double DuplicatePercent => Total > 0 ? Math.Round((double)DuplicateCount / Total * 100, 2) : 0;
}
/// <summary>
/// Helper methods for duplicate suppression.
/// </summary>
public static class DuplicateSuppressorExtensions
{
/// <summary>
/// Filters a batch of events, removing duplicates.
/// </summary>
/// <typeparam name="T">Event type.</typeparam>
/// <param name="suppressor">Duplicate suppressor.</param>
/// <param name="scopeKey">Scope identifier.</param>
/// <param name="events">Events to filter.</param>
/// <param name="keySelector">Function to extract event key.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>Filter result with events to process and duplicates.</returns>
public static async Task<DuplicateFilterResult<T>> FilterAsync<T>(
this IDuplicateSuppressor suppressor,
string scopeKey,
IReadOnlyList<T> events,
Func<T, string> keySelector,
CancellationToken cancellationToken)
{
if (events.Count == 0)
return new DuplicateFilterResult<T>([], [], 0);
var eventKeys = events.Select(keySelector).ToList();
var processed = await suppressor.GetProcessedAsync(scopeKey, eventKeys, cancellationToken).ConfigureAwait(false);
var toProcess = new List<T>();
var duplicates = new List<T>();
foreach (var evt in events)
{
var key = keySelector(evt);
if (processed.Contains(key))
{
duplicates.Add(evt);
}
else
{
toProcess.Add(evt);
}
}
return new DuplicateFilterResult<T>(toProcess, duplicates, events.Count);
}
}

View File

@@ -0,0 +1,220 @@
namespace StellaOps.Orchestrator.Core.Backfill;
/// <summary>
/// Represents an event-time window for batch processing.
/// </summary>
public sealed record EventTimeWindow(
/// <summary>Start of the window (inclusive).</summary>
DateTimeOffset Start,
/// <summary>End of the window (exclusive).</summary>
DateTimeOffset End)
{
/// <summary>
/// Duration of the window.
/// </summary>
public TimeSpan Duration => End - Start;
/// <summary>
/// Whether the window is empty (zero duration).
/// </summary>
public bool IsEmpty => End <= Start;
/// <summary>
/// Whether a timestamp falls within this window.
/// </summary>
public bool Contains(DateTimeOffset timestamp) => timestamp >= Start && timestamp < End;
/// <summary>
/// Whether this window overlaps with another.
/// </summary>
public bool Overlaps(EventTimeWindow other) =>
Start < other.End && End > other.Start;
/// <summary>
/// Creates the intersection of two windows.
/// </summary>
public EventTimeWindow? Intersect(EventTimeWindow other)
{
var newStart = Start > other.Start ? Start : other.Start;
var newEnd = End < other.End ? End : other.End;
return newEnd > newStart ? new EventTimeWindow(newStart, newEnd) : null;
}
/// <summary>
/// Splits the window into batches of the specified duration.
/// </summary>
public IEnumerable<EventTimeWindow> Split(TimeSpan batchDuration)
{
if (batchDuration <= TimeSpan.Zero)
throw new ArgumentOutOfRangeException(nameof(batchDuration), "Batch duration must be positive.");
var current = Start;
while (current < End)
{
var batchEnd = current + batchDuration;
if (batchEnd > End)
batchEnd = End;
yield return new EventTimeWindow(current, batchEnd);
current = batchEnd;
}
}
/// <summary>
/// Creates a window from a duration ending at the specified time.
/// </summary>
public static EventTimeWindow FromDuration(DateTimeOffset end, TimeSpan duration) =>
new(end - duration, end);
/// <summary>
/// Creates a window covering the last N hours from now.
/// </summary>
public static EventTimeWindow LastHours(int hours, DateTimeOffset? now = null)
{
var endTime = now ?? DateTimeOffset.UtcNow;
return FromDuration(endTime, TimeSpan.FromHours(hours));
}
/// <summary>
/// Creates a window covering the last N days from now.
/// </summary>
public static EventTimeWindow LastDays(int days, DateTimeOffset? now = null)
{
var endTime = now ?? DateTimeOffset.UtcNow;
return FromDuration(endTime, TimeSpan.FromDays(days));
}
}
/// <summary>
/// Configuration for event-time window computation.
/// </summary>
public sealed record EventTimeWindowOptions(
/// <summary>Minimum window size (prevents too-small batches).</summary>
TimeSpan MinWindowSize,
/// <summary>Maximum window size (prevents too-large batches).</summary>
TimeSpan MaxWindowSize,
/// <summary>Overlap with previous window for late-arriving events.</summary>
TimeSpan OverlapDuration,
/// <summary>Maximum lag allowed before triggering alerts.</summary>
TimeSpan MaxLag,
/// <summary>Default lookback for initial fetch when no watermark exists.</summary>
TimeSpan InitialLookback)
{
/// <summary>
/// Default options for hourly batching.
/// </summary>
public static EventTimeWindowOptions HourlyBatches => new(
MinWindowSize: TimeSpan.FromMinutes(5),
MaxWindowSize: TimeSpan.FromHours(1),
OverlapDuration: TimeSpan.FromMinutes(5),
MaxLag: TimeSpan.FromHours(2),
InitialLookback: TimeSpan.FromDays(7));
/// <summary>
/// Default options for daily batching.
/// </summary>
public static EventTimeWindowOptions DailyBatches => new(
MinWindowSize: TimeSpan.FromHours(1),
MaxWindowSize: TimeSpan.FromDays(1),
OverlapDuration: TimeSpan.FromHours(1),
MaxLag: TimeSpan.FromDays(1),
InitialLookback: TimeSpan.FromDays(30));
}
/// <summary>
/// Computes event-time windows for incremental processing.
/// </summary>
public static class EventTimeWindowPlanner
{
/// <summary>
/// Computes the next window to process based on current watermark.
/// </summary>
/// <param name="now">Current time.</param>
/// <param name="highWatermark">Current high watermark (null for initial fetch).</param>
/// <param name="options">Window configuration options.</param>
/// <returns>The next window to process, or null if caught up.</returns>
public static EventTimeWindow? GetNextWindow(
DateTimeOffset now,
DateTimeOffset? highWatermark,
EventTimeWindowOptions options)
{
DateTimeOffset windowStart;
if (highWatermark is null)
{
// Initial fetch: start from initial lookback
windowStart = now - options.InitialLookback;
}
else
{
// Incremental fetch: start from watermark minus overlap
windowStart = highWatermark.Value - options.OverlapDuration;
// If we're caught up (watermark + min window > now), no work needed
if (highWatermark.Value + options.MinWindowSize > now)
{
return null;
}
}
// Calculate window end (at most now, at most max window from start)
var windowEnd = windowStart + options.MaxWindowSize;
if (windowEnd > now)
{
windowEnd = now;
}
// Ensure minimum window size
if (windowEnd - windowStart < options.MinWindowSize)
{
// If window would be too small, extend end (but not past now)
windowEnd = windowStart + options.MinWindowSize;
if (windowEnd > now)
{
return null; // Not enough data accumulated yet
}
}
return new EventTimeWindow(windowStart, windowEnd);
}
/// <summary>
/// Calculates the current lag from the high watermark.
/// </summary>
public static TimeSpan CalculateLag(DateTimeOffset now, DateTimeOffset highWatermark) =>
now - highWatermark;
/// <summary>
/// Determines if the lag exceeds the maximum allowed.
/// </summary>
public static bool IsLagging(DateTimeOffset now, DateTimeOffset highWatermark, EventTimeWindowOptions options) =>
CalculateLag(now, highWatermark) > options.MaxLag;
/// <summary>
/// Estimates the number of windows needed to catch up.
/// </summary>
public static int EstimateWindowsToProcess(
DateTimeOffset now,
DateTimeOffset? highWatermark,
EventTimeWindowOptions options)
{
if (highWatermark is null)
{
// Initial fetch
var totalDuration = options.InitialLookback;
return (int)Math.Ceiling(totalDuration / options.MaxWindowSize);
}
var lag = CalculateLag(now, highWatermark.Value);
if (lag <= options.MinWindowSize)
return 0;
return (int)Math.Ceiling(lag / options.MaxWindowSize);
}
}

View File

@@ -0,0 +1,502 @@
using Microsoft.Extensions.Logging;
using StellaOps.Orchestrator.Core.Domain;
namespace StellaOps.Orchestrator.Core.DeadLetter;
/// <summary>
/// Notification channel types.
/// </summary>
public enum NotificationChannel
{
Email,
Slack,
Teams,
Webhook,
PagerDuty
}
/// <summary>
/// Notification rule for dead-letter events.
/// </summary>
public sealed record NotificationRule(
Guid RuleId,
string TenantId,
string? JobTypePattern,
string? ErrorCodePattern,
ErrorCategory? Category,
Guid? SourceId,
bool Enabled,
NotificationChannel Channel,
string Endpoint,
int CooldownMinutes,
int MaxPerHour,
bool Aggregate,
DateTimeOffset? LastNotifiedAt,
int NotificationsSent,
DateTimeOffset CreatedAt,
DateTimeOffset UpdatedAt,
string CreatedBy,
string UpdatedBy)
{
/// <summary>Creates a new notification rule.</summary>
public static NotificationRule Create(
string tenantId,
NotificationChannel channel,
string endpoint,
string createdBy,
string? jobTypePattern = null,
string? errorCodePattern = null,
ErrorCategory? category = null,
Guid? sourceId = null,
int cooldownMinutes = 15,
int maxPerHour = 10,
bool aggregate = true)
{
var now = DateTimeOffset.UtcNow;
return new NotificationRule(
RuleId: Guid.NewGuid(),
TenantId: tenantId,
JobTypePattern: jobTypePattern,
ErrorCodePattern: errorCodePattern,
Category: category,
SourceId: sourceId,
Enabled: true,
Channel: channel,
Endpoint: endpoint,
CooldownMinutes: cooldownMinutes,
MaxPerHour: maxPerHour,
Aggregate: aggregate,
LastNotifiedAt: null,
NotificationsSent: 0,
CreatedAt: now,
UpdatedAt: now,
CreatedBy: createdBy,
UpdatedBy: createdBy);
}
/// <summary>Checks if this rule matches the given entry.</summary>
public bool Matches(DeadLetterEntry entry)
{
if (!Enabled) return false;
if (SourceId.HasValue && entry.SourceId != SourceId.Value) return false;
if (Category.HasValue && entry.Category != Category.Value) return false;
if (!string.IsNullOrEmpty(JobTypePattern))
{
if (!System.Text.RegularExpressions.Regex.IsMatch(entry.JobType, JobTypePattern))
return false;
}
if (!string.IsNullOrEmpty(ErrorCodePattern))
{
if (!System.Text.RegularExpressions.Regex.IsMatch(entry.ErrorCode, ErrorCodePattern))
return false;
}
return true;
}
/// <summary>Checks if this rule is within rate limits.</summary>
public bool CanNotify(DateTimeOffset now, int notificationsSentThisHour)
{
if (!Enabled) return false;
if (notificationsSentThisHour >= MaxPerHour) return false;
if (LastNotifiedAt.HasValue)
{
var elapsed = now - LastNotifiedAt.Value;
if (elapsed < TimeSpan.FromMinutes(CooldownMinutes))
return false;
}
return true;
}
/// <summary>Records a notification sent.</summary>
public NotificationRule RecordNotification(DateTimeOffset now) =>
this with
{
LastNotifiedAt = now,
NotificationsSent = NotificationsSent + 1,
UpdatedAt = now
};
}
/// <summary>
/// Notification log entry.
/// </summary>
public sealed record NotificationLogEntry(
Guid LogId,
string TenantId,
Guid RuleId,
IReadOnlyList<Guid> EntryIds,
NotificationChannel Channel,
string Endpoint,
bool Success,
string? ErrorMessage,
string? Subject,
int EntryCount,
DateTimeOffset SentAt);
/// <summary>
/// Notification payload for dead-letter events.
/// </summary>
public sealed record DeadLetterNotificationPayload(
string TenantId,
string EventType,
IReadOnlyList<DeadLetterEntrySummary> Entries,
DeadLetterStatsSnapshot? Stats,
DateTimeOffset Timestamp,
string? ActionUrl);
/// <summary>
/// Summary of a dead-letter entry for notifications.
/// </summary>
public sealed record DeadLetterEntrySummary(
Guid EntryId,
Guid OriginalJobId,
string JobType,
string ErrorCode,
ErrorCategory Category,
string FailureReason,
string? RemediationHint,
bool IsRetryable,
int ReplayAttempts,
DateTimeOffset FailedAt);
/// <summary>
/// Stats snapshot for notifications.
/// </summary>
public sealed record DeadLetterStatsSnapshot(
long PendingCount,
long RetryableCount,
long ExhaustedCount);
/// <summary>
/// Interface for dead-letter event notifications.
/// </summary>
public interface IDeadLetterNotifier
{
/// <summary>Notifies when a new entry is added to dead-letter store.</summary>
Task NotifyNewEntryAsync(
DeadLetterEntry entry,
CancellationToken cancellationToken);
/// <summary>Notifies when an entry is successfully replayed.</summary>
Task NotifyReplaySuccessAsync(
DeadLetterEntry entry,
Guid newJobId,
CancellationToken cancellationToken);
/// <summary>Notifies when an entry exhausts all replay attempts.</summary>
Task NotifyExhaustedAsync(
DeadLetterEntry entry,
CancellationToken cancellationToken);
/// <summary>Sends aggregated notifications for pending entries.</summary>
Task SendAggregatedNotificationsAsync(
string tenantId,
CancellationToken cancellationToken);
}
/// <summary>
/// Interface for notification delivery.
/// </summary>
public interface INotificationDelivery
{
/// <summary>Sends a notification to the specified endpoint.</summary>
Task<bool> SendAsync(
NotificationChannel channel,
string endpoint,
DeadLetterNotificationPayload payload,
CancellationToken cancellationToken);
}
/// <summary>
/// Repository for notification rules.
/// </summary>
public interface INotificationRuleRepository
{
Task<NotificationRule?> GetByIdAsync(string tenantId, Guid ruleId, CancellationToken cancellationToken);
Task<IReadOnlyList<NotificationRule>> ListAsync(string tenantId, bool enabledOnly, CancellationToken cancellationToken);
Task<IReadOnlyList<NotificationRule>> GetMatchingRulesAsync(string tenantId, DeadLetterEntry entry, CancellationToken cancellationToken);
Task CreateAsync(NotificationRule rule, CancellationToken cancellationToken);
Task<bool> UpdateAsync(NotificationRule rule, CancellationToken cancellationToken);
Task<bool> DeleteAsync(string tenantId, Guid ruleId, CancellationToken cancellationToken);
Task<int> GetNotificationCountThisHourAsync(string tenantId, Guid ruleId, CancellationToken cancellationToken);
Task LogNotificationAsync(NotificationLogEntry log, CancellationToken cancellationToken);
}
/// <summary>
/// Default dead-letter notifier implementation.
/// </summary>
public sealed class DeadLetterNotifier : IDeadLetterNotifier
{
private readonly INotificationRuleRepository _ruleRepository;
private readonly IDeadLetterRepository _deadLetterRepository;
private readonly INotificationDelivery _delivery;
private readonly TimeProvider _timeProvider;
private readonly ILogger<DeadLetterNotifier> _logger;
public DeadLetterNotifier(
INotificationRuleRepository ruleRepository,
IDeadLetterRepository deadLetterRepository,
INotificationDelivery delivery,
TimeProvider timeProvider,
ILogger<DeadLetterNotifier> logger)
{
_ruleRepository = ruleRepository ?? throw new ArgumentNullException(nameof(ruleRepository));
_deadLetterRepository = deadLetterRepository ?? throw new ArgumentNullException(nameof(deadLetterRepository));
_delivery = delivery ?? throw new ArgumentNullException(nameof(delivery));
_timeProvider = timeProvider ?? throw new ArgumentNullException(nameof(timeProvider));
_logger = logger ?? throw new ArgumentNullException(nameof(logger));
}
public async Task NotifyNewEntryAsync(
DeadLetterEntry entry,
CancellationToken cancellationToken)
{
var rules = await _ruleRepository.GetMatchingRulesAsync(entry.TenantId, entry, cancellationToken)
.ConfigureAwait(false);
var now = _timeProvider.GetUtcNow();
foreach (var rule in rules)
{
if (rule.Aggregate)
{
// Skip immediate notification for aggregated rules
continue;
}
var notificationsThisHour = await _ruleRepository.GetNotificationCountThisHourAsync(
entry.TenantId, rule.RuleId, cancellationToken).ConfigureAwait(false);
if (!rule.CanNotify(now, notificationsThisHour))
{
continue;
}
await SendNotificationAsync(rule, "new_entry", [entry], null, cancellationToken)
.ConfigureAwait(false);
}
}
public async Task NotifyReplaySuccessAsync(
DeadLetterEntry entry,
Guid newJobId,
CancellationToken cancellationToken)
{
var rules = await _ruleRepository.GetMatchingRulesAsync(entry.TenantId, entry, cancellationToken)
.ConfigureAwait(false);
var now = _timeProvider.GetUtcNow();
foreach (var rule in rules)
{
var notificationsThisHour = await _ruleRepository.GetNotificationCountThisHourAsync(
entry.TenantId, rule.RuleId, cancellationToken).ConfigureAwait(false);
if (!rule.CanNotify(now, notificationsThisHour))
{
continue;
}
var payload = new DeadLetterNotificationPayload(
TenantId: entry.TenantId,
EventType: "replay_success",
Entries: [ToSummary(entry)],
Stats: null,
Timestamp: now,
ActionUrl: null);
var success = await _delivery.SendAsync(rule.Channel, rule.Endpoint, payload, cancellationToken)
.ConfigureAwait(false);
await LogNotificationAsync(rule, [entry.EntryId], success, null, cancellationToken)
.ConfigureAwait(false);
}
}
public async Task NotifyExhaustedAsync(
DeadLetterEntry entry,
CancellationToken cancellationToken)
{
var rules = await _ruleRepository.GetMatchingRulesAsync(entry.TenantId, entry, cancellationToken)
.ConfigureAwait(false);
var now = _timeProvider.GetUtcNow();
foreach (var rule in rules)
{
var notificationsThisHour = await _ruleRepository.GetNotificationCountThisHourAsync(
entry.TenantId, rule.RuleId, cancellationToken).ConfigureAwait(false);
if (!rule.CanNotify(now, notificationsThisHour))
{
continue;
}
await SendNotificationAsync(rule, "exhausted", [entry], null, cancellationToken)
.ConfigureAwait(false);
}
}
public async Task SendAggregatedNotificationsAsync(
string tenantId,
CancellationToken cancellationToken)
{
var rules = await _ruleRepository.ListAsync(tenantId, enabledOnly: true, cancellationToken)
.ConfigureAwait(false);
var now = _timeProvider.GetUtcNow();
var stats = await _deadLetterRepository.GetStatsAsync(tenantId, cancellationToken).ConfigureAwait(false);
foreach (var rule in rules.Where(r => r.Aggregate))
{
var notificationsThisHour = await _ruleRepository.GetNotificationCountThisHourAsync(
tenantId, rule.RuleId, cancellationToken).ConfigureAwait(false);
if (!rule.CanNotify(now, notificationsThisHour))
{
continue;
}
// Get pending entries matching this rule
var options = new DeadLetterListOptions(
Status: DeadLetterStatus.Pending,
Category: rule.Category,
Limit: 10);
var entries = await _deadLetterRepository.ListAsync(tenantId, options, cancellationToken)
.ConfigureAwait(false);
// Filter to only matching entries
var matchingEntries = entries.Where(e => rule.Matches(e)).ToList();
if (matchingEntries.Count == 0)
{
continue;
}
var statsSnapshot = new DeadLetterStatsSnapshot(
PendingCount: stats.PendingEntries,
RetryableCount: stats.RetryableEntries,
ExhaustedCount: stats.ExhaustedEntries);
await SendNotificationAsync(rule, "aggregated", matchingEntries, statsSnapshot, cancellationToken)
.ConfigureAwait(false);
}
}
private async Task SendNotificationAsync(
NotificationRule rule,
string eventType,
IReadOnlyList<DeadLetterEntry> entries,
DeadLetterStatsSnapshot? stats,
CancellationToken cancellationToken)
{
var now = _timeProvider.GetUtcNow();
var payload = new DeadLetterNotificationPayload(
TenantId: rule.TenantId,
EventType: eventType,
Entries: entries.Select(ToSummary).ToList(),
Stats: stats,
Timestamp: now,
ActionUrl: null);
string? errorMessage = null;
bool success;
try
{
success = await _delivery.SendAsync(rule.Channel, rule.Endpoint, payload, cancellationToken)
.ConfigureAwait(false);
}
catch (Exception ex)
{
success = false;
errorMessage = ex.Message;
_logger.LogError(ex, "Failed to send {EventType} notification for rule {RuleId}", eventType, rule.RuleId);
}
await LogNotificationAsync(rule, entries.Select(e => e.EntryId).ToList(), success, errorMessage, cancellationToken)
.ConfigureAwait(false);
if (success)
{
var updatedRule = rule.RecordNotification(now);
await _ruleRepository.UpdateAsync(updatedRule, cancellationToken).ConfigureAwait(false);
_logger.LogInformation(
"Dead-letter notification sent: tenant={TenantId}, channel={Channel}, eventType={EventType}",
rule.TenantId, rule.Channel, eventType);
}
else
{
_logger.LogWarning(
"Dead-letter notification failed: tenant={TenantId}, channel={Channel}, eventType={EventType}",
rule.TenantId, rule.Channel, eventType);
}
}
private async Task LogNotificationAsync(
NotificationRule rule,
IReadOnlyList<Guid> entryIds,
bool success,
string? errorMessage,
CancellationToken cancellationToken)
{
var log = new NotificationLogEntry(
LogId: Guid.NewGuid(),
TenantId: rule.TenantId,
RuleId: rule.RuleId,
EntryIds: entryIds,
Channel: rule.Channel,
Endpoint: rule.Endpoint,
Success: success,
ErrorMessage: errorMessage,
Subject: null,
EntryCount: entryIds.Count,
SentAt: _timeProvider.GetUtcNow());
await _ruleRepository.LogNotificationAsync(log, cancellationToken).ConfigureAwait(false);
}
private static DeadLetterEntrySummary ToSummary(DeadLetterEntry entry) =>
new(
EntryId: entry.EntryId,
OriginalJobId: entry.OriginalJobId,
JobType: entry.JobType,
ErrorCode: entry.ErrorCode,
Category: entry.Category,
FailureReason: entry.FailureReason,
RemediationHint: entry.RemediationHint,
IsRetryable: entry.IsRetryable,
ReplayAttempts: entry.ReplayAttempts,
FailedAt: entry.FailedAt);
}
/// <summary>
/// No-op notifier for when notifications are disabled.
/// </summary>
public sealed class NullDeadLetterNotifier : IDeadLetterNotifier
{
public static readonly NullDeadLetterNotifier Instance = new();
private NullDeadLetterNotifier() { }
public Task NotifyNewEntryAsync(DeadLetterEntry entry, CancellationToken cancellationToken) =>
Task.CompletedTask;
public Task NotifyReplaySuccessAsync(DeadLetterEntry entry, Guid newJobId, CancellationToken cancellationToken) =>
Task.CompletedTask;
public Task NotifyExhaustedAsync(DeadLetterEntry entry, CancellationToken cancellationToken) =>
Task.CompletedTask;
public Task SendAggregatedNotificationsAsync(string tenantId, CancellationToken cancellationToken) =>
Task.CompletedTask;
}

View File

@@ -0,0 +1,578 @@
using StellaOps.Orchestrator.Core.Domain;
namespace StellaOps.Orchestrator.Core.DeadLetter;
/// <summary>
/// Represents a classified error with remediation guidance.
/// </summary>
public sealed record ClassifiedError(
/// <summary>Error code (e.g., "ORCH-ERR-001").</summary>
string ErrorCode,
/// <summary>Error category.</summary>
ErrorCategory Category,
/// <summary>Human-readable description.</summary>
string Description,
/// <summary>Remediation hint for operators.</summary>
string RemediationHint,
/// <summary>Whether this error is potentially retryable.</summary>
bool IsRetryable,
/// <summary>Suggested retry delay if retryable.</summary>
TimeSpan? SuggestedRetryDelay);
/// <summary>
/// Classifies errors and provides remediation hints.
/// </summary>
public interface IErrorClassifier
{
/// <summary>Classifies an exception into a categorized error.</summary>
ClassifiedError Classify(Exception exception);
/// <summary>Classifies an error code and message.</summary>
ClassifiedError Classify(string errorCode, string message);
/// <summary>Classifies based on HTTP status code and message.</summary>
ClassifiedError ClassifyHttpError(int statusCode, string? message);
}
/// <summary>
/// Default error classifier with standard error codes and remediation hints.
/// </summary>
public sealed class DefaultErrorClassifier : IErrorClassifier
{
/// <summary>Known error codes with classifications.</summary>
public static class ErrorCodes
{
// Transient errors (ORCH-TRN-xxx)
public const string NetworkTimeout = "ORCH-TRN-001";
public const string ConnectionRefused = "ORCH-TRN-002";
public const string DnsResolutionFailed = "ORCH-TRN-003";
public const string ServiceUnavailable = "ORCH-TRN-004";
public const string GatewayTimeout = "ORCH-TRN-005";
public const string TemporaryFailure = "ORCH-TRN-099";
// Not found errors (ORCH-NF-xxx)
public const string ImageNotFound = "ORCH-NF-001";
public const string SourceNotFound = "ORCH-NF-002";
public const string RegistryNotFound = "ORCH-NF-003";
public const string ManifestNotFound = "ORCH-NF-004";
public const string ResourceNotFound = "ORCH-NF-099";
// Auth errors (ORCH-AUTH-xxx)
public const string InvalidCredentials = "ORCH-AUTH-001";
public const string TokenExpired = "ORCH-AUTH-002";
public const string InsufficientPermissions = "ORCH-AUTH-003";
public const string CertificateError = "ORCH-AUTH-004";
public const string AuthenticationFailed = "ORCH-AUTH-099";
// Rate limit errors (ORCH-RL-xxx)
public const string RateLimited = "ORCH-RL-001";
public const string QuotaExceeded = "ORCH-RL-002";
public const string ConcurrencyLimitReached = "ORCH-RL-003";
public const string ThrottlingError = "ORCH-RL-099";
// Validation errors (ORCH-VAL-xxx)
public const string InvalidPayload = "ORCH-VAL-001";
public const string InvalidConfiguration = "ORCH-VAL-002";
public const string SchemaValidationFailed = "ORCH-VAL-003";
public const string MissingRequiredField = "ORCH-VAL-004";
public const string ValidationFailed = "ORCH-VAL-099";
// Upstream errors (ORCH-UP-xxx)
public const string RegistryError = "ORCH-UP-001";
public const string AdvisoryFeedError = "ORCH-UP-002";
public const string DatabaseError = "ORCH-UP-003";
public const string ExternalServiceError = "ORCH-UP-099";
// Internal errors (ORCH-INT-xxx)
public const string InternalError = "ORCH-INT-001";
public const string StateCorruption = "ORCH-INT-002";
public const string ProcessingError = "ORCH-INT-003";
public const string UnexpectedError = "ORCH-INT-099";
// Conflict errors (ORCH-CON-xxx)
public const string DuplicateJob = "ORCH-CON-001";
public const string VersionMismatch = "ORCH-CON-002";
public const string ConcurrentModification = "ORCH-CON-003";
public const string ConflictError = "ORCH-CON-099";
// Canceled errors (ORCH-CAN-xxx)
public const string UserCanceled = "ORCH-CAN-001";
public const string SystemCanceled = "ORCH-CAN-002";
public const string TimeoutCanceled = "ORCH-CAN-003";
public const string OperationCanceled = "ORCH-CAN-099";
}
private static readonly Dictionary<string, ClassifiedError> KnownErrors = new()
{
// Transient errors
[ErrorCodes.NetworkTimeout] = new(
ErrorCodes.NetworkTimeout,
ErrorCategory.Transient,
"Network operation timed out",
"Check network connectivity and firewall rules. If the target service is healthy, increase timeout settings.",
IsRetryable: true,
SuggestedRetryDelay: TimeSpan.FromMinutes(1)),
[ErrorCodes.ConnectionRefused] = new(
ErrorCodes.ConnectionRefused,
ErrorCategory.Transient,
"Connection refused by target host",
"Verify the target service is running and accessible. Check firewall rules and network policies.",
IsRetryable: true,
SuggestedRetryDelay: TimeSpan.FromMinutes(2)),
[ErrorCodes.DnsResolutionFailed] = new(
ErrorCodes.DnsResolutionFailed,
ErrorCategory.Transient,
"DNS resolution failed",
"Verify the hostname is correct. Check DNS server configuration and network connectivity.",
IsRetryable: true,
SuggestedRetryDelay: TimeSpan.FromMinutes(1)),
[ErrorCodes.ServiceUnavailable] = new(
ErrorCodes.ServiceUnavailable,
ErrorCategory.Transient,
"Service temporarily unavailable (503)",
"The target service is temporarily overloaded or under maintenance. Retry with exponential backoff.",
IsRetryable: true,
SuggestedRetryDelay: TimeSpan.FromMinutes(5)),
[ErrorCodes.GatewayTimeout] = new(
ErrorCodes.GatewayTimeout,
ErrorCategory.Transient,
"Gateway timeout (504)",
"An upstream service took too long to respond. This is typically transient; retry with backoff.",
IsRetryable: true,
SuggestedRetryDelay: TimeSpan.FromMinutes(2)),
[ErrorCodes.TemporaryFailure] = new(
ErrorCodes.TemporaryFailure,
ErrorCategory.Transient,
"Temporary failure",
"A transient error occurred. Retry the operation after a brief delay.",
IsRetryable: true,
SuggestedRetryDelay: TimeSpan.FromMinutes(1)),
// Not found errors
[ErrorCodes.ImageNotFound] = new(
ErrorCodes.ImageNotFound,
ErrorCategory.NotFound,
"Container image not found",
"Verify the image reference is correct (repository, tag, digest). Check registry access and that the image exists.",
IsRetryable: false,
SuggestedRetryDelay: null),
[ErrorCodes.SourceNotFound] = new(
ErrorCodes.SourceNotFound,
ErrorCategory.NotFound,
"Source configuration not found",
"The referenced source may have been deleted. Verify the source ID and recreate if necessary.",
IsRetryable: false,
SuggestedRetryDelay: null),
[ErrorCodes.RegistryNotFound] = new(
ErrorCodes.RegistryNotFound,
ErrorCategory.NotFound,
"Container registry not found",
"Verify the registry URL is correct. Check DNS resolution and that the registry is operational.",
IsRetryable: false,
SuggestedRetryDelay: null),
[ErrorCodes.ManifestNotFound] = new(
ErrorCodes.ManifestNotFound,
ErrorCategory.NotFound,
"Image manifest not found",
"The image exists but the manifest is missing. The image may have been deleted or the tag moved.",
IsRetryable: false,
SuggestedRetryDelay: null),
[ErrorCodes.ResourceNotFound] = new(
ErrorCodes.ResourceNotFound,
ErrorCategory.NotFound,
"Resource not found",
"The requested resource does not exist. Verify the resource identifier is correct.",
IsRetryable: false,
SuggestedRetryDelay: null),
// Auth errors
[ErrorCodes.InvalidCredentials] = new(
ErrorCodes.InvalidCredentials,
ErrorCategory.AuthFailure,
"Invalid credentials",
"The provided credentials are invalid. Update the registry credentials in the source configuration.",
IsRetryable: false,
SuggestedRetryDelay: null),
[ErrorCodes.TokenExpired] = new(
ErrorCodes.TokenExpired,
ErrorCategory.AuthFailure,
"Authentication token expired",
"The authentication token has expired. Refresh credentials or re-authenticate to obtain a new token.",
IsRetryable: true,
SuggestedRetryDelay: TimeSpan.FromMinutes(1)),
[ErrorCodes.InsufficientPermissions] = new(
ErrorCodes.InsufficientPermissions,
ErrorCategory.AuthFailure,
"Insufficient permissions",
"The authenticated user lacks required permissions. Request access from the registry administrator.",
IsRetryable: false,
SuggestedRetryDelay: null),
[ErrorCodes.CertificateError] = new(
ErrorCodes.CertificateError,
ErrorCategory.AuthFailure,
"TLS certificate error",
"Certificate validation failed. Verify the CA bundle or add the registry's certificate to trusted roots.",
IsRetryable: false,
SuggestedRetryDelay: null),
[ErrorCodes.AuthenticationFailed] = new(
ErrorCodes.AuthenticationFailed,
ErrorCategory.AuthFailure,
"Authentication failed",
"Unable to authenticate with the target service. Verify credentials and authentication configuration.",
IsRetryable: false,
SuggestedRetryDelay: null),
// Rate limit errors
[ErrorCodes.RateLimited] = new(
ErrorCodes.RateLimited,
ErrorCategory.RateLimited,
"Rate limit exceeded (429)",
"Request rate limit exceeded. Reduce request frequency or upgrade service tier. Will auto-retry with backoff.",
IsRetryable: true,
SuggestedRetryDelay: TimeSpan.FromMinutes(5)),
[ErrorCodes.QuotaExceeded] = new(
ErrorCodes.QuotaExceeded,
ErrorCategory.RateLimited,
"Quota exceeded",
"Usage quota has been exceeded. Wait for quota reset or request quota increase.",
IsRetryable: true,
SuggestedRetryDelay: TimeSpan.FromHours(1)),
[ErrorCodes.ConcurrencyLimitReached] = new(
ErrorCodes.ConcurrencyLimitReached,
ErrorCategory.RateLimited,
"Concurrency limit reached",
"Maximum concurrent operations limit reached. Reduce parallel operations or increase limit.",
IsRetryable: true,
SuggestedRetryDelay: TimeSpan.FromMinutes(1)),
[ErrorCodes.ThrottlingError] = new(
ErrorCodes.ThrottlingError,
ErrorCategory.RateLimited,
"Request throttled",
"Request was throttled due to rate limits. Retry with exponential backoff.",
IsRetryable: true,
SuggestedRetryDelay: TimeSpan.FromMinutes(2)),
// Validation errors
[ErrorCodes.InvalidPayload] = new(
ErrorCodes.InvalidPayload,
ErrorCategory.ValidationError,
"Invalid job payload",
"The job payload is malformed or invalid. Review the payload structure and fix validation errors.",
IsRetryable: false,
SuggestedRetryDelay: null),
[ErrorCodes.InvalidConfiguration] = new(
ErrorCodes.InvalidConfiguration,
ErrorCategory.ValidationError,
"Invalid configuration",
"Source or job configuration is invalid. Review and correct the configuration settings.",
IsRetryable: false,
SuggestedRetryDelay: null),
[ErrorCodes.SchemaValidationFailed] = new(
ErrorCodes.SchemaValidationFailed,
ErrorCategory.ValidationError,
"Schema validation failed",
"Input data failed schema validation. Ensure data conforms to the expected schema.",
IsRetryable: false,
SuggestedRetryDelay: null),
[ErrorCodes.MissingRequiredField] = new(
ErrorCodes.MissingRequiredField,
ErrorCategory.ValidationError,
"Missing required field",
"A required field is missing from the input. Provide all required fields.",
IsRetryable: false,
SuggestedRetryDelay: null),
[ErrorCodes.ValidationFailed] = new(
ErrorCodes.ValidationFailed,
ErrorCategory.ValidationError,
"Validation failed",
"Input validation failed. Review the error details and correct the input.",
IsRetryable: false,
SuggestedRetryDelay: null),
// Upstream errors
[ErrorCodes.RegistryError] = new(
ErrorCodes.RegistryError,
ErrorCategory.UpstreamError,
"Container registry error",
"The container registry returned an error. Check registry status and logs for details.",
IsRetryable: true,
SuggestedRetryDelay: TimeSpan.FromMinutes(5)),
[ErrorCodes.AdvisoryFeedError] = new(
ErrorCodes.AdvisoryFeedError,
ErrorCategory.UpstreamError,
"Advisory feed error",
"Error fetching from advisory feed. Check feed URL and authentication. May be temporary.",
IsRetryable: true,
SuggestedRetryDelay: TimeSpan.FromMinutes(15)),
[ErrorCodes.DatabaseError] = new(
ErrorCodes.DatabaseError,
ErrorCategory.UpstreamError,
"Database error",
"Database operation failed. Check database connectivity and status.",
IsRetryable: true,
SuggestedRetryDelay: TimeSpan.FromMinutes(1)),
[ErrorCodes.ExternalServiceError] = new(
ErrorCodes.ExternalServiceError,
ErrorCategory.UpstreamError,
"External service error",
"An external service dependency failed. Check service status and connectivity.",
IsRetryable: true,
SuggestedRetryDelay: TimeSpan.FromMinutes(5)),
// Internal errors
[ErrorCodes.InternalError] = new(
ErrorCodes.InternalError,
ErrorCategory.InternalError,
"Internal processing error",
"An internal error occurred. This may indicate a bug. Please report if persistent.",
IsRetryable: false,
SuggestedRetryDelay: null),
[ErrorCodes.StateCorruption] = new(
ErrorCodes.StateCorruption,
ErrorCategory.InternalError,
"State corruption detected",
"Internal state corruption detected. Manual intervention may be required.",
IsRetryable: false,
SuggestedRetryDelay: null),
[ErrorCodes.ProcessingError] = new(
ErrorCodes.ProcessingError,
ErrorCategory.InternalError,
"Processing error",
"Error during job processing. Review job payload and configuration.",
IsRetryable: false,
SuggestedRetryDelay: null),
[ErrorCodes.UnexpectedError] = new(
ErrorCodes.UnexpectedError,
ErrorCategory.InternalError,
"Unexpected error",
"An unexpected error occurred. This may indicate a bug. Please report with error details.",
IsRetryable: false,
SuggestedRetryDelay: null),
// Conflict errors
[ErrorCodes.DuplicateJob] = new(
ErrorCodes.DuplicateJob,
ErrorCategory.Conflict,
"Duplicate job detected",
"A job with the same idempotency key already exists. This is expected for retry scenarios.",
IsRetryable: false,
SuggestedRetryDelay: null),
[ErrorCodes.VersionMismatch] = new(
ErrorCodes.VersionMismatch,
ErrorCategory.Conflict,
"Version mismatch",
"Resource version conflict detected. Refresh and retry the operation.",
IsRetryable: true,
SuggestedRetryDelay: TimeSpan.FromSeconds(5)),
[ErrorCodes.ConcurrentModification] = new(
ErrorCodes.ConcurrentModification,
ErrorCategory.Conflict,
"Concurrent modification",
"Resource was modified concurrently. Refresh state and retry.",
IsRetryable: true,
SuggestedRetryDelay: TimeSpan.FromSeconds(5)),
[ErrorCodes.ConflictError] = new(
ErrorCodes.ConflictError,
ErrorCategory.Conflict,
"Resource conflict",
"A resource conflict occurred. Check for concurrent operations.",
IsRetryable: true,
SuggestedRetryDelay: TimeSpan.FromSeconds(10)),
// Canceled errors
[ErrorCodes.UserCanceled] = new(
ErrorCodes.UserCanceled,
ErrorCategory.Canceled,
"Canceled by user",
"Operation was canceled by user request. No action required unless retry is desired.",
IsRetryable: false,
SuggestedRetryDelay: null),
[ErrorCodes.SystemCanceled] = new(
ErrorCodes.SystemCanceled,
ErrorCategory.Canceled,
"Canceled by system",
"Operation was canceled by the system (e.g., shutdown, quota). May be automatically rescheduled.",
IsRetryable: true,
SuggestedRetryDelay: TimeSpan.FromMinutes(5)),
[ErrorCodes.TimeoutCanceled] = new(
ErrorCodes.TimeoutCanceled,
ErrorCategory.Canceled,
"Canceled due to timeout",
"Operation exceeded its time limit. Consider increasing timeout or optimizing the operation.",
IsRetryable: true,
SuggestedRetryDelay: TimeSpan.FromMinutes(2)),
[ErrorCodes.OperationCanceled] = new(
ErrorCodes.OperationCanceled,
ErrorCategory.Canceled,
"Operation canceled",
"The operation was canceled. Check cancellation source for details.",
IsRetryable: false,
SuggestedRetryDelay: null)
};
/// <inheritdoc />
public ClassifiedError Classify(Exception exception)
{
ArgumentNullException.ThrowIfNull(exception);
return exception switch
{
OperationCanceledException => KnownErrors[ErrorCodes.OperationCanceled],
TimeoutException => KnownErrors[ErrorCodes.NetworkTimeout],
HttpRequestException httpEx => ClassifyHttpException(httpEx),
_ when exception.Message.Contains("connection refused", StringComparison.OrdinalIgnoreCase)
=> KnownErrors[ErrorCodes.ConnectionRefused],
_ when exception.Message.Contains("DNS", StringComparison.OrdinalIgnoreCase)
=> KnownErrors[ErrorCodes.DnsResolutionFailed],
_ when exception.Message.Contains("timeout", StringComparison.OrdinalIgnoreCase)
=> KnownErrors[ErrorCodes.NetworkTimeout],
_ when exception.Message.Contains("certificate", StringComparison.OrdinalIgnoreCase)
=> KnownErrors[ErrorCodes.CertificateError],
_ when exception.Message.Contains("unauthorized", StringComparison.OrdinalIgnoreCase)
=> KnownErrors[ErrorCodes.AuthenticationFailed],
_ when exception.Message.Contains("forbidden", StringComparison.OrdinalIgnoreCase)
=> KnownErrors[ErrorCodes.InsufficientPermissions],
_ => new ClassifiedError(
ErrorCodes.UnexpectedError,
ErrorCategory.InternalError,
exception.GetType().Name,
$"Unexpected error: {exception.Message}. Review stack trace for details.",
IsRetryable: false,
SuggestedRetryDelay: null)
};
}
/// <inheritdoc />
public ClassifiedError Classify(string errorCode, string message)
{
ArgumentException.ThrowIfNullOrWhiteSpace(errorCode);
if (KnownErrors.TryGetValue(errorCode, out var known))
{
return known;
}
// Try to infer from error code prefix
var category = errorCode switch
{
_ when errorCode.StartsWith("ORCH-TRN-", StringComparison.Ordinal) => ErrorCategory.Transient,
_ when errorCode.StartsWith("ORCH-NF-", StringComparison.Ordinal) => ErrorCategory.NotFound,
_ when errorCode.StartsWith("ORCH-AUTH-", StringComparison.Ordinal) => ErrorCategory.AuthFailure,
_ when errorCode.StartsWith("ORCH-RL-", StringComparison.Ordinal) => ErrorCategory.RateLimited,
_ when errorCode.StartsWith("ORCH-VAL-", StringComparison.Ordinal) => ErrorCategory.ValidationError,
_ when errorCode.StartsWith("ORCH-UP-", StringComparison.Ordinal) => ErrorCategory.UpstreamError,
_ when errorCode.StartsWith("ORCH-INT-", StringComparison.Ordinal) => ErrorCategory.InternalError,
_ when errorCode.StartsWith("ORCH-CON-", StringComparison.Ordinal) => ErrorCategory.Conflict,
_ when errorCode.StartsWith("ORCH-CAN-", StringComparison.Ordinal) => ErrorCategory.Canceled,
_ => ErrorCategory.Unknown
};
var isRetryable = category is ErrorCategory.Transient or ErrorCategory.RateLimited or ErrorCategory.UpstreamError;
return new ClassifiedError(
errorCode,
category,
message,
"Unknown error code. Review the error message for details.",
isRetryable,
isRetryable ? TimeSpan.FromMinutes(5) : null);
}
/// <inheritdoc />
public ClassifiedError ClassifyHttpError(int statusCode, string? message)
{
return statusCode switch
{
400 => KnownErrors[ErrorCodes.ValidationFailed],
401 => KnownErrors[ErrorCodes.AuthenticationFailed],
403 => KnownErrors[ErrorCodes.InsufficientPermissions],
404 => KnownErrors[ErrorCodes.ResourceNotFound],
408 => KnownErrors[ErrorCodes.NetworkTimeout],
409 => KnownErrors[ErrorCodes.ConflictError],
429 => KnownErrors[ErrorCodes.RateLimited],
500 => KnownErrors[ErrorCodes.InternalError],
502 => KnownErrors[ErrorCodes.ExternalServiceError],
503 => KnownErrors[ErrorCodes.ServiceUnavailable],
504 => KnownErrors[ErrorCodes.GatewayTimeout],
_ when statusCode >= 400 && statusCode < 500 => new ClassifiedError(
$"HTTP-{statusCode}",
ErrorCategory.ValidationError,
message ?? $"HTTP {statusCode} error",
"Client error. Review request parameters.",
IsRetryable: false,
SuggestedRetryDelay: null),
_ when statusCode >= 500 => new ClassifiedError(
$"HTTP-{statusCode}",
ErrorCategory.UpstreamError,
message ?? $"HTTP {statusCode} error",
"Server error. May be transient; retry with backoff.",
IsRetryable: true,
SuggestedRetryDelay: TimeSpan.FromMinutes(2)),
_ => new ClassifiedError(
$"HTTP-{statusCode}",
ErrorCategory.Unknown,
message ?? $"HTTP {statusCode}",
"Unexpected HTTP status. Review response for details.",
IsRetryable: false,
SuggestedRetryDelay: null)
};
}
private ClassifiedError ClassifyHttpException(HttpRequestException ex)
{
if (ex.StatusCode.HasValue)
{
return ClassifyHttpError((int)ex.StatusCode.Value, ex.Message);
}
// No status code - likely a connection error
return ex.Message switch
{
_ when ex.Message.Contains("connection refused", StringComparison.OrdinalIgnoreCase)
=> KnownErrors[ErrorCodes.ConnectionRefused],
_ when ex.Message.Contains("name resolution", StringComparison.OrdinalIgnoreCase)
=> KnownErrors[ErrorCodes.DnsResolutionFailed],
_ when ex.Message.Contains("SSL", StringComparison.OrdinalIgnoreCase) ||
ex.Message.Contains("TLS", StringComparison.OrdinalIgnoreCase)
=> KnownErrors[ErrorCodes.CertificateError],
_ => KnownErrors[ErrorCodes.ExternalServiceError]
};
}
}

View File

@@ -0,0 +1,221 @@
using StellaOps.Orchestrator.Core.Domain;
namespace StellaOps.Orchestrator.Core.DeadLetter;
/// <summary>
/// Repository for dead-letter entry persistence.
/// </summary>
public interface IDeadLetterRepository
{
/// <summary>Gets a dead-letter entry by ID.</summary>
Task<DeadLetterEntry?> GetByIdAsync(
string tenantId,
Guid entryId,
CancellationToken cancellationToken);
/// <summary>Gets a dead-letter entry by original job ID.</summary>
Task<DeadLetterEntry?> GetByOriginalJobIdAsync(
string tenantId,
Guid originalJobId,
CancellationToken cancellationToken);
/// <summary>Lists dead-letter entries with filtering and pagination.</summary>
Task<IReadOnlyList<DeadLetterEntry>> ListAsync(
string tenantId,
DeadLetterListOptions options,
CancellationToken cancellationToken);
/// <summary>Counts dead-letter entries with filtering.</summary>
Task<long> CountAsync(
string tenantId,
DeadLetterListOptions options,
CancellationToken cancellationToken);
/// <summary>Creates a new dead-letter entry.</summary>
Task CreateAsync(
DeadLetterEntry entry,
CancellationToken cancellationToken);
/// <summary>Updates an existing dead-letter entry.</summary>
Task<bool> UpdateAsync(
DeadLetterEntry entry,
CancellationToken cancellationToken);
/// <summary>Gets entries pending replay that are retryable.</summary>
Task<IReadOnlyList<DeadLetterEntry>> GetPendingRetryableAsync(
string tenantId,
int limit,
CancellationToken cancellationToken);
/// <summary>Gets entries by error code.</summary>
Task<IReadOnlyList<DeadLetterEntry>> GetByErrorCodeAsync(
string tenantId,
string errorCode,
DeadLetterStatus? status,
int limit,
CancellationToken cancellationToken);
/// <summary>Gets entries by category.</summary>
Task<IReadOnlyList<DeadLetterEntry>> GetByCategoryAsync(
string tenantId,
ErrorCategory category,
DeadLetterStatus? status,
int limit,
CancellationToken cancellationToken);
/// <summary>Gets aggregated statistics.</summary>
Task<DeadLetterStats> GetStatsAsync(
string tenantId,
CancellationToken cancellationToken);
/// <summary>Gets a summary of actionable entries grouped by error code.</summary>
Task<IReadOnlyList<DeadLetterSummary>> GetActionableSummaryAsync(
string tenantId,
int limit,
CancellationToken cancellationToken);
/// <summary>Marks expired entries.</summary>
Task<int> MarkExpiredAsync(
int batchLimit,
CancellationToken cancellationToken);
/// <summary>Purges old resolved/expired entries.</summary>
Task<int> PurgeOldEntriesAsync(
int retentionDays,
int batchLimit,
CancellationToken cancellationToken);
}
/// <summary>
/// Options for listing dead-letter entries.
/// </summary>
public sealed record DeadLetterListOptions(
DeadLetterStatus? Status = null,
ErrorCategory? Category = null,
string? JobType = null,
string? ErrorCode = null,
Guid? SourceId = null,
Guid? RunId = null,
bool? IsRetryable = null,
DateTimeOffset? CreatedAfter = null,
DateTimeOffset? CreatedBefore = null,
string? Cursor = null,
int Limit = 50,
bool Ascending = false);
/// <summary>
/// Aggregated dead-letter statistics.
/// </summary>
public sealed record DeadLetterStats(
long TotalEntries,
long PendingEntries,
long ReplayingEntries,
long ReplayedEntries,
long ResolvedEntries,
long ExhaustedEntries,
long ExpiredEntries,
long RetryableEntries,
IReadOnlyDictionary<ErrorCategory, long> ByCategory,
IReadOnlyDictionary<string, long> TopErrorCodes,
IReadOnlyDictionary<string, long> TopJobTypes);
/// <summary>
/// Summary of dead-letter entries grouped by error code.
/// </summary>
public sealed record DeadLetterSummary(
string ErrorCode,
ErrorCategory Category,
long EntryCount,
long RetryableCount,
DateTimeOffset OldestEntry,
string? SampleReason);
/// <summary>
/// Repository for replay audit records.
/// </summary>
public interface IReplayAuditRepository
{
/// <summary>Gets audit records for an entry.</summary>
Task<IReadOnlyList<ReplayAuditRecord>> GetByEntryAsync(
string tenantId,
Guid entryId,
CancellationToken cancellationToken);
/// <summary>Gets a specific audit record.</summary>
Task<ReplayAuditRecord?> GetByIdAsync(
string tenantId,
Guid auditId,
CancellationToken cancellationToken);
/// <summary>Creates a new audit record.</summary>
Task CreateAsync(
ReplayAuditRecord record,
CancellationToken cancellationToken);
/// <summary>Updates an audit record (completion).</summary>
Task<bool> UpdateAsync(
ReplayAuditRecord record,
CancellationToken cancellationToken);
/// <summary>Gets audit records for a new job ID (to find replay source).</summary>
Task<ReplayAuditRecord?> GetByNewJobIdAsync(
string tenantId,
Guid newJobId,
CancellationToken cancellationToken);
}
/// <summary>
/// Replay attempt audit record.
/// </summary>
public sealed record ReplayAuditRecord(
Guid AuditId,
string TenantId,
Guid EntryId,
int AttemptNumber,
bool Success,
Guid? NewJobId,
string? ErrorMessage,
string TriggeredBy,
DateTimeOffset TriggeredAt,
DateTimeOffset? CompletedAt,
string InitiatedBy)
{
/// <summary>Creates a new audit record for a replay attempt.</summary>
public static ReplayAuditRecord Create(
string tenantId,
Guid entryId,
int attemptNumber,
string triggeredBy,
string initiatedBy,
DateTimeOffset now) =>
new(
AuditId: Guid.NewGuid(),
TenantId: tenantId,
EntryId: entryId,
AttemptNumber: attemptNumber,
Success: false,
NewJobId: null,
ErrorMessage: null,
TriggeredBy: triggeredBy,
TriggeredAt: now,
CompletedAt: null,
InitiatedBy: initiatedBy);
/// <summary>Marks the replay as successful.</summary>
public ReplayAuditRecord Complete(Guid newJobId, DateTimeOffset now) =>
this with
{
Success = true,
NewJobId = newJobId,
CompletedAt = now
};
/// <summary>Marks the replay as failed.</summary>
public ReplayAuditRecord Fail(string errorMessage, DateTimeOffset now) =>
this with
{
Success = false,
ErrorMessage = errorMessage,
CompletedAt = now
};
}

View File

@@ -0,0 +1,472 @@
using Microsoft.Extensions.Logging;
using StellaOps.Orchestrator.Core.Domain;
namespace StellaOps.Orchestrator.Core.DeadLetter;
/// <summary>
/// Options for replay manager configuration.
/// </summary>
public sealed record ReplayManagerOptions(
/// <summary>Default maximum replay attempts.</summary>
int DefaultMaxReplayAttempts = 3,
/// <summary>Default retention period for dead-letter entries.</summary>
TimeSpan DefaultRetention = default,
/// <summary>Minimum delay between replay attempts.</summary>
TimeSpan MinReplayDelay = default,
/// <summary>Maximum batch size for bulk operations.</summary>
int MaxBatchSize = 100,
/// <summary>Enable automatic replay of retryable entries.</summary>
bool AutoReplayEnabled = false,
/// <summary>Delay before automatic replay.</summary>
TimeSpan AutoReplayDelay = default)
{
/// <summary>Default options.</summary>
public static ReplayManagerOptions Default => new(
DefaultMaxReplayAttempts: 3,
DefaultRetention: TimeSpan.FromDays(30),
MinReplayDelay: TimeSpan.FromMinutes(5),
MaxBatchSize: 100,
AutoReplayEnabled: false,
AutoReplayDelay: TimeSpan.FromMinutes(15));
}
/// <summary>
/// Result of a replay operation.
/// </summary>
public sealed record ReplayResult(
bool Success,
Guid? NewJobId,
string? ErrorMessage,
DeadLetterEntry UpdatedEntry);
/// <summary>
/// Result of a batch replay operation.
/// </summary>
public sealed record BatchReplayResult(
int Attempted,
int Succeeded,
int Failed,
IReadOnlyList<ReplayResult> Results);
/// <summary>
/// Manages dead-letter entry replay operations.
/// </summary>
public interface IReplayManager
{
/// <summary>Replays a single dead-letter entry.</summary>
Task<ReplayResult> ReplayAsync(
string tenantId,
Guid entryId,
string initiatedBy,
CancellationToken cancellationToken);
/// <summary>Replays multiple entries by ID.</summary>
Task<BatchReplayResult> ReplayBatchAsync(
string tenantId,
IReadOnlyList<Guid> entryIds,
string initiatedBy,
CancellationToken cancellationToken);
/// <summary>Replays all pending retryable entries matching criteria.</summary>
Task<BatchReplayResult> ReplayPendingAsync(
string tenantId,
string? errorCode,
ErrorCategory? category,
int maxCount,
string initiatedBy,
CancellationToken cancellationToken);
/// <summary>Resolves an entry without replay.</summary>
Task<DeadLetterEntry> ResolveAsync(
string tenantId,
Guid entryId,
string notes,
string resolvedBy,
CancellationToken cancellationToken);
/// <summary>Resolves multiple entries without replay.</summary>
Task<int> ResolveBatchAsync(
string tenantId,
IReadOnlyList<Guid> entryIds,
string notes,
string resolvedBy,
CancellationToken cancellationToken);
}
/// <summary>
/// Job creator interface for replay operations.
/// </summary>
public interface IJobCreator
{
/// <summary>Creates a new job from a dead-letter entry payload.</summary>
Task<Job> CreateFromReplayAsync(
string tenantId,
string jobType,
string payload,
string payloadDigest,
string idempotencyKey,
string? correlationId,
Guid replayOf,
string createdBy,
CancellationToken cancellationToken);
}
/// <summary>
/// Default replay manager implementation.
/// </summary>
public sealed class ReplayManager : IReplayManager
{
private readonly IDeadLetterRepository _deadLetterRepository;
private readonly IReplayAuditRepository _auditRepository;
private readonly IJobCreator _jobCreator;
private readonly IDeadLetterNotifier _notifier;
private readonly TimeProvider _timeProvider;
private readonly ReplayManagerOptions _options;
private readonly ILogger<ReplayManager> _logger;
public ReplayManager(
IDeadLetterRepository deadLetterRepository,
IReplayAuditRepository auditRepository,
IJobCreator jobCreator,
IDeadLetterNotifier notifier,
TimeProvider timeProvider,
ReplayManagerOptions options,
ILogger<ReplayManager> logger)
{
_deadLetterRepository = deadLetterRepository ?? throw new ArgumentNullException(nameof(deadLetterRepository));
_auditRepository = auditRepository ?? throw new ArgumentNullException(nameof(auditRepository));
_jobCreator = jobCreator ?? throw new ArgumentNullException(nameof(jobCreator));
_notifier = notifier ?? throw new ArgumentNullException(nameof(notifier));
_timeProvider = timeProvider ?? throw new ArgumentNullException(nameof(timeProvider));
_options = options ?? ReplayManagerOptions.Default;
_logger = logger ?? throw new ArgumentNullException(nameof(logger));
}
public async Task<ReplayResult> ReplayAsync(
string tenantId,
Guid entryId,
string initiatedBy,
CancellationToken cancellationToken)
{
ArgumentException.ThrowIfNullOrWhiteSpace(tenantId);
ArgumentException.ThrowIfNullOrWhiteSpace(initiatedBy);
var entry = await _deadLetterRepository.GetByIdAsync(tenantId, entryId, cancellationToken)
.ConfigureAwait(false);
if (entry is null)
{
throw new InvalidOperationException($"Dead-letter entry {entryId} not found.");
}
return await ReplayEntryAsync(entry, "manual", initiatedBy, cancellationToken).ConfigureAwait(false);
}
public async Task<BatchReplayResult> ReplayBatchAsync(
string tenantId,
IReadOnlyList<Guid> entryIds,
string initiatedBy,
CancellationToken cancellationToken)
{
ArgumentException.ThrowIfNullOrWhiteSpace(tenantId);
ArgumentNullException.ThrowIfNull(entryIds);
ArgumentException.ThrowIfNullOrWhiteSpace(initiatedBy);
if (entryIds.Count > _options.MaxBatchSize)
{
throw new ArgumentException($"Batch size {entryIds.Count} exceeds maximum {_options.MaxBatchSize}.");
}
var results = new List<ReplayResult>();
var succeeded = 0;
var failed = 0;
foreach (var entryId in entryIds)
{
try
{
var entry = await _deadLetterRepository.GetByIdAsync(tenantId, entryId, cancellationToken)
.ConfigureAwait(false);
if (entry is null)
{
results.Add(new ReplayResult(
Success: false,
NewJobId: null,
ErrorMessage: $"Entry {entryId} not found.",
UpdatedEntry: null!));
failed++;
continue;
}
var result = await ReplayEntryAsync(entry, "batch", initiatedBy, cancellationToken)
.ConfigureAwait(false);
results.Add(result);
if (result.Success)
succeeded++;
else
failed++;
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to replay entry {EntryId}", entryId);
results.Add(new ReplayResult(
Success: false,
NewJobId: null,
ErrorMessage: ex.Message,
UpdatedEntry: null!));
failed++;
}
}
return new BatchReplayResult(
Attempted: entryIds.Count,
Succeeded: succeeded,
Failed: failed,
Results: results);
}
public async Task<BatchReplayResult> ReplayPendingAsync(
string tenantId,
string? errorCode,
ErrorCategory? category,
int maxCount,
string initiatedBy,
CancellationToken cancellationToken)
{
ArgumentException.ThrowIfNullOrWhiteSpace(tenantId);
ArgumentException.ThrowIfNullOrWhiteSpace(initiatedBy);
var effectiveLimit = Math.Min(maxCount, _options.MaxBatchSize);
IReadOnlyList<DeadLetterEntry> entries;
if (!string.IsNullOrEmpty(errorCode))
{
entries = await _deadLetterRepository.GetByErrorCodeAsync(
tenantId, errorCode, DeadLetterStatus.Pending, effectiveLimit, cancellationToken)
.ConfigureAwait(false);
}
else if (category.HasValue)
{
entries = await _deadLetterRepository.GetByCategoryAsync(
tenantId, category.Value, DeadLetterStatus.Pending, effectiveLimit, cancellationToken)
.ConfigureAwait(false);
}
else
{
entries = await _deadLetterRepository.GetPendingRetryableAsync(tenantId, effectiveLimit, cancellationToken)
.ConfigureAwait(false);
}
var results = new List<ReplayResult>();
var succeeded = 0;
var failed = 0;
foreach (var entry in entries)
{
if (!entry.CanReplay)
{
continue;
}
try
{
var result = await ReplayEntryAsync(entry, "auto", initiatedBy, cancellationToken)
.ConfigureAwait(false);
results.Add(result);
if (result.Success)
succeeded++;
else
failed++;
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to replay entry {EntryId}", entry.EntryId);
results.Add(new ReplayResult(
Success: false,
NewJobId: null,
ErrorMessage: ex.Message,
UpdatedEntry: entry));
failed++;
}
}
return new BatchReplayResult(
Attempted: results.Count,
Succeeded: succeeded,
Failed: failed,
Results: results);
}
public async Task<DeadLetterEntry> ResolveAsync(
string tenantId,
Guid entryId,
string notes,
string resolvedBy,
CancellationToken cancellationToken)
{
ArgumentException.ThrowIfNullOrWhiteSpace(tenantId);
ArgumentException.ThrowIfNullOrWhiteSpace(resolvedBy);
var entry = await _deadLetterRepository.GetByIdAsync(tenantId, entryId, cancellationToken)
.ConfigureAwait(false);
if (entry is null)
{
throw new InvalidOperationException($"Dead-letter entry {entryId} not found.");
}
var now = _timeProvider.GetUtcNow();
var resolved = entry.Resolve(notes, resolvedBy, now);
await _deadLetterRepository.UpdateAsync(resolved, cancellationToken).ConfigureAwait(false);
_logger.LogInformation(
"Resolved dead-letter entry {EntryId} for job {JobId}. Notes: {Notes}",
entryId, entry.OriginalJobId, notes);
return resolved;
}
public async Task<int> ResolveBatchAsync(
string tenantId,
IReadOnlyList<Guid> entryIds,
string notes,
string resolvedBy,
CancellationToken cancellationToken)
{
ArgumentException.ThrowIfNullOrWhiteSpace(tenantId);
ArgumentNullException.ThrowIfNull(entryIds);
ArgumentException.ThrowIfNullOrWhiteSpace(resolvedBy);
var resolved = 0;
var now = _timeProvider.GetUtcNow();
foreach (var entryId in entryIds)
{
try
{
var entry = await _deadLetterRepository.GetByIdAsync(tenantId, entryId, cancellationToken)
.ConfigureAwait(false);
if (entry is null || entry.IsTerminal)
{
continue;
}
var resolvedEntry = entry.Resolve(notes, resolvedBy, now);
await _deadLetterRepository.UpdateAsync(resolvedEntry, cancellationToken).ConfigureAwait(false);
resolved++;
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to resolve entry {EntryId}", entryId);
}
}
return resolved;
}
private async Task<ReplayResult> ReplayEntryAsync(
DeadLetterEntry entry,
string triggeredBy,
string initiatedBy,
CancellationToken cancellationToken)
{
if (!entry.CanReplay)
{
return new ReplayResult(
Success: false,
NewJobId: null,
ErrorMessage: $"Entry cannot be replayed: status={entry.Status}, attempts={entry.ReplayAttempts}/{entry.MaxReplayAttempts}, retryable={entry.IsRetryable}",
UpdatedEntry: entry);
}
var now = _timeProvider.GetUtcNow();
// Mark entry as replaying
var replaying = entry.StartReplay(initiatedBy, now);
await _deadLetterRepository.UpdateAsync(replaying, cancellationToken).ConfigureAwait(false);
// Create audit record
var auditRecord = ReplayAuditRecord.Create(
entry.TenantId,
entry.EntryId,
replaying.ReplayAttempts,
triggeredBy,
initiatedBy,
now);
await _auditRepository.CreateAsync(auditRecord, cancellationToken).ConfigureAwait(false);
try
{
// Create new job with updated idempotency key
var newIdempotencyKey = $"{entry.IdempotencyKey}:replay:{replaying.ReplayAttempts}";
var newJob = await _jobCreator.CreateFromReplayAsync(
entry.TenantId,
entry.JobType,
entry.Payload,
entry.PayloadDigest,
newIdempotencyKey,
entry.CorrelationId,
entry.OriginalJobId,
initiatedBy,
cancellationToken).ConfigureAwait(false);
// Mark replay successful
now = _timeProvider.GetUtcNow();
var completed = replaying.CompleteReplay(newJob.JobId, initiatedBy, now);
await _deadLetterRepository.UpdateAsync(completed, cancellationToken).ConfigureAwait(false);
// Update audit record
var completedAudit = auditRecord.Complete(newJob.JobId, now);
await _auditRepository.UpdateAsync(completedAudit, cancellationToken).ConfigureAwait(false);
_logger.LogInformation(
"Replayed dead-letter entry {EntryId} as new job {NewJobId}",
entry.EntryId, newJob.JobId);
// Notify on success
await _notifier.NotifyReplaySuccessAsync(completed, newJob.JobId, cancellationToken)
.ConfigureAwait(false);
return new ReplayResult(
Success: true,
NewJobId: newJob.JobId,
ErrorMessage: null,
UpdatedEntry: completed);
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to replay entry {EntryId}", entry.EntryId);
// Mark replay failed
now = _timeProvider.GetUtcNow();
var failed = replaying.FailReplay(ex.Message, initiatedBy, now);
await _deadLetterRepository.UpdateAsync(failed, cancellationToken).ConfigureAwait(false);
// Update audit record
var failedAudit = auditRecord.Fail(ex.Message, now);
await _auditRepository.UpdateAsync(failedAudit, cancellationToken).ConfigureAwait(false);
// Notify on exhausted
if (failed.Status == DeadLetterStatus.Exhausted)
{
await _notifier.NotifyExhaustedAsync(failed, cancellationToken).ConfigureAwait(false);
}
return new ReplayResult(
Success: false,
NewJobId: null,
ErrorMessage: ex.Message,
UpdatedEntry: failed);
}
}
}

View File

@@ -0,0 +1,39 @@
namespace StellaOps.Orchestrator.Core.Domain;
/// <summary>
/// Represents an artifact produced by a job execution.
/// Artifacts are immutable outputs with content digests for provenance.
/// </summary>
public sealed record Artifact(
/// <summary>Unique artifact identifier.</summary>
Guid ArtifactId,
/// <summary>Tenant owning this artifact.</summary>
string TenantId,
/// <summary>Job that produced this artifact.</summary>
Guid JobId,
/// <summary>Run containing the producing job (if any).</summary>
Guid? RunId,
/// <summary>Artifact type (e.g., "sbom", "scan-result", "attestation", "log").</summary>
string ArtifactType,
/// <summary>Storage URI (e.g., "s3://bucket/path", "file:///local/path").</summary>
string Uri,
/// <summary>Content digest (SHA-256) for integrity verification.</summary>
string Digest,
/// <summary>MIME type (e.g., "application/json", "application/vnd.cyclonedx+json").</summary>
string? MimeType,
/// <summary>Artifact size in bytes.</summary>
long? SizeBytes,
/// <summary>When the artifact was created.</summary>
DateTimeOffset CreatedAt,
/// <summary>Optional metadata JSON blob.</summary>
string? Metadata);

View File

@@ -0,0 +1,250 @@
namespace StellaOps.Orchestrator.Core.Domain;
/// <summary>
/// Represents an immutable audit log entry for orchestrator operations.
/// Captures who did what, when, and with what effect.
/// </summary>
public sealed record AuditEntry(
/// <summary>Unique audit entry identifier.</summary>
Guid EntryId,
/// <summary>Tenant owning this entry.</summary>
string TenantId,
/// <summary>Type of audited event.</summary>
AuditEventType EventType,
/// <summary>Resource type being audited (job, run, source, quota, etc.).</summary>
string ResourceType,
/// <summary>Resource identifier being audited.</summary>
Guid ResourceId,
/// <summary>Actor who performed the action.</summary>
string ActorId,
/// <summary>Actor type (user, system, worker, api-key).</summary>
ActorType ActorType,
/// <summary>IP address of the actor (if applicable).</summary>
string? ActorIp,
/// <summary>User agent string (if applicable).</summary>
string? UserAgent,
/// <summary>HTTP method used (if applicable).</summary>
string? HttpMethod,
/// <summary>Request path (if applicable).</summary>
string? RequestPath,
/// <summary>State before the change (JSON).</summary>
string? OldState,
/// <summary>State after the change (JSON).</summary>
string? NewState,
/// <summary>Human-readable description of the change.</summary>
string Description,
/// <summary>Correlation ID for distributed tracing.</summary>
string? CorrelationId,
/// <summary>SHA-256 hash of the previous entry for chain integrity.</summary>
string? PreviousEntryHash,
/// <summary>SHA-256 hash of this entry's content for integrity.</summary>
string ContentHash,
/// <summary>Sequence number within the tenant's audit stream.</summary>
long SequenceNumber,
/// <summary>When the event occurred.</summary>
DateTimeOffset OccurredAt,
/// <summary>Optional metadata JSON blob.</summary>
string? Metadata)
{
/// <summary>
/// Creates a new audit entry with computed hash.
/// </summary>
public static AuditEntry Create(
string tenantId,
AuditEventType eventType,
string resourceType,
Guid resourceId,
string actorId,
ActorType actorType,
string description,
string? oldState = null,
string? newState = null,
string? actorIp = null,
string? userAgent = null,
string? httpMethod = null,
string? requestPath = null,
string? correlationId = null,
string? previousEntryHash = null,
long sequenceNumber = 0,
string? metadata = null)
{
var entryId = Guid.NewGuid();
var occurredAt = DateTimeOffset.UtcNow;
// Compute content hash from entry data
var contentToHash = $"{entryId}|{tenantId}|{eventType}|{resourceType}|{resourceId}|{actorId}|{actorType}|{description}|{oldState}|{newState}|{occurredAt:O}|{sequenceNumber}";
var contentHash = ComputeSha256(contentToHash);
return new AuditEntry(
EntryId: entryId,
TenantId: tenantId,
EventType: eventType,
ResourceType: resourceType,
ResourceId: resourceId,
ActorId: actorId,
ActorType: actorType,
ActorIp: actorIp,
UserAgent: userAgent,
HttpMethod: httpMethod,
RequestPath: requestPath,
OldState: oldState,
NewState: newState,
Description: description,
CorrelationId: correlationId,
PreviousEntryHash: previousEntryHash,
ContentHash: contentHash,
SequenceNumber: sequenceNumber,
OccurredAt: occurredAt,
Metadata: metadata);
}
/// <summary>
/// Verifies the integrity of this entry's content hash.
/// </summary>
public bool VerifyIntegrity()
{
var contentToHash = $"{EntryId}|{TenantId}|{EventType}|{ResourceType}|{ResourceId}|{ActorId}|{ActorType}|{Description}|{OldState}|{NewState}|{OccurredAt:O}|{SequenceNumber}";
var computed = ComputeSha256(contentToHash);
return string.Equals(ContentHash, computed, StringComparison.OrdinalIgnoreCase);
}
/// <summary>
/// Verifies the chain link to the previous entry.
/// </summary>
public bool VerifyChainLink(AuditEntry? previousEntry)
{
if (previousEntry is null)
{
return PreviousEntryHash is null || SequenceNumber == 1;
}
return string.Equals(PreviousEntryHash, previousEntry.ContentHash, StringComparison.OrdinalIgnoreCase);
}
private static string ComputeSha256(string content)
{
var bytes = System.Text.Encoding.UTF8.GetBytes(content);
var hash = System.Security.Cryptography.SHA256.HashData(bytes);
return Convert.ToHexString(hash).ToLowerInvariant();
}
}
/// <summary>
/// Types of auditable events in the orchestrator.
/// </summary>
public enum AuditEventType
{
// Job lifecycle events
JobCreated = 100,
JobScheduled = 101,
JobLeased = 102,
JobCompleted = 103,
JobFailed = 104,
JobCanceled = 105,
JobRetried = 106,
// Run lifecycle events
RunCreated = 200,
RunStarted = 201,
RunCompleted = 202,
RunFailed = 203,
RunCanceled = 204,
// Source management events
SourceCreated = 300,
SourceUpdated = 301,
SourcePaused = 302,
SourceResumed = 303,
SourceDeleted = 304,
// Quota management events
QuotaCreated = 400,
QuotaUpdated = 401,
QuotaPaused = 402,
QuotaResumed = 403,
QuotaDeleted = 404,
// SLO management events
SloCreated = 500,
SloUpdated = 501,
SloEnabled = 502,
SloDisabled = 503,
SloDeleted = 504,
SloAlertTriggered = 505,
SloAlertAcknowledged = 506,
SloAlertResolved = 507,
// Dead-letter events
DeadLetterCreated = 600,
DeadLetterReplayed = 601,
DeadLetterResolved = 602,
DeadLetterExpired = 603,
// Backfill events
BackfillCreated = 700,
BackfillStarted = 701,
BackfillCompleted = 702,
BackfillFailed = 703,
BackfillCanceled = 704,
// Ledger events
LedgerExportRequested = 800,
LedgerExportCompleted = 801,
LedgerExportFailed = 802,
// Worker events
WorkerClaimed = 900,
WorkerHeartbeat = 901,
WorkerProgressReported = 902,
WorkerCompleted = 903,
// Security events
AuthenticationSuccess = 1000,
AuthenticationFailure = 1001,
AuthorizationDenied = 1002,
ApiKeyCreated = 1003,
ApiKeyRevoked = 1004
}
/// <summary>
/// Types of actors that can perform auditable actions.
/// </summary>
public enum ActorType
{
/// <summary>Human user via UI or API.</summary>
User = 0,
/// <summary>System-initiated action (scheduler, background job).</summary>
System = 1,
/// <summary>Worker process.</summary>
Worker = 2,
/// <summary>API key authentication.</summary>
ApiKey = 3,
/// <summary>Service-to-service call.</summary>
Service = 4,
/// <summary>Unknown or unidentified actor.</summary>
Unknown = 99
}

View File

@@ -0,0 +1,429 @@
namespace StellaOps.Orchestrator.Core.Domain;
/// <summary>
/// Represents a request to backfill/reprocess events within a time window.
/// </summary>
public sealed record BackfillRequest(
/// <summary>Unique backfill request identifier.</summary>
Guid BackfillId,
/// <summary>Tenant this backfill applies to.</summary>
string TenantId,
/// <summary>Source to backfill (null if job-type scoped).</summary>
Guid? SourceId,
/// <summary>Job type to backfill (null if source-scoped).</summary>
string? JobType,
/// <summary>Normalized scope key.</summary>
string ScopeKey,
/// <summary>Current status of the backfill.</summary>
BackfillStatus Status,
/// <summary>Start of the time window to backfill (inclusive).</summary>
DateTimeOffset WindowStart,
/// <summary>End of the time window to backfill (exclusive).</summary>
DateTimeOffset WindowEnd,
/// <summary>Current processing position within the window.</summary>
DateTimeOffset? CurrentPosition,
/// <summary>Total events estimated in the window.</summary>
long? TotalEvents,
/// <summary>Events successfully processed.</summary>
long ProcessedEvents,
/// <summary>Events skipped due to duplicate suppression.</summary>
long SkippedEvents,
/// <summary>Events that failed processing.</summary>
long FailedEvents,
/// <summary>Number of events to process per batch.</summary>
int BatchSize,
/// <summary>Whether this is a dry-run (preview only, no changes).</summary>
bool DryRun,
/// <summary>Whether to force reprocessing (ignore duplicate suppression).</summary>
bool ForceReprocess,
/// <summary>Estimated duration for the backfill.</summary>
TimeSpan? EstimatedDuration,
/// <summary>Maximum allowed duration (safety limit).</summary>
TimeSpan? MaxDuration,
/// <summary>Results of safety validation checks.</summary>
BackfillSafetyChecks? SafetyChecks,
/// <summary>Reason for the backfill request.</summary>
string Reason,
/// <summary>Optional ticket reference for audit.</summary>
string? Ticket,
/// <summary>When the request was created.</summary>
DateTimeOffset CreatedAt,
/// <summary>When processing started.</summary>
DateTimeOffset? StartedAt,
/// <summary>When processing completed.</summary>
DateTimeOffset? CompletedAt,
/// <summary>Actor who created the request.</summary>
string CreatedBy,
/// <summary>Actor who last modified the request.</summary>
string UpdatedBy,
/// <summary>Error message if failed.</summary>
string? ErrorMessage)
{
/// <summary>
/// Window duration.
/// </summary>
public TimeSpan WindowDuration => WindowEnd - WindowStart;
/// <summary>
/// Progress percentage (0-100).
/// </summary>
public double ProgressPercent => TotalEvents > 0
? Math.Round((double)(ProcessedEvents + SkippedEvents + FailedEvents) / TotalEvents.Value * 100, 2)
: 0;
/// <summary>
/// Whether the backfill is in a terminal state.
/// </summary>
public bool IsTerminal => Status is BackfillStatus.Completed or BackfillStatus.Failed or BackfillStatus.Canceled;
/// <summary>
/// Creates a new backfill request.
/// </summary>
public static BackfillRequest Create(
string tenantId,
Guid? sourceId,
string? jobType,
DateTimeOffset windowStart,
DateTimeOffset windowEnd,
string reason,
string createdBy,
int batchSize = 100,
bool dryRun = false,
bool forceReprocess = false,
string? ticket = null,
TimeSpan? maxDuration = null)
{
if (windowEnd <= windowStart)
throw new ArgumentException("Window end must be after window start.", nameof(windowEnd));
if (batchSize <= 0 || batchSize > 10000)
throw new ArgumentOutOfRangeException(nameof(batchSize), "Batch size must be between 1 and 10000.");
var scopeKey = (sourceId, jobType) switch
{
(Guid s, string j) when !string.IsNullOrEmpty(j) => Watermark.CreateScopeKey(s, j),
(Guid s, _) => Watermark.CreateScopeKey(s),
(_, string j) when !string.IsNullOrEmpty(j) => Watermark.CreateScopeKey(j),
_ => throw new ArgumentException("Either sourceId or jobType must be specified.")
};
var now = DateTimeOffset.UtcNow;
return new BackfillRequest(
BackfillId: Guid.NewGuid(),
TenantId: tenantId,
SourceId: sourceId,
JobType: jobType,
ScopeKey: scopeKey,
Status: BackfillStatus.Pending,
WindowStart: windowStart,
WindowEnd: windowEnd,
CurrentPosition: null,
TotalEvents: null,
ProcessedEvents: 0,
SkippedEvents: 0,
FailedEvents: 0,
BatchSize: batchSize,
DryRun: dryRun,
ForceReprocess: forceReprocess,
EstimatedDuration: null,
MaxDuration: maxDuration,
SafetyChecks: null,
Reason: reason,
Ticket: ticket,
CreatedAt: now,
StartedAt: null,
CompletedAt: null,
CreatedBy: createdBy,
UpdatedBy: createdBy,
ErrorMessage: null);
}
/// <summary>
/// Transitions to validating status.
/// </summary>
public BackfillRequest StartValidation(string updatedBy)
{
if (Status != BackfillStatus.Pending)
throw new InvalidOperationException($"Cannot start validation from status {Status}.");
return this with
{
Status = BackfillStatus.Validating,
UpdatedBy = updatedBy
};
}
/// <summary>
/// Records safety check results.
/// </summary>
public BackfillRequest WithSafetyChecks(BackfillSafetyChecks checks, long? totalEvents, TimeSpan? estimatedDuration, string updatedBy)
{
return this with
{
SafetyChecks = checks,
TotalEvents = totalEvents,
EstimatedDuration = estimatedDuration,
UpdatedBy = updatedBy
};
}
/// <summary>
/// Transitions to running status.
/// </summary>
public BackfillRequest Start(string updatedBy)
{
if (Status != BackfillStatus.Validating)
throw new InvalidOperationException($"Cannot start from status {Status}.");
if (SafetyChecks?.HasBlockingIssues == true)
throw new InvalidOperationException("Cannot start backfill with blocking safety issues.");
return this with
{
Status = BackfillStatus.Running,
StartedAt = DateTimeOffset.UtcNow,
CurrentPosition = WindowStart,
UpdatedBy = updatedBy
};
}
/// <summary>
/// Updates progress after processing a batch.
/// </summary>
public BackfillRequest UpdateProgress(
DateTimeOffset newPosition,
long processed,
long skipped,
long failed,
string updatedBy)
{
if (Status != BackfillStatus.Running)
throw new InvalidOperationException($"Cannot update progress in status {Status}.");
return this with
{
CurrentPosition = newPosition,
ProcessedEvents = ProcessedEvents + processed,
SkippedEvents = SkippedEvents + skipped,
FailedEvents = FailedEvents + failed,
UpdatedBy = updatedBy
};
}
/// <summary>
/// Pauses the backfill.
/// </summary>
public BackfillRequest Pause(string updatedBy)
{
if (Status != BackfillStatus.Running)
throw new InvalidOperationException($"Cannot pause from status {Status}.");
return this with
{
Status = BackfillStatus.Paused,
UpdatedBy = updatedBy
};
}
/// <summary>
/// Resumes a paused backfill.
/// </summary>
public BackfillRequest Resume(string updatedBy)
{
if (Status != BackfillStatus.Paused)
throw new InvalidOperationException($"Cannot resume from status {Status}.");
return this with
{
Status = BackfillStatus.Running,
UpdatedBy = updatedBy
};
}
/// <summary>
/// Completes the backfill successfully.
/// </summary>
public BackfillRequest Complete(string updatedBy)
{
if (Status != BackfillStatus.Running)
throw new InvalidOperationException($"Cannot complete from status {Status}.");
return this with
{
Status = BackfillStatus.Completed,
CompletedAt = DateTimeOffset.UtcNow,
CurrentPosition = WindowEnd,
UpdatedBy = updatedBy
};
}
/// <summary>
/// Fails the backfill with an error.
/// </summary>
public BackfillRequest Fail(string error, string updatedBy)
{
return this with
{
Status = BackfillStatus.Failed,
CompletedAt = DateTimeOffset.UtcNow,
ErrorMessage = error,
UpdatedBy = updatedBy
};
}
/// <summary>
/// Cancels the backfill.
/// </summary>
public BackfillRequest Cancel(string updatedBy)
{
if (IsTerminal)
throw new InvalidOperationException($"Cannot cancel from terminal status {Status}.");
return this with
{
Status = BackfillStatus.Canceled,
CompletedAt = DateTimeOffset.UtcNow,
UpdatedBy = updatedBy
};
}
}
/// <summary>
/// Status of a backfill request.
/// </summary>
public enum BackfillStatus
{
/// <summary>Request created, awaiting validation.</summary>
Pending,
/// <summary>Running safety validations.</summary>
Validating,
/// <summary>Actively processing events.</summary>
Running,
/// <summary>Temporarily paused.</summary>
Paused,
/// <summary>Successfully completed.</summary>
Completed,
/// <summary>Failed with error.</summary>
Failed,
/// <summary>Canceled by operator.</summary>
Canceled
}
/// <summary>
/// Results of backfill safety validation checks.
/// </summary>
public sealed record BackfillSafetyChecks(
/// <summary>Whether the source exists and is accessible.</summary>
bool SourceExists,
/// <summary>Whether there are overlapping active backfills.</summary>
bool HasOverlappingBackfill,
/// <summary>Whether the window is within retention period.</summary>
bool WithinRetention,
/// <summary>Whether the estimated event count is within limits.</summary>
bool WithinEventLimit,
/// <summary>Whether estimated duration is within max duration.</summary>
bool WithinDurationLimit,
/// <summary>Whether required quotas are available.</summary>
bool QuotaAvailable,
/// <summary>Warning messages (non-blocking).</summary>
IReadOnlyList<string> Warnings,
/// <summary>Error messages (blocking).</summary>
IReadOnlyList<string> Errors)
{
/// <summary>
/// Whether there are any blocking issues.
/// </summary>
public bool HasBlockingIssues => !SourceExists || HasOverlappingBackfill || !WithinRetention
|| !WithinEventLimit || !WithinDurationLimit || Errors.Count > 0;
/// <summary>
/// Whether the backfill is safe to proceed.
/// </summary>
public bool IsSafe => !HasBlockingIssues;
/// <summary>
/// Creates successful safety checks with no issues.
/// </summary>
public static BackfillSafetyChecks AllPassed() => new(
SourceExists: true,
HasOverlappingBackfill: false,
WithinRetention: true,
WithinEventLimit: true,
WithinDurationLimit: true,
QuotaAvailable: true,
Warnings: [],
Errors: []);
}
/// <summary>
/// Preview result for dry-run backfill.
/// </summary>
public sealed record BackfillPreview(
/// <summary>Scope being backfilled.</summary>
string ScopeKey,
/// <summary>Time window for backfill.</summary>
DateTimeOffset WindowStart,
/// <summary>Time window for backfill.</summary>
DateTimeOffset WindowEnd,
/// <summary>Estimated total events in window.</summary>
long EstimatedEvents,
/// <summary>Events that would be skipped (already processed).</summary>
long SkippedEvents,
/// <summary>Events that would be processed.</summary>
long ProcessableEvents,
/// <summary>Estimated duration.</summary>
TimeSpan EstimatedDuration,
/// <summary>Number of batches required.</summary>
int EstimatedBatches,
/// <summary>Safety validation results.</summary>
BackfillSafetyChecks SafetyChecks,
/// <summary>Sample of event keys that would be processed.</summary>
IReadOnlyList<string> SampleEventKeys);

View File

@@ -0,0 +1,42 @@
namespace StellaOps.Orchestrator.Core.Domain;
/// <summary>
/// Represents a dependency edge in a job DAG (Directed Acyclic Graph).
/// The child job cannot start until the parent job succeeds.
/// </summary>
public sealed record DagEdge(
/// <summary>Unique edge identifier.</summary>
Guid EdgeId,
/// <summary>Tenant owning this edge.</summary>
string TenantId,
/// <summary>Run containing these jobs.</summary>
Guid RunId,
/// <summary>Parent job ID (must complete first).</summary>
Guid ParentJobId,
/// <summary>Child job ID (depends on parent).</summary>
Guid ChildJobId,
/// <summary>Edge type (e.g., "success", "always", "failure").</summary>
string EdgeType,
/// <summary>When this edge was created.</summary>
DateTimeOffset CreatedAt);
/// <summary>
/// Edge types defining dependency semantics.
/// </summary>
public static class DagEdgeTypes
{
/// <summary>Child runs only if parent succeeds.</summary>
public const string Success = "success";
/// <summary>Child runs regardless of parent outcome.</summary>
public const string Always = "always";
/// <summary>Child runs only if parent fails.</summary>
public const string Failure = "failure";
}

View File

@@ -0,0 +1,292 @@
namespace StellaOps.Orchestrator.Core.Domain;
/// <summary>
/// Represents a job that has been moved to the dead-letter store after exhausting retries
/// or encountering a non-retryable error.
/// </summary>
public sealed record DeadLetterEntry(
/// <summary>Unique dead-letter entry identifier.</summary>
Guid EntryId,
/// <summary>Tenant owning this entry.</summary>
string TenantId,
/// <summary>Original job that failed.</summary>
Guid OriginalJobId,
/// <summary>Run the job belonged to (if any).</summary>
Guid? RunId,
/// <summary>Source the job was processing (if any).</summary>
Guid? SourceId,
/// <summary>Job type (e.g., "scan.image", "advisory.nvd").</summary>
string JobType,
/// <summary>Job payload JSON (inputs, parameters).</summary>
string Payload,
/// <summary>SHA-256 digest of the payload.</summary>
string PayloadDigest,
/// <summary>Idempotency key from original job.</summary>
string IdempotencyKey,
/// <summary>Correlation ID for distributed tracing.</summary>
string? CorrelationId,
/// <summary>Current entry status.</summary>
DeadLetterStatus Status,
/// <summary>Classified error code.</summary>
string ErrorCode,
/// <summary>Human-readable failure reason.</summary>
string FailureReason,
/// <summary>Suggested remediation hint for operators.</summary>
string? RemediationHint,
/// <summary>Error classification category.</summary>
ErrorCategory Category,
/// <summary>Whether this error is potentially retryable.</summary>
bool IsRetryable,
/// <summary>Number of attempts made by original job.</summary>
int OriginalAttempts,
/// <summary>Number of replay attempts from dead-letter.</summary>
int ReplayAttempts,
/// <summary>Maximum replay attempts allowed.</summary>
int MaxReplayAttempts,
/// <summary>When the job originally failed.</summary>
DateTimeOffset FailedAt,
/// <summary>When the entry was created in dead-letter store.</summary>
DateTimeOffset CreatedAt,
/// <summary>When the entry was last updated.</summary>
DateTimeOffset UpdatedAt,
/// <summary>When the entry expires and can be purged.</summary>
DateTimeOffset ExpiresAt,
/// <summary>When the entry was resolved (if applicable).</summary>
DateTimeOffset? ResolvedAt,
/// <summary>Resolution notes (if resolved).</summary>
string? ResolutionNotes,
/// <summary>Actor who created/submitted the original job.</summary>
string CreatedBy,
/// <summary>Actor who last updated the entry.</summary>
string UpdatedBy)
{
/// <summary>Default retention period for dead-letter entries.</summary>
public static readonly TimeSpan DefaultRetention = TimeSpan.FromDays(30);
/// <summary>Default maximum replay attempts.</summary>
public const int DefaultMaxReplayAttempts = 3;
/// <summary>Whether this entry is in a terminal state.</summary>
public bool IsTerminal => Status is DeadLetterStatus.Replayed
or DeadLetterStatus.Resolved
or DeadLetterStatus.Exhausted
or DeadLetterStatus.Expired;
/// <summary>Whether more replay attempts are allowed.</summary>
public bool CanReplay => !IsTerminal && IsRetryable && ReplayAttempts < MaxReplayAttempts;
/// <summary>Creates a new dead-letter entry from a failed job.</summary>
public static DeadLetterEntry FromFailedJob(
Job job,
string errorCode,
string failureReason,
string? remediationHint,
ErrorCategory category,
bool isRetryable,
DateTimeOffset now,
TimeSpan? retention = null,
int? maxReplayAttempts = null)
{
ArgumentNullException.ThrowIfNull(job);
ArgumentException.ThrowIfNullOrWhiteSpace(errorCode);
ArgumentException.ThrowIfNullOrWhiteSpace(failureReason);
var effectiveRetention = retention ?? DefaultRetention;
var effectiveMaxReplays = maxReplayAttempts ?? DefaultMaxReplayAttempts;
return new DeadLetterEntry(
EntryId: Guid.NewGuid(),
TenantId: job.TenantId,
OriginalJobId: job.JobId,
RunId: job.RunId,
SourceId: null, // Would be extracted from payload if available
JobType: job.JobType,
Payload: job.Payload,
PayloadDigest: job.PayloadDigest,
IdempotencyKey: job.IdempotencyKey,
CorrelationId: job.CorrelationId,
Status: DeadLetterStatus.Pending,
ErrorCode: errorCode,
FailureReason: failureReason,
RemediationHint: remediationHint,
Category: category,
IsRetryable: isRetryable,
OriginalAttempts: job.Attempt,
ReplayAttempts: 0,
MaxReplayAttempts: effectiveMaxReplays,
FailedAt: job.CompletedAt ?? now,
CreatedAt: now,
UpdatedAt: now,
ExpiresAt: now.Add(effectiveRetention),
ResolvedAt: null,
ResolutionNotes: null,
CreatedBy: job.CreatedBy,
UpdatedBy: "system");
}
/// <summary>Marks entry as being replayed.</summary>
public DeadLetterEntry StartReplay(string updatedBy, DateTimeOffset now)
{
if (!CanReplay)
throw new InvalidOperationException($"Cannot replay entry in status {Status} with {ReplayAttempts}/{MaxReplayAttempts} attempts.");
return this with
{
Status = DeadLetterStatus.Replaying,
ReplayAttempts = ReplayAttempts + 1,
UpdatedAt = now,
UpdatedBy = updatedBy
};
}
/// <summary>Marks entry as successfully replayed.</summary>
public DeadLetterEntry CompleteReplay(Guid newJobId, string updatedBy, DateTimeOffset now)
{
if (Status != DeadLetterStatus.Replaying)
throw new InvalidOperationException($"Cannot complete replay from status {Status}.");
return this with
{
Status = DeadLetterStatus.Replayed,
ResolvedAt = now,
ResolutionNotes = $"Replayed as job {newJobId}",
UpdatedAt = now,
UpdatedBy = updatedBy
};
}
/// <summary>Marks replay as failed.</summary>
public DeadLetterEntry FailReplay(string reason, string updatedBy, DateTimeOffset now)
{
if (Status != DeadLetterStatus.Replaying)
throw new InvalidOperationException($"Cannot fail replay from status {Status}.");
var newStatus = ReplayAttempts >= MaxReplayAttempts
? DeadLetterStatus.Exhausted
: DeadLetterStatus.Pending;
return this with
{
Status = newStatus,
FailureReason = reason,
UpdatedAt = now,
UpdatedBy = updatedBy
};
}
/// <summary>Manually resolves the entry without replay.</summary>
public DeadLetterEntry Resolve(string notes, string updatedBy, DateTimeOffset now)
{
if (IsTerminal)
throw new InvalidOperationException($"Cannot resolve entry in terminal status {Status}.");
return this with
{
Status = DeadLetterStatus.Resolved,
ResolvedAt = now,
ResolutionNotes = notes,
UpdatedAt = now,
UpdatedBy = updatedBy
};
}
/// <summary>Marks entry as expired for cleanup.</summary>
public DeadLetterEntry MarkExpired(DateTimeOffset now)
{
if (IsTerminal)
throw new InvalidOperationException($"Cannot expire entry in terminal status {Status}.");
return this with
{
Status = DeadLetterStatus.Expired,
UpdatedAt = now,
UpdatedBy = "system"
};
}
}
/// <summary>
/// Dead-letter entry lifecycle states.
/// </summary>
public enum DeadLetterStatus
{
/// <summary>Entry awaiting operator action or replay.</summary>
Pending = 0,
/// <summary>Entry currently being replayed.</summary>
Replaying = 1,
/// <summary>Entry successfully replayed as a new job.</summary>
Replayed = 2,
/// <summary>Entry manually resolved without replay.</summary>
Resolved = 3,
/// <summary>Entry exhausted all replay attempts.</summary>
Exhausted = 4,
/// <summary>Entry expired and eligible for purge.</summary>
Expired = 5
}
/// <summary>
/// Error classification categories for dead-letter entries.
/// </summary>
public enum ErrorCategory
{
/// <summary>Unknown or unclassified error.</summary>
Unknown = 0,
/// <summary>Transient infrastructure error (network, timeout).</summary>
Transient = 1,
/// <summary>Resource not found (image, source, etc.).</summary>
NotFound = 2,
/// <summary>Authentication or authorization failure.</summary>
AuthFailure = 3,
/// <summary>Rate limiting or quota exceeded.</summary>
RateLimited = 4,
/// <summary>Invalid input or configuration.</summary>
ValidationError = 5,
/// <summary>Upstream service error (registry, advisory feed).</summary>
UpstreamError = 6,
/// <summary>Internal processing error (bug, corruption).</summary>
InternalError = 7,
/// <summary>Resource conflict (duplicate, version mismatch).</summary>
Conflict = 8,
/// <summary>Operation canceled by user or system.</summary>
Canceled = 9
}

View File

@@ -0,0 +1,69 @@
namespace StellaOps.Orchestrator.Core.Domain;
/// <summary>
/// Represents an operational incident triggered by threshold breaches.
/// Incidents are generated when failure rates exceed configured limits.
/// </summary>
public sealed record Incident(
/// <summary>Unique incident identifier.</summary>
Guid IncidentId,
/// <summary>Tenant affected by this incident.</summary>
string TenantId,
/// <summary>Incident type (e.g., "failure_rate", "quota_exhausted", "circuit_open").</summary>
string IncidentType,
/// <summary>Incident severity (e.g., "warning", "critical").</summary>
string Severity,
/// <summary>Affected job type (if applicable).</summary>
string? JobType,
/// <summary>Affected source (if applicable).</summary>
Guid? SourceId,
/// <summary>Human-readable incident title.</summary>
string Title,
/// <summary>Detailed incident description.</summary>
string Description,
/// <summary>Current incident status.</summary>
IncidentStatus Status,
/// <summary>When the incident was created.</summary>
DateTimeOffset CreatedAt,
/// <summary>When the incident was acknowledged.</summary>
DateTimeOffset? AcknowledgedAt,
/// <summary>Actor who acknowledged the incident.</summary>
string? AcknowledgedBy,
/// <summary>When the incident was resolved.</summary>
DateTimeOffset? ResolvedAt,
/// <summary>Actor who resolved the incident.</summary>
string? ResolvedBy,
/// <summary>Resolution notes.</summary>
string? ResolutionNotes,
/// <summary>Optional metadata JSON blob.</summary>
string? Metadata);
/// <summary>
/// Incident lifecycle states.
/// </summary>
public enum IncidentStatus
{
/// <summary>Incident is open and unacknowledged.</summary>
Open = 0,
/// <summary>Incident acknowledged by operator.</summary>
Acknowledged = 1,
/// <summary>Incident resolved.</summary>
Resolved = 2
}

View File

@@ -0,0 +1,81 @@
namespace StellaOps.Orchestrator.Core.Domain;
/// <summary>
/// Represents a unit of work to be executed by a worker.
/// Jobs are scheduled, leased to workers, and tracked through completion.
/// </summary>
public sealed record Job(
/// <summary>Unique job identifier.</summary>
Guid JobId,
/// <summary>Tenant owning this job.</summary>
string TenantId,
/// <summary>Optional project scope within tenant.</summary>
string? ProjectId,
/// <summary>Run this job belongs to (if any).</summary>
Guid? RunId,
/// <summary>Job type (e.g., "scan.image", "advisory.nvd", "export.sbom").</summary>
string JobType,
/// <summary>Current job status.</summary>
JobStatus Status,
/// <summary>Priority (higher = more urgent). Default 0.</summary>
int Priority,
/// <summary>Current attempt number (1-based).</summary>
int Attempt,
/// <summary>Maximum retry attempts.</summary>
int MaxAttempts,
/// <summary>SHA-256 digest of the payload for determinism verification.</summary>
string PayloadDigest,
/// <summary>Job payload JSON (inputs, parameters).</summary>
string Payload,
/// <summary>Idempotency key for deduplication.</summary>
string IdempotencyKey,
/// <summary>Correlation ID for distributed tracing.</summary>
string? CorrelationId,
/// <summary>Current lease ID (if leased).</summary>
Guid? LeaseId,
/// <summary>Worker holding the lease (if leased).</summary>
string? WorkerId,
/// <summary>Task runner ID executing the job (if applicable).</summary>
string? TaskRunnerId,
/// <summary>Lease expiration time.</summary>
DateTimeOffset? LeaseUntil,
/// <summary>When the job was created.</summary>
DateTimeOffset CreatedAt,
/// <summary>When the job was scheduled (quota cleared).</summary>
DateTimeOffset? ScheduledAt,
/// <summary>When the job was leased to a worker.</summary>
DateTimeOffset? LeasedAt,
/// <summary>When the job completed (terminal state).</summary>
DateTimeOffset? CompletedAt,
/// <summary>Earliest time the job can be scheduled (for backoff).</summary>
DateTimeOffset? NotBefore,
/// <summary>Terminal status reason (failure message, cancel reason, etc.).</summary>
string? Reason,
/// <summary>ID of the original job if this is a replay.</summary>
Guid? ReplayOf,
/// <summary>Actor who created/submitted the job.</summary>
string CreatedBy);

View File

@@ -0,0 +1,48 @@
namespace StellaOps.Orchestrator.Core.Domain;
/// <summary>
/// Represents an immutable history entry for job state changes.
/// Provides audit trail for all job lifecycle transitions.
/// </summary>
public sealed record JobHistory(
/// <summary>Unique history entry identifier.</summary>
Guid HistoryId,
/// <summary>Tenant owning this entry.</summary>
string TenantId,
/// <summary>Job this history entry belongs to.</summary>
Guid JobId,
/// <summary>Sequence number within the job's history (1-based).</summary>
int SequenceNo,
/// <summary>Previous job status.</summary>
JobStatus? FromStatus,
/// <summary>New job status.</summary>
JobStatus ToStatus,
/// <summary>Attempt number at time of transition.</summary>
int Attempt,
/// <summary>Lease ID (if applicable).</summary>
Guid? LeaseId,
/// <summary>Worker ID (if applicable).</summary>
string? WorkerId,
/// <summary>Reason for the transition.</summary>
string? Reason,
/// <summary>When this transition occurred.</summary>
DateTimeOffset OccurredAt,
/// <summary>When this entry was recorded.</summary>
DateTimeOffset RecordedAt,
/// <summary>Actor who caused this transition.</summary>
string ActorId,
/// <summary>Actor type (system, operator, worker).</summary>
string ActorType);

View File

@@ -0,0 +1,30 @@
namespace StellaOps.Orchestrator.Core.Domain;
/// <summary>
/// Job lifecycle states. Transitions follow the state machine:
/// Pending → Scheduled → Leased → (Succeeded | Failed | Canceled | TimedOut)
/// Failed jobs may transition to Pending via replay.
/// </summary>
public enum JobStatus
{
/// <summary>Job enqueued but not yet scheduled (e.g., quota exceeded).</summary>
Pending = 0,
/// <summary>Job scheduled and awaiting worker lease.</summary>
Scheduled = 1,
/// <summary>Job leased to a worker for execution.</summary>
Leased = 2,
/// <summary>Job completed successfully.</summary>
Succeeded = 3,
/// <summary>Job failed after exhausting retries.</summary>
Failed = 4,
/// <summary>Job canceled by operator or system.</summary>
Canceled = 5,
/// <summary>Job timed out (lease expired without completion).</summary>
TimedOut = 6
}

View File

@@ -0,0 +1,60 @@
namespace StellaOps.Orchestrator.Core.Domain;
/// <summary>
/// Represents rate-limit and concurrency quotas for job scheduling.
/// Quotas are scoped to tenant and optionally job type.
/// </summary>
public sealed record Quota(
/// <summary>Unique quota identifier.</summary>
Guid QuotaId,
/// <summary>Tenant this quota applies to.</summary>
string TenantId,
/// <summary>Job type this quota applies to (null = all job types).</summary>
string? JobType,
/// <summary>Maximum concurrent active (leased) jobs.</summary>
int MaxActive,
/// <summary>Maximum jobs per hour (sliding window).</summary>
int MaxPerHour,
/// <summary>Burst capacity for token bucket.</summary>
int BurstCapacity,
/// <summary>Token refill rate (tokens per second).</summary>
double RefillRate,
/// <summary>Current available tokens.</summary>
double CurrentTokens,
/// <summary>Last time tokens were refilled.</summary>
DateTimeOffset LastRefillAt,
/// <summary>Current count of active (leased) jobs.</summary>
int CurrentActive,
/// <summary>Jobs scheduled in current hour window.</summary>
int CurrentHourCount,
/// <summary>Start of current hour window.</summary>
DateTimeOffset CurrentHourStart,
/// <summary>Whether this quota is currently paused (operator override).</summary>
bool Paused,
/// <summary>Operator-provided reason for pause.</summary>
string? PauseReason,
/// <summary>Ticket reference for quota change audit.</summary>
string? QuotaTicket,
/// <summary>When the quota was created.</summary>
DateTimeOffset CreatedAt,
/// <summary>When the quota was last updated.</summary>
DateTimeOffset UpdatedAt,
/// <summary>Actor who last modified the quota.</summary>
string UpdatedBy);

View File

@@ -0,0 +1,78 @@
namespace StellaOps.Orchestrator.Core.Domain;
/// <summary>
/// Represents a run (batch/workflow execution) containing multiple jobs.
/// Runs group related jobs (e.g., scanning an image produces multiple analyzer jobs).
/// </summary>
public sealed record Run(
/// <summary>Unique run identifier.</summary>
Guid RunId,
/// <summary>Tenant owning this run.</summary>
string TenantId,
/// <summary>Optional project scope within tenant.</summary>
string? ProjectId,
/// <summary>Source that initiated this run.</summary>
Guid SourceId,
/// <summary>Run type (e.g., "scan", "advisory-sync", "export").</summary>
string RunType,
/// <summary>Current aggregate status of the run.</summary>
RunStatus Status,
/// <summary>Correlation ID for distributed tracing.</summary>
string? CorrelationId,
/// <summary>Total number of jobs in this run.</summary>
int TotalJobs,
/// <summary>Number of completed jobs (succeeded + failed + canceled).</summary>
int CompletedJobs,
/// <summary>Number of succeeded jobs.</summary>
int SucceededJobs,
/// <summary>Number of failed jobs.</summary>
int FailedJobs,
/// <summary>When the run was created.</summary>
DateTimeOffset CreatedAt,
/// <summary>When the run started executing (first job leased).</summary>
DateTimeOffset? StartedAt,
/// <summary>When the run completed (all jobs terminal).</summary>
DateTimeOffset? CompletedAt,
/// <summary>Actor who initiated the run.</summary>
string CreatedBy,
/// <summary>Optional metadata JSON blob.</summary>
string? Metadata);
/// <summary>
/// Run lifecycle states.
/// </summary>
public enum RunStatus
{
/// <summary>Run created, jobs being enqueued.</summary>
Pending = 0,
/// <summary>Run is executing (at least one job leased).</summary>
Running = 1,
/// <summary>All jobs completed successfully.</summary>
Succeeded = 2,
/// <summary>Run completed with some failures.</summary>
PartiallySucceeded = 3,
/// <summary>All jobs failed.</summary>
Failed = 4,
/// <summary>Run canceled by operator.</summary>
Canceled = 5
}

View File

@@ -0,0 +1,341 @@
namespace StellaOps.Orchestrator.Core.Domain;
/// <summary>
/// Immutable ledger entry for run execution records.
/// Provides a tamper-evident history of run outcomes with provenance to artifacts.
/// </summary>
public sealed record RunLedgerEntry(
/// <summary>Unique ledger entry identifier.</summary>
Guid LedgerId,
/// <summary>Tenant owning this entry.</summary>
string TenantId,
/// <summary>Run this entry records.</summary>
Guid RunId,
/// <summary>Source that initiated the run.</summary>
Guid SourceId,
/// <summary>Run type (scan, advisory-sync, export).</summary>
string RunType,
/// <summary>Final run status.</summary>
RunStatus FinalStatus,
/// <summary>Total jobs in the run.</summary>
int TotalJobs,
/// <summary>Successfully completed jobs.</summary>
int SucceededJobs,
/// <summary>Failed jobs.</summary>
int FailedJobs,
/// <summary>When the run was created.</summary>
DateTimeOffset RunCreatedAt,
/// <summary>When the run started executing.</summary>
DateTimeOffset? RunStartedAt,
/// <summary>When the run completed.</summary>
DateTimeOffset RunCompletedAt,
/// <summary>Total execution duration.</summary>
TimeSpan ExecutionDuration,
/// <summary>Actor who initiated the run.</summary>
string InitiatedBy,
/// <summary>SHA-256 digest of the run's input payload.</summary>
string InputDigest,
/// <summary>Aggregated SHA-256 digest of all outputs.</summary>
string OutputDigest,
/// <summary>JSON array of artifact references with their digests.</summary>
string ArtifactManifest,
/// <summary>Sequence number in the tenant's ledger.</summary>
long SequenceNumber,
/// <summary>SHA-256 hash of the previous ledger entry.</summary>
string? PreviousEntryHash,
/// <summary>SHA-256 hash of this entry's content.</summary>
string ContentHash,
/// <summary>When this ledger entry was created.</summary>
DateTimeOffset LedgerCreatedAt,
/// <summary>Correlation ID for tracing.</summary>
string? CorrelationId,
/// <summary>Optional metadata JSON.</summary>
string? Metadata)
{
/// <summary>
/// Creates a ledger entry from a completed run.
/// </summary>
public static RunLedgerEntry FromCompletedRun(
Run run,
IReadOnlyList<Artifact> artifacts,
string inputDigest,
long sequenceNumber,
string? previousEntryHash,
string? metadata = null)
{
if (run.CompletedAt is null)
{
throw new InvalidOperationException("Cannot create ledger entry from an incomplete run.");
}
var ledgerId = Guid.NewGuid();
var ledgerCreatedAt = DateTimeOffset.UtcNow;
// Build artifact manifest
var artifactManifest = BuildArtifactManifest(artifacts);
// Compute output digest from all artifact digests
var outputDigest = ComputeOutputDigest(artifacts);
// Compute execution duration
var startTime = run.StartedAt ?? run.CreatedAt;
var executionDuration = run.CompletedAt.Value - startTime;
// Compute content hash for tamper evidence
var contentToHash = $"{ledgerId}|{run.TenantId}|{run.RunId}|{run.SourceId}|{run.RunType}|{run.Status}|{run.TotalJobs}|{run.SucceededJobs}|{run.FailedJobs}|{run.CreatedAt:O}|{run.StartedAt:O}|{run.CompletedAt:O}|{inputDigest}|{outputDigest}|{sequenceNumber}|{previousEntryHash}|{ledgerCreatedAt:O}";
var contentHash = ComputeSha256(contentToHash);
return new RunLedgerEntry(
LedgerId: ledgerId,
TenantId: run.TenantId,
RunId: run.RunId,
SourceId: run.SourceId,
RunType: run.RunType,
FinalStatus: run.Status,
TotalJobs: run.TotalJobs,
SucceededJobs: run.SucceededJobs,
FailedJobs: run.FailedJobs,
RunCreatedAt: run.CreatedAt,
RunStartedAt: run.StartedAt,
RunCompletedAt: run.CompletedAt.Value,
ExecutionDuration: executionDuration,
InitiatedBy: run.CreatedBy,
InputDigest: inputDigest,
OutputDigest: outputDigest,
ArtifactManifest: artifactManifest,
SequenceNumber: sequenceNumber,
PreviousEntryHash: previousEntryHash,
ContentHash: contentHash,
LedgerCreatedAt: ledgerCreatedAt,
CorrelationId: run.CorrelationId,
Metadata: metadata);
}
/// <summary>
/// Verifies the integrity of this ledger entry.
/// </summary>
public bool VerifyIntegrity()
{
var contentToHash = $"{LedgerId}|{TenantId}|{RunId}|{SourceId}|{RunType}|{FinalStatus}|{TotalJobs}|{SucceededJobs}|{FailedJobs}|{RunCreatedAt:O}|{RunStartedAt:O}|{RunCompletedAt:O}|{InputDigest}|{OutputDigest}|{SequenceNumber}|{PreviousEntryHash}|{LedgerCreatedAt:O}";
var computed = ComputeSha256(contentToHash);
return string.Equals(ContentHash, computed, StringComparison.OrdinalIgnoreCase);
}
/// <summary>
/// Verifies the chain link to the previous entry.
/// </summary>
public bool VerifyChainLink(RunLedgerEntry? previousEntry)
{
if (previousEntry is null)
{
return PreviousEntryHash is null || SequenceNumber == 1;
}
return string.Equals(PreviousEntryHash, previousEntry.ContentHash, StringComparison.OrdinalIgnoreCase);
}
private static string BuildArtifactManifest(IReadOnlyList<Artifact> artifacts)
{
var entries = artifacts.Select(a => new
{
a.ArtifactId,
a.ArtifactType,
a.Uri,
a.Digest,
a.MimeType,
a.SizeBytes,
a.CreatedAt
});
return System.Text.Json.JsonSerializer.Serialize(entries);
}
private static string ComputeOutputDigest(IReadOnlyList<Artifact> artifacts)
{
if (artifacts.Count == 0)
{
return ComputeSha256("(no artifacts)");
}
// Sort by artifact ID for deterministic ordering
var sortedDigests = artifacts
.OrderBy(a => a.ArtifactId)
.Select(a => a.Digest)
.ToList();
var combined = string.Join("|", sortedDigests);
return ComputeSha256(combined);
}
private static string ComputeSha256(string content)
{
var bytes = System.Text.Encoding.UTF8.GetBytes(content);
var hash = System.Security.Cryptography.SHA256.HashData(bytes);
return Convert.ToHexString(hash).ToLowerInvariant();
}
}
/// <summary>
/// Represents a ledger export operation.
/// </summary>
public sealed record LedgerExport(
/// <summary>Unique export identifier.</summary>
Guid ExportId,
/// <summary>Tenant requesting the export.</summary>
string TenantId,
/// <summary>Export status.</summary>
LedgerExportStatus Status,
/// <summary>Export format (json, ndjson, csv).</summary>
string Format,
/// <summary>Start of the time range to export.</summary>
DateTimeOffset? StartTime,
/// <summary>End of the time range to export.</summary>
DateTimeOffset? EndTime,
/// <summary>Run types to include (null = all).</summary>
string? RunTypeFilter,
/// <summary>Source ID filter (null = all).</summary>
Guid? SourceIdFilter,
/// <summary>Number of entries exported.</summary>
int EntryCount,
/// <summary>URI where the export is stored.</summary>
string? OutputUri,
/// <summary>SHA-256 digest of the export file.</summary>
string? OutputDigest,
/// <summary>Size of the export in bytes.</summary>
long? OutputSizeBytes,
/// <summary>Actor who requested the export.</summary>
string RequestedBy,
/// <summary>When the export was requested.</summary>
DateTimeOffset RequestedAt,
/// <summary>When the export started processing.</summary>
DateTimeOffset? StartedAt,
/// <summary>When the export completed.</summary>
DateTimeOffset? CompletedAt,
/// <summary>Error message if export failed.</summary>
string? ErrorMessage)
{
/// <summary>
/// Creates a new pending export request.
/// </summary>
public static LedgerExport CreateRequest(
string tenantId,
string format,
string requestedBy,
DateTimeOffset? startTime = null,
DateTimeOffset? endTime = null,
string? runTypeFilter = null,
Guid? sourceIdFilter = null)
{
if (string.IsNullOrWhiteSpace(format))
{
throw new ArgumentException("Format is required.", nameof(format));
}
var validFormats = new[] { "json", "ndjson", "csv" };
if (!validFormats.Contains(format.ToLowerInvariant()))
{
throw new ArgumentException($"Invalid format. Must be one of: {string.Join(", ", validFormats)}", nameof(format));
}
return new LedgerExport(
ExportId: Guid.NewGuid(),
TenantId: tenantId,
Status: LedgerExportStatus.Pending,
Format: format.ToLowerInvariant(),
StartTime: startTime,
EndTime: endTime,
RunTypeFilter: runTypeFilter,
SourceIdFilter: sourceIdFilter,
EntryCount: 0,
OutputUri: null,
OutputDigest: null,
OutputSizeBytes: null,
RequestedBy: requestedBy,
RequestedAt: DateTimeOffset.UtcNow,
StartedAt: null,
CompletedAt: null,
ErrorMessage: null);
}
/// <summary>
/// Marks the export as started.
/// </summary>
public LedgerExport Start() => this with
{
Status = LedgerExportStatus.Processing,
StartedAt = DateTimeOffset.UtcNow
};
/// <summary>
/// Marks the export as completed.
/// </summary>
public LedgerExport Complete(string outputUri, string outputDigest, long outputSizeBytes, int entryCount) => this with
{
Status = LedgerExportStatus.Completed,
OutputUri = outputUri,
OutputDigest = outputDigest,
OutputSizeBytes = outputSizeBytes,
EntryCount = entryCount,
CompletedAt = DateTimeOffset.UtcNow
};
/// <summary>
/// Marks the export as failed.
/// </summary>
public LedgerExport Fail(string errorMessage) => this with
{
Status = LedgerExportStatus.Failed,
ErrorMessage = errorMessage,
CompletedAt = DateTimeOffset.UtcNow
};
}
/// <summary>
/// Status of a ledger export operation.
/// </summary>
public enum LedgerExportStatus
{
Pending = 0,
Processing = 1,
Completed = 2,
Failed = 3,
Canceled = 4
}

View File

@@ -0,0 +1,60 @@
namespace StellaOps.Orchestrator.Core.Domain;
/// <summary>
/// Represents a scheduled job trigger (cron-based or interval-based).
/// Schedules automatically create jobs at specified times.
/// </summary>
public sealed record Schedule(
/// <summary>Unique schedule identifier.</summary>
Guid ScheduleId,
/// <summary>Tenant owning this schedule.</summary>
string TenantId,
/// <summary>Optional project scope within tenant.</summary>
string? ProjectId,
/// <summary>Source that will be used for jobs.</summary>
Guid SourceId,
/// <summary>Human-readable schedule name.</summary>
string Name,
/// <summary>Job type to create.</summary>
string JobType,
/// <summary>Cron expression (6-field with seconds, UTC).</summary>
string CronExpression,
/// <summary>Timezone for cron evaluation (IANA, e.g., "UTC", "America/New_York").</summary>
string Timezone,
/// <summary>Whether the schedule is enabled.</summary>
bool Enabled,
/// <summary>Job payload template JSON.</summary>
string PayloadTemplate,
/// <summary>Job priority for scheduled jobs.</summary>
int Priority,
/// <summary>Maximum retry attempts for scheduled jobs.</summary>
int MaxAttempts,
/// <summary>Last time a job was triggered from this schedule.</summary>
DateTimeOffset? LastTriggeredAt,
/// <summary>Next scheduled trigger time.</summary>
DateTimeOffset? NextTriggerAt,
/// <summary>When the schedule was created.</summary>
DateTimeOffset CreatedAt,
/// <summary>When the schedule was last updated.</summary>
DateTimeOffset UpdatedAt,
/// <summary>Actor who created the schedule.</summary>
string CreatedBy,
/// <summary>Actor who last modified the schedule.</summary>
string UpdatedBy);

View File

@@ -0,0 +1,423 @@
using System.Text.Json;
namespace StellaOps.Orchestrator.Core.Domain;
/// <summary>
/// Signed manifest providing provenance chain from ledger entries to artifacts.
/// Enables verification of artifact authenticity and integrity.
/// </summary>
public sealed record SignedManifest(
/// <summary>Unique manifest identifier.</summary>
Guid ManifestId,
/// <summary>Manifest schema version.</summary>
string SchemaVersion,
/// <summary>Tenant owning this manifest.</summary>
string TenantId,
/// <summary>Type of provenance (run, export, attestation).</summary>
ProvenanceType ProvenanceType,
/// <summary>Subject of the provenance (run ID, export ID, etc.).</summary>
Guid SubjectId,
/// <summary>Provenance statements (JSON array).</summary>
string Statements,
/// <summary>Artifact references with digests (JSON array).</summary>
string Artifacts,
/// <summary>Materials (inputs) used to produce the artifacts (JSON array).</summary>
string Materials,
/// <summary>Build environment information (JSON object).</summary>
string? BuildInfo,
/// <summary>SHA-256 digest of the manifest payload (excluding signature).</summary>
string PayloadDigest,
/// <summary>Signature algorithm used.</summary>
string SignatureAlgorithm,
/// <summary>Base64-encoded signature.</summary>
string Signature,
/// <summary>Key ID used for signing.</summary>
string KeyId,
/// <summary>When the manifest was created.</summary>
DateTimeOffset CreatedAt,
/// <summary>Expiration time of the manifest (if any).</summary>
DateTimeOffset? ExpiresAt,
/// <summary>Additional metadata (JSON object).</summary>
string? Metadata)
{
/// <summary>
/// Current schema version for manifests.
/// </summary>
public const string CurrentSchemaVersion = "1.0.0";
/// <summary>
/// Creates an unsigned manifest from a ledger entry.
/// The manifest must be signed separately using SigningService.
/// </summary>
public static SignedManifest CreateFromLedgerEntry(
RunLedgerEntry ledger,
string? buildInfo = null,
string? metadata = null)
{
var statements = CreateStatementsFromLedger(ledger);
var artifacts = ledger.ArtifactManifest;
var materials = CreateMaterialsFromLedger(ledger);
var payloadDigest = ComputePayloadDigest(
ledger.TenantId,
ProvenanceType.Run,
ledger.RunId,
statements,
artifacts,
materials);
return new SignedManifest(
ManifestId: Guid.NewGuid(),
SchemaVersion: CurrentSchemaVersion,
TenantId: ledger.TenantId,
ProvenanceType: ProvenanceType.Run,
SubjectId: ledger.RunId,
Statements: statements,
Artifacts: artifacts,
Materials: materials,
BuildInfo: buildInfo,
PayloadDigest: payloadDigest,
SignatureAlgorithm: "none",
Signature: string.Empty,
KeyId: string.Empty,
CreatedAt: DateTimeOffset.UtcNow,
ExpiresAt: null,
Metadata: metadata);
}
/// <summary>
/// Creates an unsigned manifest from a ledger export.
/// </summary>
public static SignedManifest CreateFromExport(
LedgerExport export,
IReadOnlyList<RunLedgerEntry> entries,
string? buildInfo = null,
string? metadata = null)
{
if (export.Status != LedgerExportStatus.Completed)
{
throw new InvalidOperationException("Cannot create manifest from incomplete export.");
}
var statements = CreateStatementsFromExport(export, entries);
var artifacts = CreateExportArtifacts(export);
var materials = CreateExportMaterials(entries);
var payloadDigest = ComputePayloadDigest(
export.TenantId,
ProvenanceType.Export,
export.ExportId,
statements,
artifacts,
materials);
return new SignedManifest(
ManifestId: Guid.NewGuid(),
SchemaVersion: CurrentSchemaVersion,
TenantId: export.TenantId,
ProvenanceType: ProvenanceType.Export,
SubjectId: export.ExportId,
Statements: statements,
Artifacts: artifacts,
Materials: materials,
BuildInfo: buildInfo,
PayloadDigest: payloadDigest,
SignatureAlgorithm: "none",
Signature: string.Empty,
KeyId: string.Empty,
CreatedAt: DateTimeOffset.UtcNow,
ExpiresAt: null,
Metadata: metadata);
}
/// <summary>
/// Signs the manifest with the provided signature.
/// </summary>
public SignedManifest Sign(string signatureAlgorithm, string signature, string keyId, DateTimeOffset? expiresAt = null)
{
if (string.IsNullOrWhiteSpace(signatureAlgorithm))
{
throw new ArgumentException("Signature algorithm is required.", nameof(signatureAlgorithm));
}
if (string.IsNullOrWhiteSpace(signature))
{
throw new ArgumentException("Signature is required.", nameof(signature));
}
if (string.IsNullOrWhiteSpace(keyId))
{
throw new ArgumentException("Key ID is required.", nameof(keyId));
}
return this with
{
SignatureAlgorithm = signatureAlgorithm,
Signature = signature,
KeyId = keyId,
ExpiresAt = expiresAt
};
}
/// <summary>
/// Checks if the manifest is signed.
/// </summary>
public bool IsSigned => !string.IsNullOrEmpty(Signature) && SignatureAlgorithm != "none";
/// <summary>
/// Checks if the manifest has expired.
/// </summary>
public bool IsExpired => ExpiresAt.HasValue && ExpiresAt.Value < DateTimeOffset.UtcNow;
/// <summary>
/// Verifies the payload digest integrity.
/// </summary>
public bool VerifyPayloadIntegrity()
{
var computed = ComputePayloadDigest(TenantId, ProvenanceType, SubjectId, Statements, Artifacts, Materials);
return string.Equals(PayloadDigest, computed, StringComparison.OrdinalIgnoreCase);
}
/// <summary>
/// Parses the artifact manifest into typed objects.
/// </summary>
public IReadOnlyList<ArtifactReference> GetArtifactReferences()
{
if (string.IsNullOrEmpty(Artifacts) || Artifacts == "[]")
{
return Array.Empty<ArtifactReference>();
}
return JsonSerializer.Deserialize<List<ArtifactReference>>(Artifacts) ?? [];
}
/// <summary>
/// Parses the material manifest into typed objects.
/// </summary>
public IReadOnlyList<MaterialReference> GetMaterialReferences()
{
if (string.IsNullOrEmpty(Materials) || Materials == "[]")
{
return Array.Empty<MaterialReference>();
}
return JsonSerializer.Deserialize<List<MaterialReference>>(Materials) ?? [];
}
/// <summary>
/// Parses the statements into typed objects.
/// </summary>
public IReadOnlyList<ProvenanceStatement> GetStatements()
{
if (string.IsNullOrEmpty(Statements) || Statements == "[]")
{
return Array.Empty<ProvenanceStatement>();
}
return JsonSerializer.Deserialize<List<ProvenanceStatement>>(Statements) ?? [];
}
private static string CreateStatementsFromLedger(RunLedgerEntry ledger)
{
var statements = new List<ProvenanceStatement>
{
new(
StatementType: "run_completed",
Subject: $"run:{ledger.RunId}",
Predicate: "produced",
Object: $"outputs:{ledger.OutputDigest}",
Timestamp: ledger.RunCompletedAt,
Metadata: JsonSerializer.Serialize(new
{
ledger.RunType,
ledger.FinalStatus,
ledger.TotalJobs,
ledger.SucceededJobs,
ledger.FailedJobs,
ledger.ExecutionDuration
})),
new(
StatementType: "chain_link",
Subject: $"ledger:{ledger.LedgerId}",
Predicate: "follows",
Object: ledger.PreviousEntryHash ?? "(genesis)",
Timestamp: ledger.LedgerCreatedAt,
Metadata: JsonSerializer.Serialize(new
{
ledger.SequenceNumber,
ledger.ContentHash
}))
};
return JsonSerializer.Serialize(statements);
}
private static string CreateMaterialsFromLedger(RunLedgerEntry ledger)
{
var materials = new List<MaterialReference>
{
new(
Uri: $"input:{ledger.RunId}",
Digest: ledger.InputDigest,
MediaType: "application/json",
Name: "run_input")
};
return JsonSerializer.Serialize(materials);
}
private static string CreateStatementsFromExport(LedgerExport export, IReadOnlyList<RunLedgerEntry> entries)
{
var statements = new List<ProvenanceStatement>
{
new(
StatementType: "export_completed",
Subject: $"export:{export.ExportId}",
Predicate: "contains",
Object: $"entries:{entries.Count}",
Timestamp: export.CompletedAt ?? DateTimeOffset.UtcNow,
Metadata: JsonSerializer.Serialize(new
{
export.Format,
export.EntryCount,
export.StartTime,
export.EndTime,
export.RunTypeFilter,
export.SourceIdFilter
}))
};
// Add chain integrity statement
if (entries.Count > 0)
{
var first = entries.MinBy(e => e.SequenceNumber);
var last = entries.MaxBy(e => e.SequenceNumber);
if (first is not null && last is not null)
{
statements.Add(new ProvenanceStatement(
StatementType: "chain_range",
Subject: $"export:{export.ExportId}",
Predicate: "covers",
Object: $"sequence:{first.SequenceNumber}-{last.SequenceNumber}",
Timestamp: export.CompletedAt ?? DateTimeOffset.UtcNow,
Metadata: JsonSerializer.Serialize(new
{
FirstEntryHash = first.ContentHash,
LastEntryHash = last.ContentHash
})));
}
}
return JsonSerializer.Serialize(statements);
}
private static string CreateExportArtifacts(LedgerExport export)
{
var artifacts = new List<ArtifactReference>
{
new(
ArtifactId: export.ExportId,
ArtifactType: "ledger_export",
Uri: export.OutputUri ?? string.Empty,
Digest: export.OutputDigest ?? string.Empty,
MediaType: GetMediaType(export.Format),
SizeBytes: export.OutputSizeBytes ?? 0)
};
return JsonSerializer.Serialize(artifacts);
}
private static string CreateExportMaterials(IReadOnlyList<RunLedgerEntry> entries)
{
var materials = entries.Select(e => new MaterialReference(
Uri: $"ledger:{e.LedgerId}",
Digest: e.ContentHash,
MediaType: "application/json",
Name: $"run_{e.RunId}")).ToList();
return JsonSerializer.Serialize(materials);
}
private static string GetMediaType(string format) => format.ToLowerInvariant() switch
{
"json" => "application/json",
"ndjson" => "application/x-ndjson",
"csv" => "text/csv",
_ => "application/octet-stream"
};
private static string ComputePayloadDigest(
string tenantId,
ProvenanceType provenanceType,
Guid subjectId,
string statements,
string artifacts,
string materials)
{
var payload = $"{tenantId}|{provenanceType}|{subjectId}|{statements}|{artifacts}|{materials}";
var bytes = System.Text.Encoding.UTF8.GetBytes(payload);
var hash = System.Security.Cryptography.SHA256.HashData(bytes);
return Convert.ToHexString(hash).ToLowerInvariant();
}
}
/// <summary>
/// Types of provenance tracked by manifests.
/// </summary>
public enum ProvenanceType
{
/// <summary>Provenance for a completed run.</summary>
Run = 0,
/// <summary>Provenance for a ledger export.</summary>
Export = 1,
/// <summary>Provenance for an attestation.</summary>
Attestation = 2
}
/// <summary>
/// Reference to an artifact in a manifest.
/// </summary>
public sealed record ArtifactReference(
Guid ArtifactId,
string ArtifactType,
string Uri,
string Digest,
string MediaType,
long SizeBytes);
/// <summary>
/// Reference to a material (input) in a manifest.
/// </summary>
public sealed record MaterialReference(
string Uri,
string Digest,
string MediaType,
string Name);
/// <summary>
/// A provenance statement in a manifest.
/// </summary>
public sealed record ProvenanceStatement(
string StatementType,
string Subject,
string Predicate,
string Object,
DateTimeOffset Timestamp,
string? Metadata);

View File

@@ -0,0 +1,567 @@
namespace StellaOps.Orchestrator.Core.Domain;
/// <summary>
/// Service Level Objective type.
/// </summary>
public enum SloType
{
/// <summary>Availability SLO (percentage of successful requests).</summary>
Availability,
/// <summary>Latency SLO (percentile-based response time).</summary>
Latency,
/// <summary>Throughput SLO (minimum jobs processed per period).</summary>
Throughput
}
/// <summary>
/// Time window for SLO computation.
/// </summary>
public enum SloWindow
{
/// <summary>Rolling 1 hour window.</summary>
OneHour,
/// <summary>Rolling 1 day window.</summary>
OneDay,
/// <summary>Rolling 7 day window.</summary>
SevenDays,
/// <summary>Rolling 30 day window.</summary>
ThirtyDays
}
/// <summary>
/// Alert severity for SLO violations.
/// </summary>
public enum AlertSeverity
{
/// <summary>Informational - SLO approaching threshold.</summary>
Info,
/// <summary>Warning - SLO at risk.</summary>
Warning,
/// <summary>Critical - SLO likely to be breached.</summary>
Critical,
/// <summary>Emergency - SLO breached.</summary>
Emergency
}
/// <summary>
/// Service Level Objective definition.
/// </summary>
public sealed record Slo(
/// <summary>Unique SLO identifier.</summary>
Guid SloId,
/// <summary>Tenant this SLO belongs to.</summary>
string TenantId,
/// <summary>Human-readable name.</summary>
string Name,
/// <summary>Optional description.</summary>
string? Description,
/// <summary>Type of SLO.</summary>
SloType Type,
/// <summary>Job type this SLO applies to (null = all job types).</summary>
string? JobType,
/// <summary>Source ID this SLO applies to (null = all sources).</summary>
Guid? SourceId,
/// <summary>Target objective (e.g., 0.999 for 99.9% availability).</summary>
double Target,
/// <summary>Time window for SLO evaluation.</summary>
SloWindow Window,
/// <summary>For latency SLOs: the percentile (e.g., 0.95 for P95).</summary>
double? LatencyPercentile,
/// <summary>For latency SLOs: the target latency in seconds.</summary>
double? LatencyTargetSeconds,
/// <summary>For throughput SLOs: minimum jobs per period.</summary>
int? ThroughputMinimum,
/// <summary>Whether this SLO is actively monitored.</summary>
bool Enabled,
/// <summary>When the SLO was created.</summary>
DateTimeOffset CreatedAt,
/// <summary>When the SLO was last updated.</summary>
DateTimeOffset UpdatedAt,
/// <summary>Actor who created the SLO.</summary>
string CreatedBy,
/// <summary>Actor who last modified the SLO.</summary>
string UpdatedBy)
{
/// <summary>Calculates the error budget as a decimal (1 - target).</summary>
public double ErrorBudget => 1.0 - Target;
/// <summary>Creates a new availability SLO.</summary>
public static Slo CreateAvailability(
string tenantId,
string name,
double target,
SloWindow window,
string createdBy,
string? description = null,
string? jobType = null,
Guid? sourceId = null)
{
ValidateTarget(target);
var now = DateTimeOffset.UtcNow;
return new Slo(
SloId: Guid.NewGuid(),
TenantId: tenantId,
Name: name,
Description: description,
Type: SloType.Availability,
JobType: jobType,
SourceId: sourceId,
Target: target,
Window: window,
LatencyPercentile: null,
LatencyTargetSeconds: null,
ThroughputMinimum: null,
Enabled: true,
CreatedAt: now,
UpdatedAt: now,
CreatedBy: createdBy,
UpdatedBy: createdBy);
}
/// <summary>Creates a new latency SLO.</summary>
public static Slo CreateLatency(
string tenantId,
string name,
double percentile,
double targetSeconds,
double target,
SloWindow window,
string createdBy,
string? description = null,
string? jobType = null,
Guid? sourceId = null)
{
ValidateTarget(target);
if (percentile < 0 || percentile > 1)
throw new ArgumentOutOfRangeException(nameof(percentile), "Percentile must be between 0 and 1");
if (targetSeconds <= 0)
throw new ArgumentOutOfRangeException(nameof(targetSeconds), "Target latency must be positive");
var now = DateTimeOffset.UtcNow;
return new Slo(
SloId: Guid.NewGuid(),
TenantId: tenantId,
Name: name,
Description: description,
Type: SloType.Latency,
JobType: jobType,
SourceId: sourceId,
Target: target,
Window: window,
LatencyPercentile: percentile,
LatencyTargetSeconds: targetSeconds,
ThroughputMinimum: null,
Enabled: true,
CreatedAt: now,
UpdatedAt: now,
CreatedBy: createdBy,
UpdatedBy: createdBy);
}
/// <summary>Creates a new throughput SLO.</summary>
public static Slo CreateThroughput(
string tenantId,
string name,
int minimum,
double target,
SloWindow window,
string createdBy,
string? description = null,
string? jobType = null,
Guid? sourceId = null)
{
ValidateTarget(target);
if (minimum <= 0)
throw new ArgumentOutOfRangeException(nameof(minimum), "Throughput minimum must be positive");
var now = DateTimeOffset.UtcNow;
return new Slo(
SloId: Guid.NewGuid(),
TenantId: tenantId,
Name: name,
Description: description,
Type: SloType.Throughput,
JobType: jobType,
SourceId: sourceId,
Target: target,
Window: window,
LatencyPercentile: null,
LatencyTargetSeconds: null,
ThroughputMinimum: minimum,
Enabled: true,
CreatedAt: now,
UpdatedAt: now,
CreatedBy: createdBy,
UpdatedBy: createdBy);
}
/// <summary>Updates the SLO with new values.</summary>
public Slo Update(
string? name = null,
string? description = null,
double? target = null,
bool? enabled = null,
string? updatedBy = null)
{
if (target.HasValue)
ValidateTarget(target.Value);
return this with
{
Name = name ?? Name,
Description = description ?? Description,
Target = target ?? Target,
Enabled = enabled ?? Enabled,
UpdatedAt = DateTimeOffset.UtcNow,
UpdatedBy = updatedBy ?? UpdatedBy
};
}
/// <summary>Disables the SLO.</summary>
public Slo Disable(string updatedBy) =>
this with
{
Enabled = false,
UpdatedAt = DateTimeOffset.UtcNow,
UpdatedBy = updatedBy
};
/// <summary>Enables the SLO.</summary>
public Slo Enable(string updatedBy) =>
this with
{
Enabled = true,
UpdatedAt = DateTimeOffset.UtcNow,
UpdatedBy = updatedBy
};
/// <summary>Gets the window duration as a TimeSpan.</summary>
public TimeSpan GetWindowDuration() => Window switch
{
SloWindow.OneHour => TimeSpan.FromHours(1),
SloWindow.OneDay => TimeSpan.FromDays(1),
SloWindow.SevenDays => TimeSpan.FromDays(7),
SloWindow.ThirtyDays => TimeSpan.FromDays(30),
_ => throw new InvalidOperationException($"Unknown window: {Window}")
};
private static void ValidateTarget(double target)
{
if (target <= 0 || target > 1)
throw new ArgumentOutOfRangeException(nameof(target), "Target must be between 0 (exclusive) and 1 (inclusive)");
}
}
/// <summary>
/// Current state of an SLO including burn rate and budget consumption.
/// </summary>
public sealed record SloState(
/// <summary>The SLO this state belongs to.</summary>
Guid SloId,
/// <summary>Tenant this state belongs to.</summary>
string TenantId,
/// <summary>Current SLI value (actual performance).</summary>
double CurrentSli,
/// <summary>Total events/requests in the window.</summary>
long TotalEvents,
/// <summary>Good events (successful) in the window.</summary>
long GoodEvents,
/// <summary>Bad events (failed) in the window.</summary>
long BadEvents,
/// <summary>Error budget consumed (0-1 where 1 = fully consumed).</summary>
double BudgetConsumed,
/// <summary>Error budget remaining (0-1 where 1 = fully available).</summary>
double BudgetRemaining,
/// <summary>Current burn rate (1.0 = consuming budget at sustainable rate).</summary>
double BurnRate,
/// <summary>Projected time until budget exhaustion (null if not burning).</summary>
TimeSpan? TimeToExhaustion,
/// <summary>Whether the SLO is currently met.</summary>
bool IsMet,
/// <summary>Current alert severity based on budget consumption.</summary>
AlertSeverity AlertSeverity,
/// <summary>When this state was computed.</summary>
DateTimeOffset ComputedAt,
/// <summary>Start of the evaluation window.</summary>
DateTimeOffset WindowStart,
/// <summary>End of the evaluation window.</summary>
DateTimeOffset WindowEnd)
{
/// <summary>Creates a state indicating no data is available.</summary>
public static SloState NoData(Guid sloId, string tenantId, DateTimeOffset now, SloWindow window)
{
var windowDuration = GetWindowDuration(window);
return new SloState(
SloId: sloId,
TenantId: tenantId,
CurrentSli: 1.0, // Assume good when no data
TotalEvents: 0,
GoodEvents: 0,
BadEvents: 0,
BudgetConsumed: 0,
BudgetRemaining: 1.0,
BurnRate: 0,
TimeToExhaustion: null,
IsMet: true,
AlertSeverity: AlertSeverity.Info,
ComputedAt: now,
WindowStart: now - windowDuration,
WindowEnd: now);
}
private static TimeSpan GetWindowDuration(SloWindow window) => window switch
{
SloWindow.OneHour => TimeSpan.FromHours(1),
SloWindow.OneDay => TimeSpan.FromDays(1),
SloWindow.SevenDays => TimeSpan.FromDays(7),
SloWindow.ThirtyDays => TimeSpan.FromDays(30),
_ => TimeSpan.FromDays(1)
};
}
/// <summary>
/// Alert budget threshold configuration.
/// </summary>
public sealed record AlertBudgetThreshold(
/// <summary>Unique threshold identifier.</summary>
Guid ThresholdId,
/// <summary>SLO this threshold applies to.</summary>
Guid SloId,
/// <summary>Tenant this threshold belongs to.</summary>
string TenantId,
/// <summary>Budget consumed percentage that triggers this alert (0-1).</summary>
double BudgetConsumedThreshold,
/// <summary>Burn rate threshold that triggers this alert.</summary>
double? BurnRateThreshold,
/// <summary>Severity of the alert.</summary>
AlertSeverity Severity,
/// <summary>Whether this threshold is enabled.</summary>
bool Enabled,
/// <summary>Notification channel for this alert.</summary>
string? NotificationChannel,
/// <summary>Notification endpoint for this alert.</summary>
string? NotificationEndpoint,
/// <summary>Cooldown period between alerts.</summary>
TimeSpan Cooldown,
/// <summary>When an alert was last triggered.</summary>
DateTimeOffset? LastTriggeredAt,
/// <summary>When the threshold was created.</summary>
DateTimeOffset CreatedAt,
/// <summary>When the threshold was last updated.</summary>
DateTimeOffset UpdatedAt,
/// <summary>Actor who created the threshold.</summary>
string CreatedBy,
/// <summary>Actor who last modified the threshold.</summary>
string UpdatedBy)
{
/// <summary>Creates a new alert threshold.</summary>
public static AlertBudgetThreshold Create(
Guid sloId,
string tenantId,
double budgetConsumedThreshold,
AlertSeverity severity,
string createdBy,
double? burnRateThreshold = null,
string? notificationChannel = null,
string? notificationEndpoint = null,
TimeSpan? cooldown = null)
{
if (budgetConsumedThreshold < 0 || budgetConsumedThreshold > 1)
throw new ArgumentOutOfRangeException(nameof(budgetConsumedThreshold), "Threshold must be between 0 and 1");
var now = DateTimeOffset.UtcNow;
return new AlertBudgetThreshold(
ThresholdId: Guid.NewGuid(),
SloId: sloId,
TenantId: tenantId,
BudgetConsumedThreshold: budgetConsumedThreshold,
BurnRateThreshold: burnRateThreshold,
Severity: severity,
Enabled: true,
NotificationChannel: notificationChannel,
NotificationEndpoint: notificationEndpoint,
Cooldown: cooldown ?? TimeSpan.FromHours(1),
LastTriggeredAt: null,
CreatedAt: now,
UpdatedAt: now,
CreatedBy: createdBy,
UpdatedBy: createdBy);
}
/// <summary>Checks if this threshold should trigger based on current state.</summary>
public bool ShouldTrigger(SloState state, DateTimeOffset now)
{
if (!Enabled) return false;
// Check cooldown
if (LastTriggeredAt.HasValue && (now - LastTriggeredAt.Value) < Cooldown)
return false;
// Check budget consumed threshold
if (state.BudgetConsumed >= BudgetConsumedThreshold)
return true;
// Check burn rate threshold if set
if (BurnRateThreshold.HasValue && state.BurnRate >= BurnRateThreshold.Value)
return true;
return false;
}
/// <summary>Records that this threshold was triggered.</summary>
public AlertBudgetThreshold RecordTrigger(DateTimeOffset now) =>
this with
{
LastTriggeredAt = now,
UpdatedAt = now
};
}
/// <summary>
/// SLO alert event.
/// </summary>
public sealed record SloAlert(
/// <summary>Unique alert identifier.</summary>
Guid AlertId,
/// <summary>SLO this alert relates to.</summary>
Guid SloId,
/// <summary>Threshold that triggered this alert.</summary>
Guid ThresholdId,
/// <summary>Tenant this alert belongs to.</summary>
string TenantId,
/// <summary>Severity of the alert.</summary>
AlertSeverity Severity,
/// <summary>Alert message.</summary>
string Message,
/// <summary>Budget consumed at time of alert.</summary>
double BudgetConsumed,
/// <summary>Burn rate at time of alert.</summary>
double BurnRate,
/// <summary>Current SLI value at time of alert.</summary>
double CurrentSli,
/// <summary>When the alert was triggered.</summary>
DateTimeOffset TriggeredAt,
/// <summary>When the alert was acknowledged (null if not acknowledged).</summary>
DateTimeOffset? AcknowledgedAt,
/// <summary>Who acknowledged the alert.</summary>
string? AcknowledgedBy,
/// <summary>When the alert was resolved (null if not resolved).</summary>
DateTimeOffset? ResolvedAt,
/// <summary>How the alert was resolved.</summary>
string? ResolutionNotes)
{
/// <summary>Creates a new alert from an SLO state and threshold.</summary>
public static SloAlert Create(
Slo slo,
SloState state,
AlertBudgetThreshold threshold)
{
var message = threshold.BurnRateThreshold.HasValue && state.BurnRate >= threshold.BurnRateThreshold.Value
? $"SLO '{slo.Name}' burn rate {state.BurnRate:F2}x exceeds threshold {threshold.BurnRateThreshold.Value:F2}x"
: $"SLO '{slo.Name}' error budget {state.BudgetConsumed:P1} consumed exceeds threshold {threshold.BudgetConsumedThreshold:P1}";
return new SloAlert(
AlertId: Guid.NewGuid(),
SloId: slo.SloId,
ThresholdId: threshold.ThresholdId,
TenantId: slo.TenantId,
Severity: threshold.Severity,
Message: message,
BudgetConsumed: state.BudgetConsumed,
BurnRate: state.BurnRate,
CurrentSli: state.CurrentSli,
TriggeredAt: state.ComputedAt,
AcknowledgedAt: null,
AcknowledgedBy: null,
ResolvedAt: null,
ResolutionNotes: null);
}
/// <summary>Acknowledges the alert.</summary>
public SloAlert Acknowledge(string acknowledgedBy, DateTimeOffset now) =>
this with
{
AcknowledgedAt = now,
AcknowledgedBy = acknowledgedBy
};
/// <summary>Resolves the alert.</summary>
public SloAlert Resolve(string notes, DateTimeOffset now) =>
this with
{
ResolvedAt = now,
ResolutionNotes = notes
};
/// <summary>Whether this alert has been acknowledged.</summary>
public bool IsAcknowledged => AcknowledgedAt.HasValue;
/// <summary>Whether this alert has been resolved.</summary>
public bool IsResolved => ResolvedAt.HasValue;
}

View File

@@ -0,0 +1,42 @@
namespace StellaOps.Orchestrator.Core.Domain;
/// <summary>
/// Represents a job source (producer) that submits jobs to the orchestrator.
/// Examples: Concelier, Excititor, Scheduler, Export Center, Policy Engine.
/// </summary>
public sealed record Source(
/// <summary>Unique source identifier.</summary>
Guid SourceId,
/// <summary>Tenant owning this source.</summary>
string TenantId,
/// <summary>Human-readable source name (e.g., "concelier-nvd").</summary>
string Name,
/// <summary>Source type/category (e.g., "advisory-ingest", "scanner", "export").</summary>
string SourceType,
/// <summary>Whether the source is currently enabled.</summary>
bool Enabled,
/// <summary>Whether the source is paused (throttled by operator).</summary>
bool Paused,
/// <summary>Operator-provided reason for pause (if paused).</summary>
string? PauseReason,
/// <summary>Ticket reference for pause audit trail.</summary>
string? PauseTicket,
/// <summary>Optional configuration JSON blob.</summary>
string? Configuration,
/// <summary>When the source was created.</summary>
DateTimeOffset CreatedAt,
/// <summary>When the source was last updated.</summary>
DateTimeOffset UpdatedAt,
/// <summary>Actor who last modified the source.</summary>
string UpdatedBy);

View File

@@ -0,0 +1,60 @@
namespace StellaOps.Orchestrator.Core.Domain;
/// <summary>
/// Represents a dynamic rate-limit override (throttle) for a source or job type.
/// Throttles are temporary pause/slow-down mechanisms, often in response to upstream pressure.
/// </summary>
public sealed record Throttle(
/// <summary>Unique throttle identifier.</summary>
Guid ThrottleId,
/// <summary>Tenant this throttle applies to.</summary>
string TenantId,
/// <summary>Source to throttle (null if job-type scoped).</summary>
Guid? SourceId,
/// <summary>Job type to throttle (null if source-scoped).</summary>
string? JobType,
/// <summary>Whether this throttle is currently active.</summary>
bool Active,
/// <summary>Reason for the throttle (e.g., "429 from upstream", "Manual pause").</summary>
string Reason,
/// <summary>Optional ticket reference for audit.</summary>
string? Ticket,
/// <summary>When the throttle was created.</summary>
DateTimeOffset CreatedAt,
/// <summary>When the throttle expires (null = indefinite).</summary>
DateTimeOffset? ExpiresAt,
/// <summary>Actor who created the throttle.</summary>
string CreatedBy);
/// <summary>
/// Reason categories for throttle creation.
/// </summary>
public static class ThrottleReasons
{
/// <summary>Upstream returned 429 Too Many Requests.</summary>
public const string UpstreamRateLimited = "upstream_429";
/// <summary>Upstream returned 503 Service Unavailable.</summary>
public const string UpstreamUnavailable = "upstream_503";
/// <summary>Upstream returned 5xx error repeatedly.</summary>
public const string UpstreamErrors = "upstream_5xx";
/// <summary>Manual operator intervention.</summary>
public const string ManualPause = "manual_pause";
/// <summary>Circuit breaker triggered.</summary>
public const string CircuitBreaker = "circuit_breaker";
/// <summary>Quota exhausted.</summary>
public const string QuotaExhausted = "quota_exhausted";
}

View File

@@ -0,0 +1,162 @@
namespace StellaOps.Orchestrator.Core.Domain;
/// <summary>
/// Represents an event-time watermark for tracking processing progress.
/// Watermarks are scoped by source, job type, or custom key.
/// </summary>
public sealed record Watermark(
/// <summary>Unique watermark identifier.</summary>
Guid WatermarkId,
/// <summary>Tenant this watermark belongs to.</summary>
string TenantId,
/// <summary>Source this watermark tracks (null if job-type scoped).</summary>
Guid? SourceId,
/// <summary>Job type this watermark tracks (null if source-scoped).</summary>
string? JobType,
/// <summary>Normalized scope key for uniqueness.</summary>
string ScopeKey,
/// <summary>Latest processed event time (high watermark).</summary>
DateTimeOffset HighWatermark,
/// <summary>Earliest event time in current window (low watermark for windowing).</summary>
DateTimeOffset? LowWatermark,
/// <summary>Monotonic sequence number for ordering.</summary>
long SequenceNumber,
/// <summary>Total events processed through this watermark.</summary>
long ProcessedCount,
/// <summary>SHA-256 hash of last processed batch for integrity verification.</summary>
string? LastBatchHash,
/// <summary>When the watermark was created.</summary>
DateTimeOffset CreatedAt,
/// <summary>When the watermark was last updated.</summary>
DateTimeOffset UpdatedAt,
/// <summary>Actor who last modified the watermark.</summary>
string UpdatedBy)
{
/// <summary>
/// Creates a scope key for source-scoped watermarks.
/// </summary>
public static string CreateScopeKey(Guid sourceId) =>
$"source:{sourceId:N}";
/// <summary>
/// Creates a scope key for job-type-scoped watermarks.
/// </summary>
public static string CreateScopeKey(string jobType) =>
$"job_type:{jobType.ToLowerInvariant()}";
/// <summary>
/// Creates a scope key for source+job-type scoped watermarks.
/// </summary>
public static string CreateScopeKey(Guid sourceId, string jobType) =>
$"source:{sourceId:N}:job_type:{jobType.ToLowerInvariant()}";
/// <summary>
/// Creates a new watermark with initial values.
/// </summary>
public static Watermark Create(
string tenantId,
Guid? sourceId,
string? jobType,
DateTimeOffset highWatermark,
string createdBy)
{
var scopeKey = (sourceId, jobType) switch
{
(Guid s, string j) when !string.IsNullOrEmpty(j) => CreateScopeKey(s, j),
(Guid s, _) => CreateScopeKey(s),
(_, string j) when !string.IsNullOrEmpty(j) => CreateScopeKey(j),
_ => throw new ArgumentException("Either sourceId or jobType must be specified.")
};
var now = DateTimeOffset.UtcNow;
return new Watermark(
WatermarkId: Guid.NewGuid(),
TenantId: tenantId,
SourceId: sourceId,
JobType: jobType,
ScopeKey: scopeKey,
HighWatermark: highWatermark,
LowWatermark: null,
SequenceNumber: 0,
ProcessedCount: 0,
LastBatchHash: null,
CreatedAt: now,
UpdatedAt: now,
UpdatedBy: createdBy);
}
/// <summary>
/// Advances the watermark after successful batch processing.
/// </summary>
public Watermark Advance(
DateTimeOffset newHighWatermark,
long eventsProcessed,
string? batchHash,
string updatedBy)
{
if (newHighWatermark < HighWatermark)
throw new ArgumentException("New high watermark cannot be before current high watermark.", nameof(newHighWatermark));
return this with
{
HighWatermark = newHighWatermark,
SequenceNumber = SequenceNumber + 1,
ProcessedCount = ProcessedCount + eventsProcessed,
LastBatchHash = batchHash,
UpdatedAt = DateTimeOffset.UtcNow,
UpdatedBy = updatedBy
};
}
/// <summary>
/// Sets the event-time window bounds.
/// </summary>
public Watermark WithWindow(DateTimeOffset lowWatermark, DateTimeOffset highWatermark)
{
if (highWatermark < lowWatermark)
throw new ArgumentException("High watermark cannot be before low watermark.");
return this with
{
LowWatermark = lowWatermark,
HighWatermark = highWatermark,
UpdatedAt = DateTimeOffset.UtcNow
};
}
}
/// <summary>
/// Snapshot of watermark state for observability.
/// </summary>
public sealed record WatermarkSnapshot(
string ScopeKey,
DateTimeOffset HighWatermark,
DateTimeOffset? LowWatermark,
long SequenceNumber,
long ProcessedCount,
TimeSpan? Lag)
{
/// <summary>
/// Creates a snapshot from a watermark with calculated lag.
/// </summary>
public static WatermarkSnapshot FromWatermark(Watermark watermark, DateTimeOffset now) =>
new(
ScopeKey: watermark.ScopeKey,
HighWatermark: watermark.HighWatermark,
LowWatermark: watermark.LowWatermark,
SequenceNumber: watermark.SequenceNumber,
ProcessedCount: watermark.ProcessedCount,
Lag: now - watermark.HighWatermark);
}

View File

@@ -0,0 +1,450 @@
using StellaOps.Orchestrator.Core.Domain;
namespace StellaOps.Orchestrator.Core.RateLimiting;
/// <summary>
/// Adaptive rate limiter that combines token bucket, concurrency limiting, and backpressure handling.
/// Provides per-tenant/job-type rate limiting with automatic adaptation to upstream pressure.
/// </summary>
public sealed class AdaptiveRateLimiter
{
private readonly TokenBucket _tokenBucket;
private readonly ConcurrencyLimiter _concurrencyLimiter;
private readonly BackpressureHandler _backpressureHandler;
private readonly HourlyCounter _hourlyCounter;
private readonly object _lock = new();
/// <summary>
/// Tenant ID this limiter applies to.
/// </summary>
public string TenantId { get; }
/// <summary>
/// Job type this limiter applies to (null = all types).
/// </summary>
public string? JobType { get; }
/// <summary>
/// Maximum jobs per hour.
/// </summary>
public int MaxPerHour { get; }
/// <summary>
/// Whether the limiter is paused by operator.
/// </summary>
public bool IsPaused { get; private set; }
/// <summary>
/// Reason for pause (if paused).
/// </summary>
public string? PauseReason { get; private set; }
/// <summary>
/// Creates a new adaptive rate limiter from quota configuration.
/// </summary>
public AdaptiveRateLimiter(Quota quota, TimeProvider? timeProvider = null)
{
ArgumentNullException.ThrowIfNull(quota);
TenantId = quota.TenantId;
JobType = quota.JobType;
MaxPerHour = quota.MaxPerHour;
IsPaused = quota.Paused;
PauseReason = quota.PauseReason;
_tokenBucket = new TokenBucket(
quota.BurstCapacity,
quota.RefillRate,
quota.CurrentTokens,
quota.LastRefillAt);
_concurrencyLimiter = new ConcurrencyLimiter(
quota.MaxActive,
quota.CurrentActive);
_backpressureHandler = new BackpressureHandler(
baseDelay: TimeSpan.FromSeconds(1),
maxDelay: TimeSpan.FromMinutes(5),
failureThreshold: 3,
jitterFactor: 0.2);
_hourlyCounter = new HourlyCounter(
quota.MaxPerHour,
quota.CurrentHourCount,
quota.CurrentHourStart);
}
/// <summary>
/// Creates a new adaptive rate limiter with explicit configuration.
/// </summary>
public AdaptiveRateLimiter(
string tenantId,
string? jobType,
int maxActive,
int maxPerHour,
int burstCapacity,
double refillRate)
{
TenantId = tenantId ?? throw new ArgumentNullException(nameof(tenantId));
JobType = jobType;
MaxPerHour = maxPerHour;
_tokenBucket = new TokenBucket(burstCapacity, refillRate);
_concurrencyLimiter = new ConcurrencyLimiter(maxActive);
_backpressureHandler = new BackpressureHandler();
_hourlyCounter = new HourlyCounter(maxPerHour);
}
/// <summary>
/// Attempts to acquire permission to execute a job.
/// </summary>
/// <param name="now">Current time.</param>
/// <returns>Result indicating whether acquisition was successful and why.</returns>
public RateLimitResult TryAcquire(DateTimeOffset now)
{
lock (_lock)
{
// Check if paused
if (IsPaused)
{
return RateLimitResult.Denied(RateLimitDenialReason.Paused, PauseReason);
}
// Check backpressure
if (!_backpressureHandler.ShouldAllow(now))
{
var snapshot = _backpressureHandler.GetSnapshot(now);
return RateLimitResult.Denied(
RateLimitDenialReason.Backpressure,
snapshot.LastFailureReason,
retryAfter: snapshot.TimeRemaining);
}
// Check hourly limit
if (!_hourlyCounter.TryIncrement(now))
{
var hourlySnapshot = _hourlyCounter.GetSnapshot(now);
return RateLimitResult.Denied(
RateLimitDenialReason.HourlyLimitExceeded,
$"Hourly limit of {MaxPerHour} exceeded",
retryAfter: hourlySnapshot.TimeUntilReset);
}
// Check concurrency
if (!_concurrencyLimiter.TryAcquire())
{
// Rollback hourly counter
_hourlyCounter.Decrement();
var concurrencySnapshot = _concurrencyLimiter.GetSnapshot();
return RateLimitResult.Denied(
RateLimitDenialReason.ConcurrencyLimitExceeded,
$"Concurrency limit of {concurrencySnapshot.MaxActive} exceeded");
}
// Check token bucket
if (!_tokenBucket.TryConsume(now))
{
// Rollback concurrency and hourly counter
_concurrencyLimiter.Release();
_hourlyCounter.Decrement();
var waitTime = _tokenBucket.EstimatedWaitTime(now);
return RateLimitResult.Denied(
RateLimitDenialReason.TokensExhausted,
"Token bucket exhausted",
retryAfter: waitTime);
}
return RateLimitResult.Allowed();
}
}
/// <summary>
/// Releases a concurrency slot when a job completes.
/// </summary>
public void Release()
{
lock (_lock)
{
_concurrencyLimiter.Release();
}
}
/// <summary>
/// Records an upstream failure for backpressure calculation.
/// </summary>
/// <param name="statusCode">HTTP status code from upstream.</param>
/// <param name="retryAfter">Optional Retry-After header value.</param>
/// <param name="now">Current time.</param>
/// <returns>Backpressure result.</returns>
public BackpressureResult RecordUpstreamFailure(int statusCode, TimeSpan? retryAfter = null, DateTimeOffset? now = null)
{
lock (_lock)
{
return _backpressureHandler.RecordFailure(statusCode, retryAfter, now);
}
}
/// <summary>
/// Records a successful upstream request.
/// </summary>
public void RecordUpstreamSuccess()
{
lock (_lock)
{
_backpressureHandler.RecordSuccess();
}
}
/// <summary>
/// Pauses the limiter.
/// </summary>
/// <param name="reason">Reason for pause.</param>
public void Pause(string reason)
{
lock (_lock)
{
IsPaused = true;
PauseReason = reason;
}
}
/// <summary>
/// Resumes the limiter.
/// </summary>
public void Resume()
{
lock (_lock)
{
IsPaused = false;
PauseReason = null;
}
}
/// <summary>
/// Gets a snapshot of the current limiter state.
/// </summary>
/// <param name="now">Current time.</param>
/// <returns>Snapshot of limiter state.</returns>
public AdaptiveRateLimiterSnapshot GetSnapshot(DateTimeOffset now)
{
lock (_lock)
{
return new AdaptiveRateLimiterSnapshot(
TenantId: TenantId,
JobType: JobType,
IsPaused: IsPaused,
PauseReason: PauseReason,
TokenBucket: _tokenBucket.GetSnapshot(now),
Concurrency: _concurrencyLimiter.GetSnapshot(),
Backpressure: _backpressureHandler.GetSnapshot(now),
HourlyCounter: _hourlyCounter.GetSnapshot(now));
}
}
/// <summary>
/// Exports the current state to a quota record for persistence.
/// </summary>
/// <param name="quotaId">Original quota ID.</param>
/// <param name="now">Current time.</param>
/// <param name="updatedBy">Actor performing the update.</param>
/// <returns>Quota record with current state.</returns>
public Quota ExportToQuota(Guid quotaId, DateTimeOffset now, string updatedBy)
{
lock (_lock)
{
var tokenSnapshot = _tokenBucket.GetSnapshot(now);
var concurrencySnapshot = _concurrencyLimiter.GetSnapshot();
var hourlySnapshot = _hourlyCounter.GetSnapshot(now);
return new Quota(
QuotaId: quotaId,
TenantId: TenantId,
JobType: JobType,
MaxActive: concurrencySnapshot.MaxActive,
MaxPerHour: MaxPerHour,
BurstCapacity: tokenSnapshot.BurstCapacity,
RefillRate: tokenSnapshot.RefillRate,
CurrentTokens: tokenSnapshot.CurrentTokens,
LastRefillAt: tokenSnapshot.LastRefillAt,
CurrentActive: concurrencySnapshot.CurrentActive,
CurrentHourCount: hourlySnapshot.CurrentCount,
CurrentHourStart: hourlySnapshot.HourStart,
Paused: IsPaused,
PauseReason: PauseReason,
QuotaTicket: null,
CreatedAt: now, // This should be preserved from original
UpdatedAt: now,
UpdatedBy: updatedBy);
}
}
}
/// <summary>
/// Result of a rate limit acquisition attempt.
/// </summary>
public sealed record RateLimitResult(
bool IsAllowed,
RateLimitDenialReason? DenialReason,
string? DenialMessage,
TimeSpan? RetryAfter)
{
/// <summary>
/// Creates an allowed result.
/// </summary>
public static RateLimitResult Allowed() => new(true, null, null, null);
/// <summary>
/// Creates a denied result.
/// </summary>
public static RateLimitResult Denied(
RateLimitDenialReason reason,
string? message = null,
TimeSpan? retryAfter = null) =>
new(false, reason, message, retryAfter);
}
/// <summary>
/// Reasons for rate limit denial.
/// </summary>
public enum RateLimitDenialReason
{
/// <summary>Limiter is paused by operator.</summary>
Paused,
/// <summary>In backpressure backoff period.</summary>
Backpressure,
/// <summary>Hourly request limit exceeded.</summary>
HourlyLimitExceeded,
/// <summary>Concurrency limit exceeded.</summary>
ConcurrencyLimitExceeded,
/// <summary>Token bucket exhausted.</summary>
TokensExhausted
}
/// <summary>
/// Snapshot of adaptive rate limiter state.
/// </summary>
public sealed record AdaptiveRateLimiterSnapshot(
string TenantId,
string? JobType,
bool IsPaused,
string? PauseReason,
TokenBucketSnapshot TokenBucket,
ConcurrencySnapshot Concurrency,
BackpressureSnapshot Backpressure,
HourlyCounterSnapshot HourlyCounter);
/// <summary>
/// Tracks requests per hour with automatic reset.
/// </summary>
public sealed class HourlyCounter
{
private readonly object _lock = new();
private int _currentCount;
private DateTimeOffset _hourStart;
/// <summary>
/// Maximum allowed requests per hour.
/// </summary>
public int MaxPerHour { get; }
/// <summary>
/// Creates a new hourly counter.
/// </summary>
public HourlyCounter(int maxPerHour, int currentCount = 0, DateTimeOffset? hourStart = null)
{
if (maxPerHour <= 0)
throw new ArgumentOutOfRangeException(nameof(maxPerHour), "Max per hour must be positive.");
MaxPerHour = maxPerHour;
_currentCount = currentCount;
_hourStart = hourStart ?? TruncateToHour(DateTimeOffset.UtcNow);
}
/// <summary>
/// Attempts to increment the counter.
/// </summary>
/// <param name="now">Current time.</param>
/// <returns>True if increment was allowed, false if limit reached.</returns>
public bool TryIncrement(DateTimeOffset now)
{
lock (_lock)
{
MaybeResetHour(now);
if (_currentCount < MaxPerHour)
{
_currentCount++;
return true;
}
return false;
}
}
/// <summary>
/// Decrements the counter (for rollback).
/// </summary>
public void Decrement()
{
lock (_lock)
{
if (_currentCount > 0)
_currentCount--;
}
}
/// <summary>
/// Gets a snapshot of the counter state.
/// </summary>
public HourlyCounterSnapshot GetSnapshot(DateTimeOffset now)
{
lock (_lock)
{
MaybeResetHour(now);
var nextHour = _hourStart.AddHours(1);
var timeUntilReset = nextHour - now;
return new HourlyCounterSnapshot(
MaxPerHour: MaxPerHour,
CurrentCount: _currentCount,
HourStart: _hourStart,
TimeUntilReset: timeUntilReset > TimeSpan.Zero ? timeUntilReset : TimeSpan.Zero);
}
}
private void MaybeResetHour(DateTimeOffset now)
{
var currentHour = TruncateToHour(now);
if (currentHour > _hourStart)
{
_hourStart = currentHour;
_currentCount = 0;
}
}
private static DateTimeOffset TruncateToHour(DateTimeOffset dt) =>
new(dt.Year, dt.Month, dt.Day, dt.Hour, 0, 0, dt.Offset);
}
/// <summary>
/// Snapshot of hourly counter state.
/// </summary>
public sealed record HourlyCounterSnapshot(
int MaxPerHour,
int CurrentCount,
DateTimeOffset HourStart,
TimeSpan TimeUntilReset)
{
/// <summary>
/// Remaining requests in current hour.
/// </summary>
public int Remaining => Math.Max(0, MaxPerHour - CurrentCount);
/// <summary>
/// Whether the hourly limit has been reached.
/// </summary>
public bool IsExhausted => CurrentCount >= MaxPerHour;
}

Some files were not shown because too many files have changed in this diff Show More